GitHub - maiziex/MARS: A package for Multiple Samples Alignment-base SV Calling and Refinement

Multiple Alignment-based Refinement of Svs (MARS)

Hominid Ancestral Population analysis (HARP) mode (set "--HARP_flag" to 1 )

Install through Bioconda (The latest version 1.2.4):

conda install MARS

(Please ensure channels are properly setup for bioconda before installing)

MARS_step1 --help
MARS_step2 --help
# You can also check the below corresponding scripts for more details

Install through Github:

git clone /~https://github.com/maiziex/MARS.git
cd MARS
chmod +x install.sh
./install.sh

Dependencies through Github install:

MARS utilizes Python3, paftools, SAMtools, and minimap2. To be able to execute the above programs by typing their name on the command line, the program executables must be in one of the directories listed in the PATH environment variable (".bashrc").
Or you could just run "./install.sh" to install them, but make sure you have installed "conda" and "wget" first.

To use MARS, please also make sure install the below python packages through pip or conda.
pip install Bio
Conda install pysam
Conda install pandas
conda install minimap2
conda install openpyxl

Example data in Zenodo:

Please download the example data from Zenodo.

Ape_ref
|-Gorilla_gorilla_ref.fasta
|-pongo_abelii_ref.fasta
|-pan_troglodytes_ref.fasta
|-macaca_mulatta_ref.fasta

refdata-GRCh38-2.1.0
|-fasta
     |-genome.fa     
     
gnomAD_hg38_snp
|-gnomad.genomes.r2.1.1.sites.22.liftover_grch38.vcf.bgz
     
Aquila_results_4samples
|-HG00353
|      |-Assembly_Contigs_files
|                |-Aquila_Contig_chr22.fasta
|-HG00250
|      |-Assembly_Contigs_files
|                |-Aquila_Contig_chr22.fasta			 
|-HG00513
|      |-Assembly_Contigs_files
|                |-Aquila_Contig_chr22.fasta
|-HG00512
      |-Assembly_Contigs_files
                 |-Aquila_Contig_chr22.fasta

Running The Code:

Put the "MARS/bin" in the ".bashrc" file, and source the ".bashrc" file
Or use the fullpath of "MARS_step1.py" and "MARS_step2.py"

Step 1: Assembly-based structural variants calling for population

MARS_step1.py  --assembly_dir Aquila_results_4samples --ref_file refdata-GRCh38-2.1.0/fasta/genome.fa  --SV_len 20  --sample_list 'HG00250','HG00353','HG00512','HG00513' --chr_start 22 --chr_end 22 --out_dir MARS_step1_results

*Required parameters

--assembly_dir: "Aquila_results_4samples" is the input folder where you store the diploid assembled contig files for each sample by Aquila/Aquila_stLFR.

--ref_file: "refdata-GRCh38-2.1.0/fasta/genome.fa" is the human reference fasta (hg38) file.

--sample_list: 'HG00250','HG00353','HG00512' ... are the sample names corresponding to your contig files, which is the prefix of the contig files.

*Optional parameters

--out_dir: default = ./MARS_step1_results, it is the folder name you can define to store the final results.

--SV_len: default = 20, it is the SV size you can define.

--num_threads: default = 2, it is the number of threads you can define to perform assembly-based variant calling, which corresponds to number of samples.

--chr_start --chr_end: For example: use "--chr_start 1 --chr_end 5" will perform results for chromsomes 1,2,3,4,5. Use "--chr_start 22 --chr_end 22" will only assembly chromosome 22.

Step 2: Merge SV and Generate the multiple samples alignment-based SV files

MARS_step2.py  --in_dir MARS_step1_results --assembly_dir Aquila_results_4samples --ref_file refdata-GRCh38-2.1.0/fasta/genome.fa  --sample_list 'HG00250','HG00353','HG00512','HG00513' --chr_start 22 --chr_end 22 --out_dir MARS_step2_results

OR add "-gnomad_flag"

MARS_step2.py  --in_dir MARS_step1_results --assembly_dir Aquila_results_4samples --ref_file refdata-GRCh38-2.1.0/fasta/genome.fa  --sample_list 'HG00250','HG00353','HG00512','HG00513' --chr_start 22 --chr_end 22 --out_dir MARS_step2_results -gnomad_flag 1 --gnomad_dir gnomAD_hg38_snp

OR add "--HARP_flag"

MARS_step2.py  --in_dir MARS_step1_results --assembly_dir Aquila_results_4samples --ref_file refdata-GRCh38-2.1.0/fasta/genome.fa  --sample_list 'HG00250','HG00353','HG00512','HG00513' --chr_start 22 --chr_end 22 --out_dir MARS_step2_results  --HARP_flag 1 --Ape_ref_list "Ape_ref/Gorilla_gorilla_ref.fasta","Ape_ref/pan_troglodytes_ref.fasta","Ape_ref/pongo_abelii_ref.fasta","Ape_ref/macaca_mulatta_ref.fasta"

OR add "--HARP_flag" and "-gnomad_flag"

MARS_step2.py  --in_dir MARS_step1_results --assembly_dir Aquila_results_4samples --ref_file refdata-GRCh38-2.1.0/fasta/genome.fa  --sample_list 'HG00250','HG00353','HG00512','HG00513' --chr_start 22 --chr_end 22 --out_dir MARS_step2_results  --HARP_flag 1 --Ape_ref_list "Ape_ref/Gorilla_gorilla_ref.fasta","Ape_ref/pan_troglodytes_ref.fasta","Ape_ref/pongo_abelii_ref.fasta","Ape_ref/macaca_mulatta_ref.fasta"  -gnomad_flag 1 --gnomad_dir gnomAD_hg38_snp

*Required parameters

--in_dir: "MARS_step1_results" is the folder to store SV calling results from step1.

--assembly_dir: "Aquila_results_4samples" is the input folder where you store the diploid assembled contig files for each sample.

--ref_file: "refdata-GRCh38-2.1.0/fasta/genome.fa" is the human reference fasta (hg38) file.

--sample_list: 'HG00250','HG00353','HG00512'... are the sample names corresponding to your contig files, which is the prefix of the contig files.

*Optional parameters

--out_dir: default = ./MARS_step2_results, it is the folder name you can define to store the final results.

--SV_len: default = 20, it is the SV size you can define.

--num_threads_bychr: default = 2, it is the number of chromosomes you can define to perform MARS parallely.

--chr_start --chr_end: For example: use "--chr_start 1 --chr_end 5" will perform results for chromsomes 1,2,3,4,5. Use "--chr_start 22 --chr_end 22" will only assembly chromosome 22.

--gnomad_flag_linked_snp; -gnomad_flag: default = 0. If flag set to 1, MARS will output linked dbSNP from gnomad for each SV.

--gnomad_dir: If "gnomad_flag" set to 1, the users need to download gnomad VCF files and give a path to the folder which stores these gnomAD VCF files. (For example: use "wget https://storage.googleapis.com/gnomad-public/release/2.1.1/liftover_grch38/vcf/genomes/gnomad.genomes.r2.1.1.sites.10.liftover_grch38.vcf.bgz" to download gnomAD VCF file for chr10) For details, please check gnomAD downloads website)

--HARP_flag: default = 0. If flag set to 1, it will output SV with ancestral state: derived deletions and derived insertions.

--Ape_ref_list: If --HARP_flag set to 1, the users need to iniatize the Ape reference genomes they want to use. "Gorilla_gorilla_ref.fasta", "pan_troglodytes_ref.fasta", "pongo_abelii_ref.fasta", and "macaca_mulatta_ref.fasta" are the reference fasta files for each Ape. Each reference file is seperately by comma (",")

Output files:

1. In the "MARS_step2_results/Final_tables" folder:

SV_msa_table_chr*.txt
derived_del_msa_table_chr*.txt (set "--HARP_flag = 1")
derived_ins_msa_table_chr*.txt (set "--HARP_flag = 1")
derived_complex_msa_table_chr*.txt (set "--HARP_flag = 1")
SV_linked_with_gnomad_SNP_chr*.txt (set "-gnomad_flag = 1")
Consensus_seq_for_SV_chr*.fasta

2. In the "MARS_step2_results/MSA_SV_results_chr*/MSA_SV_files" folder, it includes txt/html files for each SV:

SV with 10bp left and right flanking regions around breakpoints, for example a txt file called by "chr22_29430067_29430067_ins_SV_Ref_bk_2.txt", "chr22_29430067_29430067_ins" is the name of the SV which corresponds to the first field/column in "SV_msa_table_chr*.txt" file.

2. SV with around 500bp left and right flanking regions: a MSA html file: a html file

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
bin		bin
source		source
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install through Bioconda (The latest version 1.2.4):

Install through Github:

Dependencies through Github install:

Example data in Zenodo:

Running The Code:

Step 1: Assembly-based structural variants calling for population

*Required parameters

*Optional parameters

Step 2: Merge SV and Generate the multiple samples alignment-based SV files

*Required parameters

*Optional parameters

Output files:

1. In the "MARS_step2_results/Final_tables" folder:

2. In the "MARS_step2_results/MSA_SV_results_chr*/MSA_SV_files" folder, it includes txt/html files for each SV:

Troubleshooting:

Please submit issues on the github page for MARS.

About

Releases 1

Packages

Contributors 2

Languages

License

maiziex/MARS

Folders and files

Latest commit

History

Repository files navigation

Install through Bioconda (The latest version 1.2.4):

Install through Github:

Dependencies through Github install:

Example data in Zenodo:

Running The Code:

Step 1: Assembly-based structural variants calling for population

*Required parameters

*Optional parameters

Step 2: Merge SV and Generate the multiple samples alignment-based SV files

*Required parameters

*Optional parameters

Output files:

1. In the "MARS_step2_results/Final_tables" folder:

2. In the "MARS_step2_results/MSA_SV_results_chr*/MSA_SV_files" folder, it includes txt/html files for each SV:

Troubleshooting:

Please submit issues on the github page for MARS.

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages