This repository contains analysis scripts and data associated with our manuscript
Stern DB, Anderson NW, Diaz JA, and CE Lee. Genome-wide signatures of synergistic epistasis during parallel adaptation in a Baltic Sea copepod
Scripts are organized by snp_calling, selection_analyses and simulations
Command-line options for python scripts can be found, e.g.,baypass2freqs_cov.py -h
- assemble_poolseq.commands.txt Commands used to generate the 'pseudoreference' genome by 'tiling' Pool-seq data onto the transcriptome in an iterative mapping and assembly approach
- baypass2freqs_cov.py Converts a file from multipopulation BayPass format (refcount1 altcount1 etc.) to frequencies of the alt allele and a coverage matrix
- bams2SNPs.commands.sh Commands used to call SNPs and generate allele count files
- calculate_coverage_distribution_sync.py Calculates the top X percentage of coverage across all pools from a sync file
- filter_fasta_by_blast.py Filters a multifasta file based on whether sequences had a significant blast hit to some sequence database or genome
- filter_sync_by_snplist.py Filters a sync file (Popoolation2) by a list of SNPs to keep (e.g. a snpdet file produced by poolfstat)
- get_mates.py For a set of left/R1 reads, fetch corresponding right/R2 read pairs
- get_SNP_position_in_genome.py Convert SNP positions called in one reference genome to approximate position in another genome based on blast results
- vcf2genobaypass.R R commands to generate the read count file from the VarScan VCF using poolfstat
- ACER_code.R R commands used to run the Chi-square and CMH tests on SNPs
- determine_AFC_cutoff.R R commands to simulate neutral allele frequency change to determine a cutoff to call an allele an under selection in a given line
- parallelism_functions.R R functions to calculate the Jaccard index and RFS for the empirical data
- prep_lmm.R R code specific to this study for generating the input file to run the lmm analysis of SNP frequency trajectories. Uses the files in the data directory
- 'prep_lmm.rawAFC.R' - same as above but does not transform the allele frequencies
- 'prep_lmm.rawFreqs.R' - same as above but uses raw allele frequencies rather than divergence from the ancestor
- run_lmm.R R script to run the linear mixed model with lme4 on every called SNP. Uses the output from prep_lmm.R
simulations, -- SLiM script and commands for running epistasis simulations using our empirical parameters
- epistasis_simulations.slim -- SLiM script to run the simulations. Contains the fitness functions used in the study.
- run_slim.sh -- Command to execute the SLiM script. Parameter values are set in the command line.
- sortedhbdata.csv -- Data from the 121 selected alleles (haplotype blocks) used in the simulations.
Additional information and simulation scenarios can be found here
- BLAST 2.7.1+
- BWA-MEM v0.7.17
- CD-HIT v4.7
- PoPoolation2
- SAMBLASTER v0.1.26
- Samtools v1.3.1
- SLiM v3.7
- Trinity v2.6.6
- VarScan v2.4.3
Python version 3.8.2
R version 4.0.4
- ACER v1.0.2
- BBTools
- BEDOPS v2.4.39
- Bowtie v2.3.5
- gowinda v1.12
- haplovalidate v0.1.4
- HMMER v3.2.1
- RSEM v1.3.1
- PolygenicAdaptationCode
- Transdecoder v5.5
- Trimmomatic v0.39
- TreeMix v1.13
SNPs and allele counts derived from the Pool-seq data are available in the data directory. Please see the README file within for information.
Please contact the authors for questions or issues.