Introduction

This repository contains analysis scripts and data associated with our manuscript

Stern DB, Anderson NW, Diaz JA, and CE Lee. Genome-wide signatures of synergistic epistasis during parallel adaptation in a Baltic Sea copepod

Usage

Scripts are organized by snp_calling, selection_analyses and simulations
Command-line options for python scripts can be found, e.g.,baypass2freqs_cov.py -h

snp_calling, -- reference assembly, SNP calling, SNP data processing

assemble_poolseq.commands.txt Commands used to generate the 'pseudoreference' genome by 'tiling' Pool-seq data onto the transcriptome in an iterative mapping and assembly approach
baypass2freqs_cov.py Converts a file from multipopulation BayPass format (refcount1 altcount1 etc.) to frequencies of the alt allele and a coverage matrix
bams2SNPs.commands.sh Commands used to call SNPs and generate allele count files
calculate_coverage_distribution_sync.py Calculates the top X percentage of coverage across all pools from a sync file
filter_fasta_by_blast.py Filters a multifasta file based on whether sequences had a significant blast hit to some sequence database or genome
filter_sync_by_snplist.py Filters a sync file (Popoolation2) by a list of SNPs to keep (e.g. a snpdet file produced by poolfstat)
get_mates.py For a set of left/R1 reads, fetch corresponding right/R2 read pairs
get_SNP_position_in_genome.py Convert SNP positions called in one reference genome to approximate position in another genome based on blast results
vcf2genobaypass.R R commands to generate the read count file from the VarScan VCF using poolfstat

selection_analyses, -- CMH, Chi-square, & LMM tests, calculating Jaccard index

ACER_code.R R commands used to run the Chi-square and CMH tests on SNPs
determine_AFC_cutoff.R R commands to simulate neutral allele frequency change to determine a cutoff to call an allele an under selection in a given line
parallelism_functions.R R functions to calculate the Jaccard index and RFS for the empirical data
prep_lmm.R R code specific to this study for generating the input file to run the lmm analysis of SNP frequency trajectories. Uses the files in the data directory
- 'prep_lmm.rawAFC.R' - same as above but does not transform the allele frequencies
- 'prep_lmm.rawFreqs.R' - same as above but uses raw allele frequencies rather than divergence from the ancestor
run_lmm.R R script to run the linear mixed model with lme4 on every called SNP. Uses the output from prep_lmm.R

simulations, -- SLiM script and commands for running epistasis simulations using our empirical parameters

epistasis_simulations.slim -- SLiM script to run the simulations. Contains the fitness functions used in the study.
run_slim.sh -- Command to execute the SLiM script. Parameter values are set in the command line.
sortedhbdata.csv -- Data from the 121 selected alleles (haplotype blocks) used in the simulations.

Additional information and simulation scenarios can be found here

Software required to run these scripts

Python packages

Python version 3.8.2

R packages

R version 4.0.4

Other software used in the manuscript

Data

SNPs and allele counts derived from the Pool-seq data are available in the data directory. Please see the README file within for information.

Please contact the authors for questions or issues.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
data		data
selection_analyses		selection_analyses
simulations		simulations
snp_calling		snp_calling
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Usage

snp_calling, -- reference assembly, SNP calling, SNP data processing

selection_analyses, -- CMH, Chi-square, & LMM tests, calculating Jaccard index

simulations, -- SLiM script and commands for running epistasis simulations using our empirical parameters

Software required to run these scripts

Python packages

R packages

Other software used in the manuscript

Data

About

Releases 1

Contributors 2

Languages

License

TheDBStern/Baltic_Lab_Wild

Folders and files

Latest commit

History

Repository files navigation

Introduction

Usage

snp_calling, -- reference assembly, SNP calling, SNP data processing

selection_analyses, -- CMH, Chi-square, & LMM tests, calculating Jaccard index

simulations, -- SLiM script and commands for running epistasis simulations using our empirical parameters

Software required to run these scripts

Python packages

R packages

Other software used in the manuscript

Data

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Contributors 2

Languages