Skip to content

finding deletions in parent child duos

Brent Pedersen edited this page Jul 21, 2019 · 2 revisions

duo-del

slivar duo-del finds structural deletions in parent-child duos using non-transmission of alleles. For example, given genotypes of 1/1 for the parent and 0/0 for the child, we can infer that the child must have lost this portion of DNA relative to the parent (a deletion).

In reality, this is most likely a genotyping error, but we can filter by:

  • using only high-quality variants
  • removing problematic regions (LCRs, self-chains)
  • require consecutive signals like this to infer lost regions
  • removing apparent candidates interspersed with heterozygote calls in the child.

Usage

slivar duo-del \
   --ped $ped \
   --exclude selfchain.bed \
    $vcf > deletions.bed

Here deletions.bed should contain few, high-quality candidates, often only 1 candidate per 20 families in high-quality exomes.

The use of an exclude file like selfchain.bed is strongly recommended. One can be made for hg19 with:

(wget -O - http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/chainSelf.txt.gz \
      | zcat - \
      | awk 'BEGIN{FS=OFS="\t"} $13 > 90 { print $3,$5,$6; print $7,$10,$11 }' \
      | sed -e 's/^chr//' ;
 wget -O - /~https://github.com/lh3/varcmp/raw/master/scripts/LCR-hs37d5.bed.gz \
      | zcat - ) \
      | sort -k1,1 -k2,2n -k3,3n | bedtools merge > selfchain-LCR.hg19.bed

and for hg38 with:

(wget -O - http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/chainSelf.txt.gz \
      | zcat - \
      | awk 'BEGIN{FS=OFS="\t"} $13 > 90 { print $3,$5,$6; print $7,$10,$11 }';
 wget -O - /~https://github.com/lh3/varcmp/raw/master/scripts/LCR-hs38.bed.gz \
      | zcat - ) \
      | sed -e 's/^chr//' \
      | sort -k1,1 -k2,2n -k3,3n | bedtools merge > selfchain-LCR.hg38.bed

note that this is has the effect of looking for loss-of-heterozygosity, but the parent-child (lack of) transmission information gives more power than simply looking for LOH.

In addition, to chrom, start, end, the output contains the following columns:

  • n_supporting_sites: the number of sites in the region where the parent and child were opposite homozygotes
  • n_total_sites: the total number of sites in the region (including low-quality and non-informative sites)
  • kid_id, parent_id: sample ids of the duo
  • kid_median_dp, parent_median_dp: median normalized depths of LOH sites in the event. The are normalized within samples to a mean of 1 and then within a site (across samples) to a mean of 1.
  • hq_kid_hets, kid_hets: number of high-quality, total hets in the kid in the event.
  • hq_parent_hets, hq_parent_hom_alts: number of high-quality hets and hom-alts variants in the parent, in the event.

A high quality deletion will have zero (or a small percentage of) hq_kid_hets, and a kid_median_dp of ~0.5. If it is inherited, the parent will also have few (or zero) hq_parent_hets and a parent_median_dp of ~0.5.

Clone this wiki locally