-
Notifications
You must be signed in to change notification settings - Fork 23
finding deletions in parent child duos
slivar duo-del
finds structural deletions in parent-child duos using non-transmission of alleles.
For example, given genotypes of 1/1
for the parent and 0/0
for the child,
we can infer that the child must have lost this portion of DNA relative to the parent (a deletion).
In reality, this is most likely a genotyping error, but we can filter by:
- using only high-quality variants
- removing problematic regions (LCRs, self-chains)
- require consecutive signals like this to infer lost regions
- removing apparent candidates interspersed with heterozygote calls in the child.
slivar duo-del \
--ped $ped \
--exclude selfchain.bed \
$vcf > deletions.bed
Here deletions.bed
should contain few, high-quality candidates, often only 1 candidate per 20 families in high-quality exomes.
The use of an exclude file like selfchain.bed
is strongly recommended.
One can be made for hg19 with:
(wget -O - http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/chainSelf.txt.gz \
| zcat - \
| awk 'BEGIN{FS=OFS="\t"} $13 > 90 { print $3,$5,$6; print $7,$10,$11 }' \
| sed -e 's/^chr//' ;
wget -O - /~https://github.com/lh3/varcmp/raw/master/scripts/LCR-hs37d5.bed.gz \
| zcat - ) \
| sort -k1,1 -k2,2n -k3,3n | bedtools merge > selfchain-LCR.hg19.bed
and for hg38 with:
(wget -O - http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/chainSelf.txt.gz \
| zcat - \
| awk 'BEGIN{FS=OFS="\t"} $13 > 90 { print $3,$5,$6; print $7,$10,$11 }';
wget -O - /~https://github.com/lh3/varcmp/raw/master/scripts/LCR-hs38.bed.gz \
| zcat - ) \
| sed -e 's/^chr//' \
| sort -k1,1 -k2,2n -k3,3n | bedtools merge > selfchain-LCR.hg38.bed
note that this is has the effect of looking for loss-of-heterozygosity, but the parent-child (lack of) transmission information gives more power than simply looking for LOH.
In addition, to chrom, start, end, the output contains the following columns:
-
n_supporting_sites
: the number of sites in the region where the parent and child were opposite homozygotes -
n_total_sites
: the total number of sites in the region (including low-quality and non-informative sites) -
kid_id
,parent_id
: sample ids of the duo -
kid_median_dp
,parent_median_dp
: median normalized depths of LOH sites in the event. The are normalized within samples to a mean of 1 and then within a site (across samples) to a mean of 1. -
hq_kid_hets
,kid_hets
: number of high-quality, total hets in the kid in the event. -
hq_parent_hets
,hq_parent_hom_alts
: number of high-quality hets and hom-alts variants in the parent, in the event.
A high quality deletion will have zero (or a small percentage of) hq_kid_hets
, and a kid_median_dp
of ~0.5.
If it is inherited, the parent will also have few (or zero) hq_parent_hets
and a parent_median_dp
of ~0.5.