Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jg/v3.1 vcf changes #365

Merged
merged 31 commits into from
May 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
4bae796
Add dict_hists parameter to make_hist_dict
jkgoodrich Apr 2, 2021
03e5e37
Add REGION_FLAG_FIELDS and docstrings
jkgoodrich Apr 2, 2021
8f1f63b
Add mid to POP_NAMES
jkgoodrich Apr 2, 2021
043dda5
Reemove REGION_FLAG_FIELDS
jkgoodrich Apr 5, 2021
11b0c02
Add label_delimiter to make_hist_bin_edges_expr
jkgoodrich Apr 5, 2021
8138191
Fix AS_VQSR_FIELDS
jkgoodrich Apr 7, 2021
57ac5e4
add `adjust_vcf_incompatible_types` and use within `ht_to_vcf_mt`
jkgoodrich Apr 7, 2021
b430607
add a mapping to KG POP Names
jkgoodrich Apr 7, 2021
b1fc46c
switch KG_POP_NAMES to lowercase
jkgoodrich Apr 7, 2021
c8f05c7
fix isinstance check
jkgoodrich Apr 7, 2021
6db2adc
Add default info_expr
jkgoodrich Apr 7, 2021
55ab963
Fix AC to metric in loop
jkgoodrich Apr 8, 2021
cba8f8c
Sanity checks temp changes for faster testing
jkgoodrich Apr 13, 2021
be461e2
Make adjust_vcf_incompatible_types and ht_to_vcf_mt to be only for ta…
jkgoodrich Apr 15, 2021
fed1069
Fix type of label groups
jkgoodrich Apr 15, 2021
7401f2c
Rollback addition of sanity check changes
jkgoodrich Apr 15, 2021
5edf078
Apply suggestions from code review
jkgoodrich Apr 21, 2021
d1b5eac
Merge branches 'jg/v3.1_vcf_changes' and 'master' of https://github.c…
jkgoodrich May 3, 2021
9f80fcd
Fix docstrings with review comments
jkgoodrich May 3, 2021
265ddea
change AF to "allele frequency" in descriptions
jkgoodrich May 3, 2021
0faa81a
Address review comments
jkgoodrich May 3, 2021
4e08e2a
Change KG_POP_NAMES to TGP_POP_NAMES
jkgoodrich May 7, 2021
e2afa12
Add to INFO_DICT
jkgoodrich May 7, 2021
00ef655
Remove BaseQRankSum from SITE_FIELDS and AS_BaseQRankSum from AS_FIELDS
jkgoodrich May 7, 2021
5fb5cbd
update REGION_FLAG_FIELDS description because aa segdup file now exits
jkgoodrich May 7, 2021
78f4ae8
Move IN_SILICO_ANNOTATIONS_INFO_DICT from gnomad_qc
jkgoodrich May 7, 2021
5345fa6
tiny docstring change
jkgoodrich May 14, 2021
b04fdf4
Apply suggestions from code review
jkgoodrich May 18, 2021
14c036e
Finish addressing review comments
jkgoodrich May 18, 2021
ff491d4
Merge remote-tracking branch 'origin/jg/v3.1_vcf_changes' into jg/v3.…
jkgoodrich May 18, 2021
f2d95fe
Merge branch 'master' into jg/v3.1_vcf_changes
jkgoodrich May 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
* Added function `make_freq_index_dict` to create a look-up Dictionary for entries contained in the frequency annotation array [(#349)](/~https://github.com/broadinstitute/gnomad_methods/pull/349/files)
* VersionedResource objects are no longer subclasses of BaseResource [(#359)](/~https://github.com/broadinstitute/gnomad_methods/pull/359)
* gnomAD resources can now be imported from different sources [(#373)](/~https://github.com/broadinstitute/gnomad_methods/pull/373)
* Replaced `ht_to_vcf_mt` with `adjust_vcf_incompatible_types` which maintains all functionality except turning the ht into a mt because it is no longer needed for use of the Hail module `export_vcf` [(#365)](/~https://github.com/broadinstitute/gnomad_methods/pull/365/files)


## Version 0.5.0 - April 22nd, 2021

Expand Down
73 changes: 73 additions & 0 deletions gnomad/resources/grch38/gnomad.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,36 @@
"tgp",
"hgdp",
]
"""
Order to sort subgroupings during VCF export.

Ensures that INFO labels in VCF are in desired order (e.g., tgp_raw_AC_esn_XX).
"""

GROUPS = ["adj", "raw"]
"""
Group names used to generate labels for high quality genotypes and all raw genotypes.

Used in VCF export.
"""

SEXES = ["XX", "XY"]
"""
Sample sexes used in VCF export.
jkgoodrich marked this conversation as resolved.
Show resolved Hide resolved

Used to stratify frequency annotations (AC, AN, AF) for each sex.
"""

POPS = ["afr", "ami", "amr", "asj", "eas", "fin", "nfe", "oth", "sas", "mid"]
"""
Global populations in gnomAD v3.
"""

COHORTS_WITH_POP_STORED_AS_SUBPOP = ["tgp", "hgdp"]
"""
Subsets in gnomAD v3.1 that are broken down by their known subpops instead of global pops in the frequency struct.
"""

TGP_POPS = [
"esn",
"pur",
Expand Down Expand Up @@ -59,6 +85,10 @@
"chs",
"gwd",
]
"""
1000 Genomes Project (1KG/TGP) subpops.
"""

HGDP_POPS = [
"japanese",
"papuan",
Expand Down Expand Up @@ -113,8 +143,48 @@
"burusho",
"maya",
]
"""
Human Genome Diversity Project (HGDP) subpops.
"""

TGP_POP_NAMES = {
"chb": "Han Chinese",
"jpt": "Japanese",
"chs": "Southern Han Chinese",
"cdx": "Chinese Dai",
"khv": "Kinh",
"ceu": "Utah Residents (European Ancestry)",
"tsi": "Toscani",
"fin": "Finnish",
"gbr": "British",
"ibs": "Iberian",
"yri": "Yoruba",
"lwk": "Luhya",
"gwd": "Gambian",
"msl": "Mende",
"esn": "Esan",
"asw": "African-American",
"acb": "African Caribbean",
"mxl": "Mexican-American",
"pur": "Puerto Rican",
"clm": "Colombian",
"pel": "Peruvian",
"gih": "Gujarati",
"pjl": "Punjabi",
"beb": "Bengali",
"stu": "Sri Lankan Tamil",
"itu": "Indian Telugu",
}
"""
1000 Genomes Project (1KG/TGP) pop label map.
"""

POPS_STORED_AS_SUBPOPS = TGP_POPS + HGDP_POPS
POPS_TO_REMOVE_FOR_POPMAX = {"asj", "fin", "oth", "ami", "mid"}
"""
Populations that are removed before popmax calculations.
"""

DOWNSAMPLINGS = [
10,
20,
Expand Down Expand Up @@ -143,6 +213,9 @@
110000,
120000,
]
"""
List of the downsampling numbers to use for frequency calculations.
"""

gnomad_syndip = VersionedMatrixTableResource(
default_version="3.0",
Expand Down
3 changes: 2 additions & 1 deletion gnomad/sample_qc/ancestry.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@
"eas": "East Asian",
"eur": "European",
"fin": "Finnish",
"mde": "Middle Eastern",
"mde": "Middle Eastern", # NOTE: mde is kept for historical purposes, in gnomAD v3.1 mid was used instead
"mid": "Middle Eastern",
jkgoodrich marked this conversation as resolved.
Show resolved Hide resolved
"nfe": "Non-Finnish European",
"oth": "Other",
"sas": "South Asian",
Expand Down
Loading