Releases: broadinstitute/gnomad_methods
Releases · broadinstitute/gnomad_methods
v0.8.2
What's Changed
Breaking Changes
- Update default vep version for context table resource by @ch-kr in #726
- Add coverage_metric param to allow for different metrics of coverage and cov_model_type option to allow for linear or logarithmic by @klaricch in #724
- Add
get_tissues_to_exclude
function to determine what tissues to exclude from transcript annotation calculations by @jkgoodrich in #729
Bug fixes
- Fix
tx_filter_variants_by_csqs
to correctly handle theignore_splicing
parameter by @jkgoodrich in #727
New Features
- Add retain cdf option for median calculations when computing info fields by @klaricch in #731
- Add
max_grpmax
option toget_summary_stats_variant_filter_expr
for filtering by grpmax by @jkgoodrich in #732 - Change filter_mt_to_trios to also filter on vds by @KoalaQin in #739
- Add
get_mu_annotation_expr
function that prevents a shuffle from happening when annotating a HT with mutation rate and use inannotate_with_mu
by @jkgoodrich in #734 - Add
assemble_constraint_context_ht
function to create a fully annotated context HT for computing constraint on by @jkgoodrich in #733 - Add support for filtering Hail Tables to
filter_to_trios
by @jkgoodrich in #741 - Generalize the
freq_bin_expr
function to take in a list of allele count and allele frequency cutoffs by @jkgoodrich in #745 - Add function
parse_variant
to create a Struct with the locus and alleles from a variant string or contig, position, ref, and alt. by @jkgoodrich in #746 - Modify
filter_vep_transcript_csqs_expr
so it can also accept hl.expr.StructExpression by @jkgoodrich in #748 - Filter to Gencode CDS by genes and by exon paddings by @KoalaQin in #747
- Add functions to support padding and filtering intervals:
filter_by_intervals
,pad_intervals
,parse_locus_intervals
by @jkgoodrich in #752 - Add
loftee_labels
andno_lof_flags
parameters tofilter_vep_transcript_csqs_expr
for filtering by loftee labels and flags by @jkgoodrich in #753 - Add browser tables to resources by @KoalaQin in #750
- Add functions to check struct and array missingness by @klaricch in #738
Other Changes
- Add import code for GTEx v10 RSEM by @KoalaQin in #742
- Add pext and constraint resources by @KoalaQin in #743
- Bump the pip group in /docs with 2 updates by @dependabot in #715
- Bump jinja2 from 3.1.4 to 3.1.5 in /docs in the pip group across 1 directory by @dependabot in #751
- Bump virtualenv from 20.24.6 to 20.26.6 in the pip group across 1 directory by @dependabot in #754
- Update version to 0.8.2 in setup.py for release by @KoalaQin in #758
- Add gcs connector to PyPi publish by @KoalaQin in #759
Full Changelog: v0.8.1...v0.8.2
v0.8.1
What's Changed
Bug fixes
- Fix
annotate_with_ht
to only use a semi-join whenfilter_missing
is True by @jkgoodrich in #709 - Fix bug in
process_consequences
that was introduced when adding support for VEP without polyphen by @jkgoodrich in #710
New Features
- Add explode_downsamplings function by @klaricch in #694
- Update VEP csqs in impact categories to match VEP by @mike-w-wilson in #703
- Add
get_summary_stats_variant_filter_expr
andget_summary_stats_csq_filter_expr
to build filtering expressions for summary stats by @jkgoodrich in #701 - Add
filter_vep_transcript_csqs_expr
, a version offilter_vep_transcript_csqs
that takes and returns an ArrayExpression by @jkgoodrich in #713 - Add create_vds function that only supports creating from gvcfs by @mike-w-wilson in #716
- Add functions
fill_missing_key_combinations
andmissing_struct_expr
by @jkgoodrich in #718
Other Changes
- Add a space in joint filter info dict by @KoalaQin in #698
- Change the number of values for stat_union_gen_ancs to unknown by @KoalaQin in #699
- Bump idna from 3.4 to 3.7 in /docs by @dependabot in #692
- Bump jinja2 from 3.1.3 to 3.1.4 in /docs by @dependabot in #700
- Bump requests from 2.31.0 to 2.32.2 in /docs by @dependabot in #708
- Update setup.py for v0.8.1 by @mike-w-wilson in #720
Full Changelog: v0.8.0...v0.8.1
v0.8.0
What's Changed
Breaking Changes
- Add mid to FAF and grpmax calcs by @mike-w-wilson in #658
- Update POPS constant to contain a dictionary of both exomes and genomes by @klaricch in #690
Bug fixes
- Account for missingness in int64 to int32 VCF type conversion by @mike-w-wilson in #668
- Fix
generic_field_check
in validity_checks.py print of failed checks by @jkgoodrich in #693
New Features
- Add RSEM summary function by @jkgoodrich in #647
- Function to get expression proportion by @KoalaQin in #649
- Add GTEx import resources by @KoalaQin in #646
- Add function
agg_by_strata
, which is a generalized version of thecompute_freq_by_strata
by @jkgoodrich in #659 - Clean up
compute_coverage_stats
, change it to useagg_by_strata
and have an optionalgroup_membership_ht
parameter by @jkgoodrich in #660 - Add
densify_all_reference_sites
to perform a densify at all sites in a reference HT by @jkgoodrich in #661 - Add
compute_stats_per_ref_site
to generalize computation of aggregate stats at all sites in a reference Table by @jkgoodrich in #662 - Functions to process, filter, annotate and aggregate variants by transcript expression (get the pext scores per variant) by @KoalaQin in #651
- Add gnomAD all sites allele number resource by @jkgoodrich in #669
- Add
read_args
parameter to the read functions of Resource Classes by @jkgoodrich in #672 - Add
get_is_haploid_expr
,get_dp_gq_adj_expr
,get_adj_het_ab_expr
, and some helpful parameters toagg_by_strata
andcompute_stats_per_ref_site
by @jkgoodrich in #673 - Add
sex_karyotype_field
as an argument tocompute_stats_per_ref_site
to include sex ploidy adjustment after densify by @jkgoodrich in #677 - Add function for adding gencode annotation by @klaricch in #681
- Update vcf.py to work on joint freq release Table by @KoalaQin in #688
- Change
get_downsampling_freq_indices
anddownsampling_counts_expr
to support both 'pop' and 'gen_anc' keys in metadata by @jkgoodrich in #633
Other Changes
- Suggestions to get_expression_proportion PR by @jkgoodrich in #653
- Suggestions to tx_annotate_mt PR by @jkgoodrich in #654
- Suggestions to tx_annotate_mt by @jkgoodrich in #655
- Rearrange and enforce adj_group and group_membership being on the sam… by @mike-w-wilson in #666
- Bump jinja2 from 3.1.2 to 3.1.3 in /docs by @dependabot in #665
- Add v4 to genome release constants by @klaricch in #671
- Pull ploidy optimization into a function by @mike-w-wilson in #676
- Fix sex ploidy adjustment so XX samples still get set to missing on chrY by @jkgoodrich in #678
- Minor GKS formatting changes and addition of gnomAD flags to annotation by @theferrit32 in #617
- Add option to exclude polyphen from process consequences by @KoalaQin in #685
- Bump black from 23.7.0 to 24.3.0 by @dependabot in #686
- Add Stat Union to the info dict by @KoalaQin in #695
Full Changelog: v0.7.1...v0.8.0
v0.7.1
This release uses Hail 0.2.122
What's Changed
Bug fixes
- Drop async file exists function by @mike-w-wilson in #643
Full Changelog: v0.7.0...v0.7.1
v0.7.0
This release contained a function that required Hail >= 0.2.126. Please use a newer release
What's Changed
Breaking Changes
- Update some gnomAD resources from lists to version dictionaries by @mike-w-wilson in #522
- Modifications to
annotate_freq
to improve memory use by @jkgoodrich in #577
Bug fixes
- Add
get_slope_int_relationship_expr
to get relationship between a pair of samples given slope and intercepts of lines to use as cutoffs. by @jkgoodrich in #511 - Fix access to version's SUBSETS and POPS within repo by @mike-w-wilson in #529
- Small changes to bokeh module imports in
utils.plotting
that were failing with Hail update by @jkgoodrich in #540 - Fix
filter_x_nonpar
andfilter_y_nonpar
to use reference genome by @jkgoodrich in #553 - Fix callstats order in
merge_freq_arrays
by @jkgoodrich in #574 - Avoid DeprecationWarnings from superseded hail function and import [minor] by @jmarshall in #576
- Fix
merge_freq_arrays
for cases with more than two arrays by @jkgoodrich in #587 - Fix negative values issue with 'diff' by @KoalaQin in #590
- Fix ValueError for
count_arrays
inmerge_freq_arrays
function by @KoalaQin in #591 - Modify
apply_rf_model
to usevector_to_array
frompyspark.ml.functions
instead ofudf
by @matren395 in #592 - Fix to drop 'AS_SB' after converting to 'AS_SB_TABLE' in
get_as_info_expr
by @jkgoodrich in #602 - Fix to GKS Seqloc
new_temp_file
by @matren395 in #612 - Move ga4gh imports to their functions by @mike-w-wilson in #626
New Features
- Add generic constraint function
annotate_constraint_groupings()
by @averywpx in #497 - Add an option for samples that must be kept to
compute_related_samples_to_drop
by @jkgoodrich in #506 - Add
determine_nearest_neighbors
to find nearest neighbors for each sample. Modifycompute_stratified_metrics_filter
to work with acomparison_sample_expr
that specifies what samples to compare to for filtering, this works well with the output ofdetermine_nearest_neighbor
. by @jkgoodrich in #509 - Add utility function to repartition HTs prior to join by @ch-kr in #512
- Add VEP 105 init script and its docker image by @KoalaQin in #516
- Add VEP 105 GRCh38 context HT resource by @jkgoodrich in #524
- Add additional groupings to optional stratified allele frequencies by @KoalaQin in #523
- Add 'strata' and 'qc_metrics' as globals on the table returned by
compute_stratified_metrics_filter
by @jkgoodrich in #521 - Modify
annotate_mutation_type
to take optional context length as a parameter. by @jkgoodrich in #530 - Add generic constraint functions:
oe_aggregation_expr()
,compute_pli()
,oe_confidence_interval()
,calculate_raw_z_score()
,calculate_raw_z_score_sd()
by @averywpx in #505 - Add dbSNP b156 to resources for v4 by @KoalaQin in #525
- Add
pab_max_expr
function and modifydefault_compute_info
to add 'AS_pab_max' annotation by @jkgoodrich in #531 - Add generic constraint functions:
get_downsamplings()
,remove_coverage_outliers()
, andfilter_for_mu()
by @averywpx in #507 - Add
ac_filter_groups
todefault_compute_info
allowing additional allele count groupings by @jkgoodrich in #534 - Add global annotations for 'vep_version', 'vep_help', and 'vep_config ' to the returned Table in
vep_or_lookup_vep
by @jkgoodrich in #536 - Add
annotate_allele_info
function toutils.annotations
by @jkgoodrich in #535 - Add validity check code of VEP annotations in protein-coding genes by @KoalaQin in #548
- Merge freq array function and new frequency dictionary builder by @mike-w-wilson in #551
- Add GRCh38 methylation sites resource by @jkgoodrich in #552
- Modify
comparison_sample_expr
parameter ofcompute_stratified_metrics_filter
to also accept a BooleanExpression by @jkgoodrich in #557 - Add parameters
apply_model_func
andconvert_model_func
toassign_population_pcs
so it has the ability to work with other models types by @jkgoodrich in #558 - Add
sample_list_stratification
option tocreate_fake_pedigree
function by @jkgoodrich in #564 - Modify
default_compute_info
with the option to use theAS_
annotations in gvcf_info for allele specific aggregations by @jkgoodrich in #560 - Modify
annotate_adj
to support LGT and LAD by @jkgoodrich in #567 - Function to annotate downsamplings onto HT/MT by @mike-w-wilson in #570
- Add function to merge histograms with the same bin_edges by @mike-w-wilson in #572
- Add option to also merge an array of counts/ints in the freq array merge by @mike-w-wilson in #565
- Update
annotate_freq
andqual_hists
, addsplit_vds
andcompute_freq_by_strata
by @mike-w-wilson in #571 - Add function
update_structured_annotations
to update structured annotations on a Table by @KoalaQin in #580 - Make naive_coalesce optional in
default_compute_info
by @jkgoodrich in #584 - Add function to remove items from freq and freq_meta by @KoalaQin in #582
- Add a
select_fields
option tocompute_freq_by_strata
by @jkgoodrich in #595 - Modify
split_info_annotation
to allow for splitting an info expression that doesn't includeAS_SB_TABLE
by @jkgoodrich in #594 - Update to allow for grouping and filtering by MANE transcripts by @klaricch in #605
- Add gnomad_gks() and get_gks() for extracting gks information for a specified variant by @matren395 in #596
- Add aggregations to variant QC evaluation for additional plots by @jkgoodrich in #609
- Add function to get max FAF from
faf_expr
by @KoalaQin in #608 - Add optional stratification parameter to coverage by @jkgoodrich in #615
- Add methylation resource for chrX by @klaricch in #622
- Add pop_label option to
pop_max_expr
,faf_expr
, andgen_anc_faf_max_expr
by @jkgoodrich in #623 - Add
apply_keep_to_only_items_in_filter
option tofilter_arrays_by_meta
by @jkgoodrich in #624 - Add pprint globals and a global/row length comparison, updates monoallelic expr in validity checks by @mike-w-wilson in #630
- Add MANE Select filtering option to
get_summary_counts
by @jkgoodrich in #634 - Add optional parameters to
set_female_y_metrics_to_na_expr
to use other frequency fields by @jkgoodrich in #635 - Update resource paths by @klaricch in #642
Other Changes
- Update doc requirements.doc.txt by @jkgoodrich in #520
- Bump requests from 2.28.2 to 2.31.0 in /docs by @dependabot in #543
- Add VEP 105 CSQ FIELDs by @KoalaQin in #546
- Update python 3.8 -> 3.11 by @jkgoodrich in #578
- Add ability ...
v0.6.4
What's Changed
This release uses Hail 0.2.105
Bug fixes
- Fix
assign_population_pcs
error when parameterpc_cols
is a Hail ArrayExpression by @jkgoodrich in #503
Other Changes
- Modifying
assign_population_pcs
to be more flexible by accepting an array expression in 'pc_cols' and adding a 'pc_expr' parameter instead of always using 'scores' by @jkgoodrich in #500 - add
.he
to file extensions list infile_exists()
by @averywpx in #501 - add generic constraint functions:
build_models()
,build_plateau_models_pop()
,build_plateau_models_total()
,build_coverage_model()
,get_all_pop_lengths()
by @averywpx in #485
Full Changelog: v0.6.3...v0.6.4
v0.6.3
What's Changed
This release uses Hail 0.2.104
Breaking Changes
- Change type of "pc_cols" param in ancestry function from hl.expr.ArrayExpression to List[int] to help track PCs that were used in RF model by @klaricch in #448
- Add additional_samples_to_drop option to
run_pca_with_relateds
by @klaricch in #489
Bug fixes
- Fix to only add the
error_rate
annotation iffit
is not supplied toassign_population_pcs
by @klaricch in #453 - Modify
merge_sample_qc_expr
to work with the additional VDS sample QC metrics: n_singleton_ti, n_singleton_tv, and r_ti_tv_singleton by @jkgoodrich in #454 - Fix
vep_or_lookup_vep
to dropvep_proc_id
if it exists by @konradjk in #439 - Fix to paths for VEP 101 resources in init script by @jkgoodrich in #488
- Changed tqdm to SimpleRichProgressBar in file_utils by @ch-kr in #495
New Features
- Add an
n_pcs
option torun_platform_pca
by @jkgoodrich in #468 - Add n_partitions option to get_qc_mt before LD pruning by @klaricch in #472
- Add block_size option to get_qc_mt for LD pruning by @klaricch in #473
- Add
gaussian_mixture_model_karyotype_assignment
function to assign sex karyotype using Gaussian mixture models by @jkgoodrich in #478 - Add
variants_filter_lcr
,variants_filter_segdup
andvariants_snv_only
options toannotate_sex
to filter variants prior to variant only ploidy imputation by @jkgoodrich in #479 - Add an option
compute_x_frac_variants_hom_alt
toannotate_sex
that computes the fraction of variants on chromosome X that are homozygous alternate per sample by @jkgoodrich in #480 - Add generic constraint functions - annotate_mutation_type(), trimer_from_heptamer(), collapse_strand(), add_most_severe_csq_to_tc_within_vep_root() by @averywpx in #474
- Add more file types to
file_exists
for checking '_SUCCESS' by @jkgoodrich in #486 - Add
coverage_mt
option toannotate_sex
which takes an optional precomputed coverage MT to use for ploidy imputation instead of remaking it. by @jkgoodrich in #484 - Add function
get_chr_x_hom_alt_cutoffs
, add arguments toinfer_sex_karyotype
andget_sex_expr
to use the new function and it's output. by @jkgoodrich in #492 - Add
bi_allelic_only
andsnv_only
options toget_qc_mt
by @jkgoodrich in #471 - Add generic constraint functions: annotate_with_mu(), count_variants(), downsampling_counts_expr(), filter_vep_transcript_csqs(), combine_functions(), filter_x_nonpar(), and filter_y_nonpar() by @averywpx in #481
Other Changes
- Handle tags created through GitHub in publish release workflow by @nawatts in #451
- Change branch name in CI workflow configuration by @nawatts in #452
New Contributors
Full Changelog: v0.6.2...v0.6.3
v0.6.2
What's Changed
New Features
- Use Google Cloud Public Datasets as default source for public resources by @nawatts in #431
- Add options for reading public resources from Registry of Open Data on AWS and Azure Open Datasets by @nawatts in #430
- Allow setting the default source for public resources with an environment variable by @nawatts in #435
- Use hl.utils.guess_cloud_spark_provider to set default resources source by @nawatts in #436
- add checkpoint option to get_qc_mt by @klaricch in #437
- Modification to the
annotate_sex
pipeline to allow sex ploidy estimation using only variants instead of ref blocks by @jkgoodrich in #445
Other Changes
- Document selecting resource source by @nawatts in #408
- Add VEP 101 init by @jkgoodrich in #411
- Small fix to docstrings for make_freq_index_dict() by @gtiao in #412
- Tiny fix to assign_population_pcs use of known label by @jkgoodrich in #413
- Added option to get file stats for requester-pays files by @ch-kr in #414
- fix to faf description text by @jkgoodrich in #415
- Update current gnomAD GRCh38 genome release v3.1.2 by @jkgoodrich in #416
- Update to new RouterAsyncFS interface in Hail 0.2.79 by @nawatts in #425
- add vds resource by @klaricch in #423
- Modified subset_samples_and_variants() by @wlu04 in #421
- Modified compute_stratified_sample_qc() by @wlu04 in #420
- Modified annotate_sex() by @wlu04 in #427
New Contributors
Full Changelog: v0.6.0...v0.6.2
v0.6.1
v0.6.0
Released September 3rd, 2021
All resources have been moved to a requester pays bucket.
Fixed
- Fix
annotation_type_is_numeric
andannotation_type_in_vcf_info
(#379)
Changed
- VersionedResource objects are no longer subclasses of BaseResource (#359)
- gnomAD resources can now be imported from different sources (#373)
- Replaced
ht_to_vcf_mt
withadjust_vcf_incompatible_types
which maintains all functionality except turning the ht into a mt because it is no longer needed for use of the Hail moduleexport_vcf
(#365) - Modified
SEXES
in utils/vcf to be 'XX' and 'XY' instead of 'female' and 'male' (#381) - Changed module
sanity_checks
tovalidity_checks
, modified functionsgeneric_field_check
,make_filters_expr_dict
(previouslymake_filters_sanity_check_expr
), andmake_group_sum_expr_dict
(previouslysample_sum_check
) (#395)
Added
- Added function
region_flag_expr
to flag problematic regions (#349) - Added function
missing_callstats_expr
to create a Hail Struct with missing values that is inserted into frequency annotation arrays when data is missing (#349) - Added function
set_female_y_metrics_to_na_expr
to set Y-variant frequency callstats for female-specific metrics to missing (#349) - Added function
make_faf_index_dict
to create a look-up Dictionary for entries contained in the filter allele frequency annotation array (#349) - Added function
make_freq_index_dict
to create a look-up Dictionary for entries contained in the frequency annotation array (#349) - Added function
remove_fields_from_constant
to remove fields from a list and notify which requested fields to remove were missing (#381) - Added function
create_label_groups
to generate a list of label group dictionaries needed to populate the info dictionary for vcf export (#381) - Added function
build_vcf_export_reference
to create a subset reference based on an existing reference genome (#381) - Added function
rekey_new_reference
to re-key a Table or MatrixTable with a new reference genome (#381) - Added function
parallel_file_exists
to check whether a large number of files exist (#394) - Added functions
summarize_variant_filters
,generic_field_check_loop
,compare_subset_freqs
,sum_group_callstats
,summarize_variants
,check_raw_and_adj_callstats
,check_sex_chr_metrics
,compute_missingness
,vcf_field_check
, andvalidate_release_t
(#395)