Skip to content

Releases: broadinstitute/gnomad_methods

v0.8.2

24 Jan 16:13
73347ea
Compare
Choose a tag to compare

What's Changed

Breaking Changes

  • Update default vep version for context table resource by @ch-kr in #726
  • Add coverage_metric param to allow for different metrics of coverage and cov_model_type option to allow for linear or logarithmic by @klaricch in #724
  • Add get_tissues_to_exclude function to determine what tissues to exclude from transcript annotation calculations by @jkgoodrich in #729

Bug fixes

  • Fix tx_filter_variants_by_csqs to correctly handle the ignore_splicing parameter by @jkgoodrich in #727

New Features

  • Add retain cdf option for median calculations when computing info fields by @klaricch in #731
  • Add max_grpmax option to get_summary_stats_variant_filter_expr for filtering by grpmax by @jkgoodrich in #732
  • Change filter_mt_to_trios to also filter on vds by @KoalaQin in #739
  • Add get_mu_annotation_expr function that prevents a shuffle from happening when annotating a HT with mutation rate and use in annotate_with_mu by @jkgoodrich in #734
  • Add assemble_constraint_context_ht function to create a fully annotated context HT for computing constraint on by @jkgoodrich in #733
  • Add support for filtering Hail Tables to filter_to_trios by @jkgoodrich in #741
  • Generalize the freq_bin_expr function to take in a list of allele count and allele frequency cutoffs by @jkgoodrich in #745
  • Add function parse_variant to create a Struct with the locus and alleles from a variant string or contig, position, ref, and alt. by @jkgoodrich in #746
  • Modify filter_vep_transcript_csqs_expr so it can also accept hl.expr.StructExpression by @jkgoodrich in #748
  • Filter to Gencode CDS by genes and by exon paddings by @KoalaQin in #747
  • Add functions to support padding and filtering intervals: filter_by_intervals, pad_intervals, parse_locus_intervals by @jkgoodrich in #752
  • Add loftee_labels and no_lof_flags parameters to filter_vep_transcript_csqs_expr for filtering by loftee labels and flags by @jkgoodrich in #753
  • Add browser tables to resources by @KoalaQin in #750
  • Add functions to check struct and array missingness by @klaricch in #738

Other Changes

  • Add import code for GTEx v10 RSEM by @KoalaQin in #742
  • Add pext and constraint resources by @KoalaQin in #743
  • Bump the pip group in /docs with 2 updates by @dependabot in #715
  • Bump jinja2 from 3.1.4 to 3.1.5 in /docs in the pip group across 1 directory by @dependabot in #751
  • Bump virtualenv from 20.24.6 to 20.26.6 in the pip group across 1 directory by @dependabot in #754
  • Update version to 0.8.2 in setup.py for release by @KoalaQin in #758
  • Add gcs connector to PyPi publish by @KoalaQin in #759

Full Changelog: v0.8.1...v0.8.2

v0.8.1

29 Jul 17:52
6411b7e
Compare
Choose a tag to compare

What's Changed

Bug fixes

  • Fix annotate_with_ht to only use a semi-join when filter_missing is True by @jkgoodrich in #709
  • Fix bug in process_consequences that was introduced when adding support for VEP without polyphen by @jkgoodrich in #710

New Features

  • Add explode_downsamplings function by @klaricch in #694
  • Update VEP csqs in impact categories to match VEP by @mike-w-wilson in #703
  • Add get_summary_stats_variant_filter_expr and get_summary_stats_csq_filter_expr to build filtering expressions for summary stats by @jkgoodrich in #701
  • Add filter_vep_transcript_csqs_expr, a version of filter_vep_transcript_csqs that takes and returns an ArrayExpression by @jkgoodrich in #713
  • Add create_vds function that only supports creating from gvcfs by @mike-w-wilson in #716
  • Add functions fill_missing_key_combinations and missing_struct_expr by @jkgoodrich in #718

Other Changes

Full Changelog: v0.8.0...v0.8.1

v0.8.0

19 Apr 14:23
Compare
Choose a tag to compare

What's Changed

Breaking Changes

Bug fixes

  • Account for missingness in int64 to int32 VCF type conversion by @mike-w-wilson in #668
  • Fix generic_field_check in validity_checks.py print of failed checks by @jkgoodrich in #693

New Features

  • Add RSEM summary function by @jkgoodrich in #647
  • Function to get expression proportion by @KoalaQin in #649
  • Add GTEx import resources by @KoalaQin in #646
  • Add function agg_by_strata, which is a generalized version of the compute_freq_by_strata by @jkgoodrich in #659
  • Clean up compute_coverage_stats, change it to use agg_by_strata and have an optional group_membership_ht parameter by @jkgoodrich in #660
  • Add densify_all_reference_sites to perform a densify at all sites in a reference HT by @jkgoodrich in #661
  • Add compute_stats_per_ref_site to generalize computation of aggregate stats at all sites in a reference Table by @jkgoodrich in #662
  • Functions to process, filter, annotate and aggregate variants by transcript expression (get the pext scores per variant) by @KoalaQin in #651
  • Add gnomAD all sites allele number resource by @jkgoodrich in #669
  • Add read_args parameter to the read functions of Resource Classes by @jkgoodrich in #672
  • Add get_is_haploid_expr, get_dp_gq_adj_expr, get_adj_het_ab_expr, and some helpful parameters to agg_by_strata and compute_stats_per_ref_site by @jkgoodrich in #673
  • Add sex_karyotype_field as an argument to compute_stats_per_ref_site to include sex ploidy adjustment after densify by @jkgoodrich in #677
  • Add function for adding gencode annotation by @klaricch in #681
  • Update vcf.py to work on joint freq release Table by @KoalaQin in #688
  • Change get_downsampling_freq_indices and downsampling_counts_expr to support both 'pop' and 'gen_anc' keys in metadata by @jkgoodrich in #633

Other Changes

Full Changelog: v0.7.1...v0.8.0

v0.7.1

31 Oct 18:53
Compare
Choose a tag to compare

This release uses Hail 0.2.122

What's Changed

Bug fixes

Full Changelog: v0.7.0...v0.7.1

v0.7.0

31 Oct 16:27
Compare
Choose a tag to compare

This release contained a function that required Hail >= 0.2.126. Please use a newer release

What's Changed

Breaking Changes

  • Update some gnomAD resources from lists to version dictionaries by @mike-w-wilson in #522
  • Modifications to annotate_freq to improve memory use by @jkgoodrich in #577

Bug fixes

  • Add get_slope_int_relationship_expr to get relationship between a pair of samples given slope and intercepts of lines to use as cutoffs. by @jkgoodrich in #511
  • Fix access to version's SUBSETS and POPS within repo by @mike-w-wilson in #529
  • Small changes to bokeh module imports in utils.plotting that were failing with Hail update by @jkgoodrich in #540
  • Fix filter_x_nonpar and filter_y_nonpar to use reference genome by @jkgoodrich in #553
  • Fix callstats order in merge_freq_arrays by @jkgoodrich in #574
  • Avoid DeprecationWarnings from superseded hail function and import [minor] by @jmarshall in #576
  • Fix merge_freq_arrays for cases with more than two arrays by @jkgoodrich in #587
  • Fix negative values issue with 'diff' by @KoalaQin in #590
  • Fix ValueError for count_arrays in merge_freq_arrays function by @KoalaQin in #591
  • Modify apply_rf_model to use vector_to_array from pyspark.ml.functions instead of udf by @matren395 in #592
  • Fix to drop 'AS_SB' after converting to 'AS_SB_TABLE' in get_as_info_expr by @jkgoodrich in #602
  • Fix to GKS Seqloc new_temp_file by @matren395 in #612
  • Move ga4gh imports to their functions by @mike-w-wilson in #626

New Features

  • Add generic constraint function annotate_constraint_groupings() by @averywpx in #497
  • Add an option for samples that must be kept to compute_related_samples_to_drop by @jkgoodrich in #506
  • Add determine_nearest_neighbors to find nearest neighbors for each sample. Modify compute_stratified_metrics_filter to work with a comparison_sample_expr that specifies what samples to compare to for filtering, this works well with the output of determine_nearest_neighbor. by @jkgoodrich in #509
  • Add utility function to repartition HTs prior to join by @ch-kr in #512
  • Add VEP 105 init script and its docker image by @KoalaQin in #516
  • Add VEP 105 GRCh38 context HT resource by @jkgoodrich in #524
  • Add additional groupings to optional stratified allele frequencies by @KoalaQin in #523
  • Add 'strata' and 'qc_metrics' as globals on the table returned by compute_stratified_metrics_filter by @jkgoodrich in #521
  • Modify annotate_mutation_type to take optional context length as a parameter. by @jkgoodrich in #530
  • Add generic constraint functions: oe_aggregation_expr(), compute_pli(), oe_confidence_interval(), calculate_raw_z_score(), calculate_raw_z_score_sd() by @averywpx in #505
  • Add dbSNP b156 to resources for v4 by @KoalaQin in #525
  • Add pab_max_expr function and modify default_compute_info to add 'AS_pab_max' annotation by @jkgoodrich in #531
  • Add generic constraint functions: get_downsamplings(), remove_coverage_outliers(), and filter_for_mu() by @averywpx in #507
  • Add ac_filter_groups to default_compute_info allowing additional allele count groupings by @jkgoodrich in #534
  • Add global annotations for 'vep_version', 'vep_help', and 'vep_config ' to the returned Table in vep_or_lookup_vep by @jkgoodrich in #536
  • Add annotate_allele_info function to utils.annotations by @jkgoodrich in #535
  • Add validity check code of VEP annotations in protein-coding genes by @KoalaQin in #548
  • Merge freq array function and new frequency dictionary builder by @mike-w-wilson in #551
  • Add GRCh38 methylation sites resource by @jkgoodrich in #552
  • Modify comparison_sample_expr parameter of compute_stratified_metrics_filter to also accept a BooleanExpression by @jkgoodrich in #557
  • Add parameters apply_model_func and convert_model_func to assign_population_pcs so it has the ability to work with other models types by @jkgoodrich in #558
  • Add sample_list_stratification option to create_fake_pedigree function by @jkgoodrich in #564
  • Modify default_compute_info with the option to use the AS_ annotations in gvcf_info for allele specific aggregations by @jkgoodrich in #560
  • Modify annotate_adj to support LGT and LAD by @jkgoodrich in #567
  • Function to annotate downsamplings onto HT/MT by @mike-w-wilson in #570
  • Add function to merge histograms with the same bin_edges by @mike-w-wilson in #572
  • Add option to also merge an array of counts/ints in the freq array merge by @mike-w-wilson in #565
  • Update annotate_freq and qual_hists, add split_vds and compute_freq_by_strata by @mike-w-wilson in #571
  • Add function update_structured_annotations to update structured annotations on a Table by @KoalaQin in #580
  • Make naive_coalesce optional in default_compute_info by @jkgoodrich in #584
  • Add function to remove items from freq and freq_meta by @KoalaQin in #582
  • Add a select_fields option to compute_freq_by_strata by @jkgoodrich in #595
  • Modify split_info_annotation to allow for splitting an info expression that doesn't include AS_SB_TABLE by @jkgoodrich in #594
  • Update to allow for grouping and filtering by MANE transcripts by @klaricch in #605
  • Add gnomad_gks() and get_gks() for extracting gks information for a specified variant by @matren395 in #596
  • Add aggregations to variant QC evaluation for additional plots by @jkgoodrich in #609
  • Add function to get max FAF from faf_expr by @KoalaQin in #608
  • Add optional stratification parameter to coverage by @jkgoodrich in #615
  • Add methylation resource for chrX by @klaricch in #622
  • Add pop_label option to pop_max_expr, faf_expr, and gen_anc_faf_max_expr by @jkgoodrich in #623
  • Add apply_keep_to_only_items_in_filter option to filter_arrays_by_meta by @jkgoodrich in #624
  • Add pprint globals and a global/row length comparison, updates monoallelic expr in validity checks by @mike-w-wilson in #630
  • Add MANE Select filtering option to get_summary_counts by @jkgoodrich in #634
  • Add optional parameters to set_female_y_metrics_to_na_expr to use other frequency fields by @jkgoodrich in #635
  • Update resource paths by @klaricch in #642

Other Changes

Read more

v0.6.4

08 Nov 15:00
608aed2
Compare
Choose a tag to compare

What's Changed

This release uses Hail 0.2.105

Bug fixes

  • Fix assign_population_pcs error when parameter pc_cols is a Hail ArrayExpression by @jkgoodrich in #503

Other Changes

  • Modifying assign_population_pcs to be more flexible by accepting an array expression in 'pc_cols' and adding a 'pc_expr' parameter instead of always using 'scores' by @jkgoodrich in #500
  • add .he to file extensions list in file_exists() by @averywpx in #501
  • add generic constraint functions: build_models(), build_plateau_models_pop(), build_plateau_models_total(), build_coverage_model(), get_all_pop_lengths() by @averywpx in #485

Full Changelog: v0.6.3...v0.6.4

v0.6.3

27 Oct 20:02
f87db40
Compare
Choose a tag to compare

What's Changed

This release uses Hail 0.2.104

Breaking Changes

  • Change type of "pc_cols" param in ancestry function from hl.expr.ArrayExpression to List[int] to help track PCs that were used in RF model by @klaricch in #448
  • Add additional_samples_to_drop option to run_pca_with_relateds by @klaricch in #489

Bug fixes

  • Fix to only add the error_rate annotation if fit is not supplied to assign_population_pcs by @klaricch in #453
  • Modify merge_sample_qc_expr to work with the additional VDS sample QC metrics: n_singleton_ti, n_singleton_tv, and r_ti_tv_singleton by @jkgoodrich in #454
  • Fix vep_or_lookup_vep to drop vep_proc_id if it exists by @konradjk in #439
  • Fix to paths for VEP 101 resources in init script by @jkgoodrich in #488
  • Changed tqdm to SimpleRichProgressBar in file_utils by @ch-kr in #495

New Features

  • Add an n_pcs option to run_platform_pca by @jkgoodrich in #468
  • Add n_partitions option to get_qc_mt before LD pruning by @klaricch in #472
  • Add block_size option to get_qc_mt for LD pruning by @klaricch in #473
  • Add gaussian_mixture_model_karyotype_assignment function to assign sex karyotype using Gaussian mixture models by @jkgoodrich in #478
  • Add variants_filter_lcr, variants_filter_segdup and variants_snv_only options to annotate_sex to filter variants prior to variant only ploidy imputation by @jkgoodrich in #479
  • Add an option compute_x_frac_variants_hom_alt to annotate_sex that computes the fraction of variants on chromosome X that are homozygous alternate per sample by @jkgoodrich in #480
  • Add generic constraint functions - annotate_mutation_type(), trimer_from_heptamer(), collapse_strand(), add_most_severe_csq_to_tc_within_vep_root() by @averywpx in #474
  • Add more file types to file_exists for checking '_SUCCESS' by @jkgoodrich in #486
  • Add coverage_mt option to annotate_sex which takes an optional precomputed coverage MT to use for ploidy imputation instead of remaking it. by @jkgoodrich in #484
  • Add function get_chr_x_hom_alt_cutoffs, add arguments to infer_sex_karyotype and get_sex_expr to use the new function and it's output. by @jkgoodrich in #492
  • Add bi_allelic_only and snv_only options to get_qc_mt by @jkgoodrich in #471
  • Add generic constraint functions: annotate_with_mu(), count_variants(), downsampling_counts_expr(), filter_vep_transcript_csqs(), combine_functions(), filter_x_nonpar(), and filter_y_nonpar() by @averywpx in #481

Other Changes

  • Handle tags created through GitHub in publish release workflow by @nawatts in #451
  • Change branch name in CI workflow configuration by @nawatts in #452

New Contributors

Full Changelog: v0.6.2...v0.6.3

v0.6.2

10 May 18:38
ae139ce
Compare
Choose a tag to compare

What's Changed

New Features

  • Use Google Cloud Public Datasets as default source for public resources by @nawatts in #431
  • Add options for reading public resources from Registry of Open Data on AWS and Azure Open Datasets by @nawatts in #430
  • Allow setting the default source for public resources with an environment variable by @nawatts in #435
  • Use hl.utils.guess_cloud_spark_provider to set default resources source by @nawatts in #436
  • add checkpoint option to get_qc_mt by @klaricch in #437
  • Modification to the annotate_sex pipeline to allow sex ploidy estimation using only variants instead of ref blocks by @jkgoodrich in #445

Other Changes

New Contributors

Full Changelog: v0.6.0...v0.6.2

v0.6.1

06 Jan 16:52
Compare
Choose a tag to compare
  • Update for new RouterAsyncFS import/interface in recent Hail versions (55214e8)
  • Fix assign_population_pcs's use of known population label (9c8f089)

v0.6.0

06 Jan 14:58
Compare
Choose a tag to compare

Released September 3rd, 2021

All resources have been moved to a requester pays bucket.

Fixed

  • Fix annotation_type_is_numeric and annotation_type_in_vcf_info (#379)

Changed

  • VersionedResource objects are no longer subclasses of BaseResource (#359)
  • gnomAD resources can now be imported from different sources (#373)
  • Replaced ht_to_vcf_mt with adjust_vcf_incompatible_types which maintains all functionality except turning the ht into a mt because it is no longer needed for use of the Hail module export_vcf (#365)
  • Modified SEXES in utils/vcf to be 'XX' and 'XY' instead of 'female' and 'male' (#381)
  • Changed module sanity_checks to validity_checks, modified functions generic_field_check, make_filters_expr_dict (previously make_filters_sanity_check_expr), and make_group_sum_expr_dict (previously sample_sum_check) (#395)

Added

  • Added function region_flag_expr to flag problematic regions (#349)
  • Added function missing_callstats_expr to create a Hail Struct with missing values that is inserted into frequency annotation arrays when data is missing (#349)
  • Added function set_female_y_metrics_to_na_expr to set Y-variant frequency callstats for female-specific metrics to missing (#349)
  • Added function make_faf_index_dict to create a look-up Dictionary for entries contained in the filter allele frequency annotation array (#349)
  • Added function make_freq_index_dict to create a look-up Dictionary for entries contained in the frequency annotation array (#349)
  • Added function remove_fields_from_constant to remove fields from a list and notify which requested fields to remove were missing (#381)
  • Added function create_label_groups to generate a list of label group dictionaries needed to populate the info dictionary for vcf export (#381)
  • Added function build_vcf_export_reference to create a subset reference based on an existing reference genome (#381)
  • Added function rekey_new_reference to re-key a Table or MatrixTable with a new reference genome (#381)
  • Added function parallel_file_exists to check whether a large number of files exist (#394)
  • Added functions summarize_variant_filters, generic_field_check_loop, compare_subset_freqs, sum_group_callstats, summarize_variants, check_raw_and_adj_callstats, check_sex_chr_metrics, compute_missingness, vcf_field_check, and validate_release_t (#395)