Releases: PacificBiosciences/pb-StarPhase
Releases · PacificBiosciences/pb-StarPhase
StarPhase v1.1.0
Changes
- HLA data configuration has been automated to support future haplotype additions
- The
pbstarphase build
command now requires a reference genome FASTA file for hg38; this file is specified via--reference
pbstarphase build
will now automatically pull the latest RefSeq file; coordinates for the MANE transcript are extracted from this resource- Configurations for HLA genes have been completely updated to a new structure; if using an older database file, the existing defaults from v1.0.0 will be loaded for HLA-A and HLA-B
- The
- Additional HLA genes are now reported by StarPhase: HLA-C, -DPA1, -DPB1, -DQA1, -DQB1, -DRB1, -DRB3, -DRB4, and -DRB5
- HLA algorithm has been updated to accomodate additional HLA genes
- Reads that only partially span an HLA locus are now supported (at least 50% overlap with a gene)
- Reads are now pulled from all specified HLA regions and assigned to a single gene based on closest database match
- Each batch of assigned reads is run through the HLA diplotyping process independently
- cDNA consensus step has been removed and replaced with a HPC consensus step (similar to CYP2D6); haplotype label assignment still uses cDNA sequence
- Additional logic has been added to support genes that are commonly absent or hemizygous (HLA-DRB3, -DRB4, and -DRB5) in individuals
- Debug folder updates:
hla_debug.json
has been updated to include target and query unmapped base countsread_debug.json
has been added to the outputs. This file includes the best mapping of each read to an HLA allele.hla_igv_custom
has been added to the outputs. This is similar to the previouscyp2d6_igv_custom
, but contains the assembled haplotypes for the HLA genes. Consensus sequence, database sequences, and user specified sequences are mapped to the custom assemblies. The IGV options on the session have been modified to reasonable defaults for comparing consensus to the reads.
- A new version of the database file has been uploaded to support the above changes (
v1.1.0/pbstarphase_20250110.json.gz
)
StarPhase v1.0.1
Fixed
- Fixed an issue where a homozygous deletion in CYP2D6 (*5/*5) was not considered a valid diplotype when using
--normalize-d6-only
StarPhase v1.0.0
Changes
- Source code is included in the GitHub repository
LICENSE.md
has been updated to reflect terms and conditions of source code usage
pb-StarPhase v0.14.2
Fixed
- Fixed an issue where a CYP2D6 graph with no edges would lead to a panic during SVG graph visualization
pb-StarPhase v0.14.1
Changes
- Added a new output folder through the
--output-debug
option:cyp2d6_igv_custom
. This folder contains an XML file describing an IGV session as well as the supporting data files to visualize full length alignments through the two constructed CYP2D6 haplotypes. For details, see the updated user docs. - Released an updated database file:
data/v0.14.1/pbstarphase_20240826.json.gz
Fixed
- Fixed some off-by-one errors in the coordinates for miscellaneous extra regions in CYP2D6. These caused tiny overlaps in the output BAM file described above, but otherwise did not have an impact on diplotyping accuracy.
pb-StarPhase v0.14.0
Changes
- HLA allele labeling has been updated to improve 4th-field accuracy: When two potential definitions are compared, we now restrict the initial comparison to only the shared regions of the two haplotype sequence definitions (this is often different, especially for DNA sequences). In the event of a tie, we revert to the full-length allele definitions.
- The HLA database configuration has been updated to include strand information for HLA genes. Defaults for HLA-A and HLA-B are set, so no database update is required. This modification will show in the next database release.
- HLA debug consensus outputs will now be output on the strand the gene is located to improve matching to IMGT/HLA sequences. For example, HLA-A is already on the forward strand so no change will be made. In contrast, HLA-B is on the reverse strand so the consensus sequences will be reverse complemented in the output FASTA file.
- Breaking change: CYP2D6 and the HLA genes now share a single debug BAM file through the
--output-debug
option:debug_consensus.bam
- The previous debug file for CYP2D6,
cyp2d6_consensus.bam
, has been removed from the outputs. The mappings from this file have been moved into the newdebug_consensus.bam
file. - For both HLA genes, the BAM file contains alignments of the HLA consensus sequences and corresponding read sequences used to generate the consensus. Additionally, if the assigned haplotypes have DNA sequences in the database, those sequences are also aligned for comparison purposes.
- Previously deprecated option
--debug-hla-target
has been repurposed to allow for specification of additional HLA haplotypes to get mapped in this debug BAM. As with the assigned haplotypes, these must have a DNA sequence in the database to get mapped.
- The previous debug file for CYP2D6,
pb-StarPhase v0.13.3
Fixed
- Replaced a panic with an error message when low coverage datasets fail to identify any CYP2D6 haplotypes to chain together. These will have a "NO_MATCH" diplotype in the results.
- Fixed a bug where duplicate consensus sequences in CYP2D6 could create a panic, duplicates are now flagged as FalseAlleles and ignored.
pb-Starphase v0.13.2
Fixed
- Adjusted the alignment parameters for HLA mapping to reduce errors caused by soft-clipping of alignments near the end of a haplotype
- Replaced a panic with an error message when variants are found in unexpected states of zygosity and phase (e.g., phased homozygous)
- Debug messages for HLA calling have been adjusted to improve log reviewability
pb-StarPhase v0.13.1
Changes
- For HLA genes, StarPhase would previously ignore any HLA allele definitions that were missing a DNA sequence in the database. StarPhase now allows these partial HLA allele definitions by default.
- A new option was added to enable the previous behavior:
--hla-require-dna
. If this option is enabled, any HLA allele definition that is missing a DNA sequence will be ignored and never reported in StarPhase outputs.
Fixed
- Fixed an issue where a CYP2D6 deletion allele (*5) could be reported on the same haplotype as another allele. While this is biologically possible (e.g., deletion of one *10 in a "*10x2" haplotype), it is not considered a valid star-allele at this time. This combination will still show up in the debug log files, but it will get filtered in final reporting. For example: a "*10+*5" haplotype will now get reported as "*10".
pb-StarPhase v0.13.0
v0.13.0
Changes
- The algorithm for HLA-A and HLA-B has been modified to use a consensus-based approach to solve the alleles, a simpler version of the algorithm for CYP2D6.
- CLI options related to consensus generation now control both HLA and CYP2D6 calling. These have been moved into a separate category on the CLI labeled "Consensus (HLA and CYP2D6)".
- In internal tests, these changes slightly improved the accuracy of 4th-field entries in the HLA calls (2nd- and 3rd-field were unaffected). Additionally, the approach significantly reduced compute time requirements, averaging ~10% of CPU time required for v0.12.0.
- With this change, the
--threads
option does not provide any benefit to the current algorithms. It has been deprecated, but may be added again if future optimizations allow it. - The
--max-error-rate
default has been adjusted for comparison to just the reference allele for each HLA gene, with a new default of 0.07 (previously 0.05). - Previous option
--min-allele-fraction
for HLA has been removed. The consensus option--min-consensus-fraction
is used instead.
- Added a new option,
--output-debug
, that will create a debug folder with multiple additional files that are primarily for debugging the results from HLA and CYP2D6 calling, but may be useful for researchers. This folder is subject to change as the underlying methods develop. Some of the initial files included:consensus_{GENE}.fa
- Contains the full consensus sequences generated for a given{GENE}
. Currently, this is only for HLA genes and CYP2D6.cyp2d6_consensus.bam
- Contains mapped substrings from the reads that were used to generate CYP2D6 consensus sequences. The phase set tag (PS) indicates which consensus the sequence was a part of. Useful for visualizing how the consensus ran and whether there are potential errors.cyp2d6_link_graph.svg
- A graphical representation of the connections present between CYP2D6 consensus segments.hla_debug.json
- Contains the summary mapping information of each database entry to the generated HLA consensus sequences.
Fixed
- Fixed an issue with
build
where CPIC genes with no known chromosome would cause an error and exit. These entries are now ignored with a warning. - Fixed an off-by-one error in the HLA gene region start coordinates. This has been corrected in the latest database release:
data/v0.13.0/pbstarphase_20240730.json.gz