Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/staging'
Browse files Browse the repository at this point in the history
  • Loading branch information
susannasiebert committed Aug 21, 2024
2 parents d21bb95 + 6744e24 commit d70277a
Show file tree
Hide file tree
Showing 70 changed files with 41,157 additions and 1,817 deletions.
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,9 @@
# built documents.
#
# The short X.Y version.
version = '4.2'
version = '4.3'
# The full version, including alpha/beta/rc tags.
release = '4.2.1'
release = '4.3.0'


# The language for content autogenerated by Sphinx. Refer to documentation
Expand Down
6 changes: 6 additions & 0 deletions docs/courses.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Courses
=======

- `Introductory pVACtools course <https://course.pvactools.org>`_
- `Introductory immunotherapy workflow course <https://workflow-course.pvactools.org>`_
- `Precision medicine course <https://pmbio.org/>`_
6 changes: 6 additions & 0 deletions docs/funding.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Funding
=======

pVACtools is supported by the National Cancer Institute (U01CA248235) and the
V Foundation for Cancer Research (V2018-007) as well as a generous gift from
Mrs. Cindy Goldberg and Mr. Evan Goldberg.
51 changes: 42 additions & 9 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,27 +42,60 @@ Contents
:maxdepth: 1

install
courses
tools
frequently_asked_questions
releases
license
citation
funding
contribute
contact
mailing_list

New in Release |release|
------------------------

This is a bugfix release. It fixes the following problem(s):

- When running the reference protome similarity step with a reference protome
peptide fasta file and a species other than human or mouse, the run would be aborted
with an error that the refseq_protein_prot BLASTp database was incompatible with
the species. This error should not be emitted in this circumstance since
BLASTp is not run when using a reference proteome peptide fasta file. This
release fixes this error and allows users to run the reference proteome
similarity step on non-human and non-mouse data with a reference proteome peptide fasta.
This is a minor feature release. It adds the following features:

- Add a new helper command ``pvacseq|pvacfuse|pvacbind|pvacvector valid_algorithms``
by @ldhtnp in /~https://github.com/griffithlab/pVACtools/pull/1108
- When running the ``pvacseq generate_protein_fasta`` command with the ``--phased-proximal-variants-vcf``
argument, output the intermediate ``proximal_variants.tsv`` file by @evelyn-schmidt
in /~https://github.com/griffithlab/pVACtools/pull/1091
- In pVACview, clear the comment text input box after saving the comment by @ldhtnp
in /~https://github.com/griffithlab/pVACtools/pull/1113
- Add support for mouse allele anchor positions by @ldhtnp in
/~https://github.com/griffithlab/pVACtools/pull/1110
- Skip variants where VEP didn't predict an amino acid change by @susannasiebert
in /~https://github.com/griffithlab/pVACtools/pull/1121
- Update the ordering of the fasta file output of the ``pvacseq|pvacfuse generate_protein_fasta``
command when running with the ``--input-tsv`` argument so that the order of the fasta sequences
is consistent with the order of the neoantigen candidates in the input TSV by @mhoang22 in
/~https://github.com/griffithlab/pVACtools/pull/1002
- Updat the ``pvacfuse generate_protein_fasta`` command to allow aggregated TSVs as an input TSV
by @susannasiebert in /~https://github.com/griffithlab/pVACtools/pull/1134
- Update pVACview to display the anchor positions currently applied to the data by @susannasiebert
in /~https://github.com/griffithlab/pVACtools/pull/1114

This release also fixes the following bug(s):

- Handle invalid pVACfuse characters by trimming the sequence instead of skipping it. The previous
implementation would lead to missing sequences in certain downstream steps, resulting in errors.
by @susannasiebert in /~https://github.com/griffithlab/pVACtools/pull/1130
- Add new pVACview R files to the list of files getting copied into the pVACseq output folder.
These files were previsouly not copied in the the results folder, leading to error when running
the ``pvacview run`` commands on a pVACseq output directory. by @susannasiebert in
/~https://github.com/griffithlab/pVACtools/pull/1126
- Remove single DP and DQ alpha and beta chain alleles from the list of supported alleles in MHCnuggetsII.
This is because those alleles need to be defined as a pair of alpha- and beta-chains in order to be
meaningful. Also remove DRA alleles from the same list since the DR locus is defined only by the beta
chain because functional variation in mature DRA gene products is absent. by @susannasiebert in
/~https://github.com/griffithlab/pVACtools/pull/1133
- Fix errors in the rounding of the min and max values of the sliders in the custom pVACview module by
@evelyn-schmidt in /~https://github.com/griffithlab/pVACtools/pull/1116
- Remove unused code in the Frameshift.pm VEP plugin that causes errors with certain types of variants
by @susannasiebert in /~https://github.com/griffithlab/pVACtools/pull/1122

Past release notes can be found on our :ref:`releases` page.

Expand Down
12 changes: 12 additions & 0 deletions docs/pvacbind/additional_commands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,18 @@ List Valid Alleles

.. program-output:: pvacbind valid_alleles -h

.. _pvacbind_valid_algorithms:

List Valid Algorithms
---------------------

.. program-output:: pvacbind valid_algorithms -h

.. .. argparse::
:module: lib.valid_algorithms
:func: define_parser
:prog: pvacbind valid_algorithms

List Allele-Specific Cutoffs
----------------------------

Expand Down
12 changes: 12 additions & 0 deletions docs/pvacfuse/additional_commands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,18 @@ List Valid Alleles
:func: define_parser
:prog: pvacfuse valid_alleles

.. _pvacfuse_valid_algorithms:

List Valid Algorithms
---------------------

.. program-output:: pvacfuse valid_algorithms -h

.. .. argparse::
:module: lib.valid_algorithms
:func: define_parser
:prog: pvacfuse valid_algorithms

List Allele-Specific Cutoffs
----------------------------

Expand Down
6 changes: 5 additions & 1 deletion docs/pvacseq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,11 @@
pVACseq
=======

pVACseq is a cancer immunotherapy pipeline for the identification of **p**\ ersonalized **V**\ ariant **A**\ ntigens by **C**\ ancer **Seq**\ uencing (pVACseq) that integrates tumor mutation and expression data (DNA- and RNA-Seq). It enables cancer immunotherapy research by using massively parallel sequence data to predicting tumor-specific mutant peptides (neoantigens) that can elicit anti-tumor T cell immunity. It is being used in studies of checkpoint therapy response and to identify targets for personalized cancer vaccines and adoptive T cell therapies. For more general information, see the `manuscript published in Genome Medicine <http://www.genomemedicine.com/content/8/1/11>`_.
pVACseq is a cancer immunotherapy pipeline for the identification of **p**\ ersonalized **V**\ ariant **A**\ ntigens by **C**\ ancer **Seq**\ uencing (pVACseq)
that integrates tumor mutation and expression data (DNA- and RNA-Seq). It enables cancer immunotherapy research by using massively parallel sequence data to
predict tumor-specific mutant peptides (neoantigens) that can elicit anti-tumor T cell immunity. It is being used in studies of checkpoint therapy response and
to identify targets for personalized cancer vaccines and adoptive T cell therapies. For more general information, see the
`manuscript published in Cancer Immunology Research <https://doi.org/10.1158/2326-6066.CIR-19-0401>`_.

.. toctree::
:maxdepth: 2
Expand Down
12 changes: 12 additions & 0 deletions docs/pvacseq/additional_commands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,18 @@ List Valid Alleles
:func: define_parser
:prog: pvacseq valid_alleles

.. _valid_algorithms:

List Valid Algorithms
---------------------

.. program-output:: pvacseq valid_algorithms -h

.. .. argparse::
:module: lib.valid_algorithms
:func: define_parser
:prog: pvacseq valid_algorithms

List Allele-Specific Cutoffs
----------------------------

Expand Down
12 changes: 12 additions & 0 deletions docs/pvacvector/additional_commands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,18 @@ List Valid Alleles
:func: define_parser
:prog: pvacfuse valid_alleles

.. _pvacvector_valid_algorithms:

List Valid Algorithms
---------------------

.. program-output:: pvacvector valid_algorithms -h

.. .. argparse::
:module: lib.valid_algorithms
:func: define_parser
:prog: pvacvector valid_algorithms

List Allele-Specific Cutoffs
----------------------------

Expand Down
1 change: 1 addition & 0 deletions docs/releases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ Release Notes
releases/4_0
releases/4_1
releases/4_2
releases/4_3
46 changes: 46 additions & 0 deletions docs/releases/4_3.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
Version 4.3
===========

Version 4.3.0
-------------

This is a minor feature release. It adds the following features:

- Add a new helper command ``pvacseq|pvacfuse|pvacbind|pvacvector valid_algorithms``
by @ldhtnp in /~https://github.com/griffithlab/pVACtools/pull/1108
- When running the ``pvacseq generate_protein_fasta`` command with the ``--phased-proximal-variants-vcf``
argument, output the intermediate ``proximal_variants.tsv`` file by @evelyn-schmidt
in /~https://github.com/griffithlab/pVACtools/pull/1091
- In pVACview, clear the comment text input box after saving the comment by @ldhtnp
in /~https://github.com/griffithlab/pVACtools/pull/1113
- Add support for mouse allele anchor positions by @ldhtnp in
/~https://github.com/griffithlab/pVACtools/pull/1110
- Skip variants where VEP didn't predict an amino acid change by @susannasiebert
in /~https://github.com/griffithlab/pVACtools/pull/1121
- Update the ordering of the fasta file output of the ``pvacseq|pvacfuse generate_protein_fasta``
command when running with the ``--input-tsv`` argument so that the order of the fasta sequences
is consistent with the order of the neoantigen candidates in the input TSV by @mhoang22 in
/~https://github.com/griffithlab/pVACtools/pull/1002
- Updat the ``pvacfuse generate_protein_fasta`` command to allow aggregated TSVs as an input TSV
by @susannasiebert in /~https://github.com/griffithlab/pVACtools/pull/1134
- Update pVACview to display the anchor positions currently applied to the data by @susannasiebert
in /~https://github.com/griffithlab/pVACtools/pull/1114

This release also fixes the following bug(s):

- Handle invalid pVACfuse characters by trimming the sequence instead of skipping it. The previous
implementation would lead to missing sequences in certain downstream steps, resulting in errors.
by @susannasiebert in /~https://github.com/griffithlab/pVACtools/pull/1130
- Add new pVACview R files to the list of files getting copied into the pVACseq output folder.
These files were previsouly not copied in the the results folder, leading to error when running
the ``pvacview run`` commands on a pVACseq output directory. by @susannasiebert in
/~https://github.com/griffithlab/pVACtools/pull/1126
- Remove single DP and DQ alpha and beta chain alleles from the list of supported alleles in MHCnuggetsII.
This is because those alleles need to be defined as a pair of alpha- and beta-chains in order to be
meaningful. Also remove DRA alleles from the same list since the DR locus is defined only by the beta
chain because functional variation in mature DRA gene products is absent. by @susannasiebert in
/~https://github.com/griffithlab/pVACtools/pull/1133
- Fix errors in the rounding of the min and max values of the sliders in the custom pVACview module by
@evelyn-schmidt in /~https://github.com/griffithlab/pVACtools/pull/1116
- Remove unused code in the Frameshift.pm VEP plugin that causes errors with certain types of variants
by @susannasiebert in /~https://github.com/griffithlab/pVACtools/pull/1122
1 change: 1 addition & 0 deletions pvactools/lib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
"fasta_generator",
"output_parser",
"valid_alleles",
'valid_algorithms',
'net_chop',
"netmhc_stab",
"filter",
Expand Down
59 changes: 38 additions & 21 deletions pvactools/lib/aggregate_all_epitopes.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
from abc import ABCMeta, abstractmethod
import itertools
import csv
import glob
import ast
from pvactools.lib.run_utils import get_anchor_positions

from pvactools.lib.prediction_class import PredictionClass

Expand Down Expand Up @@ -123,6 +126,17 @@ def determine_used_prediction_algorithms(self):
prediction_algorithms.append(algorithm)
return prediction_algorithms

def determine_used_epitope_lengths(self):
col_name = self.determine_epitope_seq_column_name()
return list(set([len(s) for s in pd.read_csv(self.input_file, delimiter="\t", usecols=[col_name])[col_name]]))

def determine_epitope_seq_column_name(self):
headers = pd.read_csv(self.input_file, delimiter="\t", nrows=0).columns.tolist()
for header in ["MT Epitope Seq", "Epitope Seq"]:
if header in headers:
return header
raise Exception("No mutant epitope sequence header found.")

def problematic_positions_exist(self):
headers = pd.read_csv(self.input_file, delimiter="\t", nrows=0).columns.tolist()
return 'Problematic Positions' in headers
Expand Down Expand Up @@ -160,18 +174,18 @@ def determine_columns_used_for_aggregation(self, prediction_algorithms, el_algor

def set_column_types(self, prediction_algorithms):
dtypes = {
'Chromosome': str,
'Chromosome': "string",
"Start": "int32",
"Stop": "int32",
'Reference': str,
'Variant': str,
'Reference': "string",
'Variant': "string",
"Variant Type": "category",
"Mutation Position": "category",
"Median MT IC50 Score": "float32",
"Median MT Percentile": "float32",
"Best MT IC50 Score": "float32",
"Best MT Percentile": "float32",
"Protein Position": "str",
"Protein Position": "string",
"Transcript Length": "int32",
}
for algorithm in prediction_algorithms:
Expand All @@ -183,6 +197,7 @@ def set_column_types(self, prediction_algorithms):

def execute(self):
prediction_algorithms = self.determine_used_prediction_algorithms()
epitope_lengths = self.determine_used_epitope_lengths()
el_algorithms = self.determine_used_el_algorithms()
used_columns = self.determine_columns_used_for_aggregation(prediction_algorithms, el_algorithms)
dtypes = self.set_column_types(prediction_algorithms)
Expand All @@ -209,6 +224,7 @@ def execute(self):
'allele_specific_anchors': self.allele_specific_anchors,
'alleles': self.hla_types.tolist(),
'anchor_contribution_threshold': self.anchor_contribution_threshold,
'epitope_lengths': epitope_lengths,
}
else:
metrics = {}
Expand Down Expand Up @@ -281,6 +297,20 @@ def __init__(
probs[hla] = line
anchor_probabilities[length] = probs
self.anchor_probabilities = anchor_probabilities

mouse_anchor_positions = {}
for length in [8, 9, 10, 11]:
base_dir = os.path.abspath(os.path.join(os.path.dirname(os.path.realpath(__file__)), '..'))
file_name = os.path.join(base_dir, 'tools', 'pvacview', 'data', "mouse_anchor_predictions_{}_mer.tsv".format(length))
values = {}
with open(file_name, 'r') as fh:
reader = csv.DictReader(fh, delimiter="\t")
for line in reader:
allele = line.pop('Allele')
values[allele] = {int(k): ast.literal_eval(v) for k, v in line.items()}
mouse_anchor_positions[length] = values
self.mouse_anchor_positions = mouse_anchor_positions

self.allele_specific_anchors = allele_specific_anchors
self.anchor_contribution_threshold = anchor_contribution_threshold
super().__init__()
Expand Down Expand Up @@ -362,7 +392,7 @@ def is_anchor_residue_pass(self, mutation):
binding_threshold = self.binding_threshold

anchor_residue_pass = True
anchors = self.get_anchor_positions(mutation['HLA Allele'], len(mutation['MT Epitope Seq']))
anchors = get_anchor_positions(mutation['HLA Allele'], len(mutation['MT Epitope Seq']), self.allele_specific_anchors, self.anchor_probabilities, self.anchor_contribution_threshold, self.mouse_anchor_positions)
# parse out mutation position from str
position = mutation["Mutation Position"]
if pd.isna(position):
Expand All @@ -382,20 +412,6 @@ def is_anchor_residue_pass(self, mutation):
anchor_residue_pass = False
return anchor_residue_pass

def get_anchor_positions(self, hla_allele, epitope_length):
if self.allele_specific_anchors and epitope_length in self.anchor_probabilities and hla_allele in self.anchor_probabilities[epitope_length]:
probs = self.anchor_probabilities[epitope_length][hla_allele]
positions = []
total_prob = 0
for (pos, prob) in sorted(probs.items(), key=lambda x: x[1], reverse=True):
total_prob += float(prob)
positions.append(int(pos))
if total_prob > self.anchor_contribution_threshold:
return positions

return [1, 2, epitope_length - 1 , epitope_length]


#assign mutations to a "Classification" based on their favorability
def get_tier(self, mutation, vaf_clonal):
if self.use_allele_specific_binding_thresholds and mutation['HLA Allele'] in self.allele_specific_binding_thresholds:
Expand Down Expand Up @@ -731,10 +747,11 @@ def sort_table(self, df):
def copy_pvacview_r_files(self):
module_dir = os.path.dirname(__file__)
r_folder = os.path.abspath(os.path.join(module_dir,"..","tools","pvacview"))
files = glob.iglob(os.path.join(r_folder, "*.R"))
destination = os.path.abspath(os.path.dirname(self.output_file))
os.makedirs(os.path.join(destination, "www"), exist_ok=True)
for i in ["ui.R", "app.R", "server.R", "styling.R", "anchor_and_helper_functions.R"]:
shutil.copy(os.path.join(r_folder, i), os.path.join(destination, i))
for i in files:
shutil.copy(i, destination)
for i in ["anchor.jpg", "pVACview_logo.png", "pVACview_logo_mini.png"]:
shutil.copy(os.path.join(r_folder, "www", i), os.path.join(destination, "www", i))

Expand Down
Loading

0 comments on commit d70277a

Please sign in to comment.