Releases: WrightonLabCSU/DRAM
v2.0.0-beta5
Fix typo causing bug in main script
Full Changelog: v2.0.0-beta4...v2.0.0-beta5
v2.0.0-beta4
What's Changed
- Give default to ch_distill_sql_script since it is used always in annotations by @madeline-scyphers in #383
- Bugfix/path creation bug by @madeline-scyphers in #385
- Feature/kegg pep directory by @madeline-scyphers in #387
Full Changelog: v2.0.0-beta3...v2.0.0-beta4
v2.0.0-beta3
What's Changed
DRAM v2 is wrapped in Nextflow due to its innate scalability on HPCs and containerization, ensuring rigorous reproducibility and version control, thus making it ideally suited for high-performance computing environments. It was also containerized to give users the option to use with Docker, Singularity or other container runtimes, or still with Conda. Databases have also now been largely preformatted for users. All of this is part of the goal making DRAM easier to install and use, as well as easier to scale.
Pre Beta
- Nextflow initial wrapping
- DRAM package restructuring for Nextflow
- Database preformatting changes
- Containerization
Previous Betas from old repo
Beta 1
- Removed hard-coded slurm node and slurm_queue in nextflow.config by @BioRRW in WrightonLabCSU/DRAM2#1
- Dev by @BioRRW in #2 - WrightonLabCSU/DRAM2#13
- Visualizations/make product by @madeline-scyphers in WrightonLabCSU/DRAM2#12
- Dev by @BioRRW in WrightonLabCSU/DRAM2#14
- Visualizations/docstrings by @madeline-scyphers in WrightonLabCSU/DRAM2#15
- Add README to visualization package by @madeline-scyphers in WrightonLabCSU/DRAM2#16
- Viz/move viz to installable package by @madeline-scyphers in WrightonLabCSU/DRAM2#20
- Feature/kegg db formatting by @madeline-scyphers in WrightonLabCSU/DRAM2#23
- Package/add docker file by @madeline-scyphers in WrightonLabCSU/DRAM2#24
- Kegg formating, docker, visualization package, dev notes by @madeline-scyphers in WrightonLabCSU/DRAM2#19
Beta2
- Replace many ./ paths with using NF's projectDir variable by @madeline-scyphers in WrightonLabCSU/DRAM2#25
- Config/split config by @madeline-scyphers in WrightonLabCSU/DRAM2#26
- Update docs with new install instructions by @madeline-scyphers in WrightonLabCSU/DRAM2#27
Beta 3
- Move from DRAM2 name back to DRAM
- Moving DRAM Nextflow Configuration to split better between internal and user
New Contributors
- @jrr-microbio made their first contribution in #320
- @BioRRW made their first contribution in WrightonLabCSU/DRAM2#1
- @madeline-scyphers made their first contribution in #364
Full Changelog: v1.5.0...v2.0.0-beta3
DRAM 1.5.0 release
This is the official release of DRAM1.5.0. The 1.5.0 release has significant changes that could impact your research. Please review these changes and help us validate this release!
Install / upgrade:
If DRAM is installed with Bioconda, and then it can be upgraded like any Conda package. Note that the Conda package for dram may be delayed slightly while it is validated, but it should be available within a day or two of the release.
If you already have a DRAM environment and want to upgrade:
# Activate your old DRAM environment first!
# Save your old config
DRAM-setup.py export_config > my_old_config.txt
# install DRAM
wget https://raw.githubusercontent.com/shafferm/DRAM/master/environment.yaml
conda env update -f environment.yaml -n DRAM --prune
# import your old databases
DRAM-setup.py import_config --config_loc my_old_config.txt
If you are using an old database, like in the example above, you may need to check out a special version of dram from GitHub.
git clone /~https://github.com/WrightonLabCSU/DRAM.git
cd DRAM
git checkout dbcan_no_ec
conda env update -f environment.yaml -n DRAM --prune
conda activate DRAM
conda install pip
pip install ./
To install the DRAM in a new Conda environment, follow the instructions in the README.
Change log DRAM1.5.0:
- DRAM annotate now has a new database which may be included. The new database CAMPER (Curated Annotations for Microbial (Poly)phenol Enzymes and Reactions) can be incorporated into DRAM.
Please visit the CAMPER GitHub for more information: /~https://github.com/WrightonLabCSU/CAMPER
-
Accumulation of mmseq temporary files during annotation are now removed immediately after a given sample has processed. Before, these files were removed after all samples were annotated. This reduces storage space needed for a given DRAM run.
-
"scikit-bio" related error. This error arose when scikit-bio was updated. While always using the latest version of a software can be important for security updates, the stability of DRAM is our main concern. To solve this, we have explicitly stated each version of each dependency within the environment.yaml file.
DRAM 1.4.6 -> Point Release
This is the official release of DRAM1.4.56. The 1.4.0 release has significant changes that could impact your research. The 1.4.4 point release is less significant, but still important for dram-v and dram users. DRAM 1.4.5 and 1.4.6 are a bug fix releases, so there is no new information. Please review these changes and help us validate this release!
Install / upgrade:
If DRAM is installed with Bioconda, and then it can be upgraded like any Conda package. Note that the Conda package for dram may be delayed slightly while it is validated, but it should be available within a day or two of the release.
If you already have a DRAM environment and want to upgrade:
# Activate your old DRAM environment first!
# Save your old config
DRAM-setup.py export_config > my_old_config.txt
# install DRAM
wget https://raw.githubusercontent.com/shafferm/DRAM/master/environment.yaml
conda env update -f environment.yaml -n DRAM --prune
# import your old databases
DRAM-setup.py import_config --config_loc my_old_config.txt
If you are using an old database, like in the example above, you may need to check out a special version of dram from GitHub.
git clone /~https://github.com/WrightonLabCSU/DRAM.git
cd DRAM
git checkout dbcan_no_ec
conda env update -f environment.yaml -n DRAM --prune
conda activate DRAM
conda install pip
pip install ./
To install the DRAM in a new Conda environment, follow the instructions in the README.
Point Release Update:
1.4.6:
- In order to react to changes in ref-seqs viral, the number of viral files has been changed from 2 to 1.
1.4.5:
- Bug fix related to the default values of the config file. Specifically, the CONFIG file retained information from testing that could mess with the setup process if the paths were not overwriten by new dbs.
- Added a unit test to check that the CONFIG file that is committed to GitHub is compatible in the future.
1.4.4:
- Bug fixes have been made all to the setup script to support the many ways the DRAM databases get build, You will see them in the merge history.
- Previously, the DRAM-v AMG summary did not add match data for AMGs that were matched to the AMG Database only. This was confusing, and so now information relevant to the AMG Database is in the AMG summary along with the Metabolic Database. This adds the new columns "metabolism", "reference", and "verified", and the "gene_id_origin" field which tells you where this Gene ID came from. Remember that a sequence can match to more than one sequence and this is more common in the AMG Database, so your AMG Summary will be longer and contain more duplicates.
- DRAM1.4.X collects subfamily EC numbers for the raw annotations, but does not use them in the distillation process. We have future plans for these EC numbers, but in the meantime it makes it impossible to use older versions of the DRAM databases with the newer DRAM1.4.X. This is not ideal as we do strive for backwards compatibility, sadly the only solution at this time is to create a branch that does not look for the EC numbers. Use the instructions above or in the read me to install the dbcan_no_ec branch from git.
- Most output arguments are now required, with only a few exceptions. Most people will not notice this.
Change log DRAM1.4.0:
-
DRAM distill now includes a new metabolism for methylation. Although planned for DRAM2 you can already include this tool in annotation and distillation provided you follow the instructions below.
In order to distill with methyl, you need only download the new FASTA file and point to it with the dram custom database options that were introduced in DRAM1.3. Note that in order to distill correctly, you will need to use the correct name ‘methyl’ and must use DRAM 1.4.
To Annotate with methyl, do something like:
wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy.faa DRAM.py annotate -i '/some/path/*.fasta' -o dram_output --threads 30 --custom_db_name methyl --custom_fasta_loc methylotrophy.faa
To Distill with methyl:
wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy_distillate.tsv DRAM.py distill -i dram_output/annotations.tsv -o dram_output/distillate --custom_distillate methylotrophy_distillate.tsv
Learn more about custom databases, in the Wiki.
-
Glycoside hydrolase subfamily calls, subfamily calls are now being incorporated into annotations with changes in databases and code; this impacts what gets pulled into the distillate and product because these are looking for family level (e.g.
AA1
) not subfamily level (e.g.AA1_1
,AA2_2
).In response, DRAM is changing the output of the dbCAN database in DRAM1.4. Raw- cazyme subfamilies will be output into the
cazy_id
column, and the corresponding description for the cazyme family will be put into thecazy_hit
column.The Distillation in DRAM1.4 will count cazymes marked at subfamily level on the family level; this means for cazyme family
AA1
there will be 4 entries in the distillateAA1
,AA1_1
,AA1_2
, andAA1_3
and the sum of these four will be the total number ofAA1
cazymes. In DRAM1.3 and previous, the distillate for this exampleAA1
with no underscore would include cazymes that can be assigned to family AA1, but do not have a subfamily designation.The DRAM Product will also count cazymes at the family level. For the
AA1
example,AA1_1
,AA1_2
, andAA1_3
will be counted asAA1
for the current rules in assigning cazymes to compounds. -
More changes are also being made that will affect CAZY IDs in DRAM1.4. The cutoff e-value is being changed to 1e-18 to conform to best practices for the database.
DRAM1.4 also introduced a new column for best hit per gene from dbCAN database named
cazy_best_hit
. This column will be the match to the gene that has the highest coverage and lowest full-sequence e-value as calculated by mmseqs, with priority on e-value.Cazy_best_hit
will be the only column considered downstream in the distillate and product. DRAM1.3 pulls and counts all dbCAN hits above e-value 1e-15, rather than profiling best hits.New column corresponding to EC number information from subfamilies, named cazy_subfamily_ec has been added in DRAM1.4. These EC numbers will also be used as part of the distillate along with those from kegg, as part of pathways and other tools. For now, incomplete EC numbers will be included, but not considered for the distillate. The subfamilies will be excluded from the product in order to facilitate its goals of being a larger overview.
-
Logging is now fully implemented in DRAM1.4. Log files will be created for almost all DRAM functions. The log file for annotations will appear in the annotations' folder by default, and the log file for the dram distillation will by default be in the distillation folder. You can also use the
--log_file_path
argument to set the log path. A log file for database processing is set by the config file, and by default it will be in the databases' directory. All content that DRAM prints to the command line will appear in the log file . -
The dram config now stores when databases were downloaded, citation information and version information when applicable. This information is printed to the log at the beginning of each run. The old format can still be imported if you want to keep your DRAM1.3 databases.
-
In 1.4 you can set a config file to use in dram annotation and distillation at run time in 2 ways. (1) use --config_loc with DRAM.py or DRAM-v.py or (2) set the environment variable DRAM_CONFIG_LOCATION. This will not store or import the config, and that config will only be used for that run.
-
Significant Bug fixes are also included in this release.
- When the input fastas contain duplicates in their header names, the dram annotate step should fail with an error immediately, not at the end of the annotation process, this will save some people a lot of time. It may be that this is only a problem for annotating genomes, in any case it must be in place across workflows.
- Some users have firewalls on their HPC environments that prevent the download via ftp in some cases converting to http can solve download problems. In DRAM1.4 if ftp links fail, a back-up http link will be attempted before an error is thrown. See issue #206.
- DRAM1.4 will ensure that if no databases are downloaded, DRAM setup will still work. Previously, some databases depend on data being downloaded and can't be set up with a provided data set.
- Reduced unnecessary warnings in various repetitive tasks in DRAM distillation by refactoring pandas code.
- BIO-RELATED This bug change could affect biology. In the past, the counting of EC numbers was inconsistent. When counting the number of EC numbers in a row of the annotations file duplicates were not counted, however if counting the EC numbers for the full set of data the count of EC numbers included such duplicates. This is now corrected, but it could have some small unexpected downstream effects.
- Glycoside hydrolase subfamily calls.
- In response to issue #122 You can now pass a config file at run time or by setting the environment variable DRAM_CONFIG_LOCATION. Read more in the Wiki.
Known issues:
- Speed and memory remain a big problem for DRAM and the estimates in the wiki and other documentation are woefully out of date. Fixing this is a major priority.
- The annotation merging tool lacks sufficient checks, and fails when files are missing.
- Code c...
DRAM 1.4.5 -> Point Release
This is the official release of DRAM1.4.5. The 1.4.0 release has significant changes that could impact your research. The 1.4.4 point release is less significant, but still important for dram-v and dram users. DRAM 1.4.5 is a bug fix release so there is no new information. Please review these changes and help us validate this release!
Install / upgrade:
If DRAM is installed with Bioconda, and then it can be upgraded like any Conda package. Note that the conda package for dram may be delayed slightly while it is validated, but it should be available within a day or two of the release.
If you already have a DRAM environment and want to upgrade:
# Activate your old DRAM environment first!
# Save your old config
DRAM-setup.py export_config > my_old_config.txt
# install DRAM
wget https://raw.githubusercontent.com/shafferm/DRAM/master/environment.yaml
conda env update -f environment.yaml -n DRAM --prune
# import your old databases
DRAM-setup.py import_config --config_loc my_old_config.txt
If you are using an old database, like in the example above, you may need to check out a special version of dram from GitHub.
git clone /~https://github.com/WrightonLabCSU/DRAM.git
cd DRAM
git checkout dbcan_no_ec
conda env update -f environment.yaml -n DRAM --prune
conda activate DRAM
conda install pip
pip install ./
To install the DRAM in a new Conda environment, follow the instructions in the README.
Point Release Update:
1.4.6:
- In order to react to changes in ref-seqs viral, the number of viral files has been changed from 2 to 1.
1.4.5: - Bug fix related to the default values of the config file. Specifically, the CONFIG file retained information from testing that could mess with the setup process if the paths were not overwriten by new dbs.
- Added a unit test to check that the CONFIG file that is committed to GitHub is compatible in the future.
1.4.4: - Bug fixes have been made all to the setup script to support the many ways the DRAM databases get build, You will see them in the merge history.
- Previously, the DRAM-v AMG summary did not add match data for AMGs that were matched to the AMG Database only. This was confusing, and so now information relevant to the AMG Database is in the AMG summary along with the Metabolic Database. This adds the new columns "metabolism", "reference", and "verified", and the "gene_id_origin" field which tells you where this Gene ID came from. Remember that a sequence can match to more than one sequence and this is more common in the AMG Database, so your AMG Summary will be longer and contain more duplicates.
- DRAM1.4.X collects subfamily EC numbers for the raw annotations, but does not use them in the distillation process. We have future plans for these EC numbers, but in the meantime it makes it impossible to use older versions of the DRAM databases with the newer DRAM1.4.X. This is not ideal as we do strive for backwards compatibility, sadly the only solution at this time is to create a branch that does not look for the EC numbers. Use the instructions above or in the read me to install the dbcan_no_ec branch from git.
- Most output arguments are now required, with only a few exceptions. Most people will not notice this.
Change log DRAM1.4.0:
-
DRAM distill now includes a new metabolism for methylation. Although planned for DRAM2 you can already include this tool in annotation and distillation provided you follow the instructions below.
In order to distill with methyl, you need only download the new FASTA file and point to it with the dram custom database options that were introduced in DRAM1.3. Note that in order to distill correctly, you will need to use the correct name ‘methyl’ and must use DRAM 1.4.
To Annotate with methyl, do something like:
wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy.faa DRAM.py annotate -i '/some/path/*.fasta' -o dram_output --threads 30 --custom_db_name methyl --custom_fasta_loc methylotrophy.faa
To Distill with methyl:
wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy_distillate.tsv DRAM.py distill -i dram_output/annotations.tsv -o dram_output/distillate --custom_distillate methylotrophy_distillate.tsv
Learn more about custom databases, in the Wiki.
-
Glycoside hydrolase subfamily calls, subfamily calls are now being incorporated into annotations with changes in databases and code; this impacts what gets pulled into the distillate and product because these are looking for family level (e.g.
AA1
) not subfamily level (e.g.AA1_1
,AA2_2
).In response, DRAM is changing the output of the dbCAN database in DRAM1.4. Raw- cazyme subfamilies will be output into the
cazy_id
column, and the corresponding description for the cazyme family will be put into thecazy_hit
column.The Distillation in DRAM1.4 will count cazymes marked at subfamily level on the family level; this means for cazyme family
AA1
there will be 4 entries in the distillateAA1
,AA1_1
,AA1_2
, andAA1_3
and the sum of these four will be the total number ofAA1
cazymes. In DRAM1.3 and previous, the distillate for this exampleAA1
with no underscore would include cazymes that can be assigned to family AA1, but do not have a subfamily designation.The DRAM Product will also count cazymes at the family level. For the
AA1
example,AA1_1
,AA1_2
, andAA1_3
will be counted asAA1
for the current rules in assigning cazymes to compounds. -
More changes are also being made that will affect CAZY IDs in DRAM1.4. The cutoff e-value is being changed to 1e-18 to conform to best practices for the database.
DRAM1.4 also introduced a new column for best hit per gene from dbCAN database named
cazy_best_hit
. This column will be the match to the gene that has the highest coverage and lowest full-sequence e-value as calculated by mmseqs, with priority on e-value.Cazy_best_hit
will be the only column considered downstream in the distillate and product. DRAM1.3 pulls and counts all dbCAN hits above e-value 1e-15, rather than profiling best hits.New column corresponding to EC number information from subfamilies, named cazy_subfamily_ec has been added in DRAM1.4. These EC numbers will also be used as part of the distillate along with those from kegg, as part of pathways and other tools. For now, incomplete EC numbers will be included, but not considered for the distillate. The subfamilies will be excluded from the product in order to facilitate its goals of being a larger overview.
-
Logging is now fully implemented in DRAM1.4. Log files will be created for almost all DRAM functions. The log file for annotations will appear in the annotations' folder by default, and the log file for the dram distillation will by default be in the distillation folder. You can also use the
--log_file_path
argument to set the log path. A log file for database processing is set by the config file, and by default it will be in the databases' directory. All content that DRAM prints to the command line will appear in the log file . -
The dram config now stores when databases were downloaded, citation information and version information when applicable. This information is printed to the log at the beginning of each run. The old format can still be imported if you want to keep your DRAM1.3 databases.
-
In 1.4 you can set a config file to use in dram annotation and distillation at run time in 2 ways. (1) use --config_loc with DRAM.py or DRAM-v.py or (2) set the environment variable DRAM_CONFIG_LOCATION. This will not store or import the config, and that config will only be used for that run.
-
Significant Bug fixes are also included in this release.
- When the input fastas contain duplicates in their header names, the dram annotate step should fail with an error immediately, not at the end of the annotation process, this will save some people a lot of time. It may be that this is only a problem for annotating genomes, in any case it must be in place across workflows.
- Some users have firewalls on their HPC environments that prevent the download via ftp in some cases converting to http can solve download problems. In DRAM1.4 if ftp links fail, a back-up http link will be attempted before an error is thrown. See issue #206.
- DRAM1.4 will ensure that if no databases are downloaded, DRAM setup will still work. Previously, some databases depend on data being downloaded and can't be set up with a provided data set.
- Reduced unnecessary warnings in various repetitive tasks in DRAM distillation by refactoring pandas code.
- BIO-RELATED This bug change could affect biology. In the past, the counting of EC numbers was inconsistent. When counting the number of EC numbers in a row of the annotations file duplicates were not counted, however if counting the EC numbers for the full set of data the count of EC numbers included such duplicates. This is now corrected, but it could have some small unexpected downstream effects.
- Glycoside hydrolase subfamily calls.
- In response to issue #122 You can now pass a config file at run time or by setting the environment variable DRAM_CONFIG_LOCATION. Read more in the Wiki.
Known issues:
- Speed and memory remain a big problem for DRAM and the estimates in the wiki and other documentation are woefully out of date. Fixing this is a major priority.
- The annotation merging tool lacks sufficient checks, and fails when files are missing.
- Code coverage remains low, especially for the l...
DRAM 1.4.4+ -> Point Release
This is the official release of DRAM1.4.4. The 1.4.0 release has significant changes that could impact your research. The 1.4.4 point release is less significant, but still important for dram-v and dram users. Please review these changes and help us validate this release!
Install / upgrade:
If DRAM is installed with Bioconda, and then it can be upgraded like any Conda package. Note that the conda package for dram may be delayed slightly while it is validated, but it should be available within a day or two of the release.
If you already have a DRAM environment and want to upgrade:
# Activate your old DRAM environment first!
# Save your old config
DRAM-setup.py export_config > my_old_config.txt
# install DRAM
wget https://raw.githubusercontent.com/shafferm/DRAM/master/environment.yaml
conda env update -f environment.yaml -n DRAM --prune
# import your old databases
DRAM-setup.py import_config --config_loc my_old_config.txt
If you are using an old database, like in the example above, you may need to check out a special version of dram from GitHub.
git clone /~https://github.com/WrightonLabCSU/DRAM.git
cd DRAM
git checkout dbcan_no_ec
conda env update -f environment.yaml -n DRAM --prune
conda activate DRAM
conda install pip
pip install ./
To install the DRAM in a new Conda environment, follow the instructions in the README.
Change log DRAM1.4.4 addendum:
- Bug fixes have been made all to the setup script to support the many ways the DRAM databases get build, You will see them in the merge history.
- Previously, the DRAM-v AMG summary did not add match data for AMGs that were matched to the AMG Database only. This was confusing, and so now information relevant to the AMG Database is in the AMG summary along with the Metabolic Database. This adds the new columns "metabolism", "reference", and "verified", and the "gene_id_origin" field which tells you where this Gene ID came from. Remember that a sequence can match to more than one sequence and this is more common in the AMG Database, so your AMG Summary will be longer and contain more duplicates.
- DRAM1.4.X collects subfamily EC numbers for the raw annotations, but does not use them in the distillation process. We have future plans for these EC numbers, but in the meantime it makes it impossible to use older versions of the DRAM databases with the newer DRAM1.4.X. This is not ideal as we do strive for backwards compatibility, sadly the only solution at this time is to create a branch that does not look for the EC numbers. Use the instructions above or in the read me to install the dbcan_no_ec branch from git.
- Most output arguments are now required, with only a few exceptions. Most people will not notice this.
Change log DRAM1.4.0:
-
DRAM distill now includes a new metabolism for methylation. Although planned for DRAM2 you can already include this tool in annotation and distillation provided you follow the instructions below.
In order to distill with methyl, you need only download the new FASTA file and point to it with the dram custom database options that were introduced in DRAM1.3. Note that in order to distill correctly, you will need to use the correct name ‘methyl’ and must use DRAM 1.4.
To Annotate with methyl, do something like:
wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy.faa DRAM.py annotate -i '/some/path/*.fasta' -o dram_output --threads 30 --custom_db_name methyl --custom_fasta_loc methylotrophy.faa
To Distill with methyl:
wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy_distillate.tsv DRAM.py distill -i dram_output/annotations.tsv -o dram_output/distillate --custom_distillate methylotrophy_distillate.tsv
Learn more about custom databases, in the Wiki.
-
Glycoside hydrolase subfamily calls, subfamily calls are now being incorporated into annotations with changes in databases and code; this impacts what gets pulled into the distillate and product because these are looking for family level (e.g.
AA1
) not subfamily level (e.g.AA1_1
,AA2_2
).In response, DRAM is changing the output of the dbCAN database in DRAM1.4. Raw- cazyme subfamilies will be output into the
cazy_id
column, and the corresponding description for the cazyme family will be put into thecazy_hit
column.The Distillation in DRAM1.4 will count cazymes marked at subfamily level on the family level; this means for cazyme family
AA1
there will be 4 entries in the distillateAA1
,AA1_1
,AA1_2
, andAA1_3
and the sum of these four will be the total number ofAA1
cazymes. In DRAM1.3 and previous, the distillate for this exampleAA1
with no underscore would include cazymes that can be assigned to family AA1, but do not have a subfamily designation.The DRAM Product will also count cazymes at the family level. For the
AA1
example,AA1_1
,AA1_2
, andAA1_3
will be counted asAA1
for the current rules in assigning cazymes to compounds. -
More changes are also being made that will affect CAZY IDs in DRAM1.4. The cutoff e-value is being changed to 1e-18 to conform to best practices for the database.
DRAM1.4 also introduced a new column for best hit per gene from dbCAN database named
cazy_best_hit
. This column will be the match to the gene that has the highest coverage and lowest full-sequence e-value as calculated by mmseqs, with priority on e-value.Cazy_best_hit
will be the only column considered downstream in the distillate and product. DRAM1.3 pulls and counts all dbCAN hits above e-value 1e-15, rather than profiling best hits.New column corresponding to EC number information from subfamilies, named cazy_subfamily_ec has been added in DRAM1.4. These EC numbers will also be used as part of the distillate along with those from kegg, as part of pathways and other tools. For now, incomplete EC numbers will be included, but not considered for the distillate. The subfamilies will be excluded from the product in order to facilitate its goals of being a larger overview.
-
Logging is now fully implemented in DRAM1.4. Log files will be created for almost all DRAM functions. The log file for annotations will appear in the annotations' folder by default, and the log file for the dram distillation will by default be in the distillation folder. You can also use the
--log_file_path
argument to set the log path. A log file for database processing is set by the config file, and by default it will be in the databases' directory. All content that DRAM prints to the command line will appear in the log file . -
The dram config now stores when databases were downloaded, citation information and version information when applicable. This information is printed to the log at the beginning of each run. The old format can still be imported if you want to keep your DRAM1.3 databases.
-
In 1.4 you can set a config file to use in dram annotation and distillation at run time in 2 ways. (1) use --config_loc with DRAM.py or DRAM-v.py or (2) set the environment variable DRAM_CONFIG_LOCATION. This will not store or import the config, and that config will only be used for that run.
-
Significant Bug fixes are also included in this release.
- When the input fastas contain duplicates in their header names, the dram annotate step should fail with an error immediately, not at the end of the annotation process, this will save some people a lot of time. It may be that this is only a problem for annotating genomes, in any case it must be in place across workflows.
- Some users have firewalls on their HPC environments that prevent the download via ftp in some cases converting to http can solve download problems. In DRAM1.4 if ftp links fail, a back-up http link will be attempted before an error is thrown. See issue #206.
- DRAM1.4 will ensure that if no databases are downloaded, DRAM setup will still work. Previously, some databases depend on data being downloaded and can't be set up with a provided data set.
- Reduced unnecessary warnings in various repetitive tasks in DRAM distillation by refactoring pandas code.
- BIO-RELATED This bug change could affect biology. In the past, the counting of EC numbers was inconsistent. When counting the number of EC numbers in a row of the annotations file duplicates were not counted, however if counting the EC numbers for the full set of data the count of EC numbers included such duplicates. This is now corrected, but it could have some small unexpected downstream effects.
- Glycoside hydrolase subfamily calls.
- In response to issue #122 You can now pass a config file at run time or by setting the environment variable DRAM_CONFIG_LOCATION. Read more in the Wiki.
Known issues:
- Speed and memory remain a big problem for DRAM and the estimates in the wiki and other documentation are woefully out of date. Fixing this is a major priority.
- The annotation merging tool lacks sufficient checks, and fails when files are missing.
- Code coverage remains low, especially for the less prominent tools.
DRAM 1.4.0
This is the official release of DRAM1.4.0. The 1.4.0 release has significant changes that could impact your research. Please review these changes and help us validate this release!
Install / upgrade:
If DRAM is installed with Bioconda, and then it can be upgraded like any Conda package. Note that the conda package for dram may be delayed slightly while it is validated, but it should be available within a day or two of the release.
If you already have a DRAM environment and want to upgrade:
# Activate your old DRAM environment first!
# Save your old config
DRAM-setup.py export_config > my_old_config.txt
# install DRAM
wget https://raw.githubusercontent.com/shafferm/DRAM/master/environment.yaml
conda env update -f environment.yaml -n DRAM --prune
# import your old databases
DRAM-setup.py import_config --config_loc my_old_config.txt
To install the DRAM in a new Conda environment, follow the instructions in the README.
Change log:
-
Dram distill now includes a new metabolism for methylation. Although planned for DRAM2 you can already include this tool in annotation and distillation provided you follow the instructions below.
In order to distill with methyl, you need only download the new FASTA file and point to it with the dram custom database options that were introduced in DRAM1.3. Note that in order to distill correctly, you will need to use the correct name ‘methyl’ and must use DRAM 1.4.
To Annotate with methyl, do something like:
wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy.faa DRAM.py annotate -i '/some/path/*.fasta' -o dram_output --threads 30 --custom_db_name methyl --custom_fasta_loc methylotrophy.faa
To Distill with methyl:
wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy_distillate.tsv DRAM.py distill -i dram_output/annotations.tsv -o dram_output/distillate --custom_distillate methylotrophy_distillate.tsv
Learn more about custom databases, in the Wiki.
-
Glycoside hydrolase subfamily calls, subfamily calls are now being incorporated into annotations with changes in databases and code; this impacts what gets pulled into the distillate and product because these are looking for family level (e.g.
AA1
) not subfamily level (e.g.AA1_1
,AA2_2
).In response, DRAM is changing the output of the dbCAN database in DRAM1.4. Raw- cazyme subfamilies will be output into the
cazy_id
column, and the corresponding description for the cazyme family will be put into thecazy_hit
column.The Distillation in DRAM1.4 will count cazymes marked at subfamily level on the family level; this means for cazyme family
AA1
there will be 4 entries in the distillateAA1
,AA1_1
,AA1_2
, andAA1_3
and the sum of these four will be the total number ofAA1
cazymes. In DRAM1.3 and previous, the distillate for this exampleAA1
with no underscore would include cazymes that can be assigned to family AA1, but do not have a subfamily designation.The DRAM Product will also count cazymes at the family level. For the
AA1
example,AA1_1
,AA1_2
, andAA1_3
will be counted asAA1
for the current rules in assigning cazymes to compounds. -
More changes are also being made that will affect CAZY IDs in DRAM1.4. The cutoff e-value is being changed to 1e-18 to conform to best practices for the database.
DRAM1.4 also introduced a new column for best hit per gene from dbCAN database named
cazy_best_hit
. This column will be the match to the gene that has the highest coverage and lowest full-sequence e-value as calculated by mmseqs, with priority on e-value.Cazy_best_hit
will be the only column considered downstream in the distillate and product. DRAM1.3 pulls and counts all dbCAN hits above e-value 1e-15, rather than profiling best hits.New column corresponding to EC number information from subfamilies, named cazy_subfamily_ec has been added in DRAM1.4. These EC numbers will also be used as part of the distillate along with those from kegg, as part of pathways and other tools. For now, incomplete EC numbers will be included, but not considered for the distillate. The subfamilies will be excluded from the product in order to facilitate its goals of being a larger overview.
-
Logging is now fully implemented in DRAM1.4. Log files will be created for almost all DRAM functions. The log file for annotations will appear in the annotations' folder by default, and the log file for the dram distillation will by default be in the distillation folder. You can also use the
--log_file_path
argument to set the log path. A log file for database processing is set by the config file, and by default it will be in the databases' directory. All content that DRAM prints to the command line will appear in the log file . -
The dram config now stores when databases were downloaded, citation information and version information when applicable. This information is printed to the log at the beginning of each run. The old format can still be imported if you want to keep your DRAM1.3 databases.
-
In 1.4 you can set a config file to use in dram annotation and distillation at run time in 2 ways. (1) use --config_loc with DRAM.py or DRAM-v.py or (2) set the environment variable DRAM_CONFIG_LOCATION. This will not store or import the config, and that config will only be used for that run.
-
Significant Bug fixes are also included in this release.
- When the input fastas contain duplicates in their header names, the dram annotate step should fail with an error immediately, not at the end of the annotation process, this will save some people a lot of time. It may be that this is only a problem for annotating genomes, in any case it must be in place across workflows.
- Some users have firewalls on their HPC environments that prevent the download via ftp in some cases converting to http can solve download problems. In DRAM1.4 if ftp links fail, a back-up http link will be attempted before an error is thrown. See issue #206.
- DRAM1.4 will ensure that if no databases are downloaded, DRAM setup will still work. Previously, some databases depend on data being downloaded and can't be set up with a provided data set.
- Reduced unnecessary warnings in various repetitive tasks in DRAM distillation by refactoring pandas code.
- BIO-RELATED This bug change could affect biology. In the past, the counting of EC numbers was inconsistent. When counting the number of EC numbers in a row of the annotations file duplicates were not counted, however if counting the EC numbers for the full set of data the count of EC numbers included such duplicates. This is now corrected, but it could have some small unexpected downstream effects.
- Glycoside hydrolase subfamily calls.
- In response to issue #122 You can now pass a config file at run time or by setting the environment variable DRAM_CONFIG_LOCATION. Read more in the Wiki.
Known issues:
- Speed and memory remain a big problem for DRAM and the estimates in the wiki and other documentation are woefully out of date. Fixing this is a major priority.
- The annotation merging tool lacks sufficient checks, and fails when files are missing.
- Code coverage remains low, especially for the less prominent tools.
DRAM 1.4.0 rc1
This is the first release candidate of DRAM1.4.0. The 1.4.0 release has significant changes that could impact your research. Please review these changes and help us validate this release!
Install / upgrade:
In a few weeks DRAM will be upgraded in Bioconda and then can be upgraded like any Conda package. You will still be able to install DRAM1.3.5 with the traditional Conda method outlined in the README, but for early adoption you will need to use the method of install below. This method is also added in the README under Install Release Candidate.
To install a potentially unstable release candidate of DRAM, use the set of commands below that are suitable to your situation. Note the comments within the code sections and there is a context in which commands must be used.
If you already have a DRAM environment and want to upgrade:
# Activate your old DRAM environment first!
# Save your old config
DRAM-setup.py export_config > my_old_config.txt
# If you want to install in a new environment follow the instructions below and import your config with the last command in this block
# Clone the git repository
git clone /~https://github.com/WrightonLabCSU/DRAM.git
# you may need to install pip
conda install pip3
# Make sure the pip path is in your conda environment path
which pip3
# install DRAM
pip install ./DRAM
# import your old databases
DRAM-setup.py import_config --config_loc my_old_config.txt
To install the DRAM release candidate in a new Conda environment;
git clone /~https://github.com/WrightonLabCSU/DRAM.git
cd DRAM
# Install dependencies, this will also install a stable version of DRAM that will then be replaced.
conda env create --name my_dram_env -f environment.yaml
conda activate my_dram_env
# Install pip
conda install pip3
pip3 install ./
Change log:
-
Dram distill now includes a new metabolism for methylation. Although planned for DRAM2 you can already include this tool in annotation and distillation provided you follow the instructions below.
In order to distill with methyl, you need only download the new FASTA file and point to it with the dram custom database options that were introduced in DRAM1.3. Note that in order to distill correctly, you will need to use the correct name ‘methyl’ and must use DRAM 1.4.
To Annotate with methyl, do something like:
wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy.faa DRAM.py annotate -i '/some/path/*.fasta' -o dram_output --threads 30 --custom_db_name methyl --custom_fasta_loc methylotrophy.faa
To Distill with methyl:
wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy_distillate.tsv DRAM.py distill -i dram_output/annotations.tsv -o dram_output/distillate --custom_distillate methylotrophy_distillate.tsv
Learn more about custom databases, in the Wiki.
-
Glycoside hydrolase subfamily calls, subfamily calls are now being incorporated into annotations with changes in databases and code; this impacts what gets pulled into the distillate and product because these are looking for family level (e.g.
AA1
) not subfamily level (e.g.AA1_1
,AA2_2
).In response, DRAM is changing the output of the dbCAN database in DRAM1.4. Raw- cazyme subfamilies will be output into the
cazy_id
column, and the corresponding description for the cazyme family will be put into thecazy_hit
column.The Distillation in DRAM1.4 will count cazymes marked at subfamily level on the family level; this means for cazyme family
AA1
there will be 4 entries in the distillateAA1
,AA1_1
,AA1_2
, andAA1_3
and the sum of these four will be the total number ofAA1
cazymes. In DRAM1.3 and previous, the distillate for this exampleAA1
with no underscore would include cazymes that can be assigned to family AA1, but do not have a subfamily designation.The DRAM Product will also count cazymes at the family level. For the
AA1
example,AA1_1
,AA1_2
, andAA1_3
will be counted asAA1
for the current rules in assigning cazymes to compounds. -
More changes are also being made that will affect CAZY IDs in DRAM1.4. The cutoff e-value is being changed to 1e-18 to conform to best practices for the database.
DRAM1.4 also introduced a new column for best hit per gene from dbCAN database named
cazy_best_hit
. This column will be the match to the gene that has the highest coverage and lowest full-sequence e-value as calculated by mmseqs, with priority on e-value.Cazy_best_hit
will be the only column considered downstream in the distillate and product. DRAM1.3 pulls and counts all dbCAN hits above e-value 1e-15, rather than profiling best hits.New column corresponding to EC number information from subfamilies, named cazy_subfamily_ec has been added in DRAM1.4. These EC numbers will also be used as part of the distillate along with those from kegg, as part of pathways and other tools. For now, incomplete EC numbers will be included, but not considered for the distillate. The subfamilies will be excluded from the product in order to facilitate its goals of being a larger overview.
-
Logging is now fully implemented in DRAM1.4. Log files will be created for almost all DRAM functions. The log file for annotations will appear in the annotations' folder by default, and the log file for the dram distillation will by default be in the distillation folder. You can also use the
--log_file_path
argument to set the log path. A log file for database processing is set by the config file, and by default it will be in the databases' directory. All content that DRAM prints to the command line will appear in the log file . -
The dram config now stores when databases were downloaded, citation information and version information when applicable. This information is printed to the log at the beginning of each run. The old format can still be imported if you want to keep your DRAM1.3 databases.
-
Significant Bug fixes are also included in this release.
- When the input fastas contain duplicates in their header names, the dram annotate step should fail with an error immediately, not at the end of the annotation process, this will save some people a lot of time. It may be that this is only a problem for annotating genomes, in any case it must be in place across workflows.
- Some users have firewalls on their HPC environments that prevent the download via ftp in some cases converting to http can solve download problems. In DRAM1.4 if ftp links fail, a back-up http link will be attempted before an error is thrown. See issue #206.
- DRAM1.4 will ensure that if no databases are downloaded, DRAM setup will still work. Previously, some databases depend on data being downloaded and can't be set up with a provided data set.
- Reduced unnecessary warnings in various repetitive tasks in DRAM distillation by refactoring pandas code.
- BIO-RELATED This bug change could affect biology. In the past, the counting of EC numbers was inconsistent. When counting the number of EC numbers in a row of the annotations file duplicates were not counted, however if counting the EC numbers for the full set of data the count of EC numbers included such duplicates. This is now corrected, but it could have some small unexpected downstream effects.
- Glycoside hydrolase subfamily calls.
- In response to issue #122 You can now pass a config file at run time or by setting the environment variable DRAM_CONFIG_LOCATION. Read more in the Wiki.
DRAM v1.3
DRAM v1.3 change log
- Add --amg_database_loc parameter that was missing in DRAM-setup.py
- Shift DRAM download of UniRef from FTP to HTTP address to address firewall issues
- Rename of headers in annotations.tsv files to be more uniform across databases
- By default
DRAM.py annotate
now does not annotated with VOGDB by default, flag added to use VOGDB - By default don't split
DRAM-v.py annotate
input contigs into separate files because HMMER doesn't care for E-values - Users can now pass multiple
--input_fasta
arguments toDRAM.py annotate
andDRAM-v.py annotate
- Now DRAM makes sure bin names (pulled from file names) are unique and not full paths
- Update pandas methods to get rid of warnings and increase speed
- When annotating with KEGG Genes the KEGG Genes IDs are stored in the
annotations.tsv
in addition to the KO IDs - Complete rewrite of how HMM annotation is handled inside DRAM to reduce redundancy and allow...
- Users can now annotate using custom HMM sets which may include custom bitScore cutoffs
- Complete rewrite of database handling from setup through annotation, in the future this will allow more flexible configuration
- Change CI to CircleCI from travis
- DRAM strainer and gene neighborhood pulling can now both use custom distillate information