Releases: edgardomortiz/Captus
Releases · edgardomortiz/Captus
Captus v1.1.0
New in the assemble
module:
- Contig depth of coverage is now calculated by mapping the reads back to the contigs using
Salmon
right after the assembly withMEGAHIT
. This is now the default behavior unless--disable_mapping
is enabled. - The assembly is then automatically filtered by depth of contig, if
--disable_mapping
is used then only contigs with depth of coverage >1x are retained, otherwise contigs with depth of coverage >=1.5x are retained. The filtering threshold for depth can be changed with--min_contig_depth
. - To replicate the behavior of previous versions use
--disable_mapping --min_contig_depth 0
. - The filtering can be repeated with
--redo_filtering
, without the need to reassemble, to try different values for--max_contig_gc
and--min_contig_depth
. - The assembly HTML report has been completely rewritten to reflect these changes.
New in the extract
module:
- Options
--nuc_depth_tolerance
,--ptd_depth_tolerance
,--mit_depth_tolerance
, and--dna_depth_tolerance
allow to filter contigs by depth of coverage during locus extraction. Among the contigs with hits to a particular marker type (e.g., nuclear), the median of the depths of coverage is calculated and this tolerance factor is used to determine the minimum (median / tolerance) and maximum (median * tolerance) depth allowed. The depth of coverage is taken from the contig names when they contain the pattern_cov_X.XX_
. - To replicate the behavior of previous versions use
--ignore_depth
. - Added option
--disable_stitching
. By default, Captus recover a locus across multiple contigs, this option forces the recovery of a locus in a single contig (for example when providing chromosome-level genome assemblies).
Other improvements or additions:
- The accessory script
filter_most_common_target_per_locus.py
creates a new reference target file with only the most common target per locus found during the extraction step. This new reference target set can be used to re-extract the loci and potentially improve theinformed
paralog filtering. - All the reports have been updated to include the version and command of Captus used.
- Updated installation instructions and documentation.
- Some long output filenames have been shortened.
Captus v1.0.1
- During assembly of hits when extracting a miscellaneous DNA reference target, the delta in identity percentage between two hits to be considered compatible has been reduced from 5% to 3.33%, initial test indicate slight improvement in recovery.
- In some edge cases, when translating a CDS reference target set, the same nucleotide sequence can produce perfectly translated protein in more than a single reading frame, we give now priority to positive reading frames in case of a tie.
- Latest
pandas
versions introduced breaking changes, we provide a fix. - When creating a new miscellaneous DNA reference from clustering, each target sequence in a reference locus can have different strands. We add a method to uniformize the strand per reference locus.
- Added an option to the
align
step to--only_collect
the extracted markers and exit afterwards (requested by Diego Morales) - Fixed multiple small bugs.
Captus v1.0.0
- Additional improvements to
captusd bait
: added options--min_expected_tiling
and--remove_ambiguous_loci
for the creation of baitsets and their corresponding reference target files.
Captus v0.9.99
- Now any BUSCO lineage database can be used as reference target file, just download a .tar.gz from https://busco-data.ezlab.org/v5/data/lineages/ and provide the file path for Captus extraction
- Added shortcut for
captus_assembly
as simplycaptus
(data assembly) - Added entry point for
captus_design
and a shortcut ascaptusd
(bait design) - The
cluster
step of bait design now reports mean number of copies per locus instead of just classifying it as single- or multi-copy - Added a function to create a reference target file (for locus extraction) after bait clustering and tiling
- Code cleanup and minor cosmetic changes
Captus v0.9.98
- Fixed potential problem with recognition of
_R1.
or_R1_
patterns in filenames - Support for FastQC v0.12.1 update (s-andrews/FastQC@fbd9cf5)
- Speed up QC step during cleaning step
- If the user provides a clustering threshold with
--cl_min_identity
then the miscellaneous DNA extraction is performed using the same identity. - Allow decimals in maximum average number of copies in a cluster via
--cl_max_copies
- Minor cosmetic improvements
Captus v0.9.97
- Fixed a bug in the extraction report happening when the extraction statistics tables are not sorted. This bug doesn't affect the output at all, just the report heatmap.
Captus v0.9.96
- Fixed indentation bugs that prevented Falco or FastQC from running during the
clean
step and the subsampling of reads during theassemble
step - Secret feature, coding genes databases can also be extracted as nucleotide
- Code cleanup and minor fixes
Captus v0.9.95
- Updated
perl
dependencies, now the latestbioperl
andyaml
can be used byScipio
- Improved
Scipio
parallelization, assemblies sorted by size in decreasing order before processing - Reduce
maxIntron
search forScipio
to 50000bp (previous settings took too long and created unlikely gene models when chromosome-level assemblies are analyzed) - Code cleanup and multiple cosmetic changes
Captus v0.9.93
- Fixed bugs found in
extract.py
during clustering (Thanks to Lydia Paradiso)
Captus v0.9.92
- Added module for bait design
- To see help on the new module use
captus_design --help