Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error occurs if there is an "_" in the FASTA header name. #39

Closed
akikuno opened this issue May 29, 2024 · 1 comment
Closed

An error occurs if there is an "_" in the FASTA header name. #39

akikuno opened this issue May 29, 2024 · 1 comment
Assignees
Labels
bug🐛 Something isn't working

Comments

@akikuno
Copy link
Owner

akikuno commented May 29, 2024

Describe the bug

If an underscore _ (e.g., >1_hoge) is included in the header name of the input FASTA file for DAJIN2, the following error occurs:

2024-05-29 17:42:33, INFO, 🏃 Start running DAJIN2 version 0.4.6
2024-05-29 17:42:33, INFO, example_single/control is now processing...
2024-05-29 17:42:33, INFO, Preprocess example_single/control...
2024-05-29 17:42:57, INFO, Output BAM files of example_single/control...
2024-05-29 17:42:57, INFO, 🍵 example_single/control is finished!
2024-05-29 17:42:57, INFO, example_single/sample is now processing...
2024-05-29 17:42:57, INFO, Preprocess example_single/sample...
2024-05-29 17:43:30, INFO, Classify example_single/sample...
2024-05-29 17:43:33, INFO, Clustering example_single/sample...
2024-05-29 17:43:41, INFO, Consensus calling of example_single/sample...
2024-05-29 17:43:41, ERROR, Catch an Exception. Traceback:
Traceback (most recent call last):
  File "/home/kuno/miniconda/envs/env-dajin2/bin/DAJIN2", line 10, in <module>
    sys.exit(execute())
  File "/home/kuno/miniconda/envs/env-dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 236, in execute
    execute_single_mode(arguments)
  File "/home/kuno/miniconda/envs/env-dajin2/lib/python3.10/site-packages/DAJIN2/main.py", line 46, in execute_single_mode
    core.execute_sample(arguments)
  File "/home/kuno/miniconda/envs/env-dajin2/lib/python3.10/site-packages/DAJIN2/core/core.py", line 187, in execute_sample
    consensus.cache_mutation_loci(ARGS, clust_subset_sample)
  File "/home/kuno/miniconda/envs/env-dajin2/lib/python3.10/site-packages/DAJIN2/core/consensus/mutation_extractor.py", line 101, in cache_mutation_loci
    cache_normalized_indels(ARGS, path_midsv_sample)
  File "/home/kuno/miniconda/envs/env-dajin2/lib/python3.10/site-packages/DAJIN2/core/consensus/mutation_extractor.py", line 73, in cache_normalized_indels
    sequence = ARGS.fasta_alleles[allele]
KeyError: '1'

Solutions

The cause of the error is the frequent use of split("_") on the path without considering the use of underscores in the header name. In DAJIN2, various annotations are added to the header name using _ as the delimiter. If the user-specified FASTA header name contains _, the expected splits are misaligned.

To handle cases where "_" is included, appropriate splitting should be performed. Specifically, it is recommended to remove the FASTA header name before splitting.

The following script contains hard-coded instances of the above issue, which need to be corrected.

DAJIN2/src/DAJIN2/core/consensus/mutation_extractor.py:    allele, label, *_ = path_midsv_sample.stem.split("_")
DAJIN2/src/DAJIN2/core/consensus/mutation_extractor.py:        allele, label, *_ = path_indels_normalized_sample.stem.split("_")
DAJIN2/src/DAJIN2/core/consensus/similarity_searcher.py:    allele, label, *_ = Path(path_midsv_sample).stem.split("_")
DAJIN2/src/DAJIN2/core/preprocess/midsv_caller.py:        preset = path.stem.split("_")[0]
DAJIN2/src/DAJIN2/core/preprocess/midsv_caller.py:        preset = path.stem.split("_")[0]
DAJIN2/src/DAJIN2/core/report/sequence_exporter.py:    allele = header.split("_")[1]
DAJIN2/src/DAJIN2/utils/report_generator.py:        label, allele, type_, *_ = reads["NAME"].split("_")

Steps/Code to Reproduce

Operating System

WLS2

Python version

3.10

DAJIN2 version

0.4.6

Additional context

Thank you @geedrn for reporting the issue!!

@akikuno akikuno added the bug🐛 Something isn't working label May 29, 2024
@akikuno akikuno self-assigned this May 29, 2024
akikuno added a commit that referenced this issue May 30, 2024
@akikuno
Copy link
Owner Author

akikuno commented Jun 5, 2024

Modified the system to separate intermediate files using a directory structure instead of underscores (""), ensuring that no errors occur even if users use allele names containing underscores ("").

The imprementation will be reflect on DAJIN2 v0.5.0.

@akikuno akikuno closed this as completed Jun 5, 2024
@akikuno akikuno mentioned this issue Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant