Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URLopen error #254

Closed
spencerlong1 opened this issue Jan 18, 2023 · 12 comments
Closed

URLopen error #254

spencerlong1 opened this issue Jan 18, 2023 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@spencerlong1
Copy link

spencerlong1 commented Jan 18, 2023

Hi,

Left DRAM downloading the databases last few days and have run into the following error both times: (which I know is common):

(DRAM) [sdl1u18@cyan51 ~]$ cd ../../scratch/sdl1u18/
(DRAM) [sdl1u18@cyan51 sdl1u18]$ DRAM-setup.py prepare_databases --output_dir ../../scratch/sdl1u18/
/home/sdl1u18/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_handler.py:123: UserWarning: Database does not exist at path None
warnings.warn("Database does not exist at path %s" % description_loc)
2023-01-17 10:14:16,620 - Starting the process of downloading data
2023-01-17 10:14:16,620 - The kegg_loc argument was not used to specify a downloaded kegg file, and dram can not download it its self. So it is assumed that the user wants to set up DRAM without it
2023-01-17 10:14:16,620 - The gene_ko_link_loc argument was not used to specify a downloaded gene_ko_link file, and dram can not download it its self. So it is assumed that the user wants to set up DRAM without it
2023-01-17 10:14:16,620 - Database preparation started
2023-01-17 10:14:16,620 - Downloading kofam_hmm
2023-01-17 10:20:25,663 - Downloading kofam_ko_list
2023-01-17 10:20:30,338 - Downloading uniref
2023-01-17 18:47:01,406 - Downloading pfam
2023-01-17 18:48:11,888 - Downloading pfam_hmm
2023-01-17 18:48:12,088 - Downloading dbcan
2023-01-17 18:48:17,232 - Downloading dbcan_fam_activities
2023-01-17 18:48:17,232 - Downloading dbCAN family activities from : https://bcb.unl.edu/dbCAN2/download/Databases/V11/CAZyDB.08062022.fam-activities.txt
2023-01-17 18:48:17,878 - Downloading dbcan_subfam_ec
2023-01-17 18:48:17,879 - Downloading dbCAN sub-family encumber from : https://bcb.unl.edu/dbCAN2/download/Databases/V11/CAZyDB.08062022.fam.subfam.ec.txt
2023-01-17 18:48:18,887 - Downloading vogdb
2023-01-17 18:48:25,272 - Downloading vog_annotations
2023-01-17 18:48:25,593 - Downloading viral
2023-01-17 18:48:37,411 - Something went wrong with the download of the url: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.2.protein.faa.gz
2023-01-17 18:48:37,411 - <urlopen error <urlopen error ftp error: error_perm('550 viral.2.protein.faa.gz: No such file or directory')>>
2023-01-17 18:48:37,840 - Something went wrong with the download of the url: https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.2.protein.faa.gz
2023-01-17 18:48:37,840 - HTTP Error 404: Not Found
Traceback (most recent call last):
File "/home/sdl1u18/.conda/envs/DRAM/bin/DRAM-setup.py", line 184, in
args.func(**args_dict)
File "/home/sdl1u18/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 532, in prepare_databases
locs[i] = download_functions[i](
File "/home/sdl1u18/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/database_processing.py", line 218, in download_viral
download_file(url, output_name, logger, alt_urls=[url_http], verbose=verbose)
File "/home/sdl1u18/.conda/envs/DRAM/lib/python3.10/site-packages/mag_annotator/utils.py", line 33, in download_file
raise URLError("DRAM whas not able to download a key database, check the logg for details")
urllib.error.URLError: <urlopen error DRAM whas not able to download a key database, check the logg for details>

Looks like the viral.2.protein.faa.gz hasnt downloaded. I see in my database_files that viral.1.protein.faa.gz is present, so I am wondering why this might be? Those that run our HPC dont seem to think it is the firewall (which let through everything else so far), and fttp seems fine if viral.1. has made it through. Was also just wondering how much more is required after this step, as I will just use the database_loc commands for what is already there (assuming uniref and pfam etc seem fine at this stage?)

apologies if basic, I am new to DRAM and annotation software as a whole!

Cheers!
Spencer

@nikolasbasler
Copy link

nikolasbasler commented Jan 19, 2023

Hello,

I am getting the same error, also with DRAM 1.4.5, which was recently made available on conda.

Looking at the ftp address (https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/), there is no viral.2.protein.faa.gz, so it's not surprising DRAM doesn't find it. In case it helps to find out what's going on: All files in that folders are only a few days old (2023-01-13) and one day earlier (so on 12th), I could sucessfully download and prepare the databases, including viral (but with --skip_uniref). Now there is only a viral.1.protein.faa.gz at that ftp address.

Cheers,
Nikolas

@spencerlong1
Copy link
Author

Hello,

I am getting the same error, also with DRAM 1.4.5, which was recently made available on conda.

Looking at the ftp address (https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/), there is no viral.2.protein.faa.gz, so it's not surprising DRAM doesn't find it. In case it helps to find out what's going on: All files in that folders are only a few days old (2023-01-13) and one day earlier (so on 12th), I could sucessfully download and prepare the databases, including viral (but with --skip_uniref). Now there is only a viral.1.protein.faa.gz at that ftp address.

Cheers, Nikolas

Hi Nikolas,

good find, and I am seeing the same thing. I wonder if viral.2. is no longer needed, and in that case, there is a way to skip it , or alternatively a way to find the old versions. I will play around today.

Cheers,
Spencer

@rmFlynn
Copy link
Member

rmFlynn commented Jan 19, 2023

Thanks guys looks like we might have to update the path I will get on it.

@rmFlynn rmFlynn added the bug Something isn't working label Jan 20, 2023
@rmFlynn
Copy link
Member

rmFlynn commented Jan 20, 2023

It looks like the change is real and also that it is here to stay. I am testing a fix to only pull one file now and will make a new point release when it is done. Or more likely, I will have @dmitrisvetlov do it.

@JoseLopezArcondo
Copy link

Hello, I am new to DRAM. I had the same issue with the conda latest version. How should we skip or solve this?
If I run DRAM now it seems it cannot find any database path (although they are in "DRAM_data/database_files", probably because the database download process did not end up correctly because of lacking viral2 files??

@rmFlynn
Copy link
Member

rmFlynn commented Feb 2, 2023

First can you post the output of DRAM-setup.py version?

@rmFlynn
Copy link
Member

rmFlynn commented Feb 2, 2023

As in issue #236; you can download the file from https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/. Put it in a folder on your server and point to it using DRAM-setup.py prepare_databases --viral_loc viral_file.faa.gz there will only be one file but if in the future there are separate viral files, you can cat them together to make the merged.faa.gz.

@JoseLopezArcondo
Copy link

Thanks.
The output of DRAM-setup.py version: 1.4.5
Allright, I downloaded viral.1.protein.faa.gz file and put it in DRAM_data folder, but when running
DRAM-setup.py prepare_databases --viral_loc path_to_viral.1.protein.faa.gz
this happens:
FileExistsError: [Errno 17] File exists: './database_files'
I guess as I already run the DRAM-setup.py prepare_databases --output_dir DRAM_data step, it collides with the already downloaded files...

@rmFlynn
Copy link
Member

rmFlynn commented Feb 2, 2023

The latest version of DRAM in conda is 1.4.6 https://anaconda.org/bioconda/dram, you will see in the release notes that that is the point release for single viral files. You may want to upgrade for future stability.

Yes, you must put it in a new location or delete the failed folder. You must set up all the databases at the same time for now; At least if you want to have a reliable set up, that is what you must do.

@JoseLopezArcondo
Copy link

Allright, but is there any way to use already downloaded files, or I must remove DRAM_data folder and download everything again with the new version installed? thanks

@rmFlynn
Copy link
Member

rmFlynn commented Feb 2, 2023

You can use already downloaded files using the -loc_ arguments for each. Use DRAM-setup.py prepare_databases --help to see the many arguments. Then use a new location for the output. It would be more work than it is worth, in my opinion, the downloading is typically the fast part of the setup process.

@rmFlynn rmFlynn closed this as completed Feb 3, 2023
@wuhuiyun07
Copy link

I am using DRAM version 1.4.6, and also have the same error:
database_handler.py:123: UserWarning: Database does not exist at path None
warnings.warn("Database does not exist at path %s" % description_loc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants