You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
In running sourmash compare on ~300k genes with scaled=10, I kept running into both out of memory (bus error: core dumped) and no space left on device errors, and I think the way to fix them is non-obvious.
Error executing process > 'sourmash_compare_sketches (dayhoff__k-30)'
Caused by:
Process `sourmash_compare_sketches (dayhoff__k-30)` terminated with an error exit status (135)
Command executed:
sourmash compare \
--ksize 30 \
--dayhoff \
--csv similarities__dayhoff__k-30.csv \
--processes 10 \
--traverse-directory .
# Use --traverse-directory instead of all the files explicitly to avoid
# "too many arguments" error for bash when there are lots of samples
Command exit status:
135
Command output:
(empty)
Command error:
...loading from '.' / 280690 sigs total
... redacted for brevity ...
...loading from '.' / 280920 sigs total
...loading from '.' / 280930 sigs total.command.sh: line 7: 29 Bus error (core dumped) sourmash compare --ksize 30 --dayhoff --csv similarities__dayhoff__k-30.csv --processes 10 --traverse-directory .
And if I did ls -lha in that directory with my zsh setup, I'd get no space left on device:
(immune-evolution)
✘ Fri 23 Apr - 05:01 ~/code/botryllus/workflows/kmermaid/mhc olgabot/kmermaid-mhc ✔ 1☀
olga@lrrr ll
Permissions Size User Group Date Modified Git Name
drwxr-xr-x - olga czb 23 Apr 5:01 -N .nextflow
.rw-r--r-- 323k olga czb 23 Apr 5:01 -- .nextflow.log
.rw-r--r-- 613k olga czb 22 Apr 10:07 -N .nextflow.log.1
.rw-r--r-- 14k olga czb 21 Apr 16:55 -N .nextflow.log.2
.rw-r--r-- 32k olga czb 21 Apr 16:53 -N .nextflow.log.3
.rw-r--r-- 43k olga czb 21 Apr 16:12 -N .nextflow.log.4
.rw-r--r-- 29k olga czb 21 Apr 15:03 -N .nextflow.log.5
.rw-r--r-- 18k olga czb 21 Apr 14:46 -N .nextflow.log.6
.rw-r--r-- 14k olga czb 21 Apr 14:38 -N .nextflow.log.7
.rw-r--r-- 15k olga czb 21 Apr 14:38 -N .nextflow.log.8
.rw-r--r-- 14k olga czb 21 Apr 14:36 -N .nextflow.log.9
.rw-r--r-- 951 olga czb 21 Apr 14:47 -N Makefile
.rw-r--r-- 437 olga czb 21 Apr 13:51 -N Makefile~
.rw-r--r-- 246 olga czb 22 Apr 17:11 -N nextflow.config
.rw-r--r-- 46 olga czb 21 Apr 13:52 -N nextflow.config~
drwxr-xr-x - olga czb 21 Apr 13:52 -N ROJECT_BASE
drwxr-xr-x - olga czb 21 Apr 14:32 -N work
prompt_git:33: write failed: no space left on device
prompt_git:37: write failed: no space left on device
prompt_git:40: write failed: no space left on device
prompt_git:47: write failed: no space left on device
prompt_git:48: write failed: no space left on device
prompt_git:55: write failed: no space left on device
prompt_git:62: write failed: no space left on device
I realized that the code makes a temporary file, and by default this will be /var/tmp, which does not have a ton of space in this specific configuration. So then I set export TMPDIR=$HOME/data_lg/tmp, which is mounted storage with a LOT more space.
Running the command manually with a different temporary directory, turns out this temp dir was ~634 GB! No wonder it was running out of both memory and space!
(nf-core--kmermaid-1.1.0dev)
✘ Mon 26 Apr - 10:34 ~/data_lg/tmp
olga@hulk ll
Permissions Size User Group Date Modified Name
.rw------- 634G olga czb 26 Apr 10:23 arrayk2nn1fdp.mmap
.rw------- 2.3M olga czb 26 Apr 9:53 arraynmt55kmf.mmap
This still didn't run fully, where I got some OverflowErrors due to the array being huge probably.. or something else. Anyway, I downsampled the signatures to scaled=100 and am running them now.
OverflowError: cannot serialize a string larger than 4GiB
Process ForkPoolWorker-1: done in 7.03321 seconds
Traceback (most recent call last):
File "/data_sm/home/olga_ibm/miniconda3/envs/nf-core--kmermaid-1.1.0dev/lib/python3.7/multiprocessing/pool.py", line 127, in worker
put((job, i, result))
File "/data_sm/home/olga_ibm/miniconda3/envs/nf-core--kmermaid-1.1.0dev/lib/python3.7/multiprocessing/queues.py", line 364, in put
self._writer.send_bytes(obj)
File "/data_sm/home/olga_ibm/miniconda3/envs/nf-core--kmermaid-1.1.0dev/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/data_sm/home/olga_ibm/miniconda3/envs/nf-core--kmermaid-1.1.0dev/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data_sm/home/olga_ibm/miniconda3/envs/nf-core--kmermaid-1.1.0dev/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/data_sm/home/olga_ibm/miniconda3/envs/nf-core--kmermaid-1.1.0dev/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/data_sm/home/olga_ibm/miniconda3/envs/nf-core--kmermaid-1.1.0dev/lib/python3.7/multiprocessing/pool.py", line 132, in worker
put((job, i, (False, wrapped)))
File "/data_sm/home/olga_ibm/miniconda3/envs/nf-core--kmermaid-1.1.0dev/lib/python3.7/multiprocessing/queues.py", line 358, in put
obj = _ForkingPickler.dumps(obj)
File "/data_sm/home/olga_ibm/miniconda3/envs/nf-core--kmermaid-1.1.0dev/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a string larger than 4GiB
All this is to say that I happened to know @pranathivemuri's code creates a temporary file for a memory-mapped siglist, but I didn't see this in the documentation (maybe I missed it). It would be helpful to either make explicit a --tmpdir flag, or state in the sourmash compare documentation that if you are running into performance issues that you may want to set one of TMPDIR, TEMP, or TMP environment variables as stated in tempfile.gettempdir(). Open to ideas! Curious to hear your thoughts on this.
The text was updated successfully, but these errors were encountered:
I wonder if we could use a zarr file here as mmap file is getting huge about half a TB. If it takes lesser space and provides easy access as well, could be good. But it would introduce zarr dependency in sourmash, not sure if that's acceptable
Hello,
In running
sourmash compare
on ~300k genes with scaled=10, I kept running into both out of memory (bus error: core dumped) andno space left on device
errors, and I think the way to fix them is non-obvious.And if I did
ls -lha
in that directory with myzsh
setup, I'd getno space left on device
:I realized that the code makes a temporary file, and by default this will be
/var/tmp
, which does not have a ton of space in this specific configuration. So then I setexport TMPDIR=$HOME/data_lg/tmp
, which is mounted storage with a LOT more space.Running the command manually with a different temporary directory, turns out this temp dir was ~634 GB! No wonder it was running out of both memory and space!
This still didn't run fully, where I got some
OverflowError
s due to the array being huge probably.. or something else. Anyway, I downsampled the signatures toscaled=100
and am running them now.All this is to say that I happened to know @pranathivemuri's code creates a temporary file for a memory-mapped siglist, but I didn't see this in the documentation (maybe I missed it). It would be helpful to either make explicit a
--tmpdir
flag, or state in thesourmash compare
documentation that if you are running into performance issues that you may want to set one ofTMPDIR
,TEMP
, orTMP
environment variables as stated intempfile.gettempdir()
. Open to ideas! Curious to hear your thoughts on this.The text was updated successfully, but these errors were encountered: