Skip to content

Commit

Permalink
Checkpoint
Browse files Browse the repository at this point in the history
  • Loading branch information
robertaboukhalil committed Jan 17, 2024
1 parent 63dd7b0 commit ec8a82e
Show file tree
Hide file tree
Showing 4 changed files with 102 additions and 12 deletions.
2 changes: 1 addition & 1 deletion workflows/bulk-download/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ FROM ubuntu:20.04

LABEL maintainer="CZ ID Team <idseq-tech@chanzuckerberg.com>"

RUN apt-get update && apt-get -y install curl git python3 python3-pip gcc make libz-dev libncurses-dev libbz2-dev liblzma-dev g++ zip
RUN apt-get update && apt-get -y install curl git python3 python3-pip make zip
RUN ln -s /usr/bin/python3 /usr/bin/python

COPY requirements.txt requirements.txt
Expand Down
24 changes: 13 additions & 11 deletions workflows/bulk-download/run.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,27 @@ workflow bulk_download {
input {
String action
Array[File] files
String docker_image_id
String docker_image_id = "czid-bulk-download"
}

if (action == "concatenate") {
call concatenate {
input:
files = files
docker_image_id = host_filtering_docker_image_id,
files = files,
docker_image_id = docker_image_id
}
}

if (action == "group") {
call group {
if (action == "zip") {
call zip {
input:
files = files
docker_image_id = host_filtering_docker_image_id,
files = files,
docker_image_id = docker_image_id
}
}

output {
File? file = select_first([ concatenate.file, group.file ])
File? file = select_first([ concatenate.file, zip.file ])
}
}

Expand All @@ -45,17 +45,19 @@ task concatenate {
}
}

task group {
task zip {
input {
String docker_image_id
Array[File] files
}
command <<<
set -euxo pipefail
zip ~{sep=" " files} > group.zip

# Don't store full path of original files in the .zip file
zip --junk-paths result.zip ~{sep=" " files}
>>>
output {
File file = "group.zip"
File file = "result.zip"
}
runtime {
docker: docker_image_id
Expand Down
44 changes: 44 additions & 0 deletions workflows/bulk-download/test/host_filter_1.fastq
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
@M05295:617:000000000-KL64F:1:1101:3078:7376
TTTTGCCGTAACGGCTTTTTACCACAGCCAGCTTGCGGCGCAACACCTCCGCCAGAAAGTTGCCGTTGCCGCAGGCGGGTTCCAGAAAACGGCTCTCGATGCGCTCCGTCTCGCTCTTTACAAGGTCGCACATCGCCTTTACCTCC
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG8DF@FDEGGGGGGGGGGGGGGGCDGGFFFFEFFFFFF>/
@M05295:617:000000000-KL64F:1:1101:3125:11405
TCTTTGGTATACTGCAGTGCTTATATGCGGTTTGCTGATTTTTTCGGCGGCAGCTTGTGCAGGAACGATTCTTTCCTGCAATAACCGGCTGAAAAGAAAAAGGAAAAAGATACGCAAGGCGGCACTCTTGTCAACTATGTGCATTA
+
CCCCCGGGGGGGGFFFDFGGGAFGCFGGEGGGG>AECFGEFFGGCEFGGEGGGGGGGGGGGGGGFGGGGFFGGGGGGFGGGGGGFG:FGGGFGGGG7DFCGGGGGDFGEGGGGGGCGGGFGEGGGGGFGCGGGGFF5D9CD7<DF+
@M05295:617:000000000-KL64F:1:1101:2016:13202
CCACCAAATAACACTCAAGGACTTCAAATGTCGGAGAGTGTGAGATGTTCTTTGAAAATTGAATAACGAAACAACAAAGAGGAAATTAAAGATATCCAATTAAAGAAATTTAATGGGTAAAATACAATTTCAAACAATTCTTCTGT
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGFGGGGGGGGGGGGGGFFFFFFFFFFFF
@M05295:617:000000000-KL64F:1:1101:2666:12975
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFBBF
@M05295:617:000000000-KL64F:1:2112:3938:13885
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGDEGGGGEGGCFGGGGGGEGGCGGGGGGFGG8FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@M05295:617:000000000-KL64F:1:2108:9015:16045
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGDGGGGCGFGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFG
@M05295:617:000000000-KL64F:1:2106:11795:12187
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFGGGGGGGGGCGGGGGGGDGEFCFFDFFCGGGGGGGFGGGGGGGGGGFEF8FDCEGGGGGGGGGD>FFGGGGGGGGGGGFGGGGDGGGGGGGGGGGGGGG
@M05295:617:000000000-KL64F:1:2117:7228:7910
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGGGGGGGGGGGFFGGGGGGFFGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCDGGGGGGGGGGGDGGGGFGGAFGG7FGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF
@M05295:617:000000000-KL64F:1:1119:25439:5751
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGFGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGF
@M05295:617:000000000-KL64F:1:2101:21157:16235
CCTCTTTTTTTTGCAGAAGAGTACACAACTGCTTTATTTTATGCTAAAAGACCCCTGCCTACGCAAAGGCAGAGGTCCGATTTTTTCATAGTCTGGGGAGATAAAACAACTTTCCGATTTCACAGAATGCGCACGGCCTTCCAGAT
+
CCCCCGGGGGGDGGGGFGGGGGGGGF@FGGGGG<FGGGGGGGGGFGGGGGGGFGEGDGGGGGGGGGCF@F<FFGGFGGGGG=@FGEGGGGFFA9E88CCEGGFAGFGGGGGFGG8CCCEEGGGGG??D??CGGD69DFGFF6DFGF
@M05295:617:000000000-KL64F:1:1101:1908:15400
CGCTCACATGAACGGAATAATACTCTCCCAAATATTCACTTCCCGCCCCATCTTTGTATACTTCCTCTGTTTCAAGATCATATGTTTGATATAAATACGCCTGTCCATCATTCTTCAGATCTACATAAGCAGCCCAGTCATCAATC
+
<8-AC@FCGFGGGGGCFFEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGFGFGFGGCGFGGGFBFFFFBFBDDF9
44 changes: 44 additions & 0 deletions workflows/bulk-download/test/host_filter_2.fastq
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
@M05295:617:000000000-KL64F:1:1101:3078:7376
CTTTGCCCTCAGATTTGCTTTTGTACCAATTATAGCATATTTCCCGGTTAAATCCACAGATTTTTAGCTATTCGTTTCATCTCTTGAGCCGCTTGTCAAAAGGTACACTTTTTGGCAAGCCCTTCAAAGAGGTGGAACGAATGGCA
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGFGFFFFDFFFFF5
@M05295:617:000000000-KL64F:1:1101:3125:11405
TCTTTGGTAAGTCCGAACAGATTTTATTCTACTCCTCGGGTGTTCTGAGCGATTGTTTGTGTTGAAAAGTCATTCAGGTCATAGTACCGCATTGCTTCTGTTCCGTCTTTGGCGATATACTCGACTAAAATGTAATGCTCGGTTAT
+
CCCCCFGFFFGF<FF@CFG8<<F<FGGFGGGGGGGGGGCFEGG7FGF9FC7@FGFAFGGGGGDCFEFGGGGGGGGFGGEGGGFGGGGGGGGGGGFGEGGC@EFGGFFGCGGFFGDEGGGGDFDGGGDCG?FFGBFFFFFBD>@ABD
@M05295:617:000000000-KL64F:1:1101:2016:13202
TTCAGTTCGGGCGGTTCCCCTCATATACCTATTTATTCAGTATATGATACATGGACTTGACTCCATGTGGATTGCTCCATTCGGACATCTACGGATCATATCGTGCTTGCCAATCCCCGTAGCTTTTCGCAGCTTACCACGTCCTT
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFFGGGGDGDFFGGGGGGGFGFD?FFFFFFFF
@M05295:617:000000000-KL64F:1:1101:2666:12975
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTATGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAGAAAC
+
CCCCCGGGGGGGGGGG9FAFGFGFGFGGGGGGGGGGGGGGGGGGGGGGDEGGGFGGDCGF:FGGGGGGGGFC=CFGGGGGFGGGGGGGGGGGGGFGGGGEEGDCC>EFGDEEGGGGGGGGFFDGGGGCDGG7DFGFFF5**@FFFA
@M05295:617:000000000-KL64F:1:2112:3938:13885
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTCTGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAGAAAC
+
CCCCCGGGGGGGGGCCFF<FGFGGGFCFGGGGGGGGGGGEFGEFGGGGEG7FFDGFEGGG:CFEFFFGGGEGG+CFFGCGGGGGGGGD+?FFCF,AAF,,@B@F7C@D@CCEGGGFFGFGGDGGGGFFGGCFCFG6C>+;*7*AF5
@M05295:617:000000000-KL64F:1:2108:9015:16045
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTATGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAGAAAC
+
CCCCCGGGGGGGGGGFEFG8F8AFGG:FFGFGGGGGGGGGGGGG,9BF:E:FD,@@::F8CCEFEG9FGC,7@CFDC9,CF<AFGFG:+84AAE,@9,E9CBCEE6+7BCEE6=6=F,EFGF>CDB:,CFF9CFFF677C57;>7*
@M05295:617:000000000-KL64F:1:2106:11795:12187
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTATGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAGAAAC
+
CCCCCGGGGGGGGGGFGGG8EEFEFGGGGGG,FFGGGGGGGGGGFGG@CEFCDFGG77F@CFFGFGCFD@EFFDFEAEFGFGFGGCCC+@FFCFEFFFFFEGG>C6+@EEEC>DCFFGDGGDEGGGFGGGGFFD6CD66?7BFF5;
@M05295:617:000000000-KL64F:1:2117:7228:7910
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTATGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAGAAAC
+
CCCCCGGGGGGGGGFE,C9,FFFGFCFGGGGGGFGGGFFCF<FFG8ECFGCFGGGGG7FFGGEF8FG<ECFGGGGGFGGF@9FEEGG+=F@8FF9F9AEEGGEGGGG<BFEEEFGFFGGGGGGGGGAFFFCFGGCCFCE5CGFF?8
@M05295:617:000000000-KL64F:1:1119:25439:5751
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTATGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAGAAAC
+
CCCCCGGGGGGGGGGF9@FGGGFGGFEGGGGCFEGGGGFGGGGGGGGGGGDDGGGGEDGF=FE@FGGGGFEFGGGGGGGGGGCFGGECC@FGEGFDCDFDGGGGCE@FFCFGGGFCFGGGGGFGGFGGGFGCFGGGF>>CGGGGGD
@M05295:617:000000000-KL64F:1:2101:21157:16235
CCTTTTTTTTTTTTAACTGGAATCGACATTGATTTTTATATTCCGTCGGCAAGGCAGGTCGTTCAGGTGGCGTATTCCATTCAGGGGGATGCCTTTGAGCGCGAAGTCGGAAATCTGAAAAAATTTGCAGCCACCACGACAAAAAC
+
CCCCCGGGGGGGGG<<6@<C66CC8EDC@FC<FFF@FFF9FGGGGGFEGGDEEFC=CCF8FGGGGA?EE?8@F=FF9DE,FFBFDFEEECE8BF,E=,DFEDCEC>F8@CEGCDF:,,@+6=@F8FCGG6;+0=00*3**0**33:
@M05295:617:000000000-KL64F:1:1101:1908:15400
TGGTCCTCGCCGTTATGAATCTTACAAAGAAAGTATAGTTGACCATTCAGATATTTTATTGGATGAGAGGTATGCGGAAGCATGGGAATATAAGGACAATCCATTTATTTATGTATCTATCATAGGACCTATTTATGCAACGGGAA
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDFGGGGGGGCFGGGDGFGGGGGGGGFFGGGGGGGGGGGGGGGGGGGGFGFGGGDGGGGGGGGGGGGDFGGGGGGGGGGGGGGGGGGGGG8DFGFFFFFFF5@A:

0 comments on commit ec8a82e

Please sign in to comment.