Merge pull request #26 from mahesh-panchal/update_sops

Update sops
NBISweden · Dec 11, 2024 · b9c2f5e · b9c2f5e
2 parents 53f8f50 + 6db600d
commit b9c2f5e
Show file tree

Hide file tree

Showing 7 changed files with 131 additions and 49 deletions.
diff --git a/.gitignore b/.gitignore
@@ -8,6 +8,10 @@ data/frozen/*/
 nobackup/
 nxf-work/
 
+# Pipeline files
+results/
+*_cache/
+
 # Common bioinformatic file formats
 *.sam
 *.bam
@@ -33,10 +37,18 @@ nxf-work/
 *.ped
 *.map
 
+# Pixi files
+.pixi/
+*.egg-info
+
 # Nextflow files
 .nextflow*
 work/
 
+# nf-test files
+.nf-test.log
+.nf-test/
+
 # Quarto files
 .quarto/
 _site/
@@ -45,5 +57,6 @@ _book/
 
 # misc
 .DS_Store
+.screenrc
 slurm*.out
 slurm*.err
diff --git a/.gitpod.yml b/.gitpod.yml
@@ -1,14 +1,13 @@
-image: nfcore/gitpod:latest
+image: nfcore/gitpod:dev
+
+tasks:
+  - name: Install Pixi
+    command: |
+      sudo chown gitpod -R /home/gitpod/
+      curl -fsSL https://pixi.sh/install.sh | bash
+      . /home/gitpod/.bashrc
 
 vscode:
-  extensions: # based on nf-core.nf-core-extensionpack
-    - codezombiech.gitignore # Language support for .gitignore files
-    # - cssho.vscode-svgviewer                 # SVG viewer
-    - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code
-    - eamodio.gitlens # Quickly glimpse into whom, why, and when a line or code block was changed
-    - EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files
-    - Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar
-    - mechatroner.rainbow-csv # Highlight columns in csv files in different colors
-    # - nextflow.nextflow                      # Nextflow syntax highlighting
-    - oderwat.indent-rainbow # Highlight indentation level
-    - streetsidesoftware.code-spell-checker # Spelling checker for source code
+  extensions:
+    - nf-core.nf-core-extensionpack
+    - quarto.quarto
diff --git a/docs/gh-pages/_quarto.yml b/docs/gh-pages/_quarto.yml
@@ -26,6 +26,8 @@ format:
     toc-depth: 2
     number-depth: 2
     theme: minty
+    mermaid:
+      theme: forest
 
 bibliography: references.bib
 

diff --git a/docs/gh-pages/index.qmd b/docs/gh-pages/index.qmd
@@ -1,15 +1,6 @@
-# Protocols
+# Running assembly projects
 
-Here are the standard operating procedures to follow when performing a genome assembly,
-annotation, and/or further analysis.
-
-## Why do we need these protocols?
-
-- To make data findable - (strict folder structure)
-- Ease project tracking - (git)
-- Reduce workload - (automation, code sharing)
-- Reproducibility - (workflows, notebooks, git, documentation, containers, interoperability)
-- Documentation - (reporting, summaries, issue tracking)
+If you're new to these protocols, please see the [onboarding material](preface.qmd) first.
 
 ## Quick Start
 
@@ -22,27 +13,56 @@ annotation, and/or further analysis.
           - `VREBP`: For VR-EBP projects
           - `ERGA`: For ERGA projects 
           - `BGE`: For BGE projects
-          - `SMS`: For NBIS short term projects
+          - `SMS`: For NBIS user-fee projects
+          - `LTS`: For NBIS peer-review projects
       - `<species>`: Species name
       - `<year>`: Year project started
       - `<short_description>`: Short project description.
   5. Ensure repository is private, then click Create repository.
-- Clone it into the NAISS Storage project.
+- Clone it into the NAISS Storage project or your folder on NAC.
 
   ```{.bash}
-  cd /proj/snic2021-6-194
+  cd <project allocation>
   git clone git@github.com:NBISweden/<repo>.git 
   ```
 - Update README in the repository with project details.
 - Add references to references.bib of important information.
-- Copy NGI deliveries to data folder.
+- Copy NGI deliveries to data folder (see [launch page](launch.qmd)).
 - Link relevant raw data in `data/raw-data`.
 - Update `assembly_parameters.yml` to point to files in `data/raw-data`.
-- Run analyses, activating any necessary compute environments. 
+- Run analyses (`./run_nextflow.sh`)
 - Refer to the other pages here for more in-depth descriptions of the protocols.
 
 The template provides an organised folder structure, and skeleton files to quickly
 start analyzing.
 
-Analyses are primarily run on Uppmax. Github is used as the primary repository, and
+Analyses are primarily run on Uppmax or PDC. Github is used as the primary repository, and
 analysis files should be tracked and pushed regularly.
+
+## Running a test assembly analysis
+
+Follow the steps above to make a repository for a test species. If you would like to use real data
+then feel free to use [Laetiporus sulphureus (Chicken of the Woods)](https://portal.darwintreeoflife.org/data/root/details/Laetiporus%20sulphureus).
+
+From the Data tab, download the bam file for PacBio HiFi into the deliveries folder:
+
+```{.bash}
+wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR680/ERR6808041/m64229e_210602_121910.ccs.bc1020_BAK8B_OA--bc1020_BAK8B_OA.bam
+```
+
+and the FastQ files for HiC (Arima v2) into the deliveries folder:
+
+```{.bash}
+wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR668/000/ERR6688740/ERR6688740_1.fastq.gz
+wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR668/000/ERR6688740/ERR6688740_2.fastq.gz
+```
+
+Symlink the files into appropriate folders under `raw-data`.
+
+Then edit the `assembly_parameters.yml` to point to the data linked under `raw-data`, using
+the bash snippets in the `assembly_parameters.yml` to help you write the input file.
+
+Update the `workflow_parameters.yml` and change the `mitohifi.code` parameter to 4 
+(see [NCBI Taxonomy Browser](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=5630)).
+
+Finally, open a `screen` session and then run the launch script (`./run_nextflow.sh`).
diff --git a/docs/gh-pages/initialize.qmd b/docs/gh-pages/initialize.qmd
@@ -6,8 +6,5 @@ What happens when a new species is to be assembled?
 
 - [ ] Make a private GitHub repository from the [Assembly template](/~https://github.com/NBISweden/assembly-project-template).
 - [ ] Fill in the README.
-- [ ] Make a Project Task List (Click "Issues" on the GitHub repository > Select "Get Started" next to 'Project Task List').
-- [ ] Make an issue for achieved standards (Click "Issues" on the GitHub repository > Select "Get Started" next to 'Achieved standards').
-- [ ] Assign yourself to both issues.
-- [ ] Add these issues to the EBP GitHub Project board.
 - [ ] Update references.bib with relevant references.
+- [ ] Add project details to the project spreadsheet linked in #vr-accessibility-ebp.
diff --git a/docs/gh-pages/launch.qmd b/docs/gh-pages/launch.qmd
@@ -40,25 +40,64 @@ data/
 - `data/frozen` contains symlinks to folders in `data/outputs` which are stage end-points, e.g. the raw-reads have been processed
   in various ways, and after looking at QC controls, one folder is selected to be used for assembly. This is symlinked in frozen.
 
-1. Make a translation table in data/raw-data linking the NGI delivery files to the data we're going to use. Need a way to mark bad data.
+### Link data between folders
+
+Data in `data/raw-data/*` should be symlinked from `data/deliveries/**`. 
+
+E.g.,
+
+```{.bash}
+cd data/raw-data/PacBio-Hifi
+find ../../deliveries -name "*.bam" -o -name "*.pbi" -exec ln -s {} . \;
+```
+
+and
+
+```{.bash}
+cd data/raw-data/Illumina-HiC
+find ../../deliveries -name "*.fastq.gz" -exec ln -s {} . \;
+```
 
 ### Assemble sequence data
 
+```{.bash}
+cd analyses/01_ebp-assembly-workflow
+```
+
+1. Update the `assembly_parameters.yml` with the paths to input files. Check the YML for a bash
+snippet to fill out the section.
+2. Update the `workflow_parameters.yml` with any extra workflow parameters. In particular, 
+check anything marked as TODO, e.g., selecting the mitochondrial code table to use.
+3. Run the workflow.
+
+  ```{.bash}
+  ./run_nextflow.sh
+  ```
+
+::: {.callout-note}
+The workflow above only runs until Hi-C mapping. Steps for manual curation onwards are still
+being implemented.
+:::
+
 ### Annotate assemblies
 
+No protocols as of yet
+
 ### Perform downstream analyses
 
+No protocols as of yet
+
 ### Integrate new analyses
 
-2. Need a protocol to integrate custom scripts into template while it's not
-integrated into the workflow. 
+Custom analyses might be needed. In these cases please make use of the other project folders,
+and do your best to version control all the steps. 
 
-- Put custom code in `code/scripts`, `code/snakemake`, `code/nextflow`, and launch scripts under `code/launch_templates`.
+- Put custom code in `code/scripts`, `code/snakemake`, and `code/nextflow`.
 - Make sure the code uses containers or conda environments to package the software environment.
 - Make an issue on the template to integrate the code into the template so that it's shareable until it's integrated into
 a workflow.
 - Make an issue on the relevant workflow to integrate the tools.
 
 ### Troubleshoot
 
-3. Need a protocol for troubleshooting. Who to ask
+If you encounter any issues with using these protocols please ask on #vr-accessibility-ebp.
diff --git a/docs/gh-pages/preface.qmd b/docs/gh-pages/preface.qmd
@@ -1,16 +1,27 @@
 # Onboarding {.unnumbered}
 
-Here you can find instructions on how to run assembly projects
-for the VR-EBP, ERGA, and BGE projects.
+Here you can find instructions on how to run assembly projects for the VR-EBP, ERGA, and BGE 
+projects.
+
+To ensure consistent, reproducible, and efficient genome assembly and analysis projects, we've 
+established these standard operating procedures (SOPs). By following these guidelines, we aim to 
+optimize our workflows, streamline data management, and facilitate collaboration.
+
+## Why do we need these protocols?
+
+- To make data findable - (strict folder structure)
+- Ease project tracking - (git)
+- Reduce workload - (automation, code sharing)
+- Reproducibility - (workflows, notebooks, git, documentation, containers, interoperability)
+- Documentation - (reporting, summaries, issue tracking)
 
 ## Getting started
 
 A Github account is needed. A new member needs to added to the NBISweden Github organisation 
-(Responsible: FIXME), and then to the ERGA assemblies team (Responsible: Martin P.) to access 
-this webpage and template.
+(ask on #technical-operations), and then to the ERGA assemblies team (Responsible: Martin P.).
 
 New members also need to be added to the NAISS compute and storage allocations in SUPR 
-(Responsible: Henrik).
+(Responsible: Henrik / Mahesh).
 
 Life-cyle:
 ```{mermaid}
@@ -31,11 +42,11 @@ flowchart LR
 
 - Lead: Henrik (NBIS), Lucile (NBIS)
 - Sequencer: Ignas (NGI), Christian (NGI)
-- Assembler: Martin P. (NBIS), Mahesh (NBIS), André (NBIS), Guilherme (NBIS), Estelle (NBIS)
+- Assembler: Martin P. (NBIS), Mahesh (NBIS), André (NBIS), Guilherme (NBIS), Estelle (NBIS), Tomas (NBIS)
 - Annotator: Lucile (NBIS), André (NBIS), Guilherme (NBIS), Martin P. (NBIS)
-- Steward: Stephan (NBIS)
+- Steward: Stephan (NBIS), Yvonne (NBIS)
 - Analyst: André (NBIS), Guilherme (NBIS)
-- Developer: Mahesh (NBIS)
+- Developer: Mahesh (NBIS), Martin P.(NBIS)
 - Monitor: Mahesh (NBIS)
 
 ```{mermaid}
@@ -57,11 +68,12 @@ sequenceDiagram
 
 ### Who to talk to:
 
-- Add to Github organisation: FIXME
+- Add to Github organisation: #technical-operations
 - Add to Github team: Martin P.
-- Add to NAISS compute allocation: Henrik
-- Add to NAISS storage allocation: Henrik
+- Add to NAISS compute allocation: Henrik / Mahesh
+- Add to NAISS storage allocation: Henrik / Mahesh
 - How to use the template: Mahesh
 - Code review: Mahesh
 - Protocol review: Mahesh
 - Disk space issues: Entire team
+- Anything else: #vr-accessibility-ebp