Skip to content

Commit

Permalink
Merge pull request #26 from mahesh-panchal/update_sops
Browse files Browse the repository at this point in the history
Update sops
  • Loading branch information
LucileSol authored Dec 11, 2024
2 parents 53f8f50 + 6db600d commit b9c2f5e
Show file tree
Hide file tree
Showing 7 changed files with 131 additions and 49 deletions.
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ data/frozen/*/
nobackup/
nxf-work/

# Pipeline files
results/
*_cache/

# Common bioinformatic file formats
*.sam
*.bam
Expand All @@ -33,10 +37,18 @@ nxf-work/
*.ped
*.map

# Pixi files
.pixi/
*.egg-info

# Nextflow files
.nextflow*
work/

# nf-test files
.nf-test.log
.nf-test/

# Quarto files
.quarto/
_site/
Expand All @@ -45,5 +57,6 @@ _book/

# misc
.DS_Store
.screenrc
slurm*.out
slurm*.err
23 changes: 11 additions & 12 deletions .gitpod.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
image: nfcore/gitpod:latest
image: nfcore/gitpod:dev

tasks:
- name: Install Pixi
command: |
sudo chown gitpod -R /home/gitpod/
curl -fsSL https://pixi.sh/install.sh | bash
. /home/gitpod/.bashrc
vscode:
extensions: # based on nf-core.nf-core-extensionpack
- codezombiech.gitignore # Language support for .gitignore files
# - cssho.vscode-svgviewer # SVG viewer
- esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code
- eamodio.gitlens # Quickly glimpse into whom, why, and when a line or code block was changed
- EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files
- Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar
- mechatroner.rainbow-csv # Highlight columns in csv files in different colors
# - nextflow.nextflow # Nextflow syntax highlighting
- oderwat.indent-rainbow # Highlight indentation level
- streetsidesoftware.code-spell-checker # Spelling checker for source code
extensions:
- nf-core.nf-core-extensionpack
- quarto.quarto
2 changes: 2 additions & 0 deletions docs/gh-pages/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ format:
toc-depth: 2
number-depth: 2
theme: minty
mermaid:
theme: forest

bibliography: references.bib

Expand Down
54 changes: 37 additions & 17 deletions docs/gh-pages/index.qmd
Original file line number Diff line number Diff line change
@@ -1,15 +1,6 @@
# Protocols
# Running assembly projects

Here are the standard operating procedures to follow when performing a genome assembly,
annotation, and/or further analysis.

## Why do we need these protocols?

- To make data findable - (strict folder structure)
- Ease project tracking - (git)
- Reduce workload - (automation, code sharing)
- Reproducibility - (workflows, notebooks, git, documentation, containers, interoperability)
- Documentation - (reporting, summaries, issue tracking)
If you're new to these protocols, please see the [onboarding material](preface.qmd) first.

## Quick Start

Expand All @@ -22,27 +13,56 @@ annotation, and/or further analysis.
- `VREBP`: For VR-EBP projects
- `ERGA`: For ERGA projects
- `BGE`: For BGE projects
- `SMS`: For NBIS short term projects
- `SMS`: For NBIS user-fee projects
- `LTS`: For NBIS peer-review projects
- `<species>`: Species name
- `<year>`: Year project started
- `<short_description>`: Short project description.
5. Ensure repository is private, then click Create repository.
- Clone it into the NAISS Storage project.
- Clone it into the NAISS Storage project or your folder on NAC.

```{.bash}
cd /proj/snic2021-6-194
cd <project allocation>
git clone git@github.com:NBISweden/<repo>.git
```
- Update README in the repository with project details.
- Add references to references.bib of important information.
- Copy NGI deliveries to data folder.
- Copy NGI deliveries to data folder (see [launch page](launch.qmd)).
- Link relevant raw data in `data/raw-data`.
- Update `assembly_parameters.yml` to point to files in `data/raw-data`.
- Run analyses, activating any necessary compute environments.
- Run analyses (`./run_nextflow.sh`)
- Refer to the other pages here for more in-depth descriptions of the protocols.

The template provides an organised folder structure, and skeleton files to quickly
start analyzing.

Analyses are primarily run on Uppmax. Github is used as the primary repository, and
Analyses are primarily run on Uppmax or PDC. Github is used as the primary repository, and
analysis files should be tracked and pushed regularly.

## Running a test assembly analysis

Follow the steps above to make a repository for a test species. If you would like to use real data
then feel free to use [Laetiporus sulphureus (Chicken of the Woods)](https://portal.darwintreeoflife.org/data/root/details/Laetiporus%20sulphureus).

From the Data tab, download the bam file for PacBio HiFi into the deliveries folder:

```{.bash}
wget ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR680/ERR6808041/m64229e_210602_121910.ccs.bc1020_BAK8B_OA--bc1020_BAK8B_OA.bam
```

and the FastQ files for HiC (Arima v2) into the deliveries folder:

```{.bash}
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR668/000/ERR6688740/ERR6688740_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR668/000/ERR6688740/ERR6688740_2.fastq.gz
```

Symlink the files into appropriate folders under `raw-data`.

Then edit the `assembly_parameters.yml` to point to the data linked under `raw-data`, using
the bash snippets in the `assembly_parameters.yml` to help you write the input file.

Update the `workflow_parameters.yml` and change the `mitohifi.code` parameter to 4
(see [NCBI Taxonomy Browser](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=5630)).

Finally, open a `screen` session and then run the launch script (`./run_nextflow.sh`).
5 changes: 1 addition & 4 deletions docs/gh-pages/initialize.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,5 @@ What happens when a new species is to be assembled?

- [ ] Make a private GitHub repository from the [Assembly template](/~https://github.com/NBISweden/assembly-project-template).
- [ ] Fill in the README.
- [ ] Make a Project Task List (Click "Issues" on the GitHub repository > Select "Get Started" next to 'Project Task List').
- [ ] Make an issue for achieved standards (Click "Issues" on the GitHub repository > Select "Get Started" next to 'Achieved standards').
- [ ] Assign yourself to both issues.
- [ ] Add these issues to the EBP GitHub Project board.
- [ ] Update references.bib with relevant references.
- [ ] Add project details to the project spreadsheet linked in #vr-accessibility-ebp.
49 changes: 44 additions & 5 deletions docs/gh-pages/launch.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,25 +40,64 @@ data/
- `data/frozen` contains symlinks to folders in `data/outputs` which are stage end-points, e.g. the raw-reads have been processed
in various ways, and after looking at QC controls, one folder is selected to be used for assembly. This is symlinked in frozen.

1. Make a translation table in data/raw-data linking the NGI delivery files to the data we're going to use. Need a way to mark bad data.
### Link data between folders

Data in `data/raw-data/*` should be symlinked from `data/deliveries/**`.

E.g.,

```{.bash}
cd data/raw-data/PacBio-Hifi
find ../../deliveries -name "*.bam" -o -name "*.pbi" -exec ln -s {} . \;
```

and

```{.bash}
cd data/raw-data/Illumina-HiC
find ../../deliveries -name "*.fastq.gz" -exec ln -s {} . \;
```

### Assemble sequence data

```{.bash}
cd analyses/01_ebp-assembly-workflow
```

1. Update the `assembly_parameters.yml` with the paths to input files. Check the YML for a bash
snippet to fill out the section.
2. Update the `workflow_parameters.yml` with any extra workflow parameters. In particular,
check anything marked as TODO, e.g., selecting the mitochondrial code table to use.
3. Run the workflow.

```{.bash}
./run_nextflow.sh
```

::: {.callout-note}
The workflow above only runs until Hi-C mapping. Steps for manual curation onwards are still
being implemented.
:::

### Annotate assemblies

No protocols as of yet

### Perform downstream analyses

No protocols as of yet

### Integrate new analyses

2. Need a protocol to integrate custom scripts into template while it's not
integrated into the workflow.
Custom analyses might be needed. In these cases please make use of the other project folders,
and do your best to version control all the steps.

- Put custom code in `code/scripts`, `code/snakemake`, `code/nextflow`, and launch scripts under `code/launch_templates`.
- Put custom code in `code/scripts`, `code/snakemake`, and `code/nextflow`.
- Make sure the code uses containers or conda environments to package the software environment.
- Make an issue on the template to integrate the code into the template so that it's shareable until it's integrated into
a workflow.
- Make an issue on the relevant workflow to integrate the tools.

### Troubleshoot

3. Need a protocol for troubleshooting. Who to ask
If you encounter any issues with using these protocols please ask on #vr-accessibility-ebp.
34 changes: 23 additions & 11 deletions docs/gh-pages/preface.qmd
Original file line number Diff line number Diff line change
@@ -1,16 +1,27 @@
# Onboarding {.unnumbered}

Here you can find instructions on how to run assembly projects
for the VR-EBP, ERGA, and BGE projects.
Here you can find instructions on how to run assembly projects for the VR-EBP, ERGA, and BGE
projects.

To ensure consistent, reproducible, and efficient genome assembly and analysis projects, we've
established these standard operating procedures (SOPs). By following these guidelines, we aim to
optimize our workflows, streamline data management, and facilitate collaboration.

## Why do we need these protocols?

- To make data findable - (strict folder structure)
- Ease project tracking - (git)
- Reduce workload - (automation, code sharing)
- Reproducibility - (workflows, notebooks, git, documentation, containers, interoperability)
- Documentation - (reporting, summaries, issue tracking)

## Getting started

A Github account is needed. A new member needs to added to the NBISweden Github organisation
(Responsible: FIXME), and then to the ERGA assemblies team (Responsible: Martin P.) to access
this webpage and template.
(ask on #technical-operations), and then to the ERGA assemblies team (Responsible: Martin P.).

New members also need to be added to the NAISS compute and storage allocations in SUPR
(Responsible: Henrik).
(Responsible: Henrik / Mahesh).

Life-cyle:
```{mermaid}
Expand All @@ -31,11 +42,11 @@ flowchart LR

- Lead: Henrik (NBIS), Lucile (NBIS)
- Sequencer: Ignas (NGI), Christian (NGI)
- Assembler: Martin P. (NBIS), Mahesh (NBIS), André (NBIS), Guilherme (NBIS), Estelle (NBIS)
- Assembler: Martin P. (NBIS), Mahesh (NBIS), André (NBIS), Guilherme (NBIS), Estelle (NBIS), Tomas (NBIS)
- Annotator: Lucile (NBIS), André (NBIS), Guilherme (NBIS), Martin P. (NBIS)
- Steward: Stephan (NBIS)
- Steward: Stephan (NBIS), Yvonne (NBIS)
- Analyst: André (NBIS), Guilherme (NBIS)
- Developer: Mahesh (NBIS)
- Developer: Mahesh (NBIS), Martin P.(NBIS)
- Monitor: Mahesh (NBIS)

```{mermaid}
Expand All @@ -57,11 +68,12 @@ sequenceDiagram

### Who to talk to:

- Add to Github organisation: FIXME
- Add to Github organisation: #technical-operations
- Add to Github team: Martin P.
- Add to NAISS compute allocation: Henrik
- Add to NAISS storage allocation: Henrik
- Add to NAISS compute allocation: Henrik / Mahesh
- Add to NAISS storage allocation: Henrik / Mahesh
- How to use the template: Mahesh
- Code review: Mahesh
- Protocol review: Mahesh
- Disk space issues: Entire team
- Anything else: #vr-accessibility-ebp

0 comments on commit b9c2f5e

Please sign in to comment.