diff --git a/docs/getting_started.md b/docs/getting_started.md index 0289e9f..ba37d8f 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -2,18 +2,17 @@ ## `dso init` -- Initialize a project -`dso init` initializes a new project in your current directory. +`dso init` initializes a new project in your current directory. In the context of DSO, a project is a structured environment where data science workflows are organized and managed. -```{command-output} dso init test_project --description "This is a test project" +To initialize a project use the following command: +```bash +# To initialize a project called "test_project" use the following command +dso init test_project --description "This is a test project" ``` -It creates the root directory of your project with all the necessary configuration files for `git`, `dvc`, `uv` and -`dso` itself: - -```{command-output} ls -a test_project +It creates the root directory of your project with all the necessary configuration files for `git`, `dvc`, `uv` and `dso` itself. -``` ## `dso create` -- Add folders or stages to your project @@ -46,8 +45,58 @@ stage |-- report # contains HTML Report generated by Analysis Scripts ``` -## Writing configuration files +## Configuration files + +The config files in a _project_, _folder_, or _stage_ are the cornerstone of any reproducible analysis, serving as a single point of truth. Additionally, using config files reduces the modification time needed for making _project_/_folder_-wide changes. + +Config files are designed to contain all necessary parameters, input, and output files that should be consistent across the analyses. For this purpose, configurations can be defined at each level of your project in a `params.in.yaml` file. These configurations are then transferred into the `params.yaml` files when using `dso compile-config`. + +A `params.yaml` file consolidates configurations from `params.in.yaml` files located in its parent directories, as well as from the `params.in.yaml` file in its own directory. For your analysis, reading in the `params.yaml` of the respective stage gives you then access to all the configurations. + +The following diagram displays the inheritance of configurations: + +```{eval-rst} +.. image:: ../img/dso-yaml-inherit.png + :width: 60% +``` + +### Writing configuration files +To define your configurations in the `params.in.yaml` files, please adhere to the yaml syntax. Due to the implemented configuration inheritance, relative paths need to be resolved within each __folder__ or __stage__. Therefore, relative paths need to be specified with `!path`. + +An example `params.in.yaml` can look as follows: + +```bash +thresholds: + fc: 2 + p_value: 0.05 + p_adjusted: 0.1 + +samplesheet: !path "01_preprocessing/input/samplesheet.txt" + +metadata_file: !path "metadata/metadata.csv" + +file_with_abs_path: "/data/home/user/typical_analysis_data_set.csv" + +remove_outliers: true + +exclude_samples: + - sample_1 + - sample_2 + - sample_6 + - sample_42 +``` + +### Compiling `params.yaml` files + +All `params.yaml` files are automatically generated using: + +```bash +dso compile-config +``` + +### Overwriting Parameters +When multiple `params.in.yaml` files (such as those at the project, folder, or stage level) contain the same configuration, the value specified at the more specific level (e.g., stage) takes precedence over the value set at the broader level (e.g., project). This makes the analysis adaptable and enhances modifiability across the project. ## Implementing a stage ### R diff --git a/img/dso-yaml-inherit.png b/img/dso-yaml-inherit.png new file mode 100644 index 0000000..430aa29 Binary files /dev/null and b/img/dso-yaml-inherit.png differ