Name		Name	Last commit message	Last commit date
parent directory ..
backbone_pretrained_models		backbone_pretrained_models
external_repo		external_repo
models		models
notebooks		notebooks
utils		utils
Readme.md		Readme.md
__init__.py		__init__.py
hpo_config.yaml		hpo_config.yaml
knife.yaml		knife.yaml
main.py		main.py
main_downstream.py		main_downstream.py
main_downstream_DTI.py		main_downstream_DTI.py
precompute_3d.py		precompute_3d.py
preprocess.py		preprocess.py
preprocess_MoleOOD.py		preprocess_MoleOOD.py
requirements.txt		requirements.txt

Readme.md

💊 Model comparison for molecular data

This directory contains the code to compare models on molecular data. The code is written in Python and uses the RDKit/Datamol library\cite{} to handle molecular data. The datasets are imported from the Therapeutic Data Commons (TDC) plateform\cite{}.

📋 Installation

To install the required packages,you need an environement with "pytorch", "pytorch-geometric" and "torch-scatter" already installed, you can use the following command,

pip install -r requirements.txt

cd external_repo/pre-training-via-denoising
pip install -e .

📁 Data Preprocessing

Various descriptors use a 3-dimensional representation of the molecules. To generate these representations, and preprocess the datasets, use the script precompute_.py.

python precompute_molf_descriptors.py --dataset <dataset_name> --descriptors <[optional] molecular descriptors to compute>

📈 Information Sufficiency

To evaluate the Information Sufficiency, you can use the following command:

python main.py --dataset <dataset_name> --X <models_names> --Y <models_names> --out-dir <output_dir>

This command will train the models specified in the --X and --Y arguments on the dataset <dataset_name>, and save the results in the directory <output_dir>. Once the Mutual information is computed, the results can be visualized using the notebooks

🦠 Downstream tasks

The various models considered can be trained on downstream tasks using the script main_downstream.py.

python main_downstream.py --dataset <dataset_name> --embedders <models_names> --n-runs <number_of_runs> --split-method <random/scaffold> --hidden-dim <hidden_dim> --n-layers <n_layers> --n-epochs <n_epochs> --d-rate <dropout_rate> --lr <learning_rate> --batch-size <batch_size>

This command will train the models on the single instance tasks using the dataset <dataset_name>, and save the results in the directory <output_dir>.

For DTI tasks, the script main_downstream_dti.py can be used.

python main_downstream.py --dataset <dataset_name> --embedders <models_names>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

molecule

molecule

Readme.md

💊 Model comparison for molecular data

📋 Installation

📁 Data Preprocessing

📈 Information Sufficiency

🦠 Downstream tasks

Files

molecule

Directory actions

More options

Directory actions

More options

Latest commit

History

molecule

Folders and files

parent directory

Readme.md

💊 Model comparison for molecular data

📋 Installation

📁 Data Preprocessing

📈 Information Sufficiency

🦠 Downstream tasks