Prior knowledge-guided tree-based models

Integration of prior domain knowledge into tree-based models.

Description

We developed a robust framework that improves tree-based models for high-dimensional, noisy data by integrating feature selection, tree construction, and weighting with prior knowledge, combining data-driven insights and established domain understanding.

Application Use Case

We compared the performance of the standard tree-based models and our proposed approaches on an application use case concerning the cancer-related subtype prediction of patients based on gene expression data. The use case concerns the classification of Breast Invasive Carcinoma (BRCA) patients in their corresponding cancer subtypes. We also performed two distinct sensitivity analyses to evaluate the impact of incorporating prior knowledge into tree-based models. We used a controlled dataset with limited correlation among the features for these analyses, considering publicly available RNA-seq profiles of Kidney Renal Clear Cell Carcinoma patients from The Cancer Genome Atlas (TCGA) project. The preprocessed dataset is available here, along with the list of features considered in the controlled dataset. For all datasets analysed, the data are preprocessed as described in the notebooks found here.

Implementation

To implement such tree-based models, we developed PkTree, a Python package that implements the proposed modifications. More information on the usage of the PkTree package is available here.

First, build a dedicated conda environment:

conda create -n env_pktree python=3.9 
conda activate env_pktree

Install the PkTree package:

pip install pktree

Lastly, install the required packages from requirements.txt

Prior domain knowledge

In these experiments, we used the score of biological knowledge described here and available here.

Additional Information

All the code is available here. To replicate the experiments run the following scripts:

BRCA_dt.py for experiments with the Decision Tree model
BRCA_rf_parallel.py for experiments with the Random Forest model

The code to replicate the two sensitivity analyses is available here. All the results from the experiments we performed can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
notebooks		notebooks
results		results
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prior knowledge-guided tree-based models

Description

Application Use Case

Implementation

Prior domain knowledge

Additional Information

About

Releases

Packages

Contributors 2

Languages

License

DEIB-GECO/prior_tree_models_repo

Folders and files

Latest commit

History

Repository files navigation

Prior knowledge-guided tree-based models

Description

Application Use Case

Implementation

Prior domain knowledge

Additional Information

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages