This repository contains code and analysis for detecting cancer using various machine learning algorithms. We compare the performance of logistic regression, decision tree, and random forest models. Additionally, we include preprocessing steps to prepare the data for modeling.
The data is included in the repository (data.csv
). The dataset contains information related to various risk factors associated with breast cancer.
data.csv
: Contains the dataset.cancer_clissifier.ipynb
: Jupyter notebook for data exploration, preprocessing, and model training.README.md
: This file.
-
Feature Selection: We remove the useless features that are irrelevent to the labels.
-
Feature Scaling: To ensure consistent scaling across features, we apply standardization.
We train the following machine learning models:
-
Logistic Regression: A linear model that predicts the probability of an instance belonging to a particular class.
-
Decision Tree: A tree-based model that splits the data based on feature thresholds to make predictions.
-
Random Forest: An ensemble of decision trees that combines their predictions for improved accuracy.
We evaluate the models using the following metrics:
- Accuracy: Overall correctness of predictions.
- Recall: Ability to correctly identify positive cases (cancer).
- Precision: Proportion of true positive predictions among all positive predictions.
- F1-score: Harmonic mean of precision and recall.
There are two ways that you can work with this project:
-
Clone this repository
git clone /~https://github.com/barzansaeedpour/cancer-detection.git
Navigate to the appropriate directory and run the Jupyter notebooks or Python scripts to explore the data, preprocess it, and train the models.
Feel free to add more details, instructions, or any other relevant information.🌟🔬🩺