Heart Disease Prediction
Personal Projects #Python#Machine Learning#Data Science
Overview
An evaluation of Logistic Regression, Decision Tree, and Random Forest model accuracy on Heart Disease data.
Key Achievements
- Achieved 85.07% accuracy on Logistic Regression and Decision Tree classifier
- Achieved 80% accuracy on Random Forest classifer
- Demonstrated understanding of machine learning model pipeline: data preprocessing -> training -> tuning -> prediction -> evaluation
- Created multiple plots using
matplotlibandseabornincluding correlation heatmaps - Identified highly correlated (>0.6) columns with the highest correlation being sysBP-diaBP (0.78)
Implementation
- Data Preprocessing: Dropping irrelevant columns, 80/20 train-test split, reduce inflation by removing outliers, standardize columns with outliers removed and fit_transform, fill null values of test data with most frequent values
- Model Training: Train Logistic Regression, Decision Tree (with max_depth of 3), and Random Forest Classifer (with 3 estimators and k-nearest neighbour), predict target values with
testY, determine accuracy using predicted and actual values
Technologies
Python, Seaborn, Matplotlib, NumPy, Scikit-learn, Logistic Regression, Decision Trees, Random Forest