Heart Disease Prediction
Personal Projects #Python#Machine Learning#Data Science

Overview#

An evaluation of Logistic Regression, Decision Tree, and Random Forest model accuracy on Heart Disease data.

View the project on GitHub

Key Achievements#

  • Achieved 85.07% accuracy on Logistic Regression and Decision Tree classifier
  • Achieved 80% accuracy on Random Forest classifer
  • Demonstrated understanding of machine learning model pipeline: data preprocessing -> training -> tuning -> prediction -> evaluation
  • Created multiple plots using matplotlib and seaborn including correlation heatmaps
  • Identified highly correlated (>0.6) columns with the highest correlation being sysBP-diaBP (0.78)

Implementation#

  • Data Preprocessing: Dropping irrelevant columns, 80/20 train-test split, reduce inflation by removing outliers, standardize columns with outliers removed and fit_transform, fill null values of test data with most frequent values
  • Model Training: Train Logistic Regression, Decision Tree (with max_depth of 3), and Random Forest Classifer (with 3 estimators and k-nearest neighbour), predict target values with testY, determine accuracy using predicted and actual values

Technologies#

Python, Seaborn, Matplotlib, NumPy, Scikit-learn, Logistic Regression, Decision Trees, Random Forest

← Back to Projects