Titanic Survival Prediction
Personal Projects #Data Science#Python
Overview
A machine learning project that predicts Titanic passenger survival using a Random Forest classifier, demonstrating feature engineering, data cleaning, and model evaluation techniques on the classic Kaggle dataset.
Key Achievements
- Achieved 83.8% prediction accuracy on test data
- Demonstrated that feature engineering impacts model performance
Implementation
Feature Engineering
- Dropped non-predictive features Cabin, Name, and Ticket
- Handled missing values by filling Embarked nulls with mode āSā and Age with category-specific modes
- Created derived features AgeGroup and FareBand
- Encoded categorical data Sex and Embarked to numerical values
| Original | Engineered | Reasoning |
|---|---|---|
| Age | Binned into AgeGroup | Better predictibility for categorical groups |
| Fare | Quartile-based FareBand | Reduce outliers |
| Sex, Embarked | Numerical encoding | Needed for model input |
| Cabin, Name, Ticket | Dropped | No predictive value |
Model Training
- 80/20 train/validation split
- Trained Random Forest classifier on engineered features
- Evaluated accuracy score on unknown test set
Survival Analysis
Overall survival distribution in training data
Survival disparity based on sex
Technologies
Python, Pandas, scikit-learn, Random Forest, Matplotlib