Titanic Survival Prediction
Personal Projects #Data Science#Python

Overview#

A machine learning project that predicts Titanic passenger survival using a Random Forest classifier, demonstrating feature engineering, data cleaning, and model evaluation techniques on the classic Kaggle dataset.

Key Achievements#

  • Achieved 83.8% prediction accuracy on test data
  • Demonstrated that feature engineering impacts model performance

Implementation#

Feature Engineering#

  • Dropped non-predictive features Cabin, Name, and Ticket
  • Handled missing values by filling Embarked nulls with mode ā€˜S’ and Age with category-specific modes
  • Created derived features AgeGroup and FareBand
  • Encoded categorical data Sex and Embarked to numerical values
OriginalEngineeredReasoning
AgeBinned into AgeGroupBetter predictibility for categorical groups
FareQuartile-based FareBandReduce outliers
Sex, EmbarkedNumerical encodingNeeded for model input
Cabin, Name, TicketDroppedNo predictive value

Model Training#

  • 80/20 train/validation split
  • Trained Random Forest classifier on engineered features
  • Evaluated accuracy score on unknown test set

Survival Analysis#

Survival Distribution Overall survival distribution in training data

Survival by Sex Survival disparity based on sex

Technologies#

Python, Pandas, scikit-learn, Random Forest, Matplotlib

← Back to Projects