Technical Knowledge Advanced 2 years experience
Summary
Data scientist with experience in the complete analytics pipeline: EDA, feature engineering, model training, and result interpretation. Applied data science to healthcare analytics, resulting in peer-reviewed IEEE publications.
How I Apply This Skill
- Conducted comprehensive EDA on beer/brewery datasets identifying correlations (0.67 IBU-ABV) and data quality issues (41.7% missing IBU values)
- Performed feature engineering on Titanic data: binning ages, creating fare bands using quartiles, encoding categoricals
- Trained and evaluated models with proper train/test/validation splits and cross-validation
- Created visualizations with Matplotlib and seaborn for statistical insights and result presentation
- Published healthcare analytics research in 3 IEEE conferences
- Designed a semantic schema mapping pipeline for the Text-to-SQL dashboards that translates business vocabulary to database views across 48 tables and 561 columns, using precomputed embeddings with a cosine similarity threshold of 0.65
- Built search analytics with gap analysis for the RAG Document Assistant, logging query volume, latency, and zero-result queries in SQLite to identify documentation coverage gaps
- Explored the 2020 Johns Hopkins and Worldometer COVID-19 datasets across 712k+ rows with seven standalone Plotly visualizations (global/USA county choropleths, bubble maps, mortality and recovery rates, WHO region comparisons) tied together by a Dash dashboard
- Applied Holt-Winters exponential smoothing via
statsmodelsfor a 30-day forecast of global daily confirmed COVID-19 cases with a 95% confidence band
Key Strengths
- EDA Workflow: Missing value analysis, statistical examination, plotting distributions, correlation matrices
- Feature Engineering: Categorical encoding, binning, derived features, filling null values
- Healthcare Analytics: Long COVID-19 prediction, demographic analysis, symptom clustering
- Visualization: Matplotlib, Seaborn, distribution plots, heatmaps, model performance curves
- Statistical Analysis: Correlation, quantile statistics, descriptive statistics
Related Projects
- Text-to-SQL Dashboard - Metabase
- Text-to-SQL Dashboard - Native React Charts
- RAG Document Assistant
- Long COVID Prediction Research (IEEE Published)
- CIFAR-10 Image Classification
- Exploratory Data Analysis on Beers & Breweries Datasets
- Titanic Survival Prediction
- COVID-19 Data Exploration Dashboard
- Protein Structure Viewer