Data Science icon
Data Science
Technical Knowledge Advanced 2 years experience

Summary#

Data scientist with experience in the complete analytics pipeline: EDA, feature engineering, model training, and result interpretation. Applied data science to healthcare analytics, resulting in peer-reviewed IEEE publications.

How I Apply This Skill#

  • Conducted comprehensive EDA on beer/brewery datasets identifying correlations (0.67 IBU-ABV) and data quality issues (41.7% missing IBU values)
  • Performed feature engineering on Titanic data: binning ages, creating fare bands using quartiles, encoding categoricals
  • Trained and evaluated models with proper train/test/validation splits and cross-validation
  • Created visualizations with Matplotlib and seaborn for statistical insights and result presentation
  • Published healthcare analytics research in 3 IEEE conferences
  • Designed a semantic schema mapping pipeline for the Text-to-SQL dashboards that translates business vocabulary to database views across 48 tables and 561 columns, using precomputed embeddings with a cosine similarity threshold of 0.65
  • Built search analytics with gap analysis for the RAG Document Assistant, logging query volume, latency, and zero-result queries in SQLite to identify documentation coverage gaps
  • Explored the 2020 Johns Hopkins and Worldometer COVID-19 datasets across 712k+ rows with seven standalone Plotly visualizations (global/USA county choropleths, bubble maps, mortality and recovery rates, WHO region comparisons) tied together by a Dash dashboard
  • Applied Holt-Winters exponential smoothing via statsmodels for a 30-day forecast of global daily confirmed COVID-19 cases with a 95% confidence band

Key Strengths#

  • EDA Workflow: Missing value analysis, statistical examination, plotting distributions, correlation matrices
  • Feature Engineering: Categorical encoding, binning, derived features, filling null values
  • Healthcare Analytics: Long COVID-19 prediction, demographic analysis, symptom clustering
  • Visualization: Matplotlib, Seaborn, distribution plots, heatmaps, model performance curves
  • Statistical Analysis: Correlation, quantile statistics, descriptive statistics
← Back to Skills