Exploratory Data Analysis on Beers & Breweries Datasets Personal Projects #Data Science#Python
Overview
A comprehensive exploratory data analysis (EDA) project examining the relationship between beer characteristics (ABV, IBU) and brewery data.
Key Achievements
- Discovered moderate correlation (0.670) between IBU and ABV values
- Identified that ounces has no significant correlation with other features (0.054-0.172)\
- Found 41.7% missing values in IBU column
- Determined American IPA is the most common style (424 occurences out of 99 styles)
Descriptive Statistics of Numerical Values
| Column | Non Null | Missing % | Min | Max | Mean | Median | Mode | Std Dev | Quartile 0.25 | Quartile 0.5 | Quartile 0.75 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| IBU | 1405 | 41.7% | 4.0 | 20.0 | 42.713 | 35.0 | 20.0 | 25.954 | 21.0 | 35.0 | 64.0 |
| ABV | 2348 | 2.6% | 0.001 | 0.128 | 0.0598 | 0.056 | 0.05 | 0.0135 | 0.05 | 0.056 | 0.067 |
| Ounces | 2410 | 0% | 8.4 | 32.0 | 13.592 | 12.0 | 12.0 | 2.352 | 12.0 | 12.0 | 16.0 |
Correlation Matrix
| abv | ibu | ounces | |
|---|---|---|---|
| abv | 1.0 | 0.670 | 0.172 |
| ibu | 0.670 | 1.0 | 0.054 |
| ounces | 0.172 | 0.054 | 1.0 |
Frequency Distribution Plots
For each numerical column, use seaborn and pyplot to create distribution plots of numerical columns while dropping missing values.
ibu Distribution

abv Distribution

ounces Distribution

Technologies
Python, Pandas, NumPy, Seaborn, Matplotlib, Pearson Correlation