Long COVID Prediction Research (IEEE Published)
University Projects #Data Science#Machine Learning#Python
Featured
Waiting for api.github.com...
You can view the full paper here
Overview
Key Achievements
- 3 IEEE pubications as co-author on peer-reviewed conference papers
- Custom random forest achieved AUC of 0.721; Sudre’s model achieved AUC of 0.76; Decision tree achieved AUC of 0.706
- Identified that individuals assigned female at birth develop Long COVID at higher rates
- Discovered that cough, headache, and fatigue are the most prevalent symptoms for Long COVID development
- Determined that symptom severity is a strong distinguishing feature for prediction
Implementation
- Applied Apriori algorithm to mine association rules with >80% confidence from demographic data
- Performed demographic-symptom clustering using symptom frequency and chi-square tests to identify differences
- Used the Boruta algorithm and descriptive analysis to identify predictive features
- Trained and compared 3 models: decision tree, Sudre’s random forest, and a custom random forest classifier
- Applied fuzzy logic techniques to handle uncertainty in symptom severity data
Publications
- K. I. Dotzlaw, R. E. Dotzlaw, C. K. Leung, A. G. M. Pazdor, S. A. Szturm, and D. Tan, “Data Analytics and Prediction of Long COVID Cases with Fuzzy Logic,” in 2023 IEEE International Conference on Fuzzy Systems (FUZZ), 2023, DOI: 10.1109/FUZZ52849.2023.10309753.
- D. Tan, C. K. Leung, K. I. Dotzlaw, R. E. Dotzlaw, A. G. M. Pazdor, and S. A. Szturm, “A Data Science Solution for Analyzing Long COVID Cases,” in 2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI), 2023, pp. 227-232, DOI: 10.1109/IRI58017.2023.00046
- K. Dotzlaw, R. Dotzlaw, C. K. Leung, A. G. M. Pazdor, S. Szturm, and D. Tan, “Mining Big Healthcare Data to Predict Long COVID Cases,” in 2023 IEEE International Conference on Industrial Technology (ICIT), 2023, DOI: 10.1109/ICIT58465.2023.10143145
Technologies
Python, pandas, scikit-learn, Apriori algorithm, Boruta feature selection, Random Forest, Decision Trees, Fuzzy Logic