Long COVID Prediction Research (IEEE Published)
University Projects #Data Science#Machine Learning#Python Featured
rdotzlaw
/
Long-Covid-Prediction
Waiting for api.github.com...
00K
0K
0K
Waiting...

Overview#

Since the start of the COVID-19 pandemic in 2019, large volumes of patient data have been collected and analyzed. As Long COVID emerged as a significant health concern, this research focused on understanding and predicting its development. By mining associations in demographic data, clustering common symptoms, and applying machine learning models with fuzzy logic to handle uncertainty in symptom severity, this work produced a classifier that identifies individuals at risk of developing Long COVID. The resulting models highlight key demographic and symptom-based risk factors, with the findings published across 3 peer-reviewed IEEE conference papers.

View the original paper here

Key Achievements#

  • 3 IEEE pubications as co-author on peer-reviewed conference papers
  • Custom random forest achieved AUC of 0.721; Sudre’s model achieved AUC of 0.76; Decision tree achieved AUC of 0.706
  • Identified that individuals assigned female at birth develop Long COVID at higher rates
  • Discovered that cough, headache, and fatigue are the most prevalent symptoms for Long COVID development
  • Determined that symptom severity is a strong distinguishing feature for prediction

Implementation#

  • Applied Apriori algorithm to mine association rules with >80% confidence from demographic data
  • Performed demographic-symptom clustering using symptom frequency and chi-square tests to identify differences
  • Used the Boruta algorithm and descriptive analysis to identify predictive features
  • Trained and compared 3 models: decision tree, Sudre’s random forest, and a custom random forest classifier
  • Applied fuzzy logic techniques to handle uncertainty in symptom severity data

Publications#

  • K. I. Dotzlaw, R. E. Dotzlaw, C. K. Leung, A. G. M. Pazdor, S. A. Szturm, and D. Tan, “Data Analytics and Prediction of Long COVID Cases with Fuzzy Logic,” in 2023 IEEE International Conference on Fuzzy Systems (FUZZ), 2023, DOI: 10.1109/FUZZ52849.2023.10309753.
  • D. Tan, C. K. Leung, K. I. Dotzlaw, R. E. Dotzlaw, A. G. M. Pazdor, and S. A. Szturm, “A Data Science Solution for Analyzing Long COVID Cases,” in 2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI), 2023, pp. 227-232, DOI: 10.1109/IRI58017.2023.00046
  • K. Dotzlaw, R. Dotzlaw, C. K. Leung, A. G. M. Pazdor, S. Szturm, and D. Tan, “Mining Big Healthcare Data to Predict Long COVID Cases,” in 2023 IEEE International Conference on Industrial Technology (ICIT), 2023, DOI: 10.1109/ICIT58465.2023.10143145

Technologies#

Python, pandas, scikit-learn, Apriori algorithm, Boruta feature selection, Random Forest, Decision Trees, Fuzzy Logic

← Back to Projects