You can view the full paper here
Analysis
Our group used the Apriori algorithm to mine association rules from demographic data and identify rules that have a minimum confidence of 0.3. We then preformed demographic-symptom clustering using symptom frequencies and chi-square tests to determine differences between groups. Using the Boruta algorithm and descriptive analysis, we identified the most important features for prediction. We created and trained three models, a decision tree model, a random forest model created by Sudre, and a custom random forest model, to predict whether a person has a long covid-19 case.
Results
- Demographic analysis showed that individuals who are assigned female at birth, female identifying individuals, white individuals, and individuals who have had at least one vaccination are developing Long Covid-19
- Identified high confidence association rules that indicate that individuals assigned female at birth develop Long Covid-19
- Symptom clustering determined that cough, headache and fatigue were the most prevalent symptoms for individuals developing Long Covid-19
- Descriptive analysis and feature selection indicated that symptom severity was a strong distinguisher
- Our decision tree had an AUC of 0.706
- Our custom random forest model had an AUC of 0.721
- Sudre’s random forest model had an AUC of 0.76
Publications
K. I. Dotzlaw, R. E. Dotzlaw, C. K. Leung, A. G. M. Pazdor, S. A. Szturm, and D. Tan, “Data Analytics and Prediction of Long COVID Cases with Fuzzy Logic,” in 2023 IEEE International Conference on Fuzzy Systems (FUZZ), 2023, DOI: 10.1109/FUZZ52849.2023.10309753.
D. Tan, C. K. Leung, K. I. Dotzlaw, R. E. Dotzlaw, A. G. M. Pazdor, and S. A. Szturm, “A Data Science Solution for Analyzing Long COVID Cases,” in 2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI), 2023, pp. 227-232, DOI: 10.1109/IRI58017.2023.00046
K. Dotzlaw, R. Dotzlaw, C. K. Leung, A. G. M. Pazdor, S. Szturm, and D. Tan, “Mining Big Healthcare Data to Predict Long COVID Cases,” in 2023 IEEE International Conference on Industrial Technology (ICIT), 2023, DOI: 10.1109/ICIT58465.2023.10143145