We present a study about the prediction of long-COVID sequelae through multi-label classification (MLC). Data on more than 300 patients have been collected during a long-COVID study at Ospedale Maggiore of Novara (Italy), considering their baseline situation, as well as their condition on acute COVID-19 onset. The goal is to predict the presence of specific long-COVID sequelae after a one-year follow-up. To amplify the representativeness of the analysis, we carefully investigated the possibility of both augmenting the dataset by considering situations where different levels in the number of complications could arise, and reducing the number of features to be considered for prediction. In the first case, MLSmote under six different policies of data augmentation has been considered, while in case of feature reduction we have generated new datasets via both a supervised and an unsupervised dimension reduction approach (RELIEF and PCA respectively). A representative set of MLC approaches has been tested on all the available datasets. Results have been evaluated in terms of Accuracy, Exact match, Hamming score and macro-averaged AUC; they show that MLC methods can actually be useful for the prediction of specific long-COVID sequelae, under the different conditions represented by the different considered datasets. In addition, interpretability of the results has been addressed through an approach based on the SHAP method, showing that clinical interpretations of specific predictions can be actually captured by the method, together with the observation that data augmentation techniques do not harm such a kind of explanations.

Predicting Long-COVID Sequelae: A Multi-Label Classification Approach

Mattia Bellan;A. Chiocchetti;C. Irwin;L. Piovesan;L. Portinale
2025-01-01

Abstract

We present a study about the prediction of long-COVID sequelae through multi-label classification (MLC). Data on more than 300 patients have been collected during a long-COVID study at Ospedale Maggiore of Novara (Italy), considering their baseline situation, as well as their condition on acute COVID-19 onset. The goal is to predict the presence of specific long-COVID sequelae after a one-year follow-up. To amplify the representativeness of the analysis, we carefully investigated the possibility of both augmenting the dataset by considering situations where different levels in the number of complications could arise, and reducing the number of features to be considered for prediction. In the first case, MLSmote under six different policies of data augmentation has been considered, while in case of feature reduction we have generated new datasets via both a supervised and an unsupervised dimension reduction approach (RELIEF and PCA respectively). A representative set of MLC approaches has been tested on all the available datasets. Results have been evaluated in terms of Accuracy, Exact match, Hamming score and macro-averaged AUC; they show that MLC methods can actually be useful for the prediction of specific long-COVID sequelae, under the different conditions represented by the different considered datasets. In addition, interpretability of the results has been addressed through an approach based on the SHAP method, showing that clinical interpretations of specific predictions can be actually captured by the method, together with the observation that data augmentation techniques do not harm such a kind of explanations.
File in questo prodotto:
File Dimensione Formato  
bellan-et-al-2025-predicting-long-covid-sequelae-a-multi-label-classification-approach.pdf

file disponibile agli utenti autorizzati

Descrizione: Paper
Tipologia: Versione Editoriale (PDF)
Licenza: Non specificato
Dimensione 2.21 MB
Formato Adobe PDF
2.21 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11579/204703
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact