The identification of biomarkers is one of the leading research areas in proteomics. When biomarkers have to be searched for in spot volume datasets produced by 2D gel-electrophoresis, problems may arise related to the large number of spots present in each map and the small number of samples available in each class (control/pathological). In such cases multivariate methods are usually exploited together with variable selection procedures, to provide a set of possible biomarkers: they are however usually aimed to the selection of the smallest set of variables (spots) providing the best performances in prediction. This approach seems not to be suitable for the identification of potential biomarkers since in this case all the possible candidate biomarkers have to be identified to provide a general picture of the "pathological state'': in this case exhaustivity has to be preferred to provide a complete understanding of the mechanisms underlying the pathology. We propose here a ranking and classification method, "Ranking-PCA'', based on Principal Component Analysis and variable selection in forward search: the method selects one variable at a time as the one providing the best separation of the two classes investigated in the space given by the relevant PCs. The method was applied to an artificial dataset and a real case-study: Ranking-PCA exhaustively identified the potential biomarkers and provided reliable and robust results.

Development of a classification and ranking method for the identification of possible biomarkers in two-dimensional gel-electrophoresis based on principal component analysis and variable selection procedures

ROBOTTI, Elisa;GOSETTI, Fabio;MARENGO, Emilio
2011-01-01

Abstract

The identification of biomarkers is one of the leading research areas in proteomics. When biomarkers have to be searched for in spot volume datasets produced by 2D gel-electrophoresis, problems may arise related to the large number of spots present in each map and the small number of samples available in each class (control/pathological). In such cases multivariate methods are usually exploited together with variable selection procedures, to provide a set of possible biomarkers: they are however usually aimed to the selection of the smallest set of variables (spots) providing the best performances in prediction. This approach seems not to be suitable for the identification of potential biomarkers since in this case all the possible candidate biomarkers have to be identified to provide a general picture of the "pathological state'': in this case exhaustivity has to be preferred to provide a complete understanding of the mechanisms underlying the pathology. We propose here a ranking and classification method, "Ranking-PCA'', based on Principal Component Analysis and variable selection in forward search: the method selects one variable at a time as the one providing the best separation of the two classes investigated in the space given by the relevant PCs. The method was applied to an artificial dataset and a real case-study: Ranking-PCA exhaustively identified the potential biomarkers and provided reliable and robust results.
File in questo prodotto:
File Dimensione Formato  
mol biosyst.pdf

file disponibile solo agli amministratori

Tipologia: Altro materiale allegato
Licenza: DRM non definito
Dimensione 2.01 MB
Formato Adobe PDF
2.01 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11579/12376
Citazioni
  • ???jsp.display-item.citation.pmc??? 3
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 11
social impact