Development of a classification and ranking method for the identification of possible biomarkers in two-dimensional gel-electrophoresis based on principal component analysis and variable selection procedures

Robotti, Elisa; Marco, Demartini; Gosetti, Fabio; Giorgio, Calabrese; Marengo, Emilio

doi:10.1039/c0mb00124d

The identification of biomarkers is one of the leading research areas in proteomics. When biomarkers have to be searched for in spot volume datasets produced by 2D gel-electrophoresis, problems may arise related to the large number of spots present in each map and the small number of samples available in each class (control/pathological). In such cases multivariate methods are usually exploited together with variable selection procedures, to provide a set of possible biomarkers: they are however usually aimed to the selection of the smallest set of variables (spots) providing the best performances in prediction. This approach seems not to be suitable for the identification of potential biomarkers since in this case all the possible candidate biomarkers have to be identified to provide a general picture of the "pathological state'': in this case exhaustivity has to be preferred to provide a complete understanding of the mechanisms underlying the pathology. We propose here a ranking and classification method, "Ranking-PCA'', based on Principal Component Analysis and variable selection in forward search: the method selects one variable at a time as the one providing the best separation of the two classes investigated in the space given by the relevant PCs. The method was applied to an artificial dataset and a real case-study: Ranking-PCA exhaustively identified the potential biomarkers and provided reliable and robust results.