Biomarkers discovery is a discipline achieving increasing importance since it provides diagnostic/prognostic markers and may permit to investigate and understand the mechanism of development of the pathology, possibly suggesting new biomolecular therapeutic targets. Biomarkers discovery in proteomics is hampered by the use of high-throughput techniques providing a great number of candidates among which the true biomarkers have to be searched for. Moreover, often a small number of samples are available. Two main problems arise when biomarkers have to be searched for in such datasets: 1) the identification of reliable markers, avoiding false positives due to chance correlations; 2) the exhaustive identification of all candidate markers, to obtain a complete snapshot of the effect investigated. Biomarkers can be identified by two approaches: classical monovariate methods, where each biomarker is considered as independent (Student’s t-test, Mann-Whitney test etc.) or multivariate methods, able to take into con- sideration the correlation structure of the data (i.e. interactions). These last ones are certainly to be preferred and should achieve the best compromise between the best predictive ability (accomplished through the use of variable selection procedures and exhaustivity. Here, we review the most recent applications of multivariate methods for the identification of biomarkers in proteomics with particular regard to the statistical methods exploited.
Biomarkers Discovery through Multivariate Statistical Methods: A Review of Recently Developed Methods and Applications in Proteomics
Marcello Manfredi
2013-01-01
Abstract
Biomarkers discovery is a discipline achieving increasing importance since it provides diagnostic/prognostic markers and may permit to investigate and understand the mechanism of development of the pathology, possibly suggesting new biomolecular therapeutic targets. Biomarkers discovery in proteomics is hampered by the use of high-throughput techniques providing a great number of candidates among which the true biomarkers have to be searched for. Moreover, often a small number of samples are available. Two main problems arise when biomarkers have to be searched for in such datasets: 1) the identification of reliable markers, avoiding false positives due to chance correlations; 2) the exhaustive identification of all candidate markers, to obtain a complete snapshot of the effect investigated. Biomarkers can be identified by two approaches: classical monovariate methods, where each biomarker is considered as independent (Student’s t-test, Mann-Whitney test etc.) or multivariate methods, able to take into con- sideration the correlation structure of the data (i.e. interactions). These last ones are certainly to be preferred and should achieve the best compromise between the best predictive ability (accomplished through the use of variable selection procedures and exhaustivity. Here, we review the most recent applications of multivariate methods for the identification of biomarkers in proteomics with particular regard to the statistical methods exploited.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.