Spaced seeds are a fundamental tool for similarity search in biosequences. The best sensitivity/selectivity trade-offs are obtained using many seeds simultaneously: This is known as the multiple seed approach. Unfortunately, spaced seeds use a large amount of memory and the available RAM is a practical limit to the number of seeds one can use simultaneously. Inspired by some recent results on lossless seeds, we revisit the approach of using a single spaced seed and considering two regions homologous if the seed hits in at least t sufficiently close positions. We show that by choosing the locations of the don't care symbols in the seed using quadratic residues modulo a prime number, we derive single seeds that when used with a threshold t > 1 have competitive sensitivity/selectivity trade-offs, indeed close to the best multiple seeds known in the literature. In addition, the choice of the threshold t can be adjusted to modify sensitivity and selectivity a posteriori, thus enabling a more accurate search in the specific instance at issue. The seeds we propose also exhibit robustness and allow flexibility in usage.

Multiple seeds sensitivity using a single seed with threshold

EGIDI, Lavinia;MANZINI, Giovanni
2015-01-01

Abstract

Spaced seeds are a fundamental tool for similarity search in biosequences. The best sensitivity/selectivity trade-offs are obtained using many seeds simultaneously: This is known as the multiple seed approach. Unfortunately, spaced seeds use a large amount of memory and the available RAM is a practical limit to the number of seeds one can use simultaneously. Inspired by some recent results on lossless seeds, we revisit the approach of using a single spaced seed and considering two regions homologous if the seed hits in at least t sufficiently close positions. We show that by choosing the locations of the don't care symbols in the seed using quadratic residues modulo a prime number, we derive single seeds that when used with a threshold t > 1 have competitive sensitivity/selectivity trade-offs, indeed close to the best multiple seeds known in the literature. In addition, the choice of the threshold t can be adjusted to modify sensitivity and selectivity a posteriori, thus enabling a more accurate search in the specific instance at issue. The seeds we propose also exhibit robustness and allow flexibility in usage.
File in questo prodotto:
File Dimensione Formato  
qr-jbcb-selfarch.pdf

Open Access dal 01/01/2017

Descrizione: main paper
Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 431.63 kB
Formato Adobe PDF
431.63 kB Adobe PDF Visualizza/Apri
jbcb2015.pdf

file disponibile solo agli amministratori

Descrizione: main paper
Tipologia: Versione Editoriale (PDF)
Licenza: DRM non definito
Dimensione 688.12 kB
Formato Adobe PDF
688.12 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11579/70359
Citazioni
  • ???jsp.display-item.citation.pmc??? 2
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 4
social impact