As a first step in designing relatively-compressed data structures—i.e., such that storing an instance for one dataset helps us store instances for similar datasets—we consider how to compress spaced suffix arrays relative to normal suffix arrays and still support fast access to them. This problem is of practical interest when performing similarity search with spaced seeds because using several seeds in parallel significantly improves their performance, but with existing approaches we keep a separate linear-space hash table or spaced suffix array for each seed. We first prove a theoretical upper bound on the space needed to store a spaced suffix array when we already have the suffix array. We then present experiments indicating that our approach works even better in practice.
Compressed Spaced Suffix Arrays
MANZINI, Giovanni;
2017-01-01
Abstract
As a first step in designing relatively-compressed data structures—i.e., such that storing an instance for one dataset helps us store instances for similar datasets—we consider how to compress spaced suffix arrays relative to normal suffix arrays and still support fast access to them. This problem is of practical interest when performing similarity search with spaced seeds because using several seeds in parallel significantly improves their performance, but with existing approaches we keep a separate linear-space hash table or spaced suffix array for each seed. We first prove a theoretical upper bound on the space needed to store a spaced suffix array when we already have the suffix array. We then present experiments indicating that our approach works even better in practice.File | Dimensione | Formato | |
---|---|---|---|
cssa_selfarch.pdf
Open Access dal 04/02/2018
Descrizione: Articolo Principale
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
240.15 kB
Formato
Adobe PDF
|
240.15 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.