Compressed Data Structures

The team led by Professor Simon J. Puglisi focuses on efficient algorithms and data structures for searching, storing, and manipulating strings, trees and graphs, and applications thereof (like bioinformatics, information retrieval, database systems, and data mining).

Compressed data structures lie at the intersection of data structures and data compression. Often, if a particular data set is compressible, a data structure built from it is compressible too. The aim of compressed data structuring is to reduce the space consumed by a data structure using techniques from data compression, but to simultaneously support fast operations on the data structure too. Compressed data structures (and their cousins, succinct data structures) are a relatively young branch of computer science, but have already found heavy use in bioinformatics, database systems, and search engines.

    Team members

    Selected Software

    Selected Recent Publications

    • Jarno N. Alanko, Jaakko Vuohtoniemi, Tommi Mäklin, and Simon J. Puglisi. Themisto: a scalable colored k-mer index for sensitive pseudoalignment against hundreds of thousands of bacterial genomes. Bioinformatics, 39(Supplement 1):i260–i269, 2023 [Article online] [Code]
    • Jarno N. Alanko, Simon J. Puglisi, Jaakko Vuohtoniemi:
      Small Searchable k-Spectra via Subset Rank Queries on the Spectral Burrows-Wheeler Transform. ACDA 2023: 225-236 [Article online] [Code]
    • Jarno N. Alanko, Elena Biagi, Simon J. Puglisi:
      Longest Common Prefix Arrays for Succinct k-Spectra. SPIRE 2023: 1-13 [Article online] [Code]
    • Diego Díaz-Domínguez, Saska Dönges, Simon J. Puglisi, Leena Salmela:
      Simple Runs-Bounded FM-Index Designs Are Fast. SEA 2023: 7:1-7:16 [Article online] [Code]
      Dominik Köppl, Simon J. Puglisi, Rajeev Raman:
      Fast and Simple Compact Hashing via Bucketing. Algorithmica 84(9): 2735-2766 (2022) [Article online]
    • Saska Dönges, Simon J. Puglisi, Rajeev Raman:
      On Dynamic Bitvector Implementations. DCC 2022: 252-261. [Article online] [Code]
    • Block Trees, Journal of Computer and System Sciences 117:1-22 (2021). [Article online]
    • L. Salmela, K. Mukherjee, S.J. Puglisi, M.D. Muggli, and C. Boucher: Fast and accurate correction of optical mapping data via spaced seeds. Bioinformatics 36(3): 682-689 (2020). [Article online] [Implementation]
    • M. Cáceres, S.J. Puglisi, B. Zhukova: Fast Indexes for Gapped Pattern Matching. Proceedings of SOFSEM 2020: 493-504. [Article online]
    • S. Gog, J. Kärkkäinen, D. Kempa, M. Petri, S.J. Puglisi: Fixed Block Compression Boosting in FM-Indexes: Theory and Practice, Algorithmica 81(4): 1370-1391 (2019). [Article online]
    • Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi, Bella Zhukova: Engineering External Memory Induced Suffix Sorting. ALENEX 2017: 98-108
    • Juha Kärkkäinen, Marcin Piatkowski, Simon J. Puglisi: String Inference from Longest-Common-Prefix Array. ICALP 2017: 62:1-62:14
    • Djamal Belazzougui, Simon J. Puglisi: Range Predecessor and Lempel-Ziv Parsing. SODA 2016: 2053-2071
    • Yasuo Tabei, Hiroto Saigo, Yoshihiro Yamanishi, Simon J. Puglisi: Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices. KDD 2016: 1875-1884

    Recent News/Activities

    December 2023

    Simon visiting Hideo Bannai in Tokyo.

    Touko Puro receives his MSc degree.

    November 2023

    Simon is an opponent at Garance Gourdel's PhD defence at Ecole Normale Superieur in Paris.

    October 2023

    Simon is an opponent at Max Pedersen's PhD defence at TU Denmark.

    Elena, Jarno, and Simon attend the Symposium on String Processing and Information Retrieval (SPIRE) in Pisa. Elena presenting work on Longest-common-suffix array construction.

    July 2022

    Simon attends the Annual Symposium on Combinatorial Pattern Matching in Paris.

    June 2022

    Elena, Saska, and Simon attend the Symposium on Experimental Algorithms (SEA) in Barcelona.

    Elena, Jarno, and Simon visit Zamin Iqbal's lab at the European Bioinformatics Institute.

    May 2023

    "Small Searchable k-Spectra via Subset Rank Queries on the Spectral Burrows-Wheeler Transform" wins the Best Paper Award at the SIAM Conference on Applied and Computational Discrete Algorithms in Seattle, USA.

    October 2022

    Elena Biagi (late of U. Verona) is starting a PhD in our group!

    April 2022

    Saska presents at the Data Compression Conference in Snowbird, Utah.

    June 2021

    Weighted-level ancestor queries revisited wins the Alberto Apostolico Best Paper Award at the Annual Symposium on Combinatorial Pattern Matching.

    March 2021

    Papers accepted to CPM and SEA:

    D. Belazzogui, D. Kosolobov, S. J. Puglisi, & R. Raman, "Weighted-level ancestor queries revisited", to appear at CPM
    S. J. Puglisi & B. Zhukova, "Document Listing Hacks", to appeat at SEA

    January 2021

    Papers appear in print at JCSS (on block trees) and TCS (on suffix tree breadth)
    Saska Dönges starts as a PhD student

    December 2020

    Three journal papers appear this month at Bioinformatics, Algorithmica, and Algorithms + two papers accepted to the 2021 Data Compression Conference:

    Simon J. Puglisi & Bella Zhokova, "Smaller RLZ-compressed suffix arrays"
    Danyang Ma, Simon J. Puglisi, Rajeev Raman & Bella Zhokova, "On Elias-Fano for Rank Queries in FM-Indexes"

    November 2020
    Lauri Heino receives his MSc degree

    September 2020
    Pekka Jylha-Ollila receives his MSc degree

    July 2020
    Jussi Timonen, Antti Karjalainen, and Risto Haapasalmi receive their MSc degrees
    Bella and Simon have paper "Relative Lempel-Ziv compression of suffix array" accepted to SPIRE 2020

    May 2020
    Simon is opponent for Florian Kurpicz's PhD thesis defence at TU Dortmund

    March 2020
    Two papers accepted to SEA 2020 in Palermo

    February 2020
    Simon attending London Stringology Days at King's College London

    January 2020
    Bella presenting "Fast Indexes for Gapped Pattern Matching" at SOFSEM 2020 in Cyprus

    December 2019
    Simon visiting Yasuo Tabei and Hideo Bannai at RIKEN AIP Tokyo
    Simon visiting Alistair Moffat and Joel MacKenzie at U. Melbourne

    November 2019
    Dustin Cobas visiting from U. Chile

    October 2019
    Simon co-chairing (with Nieves Brisaboa) SPIRE 2019 conference in Segovia, Spain
    Bella attending SPIRE 2019 and WCTA 2019 in Segovia, Spain

    September 2019
    Javier Soto visiting from U. Concepcion

    August 2019
    Mabel Vidal visiting from U. Concepcion

    July 2019
    Rajeev Raman visiting from U. Leicester

    June 2019
    Massimiliano Rossi visiting from U. Verona

    May 2019
    Simon visiting Rayan Chikhi's group at Institute Pasteur in Paris
    Simon visiting Golnaz Badkobeh at Goldsmiths University of London
    Rayan Chikhi and Camille Marchet visiting from Institute Pasteur in Paris for Helsinki Bioinformatics Day
    Sara Giuliani starting as a summer research intern
    Uula Ulkuniemi starting us as a summer research intern

    Alumni/former students

    • Touko Puro (MSc, U. Helsinki, December 2023)
    • Eve Kivivuori (MSc, U. Helsinki, July 2023)
    • Heikki Lehtosalo (Research Programmer, Nov 2019-Nov 2020)
    • Lauri Heino (MSc, U. Helsinki, November 2020)
    • Pekka Jylha-Ollila (MSc, U. Helsinki, September 2020)
    • Antti Karjalainen (MSc, U. Helsinki, July 2020)
    • Risto Haapasalmi (MSc, U. Helsinki, July 2020)
    • Jussi Timonen (MSc, U. Helsinki, July 2020)
    • Sara Giuliani (summer research intern, May-July 2019)
    • Uula Ulkuniemi (summer research intern, May-July 2019)
    • Joonsa Nietosvaara (MSc, U. Helsinki, graduated 2019)
    • Massimiliano Rossi (visiting PhD student from U. Verona, July-October 2018)
    • Christopher Hoobin (PhD, RMIT, graduated 2015, now at NASDAQ)
    • Jasbir Dhaliwal (PhD, RMIT, graduated 2014, now at Monash U., formerly at IBM)
    • Alex Bowe (MSc, RMIT, graduated 2013, now at Cruise Automation, PhD from U. Tokyo)
    • Shanika Kuruppu (PhD, U. Melbourne, graduated 2013, now at Google, co-supervised with Justin Zobel)