Compressed Data Structures

The team led by Associate Professor Simon J. Puglisi focuses on efficient algorithms and data structures for searching, storing, and manipulating strings, trees and graphs, and applications thereof (like bioinformatics, information retrieval, database systems, and data mining).

Compressed data structures lie at the intersection of data structures and data compression. Often, if a particular data set is compressible, a data structure built from it is compressible too. The aim of compressed data structuring is to reduce the space consumed by a data structure using techniques from data compression, but to simultaneously support fast operations on the data structure too. Compressed data structures (and their cousins, succinct data structures) are a relatively young branch of computer science, but have already found heavy use in bioinformatics, database systems, and search engines. 

Team members

  • Simon J. Puglisi (Associate Professor)
  • Jarno Alanko (Postdoctoral Fellow)
  • Bella Zhukova (PhD student, U. Helsinki, current)
  • Saska Dönges (PhD student, U. Helsinki, current)
  • Silvia Nepšinská (Research Programmer, current, co-supervised with Leena Salmela)
  • Aki Utoslahti (MSc student, U. Helsinki, current)
  • Martin Vidjeskog (MSc student, U. Helsinki, current)
  • Arttu Kilpinen (MSc student, U. Helsinki, current)
  • Johannes Verwijnen (MSc student, U. Helsinki, current)
  • Yan Zhengtong (PhD student, U. Helsinki, current, co-supervied with Jiaheng Lu)

Selected Recent Publications

  • Block Trees, Journal of Computer and System Sciences 117:1-22 (2021). [Article online]
  • L. Salmela, K. Mukherjee, S.J. Puglisi, M.D. Muggli, and C. Boucher: Fast and accurate correction of optical mapping data via spaced seeds. Bioinformatics 36(3): 682-689 (2020). [Article online] [Implementation]
  • M. Cáceres, S.J. Puglisi, B. Zhukova: Fast Indexes for Gapped Pattern Matching. Proceedings of SOFSEM 2020: 493-504. [Article online]
  • S. Gog, J. Kärkkäinen, D. Kempa, M. Petri, S.J. Puglisi: Fixed Block Compression Boosting in FM-Indexes: Theory and Practice, Algorithmica 81(4): 1370-1391 (2019). [Article online]
  • A. Poyias, S.J. Puglisi, R. Raman: m-Bonsai: A Practical Compact Dynamic Trie. Int. J. Found. Comput. Sci. 29(8): 1257-1278 (2018)
  • Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi, Bella Zhukova: Engineering External Memory Induced Suffix Sorting. ALENEX 2017: 98-108
  • Juha Kärkkäinen, Marcin Piatkowski, Simon J. Puglisi: String Inference from Longest-Common-Prefix Array. ICALP 2017: 62:1-62:14
  • Djamal Belazzougui, Simon J. Puglisi: Range Predecessor and Lempel-Ziv Parsing. SODA 2016: 2053-2071
  • Yasuo Tabei, Hiroto Saigo, Yoshihiro Yamanishi, Simon J. Puglisi: Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices. KDD 2016: 1875-1884

Recent News/Activities

March 2021

Papers accepted to CPM and SEA:

D. Belazzogui, D. Kosolobov, S. J. Puglisi, & R. Raman, "Weighted-level ancestor queries revisited", to appear at CPM
S. J. Puglisi & B. Zhukova, "Document Listing Hacks", to appeat at SEA

January 2021

Papers appear in print at JCSS (on block trees) and TCS (on suffix tree breadth)
Saska Dönges starts as a PhD student

December 2020

Three journal papers appear this month at Bioinformatics, Algorithmica, and Algorithms + two papers accepted to the 2021 Data Compression Conference:

Simon J. Puglisi & Bella Zhokova, "Smaller RLZ-compressed suffix arrays"
Danyang Ma, Simon J. Puglisi, Rajeev Raman & Bella Zhokova, "On Elias-Fano for Rank Queries in FM-Indexes"

November 2020
Lauri Heino receives his MSc degree

September 2020
Pekka Jylha-Ollila receives his MSc degree

July 2020
Jussi Timonen, Antti Karjalainen, and Risto Haapasalmi receive their MSc degrees
Bella and Simon have paper "Relative Lempel-Ziv compression of suffix array" accepted to SPIRE 2020

May 2020
Simon is opponent for Florian Kurpicz's PhD thesis defence at TU Dortmund

March 2020
Two papers accepted to SEA 2020 in Palermo

February 2020
Simon attending London Stringology Days at King's College London

January 2020
Bella presenting "Fast Indexes for Gapped Pattern Matching" at SOFSEM 2020 in Cyprus

December 2019
Simon visiting Yasuo Tabei and Hideo Bannai at RIKEN AIP Tokyo
Simon visiting Alistair Moffat and Joel MacKenzie at U. Melbourne

November 2019
Dustin Cobas visiting from U. Chile

October 2019
Simon co-chairing (with Nieves Brisaboa) SPIRE 2019 conference in Segovia, Spain
Bella attending SPIRE 2019 and WCTA 2019 in Segovia, Spain

September 2019
Javier Soto visiting from U. Concepcion

August 2019
Mabel Vidal visiting from U. Concepcion

July 2019
Rajeev Raman visiting from U. Leicester

June 2019
Massimiliano Rossi visiting from U. Verona

May 2019
Simon visiting Rayan Chikhi's group at Institute Pasteur in Paris
Simon visiting Golnaz Badkobeh at Goldsmiths University of London
Rayan Chikhi and Camille Marchet visiting from Institute Pasteur in Paris for Helsinki Bioinformatics Day
Sara Giuliani starting as a summer research intern
Uula Ulkuniemi starting us as a summer research intern

Alumni/former students

  • Heikki Lehtosalo (Research Programmer, Nov 2019-Nov 2020)
  • Lauri Heino (MSc, U. Helsinki, November 2020)
  • Pekka Jylha-Ollila (MSc, U. Helsinki, September 2020)
  • Antti Karjalainen (MSc, U. Helsinki, July 2020)
  • Risto Haapasalmi (MSc, U. Helsinki, July 2020)
  • Jussi Timonen (MSc, U. Helsinki, July 2020)
  • Sara Giuliani (summer research intern, May-July 2019)
  • Uula Ulkuniemi (summer research intern, May-July 2019)
  • Joonsa Nietosvaara (MSc, U. Helsinki, graduated 2019)
  • Massimiliano Rossi (visiting PhD student from U. Verona, July-October 2018)
  • Christopher Hoobin (PhD, RMIT, graduated 2015, now at NASDAQ)
  • Jasbir Dhaliwal (PhD, RMIT, graduated 2014, now at Monash U., formerly at IBM)
  • Alex Bowe (MSc, RMIT, graduated 2013, now at Cruise Automation, PhD from U. Tokyo)
  • Shanika Kuruppu (PhD, U. Melbourne, graduated 2013, now at Google, co-supervised with Justin Zobel)