Research Topics - Graph Algorithms

Here is a selection of representative research topics of the Graph Algorithms team (Algorithmic Bioinformatics group).
Safe and Complete Algorithms

Computational problems are usually proposed as models of real-world problems. In some cases we are interested in just a simple answer to a problem, e.g., any route from A to B minimizing the amount of fuel consumed. However, other problems are only approximately modeled by a mathematical formulation, and the best solution may depend on more complex criteria, unavailable when modeling the problem. One established way of coping with this is to enumerate all solutions, or only the first k best solutions. However, in many cases this is simply unfeasible, since there can be a large, even exponential, number of optimal solutions.

Another way to cope with issue, interesting from both a theoretical and a practical viewpoint, is to consider only those partial solutions that are common to all solutions. We call these safe. An algorithm outputting all safe partial solutions is called safe and complete. A practical reason why we are intersted in safe and complete algorithms comes from Bioinformatics, and more precisely from the analysis of high-throughput sequencing data. 

 

Our key contributions until 2019 are:

For further details and more recent papers on this topic, check the page of our ERC Starting Grant "Safe and Complete Algorithms for Bioinformatics".

String Problems Generalized to Labeled Graphs (and Pan-genomics)

Many problems on strings can be naturally generalized to labeled graphs, since a string is just a labeled path. We work problems of the following type: given a string and a labeled graph, find an "occurrence" of the string in the graph. There are several ways to define "occurrence", the simplest of which being "a path whose concatenation of labels equals the given string". This is relevant in Bioinformatics (in Pan-genomics) since genomic references are being replaced by labeled graphs, which encoding not only a single reference, but also all the variation observed in a population.

The following publications deal with such problems:

Genome Assembly

The genome assembly problem asks for the reconstructing for a genome sequence from many small fragments sequenced from it, for example with high-throughut sequencing. Genome assembly is nowadays split into different stages, such as contig assembly, scaffolding, gap filling. Our key contributions to this field are:

Applications of Network Flows in Bioinformatics

Network flows, and in particular min-cost flows, are a computational model generalizing several classical problems on graphs. We were among the first to propose their application to the problem of assembling RNA transcripts from RNA-Seq data:

Later applications of network flows co0authored by our team are:

We described some of these applications in our Bioinformatics textbook:

Tumor Phylogenies

By sequecing several samples of a tumor, we can discover the mutations present in each of these, and further use them to reconstruct the history of the tumor, for example as a phylogenetic tree. We did some work involcing such reconstructions, by assuming that the tumor evolution was a perfect phylogeny:

Graph-Theoretic Problems on Sets

Well-founded sets are sets which contain as elements only other sets. They can be interpreted as directed graphs, by viewing each set as a node, and each membership relation as a directed edge between the two sets. Once this translation is made, there is a wide range of graph-theoretic questions that can be asked about such directed graphs corresponding to sets. Below is a list of publications dealing with such questions:

We discuss some of these result in this monograph: