Two papers on Flow Decomposition accepted at RECOMB 2022

Flow decomposition lies at the core of the multi-assembly problem in bioinformatics, whose main objective is to reconstruct each genomic sequence from a set of mixed reads sequenced from them. The Graph Algorithms Team has contributed to this problem with a new ILP (Integer Linear Program) formulation for minimum flow decomposition and a “safe and complete” algorithm. Both works will be presented in RECOMB 2022.

A popular approach to solve multi-assembly consists of building a weighted acyclic graph from the different reads and their abundances in the mixed sample. The weights in the graph form an (approximate) flow, and the different genomic sequences (and their abundances) can be derived by decomposing such flow into a set of weighted paths explaining it.  

In “Fast, Flexible, and Exact Minimum Flow Decompositions via ILP” a team formed by researchers from Montana State University and University of Helsinki (Postdoctoral  
Researcher Fernando Dias and Associate Professor Alexandru Tomescu) consider the NP-hard problem of decomposing the flow into the minimum number of paths. They propose a new Integer Linear Program (ILP) encoding the exponentially many solution paths using only a quadratic number of variables. This significant result will lead to more scalable, faster, and more flexible multi-assemblers based on ILP.  

Pre-print: https://arxiv.org/abs/2201.10923 

In “Safety and Completeness in Flow Decompositions for RNA Assembly” researchers from Montana State University and University of Helsinki (Research Assistant Milla Kortelainen, Doctoral Researcher Manuel Caceres, Postdoctoral researcher Shahbaz Khan and Associate Professor Alexandru Tomescu) studied the problem of finding sub-paths that are common to all flow decompositions. They developed a “safe and complete” algorithm that reports all 100% precise sub-solutions that can be derived from the flow, instead of reporting one candidate solution. Their algorithm efficiently finds all safe and complete paths, and these are shown to have better quality compared to existing methods.  

Pre-print: https://arxiv.org/abs/2201.10372 

Both research works have been accepted to the 26th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2022) and will be published in the conference proceedings.