Topics for Master's Thesis

Below you can find topics that are currently available in the Algorithmic Data Science group.

IMPORTANT Please make sure to read these instructions for how to start and complete your thesis in our group.

Topics

Topic 1: Materialization of Bayesian Networks

Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian Networks (BNs). In recent work, we have developed a materialization scheme for BN models, such that the expected running time of Variable Elimination over a given query workload is optimal under given space constraints.

Current limitations and possible thesis contributions:

  • The current implementation assumes that the Bayesian network remains fixed over time. If even a small part of the BN gets updated, then the materialization has to be performed again from scratch for the entire BN. Thesis contribution: Adapt the current implementation to enable incremental updates to the BN materialization.
  • The current implementation uses the entire Bayesian network for answering any possible inference queries. In fact, however, part of the Bayesian network may be redundant, depending on the inference query. To take such redundancy into account, we have devised a materialization scheme to take into account redundancy. Thesis contribution: Implement the redundancy-aware materialization scheme.

Existing implementation:
Query the Model (QtM) https://github.com/aslayci/qtm

Reference:

Aslay, Cigdem, Martino Ciaperoni, Aristides Gionis, and Michael Mathioudakis. "Workload-aware materialization for efficient variable elimination on Bayesian networks." In Proceedings of the IEEE International Conference on Data Engineering, 2021. Extended version at https://arxiv.org/abs/1904.00079.

Topic 2: Node Classification on Large Networks

Semi-supervised node classification is a common task in network analysis. Given the labels for some of the nodes in a network, we aim to predict the labels for the remaining nodes. In our on-going work (Merchant and Mathioudakis, 2020), we have developed a novel approach for this task that is based on network embeddings. However, computing the embedding of a given network is a bottleneck for this approach.

Thesis contribution: Enhance the existing implementation of the proposed approach to make it scalable for larger networks, e.g., via parallelization, or state-of-the-art techniques for computation of network embeddings (Akyildiz et.al., 2020; Zhao et.al., 2020).

Existing implementation:
JANE https://version.helsinki.fi/ads/jane.

References:

Merchant, Arpit, and Michael Mathioudakis. "Joint Use of Node Attributes and Proximity for Semi-Supervised Classification on Graphs". International Conference on Complex Networks. 2021. Springer-Verlag (to appear). Pre-print: https://arxiv.org/abs/2010.11536

Akyildiz, Taha Atahan, Amro Alabsi Aljundi, and Kamer Kaya. 2020. GOSH: Embedding Big Graphs on Small Hardware. In 49th International Conference on Parallel Processing - ICPP (ICPP '20). Association for Computing Machinery, New York, NY, USA, Article 4, 1–11. DOI:https://doi.org/10.1145/3404397.3404456

Zhao, Zhiqiang, and Ying Zhang, and Zhuo Feng. 2021. Towards Scalable Spectral Embedding and Data Visualization via Spectral Coarsening. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM '21). Association for Computing Machinery, New York, NY, USA, 869–877. DOI: https://doi.org/10.1145/3437963.3441767

Completed Theses

See here: https://www2.helsinki.fi/en/researchgroups/algorithmic-data-science/people#section-98775.