The methodological core of the Discovery Research Group is in data mining and machine learning. They are complemented by language technology, including both text mining/natural language analysis and natural language generation. We collaborate in software architecture research for self-adaptive systems. Most of our research takes place in the context of computational creativity. The wider fields we identify with are artificial intelligence and data science.
Our research methodology consists of identification of relevant computational problems, development of new concepts and algorithms for them, and of building systems and applying the algorithms. For our scientific results in all the above fields, see the Publications page.
Our work on computational creativity also aims at producing creative results. Please see the Art page for artistic results.
Embeddia — Cross-Lingual Embeddings for Less-Represented Languages in European News Media
Within European Union, access to fundamental resources such as local news and government services is limited by the great diversity of the EU’s 37 languages. The Embeddia project seeks to address these challenges by leveraging innovations in the use of cross-lingual embeddings to allow existing monolingual resources to be used across languages. For this, the project will develop novel multilingual solutions in the domain of news analysis and media production.(Funding: EU H2020, 2019-2021)
NewsEye — A Digital Investigator for Historical Newspapers
Newspapers collect information about cultural, political and social events in a more detailed way than any other public record. In the last decades, tens of millions of newspaper pages from European libraries have been digitized and made available online. Whilst the broad public shows general interest in this historical and cultural resource, it is of crucial importance for many humanities scholars. The multi-disciplinary NewsEye project involves national libraries, humanities and social science research groups and computer science research groups to address a number of challenges in text recognition, text analysis, natural language processing, computational creativity and natural language generation; in digital newspaper research; in digital humanities; as well as in history. (Funding: EU H2020, 2018-2021)
Script Generation for Radio Drama
Can computational creativity be used to generate dialogue for radio plays? How can research in dramaturgy, conversation analysis, and computer science come together to create such as system? We study computational generation of dialogue for radio plays in collaboration with Uniarts Helsinki (Prof. Otso Huopaniemi, Dramaturgy and Playwriting), the Department of Finnish, Finno-Ugrian and Scandinavian Studies (Prof. Marja-Leena Sorjonen, Finnish) and The Finnish broadcasting company YLE. The first outcome of the ongoing project, a series of generated dialogues, was produced in autumn 2019. The Finnish broadcasting company YLE plans to produce and broadcast these scripts as part of its radio drama programming. (Funding: YLE, Helsinki Institute for Information Technology, the Department of Computer Science and the Department of Finnish, Finno-Ugrian and Scandinavian Studies, 2019-2020).
Cooperation-Aware Software and Creative Self-Adaptivity — CACS
We develop models and architectures for intelligently self-adaptive, collaborative software components. We combine research in software architectures with research in artificial intelligence, more specifically computational creativity. We use computational creativity to empower intelligent software components or agents with the capability to communicate and cooperate in novel and valuable ways in unanticipated situations. Work on self-adaptive and collaborative systems will advance the design of autonomous and resilient agents, with potential applications e.g. in industrial Internet-of-Things, in collaborating gadgets of a smart home, or in earthquake area rescue service with drones, crawlers and robots deployed in high numbers. Despite its basic research nature, the project has high potential for practical impact via the software-intensive industries. (Funding: Academy of Finland, 2018-2019)
Digital language typology (DLT) is a multi-disciplinary project intending to produce a computer-based platform that will be able to assess the structurally manifested family relationships within any set of languages with appropriate large digital textual and speech material. To this end, we have collected a group of specialists from phonetics, linguistics, and computer science. DLT is part of the Finnish Academy Digital Humanities programme, which includes novel methods and techniques in which digital technologies and state-of-the-art computational science methods are used for collecting, managing, and analysing data in humanities and social sciences research. The principal investigators are Martti Vainio (University of Helsinki, coordinator), Hannu Toivonen (University of Helsinki), and Markku Turunen (University of Tampere). (Funding: Academy of Finland, 2016-2019).
In this project we investigate computational linguistic creativity, i.e., the ability of computers to act in verbally creative ways. Such creative skills will give computers more flexibility in their verbal communication with users. Software with creative skills can also be used to build tools that help people use language in novel and creative ways. We develop novel text mining inspired methods for computational linguistic creativity, especially for supporting human creativity, and we also investigate use of these methods as pedagogical tools in primary and secondary schools. The project combines computer science, data mining and computational creativity with pedagogy and use of digital technology in education. (Funding: Academy of Finland, 2014-2018)
The demise of the old strategies of newspaper publishers has created an urgent need for radical transformation of operations. The aim of this project is to develop new strategies based on technical solutions that are evolving. We propose a holistic strategical approach, a new ecosystem for news that will open up for a new economically viable and technically sophisticated approach to news production and consumption. The research consortium is formed by The Swedish School of Social science, University of Helsinki together with the Department of Computer Science, University of Helsinki and VTT Technical Research Centre of Finland Ltd; the project involves collaboration with several Finnish media houses. (Funding: Tekes and companies, 2017-2018).
Concept Creation Technology (ConCreTe), Promotion the Scientific Exploration of Computational Creativity (PROSECCO)
Computational creativity is a new area of computer science, the goal of which is to model, simulate and enhance creativity. We capitalise on our data mining background by investigating the discovery and use of patterns in creative systems. Our current research topics include automatic production of creative texts, especially computational poetry and machine humor, and also music. We participate in ConCreTe: Concept Creation Technology project, and we are partners in the co-ordination action PROSECCO: Promoting the Scientific Exploration of Computational Creativity. (Funding: EU FP7, 2013-2016)
We view biological databases of sequences, proteins, genes etc. as weighted graphs and develop methods for link discovery and analysis in such graphs. Try out the prototype search engine at biomine.cs.helsinki.fi! We are also affiliated with InterPregGen: Genetic studies of pre-eclampsia in Central Asian & European populations (EU FP7, participation with Hannele Laivuori). (Funding: National Technology Agency (Tekes) and companies) (project web page)
Data and text mining
We participate in two programmes of Tivit, The Strategic Centre for Science, Technology and Innovation in the Field of ICT. In the Next Media researh programme, we develop methods for on-line analysis and surveillance of social media for local news in the Software Newsroom project. In the Security Ecosystem of the Data to Intelligence researh programme, in turn, the goal of the consortium is to invent products, solutions and services that use security and related data sources to provide added value to the customers and good business for the providers. (Funding: Tekes)
The aim is to develop and validate a novel computational methodology, which facilitates bisociative information discovery in large-scale heterogeneous information environments. (Funding: European Commission under the Framework 7 programme.) (project web site)
The Context project studies characterization and analysis of information about user's context and its use in proactive adaptivity. We have developed data analysis algorithms as well as ContextPhone, a mobile context-aware prototyping platform, available as free software. (Funding: Academy of Finland, PROACT Programme.) (project web page)
We develop models, methods and tools for analyzing genetic data, in particular for gene mapping and haplotype analysis. (Funding: National Technology Agency (Tekes) and companies in several projects, HIIT.) (project web page)