All the software tools and databases are freely available.
drda: An R package for dose-response data analysis
Analysis of dose-response data is an important step in many scientific disciplines, including but not limited to pharmacology, toxicology, and epidemiology. The R package drda is designed to facilitate the analysis of dose-response data by implementing efficient and accurate functions with a familiar interface. With drda, it is possible to fit models by the method of least squares, perform goodness of fit tests, and conduct model selection. Compared to other similar packages, drda provides, in general, more accurate estimates in the least-squares sense. This result is achieved by a smart choice of the starting point in the optimization algorithm and by implementing the Newton method with a trust region with analytical gradients and Hessian matrices. In this article, drda is presented through the description of its methodological components and examples of its user-friendly functions. Performance is finally evaluated using a real, large-scale drug sensitivity screening dataset.
Minimal information for Chemosensitivity assays (MICHA): A next-generation pipeline to enable the FAIRification of drug screening experiments
Chemosensitivity assays are commonly used for preclinical drug discovery and clinical trial optimization. However, data from independent assays are often discordant, largely attributed to uncharacterized variation in the experimental materials and protocols. We report here the launching of MICHA (Minimal Information for Chemosensitivity Assays), accessed via https://micha-protocol.org. Distinguished from existing efforts that are often lacking support from data integration tools, MICHA can automatically extract publicly available information to facilitate the assay annotation including: 1) compounds, 2) samples, 3) reagents, and 4) data processing methods. For example, MICHA provides an integrative web server and database to obtain compound annotation including chemical structures, targets, and disease indications. In addition, the annotation of cell line samples, assay protocols and literature references can be greatly eased by retrieving manually curated catalogues. Once the annotation is complete, MICHA can export a report that conforms to the FAIR principle (Findable, Accessible, Interoperable and Reusable) of drug screening studies. To consolidate the utility of MICHA, we provide FAIRified protocols from five major cancer drug screening studies, as well as six recently conducted COVID-19 studies. With the MICHA webserver and database, we envisage a wider adoption of a community-driven effort to improve the open access of drug sensitivity assays.
Web application: https://micha-protocol.org
 Brief Bioinform. 2021:bbab350. doi: 10.1093/bib/bbab350. Epub ahead of print. PMID: 34472587.
Drug combination therapy has the potential to enhance efficacy, reduce dose-dependent toxicity and prevent the emergence of drug resistance. However, discovery of synergistic and effective drug combinations has been a laborious and often serendipitous process. In recent years, identification of combination therapies has been accelerated due to the advances in high-throughput drug screening, but informatics approaches for systems-level data management and analysis are needed. To contribute toward this goal, we created an open-access data portal called DrugComb (https://drugcomb.org) where the results of drug combination screening studies are accumulated, standardized and harmonized. Through the data portal, we provided a web server to analyze and visualize users' own drug combination screening data. The users can also effectively participate a crowdsourcing data curation effect by depositing their data at DrugComb. To initiate the data repository, we collected 437 932 drug combinations tested on a variety of cancer cell lines. We showed that linear regression approaches, when considering chemical fingerprints as predictors, have the potential to achieve high accuracy of predicting the sensitivity of drug combinations. All the data and informatics tools are freely available in DrugComb to enable a more efficient utilization of data resources for future drug combination discovery.
 Nucleic Acids Res. 2021 Jun 1; gkab438. doi: 10.1093/nar/gkab438
 Nucleic Acids Res. 2019 Jul 2;47(W1):W43-W51.doi: 10.1093/nar/gkz337
Database link: https://drugcomb.org
Combinatorial therapies have been recently proposed for improving anticancer treatment efficacy. SynergyFinder R package is a software tool to analyze pre-clinical drug combination datasets developed in our group. We report the major updates of the R package to improve the interpretation and annotation of drug combination screening results. Compared to the existing implementations, the novelty of the updated SynergyFinder R package consists of 1) extending to higher order drug combination data analysis and the implementation of dimension reduction techniques for visualizing the synergy landscape for unlimited number of drugs in a combination; 2) statistical analysis of drug combination synergy and sensitivity with confidence intervals and p-values; 3) incorporating a synergy barometer to harmonize multiple synergy scoring methods to provide a consensus metric of synergy; 4) incorporating the evaluation of drug combination synergy and sensitivity simultaneously to provide an unbiased interpretation of the clinical potential. Furthermore, we provide the annotation of drugs and cell lines that are tested in an experiment, including their chemical information, targets and signaling network information. These annotations shall improve the interpretation of the mechanisms of action of drug combinations. To facilitate the use of the R package for the drug discovery community, we also provide a web server at www.synergyfinder.org that provides a user-friendly interface to enable a more flexible and versatile analysis of drug combination data.
 doi: https://doi.org/10.1101/2021.06.01.446564
 Bioinformatics. 2017 Aug 1;33(15):2413-2415. doi: 10.1093/bioinformatics/btx162.
 Comput Struct Biotechnol J. 2015 Sep 25;13:504-13. doi: 10.1016/j.csbj.2015.09.001
FAIRification of drug target interaction data (drugtargetcommons.org)
Knowledge of the full target space of bioactive substances, approved and investigational drugs as well as chemical probes, provides important insights into therapeutic potential and possible adverse effects. The existing compound-target bioactivity data resources are often incomparable due to non-standardized and heterogeneous assay types and variability in endpoint measurements. To extract higher value from the existing and future compound target-profiling data, we implemented an open-data web platform, named Drug Target Commons (DTC), which features tools for crowd-sourced compound-target bioactivity data annotation, standardization, curation, and intra-resource integration. We demonstrate the unique value of DTC with several examples related to both drug discovery and drug repurposing applications and invite researchers to join this community effort to increase the reuse and extension of compound bioactivity data.
 Cell Chem Biol. 2018 Feb 15;25(2):224-229.e2. doi: 10.1016/j.chembiol.2017.11.009
 Database (Oxford). 2018 Jan 1;2018:1-13. doi: 10.1093/database/bay083
 Brief Bioinform. 2021 Mar 22;22(2):1656-1678. doi: 10.1093/bib/bbaa003
We carried out a systematic evaluation of target selectivity profiles across three recent large-scale biochemical assays of kinase inhibitors and further compared these standardized bioactivity assays with data reported in the widely used databases ChEMBL and STITCH. Our comparative evaluation revealed relative benefits and potential limitations among the bioactivity types, as well as pinpointed biases in the database curation processes. Ignoring such issues in data heterogeneity and representation may lead to biased modeling of drugs' polypharmacological effects as well as to unrealistic evaluation of computational strategies for the prediction of drug-target interaction networks. Toward making use of the complementary information captured by the various bioactivity types, including IC50, K(i), and K(d), we also introduce a model-based integration approach, termed KIBA, and demonstrate here how it can be used to classify kinase inhibitor targets and to pinpoint potential errors in database-reported drug-target interactions. An integrated drug-target bioactivity matrix across 52,498 chemical compounds and 467 kinase targets, including a total of 246,088 KIBA scores, has been made freely available.
Citation: Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J Chem Inf Model. 2014 Mar 24;54(3):735-43. doi: 10.1021/ci400709d.
Download the dataset:
A recent trend in drug development is to identify drug combinations or multi-target agents that effectively modify multiple nodes of disease-associated networks. Such polypharmacological effects may reduce the risk of emerging drug resistance by means of attacking the disease networks through synergistic and synthetic lethal interactions. However, due to the exponentially increasing number of potential drug and target combinations, systematic approaches are needed for prioritizing the most potent multi-target alternatives on a global network level. We took a functional systems pharmacology approach toward the identification of selective target combinations for specific cancer cells by combining large-scale screening data on drug treatment efficacies and drug-target binding affinities. Our model-based prediction approach, named TIMMA, takes advantage of the polypharmacological effects of drugs and infers combinatorial drug efficacies through system-level target inhibition networks. Case studies in MCF-7 and MDA-MB-231 breast cancer and BxPC-3 pancreatic cancer cells demonstrated how the target inhibition modeling allows systematic exploration of functional interactions between drugs and their targets to maximally inhibit multiple survival pathways in a given cancer type. The TIMMA prediction results were experimentally validated by means of systematic siRNA-mediated silencing of the selected targets and their pairwise combinations, showing increased ability to identify not only such druggable kinase targets that are essential for cancer survival either individually or in combination, but also synergistic interactions indicative of non-additive drug efficacies. These system-level analyses were enabled by a novel model construction method utilizing maximization and minimization rules, as well as a model selection algorithm based on sequential forward floating search. Compared with an existing computational solution, TIMMA showed both enhanced prediction accuracies in cross validation as well as significant reduction in computation times. Such cost-effective computational-experimental design strategies have the potential to greatly speed-up the drug testing efforts by prioritizing those interventions and interactions warranting further study in individual cancer cases.
 Bioinformatics. 2015 Jun 1;31(11):1866-8. doi: 10.1093/bioinformatics/btv067.
 PLoS Comput Biol. 2013;9(9):e1003226. doi: 10.1371/journal.pcbi.1003226.
Bayesian Analysis of Population Structure
During the most recent decade many Bayesian statistical models and software for answering questions related to the genetic structure underlying population samples have appeared in the scientific literature. Most of these methods utilize molecular markers for the inferences, while some are also capable of handling DNA sequence data. In a number of earlier works, we have introduced an array of statistical methods for population genetic inference that are implemented in the software BAPS. However, the complexity of biological problems related to genetic structure analysis keeps increasing such that in many cases the current methods may provide either inappropriate or insufficient solutions. We discuss the necessity of enhancing the statistical approaches to face the challenges posed by the ever-increasing amounts of molecular data generated by scientists over a wide range of research areas and introduce an array of new statistical tools implemented in the most recent version of BAPS. With these methods it is possible, e.g., to fit genetic mixture models using user-specified numbers of clusters and to estimate levels of admixture under a genetic linkage model. Also, alleles representing a different ancestry compared to the average observed genomic positions can be tracked for the sampled individuals, and a priori specified hypotheses about genetic population structure can be directly compared using Bayes' theorem. In general, we have improved further the computational characteristics of the algorithms behind the methods implemented in BAPS facilitating the analyses of large and complex datasets. In particular, analysis of a single dataset can now be spread over multiple computers using a script interface to the software. The Bayesian modelling methods introduced in this article represent an array of enhanced tools for learning the genetic structure of populations. Their implementations in the BAPS software are designed to meet the increasing need for analyzing large-scale population genetics data. The software is freely downloadable for Windows, Linux and Mac OS X systems at https://web.abo.fi/fak/mnf/mate/jc/software/ or https://www2.helsinki.fi/en/researchgroups/network-pharmacology-for-prec....
 Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinformatics. 2008 Dec 16;9:539. doi: 10.1186/1471-2105-9-539
 Bayesian analysis of population structure based on linked molecular information. Math Biosci. 2007 Jan;205(1):19-31. doi: 10.1016/j.mbs.2006.09.015.
 Identifying currents in the gene pool for bacterial populations using an integrative approach.
PLoS Comput Biol. 2009 Aug;5(8):e1000455. doi: 10.1371/journal.pcbi.1000455
 Hyper-recombination, diversity, and antibiotic resistance in pneumococcus. Science. 2009 Jun 12;324(5933):1454-7.doi: 10.1126/science.1171908
Download source code (v5.2) and executable (v6.0). For more recent updates, please contact Prof. Jukka Corander
T-RFLP Bayesian Analysis of Population Structures in Bacteria
The investigation of microbial communities is an essential part of the study of the biosphere. Flexible molecular fingerprinting tools such as terminal-restriction fragment length polymorphism (T-RFLP) analysis are often applied in the studies to enable the characterization of the microbial population. However, such data have so far been primarily analyzed using conventional clustering methods. Here we introduce a Bayesian model-based method for the purpose of comparing microbial communities using T-RFLP data. Such datasets have in general several challenging features, e.g. sparseness, missing values and structurally zero-valued observations. These features are taken into account by developing a Bayesian latent class mixture model for the observations in our framework. To make inferences under the model we use a recent Markov chain Monte Carlo (MCMC) -based method for the Bayesian model selection. To assess the introduced method we analyze both simulated and real datasets. The simulations show that our approach compares preferably to standard statistical clustering tools, such as k-means, hierarchical clustering, and Autoclass. The developed tool is freely available as a software package T-BAPS at http://www.abo.fi/fak/mnf/mate/jc/software/t-baps.html.
Citation: T-BAPS: a Bayesian statistical tool for comparison of microbial communities using terminal-restriction fragment length polymorphism (T-RFLP) data. Stat Appl Genet Mol Biol. 2007;6:Article30. doi: 10.2202/1544-6115.1303