DrugComb - an integrative cancer drug combination data portal

Drug combination therapy has the potential to enhance efficacy, reduce dose-dependent toxicity and prevent the emergence of drug resistance. However, discovery of synergistic and effective drug combinations has been a laborious and often serendipitous process. In recent years, identification of combination therapies has been accelerated due to the advances in high-throughput drug screening, but informatics approaches for systems-level data management and analysis are needed. To contribute toward this goal, we created an open-access data portal called DrugComb ( where the results of drug combination screening studies are accumulated, standardized and harmonized. Through the data portal, we provided a web server to analyze and visualize users' own drug combination screening data. The users can also effectively participate a crowdsourcing data curation effect by depositing their data at DrugComb. To initiate the data repository, we collected 437 932 drug combinations tested on a variety of cancer cell lines. We showed that linear regression approaches, when considering chemical fingerprints as predictors, have the potential to achieve high accuracy of predicting the sensitivity of drug combinations. All the data and informatics tools are freely available in DrugComb to enable a more efficient utilization of data resources for future drug combination discovery.
Citation: Nucleic Acids Res. 2019 Jul 2;47(W1):W43-W51.doi: 10.1093/nar/gkz337

Database link:

Combinatorial therapies have been recently proposed for improving anticancer treatment efficacy. SynergyFinder R package is a software tool to analyze pre-clinical drug combination datasets developed in our group. We report the major updates of the R package to improve the interpretation and annotation of drug combination screening results. Compared to the existing implementations, the novelty of the updated SynergyFinder R package consists of 1) extending to higher order drug combination data analysis and the implementation of dimension reduction techniques for visualizing the synergy landscape for unlimited number of drugs in a combination; 2) statistical analysis of drug combination synergy and sensitivity with confidence intervals and p-values; 3) incorporating a synergy barometer to harmonize multiple synergy scoring methods to provide a consensus metric of synergy; 4) incorporating the evaluation of drug combination synergy and sensitivity simultaneously to provide an unbiased interpretation of the clinical potential. Furthermore, we provide the annotation of drugs and cell lines that are tested in an experiment, including their chemical information, targets and signaling network information. These annotations shall improve the interpretation of the mechanisms of action of drug combinations. To facilitate the use of the R package for the drug discovery community, we also provide a web server at that provides a user-friendly interface to enable a more flexible and versatile analysis of drug combination data.

Web application (

R package (


[1] Bioinformatics. 2017 Aug 1;33(15):2413-2415. doi: 10.1093/bioinformatics/btx162.
[2] Comput Struct Biotechnol J. 2015 Sep 25;13:504-13. doi: 10.1016/j.csbj.2015.09.001

FAIRification of drug target interaction data (

Knowledge of the full target space of bioactive substances, approved and investigational drugs as well as chemical probes, provides important insights into therapeutic potential and possible adverse effects. The existing compound-target bioactivity data resources are often incomparable due to non-standardized and heterogeneous assay types and variability in endpoint measurements. To extract higher value from the existing and future compound target-profiling data, we implemented an open-data web platform, named Drug Target Commons (DTC), which features tools for crowd-sourced compound-target bioactivity data annotation, standardization, curation, and intra-resource integration. We demonstrate the unique value of DTC with several examples related to both drug discovery and drug repurposing applications and invite researchers to join this community effort to increase the reuse and extension of compound bioactivity data.


[1] Cell Chem Biol. 2018 Feb 15;25(2):224-229.e2. doi: 10.1016/j.chembiol.2017.11.009

[2] Database (Oxford). 2018 Jan 1;2018:1-13. doi: 10.1093/database/bay083

[3] Brief Bioinform. 2021 Mar 22;22(2):1656-1678. doi: 10.1093/bib/bbaa003

We carried out a systematic evaluation of target selectivity profiles across three recent large-scale biochemical assays of kinase inhibitors and further compared these standardized bioactivity assays with data reported in the widely used databases ChEMBL and STITCH. Our comparative evaluation revealed relative benefits and potential limitations among the bioactivity types, as well as pinpointed biases in the database curation processes. Ignoring such issues in data heterogeneity and representation may lead to biased modeling of drugs' polypharmacological effects as well as to unrealistic evaluation of computational strategies for the prediction of drug-target interaction networks. Toward making use of the complementary information captured by the various bioactivity types, including IC50, K(i), and K(d), we also introduce a model-based integration approach, termed KIBA, and demonstrate here how it can be used to classify kinase inhibitor targets and to pinpoint potential errors in database-reported drug-target interactions. An integrated drug-target bioactivity matrix across 52,498 chemical compounds and 467 kinase targets, including a total of 246,088 KIBA scores, has been made freely available.
Citation: Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis.  J Chem Inf Model. 2014 Mar 24;54(3):735-43. doi: 10.1021/ci400709d. 

Download the dataset:

Target inhibition network analysis using Minimization and Maximization Averaging

A recent trend in drug development is to identify drug combinations or multi-target agents that effectively modify multiple nodes of disease-associated networks. Such polypharmacological effects may reduce the risk of emerging drug resistance by means of attacking the disease networks through synergistic and synthetic lethal interactions. However, due to the exponentially increasing number of potential drug and target combinations, systematic approaches are needed for prioritizing the most potent multi-target alternatives on a global network level. We took a functional systems pharmacology approach toward the identification of selective target combinations for specific cancer cells by combining large-scale screening data on drug treatment efficacies and drug-target binding affinities. Our model-based prediction approach, named TIMMA, takes advantage of the polypharmacological effects of drugs and infers combinatorial drug efficacies through system-level target inhibition networks. Case studies in MCF-7 and MDA-MB-231 breast cancer and BxPC-3 pancreatic cancer cells demonstrated how the target inhibition modeling allows systematic exploration of functional interactions between drugs and their targets to maximally inhibit multiple survival pathways in a given cancer type. The TIMMA prediction results were experimentally validated by means of systematic siRNA-mediated silencing of the selected targets and their pairwise combinations, showing increased ability to identify not only such druggable kinase targets that are essential for cancer survival either individually or in combination, but also synergistic interactions indicative of non-additive drug efficacies. These system-level analyses were enabled by a novel model construction method utilizing maximization and minimization rules, as well as a model selection algorithm based on sequential forward floating search. Compared with an existing computational solution, TIMMA showed both enhanced prediction accuracies in cross validation as well as significant reduction in computation times. Such cost-effective computational-experimental design strategies have the potential to greatly speed-up the drug testing efforts by prioritizing those interventions and interactions warranting further study in individual cancer cases.


[1] Bioinformatics. 2015 Jun 1;31(11):1866-8. doi: 10.1093/bioinformatics/btv067.

[2] PLoS Comput Biol. 2013;9(9):e1003226. doi: 10.1371/journal.pcbi.1003226.

Bayesian Analysis of Population Structure 

During the most recent decade many Bayesian statistical models and software for answering questions related to the genetic structure underlying population samples have appeared in the scientific literature. Most of these methods utilize molecular markers for the inferences, while some are also capable of handling DNA sequence data. In a number of earlier works, we have introduced an array of statistical methods for population genetic inference that are implemented in the software BAPS. However, the complexity of biological problems related to genetic structure analysis keeps increasing such that in many cases the current methods may provide either inappropriate or insufficient solutions. We discuss the necessity of enhancing the statistical approaches to face the challenges posed by the ever-increasing amounts of molecular data generated by scientists over a wide range of research areas and introduce an array of new statistical tools implemented in the most recent version of BAPS. With these methods it is possible, e.g., to fit genetic mixture models using user-specified numbers of clusters and to estimate levels of admixture under a genetic linkage model. Also, alleles representing a different ancestry compared to the average observed genomic positions can be tracked for the sampled individuals, and a priori specified hypotheses about genetic population structure can be directly compared using Bayes' theorem. In general, we have improved further the computational characteristics of the algorithms behind the methods implemented in BAPS facilitating the analyses of large and complex datasets. In particular, analysis of a single dataset can now be spread over multiple computers using a script interface to the software. The Bayesian modelling methods introduced in this article represent an array of enhanced tools for learning the genetic structure of populations. Their implementations in the BAPS software are designed to meet the increasing need for analyzing large-scale population genetics data. The software is freely downloadable for Windows, Linux and Mac OS X systems at


[1] Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinformatics. 2008 Dec 16;9:539. doi: 10.1186/1471-2105-9-539

[2] Bayesian analysis of population structure based on linked molecular information. Math Biosci. 2007 Jan;205(1):19-31. doi: 10.1016/j.mbs.2006.09.015.

[3] Identifying currents in the gene pool for bacterial populations using an integrative approach. 

PLoS Comput Biol. 2009 Aug;5(8):e1000455. doi: 10.1371/journal.pcbi.1000455

[4] Hyper-recombination, diversity, and antibiotic resistance in pneumococcus. Science. 2009 Jun 12;324(5933):1454-7.doi: 10.1126/science.1171908

T-RFLP Bayesian Analysis of Population Structures in Bacteria

The investigation of microbial communities is an essential part of the study of the biosphere. Flexible molecular fingerprinting tools such as terminal-restriction fragment length polymorphism (T-RFLP) analysis are often applied in the studies to enable the characterization of the microbial population. However, such data have so far been primarily analyzed using conventional clustering methods. Here we introduce a Bayesian model-based method for the purpose of comparing microbial communities using T-RFLP data. Such datasets have in general several challenging features, e.g. sparseness, missing values and structurally zero-valued observations. These features are taken into account by developing a Bayesian latent class mixture model for the observations in our framework. To make inferences under the model we use a recent Markov chain Monte Carlo (MCMC) -based method for the Bayesian model selection. To assess the introduced method we analyze both simulated and real datasets. The simulations show that our approach compares preferably to standard statistical clustering tools, such as k-means, hierarchical clustering, and Autoclass. The developed tool is freely available as a software package T-BAPS at

Citation: T-BAPS: a Bayesian statistical tool for comparison of microbial communities using terminal-restriction fragment length polymorphism (T-RFLP) data. Stat Appl Genet Mol Biol. 2007;6:Article30. doi: 10.2202/1544-6115.1303

Paper download: