What is ClusTRace?

ClusTRace is a bioinformatic pipeline for high level handling and phylogenetic/cluster analysis of virus sequences.
ClusTRace was created as an aid for COVID-19 transmission chain tracing in Helsinkin University, Finland.

ClusTRace flowchart


ClusTRace supports:

  • assigning lineages with Pangolin
  • collecting sequences to multi-fasta files according to lineage
  • filtering outlier sequences
  • creating multiple sequence alignments (MSA)
  • creating phylogenetic trees from MSAs
  • cluster analysis
    • extracting sequence clusters from the obtained phylogenetic trees
    • extracting clusters at different mutation rates and with different methods supported in TreeCluster
    • visualizing clusters with different colors/labels in phylogenetic trees
    • summarizing clusters with spreadsheets, that depict cluster size, sequence composition, growth rate and support information
  • variant calling
    • calling nucleotide variants for lineage and/or cluster MSA(s)
    • calling amino acid variants for lineage and/or cluster VCF(s)
    • summarizing nucleotide and amino acid variants with spreadsheets
    • visualizing lineage amino acid variants with interactive lollipop graphs (g3viz)

Citing ClusTRace

Plyusnin, I., Truong Nguyen, P.T., Sironen, T. et al. ClusTRace, a bioinformatic pipeline for analyzing clusters in virus phylogenies. BMC Bioinformatics 23, 196 (2022). https://doi.org/10.1186/s12859-022-04709-8

Publication pdf is available here.