What is ClusTRace?

ClusTRace is a bioinformatic pipeline for high level handling and phylogenetic/cluster analysis of virus sequences.
ClusTRace was created as an aid for COVID-19 transmission chain tracing in Helsinkin University, Finland.


ClusTRace flowchart

ClusTRace supports:

  • assigning lineages to consensus sequences with Pangolin
  • collecting consensus sequences into multi-fasta files according to lineage annotations
  • filtering outlier sequences from multi-fasta files
  • creating multiple sequence alignments (MSA) from multi-fasta files
  • creating phylogenetic trees from MSAs
  • updating MSAs and phylogenetic trees with novel sequence batches
  • cluster analysis
    • extracting sequence clusters from the obtained phylogenetic trees
    • extracting clusters at different mutation rates and with different methods supported in TreeCluster
    • visualizing clusters with different colors/labels in phylogenetic trees
    • summarizing clusters with excel tables, that depict cluster size, sequence composition, growth rate and support information
    • tracing a set of predifined clusters to an updated set of sequences
  • variant calling
    • calling nucleotide variants for lineage and/or cluster MSA(s)
    • calling amino acid variants for lineage and/or cluster VCF(s)
    • summarizing nucleotide and amino acid variants with excel tables