This guide provides instructions on how to set up your own HAVoC.
The tool can be downloaded directly from the repository (https://bitbucket.org/auto_cov_pipeline/havoc.git) by visiting the link or by using the following command in terminal:
Note that downloading these may take time depending on your internet speed, as the FASTQ files are relatively large (200–400 MB).
- Trimmomatic or Fastp
- BWA-MEM or Bowtie2
Installing bioinformatics software
Before starting to use HAVoC , you will need to get all these tools (listed below) installed on their system.
All dependencies can be conveniently installed with Bioconda with the following command:
conda install fastp trimmomatic bowtie2 bwa sambamba samtools bedtools lofreq bcftools pangolin
or please follow the installation instruction from each tool website on how to install them. These are very popular and common bioinformatics tools and majority could be found install on various university servers.
Trimmomatic performs a variety of useful preprocessing tasks for illumina paired-end and single ended data. See the documentation of Trimmomatic for further information. The tool can be downloaded via:
Fastp is a fast all-in-one read preprocessing software similar to Trimmomatic. Fastp includes automated adapter detection and polyG tail trimming. For further information refer to Fastp documentation. The tool can be downloaded via:
BWA is a fast and accurate aligner designed to align reads and other short DNA sequences against large reference genomes. See the documentation of Burrow-Wheeler Aligner for installation and use.
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences or genomes.
Samtools provides various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. See the documentation of Samtools for installation and use.
Bedtools is a collection of tools for a wide-range of genomics analysis tasks. A useful function of it is masking low coverage regions in a sequnce.
Lowfreq is a sensitive and robust tool for calling single-nucleotide variants (SNVs) from high-coverage sequencing datasets.
Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. It allows the user to assign the most likely Pango lineage to a SARS-CoV-2 query sequence.