For up-to-date User Guides please see Lazypipe Wiki:
In this example we will use a sample PE library that is included with the repository (data/samples/M15small_R*.fastq).
Preprocess reads with fastp:
perl lazypipe.pl -1 data/samples/M15small_R1.fastq --pipe pre -t 8 -v
Download Neovison vison genome and use it to filter host reads. Note that running host filtering with a newly downloaded genome will take some time to index the genome:
mkdir -p $data/hostgen
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/108/605/GCA_900108605.1_NNQGG… -P $data/hostgen/
perl lazypipe.pl -1 data/samples/M15small_R1.fastq --pipe flt --hostgen $data/hostgen/GCA_900108605.1_NNQGG.v01_genomic.fna.gz -t 8 -v
Run assembling with Megahit and realign reads to assembly
perl lazypipe.pl -1 data/samples/M15small_R1.fastq -p ass,rea --ass megahit -t 8 -v
Run 1st round annotation with Minimap2 against your local minimap.refseq database:
perl lazypipe.pl -1 data/samples/M15small_R1.fastq -p ann1 --ann1 minimap.refseq -t 8 -v
Run 1st round annotation with SANSparallel against UniProt TrEMBL. Note that SANSparallel runs on a remote server and requires internet connection. Append results to Minimap2 annotations from the previous step:
perl lazypipe.pl -1 data/samples/M15small_R1.fastq -p ann1 --ann1 sans --append -t 8 -v
Now run a more complex 1st round annotation. Start by mapping contigs with Minimap2, then map unmapped contigs with SANSparallel, then map unmapped contigs with BLASTN against blastn.vi database. Note that without --append flag this will overwrite existing 1st round annotations:
perl lazypipe.pl -1 data/samples/M15small_R1.fastq -p ann1 --ann1 minimap.refseq,sans,blastn.vi -t 8 -v
Run 2nd round annotation. In the second round you can target archaeal+bacterial (=ab), bacteriophage (=ph), viral (=vi) and unmapped (=un) contigs, based on labeling from the 1st round. Local databases for the 2nd round annotations are defined in ann2.databases section of the config.yaml. For example, to map viral contigs with BLASTN and BLASTP against local viral databases type:
perl lazypipe.pl -1 data/samples/M15small_R1.fastq --pipe ann2 --ann2 blastn.vi.refseq,blastp.vi -t 8 -v
Run 2nd round annotation for bacteria with BLASTN. Append results to BLASTN and BLASTP annotations from the previous step:
perl lazypipe.pl -1 data/samples/M15small_R1.fastq --pipe ann2 --ann2 blastn.ab.refseq --append -t 8 -v
You can also combine these runs in any order. For example:
perl lazypipe.pl -1 data/samples/M15small_R1.fastq --pipe ann2 --ann2 blastn.ab.refseq,blastn.vi.refseq,blastp.vi -t 8 -v
The most common combinations of 1st and 2nd round annotations can be saved to config.yaml in the ann.strategies section. Each annotation strategy is saved as a key-value pair. There are several annotation strategies predifined:
Generate reports based on created annotations:
perl lazypipe.pl -1 data/samples/M15small_R1.fastq --pipe rep -t 8 -v
Generate assembly stats, pack for sharing and remove temporary files:
perl lazypipe.pl -1 data/samples/M15small_R1.fastq -p stats,pack,clean -t 8 -v
For convenience, routine analysis steps (pre,flt,ass,rea,ann1,ann2,rep,sta,pack,clean) can be called with maintag. To run main analysis with normal annotation strategy type:
perl lazypipe.pl -1 data/samples/M15small_R1.fastq -p main --anns norm -t 8 -v
Results are output to $res/$sample. Default value for $res is set in config.yaml and default value for $sample is created from the name of the input reads. These can be changed during runtime with --res mydir --sample mysample.
File or Directory | Description |
---|---|
contigs | contigs sorted by taxa |
contigs.fa | contigs in a single fasta file |
contigs.ann1.ab.fa | archaeal+bacterial contigs (based on 1st round annotation) |
contigs.ann1.ph.fa | bacteriophage contigs (1st round) |
contigs.ann1.vi.fa | viral contigs (1st round) |
contigs.ann1.un.fa | unmapped contigs (1st round) |
contigs.ann2.ab.fa | archaeal+bacterial contigs (2nd round) |
contigs.ann2.ph.fa | bacteriophage contigs (2nd round) |
contigs.ann2.vi.fa | viral contigs (2nd round) |
contigs.ann2.un.fa | unmapped contigs (2nd round) |
contigs.orfs.aa.fa | predicted ORFs as aa sequences |
contigs.orfs.nt.fa | predicted ORFs as nt sequences |
scaffolds.fa | scaffolds, if available |
Table 1: Lazypipe results: contigs and ORFs.
Spreadsheets with taxon abundancies are printed to abund_table.xlsx. Abundancies are displayed in separate tables for viruses (excluding bacteriophages), bacteria, bacteriophages and eukaryots. For each domain abundancies are displayed at three taxonomic levels: species, genus and family.
For raw abundance data see abund_table.tsv.
column | description |
---|---|
readn | read pairs assigned to this taxon |
readn_pc | percentage of reads pairs assigned to this taxon |
csum | cumulative read distribution score (percentage of reads mapped to this taxon and more abundant taxa) |
csumq | confidences score based on csum (1 ~ reliable, 2 ~ intermediate, 3 ~ unreliable) |
contign | contigs assigned to this taxon |
species | species name (NCBI taxonomy) |
species_id | species taxid (NCBI taxonomy) |
genus | genus name |
genus_id | genus taxid |
family | family name |
family_id | family taxid |
Table 2: Columns in abund_table.xlsx
Spreadsheets with contig annotations are printed to contig_annot.xslx. Spreadsheets are displayed separately for viruses (excluding bacteriophages), bacteria, bacteriophages and eukaryots.
For raw annotation data see contigs_annot.tsv.
column | description |
---|---|
search | applied database search (e.g. blastn) |
db | applied database (e.g. UniRef100.vi) |
dbtype | nucl for nucleotide and prot for protein databases |
contig | contig id |
orf | orf description in start-end:strand format |
clen | contig length |
sseqid | subject sequence id |
bitscore | alignment score |
alen | alignment length |
pident | percent identity |
qlen | query sequence length |
qcov | query coverage |
slen | subject sequence length |
scov | subject coverage |
staxid | subject sequence taxid |
sname | subject sequence name |
bphage | yes for bacteriophage staxids |
species | assigned species |
genus | assigned genus |
family | assigned family |
order | assigned order |
class | assigned class |
Table 3: Columns in contigs_annot.xslx
Quality Control (QC) plots include length histograms for reads and contigs, and survival plots. The survival plots track retained reads after each pipeline step.
file | description |
---|---|
qc.read1.jpeg | length hist for forward reads |
qc.read2.jpeg | length hist for reverse reads |
qc.contigs.jpeg | length hist for contigs |
qc.readsurv.jpeg | read survival plots |
Table 4: Quality Control plots