Recommended Pipeline Directory Structure

Benefits:

  • separates workflow logic from data
  • easier debugging
  • easier collaboration

Common practice:

  • config → parameters and sample tables
  • envs → reproducible environments
  • rules → modular workflow steps
  • results → generated outputs

A clean directory structure makes pipelines easier to maintain and reproduce.

Snakefile Breakdown

  • Fastq files that need trimming - input: sample.fastq
  • Cutadapt - output: sample-trimmed.fastq
  • BWA - align trimmed fastq to assembly output: sample-aligned.sam
  • Samtools sorting, indexing - output: sample-sorted.bam
  • Freebayes variant calling - output: sample-variants.vcf

Example snakefile

rule all:

 input:
 
variants/sample1.vcf

Snakemake take first rule as target then constructs graph of dependencies

rule trim:

 input:
 
”reads/sample1.fastq”
output:

 ”trimmed_reads/sample1-trimmed.fastq”
shell:

  cutadapt -A TCCGGGTS -o {output} {input} 

rule align:

 input:
 
"trimmed_reads/sample1-trimmed.fastq"
 output:
 
    "bam/sample1.bam"

 threads: 1
 
 shell:
 
    bwa mem -t {threads} ref.fa {input} | samtools view -Sb -> {output}"

wildcards serve as placeholders within rules to operate on multiple files via pattern matching

Previous
Next
RC Logo RC Logo © 2026 The Rector and Visitors of the University of Virginia