nf-core/sarek
Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing information about the samples in the experiment.
string^\S+\.(csv|tsv|yaml|yml|json)$Automatic retrieval for restart
string^\S+\.(csv|tsv|yaml|yml|json)$Starting step
stringThe output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringMost common options used for the pipeline
Specify how many reads each split of a FastQ file contains. Set 0 to turn off splitting at all.
integer50000000Estimate interval size.
integer200000Path to target bed file in case of whole exome or targeted sequencing or intervals file.
string\S+\.(bed|interval_list)$Disable usage of intervals.
booleanEnable when exome or panel data is provided.
booleanTools to use for contamination removal, duplicate marking, variant calling and/or for annotation.
stringDisable specified tools.
stringTrim fastq file or handle UMIs
Run FastP for read trimming
booleanRemove bp from the 5’ end of read 1
integerRemove bp from the 5’ end of read 2
integerRemove bp from the 3’ end of read 1
integerRemove bp from the 3’ end of read 2
integerRemoving poly-G tails.
booleanMinimum length of reads to keep
integer15Save trimmed FastQ file intermediates.
booleanIf set, publishes split FASTQ files. Intended for testing purposes.
booleanParameters related to the handling of Unique Molecular Identifiers (UMIs)
Specify UMI read structure for fgbio UMI consensus read generation
stringDefault strategy for fgbio UMI-based consensus read generation
stringMove UMIs from fastq read headers to a tag prior to deduplication.
booleanLocation of the UMI(s) to be extracted with fastp.
stringLength of the UMI(s) in the read.
integerNumber of bases to skip after the UMI(s) in the read when extracting with fastp.
integerTag detailing where UMIs are present inside the bam/cram file (e.g. RX).
stringPath to comma-separated file containing a list of reference genomes to filter reads against with BBSplit. You have to also explicitly set --tools bbsplit if you want to use BBSplit.
stringPath to directory or tar.gz archive for pre-built BBSplit index.
stringIf this option is specified, FastQ files split by reference will be saved in the results directory.
booleanConfigure preprocessing tools
Specify aligner to be used to map reads to reference genome.
stringSave mapped files.
booleanSaves output from mapping (if --save_mapped), Markduplicates & Baserecalibration as BAM file instead of CRAM
booleanEnable usage of GATK Spark implementation for duplicate marking and/or base quality score recalibration
stringintegerGenerate consensus reads with Sentieon dedup rather than choosing one best read.
booleanConfigure variant calling tools
If true, skips germline variant calling for matched normal to tumor sample. Normal samples without matched tumor will still be processed through germline variant calling tools.
booleanOverwrite Ascat min base quality required for a read to be counted.
integer20Overwrite Ascat minimum depth required in the normal for a SNP to be considered.
integer10Overwrite Ascat min mapping quality required for a read to be counted.
integer35Overwrite ASCAT ploidy.
numberOverwrite ASCAT purity.
numberSpecify a custom chromosome length file.
string^\S+\.(fai|len)$Overwrite Control-FREEC coefficientOfVariation
number0.05Overwrite Control-FREEC contaminationAdjustement
booleanDesign known contamination value for Control-FREEC
integerMinimal sequencing quality for a position to be considered in BAF analysis.
integerMinimal read coverage for a position to be considered in BAF analysis.
integerGenome ploidy used by ControlFREEC
string2Overwrite Control-FREEC window size.
numberCopy-number reference for CNVkit
string^\S+\.cnn$Filtering expression for vcflib/vcffilter
string30Turn on the joint germline variant calling for GATK haplotypecaller
booleanRuns Mutect2 in joint (multi-sample) mode for better concordance among variant calls of tumor samples from the same patient. Mutect2 outputs will be stored in a subfolder named with patient ID under variant_calling/mutect2/ folder. Only a single normal sample per patient is allowed. Tumor-only mode is also supported.
booleanDo not analyze soft clipped bases in the reads for GATK Mutect2.
booleanPanel-of-normals VCF (bgzipped) for GATK Mutect2
string^\S+\.vcf\.gz$Index of PON panel-of-normals VCF.
string^\S+\.vcf\.gz\.tbi$Option for selecting output and emit-mode of Sentieon’s Haplotyper.
stringvariantOption for selecting output and emit-mode of Sentieon’s Dnascope.
stringvariantOption for selecting the PCR indel model used by Sentieon Dnascope.
stringCONSERVATIVEstringCONSERVATIVEOption for concatenating germline vcf-files.
booleanOption for normalization of vcf-files.
booleanNumber of chunks to split the vcf-files for varlociraptor
integer15Yte compatible scenario file for germline samples. Defaults to assets/varlociraptor_germline.yte.yaml
stringYte compatible scenario file for somatic samples. Defaults to assets/varlociraptor_somatic.yte.yaml
stringYte compatible scenario file for tumor only samples. Defaults to assets/varlociraptor_tumor_only.yte.yaml
stringAllow usage of fasta file for annotation with VEP
booleanEnable the use of the VEP dbNSFP plugin.
booleanPath to dbNSFP processed file.
string^\S+\.gz$Path to dbNSFP tabix indexed file.
string^\S+\.vcf\.gz\.(csi|tbi)$Consequence to annotate with
stringFields to annotate with
stringrs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AFEnable the use of the VEP LOFTEE plugin.
booleanEnable the use of the VEP SpliceAI plugin.
booleanPath to spliceai raw scores snv file.
string^\S+\.\vcf\.gz$Path to spliceai raw scores snv tabix indexed file.
string^\S+\\.vcf\.gz.(csi|tbi)$Path to spliceai raw scores indel file.
string^\S+\.vcf\.gz$Path to spliceai raw scores indel tabix indexed file.
string^\S+\.vcf\.gz\.(csi|tbi)$Enable the use of the VEP SpliceRegion plugin.
booleanAdd an extra custom argument to VEP.
string--everything --filter_common --per_gene --total_length --offline --format vcfShould reflect the VEP version used in the container.
string111.0-0The output directory where the cache will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringVEP output-file format.
stringA vcf file containing custom annotations to be used with bcftools annotate. Needs to be bgzipped.
string^\S+\.vcf\.gz$Index file for bcftools_annotations
string^\S+\.vcf\.gz\.tbi$Optional text file with list of columns to use from bcftools_annotations, one name per row
stringText file with the header lines of bcftools_annotations
stringGeneral options to interact with reference genomes.
The base path to the igenomes reference files
strings3://ngi-igenomes/igenomes/Do not load the iGenomes reference config.
booleanSave built references.
booleanOnly built references.
booleanDownload annotation cache.
booleanReference genome related files and options required for the workflow. If you use AWS iGenomes, this has already been set for you appropriately.
Name of iGenomes reference.
stringGATK.GRCh38ASCAT genome.
stringPath to ASCAT allele zip file.
string^\S+\.zip$Path to ASCAT loci zip file.
string^\S+\.zip$Path to ASCAT GC content correction file.
string^\S+\.zip$Path to ASCAT RT (replictiming) correction file.
string^\S+\.zip$Path to BWA mem indices.
stringPath to bwa-mem2 mem indices.
stringPath to chromosomes folder used with ControLFREEC.
stringPath to dbsnp file.
string^\S+\.vcf\.gz$Path to dbsnp index.
string^\S+\.vcf\.gz\.tbi$Label string for VariantRecalibration (haplotypecaller joint variant calling).
If you use AWS iGenomes, this has already been set for you appropriately.
stringPath to FASTA dictionary file.
string^\S+\.dict$Path to dragmap indices.
stringPath to FASTA genome file.
string^\S+\.fn?a(sta)?(\.gz)?$Path to FASTA reference index.
stringPath to GATK Mutect2 Germline Resource File.
string\S+\.vcf\.gz$Path to GATK Mutect2 Germline Resource Index.
string\S+\.vcf\.gz\.tbi$Path to known indels file.
stringPath to known indels file index.
stringLabel string for VariantRecalibration (haplotypecaller joint variant calling). If you use AWS iGenomes, this has already been set for you appropriately.
stringPath to known snps file.
string^\S+\.vcf\.gz$Path to known snps file snps.
string^\S+\.vcf\.gz\.tbi$Label string for VariantRecalibration (haplotypecaller joint variant calling).If you use AWS iGenomes, this has already been set for you appropriately.
stringPath to Control-FREEC mappability file.
string^\S+\.gem$Path to models folder used with MSIsensor2.
stringPath to scan file used with MSIsensor2.
stringPath to scan file used with MSIsensorPro.
stringPath to SNP bed file for sample checking with NGSCheckMate
string^\S+\.bed$Machine learning model for Sentieon Dnascope.
string^\S+\.model$Path to snpEff cache.
strings3://annotation-cache/snpeff_cache/snpEff DB version.
stringPath to VEP cache.
strings3://annotation-cache/vep_cache/VEP cache version.
stringVEP genome.
stringVEP species.
stringParameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
stringmasterBase directory for Institutional configs.
stringhttps://raw.githubusercontent.com/nf-core/configs/masterInstitutional config name.
stringInstitutional config description.
stringInstitutional config contact information.
stringInstitutional config URL link.
stringBase path / URL for data used in the test profiles
stringhttps://raw.githubusercontent.com/nf-core/test-datasets/sarek3Base path / URL for data used in the modules
stringSequencing center information to be added to read group (CN field).
stringSequencing platform information to be added to read group (PL field).
stringILLUMINALess common options for the pipeline, typically set in a config file.
Display version and exit.
booleanMethod used to save pipeline results to output directory.
stringEmail address for completion summary.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Email address for completion summary, only when pipeline fails.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Send plain-text email instead of HTML.
booleanFile size limit when attaching MultiQC reports to summary emails.
string25.MB^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$Do not use coloured log outputs.
booleanIncoming hook URL for messaging service
stringMultiQC report title. Printed as page header, used for filename if not otherwise specified.
stringCustom config file to supply to MultiQC.
stringCustom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
stringCustom MultiQC yaml file containing HTML including a methods description.
stringBoolean whether to validate parameters against the schema at runtime
booleantrueBase URL or local path to location of pipeline test dataset files
stringhttps://raw.githubusercontent.com/nf-core/test-datasets/Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
string