sarek: Parameters

Define where the pipeline should find input data and save output data.

Path to comma-separated file containing information about the samples in the experiment.

type: string

pattern: ^\S+\.(csv|tsv|yaml|yml|json)$

Automatic retrieval for restart

hidden

type: string

pattern: ^\S+\.(csv|tsv|yaml|yml|json)$

Starting step

required

type: string

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Most common options used for the pipeline

Specify how many reads each split of a FastQ file contains. Set 0 to turn off splitting at all.

type: integer

default: 50000000

Estimate interval size.

type: integer

default: 200000

Path to target bed file in case of whole exome or targeted sequencing or intervals file.

type: string

pattern: \S+\.(bed|interval_list)$

Disable usage of intervals.

type: boolean

Enable when exome or panel data is provided.

type: boolean

Tools to use for contamination removal, duplicate marking, variant calling and/or for annotation.

type: string

Disable specified tools.

type: string

Trim fastq file or handle UMIs

Run FastP for read trimming

type: boolean

Remove bp from the 5’ end of read 1

type: integer

Remove bp from the 5’ end of read 2

type: integer

Remove bp from the 3’ end of read 1

type: integer

Remove bp from the 3’ end of read 2

type: integer

Removing poly-G tails.

type: boolean

Minimum length of reads to keep

type: integer

default: 15

Save trimmed FastQ file intermediates.

type: boolean

If set, publishes split FASTQ files. Intended for testing purposes.

type: boolean

Parameters related to the handling of Unique Molecular Identifiers (UMIs)

Specify UMI read structure for fgbio UMI consensus read generation

type: string

Default strategy for fgbio UMI-based consensus read generation

type: string

Move UMIs from fastq read headers to a tag prior to deduplication.

type: boolean

Location of the UMI(s) to be extracted with fastp.

type: string

Length of the UMI(s) in the read.

type: integer

Number of bases to skip after the UMI(s) in the read when extracting with fastp.

type: integer

Tag detailing where UMIs are present inside the bam/cram file (e.g. RX).

type: string

Path to comma-separated file containing a list of reference genomes to filter reads against with BBSplit. You have to also explicitly set --tools bbsplit if you want to use BBSplit.

type: string

Path to directory or tar.gz archive for pre-built BBSplit index.

type: string

If this option is specified, FastQ files split by reference will be saved in the results directory.

type: boolean

Configure preprocessing tools

Specify aligner to be used to map reads to reference genome.

type: string

Save mapped files.

type: boolean

Saves output from mapping (if --save_mapped), Markduplicates & Baserecalibration as BAM file instead of CRAM

type: boolean

Enable usage of GATK Spark implementation for duplicate marking and/or base quality score recalibration

type: string

type: integer

Generate consensus reads with Sentieon dedup rather than choosing one best read.

type: boolean

Configure variant calling tools

If true, skips germline variant calling for matched normal to tumor sample. Normal samples without matched tumor will still be processed through germline variant calling tools.

type: boolean

Overwrite Ascat min base quality required for a read to be counted.

type: integer

default: 20

Overwrite Ascat minimum depth required in the normal for a SNP to be considered.

type: integer

default: 10

Overwrite Ascat min mapping quality required for a read to be counted.

type: integer

default: 35

Overwrite ASCAT ploidy.

type: number

Overwrite ASCAT purity.

type: number

Specify a custom chromosome length file.

type: string

pattern: ^\S+\.(fai|len)$

Overwrite Control-FREEC coefficientOfVariation

type: number

default: 0.05

Overwrite Control-FREEC contaminationAdjustement

type: boolean

Design known contamination value for Control-FREEC

type: integer

Minimal sequencing quality for a position to be considered in BAF analysis.

type: integer

Minimal read coverage for a position to be considered in BAF analysis.

type: integer

Genome ploidy used by ControlFREEC

type: string

default: 2

Overwrite Control-FREEC window size.

type: number

Copy-number reference for CNVkit

type: string

pattern: ^\S+\.cnn$

Filtering expression for vcflib/vcffilter

type: string

default: 30

Turn on the joint germline variant calling for GATK haplotypecaller

type: boolean

Runs Mutect2 in joint (multi-sample) mode for better concordance among variant calls of tumor samples from the same patient. Mutect2 outputs will be stored in a subfolder named with patient ID under variant_calling/mutect2/ folder. Only a single normal sample per patient is allowed. Tumor-only mode is also supported.

type: boolean

Do not analyze soft clipped bases in the reads for GATK Mutect2.

type: boolean

Panel-of-normals VCF (bgzipped) for GATK Mutect2

type: string

pattern: ^\S+\.vcf\.gz$

Index of PON panel-of-normals VCF.

type: string

pattern: ^\S+\.vcf\.gz\.tbi$

Option for selecting output and emit-mode of Sentieon’s Haplotyper.

type: string

default: variant

Option for selecting output and emit-mode of Sentieon’s Dnascope.

type: string

default: variant

Option for selecting the PCR indel model used by Sentieon Dnascope.

type: string

default: CONSERVATIVE

type: string

default: CONSERVATIVE

Option for concatenating germline vcf-files.

type: boolean

Option for normalization of vcf-files.

type: boolean

Number of chunks to split the vcf-files for varlociraptor

hidden

type: integer

default: 15

Yte compatible scenario file for germline samples. Defaults to assets/varlociraptor_germline.yte.yaml

type: string

Yte compatible scenario file for somatic samples. Defaults to assets/varlociraptor_somatic.yte.yaml

type: string

Yte compatible scenario file for tumor only samples. Defaults to assets/varlociraptor_tumor_only.yte.yaml

type: string

Allow usage of fasta file for annotation with VEP

type: boolean

Enable the use of the VEP dbNSFP plugin.

type: boolean

Path to dbNSFP processed file.

type: string

pattern: ^\S+\.gz$

Path to dbNSFP tabix indexed file.

type: string

pattern: ^\S+\.vcf\.gz\.(csi|tbi)$

Consequence to annotate with

type: string

Fields to annotate with

type: string

default: rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF

Enable the use of the VEP LOFTEE plugin.

type: boolean

Enable the use of the VEP SpliceAI plugin.

type: boolean

Path to spliceai raw scores snv file.

type: string

pattern: ^\S+\.\vcf\.gz$

Path to spliceai raw scores snv tabix indexed file.

type: string

pattern: ^\S+\\.vcf\.gz.(csi|tbi)$

Path to spliceai raw scores indel file.

type: string

pattern: ^\S+\.vcf\.gz$

Path to spliceai raw scores indel tabix indexed file.

type: string

pattern: ^\S+\.vcf\.gz\.(csi|tbi)$

Enable the use of the VEP SpliceRegion plugin.

type: boolean

Add an extra custom argument to VEP.

type: string

default: --everything --filter_common --per_gene --total_length --offline --format vcf

Should reflect the VEP version used in the container.

type: string

default: 111.0-0

The output directory where the cache will be saved. You have to use absolute paths to storage on Cloud infrastructure.

type: string

VEP output-file format.

type: string

A vcf file containing custom annotations to be used with bcftools annotate. Needs to be bgzipped.

type: string

pattern: ^\S+\.vcf\.gz$

Index file for bcftools_annotations

type: string

pattern: ^\S+\.vcf\.gz\.tbi$

Optional text file with list of columns to use from bcftools_annotations, one name per row

type: string

Text file with the header lines of bcftools_annotations

type: string

General options to interact with reference genomes.

The base path to the igenomes reference files

type: string

default: s3://ngi-igenomes/igenomes/

Do not load the iGenomes reference config.

type: boolean

Save built references.

type: boolean

Only built references.

type: boolean

Download annotation cache.

type: boolean

Reference genome related files and options required for the workflow. If you use AWS iGenomes, this has already been set for you appropriately.

Name of iGenomes reference.

type: string

default: GATK.GRCh38

ASCAT genome.

type: string

Path to ASCAT allele zip file.

type: string

pattern: ^\S+\.zip$

Path to ASCAT loci zip file.

type: string

pattern: ^\S+\.zip$

Path to ASCAT GC content correction file.

type: string

pattern: ^\S+\.zip$

Path to ASCAT RT (replictiming) correction file.

type: string

pattern: ^\S+\.zip$

Path to BWA mem indices.

type: string

Path to bwa-mem2 mem indices.

type: string

Path to chromosomes folder used with ControLFREEC.

type: string

Path to dbsnp file.

type: string

pattern: ^\S+\.vcf\.gz$

Path to dbsnp index.

type: string

pattern: ^\S+\.vcf\.gz\.tbi$

Label string for VariantRecalibration (haplotypecaller joint variant calling).

If you use AWS iGenomes, this has already been set for you appropriately.

type: string

Path to FASTA dictionary file.

type: string

pattern: ^\S+\.dict$

Path to dragmap indices.

type: string

Path to FASTA genome file.

type: string

pattern: ^\S+\.fn?a(sta)?(\.gz)?$

Path to FASTA reference index.

type: string

Path to GATK Mutect2 Germline Resource File.

type: string

pattern: \S+\.vcf\.gz$

Path to GATK Mutect2 Germline Resource Index.

type: string

pattern: \S+\.vcf\.gz\.tbi$

Path to known indels file.

type: string

Path to known indels file index.

type: string

Label string for VariantRecalibration (haplotypecaller joint variant calling). If you use AWS iGenomes, this has already been set for you appropriately.

type: string

Path to known snps file.

type: string

pattern: ^\S+\.vcf\.gz$

Path to known snps file snps.

type: string

pattern: ^\S+\.vcf\.gz\.tbi$

Label string for VariantRecalibration (haplotypecaller joint variant calling).If you use AWS iGenomes, this has already been set for you appropriately.

type: string

Path to Control-FREEC mappability file.

type: string

pattern: ^\S+\.gem$

Path to models folder used with MSIsensor2.

type: string

Path to scan file used with MSIsensor2.

type: string

Path to scan file used with MSIsensorPro.

type: string

Path to SNP bed file for sample checking with NGSCheckMate

type: string

pattern: ^\S+\.bed$

Machine learning model for Sentieon Dnascope.

type: string

pattern: ^\S+\.model$

Path to snpEff cache.

type: string

default: s3://annotation-cache/snpeff_cache/

snpEff DB version.

type: string

Path to VEP cache.

type: string

default: s3://annotation-cache/vep_cache/

VEP cache version.

type: string

VEP genome.

type: string

VEP species.

type: string

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Base path / URL for data used in the test profiles

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/sarek3

Base path / URL for data used in the modules

hidden

type: string

Sequencing center information to be added to read group (CN field).

hidden

type: string

Sequencing platform information to be added to read group (PL field).

hidden

type: string

default: ILLUMINA

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden

type: boolean

Incoming hook URL for messaging service

hidden

type: string

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

hidden

type: string

nf-core/sarek