.. This file was automatically generated by docs/create.py. CLI *** Introduction ============ This section describes command line interface (CLI) for the fuc package. For getting help on the fuc CLI: .. code-block:: text $ fuc -h usage: fuc [-h] [-v] COMMAND ... positional arguments: COMMAND bam-aldepth Compute allelic depth from a BAM file. bam-depth Compute per-base read depth from BAM files. bam-head Print the header of a BAM file. bam-index Index a BAM file. bam-rename Rename the sample in a BAM file. bam-slice Slice a BAM file. bed-intxn Find the intersection of BED files. bed-sum Summarize a BED file. cov-concat Concatenate depth of coverage files. cov-rename Rename the samples in a depth of coverage file. fa-filter Filter sequence records in a FASTA file. fq-count Count sequence reads in FASTQ files. fq-sum Summarize a FASTQ file. fuc-bgzip Write a BGZF compressed file. fuc-compf Compare the contents of two files. fuc-demux Parse the Reports directory from bcl2fastq. fuc-exist Check whether certain files exist. fuc-find Retrieve absolute paths of files whose name matches a specified pattern, optionally recursively. fuc-undetm Compute top unknown barcodes using undertermined FASTQ from bcl2fastq. maf-maf2vcf Convert a MAF file to a VCF file. maf-oncoplt Create an oncoplot with a MAF file. maf-sumplt Create a summary plot with a MAF file. maf-vcf2maf Convert a VCF file to a MAF file. ngs-bam2fq Pipeline for converting BAM files to FASTQ files. ngs-fq2bam Pipeline for converting FASTQ files to analysis-ready BAM files. ngs-hc Pipeline for germline short variant discovery. ngs-m2 Pipeline for somatic short variant discovery. ngs-pon Pipeline for constructing a panel of normals (PoN). ngs-quant Pipeline for running RNAseq quantification from FASTQ files with Kallisto. ngs-trim Pipeline for trimming adapters from FASTQ files. tabix-index Index a GFF/BED/SAM/VCF file with Tabix. tabix-slice Slice a GFF/BED/SAM/VCF file with Tabix. tbl-merge Merge two table files. tbl-sum Summarize a table file. vcf-call Call SNVs and indels from BAM files. vcf-filter Filter a VCF file. vcf-index Index a VCF file. vcf-merge Merge two or more VCF files. vcf-rename Rename the samples in a VCF file. vcf-slice Slice a VCF file for specified regions. vcf-split Split a VCF file by individual. vcf-vcf2bed Convert a VCF file to a BED file. vcf-vep Filter a VCF file by annotations from Ensembl VEP. optional arguments: -h, --help Show this help message and exit. -v, --version Show the version number and exit. For getting help on a specific command (e.g. vcf-merge): .. code-block:: text $ fuc vcf-merge -h bam-aldepth =========== .. code-block:: text $ fuc bam-aldepth -h usage: fuc bam-aldepth [-h] bam sites Count allelic depth from a BAM file. Positional arguments: bam Input alignment file. sites TSV file containing two columns, chromosome and position. This can also be a BED or VCF file (compressed or uncompressed). Input type will be detected automatically. Optional arguments: -h, --help Show this help message and exit. [Example] Provide sites with a TSV file: $ fuc bam-aldepth in.bam sites.tsv > out.tsv [Example] Provide sites with a VCF file: $ fuc bam-aldepth in.bam sites.vcf > out.tsv bam-depth ========= .. code-block:: text $ fuc bam-depth -h usage: fuc bam-depth [-h] [-r TEXT [TEXT ...]] [--zero] bams [bams ...] Compute per-base read depth from BAM files. Under the hood, the command computes read depth using the 'samtools depth' command. Positional arguments: bams One or more input BAM files. Alternatively, you can provide a text file (.txt, .tsv, .csv, or .list) containing one BAM file per line. Optional arguments: -h, --help Show this help message and exit. -r TEXT [TEXT ...], --regions TEXT [TEXT ...] By default, the command counts all reads in BAM files, which can be excruciatingly slow for large files (e.g. whole genome sequencing). Therefore, use this argument to only output positions in given regions. Each region must have the format chrom:start-end and be a half-open interval with (start, end]. This means, for example, chr1:100-103 will extract positions 101, 102, and 103. Alternatively, you can provide a BED file (compressed or uncompressed) to specify regions. Note that the 'chr' prefix in contig names (e.g. 'chr1' vs. '1') will be automatically added or removed as necessary to match the input BAM's contig names. --zero Output all positions including those with zero depth. [Example] Specify regions manually: $ fuc bam-depth 1.bam 2.bam \ -r chr1:100-200 chr2:400-500 > out.tsv [Example] Specify regions with a BED file: $ fuc bam-depth bam.list \ -r in.bed > out.tsv bam-head ======== .. code-block:: text $ fuc bam-head -h usage: fuc bam-head [-h] bam Print the header of a BAM file. Positional arguments: bam Input alignment file. Optional arguments: -h, --help Show this help message and exit. [Example] Print the header of a BAM file: $ fuc bam-head in.bam bam-index ========= .. code-block:: text $ fuc bam-index -h usage: fuc bam-index [-h] bam Index a BAM file. Positional arguments: bam Input alignment file. Optional arguments: -h, --help Show this help message and exit. [Example] Index a BAM file: $ fuc bam-index in.bam bam-rename ========== .. code-block:: text $ fuc bam-rename -h usage: fuc bam-rename [-h] bam name Rename the sample in a BAM file. Positional arguments: bam Input alignment file. name New sample name. Optional arguments: -h, --help Show this help message and exit. [Example] Write a new BAM file after renaming: $ fuc bam-rename in.bam NA12878 > out.bam bam-slice ========= .. code-block:: text $ fuc bam-slice -h usage: fuc bam-slice [-h] [--format TEXT] [--fasta PATH] bam regions [regions ...] Slice a BAM file. Positional arguments: bam Input BAM file. It must be already indexed to allow random access. You can index a BAM file with the bam-index command. regions One or more regions to be sliced. Each region must have the format chrom:start-end and be a half-open interval with (start, end]. This means, for example, chr1:100-103 will extract positions 101, 102, and 103. Alternatively, you can provide a BED file (compressed or uncompressed) to specify regions. Note that the 'chr' prefix in contig names (e.g. 'chr1' vs. '1') will be automatically added or removed as necessary to match the input BED's contig names. Optional arguments: -h, --help Show this help message and exit. --format TEXT Output format (default: 'BAM') (choices: 'SAM', 'BAM', 'CRAM'). --fasta PATH FASTA file. Required when --format is 'CRAM'. [Example] Specify regions manually: $ fuc bam-slice in.bam 1:100-300 2:400-700 > out.bam [Example] Speicfy regions with a BED file: $ fuc bam-slice in.bam regions.bed > out.bam [Example] Slice a CRAM file: $ fuc bam-slice in.bam regions.bed --format CRAM --fasta ref.fa > out.cram bed-intxn ========= .. code-block:: text $ fuc bed-intxn -h usage: fuc bed-intxn [-h] bed [bed ...] Find the intersection of BED files. Positional arguments: bed Input BED files. Optional arguments: -h, --help Show this help message and exit. [Example] Find the intersection of three BED files: $ fuc bed-intxn in1.bed in2.bed in3.bed > out.bed bed-sum ======= .. code-block:: text $ fuc bed-sum -h usage: fuc bed-sum [-h] [--bases INT] [--decimals INT] bed Summarize a BED file. This command will compute various summary statistics for a BED file. The returned statistics include the total numbers of probes and covered base pairs for each chromosome. By default, covered base pairs are displayed in bp, but if you prefer you can, for example, use '--bases 1000' to display in kb. Positional arguments: bed Input BED file. Optional arguments: -h, --help Show this help message and exit. --bases INT Number to divide covered base pairs (default: 1). --decimals INT Number of decimals (default: 0). cov-concat ========== .. code-block:: text $ fuc cov-concat -h usage: fuc cov-concat [-h] [--axis INT] tsv [tsv ...] Concatenate depth of coverage files. Positional arguments: tsv Input TSV files. Optional arguments: -h, --help Show this help message and exit. --axis INT The axis to concatenate along (default: 0) (choices: 0, 1 where 0 is index and 1 is columns). [Example] Concatenate vertically: $ fuc cov-concat in1.tsv in2.tsv > out.tsv [Example] Concatenate horizontally: $ fuc cov-concat in1.tsv in2.tsv --axis 1 > out.tsv cov-rename ========== .. code-block:: text $ fuc cov-rename -h usage: fuc cov-rename [-h] [--mode TEXT] [--range INT INT] [--sep TEXT] tsv names Rename the samples in a depth of coverage file. There are three different renaming modes using the names file: - 'MAP': Default mode. Requires two columns, old names in the first and new names in the second. - 'INDEX': Requires two columns, new names in the first and 0-based indicies in the second. - 'RANGE': Requires only one column of new names but --range must be specified. Positional arguments: tsv TSV file (compressed or uncompressed). names Text file containing information for renaming the samples. Optional arguments: -h, --help Show this help message and exit. --mode TEXT Renaming mode (default: 'MAP') (choices: 'MAP', 'INDEX', 'RANGE'). --range INT INT Index range to use when renaming the samples. Applicable only with the 'RANGE' mode. --sep TEXT Delimiter to use when reading the names file (default: '\t'). [Example] Using the default 'MAP' mode: $ fuc cov-rename in.tsv old_new.tsv > out.tsv [Example] Using the 'INDEX' mode: $ fuc cov-rename in.tsv new_idx.tsv --mode INDEX > out.tsv [Example] Using the 'RANGE' mode: $ fuc cov-rename in.tsv new_only.tsv --mode RANGE --range 2 5 > out.tsv fa-filter ========= .. code-block:: text $ fuc fa-filter -h usage: fuc fa-filter [-h] [--contigs TEXT [TEXT ...]] [--exclude] fasta Filter sequence records in a FASTA file. Positional arguments: fasta Input FASTA file (compressed or uncompressed). Optional arguments: -h, --help Show this help message and exit. --contigs TEXT [TEXT ...] One or more contigs to be selected. Alternatively, you can provide a file containing one contig per line. --exclude Exclude specified contigs. [Example] Select certain contigs: $ fuc fa-filter in.fasta --contigs chr1 chr2 > out.fasta [Example] Select certain contigs: $ fuc fa-filter in.fasta --contigs contigs.list --exclude > out.fasta fq-count ======== .. code-block:: text $ fuc fq-count -h usage: fuc fq-count [-h] [fastq ...] Count sequence reads in FASTQ files. Positional arguments: fastq Input FASTQ files (compressed or uncompressed) (default: stdin). Optional arguments: -h, --help Show this help message and exit. [Example] When the input is a FASTQ file: $ fuc fq-count in1.fastq in2.fastq [Example] When the input is stdin: $ cat fastq.list | fuc fq-count fq-sum ====== .. code-block:: text $ fuc fq-sum -h usage: fuc fq-sum [-h] fastq Summarize a FASTQ file. This command will output a summary of the input FASTQ file. The summary includes the total number of sequence reads, the distribution of read lengths, and the numbers of unique and duplicate sequences. Positional arguments: fastq Input FASTQ file (compressed or uncompressed). Optional arguments: -h, --help Show this help message and exit. [Example] Summarize a FASTQ file: $ fuc fq-sum in.fastq fuc-bgzip ========= .. code-block:: text $ fuc fuc-bgzip -h usage: fuc fuc-bgzip [-h] [file ...] Write a BGZF compressed file. BGZF (Blocked GNU Zip Format) is a modified form of gzip compression which can be applied to any file format to provide compression with efficient random access. In addition to being required for random access to and writing of BAM files, the BGZF format can also be used for most of the sequence data formats (e.g. FASTA, FASTQ, GenBank, VCF, MAF). Positional arguments: file Input file to be compressed (default: stdin). Optional arguments: -h, --help Show this help message and exit. [Example] When the input is a VCF file: $ fuc fuc-bgzip in.vcf > out.vcf.gz [Example] When the input is stdin: $ cat in.vcf | fuc fuc-bgzip > out.vcf.gz fuc-compf ========= .. code-block:: text $ fuc fuc-compf -h usage: fuc fuc-compf [-h] left right Compare the contents of two files. This command will compare the contents of two files, returning 'True' if they are identical and 'False' otherwise. Positional arguments: left Input left file. right Input right file. Optional arguments: -h, --help Show this help message and exit. [Example] Compare two files: $ fuc fuc-compf left.txt right.txt fuc-demux ========= .. code-block:: text $ fuc fuc-demux -h usage: fuc fuc-demux [-h] [--sheet PATH] reports output Parse the Reports directory from bcl2fastq. This command will parse, and extract various statistics from, HTML files in the Reports directory created by the bcl2fastq or bcl2fastq2 prograrm. After creating an output directory, the command will write the following files: - flowcell-summary.csv - lane-summary.csv - top-unknown-barcodes.csv - reports.pdf Use --sheet to sort samples in the lane-summary.csv file in the same order as your SampleSheet.csv file. You can also provide a modified version of your SampleSheet.csv file to subset samples for the lane-summary.csv and reports.pdf files. Positional arguments: reports Input Reports directory. output Output directory (will be created). Optional arguments: -h, --help Show this help message and exit. --sheet PATH SampleSheet.csv file. Used for sorting and/or subsetting samples. fuc-exist ========= .. code-block:: text $ fuc fuc-exist -h usage: fuc fuc-exist [-h] [files ...] Check whether certain files exist. This command will check whether or not specified files including directories exist, returning 'True' if they exist and 'False' otherwise. Positional arguments: files Files and directories to be tested (default: stdin). Optional arguments: -h, --help Show this help message and exit. [Example] Test a file: $ fuc fuc-exist in.txt [Example] Test a directory: $ fuc fuc-exist dir [Example] When the input is stdin: $ cat test.list | fuc fuc-exist fuc-find ======== .. code-block:: text $ fuc fuc-find -h usage: fuc fuc-find [-h] [-r] [-d PATH] pattern Retrieve absolute paths of files whose name matches a specified pattern, optionally recursively. Positional arguments: pattern Filename pattern. Optional arguments: -h, --help Show this help message and exit. -r, --recursive Turn on recursive retrieving. -d PATH, --directory PATH Directory to search in (default: current directory). [Example] Retrieve VCF files in the current directory only: $ fuc fuc-find "*.vcf" [Example] Retrieve VCF files recursively: $ fuc fuc-find "*.vcf" -r [Example] Retrieve VCF files in a specific directory: $ fuc fuc-find "*.vcf" -d /path/to/dir fuc-undetm ========== .. code-block:: text $ fuc fuc-undetm -h usage: fuc fuc-undetm [-h] [--count INT] fastq Compute top unknown barcodes using undertermined FASTQ from bcl2fastq. This command will compute top unknown barcodes using undertermined FASTQ from the bcl2fastq or bcl2fastq2 prograrm. Positional arguments: fastq Undertermined FASTQ (compressed or uncompressed). Optional arguments: -h, --help Show this help message and exit. --count INT Number of top unknown barcodes to return (default: 30). [Example] Compute top unknown barcodes: $ fuc fuc-undetm Undetermined_S0_R1_001.fastq.gz maf-maf2vcf =========== .. code-block:: text $ fuc maf-maf2vcf -h usage: fuc maf-maf2vcf [-h] [--fasta PATH] [--ignore_indels] [--cols TEXT [TEXT ...]] [--names TEXT [TEXT ...]] maf Convert a MAF file to a VCF file. In order to handle INDELs the command makes use of a reference assembly (i.e. FASTA file). If SNVs are your only concern, then you do not need a FASTA file and can just use --ignore_indels. If you are going to provide a FASTA file, please make sure to select the appropriate one (e.g. one that matches the genome assembly). In addition to basic genotype calls (e.g. '0/1'), you can extract more information from the MAF file by specifying the column(s) that contain additional genotype data of interest with the '--cols' argument. If provided, this argument will append the requested data to individual sample genotypes (e.g. '0/1:0.23'). You can also control how these additional genotype information appear in the FORMAT field (e.g. AF) with the '--names' argument. If this argument is not provided, the original column name(s) will be displayed. Positional arguments: maf MAF file (compressed or uncompressed). Optional arguments: -h, --help Show this help message and exit. --fasta PATH FASTA file (required to include INDELs in the output). --ignore_indels Use this flag to exclude INDELs from the output. --cols TEXT [TEXT ...] Column(s) in the MAF file. --names TEXT [TEXT ...] Name(s) to be displayed in the FORMAT field. [Example] Convert both SNVs and indels: $ fuc maf-maf2vcf in.maf --fasta hs37d5.fa > out.vcf [Example] Convert SNVs only: $ fuc maf-maf2vcf in.maf --ignore_indels > out.vcf [Example] Extract AF field: $ fuc maf-maf2vcf \ in.maf \ --fasta hs37d5.fa \ --cols i_TumorVAF_WU \ --names AF > out.vcf maf-oncoplt =========== .. code-block:: text $ fuc maf-oncoplt -h usage: fuc maf-oncoplt [-h] [--count INT] [--figsize FLOAT FLOAT] [--label_fontsize FLOAT] [--ticklabels_fontsize FLOAT] [--legend_fontsize FLOAT] maf out Create an oncoplot with a MAF file. The format of output image (PDF/PNG/JPEG/SVG) will be automatically determined by the output file's extension. Positional arguments: maf Input MAF file. out Output image file. Optional arguments: -h, --help Show this help message and exit. --count INT Number of top mutated genes to display (default: 10). --figsize FLOAT FLOAT Width, height in inches (default: [15, 10]). --label_fontsize FLOAT Font size of labels (default: 15). --ticklabels_fontsize FLOAT Font size of tick labels (default: 15). --legend_fontsize FLOAT Font size of legend texts (default: 15). [Example] Output a PNG file: $ fuc maf-oncoplt in.maf out.png [Example] Output a PDF file: $ fuc maf-oncoplt in.maf out.pdf maf-sumplt ========== .. code-block:: text $ fuc maf-sumplt -h usage: fuc maf-sumplt [-h] [--figsize FLOAT FLOAT] [--title_fontsize FLOAT] [--ticklabels_fontsize FLOAT] [--legend_fontsize FLOAT] maf out Create a summary plot with a MAF file. The format of output image (PDF/PNG/JPEG/SVG) will be automatically determined by the output file's extension. Positional arguments: maf Input MAF file. out Output image file. Optional arguments: -h, --help Show this help message and exit. --figsize FLOAT FLOAT Width, height in inches (default: [15, 10]). --title_fontsize FLOAT Font size of subplot titles (default: 16). --ticklabels_fontsize FLOAT Font size of tick labels (default: 12). --legend_fontsize FLOAT Font size of legend texts (default: 12). [Example] Output a PNG file: $ fuc maf-sumplt in.maf out.png [Example] Output a PNG file: $ fuc maf-sumplt in.maf out.pdf maf-vcf2maf =========== .. code-block:: text $ fuc maf-vcf2maf -h usage: fuc maf-vcf2maf [-h] vcf Convert a VCF file to a MAF file. Positional arguments: vcf Annotated VCF file. Optional arguments: -h, --help Show this help message and exit. [Example] Convert VCF to MAF: $ fuc maf-vcf2maf in.vcf > out.maf ngs-bam2fq ========== .. code-block:: text $ fuc ngs-bam2fq -h usage: fuc ngs-bam2fq [-h] [--thread INT] [--qsub TEXT] [--force] manifest output Pipeline for converting BAM files to FASTQ files. This pipeline assumes that input BAM files consist of paired-end reads, and will output two zipped FASTQ files for each sample (forward and reverse reads). That is, SAMPLE.bam will produce SAMPLE_R1.fastq.gz and SAMPLE_R2.fastq.gz. By default, the pipeline will be run in a local environment. Use --qsub to leverage a parallel environment, in which case SGE is required. External dependencies: - [Required] SAMtools: Required for BAM to FASTQ conversion. - [Optional] SGE: Required for job submission (i.e. qsub). Manifest columns: - [Required] BAM: Input BAM file. Positional arguments: manifest Sample manifest CSV file. output Output directory. Optional arguments: -h, --help Show this help message and exit. --thread INT Number of threads to use (default: 1). --qsub TEXT SGE resoruce to request with qsub for BAM to FASTQ conversion. Since this oppoeration supports multithreading, it is recommended to speicfy a parallel environment (PE) to speed up the process (also see --thread). --force Overwrite the output directory if it already exists. [Example] Run in local environment: $ fuc ngs-bam2fq \ manifest.csv \ output_dir \ --thread 10 $ sh output_dir/shell/runme.sh [Example] Run in parallel environment with specific queue: $ fuc ngs-bam2fq \ manifest.csv \ output_dir \ --qsub "-q queue_name -pe pe_name 10" \ --thread 10 $ sh output_dir/shell/runme.sh [Example] Run in parallel environment with specific nodes: $ fuc ngs-bam2fq \ manifest.csv \ output_dir \ --qsub "-l h='node_A|node_B' -pe pe_name 10" \ --thread 10 $ sh output_dir/shell/runme.sh ngs-fq2bam ========== .. code-block:: text $ fuc ngs-fq2bam -h usage: fuc ngs-fq2bam [-h] [--qsub TEXT] [--bed PATH] [--thread INT] [--platform TEXT] [--job TEXT] [--force] [--keep] manifest fasta output java vcf [vcf ...] Pipeline for converting FASTQ files to analysis-ready BAM files. Here, "analysis-ready" means that the final BAM files are: 1) aligned to a reference genome, 2) sorted by genomic coordinate, 3) marked for duplicate reads, 4) recalibrated by BQSR model, and 5) ready for downstream analyses such as variant calling. By default, the pipeline will be run in a local environment. Use --qsub to leverage a parallel environment, in which case SGE is required. External dependencies: - [Required] BWA: Required for read alignment (i.e. BWA-MEM). - [Required] SAMtools: Required for sorting and indexing BAM files. - [Required] GATK: Required for marking duplicate reads and recalibrating BAM files. - [Optional] SGE: Required for job submission (i.e. qsub). Manifest columns: - Name: Sample name. - Read1: Path to forward FASTA file. - Read2: Path to reverse FASTA file. Positional arguments: manifest Sample manifest CSV file. fasta Reference FASTA file. output Output directory. java Java resoruce to request for GATK. vcf One or more reference VCF files containing known variant sites (e.g. 1000 Genomes Project). Optional arguments: -h, --help Show this help message and exit. --qsub TEXT SGE resoruce to request for qsub. --bed PATH BED file. --thread INT Number of threads to use (default: 1). --platform TEXT Sequencing platform (default: 'Illumina'). --job TEXT Job submission ID for SGE. --force Overwrite the output directory if it already exists. --keep Keep temporary files. [Example] Run in local environment: $ fuc ngs-fq2bam \ manifest.csv \ ref.fa \ output_dir \ "-Xmx15g -Xms15g" \ 1.vcf 2.vcf 3.vcf \ --thread 10 $ sh output_dir/shell/runme.sh [Example] Run in parallel environment with specific queue: $ fuc ngs-fq2bam \ manifest.csv \ ref.fa \ output_dir \ "-Xmx15g -Xms15g" \ 1.vcf 2.vcf 3.vcf \ --qsub "-q queue_name -pe pe_name 10" \ --thread 10 $ sh output_dir/shell/runme.sh [Example] Run in parallel environment with specific nodes: $ fuc ngs-fq2bam \ manifest.csv \ ref.fa \ output_dir \ "-Xmx15g -Xms15g" \ 1.vcf 2.vcf 3.vcf \ --qsub "-l h='node_A|node_B' -pe pe_name 10" \ --thread 10 $ sh output_dir/shell/runme.sh ngs-hc ====== .. code-block:: text $ fuc ngs-hc -h usage: fuc ngs-hc [-h] [--bed PATH] [--dbsnp PATH] [--thread INT] [--batch INT] [--job TEXT] [--force] [--keep] [--posix] manifest fasta output qsub java1 java2 Pipeline for germline short variant discovery. External dependencies: - SGE: Required for job submission (i.e. qsub). - GATK: Required for variant calling (i.e. HaplotypeCaller) and filtration. Manifest columns: - BAM: Recalibrated BAM file. Positional arguments: manifest Sample manifest CSV file. fasta Reference FASTA file. output Output directory. qsub SGE resoruce to request for qsub. java1 Java resoruce to request for single-sample variant calling. java2 Java resoruce to request for joint variant calling. Optional arguments: -h, --help Show this help message and exit. --bed PATH BED file. --dbsnp PATH VCF file from dbSNP. --thread INT Number of threads to use (default: 1). --batch INT Batch size used for GenomicsDBImport (default: 0). This controls the number of samples for which readers are open at once and therefore provides a way to minimize memory consumption. The size of 0 means no batching (i.e. readers for all samples will be opened at once). --job TEXT Job submission ID for SGE. --force Overwrite the output directory if it already exists. --keep Keep temporary files. --posix Set GenomicsDBImport to allow for optimizations to improve the usability and performance for shared Posix Filesystems (e.g. NFS, Lustre). If set, file level locking is disabled and file system writes are minimized by keeping a higher number of file descriptors open for longer periods of time. Use with --batch if keeping a large number of file descriptors open is an issue. [Example] Specify queue: $ fuc ngs-hc \ manifest.csv \ ref.fa \ output_dir \ "-q queue_name" \ "-Xmx15g -Xms15g" \ "-Xmx30g -Xms30g" \ --dbsnp dbSNP.vcf [Example] Specify nodes: $ fuc ngs-hc \ manifest.csv \ ref.fa \ output_dir \ "-l h='node_A|node_B'" \ "-Xmx15g -Xms15g" \ "-Xmx30g -Xms30g" \ --bed in.bed ngs-m2 ====== .. code-block:: text $ fuc ngs-m2 -h usage: fuc ngs-m2 [-h] [--bed PATH] [--force] [--keep] manifest fasta output pon germline qsub java Pipeline for somatic short variant discovery. External dependencies: - SGE: Required for job submission (i.e. qsub). - GATK: Required for variant calling (i.e. Mutect2) and filtration. Manifest columns: - Tumor: Recalibrated BAM file for tumor. - Normal: Recalibrated BAM file for matched normal. Positional arguments: manifest Sample manifest CSV file. fasta Reference FASTA file. output Output directory. pon PoN VCF file. germline Germline VCF file. qsub SGE resoruce to request for qsub. java Java resoruce to request for GATK. Optional arguments: -h, --help Show this help message and exit. --bed PATH BED file. --force Overwrite the output directory if it already exists. --keep Keep temporary files. ngs-pon ======= .. code-block:: text $ fuc ngs-pon -h usage: fuc ngs-pon [-h] [--bed PATH] [--force] [--keep] manifest fasta output qsub java Pipeline for constructing a panel of normals (PoN). Dependencies: - GATK: Required for constructing PoN. Manifest columns: - BAM: Path to recalibrated BAM file. Positional arguments: manifest Sample manifest CSV file. fasta Reference FASTA file. output Output directory. qsub SGE resoruce to request for qsub. java Java resoruce to request for GATK. Optional arguments: -h, --help Show this help message and exit. --bed PATH BED file. --force Overwrite the output directory if it already exists. --keep Keep temporary files. [Example] Specify queue: $ fuc ngs-pon \ manifest.csv \ ref.fa \ output_dir \ "-q queue_name" \ "-Xmx15g -Xms15g" [Example] Specify nodes: $ fuc ngs-pon \ manifest.csv \ ref.fa \ output_dir \ "-l h='node_A|node_B'" \ "-Xmx15g -Xms15g" ngs-quant ========= .. code-block:: text $ fuc ngs-quant -h usage: fuc ngs-quant [-h] [--thread INT] [--bootstrap INT] [--job TEXT] [--force] [--posix] [--stranded TEXT] manifest index output qsub Pipeline for running RNAseq quantification from FASTQ files with Kallisto. External dependencies: - SGE: Required for job submission (i.e. qsub). - kallisto: Required for RNAseq quantification. Manifest columns: - Name: Sample name. - Read1: Path to forward FASTA file. - Read2: Path to reverse FASTA file. Positional arguments: manifest Sample manifest CSV file. index Kallisto index file. output Output directory. qsub SGE resoruce to request for qsub. Optional arguments: -h, --help Show this help message and exit. --thread INT Number of threads to use (default: 1). --bootstrap INT Number of bootstrap samples (default: 50). --job TEXT Job submission ID for SGE. --force Overwrite the output directory if it already exists. --posix Set the environment variable HDF5_USE_FILE_LOCKING=FALSE before running Kallisto. This is required for shared Posix Filesystems (e.g. NFS, Lustre). --stranded TEXT Strand specific reads (default: 'none') (choices: 'none', 'forward', 'reverse'). [Example] Specify queue: $ fuc ngs-quant \ manifest.csv \ transcripts.idx \ output_dir \ "-q queue_name -pe pe_name 10" \ --thread 10 ngs-trim ======== .. code-block:: text $ fuc ngs-trim -h usage: fuc ngs-trim [-h] [--thread INT] [--job TEXT] [--force] manifest output qsub Pipeline for trimming adapters from FASTQ files. External dependencies: - SGE: Required for job submission (i.e. qsub). - cutadapt: Required for trimming adapters. Manifest columns: - Name: Sample name. - Read1: Path to forward FASTA file. - Read2: Path to reverse FASTA file. Positional arguments: manifest Sample manifest CSV file. output Output directory. qsub SGE resoruce to request for qsub. Optional arguments: -h, --help Show this help message and exit. --thread INT Number of threads to use (default: 1). --job TEXT Job submission ID for SGE. --force Overwrite the output directory if it already exists. [Example] Specify queue: $ fuc ngs-trim \ manifest.csv \ output_dir \ "-q queue_name -pe pe_name 10" \ --thread 10 tabix-index =========== .. code-block:: text $ fuc tabix-index -h usage: fuc tabix-index [-h] [--force] file Index a GFF/BED/SAM/VCF file with Tabix. The Tabix program is used to index a TAB-delimited genome position file (GFF/BED/SAM/VCF) and create an index file (.tbi). The input data file must be position sorted and compressed by bgzip. Positional arguments: file File to be indexed. Optional arguments: -h, --help Show this help message and exit. --force Force to overwrite the index file if it is present. [Example] Index a GFF file: $ fuc tabix-index in.gff.gz [Example] Index a BED file: $ fuc tabix-index in.bed.gz [Example] Index a SAM file: $ fuc tabix-index in.sam.gz [Example] Index a VCF file: $ fuc tabix-index in.vcf.gz tabix-slice =========== .. code-block:: text $ fuc tabix-slice -h usage: fuc tabix-slice [-h] file regions [regions ...] Slice a GFF/BED/SAM/VCF file with Tabix. After creating an index file (.tbi), the Tabix program is able to quickly retrieve data lines overlapping regions specified in the format 'chr:start-end'. Coordinates specified in this region format are 1-based and inclusive. Positional arguments: file File to be sliced. regions One or more regions. Optional arguments: -h, --help Show this help message and exit. [Example] Slice a VCF file: $ fuc tabix-slice in.vcf.gz chr1:100-200 > out.vcf tbl-merge ========= .. code-block:: text $ fuc tbl-merge -h usage: fuc tbl-merge [-h] [--how TEXT] [--on TEXT [TEXT ...]] [--lsep TEXT] [--rsep TEXT] [--osep TEXT] left right Merge two table files. This command will merge two table files using one or more shared columns. The command essentially wraps the 'pandas.DataFrame.merge' method from the pandas package. For details on the merging algorithms, please visit the method's documentation page. Positional arguments: left Input left file. right Input right file. Optional arguments: -h, --help Show this help message and exit. --how TEXT Type of merge to be performed (default: 'inner') (choices: 'left', 'right', 'outer', 'inner', 'cross'). --on TEXT [TEXT ...] Column names to join on. --lsep TEXT Delimiter to use for the left file (default: '\t'). --rsep TEXT Delimiter to use for the right file (default: '\t'). --osep TEXT Delimiter to use for the output file (default: '\t'). [Example] Merge two tables: $ fuc tbl-merge left.tsv right.tsv > merged.tsv [Example] When the left table is a CSV: $ fuc tbl-merge left.csv right.tsv --lsep , > merged.tsv [Example] Merge with the outer algorithm: $ fuc tbl-merge left.tsv right.tsv --how outer > merged.tsv tbl-sum ======= .. code-block:: text $ fuc tbl-sum -h usage: fuc tbl-sum [-h] [--sep TEXT] [--skiprows TEXT] [--na_values TEXT [TEXT ...]] [--keep_default_na] [--expr TEXT] [--columns TEXT [TEXT ...]] [--dtypes PATH] table_file Summarize a table file. Positional arguments: table_file Table file. Optional arguments: -h, --help Show this help message and exit. --sep TEXT Delimiter to use (default: '\t'). --skiprows TEXT Comma-separated line numbers to skip (0-indexed) or number of lines to skip at the start of the file (e.g. `--skiprows 1,` will skip the second line, `--skiprows 2,4` will skip the third and fifth lines, and `--skiprows 10` will skip the first 10 lines). --na_values TEXT [TEXT ...] Additional strings to recognize as NA/NaN (by default, the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', '', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'). --keep_default_na Whether or not to include the default NaN values when parsing the data (see 'pandas.read_table' for details). --expr TEXT Query the columns of a pandas.DataFrame with a boolean expression (e.g. `--query "A == 'yes'"`). --columns TEXT [TEXT ...] Columns to be summarized (by default, all columns will be included). --dtypes PATH File of column names and their data types (either 'categorical' or 'numeric'); one tab-delimited pair of column name and data type per line. [Example] Summarize a table: $ fuc tbl-sum table.tsv vcf-call ======== .. code-block:: text $ fuc vcf-call -h usage: fuc vcf-call [-h] [-r TEXT [TEXT ...]] [--min-mq INT] [--max-depth INT] [--dir-path PATH] [--gap_frac FLOAT] [--group-samples PATH] fasta bams [bams ...] Call SNVs and indels from BAM files. Under the hood, the command utilizes the bcftool program to call variants. Positional arguments: fasta Reference FASTA file. bams One or more input BAM files. Alternatively, you can provide a text file (.txt, .tsv, .csv, or .list) containing one BAM file per line. Optional arguments: -h, --help Show this help message and exit. -r TEXT [TEXT ...], --regions TEXT [TEXT ...] By default, the command looks at each genomic position with coverage in BAM files, which can be excruciatingly slow for large files (e.g. whole genome sequencing). Therefore, use this argument to only call variants in given regions. Each region must have the format chrom:start-end and be a half-open interval with (start, end]. This means, for example, chr1:100-103 will extract positions 101, 102, and 103. Alternatively, you can provide a BED file (compressed or uncompressed) to specify regions. Note that the 'chr' prefix in contig names (e.g. 'chr1' vs. '1') will be automatically added or removed as necessary to match the input BAM's contig names. --min-mq INT Minimum mapping quality for an alignment to be used (default: 1). --max-depth INT At a position, read maximally this number of reads per input file (default: 250). --dir-path PATH By default, intermediate files (likelihoods.bcf, calls.bcf, and calls.normalized.bcf) will be stored in a temporary directory, which is automatically deleted after creating final VCF. If you provide a directory path, intermediate files will be stored there. --gap_frac FLOAT Minimum fraction of gapped reads for calling indels (default: 0.002). --group-samples PATH By default, all samples are assumed to come from a single population. This option allows to group samples into populations and apply the HWE assumption within but not across the populations. To use this option, provide a tab-delimited text file with sample names in the first column and group names in the second column. If '--group-samples -' is given instead, no HWE assumption is made at all and single-sample calling is performed. Note that in low coverage data this inflates the rate of false positives. Therefore, make sure you know what you are doing. [Example] Specify regions manually: $ fuc vcf-call ref.fa 1.bam 2.bam \ -r chr1:100-200 chr2:400-500 > out.vcf [Example] Specify regions with a BED file: $ fuc vcf-call ref.fa bam.list \ -r in.bed > out.vcf vcf-filter ========== .. code-block:: text $ fuc vcf-filter -h usage: fuc vcf-filter [-h] [--expr TEXT] [--samples PATH] [--drop_duplicates [TEXT ...]] [--greedy] [--opposite] [--filter_empty] vcf Filter a VCF file. Positional arguments: vcf VCF file (compressed or uncompressed). Optional arguments: -h, --help Show this help message and exit. --expr TEXT Expression to evaluate. --samples PATH File of sample names to apply the marking (one sample per line). --drop_duplicates [TEXT ...] Only consider certain columns for identifying duplicates, by default use all of the columns. --greedy Use this flag to mark even ambiguous genotypes as missing. --opposite Use this flag to mark all genotypes that do not satisfy the query expression as missing and leave those that do intact. --filter_empty Use this flag to remove rows with no genotype calls at all. [Example] Mark genotypes with 0/0 as missing: $ fuc vcf-filter in.vcf --expr 'GT == "0/0"' > out.vcf [Example] Mark genotypes that are not 0/0 as missing: $ fuc vcf-filter in.vcf --expr 'GT != "0/0"' > out.vcf [Example] Mark genotypes whose DP is less than 30 as missing: $ fuc vcf-filter in.vcf --expr 'DP < 30' > out.vcf [Example] Same as above, but also mark ambiguous genotypes as missing: $ fuc vcf-filter in.vcf --expr 'DP < 30' --greedy > out.vcf [Example] Build a complex query to select genotypes to be marked missing: $ fuc vcf-filter in.vcf --expr 'AD[1] < 10 or DP < 30' --opposite > out.vcf [Example] Compute summary statistics and subset samples: $ fuc vcf-filter in.vcf \ --expr 'np.mean(AD) < 10' --greedy --samples sample.list > out.vcf [Example] Drop duplicate rows: $ fuc vcf-filter in.vcf --drop_duplicates CHROM POS REF ALT > out.vcf [Example] Filter out rows without genotypes: $ fuc vcf-filter in.vcf --filter_empty > out.vcf vcf-index ========= .. code-block:: text $ fuc vcf-index -h usage: fuc vcf-index [-h] [--force] vcf Index a VCF file. This command will create an index file (.tbi) for the input VCF. Positional arguments: vcf Input VCF file to be indexed. When an uncompressed file is given, the command will automatically create a BGZF compressed copy of the file (.gz) before indexing. Optional arguments: -h, --help Show this help message and exit. --force Force to overwrite the index file if it is already present. [Example] Index a compressed VCF file: $ fuc vcf-index in.vcf.gz [Example] Index an uncompressed VCF file (will create a compressed VCF first): $ fuc vcf-index in.vcf vcf-merge ========= .. code-block:: text $ fuc vcf-merge -h usage: fuc vcf-merge [-h] [--how TEXT] [--format TEXT] [--sort] [--collapse] vcf_files [vcf_files ...] Merge two or more VCF files. Positional arguments: vcf_files VCF files (compressed or uncompressed). Note that the 'chr' prefix in contig names (e.g. 'chr1' vs. '1') will be automatically added or removed as necessary to match the contig names of the first VCF. Optional arguments: -h, --help Show this help message and exit. --how TEXT Type of merge as defined in pandas.DataFrame.merge (default: 'inner'). --format TEXT FORMAT subfields to be retained (e.g. 'GT:AD:DP') (default: 'GT'). --sort Use this flag to turn off sorting of records (default: True). --collapse Use this flag to collapse duplicate records (default: False). [Example] Merge multiple VCF files: $ fuc vcf-merge 1.vcf 2.vcf 3.vcf > merged.vcf [Example] Keep the GT, AD, DP fields: $ fuc vcf-merge 1.vcf 2.vcf --format GT:AD:DP > merged.vcf vcf-rename ========== .. code-block:: text $ fuc vcf-rename -h usage: fuc vcf-rename [-h] [--mode TEXT] [--range INT INT] [--sep TEXT] vcf names Rename the samples in a VCF file. There are three different renaming modes using the 'names' file: - 'MAP': Default mode. Requires two columns, old names in the first and new names in the second. - 'INDEX': Requires two columns, new names in the first and 0-based indicies in the second. - 'RANGE': Requires only one column of new names but '--range' must be specified. Positional arguments: vcf VCF file (compressed or uncompressed). names Text file containing information for renaming the samples. Optional arguments: -h, --help Show this help message and exit. --mode TEXT Renaming mode (default: 'MAP') (choices: 'MAP', 'INDEX', 'RANGE'). --range INT INT Index range to use when renaming the samples. Applicable only with the 'RANGE' mode. --sep TEXT Delimiter to use for reading the 'names' file (default: '\t'). [Example] Using the default 'MAP' mode: $ fuc vcf-rename in.vcf old_new.tsv > out.vcf [Example] Using the 'INDEX' mode: $ fuc vcf-rename in.vcf new_idx.tsv --mode INDEX > out.vcf [Example] Using the 'RANGE' mode: $ fuc vcf-rename in.vcf new_only.tsv --mode RANGE --range 2 5 > out.vcf vcf-slice ========= .. code-block:: text $ fuc vcf-slice -h usage: fuc vcf-slice [-h] vcf regions [regions ...] Slice a VCF file for specified regions. Positional arguments: vcf Input VCF file must be already BGZF compressed (.gz) and indexed (.tbi) to allow random access. A VCF file can be compressed with the fuc-bgzip command and indexed with the vcf-index command. regions One or more regions to be sliced. Each region must have the format chrom:start-end and be a half-open interval with (start, end]. This means, for example, chr1:100-103 will extract positions 101, 102, and 103. Alternatively, you can provide a BED file (compressed or uncompressed) to specify regions. Note that the 'chr' prefix in contig names (e.g. 'chr1' vs. '1') will be automatically added or removed as necessary to match the input VCF's contig names. Optional arguments: -h, --help Show this help message and exit. [Example] Specify regions manually: $ fuc vcf-slice in.vcf.gz 1:100-300 2:400-700 > out.vcf [Example] Speicfy regions with a BED file: $ fuc vcf-slice in.vcf.gz regions.bed > out.vcf [Example] Output a compressed file: $ fuc vcf-slice in.vcf.gz regions.bed | fuc fuc-bgzip > out.vcf.gz vcf-split ========= .. code-block:: text $ fuc vcf-split -h usage: fuc vcf-split [-h] [--clean] [--force] vcf output Split a VCF file by individual. Positional arguments: vcf VCF file to be split. output Output directory. Optional arguments: -h, --help Show this help message and exit. --clean By default, the command will only return variants present in each individual. Use the tag to stop this behavior and make sure that all individuals have the same number of variants. --force Overwrite the output directory if it already exists. [Example] Split a VCF file by individual: $ fuc vcf-split in.vcf output_dir vcf-vcf2bed =========== .. code-block:: text $ fuc vcf-vcf2bed -h usage: fuc vcf-vcf2bed [-h] vcf Convert a VCF file to a BED file. Positional arguments: vcf VCF file (compressed or uncompressed). Optional arguments: -h, --help Show this help message and exit. [Example] Convert VCF to BED: $ fuc vcf-vcf2bed in.vcf > out.bed vcf-vep ======= .. code-block:: text $ fuc vcf-vep -h usage: fuc vcf-vep [-h] [--opposite] [--as_zero] vcf expr Filter a VCF file by annotations from Ensembl VEP. Positional arguments: vcf VCF file annotated by Ensembl VEP (compressed or uncompressed). expr Query expression to evaluate. Optional arguments: -h, --help Show this help message and exit. --opposite Use this flag to return only records that don't meet the said criteria. --as_zero Use this flag to treat missing values as zero instead of NaN. [Example] Select variants in the TP53 gene: $ fuc vcf-vep in.vcf "SYMBOL == 'TP53'" > out.vcf [Example] Exclude variants from the TP53 gene: $ fuc vcf-vep in.vcf "SYMBOL != 'TP53'" > out.vcf [Example] Same as above: $ fuc vcf-vep in.vcf "SYMBOL == 'TP53'" --opposite > out.vcf [Example] Select splice donor or stop-gain variants: $ fuc vcf-vep in.vcf \ "Consequence in ['splice_donor_variant', 'stop_gained']" > out.vcf [Example] Build a complex query to select specific variants: $ fuc vcf-vep in.vcf \ "(SYMBOL == 'TP53') and (Consequence.str.contains('stop_gained'))" > out.vcf [Example] Select variants whose gnomAD AF is less than 0.001: $ fuc vcf-vep in.vcf "gnomAD_AF < 0.001" > out.vcf [Example] Variants without AF data will be treated as having AF of 0: $ fuc vcf-vep in.vcf "gnomAD_AF < 0.001" --as_zero > out.vcf