CLI

Introduction

This section describes command line interface (CLI) for the fuc package.

For getting help on the fuc CLI:

$ fuc -h
usage: fuc [-h] [-v] COMMAND ...

positional arguments:
  COMMAND
    bam-aldepth  Compute allelic depth from a SAM/BAM/CRAM file.
    bam-depth    Compute read depth from SAM/BAM/CRAM files.
    bam-head     Print the header of a SAM/BAM/CRAM file.
    bam-index    Index a SAM/BAM/CRAM file.
    bam-rename   Rename the sample in a SAM/BAM/CRAM file.
    bam-slice    Slice a SAM/BAM/CRAM file.
    bed-intxn    Find the intersection of BED files.
    bed-sum      Summarize a BED file.
    cov-concat   Concatenate depth of coverage files.
    cov-rename   Rename the samples in a depth of coverage file.
    fa-filter    Filter sequence records in a FASTA file
    fq-count     Count sequence reads in FASTQ files.
    fq-sum       Summarize a FASTQ file.
    fuc-bgzip    Write a BGZF compressed file.
    fuc-compf    Compare the contents of two files.
    fuc-demux    Parse the Reports directory from bcl2fastq.
    fuc-exist    Check whether certain files exist.
    fuc-find     Find all filenames matching a specified pattern recursively.
    fuc-undetm   Compute top unknown barcodes using undertermined FASTQ from bcl2fastq.
    maf-maf2vcf  Convert a MAF file to a VCF file.
    maf-oncoplt  Create an oncoplot with a MAF file.
    maf-sumplt   Create a summary plot with a MAF file.
    maf-vcf2maf  Convert a VCF file to a MAF file.
    ngs-bam2fq   Pipeline for converting BAM files to FASTQ files.
    ngs-fq2bam   Pipeline for converting FASTQ files to analysis-ready BAM files.
    ngs-hc       Pipeline for germline short variant discovery.
    ngs-m2       Pipeline for somatic short variant discovery.
    ngs-pon      Pipeline for constructing a panel of normals (PoN).
    tabix-index  Index a GFF/BED/SAM/VCF file with Tabix.
    tabix-slice  Slice a GFF/BED/SAM/VCF file with Tabix.
    tbl-merge    Merge two table files.
    tbl-sum      Summarize a table file.
    vcf-filter   Filter a VCF file.
    vcf-index    Index a VCF file.
    vcf-merge    Merge two or more VCF files.
    vcf-rename   Rename the samples in a VCF file.
    vcf-slice    Slice a VCF file for specified regions.
    vcf-vcf2bed  Convert a VCF file to a BED file.
    vcf-vep      Filter a VCF file by annotations from Ensembl VEP.

optional arguments:
  -h, --help     Show this help message and exit.
  -v, --version  Show the version number and exit.

For getting help on a specific command (e.g. vcf-merge):

$ fuc vcf-merge -h

bam-aldepth

$ fuc bam-aldepth -h
usage: fuc bam-aldepth [-h] bam sites

Count allelic depth from a SAM/BAM/CRAM file.

Positional arguments:
  bam         Alignment file.
  sites       TSV file containing two columns, chromosome and position.
              This can also be a BED or VCF file (compressed or
              uncompressed) Input type will be detected automatically.

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] Provide sites with a TSV file:
  $ fuc bam-aldepth in.bam sites.tsv > out.tsv

[Example] Provide sites with a VCF file:
  $ fuc bam-aldepth in.bam sites.vcf > out.tsv

bam-depth

$ fuc bam-depth -h
usage: fuc bam-depth [-h] [--bam PATH [PATH ...]] [--fn PATH] [--bed PATH]
                     [--region TEXT] [--zero]

Compute read depth from SAM/BAM/CRAM files.

By default, the command will count all reads within the alignment files. You
can specify regions of interest with --bed or --region. When you do this, pay
close attention to the 'chr' string in contig names (e.g. 'chr1' vs. '1').
Note also that --region requires the input files be indexed.

Optional arguments:
  -h, --help            Show this help message and exit.
  --bam PATH [PATH ...]
                        One or more alignment files. Cannot be used with --fn.
  --fn PATH             File containing one alignment file per line. Cannot
                        be used with --bam.
  --bed PATH            BED file. Cannot be used with --region.
  --region TEXT         Target region ('chrom:start-end'). Cannot be used
                        with --bed.
  --zero                Output all positions including those with zero depth.

[Example] To specify regions with a BED file:
  $ fuc bam-depth \
  --bam 1.bam 2.bam \
  --bed in.bed > out.tsv

[Example] To specify regions manually:
  $ fuc bam-depth \
  --fn bam.list \
  --region chr1:100-200 > out.tsv

bam-head

$ fuc bam-head -h
usage: fuc bam-head [-h] bam

Print the header of a SAM/BAM/CRAM file.

Positional arguments:
  bam         Alignment file.

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] Print the header of a BAM file:
  $ fuc bam-head in.bam

bam-index

$ fuc bam-index -h
usage: fuc bam-index [-h] bam

Index a SAM/BAM/CRAM file.

Positional arguments:
  bam         Alignment file.

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] Index a BAM file:
  $ fuc bam-index in.bam

bam-rename

$ fuc bam-rename -h
usage: fuc bam-rename [-h] bam name

Rename the sample in a SAM/BAM/CRAM file.

Positional arguments:
  bam         Alignment file.
  name        New sample name.

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] Write a new BAM file after renaming:
  $ fuc bam-rename in.bam NA12878 > out.bam

bam-slice

$ fuc bam-slice -h
usage: fuc bam-slice [-h] [--format TEXT] [--fasta PATH]
                     bam regions [regions ...]

Slice an alignment file (SAM/BAM/CRAM).

Positional arguments:
  bam            Input alignment file must be already indexed (.bai) to allow
                 random access. You can index an alignment file with the
                 bam-index command.
  regions        One or more regions to be sliced. Each region must have the
                 format chrom:start-end and be a half-open interval with
                 (start, end]. This means, for example, chr1:100-103 will
                 extract positions 101, 102, and 103. Alternatively, you can
                 provide a BED file (compressed or uncompressed) to specify
                 regions. Note that the 'chr' prefix in contig names (e.g.
                 'chr1' vs. '1') will be automatically added or removed as
                 necessary to match the input BED's contig names.

Optional arguments:
  -h, --help     Show this help message and exit.
  --format TEXT  Output format (default: 'BAM') (choices: 'SAM', 'BAM',
                 'CRAM').
  --fasta PATH   FASTA file. Required when --format is 'CRAM'.

[Example] Specify regions manually:
  $ fuc bam-slice in.bam 1:100-300 2:400-700 > out.bam

[Example] Speicfy regions with a BED file:
  $ fuc bam-slice in.bam regions.bed > out.bam

[Example] Slice a CRAM file:
  $ fuc bam-slice in.bam regions.bed --format CRAM --fasta ref.fa > out.cram

bed-intxn

$ fuc bed-intxn -h
usage: fuc bed-intxn [-h] bed [bed ...]

Find the intersection of BED files.

Positional arguments:
  bed         BED files.

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] Find the intersection of three BED files:
  $ fuc bed-intxn in1.bed in2.bed in3.bed > out.bed

bed-sum

$ fuc bed-sum -h
usage: fuc bed-sum [-h] [--bases INT] [--decimals INT] bed

Summarize a BED file.

This command will compute various summary statistics for a BED file. The
returned statistics include the total numbers of probes and covered base
pairs for each chromosome.

By default, covered base pairs are displayed in bp, but if you prefer you
can, for example, use '--bases 1000' to display in kb.

Positional arguments:
  bed             BED file.

Optional arguments:
  -h, --help      Show this help message and exit.
  --bases INT     Number to divide covered base pairs (default: 1).
  --decimals INT  Number of decimals (default: 0).

cov-concat

$ fuc cov-concat -h
usage: fuc cov-concat [-h] [--axis INT] PATH [PATH ...]

Concatenate depth of coverage files.

Positional arguments:
  PATH        One or more TSV files.

Optional arguments:
  -h, --help  Show this help message and exit.
  --axis INT  The axis to concatenate along (default: 0) (choices:
              0, 1 where 0 is index and 1 is columns).

[Example] Concatenate vertically:
  $ fuc cov-concat in1.tsv in2.tsv > out.tsv

[Example] Concatenate horizontally:
  $ fuc cov-concat in1.tsv in2.tsv --axis 1 > out.tsv

cov-rename

$ fuc cov-rename -h
usage: fuc cov-rename [-h] [--mode TEXT] [--range INT INT] [--sep TEXT]
                      tsv names

Rename the samples in a depth of coverage file.

There are three different renaming modes using the names file:
  - 'MAP': Default mode. Requires two columns, old names in the first
    and new names in the second.
  - 'INDEX': Requires two columns, new names in the first and 0-based
    indicies in the second.
  - 'RANGE': Requires only one column of new names but --range must
    be specified.

Positional arguments:
  tsv              TSV file (compressed or uncompressed).
  names            Text file containing information for renaming the samples.

Optional arguments:
  -h, --help       Show this help message and exit.
  --mode TEXT      Renaming mode (default: 'MAP') (choices: 'MAP',
                   'INDEX', 'RANGE').
  --range INT INT  Index range to use when renaming the samples.
                   Applicable only with the 'RANGE' mode.
  --sep TEXT       Delimiter to use when reading the names file
                   (default: '\t').

[Example] Using the default 'MAP' mode:
  $ fuc cov-rename in.tsv old_new.tsv > out.tsv

[Example] Using the 'INDEX' mode:
  $ fuc cov-rename in.tsv new_idx.tsv --mode INDEX > out.tsv

[Example] Using the 'RANGE' mode:
  $ fuc cov-rename in.tsv new_only.tsv --mode RANGE --range 2 5 > out.tsv

fa-filter

$ fuc fa-filter -h
usage: fuc fa-filter [-h] [--contigs TEXT [TEXT ...]] [--exclude] fasta

Filter sequence records in a FASTA file.

Positional arguments:
  fasta                 FASTA file (compressed or uncompressed).

Optional arguments:
  -h, --help            Show this help message and exit.
  --contigs TEXT [TEXT ...]
                        One or more contigs to be selected. Alternatively, you can
                        provide a file containing one contig per line.
  --exclude             Exclude specified contigs.

[Example] Select certain contigs:
  $ fuc fa-filter in.fasta --contigs chr1 chr2 > out.fasta

[Example] Select certain contigs:
  $ fuc fa-filter in.fasta --contigs contigs.list --exclude > out.fasta

fq-count

$ fuc fq-count -h
usage: fuc fq-count [-h] [fastq ...]

Count sequence reads in FASTQ files.

Positional arguments:
  fastq       FASTQ files (compressed or uncompressed) (default: stdin).

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] When the input is a FASTQ file:
  $ fuc fq-count in1.fastq in2.fastq

[Example] When the input is stdin:
  $ cat fastq.list | fuc fq-count

fq-sum

$ fuc fq-sum -h
usage: fuc fq-sum [-h] fastq

Summarize a FASTQ file.

This command will output a summary of the input FASTQ file. The summary
includes the total number of sequence reads, the distribution of read
lengths, and the numbers of unique and duplicate sequences.

Positional arguments:
  fastq       FASTQ file (zipped or unqzipped).

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] Summarize a FASTQ file:
  $ fuc fq-sum in.fastq

fuc-bgzip

$ fuc fuc-bgzip -h
usage: fuc fuc-bgzip [-h] [file ...]

Write a BGZF compressed file.

BGZF (Blocked GNU Zip Format) is a modified form of gzip compression which
can be applied to any file format to provide compression with efficient
random access. In addition to being required for random access to and writing
of BAM files, the BGZF format can also be used for most of the sequence data
formats (e.g. FASTA, FASTQ, GenBank, VCF, MAF).

Positional arguments:
  file        File to be compressed (default: stdin).

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] When the input is a VCF file:
  $ fuc fuc-bgzip in.vcf > out.vcf.gz

[Example] When the input is stdin:
  $ cat in.vcf | fuc fuc-bgzip > out.vcf.gz

fuc-compf

$ fuc fuc-compf -h
usage: fuc fuc-compf [-h] left right

Compare the contents of two files.

This command will compare the contents of two files, returning 'True' if they
are identical and 'False' otherwise.

Positional arguments:
  left        Left file.
  right       Right file.

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] Compare two files:
  $ fuc fuc-compf left.txt right.txt

fuc-demux

$ fuc fuc-demux -h
usage: fuc fuc-demux [-h] [--sheet PATH] reports output

Parse the Reports directory from bcl2fastq.

This command will parse, and extract various statistics from, HTML files in
the Reports directory created by the bcl2fastq or bcl2fastq2 prograrm. After
creating an output directory, the command will write the following files:
  - flowcell-summary.csv
  - lane-summary.csv
  - top-unknown-barcodes.csv
  - reports.pdf

Use --sheet to sort samples in the lane-summary.csv file in the same order
as your SampleSheet.csv file. You can also provide a modified version of your
SampleSheet.csv file to subset samples for the lane-summary.csv and
reports.pdf files.

Positional arguments:
  reports       Reports directory.
  output        Output directory (will be created).

Optional arguments:
  -h, --help    Show this help message and exit.
  --sheet PATH  SampleSheet.csv file. Used for sorting and/or subsetting
                samples.

fuc-exist

$ fuc fuc-exist -h
usage: fuc fuc-exist [-h] [files ...]

Check whether certain files exist.

This command will check whether or not specified files including directories
exist, returning 'True' if they exist and 'False' otherwise.

Positional arguments:
  files       Files and directories to be tested (default: stdin).

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] Test a file:
  $ fuc fuc-exist in.txt

[Example] Test a directory:
  $ fuc fuc-exist dir

[Example] When the input is stdin:
  $ cat test.list | fuc fuc-exist

fuc-find

$ fuc fuc-find -h
usage: fuc fuc-find [-h] [--dir PATH] pattern

Find all filenames matching a specified pattern recursively.

This command will recursively find all the filenames matching a specified
pattern and then return their absolute paths.

Positional arguments:
  pattern     Filename pattern.

Optional arguments:
  -h, --help  Show this help message and exit.
  --dir PATH  Directory to search in (default: current directory).

[Example] Find VCF files in the current directory:
  $ fuc fuc-find "*.vcf"

[Example] Find specific VCF files:
  $ fuc fuc-find "*.vcf.*"

[Example] Find zipped VCF files in a specific directory:
  $ fuc fuc-find "*.vcf.gz" --dir ~/test_dir

fuc-undetm

$ fuc fuc-undetm -h
usage: fuc fuc-undetm [-h] [--count INT] fastq

Compute top unknown barcodes using undertermined FASTQ from bcl2fastq.

This command will compute top unknown barcodes using undertermined FASTQ from
the bcl2fastq or bcl2fastq2 prograrm.

Positional arguments:
  fastq        Undertermined FASTQ (compressed or uncompressed).

Optional arguments:
  -h, --help   Show this help message and exit.
  --count INT  Number of top unknown barcodes to return (default: 30).

[Example] Compute top unknown barcodes:
  $ fuc fuc-undetm Undetermined_S0_R1_001.fastq.gz

maf-maf2vcf

$ fuc maf-maf2vcf -h
usage: fuc maf-maf2vcf [-h] [--fasta PATH] [--ignore_indels]
                       [--cols TEXT [TEXT ...]] [--names TEXT [TEXT ...]]
                       maf

Convert a MAF file to a VCF file.

In order to handle INDELs the command makes use of a reference assembly (i.e.
FASTA file). If SNVs are your only concern, then you do not need a FASTA file
and can just use --ignore_indels.

If you are going to provide a FASTA file, please make sure to select the
appropriate one (e.g. one that matches the genome assembly).

In addition to basic genotype calls (e.g. '0/1'), you can extract more
information from the MAF file by specifying the column(s) that contain
additional genotype data of interest with the '--cols' argument. If provided,
this argument will append the requested data to individual sample genotypes
(e.g. '0/1:0.23').

You can also control how these additional genotype information appear in the
FORMAT field (e.g. AF) with the '--names' argument. If this argument is not
provided, the original column name(s) will be displayed.

Positional arguments:
  maf                   MAF file (compressed or uncompressed).

Optional arguments:
  -h, --help            Show this help message and exit.
  --fasta PATH          FASTA file (required to include INDELs in the output).
  --ignore_indels       Use this flag to exclude INDELs from the output.
  --cols TEXT [TEXT ...]
                        Column(s) in the MAF file.
  --names TEXT [TEXT ...]
                        Name(s) to be displayed in the FORMAT field.

[Example] Convert both SNVs and indels:
  $ fuc maf-maf2vcf in.maf --fasta hs37d5.fa > out.vcf

[Example] Convert SNVs only:
  $ fuc maf-maf2vcf in.maf --ignore_indels > out.vcf

[Example] Extract AF field:
  $ fuc maf-maf2vcf \
  in.maf \
  --fasta hs37d5.fa \
  --cols i_TumorVAF_WU \
  --names AF > out.vcf

maf-oncoplt

$ fuc maf-oncoplt -h
usage: fuc maf-oncoplt [-h] [--count INT] [--figsize FLOAT FLOAT]
                       [--label_fontsize FLOAT] [--ticklabels_fontsize FLOAT]
                       [--legend_fontsize FLOAT]
                       maf out

Create an oncoplot with a MAF file.

The format of output image (PDF/PNG/JPEG/SVG) will be automatically
determined by the output file's extension.

Positional arguments:
  maf                   MAF file.
  out                   Output image file.

Optional arguments:
  -h, --help            Show this help message and exit.
  --count INT           Number of top mutated genes to display (default: 10).
  --figsize FLOAT FLOAT
                        Width, height in inches (default: [15, 10]).
  --label_fontsize FLOAT
                        Font size of labels (default: 15).
  --ticklabels_fontsize FLOAT
                        Font size of tick labels (default: 15).
  --legend_fontsize FLOAT
                        Font size of legend texts (default: 15).

[Example] Output a PNG file:
  $ fuc maf-oncoplt in.maf out.png

[Example] Output a PDF file:
  $ fuc maf-oncoplt in.maf out.pdf

maf-sumplt

$ fuc maf-sumplt -h
usage: fuc maf-sumplt [-h] [--figsize FLOAT FLOAT] [--title_fontsize FLOAT]
                      [--ticklabels_fontsize FLOAT] [--legend_fontsize FLOAT]
                      maf out

Create a summary plot with a MAF file.

The format of output image (PDF/PNG/JPEG/SVG) will be automatically
determined by the output file's extension.

Positional arguments:
  maf                   MAF file.
  out                   Output image file.

Optional arguments:
  -h, --help            Show this help message and exit.
  --figsize FLOAT FLOAT
                        width, height in inches (default: [15, 10])
  --title_fontsize FLOAT
                        font size of subplot titles (default: 16)
  --ticklabels_fontsize FLOAT
                        font size of tick labels (default: 12)
  --legend_fontsize FLOAT
                        font size of legend texts (default: 12)

[Example] Output a PNG file:
  $ fuc maf-sumplt in.maf out.png

[Example] Output a PNG file:
  $ fuc maf-sumplt in.maf out.pdf

maf-vcf2maf

$ fuc maf-vcf2maf -h
usage: fuc maf-vcf2maf [-h] vcf

Convert a VCF file to a MAF file.

Positional arguments:
  vcf         Annotated VCF file.

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] Convert VCF to MAF:
  $ fuc maf-vcf2maf in.vcf > out.maf

ngs-bam2fq

$ fuc ngs-bam2fq -h
usage: fuc ngs-bam2fq [-h] [--thread INT] [--force] manifest output qsub

Pipeline for converting BAM files to FASTQ files.

This pipeline will assume input BAM files consist of paired-end reads
and output two zipped FASTQ files for each sample (forward and reverse
reads). That is, SAMPLE.bam will produce SAMPLE_R1.fastq.gz and
SAMPLE_R2.fastq.gz.

External dependencies:
  - SGE: Required for job submission (i.e. qsub).
  - SAMtools: Required for BAM to FASTQ conversion.

Manifest columns:
  - BAM: BAM file.

Positional arguments:
  manifest      Sample manifest CSV file.
  output        Output directory.
  qsub          SGE resoruce to request with qsub for BAM to FASTQ
                conversion. Since this oppoeration supports multithreading,
                it is recommended to speicfy a parallel environment (PE)
                to speed up the process (also see --thread).

Optional arguments:
  -h, --help    Show this help message and exit.
  --thread INT  Number of threads to use (default: 1).
  --force       Overwrite the output directory if it already exists.

[Example] Specify queue:
  $ fuc ngs-bam2fq \
  manifest.csv \
  output_dir \
  "-q queue_name -pe pe_name 10" \
  --thread 10

[Example] Specify nodes:
  $ fuc ngs-bam2fq \
  manifest.csv \
  output_dir \
  "-l h='node_A|node_B' -pe pe_name 10" \
  --thread 10

ngs-fq2bam

$ fuc ngs-fq2bam -h
usage: fuc ngs-fq2bam [-h] [--bed PATH] [--thread INT] [--platform TEXT]
                      [--job TEXT] [--force] [--keep]
                      manifest fasta output qsub java vcf [vcf ...]

Pipeline for converting FASTQ files to analysis-ready BAM files.

Here, "analysis-ready" means that the final BAM files are: 1) aligned to a
reference genome, 2) sorted by genomic coordinate, 3) marked for duplicate
reads, 4) recalibrated by BQSR model, and 5) ready for downstream analyses
such as variant calling.

External dependencies:
  - SGE: Required for job submission (i.e. qsub).
  - BWA: Required for read alignment (i.e. BWA-MEM).
  - SAMtools: Required for sorting and indexing BAM files.
  - GATK: Required for marking duplicate reads and recalibrating BAM files.

Manifest columns:
  - Name: Sample name.
  - Read1: Path to forward FASTA file.
  - Read2: Path to reverse FASTA file.

Positional arguments:
  manifest         Sample manifest CSV file.
  fasta            Reference FASTA file.
  output           Output directory.
  qsub             SGE resoruce to request for qsub.
  java             Java resoruce to request for GATK.
  vcf              One or more reference VCF files containing known variant
                   sites (e.g. 1000 Genomes Project).

Optional arguments:
  -h, --help       Show this help message and exit.
  --bed PATH       BED file.
  --thread INT     Number of threads to use (default: 1).
  --platform TEXT  Sequencing platform (default: 'Illumina').
  --job TEXT       Job submission ID for SGE.
  --force          Overwrite the output directory if it already exists.
  --keep           Keep temporary files.

[Example] Specify queue:
  $ fuc ngs-fq2bam \
  manifest.csv \
  ref.fa \
  output_dir \
  "-q queue_name -pe pe_name 10" \
  "-Xmx15g -Xms15g" \
  1.vcf 2.vcf 3.vcf \
  --thread 10

[Example] Specify nodes:
  $ fuc ngs-fq2bam \
  manifest.csv \
  ref.fa \
  output_dir \
  "-l h='node_A|node_B' -pe pe_name 10" \
  "-Xmx15g -Xms15g" \
  1.vcf 2.vcf 3.vcf \
  --thread 10

ngs-hc

$ fuc ngs-hc -h
usage: fuc ngs-hc [-h] [--bed PATH] [--dbsnp PATH] [--thread INT]
                  [--batch INT] [--job TEXT] [--force] [--keep] [--posix]
                  manifest fasta output qsub java1 java2

Pipeline for germline short variant discovery.

External dependencies:
  - SGE: Required for job submission (i.e. qsub).
  - GATK: Required for variant calling (i.e. HaplotypeCaller) and filtration.

Manifest columns:
  - BAM: Recalibrated BAM file.

Positional arguments:
  manifest      Sample manifest CSV file.
  fasta         Reference FASTA file.
  output        Output directory.
  qsub          SGE resoruce to request for qsub.
  java1         Java resoruce to request for single-sample variant calling.
  java2         Java resoruce to request for joint variant calling.

Optional arguments:
  -h, --help    Show this help message and exit.
  --bed PATH    BED file.
  --dbsnp PATH  VCF file from dbSNP.
  --thread INT  Number of threads to use (default: 1).
  --batch INT   Batch size used for GenomicsDBImport (default: 0). This
                controls the number of samples for which readers are
                open at once and therefore provides a way to minimize
                memory consumption. The size of 0 means no batching (i.e.
                readers for all samples will be opened at once).
  --job TEXT    Job submission ID for SGE.
  --force       Overwrite the output directory if it already exists.
  --keep        Keep temporary files.
  --posix       Set GenomicsDBImport to allow for optimizations to improve
                the usability and performance for shared Posix Filesystems
                (e.g. NFS, Lustre). If set, file level locking is disabled
                and file system writes are minimized by keeping a higher
                number of file descriptors open for longer periods of time.
                Use with --batch if keeping a large number of file
                descriptors open is an issue.

[Example] Specify queue:
  $ fuc ngs-hc \
  manifest.csv \
  ref.fa \
  output_dir \
  "-q queue_name" \
  "-Xmx15g -Xms15g" \
  "-Xmx30g -Xms30g" \
  --dbsnp dbSNP.vcf

[Example] Specify nodes:
  $ fuc ngs-hc \
  manifest.csv \
  ref.fa \
  output_dir \
  "-l h='node_A|node_B'" \
  "-Xmx15g -Xms15g" \
  "-Xmx30g -Xms30g" \
  --bed in.bed

ngs-m2

$ fuc ngs-m2 -h
usage: fuc ngs-m2 [-h] [--bed PATH] [--force] [--keep]
                  manifest fasta output pon germline qsub java

Pipeline for somatic short variant discovery.

External dependencies:
  - SGE: Required for job submission (i.e. qsub).
  - GATK: Required for variant calling (i.e. Mutect2) and filtration.

Manifest columns:
  - Tumor: Recalibrated BAM file for tumor.
  - Normal: Recalibrated BAM file for matched normal.

Positional arguments:
  manifest    Sample manifest CSV file.
  fasta       Reference FASTA file.
  output      Output directory.
  pon         PoN VCF file.
  germline    Germline VCF file.
  qsub        SGE resoruce to request for qsub.
  java        Java resoruce to request for GATK.

Optional arguments:
  -h, --help  Show this help message and exit.
  --bed PATH  BED file.
  --force     Overwrite the output directory if it already exists.
  --keep      Keep temporary files.

ngs-pon

$ fuc ngs-pon -h
usage: fuc ngs-pon [-h] [--bed PATH] [--force] [--keep]
                   manifest fasta output qsub java

Pipeline for constructing a panel of normals (PoN).

Dependencies:
  - GATK: Required for constructing PoN.

Manifest columns:
  - BAM: Path to recalibrated BAM file.

Positional arguments:
  manifest    Sample manifest CSV file.
  fasta       Reference FASTA file.
  output      Output directory.
  qsub        SGE resoruce to request for qsub.
  java        Java resoruce to request for GATK.

Optional arguments:
  -h, --help  Show this help message and exit.
  --bed PATH  BED file.
  --force     Overwrite the output directory if it already exists.
  --keep      Keep temporary files.

[Example] Specify queue:
  $ fuc ngs-pon \
  manifest.csv \
  ref.fa \
  output_dir \
  "-q queue_name" \
  "-Xmx15g -Xms15g"

[Example] Specify nodes:
  $ fuc ngs-pon \
  manifest.csv \
  ref.fa \
  output_dir \
  "-l h='node_A|node_B'" \
  "-Xmx15g -Xms15g"

tabix-index

$ fuc tabix-index -h
usage: fuc tabix-index [-h] [--force] file

Index a GFF/BED/SAM/VCF file with Tabix.

The Tabix program is used to index a TAB-delimited genome position file
(GFF/BED/SAM/VCF) and create an index file (.tbi). The input data file must
be position sorted and compressed by bgzip.

Positional arguments:
  file        File to be indexed.

Optional arguments:
  -h, --help  Show this help message and exit.
  --force     Force to overwrite the index file if it is present.

[Example] Index a GFF file:
  $ fuc tabix-index in.gff.gz

[Example] Index a BED file:
  $ fuc tabix-index in.bed.gz

[Example] Index a SAM file:
  $ fuc tabix-index in.sam.gz

[Example] Index a VCF file:
  $ fuc tabix-index in.vcf.gz

tabix-slice

$ fuc tabix-slice -h
usage: fuc tabix-slice [-h] file regions [regions ...]

Slice a GFF/BED/SAM/VCF file with Tabix.

After creating an index file (.tbi), the Tabix program is able to quickly
retrieve data lines overlapping regions specified in the format
'chr:start-end'. Coordinates specified in this region format are 1-based and
inclusive.

Positional arguments:
  file        File to be sliced.
  regions     One or more regions.

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] Slice a VCF file:
  $ fuc tabix-slice in.vcf.gz chr1:100-200 > out.vcf

tbl-merge

$ fuc tbl-merge -h
usage: fuc tbl-merge [-h] [--how TEXT] [--on TEXT [TEXT ...]] [--lsep TEXT]
                     [--rsep TEXT] [--osep TEXT]
                     left right

Merge two table files.

This command will merge two table files using one or more shared columns.
The command essentially wraps the 'pandas.DataFrame.merge' method from the
pandas package. For details on the merging algorithms, please visit the
method's documentation page.

Positional arguments:
  left                  Left file.
  right                 Right file.

Optional arguments:
  -h, --help            Show this help message and exit.
  --how TEXT            Type of merge to be performed (default: 'inner')
                        (choices: 'left', 'right', 'outer', 'inner', 'cross').
  --on TEXT [TEXT ...]  Column names to join on.
  --lsep TEXT           Delimiter to use for the left file (default: '\t').
  --rsep TEXT           Delimiter to use for the right file (default: '\t').
  --osep TEXT           Delimiter to use for the output file (default: '\t').

[Example] Merge two tables:
  $ fuc tbl-merge left.tsv right.tsv > merged.tsv

[Example] When the left table is a CSV:
  $ fuc tbl-merge left.csv right.tsv --lsep , > merged.tsv

[Example] Merge with the outer algorithm:
  $ fuc tbl-merge left.tsv right.tsv --how outer > merged.tsv

tbl-sum

$ fuc tbl-sum -h
usage: fuc tbl-sum [-h] [--sep TEXT] [--skiprows TEXT]
                   [--na_values TEXT [TEXT ...]] [--keep_default_na]
                   [--expr TEXT] [--columns TEXT [TEXT ...]] [--dtypes PATH]
                   table_file

Summarize a table file.

Positional arguments:
  table_file            Table file.

Optional arguments:
  -h, --help            Show this help message and exit.
  --sep TEXT            Delimiter to use (default: '\t').
  --skiprows TEXT       Comma-separated line numbers to skip (0-indexed) or
                        number of lines to skip at the start of the file
                        (e.g. `--skiprows 1,` will skip the second line,
                        `--skiprows 2,4` will skip the third and fifth lines,
                        and `--skiprows 10` will skip the first 10 lines).
  --na_values TEXT [TEXT ...]
                        Additional strings to recognize as NA/NaN (by
                        default, the following values are interpreted
                        as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND',
                        '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN',
                        '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan',
                        'null').
  --keep_default_na     Whether or not to include the default NaN values when
                        parsing the data (see 'pandas.read_table' for details).
  --expr TEXT           Query the columns of a pandas.DataFrame with a
                        boolean expression (e.g. `--query "A == 'yes'"`).
  --columns TEXT [TEXT ...]
                        Columns to be summarized (by default, all columns
                        will be included).
  --dtypes PATH         File of column names and their data types (either
                        'categorical' or 'numeric'); one tab-delimited pair of
                        column name and data type per line.

[Example] Summarize a table:
  $ fuc tbl-sum table.tsv

vcf-filter

$ fuc vcf-filter -h
usage: fuc vcf-filter [-h] [--expr TEXT] [--samples PATH]
                      [--drop_duplicates [TEXT ...]] [--greedy] [--opposite]
                      [--filter_empty]
                      vcf

Filter a VCF file.

Positional arguments:
  vcf                   VCF file (compressed or uncompressed).

Optional arguments:
  -h, --help            Show this help message and exit.
  --expr TEXT           Expression to evaluate.
  --samples PATH        File of sample names to apply the marking (one
                        sample per line).
  --drop_duplicates [TEXT ...]
                        Only consider certain columns for identifying
                        duplicates, by default use all of the columns.
  --greedy              Use this flag to mark even ambiguous genotypes
                        as missing.
  --opposite            Use this flag to mark all genotypes that do not
                        satisfy the query expression as missing and leave
                        those that do intact.
  --filter_empty        Use this flag to remove rows with no genotype
                        calls at all.

[Example] Mark genotypes with 0/0 as missing:
  $ fuc vcf-filter in.vcf --expr 'GT == "0/0"' > out.vcf

[Example] Mark genotypes that are not 0/0 as missing:
  $ fuc vcf-filter in.vcf --expr 'GT != "0/0"' > out.vcf

[Example] Mark genotypes whose DP is less than 30 as missing:
  $ fuc vcf-filter in.vcf --expr 'DP < 30' > out.vcf

[Example] Same as above, but also mark ambiguous genotypes as missing:
  $ fuc vcf-filter in.vcf --expr 'DP < 30' --greedy > out.vcf

[Example] Build a complex query to select genotypes to be marked missing:
  $ fuc vcf-filter in.vcf --expr 'AD[1] < 10 or DP < 30' --opposite > out.vcf

[Example] Compute summary statistics and subset samples:
  $ fuc vcf-filter in.vcf \
  --expr 'np.mean(AD) < 10' --greedy --samples sample.list > out.vcf

[Example] Drop duplicate rows:
  $ fuc vcf-filter in.vcf --drop_duplicates CHROM POS REF ALT > out.vcf

[Example] Filter out rows without genotypes:
  $ fuc vcf-filter in.vcf --filter_empty > out.vcf

vcf-index

$ fuc vcf-index -h
usage: fuc vcf-index [-h] [--force] vcf

Index a VCF file.

This command will create an index file (.tbi) for the input VCF.

Positional arguments:
  vcf         Input VCF file to be indexed. When an uncompressed file is
              given, the command will automatically create a BGZF
              compressed copy of the file (.gz) before indexing.

Optional arguments:
  -h, --help  Show this help message and exit.
  --force     Force to overwrite the index file if it is already present.

[Example] Index a compressed VCF file:
  $ fuc vcf-index in.vcf.gz

[Example] Index an uncompressed VCF file (will create a compressed VCF first):
  $ fuc vcf-index in.vcf

vcf-merge

$ fuc vcf-merge -h
usage: fuc vcf-merge [-h] [--how TEXT] [--format TEXT] [--sort] [--collapse]
                     vcf_files [vcf_files ...]

Merge two or more VCF files.

Positional arguments:
  vcf_files      VCF files (compressed or uncompressed).

Optional arguments:
  -h, --help     Show this help message and exit.
  --how TEXT     Type of merge as defined in pandas.DataFrame.merge
                 (default: 'inner').
  --format TEXT  FORMAT subfields to be retained (e.g. 'GT:AD:DP')
                 (default: 'GT').
  --sort         Use this flag to turn off sorting of records
                 (default: True).
  --collapse     Use this flag to collapse duplicate records
                 (default: False).

[Example] Merge multiple VCF files:
  $ fuc vcf-merge 1.vcf 2.vcf 3.vcf > merged.vcf

[Example] Keep the GT, AD, DP fields:
  $ fuc vcf-merge 1.vcf 2.vcf --format GT:AD:DP > merged.vcf

vcf-rename

$ fuc vcf-rename -h
usage: fuc vcf-rename [-h] [--mode TEXT] [--range INT INT] [--sep TEXT]
                      vcf names

Rename the samples in a VCF file.

There are three different renaming modes using the 'names' file:
  - 'MAP': Default mode. Requires two columns, old names in the first
    and new names in the second.
  - 'INDEX': Requires two columns, new names in the first and 0-based
    indicies in the second.
  - 'RANGE': Requires only one column of new names but '--range' must
    be specified.

Positional arguments:
  vcf              VCF file (compressed or uncompressed).
  names            Text file containing information for renaming the samples.

Optional arguments:
  -h, --help       Show this help message and exit.
  --mode TEXT      Renaming mode (default: 'MAP') (choices: 'MAP',
                   'INDEX', 'RANGE').
  --range INT INT  Index range to use when renaming the samples.
                   Applicable only with the 'RANGE' mode.
  --sep TEXT       Delimiter to use for reading the 'names' file
                   (default: '\t').

[Example] Using the default 'MAP' mode:
  $ fuc vcf-rename in.vcf old_new.tsv > out.vcf

[Example] Using the 'INDEX' mode:
  $ fuc vcf-rename in.vcf new_idx.tsv --mode INDEX > out.vcf

[Example] Using the 'RANGE' mode:
  $ fuc vcf-rename in.vcf new_only.tsv --mode RANGE --range 2 5 > out.vcf

vcf-slice

$ fuc vcf-slice -h
usage: fuc vcf-slice [-h] vcf regions [regions ...]

Slice a VCF file for specified regions.

Positional arguments:
  vcf         Input VCF file must be already BGZF compressed (.gz) and
              indexed (.tbi) to allow random access. A VCF file can be
              compressed with the fuc-bgzip command and indexed with the
              vcf-index command.
  regions     One or more regions to be sliced. Each region must have the
              format chrom:start-end and be a half-open interval with
              (start, end]. This means, for example, chr1:100-103 will
              extract positions 101, 102, and 103. Alternatively, you can
              provide a BED file (compressed or uncompressed) to specify
              regions. Note that the 'chr' prefix in contig names (e.g.
              'chr1' vs. '1') will be automatically added or removed as
              necessary to match the input VCF's contig names.

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] Specify regions manually:
  $ fuc vcf-slice in.vcf.gz 1:100-300 2:400-700 > out.vcf

[Example] Speicfy regions with a BED file:
  $ fuc vcf-slice in.vcf.gz regions.bed > out.vcf

[Example] Output a compressed file:
  $ fuc vcf-slice in.vcf.gz regions.bed | fuc fuc-bgzip > out.vcf.gz

vcf-vcf2bed

$ fuc vcf-vcf2bed -h
usage: fuc vcf-vcf2bed [-h] vcf

Convert a VCF file to a BED file.

Positional arguments:
  vcf         VCF file (compressed or uncompressed).

Optional arguments:
  -h, --help  Show this help message and exit.

[Example] Convert VCF to BED:
  $ fuc vcf-vcf2bed in.vcf > out.bed

vcf-vep

$ fuc vcf-vep -h
usage: fuc vcf-vep [-h] [--opposite] [--as_zero] vcf expr

Filter a VCF file by annotations from Ensembl VEP.

Positional arguments:
  vcf         VCF file annotated by Ensembl VEP (compressed or uncompressed).
  expr        Query expression to evaluate.

Optional arguments:
  -h, --help  Show this help message and exit.
  --opposite  Use this flag to return only records that don't
              meet the said criteria.
  --as_zero   Use this flag to treat missing values as zero instead of NaN.

[Example] Select variants in the TP53 gene:
  $ fuc vcf-vep in.vcf "SYMBOL == 'TP53'" > out.vcf

[Example] Exclude variants from the TP53 gene:
  $ fuc vcf-vep in.vcf "SYMBOL != 'TP53'" > out.vcf

[Example] Same as above:
  $ fuc vcf-vep in.vcf "SYMBOL == 'TP53'" --opposite > out.vcf

[Example] Select splice donor or stop-gain variants:
  $ fuc vcf-vep in.vcf \
  "Consequence in ['splice_donor_variant', 'stop_gained']" > out.vcf

[Example] Build a complex query to select specific variants:
  $ fuc vcf-vep in.vcf \
  "(SYMBOL == 'TP53') and (Consequence.str.contains('stop_gained'))" > out.vcf

[Example] Select variants whose gnomAD AF is less than 0.001:
  $ fuc vcf-vep in.vcf "gnomAD_AF < 0.001" > out.vcf

[Example] Variants without AF data will be treated as having AF of 0:
  $ fuc vcf-vep in.vcf "gnomAD_AF < 0.001" --as_zero > out.vcf