CLI

Introduction

This section describes command line interface (CLI) for the fuc package.

For getting help on the fuc CLI:

$ fuc -h
usage: fuc [-h] [-v] COMMAND ...

positional arguments:
  COMMAND
    bam-depth    Compute read depth from SAM/BAM/CRAM files.
    bam-head     Print the header of a SAM/BAM/CRAM file.
    bam-index    Index a SAM/BAM/CRAM file.
    bam-rename   Rename the sample in a SAM/BAM/CRAM file.
    bam-slice    Slice a SAM/BAM/CRAM file.
    bed-intxn    Find the intersection of two or more BED files.
    bed-sum      Summarize a BED file.
    cov-concat   Concatenate TSV files containing depth of coverage data.
    fq-count     Count sequence reads in FASTQ files.
    fq-sum       Summarize a FASTQ file.
    fuc-compf    Compare the contents of two files.
    fuc-demux    Parse the Reports directory from bcl2fastq.
    fuc-exist    Check whether certain files exist.
    fuc-find     Find all filenames matching a specified pattern recursively.
    fuc-undetm   Compute top unknown barcodes using undertermined FASTQ from bcl2fastq.
    maf-maf2vcf  Convert a MAF file to a VCF file.
    maf-oncoplt  Create an oncoplot with a MAF file.
    maf-sumplt   Create a summary plot with a MAF file.
    maf-vcf2maf  Convert a VCF file to a MAF file.
    ngs-fq2bam   Pipeline for converting FASTQ files to analysis-ready BAM files.
    ngs-hc       Pipeline for germline short variant discovery.
    ngs-m2       Pipeline for somatic short variant discovery.
    ngs-pon      Pipeline for constructing a panel of normals (PoN).
    tbl-merge    Merge two table files.
    tbl-sum      Summarize a table file.
    vcf-filter   Filter a VCF file.
    vcf-merge    Merge two or more VCF files.
    vcf-rename   Rename the samples in a VCF file.
    vcf-slice    Slice a VCF file for specified regions.
    vcf-vcf2bed  Convert a VCF file to a BED file.
    vcf-vep      Filter a VCF file annotated by Ensembl VEP.

optional arguments:
  -h, --help     Show this help message and exit.
  -v, --version  Show the version number and exit.

For getting help on a specific command (e.g. vcf-merge):

$ fuc vcf-merge -h

bam-depth

$ fuc bam-depth -h
usage: fuc bam-depth [-h] [--bam PATH [PATH ...]] [--fn PATH] [--bed PATH]
                     [--region TEXT] [--zero]

###############################################
# Compute read depth from SAM/BAM/CRAM files. #
###############################################

Alignment files must be specified with either '--bam' or '--fn', but it's an error to use both.

By default, the command will count all reads within the alignment files. You can specify target regions with either '--bed' or '--region', but not both. When you do this, pay close attention to the 'chr' string in contig names (e.g. 'chr1' vs. '1'). Note also that '--region' requires the input files be indexed.

Under the hood, the command computes read depth using the 'samtools depth' command.

Usage examples:
  $ fuc bam-depth --bam 1.bam 2.bam --bed in.bed > out.tsv
  $ fuc bam-depth --fn bam.list --region chr1:100-200 > out.tsv

Optional arguments:
  -h, --help            Show this help message and exit.
  --bam PATH [PATH ...]
                        One or more alignment files.
  --fn PATH             File containing one alignment file per line.
  --bed PATH            BED file.
  --region TEXT         Target region ('chrom:start-end').
  --zero                Output all positions including those with zero depth.

bam-head

$ fuc bam-head -h
usage: fuc bam-head [-h] bam

############################################
# Print the header of a SAM/BAM/CRAM file. #
############################################

Usage examples:
  $ fuc bam-head in.sam
  $ fuc bam-head in.bam
  $ fuc bam-head in.cram

Positional arguments:
  bam         Alignment file.

Optional arguments:
  -h, --help  Show this help message and exit.

bam-index

$ fuc bam-index -h
usage: fuc bam-index [-h] bam

##############################
# Index a SAM/BAM/CRAM file. #
##############################

Usage examples:
  $ fuc bam-index in.bam

Positional arguments:
  bam         Alignment file.

Optional arguments:
  -h, --help  Show this help message and exit.

bam-rename

$ fuc bam-rename -h
usage: fuc bam-rename [-h] bam name

##############################################
# Rename the sample in a SAM/BAM/CRAM file. #
##############################################

Usage examples:
  $ fuc bam-rename in.bam NA12878 > out.bam

Positional arguments:
  bam         Alignment file.
  name        New sample name.

Optional arguments:
  -h, --help  Show this help message and exit.

bam-slice

$ fuc bam-slice -h
usage: fuc bam-slice [-h] [--format TEXT] [--fasta PATH]
                     bam region [region ...]

##############################
# Slice a SAM/BAM/CRAM file. #
##############################

This command will slice the input alignment file for specified region(s).

Usage examples:
  $ fuc bam-slice in.bam chr1:100-200 > out.bam
  $ fuc bam-slice in.bam chr1:100-200 chr2:100-200 > out.bam
  $ fuc bam-slice in.bam chr1:100-200 --format SAM > out.sam
  $ fuc bam-slice in.bam chr1:100-200 --format CRAM --fasta ref.fa > out.cram

Positional arguments:
  bam            Alignment file.
  region         Space-separated regions ('chrom:start-end').

Optional arguments:
  -h, --help     Show this help message and exit.
  --format TEXT  Output format (default: 'BAM') (choices: 'SAM', 'BAM', 'CRAM'). A FASTA file must be specified with '--fasta' for 'CRAM'.
  --fasta PATH   FASTA file. Required when '--format' is 'CRAM'.

bed-intxn

$ fuc bed-intxn -h
usage: fuc bed-intxn [-h] bed [bed ...]

###################################################
# Find the intersection of two or more BED files. #
###################################################

Usage examples:
  $ fuc bed-intxn 1.bed 2.bed 3.bed > intersect.bed

Positional arguments:
  bed         BED files.

Optional arguments:
  -h, --help  Show this help message and exit.

bed-sum

$ fuc bed-sum -h
usage: fuc bed-sum [-h] [--bases INT] [--decimals INT] bed

#########################
# Summarize a BED file. #
#########################

This command will compute various summary statstics for a BED file. The returned statistics include the total numbers of probes and covered base pairs for each chromosome.

By default, covered base paris are displayed in bp, but if you prefer you can, for example, use '--bases 1000' to display in kb.

Usage examples:
  $ fuc bed-sum in.bed

Positional arguments:
  bed             BED file.

Optional arguments:
  -h, --help      Show this help message and exit.
  --bases INT     Number to divide covered base pairs (default: 1).
  --decimals INT  Number of decimals (default: 0).

cov-concat

$ fuc cov-concat -h
usage: fuc cov-concat [-h] [--axis INT] PATH [PATH ...]

############################################################
# Concatenate TSV files containing depth of coverage data. #
############################################################

Usage examples:
  $ fuc cov-concat 1.tsv 2.tsv > rows.tsv
  $ fuc cov-concat 1.tsv 2.tsv --axis 1 > cols.tsv

Positional arguments:
  PATH        One or more TSV files.

Optional arguments:
  -h, --help  Show this help message and exit.
  --axis INT  The axis to concatenate along (default: 0) (chocies: 0, 1 where 0 is index and 1 is columns).

fq-count

$ fuc fq-count -h
usage: fuc fq-count [-h] [fastq ...]

########################################
# Count sequence reads in FASTQ files. #
########################################

It will look for stdin if there are no arguments.

Usage examples:
  $ fuc fq-count in.fastq
  $ cat fastq.list | fuc fq-count

Positional arguments:
  fastq       FASTQ files (zipped or unzipped) (default: stdin).

Optional arguments:
  -h, --help  Show this help message and exit.

fq-sum

$ fuc fq-sum -h
usage: fuc fq-sum [-h] fastq

###########################
# Summarize a FASTQ file. #
###########################

This command will output a summary of the input FASTQ file. The summary includes the total number of sequence reads, the distribution of read lengths, and the numbers of unique and duplicate sequences.

Usage examples:
  $ fuc fq-sum in.fastq

Positional arguments:
  fastq       FASTQ file (zipped or unqzipped).

Optional arguments:
  -h, --help  Show this help message and exit.

fuc-compf

$ fuc fuc-compf -h
usage: fuc fuc-compf [-h] left right

######################################
# Compare the contents of two files. #
######################################

This command will compare the contents of two files, returning 'True' if they are identical and 'False' otherwise.

Usage examples:
  $ fuc fuc-compf left.txt right.txt

Positional arguments:
  left        Left file.
  right       Right file.

Optional arguments:
  -h, --help  Show this help message and exit.

fuc-demux

$ fuc fuc-demux -h
usage: fuc fuc-demux [-h] [--sheet PATH] reports output

###############################################
# Parse the Reports directory from bcl2fastq. #
###############################################

This command will parse, and extract various statistics from, HTML files in the Reports directory created by the bcl2fastq or bcl2fastq2 prograrm.

After creating an output directory, the command will write the following files:
  - flowcell-summary.csv
  - lane-summary.csv
  - top-unknown-barcodes.csv
  - reports.pdf

Use '--sheet' to sort samples in the lane-summary.csv file in the same order as your SampleSheet.csv file. You can also provide a modified version of your SampleSheet.csv file to subset samples for the lane-summary.csv and reports.pdf files.

Usage examples:
  $ fuc fuc-demux Reports output
  $ fuc fuc-demux Reports output --sheet SampleSheet.csv

Positional arguments:
  reports       Reports directory.
  output        Output directory (will be created).

Optional arguments:
  -h, --help    Show this help message and exit.
  --sheet PATH  SampleSheet.csv file. Used for sorting and/or subsetting samples.

fuc-exist

$ fuc fuc-exist -h
usage: fuc fuc-exist [-h] [files ...]

######################################
# Check whether certain files exist. #
######################################

This command will check whether or not specified files including directoires exist, returning 'True' if they exist and 'False' otherwise.

The command will look for stdin if there are no arguments.

Usage examples:
  $ fuc fuc-exist test.txt
  $ fuc fuc-exist test_dir
  $ cat test.list | fuc fuc-exist

Positional arguments:
  files       Files and directories to be tested (default: stdin).

Optional arguments:
  -h, --help  Show this help message and exit.

fuc-find

$ fuc fuc-find -h
usage: fuc fuc-find [-h] [--dir PATH] pattern

################################################################
# Find all filenames matching a specified pattern recursively. #
################################################################

This command will recursively find all the filenames matching a specified pattern and then return their absolute paths.

Usage examples:
  $ fuc fuc-find "*.vcf"
  $ fuc fuc-find "*.vcf.*"
  $ fuc fuc-find "*.vcf.gz" --dir ~/test_dir

Positional arguments:
  pattern     Filename pattern.

Optional arguments:
  -h, --help  Show this help message and exit.
  --dir PATH  Directory to search in (default: current directory).

fuc-undetm

$ fuc fuc-undetm -h
usage: fuc fuc-undetm [-h] [--count INT] fastq

##########################################################################
# Compute top unknown barcodes using undertermined FASTQ from bcl2fastq. #
##########################################################################

This command will compute top unknown barcodes using undertermined FASTQ from the bcl2fastq or bcl2fastq2 prograrm.

Usage examples:
  $ fuc fuc-undetm Undetermined_S0_R1_001.fastq.gz

Positional arguments:
  fastq        Undertermined FASTQ (zipped or unzipped).

Optional arguments:
  -h, --help   Show this help message and exit.
  --count INT  Number of top unknown barcodes to return (default: 30).

maf-maf2vcf

$ fuc maf-maf2vcf -h
usage: fuc maf-maf2vcf [-h] [--fasta PATH] [--ignore_indels]
                       [--cols TEXT [TEXT ...]] [--names TEXT [TEXT ...]]
                       maf

#####################################
# Convert a MAF file to a VCF file. #
#####################################

In order to handle INDELs the command makes use of a reference assembly (i.e. FASTA file). If SNVs are your only concern, then you do not need a FASTA file and can just use the '--ignore_indels' flag.

If you are going to provide a FASTA file, please make sure to select the appropriate one (e.g. one that matches the genome assembly).

In addition to basic genotype calls (e.g. '0/1'), you can extract more information from the MAF file by specifying the column(s) that contain additional genotype data of interest with the '--cols' argument. If provided, this argument will append the requested data to individual sample genotypes (e.g. '0/1:0.23').

You can also control how these additional genotype information appear in the FORMAT field (e.g. AF) with the '--names' argument. If this argument is not provided, the original column name(s) will be displayed.

Usage examples:
  $ fuc maf-maf2vcf in.maf --fasta hs37d5.fa > out.vcf
  $ fuc maf-maf2vcf in.maf --ignore_indels > out.vcf
  $ fuc maf-maf2vcf in.maf --fasta hs37d5.fa --cols i_TumorVAF_WU --names AF > out.vcf

Positional arguments:
  maf                   MAF file (zipped or unzipped).

Optional arguments:
  -h, --help            Show this help message and exit.
  --fasta PATH          FASTA file (required to include INDELs in the output).
  --ignore_indels       Use this flag to exclude INDELs from the output.
  --cols TEXT [TEXT ...]
                        Column(s) in the MAF file.
  --names TEXT [TEXT ...]
                        Name(s) to be displayed in the FORMAT field.

maf-oncoplt

$ fuc maf-oncoplt -h
usage: fuc maf-oncoplt [-h] [--count INT] [--figsize FLOAT FLOAT]
                       [--label_fontsize FLOAT] [--ticklabels_fontsize FLOAT]
                       [--legend_fontsize FLOAT]
                       maf out

#######################################
# Create an oncoplot with a MAF file. #
#######################################

The format of output image (PDF/PNG/JPEG/SVG) will be automatically determined by the output file's extension.

Usage examples:
  $ fuc maf-oncoplt in.maf out.png
  $ fuc maf-oncoplt in.maf out.pdf

Positional arguments:
  maf                   MAF file.
  out                   Output image file.

Optional arguments:
  -h, --help            Show this help message and exit.
  --count INT           Number of top mutated genes to display (default: 10).
  --figsize FLOAT FLOAT
                        Width, height in inches (default: [15, 10]).
  --label_fontsize FLOAT
                        Font size of labels (default: 15).
  --ticklabels_fontsize FLOAT
                        Font size of tick labels (default: 15).
  --legend_fontsize FLOAT
                        Font size of legend texts (default: 15).

maf-sumplt

$ fuc maf-sumplt -h
usage: fuc maf-sumplt [-h] [--figsize FLOAT FLOAT] [--title_fontsize FLOAT]
                      [--ticklabels_fontsize FLOAT] [--legend_fontsize FLOAT]
                      maf out

##########################################
# Create a summary plot with a MAF file. #
##########################################

The format of output image (PDF/PNG/JPEG/SVG) will be automatically determined by the output file's extension.

Usage examples:
  $ fuc maf-sumplt in.maf out.png
  $ fuc maf-sumplt in.maf out.pdf

Positional arguments:
  maf                   MAF file.
  out                   Output image file.

Optional arguments:
  -h, --help            Show this help message and exit.
  --figsize FLOAT FLOAT
                        width, height in inches (default: [15, 10])
  --title_fontsize FLOAT
                        font size of subplot titles (default: 16)
  --ticklabels_fontsize FLOAT
                        font size of tick labels (default: 12)
  --legend_fontsize FLOAT
                        font size of legend texts (default: 12)

maf-vcf2maf

$ fuc maf-vcf2maf -h
usage: fuc maf-vcf2maf [-h] vcf

#####################################
# Convert a VCF file to a MAF file. #
#####################################

Usage examples:
  $ fuc maf-vcf2maf in.vcf > out.maf

Positional arguments:
  vcf         Annotated VCF file.

Optional arguments:
  -h, --help  Show this help message and exit.

ngs-fq2bam

$ fuc ngs-fq2bam -h
usage: fuc ngs-fq2bam [-h] [--bed PATH] [--thread INT] [--platform TEXT]
                      [--force] [--keep]
                      manifest fasta output qsub1 qsub2 java vcf [vcf ...]

####################################################################
# Pipeline for converting FASTQ files to analysis-ready BAM files. #
####################################################################

Here, "analysis-ready" means that the final BAM files are: 1) aligned to a reference genome, 2) sorted by genomic coordinate, 3) marked for duplicate reads, 4) recalibrated by BQSR model, and 5) ready for downstream analyses such as variant calling.

External dependencies:
  - SGE: Required for job submission (i.e. qsub).
  - BWA: Required for read alignment (i.e. BWA-MEM).
  - SAMtools: Required for sorting and indexing BAM files.
  - GATK: Required for marking duplicate reads and recalibrating BAM files.

Manifest columns:
  - Name: Sample name.
  - Read1: Path to forward FASTA file.
  - Read2: Path to reverse FASTA file.

Usage examples:
  $ fuc ngs-fq2bam manifest.csv ref.fa output_dir "-q queue_name -pe pe_name 10" "-q queue_name" "-Xmx15g -Xms15g" 1.vcf 2.vcf 3.vcf --thread 10
  $ fuc ngs-fq2bam manifest.csv ref.fa output_dir "-l h='node_A|node_B' -pe pe_name 10" "-l h='node_A|node_B'" "-Xmx15g -Xms15g" 1.vcf 2.vcf 3.vcf --thread 10

Positional arguments:
  manifest         Sample manifest CSV file.
  fasta            Reference FASTA file.
  output           Output directory.
  qsub1            SGE resoruce to request with qsub for read alignment and sorting. Since both tasks support multithreading, it is recommended to speicfy a parallel environment (PE) to speed up the process (also see '--thread').
  qsub2            SGE resoruce to request with qsub for the rest of the tasks, which do not support multithreading.
  java             Java resoruce to request for GATK.
  vcf              One or more reference VCF files containing known variant sites (e.g. 1000 Genomes Project).

Optional arguments:
  -h, --help       Show this help message and exit.
  --bed PATH       BED file.
  --thread INT     Number of threads to use (default: 1).
  --platform TEXT  Sequencing platform (default: 'Illumina').
  --force          Overwrite the output directory if it already exists.
  --keep           Keep temporary files.

ngs-hc

$ fuc ngs-hc -h
usage: fuc ngs-hc [-h] [--bed PATH] [--dbsnp PATH] [--job TEXT] [--force]
                  [--keep]
                  manifest fasta output qsub java1 java2

##################################################
# Pipeline for germline short variant discovery. #
##################################################

External dependencies:
  - SGE: Required for job submission (i.e. qsub).
  - GATK: Required for variant calling (i.e. HaplotypeCaller) and filtration.

Manifest columns:
  - BAM: Recalibrated BAM file.

Usage examples:
  $ fuc ngs-hc manifest.csv ref.fa output_dir "-q queue_name" "-Xmx15g -Xms15g" "-Xmx30g -Xms30g" --dbsnp dbSNP.vcf
  $ fuc ngs-hc manifest.csv ref.fa output_dir "-l h='node_A|node_B'" "-Xmx15g -Xms15g" "-Xmx30g -Xms30g" --bed in.bed

Positional arguments:
  manifest      Sample manifest CSV file.
  fasta         Reference FASTA file.
  output        Output directory.
  qsub          SGE resoruce to request for qsub.
  java1         Java resoruce to request for single-sample variant calling.
  java2         Java resoruce to request for joint variant calling.

Optional arguments:
  -h, --help    Show this help message and exit.
  --bed PATH    BED file.
  --dbsnp PATH  VCF file from dbSNP.
  --job TEXT    Job submission ID for SGE.
  --force       Overwrite the output directory if it already exists.
  --keep        Keep temporary files.

ngs-m2

$ fuc ngs-m2 -h
usage: fuc ngs-m2 [-h] [--bed PATH] [--force] [--keep]
                  manifest fasta output pon germline qsub java

#################################################
# Pipeline for somatic short variant discovery. #
#################################################

External dependencies:
  - SGE: Required for job submission (i.e. qsub).
  - GATK: Required for variant calling (i.e. Mutect2) and filtration.

Manifest columns:
  - Tumor: Recalibrated BAM file for tumor.
  - Normal: Recalibrated BAM file for matched normal.

Usage examples:
  $ fuc ngs-m2 manifest.csv ref.fa output_dir pon.vcf germline.vcf "-q queue_name" "-Xmx15g -Xms15g"
  $ fuc ngs-m2 manifest.csv ref.fa output_dir pon.vcf germline.vcf "-l h='node_A|node_B'" "-Xmx15g -Xms15g" --bed in.bed

Positional arguments:
  manifest    Sample manifest CSV file.
  fasta       Reference FASTA file.
  output      Output directory.
  pon         PoN VCF file.
  germline    Germline VCF file.
  qsub        SGE resoruce to request for qsub.
  java        Java resoruce to request for GATK.

Optional arguments:
  -h, --help  Show this help message and exit.
  --bed PATH  BED file.
  --force     Overwrite the output directory if it already exists.
  --keep      Keep temporary files.

ngs-pon

$ fuc ngs-pon -h
usage: fuc ngs-pon [-h] [--bed PATH] [--force] [--keep]
                   manifest fasta output qsub java

#######################################################
# Pipeline for constructing a panel of normals (PoN). #
#######################################################

The pipeline is based on GATK's tutorial "(How to) Call somatic mutations using GATK4 Mutect2" (https://gatk.broadinstitute.org/hc/en-us/articles/360035531132).

Dependencies:
  - GATK: Required for constructing PoN.

Manifest columns:
  - BAM: Path to recalibrated BAM file.

Usage examples:
  $ fuc ngs-pon manifest.csv ref.fa output_dir "-q queue_name" "-Xmx15g -Xms15g"
  $ fuc ngs-pon manifest.csv ref.fa output_dir "-l h='node_A|node_B'" "-Xmx15g -Xms15g"

Positional arguments:
  manifest    Sample manifest CSV file.
  fasta       Reference FASTA file.
  output      Output directory.
  qsub        SGE resoruce to request for qsub.
  java        Java resoruce to request for GATK.

Optional arguments:
  -h, --help  Show this help message and exit.
  --bed PATH  BED file.
  --force     Overwrite the output directory if it already exists.
  --keep      Keep temporary files.

tbl-merge

$ fuc tbl-merge -h
usage: fuc tbl-merge [-h] [--how TEXT] [--on TEXT [TEXT ...]] [--lsep TEXT]
                     [--rsep TEXT] [--osep TEXT]
                     left right

##########################
# Merge two table files. #
##########################

This command will merge two table files using one or more shared columns. The command essentially wraps the 'pandas.DataFrame.merge' method from the pandas package. For details on the merging algorithms, please visit the method's documentation page.

Usage examples:
  $ fuc tbl-merge left.tsv right.tsv > merged.tsv
  $ fuc tbl-merge left.csv right.tsv --lsep , > merged.tsv
  $ fuc tbl-merge left.tsv right.tsv --how outer > merged.tsv

Positional arguments:
  left                  Left file.
  right                 Right file.

Optional arguments:
  -h, --help            Show this help message and exit.
  --how TEXT            Type of merge to be performed ['left', 'right', 'outer', 'inner', 'cross'] (default: 'inner').
  --on TEXT [TEXT ...]  Column names to join on.
  --lsep TEXT           Delimiter to use for the left file (default: '\t').
  --rsep TEXT           Delimiter to use for the right file (default: '\t').
  --osep TEXT           Delimiter to use for the output file (default: '\t').

tbl-sum

$ fuc tbl-sum -h
usage: fuc tbl-sum [-h] [--sep TEXT] [--skiprows TEXT]
                   [--na_values TEXT [TEXT ...]] [--keep_default_na]
                   [--expr TEXT] [--columns TEXT [TEXT ...]] [--dtypes PATH]
                   table_file

###########################
# Summarize a table file. #
###########################

Usage examples:
  $ fuc tbl-sum table.tsv
  $ fuc tbl-sum table.csv --sep ,

Positional arguments:
  table_file            Table file.

Optional arguments:
  -h, --help            Show this help message and exit.
  --sep TEXT            Delimiter to use (default: '\t').
  --skiprows TEXT       Comma-separated line numbers to skip (0-indexed) or number of lines to skip at the start of the file (e.g. `--skiprows 1,` will skip the second line, `--skiprows 2,4` will skip the third and fifth lines, and `--skiprows 10` will skip the first 10 lines).
  --na_values TEXT [TEXT ...]
                        Additional strings to recognize as NA/NaN (by default, the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null').
  --keep_default_na     Wwhether or not to include the default NaN values when parsing the data (see 'pandas.read_table' for details).
  --expr TEXT           Query the columns of a pandas.DataFrame with a boolean expression (e.g. `--query "A == 'yes'"`).
  --columns TEXT [TEXT ...]
                        Columns to be summarized (by default, all columns will be included).
  --dtypes PATH         File of column names and their data types (etheir 'categorical' or 'numeric'); one tab-delimited pair of column name and data type per line.

vcf-filter

$ fuc vcf-filter -h
usage: fuc vcf-filter [-h] [--expr TEXT] [--samples PATH]
                      [--drop_duplicates [TEXT ...]] [--greedy] [--opposite]
                      [--filter_empty]
                      vcf

######################
# Filter a VCF file. #
######################

Usage examples:
  $ fuc vcf-filter in.vcf --expr 'GT == "0/0"' > out.vcf
  $ fuc vcf-filter in.vcf --expr 'GT != "0/0"' > out.vcf
  $ fuc vcf-filter in.vcf --expr 'DP < 30' > out.vcf
  $ fuc vcf-filter in.vcf --expr 'DP < 30' --greedy > out.vcf
  $ fuc vcf-filter in.vcf --expr 'AD[1] < 10' --greedy > out.vcf
  $ fuc vcf-filter in.vcf --expr 'AD[1] < 10 and DP < 30' --greedy > out.vcf
  $ fuc vcf-filter in.vcf --expr 'AD[1] < 10 or DP < 30' --greedy > out.vcf
  $ fuc vcf-filter in.vcf --expr 'AD[1] < 10 or DP < 30' --opposite > out.vcf
  $ fuc vcf-filter in.vcf --expr 'np.mean(AD) < 10' --greedy --samples sample.list > out.vcf
  $ fuc vcf-filter in.vcf --drop_duplicates CHROM POS REF ALT > out.vcf
  $ fuc vcf-filter in.vcf --filter_empty > out.vcf

Positional arguments:
  vcf                   VCF file (zipped or unzipped).

Optional arguments:
  -h, --help            Show this help message and exit.
  --expr TEXT           Expression to evaluate.
  --samples PATH        File of sample names to apply the marking (one sample per line).
  --drop_duplicates [TEXT ...]
                        Only consider certain columns for identifying duplicates, by default use all of the columns.
  --greedy              Use this flag to mark even ambiguous genotypes as missing.
  --opposite            Use this flag to mark all genotypes that do not satisfy the query expression as missing and leave those that do intact.
  --filter_empty        Use this flag to remove rows with no genotype calls at all.

vcf-merge

$ fuc vcf-merge -h
usage: fuc vcf-merge [-h] [--how TEXT] [--format TEXT] [--sort] [--collapse]
                     vcf_files [vcf_files ...]

################################
# Merge two or more VCF files. #
################################

Usage examples:
  $ fuc vcf-merge 1.vcf 2.vcf 3.vcf > merged.vcf
  $ fuc vcf-merge 1.vcf 2.vcf --format GT:AD:DP > merged.vcf

Positional arguments:
  vcf_files      VCF files (zipped or unzipped).

Optional arguments:
  -h, --help     Show this help message and exit.
  --how TEXT     Type of merge as defined in `pandas.DataFrame.merge` (default: 'inner').
  --format TEXT  FORMAT subfields to be retained (e.g. 'GT:AD:DP') (default: 'GT').
  --sort         Use this flag to turn off sorting of records (default: True).
  --collapse     Use this flag to collapse duplicate records (default: False).

vcf-rename

$ fuc vcf-rename -h
usage: fuc vcf-rename [-h] [--mode TEXT] [--range INT INT] [--sep TEXT]
                      vcf names

#####################################
# Rename the samples in a VCF file. #
#####################################

There are three different renaming modes using the 'names' file:
  - 'MAP': Default mode. Requires two columns, old names in the first and new names in the second.
  - 'INDEX': Requires two columns, new names in the first and 0-based indicies in the second.
  - 'RANGE': Requires only one column of new names but '--range' must be specified.

Usage examples:
  $ fuc vcf-rename in.vcf old_new.tsv > out.vcf
  $ fuc vcf-rename in.vcf new_idx.tsv --mode INDEX > out.vcf
  $ fuc vcf-rename in.vcf new_only.tsv --mode RANGE --range 2 5 > out.vcf
  $ fuc vcf-rename in.vcf old_new.csv --sep , > out.vcf

Positional arguments:
  vcf              VCF file (zipped or unzipped).
  names            Text file containing information for renaming the samples.

Optional arguments:
  -h, --help       Show this help message and exit.
  --mode TEXT      Renaming mode (default: 'MAP') (choices: 'MAP', 'INDEX', 'RANGE').
  --range INT INT  Index range to use when renaming the samples. Applicable only with the 'RANGE' mode.
  --sep TEXT       Delimiter to use for reading the 'names' file (default: '\t').

vcf-slice

$ fuc vcf-slice -h
usage: fuc vcf-slice [-h] [--region TEXT] [--bed PATH] [--vcf PATH] input

###########################################
# Slice a VCF file for specified regions. #
###########################################

Target regions can be specified with either '--region', '--bed', or '--vcf'.

Pay attention to the 'chr' string in contig names (e.g. 'chr1' vs. '1').

Usage examples:
  $ fuc vcf-slice in.vcf --region 1 > out.vcf
  $ fuc vcf-slice in.vcf --region 1:100-300 > out.vcf
  $ fuc vcf-slice in.vcf --region 1:100 > out.vcf
  $ fuc vcf-slice in.vcf --region chr1:100- > out.vcf
  $ fuc vcf-slice in.vcf --region chr1:-300 > out.vcf
  $ fuc vcf-slice in.vcf --bed targets.bed > out.vcf
  $ fuc vcf-slice in.vcf --vcf targets.vcf > out.vcf

Positional arguments:
  input          Input VCF file (zipped or unzipped).

Optional arguments:
  -h, --help     Show this help message and exit.
  --region TEXT  Target region to use for slicing ('chrom:start-end').
  --bed PATH     BED file to use for slicing (zipped or unzipped).
  --vcf PATH     VCF file to use for slicing (zipped or unzipped).

vcf-vcf2bed

$ fuc vcf-vcf2bed -h
usage: fuc vcf-vcf2bed [-h] vcf

#####################################
# Convert a VCF file to a BED file. #
#####################################

Usage examples:
  $ fuc vcf-vcf2bed in.vcf > out.bed

Positional arguments:
  vcf         VCF file.

Optional arguments:
  -h, --help  Show this help message and exit.

vcf-vep

$ fuc vcf-vep -h
usage: fuc vcf-vep [-h] [--opposite] [--as_zero] vcf expr

###############################################
# Filter a VCF file annotated by Ensembl VEP. #
###############################################

Usage examples:
  $ fuc vcf-vep in.vcf "SYMBOL == 'TP53'" > out.vcf
  $ fuc vcf-vep in.vcf "SYMBOL != 'TP53'" > out.vcf
  $ fuc vcf-vep in.vcf "SYMBOL == 'TP53'" --opposite > out.vcf
  $ fuc vcf-vep in.vcf "Consequence in ['splice_donor_variant', 'stop_gained']" > out.vcf
  $ fuc vcf-vep in.vcf "(SYMBOL == 'TP53') and (Consequence.str.contains('stop_gained'))" > out.vcf
  $ fuc vcf-vep in.vcf "gnomAD_AF < 0.001" > out.vcf
  $ fuc vcf-vep in.vcf "gnomAD_AF < 0.001" --as_zero > out.vcf

Positional arguments:
  vcf         VCF file annotated by Ensembl VEP.
  expr        Query expression to evaluate.

Optional arguments:
  -h, --help  Show this help message and exit.
  --opposite  Use this flag to return only records that don't meet the said criteria.
  --as_zero   Use this flag to treat missing values as zero instead of NaN.