Using MetaCoAG
A user can start an analysis by running the metacoag subcommand to bin a metagenomic dataset using the metagenomic binning tool MetaCoAG and obtain contig bins or metagenome-assembled genomes (MAGs).
Run gbintk metacoag --help or gbintk metacoag -h to list the help message for MetaCoAG.
Usage: gbintk metacoag [OPTIONS]
MetaCoAG: Binning Metagenomic Contigs via Composition, Coverage and Assembly
Graphs
Options:
--assembler [spades|megahit|flye]
name of the assembler used (SPAdes, MEGAHIT
or Flye) [required]
--graph PATH path to the assembly graph file [required]
--contigs PATH path to the contigs file [required]
--paths PATH path to the contigs.paths (metaSPAdes) or
assembly.info (metaFlye) file
--abundance PATH path to the abundance file [required]
--output PATH path to the output folder [required]
--hmm TEXT path to marker.hmm file. [default:
auxiliary/marker.hmm]
--prefix TEXT prefix for the output file
--min_length INTEGER minimum length of contigs to consider for
binning. [default: 1000]
--p_intra FLOAT RANGE minimum probability of an edge matching to
assign to the same bin. [default: 0.1;
0<=x<=1]
--p_inter FLOAT RANGE maximum probability of an edge matching to
create a new bin. [default: 0.01; 0<=x<=1]
--d_limit INTEGER distance limit for contig matching.
[default: 20]
--depthlp INTEGER depth to consider for label propagation.
[default: 10]
--n_mg INTEGER total number of marker genes. [default:
108]
--no_cut_tc do not use --cut_tc for hmmsearch.
--mg_threshold FLOAT RANGE length threshold to consider marker genes.
[default: 0.5; 0<=x<=1]
--bin_mg_threshold FLOAT RANGE minimum fraction of marker genes that should
be present in a bin. [default: 0.33333;
0<=x<=1]
--min_bin_size INTEGER minimum size of a bin to output in base
pairs (bp). [default: 200000]
--delimiter [comma|tab] delimiter for input/output results. Supports
a comma and a tab. [default: comma]
--nthreads INTEGER number of threads to use. [default: 8]
-h, --help Show this message and exit.
min_length, p_intra, p_inter, d_limit, mg_threshold, bin_mg_threshold, min_bin_size, depthlp and nthreads parameters are set by default to 1000, 0.1, 0.01, 20, 0.5, 0.3333, 200000, 10 and 8 respectively. However, the user can specify them when running MetaCoAG.
Input Format
For the metaSPAdes version, MetaCoAG takes in 4 files as inputs.
- Assembly graph file (in
.gfaformat) - Contigs file (in
.fastaformat) - Contig paths file (in
.pathsformat) - A tab delimited file containing the contig identifier and its average read coverage for each contig. A
.tsvcan be obtained by running a read coverage calculation tool such as CoverM or Koverage.
For the MEGAHIT version, MetaCoAG takes in 3 files as inputs.
- Assembly graph file (in
.gfaformat) - Contigs file (in
.fastaformat) - A tab delimited file containing the contig identifier and its average read coverage for each contig
For the Flye version, MetaCoAG takes in 4 files as inputs.
- Assembly graph file (
assembly_graph.gfa) - Contigs file (
assembly.fasta) - Contig paths file (
assembly_info.txt) - A tab delimited file containing the contig identifier and its average read coverage for each contig
Note: You can specify the delimiter for the initial binning result file and the final output file using the delimiter paramter. Supports comma (comma) and tab (tab).
Example Usage
# SPAdes assembly available in tests/data/
gbintk metacoag --assembler spades --graph tests/data/5G_metaSPAdes/assembly_graph_with_scaffolds.gfa --contigs tests/data/5G_metaSPAdes/contigs.fasta --paths tests/data/5G_metaSPAdes/contigs.paths --abundance tests/data/5G_metaSPAdes/abundance.tsv --output tests/data/5G_metaSPAdes/metacoag_results
# MEGAHIT assembly available in tests/data/
gbintk metacoag --assembler megahit --graph tests/data/5G_MEGAHIT/final.gfa --contigs tests/data/5G_MEGAHIT/final.contigs.fasta --abundance tests/data/5G_MEGAHIT/abundance.tsv --output tests/data/5G_MEGAHIT/metacoag_results
# Flye assembly available in tests/data/
gbintk metacoag --assembler flye --graph tests/data/1Y3B_Flye/assembly_graph.gfa --contigs tests/data/1Y3B_Flye/assembly.fasta --paths tests/data/1Y3B_Flye/assembly_info.txt --abundance tests/data/1Y3B_Flye/abundance.tsv --output tests/data/1Y3B_Flye/metacoag_results
Output
The output of MetaCoAG will contain the following main files and folders.
- A delimited text file containing the contig identifier and bin identifier for each binned contig (e.g.
contig_to_bin.csv) binscontaining the identified bins (FASTA file for each bin)low_quality_binscontaining the identified low-quality bins, i.e., having a fraction of marker genes lower thanbin_mg_threshold(FASTA file for each bin)*.frag.faa,*.frag.ffnand*.frag.gfffiles containing FragGeneScan output*.hmmoutfile containing HMMER output