Using GraphBin2
A formatted initial binning result from the prepare subcommand can be improved by providing it to GraphBin2 using the subcommand graphbin2.
Run gbintk graphbin2 --help or gbintk graphbin2 -h to list the help message for GraphBin2.
Usage: gbintk graphbin2 [OPTIONS]
GraphBin2: Refined and Overlapped Binning of Metagenomic Contigs Using
Assembly Graphs
Options:
--assembler [spades|megahit|flye]
name of the assembler used (SPAdes, MEGAHIT
or Flye) [required]
--graph PATH path to the assembly graph file [required]
--contigs PATH path to the contigs file [required]
--paths PATH path to the contigs.paths (metaSPAdes) or
assembly.info (metaFlye) file
--abundance PATH path to the abundance file [required]
--binned PATH path to the .csv file with the initial
binning output from an existing tool
[required]
--output PATH path to the output folder [required]
--prefix TEXT prefix for the output file
--depthb INTEGER maximum depth for the breadth-first-search.
[default: 5]
--threshold FLOAT threshold for determining inconsistent
vertices. [default: 1.5]
--delimiter [comma|tab] delimiter for input/output results. Supports
a comma and a tab. [default: comma]
--nthreads INTEGER number of threads to use. [default: 8]
-h, --help Show this message and exit.
Input Format
The SPAdes version of GraphBin2 takes in 5 files as inputs (required).
- Contigs file (in
.fastaformat) - Assembly graph file (in
.gfaformat) - Contig paths file (in
.pathsformat) - A delimited text file containing the initial binning result (e.g.
<contig_id>,<groud_truth_bin>in.csvformat) - A tab delimited file containing the contig identifier and its average read coverage for each contig - A
.tsvcan be obtained by running a read coverage calculation tool such as CoverM or Koverage.
The MEGAHIT version of GraphBin2 takes in 4 files as inputs (required).
- Contigs file (in
.fastaformat) - Assembly graph file (in
.gfaformat) - A delimited text file containing the initial binning result (e.g.
<contig_id>,<groud_truth_bin>in.csvformat) - A tab delimited file containing the contig identifier and its average read coverage for each contig
The Flye version of GraphBin2 takes in 5 files as inputs (required).
- Assembly graph file (
assembly_graph.gfa) - Contigs file (
assembly.fasta) - Contig paths file (
assembly_info.txt) - A delimited text file containing the initial binning result (e.g.
<contig_id>,<groud_truth_bin>in.csvformat) - A tab delimited file containing the contig identifier and its average read coverage for each contig
Note: Make sure that the initial binning result consists of contigs belonging to only one bin. GraphBin2 is designed to handle initial contigs which belong to only one bin.
Example Usage
# SPAdes assembly available in tests/data/
gbintk graphbin2 --assembler spades --graph tests/data/5G_metaSPAdes/assembly_graph_with_scaffolds.gfa --contigs tests/data/5G_metaSPAdes/contigs.fasta --paths tests/data/5G_metaSPAdes/contigs.paths --binned tests/data/5G_metaSPAdes/initial_contig_bins.csv --abundance tests/data/5G_metaSPAdes/abundance.tsv --output tests/data/5G_metaSPAdes/graphbin2_results
# MEGAHIT assembly available in tests/data/
gbintk graphbin2 --assembler megahit --graph tests/data/5G_MEGAHIT/final.gfa --contigs tests/data/5G_MEGAHIT/final.contigs.fa --binned tests/data/5G_MEGAHIT/initial_contig_bins.csv --abundance tests/data/5G_MEGAHIT/abundance.tsv --output tests/data/5G_MEGAHIT/graphbin2_results
# Flye assembly available in tests/data/
gbintk graphbin2 --assembler flye --contigs tests/data/1Y3B_Flye/assembly.fasta --paths tests/data/1Y3B_Flye/assembly_info.txt --graph tests/data/1Y3B_Flye/graph_file.gfa --binned tests/data/1Y3B_Flye/initial_contig_bins.csv --abundance tests/data/1Y3B_Flye/abundance.tsv --output tests/data/1Y3B_Flye/graphbin2_results
Output
The output of GraphBin2 will contain the following main files and folders.
- A delimited text file containing the contig identifier and bin identifier for each binned contig (e.g.
graphbin2_output.csv). binsfolder containing.fastafiles of the refined bins. The bins include shared contigs as well.- Shared contigs and their corresponding bins can be found in the
graphbin2.logfile.