Formatting binning results for GraphBin/GraphBin2
You can use the prepare subcommand of gbintk to format the binning results so they are accepted by GraphBin/GraphBin2.
Run gbintk prepare --help or gbintk prepare -h to list the help message for formatting.
Usage: gbintk prepare [OPTIONS]
Format the initial binning result from an existing binning tool
Options:
--assembler [spades|megahit|flye]
name of the assembler used (SPAdes, MEGAHIT
or Flye) [required]
--resfolder PATH path to the folder containing FASTA files
for individual bins [required]
--delimiter [comma|tab] delimiter for input/output results. Supports
a comma and a tab. [default: comma]
--prefix TEXT prefix for the output file
--output PATH path to the output folder [required]
-h, --help Show this message and exit.
Input
prepare subcommand takes the path to the folder containing the .fasta files of the bins as input.
Example usage
You can use the prepare subcommand to format an initial binning result in to the .csv format (by default) with contig identifiers and bin ID. You can run the prepare subcommand as follows.
# For SPAdes data available in tests/data/
gbintk prepare --assembler spades --resfolder tests/data/5G_metaSPAdes/initial_bins --output tests/data/5G_metaSPAdes/prepare_results
# For MEGAHIT data available in tests/data/
gbintk prepare --assembler megahit --resfolder tests/data/5G_MEGAHIT/initial_bins --output tests/data/5G_MEGAHIT/prepare_results
# For Flye data available in tests/data/
gbintk prepare --assembler flye --resfolder tests/data/1Y3B_Flye/initial_bins --output tests/data/1Y3B_Flye/prepare_results
Output
Formatted binning result will be stored in a delimited text file in the output folder provided (e.g. initial_contig_bins.csv). Contigs are named according to their original identifier and bins are numbered according to the fasta file name.