CompostBin is a DNA-composition-based binning algorithm for classifying metagenomic reads. Unlike previous methods that seek to bin assembled contigs and often require training on known reference genomes, CompostBin has the ability to accurately bin raw sequence reads without need for assembly or training. It applies principal component analysis to project the data into an informative lower-dimensional space, and then uses the normalized cut clustering algorithm on this filtered data set to classify sequences into taxon-specific bins. We have demonstrated the algorithm’s accuracy on a variety of simulated data sets and on one metagenomic data set with known species assignments. CompostBin is a work in progress, with several refinements of the algorithm planned for the future.
The software is available at Figshare here.
CompostBin is described in Chatterji S, Yamazaki I, Bai Z, Eisen JA, CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads , in RECOMB 2008.
Alas, despite assurances by the 1st author that this would be openly available, it is not. A pre-publication version has been submitted to arxiv.org here.