CompostBin is a DNA-composition-based binning algorithm for classifying metagenomic reads. Unlike previous methods that seek to bin assembled contigs and often require training on known reference genomes, CompostBin has the ability to accurately bin raw sequence reads without need for assembly or training. It applies principal component analysis to project the data into an informative lower-dimensional space, and then uses the normalized cut clustering algorithm on this filtered data set to classify sequences into taxon-specific bins. We have demonstrated the algorithm’s accuracy on a variety of simulated data sets and on one metagenomic data set with known species assignments. CompostBin is a work in progress, with several refinements of the algorithm planned for the future.
The software is available here.
CompostBin is described in “Sourav Chatterji, Ichitaro Yamazaki, Zhaojun Bai and Jonathan Eisen, CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads , in RECOMB 2008.”
A pre-publication version has been submitted to arxiv.org here.