CompostBin “a dna COMPOSiTion-based algorithm for BINning environmental shotgun reads”

CompostBin is a DNA-composition-based binning algorithm for classifying metagenomic reads. Unlike previous methods that seek to bin assembled contigs and often require training on known reference genomes, CompostBin has the ability to accurately bin raw sequence reads without need for assembly or training. It applies principal component analysis to project the data into an informative lower-dimensional space, and then uses the normalized cut clustering algorithm on this filtered data set to classify sequences into taxon-specific bins. We have demonstrated the algorithm’s accuracy on a variety of simulated data sets and on one metagenomic data set with known species assignments. CompostBin is a work in progress, with several refinements of the algorithm planned for the future.

The software is available at Figshare here.

CompostBin is described in Chatterji S,  Yamazaki I,  Bai Z, Eisen JA, CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads , in RECOMB 2008.

Alas, despite assurances by the 1st author that this would be openly available, it is not.  A pre-publication version has been submitted to arxiv.org here.

5 Responses to CompostBin “a dna COMPOSiTion-based algorithm for BINning environmental shotgun reads”

  1. Mark Blaxter says:

    Hi Jonathan and co
    Is compostBin still a ‘live’ project? I note the arxive post is 2007… but havent found a publication subsequently, and the link to the download above gives “Server not found Firefox can’t find the server at bobcat.genomecenter.ucdavis.edu.”

    Like

    • it is not still active but those links should be active … our bobcat server died and I thought I had moved all the links to it to other sites but missed a few like this

      I will fix this asap

      I note – there is a publication.

      Chatterji S, Yamazaki I, Bai Z, Eisen JA. 2008. CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads. RECOMB 2008.

      My post doc Sourav swore to me that this conference proceedings was “Open Access”. He was wrong. You can get a copy of the paper here http://phylogenomics.files.wordpress.com/2012/02/100-chatterji-2008.pdf.

      Like

      • Mark Blaxter says:

        Many thanks… I didn’t _really_ mean that a conference proceedings wasnt a paper. oops.
        Are you still using compostBin, or is it, in your humble opinion, no longer a best tool for the job of sorting metagenome data?

        Like

  2. Not really using it still. Sourav left the lab a bit before expected for a Software Engineering job elsewhere. I think it was/is a good approach but we have not done anything more with it … my dream is to create a binning combiner tool that takes dozens of methods and combines them together like some combiner gene finders …

    Like

  3. Mark Blaxter says:

    Yes, that indeed would be a Good Thing. We are leaning towards read classification via the classification of a preliminary assembly of the reads (where the targets are longer contigs, and are thus more classifiable, as then properties such as read depth can become part of the feature set that goes into the classifier). Dream big.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s