Which includes clades for the abundant sample population that are inferred from the appropriate reference sequences

This approach is computationally more expensive compared to sequence composition, and thus requires more hardware resources for analysis of large datasets. Hybrid methods combine GDC-0449 information from both sequence composition and alignment to assess similarity between sequences. From another perspective, taxonomic assignment methods can be categorized as either unsupervised or supervised methods. Unsupervised methods cluster the sequences based on a similarity measure and then assign a taxonomic affiliation to the clusters. Supervised methods, on the other hand, infer a taxonomic model using sequences of known taxonomic origin, which are then used for taxonomic assignment of novel metagenome sequences. Given that sufficient reference data for modeling are available, supervised methods are likely to be more accurate in taxonomic assignment than clustering techniques, as the effect of non-taxonomic signals, such as guanine and cytosine strand biases, on taxonomic assignment is minimized during model induction. Recently we developed a new method PhyloPythiaS, which is a successor to the previously published software PhyloPythia. PhyloPythiaS exhibits high prediction accuracy and allows a rapid analysis of datasets with several hundred mega-bases or giga-bases. PhyloPythiaS was benchmarked on simulated and real data sets and shows good predictive performance. PhyloPythiaS shows notably reduced execution times in comparison to MEGAN and PhymmBL, as no similarity searches are performed against large databases. It also shows better predictive performance on both simulated and real metagenome samples, in particular when limited amount of reference sequences from particular species are available. While for short fragments, all methods perform less favorably than for fragments of 1 kb in length or more, similarity-based assignment with MEGAN has the lowest error rate for short fragments. PhyloPythiaS is freely available for non-commercial users and can be installed on a Linux-based machine. PhyloPythiaS can be used in two different modes – generic and sample-specific. The generic model is suitable for the analysis of a metagenome sample, if no further information on the sample’s taxonomic composition or relevant reference data are available. Assignment accuracy can be improved by creation and use of a sample-specific model. A sample-specific model is inferred from public sequence data combined with sequences with known taxonomic affiliation identified from the metagenome sample, along with a customized taxonomy. If a better match to the taxa in the metagenome sample is achieved, sample-specific models exhibit higher predictive accuracy, and have improved resolution to lowranking clades and higher coverage in terms of assigned sequences compared to the generic model. Accurate assignments can be obtained based on,100 kb of reference sequence for a modeled sample population. Here we present a web server for taxonomic sequence assignment for web-based use of PhyloPythiaS. The underlying functionality of the software is as we have described it before.