HM16STR - 16S rRNA Trimmed Data Set
Raw 16S sequence reads must be processed before they can be used to infer useful taxonomic information. The HMP DCC performed baseline processing and analysis of all 16S variable region sequences generated from >10,000 samples from healthy human subjects. A subset of these samples were described in a series of 2012 publications in Nature and PLoS. This subset consisted of > 5000 samples, corresponding to 16S variable regions v1v3, v3v5 and v6v9. Here we provide access to trimmed, deconvoluted fasta files corresponding to this subset of samples.
These trimmed datasets were then processed by a pipeline that ran the following analysis steps: a) 16S reference alignment via the NAST-iEr alignment tool; b) chimera identification via ChimeraSlayer; c) aberrant sequence identification via WigeoN; and d) taxonomic binning using RDP classifier. The first three steps were performed using components of the Broad Institute’s Microbiome Utilities Portal.
Protocols and Tools