HM16STR Healthy - 16S rRNA Trimmed Data Set
Raw 16S sequence reads must be processed before they can be used to infer useful taxonomic information. The HMP DCC performed baseline processing and analysis of all 16S variable region sequences generated from >10,000 samples from healthy human subjects, corresponding to 16S variable regions v1v3, v3v5 and v6v9. Here we provide access to all trimmed, deconvoluted fasta files, including a subset of >5000 samples described in a series of 2012 publications (see the 'Published Data' tab for more information), as well as all subsequently sequenced samples.
These trimmed datasets were then processed by a pipeline that ran the following analysis steps: a) 16S reference alignment via the NAST-iEr alignment tool; b) chimera identification via ChimeraSlayer; c) aberrant sequence identification via WigeoN; and d) taxonomic binning using RDP classifier. The first three steps were performed using components of the Broad Institute’s Microbiome Utilities Portal.
Protocols and Tools