Each 25-μl
reaction consisted of 2.5 μl of Takara 10× Ex Taq Buffer (Mg2+ free), 2 μl of dNTP Mix (2.5 mM), 1.5 μl of Mg2+ (25 mM), 0.25 μl of Takara Ex Taq DNA polymerase (2.5 units), 1 μl of template DNA, 0.5 μl selleck inhibitor of 10 μM barcode primer 967 F, 0.5 μl of 10 μM primer 1406R, and 16.75 μl of ddH2O. The two PCR products were sequenced independently in two sequencing batches at the Beijing Genomic Institute using paired-end sequencing with an Illumina HiSeq 2000 platform, and 101 bp were sequenced from each end. The sequences have been deposited in the sequence read archive (SRA) with accession number from ERS346316 to ERS346371. Sequence processing and analysis We wrote a Perl script to separate tags according to their barcodes with the following steps: the primer region of each tag was first identified with no mismatches allowed; tags which failed to match primers were replaced by their reverse complements, and the primer region was identified again; the barcodes (region before the primer) and target V6
region (region after the primer) were stored for each tag; tags were separated according to their barcodes, and tags without any matching samples were discarded. For quality control purposes, no mismatches were allowed in the primer or barcode regions (see above). Furthermore, we removed tags with ambiguous bases (N) and screened potential chimeras with UCHIME (de novo mode, parameters set as follows: –minchunk 20 –xn 7 –noskipgaps 2 [12]. To unify selleck chemical the target region of the tags from the two primer sets, we extracted the V6 region of each tag by cutting 60 bp from the right end of the sequences from V6R primers (960 bp to 1,028 bp in E. coli). To avoid the effects of different sequencing depths, all samples were normalized to 5,000 sequences
for subsequent analyses. We calculated the Good’s coverage of each sample at this depth. The formula used was , where C is the Good’s coverage, n is the number of OTUs with only one tag per sample, and N is the number of all tags in that sample. TSC was used to cluster the tags into Bay 11-7085 OTUs, with the similarity threshold set to 0.97 [13]. GAST was used to assign these sequences into taxa with the V6 database [7]. The α-diversity indices, including Chao, Ace, Shannon and observed OTUs, were calculated using the MOTHUR [14]. PCA was implemented using QIIME based on the Jaccard distance [15]. LEfSe was used to determine the biomarkers with LDA = 3 [16]. Statistical analysis was performed using SigmaPlot 12.0. Results and discussion Illumina paired-end sequencing results In total, we determined 417,821 tags with the V4F-V6R primer set (an average of 14,992 tags per sample) and 756,514 tags with the V6F-V6R primer set (an average of 27,018 tags per sample).