|
|
||||||||
1Centre for Infectious Diseases and Microbiology (CIDM), Institute of Clinical Pathology and Medical Research (ICPMR), Westmead, NSW 2145, Australia 2Wuhan First Hospital, Hubei Province, Wuhan 430022, PR China 3TEDA School of Biological Sciences and Biotechnology and Tianjin State Laboratory of Microbial Functional Genomics, Nankai University, TEDA College, Tianjin 300457, PR China 4School of Biotechnology and Biomolecular Sciences, The University of New South Wales, NSW 2052, Australia 5Tianjin Biochip Technology Corporation, TEDA, Tianjin 300457, PR China 6Department of Medicine, The University of Sydney, NSW 2066, Australia 7National Centre for Immunisation Research and Surveillance of Vaccine Preventable Diseases, Children's Hospital at Westmead, NSW 2145, Australia
Correspondence Gwendolyn L. Gilbert lyng{at}icpmr.wsahs.nsw.gov.au
Received October 13, 2004
Accepted December 1, 2004
In a previous study, a molecular capsular type (MCT) prediction system for 51 Streptococcus pneumoniae serotypes was developed based on a combination of partial cpsAcpsB sequencing and serotype(s)/group(s)-specific PCR. In this study, another 169 S. pneumoniae isolates were added to the existing database of 427 isolates, including representatives of all 39 serotypes not previously studied. In addition to the authors own limited sequence data for all 90 serotypes, cpsAcpsB sequence data published by the S. pneumoniae capsular loci-sequencing group (http://www.sanger.ac.uk/Projects/S_pneumoniae/CPS/) at the Sanger Institute or available from GenBank were incorporated into the database. All serotypes, except 25A, were represented by at least two isolates. The number of sequence types identified was 138, of which 110 corresponded to single conventional serotypes (CSs); of these, 57 were represented by two or more isolates. Twenty-six sequence types were shared by between two and four CSs. To resolve these shared cpsAcpsB sequence types and increase the discriminatory power of our system, the genes encoding the capsular polysaccharide flippase (wzx) and polymerase (wzy) were annotated and 24 new serotype(s)/group(s)-specific PCRs targeting wzy and two targeting wzx were designed. Using both cpsAcpsB sequencing and wzx/wzy PCR, MCT correctly predicted the CSs of 516 (73 %) and the serogroup of an additional 155 (22 %) of the 708 isolates evaluated. For 5 % of isolates, MCT could not distinguish between members of five serotype pairs (37 isolates) containing members of different serogroups. Although further study of the relationship between MCT and CS is needed, this system now allows serotype or serogroup identification of 95 % of S. pneumoniae isolates.
The GenBank/EMBL/DDBJ accession numbers for the new partial cpsA (wzg)cpsB (wzh) genes are AY508586AY508641, AY621659, AY621660 and AY661448AY661457.
Three phylogenetic trees, a schematic representation of related wzx genes and five tables of data are available as supplementary material in JMM Online.
| INTRODUCTION |
|---|
|
|
|---|
S. pneumoniae comprises at least 90 serotypes (Henrichsen, 1995) distinguished by capsular polysaccharide antigens (Garcia et al., 2000). Capsule production in S. pneumoniae is largely controlled by the capsular polysaccharide synthesis (cps) gene cluster (Garcia et al., 2000). While serotyping and antibiotic-susceptibility testing remain the primary methods for characterizing pneumococci, molecular typing can add greater discrimination and complementary information (Hall, 1998). A molecular capsular typing (MCT) system for S. pneumoniae will be of greatest value when it can fully replace serotyping and allow monitoring of capsule evolution (Lawrence et al., 2000, 2003; Enright & Spratt, 1998). In a previous study, we developed a MCT prediction system for 51 S. pneumoniae serotypes based on sequencing and serotype(s)/group(s)-specific PCR, targeting the cps gene cluster (Kong & Gilbert, 2003). In this study, we aimed to extend the system to allow prediction of all 90 S. pneumoniae serotypes.
| METHODS |
|---|
|
|
|---|
A subset of well-characterized isolates, the serotypes of which were unknown at the time of receipt, was used to evaluate our genotyping system. It included 35 isolates provided by the Royal College of Pathologists of Australasia, Quality Assurance Program Pty Limited, New South Wales, Australia and 49 by the Department of Microbiology, Children's Hospital at Westmead.
Isolates were retrieved from storage by subculture on blood agar plates (Columbia II agar base supplemented with 5 % horse blood) and incubated overnight at 37 °C in 5 % CO2.
Annotation and analysis of wzx and wzy.
Analysis of homology and protein hydrophobicity was performed to annotate the wzx and wzy genes in S. pneumoniae cps gene clusters. BLAST and PSI-BLAST (Altschul et al., 1997) were used to search GenBank and Pfam protein motif databases (Bateman et al., 2002) for possible gene functions. The TMHMM v2.0 analysis program (http://www.cbs.dtu.dk/services/TMHMM-2.0/) was used to identify potential transmembrane segments from amino acid sequences (Chen et al., 2003). Sequence alignment and comparison were done using the program CLUSTAL_W (Thompson et al., 1994).
Phylogenetic trees.
The phylogenetic trees for partial cpsAcpsB, wzx and wzy were generated by the neighbour-joining method using program MEGA (Kumar et al., 1994).
MCT prediction
Oligonucleotide primers.
In addition to our previous MCT primer pairs (Kong & Gilbert, 2003), 24 new serotype(s)/group(s)-specific oligonucleotide primer pairs targeting wzy and two targeting wzx were designed for this study. The specificities, sequences, numbered base positions and melting temperatures (Tm) are shown in Table S1 (available as supplementary data in JMM Online). Expected amplicon lengths of different primer pairs were calculated from the 5'-end positions of the corresponding primers. Since these PCRs were designed to resolve serotypes with shared sequence types, all available isolates of the relevant serotypes were tested. Finally, the specificities of all 24 new primer pairs were checked against a reference set of one strain of each of the 90 serotypes.
DNA preparation, PCR, sequencing and sequence analysis. These steps were performed as previously described (Kong & Gilbert, 2003). The only difference was that for the new PCRs 5560 °C was used as the annealing temperature because of the low Tm values of the new primers.
Nucleotide sequence accession numbers.
Fifty-six new sequences generated in this study for partial cpsA (wzg)cpsB (wzh) genes were deposited in GenBank with the following accession numbers: AY508586AY508641, AY621659, AY621660 and AY661448AY661457 (see Table S2, available as supplementary data in JMM Online).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Extension of partial cpsAcpsB sequence database
Partial cpsAcpsB sequencing primers.
The sequencing primers cpsS1cpsA3 produced amplicons from all isolates studied in this and our previous study, except for three belonging to rare serotypes, 25F, 25A and 38, and five that were non-serotypable (Kong & Gilbert, 2003). Two additional primer pairs, cpsS1cpsA1 and cpsS3cpsA2, formed amplicons from isolates belonging to serotypes 25F, 25A and 38 and two non-serotypable isolates (the other three non-serotypable isolates were not studied further).
Sequence type nomenclature. Sequence types were generally named according to the corresponding serotype, with a suffix representing the source of the isolate in which they were first identified. The name given where a sequence type was shared by multiple serotypes includes all (two to four) of the serotypes in ascending numerical order (e.g. 15B-15C-22F-22A) (Henrichsen, 1995). One or two representative sequences of each sequence type were deposited into GenBank (see Tables S2 and S3, available as supplementary data in JMM Online, for sequence type nomenclature and corresponding GenBank accession numbers).
Phylogenetic tree based on partial cpsAcpsB genes. A tree based on partial cpsAcpsB sequences of all isolates studied is shown in Fig. S1 (available as supplementary data in JMM Online). The tree shows similarities between sequence types analogous to a multilocus sequence type tree (Enright & Spratt, 1998) rather than true phylogenetic relationships. Adding the sequence of an isolate of unknown serotype to the cpsAcpsB tree, which contains all the sequences in our database, will allow us to infer its most likely serotype; the predictive accuracy will increase as the number of sequences increases (see Table S2, available as supplementary data in JMM Online).
Sequence type vs. mutation. Based on our sequence type definition, heterogeneity at one or more sites defines a new sequence type (Kong & Gilbert, 2003), which is consistent with the widely accepted principle for definition of a multilocus sequence type (Enright & Spratt, 1998, 1999). This strategy does not allow us to distinguish significant evolutionary mutations from accidental point mutations, but it is a consistent, unambiguous basis for sequence type nomenclature (see Tables S2 and S3, available as supplementary data in JMM Online).
Extension and ongoing evaluation of our sequence type database. Our database has been extended to 90 serotypes, but its development will continue as new sequence types or even serotypes are identified. In future, more isolates of each serotype will be examined as they become available and the results added to our database. It may be necessary to modify the database when additional sequence types are identified and discrepancies are resolved (for example, our data differ from cps gene cluster sequences reported by the Sanger Institute for serotypes 3, 25A and 28F).
Progress so far demonstrates that it is possible to generate an accessible cpsAcpsB sequence database for practical use by S. pneumoniae serotyping reference laboratories (McEllistrem et al., 2004). In general, the more serotypes, sequence types and isolates of each that are included in the database, the greater the accuracy of serotype prediction using sequence data. The rationale for making our database available at this stage is to allow others to use and contribute to further evaluation of the effectiveness of the serotype prediction system.
The results of our preliminary evaluation of 84 selected well-characterized isolates, representing 46 serotypes, are shown in Table S4 (available as supplementary data in JMM Online). cpsAcpsB sequencing alone correctly characterized the serotype of 41 isolates and the serogroup of 13. Six isolates belonged to one of four new sequence types, which have been added to our database (Table S2, available as supplementary data in JMM Online). Serotype(s)/group(s)-specific PCR allowed correct serotype (18) or serogroup (seven) identification of another 25 isolates (including those belonging to new sequence types). The remaining five isolates belonged to one of three pairs of serotypes, individual members of which can be rapidly distinguished using serotype-specific antisera (see Table S4, available as supplementary data in JMM Online).
Other potential uses of the cpsAcpsB database. It has been recently reported that serotype 14 variants of the France 9V3 clone from Baltimore, Maryland, can be differentiated by cpsB gene sequence variation (McEllistrem et al., 2004). In addition, sequence variation in cpsB is related to S. pneumoniae strain virulence (Morona et al., 2004). Incorporation of these sequence variants and related epidemiological and virulence data into the cpsAcpsB database would allow them to be easily recognized by other researchers.
Are shared sequence types plausible? In order to explain the sharing of cpsAcpsB sequence types by more than one serotype, we studied their antigenic formulae (Henrichsen, 1995). Among the 24 shared cps sequence types (genotypes), the majority involved closely related serotypes (or phenotypes). However, four (2-41A, 10A-17A, 10A-23F, 13-20) were shared between apparently unrelated serotypes (no antigenic cross-reactions) and three (11A-11D-18F, 15B-15C-22F-22A, 17F-35B-35C-42) between both cross-reacting and non-cross-reacting serotypes (Henrichsen, 1995) (see Table S3, available as supplementary data in JMM Online). The latter probably can be explained by recombination events (Coffey et al., 1998, 1999).
S. pneumoniae is characterized by high-frequency recombination within the cps gene cluster, including wzx, leading to serotype switching among isolates within genetic lineages defined by relationships between the more conserved housekeeping genes (Coffey et al., 1998; Jiang et al., 2001). Although wzx sequences are usually highly variable (Samuel & Reeves, 2003), we found that those of 24 serotypes share high-level (72100 %) homology. We found three main recombination sites within these 24 wzx sequences (base positions 395, 775 and 1150) using PhylPro 1.0 (Weiller, 1998), which generated the diagrammatic representation of polymorphic sites and hypothetical recombination events as shown in Fig. S2 (available as supplementary data in JMM Online). These regions of high-level similarity in wzx suggest recent recombination.
Are wzx and wzy helpful?
In our previous study, we showed that wzx- and wzy-based PCRs increase the accuracy of cpsAcpsB sequence-based serotype prediction (Kong & Gilbert, 2003; Rubin & Rizvi, 2004). Therefore, in order to extend our serotype-prediction strategy to all 90 serotypes, we examined all known wzx and wzy sequences (see Table S3, available as supplementary data in JMM Online). This was facilitated by the timely publication, by the Sanger Institute (http://www.sanger.ac.uk/Projects/S_pneumoniae/CPS/), of the complete sequences of the cps gene clusters of 90 serotypes. We independently annotated all 89 available wzx and wzy sequences (not including serotype 3, which lacks these genes), using 17 available serotype cps gene cluster sequences from GenBank as reference (see Table S5, available as supplementary data in JMM Online) (Kong & Gilbert, 2003). Our sequence data showed significant discrepancies when compared with the Sanger Institute sequences for serotypes 3, 25A and 28F, which have not been resolved despite repetition of sequencing.
Based on previous studies, both wzx and wzy should be serotype-specific (Jiang et al., 2001; Kong & Gilbert, 2003), but the present study suggests that this is not straightforward. Our analysis showed that, for most serotypes, wzy is shorter but more heterogeneous than wzx (see Tables S3 and S5, available as supplementary data in JMM Online). These observations, as well as evidence of wzx recombination events (see above and Fig. S2, available as supplementary data in JMM Online), suggest that wzy is a more suitable target for serotype(s)/group(s)-specific PCR for all 90 serotypes except serotype 3, which lacks these genes and so will need to be identified on the basis of other serotype 3-specific cps genes (Kong & Gilbert, 2003).
To increase the predictive accuracy of our system, we designed 26 serotype(s)/group(s)-specific PCRs targeting wzy and two targeting wzx, in addition to those developed in our previous study. The sensitivities and specificities of the 26 new PCR primer pairs were assessed, initially, for the corresponding shared sequence types and then with a reference set of all 90 serotypes (see Table S1, available as supplementary data in JMM Online for primer pair specificity). All primer pairs amplified isolates belonging to corresponding serotypes (see Tables S1S4, available as supplementary data in JMM Online) and did not amplify unrelated serotypes. As shown in our previous study (Kong & Gilbert, 2003), partial wzy and wzx sequencing can distinguish serotypes 7B and 7C from 40, 10F from 10C, 11F/11B (identical) from 11C, 12A/46 (identical) from 12F/12B/44 (identical), 35A from 35C/42 (identical) and 35F from 47F. However, they cannot resolve individual serotypes within some isolates of sequence types 25F-38 and 6A-6B.
Comprehensive MCT results. The final MCT results for 596 isolates (427 previously studied and 169 new isolates) (Kong & Gilbert, 2003) and 112 previously published cps sequences are shown in Table S2 (available as supplementary data in JMM Online). Our database now includes all 90 S. pneumoniae serotypes and 140 partial cpsAcpsB sequence types (including two non-serotypable strains). We have at least two isolates or sequences of 89 serotypes. We did not use the cps sequence published by Sanger Institute for serotype 25A, which was reported to be identical to that of serotype 29 and differs from our partial cpsAcpsB sequence for the same serotype 25A strain (supplied to us and the Sanger Institute by Statens Serum Institut). Our partial sequence results for serotypes 38, 25F and 25A also show that serotype 25A cps is not identical to 29.
Most (110) sequence types correspond to a single serotype and, of these, 57 are represented by two or more isolates. For 516 of 708 (73 %) isolates and published sequences, CS and MCT are identical. MCT of another 155 (22 %) isolates identified the correct serogroup. For the remaining 37 (5 %) isolates, MCT could not distinguish between members of five pairs of CSs which shared the same sequence type (7B/40, 12A/46, 25F/38, 35F/47F, 35C/42; see Table S2, available as supplementary data in JMM Online). Two antisera would be required to identify individual members of these groups.
Relationship between the partial cpsAcpsB, wzx and wzy trees
In the partial cpsAcpsB tree (see Fig. S1, available as supplementary data in JMM Online), and as suggested in Table S2 (available as supplementary data in JMM Online), some serotypes are clustered because they share the same or very similar sequences. In the wzx and wzy trees these show similar relationships (see Figs S3 and S4, available as supplementary data in JMM Online), but there are differences between the trees. For example, serotypes 17A, 34 and 10B, which are closely related in the cpsAcpsB tree (see Fig. S1 ), are only distantly related in both the wzx and the wzy trees (see Figs S3 and S4), suggesting that they have different evolutionary histories. This illustrates the potential risk of using a single gene or even one gene cluster to infer phylogenetic relationships (Trzcinski et al., 2003). It also implies that, for a final accurate MCT prediction system, the combination of different cps genes may increase the predictive accuracy. Based on our study, we recommend the combined use of both cpsAcpsB and wzy, at least.
PCR or microarray?
In future, microarray (genechip or equivalent)-based technology should be a practical solution for MCT prediction of all 90 pneumococcal serotypes (Magee et al., 2001). As a prototype, we have developed a practical multiplex PCR and reverse line blot hybridization assay (van den Brule et al., 2002; Wang et al., 2004) to identify the 23 serotypes included in the polysaccharide vaccine. This assay showed very promising results in preliminary evaluation, using the 90-serotype reference panel and a small number of clinical isolates (data not shown). The results will be reported separately after systemic evaluation of a large number of clinical isolates. We are also trying to develop a genechip microarray to identify all the 90 serotypes. Meanwhile, we will use the cpsAcpsB sequencing and selected wzy/wzx PCR strategy we previously described (Kong & Gilbert, 2003), for which we now have 26 additional primer sets, to resolve shared sequence types (Rubin & Rizvi, 2004).
The relationship between cps gene clusters and CSs
Because cpsAcpsB, wzx and wzy PCR/sequencing cannot resolve all serotypes, we studied selected whole cps gene cluster sequences, especially for serotypes in which wzx and wzy were very similar (see Table S3, available as supplementary data in JMM Online). Nevertheless, some serotypes remain unresolved, either because the heterogeneity between their cps sequences was minor and inconsistent (e.g. serotypes 6A and 6B) (Kong & Gilbert, 2003) or because the serotype-specific gene was located outside the cps gene cluster (e.g. serotype 37) (Llull et al., 2001). We cannot rule out the possibility that some rare serotypes have arisen as a result of aberrant gene replication and expression such as serotypes 15B and 15C (van Selm et al., 2003) or as an isolated accidental event (Waite et al., 2001, 2003), without a consistent molecular basis, which could explain their rarity (Henrichsen, 1995).
Benefits from the Sanger Institute S. pneumoniae capsular loci project
In addition to several other completed and continuing S. pneumoniae genomic projects (Hoskins et al., 2001; Tettelin et al., 2001), the S. pneumoniae capsular loci project at the Sanger Institute (http://www.sanger.ac.uk/Projects/S_pneumoniae/CPS/) has been an invaluable resource in development of our MCT prediction system. Without it, our annotation of wzx and wzy and development of many of our serotype(s)/group(s)-specific PCR would have been impossible. By making available whole cps gene cluster sequences, it also helped us to understand relationships between different serotypes that share the same sequence. Integration of the Sanger Institute S. pneumoniae cps cluster sequences into our database, allowed us to determine the subtypes to which they belong and examine them within the context of a larger S. pneumoniae population. However, discrepancies between our results and those of the Sanger Institute for several cps sequences, including serotypes 3, 25A and 28F, need to be resolved by repeat sequencing.
Conclusion
In this study, we have extended our previous MCT prediction system to 90 serotypes. The combination of cps sequence data from the S. pneumoniae capsular loci project and other known mechanisms (Llull et al., 2001; Waite et al., 2001) cannot fully account for all conventional serotype differences. Therefore, it is too early to fully replace CS with MCT (Hall, 1998). However, MCT can be used as an objective alternative to identify serotypes of
73 % of isolates and serogroups of another 22 %, and limit the identification of the remainder to two serotypes. It can resolve discrepancies in CS and identify non-serotypable isolates. Moreover, it will allow development of rapid and relatively inexpensive typing systems (such as reverse line blot or, in future, genechips) for surveillance of distribution and prevalence of serotypes/groups and other important characteristics, such as antibiotic resistance and virulence markers (Magee et al., 2001). However, unresolved controversies between CS and MCT deserve further study to improve our understanding of CS and the accuracy of the MCT system.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. Yu, M. d. G. S. Carvalho, B. Beall, and M. H. Nahm A rapid pneumococcal serotyping system based on monoclonal antibodies and PCR J. Med. Microbiol., February 1, 2008; 57(2): 171 - 178. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Zhou, F. Kong, Z. Tong, and G. L. Gilbert Identification of Less-Common Streptococcus pneumoniae Serotypes by a Multiplex PCR-Based Reverse Line Blot Hybridization Assay J. Clin. Microbiol., October 1, 2007; 45(10): 3411 - 3415. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-C. Slotved, F. Kong, L. Lambertsen, S. Sauer, and G. L. Gilbert Serotype IX, a Proposed New Streptococcus agalactiae Serotype J. Clin. Microbiol., September 1, 2007; 45(9): 2929 - 2936. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Kong, M. Brown, A. Sabananthan, X. Zeng, and G. L. Gilbert Multiplex PCR-Based Reverse Line Blot Hybridization Assay To Identify 23 Streptococcus pneumoniae Polysaccharide Vaccine Serotypes. J. Clin. Microbiol., May 1, 2006; 44(5): 1887 - 1891. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Lin, M. S. Kaltoft, A. P. Brandao, G. Echaniz-Aviles, M. C. C. Brandileone, S. K. Hollingshead, W. H. Benjamin, and M. H. Nahm Validation of a Multiplex Pneumococcal Serotyping Assay with Clinical Samples J. Clin. Microbiol., February 1, 2006; 44(2): 383 - 388. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| INT J SYST EVOL MICROBIOL | J MED MICROBIOL | MICROBIOLOGY | J GEN VIROL | ALL SGM JOURNALS |