J Med Microbiol Track the topics, authors and articles important to you
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Robertson, G. A.
Right arrow Articles by Giffard, P. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Robertson, G. A.
Right arrow Articles by Giffard, P. M.
Agricola
Right arrow Articles by Robertson, G. A.
Right arrow Articles by Giffard, P. M.
J Med Microbiol 53 (2004), 35-45; DOI: 10.1099/jmm.0.05365-0
© 2004 Society for General Microbiology
ISSN 0022-2615

Identification and interrogation of highly informative single nucleotide polymorphism sets defined by bacterial multilocus sequence typing databases

Gail A. Robertson1, Venugopal Thiruvenkataswamy1, Hayden Shilling2, Erin P. Price1, Flavia Huygens1, Frans A. Henskens2 and Philip M. Giffard1

1Cooperative Research Centre for Diagnostics, Queensland University of Technology (Gardens Point Campus), GPO Box 2434 Brisbane, Queensland 4001, Australia 2Discipline of Computer Science and Software Engineering, University of Newcastle, Newcastle, New South Wales, Australia

Correspondence Philip M. Giffard p.giffard{at}qut.edu.au

Received June 26, 2003
Accepted September 11, 2003

A unified, bioinformatics-driven, single nucleotide polymorphism (SNP)-based approach to microbial genotyping has been developed. Multilocus sequence typing (MLST) databases consist of known variants of standardized housekeeping genes. Normally, seven fragments are defined; a sequence type (ST) consists of the variants of these fragments that are found in a particular isolate. A computer program that can identify highly informative sets of SNPs in entire MLST databases has been constructed. The SNPs either define a particular user-specified ST or provide a high value for Simpson's index of diversity (D), and may thus be generally applicable to that species. SNP sets that are diagnostic for Neisseria meningitidis ST-11 and ST-42, and high-D SNP sets for N. meningitidis and Staphylococcus aureus, were identified and real-time PCR methods to interrogate these SNPs were demonstrated. High-D SNP sets were also identified in other MLST databases. This widely applicable approach allows rapid genetic fingerprinting of infectious agents.


Abbreviations: CT, cycles to threshold; {Delta}CT, difference in cycles to threshold; D, Simpson's index of diversity; MLST, multilocus sequence typing; MRSA, methicillin-resistant Staphylococcus aureus; SNP, single nucleotide polymorphism; ST, sequence type.


    INTRODUCTION
 TOP
 INTRODUCTION
 METHODS
 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The last decade has seen a great increase in quantity of gene sequence data that are contained in searchable and downloadable databases. These data are now acquiring a significant second dimension, in that there has been a recent rapid accumulation of data concerning the intraspecific variation of genes. Important examples of this are the single nucleotide polymorphism (SNP) databases for humans (Sachidanandam et al., 2001) and the multilocus sequence typing (MLST) databases for infectious micro-organisms (Maiden et al., 1998). MLST databases consist of known variants of standardized housekeeping genes. Normally, seven fragments are defined; a sequence type (ST) consists of the variants of these fragments that are found in a particular isolate.

Known intraspecific variation can be interrogated to generate genetic fingerprints. For genetic fingerprinting procedures to be useful, their resolving power must be known. In the case of human SNPs, low average linkage between SNPs and complete allelic reassortment at each generation define the algorithms used to deduce the resolving power of SNP combinations (Krawczak, 1999). Bacteria differ in that there is usually a single haploid chromosome, recombination occurs sporadically and at different rates in different species, a given recombination event will probably not encompass the entire genome, and mutation and recombination may occur at comparable rates (Feil et al., 2001). As a consequence, in some bacterial species, a degree of linkage may exist over an appreciable fraction of the genome and observed de-linkage may either be due to recombination or the appearance of new alleles as a result of point mutation. This prevents estimation of the resolving power of bacterial SNPs by using methods that were developed for sexually reproducing diploid organisms.

The aim of this study was to develop a straightforward approach to SNP-based bacterial typing. We have made use of computer software that can identify highly informative sets of SNPs in databases of gene sequence variants and applied this to MLST databases for a number of bacterial pathogens. The algorithms used are free of assumptions concerning modes of generation of diversity. We have also developed kinetic PCR assays for interrogation of informative sets of SNPs that were identified in the Neisseria meningitidis and Staphylococcus aureus MLST databases. Kinetic PCR does not require any fluorescently labelled probes and is potentially very cost-effective. It is similar in principle to the allele-specific PCR/amplification refractory mutation system family of SNP interrogation methods, which depend on reduced extension from 3' mismatched PCR primers (Germer et al., 2000). In the non-time-resolved format, these methods are difficult to optimize, as any extension from a mismatched primer provides a template for unimpeded amplification (Giffard et al., 2001). However, in the real-time format, optimization is easier as SNP calling can be done on the basis of a difference in the number of cycles to threshold ({Delta}CT) between allele-specific reactions, rather than a difference between the amounts of amplified material at the reaction end points.


    METHODS
 TOP
 INTRODUCTION
 METHODS
 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Bacterial strains.

Seven N. meningitidis isolates were used (Table 1); these were obtained from Queensland Health Scientific Services. The STs of these isolates were determined as part of this study. The 11 S. aureus isolates that were used were all methicillin-resistant S. aureus (MRSA) from Australian healthcare facilities (Table 1). Isolates 66460/98 (ST-30), F829549 (ST-88), IP01M2046 (ST-88) and MC8011535 (ST-88) were from the Queensland Health Pathology Service collection and their STs were determined as part of this study. The remaining isolates (two with ST-30, three with ST-239 and two with ST-1) were from the Australian Group on Antimicrobial Resistance collection. The STs of these were determined as part of a study that is as yet unpublished.


View this table:
[in this window]
[in a new window]
 
Table 1. Bacterial isolates used in this study
 

Downloading of MLST databases.

All MLST data were downloaded from http://www.mlst.net in the form of alignments in FASTA format. The first 2510 STs were used for all analyses of N. meningitidis STs unless otherwise indicated, whereas for S. aureus, the following STs were available for download and were used for all analyses unless otherwise indicated: 1–103, 109, 120, 121, 123, 134, 145, 169, 178, 182, 188–190, 192–205, 207–215, 217, 220–222, 225, 228, 231, 235, 238–241, 243, 246–247 and 250.

Identification of informative SNP sets.

SNP sets were identified by using a computer program called ‘Minimum SNPs'. It is designed to provide a suite of methods for extraction of informative SNP sets from comparative sequence data. The functions of Minimum SNPs can be summarized as follows:

(1) Minimum SNPs can either carry out SNP searches on alignments derived from individual MLST loci or convert an MLST database into a single concatenated alignment that can be searched in a single operation.

(2) Highly informative sets of SNPs are identified by an approximation that we have termed the ‘anchored method'. This entails initial identification of the most informative SNP, which is termed SNP1, then identification of the next SNP (SNP2) that, in combination with SNP1, forms the most informative pair. This process is iterated, creating progressively larger combinations until a preset target, based on either informative power or number of SNPs, is reached. If two or more sets of SNPs exhibit equal informative power, then the program will provide all these sets as output. Given that this may occur at more than one point in the analysis, large numbers of different sets of SNPs may be produced. Cumulative informative power for each SNP identified is included in the output.

(3) The minimum number of loci required to define any given ST can be calculated.

(4) The informative power of sets of SNPs can be calculated by two different methods, depending on the user's requirements. The ‘specified variant’ algorithm identifies SNPs that are diagnostic for a user-specified variant, i.e. one sequence in the alignment. Such SNPs are most useful for efficient determination of whether or not an unknown isolate is a variant of particular interest. If this option is selected, the program assesses informative power by calculating the proportion of non-selected variants that have the same polymorphs as the selected variant at the SNPs being assessed (the term ‘polymorph’ is used to denote a particular state of an SNP; this is to avoid confusion with the term ‘allele', which can be used to denote variants of MLST loci). The ‘generalized’ algorithm identifies sets of SNPs that are suitable for the efficient determination of whether two isolates are likely to be the same or different. If this option is selected, the informative power of sets of SNPs is assessed by calculating Simpson's index of diversity (D) (Simpson, 1949; Hunter & Gaston, 1988). In this context, this is the probability that two sequences in the alignment, selected at random without replacement, will type differently if interrogated at the SNPs in question. This is calculated as follows:


where n is the number of sequences in the alignment, s is the number of classes defined by the SNPs under test and nj is the number of sequences in the jth class.

(5) The program allows the performance of sets of SNPs to be explored fully by incorporating a function that returns STs that conform to any combination of SNPs and polymorphs defined by the user.

The program is menu-driven, with a comprehensive graphical user interface. It was written by using the Java programming language (http://java.sun.com). Its engineering is in accordance with the object-oriented methodology (Booch, 1993). This structure facilitates future adaptation of the software to implement, for example, a web-based interface that would allow clients to download a relatively small applet that communicates with a remote-server-based compute and storage engine.

The decision to use Java rather than, for instance, C++ as the implementation language was predicated on a desire for maximum portability, as the program files generated by the Java compiler on one platform do not require recompilation to run on any alternative platform for which there is an available Java Virtual Machine. Thus, while the software is currently in use on Microsoft Windows-based PCs, it will also run on systems such as Solaris, Linux or Macintosh OS-X. It is recognized that programs written in Java execute more slowly than those written in languages that compile to native code for a given architecture (e.g. C++), but the benefits of portability were felt to far outweigh the slight performance penalty that is imposed by execution in a virtual environment. Moreover, the data structures used to store SNP data and the algorithms used to manipulate those data were chosen carefully, with maximum efficiency as a primary goal.

Interrogation of SNPs by kinetic PCR.

All reactions were carried out in an Applied Biosystems ABI 7000 Sequence Detection system by using Applied Biosystems SYBR Green PCR MasterMix. Primer design was carried out with the assistance of Primer Express 2.0 (Applied Biosystems) on sequences that were aligned by using Clustal X 1.8 (Jeanmougin et al., 1998) [accessed through the Australian National Genome Information Service (ANGIS), Sydney, Australia]. Primer sequences are listed in Table 2. Primers were obtained from Proligo and were unlabelled and purified by desalting only.


View this table:
[in this window]
[in a new window]
 
Table 2. Kinetic PCR primers Bases that provide allele specificity are underlined.
 

For N. meningitidis, cells were grown on chocolate agar that was made as follows: 36 g GC agar base (Oxoid), 10 g haemoglobin powder (Oxoid), 0.4 of one vial of Vitox (Oxoid) and water to 1000 ml. To prepare genomic DNA, individual colonies were suspended in 400 mM Tris/EDTA buffer (10 mM Tris, 1 mM EDTA, pH 8.0) and boiled for 6 min, then centrifuged at 13 400 g for 5 min to pellet debris. Aliquots of the supernatant were transferred to fresh microfuge tubes and stored at -82 °C. Reactions contained 1 µl extract, 2.5–5 pmol each primer and 1x SYBR Green PCR MasterMix in a final volume of 20 µl. A two-step temperature cycling protocol was used: 50 °C for 2 min and 95 °C for 10 min, followed by 40 cycles of 95 °C for 15 s and 59 °C for 30 s.

For S. aureus, genomic DNA was prepared from 1 ml aerated overnight culture in brain heart infusion broth (Oxoid) by using a Qiagen DNA Extraction kit according to the manufacturer's instructions, with the exception that to lyse the cells, cell pellets were incubated for 30 min at 37 °C in 180 µl lysostaphin (200 µg ml-1). DNA was stored at -20 °C. Reactions contained 2 µl template DNA, a final concentration of 0.5x SYBR Green PCR MasterMix and 6.25 pmol each primer and were made up to final volume of 25 µl. A three-step temperature cycling procedure was used: 50 °C for 2 min and 95 °C for 10 min, followed by 40 cycles of 95 °C for 15 s, 56 °C for 10 s and 72 °C for 33 s.

Integrity of kinetic PCRs was checked routinely by determining melt curves of the products, to ensure that melting-points were consistent with the size of the expected product and not with primer-dimers, and also to ensure that a single peak was produced.

{Delta}CT values were calculated by subtracting the CT of the matched primer/template pair from the CT of the mismatched primer template. CT values were obtained by using the default threshold setting.

MLST determination.

Genomic DNA was prepared as for kinetic PCR reactions. Amplification and sequencing primers and amplification conditions were as recommended at http://www.mlst.net/, with the exception that, for the N. meningitidis fumC and pgm loci, the recommended sequencing primers were used for amplification as, in our hands, the recommended amplification primers were unreliable in yielding product. Sequencing reactions were carried out by using ABI PRISM BigDye Terminator mix (version 3.1) according to the manufacturer's instructions. Sequencing products were electrophoresed at the Australian Genome Research Facility (University of Queensland, Brisbane, Australia).


    RESULTS AND DISCUSSION
 TOP
 INTRODUCTION
 METHODS
 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Experimental strategy

The strategy pursued involved initial studies that used non-computer-aided sequence analysis, which confirmed that informative SNP sets, composed of small numbers of SNPs, are indeed defined by MLST databases (data not shown). This was followed by software engineering, with incremental improvements to the program over a period of time. The software was used to test different data-mining strategies and, finally, in order to demonstrate the potential of this genotyping approach, we reduced to practice procedures for interrogation of potentially useful examples of SNP sets. Most of our focus was on N. meningitidis and S. aureus, as both of these species are epidemiologically interesting, but they differ greatly with respect to diversity in MLST loci.

Identification of informative SNPs in the N. meningitidis MLST database

SNPs that were diagnostic for N. meningitidis ST-11 and ST-42 were identified. ST-11 was chosen as this is the ST of the electrophoretic type-15 variant that has been responsible for outbreaks in the Czech Republic, Greece and Canada and is also isolated frequently in Australia (Tribe et al., 2002). ST-42 is a member of lineage 3, which is a genetically related cluster of disease-causing strains (Caugant et al., 1990). Two methods were used to identify SNP sets. Firstly, a semi-empirical, two-step strategy was adopted, which used an earlier version of Minimum SNPs that was unable to accommodate concatenated databases. This entailed identification of subsets of loci that defined these STs with a high degree of reliability, identification of SNPs that are diagnostic for alleles at these loci and use of these SNPs as a pool for empirical searches for highly discriminatory subsets. Discriminatory powers of these subsets were assessed by using the Minimum SNPs facility that returns STs that conform to any user-defined combination of SNPs and polymorphs. Secondly, SNPs that were diagnostic for these STs were identified in a single step by using the concatenated database.

The results of the semi-empirical method were as follows: for ST-11, T at fumC435 and C at pdhC12 excluded 98.1 % of known STs, whereas for ST-42, T at abcZ411, A at aroE455, A at fumC201 and T at pdhC274 excluded 97.7 % of known STs. It was concluded that significant resolution may be obtained from a small number of SNPs.

The results of the direct approach that used the concatenated database were broadly similar. With this method, it was possible to identify sets of SNPs that would, within the context of known STs, identify these STs unambiguously. For ST-11, 26 SNPs were required. These are listed, with the polymorph present in ST-11 and the cumulative percentage of SNPs discriminated from ST-11, as follows: pgm124, A (95.2 %); pdhC12, C (98.0 %); fumC435, T (98.4 %); gdh132, T (98.7 %); abcZ27, T (98.8 %); adk108, C (99.0 %); aroE352, A (99.2 %); gdh60, G (99.2 %); abcZ366, C (99.3 %); abcZ375, G (99.3 %); adk29, G (99.4 %); adk135, A (99.4 %); adk189, C (99.4 %); adk371, A (99.5 %); aroE43, C (99.5 %); aroE126, C (99.6 %); aroE169, A (99.6 %); aroE207, C (99.6 %); gdh290, G (99.7 %); gdh339, T (99.7 %); pdhC201, C (99.8 %); pgm106, A (99.8 %); pgm276, C (99.8 %); pgm373, G (99.9 %); pgm430, G (99.9 %); pgm433, G (100.0 %). For ST-42, only eight SNPs were required for unambiguous identification: abcZ411, T (88.4 %); gdh129, T (95.6 %); abcZ423, C (98.9 %); aroE82, T (99.5 %); fumC9, G (99.8 %); pdhC129, A (99.9 %); adk21, T (99.9 %); gdh492, C (100.0 %). This demonstrates that it is technically feasible to identify STs unambiguously by using SNPs, assuming that the unknown sample has a known ST. However, it is striking that for ST-11, 20 SNPs are required to move from 99 to 100 % discrimination, and that 95.2 % discrimination is obtained with just one SNP. For many applications, interrogation of a very small number of SNPs may provide useful information.

SNPs that are discriminatory for specified STs are potentially very useful, but only for specialized applications. In order to define a set of SNPs that could be used to derive a discriminatory fingerprint from any N. meningitidis isolate, the Minimum SNPs program was used with the D option selected as the criterion for assessment of the discriminatory power of SNP sets. A set of seven SNPs was identified and their cumulative D-values were: pgm93, 0.653; aroE283, 0.871; pdhC456, 0.933; fumC114, 0.965; abcZ54, 0.980; abcZ183, 0.988; fumC330, 0.992. This demonstrates that a small number of SNPs can provide potentially useful resolution.

Identification of informative SNPs in the S. aureus MLST database

The S. aureus MLST database is much smaller than the N. meningitidis database, and the level of sequence diversity is much less. Therefore, it was of interest to explore the relationship between the number of SNPs identified and resolving power for this species. Firstly, SNPs that were diagnostic for a subset of the clonal complex progenitors and one single-locus variant, as defined by Enright et al. (2002), were identified. Clonal complex progenitors were tested because we conjectured that these may be difficult to define unambiguously, as they would need to be discriminated individually from each descendant. The STs chosen were the clonal complex progenitors ST-30, ST-22 and ST-45, and the ST-30 single-locus variant ST-36. SNPs that identify these STs unambiguously are as follows.

For ST-30, 14 SNPs were required: glpF66, T (83.7 %); tpi241, G (88.3 %); arcC78, A (90.9 %); pta387, G (92.8 %); pta181, G (94.1 %); arcC165, A (94.8 %); aroE121, C (95.4 %); aroE310, G (96.1 %); aroE362, C (96.7 %); glpF124, G (97.4 %); gmk331, C (98.0 %); gmk390, A (98.7 %); pta115, A (99.3 %); tpi158, C (100.0 %).

For ST-22, seven SNPs were required: tpi13, A (96.1 %); aroE184, G (96.7 %); glpF128, G (97.4 %); glpF213, C (98.0 %); gmk303, A (98.7 %); gmk340, G (99.3 %); gmk373, A (100 %).

For ST-45, six SNPs were required: glpF57, A (95.4 %); aroE178, G (96.7 %); tpi60, T (98.0 %); aroE270, T (98.7 %); gmk6, A (99.3 %); pta357, T (100 %).

For ST-36, two SNPs were required: pta181, A (99.3 %); glpF178, G (100 %).

As expected, the single-locus variant required markedly fewer SNPs for unambiguous identification than the clonal complex progenitors. In addition, during these searches, the program settings were adjusted so that up to 10 SNP sets would be selected for each ST if they were comparable in performance. For the clonal complex progenitor STs, 10 essentially equivalent pathways to 100 % were identified. However, in the case of the single-locus variant ST-36, the set of two SNPs was the only set identified. These results are consistent with the notion that in the case of clonal complex progenitors, each descendant must be discriminated separately, whereas single-locus variants need only be discriminated from the progenitor.

In order to obtain a set of SNPs with generalized ability to discriminate S. aureus STs, a set of SNPs that provides a high D-value was identified. These seven SNPs and their cumulative D-values are: arcC210, 0.51; tpi243, 0.75; arcC162, 0.84; gmk129, 0.90; aroE132, 0.92; yqi333, 0.94; tpi241, 0.95. Several alternatives to this set of SNPs with essentially equivalent performance were identified. It is clear that, although a potentially useful level of discrimination was achieved, the D-values did not rise as swiftly as with the N. meningitidis database.

Kinetic PCR interrogation of N. meningitidis SNPs

In order to test the practicality and robustness of kinetic PCR interrogation of SNPs in N. meningitidis DNA, kinetic PCR assays were developed for: (a) SNPs diagnostic for ST-11 and identified by the semi-empirical two-step method (fumC435 and pdhC12); (b) SNPs diagnostic for ST-42 and identified by the semi-empirical two-step method (abcZ411, aroE455, fumC201 and pdhC274); (c) the high-D SNP set (pgm93, 0.653; aroE283, 0.871; fumC114, 0.933; abcZ183, 0.963; abcZ54, 0.979; gdh60, 0.987; pdhC103, 0.992). This is slightly different from the N. meningitidis high-D SNP set identified above, as we were unable to reduce to practice a reliable kinetic PCR assay for pdhC456. Therefore, this position was deleted from the alignment and the Minimum SNPs analysis was repeated, yielding the SNP set above.

After optimization, all assays were carried out at least twice on each isolate. To assess the robustness of the assays, results from each polymorph of each SNP were pooled and the mean and SD of {Delta}CT values were calculated. The results are shown in Tables Go3–5. On all occasions, the assays clearly indicated the correct polymorph. In the case of isolates 02M5007 (ST-11) and 02M5044 (ST-23), MLST determination was not carried out until the kinetic PCR assays had been completed, so the SNP interrogation was carried out blind. The kinetic PCR and MLST results were entirely consistent. As expected, the high-D SNP set provided unique polymorph profiles for each ST included in the study, despite there being no reference to these STs in the selection process for these SNPs. It was concluded that this method has potential as a rapid means for obtaining an epidemiological type for N. meningitidis. An alternative strategy for reducing the time and cost of MLST-based genotyping has been reported by Shlush et al. (2002); this involved detection of mismatches between PCR products from a standard isolate and a known isolate by using denaturing HPLC. This approach is very effective for determining whether or not an isolate has an ST of interest. However, it requires post-PCR processing and, unlike our high-D SNP sets, cannot yield a generally applicable genetic fingerprint.


View this table:
[in this window]
[in a new window]
 
Table 3. Kinetic PCR results on N. meningitidis DNA for ST-specific SNPs All {Delta}CT values obtained were in the expected orientation. NA, Not applicable.
 

View this table:
[in this window]
[in a new window]
 
Table 4. Kinetic PCR results on N. meningitidis DNA for high-D SNPs All {Delta}CT values obtained were in the expected orientation. NA, Not applicable; NT, not tested.
 

View this table:
[in this window]
[in a new window]
 
Table 5. {Delta}CT values from interrogation of high-D SNPs in ST-11 DNA
 

During the process of primer design and optimization, a potentially useful observation was made. Scrutiny of the results of kinetic PCR interrogation of fumC435 revealed that in some fumC alleles, there was a subterminal mismatch 7 bp from the 3' end of the primer. This mismatch is a consequence of diversity in the primer-binding site and is present in ST-8 and ST-42, but not in ST-11 or ST-32. This mismatch was shown to improve allele specificity by increasing CT for reactions with mismatched primers, whilst having little or no effect on CT values for matched primers (Fig. 1). The net effect of this is to increase the {Delta}CT. The likely reason for this effect is that the mismatch lowers the melting-point of the target–primer duplex, thus reducing the probability that the primer site will be occupied at any given time-point during the annealing step.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 1. Effect on allele discrimination for the N. meningitidis fumC435 SNP of a subterminal mismatch 7 bp from the 3' end of the allele-specific primers.

 

Kinetic PCR interrogation of S. aureus SNPs

Kinetic PCR assays were developed for the S. aureus high-D SNP set: arcC210, tpi243, arcC162, tpi241, yqiL333, aroE132 and gmk129. The results are shown in Tables 6 and 7. As with the N. meningitidis SNPs, the assay was very reliable. After optimization, {Delta}CT values obtained were always in the expected orientation. This method has potential as a means for rapid fingerprinting of S. aureus.


View this table:
[in this window]
[in a new window]
 
Table 6. Kinetic PCR on S. aureus genomic DNA for high-D SNPs All {Delta}CT values obtained were in the expected orientation. NA, Not applicable; NT, not tested.
 

View this table:
[in this window]
[in a new window]
 
Table 7. {Delta}CT values from interrogation of high-D SNPs in ST-30 DNA
 

Identification of high-D SNP sets in other organisms and comparison of the relationship between the number of SNPs and D-values obtained

Our approach is applicable to all species for which there are MLST databases. Therefore, MLST databases for a number of different species were searched for high-D SNP sets. Species subject to this analysis were Burkholderia pseudomallei/Burkholderia mallei, Helicobacter pylori, Campylobacter jejuni, Streptococcus pneumoniae, Streptococcus pyogenes and Enterococcus faecium (Feil et al., 2000; Enright et al., 2001; Sa-Leao et al., 2001; Dingle et al., 2002; Homan et al., 2002; Godoy et al., 2003; Meats et al., 2003). Ten different sets of SNPs were identified for each species. For each species, the relationship between D and the number of SNPs was essentially identical for each SNP set, but there were large differences between species (Fig. 2). Despite the differences, it was possible to identify high-D SNP sets in all cases. It was concluded that SNP-based typing could potentially be applied to any bacterial species for which there is sufficient comparative sequence data.



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 2. Relationships between number of SNPs and D-values obtained for MLST databases for a number of different bacterial species. STs analysed were: Helicobacter pylori, 1–370; Neisseria meningitidis, 1–2859; Streptococcus pneumoniae, 1–900; Campylobacter jejuni, 1–812; Streptococcus pyogenes, 1–108; Haemophilus influenzae, 1–95; Burkholderia pseudomallei/Burkholderia mallei, 1–101; Enterococcus faecium, 1–63; Staphylococcus aureus, the 202 STs available for download in September 2003. In the case of H. pylori, YphC alleles 286, 288 and 310–315 contained 6 bp of missing sequence; these were filled in manually by using the consensus sequence for this region.

 

The basis for differences in the relationship between the number of SNPs and D-values obtained is an intriguing puzzle. It is tempting to look for a relationship between clonality and D-values: a highly clonal organism would have linkage disequilibrium that extended over the entire genome, so different SNPs would be polymorphic only within certain lineages, resulting in lower combinatorial power between SNPs. The recent finding that S. aureus has a low recombination rate (Feil et al., 2003) may explain the low D-values that were obtained with this species. However, there are other factors. The extent of genetic diversity would be expected to have an effect, simply because more genetic diversity means a larger pool of SNPs and, all things being equal, a consequently higher probability of finding highly informative sets of SNPs. Also, the presence of trimorphic or tetramorphic SNPs potentially speeds up the accumulation of resolving power as each SNP is added to the set; such SNPs are much more common in more diverged species. Our current view is that both the frequency of recombination and the number and nature of SNPs available contribute to D-values that can be obtained for a given number of SNPs.

Possible applications

The purpose of this study was to develop a systematic and practical approach to making use of known genetic diversity in micro-organisms, in order to assemble rapid and cost-effective genetic fingerprinting methodologies. The algorithms used are entirely empirical and there are no assumptions made concerning population structure or diversity. Therefore, this approach is suitable for bacterial pathogens, as bacteria exhibit significant interspecies variation in their diversity and clonality and, hence, also vary in the extent of linkage disequilibrium between markers that are separated by any given distance. The results presented here demonstrate that SNP combinations with quantitatively defined and useful resolving powers can indeed be identified and can also be interrogated by using efficient and convenient assays.

Possible criticisms of our work are that SNP combinations will almost never have the same resolution as the full ST, that SNP combinations are ineffective at detecting new STs and that sequencing technology is itself rapid and cost-effective. The major counter to this is that MLST is itself somewhat arbitrary in nature. There is no certainty that two isolates with the same ST do not differ in regions that are subject to high-frequency transposition or site-specific recombination events, or even in housekeeping genes that are outside the MLST-defined fragments and have undergone recombination or mutation. MLST appears to be an excellent method for identifying clonal background and carrying out long-range studies of population structure and evolution. Higher typing resolution, such as that required for reliable outbreak detection, may require methods such as PFGE. This raises the question as to the practical value of the increase in resolution provided by MLST, as compared to our SNP-based approach. It is likely that highly effective typing methodologies of the future will encompass interrogation of housekeeping genes to determine clonal background, and interrogation of hypervariable regions to increase resolution. A SNP-based approach in combination with interrogation of hypervariable regions would be expected to have a very similar performance to full ST determination in combination with interrogation of hypervariable regions.

Although the ability of SNP-based typing to detect new STs is limited, in the case of high-D SNP sets, it is not zero, as novel SNP profiles can be detected. In the case of specified variant SNPs, the concern is not relevant, as these SNPs are designed only to allow efficient determination of whether or not an isolate has an ST of interest. Also, SNP-based typing does not rule out the use of MLST or other higher-resolution typing methods. Specified variant SNPs for ST-x will never provide a false-negative answer to the question ‘Does this isolate have ST-x?', whereas high-D SNPs will never provide a false-negative answer to the question ‘Are these isolates identical?'. When these questions are answered in the affirmative on the basis of SNP profiles, it may be appropriate in some instances to carry out full ST determination or other higher-resolution typing procedures. We therefore see SNP-based typing to be most suitable as a routinely applied, high-throughput screening tool that can flag potential outbreaks or the appearance of variants of concern. Possible applications include infection control in healthcare facilities, monitoring of food-borne disease and biodefence.

It was in recognition of this that we tested the practicality of kinetic PCR as an SNP-interrogation method. This approach is single-step –there is no prior amplification required. It uses unlabelled primers and no probes. There is a trend toward lower reaction volumes and reduced reagent costs in real-time PCR applications, and kinetic PCR has the potential to be extremely cost-effective. We envisage this method being carried out on colonies on primary isolation plates or on blood or enrichment cultures from normally sterile sites. It could be combined easily with other gene-detection assays for, for example, virulence factor-encoding genes or antibiotic-resistance determinants, to gain a complete picture of a micro-organism in a very short time.

In our hands, kinetic PCR is a tractable methodology. It is well-known that different 3' mismatches are subject to different mis-priming frequencies (Huang et al., 1992; Ayyadevara et al., 2000), so the different mean {Delta}CT values obtained at different SNPs were not unexpected. Remarkably, kinetic PCR reactions yielded approximately equal {Delta}CT values (although always in the expected orientation) with different polymorphs at any given SNP, indicating that the allele-specific primers in any given primer pair (or triplet) mis-primed at approximately equal frequencies. This was not expected and quite fortuitous. Primer design for N. meningitidis reactions was more challenging than for S. aureus reactions, as greater diversity in N. meningitidis means that diversity within primer sites is a factor that must be accommodated. However, we were only unsuccessful in assembling a reliable kinetic PCR assay for one SNP (pdhC456) and degenerate primers were only necessary for one other SNP (aroE283). In the case of fumC435, primer site diversity proved to be advantageous, as a mismatch 7 bp from the 3' end increased the {Delta}CT. Deliberate inclusion of mismatches is likely to prove a valuable tactic for maximizing {Delta}CT values, as has been found previously for conventional allele-specific PCR (Ishikawa et al., 1995). It can be seen, in Table 2, that the N. meningitidis allele-specific primers differ markedly at locations other than the 3' end. This was necessary to accommodate diversity in the primer sites and appears to be a viable option, as examination of sequence alignments indicated that the diversity was not distributed evenly –particular bases at polymorphic sites were linked strongly to certain sequences within the primer-binding sites. This may be the effect of insertions, deletions or inversions in the evolutionary past. With the much less diverse species S. aureus, diversity within the primer-binding sites was not an issue.

In conclusion, we have demonstrated a novel and widely applicable approach to bacterial typing. It makes use of known genetic diversity and allows the typing method to be designed with reference to the resolving power required. It would be worthwhile to test this approach against large numbers of isolates.


    ACKNOWLEDGEMENTS
 TOP
 INTRODUCTION
 METHODS
 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The authors thank Graeme Nimmo, John Bates, Helen Smith and the Australian Group on Antimicrobial Resistance for providing samples and helpful discussions, David Hammond for assistance with kinetic PCR and Alex Stephens for assistance with MLST determinations. This study was supported by the Australian Federal Government Cooperative Research Centres Programme and the Queensland State Government Department of Innovation and Information Economy.


    REFERENCES
 TOP
 INTRODUCTION
 METHODS
 RESULTS AND DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

  • Ayyadevara, S., Thaden, J. J. & Shmookler Reis, R. J. (2000). Discrimination of primer 3'-nucleotide mismatch by Taq DNA polymerase during polymerase chain reaction. Anal Biochem 284, 11–18.[CrossRef][Medline]

  • Booch, G. (1993). Object-oriented Analysis and Design with Applications, 2nd edn. Boston: Addison-Wesley.

  • Caugant, D. A., Bol, P., Høiby, E. A., Zanen, H. C. & Frøholm, L. O. (1990). Clones of serogroup B Neisseria meningitidis causing systemic disease in The Netherlands, 1958–1986. J Infect Dis 162, 867–874.[Medline]

  • Dingle, K. E., Colles, F. M., Ure, R., Wagenaar, J. A., Duim, B., Bolton, F. J., Fox, A. J., Wareing, D. R. A. & Maiden, M. C. J. (2002). Molecular characterization of Campylobacter jejuni clones: a basis for epidemiologic investigation. Emerg Infect Dis 8, 949–955.[Medline]

  • Enright, M. C., Spratt, B. G., Kalia, A., Cross, J. H. & Bessen, D. E. (2001). Multilocus sequence typing of Streptococcus pyogenes and the relationships between emm type and clone. Infect Immun 69, 2416–2427.[Abstract/Free Full Text]

  • Enright, M. C., Robinson, D. A., Randle, G., Feil, E. J., Grundmann, H. & Spratt, B. G. (2002). The evolutionary history of methicillin-resistant Staphylococcus aureus (MRSA). Proc Natl Acad Sci U S A 99, 7687–7692.[Abstract/Free Full Text]

  • Feil, E. J., Smith, J. M., Enright, M. C. & Spratt, B. G. (2000). Estimating recombinational parameters in Streptococcus pneumoniae from multilocus sequence typing data. Genetics 154, 1439–1450.[Abstract/Free Full Text]

  • Feil, E. J., Holmes, E. C., Bessen, D. E. & 9 other authors (2001). Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc Natl Acad Sci U S A 98, 182–187.[Abstract/Free Full Text]

  • Feil, E. J., Cooper, J. E., Grundmann, H. & 9 other authors (2003). How clonal is Staphylococcus aureus? J Bacteriol 185, 3307–3316.[Abstract/Free Full Text]

  • Germer, S., Holland, M. J. & Higuchi, R. (2000). High-throughput SNP allele-frequency determination in pooled DNA samples by kinetic PCR. Genome Res 10, 258–266.[Abstract/Free Full Text]

  • Giffard, P. M., McMahon, J. A., Gustafson, H. M., Barnard, R. T. & Voisey, J. (2001). Comparison of competitively primed and conventional allele-specific nucleic acid amplification. Anal Biochem 292, 207–215.[CrossRef][Medline]

  • Godoy, D., Randle, G., Simpson, A. J., Aanensen, D. M., Pitt, T. L., Kinoshita, R. & Spratt, B. G. (2003). Multilocus sequence typing and evolutionary relationships among the causative agents of melioidosis and glanders, Burkholderia pseudomallei and Burkholderia mallei. J Clin Microbiol 41, 2068–2079.[Abstract/Free Full Text]

  • Homan, W. L., Tribe, D., Poznanski, S., Li, M., Hogg, G., Spalburg, E., van Embden, J. D. A. & Willems, R. J. L. (2002). Multilocus sequence typing scheme for Enterococcus faecium. J Clin Microbiol 40, 1963–1971.[Abstract/Free Full Text]

  • Huang, M. M., Arnheim, N. & Goodman, M. F. (1992). Extension of base mispairs by Taq DNA polymerase: implications for single nucleotide discrimination in PCR. Nucleic Acids Res 20, 4567–4573.[Abstract/Free Full Text]

  • Hunter, P. R. & Gaston, M. A. (1988). Numerical index of the discriminatory ability of typing systems: an application of Simpson's index of diversity. J Clin Microbiol 26, 2465–2466.[Abstract/Free Full Text]

  • Ishikawa, Y., Tokunaga, K., Kashiwase, K., Akaza, T. & Juji, T. (1995). Sequence-based typing of HLA-A2 alleles using a primer with an extra base mismatch. Hum Immunol 42, 315–318.[CrossRef][Medline]

  • Jeanmougin, F., Thompson, J. D., Gouy, M., Higgins, D. G. & Gibson, T. J. (1998). Multiple sequence alignment with Clustal X. Trends Biochem Sci 23, 403–405.[CrossRef][Medline]

  • Krawczak, M. (1999). Informativity assessment for biallelic single nucleotide polymorphisms. Electrophoresis 20, 1676–1681.[CrossRef][Medline]

  • Maiden, M. C. J., Bygraves, J. A., Feil, E. & 10 other authors (1998). Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci U S A 95, 3140–3145.[Abstract/Free Full Text]

  • Meats, E., Feil, E. J., Stringer, S., Cody, A. J., Goldstein, R., Kroll, J. S., Popovic, T. & Spratt, B. G. (2003). Characterization of encapsulated and noncapsulated Haemophilus influenzae and determination of phylogenetic relationships by multilocus sequence typing. J Clin Microbiol 41, 1623–1636.[Abstract/Free Full Text]

  • Sachidanandam, R., Weissman, D., Schmidt, S. C. & 38 other authors (2001). A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms (International SNP Map Working Group). Nature 409, 928–933.[CrossRef][Medline]

  • Sa-Leao, R., Tomasz, A. & de Lencastre, H. (2001). Multilocus sequence typing of Streptococcus pneumoniae clones with unusual drug resistance patterns: genetic backgrounds and relatedness to other epidemic clones. J Infect Dis 184, 1206–1210.[CrossRef][Medline]

  • Shlush, L. I., Behar, D. M., Zelazny, A., Keller, N., Lupski, J. R., Beaudet, A. L. & Bercovich, D. (2002). Molecular epidemiological analysis of the changing nature of a meningococcal outbreak following a vaccination campaign. J Clin Microbiol 40, 3565–3571.[Abstract/Free Full Text]

  • Simpson, E. H. (1949). Measurement of diversity. Nature 163, 688. 688.

  • Tribe, D. E., Zaia, A. M., Griffith, J. M., Robinson, P. M., Li, H. Y., Taylor, K. N. & Hogg, G. G. (2002). Increase in meningococcal disease associated with the emergence of a novel ST-11 variant of serogroup C Neisseria meningitidis in Victoria, Australia, 1999–2000. Epidemiol Infect 128, 7–14.[Medline]




    This article has been cited by other articles:


    Home page
    Antimicrob. Agents Chemother.Home page
    A. J. Stephens, F. Huygens, and P. M. Giffard
    Systematic Derivation of Marker Sets for Staphylococcal Cassette Chromosome mec Typing
    Antimicrob. Agents Chemother., August 1, 2007; 51(8): 2954 - 2964.
    [Abstract] [Full Text] [PDF]


    Home page
    Appl. Environ. Microbiol.Home page
    E. P. Price, H. Smith, F. Huygens, and P. M. Giffard
    High-Resolution DNA Melt Curve Analysis of the Clustered, Regularly Interspaced Short-Palindromic-Repeat Locus of Campylobacter jejuni
    Appl. Envir. Microbiol., May 15, 2007; 73(10): 3431 - 3436.
    [Abstract] [Full Text] [PDF]


    Home page
    Appl. Environ. Microbiol.Home page
    E. P. Price, F. Huygens, and P. M. Giffard
    Fingerprinting of Campylobacter jejuni by Using Resolution-Optimized Binary Gene Targets Derived from Comparative Genome Hybridization Studies
    Appl. Envir. Microbiol., December 1, 2006; 72(12): 7793 - 7803.
    [Abstract] [Full Text] [PDF]


    Home page
    J. Clin. Microbiol.Home page
    F. Huygens, J. Inman-Bamber, G. R. Nimmo, W. Munckhof, J. Schooneveldt, B. Harrison, J. A. McMahon, and P. M. Giffard
    Staphylococcus aureus Genotyping Using Novel Real-Time PCR Formats.
    J. Clin. Microbiol., October 1, 2006; 44(10): 3712 - 3719.
    [Abstract] [Full Text] [PDF]


    Home page
    J. Clin. Microbiol.Home page
    M. McDonald, A. Dougall, D. Holt, F. Huygens, F. Oppedisano, P. M. Giffard, J. Inman-Bamber, A. J. Stephens, R. Towers, J. R. Carapetis, et al.
    Use of a Single-Nucleotide Polymorphism Genotyping System To Demonstrate the Unique Epidemiology of Methicillin-Resistant Staphylococcus aureus in Remote Aboriginal Communities.
    J. Clin. Microbiol., October 1, 2006; 44(10): 3720 - 3727.
    [Abstract] [Full Text] [PDF]


    Home page
    J Med MicrobiolHome page
    E. P. Price, V. Thiruvenkataswamy, L. Mickan, L. Unicomb, R. E. Rios, F. Huygens, and P. M. Giffard
    Genotyping of Campylobacter jejuni using seven single-nucleotide polymorphisms in combination with flaA short variable region sequencing.
    J. Med. Microbiol., August 1, 2006; 55(Pt 8): 1061 - 1070.
    [Abstract] [Full Text] [PDF]


    Home page
    J Med MicrobiolHome page
    A. J. Stephens, F. Huygens, J. Inman-Bamber, E. P. Price, G. R. Nimmo, J. Schooneveldt, W. Munckhof, and P. M. Giffard
    Methicillin-resistant Staphylococcus aureus genotyping using a small set of polymorphisms
    J. Med. Microbiol., January 1, 2006; 55(1): 43 - 51.
    [Abstract] [Full Text] [PDF]


    Home page
    J Med MicrobiolHome page
    E L Best, A J Fox, J A Frost, and F J Bolton
    Real-time single-nucleotide polymorphism profiling using Taqman technology for rapid recognition of Campylobacter jejuni clonal complexes
    J. Med. Microbiol., October 1, 2005; 54(10): 919 - 925.
    [Abstract] [Full Text] [PDF]


    Home page
    Proc. Natl. Acad. Sci. USAHome page
    W.-G. Qiu, S. E. Schutzer, J. F. Bruno, O. Attie, Y. Xu, J. J. Dunn, C. M. Fraser, S. R. Casjens, and B. J. Luft
    Genetic exchange and plasmid transfers in Borrelia burgdorferi sensu stricto revealed by three-way genome comparisons and multilocus sequence typing
    PNAS, September 28, 2004; 101(39): 14150 - 14155.
    [Abstract] [Full Text] [PDF]


    Home page
    J. Clin. Microbiol.Home page
    J. C. Robles, L. Koreen, S. Park, and D. S. Perlin
    Multilocus Sequence Typing Is a Reliable Alternative Method to DNA Fingerprinting for Discriminating among Strains of Candida albicans
    J. Clin. Microbiol., June 1, 2004; 42(6): 2480 - 2488.
    [Abstract] [Full Text] [PDF]


    Home page
    J. Clin. Microbiol.Home page
    F. Huygens, A. J. Stephens, G. R. Nimmo, and P. M. Giffard
    mecA Locus Diversity in Methicillin-Resistant Staphylococcus aureus Isolates in Brisbane, Australia, and the Development of a Novel Diagnostic Procedure for the Western Samoan Phage Pattern Clone
    J. Clin. Microbiol., May 1, 2004; 42(5): 1947 - 1955.
    [Abstract] [Full Text] [PDF]


    This Article
    Right arrow Abstract Freely available
    Right arrow Full Text (PDF)
    Right arrow Alert me when this article is cited
    Right arrow Alert me if a correction is posted
    Right arrow Citation Map
    Services
    Right arrow Email this article to a friend
    Right arrow Similar articles in this journal
    Right arrow Similar articles in PubMed
    Right arrow Alert me to new issues of the journal
    Right arrow Download to citation manager
    Right arrow reprints & permissions
    Citing Articles
    Right arrow Citing Articles via HighWire
    Right arrow Citing Articles via CrossRef
    Right arrow Citing Articles via Google Scholar
    Google Scholar
    Right arrow Articles by Robertson, G. A.
    Right arrow Articles by Giffard, P. M.
    Right arrow Search for Related Content
    PubMed
    Right arrow PubMed Citation
    Right arrow Articles by Robertson, G. A.
    Right arrow Articles by Giffard, P. M.
    Agricola
    Right arrow Articles by Robertson, G. A.
    Right arrow Articles by Giffard, P. M.


    HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
    INT J SYST EVOL MICROBIOL J MED MICROBIOL MICROBIOLOGY J GEN VIROL ALL SGM JOURNALS