The invention provides a method of tracking, identifying, and/or sorting classes or subpopulations of molecules by the use of oligonucleotide tags. oligonucleotide tags of the invention each consist of a plurality of subunits 3 to 6 nucleotides in length selected from a minimally cross-hybridizing set. A subunit of a minimally cross-hybridizing set forms a duplex or triplex having two or more mismatches with the complement of any other subunit of the same set. The number of oligonucleotide tags available in a particular embodiment depends on the number of subunits per tag and on the length of the subunit. An important aspect of the invention is the use of the oligonucleotide tags for sorting polynucleotides by specifically hybridizing tags attached to the polynucleotides to their complements on solid phase supports. This embodiment provides a readily automated system for manipulating and sorting polynucleotides, particularly useful in large-scale parallel operations, such as large-scale DNA sequencing, mRNA fingerprinting, and the like, wherein many target polynucleotides or many segments of a single target polynucleotide are sequenced simultaneously.

Patent
   RE39793
Priority
Oct 13 1994
Filed
Aug 02 1999
Issued
Aug 21 2007
Expiry
Oct 13 2014
Assg.orig
Entity
Large
72
24
all paid
0. 5. A composition of matter comprising a plurality of from ten thousand to a hundred thousand different polynucleotides, selected from cDNA molecules or fragments of a target polynucleotide to be analyzed or sequenced, said composition including a mixture of microparticles,
wherein each microparticle has identical polynucleotides of the plurality attached thereto,
and wherein substantially all different polynucleotides in the plurality are attached to different microparticles.
1. A composition of matter comprising:
a solid phase support having one or more spacially discrete regions; and
a uniform population of substantially identical oligonucleotide tag complements covalently attached to the solid phase support in at least one of the one or more spacially discrete regions, the oligonucleotide tag complements comprising a plurality of subunits, each subunit consisting of an oligonucleotide having a length from three to six nucleotides and each subunit being selected from a minimally cross-hybridizing set, wherein a subunit of the set and a component of any other subunit of the set would have at least two mismatches.
0. 7. A composition of matter comprising a plurality of different polynucleotides, selected from cDNA molecules or fragments of a target polynucleotide to be analyzed or sequenced, said composition including a mixture of microparticles,
wherein tag complements are attached to each said microparticle,
and wherein each said cDNA molecule or fragment has an oligonucleotide tag attached, such that substantially all the same molecules have the same oligonucleotide tag attached and substantially all different molecules have different oligonucleotide tags attached,
such that perfectly matched duplexes are formed between the tag complements of said microparticles and the oligonucleotide tags of said cDNA molecules or fragments,
whereby, each microparticle has identical polynucleotides of the plurality attached thereto, and substantially all different polynucleotides in the plurality are attached to different microparticles.
2. The composition of matter of claim 1 wherein said plurality of said subunits is in the range of from 4 to 10.
3. The composition of matter of claim 2 wherein said solid phase support is a microparticle having a single spacially discrete region.
4. The composition of matter of claim 3 wherein said microparticles is selected from the group consisting of glass microparticles, magnetic beads, and polystyrene microparticles.
0. 6. The composition of claim 5 wherein each microparticle has about 105 identical polynucleotides attached thereto.

This is a continuation of U.S. patent application Ser. No. 08/358,810 filed 19 Dec. 1994, which is a continuation-in-part of U.S. patent application Ser. No. 08/322,348 filed 13 Oct. 1994, now abandoned, which application is incorporated by reference.

The invention relates generally to methods for identifying, sorting, and/or tracking molecules, especially polynucleotides, with oligonucleotide labels, and more particularly, to a method of sorting polynucleotides by specific hybridization to oligonucleotide tags.

Specific hybridization of oligonucleotides and their analogs is a fundamental process that is employed in a wide variety of research, medical, and industrial applications, including the identification of disease-related polynucleotides in diagnostic assays, screening for clones of novel target polynucleotides, identification of specific polynucleotides in blots of mixtures of polynucleotides, amplification of specific target polynucleotides, therapeutic blocking of inappropriately expressed genes. DNA sequencing, and the like, e.g. Sambrook et at, Molecular Cloning: A Laboratory Manual 2nd Edition (Cold Spring Harbor Laboratory, New York, 1989); Keller and Manak, DNA Probes, 2nd Edition (Stockton Press, New York, 1993); Milligan et al. J. Med. Chem., 36: 1923-1937 (1993); Drmanac et al, Science, 260: 1649-1652 (1993); Bains, J. DNA Sequencing and Mapping, 4: 143-150 (1993).

Specific hybridization has also been proposed as a method of tracking, retrieving, and identifying compounds labeled with oligonucleotide tags. For example, in multiplex DNA sequencing oligonucleotide tags are used to identify electrophoretically separated bands on a gel that consist of DNA fragments generated in the same sequencing reaction. In this way, DNA fragments from many sequencing reactions are separated on the same length of a gel which is then blotted with separate solid phase materials on which the fragment bands from the separate sequencing reactions are visualized with oligonucleotide probes that specifically hybridize to complementary tags, Church et al. Science, 240: 185-188 (1988). Similar uses of oligonucleotide tags have also been proposed for identifying explosive, potentially pollutants, such as crude oil, and currency for prevention and detection of counterfeiting, e.g. reviewed by Dollinger, pages 265-274 in Mullis et al, editors. The Polymerase Chain Reaction (Birkhauser, Boston, 1994). More recently, systems employing oligonucleotide tags have also been proposed as a means of manipulating and identifying individual molecules in complex combinatorial chemical libraries, for example, as an aid to screening such libraries for drug candidates, Brenner and Lerner, Proc. Natl. Acad. Sci. 89: 5381-5383 (1992); Alper, Science, 264: 1399-1401 (1994); and Needels et al, Proc. Natl. Acad. Sci., 90: 10700-10704 (1993)

The successful implementation of such tagging schemes depends in large part on the success in achieving specific hybridization between a tag and its complementary probe. That is, for an oligonucleotide tag to successfully identify a substance, the number of false positive and false negative signals must be minimized. Unfortunately, such spurious signals are not uncommon because base pairing and base stacking free energies vary widely among nucleotides in a duplex or triplex structure. For example, a duplex consisting of a repeated sequence of deoxyadenine (A) and thymidine (T) bound to its complement may have less stability than an equal-length duplex consisting of a repeated sequence of deoxyguanidine (G) and deoxycytidine (C) bound to a partially complementary target containing a mismatch. Thus, if a desired compound from a large combinatorial chemical library were tagged with the former oligonucleotide, a significant possibility would exist that, under hybridization conditions designed to detect perfectly matched AT-rich duplexed, undesired compounds labeled with the GC-rich oligonucleotide—even in a mismatched duplex—would be detected along with the perfectly matched duplexes consisting of the AT-rich tag. In the molecular tagging system proposed by Brenner et al (cited above), the related problem of mis-hybridizations of closely related tags was addressed by employing a so-called “commaless” code, which ensures that a probe out of register (or frame shifted) with respect to its complementary tag would result in a duplex with one or more mismatches for each of its five or more three-base words, or “codons.”

Even though reagents, such as tetramethylammonium chloride, are available to negate base-specific stability differences of oligonucleotide duplexes, the effect of such reagents is often limited and their presence can be incompatible with, or render more difficult, further manipulations of the selected compounds, e.g. amplification by polymerase chain reaction (PCR) or the like.

Such problems have made the simultaneous use of multiple hybridization probes in the analysis of multiple or complex genetic loci e.g. via multiplex PCR, reverse dot blotting, or the like, very difficult. As a result, direct sequencing of certain loci, e.g. HLA genes, has been promoted as a reliable alternative to indirected methods employing specific hybridization for the identification of genotypes, e.g. Gyllensten et al, Proc. Nat. Acad. Sci., 85: 7652-7656 (1988).

The ability to sort cloned and identically tagged DNA fragments onto distinct solid phase supports would facilitate such sequencing, particularly when coupled with a non gel-based sequencing methodology simultaneously applicable to many samples in parallel.

In view of the above, it would be useful if there were available an oligonucleotide-based tagging system which provided a large repertoire of tags, but which also minimized the occurrence of false positive and false negative signals without the need to employ special reagents for altering natural base pairing and base stacking free energy differences. Such a tagging system would find applications in many areas, including construction and use of combinatorial chemical libraries, large-scale mapping and sequencing of DNA, genetic identification, medical diagnosis, and the like.

An object of my invention is to provide a molecular tagging system for tracking, retrieving, and identifying compounds.

Another object of my invention is to provide a method for sorting identical molecules, or subclasses of molecules, especially polynucleotides, onto surfaces of solid phase materials by the specific hybridization of oligonucleotide tags and their complements.

A further object of my invention is to provide a combinatorial chemical library whose member compounds are identified by the specific hybridization of oligonucleotide tags and their complements.

A still further object of my invention is to provide a system for tagging and sorting many thousands of fragments, especially randomly overlapping fragments, of a target polynucleotide for simultaneous analysis and/or sequencing.

Another object of my invention is to provide a rapid and reliable method for sequencing target polynucleotides having a length in the range of a few hundred basepairs to several tens of thousands of basepairs.

My invention achieve these and other objects by providing a method and materials for tracking, identifying, and/or sorting classes or subpopulations of molecules by the use of oligonucleotide tags. An oligonucleotide tag of the invention consists of a plurality of subunits, each subunit consisting of an oligonucleotide of 3 to 6 nucleotides in length. Subunits of an oligonucleotide tag are selected from a minimally cross-hybridizing set. In such a set, a duplex or triplex consisting of a subunit of the set and the complement of any other subunit of the set contains at least two mismatches. In other words, a subunit of a minimally cross-hybridizing set at best forms a duplex or triplex having two mismatches with the complement of any other subunit of the same set. The numbers of oligonucleotide tags available in a particular embodiment depends on the number of subunits per tag and on the length of the subunit. The number is generally much less than the number of all possible sequences the length of the tag which for a tag nucleotides long would be 4n. More preferably, subunits are oligonucleotides from 4 to 5 nucleotides in length.

In one aspect of my invention, complements of oligonucleotide tags attached to a solid phase support are used to sort polynucleotides from a mixture of polynucleotides each containing a tag. In this embodiment, complements of the oligonucleotide tags are synthesized on the surface of a solid phase support, such as a microscopic bead or a specific location on an array of synthesis locations on a single support such that populations of identical sequences are produced in specific regions. That is, the surface of each support in the case of a bead, or of each region, in the case of an array, is derivatized by only one type of complement which has a particular sequence. The population of such beads or regions contains a repertoire of complements with distinct sequences, the size of the repertoire depending on the number of subunits per oligonucleotide tag and the length of the subunits employed. Similarly, the polynucleotides to be sorted each comprises an oligonucleotide tag in the repertoire, such that identical polynucleotides have the same tag and different polynucleotides have different tags. Thus, when the populations of supports and polynucleotides are mixed under conditions which permit specific hybridization of the oligonucleotide tags with their respective complements, subpopulations of identical polynucleotides are sorted onto particular beads or regions. The subpopulations of polynucleotides can then be manipulated on the solid phase support by micro-biochemical techniques.

Generally, the method of my invention comprises the following steps: (a) attaching an oligonucleotide tag from a repertoire of tags to each molecule in a population of molecules (i) such that substantially all the same molecules or same subpopulation of molecules in the population have the same oligonucleotide tag attached and substantially all different molecules or different subpopulations of molecules in the population have different oligonucleotide tags attached and (ii) such that each oligonucleotide tag from the repertoire comprises a plurality of subunits and each subunit of the plurality consists of an oligonucleotide having a length from three to six nucleotides or from three to six basepairs, the subunits being selected from a minimally cross-hybridizing set; and (b) sorting the molecules or subpopulations of molecules of the population by specifically hybridizing the oligonucleotide tags with their respective complements.

An important aspect of my invention is the use of the oligonucleotide tags to sort polynucleotides for parallel sequence determination. Preferably, such sequencing is carried out by the following steps: (a) generating from the target polynucleotide a plurality of fragments that cover the target polynucleotide; (b) attaching an oligonucleotide tag from a repertoire of tags to each fragment of the plurality (i) such that substantially all the same fragments have the same oligonucleotide tag attached and substantially all different fragments have different oligonucleotide tags attached and (ii) such that each oligonucleotide tag from the repertoire comprises a plurality of subunits and each subunit of the plurality consists of an oligonucleotide having a length from three to six nucleotides or from three to six basepairs, the subunits being selected from a minimally cross-hybridizing set;

My invention overcomes a key deficiency of current methods of tagging or labeling molecules with oligonucleotides: By coding the sequences of the tags in accordance with the invention, the stability of any mismatched duplex or triplex between a tag and complement to another tag is far lower than that of any preferably matched duplex between the tag and its own complement. Thus, the problem of incorrect sorting because of mismatch duplexes of GC-rich tags being more stable than perfectly matched AT-rich tags is eliminated.

When used in combination with solid phase supports, such as microscopic beads, my invention provides a readily automated system for manipulating and sorting polynucleotides, particularly useful in large-scale parallel operations, such as large-scale DNA sequencing, wherein many target polynucleotides or many segments of a single target polynucleotide are sequenced and/or analyzed simultaneously.

FIGS. 1a-1c illustrates structures of labeled probes employed in a preferred method of “single base” sequencing which may be used with the invention.

FIG. 2 illustrates the relative positions of the nuclease recognition site, ligation site, and cleavage site in a ligated complex (SEQ. ID NO:16) formed between a target polynucleotide and a probe used in a preferred “single base” sequencing method.

FIG. 3 is a flow chart illustrating a general algorithm for generating minimally cross-hybridizing sets.

FIG. 4 illustrates a scheme for synthesizing and using a combinatorial chemical library in which member compounds are labeled with oligonucleotide tags in accordance with the invention.

FIG. 5 diagrammatically illustrates an apparatus for carrying out parallel operations, such as polynucleotide sequencing, in accordance with the invention.

“Complement” or “tag complement” as used herein in reference to oligonucleotide tags refers to an oligonucleotide to which a oligonucleotide tag specifically hybridizes to form a perfectly matched duplex or triplex. In embodiments where specific hybridization results in a triplex, the oligonucleotide tag may be selected to be either double stranded or single stranded. Thus, where triplexes are formed, the term “couplement” is meant to encompass either a double stranded complement of a single stranded oligonucleotide tag or a single stranded complement of a double stranded oligonucleotide tag.

The term “oligonucleotide” as used herein includes linear oligomers of natural or modified monomers or linkages, including deoxyribonucleosides, ribunucleosides, α-anomeric forms thereof, peptide nucleic acids (PNAs), and the like, capable of specifically binding to a target polynucleotide by way of a regular pattern of monomer-to-monomer interactions, such as Watson-Crick type of base pairing, base stacking. Hoogsteen or reverse Hoogsteen types of base pairing, or the like. Usually monomers are linked by phosphodiester bonds or analogs thereof to form oligonucleotides ranging in size from a few monomeric units, e.g., 3-4, to several tens of monomeric units. Whenever an oligonucleotide is represented by a sequence of letters, such as “ATGCCTG,” it will be understood that the nucleotides are in 5′→3′ order from left to right and that “A” denotes deoxyadenosine, “C” denotes deoxycytidine, “G” denotes deoxyguanosine, and “T” denotes thymidine, unless otherwise noted. Analogs of phosphodiester linkages include phosphorothioate, phosphorodithioate, phosphoramilidate, phosphoramidiate, and the like. Usually oligonucleotides of the invention comprise the four natural nucleotides; however, they may also comprise non-natural nucleotide analogs. It is clear to those skilled in the art when oligonucleotides having natural or non-natural nucleotides may be employed, e.g. where processing by enzymes is called for, usually oligonucleotides consisting of natural nucleotides are required.

“Perfectly matched” in reference to a duplex means that the poly- or oligonucleotide strands making up the duplex form a double stranded structure with one other such that every nucleotide in each strand undergoes Watson-Crick basepairing with a nucleotide in the other strand. The term also comprehends the pairing of nucleoside analogs, such as deoxyinosine, nucleosides with 2-aminopurine bases, and the like, that may be employed. In reference to a triplex, the term means that the triplex consists of a perfectly matched duplex and a third strand in which every nucleotide undergoes Hoogsteen or reverse Hoogsteen association with a basepair of the perfectly matched duplex. Conversely, a “mismatch” in a duplex between a tag and an oligonucleotide means that a pair or triplet of nucleotides in the duplex or triplex fails to undergo Watson-Crick and/or Hoogsteen and/or reverse Hoogsteen bonding.

As used herein, “nucleoside” includes the natural nucleosides, including 2′-deoxy and 2′-hydroxyl forms, e.g. as described in Kornberg and Baker, DNA, Replication, 2nd Ed. (Freeman, San Francisco, 1992), “Analogs” in reference to nucleosides includes synthetic nucleosides having modified base moieties and/or modified sugar modified, e.g. described by Scheti, Nucleotide Analogs (John Wiley, New York, 1980) Uhlman and Peyman Chemical Review, 90: 543-584 (1990), or the like, with the only proviso that they are capable of specific hybridization. Such analogs include synthetic nucleosides designed to enhance binding properties, reduce degeneracy, increase specificity, and the like.

The invention provides a method of labeling and sorting molecules, particularly polynucleotides, by the use of oligonucleotide tags. The oligonucleotide tags of the invention comprise a plurality of “words” or subunits, selected from minimally cross-hybridizing sets of subunits. Subunits of such sets cannot form a duplex or triplex with the complement of another subunit of the same set with less than two mismatched nucleotides. Thus, the sequences of any two oligonucleotide tags of a repertoire that form duplexes will never be “closer” than differing by two nucleotides. In particular embodiments sequences of any two oligonucleotide tags of a repertoire can be even “further” apart, e.g. by designing a minimally cross-hybridizing set such that subunits cannot form a duplex with the complement of another subunit of the same set with less than three mismatched nucleotides, and so on. The invention is particularly useful in labeling and sorting polynucleotides for parallel operations, such as sequencing, fingerprinting or other types of analysis.

The nucleotide sequences of the subunits for any minimally cross-hybridizing set are conveniently enumerated by simple computer programs following the general algorithm illustrated in FIG. 3, and as exemplified by program minhx whose source code is listed in Appendix L minhx computes all minimally cross-hybridizing sets having subunits composed of three kinds of nucleotides and having length of four.

The algorithm of FIG. 3 is implemented by first defining the characteristic of the subunits of the minimally cross-hybridizing set, i.e. length, number of base differences between members, and composition, e.g. do the consist of two, three, or four kinds of bases. A table Mn, n=1, is generated (100) that consists of all possible sequences of a given length and composition. An initial subunit S1 is selected and compared (120) with successive subunits S2 for i=n+1 to the end of the cable. Whenever a successive subunit has the required number of mismatches to be a member of the minimally cross-hybridizing set, it is saved in a new table Mn+1 (125), that also contains subunits previously selected in prior passes through step 120. For example, in the first set of comparisons, M2 will contain S1; in the second set of comparisons, M3 will contain S1 and S2; in the third set of comparisons, M4 will contain S1, S2, and S3; and so on. Similarly, comparisons in table Mj will be between Sj and all successive subunits in Mj. Note that each successive table Mn+1 is smaller than its predecessors as subunits are eliminated in successive passes through step 130. After every subunit of table Mn has been compared (140) the old table is replaced by the new table Mn+1, and the next round of comparisons are begun. The process stops (160) when a table Mn is reached that contains no successive subunits to compare to the selected subunit Si, i.e. Mn=Mn+1.

Preferably, minimally cross-hybridizing sets comprise subunits that make approximately equivalent contributions to duplex stability as every other subunit in the set. In this way, the stability of perfectly matched duplexes between every subunit and its complement is approximately equal. Guidance for selecting such sets is provided by published techniques for selecting optimal PCR primers and calculating duplex stabilities, e.g. Rychlik et al. Nucleic Acids Research, 17: 8543-8551 (1989) and 18: 6409-6412 (1990); Breslauer et al, Proc. Natl. Acad. Sci., 83: 3746-3750 (1986); Wetmur, Crit. Rev. Biochem. Mol. Biol., 26: 227-259 (1991); and the like. For shorter tags, e.g. about 30 nucleotides or less, the algorithm described by Rychlik and Wetmur is preferred, and for longer tags, e.g. about 30-35 nucleotides or greater, and algoithm disclosed by Suggs et al, pages 683-693 in Brown, editor, ICN-UCLA Syrup. Dev. Biol., Vol. 23 (Academic Press, New York, 1981) may be conveniently employed.

A preferred embodiment of minimally cross-hybridizing sets are those whose subunits are made up of three of the four natural nucleotides. As will be discussed more fully below, the absence of one type of nucleotide in the oligonucleotide tags permits target polynucleotides to be loaded onto solid phase supports by use of the 5′→3′ exonuclease activity era DNA polymerase. The following is an exemplary minimally cross-hybridizing set of subunits each comprising four nucleotides selected from the group consisting of A, G, and T:

TABLE I
Word: W1 W2 W3 W4
Sequence: GATT TGAT TAGA TTTG
Word: W5 W6 W7 W8
Sequence: GTAA AGTA ATGT AAAG

In this set, each member would form a duplex having three mismatched bases with the component of every other member.

Further exemplary minimally cross-hybridizing sets are listed below in Table I. Clearly, additional sets can be generated by substituting different groups of nucleotides, or by using subsets of known minimally cross-hybridizing sets.

TABLE II
Exemplary Minimally Cross-Hybridizing Sets of 4-mer Subunits
CATT ACCC AAAC AAAG AACA AACG
CTAA AGGG ACCA ACCA ACAC ACAA
TCAT CACG AGGG AGGC AGGG AGGC
ACTA CCGA CACG CACC CAAG CAAC
TACA CGAC CCGC CCGG CCGC CCGG
TTTC GAGC CGAA CGAA CGCA CGCA
ATCT GCAG GAGA GAGA GAGA GAGA
AAAC GGCA GCAG GCAC GCCG GCCC
AAAA GGCC GGCG GGAC GGAG
AAGA AAGC AAGG ACAG ACCG ACGA
ACAC ACAA ACAA AACA AAAA AAAC
AGCG AGCG AGCC AGGC AGGC AGCG
CAAG CAAG CAAC CAAC CACC CACA
CCCA CCCC CCCG CCGA CCGA CACA
CGGC CGGA CGGA CGCG CGAG CGGC
GACC GACA GACA GAGG GAGG GAGG
GCGG GCGG GCGC GCCC GCAC GCCC
GGAA GGAC GGAG GGAA GGCA GGAA

The oligonucleotide tags of the invention and their complements are conveniently synthesized on an automated DNA synthesizer, e.g. an Applied Biosystems, Inc. (Foster City, Calif.) model 392 or 394 DNA/RNA Synthesizer, using 6 standard chemistries, such as phosphoramidiate chemistry, e.g. disclosed in the following references: Beaucage and Iyer. Tetrahedron, 48: 2223∝2311 (1992); Moltco et al, U.S. Pat. No. 4,980,460; Koster et al, U.S. Pat. No. 4,725,677; Caruthers et al, U.S. Pat. Nos. 4,415,732; 4,458,066; and 4,973,679; and the like. Alternative chemistries, e.g. resulting in non-natural backbone groups, such as phosphorothioate, phosphoramidate, and the like, may also be employed provided that the resulting oligonucleotides are capable of specific hybridization. In some embodiments, tags may comprise naturally occurring nucleotides that permit processing or manipulation by enzymes, while the corresponding tag complements may comprise non-natural nucleotide analogs, such as peptide nucleic acids, or like compounds, that promote the formation of more stable duplexes during sorting.

When microparticles are used as supports, repertoires of oligonucleotide tags and tag complements are preferably generated by subunit-wise synthesis via “split and mix” techniques e.g. as disclosed in Shortle et al, International patent application PCT/US93/03418. Briefly, the basic unit of the synthesis is a subunit of the oligonucleotide tag. Preferably, phosphoramidiate chemistry is used and 3′ phosphoramidiate oligonucleotides are prepared for each subunit in a minimally cross-hybridizing set, e.g. for the set first listed above, there would be eight 4-mer 3′-phosphoramidites. Synthesis proceeds as disclosed by Shortle et al of in direct analogy with the techniques employed to generate diverse oligonucleotide libraries using nucleosidic monomers, e.g. as disclosed in Telenius et al, Genomics, 13: 718-725 (1992); Welsh et al, Nucleic Acids Research, 19: 5275-5279 (1991); Grothues et al, Nucleic Acids Research, 21: 1321-1322 (1993); Hartley, European patent application 90304496.4; Lam et al. Nature; 354: 82-84 (1991); Zuckerman et al, Int. J. Pept. Protein Research, 40: 498-507 (1992) and the like. Generally, these techniques simply call for fine application of mixtures of the activated monomers to the growing oligonucleotide during the coupling steps.

Double standard forms of tags are made by separately synthesized the complementary strands followed by mixing under conditions that permit duplex formation. Such duplex tags may then be inserted into cloning vectors along with target polynucleotides for sorting and manipulation of the target polynucleotide in accordance with the invention.

In embodiments where specific hybridization occurs via triplex formation, coding of tag sequences follows the same principles as for duplex-forming tags; however, there are further constraints on the selection of subunit sequences. Generally, third strand association via Hoogsteen type of binding is most stable along homopyrimidine-homopurine tracks in a double stranded target. Usually, base triplets form in T-A*T or C-G*C motifs (where “-” indicates Watson-Crick pairing and “*” indicates Hoogsteen type of binding); however, other motifs are also possible. For example, Hoogsteen base pairing permits parallel and antiparallel orientations between the third strand (the Hoogsteen strand) and the purine-rich strand of the duplex to which the third strand binds, depending on conditions and the composition of the strands. There is extensive guidance in the literature for selecting appropriate sequences, orientation, conditions, nucleoside type (e.g. whether ribose or deoxyribose nucleosides are employed). base modifications (e.g. methylated cytosine, and the like) in order to maximize, or otherwise regulate, triplex stability as desired in particular embodiments, e.g. Roberts et al, Proc. Natl. Acad. Sci. 88: 9397-9401 (1991); Roberts et al, Science, 258: 1463-1466 (1992); Distefano et al, Proc. Natl. Acad. Sci. 90: 1179-1183 (1993); Mergny et al, Biochemistry, 30: 9791-9798 (1991); Cheng et al, J. Am. Chem. Soc., 114: 4465-4474 (1992); Beal and Dervan, Nucleic Acids Research, 20: 2773-2776 (1992); Beal and Dervan, J. Am. Chem. Soc, 114: 4976-4982 (1992); Giovannangeli et al, Proc. Natl. Acad. Sci. 89: 8631-8635 (1992); Moser and Dervan, Science, 238: 645-650 (1987); McShan et al, J. Biol. Chem., 267: 5712-5721 (1992); Yoon et al, Proc. Natl. Acad. Sci., 89: 3840-3844 (1992); Blume et al, Nucleic Acids Research, 20: 1777-1784 (1992); Thuong and Helene, Angew. Chem. Int. Ed. Engl. 32: 666-690 (1993); and the like. Conditions for annealing single-stranded or duplex tags to their single-stranded or duplex complements are well known, e.g. Ji et al, Anal. Chem. 65: 1323-1328 (1993).

Oligonudeotide tags of the invention may range in length from 12 to 60 nucleotides or basepairs. Preferably, oligonucleotide tags range in length from 18 to 40 nucleotides or basepairs. More preferably, oligonucleotide tags range in length from 25 to 40 nucleotides or basepairs. Most preferably, oligonucleotide tags are single stranded and specific hybridizing occurs via Watson-Crick pairing with a tag complement.

Oligonucleotide tags may be attached to many different classes of molecules by a variety of reactive functionalities well known in the art; e.g. Haugland, Handbook of Fluorescent Probes and Research Chemicals (Molecular Probes, Inc. Eugene, 1992); Khanna et al, U.S. Pat. No. 4,318,846; or the like. Table III provides exemplary functionalities and counterpart reactive groups that may reside on oligonucleotide tags or the molecules of interest. When the functionalities and counterpart reactants are reacted together, after activation in some cases, a linking group is formed. Moreover, as described more fully below, tags may be synthesized simultaneously with the molecules undergoing selection to form combinatorial chemical libraries.

TABLE III
Reactive Functionalities and Their Counterpart Reactants
and Resulting Linking Groups
Reactive Counterpart Linking
Functionality Functionality Group
—NH2 —COOH —CO—NH—
—NH2 —NCO —NHCONH—
—NH2 —NCS —NHCSNH—
—NH2 ##STR00001## ##STR00002##
—SH —C═C—CO— —S—C—C—CO—
—NH2 —CHO —CH2NH—
—NH2 —SO2Cl —SO2NH—
—OH —OP(NCH(CH3)2)2 —OP(═O)(O)O—
—OP(═O)(O)S —NHC(═O)CH2Br —NHC(═O)CH2SP(═O)(O)O—

A class of molecules particularly convenient for the generation of combinatorial chemical libraries includes linear polymeric molecules of the form:
—(M—L)n
wherein L is a linker moiety and M is a monomer that may selected from a wide range of chemical structures to provide a range of functions from serving as an inert non-sterically hindering spacer moiety to providing a reactive functionality which can serve as a branching point to attach other components, a site for attaching labels; a site for attaching oligonucleotides or other binding polymers for hybridizing or binding to a therapeutic target; or as a site for attaching other groups for affecting solubility, promotion of duplex and/or triplex formation, such as intercalators, alkylating agents, and the like. The sequence, and therefore composition, of such linear polymeric molecules may be encoded within a polynucleotide attached to the tag, as taught by Brenner and Lener (cited above). However, after a selection event, instead of amplifying then sequencing the tag of the selected molecule, the tag itself or an additional coding segment can be sequenced directly—using a so-called “single base” approach described below—after releasing the molecule of interest, e.g. by restriction digestion of a site engineered into the tag. Clearly, any molecule produced by a sequence of chemical reaction steps compatible with the simultaneous synthesis of the tag moieties can be used in the generation of combinatorial libraries.

Conveniently there is a wide diversity of phosphate-linked monomers available for generating combinatorial libraries. The following references disclose several phosphoramidite and/or hydrogen phosphonate monomers suitable for use in the present invention and provide guidance for their synthesis and inclusion into oligonucleotides: Newton et al, Nucleic Acids Research, 21: 1155-1162 (1993); Griffin et al, J. Am. Chem. Soc, 114: 7976-7982 (1992); Jaschke et al, Tetrahedron Letters, 34: 301-304 (1992); Ma et al, International application PCT/CA92/00423; Zon et al, International application PCT/US90/06630; Durand et al, Nucleic Acids Research, 18: 6353-6359 (1990); Salunkhe et al, J. Am. Chem. Soc., 114: 8768-8772 (1992); Urdea et al, U.S. Pat. No. 5,093,232; Ruth, U.S. Pat. No. 4,948,882; Cruickshank, U.S. Pat. No. 5,091,519; Haralambidis et al, Nucleic Acids Research, 15: 4857-4876 (1987); and the like. More particularly, M may be a straight chain, cyclic, or branched organic molecular structure containing from 1 to 20 carbon atoms and from 0 to 10 heteroatoms selected from the group consisting of oxygen, nitrogen and sulfur. Preferably, M is alkyl, alkoxy, alkenyl, or aryl containing from 1 to 16 carbon atoms; a heterocycle having from 3 to 8 carbon atoms and from 1 to 3 heteroatoms selected from the group consisting of oxygen, nitrogen, and sulfur; glycosyl; or nucleosidyl. More preferably, M is alkyl, alkoxy, alkenyl, or aryl containing from 1 to 8 carbon atoms; glycosyl; or nucleosidyl.

Preferably, L is a phosphorus (V) linking group which may be phosphodiester, phosphotriester, methyl or ethyl phosphonate, phosphorothioate, phophorodithioate, phosphoramidate, of the like. Generally, linkages derived from phosphoramidite or hydrogen phosphonate precursors are preferred so that the linear polymeric units of the invention can be conveniently synthesized with commercial automated DNA synthesizers, e.g. Applied Biosystems, Inc. (Foster City, Calif.) model 394, or the like.

n may vary significantly depending on the nature of M and L. Usually, n varies from about 3 to about 100. When M is a nucleoside or analog thereof or a nucleoside-sized monomer and L is a phosphorus(V) linkage, then n varies from about 12 to about 100. Preferably, when M is a nucleoside or analog thereof or a nucleoside-sized monomer and L is a phosphorus(V) linkage, then n varies from about 12 to about 40.

Peptides are another preferred class of molecules to which tags of the invention are attached. Synthesis of peptide, oligonucleotide conjugates which may be used in the invention is taught in Nielsen et al, J. Am. Chem. Soc., 115: 9812-9813 (1993); Haralambidis et al (cited above) and International patent application PCT/AU88/004417; Truffert et al, Tetrahedron Letters, 35:2353-2356 (1994); de la Torre et al, Tetrahedron Letters, 35: 2733-2736 (1994); and like references. Preferably, peptide-oligonucleotide conjugates are synthesized as described below. Peptides synthesized in accordance with the invention may consist of the natural amino acid monomers or non-natural monomers, including the D isomers of the natural amino acids and the like.

Combinatorial chemical libraries employing tags of the invention are preferably prepared by the method disclosed in Nielson et al (cited above) and illustrated in FIG. 4 for a particular embodiment. Briefly, a solid phase support, such as CPG, is derivatized with a cleavable linker that is compatible with both the chemistry employed to synthesize the tags and the chemistry employed to synthesize the molecule that will undergo some selection process. Preferably, tags are synthesized using phosphoramidite chemistry as described above and with the modifications recommended by Nielson et al (cited above); that is, DMT-5′-O-protected 3′-phosphoramidite-derivatized subunits having methyl-protected phosphite and phosphate, moieties are added in each synthesis cycle. Library compounds are preferably monomers having Fmos—or equivalent—protecting groups masking the functionality to which successive monomer will be coupled. A suitable linker for chemistries employing both DMT and Fmoc protecting groups (referred to herein as a sarcosine linker) is disclosed by Brown et al, J. Chem. Soc. Chem. Commun. 1989: 891-893, which reference is incorporated by reference.

FIG. 4 illustrates a scheme for generating a combinatorial chemical library of peptides conjugated to oligonucleotide tags. Solid phase support 200 is derivatized by sarcosine linker 205 (exemplified in the formula below) as taught by Nielson et al (cited above), which has an extended linking moiety to facilitate reagent access.

(CPG)-NHC(O)CN(CH3)C(O)CH2CH2C(O)O(CH2)6NHC(O)CH2(O-DMT)NC(O)-
CH2O(CH2CH2O)2CH2CH2NHC(O)CH2O(CH2CH2O)2CH2CH2NH-Fmoc

Here “CPG” represents a controlled-pore glass support, “DMT” represents dimethoxytrityl, and “Fmos” represents, 9-fluorenylmethoxycarbonyl.

In a preferred embodiment, an oligonucleotide segment 214 is synthesized initially so that in double stranded form a restriction candonuclease site is provided for cleaving the library compound after sorting onto a microparticle, or like substrate. Synthesis proceeds by successive alternative additions of subunits S1, S2, S3, and the like, to form tag 212, and their corresponding library compound monomers A1, A2, A3, and the like, to form library compound 216. A “split and mix” technique is employed to generate diversity.

The subunits in a minimally cross-hybridizing set code for the monomer added in the library compound. Thus, a nine word set can unambiguously encode library compounds constructed from nine monomers. If some ambiguity is acceptable, then a single subunit may encode more than one monomer.

After synthesis is completed, the product is cleaved and deprotected (220) to form tagged library compound 225, which then undergoes selection 230, e.g. binding to a predetermined target 235, such as a protein. The subset of library compounds recovered from selection process 230 is then sorted (24) onto a solid phase support 245 via their tag moieties (there complementary subunits and nucleotides are shown in italics). After ligating oligonucleotide splint 242 to tag complement 250 to form restriction site 225, the conjugate is digested with the corresponding restriction endonuclease to cleave the library compound, a peptide in the example of FIG. 4, from the oligonucleotide moiety. The sequence of the tag, and hence the identity of the library compound, is then determined by the preferred single base sequencing technique of the invention, described below.

Solid phase supports for use with the invention may have a wide variety of forms, including microparticles; beads, and membrance, slides, plates micromachined chops, and the like. Likewise, solid phase supports of the invention may comprise a wide variety of compositions, including glass, plastic, silicon, alkanethiolate-dervatized gold, cellulose, low cross-linked and high cross-linked polystyrene, silica gel, polyamide, and the like. Preferably, either a population of discrete particles are employed such that each has a uniform coating, or population, of complementary sequences of the same tag(and no other), of a single or a few supports are employed with spacially discrete regions each containing a uniform coating, or population, or complementary sequences to the same tag (and no other). In the latter embodiment, the area of the regions may vary according to particular applications; usually, the regions range in area from several μm2, e.g. 3-5, to several hundred μm2, e.g. 100-500. Preferably, such regions are specifically discrete so that signals generated by events, e.g. fluorescent emissions, at adjacent regions can be resolved by the detection system being employed. In some applications, it may be desirable to have regions with uniform coatings of more than one tag complement, e.g. for simultaneous sequence analysis, or for bringing separately tagged molecules into close proximity.

Tag complements may be used with the solid phase support that they are synthesized on, or they may be separately synthesized and attached to a solid phase support for use, e.g. as disclosed by Lund et al. Nucleic Adds Research, 16: 10861-10880 (1988); Albretsen et al, Anal. Biochem., 189: 40-50 (1990); Wolf et al, Nucleic Acids Research, 15: 2911-2926 (1987); or Ghosh et al, Nucleic Acids Research, 15: 5353-5372 (1987); Preferably, tag complements are synthesized on and used with the same solid phase support; which my comprise a variety of forms and include a variety of linking moieties. Such supports may comprise microparticles or arrays, or matrices, of regions where uniform populations of tag complements are synthesized. A wide variety of microparticle supports may be used with the invention, including microparticles made of controlled pore glass (CPG), highly cross-linked polstyrene., acrylic copolymers, cellulose, nylon, dextran, latex, polyacrolein, and the like, disclosed in the following exemplary references: Meth. Enzymol, Section A pages 11-147, vol. 44 (Academic Press, New York, 1976); U.S. Pat. No. 4,678,814; 4,413,070; and 4,046;720; and Pon. Chapter 19, in Agrawal, editor, Methods in Molecular Biology, Vol. 20, (Humana Press, Totowa, N.J., 1993). Microparticle supports further include commercially available nucleoside-derivatized CPG and polystyrene beads (e.g. available from Applied Biosystems, Foster City, Calif.); derivatized magnetic beads; polystyrene grafted with polythylene glycol (e.g. TentaGel™, Rapp Polymere, Tubingen Germany); and the like. Selection of the support characteristics, such as material, porosity, size, shape, and the like, and the type of linking moiety employed depends on the conditions under which the tags are used. For example, in applications involving successive processing with enzymes, supports and linkers that minimize steric hinderance of the enzymes and that facilitate access to substrate are preferred. Exemplary linking moieties are disclosed in Pon et al, Biotechniques, 6; 768-775 (1988); Webb, U.S. Pat. No. 4,659,774; Barany et al, International patent application PCT/US91/06103; Brown et al, J. Chem. Soc. Commun., 1989: 891-893; Damha et al. Nucleic Acids Research, 18: 3813-3821 (1990); Beattie et al, Clinical Chemistry, 39: 719-722 (1993); Maskos and Southern, Nucleic Acids Research, 20: 1679-1684 (1992); and the like.

As mentioned above, tag complements may also be synthesized on a single (or a few) solid phase support to form an array of regions uniformly coated with tag complements. That is, within each region in such an array the same tag complement is synthesized. Techniques for synthesizing such arrays are disclosed in McGall et al, International application PCT/US93/03767; Pease et al, Proc. Natl. Acad. Sci., 91: 5022-5026 (1994); Southern and Maskos, International application PCT/GB89/01114; Maskos and Southern (cited above); Southern et al, Genomics, 13: 1008-1017 (1992); and Maskos and Southern, Nucleic Acids Research, 21: 4663-4669 (1993).

Preferably, the invention is implemented with microparticles or beads uniformly coated with complements of the same tag sequence. Microparticle supports and methods of covalently or noncovalently linking oligonucleotides to their surfaces are well known, as exemplified by the following references: Beaucage and Iyer (cited above); Gait, editor, Oligonucleotide Synthesis: A Practical Approach (IRL Press, Oxford, 1984); and the references cited above. Generally, the size and shape of a microparticle is not critical; however, microparticles in the size range of a few, e.g. 1-2, to several hundred, e.g. 200-1000 μm diameter are preferable, as they facilitate the construction and manipulation of large repertoires of oligonucleotide tags with minimal reagent and sample usage.

Preferably, commercially available controlled-pore glass (CPG) or polystyrene supports are employed as solid phase supports in the invention. Such supports come available with base-labile linkers and initial nucleosides attached, e.g. Applied Biosystems (Foster City, Calif.). Preferably, microparticles having pore sizes between 500 and 1000 angstroms are employed.

An important aspect of the invention is the sorting of populations of identical polynucleotides, e.g. from a cDNA library, and their attachment to microparticles or separate regions of a solid phase support such that each microparticle or region has only a single kind of polynucleotide. This latter condition can be essentially met by ligating a repertoire of tags to a population of polynucleotides followed by cloning and sampling of the ligated sequences. A repertoire of oligonucleotide tags can be ligated to a population of polynucleotides in a number of ways, such as through direct enzymatic ligation, amplification, e.g. via PCR, using primers containing the tag sequences, and the like. The initial ligating step produces a very large populations of tag-polynucleotide conjugates such that a single tag is generally attached to many different polynucleotides. However, by taking a sufficiently small sample of the conjugates, the probability of obtaining “doubles,” i.e. the same tag on two different polynucleotide, can be made negligible. (Note that it is also possible to obtain different tags with the same polynucleotide in a sample. This case is simply leads to a polynucleotide being processed, e.g. sequenced, twice). As explain more fully below, the probability of obtaining a double in a sample can be estimated by a Poisson distribution since the number of conjugates in a sample will be large, e.g. on the order of thousands or more, and the probability of selecting a particular tag will be small because the tag repertoire is large, e.g. on the order of tens of thousands or more. Generally, the larger the sample the greater the probability of obtaining a double. Thus, a design trade-off exists between selecting a large sample of tag-polynucleotide conjugates—which, for example, ensures adequate coverage of a target polynucleotide in a shotgun sequencing operation, and selecting a small sample which ensures that a minimal number of doubles will be present. In most embodiments, the presence of double merely adds an additional source of noise or, in the case of sequencing, a minor complication in scanning and signal processing, as microparticles giving multiple fluorescent signals can simply ignored. As used herein, the term “substantially all” in reference to attaching tags to molecules, especially polynucleotides, is meant to reflect the statistical nature of the sampling procedure employed to obtain a population of tag-molecule conjugates essentially free of doubles. The meaning of substantially all in terms of actual percentages of tag-molecule conjugates depends on how the tags are being employed. Preferably, for nucleic acid sequencing, substantially all means that at least eighty percent of the tags have unique polynucleotides attached. More preferably, it means that at least ninety percent of the tags have unique polynucleotides attached. Still more preferably, i. means that at least ninety-five percent of the tags have unique polynucleotides attached. And, more preferably, it means that at least ninety-nine percent of the tags have unique polynucleotides attached.

Preferably, when the population of polynucleotides is messenger RNA (mRNA), oligonucleotides tags are attached by reverse transcribing the mRNA with a set of primers containing complements of tag sequences. An exemplary set of such primers could have the following sequence:
5′-mRNA-[A]n-3′
[T]19GG[W,W,W,C]9ACCAGCTGATC-5′-biotin
where “[W,W,W,C]9” represents the sequence of an oligonucleotide tag of nine subunits of four nucleotides each and “[W,W,W,C]” represents the subunit sequences listed above, i.e. “W” represents T or A. The underlined sequences identify an optional restriction endonuclease site that can be used to release the polynucleotide from attachment to a solid phase support via the biotin, if one is employed. For the above primer, the complement attached to a microparticle could have the form

After reverse transcription, the mRNA is removed, e.g. by RNase H digestion, and the second strand of the cDNA is synthesized using, for example, a primer of the following form (SEQ ID NO:6):


where N is any one of A, T, G, or C; R is a purine-containing nucleotide, and Y is a pyrimidine-containing nucleotide. This particular primer creates a Bst Y1 restriction site in the resulting double stranded DNA which, together with the Sal I site, facilitates cloning into a vector with, for example, Bam HI and Xho I sites. After Bst Y1 and Sal I digestion, the exemplary conjugate would have the form
GGT[G,W,W,W]9CC[A]19-rDNA0NNNYCTAG-5′
Preferably, when the ligated-based method of sequencing is employed, the Bst YI and Sal I digested fragments are cloned into a Bam HI-/Xho I-digested vector having the following single-copy restriction sites (SEQ ID NO:1):
5′-GAGGATGCCTTTATGGATCCACTCGAGATCCCAATCCA-3′
FokI BAmHI XhoI
This adds the Fok I site which will allow initiation of the sequencing process discussed more fully below.

A general method for exposing the single stranded tag after amplification involves digesting a target polynucleotide-containing conjugate with the
CCCGG(**)(**)(**)(**)(**)(**)(**)(**)(**)TTCGAp-5′
The mixture of duplexes is then ligated into a Sma I/Hind III-digested M13mp19. A repertoire of tag complements are synthesized on CPG microparticles as described above.

Next the following adapter (SEQ ID NO:2 and SEQ ID NO:7) is prepared which contains a Fok I site and portion of Eco RI and Sma I sites:

5′-pAATTCGGATGATGCATGCATCGACCC
       GCCTACTACGTACGTAGCTGGGp-5′
Eco RI   Fok I             Sma I

The adapter is ligated into the Eco RI/Sma I digested M13 described above.

Separately, SV40 DNA is fragmented by sonication following the protocol set forth in Sambrook et al (cited above). The resulting fragments are repaired using standard protocols and separated by size. Fragments in the range of 300-500 basepairs are selected and ligated into the Sma I digested M13 described above to form a library of fragment-tag conjugates, which is then amplified. A sample containing several thousand different fragment-tag conjugates is taken from the library, further amplified, and the fragment-tag inserts are excised by digesting with Eco RI and Hind III. The excised fragment-tag conjugates are treated with T4 DNA polymerase in the presence of deoxycytidine triphosphate, as described in Example I, to expose the oligonucleotide tags for specific hybridization to the CPG microparticles.

After hybridization and ligation, as described in Example I, the loaded microparticles are treated with Fok I to produce a 4-nucleotide protruding strand of a predetermined sequence. A 10:1 mixture (probe 1:probe 2) of the following probes (SEQ ID NO:3, SEQ ID NO:8, SEQ ID NO:9, and SEQ ID NO:10 ) are ligated to the polynucleotides on microparticles.

Probe 1 FAM- ATCGGATGAC
TAGCCTACTGAGCT
Probe 2 biotin- ATCGGATGAC
TAGCCTACTGAGCT

FAM represents a fluorescein dye attached to the 5′-hydroxyl of the top strand of Probe I through an aminophosphate linker available from Applied Biosystems (Aminolinker). The biotin may also be attached through an Aminolinker moiety and optionally may be further extended via polyethylene oxide linkers, e.g. Jaschke et al (cited above).

The loaded microparticles are then deposited on the surface of an avidinated glass slide to which and from which reagents and wash solutions can be delivered and removed. The avidinated slide with the attached microparticles is examined with a scanning fluorescent microscope (e.g. Zeiss Axiskop equipped with a Newport Model PM500-C motion controller, a Spectra-Physics Model 2020 argon ion laster producing a 488 nm excitation beam, and a 520 nm long-pass emission filter, or like apparatus). The excitation beam and fluorescent emissions are delivered and collected, respectively, through the same objective lens. The excitation beam and collected fluorescence are separated by a dichroic mirror which directs the collected fluorescence through a series of bandpass filters and to photon-counting devices corresponding to the fluorophors being monitored, e.g. comprising Hamamatsu model 9403-02 photomutlipliers, a Stanford Research Systems model SR445 amplifier and model SR430 multichannel scaler, and digital computer, e.g. a 486-based computer. The computer generates a two dimensional map of the slide which registers the positions of the microparticles.

After cleavage with Fok I to remove the initial probe, the polynucleotides on the attached microparticles undergo 20 cycles of probe ligation, washing detection, cleavage, and washing, in accordance with the preferred single base sequencing methodology described below. Within each detection step, the scanning system records the fluorescent emission corresponding to the base identified at each microparticle. Reactions and washes below are generally carded out with manufacturer's (New England Biolabs') recommended buffers for the enzymes employed, unless otherwise indicated. Standard buffers are also described in Sambrook et al (cited above).

The following four sets of mixed probes (SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, and SEQ ID NO:15) are provided for addition to the target polynucleotides:

TAMRA- ATCGGATGACATCAAC
TAGCCTACTGTAGTTGANNN
FAM- ATCGGATGACATCAAC
TAGCCTACTGTAGTTGCNNN
ROX- ATCGGATGACATCAAC
TACCCTACTGTAGTTGGNNN
JOE- ATCGGATGACATCAAC
TACCCTACTGTAGTTGTNNN

where TAMRA, FAM, ROX and JOE are spectrally resolvable fluorescent lables attached by way of Aminolinker II (all being available from Applied Biosystems, Inc., Foster City, Calif.); the bold faced nucleotides are the recognition site for Fok I enidonuclease, and “N” represents any one of the four nucleotides, A, C, G, T. TAMRA (tetramethylrhodamine), FAM (fluorescein), ROX (rhodamine X), and JOE (2′,7′-dimethoxy-4′,5′-dichlorofluorescein) and their attachment to oligonucleotides is also described in Fung et al, U.S. Pat. No. 4;855,225.

The above probes are incubated in approximately 5 molar excess of the target polynucleotide ends as follows: the probes are incubated for 60 minutes at 16° C. with 200 traits of T4 DNA ligase and the anchored target polynculeotide in T4 DNA ligase buffer, after washing, the target polynucleotide is then incubated with 100 units T4 polynucleotide kinase in the manufacturer's, recommended buffer for 30 minutes at 37° C., washed, and again incubated for 30 minutes at 16° C. with 200 units of T4 DNA ligase and the anchored target polynucleotide in T4 DNA ligase buffer. Washing is accomplished by successively flowing volumes of wash buffer over the slide, e.g. TE, disclosed in Sambrook et al (cited above). After the cycle of ligation-phosphorylation-ligation and a final washing, the attached microparticles are scanned for the presence of fluorescent label, the positions and characteristics of which are recorded by the scanning system. The labeled target polynucleotide, i.e. The ligated complex, is then incubated with 10 units of Fok I in the manufacturer's recommended buffer for 30 minutes at 37° C., followed by washing in TE. As a result the target polynucleotide is shortened by one nucleotide on each strand and is ready for the next cycle of ligation and cleavage. The process is continued until twenty nucleotides are identified.

APPENDIX I
Exemplary computer program for generating
minimally cross hybridizing sets
Program minxh
c
c
c
integer*2 sub1 (6) ,mset1(1000,6) ,mset2(1000,6)
dimension nbase(6)
c
c
write(*,*)‘ENTER SUBUNIT LENGTH’
read(*,100)nsub
100 format(i1)
open(1,file=‘sub4.dat’,form=‘formatted’,status=‘new’)
c
c
nset=0
do 7000 m1=1,3
do 7000 m2=1,3
do 7000 m3=1,3
do 7000 m4=1,3
sub1(1)=m1
sub1(2)=m2
sub1(3)=m3
sub1(4)=m4
c
c
ndiff=3
c
c
c Generate set of subunits differing from
c sub1 by at least ndiff nucleotides.
c Save in mset1.
c
c
jj=1
do 900 J=1,nsub
900 mset1(1,j)=sub1(j)
c
c
do 1000 k1=1,3
do 1000 k2=1,3
do 1000 k3=1,3
do 1000 k4=1,3
c
c
nbase(1)=k1
nbase(2)=k2
nbase(3)=k3
nbase(4)=k4
c
n=0
do 1200 j=1,nsub
if(sub1(j) .eq.1 .and. nbase(j) .ne.1 .or.
1 sub1(j) .eq.2 .and. nbase(j) .ne.2 .or.
3 sub1(j) .eq.3 .and. nbase(j) .ne.3) then
n=n+1
endif
1200 continue
c
c
if(n.ge.ndiff) then
c
c
c If number of mismatches
c is greater than or equal
c to ndiff then record
c subunit in matrix mset
c
jj=jj+1
do 1100 i=1,nsub
1100 mset1(jj,i)=nbase(i)
endif
c
c
1000 continue
c
c
do 1325 j2=1,nsub
mset2(1,j2)=mset1(1,j2)
1325 mset2(2,j2)=mset1(2,j2)
c
c
c Compare subunit 2 from
c mset1 with each successive
c subunit in mset1, i.e. 3,
c 4,5, . . . etc. Save those
c with mismatches .ge. ndiff
c in matrix mset2 starting at
c position 2.
c  Next transfer contents
c of mset2 into mset1 and
c start
c comparisons again this time
c starting with subunit 3.
c Continue until all subunits
c undergo the comparisons.
c
c
npass=0
c
c
1700 continue
kk=npass+2
npass=npass+1
c
c
do 1500 m=npass+2,jj
n=0
do 1600 j=1,nsub
if(mset1(npass+1,j) .eq.1.and.mset1(m,j) .ne.1.or.
2 mset1(npass+1,j) .eq.2.and.mset1(m,j) .ne.2.or.
2 mset1(npass+1,j) .eq.3.and.mset1(m,j) .ne.3) then
n=n+1
endif
1600 continue
if(n.ge.ndiff) then
kk=kk+1
do 1625 i=1,nsub
1625 nset2(kk,i)=mset1(m,i)
endif
1500 continue
c
c
c kk is the number of subunits
c stored in mset2
c
c
c Transfer contents of mset2
c into mset1 for next pass.
c
c
do 2000 k=1,kk
do 2000 m=1,nsub
2000 mset1(k, m)=mset2 (k, m)
if(kk.1t.jj) then
jj=kk
goto 1700
endif
c
c
nset=nset+1
write (1,7009)
7009 format(/)
do 7008 k=1,kk
7008 write(1,7010)(mset1(k,m),m=1,nsub)
7010 format(4i1)
write(*,*)
write(*,120) kk,nset
120 format(1x,‘Subunits in set=’,i5,2x,‘Set No=’,i5)
7000 continue
close(1)
c
c
end

Brenner, Sydney

Patent Priority Assignee Title
10000557, Dec 19 2012 DNAE Group Holdings Limited Methods for raising antibodies
10000799, Nov 04 2014 NCAN GENOMICS, INC Methods of sequencing with linked fragments
10036013, Aug 19 2013 Abbott Molecular Inc Next-generation sequencing libraries
10066259, Jan 06 2015 MOLECULAR LOOP BIOSCIENCES, INC Screening for structural variants
10202637, Mar 14 2013 MOLECULAR LOOP BIOSCIENCES, INC Methods for analyzing nucleic acid
10227635, Apr 16 2012 MOLECULAR LOOP BIOSCIENCES, INC Capture reactions
10370710, Oct 17 2011 INVITAE CORPORATION Analysis methods
10379113, Dec 19 2012 DNAE Group Holdings Limited Target detection
10429399, Sep 24 2014 INVITAE CORPORATION Process control for increased robustness of genetic assays
10577646, Aug 19 2013 ABBOTT MOLECULAR INC. Nucleotide analogs
10584329, Dec 19 2012 DNAE Group Holdings Limited Methods for universal target capture
10604799, Apr 04 2012 INVITAE CORPORATION Sequence assembly
10677789, Apr 21 2010 DNAE Group Holdings Limited Analyzing bacteria without culturing
10706017, Jun 03 2013 INVITAE CORPORATION Methods and systems for storing sequence read data
10745763, Dec 19 2012 DNAE Group Holdings Limited Target capture system
10801059, Mar 28 2016 NCAN GENOMICS, INC Droplet-based linked-fragment sequencing
10829813, Nov 04 2014 NCAN GENOMICS, INC Methods of sequencing with linked fragments
10851414, Oct 18 2013 MOLECULAR LOOP BIOSCIENCES, LLC Methods for determining carrier status
10865409, Sep 07 2011 X-CHEM, INC Methods for tagging DNA-encoded libraries
10865410, Aug 19 2013 ABBOTT MOLECULAR INC. Next-generation sequencing libraries
10961568, Mar 28 2016 NCAN GENOMICS, INC Linked target capture
10961573, Mar 28 2016 NCAN GENOMICS, INC Linked duplex target capture
10995363, Aug 19 2013 ABBOTT MOLECULAR INC. Nucleotide analogs
11016086, Dec 19 2012 DNAE Group Holdings Limited Sample entry
11021742, Mar 28 2016 NCAN GENOMICS, INC Linked-fragment sequencing
11041203, Oct 18 2013 MOLECULAR LOOP BIOSCIENCES, INC Methods for assessing a genomic region of a subject
11041851, Dec 23 2010 MOLECULAR LOOP BIOSCIENCES, INC Methods for maintaining the integrity and identification of a nucleic acid template in a multiplex sequencing reaction
11041852, Dec 23 2010 Molecular Loop Biosciences, Inc. Methods for maintaining the integrity and identification of a nucleic acid template in a multiplex sequencing reaction
11053548, May 12 2014 INVITAE CORPORATION Methods for detecting aneuploidy
11073513, Apr 21 2010 DNAE Group Holdings Limited Separating target analytes using alternating magnetic fields
11149308, Apr 04 2012 INVITAE CORPORATION Sequence assembly
11155863, Apr 04 2012 INVITAE CORPORATION Sequence assembly
11168321, Feb 13 2009 X-Chem, Inc. Methods of creating and screening DNA-encoded libraries
11268137, Dec 09 2016 NCAN GENOMICS, INC Linked ligation
11332736, Dec 07 2017 THE BROAD INSTITUTE, INC Methods and compositions for multiplexing single cell and single nuclei sequencing
11390919, Apr 04 2012 INVITAE CORPORATION Sequence assembly
11408024, Sep 10 2014 MOLECULAR LOOP BIOSCIENCES, INC Methods for selectively suppressing non-target sequences
11448646, Apr 21 2010 DNAE Group Holdings Limited Isolating a target analyte from a body fluid
11473136, Jan 03 2019 NCAN GENOMICS, INC Linked target capture
11603400, Dec 19 2012 DNAE Group Holdings Limited Methods for raising antibodies
11667951, Oct 24 2016 GENEINFOSEC, INC Concealing information present within nucleic acids
11667965, Apr 04 2012 INVITAE CORPORATION Sequence assembly
11674135, Jul 13 2012 X-CHEM, INC DNA-encoded libraries having encoding oligonucleotide linkages not readable by polymerases
11680284, Jan 06 2015 Molecular Loop Biosolutions, LLC Screening for structural variants
11768200, Dec 23 2010 Molecular Loop Biosciences, Inc. Methods for maintaining the integrity and identification of a nucleic acid template in a multiplex sequencing reaction
11827930, Nov 04 2014 NCAN GENOMICS, INC Methods of sequencing with linked fragments
11840730, Apr 30 2009 Molecular Loop Biosciences, Inc. Methods and compositions for evaluating genetic markers
11879151, Dec 09 2016 NCAN Genomics, Inc. Linked ligation
11905556, Mar 28 2016 NCAN GENOMICS, INC Linked target capture
7622281, May 20 2004 BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY, THE Methods and compositions for clonal amplification of nucleic acid
8278049, Apr 26 2010 ANN AND ROBERT H LURIE CHILDREN S HOSPITAL OF CHICAGO Selective enrichment of CpG islands
8738300, Apr 04 2012 INVITAE CORPORATION Sequence assembly
8812422, Apr 09 2012 INVITAE CORPORATION Variant database
9115387, Mar 14 2013 MOLECULAR LOOP BIOSCIENCES, INC Methods for analyzing nucleic acids
9228233, Oct 17 2011 INVITAE CORPORATION Analysis methods
9298804, Apr 09 2012 INVITAE CORPORATION Variant database
9359601, Feb 13 2009 X-CHEM, INC Methods of creating and screening DNA-encoded libraries
9476812, Apr 21 2010 DNAE Group Holdings Limited Methods for isolating a target analyte from a heterogeneous sample
9535920, Jun 03 2013 INVITAE CORPORATION Methods and systems for storing sequence read data
9551704, Dec 19 2012 DNAE Group Holdings Limited Target detection
9562896, Apr 21 2010 DNAE Group Holdings Limited Extracting low concentrations of bacteria from a sample
9599610, Dec 19 2012 DNAE Group Holdings Limited Target capture system
9671395, Apr 21 2010 DNAE Group Holdings Limited Analyzing bacteria without culturing
9677124, Mar 14 2013 MOLECULAR LOOP BIOSCIENCES, INC Methods for analyzing nucleic acids
9696302, Apr 21 2010 DNAE Group Holdings Limited Methods for isolating a target analyte from a heterogeneous sample
9804069, Dec 19 2012 DNAE Group Holdings Limited Methods for degrading nucleic acid
9822409, Oct 17 2011 INVITAE CORPORATION Analysis methods
9869671, Apr 21 2010 DNAE Group Holdings Limited Analyzing bacteria without culturing
9902949, Dec 19 2012 DNAE Group Holdings Limited Methods for universal target capture
9932623, Aug 19 2013 Abbott Molecular Inc Nucleotide analogs
9970931, Apr 21 2010 DNAE Group Holdings Limited Methods for isolating a target analyte from a heterogenous sample
9995742, Dec 19 2012 DNAE Group Holdings Limited Sample entry
Patent Priority Assignee Title
5028545, Jun 16 1987 WALLAC OY, P O BOX 10, SF-20101 TURKU 10, FINLAND Biospecific multianalyte assay method
5104791, Feb 09 1988 DADE BEHRING INC ; BADE BEHRING INC Particle counting nucleic acid hybridization assays
5206143, Nov 01 1985 SmithKline Beecham Corporation Method and reagents for performing subset analysis using quantitative differences in fluorescence intensity
5302509, Aug 14 1989 BECKMAN INSTRUMENTS, INC , A CORP OF DELAWARE Method for sequencing polynucleotides
5405746, Mar 23 1988 Qiagen GmbH Method of sequencing DNA
5482836, Jan 14 1993 Regents of the University of California, The DNA purification by triplex-affinity capture and affinity capture electrophoresis
5512439, Nov 21 1988 Life Technologies AS Oligonucleotide-linked magnetic particles and uses thereof
5518883, Jul 02 1992 Biospecific multiparameter assay method
5567627, Jul 16 1991 Trans-Med Biotech, Incorporated Method and composition for the simultaneous and discrete analysis of multiple analytes
CA2036946,
EP303459,
EP304845,
EP392546,
WO9003382,
WO9200091,
WO9210587,
WO9210588,
WO9306121,
WO9317126,
WO9321203,
WO9322680,
WO9322684,
WO9408051,
WO9520053,
///
Executed onAssignorAssigneeConveyanceFrameReelDoc
Oct 23 1996SPECTRAGENLYNX THERAPEUTICS, INC MERGER CHANGE OF NAME0192970508 pdf
Aug 02 1999Solexa, Inc.(assignment on the face of the patent)
Mar 04 2005LYNX THERAPEUTICS, INC SOLEXA, INC MERGER SEE DOCUMENT FOR DETAILS 0193130433 pdf
Date Maintenance Fee Events
Feb 09 2009REM: Maintenance Fee Reminder Mailed.
Feb 16 2009M1553: Payment of Maintenance Fee, 12th Year, Large Entity.
Feb 16 2009M1556: 11.5 yr surcharge- late pmt w/in 6 mo, Large Entity.


Date Maintenance Schedule
Aug 21 20104 years fee payment window open
Feb 21 20116 months grace period start (w surcharge)
Aug 21 2011patent expiry (for year 4)
Aug 21 20132 years to revive unintentionally abandoned end. (for year 4)
Aug 21 20148 years fee payment window open
Feb 21 20156 months grace period start (w surcharge)
Aug 21 2015patent expiry (for year 8)
Aug 21 20172 years to revive unintentionally abandoned end. (for year 8)
Aug 21 201812 years fee payment window open
Feb 21 20196 months grace period start (w surcharge)
Aug 21 2019patent expiry (for year 12)
Aug 21 20212 years to revive unintentionally abandoned end. (for year 12)