In some embodiments, a non-transitory processor-readable medium includes code to cause a processor to receive a set of variants identified by a comparison of a test dna sequence with a reference dna sequence and associate at least one of the set of variants with at least one of a set of annotations each indicative of at least one criterion. The code includes code to cause the processor to filter, based on the set of annotations, the set of variants to identify a subset of variants from the set of variants. Each variant from the subset of variants is associated with at least one common annotation from the set of annotations. The code further includes code to cause the processor to present the subset of variants such that the subset of variants can be used to render a clinical diagnosis.
|
29. A method for identifying a genetic disorder in an individual, comprising:
determining a dna sequence for a patient suspected of having a genetic disorder,
comparing the dna sequence with one or more reference sequences to identify a plurality of variants,
annotating, at an annotation module implemented in at least one of a memory or a processing device, each variant from the plurality of variants with at least one annotation from a set of annotations including at least one of:
(i) reported to cause a disorder and recognized to cause a disorder,
(ii) unreported but expected to cause a disorder,
(iii) unreported and of the type that might be causative of a disorder,
(iv) unreported and unlikely to cause a disorder,
(v) reported and recognized neutral variant, or
(vi) unknown or not expected to be causative of disease, but associated with clinical presentation,
filtering the plurality of variants on the basis of the set of annotations; and
identifying the presence or absence of the genetic disorder.
1. A non-transitory processor-readable medium storing code representing instructions to be executed by a processor, the code comprising code to cause the processor to:
receive a plurality of variants identified by a comparison of a test dna sequence with a reference dna sequence;
associate at least one variant from the plurality of variants with at least one annotation from a plurality of annotations including at least one of:
(i) reported to cause a disorder and recognized to cause a disorder,
(ii) unreported but expected to cause a disorder,
(iii) unreported and of the type that might be causative of a disorder,
(iv) unreported and unlikely to cause a disorder,
(v) reported and recognized neutral variant, or
(vi) unknown or not expected to be causative of disease, but associated with clinical presentation;
filter, based on the plurality of annotations, the plurality of variants to identify a set of variants from the plurality of variants, each variant from the set of variants being associated with at least one common annotation from the plurality of annotations; and
present the set of variants such that the set of variants can be used to render a clinical diagnosis.
2. The non-transitory processor-readable medium of
3. The non-transitory processor-readable medium of
4. The non-transitory processor-readable medium of
5. The non-transitory processor-readable medium of
6. The non-transitory processor-readable medium of
7. The non-transitory processor-readable medium of
8. The non-transitory processor-readable medium of any one of
9. The non-transitory processor-readable medium of
10. The non-transitory processor-readable medium of
11. The non-transitory processor-readable medium of
12. The non-transitory processor-readable medium of
13. The non-transitory processor-readable medium of
14. The non-transitory processor-readable medium of
15. The non-transitory processor-readable medium of
16. The non-transitory processor-readable medium of
17. The non-transitory processor-readable medium of
18. The non-transitory processor-readable medium of
19. The non-transitory processor-readable medium of
20. The non-transitory processor-readable medium of
21. The non-transitory processor-readable medium of
22. The non-transitory processor-readable medium of
23. The non-transitory processor-readable medium of
24. The non-transitory processor-readable medium of
25. The non-transitory processor-readable medium of
26. The non-transitory processor-readable medium of
27. The non-transitory processor-readable medium of
receive, in a second syntax different from the first syntax, a second plurality of variants identified by a comparison of a third nucleotide sequence with the second nucleotide sequence;
convert the first plurality of variants to a third syntax; and
convert the second plurality of variants to the third syntax.
28. The non-transitory processor-readable medium of
30. The method of
31. The method of
32. The method of
33. The method of
34. The method of
36. The method of
37. The method of
38. The method of
40. The method of
41. The method of
42. The method of
43. The method of
44. The method of
45. The method of
46. The method of
|
While the costs associated with genomic sequencing have dropped dramatically over the last several years, the costs and labor associated with data analysis have remained relatively constant. Thus, there is a great need for tools to support clinical genetics and enable the analysis of an individual's genomic data, for example, in identifying genetic variations potentially associated with a disease phenotype.
In some embodiments, a non-transitory processor-readable medium includes code to cause a processor to receive a set of genetic variants identified by a comparison of an experimental sample DNA sequence with a reference DNA sequence and associate at least one of the set of variants with at least one of a set of functional or genomic annotations each indicative of at least one criterion. The code includes code to cause the processor to filter, based on the set of annotations, the set of variants to identify a subset of variants from the set of variants. Each variant from the subset of variants is associated with at least one common annotation from the set of annotations. The code further includes code to cause the processor to present the subset of variants such that the subset of variants can be used to render a clinical diagnosis.
The invention provides methods for analysis of DNA sequence data and associated software and computer systems. The method, which is generally computer implemented, enables a clinical geneticist or other healthcare technician to sift through vast amounts of DNA sequence data, to identify potential disease-causing genomic variants. In some cases, the DNA sequence data is from a patient who may be suspected of having a genetic disorder.
Therefore, in one aspect, the invention provides a method for identifying a genetic disorder in an individual, or identifying a genetic variant that is causative of a phenotype in an individual. The method comprises determining a DNA sequence for a patient suspected of having a genetic disorder, identifying sequence variants, annotating the identified variants based on one or more criteria, and filtering or searching the variants at least partially based on the annotations, to thereby identify potential disease-causing variants.
In some embodiments, the sequence is obtained by use of a sequencing instrument, or alternatively, DNA sequence data is obtained from another source, such as for example, a commercial sequencing service provider. The term “DNA sequence” as used herein refers not only to chromosomal sequence, but also to cDNA sequence, or any nucleotide sequence information that allows for detection of genetic disease. Generally, the amount of sequence information is such that computational tools are required for data analysis. For example, the sequence data may represent at least half of the individual's genomic or cDNA sequence (e.g., of a representative cell population or tissue), or the individuals entire genomic or cDNA sequence. In various embodiments, the sequence data comprises the nucleotide sequence for at least 1 million base pairs, at least 10 million base pairs, or at least 50 million base pairs. In certain embodiments, the DNA sequence is the individual's exome sequence or full exonic sequence component (i.e., the exome; sequence for each of the exons in each of the known genes in the entire genome). Further, the source of genomic DNA or cDNA may be any suitable source, and may be a sample particularly indicative of a disease or phenotype of interest, including blood cells (e.g, PBMCs, or a T-cell or B-cell population). In certain embodiments, the source of the sample is a tissue or sample that is potentially malignant.
As used herein, “whole genome sequence” includes the entire sequence (including all chromosomes) of an individual's germline genome. In some embodiments, the concatenated length for a whole genome sequence is approximately 3.2 Gbases or 3.2 billion nucleotides.
The DNA sequence may be determined by any suitable method. For example, the DNA sequence may be a cDNA sequence determined by clonal amplification (e.g., emulsion PCR) and sequencing. Base calling may be conducted based on any available method, including Sanger sequencing (chain termination), pH sequencing, pyrosequencing, sequencing-by-hybridization, sequencing-by-ligation, etc. The sequencing output data may be subject to quality controls, including filtering for quality (e.g., confidence) of base reads. Exemplary sequencing systems include 454 pyrosequencing (454 Life Sciences), Illumina (Solexa) sequencing, SOLiD (Applied Biosystems), and Ion Torrent Systems' pH sequencing system.
The DNA sequence is mapped with one or more reference sequences to identify sequence variants. For example, the base reads are mapped against a reference sequence, which in various embodiments is presumed to be a “normal” non-disease sequence. The DNS sequence derived from the Human Genome Project is generally used as a “premier” reference sequence. A number of mapping applications are known, and include GSMAPPER, ELAND, MOSAIK, and MAQ. Various other alignment tools are known, and could also be implemented to map the base reads.
Based on the sequence alignments, and mapping results, sequence variants are identified. Types of variants include insertions, deletions, indels (a colocalized insertion and deletion), translocations, inversions, and substitutions. While the type of variants analyzed are not limited, the most numerous of the variant types will be single nucleotide substitutions, for which a wealth of data is currently available. In various embodiments, comparison of the test sequence with the reference sequence will produce at least 500 variants, at least 1000 variants, at least 3,000 variants, at least 5,000 variants, at least 10,000 variants, but in some embodiments, will produce at least 1 million variants, at least three million variants, or at least 10 million variants. The tools provided herein enable the user to navigate the vast amounts of genetic data to identify potentially disease-causing variants.
A wealth of data is extracted for the identified variants, including one or more of conservation scores, genic/genomic location, zygosity, SNP ID, Polyphen and SIFT predictions, splice site predictions, amino acid properties, disease associations, annotations for known variants, variant or allele frequency data, and gene annotations. Data may be calculated and/or extracted from one or more internal or external databases. Since certain categories of annotations (e.g., amino acid properties/PolyPhen and SIFT data) are dependent on a nature of the region of the genome in which they are contained (e.g., whether a variant is contained within a region translated to give rise to an amino acid sequence in a resultant protein), these annotations can be carried out for each known transcript. Exemplary external databases include OMIM (Online Mendelian Inheritance in Man), HGMD (The Human Gene Mutation Databse), PubMed, PolyPhen, SIFT, SpliceSite, reference genome databases, the University of California Santa Cruz (UCSC) genome database, the BioBase biological databases, the dbSNP Short Genetic Variations database, the Rat Genome Database (RGD), and/or the like. Various other databases may be employed for extracting data on identified variants. Variant information may be further stored in a central data repository, and the data extracted for future sequence analyses.
Based on extracted information, the variants are annotated to facilitate filtering of the variants, such that the variants meeting certain criteria can be easily identified. In some embodiments, variants are interpreted as being benign, pathogenic, or variants of unknown significance, based upon available information including for example, information from human mutational databases, and data stored in a central repository. In some embodiments, the variants are annotated as meeting, for example, one or more of the following criteria:
Alternative or additional bases for annotation may be employed, as long as the annotations essentially categorize variants as having a known association with disease, having a known neutral effect, or having an unknown effect. For variants having an unknown effect, various sub-categories are usually preferred to identify or score variants based on a predicted biological impact. In various embodiments, the annotations are assigned to sequence variants on the basis of a change in encoded amino acid, a change in an RNA processing site, or information from a human gene mutational database. For example, the variants may lead to a change in the encoded amino acid at one or more positions (a non-conservative substitution), and thus the annotation may take into account the damage prediction as a result of such amino acid change to determine whether a resulting phenotype change (and a resulting disorder) is likely. In some embodiments, annotations can be produced for all known transcripts. Annotations may be conducted automatically, as described in detail below, or conducted by independent examination, or both.
Once variants have been annotated, the variants can be filtered, based on the annotations, in response to user queries to identify potential disease causing variants, and/or to confirm or rule out the presence of a genetic disorder. Alternatively, one can browse variants according to other criteria of interest, including browsing or searching for variants in one or more genes of interest or a genetic locus of interest, or browsing or searching for variants with predicted biological impact, which may be based on polypeptide damage predictions, splice site predictions, known gene-disease associations, or matching with variants stored in a central data repository with patient phenotype information, or variant frequency. As described in detail herein, the invention in certain aspects provides a user interface to enable such filtering and searching.
Variants may be tagged by the user with additional descriptive information to aid subsequent analysis. For example, confidence in the existence of the variant can be recorded as confirmed, preliminary, or sequence artifact. Certain sequencing technologies have a tendency to produce certain types of sequence artifacts, and the invention allows such suspected artifacts to be recorded. The variants may be further tagged in basic categories of benign, pathogenic, or unknown, or as potentially of interest.
The method may employ a computer-readable medium, or “non-transitory processor-readable medium.”
In particular, queries can be run to identify variants meeting certain criteria, or variant report pages can be browsed by chromosomal position or by gene, the latter allowing researchers to focus on only those variations that exist in a particular set of genes of interest. In some embodiments, the user selects only variants with well-documented and published disease associations (e.g., by filtering based on HGMD or other disease annotation). Alternatively, the user can filter for variants not previously associated with disease, but of a type likely to be deleterious, such as those introducing frameshifts, non-synonymous substitutions (predicted by Polyphen or SIFT), or premature terminations. Further, the user can exclude from analysis those variants believed to be neutral (based on their frequency of occurrence in studies populations), for example, through exclusion of variants in dbSNP. Additional exclusion criteria include mode of inheritance (e.g., heterozygosity), depth of coverage, and quality score.
In certain embodiments, base calling is carried out to extract the sequence of the sequencing reads from an image file produced by an instrument scanner. Following base calling and base quality trimming/filtering, the reads are mapped against a reference sequence (assumed to be normal for the phenotype under analysis) to identify variations (variants) between the two with the assumption that one or more of these differences will be associated with phenotype of the individual whose DNA is under analysis. Subsequently, each variant is annotated with data that can be used to determine the likelihood that that particular variant is associated with the phenotype under analysis. The analysis may be fully or partially automated as described in detail below, and may include use of a central repository for data storage and analysis, and to present the data to analysts and clinical geneticists in a format that makes identification of variants with a high likelihood of being associated with the phenotypic difference more efficient and effective.
In some embodiments, a user is been provided with the ability to run cross sample queries where the variants from multiple samples are interrogated simultaneously. In such embodiments, for example, a user can build a query to return data on only those variants that are exactly shared across a user defined group of samples. This can be useful for family based analyses where the same variant is believed to be associated with disease in each of the affected family members. For another example, the user can also build a query to return only those variants that are present in genes where the gene contains at least one, but not necessarily the same, variant. This can be useful where a group of individuals with disease are not related (the variants associated with the disease are not necessary exactly the same, but result in a common alteration in normal function). For yet another example, the user can specify to ignore genes containing variants in a user defined group of samples. This can be useful to exclude polymorphisms (variants believed or confirmed not to be associated with disease) where the user has access to a user defined group of control individuals who are believed to not have the disease associated variant. For each of these queries a user can additionally filter the variants by specifying any or all of the previously discussed filters (using the advanced search shown, for example, in
The secondary analysis phase 50 includes mapping the test DNA sequence against a reference DNA sequence, at 16, and identifying variations and/or differences (variants) between the test DNA sequence and the reference DNA sequence, at 18. In some embodiments, the reference DNA sequence can be deemed “normal” and/or “a baseline” for one or more phenotypes, disorders and/or diseases under analysis. Thus, an assumption can be made that one or more variants between the test DNA sequence and the reference DNA sequence can be associated with the phenotype, disorder and/or disease under analysis. In some embodiments, the test DNA sequence and/or the reference DNA sequence is an entire human genome. In other embodiments, the test DNA sequence and/or the reference DNA sequence includes between 1 million base pairs and 50 million base pairs. In still other embodiments, the test DNA sequence and/or the reference DNA sequence includes fewer than 1 million base pairs or greater than 50 million base pairs.
In some embodiments, any suitable application, tool and/or system can be used to perform the primary analysis phase 40 and/or the secondary analysis phase 50. In some embodiments, for example, a tool such as Roche's GSMAPPER, ELAND, SourceForge.net's MAQ (Mapping and Assembling with Qualities), Boston College's MOSAIK and/or the like can be used to perform the primary analysis phase 40 and/or the secondary analysis phase 50. These tools can output a result that includes the identified variants that can be input into a variant analysis system that implements the tertiary analysis phase 60. Different tools can output the results of the secondary analysis phase 50 in different file formats and/or syntaxes. As such, the variant analysis system can be configured to receive various file formats and/or syntaxes of variant data and convert the various file formats and/or syntaxes of variant data into a common file format or syntax prior to performing the tertiary analysis phase 60.
The tertiary analysis phase 60 includes annotating the variants, at 20. Specifically, each variant can be annotated with data that can be used to determine a likelihood that that particular variant is associated with the phenotype, disorder and/or disease under analysis. A variant can be annotated, for example, with functional information, amino acid properties, conservation scores, zygosity data, allele frequencies, quality scores, disease associations, PolyPhen damage predictions, a change in an RNA processing site, splice site predictions, or other information from a human gene mutational database. Additionally, in some embodiments, each variant can be annotated with data that indicates whether (1) the variant has been reported to cause a disorder and recognized to cause a disorder; (2) the variant is unreported but expected to cause a disorder; (3) the variant is unreported and of the type that might be causative of a disorder; (4) the variant is unreported and unlikely to cause a disorder; (5) the variant has been reported and is a recognized neutral variant; and/or (6) the variant is unknown or not expected to be causative of disease but has a known association with clinical presentation. Such annotations are extracted from existing datasets and/or calculated using the variant analysis system.
The variants can then be filtered based on these annotations, at 22 and the filtered variants can be presented to a user (e.g., a clinical geneticist) for use in a clinical diagnosis, at 24. For example, a user can filter the variants using the annotations such that variants meeting a specific criteria are presented. The user can then use these variants in connection with rendering a clinical diagnosis.
The host device 120 can be any type of device configured to send data over the network 170 to and/or receive data from one or more of the communication devices 180. In some embodiments, the host device 120 can be configured to function as, for example, a server device (e.g., a web server device), a network management device, and/or so forth.
The host device 120 includes a memory 124 and a processor 122. The memory 124 can be, for example, a random access memory (RAM), a memory buffer, a hard drive, and/or so forth. In some embodiments, the memory 124 of the host device 120 includes data used to facilitate a variant analysis system. In such embodiments, for example, the host device 120 is configured to execute code that implements the variant analysis system. As such, the host device 120 can send data associated with the variant analysis system to and receive data associated with the variant analysis system from the communication devices 180. For example, as described in further detail herein, the host device 120 can send data associated with a result of a variant analysis to the communication devices 180. Additionally, the host device 120 can receive data associated with a variant filtering request from a communication device 180. In some embodiments, the host device 120 can receive data from the communication devices 180 pertaining to variant analysis. For example, the host device can receive an indication that a user of a communication device 180 wishes to import variant data, a user of a communication device 180 wishes to annotate variant data, a user of a communication device 180 wishes to filter variant data, and/or the like.
In some embodiments, the memory 124 of the host device 120 stores account information associated with users of the variant analysis system. In such embodiments, for example, the host device 120 can store a username and password associated with a user, preferences associated with the user, a listing of variant analyses conducted by the user, and/or the like. In other embodiments, such information is stored in a database (e.g., database 126) operatively coupled to the host device 120.
The database 126 is operatively coupled to the host device 120. In some embodiments, the data associated with the variant analysis (e.g., variant annotation data, variant identification data, data associated with diseases and/or phenotypes, specific variant data imported to the memory 124 by a user of a communication device 180, and/or the like) is stored in the database 126. In such embodiments, the host device 120 can query the database 126 for data associated with the variant analysis. Specifically, when a user wishes to view data associated with a variant analysis, a communication device 180 can send a request for data to the host device 120; the host device 120 can query the database 126 for the requested data; and the host device 120 can send the requested data to the communication device 180.
In some embodiments, for example, the database 126 can be an Oracle 11 g database and/or the like. In some embodiments, the host device 120 can facilitate communication between the communication devices and the database 126 using a middleware layer. In some embodiments, such a middleware layer can be a Java middleware layer and the host device 120 can be a Java application server.
Each of the communication devices 180 can be, for example, a computing entity (e.g., a personal computing device such as a desktop computer, a laptop computer, etc.), a mobile phone, a personal digital assistant (PDA), and/or so forth. Although not shown, in some embodiments, each of the communication devices 180 can include one or more network interface devices (e.g., a network interface card) configured to connect the communication devices 180 to the network 170. In some embodiments, the communication devices 180 can be referred to as client devices.
As shown in
In some embodiments, a web browser application can be stored in the memory 164 of the communication device 160. Using the web browser application, the communication device 160 can send data to and receive data from the host device 120. Similarly, the communication device 150 can include a web browser application. In such embodiments, the communication devices 180 act as thin clients. This allows minimal data to be stored on the communication devices 180. In other embodiments, the communication devices 180 can include an application specific to communicating with the host device 120 when using the variant analysis system. In such embodiments, the communication devices 180 can download the application from the host device 120 prior to running the variant analysis system. In some embodiments, such an application can be, for example, an Adobe Flex application executing on the host device 120 using a web browser application.
As discussed above, the communication devices 180 can send data to and receive data from the host device 120 associated with a variant analysis system. In some embodiments, the data sent between the communication devices 180 and the host device 120 can be formatted using any suitable format. In some embodiments, for example, the data can be formatted using extensible markup language (XML), hypertext markup language (HTML) and/or the like.
In some embodiments, one or more portions of the host device 120 and/or one or more portions of the communication devices 180 can include a hardware-based module (e.g., a digital signal processor (DSP), a field programmable gate array (FPGA)) and/or a software-based module (e.g., a module of computer code to be executed at a processor, a set of processor-readable instructions that can be executed at a processor). In some embodiments, one or more of the functions associated with the host device 120 (e.g., the functions associated with the processor 122) can be included in one or more modules (see, e.g.,
Such modules can be hardware-based modules (e.g., a digital signal processor (DSP), a field programmable gate array (FPGA)) and/or software-based modules (e.g., a module of computer code to be executed at processor 200, a set of processor-readable instructions that can be executed at a processor 200). While each module is shown in
The communication module 210 can facilitate communication between the processor 200 of the host device and one or more communication devices (e.g., communication devices 180 of
The data input module 202 is configured to receive data from a communication device (e.g., via communication module 210). In some embodiments, such data can include variant data associated with a specific individual, variant annotation data, and/or the like. Variant data can include variations identified by mapping a test DNA sequence against a reference DNA sequence as produced in a secondary analysis phase of a sequencing analysis method, as described above with respect to
Variant annotation data can include information and/or identified relationships used to annotate variants. In some embodiments, the data input module 202 can send the data to a database (e.g., database 126) for storage and future retrieval and/or use. In other embodiments, the input module 202 can send the data to another module (e.g., annotation module 204) for further processing and/or analysis.
As discussed above, in some embodiments, different secondary analysis phase tools output variant data in different file formats and/or syntax. Accordingly, in some embodiments, the data input module 202 can be configured to receive variant data in multiple file formats and/or syntaxes. The data input module 202 can then convert the variant data into a single common format and/or syntax. For example, the data input module 202 can be configured to convert variant data received in a first file format and/or syntax and produced by a first secondary analysis phase tool to a second file format and/or syntax. Similarly, the data input module 202 can be configured to convert variant data received in a third file format and/or syntax and produced by a second secondary analysis phase tool to the same second file format and/or syntax. The remaining portions of the variant analysis system can be configured to read and/or analyze the variant data in the common format and/or syntax.
The annotation module 204 is configured to receive variant data from the data input module 202 and associate the variants associated with the variant data with annotations. For example, each variant can be annotated with data that can be used to determine a likelihood that that particular variant is associated a disease, disorder and/or phenotype under analysis. The annotation module 204 can annotate a variant with, for example, functional information, amino acid properties, conservation scores, zygosity data, allele frequencies, quality scores, disease associations, PolyPhen damage predictions, a change in an RNA processing site, splice site predictions, and/or the like. Additionally, in some embodiments, the annotation module 204 can annotate one or more variants with data that indicates whether (1) the variant has been reported to cause a disorder and recognized to cause a disorder; (2) the variant is unreported but expected to cause a disorder; (3) the variant is unreported and of the type that might be causative of a disorder; (4) the variant is unreported and unlikely to cause a disorder; (5) the variant has been reported and is a recognized neutral variant; and/or (6) the variant is unknown or not expected to causative of disease but has a known association with clinical presentation. In some embodiments, variants can be associated with annotations on the basis of a change in encoded amino acid, a change in an RNA processing site or information from a human gene mutational database. Such annotations are extracted from existing datasets and/or calculated using the variant analysis system.
The annotation module 220 includes functionality to calculate variant consequence 282, RNA splice site predictions 284, depth of coverage 286 and variant zygosity 288. In some embodiments, RNA spice site prediction 284 can be calculated using GeneSplicer, Splice Site and/or any other tool or algorithm. Variant consequence 282 can be calculated using PolyPhen, SIFT, and/or the like.
Returning to
The filter module 206 is configured to allow users (e.g., clinical geneticists) to filter, search and/or query the variant data for variants having one or more commonalities. Specifically, the filter module 206 can be configured to receive search parameters from a communication device (e.g., via communication module 210). For example, a user might send an instruction to the filter module 206 to search for all variants that are highly conserved, homozygous and novel (i.e., not identified in a reference data set). For another example, a user might send an instruction to the filter module 206 to search for all variants already associated with a disease. Based on the search parameters, the filter module 206 can provide a query to the database (e.g., database 126 of
The presentation module 208 is configured to provide a graphical user interface to a user of the variant analysis system. In some embodiments, for example, the presentation module 208 sends instructions to a communications device (e.g., via communications module 210) to render various user interfaces on a display of the communications device. For example, the presentation module 208 can send instructions to cause the communications device to provide a data input user interface, which allows a user to load variant data, on a display of the communications device. For another example, the presentation module can send instructions to cause the communications device to provide a variant filter user interface (e.g.,
User interface 300 includes a gene search portion 310, an advanced search portion 320 and a search results summary portion 330. The gene search portion 310 of user interface 300 allows a user to search for and/or filter variants by gene identifier and/or gene set identifier. For example, as shown in
The advanced search portion 320 of user interface 300 allows a user to search for and/or filter variants using a variety of parameters such as, for example, conservation, sift prediction, frequency, polyphen prediction, hetrozygosity, splice site, and/or the like. The advanced search portion 320 of user interface 300 also allows a user to search for and/or filter variants that are synonymous, non-synonymous, novel, known-biobase, known-dbSNP, intergenic, genic, protein coding, intronic, 3 prime UTR and/or 5 prime UTR. The search results summary portion 330 includes a summary of the genes identified as variants using the parameters identified in the gene search portion 310 and the advanced search portion 320.
Specifically, user interface 400 includes a gene summary portion 410 and a variant summary portion 420. The gene summary portion 410 provides information associated with a particular gene such as, for example, a description, protein identifiers, position information, and/or the like. The variant summary portion 420 includes information associated with the variants identified for that gene. Additionally, the variant summary portion includes a variant summary for each known transcript 430.
Returning to
Specifically, user interface 500 includes a selection portion 510, a methodology portion 520, an additional filter portion 530 and a results portion 540. The selection portion 510 allows a user to select samples to be analyzed 510. The methodology portion 520 allows a user of the communication device to select a methodology to be used for cross sample analysis. The additional filter portion 530 can be used to further define the methodology selected in the methodology portion 520. The returned cross sample variant containing genes can be presented to the user as a table in the results portion 540. In some embodiments, additional columns 550 of data, based on the variant annotations, can be added to the table in the results portion 540. A user can select an order in which the data appears in the table of the results portion 540 (e.g., an order of the columns) using selection inputs 560.
At least one of the set of variants is associated with at least one of a set of annotations each indicative of at least one criterion, at 604. As discussed above, in some embodiments, each variant can be annotated with data that can be used to determine a likelihood that that particular variant is associated with a disease, disorder and/or phenotype under analysis. Additionally, such annotations can be extracted from existing datasets and/or calculated using a variant analysis system.
The set of variants is filtered, based on the set of annotations, to identify a subset of variants from the set of variants, at 606. Each variant from the subset of variants is associated with at least one common annotation from the set of annotations. Such filtering can be based on any suitable criteria, such as, for example, the criteria described with respect to filter module 206 of
While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. Where methods described above indicate certain events occurring in certain order, the ordering of certain events may be modified. Additionally, certain of the events may be performed concurrently in a parallel process when possible, as well as performed sequentially as described above.
For example, while shown and described above (e.g., with respect to
Some embodiments described herein relate to a computer storage product with a non-transitory computer-readable medium (also can be referred to as a non-transitory processor-readable medium) having instructions or computer code thereon for performing various computer-implemented operations. The computer-readable medium (or processor-readable medium) is non-transitory in the sense that it does not include transitory propagating signals per se (e.g., a propagating electromagnetic wave carrying information on a transmission medium such as space or a cable). The media and computer code (also can be referred to as code) may be those designed and constructed for the specific purpose or purposes. Examples of non-transitory computer-readable media include, but are not limited to: magnetic storage media such as hard disks, floppy disks, and magnetic tape; optical storage media such as Compact Disc/Digital Video Discs (CD/DVDs), Compact Disc-Read Only Memories (CD-ROMs), and holographic devices; magneto-optical storage media such as optical disks; carrier wave signal processing modules; and hardware devices that are specially configured to store and execute program code, such as Application-Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), Read-Only Memory (ROM) and Random-Access Memory (RAM) devices.
Examples of computer code include, but are not limited to, micro-code or micro-instructions, machine instructions, such as produced by a compiler, code used to produce a web service, and files containing higher-level instructions that are executed by a computer using an interpreter. For example, embodiments may be implemented using Java, C++, or other programming languages (e.g., object-oriented programming languages) and development tools. Additional examples of computer code include, but are not limited to, control signals, encrypted code, and compressed code.
While various embodiments have been described above, it should be understood that they have been presented by way of example only, not limitation, and various changes in form and details may be made. Any portion of the apparatus and/or methods described herein may be combined in any combination, except mutually exclusive combinations. The embodiments described herein can include various combinations and/or sub-combinations of the functions, components and/or features of the different embodiments described.
Worthey, Elizabeth Anabel, Dimmock, David Paul
Patent | Priority | Assignee | Title |
10395759, | May 18 2015 | REGENERON PHARMACEUTICALS, INC | Methods and systems for copy number variant detection |
11568957, | May 18 2015 | Regeneron Pharmaceuticals Inc. | Methods and systems for copy number variant detection |
Patent | Priority | Assignee | Title |
6955883, | Mar 26 2002 | Genetic Technologies Limited | Life sciences business systems and methods |
7584058, | Feb 27 2003 | METHEXIS GENOMICS NV | Genetic diagnosis using multiple sequence variant analysis |
7593818, | Feb 27 2003 | METHEXIS GENOMICS, N V | Genetic diagnosis using multiple sequence variant analysis |
20030157488, | |||
20030186244, | |||
20040091890, | |||
20040101876, | |||
20040142325, | |||
20040267458, | |||
20050191731, | |||
20060286566, | |||
20090104601, | |||
WO101218, | |||
WO248387, | |||
WO3083442, | |||
WO2004061616, | |||
WO2007148997, | |||
WO2008067551, | |||
WO2008132763, | |||
WO2008152656, | |||
WO2009059317, | |||
WO2009117122, | |||
WO2009146460, | |||
WO2009152406, | |||
WO2010018600, | |||
WO2010018601, | |||
WO2010019550, | |||
WO2010061407, | |||
WO2010073266, | |||
WO2010103292, | |||
WO2011004405, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jul 08 2011 | The Medical College of Wisconsin, Inc. | (assignment on the face of the patent) | / | |||
Oct 19 2011 | WORTHEY, ELIZABETH ANABEL | THE MEDICAL COLLEGE OF WISCONSIN, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 027101 | /0562 | |
Oct 19 2011 | DIMMOCK, DAVID PAUL | THE MEDICAL COLLEGE OF WISCONSIN, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 027101 | /0562 | |
Feb 06 2017 | THE MEDICAL COLLEGE OF WISCONSIN, INC | HudsonAlpha Institute for Biotechnology | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 041224 | /0491 |
Date | Maintenance Fee Events |
Feb 15 2017 | ASPN: Payor Number Assigned. |
Jul 28 2017 | LTOS: Pat Holder Claims Small Entity Status. |
Jul 31 2017 | M2551: Payment of Maintenance Fee, 4th Yr, Small Entity. |
May 19 2021 | M2552: Payment of Maintenance Fee, 8th Yr, Small Entity. |
Date | Maintenance Schedule |
May 06 2017 | 4 years fee payment window open |
Nov 06 2017 | 6 months grace period start (w surcharge) |
May 06 2018 | patent expiry (for year 4) |
May 06 2020 | 2 years to revive unintentionally abandoned end. (for year 4) |
May 06 2021 | 8 years fee payment window open |
Nov 06 2021 | 6 months grace period start (w surcharge) |
May 06 2022 | patent expiry (for year 8) |
May 06 2024 | 2 years to revive unintentionally abandoned end. (for year 8) |
May 06 2025 | 12 years fee payment window open |
Nov 06 2025 | 6 months grace period start (w surcharge) |
May 06 2026 | patent expiry (for year 12) |
May 06 2028 | 2 years to revive unintentionally abandoned end. (for year 12) |