Characteristics of proteins, peptides, and/or peptoids can be determined via two-dimensional correlation spectroscopy and/or two-dimensional co-distribution spectroscopies. Spectral data of the proteins, peptides, and/or peptoids can be obtained with respect to an applied perturbation. two-dimensional co-distribution analysis can be applied to generate an asynchronous co-distribution plot for the proteins, peptides, and/or peptoids to define the population of proteins in solution. In the two-dimensional asynchronous plot, a cross peak can be identified as correlating with an auto peak in the two-dimensional correlation synchronous plot associated with aggregation of the proteins, peptides, and/or peptoids. The two-dimensional asynchronous cross peak can be used to determine an order of a distributed presence of spectral intensities with respect to the applied perturbation. For example, for two wavenumbers v1 and v2, the value of the cross peak corresponding to the two wavenumbers can indicate a presence of spectral intensity at v1 relative to the presence of spectral intensity at v2.
|
1. A system for processing data representing a characteristic of proteins, peptides, and/or peptoids, the system comprising:
a data acquisition module, including a quantum cascade laser microscope, configured to:
sequentially acquire spectral images of the proteins, peptides, and/or peptoids in a solution using the quantum cascade laser microscope without the use of exogenous probes or additives, the sequentially acquired spectral images capturing induced changes in spectral intensities as a function of an applied controlled perturbation;
identify and select, in at least one of the acquired spectral images, a region of interest with respect to the applied perturbation; and
select and analyze spectral data including data for side chains of amino acids in the proteins, peptides and/or peptoids in the solution for the region of interest in a plurality of the sequentially acquired spectral images, wherein analyzing the spectral data includes analyzing side chain modes of the proteins, peptides, and/or peptoids as internal probes; and
a correlation analysis module configured to:
apply two-dimensional co-distribution (2DCDS) analysis to generate an asynchronous co-distribution plot for the proteins, peptides, and/or peptoids; and
identify in the asynchronous co-distribution plot a cross peak associated with aggregation of the proteins, peptides, and/or peptoids.
4. Non-transitory computer-readable medium comprising instructions which, when executed by one or more computers, cause the one or more computers to:
obtain sequentially acquired spectral images, taken using a quantum cascade laser microscope, of the proteins, peptides, and/or peptoids in a solution without the use of exogenous probes or additives, the sequentially acquired spectral images capturing induced changes in spectral intensities as a function of an applied controlled perturbation;
identify and select, in at least one of the acquired spectral images, a region of interest with respect to the applied perturbation;
select and analyze spectral data including data for side chains of amino acids in the proteins, peptides and/or peptoids in the solution for the region of interest in a plurality of the sequentially acquired spectral images, wherein analyzing the spectral data includes analyzing side chain modes of the proteins, peptides, and/or peptoids as internal probes;
apply two-dimensional co-distribution (2DCDS) analysis to generate an asynchronous co-distribution plot for the proteins, peptides, and/or peptoids;
identify in the asynchronous co-distribution plot a cross peak that correlates with an auto peak associated with aggregation of the proteins, peptides, and/or peptoids; and
use the cross peak to determine an order of a distributed presence of spectral intensities with respect to the applied perturbation.
2. The system of
3. The system of
5. The system of
6. The system of
7. The system of
8. The system of
|
This application is a National Stage application under 35 U.S.C. 371 of PCT International Application No. PCT/US2017/014338, filed Jan. 20, 2017, which claims the benefit of U.S. Provisional Application No. 62/281,630, filed Jan. 21, 2016, the entirety of which is hereby incorporated by reference.
This invention was made with Government support under Award Nos. 1632420 and 1447918 awarded by the National Science Foundation. The government has certain rights in this invention.
Protein aggregation phenomena are prevalent throughout the industrial bioprocess. Proteins are expensive to express, isolate, and purify due to their complex physical-chemical characteristics. Aggregation is considered a primary mode of protein degradation, at times leading to immunogenicity, anti-drug antibody response (ADA) in patients and a loss of efficacy. The detection and determination of protein aggregates is a major objective in the biopharmaceutical industry and other areas of scientific research. The formation of protein aggregates is important in industrial applications because they can significantly affect the production of protein therapeutics (i.e., biologics or biosimilars), effectively lowering the production yields.
The subject technology is illustrated, for example, according to various aspects described below. Various examples of aspects of the subject technology are described below. These are provided as examples and do not limit the subject technology.
Aspects of the subject technology provide a method for determining aggregation in protein, peptide and/or peptoid formulation, in solution or lyophilized state without the use of probes or additives.
According to aspects of the subject technology, the protein sample is spectroscopically analyzed and the spectral data analyzed using the established method to determine viability of the protein sample. The method and/or portions thereof can be fully automated and be used for the determination of the mechanism of aggregation.
According to aspects of the subject technology, methods described herein can be applied to membrane proteins, hydrophilic proteins, peptides and peptoids as a single component or in binary or ternary mixtures with other peptides or lipid mixtures. When in mixtures, one of the components must be isotopically labeled to allow for the simultaneous detection of each component.
Aspects of the subject technology allow flexibility of the sample preparation, its potential for automation, and data analysis which have proven its utility for pharmaceutical protein formulation.
According to aspects of the subject technology, methods described herein can be applied to any protein, peptide or peptoid sample in several environments, aqueous or lipidic. Methods described herein can be used qualitatively and/or quantitatively for determining protein aggregation. Data analysis is performed through which the mechanism of protein aggregation is determined and the stability and/or viability of the protein, peptide or peptoid can be determined.
According to one aspect of the subject technology, the method involves transmission Fourier transform infrared (“FT-IR”) and/or attenuated total reflectance (“ATR”) spectroscopy, quantum cascade laser microscopy (“QCL”), two-dimensional correlation spectroscopy (“2DCOS”), and/or two-dimensional co-distribution spectroscopy (“2DCDS”) for the analysis of these proteins, peptides or peptoids. According to aspects of the subject technology, spectral data can be obtained using any suitable method and equipment, such as a FT-IR spectrometer, FT-IR microscope, QCL spectrometer or QCL microscope. In aspects of the subject technology, it is preferred to obtain spectral data using a QCL microscope.
Methods, systems, and instructions for processing data representing a characteristic of proteins, peptides, and/or peptoids can include: obtaining spectral data of the proteins, peptides, and/or peptoids with respect to an applied perturbation; applying two-dimensional co-distribution analysis to generate an asynchronous co-distribution plot for the proteins, peptides, and/or peptoids; identifying in the asynchronous co-distribution plot a cross peak that correlates with an auto peak associated with aggregation of the proteins, peptides, and/or peptoids; and using the cross peak to determine an order of a distributed presence of spectral intensities with respect to the applied perturbation.
Using the cross peak can include: determining, for two wavenumbers v1 and v2, whether the cross peak corresponding to the two wavenumbers has a positive value; and when the cross peak has a positive value, determining that a presence of spectral intensity at v1 is distributed within an interval of the applied perturbation that is lower than an interval within which a presence of spectral intensity at v2 is distributed. Using the cross peak can include: determining, for two wavenumbers v1 and v2, whether the cross peak corresponding to the two wavenumbers has a negative value; and when the cross peak has a negative value, determining that a presence of spectral intensity at v2 is distributed within an interval of the applied perturbation that is lower than an interval within which a presence of spectral intensity at v1 is distributed.
Operations can include: applying the two-dimensional correlation analysis to generate a synchronous plot for the proteins, peptides, and/or peptoids; identifying, in the synchronous plot, synchronous peaks associated with aggregation of the proteins, peptides, and/or peptoids; and using the synchronous peaks to determine a degree of overlap of distribution patterns for spectral intensities with respect to the applied perturbation.
Operations can also include: applying two-dimensional correlation analysis, generating a synchronous plot and an asynchronous plot for the proteins, peptides, and/or peptoids; identifying, in the synchronous plot, positive cross peaks that correlate with auto peaks associated with aggregation of the proteins, peptides, and/or peptoids; and using identified peak intensities of the spectral data to determine an amount of aggregation of the proteins, peptides, and/or peptoids.
The amount of aggregation of the proteins, peptides, and/or peptoids can be compared to an order of a distributed presence of spectral intensities with respect to the applied perturbation. Regions of interest can be recognized for discrimination of particulates and solution. A size and a number of particulates can be determined to ascertain population distribution of the particulates. The spectral data can be analyzed to verify signal-to-noise ratio, perform a baseline correction, determine water vapor content, and/or determine signal intensity within a spectral region. Covariance or dynamic spectral data can be generated based on perturbation of a sample. Changes, comprising peak intensities, can be correlated in the spectral data that are in-phase with one another as obtained in the synchronous plot. Elements that change in the spectral data can be determined. An overall greatest intensity change in the spectral data can be determined. An overall smallest intensity change in the spectral data can be determined. A minimum number of underlying spectral contributions in a band, performing curve fitting analysis, and a secondary structure composition of a sample can be determined. Changes, comprising peak intensities, can be correlated in the spectral data that are out-of-phase from one another as obtained in the asynchronous plot.
Additional features and advantages of the subject technology will be set forth in the description below, and in part will be apparent from the description, or may be learned by practice of the subject technology. The advantages of the subject technology will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the subject technology as claimed.
The accompanying drawings, which are included to provide further understanding of the subject technology and are incorporated in and constitute a part of this description, illustrate aspects of the subject technology and, together with the specification, serve to explain principles of the subject technology.
In the following detailed description, specific details are set forth to provide an understanding of the subject technology. It will be apparent, however, to one ordinarily skilled in the art that the subject technology may be practiced without some of these specific details. In other instances, well-known structures and techniques have not been shown in detail so as not to obscure the subject technology.
Proteins are large organic compounds made of amino acids arranged in a linear chain and joined together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. Most proteins fold into unique 3-dimensional structures. The shape into which a protein naturally folds is known as its native state. Although many proteins can fold unassisted, simply through the chemical properties of their amino acids, others require the aid of molecular chaperones to fold into their native states. There are four distinct aspects of a protein's structure:
Proteins are not entirely rigid molecules. In addition to these levels of structure, proteins may shift between several related structures while they perform their biological function. In the context of these functional rearrangements, these tertiary or quaternary structures are usually referred to as “conformations,” and transitions between them are called conformational changes.
Protein aggregation is characterized as a misfolded, rigid protein grouping which is considered a prevalent phenomenon throughout the industrial bioprocess. Aggregation is considered a primary mode of protein degradation, often leading to immunogenicity of the protein and a loss of bioactivity. Protein aggregation is of critical importance in a wide variety of biomedical situations, ranging from abnormal disease states, such as Alzheimer's and Parkinson's disease, to the production, stability and delivery of protein drugs. Protein aggregation, which could be amorphous or fibrillar in nature, can start by one of two different mechanisms: A) self-aggregation, in which the partially-folded intermediates are the immediate precursors for aggregation, and B) hetero-aggregation, in which the aggregation of one protein is mediated by another protein.
The formation of protein aggregates is critical in industrial applications, because it can highly affect the production of protein-based drugs or commercial enzymes, greatly lowering the production yields. The biologics and biosimilar industry is involved in the research, development, and manufacturing of complex drugs that include protein therapeutics. The research and development efficiency can be undesirably low, which increases costs of drug development due to the high attrition rate of protein therapeutics. The cost of protein therapeutic development is significantly impacted by late stage failure. One way to lower research and development costs is to perform a series of evaluations of the protein therapeutic candidate early in the research and development phase. By performing the characterization of the therapeutic protein under varying formulation conditions and stressors early in the research and development phase, a predictive profile of the therapeutic candidate is generated to assess the risk of protein aggregation. This approach has been defined as a developability assessment. This assessment can provide important information for decision making, such as selecting protein therapeutic candidates for further development. When protein aggregation occurs the protein therapeutic typically has decreased efficacy and can elicit an immune response. In severe cases, such an immune response can be fatal.
Several methods have been proposed in the past for the determination of aggregates in mixtures. These prior methods are either designed for a particular protein or peptide and/or require the addition of a foreign probe, and thus, do not represent a generalized method with a universal application to a class of biological molecules. Several spectroscopic techniques have been used, like UV-Vis spectroscopy with the aid of probes, fluorescence spectroscopy also using internal or exogenous probes. Similarly, near UV circular dichroism (“CD”) has been used but is limited to the detection of the aggregate in its immediate vicinity, and nuclear magnetic resonance (“NMR”) could be used to detect protein aggregation by the appearance of band broadening. Sedimentation analysis could also be used to identify the extent of oligomerization as long as the protein of interest has a large enough molar extinction coefficient. Chromatographic techniques such as size exclusion could also detect the presence of protein aggregates. But these techniques may require the use of exogenous probes, large amounts of protein, are time consuming and none allow for the determination of the mechanism of aggregation.
The problem of protein aggregation is complex and frequently involves several different chemical and/or computational processes, which are difficult to discern. Aggregation may be stress induced and involve physical or chemical changes such as agitation, oxidation, deamination and temperature changes. Even a slight change in pH, salt conditions, protein concentration or formulation conditions can also induce protein aggregation. Again, aggregation leads to lower yields in production, loss of efficacy of the protein therapeutic, and safety concerns in relation to immunogenicity risks. Currently available techniques to assess aggregation do not address all of the factors that are involved in the process, such as the size, identity, mechanism and extent of aggregation, and stability of the protein therapeutic in solution. Several techniques have been developed to address the size of the aggregate or particulate, yet they do not determine the identity. Other techniques can determine the size and the identity of aggregates, but cannot determine the extent of aggregation. The amino acid side chains present in a protein are important contributors to the stability of proteins. Yet, the relationship between the weak chemical interactions observed in side chains and the stability of the secondary structure of a protein can not been determined using routine bench instrumentation in a high throughput process.
The stability of the protein therapeutic is also critical for drug development, and cannot be fully characterized by simply identifying the thermal transition temperature of the protein. A greater level of understanding is needed to understand and address the stability of protein therapeutics. For example it would be beneficial to understand 1) the relative stability of the domains within the protein of interest, 2) how the amino acid side chains contribute to the stability the domains, 3) whether the amino acid side chains are involved in the aggregation mechanism, and 4) if an excipient can stabilize weak interactions (e.g., in amino acid side chains) within the critical regions in specific domains of the protein therapeutic. There is a gap in understanding parameters that are important for determining the mechanism of protein aggregation.
When currently commercially available techniques are used orthogonally, differences in the sensitivity of the available techniques is a concern. In general, such techniques focus on determining the size, purity and stability of the protein therapeutic, and evaluate the presence or absence of protein aggregates or particulates in a formulation, to achieve lot-to-lot consistency.
There is a need for technology that can be used to better assess the developability of protein therapeutics, and for the comparability assessments needed to maintain and ensure product integrity, efficacy and safety. Such a process would need to be recognized as sufficient to ensure product integrity, efficacy and safety by the Food and Drug Administration (“FDA”) Center for Drug Evaluation (“CDER”) division and other relevant regulatory bodies.
Solution to the protein aggregation problem for the Biopharma industry would lead to: (1) decreased R&D costs, (2) increased product yields thus ensuring its supply and demand, (3) lower risks of withdrawals, (4) increased FDA approval rates (5) reduce the time-to-market and (6) in turn increase its valuation. Also, the pipeline of new protein therapeutics is poised to address the treatment of cancer and chronic diseases such as rheumatoid arthritis, Chron's disease and neurodegenerative disorders, among others, thus improving the quality of life of patients.
Aspects of the subject technology provide a fast, accurate, and reproducible technique to determine the size, identity, mechanism, and extent of aggregation and the stability of a protein therapeutic, or other chemical, in a single experiment. Aspects of the subject technology address comparability assessment of different protein therapeutic candidates and developability assessment of protein therapeutic candidates. The data can be used for classification and chemical characterization of proteins, polymers, organic materials, inorganic materials for discovery, research and development in pilot scale or manufacturing or for quality control and assurance purposes. Also for the stability assessment during storage and delivery of the protein therapeutic.
The computational methods and systems described herein provide significant improvements over existing analysis for proteins. The computational methods and systems described herein generates and stores data in forms that facilitate efficient and meaningful analysis without requiring the use of several pieces of equipment. Accordingly, the computational methods and systems described herein can improve the efficiency of spectral data analysis for evaluation of candidate drugs.
Aspects of the subject technology include the use of two-dimensional correlation spectroscopy (“2DCOS”) and two-dimensional co-distribution spectroscopy (“2DCDS”) to provide essential information towards the extent and mechanism of aggregation of a protein therapeutic. The methods described herein can include analysis of the side chain modes as internal probes, offering information that confirms the stability of the structural motif or domain within proteins. The methods described herein have been shown to be useful in High Throughput-Developability and Comparability Assessment (“HT-DCA”) via a Design of Experiment (“DOE”) approach that complied with Quality by Design (“QBD”).
According to some embodiments, systems and methods described herein can also be used to determine protein-protein interactions (“PPI's”) or protein-macromolecules (protein-lipid interactions, protein DNA or protein-RNA interactions or protein drug interactions). Also, systems and methods described herein can be used for the analysis of organic solutions, polymers, gels, nanostructures or small liquid crystals, etc.
According to some embodiments, for example as shown in
FT-IR spectroscopy is sensitive to conformational changes and aggregation. This technique allows for qualitative and quantitative analysis of the extent of protein, peptide and peptoid aggregation. The use of 2DCOS allows for further analysis and provides mechanistic information related to the aggregation process. The method may incorporate one or more of the following techniques: Transmission FT-IR Spectroscopy, Attenuated Total Reflectance (“ATR”) FT-IR Spectroscopy, 2DCOS analysis, and/or 2DCDS analysis.
In Transmission FT-IR microscopy or QCL microscopy, sample preparation can involve the use of pure protein, peptide or peptoid, in the appropriate buffer. The sample can be lyophilized and re-suspended in D2O. The protein solution can be applied between a slide and cover and sealed to prevent solvent evaporation. The slide can be set in a slide holder. A similar procedure is used for a reference using the appropriate buffer (PBS or HEPES). A temperature probe located in close contact with the slide is used to register the temperature of the sample. A temperature gradient over time can be used and the acquired spectral data is received automatically through a thermocouple interface. During the spectral analysis the full width at half height (FWHH) of the amide I band can be determined as a function of temperature to establish the transition temperature.
Attenuated Total Reflectance (ATR) FT-IR Spectroscopy can be used for hydrogen/deuterium exchange studies, titration experiments and the determination of the orientation of reconstituted membrane proteins. In this method the protein can be fully exchanged by repeated lyophillization and redissolving the sample in D2O. The fully exchanged protein sample and buffer can be spread as a film independently where the buffer is considered as the reference. Typically, a protein sample in D2O is spread onto the ATR crystal and allowed to dry, using a dry air purge. The subsequent spectrum would be representative of the protein sample and if present, the aggregated form of the protein.
According to some embodiments, spectral data can be generated by any suitable method, such as one or more of the above-described methods. A molecule to be analyzed can be provided in solution with a solute, such as water or D2O, if desired. The concentration of the molecule to be analyzed in solution is preferably with a range that provides a strong signal from the molecule relative to any signal from the solute (e.g., water) or other components of the sample (i.e., a suitable signal to noise ratio), which can facilitate further analysis as described herein. Typically the concentration of a protein or peptide molecule that will provide a desired signal-to-noise ratio is related and proportional to the size of the protein or peptide. Preferred concentrations provide adequate signal-to-noise ratio for analysis. For example, as described further herein, the sample can facilitate analysis of the spectra for the molecule of interest without the need to subtract the spectra attributable to the solute (e.g., water or D2O) or other components of the sample. For example, for an IgG or other protein of about 150 kD, the sample can contain the protein at a concentration of from about 50 mg/mL to about 150 mg/mL. The amount of protein can be varied from this range proportionately to the size of the protein of interest, for example, BSA which is about 67 kD can be analyzed in solution at a concentration of about 25 mg/mL to about 75 mg/mL. The sample can be provided in a cell having a path length. The path length can be longer (e.g., 30-50 μm, preferably about 40 μm) for D2O and shorter (e.g., 4-12 μm) for water.
According to some embodiments, spectral analysis can be performed in stages, for example as illustrated in
According to some embodiments, the protein sample is perturbed (thermally, chemically, pressure, or acoustics) inducing a dynamic fluctuation in the vibrational spectrum. In stage 310, raw spectra data can be collected and/or analyzed. The spectral data can be acquired at regular temperature intervals and in a sequential manner. According to some embodiments, the data can be baseline corrected.
According to some embodiments, the spectral data can be used to determine the existence of the aggregated form of the protein, peptide or peptoid. For this, the first spectrum is subtracted from the subsequent spectra to generate the dynamic spectra. In stage 320, covariance (difference) spectra can be generated by subtraction of the first spectrum (24° C.) from all subsequent spectra. Consequently, the covariance (difference) spectra contains positive and negative peaks; also referred as in- and out-of-phase from one another.
Notably the process described herein does not require the manual subtraction of water or other reference (e.g., solute) from spectral data. Such manual subtraction is a highly subjective step often incurred in protein spectral analysis. Instead, the process described herein generates the difference spectral data set based on the perturbation of the sample of interest. The output thereof can then be used for further analysis. By subtracting the first spectrum which has the overlapping water band along with the amide I band from all subsequent spectra, the spectral contributions of water are automatically subtracted.
In stage 330, a 2D IR correlation technique can be applied to generate a synchronous plot (stage 340) and an asynchronous plot (stage 350). For example, the spectral data can be fast Fourier transformed (“FFT”) to generate the complex matrix from which an intensity matrix is obtained through the cross correlation product the synchronous and asynchronous plots are generated. The techniques for generating these plots will be discussed in greater detail herein.
The synchronous plot represents the intensity changes that occur during the perturbation. On the diagonal of this plot are the peaks or bands (known as auto peaks) that changed throughout the spectrum. Off the diagonal are the cross peaks which show the correlation between the auto peaks, that is, the relationship between the secondary structure changes observed. The synchronous plot can be used to relate the in-phase peak intensity changes or shifts.
In synchronous correlation spectrum, auto peaks at diagonal positions represent the extent of perturbation-induced dynamic fluctuations of spectral signals. Cross peaks represent simultaneous changes of spectral signals at two different wavenumbers, suggesting a coupled or related origin of intensity variations. If the sign of a cross peak is positive, the intensities at corresponding wavenumbers are increasing or decreasing together. If the sign is negative, one is increasing, while the other is decreasing.
The asynchronous plot contains only cross peaks which are used to determine the order of events and thus the mechanism of aggregation of the protein. The asynchronous plot can be used to relate the out-of-phase peak intensity changes or shifts.
In asynchronous correlation spectrum, cross peaks develop only if the intensity varies out of phase with each other for some Fourier frequency components of signal fluctuations. The sign of a cross peak is positive if the intensity change at wavenumber v2 occurs before wavenumber v1. The sign of a cross peak is negative if the intensity change at wavenumber v2 occurs after wavenumber v1. The above sign rules are reversed if the same asynchronous cross peak position translated to the synchronous plot falls in a negative region (Φ(v1, v2)<0).
The 2D IR correlation enhances the spectral resolution of the underlying peaks of broad bands such as the amide I and II bands by spreading the peaks in two dimensions. These plots are symmetrical in nature, and for discussion purposes reference will be made to the top triangle for analysis. The synchronous plot (shown at 340) contains two types of peaks: (a) auto peaks that are positive peaks on the diagonal and (b) cross peaks that are off-diagonal peaks that can be either positive or negative. The asynchronous plot (shown at 350) is comprised exclusively of cross peaks that relate the out-of-phase peaks. As a result this plot reveals greater spectral resolution enhancement. The following rules can apply to establish the order of molecular events:
The order of events can be established for each peak observed in the v2 axis. A table can be provided summarizing the order for each event. In stage 360, a sequential order of events plot is generated using the table summarizing the order of each event. On top of each step (event) is the spectroscopic information of the cross peak, v2, while on the bottom of each step is the corresponding peak assignment or the biochemical information for each event in the order in which they are perturbed as a function of temperature. Examples are provided herein.
Two-dimensional correlation spectroscopy (“2DCOS”) analysis can be used to resolve complex bands, such as the amide I band. An example of 2DCOS analysis is described in U.S. Pat. No. 8,268,628, hereby incorporated herein by reference. The skilled artisan's attention is called to Isao Noda, “Two-dimensional co-distribution spectroscopy to determine the sequential order of distributed presence of species”, Journal of Molecular Structure, Vol. 1069, pp. 51-54, which describes algorithms suitable for use in 2DCOS analysis.
A summary of the development of 2DCOS is as follows. A discretely sampled set of spectra A(vj, tk) can be obtained for a system measured under the influence of an external perturbation, which induces changes in the observed spectral intensities. The spectral variable vj with j=1, 2, . . . , n may be for example wave-number, frequency, scattering angle, etc., and the other variable tk with k=1, 2, . . . , m represents the effect of the applied perturbation, e.g., time, temperature, and electrical potential. Only the sequentially sampled spectral data set obtained during the explicitly defined observation interval between t1 and tm will be used for the 2DCOS analysis. For simplicity, wavenumber and time are used here to designate the two variables, but it is understood that use of other physical variables is also valid.
Dynamic spectrum used in 2D correlation spectroscopy is explicitly defined as
where Ā(vj) is the spectrum of the reference state of the system. In the absence of the a priori knowledge of the reference state, the reference spectrum can also be set as the time-averaged spectrum over the observation interval between t1 and tm.
With this specific choice of the reference spectrum, the portion of dynamic spectra within the observation interval essentially becomes equivalent to the mean-centered spectra. Synchronous and asynchronous 2D correlation spectra Φ(v1, v2) and Ψ(v1, v2), are given by
The term Nij is the element of the so-called Hilbert-Noda transformation matrix given by
Synchronous spectrum Φ(v1, v2) represents the coordinated or simultaneous changes of spectral intensities observed at two different wavenumbers, v1 and v2, along the perturbation variable tk. The sign of the synchronous correlation intensity becomes positive if the spectral intensities measured at the two wavenumbers mostly change in the same direction, either increasing or decreasing. On the other hand, if one is increasing while the other is decreasing, the sign of Φ (v1, v2) becomes negative.
Asynchronous spectrum Ψ(v1, v2) represents the out-of-phase or sequential changes of spectral intensities. If Ψ(v1, v2)=0, the variations of spectral intensities at two wavenumbers, v1 and v2 are completely synchronized. If the signs of Φ(v1, v2) and Ψ(v1, v2) are the same, the overall spectral intensity variation observed at v1 predominantly occurs prior to that at v2. If the signs are different, the order is reversed. Finally, if Φ(v1, v2)=0, the sequential order of intensity variations cannot be determined. It is important to emphasize that 2D correlation spectra only give the sequential order of spectral intensity variations but not the order of the distributed presence of species responsible for the spectral signals.
Referring again to
Two-dimensional co-distribution spectroscopy (“2DCDS”) analysis can be used to analyze a population of protein molecules that are in solution and how the different populations of these proteins behave. The skilled artisan's attention is called to Isao Noda, “Two-dimensional co-distribution spectroscopy to determine the sequential order of distributed presence of species”, Journal of Molecular Structure, Vol. 1069, pp. 54-56, which describes algorithms suitable for use in 2DCDS analysis.
For a set of m time-dependent spectra A(vj, tk) sequentially obtained during the observation interval of t1≤tk≤tm with the time-averaged spectrum Ā(vj) given by Eq. (2), the characteristic (time) index is defined as
Dynamic spectrum Ā(vj, tk) used here is the same as that defined in Eq. (1). The corresponding characteristic time of the distribution of spectral intensity observed at wavenumber vj is given by
Once again, it is understood that time used here is meant to be the generic description of a representative variable of applied perturbation, so that it could be replaced with any other appropriate physical variables, such as temperature, concentration, and pressure, selected specific to the experimental condition. The characteristic time
Given the characteristic times,
where T(v1, v2) is the total joint variance given by
T(v1,v2)=√{square root over (Φ(v1,v1)·Φ(v2,v2))} (10)
Synchronous co-distribution intensity Γ(v1, v2) is a measure of the co-existence or overlap of distributions of two separate spectral intensities along the time axis. In contrast, asynchronous co-distribution intensity Δ(v1, v2) is a measure of the difference in the distribution of two spectral signals. The term “co-distribution” denotes the comparison of two separate distributions, distinguishing this metric from the concept of “correlation” which is based on the comparison of two variations.
By combining Eqs. 6, 7, and 9, the expression for asynchronous co-distribution spectrum is given as
The value of Δ(v1, v2) is set to be zero, if the condition of Ā(v1)=0 or Ā(v2)=0 is encountered, which indicates the lack of spectral intensity signals at either of the wavenumber. Synchronous co-distribution spectrum can be obtained from the relationship
Γ(v1,v2)=√{square root over (T(v1,v1)2−Δ(v2,v2)2)} (12)
In an asynchronous co-distribution spectrum, and for a cross peak with positive sign, i.e., Δ(v1, v2)=0, the presence of spectral intensity at v1 is distributed predominantly at the earlier stage along the time axis compared to that for v2. On the other hand, if Δ(v1, v2)<0, the order is reversed. In the case of Δ(v1, v2)≈0, the average distributions of the spectral intensities observed at two wavenumbers over the time course are similar. Sign of synchronous co-distribution peaks is always positive, which somewhat limits the information content of synchronous spectrum beyond the obvious qualitative measure of the degree of overlap of distribution patterns.
2DCDS is capable of providing elements of the mechanism of aggregation in a protein or any process being investigated in a weighted fashion. 2DCDS can be used to directly provide the sequence of distributed presence of species along the perturbation (e.g., time, temperature, concentration, pressure, etc.) variable axis. The technique can be used as a complementary tool to augment 2DCOS analysis in directly identifying the presence of intermediate species. According to some embodiments, perturbation-dependent spectra are sequentially obtained during an observation interval. 2D correlation spectra (synchronous spectrum and asynchronous spectrum) are derived from the spectral variations. Synchronous co-distribution intensity is measured as the coexistence or overlap of distributions of two separate spectral intensities along the perturbation axis. Asynchronous co-distribution intensity is measured as the difference in the distribution of two spectral signals. For a cross peak with positive sign, i.e., Δ(v1, v2)>0, the presence of spectral intensity at v1 is distributed predominantly at the earlier stage along the time axis compared to that for v2. On the other hand, if Δ(v1, v2)<0, the order is reversed. In the case of Δ(v1, v2)≈0, the average distributions of the spectral intensities observed at two wavenumbers over the time course are similar.
Differences between the 2DCOS analyses provide a mean average description of the pathway due to the perturbation process and its effect on the sample, while the 2DCDS analysis provides the weighted elements in a population of molecules (proteins) during the perturbation process. The result of 2DCOS and 2DCDS is a direct and simplified description of elements that are changing in the spectral data due to the perturbation.
According to some embodiments, for example as shown in
According to some embodiments, for example as shown in
According to some embodiments, for example as shown in
According to some embodiments, for example as shown in
Data manipulation can include auto recognition of regions of interest (ROI) for the discrimination of particulates and solution. The size and number of the particulates can be determined to ascertain population distribution of particulates. Data manipulation can be performed to ensure compliance such as S/N ratio determination, baseline correction, determine water vapor content, and determine signal intensity of the elements of interest within the spectral region studied. Data output for statistical analysis can be simplified using, inter alia, the Design of Experiment approach. The intensity and spectral position of the elements of interest can be output as comma delimited files (*.csv). Covariance, or dynamic spectral data sets can be generated based on the perturbation of the sample of interest, the output of which can be used for further analysis. For example, data output can be provided in a format that facilitates merging with other bioanalytical results for comparability assessment and sourced by: perturbation type, excipient, protein therapeutic, protein concentration, temperature, date of acquisition, and/or bioanalytical technique. This approach would allow for the statistical analysis to be performed for all of the experiments that were carried-out under similar conditions. More importantly, the results of the DOE analysis would be a standalone document ready for final reporting and allow for decision making.
According to some embodiments, methods and systems described herein can apply a correlation function to the covariance or the dynamic spectral data to generate two plots (Synchronous and Asynchronous) this algorithm is termed 2D IR correlation spectroscopy. The changes (e.g., peak intensities) in the spectral data that are in-phase with one another can be correlated as obtained in the synchronous plot. The elements that change in the spectral data can be determined. The overall greatest intensity change in the spectral data can be determined. The overall smallest intensity change in the spectral data can be determined. The minimum number of underlying spectral contribution in a broad band such as the amide band for proteins and peptides can be determined for curve fitting analysis, which allows for the determination of secondary structure composition. The resolution of the spectral region being studied can be enhanced, particularly for broad bands in the spectra.
The changes (e.g., peak intensities) in the spectral data that are out-of-phase from one another can be correlated as obtained in the asynchronous plot. The asynchronous plot also contains the order of events that describe in molecular detail the protein behavior. A detailed evaluation of the plots could be performed to ascertain the order of events. Alternatively or in combination, this process can be automated. A joint variance function can be applied to the covariance or dynamic spectral data to generate the merged asynchronous plot which can be interpreted directly to determine the order of events. This method can alternatively be used to validate the above interpretations for the description of the molecular behavior of a protein which is a complex description. Further information for the curve-fitting routine, the input of the number position and intensity information for the curve-fitting routine could also be an automated process yielding the secondary structure composition of the protein and the extent of protein aggregated species in the samples analyzed. The intensity information from the 2D IR correlation plots can be used for the quantitative determination of oxidative products, such as deamination. For example, deamination can be detected based on side chins. Such analysis can be used for candidate drug selection or during a protein design phase. A machine learning approach can be implemented as a long term solution to the complexity of the attributes needed to be correlated and solved.
According to some embodiments, for example as shown in
TABLE 1
Amino acids as internal probes in D2O
Position
Vibrational
Side
(cm−1)
mode
Chain
Comment
1517
ring bend
Y
immediate surrounding
1545
ν (COO−)
E
pH, salt-bridge, H-bonding, flexibility
and deamination
1567
ν (COO−)
D
pH, salt-bridge, H-bonding, deamination
and flexibility found in β-hairpins
1589
νs (C—N)
R
salt-bridge, H-bonding and flexibility
1609
νa (C—N)
R
salt-bridge, H-bonding and flexibility
1595
ring bend
H
pH, H-bonding
1849
SH
C
covalent interaction, oxidative damage,
long range flexibility
By subtracting the initial spectrum at low temperature from all subsequent spectra, the spectral changes due to the temperature increase are revealed (revealing the changes in the protein behavior) which are referred to as covariance spectral data, but also commonly referred to as difference spectra. A cross correlation function is then applied to these spectral changes to determine the relationship between the peaks observed. Two plots are generated, the synchronous and asynchronous plots which provide the correlation between the resulting peaks observed due to the perturbation of the protein sample. These plots provide a wealth of molecular information and the sequential order of molecular events which describe the behavior of the protein. A synchronous plot (lower left of
A developability and comparability assessment was performed for three antibody drug conjugate fragments (
Also, the spectral analysis of aggregate free ADC2 was performed in the presence of varying excipients (sucrose and NaCl) at near-room temperatures, T=24-26° C. (
A full factorial design of 516 spectra and Normal Distribution Analysis was performed for 43 experiments using the QCL Microscope (QCL) under varying conditions. The QbD experimental setup was such that 324 spectral data were analyzed representing the evaluation of ADC2 in the presence of varying amounts of NaCl, sucrose and varying ratios of both excipients (i.e., NaCl and sucrose). The sample size was determined to be n=8-12 depending on the standard deviation. Developability and comparability assessment was pursued with ADC2 below are the summary of the results obtained at 15, 30 and 60% Sucrose at 26° and 28° C. Similar results were obtained for varying concentration (325, 350 and 400 mM) of NaCl and varying ratios of sucrose and NaCl as excipients. Typically, the results obtained converged with p values greater than 0.8 (
The QCL spectral analysis capabilities of a HT-DCA platform provide further molecular analysis and stability determination of the protein therapeutic. This type of analysis is highly informative, allowing for the optimum design of the protein therapeutic candidate. Two types of correlation analysis were performed: 2DCOS analysis and 2DCDS analysis, providing information regarding the behavior of the protein therapeutic in solution.
Conceptual analysis of the 2D IR correlational plots was applied to infrared spectra of proteins. The amide I′ and side chain bands are broad and comprised of many underlying contributions, whether they are conformationally sensitive as are the carbonyl stretches within the peptide bonds or side chain vibrational modes which are informative of their neighboring environment and weak interactions. To extract this information, covariance spectra are generated by subtracting the reference spectrum from all subsequent spectra. For example, in a protein thermal denaturation study (temperature perturbation), the initial spectrum at low temperature would be used for subtraction. The covariance spectra generated include the changes in intensity due to the temperature increase. A correlation function is then applied to the data set which will relate the changes in intensity observed in the covariance spectra in the form of 2 separate graphs with increased resolution. These plots are capable of resolving highly overlapped bands, establishing the most flexible regions of a protein, deciphering the aggregation mechanism in a protein and establishing protein-target interactions. 2D IR correlational plots are termed synchronous and asynchronous plots. These plots are symmetrical in nature and for interpretation purposes, reference is made to the top half of each plot. The synchronous plot has positive peaks on the diagonal known as the auto-peaks. Auto-peaks contain the overall changes in intensity observed for the entire spectral data set. The magnitude of the change can be identified and used to determine the flexibility or susceptibility a region of the protein may have due to the perturbation. The position and number of these peaks is used to determine the underlying spectral contributions for the amide I′ and side chain bands (see Table 2).
TABLE 2
Summary of the band assignments for ADC2 in HEPES buffer solution
with 15% sucrose
distance
Wavenumber
number
line
(cm)
(cm−1)
Band Assignment
of residues
protein backbone
a′
1.0
1682.6
β-turn
a
1.3
1670.3
β-turn (hinge loop)
b
2.1
1652.8
random coil
c
2.5
1632.0
β-sheet (antiparallel)
d
2.8
1626.4
β-strand
side chains
e
3.7
1609.7
Arg (νas (CN3H5+)
3
f
4.3
1590.1
His (C═C)
9
g
4.9
1580.4
Arg (vs (CN3H5+)
3
h
6.0
1553.6
Asp− (νas (COO−)
10
i
6.4
1543.8
Glu− (νas (COO−)
8
j
7.0
1529.2
COO (C-term)
1
k
7.5
1517.0
Tyr (C═C)
5
The synchronous plot, also has off diagonal peaks known as the cross peaks. These cross peaks determine the relationship of the auto-peaks. The cross peaks observed in the synchronous plot are due to changes in intensity that are in-phase with one another. One can consider 2 peaks whose intensity changed incrementally or vice versa, these two auto-peaks would have an accompanying cross peak that represents their mutual relationship (
The asynchronous plot does not contain peaks on the diagonal, yet renders enhanced spectral resolution. The resulting cross peaks are due to peaks whose intensity in the covariance spectra changed out-of-phase from one another and consequently provides detailed information. Among them, are the sequential order of molecular events due to the thermal perturbation. The cross peaks in the asynchronous plot are either positive or negative and one can determine the sequential order. In general, if the sign of the cross peaks are positive in both plots, the order defined in the asynchronous plot is retained. Therefore, a positive cross peak means v1 occurs prior to v2. This interpretation is designated as true if and only if the same cross peak in the synchronous plot is also positive. However, when the sign of the cross peaks are different in both plots, then the order is reversed.
Applying this to the plots of
Although the temperature perturbation was limited to near room temperature, the analysis still allowed for the determination of the H-bonding interaction between the side chains and its aqueous environment and the excipient (sucrose). Also, these interactions stabilized the secondary structure of ADC2.
The 2DCDS analysis was found to be useful for the evaluation of the dynamics of a protein solution and the distribution of conformational dynamics within a temperature range, in the current case the temperature range was small only 26-28° C. for ADC2 in HEPES buffers and in the presence of 15% sucrose (
No aggregation was observed for this protein. With reference to the asynchronous plot (
TABLE 3
Summary of the curve-fit results stating the secondary structure composition of
ADC fragment 2 at 26° C.
Contributing Sub-bands
Secondary Structure
position
Composition
spectrum #
(cm−1)
Area
assignment
(%)
Comments
12
1688.2
4.1
β-turn
1670.6
12.5
β-turn (hinge loop)
29
sum of β-turns
1639.4
18.8
β-sheet (anti)
32.8
total area was 57.3
1623.7
9.9
β-strand
17.3
essentially β-structure
1655.2
11.9
random coil
20.7
79.10%
13
1691.1
3.27
β-turn
1670.2
13.82
β-turn (hinge loop)
29.4
sum of β-turns
1639.4
19.3
β-sheet (anti)
33.2
total area was 58.1
1624.0
10.2
β-strand
17.5
essentially β-structure
1655.2
11.54
random coil
19.8
80.10%
14
1692.2
1.86
β-turn
1672.2
16.0
β-turn (hinge loop)
29.3
sum of β-turns
1639.9
21.9
β-sheet (anti)
35.9
total area was 60.8
1625.1
10.4
β-strand
17.0
essentially β-structure
1656.1
10.7
random coil
17.6
82.0%
Samples including the National Institute of Standards & Technology Reference Material 8671 (RM8671) Lot No. 14HB-D-002, a humanized IgG1κ monoclonal antibody (NIST mAb), in H2O were studied for analysis according to methods described herein. Samples were added to cells of a CaF2 slide for data acquisition using a QCL microscope. The applied perturbation was temperature within the range of 24-60° C. with 4° C. temperature intervals. QCL IR spectral data was acquired using a 4× magnitude objective at 4 cm−1 with data encoded every 0.5 cm−1 and baseline corrected.
The NIST mAb standard is an IgG1κ protein. The amino acid sequences of the heavy chain (SEQ ID NO:5) and the light chain (SEQ ID NO:6) of the antibody are presented below.
RM 8671 Heavy Chain AA
QVTLRESGPALVKPTQTLTLTCTFSGFSLSTAGMSVGWIRQPPGKALEWL
ADIWWDDKKHYNPSLKDRLTISKDTSKNQVVLKVTNMDPADTATYYCARD
MIFNFYFDVWGQGTTVTVSS ASTKGPSVFPLAPSSKSTSGGTAALGCLV
KDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGQT
QTYICNVNHKPSNTKVDKRV EPKSCDKTHTCPPCP APELLGGSPVFLF
PPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPRE
EQYNSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKTISKAKGQP
REPQVYTLPPSREEMTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKT
TPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSL
SPGK
RM 8671 Light Chain
DIQMTQSPSTLSASVGDRVTITCSASSRVGYMHWYQQKPGKAPKLLIYDT
SKLASGVPSRFSGSGSGTEFTLTISSLQPDDFATYYCFQGSGYPFTFGGG
TKVEIK RTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKV
DNALQSGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQG
LSSPVTKSFNRGEC
Assignment of amino acid side chains for the sample is provided in Tables 4 and 5.
TABLE 4
Assignment of heavy amino acid side chains for NIST mAb in H2O
number of
side chain
position
residues in
item
Heavy Chain
code
vibrational mode
(cm−1)
sequence
1
Tyr
Y
ν (C═C)
1518
32
2
Glu−
E
ν (COO−)
1543-1560
40
3
Asp−
D
ν (COO−)
1570-1574
42
4
His
H
ν (C═C)
1596
20
5
C-term end
ν (COO−)
1598
2
6
Arg
R
νas (CN3H5+)
1673
22
7
Arg
R
νs (CN3H5+)
1633
22
8
Lys
K
δas (NH3+)
1629
70
9
Lys
K
δs (NH3+)
1526
70
10
Asn
N
ν (C═O)
1678
36
11
Gln
Q
ν (C═O)
1670
32
TABLE 5
Assignment of light amino acid side chains for NIST mAb in H2O
number of
side chain
position
residues in
item
light Chain
code
vibrational mode
(cm−1)
sequence
1
Tyr
Y
ν (C═C)
1518
20
2
Glu−
E
ν (COO−)
1543-1560
18
3
Asp−
D
ν (COO−)
1570-1574
20
4
His
H
ν (C═C)
1596
6
5
C-term end
ν (COO−)
1598
2
6
Arg
R
νas (CN3H5+)
1673
12
7
Arg
R
νs (CN3H5+)
1633
22
8
Lys
K
δas (NH3+)
1629
28
9
Lys
K
δs (NH3+)
1526
28
10
Asn
N
ν (C═O)
1678
10
11
Gln
Q
ν (C═O)
1670
24
As shown in
As shown in
Peak assignments of NIST mAb at 50 mg/mL in H2O are provided in Table 6.
TABLE 6
Summary of peak assignments of NIST mAb at 50 mg/mL in H2O
Peak Assignment
Peak Position (cm−1)
β-turn
1692
β-turn
1681.6
Arg
1668
Hinge Loop
1660
α-helix
1652
β-sheet
1635
Agg
1618
His
1602
Asp−
1573
Glu−
1542
Try
1515.8
Note:
Aggregation (Agg)
The sequential order of events for NIST mAb at 50 mg/mL in H2O under thermal stress within the temperature range of 24-60° C. is shown in
The sequential order of events for the NIST mAb at 50 mg/mL in H2O is provided in Table 7.
TABLE 7
Summary of sequential order of events for NIST mAb at 50 mg/mL
in H2O
Event
Asynchronous and Synchronous plot analysis
1
β-turn (1681.6 cm−1) → Arg (1668 cm−1)
2
Asp− (1573 cm−1), Glu− (1542 cm−1), Try (1515.8 cm−1) → α-
helix (1652 cm−1)
3
Arg (1668 cm−1) → α-helix (1652 cm−1)
4
Hinge loop (1660 cm−1) → His (1602 cm−1)
5
β-turn (1681.6 cm−1) → His (1602 cm−1)
6
Hinge loop (1660 cm−1) → Agg (1618 cm−1)
7
α-helix (1652 cm−1) → Agg (1618 cm−1)
8
Hinge loop (1660 cm−1) → α-helix (1652 cm−1)
9
His (1602 cm−1) → β-turn (1692 cm−1)
10
β-sheet (1635 cm−1) → β-turn (1692 cm−1)
Note:
Aggregation (Agg)
β-sheet and β-turn appear as coupled modes indicating presence of antiparallel β-sheet
Samples including Bovine Serum Albumin (“BSA”) in H2O were studied for analysis according to methods described herein. Samples were added to cells of a CaF2 slide for data acquisition using a QCL microscope. The applied perturbation was temperature within the range of 24-60° C. with 4° C. temperature intervals. QCL spectral data was acquired using a 4× magnitude objective at 4 cm−1 with data encoded every 0.5 cm−1 and baseline corrected.
(SEQ ID NO: 7)
DTHKSEIAHRFKDLGEEHFKGLVLIAFSQYLQQCPFDEHVKLVNELTEFA
KTCVADESHAGCEKSLHTLFGDELCKVASLRETYGDMADCCEKQEPERNE
CFLSHKDDSPDLPKLKPDPNTLCDEFKADEKKFWGKYLYEIARRHPYFYA
PELLYYANKYNGVFQECCQAEDKGACLLPKIETMREKVLTSSARQRLRCA
SIQKFGERALKAWSVARLSQKFPKAEFVEVTKLVTDLTKVHKECCHGDLL
ECADDRADLAKYICDNQDTISSKLKECCDKPLLEKSHCIAEVEKDAIPEN
LPPLTADFAEDKDVCKNYQEAKDAFLGSFLYEYSRRHPEYAVSVLLRLAK
EYEATLEECCAKDDPHACYSTVFDKLKHLVDEPQNLIKQNCDQFEKLGEY
GFQNALIVRYTRKVPQVSTPTLVEVSRSLGKVGTRCCTKPESERMPCTED
YLSLILNRLCVLHEKTPVSEKVTKCCTESLVNRRPCFSALTPDETYVPKA
FDEKLFTFHADICTLPDTEKQIKKQTALVELLKHKPKATEEQLKTVMENF
VAFVDKCCAADDKEACFAVEGPKLVVSTQTALA
Assignment of amino acid side chains for the sample is provided in Table 8.
TABLE 8
Assignment of amino acid side chains for BSA in H2O
number of
position
residues in
item
side chain
code
vibrational mode
(cm−1)
sequence
1
Tyr
Y
ν (C═C)
1518
20
2
Glu−
E
ν (COO−)
1543-1560
62
3
Asp−
D
ν (COO−)
1570-1574
39
4
His
H
ν (C═C)
1596
17
5
C-term end
ν (COO−)
1598
1
6
Arg
R
νas (CN3H5+)
1673
21
7
Arg
R
νs (CN3H5+)
1633
21
8
Lys
K
δas (NH3+)
1629
48
9
Lys
K
δs (NH3+)
1526
48
10
Asn
N
ν (C═O)
1678
14
11
Gln
Q
ν (C═O)
1670
21
As shown in
Peak assignments of BSA at 40 mg/mL are provided in Table 9.
TABLE 9
Summary of peak assignments of BSA at 40 mg/mL
Peak Assignment
Peak Position (cm−1)
β-turn
1698
β-turn
1684
Arg
1672.4
α-helix
1653.9
β-sheet/Arg
1629.6
His
1606.5
Asp−
1584.5
Asp−
1576.4
Asp−
1567.1
Glu−
1559
Glu−
1541.7
Lys
1530
Lys
1525.5
Tyr
1518.5
The sequential order of events for BSA at 40 mg/mL under thermal stress within the temperature range of 24-60° C. is shown in
TABLE 10
Summary of sequential order of events for BSA at 40 mg/mL
Event
Asynchronous and Synchronous plot analysis
1
Asp− (1567 cm−1) → Asp− (1584 cm−1)
2
Lys (1530 cm−1), Lys (1525.5 cm−1) → β-sheet/Arg (1629.5 cm−1)
3
Glu− (1541.7 cm−1) → α-helix (1653.9 cm−1)
4
Asp− (1584 cm−1) → Glu− (1541.7 cm−1)
5
β-sheet/Arg (1629 cm−1) → His (1606.5 cm−1)
6
Tyr (1518.5 cm−1) → His (1606.5 cm−1)
7
α-helix (1652 cm−1) → Asp− (1576.4 cm−1)
8
Arg (1672.4 cm−1) → Glu− (1559 cm−1)
9
His (1606.5 cm−1) → β-turn (1684 cm−1)
10
Glu− (1559 cm) → Asp− (1576.4 cm)
11
β-turn (1698 cm−1) → β-turn (1684 cm−1)
The aspartates (1567 cm−1) and glutamates (1584 cm−1) located within the helical regions (1653.9 cm−1) that are involved in salt bridge interactions with lysines (1530.0 and 1525.5 cm−1) are perturbed first; followed by the β-sheets (1629.6 cm−1), then the tyrosines (1518 cm−1) and histidines (1606.5 cm−1) within the antiparallel β-sheets (1629.6 cm−1) β-turns (1698 cm−1) are perturbed. Finally at high temperature the salt bridge interactions involving arginines with glutamates (1560 cm−1) and aspartates (1576.4 cm−1) located close to β-turns (1684.0 cm−1) are perturbed.
Samples including a mixture of NIST mAb and BSA in H2O were studied for analysis according to methods described herein. Samples were added to cells of a CaF2 slide for data acquisition using a QCL microscope. The applied perturbation was temperature within the range of 24-60° C. with 4° C. temperature intervals. QCL spectral data was acquired using a 4× magnitude objective at 4 cm−1 with data encoded every 0.5 cm−1 and baseline corrected.
As shown in
Peak assignments of NIST mAb/BSA are provided in Table 11.
TABLE 11
Summary of peak assignments of NIST mAb/BSA
Protein
Peak Assignment
Peak Position (cm−1)
NIST mAb
β-turn
1692
NIST mAb
β-turn
1681.6
NIST mAb
Arg
1668
BSA
α-helix
1653.9
NIST mAb
α-helix
1652
NIST mAb
β-sheet
1635
BSA
β-sheet/Arg
1629.6
NIST mAb
Agg
1618
BSA
His
1606.5
NIST mAb
Asp−
1573
BSA
Asp−
1567.1
BSA
Glu−
1559
NIST mAb
Glu−
1542
BSA
Glu−
1541.7
BSA
Lys
1525.5
Note:
Aggregation (Agg)
Samples including Lysozyme in H2O were studied for analysis according to methods described herein. Custom CaF2 slide cells were used with 7 μm path-length for samples in H2O. The applied perturbation was temperature within the range of 24-60° C. with 4° C. temperature intervals. QCL IR spectral data was acquired using a 4× magnitude objective at 4 cm−1 with data encoded every 0.5 cm−1 and baseline corrected.
Below is an amino acid sequence for the Lysozyme analyzed.
(SEQ ID NO: 8)
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGS
TDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVS
DGNGMNAWVAWRNRCKGTDVQAWIRGCRL
Assignment of amino acid side chains for the sample is provided in Table 12.
TABLE 12
Assignment of amino acid side chains for Lysozyme in H2O
number of
position
residues in
item
side chain
code
vibrational mode
(cm−1)
sequence
1
Tyr
Y
ν (C═C)
1518
3
2
Glu−
E
ν (COO−)
1543-1560
5
3
Asp−
D
ν (COO−)
1570-1574
7
4
His
H
ν (C═C)
1596
1
5
C-term end
ν (COO−)
1598
1
6
Arg
R
νas (CN3H5+)
1673
11
7
Arg
R
νs (CN3H5+)
1633
11
8
Lys
K
δas (NH3+)
1629
6
9
Lys
K
δs (NH3+)
1526
6
10
Asn
N
ν (C═O)
1678
15
11
Gln
Q
ν (C═O)
1670
4
As shown in
Peak assignments of Lysozyme at 600 mg/mL are provided in Table 13.
TABLE 13
Summary of peak assignments of Lysozyme at 600 mg/mL
Peak Assignment
Peak Position (cm−1)
β-turn
1698
β-turn
1683.8
Arg
1672.4
Arg/Asn/Gln
1666.6
Hinge loop
1660.5
α-helix
1647
β-sheet
1637.2
Arg
1628.7
His
1596.6
Asp−
1572.3
Asp−
1566.1
Glu−
1556.3
Glu−
1547.8
Glu−
1536.8
Lys
1526.9
Tyr
1514.6
The sequential order of events for Lysozyme at 600 mg/mL under thermal stress within the temperature range of 24-60° C. is shown in
TABLE 14
Summary of sequential order of events for Lysozyme at 600 mg/mL
Event
Asynchronous and Synchronous plot analysis
1
Tyr (1514.6 cm−1) → Lys (1526.9 cm−1)
2
Lys (1526.9 cm−1)→ β-sheet (1637.2 cm−1)
3
Lys (1526.9 cm−1) → Arg (1628.7 cm−1)
4
Arg (1628.7 cm−1) → Glu− (1536.8 cm−1)
5
β-sheet (1637.2 cm−1) → Glu− (1556.3 cm−1)
6
Glu− (1536.8 cm−1) → Glu− (1556.3 cm−1)
7
Glu− (1556.3 cm−1) → Glu− (1547.8 cm−1)
8
α-helix (1647 cm−1) → β-turn (1683 cm−1)
9
β-turn (1698 cm−1) → Arg/Asn/Gln (1666.6 cm−1)
10
Glu− (1547.8 cm−1) → Asp− (1566.1 cm−1, 1572.3 cm−1)
11
Glu− (1547.8 cm−1) → Hinge loop (1660.5 cm−1)
12
Glu− (1556.3 cm−1) → α-helix (1647 cm−1)
13
Hinge loop (1660.5 cm−1) → His (1596.6 cm−1)
14
Tyr (1514.6 cm−1) → Arg/Asn/Gln (1666.6 cm−1)
The tyrosines (1514.6 cm−1) and lysines (1526.9 cm−1) are perturbed first, followed by the arginines (1628.7 cm−1) then the β-sheets (1637.2 cm−1), then the glutamates (1536.8 cm−1) within the β-sheets followed by the glutamates (1556 cm−1) located within the helical regions (1647.0 cm−1) and the β-turns (1698.0 cm−1 and 1683.8 cm−1) followed by glutamates (1547.8 cm−1) the hinge loops (1660.5 cm−1) then the aspartates (1566.1,1672.3 cm−1) and a single histidine (1596.6 cm−1) presumably interacting with an aspartate by H-bonding interaction located near the N-terminal end and finally the Arg, Asn, Gln all assigned to (1666.6 cm−1). No aggregation was observed.
The computer system 1900 includes a bus 1908 or other communication mechanism for communicating information, and a processor 1902 coupled with the bus 1908 for processing information. By way of example, the computer system 1900 may be implemented with one or more processors 1902. The processor 1902 may be a general-purpose microprocessor, a microcontroller, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a state machine, gated logic, discrete hardware components, and/or any other suitable entity that can perform calculations or other manipulations of information.
The computer system 1900 can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them stored in an included memory 1904, such as a Random Access Memory (RAM), a flash memory, a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable PROM (EPROM), registers, a hard disk, a removable disk, a CD-ROM, a DVD, and/or any other suitable storage device, coupled to the bus 1908 for storing information and instructions to be executed by the processor 1902. The processor 1902 and the memory 1904 can be supplemented by, or incorporated in, special purpose logic circuitry.
The instructions may be stored in the memory 1904 and implemented in one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, the computer system 1900, and according to any method well known to those of skill in the art, including, but not limited to, computer languages such as data-oriented languages (e.g., SQL, dBase), system languages (e.g., C, Objective-C, C++, Assembly), architectural languages (e.g., Java, .NET), and/or application languages (e.g., PHP, Ruby, Perl, Python). Instructions may also be implemented in computer languages such as array languages, aspect-oriented languages, assembly languages, authoring languages, command line interface languages, compiled languages, concurrent languages, curly-bracket languages, dataflow languages, data-structured languages, declarative languages, esoteric languages, extension languages, fourth-generation languages, functional languages, interactive mode languages, interpreted languages, iterative languages, list-based languages, little languages, logic-based languages, machine languages, macro languages, metaprogramming languages, multiparadigm languages, numerical analysis, non-English-based languages, object-oriented class-based languages, object-oriented prototype-based languages, off-side rule languages, procedural languages, reflective languages, rule-based languages, scripting languages, stack-based languages, synchronous languages, syntax handling languages, visual languages, wirth languages, and/or xml-based languages. The memory 1904 may also be used for storing temporary variable or other intermediate information during execution of instructions to be executed by the processor 1902.
A computer program as discussed herein does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, subprograms, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network. The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.
The computer system 1900 further includes a data storage device 1906 such as a magnetic disk or optical disk, coupled to the bus 1908 for storing information and instructions. The computer system 1900 may be coupled via an input/output module 1910 to various devices (e.g., devices 1914 and 1916). The input/output module 1910 can be any input/output module. Exemplary input/output modules 1910 include data ports (e.g., USB ports), audio ports, and/or video ports. In some embodiments, the input/output module 1910 includes a communications module. Exemplary communications modules include networking interface cards, such as Ethernet cards, modems, and routers. In certain aspects, the input/output module 1910 is configured to connect to a plurality of devices, such as an input device 1914 and/or an output device 1916. Exemplary input devices 1914 include a keyboard and/or a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer system 1900. Other kinds of input devices 1914 can be used to provide for interaction with a user as well, such as a tactile input device, visual input device, audio input device, and/or brain-computer interface device. For example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, and/or tactile feedback), and input from the user can be received in any form, including acoustic, speech, tactile, and/or brain wave input. Exemplary output devices 1916 include display devices, such as a cathode ray tube (CRT) or liquid crystal display (LCD) monitor, for displaying information to the user.
According to certain embodiments, a client device and/or a server can be implemented using the computer system 1900 in response to the processor 1902 executing one or more sequences of one or more instructions contained in the memory 1904. Such instructions may be read into the memory 1904 from another machine-readable medium, such as the data storage device 1906. Execution of the sequences of instructions contained in the memory 1904 causes the processor 1902 to perform the process steps described herein. One or more processors in a multi-processing arrangement may also be employed to execute the sequences of instructions contained in the memory 1904. In some embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement various aspects of the present disclosure. Thus, aspects of the present disclosure are not limited to any specific combination of hardware circuitry and software.
Various aspects of the subject matter described in this specification can be implemented in a computing system that includes a back end component (e.g., a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface and/or a Web browser through which a user can interact with an implementation of the subject matter described in this specification), or any combination of one or more such back end, middleware, or front end components. The components of the system 1900 can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network and a wide area network.
The term “machine-readable storage medium” or “computer readable medium” as used herein refers to any medium or media that participates in providing instructions to the processor 1902 for execution. Such a medium may take many forms, including, but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as the data storage device 1906. Volatile media include dynamic memory, such as the memory 1904. Transmission media include coaxial cables, copper wire, and fiber optics, including the wires that comprise the bus 1908. Common forms of machine-readable media include, for example, floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH EPROM, any other memory chip or cartridge, or any other medium from which a computer can read. The machine-readable storage medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them.
As used herein, a “processor” can include one or more processors, and a “module” can include one or more modules.
In an aspect of the subject technology, a machine-readable medium is a computer-readable medium encoded or stored with instructions and is a computing element, which defines structural and functional relationships between the instructions and the rest of the system, which permit the instructions' functionality to be realized. Instructions may be executable, for example, by a system or by a processor of the system. Instructions can be, for example, a computer program including code. A machine-readable medium may comprise one or more media.
As used herein, the word “module” refers to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example C++. A software module may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpretive language such as BASIC. It will be appreciated that software modules may be callable from other modules or from themselves, and/or may be invoked in response to detected events or interrupts. Software instructions may be embedded in firmware, such as an EPROM or EEPROM. It will be further appreciated that hardware modules may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors. The modules described herein are preferably implemented as software modules, but may be represented in hardware or firmware.
It is contemplated that the modules may be integrated into a fewer number of modules. One module may also be separated into multiple modules. The described modules may be implemented as hardware, software, firmware or any combination thereof. Additionally, the described modules may reside at different locations connected through a wired or wireless network, or the Internet.
In general, it will be appreciated that the processors can include, by way of example, computers, program logic, or other substrate configurations representing data and instructions, which operate as described herein. In other embodiments, the processors can include controller circuitry, processor circuitry, processors, general purpose single-chip or multi-chip microprocessors, digital signal processors, embedded microprocessors, microcontrollers and the like.
Furthermore, it will be appreciated that in one embodiment, the program logic may advantageously be implemented as one or more components. The components may advantageously be configured to execute on one or more processors. The components include, but are not limited to, software or hardware components, modules such as software modules, object-oriented software components, class components and task components, processes methods, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
The foregoing description is provided to enable a person skilled in the art to practice the various configurations described herein. While the subject technology has been particularly described with reference to the various figures and configurations, it should be understood that these are for illustration purposes only and should not be taken as limiting the scope of the subject technology.
There may be many other ways to implement the subject technology. Various functions and elements described herein may be partitioned differently from those shown without departing from the scope of the subject technology. Various modifications to these configurations will be readily apparent to those skilled in the art, and generic principles defined herein may be applied to other configurations. Thus, many changes and modifications may be made to the subject technology, by one having ordinary skill in the art, without departing from the scope of the subject technology.
It is understood that the specific order or hierarchy of steps in the processes disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged. Some of the steps may be performed simultaneously. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.
As used herein, the phrase “at least one of” preceding a series of items, with the term “and” or “or” to separate any of the items, modifies the list as a whole, rather than each member of the list (i.e., each item). The phrase “at least one of” does not require selection of at least one of each item listed; rather, the phrase allows a meaning that includes at least one of any one of the items, and/or at least one of any combination of the items, and/or at least one of each of the items. By way of example, the phrases “at least one of A, B, and C” or “at least one of A, B, or C” each refer to only A, only B, or only C; any combination of A, B, and C; and/or at least one of each of A, B, and C.
Terms such as “top,” “bottom,” “front,” “rear” and the like as used in this disclosure should be understood as referring to an arbitrary frame of reference, rather than to the ordinary gravitational frame of reference. Thus, a top surface, a bottom surface, a front surface, and a rear surface may extend upwardly, downwardly, diagonally, or horizontally in a gravitational frame of reference.
Furthermore, to the extent that the term “include,” “have,” or the like is used in the description or the claims, such term is intended to be inclusive in a manner similar to the term “comprise” as “comprise” is interpreted when employed as a transitional word in a claim.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments.
A reference to an element in the singular is not intended to mean “one and only one” unless specifically stated, but rather “one or more.” Pronouns in the masculine (e.g., his) include the feminine and neuter gender (e.g., her and its) and vice versa. The term “some” refers to one or more. Underlined and/or italicized headings and subheadings are used for convenience only, do not limit the subject technology, and are not referred to in connection with the interpretation of the description of the subject technology. All structural and functional equivalents to the elements of the various configurations described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and intended to be encompassed by the subject technology. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the above description.
While certain aspects and embodiments of the subject technology have been described, these have been presented by way of example only, and are not intended to limit the scope of the subject technology. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms without departing from the spirit thereof. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the subject technology.
Pastrana-Rios, Belinda, Rodriguez-Toro, Jose Javier
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
8268628, | Apr 13 2007 | University of Puerto Rico | Method for determination of protein, peptide or peptoid aggregation, stability, and viability and system using the same |
9702810, | Sep 22 2014 | University of Puerto Rico | Dual cell holder system |
WO2017184886, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jan 20 2017 | Protein Dynamic Solutions, Inc. | (assignment on the face of the patent) | / | |||
Dec 26 2018 | PROTEIN DYNAMIC SOLUTIONS LLC | PROTEIN DYNAMIC SOLUTIONS, INC | MERGER AND CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 049849 | /0705 | |
May 02 2019 | RODRIGUEZ-TORO, JOSE JAVIER | PROTEIN DYNAMIC SOLUTIONS LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 049849 | /0727 | |
May 13 2019 | PASTRANA-RIOS, BELINDA | PROTEIN DYNAMIC SOLUTIONS LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 049849 | /0727 | |
Dec 20 2019 | PASTRANA-RIOS, BELINDA | PROTEIN DYNAMIC SOLUTIONS, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 051428 | /0282 |
Date | Maintenance Fee Events |
Jul 18 2018 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Jul 27 2018 | SMAL: Entity status set to Small. |
May 15 2024 | M2551: Payment of Maintenance Fee, 4th Yr, Small Entity. |
Date | Maintenance Schedule |
Dec 01 2023 | 4 years fee payment window open |
Jun 01 2024 | 6 months grace period start (w surcharge) |
Dec 01 2024 | patent expiry (for year 4) |
Dec 01 2026 | 2 years to revive unintentionally abandoned end. (for year 4) |
Dec 01 2027 | 8 years fee payment window open |
Jun 01 2028 | 6 months grace period start (w surcharge) |
Dec 01 2028 | patent expiry (for year 8) |
Dec 01 2030 | 2 years to revive unintentionally abandoned end. (for year 8) |
Dec 01 2031 | 12 years fee payment window open |
Jun 01 2032 | 6 months grace period start (w surcharge) |
Dec 01 2032 | patent expiry (for year 12) |
Dec 01 2034 | 2 years to revive unintentionally abandoned end. (for year 12) |