User interactions with a database storage system allow creation of virtual databases based on point-in-time copies associated with a source database. Multiple point-in-time copies are obtained for each source database. A point-in-time copy retrieves data changed in the source database since the retrieval of a previous point-in-time copy. A virtual database (VDB) is created by creating a set of files in the data storage system and mounting the files on a database server allowing the database server to access the files. User interactions allow the user to specify the source database, a point in time associated with the source database and a destination server to create the virtual database. User input can specify other attributes associated with the virtual database including the file paths, database parameters etc. The user can specify schedules of various actions, including making and retention of point-in-time copies.
|
1. A method of creating a virtual database system, the method comprising:
storing, by a database storage system, point-in-time copies of one or more source databases, wherein each of the one or more source databases stores metadata describing data stored in the source database, wherein a point-in-time copy of a given source database stored in the database storage system comprises one or more database blocks associated with multiple point-in-time copies of the given source database, wherein information stored in a database block comprises metadata associated with the database block;
configuring for presentation, by the database storage system, one or more user interfaces for receiving a request for creating a virtual database, the one or more user interfaces configured to receive:
information identifying a source database from the one or more source databases,
information identifying a point in time associated with the source database, and
information identifying a destination database server for accessing the virtual database being created;
receiving, from the one or more user interfaces, information identifying a particular source database for the virtual database being created;
receiving, from the one or more user interfaces, information identifying a particular point in time associated with the particular source database;
receiving, from the one or more user interfaces, information identifying a particular destination database server for accessing the virtual database being created;
creating the virtual database on the database storage system, wherein the created virtual database stores data corresponding to a state of the particular source database, the state associated with the particular point in time, wherein the created virtual database shares database blocks with one or more other virtual databases stored in the database storage system; and
providing access to the particular destination database server, the access for allowing the particular destination database server to perform read and write operations on the created virtual database.
15. A non-transitory computer readable storage medium storing instructions for:
storing, by a database storage system, point-in-time copies of one or more source databases, wherein each of the one or more source databases stores metadata describing data stored in the source database, wherein a point-in-time copy of a given source database stored in the database storage system comprises one or more database blocks associated with multiple point-in-time copies of the given source database, wherein information stored in a database block comprises metadata associated with the database block;
configuring for presentation, by the database storage system, one or more user interfaces for receiving a request for creating a virtual database, the one or more user interfaces configured to receive:
information identifying a source database from the one or more source databases,
information identifying a point in time associated with the source database, and
information identifying a destination database server for accessing the virtual database being created;
receiving, from the one or more user interfaces, information identifying a particular source database for the virtual database being created;
receiving, from the one or more user interfaces, information identifying a particular point in time associated with the particular source database;
receiving, from the one or more user interfaces, information identifying a particular destination database server for accessing the virtual database being created;
creating the virtual database on the database storage system, wherein the created virtual database stores data corresponding to a state of the particular source database, the state associated with the particular point in time, wherein the created virtual database shares database blocks with one or more other virtual databases stored in the database storage system; and
providing access to the particular destination database server, the access for allowing the particular destination database server to perform read and write operations on the created virtual database.
30. A computer system comprising:
an electronic processor; and
a non-transitory computer readable storage medium for storing instructions for:
storing, by a database storage system, point-in-time copies of one or more source databases, wherein each of the one or more source databases stores metadata describing data stored in the source database, wherein a point-in-time copy of a given source database stored in the database storage system comprises one or more database blocks associated with multiple point-in-time copies of the given source database, wherein information stored in a database block comprises metadata associated with the database block;
configuring for presentation, by the database storage system, one or more user interfaces for receiving a request for creating a virtual database, the one or more user interfaces configured to receive:
information identifying a source database from the one or more source databases,
information identifying a point in time associated with the source database, and
information identifying a destination database server for accessing the virtual database being created;
receiving, from the one or more user interfaces, information identifying a particular source database for the virtual database being created;
receiving, from the one or more user interfaces, information identifying a particular point in time associated with the particular source database;
receiving, from the one or more user interfaces, information identifying a particular destination database server for accessing the virtual database being created;
creating the virtual database on the database storage system, wherein the created virtual database stores data corresponding to a state of the particular source database, the state associated with the particular point in time, wherein the created virtual database shares database blocks with one or more other virtual databases stored in the database storage system; and
providing access to the particular destination database server, the access for allowing the particular destination database server to perform read and write operations on the created virtual database.
2. The method of
3. The method of
4. The method of
5. The method of
a live production database;
a storage level snapshot of a production database;
a clone of a production database; or
a previously created virtual database.
6. The method of
7. The method of
8. The method of
receiving, from the one or more user interfaces, selection of a particular host server associated with the particular source database;
discovering databases installed on the particular host server; and
sending for presentation via the one or more user interfaces, information describing the discovered databases for selection of a discovered database to be used as the particular source database.
9. The method of
receiving a selection of a position in the geometric shape; and
mapping the selection of the position in the geometric shape to the identified point in time.
10. The method of
receiving a selection of a position in the geometric shape;
mapping the selection of the position in the geometric shape to a database operation performed in the source database; and
determining the identified point in time based on a time of execution of the database operation.
11. The method of
receiving, from the one or more user interfaces, modifications to one or more database parameters from the set of database parameters to obtain a modified set of database parameters; and
using the modified set of database parameters for creating the virtual database.
12. The method of
storing the modified set of database parameters for creating subsequent virtual databases; and
using the modified set of parameters for creating a second virtual database.
13. The method of
14. The method of
configuring for presentation a user interface for specifying a group of virtual databases wherein an attribute specified for the group is applied to all the virtual databases in the group.
16. The non-transitory computer readable storage medium of
information indicating a time of update of data of the database block;
information related to the data stored in the database block; and
information related to one or more objects of a database that the database block is part of.
17. The non-transitory computer readable storage medium of
a live production database;
a storage level snapshot of a production database;
a clone of a production database; or
a previously created virtual database.
18. The non-transitory computer readable storage medium of
19. The non-transitory computer readable storage medium of
20. The non-transitory computer readable storage medium of
receiving, from the one or more user interfaces, selection of a particular host server associated with the particular source database;
discovering databases installed on the particular host server; and
sending for presentation via the one or more user interfaces, information describing the discovered databases for selection of a discovered database to be used as the particular source database.
21. The non-transitory computer readable storage medium of
receiving a selection of a position in the geometric shape; and
mapping the selection of the position in the geometric shape to the identified point in time.
22. The non-transitory computer readable storage medium of
receiving a selection of a position in the geometric shape;
mapping the selection of the position in the geometric shape to a database operation performed in the source database; and
determining the identified point in time based on a time of execution of the database operation.
23. The non-transitory computer readable storage medium of
receiving, from the one or more user interfaces, modifications to one or more database parameters from the set of database parameters to obtain a modified set of database parameters; and
using the modified set of database parameters for creating the virtual database.
24. The non-transitory computer readable storage medium of
storing the modified set of database parameters for creating subsequent virtual databases; and
using the modified set of parameters for creating a second virtual database.
25. The non-transitory computer readable storage medium of
26. The non-transitory computer readable storage medium of
configuring for presentation a user interface for specifying a group of virtual databases wherein an attribute specified for the group is applied to all the virtual databases in the group.
27. The non-transitory computer readable storage medium of
28. The non-transitory computer readable storage medium of
29. The non-transitory computer readable storage medium of
31. The computer system of
information indicating a time of update of data of the database block;
information related to the data stored in the database block; and
information related to one or more objects of a database that the database block is part of.
32. The computer system of
a live production database;
a storage level snapshot of a production database;
a clone of a production database; or
a previously created virtual database.
33. The computer system of
34. The computer system of
35. The computer system of
receiving, from the one or more user interfaces, selection of a particular host server associated with the particular source database;
discovering databases installed on the particular host server; and
sending for presentation via the one or more user interfaces, information describing the discovered databases for selection of a discovered database to be used as the particular source database.
36. The computer system of
receiving a selection of a position in the geometric shape; and
mapping the selection of the position in the geometric shape to the identified point in time.
37. The computer system of
receiving a selection of a position in the geometric shape;
mapping the selection of the position in the geometric shape to a database operation performed in the source database; and
determining the identified point in time based on a time of execution of the database operation.
38. The computer system of
receiving, from the one or more user interfaces, modifications to one or more database parameters from the set of database parameters to obtain a modified set of database parameters; and
using the modified set of database parameters for creating the virtual database.
39. The computer system of
storing the modified set of database parameters for creating subsequent virtual databases; and
using the modified set of parameters for creating a second virtual database.
40. The computer system of
41. The computer system of
configuring for presentation a user interface for specifying a group of virtual databases wherein an attribute specified for the group is applied to all the virtual databases in the group.
42. The computer system of
43. The computer system of
44. The computer system of
|
This application is a continuation of U.S. patent application Ser. No. 13/894,259 filed on May 14, 2013, which is a continuation of U.S. patent application Ser. No. 13/301,448 filed on Nov. 21, 2011, which claims the benefit of U.S. Provisional Application No. 61/418,396, filed on Nov. 30, 2010, each of which is incorporated by reference in its entirety.
This invention relates generally to databases and in particular to interfacing and interacting with storage efficient systems for managing databases.
Databases store the data that is critical to an organization and thus form an important part of an organization's information technology infrastructure. As the information available in an organization grows, so does the complexity of the infrastructure required to manage the databases that store the information. The increased complexity of the infrastructure increases the resources required to manage the databases and the applications that depend on the databases. These increased costs may include the costs associated with hardware for managing the databases as well as the costs associated with additional personnel needed to maintain the hardware. The increased complexity of the infrastructure also affects the maintenance operations associated with the databases, for example, causing backup and recovery operations to take significantly longer.
In a typical organization's infrastructure environment, production database servers run applications that manage the day-to-day transactions of the organization. Changes to production databases or to applications that depend on the production databases are tested on copies of the databases to protect the production environment. Copies of the production databases may be required for several stages in the lifecycles of workflows associated with the production database and applications that depend on the production databases. For example, the stages in the lifecycle of a change incorporated in a production database may include a development stage, a tuning stage, a testing stage, a quality assurance stage, a certification stage, a training stage, and a staging stage. Making copies of the production databases for each stage requires redundant and expensive hardware infrastructure as well as the time overhead required to copy the data, which may take days or weeks. Additional hardware also requires additional costs associated with physically storing the hardware, such as floor space requirements and costs related to power and cooling. Furthermore, redundant hardware typically causes inefficient use of available resources.
Since databases involve complex manipulations of data and information, database products provide various mechanisms to allow users or database administrators to interact or interface with the database. For example, users and database administrators can interact with the database using a user interface, application programming interface, commands, scripts and the like. The mechanisms provided by databases for interfacing with the database can be complex since a large number of commands and options for commands are typically available for manipulating information in a database.
The patent application file contains at least one drawing executed in color. Copies of this patent application with color drawings will be provided by the U.S. Patent and Trademark Office upon request and payment of the necessary fee.
The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
Virtual Database Systems
In certain embodiments of the invention, one or more virtual databases are created based on the state of a production database or a virtual database at a particular point in time, and the virtual databases can then be individually accessed and modified as desired. A database comprises data stored in a computer for use by computer implemented applications. A database server is a computer program that can interact with the database and provides database services, for example, access to the data stored in the database. The virtual database provides efficient storage of database blocks by efficiently sharing database blocks between virtual databases. A database block is a unit of data used by a database and comprises a specific number of bytes stored in the storage. A database block can also be referred to as a page. A portion of the database block stores metadata associated with the database block. Examples of information that may be stored in the metadata of a database block include information related to the data stored in the database block, information related to objects of database that the database block is part of, or information indicating when the data in the database block was updated. The information indicating when a database block was updated may be available as a relative ordering of the database blocks based on their time of update.
A database storage system interfaces with a user to receive information necessary for creating a virtual database. The database storage system uses default values as attributes of the virtual database being created to reduce the burden on the database administrator creating the virtual database. However, the database administrator can chose to specify more or less information in order to customize the virtual database being created to suit a particular application or purpose.
Database servers include commercially available programs, for example, database servers included with database management systems provided by ORACLE, SYBASE, MICROSOFT SQL SERVER, IBM's DB2, MYSQL, and the like. The term “production database” is used in particular examples to illustrate a useful application of the technology; however, it can be appreciated that the techniques disclosed can be used for any database, regardless of whether the database is used as a production database. The virtual databases are “virtual” in the sense that the physical implementation of the database files is decoupled from the logical use of the database files by a database server.
In one embodiment, information from the production database is copied to a storage system at various times, such as periodically. The schedule for copying the information from the production database can be either a default schedule selected by the system or specified by the database administrator. This enables reconstruction of the database files associated with the production database for these different points in time. The information may be managed in the storage system in an efficient manner so that copies of information are made only if necessary. For example, if a portion of the database is unchanged from a version that was previously copied, that unchanged portion need not be copied. A virtual database created for a point in time is stored as a set of files that contain the information of the database as available at that point in time. Each file includes a set of database blocks and the data structures for referring to the database blocks. In some embodiments, the database blocks may be compressed in order to store them efficiently.
A virtual database may be created on a database server by creating the database files for the production database corresponding to the state of the production database at a previous point in time, as required for the database server. The files corresponding to the virtual database are made available to the database server using a file sharing mechanism, which links the virtual database to the appropriate database blocks stored on the storage system. The process of making the virtual database available to a database server is called “provisioning” the virtual database. In some embodiments, provisioning the virtual database includes managing the process of creating a running database server based on virtual database. Multiple VDBs can be provisioned based on the state of the production database at the same point in time. On the other hand, different VDBs can be based on different point in time state of the same production database or different production databases.
The database server on which a virtual database has been provisioned can then read from and write to the files stored on the storage system. A database block may be shared between different files, each file associated with a different VDB. In particular, a database block is shared if the corresponding virtual database systems 130 are only reading the information in the database block and not writing to the database block. In one embodiment, the virtual database manager makes copies of the database blocks only if necessary. For example, a particular database block may be shared by multiple VDBs that read from the same database block. But if one of virtual database systems attempts to write to the database block, a separate copy of the database block is made because the writing operation causes that database block to be different for the VDB corresponding to that virtual database systems than it is for the other VDBs. Systems and methods for creating and using virtual databases are disclosed in U.S. patent application Ser. No. 12/603,541 filed on Oct. 21, 2009, which is incorporated by reference in its entirety.
System Environment
In response to a request from the administrator system 140, or based on a predefined schedule, the database storage system 100 may send a request 150 for data to a production database system 110. The production database system 110 responds by sending information stored in the production database as a stream of data 160. The request 150 is sent periodically and the production database system 110 responds by sending information representing changes of data stored in the production database since the last response 160 sent by the production database system 110. The database storage system 100 receives the data 160 sent by the production database system 110 and stores the data. The database storage system 100 may analyze the data 160 received to determine whether to store the information or skip the information if the information is not useful for reconstructing the database at previous time points. The database storage system 100 stores the information efficiently, for example, by keeping versions of database blocks that have changed and reusing database blocks that have not changed. In an embodiment, database storage system 100 employs a hierarchical caching system where high speed solid-state drive (SSD) or equivalent storage devices are configured for caching read operations and for persisting logs for writing operations to magnetic disks.
To create a virtual database, the database storage system 100 creates files that represent the information corresponding to the production database system 110 at a given point in time. The database storage system 100 exposes 170 the corresponding files to a virtual database system 130 using a file sharing system 120. The virtual database system 130 runs a database server that can operate with the files exposed 170 by the database storage system 100. Hence, a virtual copy of the production database is created for the virtual database system 130 for a given point in time in a storage efficient manner.
System Architecture
A production database system 110 is typically used by an organization for maintaining its daily transactions. For example, an online bookstore may save all the ongoing transactions related to book purchases, book returns, or inventory control in a production system 110. The production system 110 includes a database server 245, a production DB data store 250, a vendor interface module 235, and a production system library 285. In alternative configurations, different and/or additional modules can be included in a production database system 110.
The production DB data store 250 stores data associated with a database that may represent for example, information representing daily transactions of an enterprise. The database server 245 is a computer program that provides database services and application programming interfaces (APIs) for managing data stored on the production DB data store 250. The production system library 285 provides APIs useful for extracting information from the production database system 110. The vendor interface module 235 represents APIs provided by a vendor for customizing functionality provided by the database server 245, for example, APIs to retrieve database blocks that changed since a previous time point. An example of a vendor interface module is the program code of a database server provided by vendor ORACLE that implements RMAN APIs. Database servers provided by other vendors, for example, MICROSOFT's SQL SERVER or IBM's DB2 have similar APIs. In one embodiment, the vendor interface module 235 mounts the production DB data store 250 of the production database system 110 on the database storage system 100 using a file sharing system similar to the file sharing system 120. Mounting the production DB data store 250 on the database storage system 100 allows transfer of information stored on the production database system 110 to the database storage system 100.
The production system library 285 may be implemented in different ways depending on the requirements of the vendor interface module 235. In an embodiment, the vendor interface module 235 loads the production system library 285 in order to call back functions implemented in the production system library 285. For example, the production system library 285 may be a shared object file with a “.so” or a “.DLL” file extension that contains executable program code that can be called by a C/C++ executable program or by a JAVA program that uses the JAVA NATIVE INTERFACE for interaction with binary code generated by C/C++ programs. Alternatively, the production system library 285 may be implemented using the JAVA programming language and installed in the production database system 110 as a file with “.jar” extension. The java program requires a JAVA VIRTUAL MACHINE running on the production database system 110 for execution. In another embodiment, a part of the production system library 285 may be implemented as an executable “.so” shared object file and another part of the production system library 285 may be implemented as a JAVA program installed as a “.jar” file.
The vendor interface module 235 responds to requests from database storage system 100, and in response to the requests, collects requested information from the production DB data store 250 and returns the collected information to the database storage system 100. The vendor interface module 235 may send request to the database server 245 for retrieving information from the production DB data store 250. The vendor interface module 235 loads the program code in the production system library 285 and invokes it to transmit the stream of data for to the database storage system 100 for further processing. In some embodiments the vendor interface module 235 may directly interact with the production DB data store 250 instead of sending a request to the database server 245 to retrieve the necessary database blocks. In other embodiments, the vendor interface module 235 may retrieve the necessary database blocks from storage level snapshots of production databases or clones of production databases instead of a live production database.
The database storage system 100 retrieves information available in the production database systems 110 and stores it. The information retrieved includes database blocks comprising data stored in the database, transaction log information, metadata information related to the database, information related to users of the database and the like. The information retrieved may also include configuration files associated with the databases. For example, databases may use vendor specific configuration files to specify various configuration parameters including initialization parameters associated with the databases. Copying the configuration files allows a VDB to be created with configuration parameters similar to the source production database. In some embodiments, the configuration parameters files may be modified by a database administrator using the user interface 295 to customize the VDB configuration for a specific usage scenario. For example, the production database may be accessed by a database server 245 using a particular cache size whereas the corresponding VDB may be accessed by a database server 260 using a different cache size.
The information retrieved may also include information associated with applications using the database, for example, an enterprise resource planning (ERP) application may be using the database and may have data specific to the ERP application. Retrieving the ERP application data allows a similar ERP application to be executed with a VDB created based on the production database system. This is beneficial for usage scenarios where a VDB is created for an environment similar to the production environment, for example, for testing and development. A database administrator can use the user interface 295 to specify logic for copying the information that is specific to a production environment as well as logic for appropriately installing the information with a VDB for use by a virtual database system 130.
In some embodiments, information regarding users of the production database, for example, the users with administrative privileges may be obtained by using specific APIs or by running specific scripts on the production database. The information about the users can be used to facilitate life cycle management of VDBs in the system. In an embodiment, a database administrator is allowed to use the user interface 295 in order to specify information regarding user accounts to be created and their access permissions. For example, if the VDB is created for testing purposes, test users may be created on the VDB for test organization whereas if the VDB is created as a standby for the production database, only users with production support roles should have access. In some embodiments, access permission may specify if a user can provision a privileged VDB. One example of privileged VDB is a VDB with full access to non-public information (information that may not be accessible to non-privileged users), for example, social security numbers or credit card information. The corresponding un-privileged VDB is a VDB with non-public information masked or scrambled. Another example of privileged VDB is a VDB with sensitive data accessible transparently. The corresponding un-privileged VDB is a VDB with sensitive information encrypted.
In some embodiments, access privileges are simplified to three levels: administrator, owner, and auditor. Administrator has full control of all managed objects including databases and hosts. The control available to an administrator included policy management. Owner has access to use of resources, for example, an owner can provision a VDB. Auditor can view logs but may not have rights to consume system resources.
The data stored in the storage system data store 290 can be exposed to a virtual database system 130 allowing the virtual database system 130 to treat the data as a copy of the production database stored in the production database system 110. The database storage system 100 includes a point-in-time copy manager 210, a transaction log manager 220, a interface manager 230, a system configuration manager 215, a storage allocation manager 265, a file sharing manager 270, a virtual database manager 275, and a storage system data store 290. In alternative configurations, different and/or additional modules can be included in the database storage system 100.
The point-in-time copy manager 210 interacts with the production database system 110 by sending a request to the vendor interface module 235 to retrieve information representing a point-in-time copy (also referred to as a “PIT copy”) of a database stored in the production DB data store 250. The point-in-time copy manager 210 stores the data obtained from the production database system 110 in the storage system data store 290. The data retrieved by the point-in-time copy manager 210 corresponds to database blocks (or pages) of the database being copied from the production DB data store 250. After a first PIT copy request to retrieve information production DB data store 250, a subsequent PIT copy request may need to retrieve only the data that changed in the database since the previous request. The data collected in the first request can be combined with the data collected in a second request to reconstruct a copy of the database corresponding to a point in time at which the data was retrieved from the production DB data store 250 for the second request.
The transaction log manager 220 sends request to the production database system 110 for retrieving portions of the transaction logs stored in the production database system 110. In some embodiments, the request from the transaction log manager 220 is sent to the vendor interface module 235. The data obtained by the transaction log manager 220 from the vendor interface module 235 is stored in the storage system data store 290. In one embodiment, a request for transaction logs retrieves only the changes in the transaction logs in the production database system 110 since a previous request for the transaction logs was processed. The database blocks retrieved by a point in time copy manager 210 combined with the transaction logs retrieved by the transaction log manager 220 can be used to reconstruct a copy of a database in the production system 110 corresponding to times in the past in between the times as which point-in-time copies are made.
The storage allocation manager 265 provides the functionality of saving data retrieved from the production database system 110. For example, the point-in-time copy manager 210 may call APIs of storage allocation manager to save blocks of data retrieved from the production database system 110. The storage allocation manager 265 keeps track of the various versions of each block of data that may be obtained from the production database system 110. For a given time point, the storage allocation manager 265 can be requested to provide the latest version of a block of data obtained before the given time point. The storage allocation manager 265 can also be used for making copies of blocks of data. If a block of data is copied for read-only purposes, the storage allocation manager 265 allocates only sufficient storage to keep a pointer of reference to the exiting block of data. However, if an attempt to write to the copied block of data is made, the storage allocation manager 265 allocates sufficient storage to make an actual copy of the block of data to avoid updating the original block of data.
The file sharing manager 270 allows files stored in the storage system data store 290 to be shared across computers that may be connected with the database storage system 100 over the network. The file sharing manager 270 uses the file sharing system 120 for sharing files. An example of a system for sharing files is a network file system (NFS). A system for sharing files may utilize fiber channel Storage area networks (FC-SAN) or network attached storage (NAS) or combinations and variations thereof. The system for sharing files may be based on small computer system interface (SCSI) protocol, internet small computer system interface (iSCSI) protocol, fiber channel protocols or other similar and related protocols. In some embodiments, the database storage system 100 may utilize a logical volume manager. Sharing a file stored in the storage system data store 290 using the file sharing manager 270 allows a remote computer, for example, the virtual database systems 130 to access the data in the shared file. A remote system may be able to read and write from/to the file shared by the storage system data store 290. In an embodiment, files are organized in a format emulating a given file system disk layout, such as the file system of WINDOWS operating system called NTFS or the UNIX file system (UFS).
The virtual database manager 275 receives requests for creation of a virtual database for a virtual database system 130. The request for creation of a virtual database may be sent by a database administrator using the administration system 140 and identifies a production database system 110, a virtual database system 130, and includes a past point-in-time corresponding to which a virtual database needs to be created. The virtual database manager 275 creates the necessary files corresponding to the virtual database being created and shares the files with the virtual database system 130. The database administrator for a virtual database system 130 may be different from a database administrator for the production database system 110.
The interface manager 230 renders for display information necessary for display using the administration system 140. A database administrator user can see information available in the storage system data store 290 as well as take actions executed by the database storage system. For example, a database administrator can see the different production databases stored in the storage system data store 290 obtained from different production database systems 110. As another example, the database administrator can request the database storage system 100 to make a PIT copy of a database stored on a production database system 110 at a particular point-in-time. In an embodiment, the interface manager 230 allows external applications to access information of the database storage system 100. For example, the database storage system may provide application programming interface (API) to allow third party vendors to write applications based on database storage system 100. In an embodiment, the interface manager 230 provides web services that allow web applications to access information available in the database storage system 100. For example, the database storage system can be part of a cloud computing environment. A third party vendor can use web services to implement various workflow scenarios based on VDBs, for example the various workflow scenarios described herein. This allows automation of the workflow scenarios based on VDBs.
The system configuration manager 215 allows a database administrator using the administration system 140 to setup or change the configuration of the database storage system 100. For example, when the database storage system is being initially setup or at a later stage, the system configuration manager 215 allows a database administrator user or an agent to specify production database systems 110 and virtual database systems 130 to connect to. The system configuration manager 215 also allows a user with appropriate roles and privileges to setup policies specifying the schedule with which the point-in-time copy manager 210 retrieves PIT copies of databases in the production database systems 110 as well as the frequency and the times at which the transaction log manager 220 retrieves updates to online transaction logs from the production database systems 110. In an embodiment, a schedule can specify the frequency and times during the day for the PIT and log retrieval actions or it could be an a periodic schedule specifying the calendar days when the same action should take place.
In an embodiment, policies can be defined by a database administrator and stored in the system configuration manager 215 for various operations associated with the loading of point-in-time copies from production database systems 110, loading of transaction logs from the production database systems 110, purging of information from the database storage system 100 including point-in-time copies of databases and transaction log information, and provisioning of virtual database systems. A policy specifies rules for executing the specific operation. For example, a policy may specify the operation to be executed based on a predetermined schedule. A policy may determine when to purge PIT copies stored in the database storage system 100 based on number of PIT copies that have been accumulated for a production database. A policy may measure storage availability to determine when to purge information. For example, if the amount of storage available reaches below a threshold level, old PIT copies of selected databases may be purged. The policy may also specify priority of production databases to be used before purging information, for example, low priority database information is purged before purging high-priority database information. In a particular workflow scenario, a policy may determine when to obtain new information from a production database and automatically update VDB information and provision the updated VDB based on the new information.
A virtual database system 130 includes a database server 260 and a VDB system library 280. The database server 260 is similar in functionality to the database server 245 and is a computer program that provides database services and application programming interfaces (APIs) for managing data stored on a data store 250. The data managed by the database server 260 may be stored on the storage system data store 290 that is shared by the database storage system 100 using a file sharing system 120. The VDB system library 280 contains program code for processing requests sent by the database storage system 100. In alternative configurations, different and/or additional modules can be included in a virtual database system 130.
The user interface 295 can provide a list of source databases to select from. The user can select a particular source database and send the selection to the database storage system 100. The database storage system 100 receives 310 the selection of the source database. The source databases presented to the user for selection comprise source databases for which the database storage system 100 has stored point-in-time copies and transaction logs. In an embodiment, the user provides input identifying a server machine hosting one or more databases. The database storage system 100 analyzes the server machine to determine the databases hosted by the server machine and presents the discovered databases as potential source databases to select from. The discovery of the databases can be based on discovery of names of files or file paths that are typically used by production database system 110 as well as by discovery of processes running on the server machine that are typically present in production database systems 110.
The user interface 295 allows the user to select a point-in-time value. The database storage system 100 receives 315 the selection of the point-in-time value. In an embodiment, the user interface 295 presents a time line to the user indicating a range of point-in-time values to select from, allowing the user to select a point-in-time value by identifying a position in the time line. The database storage system 100 uses the point-in-time value for determining the database blocks of the source database stored in the storage system data store 290 to be used for creating the VDB.
The user interface 295 allows the user to select a destination database system 130 for accessing the VDB being created. The database storage system 100 receives 320 the selection of the destination database system 130 from the user. In an embodiment, the user interface 295 presents a list of previously selected destination database systems 130 to the user. Alternatively, the user interface 295 allows the user to enter information identifying the destination database systems 130, for example, using a machine name or internet protocol (IP) address.
The user interface 295 presents 325 to the user, the parameters of the source database selected by the user. The database storage system 100 by default may use values from the parameters of the source database as the corresponding parameters for the VDB being created. Alternatively, the user can modify the parameter values presented by the user interface 295. The modifications of the parameters are received 325 by the database storage system 100. The database storage system 100 uses the set of parameters including the unmodified values as well as the modified values as the parameters for the VDB being created. In an embodiment the database storage system 100 stores the set of parameters values as modified by the user and uses them as the default for subsequent VDBs created by the user, for example, VDBs created using the same source database.
The user interface 295 presents 330 to the user, the file paths where the database storage system 100 expects to create the files associated with the VDB. The user can modify the file paths as well as the file names. For example, certain applications using the VDB may require a special file naming convention or the files to be stored at a particular file path. The database storage system 100 received 335 the modifications to the file path. In an embodiment, the user interface 295 allows the user to map patterns in the default file path to patterns associated with a desired file path. The mapping of the patterns can be stored by the database storage system 100 and applied to subsequent VDBs created by the user.
Based on the input received by the database storage system 100 in the steps described above the database storage system 100 creates the VDB. The created VDB is based on a snapshot and transaction logs associated with the point-in-time value selected by the user. The database blocks of the source database associated with updates made in the source database prior to the selected point-in-time are linked to a file structure created for the VDB. The file structure for the VDB is mounted on the destination database system 130, thereby allowing the destination database system to access the VDB.
In an embodiment, the steps illustrated in the
In response to the user selecting a particular source database 410, the user interface provides information describing the source database including its status, size, name etc. as well as information describing the various point-in-time copies 430 stored on the database storage system 100. The information describing each point-in-time copy 430 comprises the time at which the point-in-time copy was made, the source database from which the point-in-time copy was made, and information describing the source database as well as the production database system 110 hosting the source database. The source database can be a virtual database associated with the data source.
The user interface shown in
The user interface shown in
The user interface 520 allowing the user to select a time point in between the time of copying of two point-in-time copies can be a time line marked with various time points that can be selected. The user may be provided a slider 530 for allowing the selection of a particular time point on the time line 520. Alternatively, the user interface can provide the user with any geometric shape that can be used for representing various time points, for example, a curved line or a thin rectangle or ellipse. The user can select a particular time point by using a slider 530 or by clicking or double clicking at a particular position in the geometric shape. In an embodiment the user can be presented with a list of various time points based on textual representation, for example, a drop down list and the user can make a particular selection. Another embodiment, allows the user to enter a time value using a data entry widget, for example, text box. The value entered by the user can be validated by the interface manager 230, for example, to ensure that the value is within a valid range.
The arrow 710 shown in
The user interface shown in
A VDB may be created using a point-in-time copy of another VDB as a source. For example, assume VDB1 is created and provisioned to a virtual database system 130. Database blocks associated with the VDB are copied when the virtual database system 130 writes to the database blocks for the first time. Point-in-time copies of VDB1 are also made based on a predefined schedule. This allows a user to create a second virtual database VDB2 based on a point-in-time copy of VDB1. Transaction logs of VDB1 are also stored, allowing a user to create the second virtual database VDB2 based on any previous state of VDB1 that may be in-between point-in-time copies of VDB1.
The virtual database system 130 is allowed to read from the file structures created for a VDB as well as write to them. When the virtual database system 130 writes to a block Vij, space is allocated for the database block and the data of the corresponding database block copied to the space allocated. For example, if the virtual database system 130 writes to the block V11, space is allocated and block F11 copied to the allocated block. Hence the original copy of the block F11 is maintained as a read only copy and the virtual database system 130 is allowed to write to a copy of the appropriate database block created specifically for the virtual database system 130. This can be considered a lazy mechanism for creating copies of the database blocks that copies a database blocks only if the corresponding virtual database system 130 writes to the database block. Since the number of blocks that a virtual database system 130 writes to may be a small fraction of the total number of blocks associated with the VDB, the above structure stores the data associated with the VDB in a highly storage efficient manner. A database block that is not written to by virtual database systems 130 may be shared by several virtual database systems without being copied for a specific virtual database systems 130.
A user can specify policies 1010 for scheduling the operation of the initial copy of a source database to the database storage system 100. The initial copy of the database can take significant amount of time, depending on the size of the source database. For example, for large source databases, the initial database copy can take several hours. The user can specify policies defining when the initial copy operation is performed. For example, the user can specify intervals of time when the work load on the database storage system 100 is low, thereby allowing the database storage system 100 to devote more resources to the database copy operation and avoid affecting a large number of other operations during the process of copying.
A user can specify policies 1020 describing a schedule for making point-in-time copies of source databases. The source databases can include productions databases and virtual databases. The policies 1010 may be specified for a group of source databases. A policy specified for a group of source databases is applicable to each source databases, unless overwritten by a policy for a specific source database in the group. Since a point-in-time copy takes significantly less time compared to an initial copy of the source databases, the point-in-time copy can be performed in one continuous time interval. Accordingly, the schedule for making point-in-time copies specifies a frequency at which the point-in-time copies are made.
The following equation provides a quantitative measure of storage savings obtained by utilizing a set S of virtual databases.
The metric is called vdbRatio (VDB ratio) and is obtained by taking the ratio of size of storage of unvirtualized database (databases stored using conventional techniques) and the size of storage of virtual databases. The variable dbuv represents size of storage of an unvirtualized database. The symbol
indicates summation of a parameter associated with each database, the summation computed over all databases belonging to a set S of databases. The variable dbv represents size of storage occupied by a virtualized database.
The following equation provides a quantitative measure of storage savings provided by the information stored in the database storage system 100 based on the mechanisms described herein, for example, as a result of reusing database blocks across point-in-time copies of source databases.
The metric determined using equation (2) is called timeFlowRatio (time-flow ratio). The symbol
indicates summation of a parameter associated with each database, the summation computed over all databases belonging to a set S of databases. The variable dSourcev represents the size of the storage occupied by the data blocks obtained initially from each source database. The variable SSv represents size of storage occupied by each point-in-time copy obtained from the source database stored in virtualized form such that database blocks that do not change between two consecutive point-in-time copies are shared. The variable dbLogv represents size of storage occupied by the database logs obtained from the source database from a given point in time stored in virtualized form. The dSourceuv, SSuv, and dbLoguv corresponds to the above size of the storage occupied by the data blocks obtained initially, size of storage occupied by each point-in-time copy, and size of storage occupied by the database logs corresponding to each database when the information is stored in unvirtualized form. In an embodiment, the variables dSourceuv, SSuv, and dbLoguv represent the size of the corresponding information as the source database stores it, assuming the source database is a conventional database and not a virtual database.
Other variations of the metric indicated in equations (1) and (2) can be used, for example, the inverse of the ration can be used or the two values corresponding to the numerator and denominator presented separately.
Computing Machine Architecture
The machine may be a server computer, a client computer, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a smartphone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions 1424 (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute instructions 1424 to perform any one or more of the methodologies discussed herein.
The example computer system 1400 includes a processor 1402 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), one or more application specific integrated circuits (ASICs), one or more radio-frequency integrated circuits (RFICs), or any combination of these), a main memory 1404, and a static memory 1406, which are configured to communicate with each other via a bus 1408. The computer system 1400 may further include graphics display unit 1410 (e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)). The computer system 1400 may also include alphanumeric input device 1412 (e.g., a keyboard), a cursor control device 1414 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 1416, a signal generation device 1418 (e.g., a speaker), and a network interface device 1420, which also are configured to communicate via the bus 1408.
The storage unit 1416 includes a machine-readable medium 1422 on which is stored instructions 1424 (e.g., software) embodying any one or more of the methodologies or functions described herein. The instructions 1424 (e.g., software) may also reside, completely or at least partially, within the main memory 1404 or within the processor 1402 (e.g., within a processor's cache memory) during execution thereof by the computer system 1400, the main memory 1404 and the processor 1402 also constituting machine-readable media. The instructions 1424 (e.g., software) may be transmitted or received over a network 1426 via the network interface device 1420.
While machine-readable medium 1422 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 1424). The term “machine-readable medium” shall also be taken to include any medium that is capable of storing instructions (e.g., instructions 1424) for execution by the machine and that cause the machine to perform any one or more of the methodologies disclosed herein. The term “machine-readable medium” includes, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media.
Additional Configuration Considerations
The foregoing description of the embodiments of the invention has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.
Some portions of this description describe the embodiments of the invention in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combinations thereof.
Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.
Embodiments of the invention may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a tangible computer readable storage medium or any type of media suitable for storing electronic instructions, and coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments of the invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
Wang, Yan, Srivastava, Alok, Yueh, Jedidiah, Luiz, Xavier David
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
4853843, | Dec 18 1987 | TEKTRONIX, INC , A CORP OF OREGON | System for merging virtual partitions of a distributed database |
5634053, | Aug 29 1995 | HE HOLDINGS, INC , A DELAWARE CORP ; Raytheon Company | Federated information management (FIM) system and method for providing data site filtering and translation for heterogeneous databases |
5680608, | Feb 06 1995 | IBM Corporation | Method and system for avoiding blocking in a data processing system having a sort-merge network |
5680618, | May 26 1993 | Borland Software Corporation | Driver query and substitution for format independent native data access |
5819292, | Jun 03 1993 | NetApp, Inc | Method for maintaining consistent states of a file system and for creating user-accessible read-only copies of a file system |
6523036, | Aug 01 2000 | EMC IP HOLDING COMPANY LLC | Internet database system |
6883083, | Dec 20 2002 | Veritas Technologies LLC | System and method for maintaining and accessing information regarding virtual storage devices |
6920457, | May 16 2002 | Virtual database of heterogeneous data structures | |
7107385, | Aug 09 2002 | Network Appliance, Inc | Storage virtualization by layering virtual disk objects on a file system |
7197491, | Sep 21 1999 | International Business Machines Corporation | Architecture and implementation of a dynamic RMI server configuration hierarchy to support federated search and update across heterogeneous datastores |
7222172, | Apr 26 2002 | Hitachi, Ltd. | Storage system having virtualized resource |
7225204, | Mar 19 2002 | Network Appliance, Inc | System and method for asynchronous mirroring of snapshots at a destination using a purgatory directory and inode mapping |
7269607, | Sep 29 2003 | International Business Machines Coproartion | Method and information technology infrastructure for establishing a log point for automatic recovery of federated databases to a prior point in time |
7334094, | Apr 30 2004 | Network Appliance, Inc | Online clone volume splitting technique |
7334095, | Apr 30 2004 | NetApp, Inc | Writable clone of read-only volume |
7373364, | Mar 05 2002 | NetApp, Inc | System and method for creating a point-in-time restoration of a database file |
7386695, | Dec 23 2004 | International Business Machines Corporation | Storage system with multiple copy targeting |
7409511, | Apr 30 2004 | NetApp, Inc | Cloning technique for efficiently creating a copy of a volume in a storage system |
7457982, | Apr 11 2003 | NetApp, Inc | Writable virtual disk of read-only snapshot file objects |
7539836, | Apr 18 2005 | NetApp, Inc | Method and system for configuring a data storage object |
7587563, | Jul 11 2006 | Network Appliance, Inc. | Method and system to make a read-only file system appear to be writeable |
7590660, | Mar 21 2006 | Network Appliance, Inc. | Method and system for efficient database cloning |
7631021, | Mar 25 2005 | NetApp, Inc | Apparatus and method for data replication at an intermediate node |
7653665, | Sep 13 2004 | Microsoft Technology Licensing, LLC | Systems and methods for avoiding database anomalies when maintaining constraints and indexes in presence of snapshot isolation |
7653794, | May 08 2006 | Microsoft Technology Licensing, LLC | Converting physical machines to virtual machines |
7743035, | Aug 09 2002 | NetApp, Inc. | System and method for restoring a virtual disk from a snapshot |
7757056, | Mar 16 2005 | NetApp, Inc | System and method for efficiently calculating storage required to split a clone volume |
7779051, | Jan 02 2008 | International Business Machines Corporation | System and method for optimizing federated and ETL'd databases with considerations of specialized data structures within an environment having multidimensional constraints |
7809769, | May 18 2006 | GOOGLE LLC | Database partitioning by virtual partitions |
7822758, | Apr 22 2005 | Network Appliance, Inc | Method and apparatus for restoring a data set |
7827366, | Oct 31 2006 | Network Appliance, Inc | Method and system for providing continuous and long-term data protection for a dataset in a storage system |
7856424, | Aug 04 2006 | Apple Computer, Inc; Apple Inc | User interface for backup management |
7877357, | Oct 12 2007 | Network Appliance, Inc | Providing a simulated dynamic image of a file system |
7895228, | Nov 27 2002 | International Business Machines Corporation | Federated query management |
7937547, | Jun 24 2005 | CATALOGIC SOFTWARE, INC | System and method for high performance enterprise data protection |
7941470, | Mar 28 2007 | VMware LLC | Synchronization and customization of a clone computer |
7953749, | May 11 2004 | Oracel International Corporation | Providing the timing of the last committed change to a row in a database table |
7996636, | Nov 06 2007 | NetApp, Inc. | Uniquely identifying block context signatures in a storage volume hierarchy |
8037032, | Aug 25 2008 | VMware LLC | Managing backups using virtual machines |
8150808, | Oct 21 2009 | Silicon Valley Bank | Virtual database system |
8161077, | Oct 21 2009 | Silicon Valley Bank | Datacenter workflow automation scenarios using virtual databases |
8255915, | Oct 31 2006 | Hewlett Packard Enterprise Development LP | Workload management for computer system with container hierarchy and workload-group policies |
8280858, | Jun 29 2009 | Oracle America, Inc | Storage pool scrubbing with concurrent snapshots |
8311988, | Aug 04 2006 | Apple Computer, Inc | Consistent back up of electronic information |
8532973, | Jun 27 2008 | NetApp, Inc. | Operating a storage server on a virtual machine |
8775663, | Apr 25 2007 | NetApp, Inc. | Data replication network traffic compression |
20020083037, | |||
20020143764, | |||
20030204597, | |||
20040054648, | |||
20050114701, | |||
20060242381, | |||
20070219959, | |||
20070260628, | |||
20070294215, | |||
20080005201, | |||
20080034268, | |||
20080154989, | |||
20080183973, | |||
20080256314, | |||
20080306904, | |||
20080307345, | |||
20090019246, | |||
20090080398, | |||
20090132611, | |||
20090132616, | |||
20090144224, | |||
20090177697, | |||
20090222496, | |||
20090292734, | |||
20100070476, | |||
20100125844, | |||
20100131959, | |||
20100174684, | |||
20110004586, | |||
20110004676, | |||
20110093435, | |||
20110093436, | |||
20110161973, | |||
CN101286127, | |||
CN101441582, | |||
CN101473309, | |||
CN1770088, | |||
JP2004110218, | |||
JP2005532611, | |||
JP2009530756, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Dec 17 2014 | Delphix Corporation | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Date | Maintenance Schedule |
Jun 07 2019 | 4 years fee payment window open |
Dec 07 2019 | 6 months grace period start (w surcharge) |
Jun 07 2020 | patent expiry (for year 4) |
Jun 07 2022 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jun 07 2023 | 8 years fee payment window open |
Dec 07 2023 | 6 months grace period start (w surcharge) |
Jun 07 2024 | patent expiry (for year 8) |
Jun 07 2026 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jun 07 2027 | 12 years fee payment window open |
Dec 07 2027 | 6 months grace period start (w surcharge) |
Jun 07 2028 | patent expiry (for year 12) |
Jun 07 2030 | 2 years to revive unintentionally abandoned end. (for year 12) |