Sequential media reclamation is usually performed after portions of a sequential access volume's data are no longer needed and the unused portion of the volume exceeds a threshold. Improved sequential media reclamation is provided by using a sequential access disk volume (for example, a volume of a virtual tape library (VTL)) embodied as a sparse file. reclamation of objects stored in the volume is accomplished by nulling out regions of the sparse file that contain the objects that are no longer needed. A replication method is also provided in which information about the objects stored in the sparse file (such as offset and length) is used during replication to enable the correct portions of a target volume (embodied as a sparse file) to be nulled out to match a source volume (also embodied as a sparse file).
|
1. A method comprising:
determining a used space of a logical volume of a storage repository that is currently being used by a plurality of needed files, the needed files each being a file that has one of: not expired as opposed to having expired, not deleted as opposed to having been deleted, and not outdated as opposed to being outdated;
determining whether the logical volume qualifies for reclamation by comparing the used space of the logical volume against a logical size of the logical volume, such that where a difference between the logical size and the used space divided by the logical size is greater than a first predetermined reclamation threshold, the logical volume qualifies for reclamation;
only in direct response to determining that the logical volume qualifies for reclamation, determining whether a non-sparse and unused space of the logical volume exists by determining whether the used space of the logical volume is less than a physical size of the logical volume, such that where a difference between the physical size and the used space divided by the logical size is greater than a second predetermined threshold, the non-sparse and unused space of the logical volume exists;
only in direct response to determining that the non-sparse and unused space of the logical volume exists, determining an offset and a length of each hole region of a plurality of hole regions of the logical volume, the offset of a hole region being a location of where the hole region starts within the logical volume relative to a beginning of the logical volume;
after determining the offset and the length of each hole region of the logical volume, nulling out each hole region of the logical volume by writing a null value to each position of a plurality of positions of each hole region, the plurality of positions starting at the offset into the logical volume and ending within the logical volume at the offset plus the length, where nulling out each hole region of the logical volume renders each hole region sparse; and
after nulling out each hole region of the logical volume, updating a database with the physical size of the logical volume that has changed due to each hole region of the logical volume having been nulled out, where updating the physical size of the logical volume is configured to prevent repeated reclamation of space of the logical volume.
7. A system comprising:
a storage repository including a logical volume;
a database; and
a server containing a memory and a processor to:
determine a used space of the logical volume of the storage repository that is currently being used by a plurality of needed files, the needed files each being a file that has one of: not expired as opposed to having expired, not deleted as opposed to having been deleted, and not outdated as opposed to being outdated;
determine whether the logical volume qualifies for reclamation by comparing the used space of the logical volume against a logical size of the logical volume, such that where a difference between the logical size and the used space divided by the logical size is greater than a first predetermined reclamation threshold, the logical volume qualifies for reclamation;
only in direct response to determining that the logical volume qualifies for reclamation, determine whether a non-sparse and unused space of the logical volume exists by determining whether the used space of the logical volume is less than a physical size of the logical volume, such that where a difference between the physical size and the used space divided by the logical size is greater than a second predetermined threshold, the non-sparse and unused space of the logical volume exists;
only in direct response to determining that the non-sparse and unused space of the logical volume exists, determine an offset and a length of each hole region of a plurality of hole regions of the logical volume, the offset of a hole region being a location of where the hole region starts within the logical volume relative to a beginning of the logical volume;
after determining the offset and the length of each hole region of the logical volume, null out each hole region of the logical volume by writing a null value to each position of a plurality of positions of each hole region, the plurality of positions starting at the offset into the logical volume and ending within the logical volume at the offset plus the length, where nulling out each hole region of the logical volume renders each hole region sparse; and
after nulling out each hole region of the logical volume, update the database with the physical size of the logical volume that has changed due to each hole region of the logical volume having been nulled out, where updating the physical size of the logical volume is configured to prevent repeated reclamation of space of the logical volume.
4. A computer program product comprising a non-transitory computer useable medium having a computer readable program, wherein the computer readable program, when executed on a computer, causes the computer to perform a method comprising:
determining a used space of a logical volume of a storage repository that is currently being used by a plurality of needed files, the needed files each being a file that has one of: not expired as opposed to having expired, not deleted as opposed to having been deleted, and not outdated as opposed to being outdated;
determining whether the logical volume qualifies for reclamation by comparing the used space of the logical volume against a logical size of the logical volume, such that where a difference between the logical size and the used space divided by the logical size is greater than a first predetermined reclamation threshold, the logical volume qualifies for reclamation;
only in direct response to determining that the logical volume qualifies for reclamation, determining whether a non-sparse and unused space of the logical volume exists by determining whether the used space of the logical volume is less than a physical size of the logical volume, such that where a difference between the physical size and the used space divided by the logical size is greater than a second predetermined threshold the non-sparse and unused space of the logical volume exists;
only in direct response to determining that the non-sparse and unused space of the logical volume exists, determining an offset and a length of each hole region of a plurality of hole regions of the logical volume, the offset of a hole region being a location of where the hole region starts within the logical volume relative to a beginning of the logical volume;
after determining the offset and the length of each hole region of the logical volume, nulling out each hole region of the logical volume by writing a null value to each position of a plurality of positions of each hole region, the plurality of positions starting at the offset into the logical volume and ending within the logical volume at the offset plus the length, where nulling out each hole region of the logical volume renders each hole region sparse; and
after nulling out each hole region of the logical volume, updating a database with the physical size of the logical volume that has changed due to each hole region of the logical volume having been nulled out, where updating the physical size of the logical volume is configured to prevent repeated reclamation of space of the logical volume.
2. The method of
wherein nulling out each hole region of the logical volume by writing a null value to each position of each hole region comprises the server instructing the virtual tape library system to write a null value to each position of each hole region to render each hole region sparse as opposed to the server directly writing a null value to each position of each hole region, such that rendering each hole region sparse is offloaded from the server and from the storage repository to the virtual tape library system.
5. The computer program product of
wherein nulling out each hole region of the logical volume by writing a null value to each position of each hole region comprises the server instructing the virtual tape library system to write a null value to each position of each hole region to render each hole region sparse as opposed to the server directly writing a null value to each position of each hole region, such that rendering each hole region sparse is offloaded from the server and from the storage repository to the virtual tape library system.
6. The computer program product of
8. The system of
wherein the server is to null out each hole region of the logical volume by writing a null value to each position of each hole region by instructing the virtual tape library system to write a null value to each position of each hole region to render each hole region sparse as opposed to the server directly writing a null value to each position of each hole region, such that rendering each hole region sparse is offloaded from the server and from the storage repository to the virtual tape library system.
|
This invention relates in general to data storage management. More specifically, the invention relates to reclaiming sequential storage media, such as virtual tape.
Sequential media reclamation is a process in which space is reclaimed on sequential media after portions of the data stored on the media are no longer needed. The most common type of sequential media for which this process is performed is magnetic tape. Storage management systems may implement operations called “reclamation” or “recycling” to reclaim space by copying the data that is still needed from one sequential media volume to a new volume so that the source volume can be reclaimed or reused. This is typically done after a sequential volume has filled and the usable data on the volume falls below a specified threshold, typically established by the product user or administrator. The operation typically requires substantial database update activity in addition to data movement because the data location on the new volume needs to be updated in the database so that the data can be later located when needed by a restore or retrieve operation.
With certain storage management systems, backup or archive data stored on sequential media expires when a management policy (such as a retention or versioning policy) dictates that the data should no longer be retained. Because multiple files are stored sequentially on the media and each of the files may expire at differing times, segments of the data stored on the media are no longer needed over time. Upon expiration of a data object, a storage management server may logically delete the data object by removing references to the locations at which the data object was stored. Such expiration of data objects, as well as deletion of data objects for other reasons cause logical vacancies to develop in the storage volumes. Such logical vacancies are space that is taken up by objects that are no longer needed. Since sequential media allows data to be appended, but does not allow for internal sections of the media to be overwritten, the logical vacancies cannot be reused unless the media is re-written from the beginning.
The physical reclamation process described above utilizes resources of the storage management server. For example, copying the data objects from the first storage volume to the second storage volume requires server resources. As another example, reclamation typically requires substantial database update activity because the data location on the new volume needs to be updated in the database so that the data can be later located when needed by a restore or retrieve operation.
In one embodiment, a method for reclaiming a sequential access disk volume is implemented by maintaining a set of objects for a sequential access disk volume using a sparse file. In this embodiment, the objects that are no longer needed are reclaimed by calculating region data (such as offset and length) and using this region data to remove the objects by making their regions in the sparse file null.
In another embodiment, the sequential access disk volume is a virtual tape library volume. In yet another embodiment, the method includes receiving an instruction to reclaim objects because the objects are expired or otherwise no longer needed. In yet another embodiment, the removal of objects from the sparse file is triggered when the sequential access disk volume meets a reclamation threshold indicator. In yet another embodiment, after the objects are removed from the sparse file, a database is updated with information concerning the size of the sparse file.
Other embodiments of the invention provide a method for replicating a source sequential access disk volume that is implemented as a sparse file to a target sequential access disk volume that also is implemented as a sparse file. In one embodiment, a server for the source volume sends a message to a server for the target volume. The server for the target volume parses the message to extract region data (such as offset and length) which is then used to remove objects by making their regions in the sparse file null. In one embodiment, the target and source files are virtual tape library volumes.
As illustrated by
The present invention preferably may be a computer program product stored on a computer useable medium (such as a disk) with instructions which are read and executed by the storage management server 410 (or other type of computer), causing the server to perform the steps necessary to implement or use the present invention. The computer program or the operating system may also be tangibly embodied in the server's memory or accessed over a network. The present invention may also be a method for performing the steps necessary to implement or use the invention. In yet another embodiment, the invention may be a system of hardware or software components. Those skilled in the art will recognize that many modifications may be made to the embodiments of the present invention without departing its scope. Features discussed with regard to the various embodiments of the invention may be combined and need not all appear in a single embodiment. In embodiments described with the use of flowcharts, the steps may be combined or reordered without deviating from the scope of the present invention.
Through the storage management server 410, the system may manage the backup or archiving of data objects from the clients 405 to a storage repository 415. In some embodiments, the storage repository 415 consists of a set of logical storage pools 420, each containing one or more logical or physical storage volumes 105. Once data objects are stored in a storage pool 420, they may be copied or relocated to other storage pools in the storage repository 415.
The present invention makes use of virtual tape libraries (VTLs). While VTLs simulate tape, disks are usually utilized as the actual storage media. Although disks are used, the VTLs preserve the semantics of tape operations. In
The VTL system 440 may include one or more VTL servers 430 in communication with one or more VTL volumes 435. Data stored in the storage repository 415 may be backed up, archived or otherwise moved to the VTL volumes 435 through the VTL system 440. In an alternate embodiment, the VTL system may be part of the storage repository 415 and used in place of the logical storage pools 420 or storage volumes 105.
As part of the invention, regions of the VTL's disk storage that are no longer needed are marked. In a preferred embodiment, the present invention makes use of sparse files for this task. Such sparse files null out the regions of the VTL media that are no longer needed. Because the files are sparse, the offset location for data that is still valid is preserved and database updates are not required to record the new location of the valid data. In addition, since sparse files require less disk space their use reclaims space as if the good portions of the file were copied to a new volume or file. The present invention may increase the speed of reclamation since it is a logical reclamation instead of a physical reclamation. While sparse files are described, one skilled in the art will recognize that there are other techniques to mark regions of the VTL media that are no longer needed.
Performing reclamation on a sequential media volume that is known to be backed by disk media involves identifying the regions (offsets and lengths) of the volume that are no longer needed. This information is then used to create “holes” in the volume (file) by nulling out the region to make it a sparse file which is optimized by the file system so that only the needed regions are stored on disk. Logically, the volume (file) size stays the same but utilizes less physical disk space. The offsets and lengths of the remaining needed data objects are not changed, so database updates are not needed to record the new locations. In addition, reclaimed data remain on the same logical volume (which is stored in a file) and does not have to be copied to a new logical volume so, again, database updates are not required. A VTL that can support the subject invention can be used to reduce the database activity and the data movement required to reclaim unneeded space in virtual tape volumes.
In addition to VTLs, the technique of the present invention can be used to reclaim space in sequential access disk volumes that are supported directly in a storage management system. The sequential access storage volumes (such as those with device class of type ‘FILE’ in products like the IBM Tivoli Storage Manager) can be implemented as files of a size specified by the administrator and filled sequentially, much like tape volumes. As with tape or with disk files in a VTL, the files need to be reclaimed to reuse unneeded space. The present invention, then, can be used directly by a storage management system and does not require a VTL.
An additional embodiment of the present invention may apply to configurations in which storage volumes on systems at separate locations are replicated over a network. In such environments, the reclamation technique of the present invention may greatly reduce the transfer of data over the network, because in a reclamation procedure the source system only needs to communicate offsets and lengths of the regions that should be nulled out on the target volume in the other system. One skilled in the art will recognize that such an implementation may require new interfaces between the source and target systems.
The implementation of this invention preferably has the logical size and physical size of the sequential media volumes recorded in a database. The physical size can be used to prevent the system from repeatedly reclaiming space that has already been reclaimed (e.g. the holes have already been made in the target volume backed by a sparse file). Reclamation operations may be triggered by calculating the amount of space that is no longer needed in a sequential access volume but which has not yet been nulled out with a sparse operation. When this reclaimable space reaches or surpasses a certain threshold value, the volume may be reclaimed.
The present invention can be explained through an example of a sequential volume that has been written to the end of volume and which has data objects that have been logically deleted. Consider a sequential volume that has a logical size of 1 GB and a physical size of 1 GB. In other words, the sequential volume does not yet contain space that has been nulled out. The storage management system may determine that 25% of the space in volume is still needed. Thus, 75% of the space is no longer required. Through the present invention, the storage management system may access the database and calculate the offsets and lengths of the regions in the volume that are no longer required. This information may then be used to null out regions of the file, introducing holes in the volume. After creating the holes, the logical size of the volume may still be 1 GB, but the physical size may be only 250 MB.
A subsequent reclamation operation in this example, may determine that the volume has 75% free space, which would indicate that the volume may need to be reclaimed. An additional check may inspect the physical size of the volume to determine that 75% of the space has already been reclaimed. This may indicate that the reclamation operation does not need to be performed again. As volumes are reclaimed through the present invention, the database preferably needs to track the physical size, logical size and the space used so that repeated reclamation operations are not performed unnecessarily. Later if the storage management system determines that the volume is now 90% free and that the physical size is 75% less than the logical size, then reclamation may be done again to reduce the physical size to perhaps only 100 MB.
Having now explained one example use of the invention, focus is now turned to
At step 515, it may be determined whether the volume qualifies for reclamation. One way to do this is to compare the space used on the volume (from step 510) against the logical size of the volume. For example, one formula for making the determination may be
If the volume qualifies for reclamation, the process may discover whether there is non-sparse, unused space on the volume (i.e., unused space that has not yet been nulled out) (step 520). One method to do this is to compare the space used on the volume with the physical size of the volume. If the space used is less than the physical size of the volume, the volume has empty space that has not yet been made sparse. This could be determined by requiring that the difference between physical size and space used be greater than some fixed value, or that the ratio of this difference divided by the logical volume size be greater than some threshold.
In the process shown in
At step 535, the database is preferably updated with the new physical size for the volume. This prevents the system from repeatedly reclaiming space that has already been reclaimed.
The present invention avoids movement of data objects during reclamation. Furthermore, because none of the objects on the volume are moved during reclamation, there is no need to update the database records for the remaining objects. Thus, by using the present invention for reclamation, the present invention provides for the possibility of increasing the speed of reclamation and reducing database updates.
The present invention is useful not only for reclamation, but also for replication. For example,
Described herein are various embodiments of the invention, providing a reclamation method for a sequential access disk volume as well as a replication method for a source and target sequential access disk volume. In some of the discussion, embodiments have been discussed in terms of virtual tape library volumes. One skilled in the art will recognize that the invention is applicable to other forms of sequential access disk volumes. For example, the invention may also be practiced with volumes with device class of type ‘FILE’ in products like the IBM Tivoli Storage Manager, etc.
The embodiments described herein and illustrated in the drawings include methods and systems. One skilled in the art will recognize that the scope of the invention extends to methods in which the steps are reordered or accomplished differently. As a system, the invention can be implemented as a series of components created in hardware or software. For example, the invention may make use of a calculation component, an object removal component, a receiver, an evaluator, an updater, a replication message, a parser, etc. These components, which may also be found in the subsequent claims, are readily created by one skilled in the art based on the detailed description and drawings. One skilled in the art will also understand that the invention can also be embodied as a computer program that is stored on a computer readable medium. None of the various embodiments described herein should be read as limiting the invention to just the steps, components and computer code described, but rather the embodiments serve as a way to teach the concepts of the invention to one skilled in the art.
Kaczmarski, Michael Allen, Cannon, David Maxwell
Patent | Priority | Assignee | Title |
10061834, | Oct 31 2014 | Amazon Technologies, Inc | Incremental out-of-place updates for datasets in data stores |
10417072, | Jan 23 2015 | EMC IP HOLDING COMPANY LLC | Scalable predictive early warning system for data backup event log |
10452305, | Jun 20 2018 | International Business Machines Corporation | Tape drive data reclamation |
10545698, | Dec 01 2015 | International Business Machines Corporation | Copy from source medium to target medium |
10732843, | Jun 20 2018 | International Business Machines Corporation | Tape drive data reclamation |
10884649, | Jun 20 2018 | International Business Machines Corporation | Tape drive data reclamation |
11314439, | Dec 01 2015 | International Business Machines Corporation | Copy from source medium to target medium |
9236065, | Nov 25 2013 | International Business Machines Corporation | Reclamation of data on tape cartridge |
9600493, | Dec 01 2015 | International Business Machines Corporation | Copy from source medium to target medium |
Patent | Priority | Assignee | Title |
5564037, | Mar 29 1995 | GOOGLE LLC | Real time data migration system and method employing sparse files |
5943688, | May 29 1997 | International Business Machines Corporation | Automated database back-up within a data storage system using removable media |
5953729, | Dec 23 1997 | Microsoft Technology Licensing, LLC | Using sparse file technology to stage data that will then be stored in remote storage |
6173359, | Aug 27 1997 | International Business Machines Corp. | Storage and access to scratch mounts in VTS system |
6675257, | Jun 28 2000 | Microsoft Technology Licensing, LLC | System and method for managing storage space on a sequential storage media |
7054888, | Oct 16 2002 | Microsoft Technology Licensing, LLC | Optimizing media player memory during rendering |
7386663, | May 13 2004 | Transaction-based storage system and method that uses variable sized objects to store data | |
7620765, | Dec 15 2006 | Veritas Technologies LLC | Method to delete partial virtual tape volumes |
20030196036, | |||
20040268068, | |||
WO9932995, | |||
WO9932995, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Mar 08 2007 | CANNON, DAVID MAXWELL | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 019062 | /0802 | |
Mar 14 2007 | KACZMARSKI, MICHAEL ALLEN | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 019062 | /0802 | |
Mar 26 2007 | International Business Machines Corporation | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Jan 08 2018 | REM: Maintenance Fee Reminder Mailed. |
Jun 25 2018 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
May 27 2017 | 4 years fee payment window open |
Nov 27 2017 | 6 months grace period start (w surcharge) |
May 27 2018 | patent expiry (for year 4) |
May 27 2020 | 2 years to revive unintentionally abandoned end. (for year 4) |
May 27 2021 | 8 years fee payment window open |
Nov 27 2021 | 6 months grace period start (w surcharge) |
May 27 2022 | patent expiry (for year 8) |
May 27 2024 | 2 years to revive unintentionally abandoned end. (for year 8) |
May 27 2025 | 12 years fee payment window open |
Nov 27 2025 | 6 months grace period start (w surcharge) |
May 27 2026 | patent expiry (for year 12) |
May 27 2028 | 2 years to revive unintentionally abandoned end. (for year 12) |