systems and methods of enhanced backup job scheduling are disclosed. An example method may include determining a number of jobs (n) in a backup set, determining a number of tape drives (m) in the backup device, and determining a number of concurrent disk agents (maxda) configured for each tape drive. The method may also include defining a scheduling problem based on n, m, and maxda. The method may also include solving the scheduling problem using an integer programming (IP) formulation to derive a bin-packing schedule that minimizes makespan (S) for the backup set.
|
1. A method of enhanced backup job scheduling, comprising in a computing device:
determining a number of jobs (n) in a backup set;
determining a number of tape drives (m) in the backup device;
determining a number of concurrent disk agents (maxda) configured for each tape drive;
defining a scheduling problem based on n, m, and maxda;
solving the scheduling problem using an integer programming (IP) formulation to derive a bin-packing schedule that minimizes makespan (S) for the backup set;
determining a width for the maxda of jobs processed in parallel, the width within a capacity (maxTput) of the tape drive; and
starting backup processing.
8. A system for enhancing scheduling of backup jobs, comprising:
a solver stored on non-transitory computer-readable program code and executed by a processor to derive a bin-packing schedule based on a number of jobs (n) in a backup set, a number of tape drives (m) in the backup device, and a number of concurrent disk agents (maxda) configured for each tape drive;
the solver further executed by the processor to determine a width for the maxda of jobs processed in parallel, wherein the width is within a capacity (maxTput) of the tape drives; and
wherein the bin-packing schedule is derived by the solver by solving a scheduling problem with an integer programming (IP) formulation, the bin-packing schedule minimizing makespan (S) for the backup set.
2. The method of
3. The method of
D1 defined as a duration of the longest backup job in the set,
D2 defined as a shortest possible time to process the entire set at maxTput, and
D3 defined as shortest possible time to process the entire set at maxda.
4. The method of
5. The method of
9. The method of
10. The system of
11. The system of
12. The system of
13. The system of
15. The system of
16. The system of
D1 defined as a duration of the longest backup job in the set,
D2 defined as a shortest possible time to process the entire set at maxTput, and
D3 defined as shortest possible time to process the entire set at maxda.
18. The system of
19. The system of
|
An ongoing challenge for information technology (IT) departments is effectively backing up and protecting the vast amounts of data stored throughout the enterprise. The increase in electronic documents and other files, along with regulations and retention rules for data backup, has only led to a higher demand for performance efficiency in data protection and archival tools. It is estimated that 60% to 70% of the effort associated with storage management is related to backup and recovery.
While there are a growing variety of systems and services that provide efficient file system backups over the Internet, the traditional tape-based (and virtual tape) backup is still preferred in many enterprise environments, particularly for long-term data backup and data archival. Consequently, many organizations have significant amounts of backup data stored on tape (or virtual tapes), and those organizations are interested in improving performance of their tape-based data protection solutions.
Typically, a tape-based backup solution has a configuration parameter which defines a level of concurrency (i.e., the number of concurrent processes, also referred to as “disk agents”), which can backup different objects across multiple tapes in parallel. But the backup and restore operations still involve many manual processes, and therefore are labor intensive. There is little or no information on the expected duration and throughput requirements of different backup jobs. This may lead to suboptimal scheduling and longer backup session times.
The systems and methods described herein may be used to automate design of a backup schedule which reduces or minimizes overall completion time for a given set of backup jobs by automating the parameter for setting concurrent disk agents per tape drive to enhance the tape drive throughput. In an example embodiment, an integer programming (IP) formulation is implemented using IP-solvers (e.g., CPLEX) for finding an enhanced or optimized schedule, referred to herein as a “bin-packing” schedule.
The same approach can be applied to job scheduling for incremental backups. In such an embodiment, each backup job is characterized by two metrics, referred to herein as “job duration” and “job throughput.” These metrics are derived from collected historic information about backup jobs during previous backup sessions. The design of a backup schedule can then be designed which minimizes the overall completion time for a given set of backup jobs. In an example embodiment, the design may be formulated as a resource constrained scheduling problem where a set of n jobs should be scheduled on m machines with given capacities. A general IP formulation of the backup job scheduling problem is provided for multiple tape drive configurations, and an improved and more compact IP formulation for the case of a single drive configuration may be designed using IP-solvers to find an optimized schedule (the bin-packing job schedule).
The new bin-packing schedule provides upwards of a 60% reduction in backup time. This significantly reduced backup time results in improved resource/power usage and price/performance ratios of the overall backup solution.
Each tape drive 115a-d has a configuration parameter which defines a concurrency level (i.e., the number of concurrent disk agents which can backup different objects 130 in parallel to the tape drive 115a-d). A single data stream may not be able to fully utilize the capacity/bandwidth of the backup tape drive 115a-d due to slow client devices 140a-c. For example, a typical throughput of a client device is 10-20 MB/s. Therefore, a system administrator can configure a high number of disk agents 120a-d for each tape drive 115a-d to enable concurrent backup of different objects 130 at the same time. Of course, the data streams from many different objects 130 are interleaved on the tape, and when the data of a particular object 130 needs to be restored, there is a higher restoration time for retrieving such data, for example, as compared to a continuous data stream written by a single disk agent.
Before continuing, it is noted that client devices (or “clients”) 140a-c may include any of a wide variety of computing systems, such as a stand-alone personal desktop or laptop computer (PC), workstation, personal digital assistant (PDA), or appliance, to name only a few examples. Each of the client devices 140a-c may include memory, storage, and a degree of data processing capability at least sufficient to manage a connection to the tape library 110 either directly or via a network 150. Client devices 140a-c may connect to network 110 via a suitable communication connection, including but not limited to an Internet service provider (ISP).
There are a few potential problems with a traditional backup solution which may cause inefficient backup processing. When a group of n objects 130 is assigned to be processed by the backup device (e.g., library 110), there is no way to enforce an order in which these objects should be processed. If a large (or slow) object 130 with a long backup time is selected significantly later in the backup session, this leads to an inefficient schedule and an increased overall backup time.
Also, when configuring the backup device (e.g., library 110), a system administrator should not over-estimate the number of concurrent DAs 120a-d that will be needed to handle the backup operations. The data streams from these concurrent DAs 120a-d are interleaved on the tape, and may therefore lead to a higher restoration time for retrieving such data. Moreover, when the aggregate throughput of concurrent streams exceeds the throughput of the specified tape drive 115a-d, it may increase the overall backup time. Often the backup time of a large object 130 dominates the overall backup time. Too many concurrent data streams written at the same time to the tape drive 115a-d might decrease the effective throughput of each stream, and therefore, unintentionally increase the backup time of large objects 130 and result in higher the overall backup times.
Accordingly, the systems and methods described herein may be utilized so that the backup job scheduling and configuration may be tailored based on the available historical information and the workload profile.
In order to better understand the systems and methods disclosed herein, however, it is useful to explain the LBF job scheduling mechanism and use this as a comparison basis. According to the LBF job scheduling mechanism, information about the job durations from the previous full backup may be used for an upcoming full backup. At this phase, an ordered list of objects is created and sorted in decreasing order of the backup durations. For purposes of illustration, the ordered list may be expressed as:
OrdObjList={O1, Dur1), . . . , (On, Durn)}
If there are m tape drives (Tape1, . . . Tapem), and each tape drive is configured with k disk agents, then the following running counters may be established for each tape drive:
For each tape drive Tape; (1<i<m) these counters are initialized as follows:
The iteration step of the algorithm is described as follows. Let (Oj, Durj) be the top object in the OrdObjList, and let
The tape drive Taper has the smallest assigned processing time, and still has an available DA for processing the object Oj. Accordingly, object Oj is assigned for processing to the tape drive Taper, and the running counters of this tape drive Taper are updated as follows:
TapeProcTimerTapeProcTimer+Durj,
DiskAgentrDiskAgentr−1.
The longest jobs for processing are assigned first. In addition, the job is assigned to concurrent DAs in such a way that the overall amount of processing time assigned to different tape drives is balanced. Once the objects are assigned to the available DAs, the backup processing can start. When a DA at a tape drive Tape, completes the backup of the assigned object, the running counter of this tape drive Tape, is updated as follows:
DiskAgentrDiskAgentr+1.
The DA of this tape drive Tape, is assigned the next available object from the OrdObjList, the running counters are updated again, and the backup process continues. According to the LBF job schedule mechanism, each tape drive concurrently processes a constant number (k) of jobs independent of aggregate throughput.
On the other hand, the systems and methods of the present disclosure implement an integer programming formulation of the multiple machine resource constrained scheduling problem. The systems and methods minimize the makespan (i.e., the overall completion time) of a given set of backup jobs for processing by multiple tape drives 115a-d. Accordingly, the systems and methods described herein provide a compact problem formulation that can be efficiently solved with IP solvers (e.g., CPLEX) in a reasonable compute time.
To determine scheduling for multiple tape drives 115a-d, the number of jobs in the backup set is represented by n; and the number of tape drives 115a-d in the backup device (e.g., library 110) is represented by m. The schedule is defined by a given set of n backup jobs that has to be processed by m tape drives 115a-d with given performance capacities. The maximum number of concurrent DAs 120a-d configured for each tape drive 115a-d is represented by maxDA; and the aggregate throughput of the tape drive 115a-d is represented by maxTput.
Each tape library 110 is homogeneous, but there may be different generation tape libraries in the overall set. Each job j, 1<j<n in a given backup set is defined by a pair of attributes (d3, w3), where d3 is the duration of job j, and w3 is the throughput of job j (e.g., the throughput of the tape drive 115a-d or the resource demand of job j).
At any time, each tape drive 115a-d can process up to maxDA jobs in parallel but the total “width” of these jobs cannot exceed the capacity of the tape drive 115a-d (maxTput). The objective is to find a schedule that minimizes the processing makespan and minimizes the overall completion time for a given set of backup jobs.
In an example, the variables may be defined as follows. Rij is a 0/1 variable, indicating whether backup job i is assigned to tape drive j at some point in time. Yit is a 0/1 variable, indicating whether job i starts processing at time t. Zijt is a continuous variable (acting as Rij·Yit) indicating whether job i is in processing on tape drive j at time t. S is the makespan of the entire backup session.
First, the low bound on the makespan S is approximated. The nature of a given backup workload and the tape library configuration parameters define the following three low bounds on makespan S. D1 represents the duration of the longest backup job in the given set:
The makespan S (i.e., duration of the entire backup session) cannot be smaller than the longest backup job in the set.
D2 is the shortest possible time that is required to process the entire set of submitted backup jobs at maximum tape drive throughput maxTput (multiplied by the number of tape drives).
This time represents the ideal processing of “all the bytes” in the given set of backup jobs at the maximum tape drive rate without any other configuration constraints of the backup server. Therefore, makespan S cannot be smaller than the “ideal” processing time of the backup set.
D3 is the shortest possible time to process the entire set of submitted backup jobs while using the maximum possible number maxDA of concurrent disk agents at all tape drives. This computation approximates the processing time for the case when maxDA parameter is a constraint that limits backup processing.
Accordingly, makespan S cannot be smaller than D3, and reflects the ideal processing time of the backup set with maxDA of concurrent disk agents.
In the IP formulation, estimates of the lower and upper bounds of makespan S are computed as follows.
Mlow=┌max(Di, D2, D3)┐
Mup=┌max(D1, D2, D3)/0.951┐
First, it is noted that Mup is a lower bound on makespan S since it cannot be smaller than D1, D2, or D3. However, Mup is a possible approximation of the upper bound on makespan S, and the current estimate might be incorrect. The solution does not depend on Kup in a direct way; as long as S<Mup it leads to a feasible solution. If this guess makes the problem infeasible, the computation can be repeated for Mup using 0.90, 085, etc. in the denominator, until the problem is feasible.
If Mup is too large, then a higher complexity problem is created by introducing a higher number of equations and variables. However, if Mup is too small, then the problem could be made infeasible. However, using 0.95 is a good starting estimate.
Next, the integer programming formulation is defined as follows. A job is processed by exactly one tape drive (total n equations):
Each job starts backup processing at some time before t=c−di+1, where:
The jobs that are processed concurrently by tape drive j have to satisfy the tape drive capacity constraint (at any time t). That is, the jobs aggregate throughput requirements cannot exceed tape drive maximum throughput (total m·Mup inequalities).
Maximum of maxDA concurrent jobs can be assigned to tape drive j at any point of time t.
Each job finishes the backup processing within time duration S, formally defining S as a makespan of the backup session. Next, the number of inequalities is optimized by considering only jobs i that were in processing at time t≧Mlow (total n(Mup−Mlow) inequalities).
Linking Zijt to binary variables Rij and Yit (total n·m·Mup inequalities) gives:
Zijt≧Rij+Yit−1, ∀i,j,t
Non-negativity requirements:
Rij=0/1; Yit=0/1; Zijt≧0
An IP solver (e.g., CPLEX) can be used to find a feasible solution. Once an optimized job scheduling is provided by the solver, the backup jobs can be ordered by the assigned “start” timestamps, and then the backup application can schedule these jobs in the determined order. This schedule is the bin-packing schedule.
A modified process may be used for single tape drives. Often, system administrators manually create the so-called backup groups, which are assigned to different tape drives for processing. This helps in controlling the number of tapes that are used for different mount points of the same client 140a-c, thereby avoiding having different file systems of the client machine from being written to different tapes. This situation can be especially unacceptable for smaller client machines when the backed up client data is spread across multiple tapes. Therefore, in the case of a backup group, a given set of backup jobs (a specified backup group) is assigned for processing to a particular tape drive.
If there are n jobs (i=1, 2 . . . n) and a single tape drive (e.g., 115a) for backup processing, then the IP formulation can be simplified as follows. First, the following variables are defined. Yit is a 0/1 variable, indicating whether job i starts its run at time t. S is the makespan of the entire backup session. A lower and upper bound of makespan S (Mlow and Mup respectively) are determined similarly as already discussed above.
Assuming that job i will need to finish by period t=Mup, then job i needs to start no later than t=Mup−di+1.
The jobs that are processed concurrently by the same tape drive (e.g., 115a) have to satisfy a given tape drive capacity constraint. That is, the combined bandwidth requirements should be less than or equal to maxTput (total Mup inequalities).
The maximum of maxDA concurrent jobs can be assigned to the tape drive (e.g., 115a) at any time t.
Each job finishes the backup processing within time duration S, formally defining S as a makespan of the backup session.
It is noted that the number of variables, equations and inequalities is significantly reduced compared to the general case of multiple tape drives 115a-d.
In this example, data from six backup servers were used to evaluate the performance benefits of the new bin-packing schedule, and compare its performance with already optimized LBF scheduling. The client machines included a variety of Windows and Linux desktops. In addition, there is a collection of servers with a significant amount of stored data. The computing infrastructure is typical of a medium-size enterprise environment.
There were 665 objects in the overall backup set.
The servers have four tape drives 115a-d (with a maximum data rate of 80 MB/s), each configured with four concurrent DAs 120a-d. As can be seen in
To set a base line for a performance comparison, given workloads were processed using LBF scheduling in the traditional tool architecture configured with a single tape drive 115a-d and a fixed number of four concurrent DAs 120a-d per tape drive. Then the same workloads (from six backup servers) were processed with a new bin-packing schedule. The backup servers were configured with a single tape drive 115a-d and the following parameters: no more than 10 concurrent disk agents were used for each tape drive (maxDA=10); and the aggregate throughput of the assigned concurrent objects for each tape drive did not exceed 80 MB/s (maxTput=80 MB/s).
Table I shows the absolute and relative reduction in the overall backup session times when the bin-packing schedule is used instead of LBF.
TABLE I
Absolute and Relative Reduction
Backup
of the Overall Backup Time
Server
week1
week2
week3
Server1
665 min (35%)
651 min (34%)
675 min (35%)
Server2
340 min (33%)
212 min (24%)
163 min (19%)
Server3
922 min (52%)
928 min (52%)
920 min (52%)
Server4
520 min (44%)
552 min (44%)
534 min (43%)
Server5
126 min (33%)
124 min (33%)
165 min (39%)
Server6
231 min (28%)
190 min (26%)
234 min (29%)
The bin-packing schedule was created with additional information on both job duration and its throughput (observed from the past measurements). This additional information on job throughput was used to schedule a higher number of concurrent backup jobs (when it is appropriate) in order to optimize throughput of the tape drive.
Accordingly, significant time savings were achieved across all six backup servers using the bin-packing job scheduling as compared to the LBF schedule. In this example, the absolute time savings ranged from 124 min to 928 min. These results were consistent for three consecutive weeks. The relative performance benefits and reduction in the backup time were 19%-52% and depended on the specifics of workload, including for example, the size and throughput distribution of objects the backup server is responsible for.
The bin-packing schedule results discussed above were for a single tape drive which was formulated in a significantly more compact and efficient way than a multi-tape drive IP formulation.
In order to understand the performance benefits and efficiency of the designed IP approach for multi-tape drive configurations, the overall set of backup jobs from the six backup servers was used as a baseline (the set consisted of 665 jobs), and then different backup set “samples” were created of a given size. Multiple different backup sets with 100, 200, 300, and 400 jobs were generated; four “samples” of each size. Thus, there were sixteen different backup sets of different size, but with representative characteristics of real workloads.
The backup sets with 100 and 200 jobs were used for evaluating single and double tape drive configurations. The backup sets with 300 and 400 jobs were used for evaluating a full spectrum of one to four tape drive configurations.
The second set of results 320 represents performance benefits using the bin-packing schedule for a multi-drive configuration. Again, the bin-packing schedule significantly outperformed the LBF schedule, and only when the makespan was explicitly bounded by the duration of the longest job, did both bin-packing and LBF schedule produce similar results.
This time represents the ideal processing of “all the bytes” in a given set of backup jobs at the maximum tape drive rate (multiplied by the number of drives) without any other configuration constraints of the backup server. The makespan S cannot be smaller than the “ideal” processing time of the backup set.
The relationship between D1 and D2 helps explain the “complexity” of the backup job scheduling problem. When D1≧D2, then D1 defines the lower bound of the makespan and the ideal processing of all the jobs at the maximum disk drive rate completes earlier than D1. In this case, the duration of the longest job strongly impacts the makespan. The difference between D1 and D2 determines the size of the “extra room” for making different job scheduling choices. Typically, this case means that the solver can quickly find the near-optimal or optimal solution by scheduling the longest job as one of the first jobs, and often the remaining jobs might be scheduled in a flexible way without impacting the schedule makespan.
When D1≦D2, then D2 defines the lower bound of the makespan, and potentially there are many more possible schedules that have different makespan. The larger difference between D2 and D1 creates more and more choices for different schedule choices, and the problem becomes much harder to solve. Accordingly, the relationship between D1 and D2 can be defined as Rel(D1,D2)=(D2−D1)/D1.
The Rel(D1,D2) metric correlates well with the solution time of the solver and therefore can be useful in its prediction.
It is apparent from the above description that the backup tools provide a variety of different means to system administrators for scheduling designated collections of client machines on a certain time table. Scheduling of incoming jobs and the assignment of processors is an important factor for optimizing the performance of parallel and distributed systems. The choice of the scheduling/assignment algorithm is driven by performance objectives. If the performance goal is to minimize mean response time then the optimal algorithm is to schedule the shortest job first. However, if there is a requirement of fairness in job processing, then processor-sharing or round-robin scheduling might be preferable.
For large-scale heterogeneous distributed systems, job scheduling is one of the main components of resource management. Many scheduling problems can be formulated as a resource constrained scheduling problem where a set of n jobs should be scheduled on m machines with given capacities.
In operation 610a-c, a number of jobs (n) in a backup set is determined, a number of tape drives (m) in the backup device is determined, and a number of concurrent disk agents (maxDA) configured for each tape drive is determined. In operation 620, a scheduling problem is defined based on n, m, and maxDA. The scheduling problem is solved in operation 630 using an integer programming (IP) formulation to derive a bin-packing schedule which minimizes makespan (S) for the backup set.
The operations shown and described herein are provided to illustrate exemplary implementations for scheduling backup jobs. It is noted that the operations are not limited to the ordering shown. Still other operations may also be implemented.
It is noted that the exemplary embodiments shown and described are provided for purposes of illustration and are not intended to be limiting. Still other embodiments are also contemplated.
Zhang, Xin, Cherkasova, Ludmila, Li, Xiaozhou
Patent | Priority | Assignee | Title |
10200261, | Apr 30 2015 | Microsoft Technology Licensing, LLC | Multiple-computing-node system job node selection |
10579295, | Oct 14 2016 | International Business Machines Corporation | Tape backup in large scale distributed systems |
10579298, | Oct 14 2016 | International Business Machines Corporation | Tape backup in large scale distributed systems |
9442770, | Jun 04 2015 | LENOVO INTERNATIONAL LIMITED | Workload execution timing device in a virtual machine environment |
9864760, | Dec 05 2013 | EMC IP HOLDING COMPANY LLC | Method and system for concurrently backing up data streams based on backup time estimates |
Patent | Priority | Assignee | Title |
5392430, | Oct 30 1992 | International Business Machines; International Business Machines Corporation | Hierarchical scheduling method for processing tasks having precedence constraints on a parallel processing system |
5673381, | May 27 1994 | CA, INC | System and parallel streaming and data stripping to back-up a network |
6934724, | Dec 21 2000 | EMC IP HOLDING COMPANY LLC | Methods and apparatus for reducing resource contention in parallel data backup processes |
8091087, | Apr 20 2007 | Microsoft Technology Licensing, LLC | Scheduling of new job within a start time range based on calculated current load and predicted load value of the new job on media resources |
8321644, | Oct 01 2009 | Hewlett Packard Enterprise Development LP | Backing up filesystems to a storage device |
20080244601, | |||
20080263551, | |||
20100257326, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Nov 17 2010 | CHERKASOVA, LUDMILA | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 030883 | /0838 | |
Nov 17 2010 | Zhang, Xin | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 030883 | /0838 | |
Nov 17 2010 | LI, XIAOZHOU | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 030883 | /0838 | |
Nov 19 2010 | Hewlett-Packard Development Company, L.P. | (assignment on the face of the patent) | / | |||
Oct 27 2015 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Hewlett Packard Enterprise Development LP | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 037079 | /0001 |
Date | Maintenance Fee Events |
Mar 21 2017 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jun 07 2021 | REM: Maintenance Fee Reminder Mailed. |
Nov 22 2021 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Oct 15 2016 | 4 years fee payment window open |
Apr 15 2017 | 6 months grace period start (w surcharge) |
Oct 15 2017 | patent expiry (for year 4) |
Oct 15 2019 | 2 years to revive unintentionally abandoned end. (for year 4) |
Oct 15 2020 | 8 years fee payment window open |
Apr 15 2021 | 6 months grace period start (w surcharge) |
Oct 15 2021 | patent expiry (for year 8) |
Oct 15 2023 | 2 years to revive unintentionally abandoned end. (for year 8) |
Oct 15 2024 | 12 years fee payment window open |
Apr 15 2025 | 6 months grace period start (w surcharge) |
Oct 15 2025 | patent expiry (for year 12) |
Oct 15 2027 | 2 years to revive unintentionally abandoned end. (for year 12) |