Methods and apparatus for migrating data with minimal disruption in a storage virtualization system. In one embodiment, a method includes collecting information on a backend logical unit (lun) having a direct path from a host, presenting the backend lun to the host through a virtual target via the storage virtualization application, disconnecting the direct path from the host to the backend lun so that the backend lun is accessed by the host via the storage virtualization application transparently to the host, and migrating data from the backend lun to a destination storage transparently to the host.
|
1. A method, comprising: migrating data in a storage virtualization environment from a backend lun to a destination storage using a data migration module stored in a device coupled to a host by a switch fabric, using the data migration module comprising:
collecting information on a personality of the backend lun by collecting responses from the backend lun to Small Computer System Interface (SCSI) commands issued from the host to the backend lun while the host is coupled to the backend lun by the switch fabric in a direct path;
presenting a virtualization of the backend lun to include the personality of the backend lun to the host, using the collected information, through a virtual target at the device using a first path through the switch while the host is coupled to the backend lun by the switch fabric in the direct path;
disconnecting the host from the direct path to the backend lun while a destination storage is coupled to the device by the switch fabric; and
migrating data from the backend lun to the destination storage transparently to the host after disconnecting the host from the direct path to the backend lun.
18. A system, comprising:
a host coupled to a device comprising a data migration module in a storage virtualization environment using a switch fabric;
a backend logical unit (lun) coupled to the host using the switch fabric to form a direct path, the backend lun being coupled to the device using the switch fabric;
a destination storage coupled to the device using the switch fabric; and
wherein the data migration module is configured to:
collect information on a personality of the backend lun by collecting responses from the backend lun to Small Computer System Interface (SCSI) commands issued from the host to the backend lun while the host is coupled to the backend lun by the switch fabric in the direct path;
present a virtualization of the backend lun to include the personality of the backend lun to the host, using the collected information, through a virtual target at the device using a first path though the switch fabric while the host is coupled to the backend lun by the switch fabric in the direct path;
disconnect the host from the direct path to the backend lun while a destination storage is coupled to the device by the switch fabric; and
migrate data from the backend lun to the destination storage transparently to the host after disconnecting the host from the direct path.
13. An article, comprising:
a non-transitory machine-readable medium that stores executable instructions to migrate data in a storage virtualization environment from a backend logical unit (lun) to a destination storage, a host being coupled to a device comprising the non-transitory machine-readable medium using a switch fabric, the backend lun being coupled to the host using the switch fabric to form a direct path, a destination storage being coupled to the device using the switch fabric, the instructions causing a machine to:
collect information on a personality of the backend lun by collecting responses to Small Computer System Interface (SCSI) commands issued from the host to the backend lun while the host is coupled to the backend lun by the switch fabric in the direct path;
present a virtualization of the backend lun to include the personality of the backend lun to the host, using the collected information, through a virtual target at the device using a first path through the switch fabric while the host is coupled to the backend lun by the switch fabric in the direct path;
disconnect the host from the direct path to the backend lun while a destination storage is coupled to the device by the switch fabric;
migrate data from the backend lun to the destination storage transparently to the host after disconnecting the host from the direct path.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
7. The method of
8. The method of
9. The method of
11. The method of
12. The method of
14. The article according to
15. The article according to
16. The article according to
persistently store the collected information on the backend lun, and
utilize, after the backend lun is disconnected, the collected information on the backend lun to present to the host a personality identical to that of the prior storage and writing data to the destination storage.
17. The article according to
19. The system according to
20. The system according to
|
Computer systems are constantly improving in terms of speed, reliability, and processing capability. As is known in the art, computer systems which process and store large amounts of data typically include a one or more processors in communication with a shared data storage system in which the data is stored. The data storage system may include one or more storage devices, usually of a fairly robust nature and useful for storage spanning various temporal requirements, e.g., disk drives. The one or more processors perform their respective operations using the storage system. Mass storage systems (MSS) typically include an array of a plurality of disks with on-board intelligent and communications electronics and software for making the data on the disks available.
To leverage the value of MSS, these are typically networked in some fashion, Popular implementations of networks for MSS include network attached storage (NAS) and storage area networks (SAN). In NAS, MSS is typically accessed over known TCP/IP lines such as Ethernet using industry standard file sharing protocols like NFS, HTTP, and Windows Networking. In SAN, the MSS is typically directly accessed over Fibre Channel switching fabric using encapsulated SCSI protocols.
To enhance the user experience with storage systems, storage virtualization applications abstract storage available to the user. For example, storage arrays from a variety of vendors can be used transparently to the user. One feature of such systems, such as the INVISTA system by EMC Corporation, includes data migration from source storage to destination storage. Data migration can require down time to perform the migration. As will be readily appreciated, it is desirable to reduce the amount of down time for data migration and to make it easy for users to perform data migrations.
The present invention provides methods and apparatus for a storage virtualization to migrate data from source storage to destination storage with minimal disruption transparently to the host. With this arrangement, data can be migrated using a storage virtualization application without software being added to the host with the use of ‘pass through’ logical units. While exemplary embodiments of the invention are shown and described in conjunction with particular configurations, components, and storage, it is understood that the invention is applicable to storage systems in general for which it is desirable to migrate data from a first storage to a second storage.
In one aspect of the invention, a method comprises migrating data in a storage virtualization environment having a storage virtualization application, including collecting information on a backend logical unit (LUN) having a direct path from a host, presenting the backend LUN to the host through a virtual target via the storage virtualization application, disconnecting the direct path from the host to the backend LUN so that the backend LUN is accessed by the host via the storage virtualization application transparently to the host, and migrating data from the backend LUN to a destination storage transparently to the host.
The method can further include one or more of the following features: the backend LUN is identified with the same unique ID for both the direct path to the host and the virtual target, collecting information includes static information collection, collecting information includes information collection through monitoring, collecting information includes self-learning where the storage virtualization application functions as a pass through to the backend LUN, persistently storing the collected information on the backend LUN, utilizing, after the backend LUN is disconnected, the collected information on the backend LUN to present to the host a personality identical to that of the prior storage and writing data to the destination storage, collecting the information on the backend logical unit (LUN) by a virtual initiator, removing the backend LUN after completion of the data migration to the destination storage, moving the destination storage to a remote location, during the data migration, mirroring writes to the backend LUN and the destination storage and servicing reads only through the backend LUN, and retaining a personality of the backend LUN until cleared by a user.
In another aspect of the invention, an article comprises a computer readable medium having stored instructions to enable a machine to perform the steps of: migrating data in a storage virtualization environment having a storage virtualization application, including collecting information on a backend logical unit (LUN) having a direct path from a host, presenting the backend LUN to the host through a virtual target via the storage virtualization application, disconnecting the direct path from the host to the backend LUN so that the backend LUN is accessed by the host via the storage virtualization application transparently to the host, and migrating data from the backend LUN to a destination storage transparently to the host.
In a further aspect of the invention, a system comprises a storage virtualization application having a data migration module to collect information on a backend logical unit (LUN) having a direct path from a host for presenting the backend LUN to the host through a virtual target via the storage virtualization application, and disconnecting the direct path from the host to the backend LUN so that the backend LUN is accessed by the host via the storage virtualization application transparently to the host to enable migration of data from the backend LUN to a destination storage transparently to the host.
The above and further advantages of the present invention may be better under stood by referring to the following description taken into conjunction with the accompanying drawings in which:
The methods and apparatus of the present invention are intended for use in Storage Area Networks (SANs) that include data storage systems, such as the Symmetrix Integrated Cache Disk Array system or the Clariion Disk Array system available from EMC Corporation of Hopkinton, Mass. and those provided by vendors other than EMC.
The methods and apparatus of this invention may take the form, at least partially, of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, random access or read only-memory, or any other machine-readable storage medium, including transmission medium. When the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. The methods and apparatus of the present invention may be embodied in the form of program code that is transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via any other form of transmission. And may be implemented such that herein, when the program code is received and loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code combines with the processor to provide a unique apparatus that operates analogously to specific logic circuits. The program code (software-based logic) for carrying out the method is embodied as part of the system described below.
The embodiment of the present invention denominated as FabricX architecture allows storage administrators to manage the components of their SAN infrastructure without interrupting the services they provide to their clients. This provides for a centralization of management allowing the storage infrastructure to be managed without requiring host-based software or resources for this management. For example, data storage volumes can be restructured and moved across storage devices on the SAN while the hosts accessing these volumes continue to operate undisturbed.
Architecture
Referring now to
Generally such a data storage system includes a system memory and sets or pluralities and of multiple data storage devices or data stores. The system memory can comprise a buffer or cache memory; the storage devices in the pluralities and can comprise disk storage devices, optical storage devices and the like. However, in a preferred embodiment the storage devices are disk storage devices. The sets represent an array of storage devices in any of a variety of known configurations. In such a data storage system, a computer or host adapter provides communications between a host system and the system memory and disk adapters and provides pathways between the system memory and the storage device pluralities. Regarding terminology related to the preferred data storage system, the host or host network is sometimes referred to as the front end and from the disk adapters toward the disks is sometimes referred to as the back end. Since the invention includes the ability to virtualize disks using LUNs as described below, a virtual initiator may be interchanged with disk adapters. A bus interconnects the system memory, and communicates with front and back end. As will be described below, providing such a bus with switches provides discrete access to components of the system.
Referring again to
It is known in SAN networks using Fibre Channel and/or SCSI protocols that such data devices as those represented by disks or storage 30 40 can be mapped using a protocol to a Fibre Channel logical unit (LUN) that act as virtual disks that may be presented for access to one or more hosts, such as hosts 13 18 for I/O operations. LUN's are also sometimes referred to interchangeably with data volumes which at a logical level represent physical storage such as that on storage 30 40.
Over the referred IP Network 64 and by communicating through the management interface 43, a Storage Administrator using the EMS 29 may create virtual LUN's (Disks) that are composed of elements from the back-end storage. These virtual devices which may be represented, for example by a disk icon (not shown) grouped with the intelligent switch, are made available through targets created on a specific set of intelligent switch ports. Client host systems connect to these ‘front-end’ ports to access the created volumes. The client host systems, the front-end ports, and the virtual LUN's all form part of the Front-End SAN 20. Note Hosts, such as Host 13 may connect directly to the IMPS.
The combined processing and intelligence of the switch and the FabricX Controller provide the connection between the client hosts in the front-end SAN and the storage in the back-end SAN. The FabricX Controller runs storage applications that are presented to the client hosts. These include the Volume Management, Data Mobility, Snapshots, Clones, and Mirrors, which are terms of art known with EMC's Clariion data storage system. In a preferred embodiment the FabricX Controller implementation is based on the CLARiiON Barracuda storage processor and the CLARiiON Flare software implementation which includes layered drivers that are discussed below.
Hardware Components
Referring to
The CPP provides support for storage and switch software applications and runs the software that handles exceptions that occur on the fast-path. Regarding where software runs, in the exemplary embodiment, software for management by the Storage and Switch Controller is shown running on the CPP; however, that is merely an example and any or all software may be loaded and run from the IMPS or anywhere in the networked environment. Additionally the CPP supports management interfaces used to configure and control the instance. The CPP is composed of redundant storage processors and is further described with reference to
In one embodiment, the DAE, together with the disks that it contains provide the persistent storage of the meta-data for the FabricX instance. The meta data includes configuration information that identifies the components of the instance, for example, the identities of the intelligent switches that make up the instance, data describing the set of exported virtual volumes, the software for the Controller, information describing what hosts and initiators are allowed to see what volumes, etc. The DAE is further described with reference to
Referring to
Referring to
Referring to
An IMPS can be used to support virtual SANs (VSANs), to parse between front-end SANs and back-end SANs even if such SANs are not physically configured. In general, switches that support VSANs allow a shared storage area network to be configured into separate logical SANs providing isolation between the components of different VSANs. The IMPS itself may be configured in accordance with specifications from such known switch vendors as Brocade and Cisco.
Each intelligent switch preferably contains a collection of SCSI ports, such as Fibre Channel, with translation processing functions that allow a port or associated hardware to make various transformations on the SCSI command stream flowing through that port. These transformations are performed at near wire-speeds and hence have little impact on the latency of the command. However, intelligent ports are only able to make translations on READ and WRITE commands. For other SCSI commands, the port blocks the request and passes control for the request to a higher-level control function. This process is referred to as faulting the request. Faulting also occurs for read and write commands when certain conditions exist within the port. For example, a common transformation performed by an intelligent port is to map the data region of a virtual volume presented to a host to the data regions of back-end storage elements. To support this, the port maintains data that allows it to translate (map) logical block addresses of the virtual volume to logical back-end addresses on the back-end devices. If this data is not present in the port when a read or write is received, the port will fault the request to the control function. This is referred to as a map fault.
Once the control function receives a faulted request it takes whatever actions necessary to respond to the request (for example it might load missing map data), then either responds directly to the request or resumes it. The control function supported may be implemented differently on different switches. On some vendor's switches the control function is known to be supported by a processor embedded within the blade containing the intelligent ports, on others it is known to provide it as an adjunct processor which is accessed via the backplane of the switch, a third known configuration is to support the control function as a completely independent hardware component that is accessed through a network such as Fibre Channel or IP.
Back-end storage devices connect to FabricX via the Fibre Channel ports of the IMPSs that have been identified as back-end ports (oriented in
The EMS 29 connects to FabricX in a preferred embodiment through an IP network, e.g. an Ethernet network which may be accessed redundantly. The FabricX CPP 58 in a preferred embodiment has two 10/100 Mbps Ethernet NIC that is used both for connectivity to the IMPS (so that it can manage the IMPS and receive SNMP traps), and for connectivity to the EMS. It is recommended that the IP networks 624a b provided isolation and dedicated 100 Mbps bandwidth to the IMPS and CPP.
The EMS in a preferred embodiment is configured with IP addresses for each Processor 72 74 in the FabricX CPP. This allows direct connection to each processor. Each Processor preferably has its own Fibre Channel link that provides the physical path to each IMPS in the FabricX instance. Other connections may also work, such as the use of Gigabit Ethernet control path connections between the CPP and IMPS. A logical control path is established between each Processor of the CPP and each IMPS. The control paths to IMPS's are multiplexed over the physical link that connects the respective SP of the CPP to its corresponding IMPS. The IMPS provides the internal routing necessary to send and deliver Fiber Channel frames between the SP of the CPP and the respective IMPS. Other embodiments are conceivable that could use IP connectivity for the control path. In such a case the IMPS could contain logic to route IP packets to the SP.
Software Components
Reference is made to
The CPP-based software includes a mixture of User-Level Services 122 and Kernel-mode or Kernel services 128. The Kernel services include Layered Drivers 123, Common Services 125, Switch Independent Layer (SIL) 126, and Switch Abstraction Layer-Control Path Processor (SAL-CPP) 127. The IMPS-based software preferably runs on a control processor within the vendor's switch. This processor may be embedded on an I/O blade within the switch or implemented as a separate hardware module.
The SAL-CPP 127 provides a vendor-independent interface to the services provided by the IMPS's that form a FabricX instance. This software layer creates and manages a IMPS Client for each IMPS that is part of the FabricX instance. The following services are provided by the SAL-CPP. There is a Switch Configuration and Management Services (SWITCH CONTIG MGMT) in the SAL-CPP that provides uniform support for configuring the IMPS, zone configuration and management, name service configuration and management, discovering the ports supported by the IMPS, reporting management related events such as Registered State Change Notifications (RSCNs), and component failure notifications. The service interfaces combined with the interfaces provided by the user-level Switch Management service encapsulate the differences between different switch vendors and provide a single uniform interface to the FabricX management system. The Switch Adapter Port Driver (SAPD) of the Kernel Services 128 uses these interfaces to learn what ports are supported by the instance so that it can create the appropriate device objects representing these ports.
Referring to
Referring again to
SAL-CPP 127 provides a volume management (Volume MGMT) service interface that supports creating and destroying virtual volumes, associating virtual volumes with back-end Storage Elements, and composing virtual volumes for the purpose of aggregation, striping, mirroring, and/or slicing. The volume management service interface also can be used for loading all or part of the translation map for a volume to a virtualizer, quiescing and resuming IO to a virtual volume, creating and destroying permission maps for a volume, and handling map cache miss faults, permission map faults, and other back-end errors. These services are used by the volume graph manager (VGM) in each SP to maintain the mapping from the virtual targets presented out the logical front of the instance to the Storage Elements on the back-end.
There are other SAL-CPP modules. The SAL copy service (COPY SVCS) functions provide the ability to copy blocks of data from one virtual volume to another. The Event Dispatcher is responsible for delivering events produced from the IMPS to the registered kernel-based services such as Path Management, VGM, Switch Manager, etc.
The Switch and Configuration Management Interface is responsible for managing the connection to an IMPS. Each Storage Processor maintains one IMPS client for each IMPS that is part of the instance. These clients are created when the Switch Manager process directs the SAL-CPP to create a session with a IMPS.
The Switch Independent Layer (SIL) 126 is a collection of higher-level switch-oriented services. These services are implemented using the lower-level services provided by the SAL-CPP. These services include: Volume Graph Manager (VGM)—The volume graph manager is responsible for processing map-miss faults, permission map faults, and back-end IO errors that it receives from the SAL-CPP. The VGM maintains volume graphs that provide the complete mapping of the data areas of front-end virtual volumes to the data areas of back-end volumes. The Volume Graph Manager provides its service via a kernel DLL running within the SP. Data Copy Session Manager—The Data Copy Session Manager provides high-level copy services to its clients. Using this service, clients can create sessions to control the copying of data from one virtual volume to another. The service allows its clients to control the amount of data copied in a transaction, the amount of time between transactions, sessions can be suspended, resumed, and aborted. This service builds on top of capabilities provided by the SAL-CPP's Data Copy Services. The Data Copy Session Manager provides its service as a kernel level DLL running within the SP. Path Management—The path management component of the SIL is a kernel-level DLL that works in conjunction with the Path Manager. Its primary responsibility is to provide the Path Manager with access to the path management capabilities of the SAL-CPP. It registers for path change events with the SAL-CPP and delivers these events to the Path Manager running in user-mode. Note, in some embodiments, the Path Management, or any of the other services may be configured to operate elsewhere, such as being part of another driver, such as FlareX. Switch Management—The switch management component of the SIL is a kernel-level DLL that works in conjunction with the Switch Manager. Its primary responsibility is to provide the Switch Manager with access to the switch management capabilities of the SAL-CPP.
The CPP also hosts a collection of Common Services 125 that are used by the layered application drivers. These services include: Persistent Storage Mechanism (PSM)—This service provides a reliable persistent data storage abstraction. It is used by the layered applications for storing their meta-data. The PSM uses storage volumes provided by FlareX that are located on the Disk Array Enclosure attached to the CPP. This storage is accessible to both SPs, and provides the persistency required to perform recovery actions for failures that occur. Flare provides data-protection to these volumes using three-way mirroring. These volumes are private to a FabricX instance and are not visible to external hosts. Distributed Lock Service (DLS)—This service provides a distributed lock abstraction to clients running on the SPs. The service allows clients running on either SP to acquire and release shared locks and ensures that at most one client has ownership of a given lock at a time. Clients use this abstraction to ensure exclusive access to shared resources such as meta-data regions managed by the PSM. Message Passing Service (MPS)—This service provides two-way communication sessions, called filaments, to clients running on the SPs. The service is built on top of the CMI service and adds dynamic session creation to the capabilities provided by CMI. MPS provides communication support to kernel-mode drivers as well as user-level applications. Communication Manager Interface (CMI)—CMI provides a simple two-way message passing transport to its clients. CMI manages multiple communication paths between the SPs and masks communication failures on these. The CMI transport is built on top of the SCSI protocol which runs over 2 Gbps Fibre-Channel links that connect the SPs via the mid-plane of the storage processor enclosure. CMI clients receive a reliable and fast message passing abstraction. CMI also supports communication between SPs within different instances of FabricX. This capability will be used to support mirroring data between instances of FabricX.
The CPP includes Admin Libraries that provide the management software of FabricX with access to the functionality provided by the layered drivers such as the ability to create a mirrored volume or a snapshot. The Admin Libraries, one per managed layer, provide an interface running in user space to communicate with the managed layers. The CPP further includes Layered Drivers 123 providing functionality as described below for drivers denominated as Flare, FlareX (FLARE_X), Fusion, Clone/Mirror, PIT Copy, TDD, TCD, and SAPD.
Flare provides the low-level disk management support for FabricX. It is responsible for managing the local Disk Array Enclosure used for storing the data of the PSM, the operating system and FabricX software, and initial system configuration images, packages, and the like. It provides the RAID algorithms to store this data redundantly.
The FlareX component is responsible for discovering and managing the back-end storage that is consumed by the FabricX instance. It identifies what storage is available, the different paths to these Storage Elements, presents the Storage Elements to the management system and allows the system administrator to identify which Storage Elements belong to the instance. Additionally, FlareX may provide Path Management support to the system, rather than that service being provided by the SIL as shown. In such a case, FlareX would be responsible for establishing and managing the set of paths to the back-end devices consumed by a FabricX instance. And in such a care it would receive path related events from the Back-End Services of the SAL-CPP and responds to these events by, for example, activating new paths, reporting errors, or updating the state of a path.
The Fusion layered driver provides support for re-striping data across volumes and uses the capabilities of the IMPS have to implement striped and concatenated volumes. For striping, the Fusion layer (also known as the Aggregate layer), allows the storage administrator to identify a collection of volumes (identified by LUN) over which data for a new volume is striped. The number of volumes identified by the administrator determines the number of columns in the stripe set. Fusion then creates a new virtual volume that encapsulates the lower layer stripe set and presents a single volume to the layers above.
Fusion's support for volume concatenation works in a similar way; the administrator identifies a collection of volumes to concatenate together to form a larger volume. The new larger volume aggregates these lower layer volumes together and presents a single volume to the layers above. The Fusion layer supports the creation of many such striped and concatenated volumes.
Because of its unique location in the SAN infrastructure, FabricX, can implement a truly non-disruptive migration of the dataset by using the Data Mobility layer driver that is part of the Drivers 123. The client host can continue to access the virtual volume through its defined address, while FabricX moves the data and updates the volume mapping to point to the new location.
The Clone driver provides the ability to clone volumes by synchronizing the data in a source volume with one or more clone volumes. Once the data is consistent between the source and a clone, the clone is kept up-to-date with the changes made to the source by using mirroring capabilities provided by the IMPS's. Clone volumes are owned by the same FabricX instance as the source; their storage comes from the back-end Storage Elements that support the instance.
The Mirror driver supports a similar function to the clone driver however, mirrors are replicated between instances of FabricX. The mirror layered driver works in conjunction with the mirror driver in another instance of FabricX. This application provides the ability to replicate a source volume on a remote FabricX instance and keep the mirror volume in synch with the source.
The PIT (Point-In-Time) Copy driver, also known as Snap, provides the ability to create a snapshot of a volume. The snapshot logically preserves the contents of the source volume at the time the snapshot is taken. Snapshots are useful for supporting non-intrusive data backups, replicas for testing, checkpoints of a volume, and other similar uses.
The Target Class Driver and Target Disk Driver (TCD/TDD) layer provides SCSI Target support. In FabricX these drivers mostly handle non-read and write SCSI commands (such as INQUIRY, REPORT_LUNS, etc). The drivers are also responsible for error handling, when errors cannot be masked by the driver layers below, the TCD/TDD is responsible for creating the SCSI error response to send back to the host. The TCD/TDD Layer also implements support for the preferred CLARiiON functionality which provides the means of identifying what LUNs each initiator should see. This is known as LUN masking. The feature also provides for LUN mapping whereby the host visible LUN is translated to an instance-based LUN. Additionally such functionality when combined with a host agent provides the ability to identify which initiators belong to a host to simplify the provisioning of LUN masking and mapping.
The Switch Adapter Port Driver (SAPD) is presented as a Fibre-Channel Port Driver to the TCD/TDD (Target Class Driver/Target Disk Driver) drivers, but rather than interfacing with a physical port device on the SP, the driver interfaces with the SAL-CPP and creates a device object for each front-end port of each IMPS that is part of the FabricX instance. The SAPD registers with the SAL-CPP to receive non-IO SCSI commands that arrive. The SAL-CPP will deliver all non-IO SCSI commands received for LUs owned by this driver's SP to this SAPD. The SAPD runs as a kernel-mode driver.
The following services are user based: Governor and Switch Management. The Governor is an NT Service that is responsible for starting other user-level processes and monitoring their health and restarting them upon failure. The Switch Manager controls sessions created in the SAL-CPP for each IMPS. The Switch Manager is responsible for establishing the connections to the switches under its control and for monitoring and reacting to changes in their health. Each SP hosts a single Switch Manager that runs as a User-level process and a Kernel-mode service within the SP.
Reference is made once again to
The Path Management is responsible for the construction and management of paths to back-end Storage Elements and is part of Kernel-mode services. It notes when paths change state; based on these state changes it applies its path management policies to take any adjusting actions. For example, upon receiving a path failure notification, the Path Management might activate a new path to continue the level of service provided for a back-end volume.
One function of FabricX Volume Management is to combine elements of the physical storage assets of the FabricX Instance into logical devices. The initial implementation of the FabricX Volume Manager is based on the Flare Fusion Driver. As in Flare, the basic building blocks of the volumes exported by FabricX are constructed from the back-end storage devices. Each device visible to the FabricX instance will be initially represented as an un-imported Storage Element. The storage administrator will able to bind the individual storage elements into single disk RAID Groups. From these RAID Groups the administrator can define Flare Logical Units (FLU). In the FabricX environment the FLUs will be exported by the FlareX component to the layered drivers above.
Flare Fusion imports FLUs and aggregates them into Aggregate Logical Units (ALU). When a logical unit or SCSI Disk is presented to a client host it is called a Host Logical Unit (HLU). HLUs can be created by: directly exporting a FLU; exporting an ALU created by concatenating two or more FLUs; and exporting an ALU created by striping two or more FLUs.
The FabricX Inter Process Communication Transport (FIT) provides the message passing support necessary for the SAL Agents running on the IMPS's to communicate with the SAL-CPP client instance running on each SP. This transport provides a model of asynchronous communication to its clients and is responsible for monitoring and reporting on the health of the communications network connecting the IMPSs to the CPPs. FIT uses a proprietary protocol developed on top of the SCSI/FC protocol stack to provide a control path between the SPs and the IMPS's. This protocol runs over the Fibre Channel connecting the SP to the IMPS's switch. FIT supports multiple transport protocols. In addition to SCSI/FC FIT also supports TCP/IP.
FabricX preferably uses the Flare architecture from CLARiiON. This architecture uses a Windows driver stack. At the lowest layer is the code, labeled FlareX (Flare_X) that interfaces to the back-end devices. The storage applications are logically layered above FlareX as Windows device drivers, including Fusion, Mirror_View, Snap_View, Clone_View, and TCD/TDD. These layers provide support for features such as volume concatenation, striping, clones, snapshots, mirrors, and data migration. These layers also define virtualized Logical Unit objects.
This architecture includes replication layers: Snap_View, Clone_View, and Mirror_View are layered above the Fusion layer of code and consume the logical units (volumes) presented by the Fusion layer, they likewise present logical units to the layers above. The replication layers have no knowledge of back-end devices or data placement thereon.
The inventors have critically recognized that prior art storage arrays had limited processes to create clones, which are replicas stored within an array, and mirrors, which are replicas stored on different arrays. This is because the front end of an array has no way to directly access to a back end device in some other array. Remote mirroring in such prior art configurations is processed through two different arrays, one attached to a host and one attached to remote back end devices. These two arrays are then connected to each other via a WAN. However, the present invention being based on a switch and storage architecture does not have such limitations. Instead, all back end devices are equivalent irregardless of their physical location though latency of access may vary. A back end device may be connected to a back-end port of a switch through some WAN making it physically remote. With a switch-based architecture clones, which are replicas stored within the storage managed by the instance, can be created from storage that is physically remote from the Controller and switch hardware just as easily as creating it from storage which is physically close to this hardware. Only one FabricX instance is necessary, to create clones whether on physically local back-end devices or on physically remote back-end devices. However, if it is desirable to create replicas, mirrors, between instances of FabricX that is possible. For example, one might want to create a replica on another instance for the purpose of increasing data protection and providing a means to tolerate a failure of a FabricX instance, or to replicate data using an asynchronous protocol for very long distance replication.
Further Operation Details
Several components of FabricX System Management reside on Host computers (those using the storage services of FabricX and/or those managing FabricX), these are referred to as Client Components and are shown in group 18. One component in particular, Raid++, has both a client and server instance, shown respectively in host/client group 18 or server group 210. The C++ Command Line Interface CLI 200—referred to here as CLI++) component resides on any system where the user wishes to manage FabricX using a text based command line interface. This component in conjunction with the Java based CLI provide the user with control over FabricX. The security model for the CLI++ is host based; the user/IP address of the host is entered into the FabricX registry to permit/limit the user's access to system management. The CLI++ uses a client instance of Raid++ to hold the model of the FabricX instance and to manipulate the configuration of FabricX. The client resident Raid++ communicates with the server based Raid++ using a messaging scheme over TCP/IP.
The Java CLI 206 provides commands that use a different management scheme from that used by the CLI++. The Java CLI captures the user command string, packages it into an XML/HTTP message and forwards it to the CIMOM on the server group 210. The CIMOM directs the command to the CLI Provider which decomposes the command and calls methods in the various CIM providers, primarily the CLARiiON provider, to effect the change.
The Java GUI 208 provides a windows based management interface. It communicates with CIMOM using standard CIM XML/HTTP protocol. The GUI effects its changes and listens for events using standard CIM XML/HTTP. The Host Agent 204 provides optional functionality by pushing down to FabricX information about the host. The following information is forwarded by the Agent explicitly to the CPP: Initiator type, Initiator options, Host device name used for push, Hostname, Host IP address, Driver name, Host Bus Adapter (HBA) model, HBA vendor string, and Host ID.
The Event Monitor 202 resides on a host and can be configured to send email, page, SNMP traps, and/or use a preferred EMC Call Home feature for service and support. The configuration is performed on the CPP and the configuration information is pushed back to the Event Monitor. The Event Monitor may also run directly on the CPP but due to memory constraints may be limited in function in comparison to running on a host computer.
Referring again to
The management functions not provided by Raid++ are provided by a series of CIMOM providers which are attached to a CIMOM. The CIMOM provides common infrastructure services such as XML coding/decoding and HTTP message transport. The hosted services exclusively implemented in CIMOM providers are: Analyzer Provider—Provides statistics about performance of traffic on ports on the switch; CLI Provider—This provider implements services to allow CLI clients to access CIM managed services such as Clone, Analyzer, Fusion, and switch management; Clone Provider—Provides services to manage the configuration and operation of clones; Data Mobility Provider—Provides services to manage the configuration and operation of data migration between storage volumes transparently to the host applications using the storage; Fusion Provider—Provides services to configure and manage the combining of LUNs to create new LUNs of larger capacity; Mirror Provider—Provides services to manage the configuration and operation of mirrors; and. Switch Management Provider—Provides services to configure and manage the attached intelligent switch components owned by FabricX.
The above-described providers periodically poll the system infrastructure to build a model of the existing component configuration and status. If any changes are detected in configuration or status between the existing model and the newly-built model, registered observers are notified of the changes. The model is then updated with the new model and saved for queries by the provider. The services of these providers can be accessed from other providers by formulating XML requests and sending them to the CIMOM. This permits providers which require the services of other providers (such as Raid++ through the CLARiiON Provider or the CIM local services such as persistent storage, directory services, or security) to access those services. Additionally Admin STL Driver Access through the server side provides access to these providers to the drivers and services of an SP as shown in group 218, including to the following drivers: Flare, Clones, Snaps, Fusion, and mirrors and services for switch management and data mobility.
Other Service Providers are shown in group 212 of the server group 210, and include the Persistent Data Service Provider, Security Provider, and Directory Service Provider. The Persistent Data Service Provider provides the CIMOM with access to the kernel-based PSM. The CIMOM uses the PSM to store meta-data relevant to the management system for example user names and passwords. The Security Provider supports authentication of users and authorization of user roles. The Directory Service Provider is used to obtain the network addresses of the systems in a domain of managed FabricX instances.
Reference will be made below to
This problem is addressed with the architecture of the present invention by using memory with the FabricX Storage Processor 26 or 28 to supplement the memory resources of the IMPS's translation unit and more efficiently use memory of each. The translation unit's memory resources are used to store a subset of the full set of extent maps for the volumes exported by the translation unit. Maps are loaded to the translation unit from the CPP both on demand and ahead of demand in a technique denoted as being a virtualizer application which is preferably software running in a SP or on the IMPS. In this embodiment, sequential access is detected and future requests are predicted using protection maps to mark the edges of unmapped regions. Access to a marked region in combination with the detection of sequential access triggers the preloading of additional maps prior to the arrival of the actual request.
I/O requests that arrive to find no mapped region are handled by loading the map from the control-processor. The control processor uses access data collected from the intelligent multi-protocol switch to determine which data to replace. Supported cache replacement algorithms include least-recently used, least frequently used. The IMPS hardware is used to detect need for extents prior to access and statistical data is collected on volume access to select which cache entries to replace. The mechanism further identifies volumes whose extent maps have become fragmented and triggers a process to reduce this fragmentation.
Referring to
Referring to
An example of solving the problem described in general is shown with reference to
Referring to
In this embodiment the storage application (
Referring to
Regarding nomenclature used for presenting an example of how this embodiment functions, in the volume group 412, V7 represents a “top level” volume, or one that is capable of being presented to a host. V6 represents a cloning volume. This volume has two children, V3 and V5, where V3 is the source for the clone and V5 is its replica. Similarly V1 and V2 are children of V3 while V4 is a child of V5. It is the children at the bottommost layer that are presented by the bottommost driver in this example (FlareX) to map the storage extents of the backend array that are of interest.
Referring now to
Information extracted from the IMPS through its API includes the type of I/O operation (e.g., read or write), a logical volume target or in this context a virtual target (VT), a virtual LUN (v-LUN), physical target, a physical LUN, block address of a physical device, length of request, and error condition. The error condition is received and that identifies the bottommost affected volume and delivered to the bottommost affected object.
Objects in the Volume Graph have one-to-one congruity with objects in the Device Graph, as follows. At the top layer (V7) the Volume Graph Object G represents a slice volume. A slice volume represents a “slice” or partition of its child volume. It has only one child volume and presents a (possibly complete) contiguous subset of the data of this child volume. The next layer down (V6) is congruently mapped to element F represents the mirrored volume. V3 represents an aggregated volume consisting of the concatenation of volumes V1 and V2. Volume V5 is a slice volume that presents the data of V4. Slice Volume Objects A, B, and D, are congruent with Device Graph Objects V1, V2, and V4, respectively.
Upon receiving an error, the Bottommost Volume Graph Object will deliver the error to its corresponding Device Graph Object. The Device Graph Object as the owner of the error can decide how to handle, e.g. propagate the error up the layer stack, or deal with it directly and not propagate it. Two goals are carried out in the error presentation: consistent presentation to the host, and masking of errors which involves using independent capabilities of the FabricX architecture to solve problems for the host at the unique middle-layer level of a FabricX instance.
Reference is made below to
In handling the error in this example, there are two cases for handling in keeping with the goals discussed above: (a) Case 1—error is not delivered back to host but rather handled by a Device Object at the FabricX instance level; or (b) Case 2—error is transformed to have the error personality of FabricX. Each method of handling is explained in the example, continued with the flow-diagrams of
Referring to
Referring to
The system 900 includes an intelligent switch 902 having a series of virtual targets 904 (shown as VT 1 to VT 8) and virtual initiators 906 (shown as VI 1 to VI 8) provided by a virtual volume 908 and a masking and mapping module 910. A host 912, having a first host bus adaptor HBA1 includes a series of ports 914 coupled to first and second zones 916, 918. In the illustrated embodiment, the first and second ports (Port 1 and Port 2) are coupled to the first zone 916 and the third and fourth ports (Port 3 and Port 4) are coupled to the second zone 918. It is understood that the zones can be provided by various switches, such as Fibre Channel switches. It is also understood that direct connections to the intelligent switch 902 can be provided.
The first and second zones 916, 918 are coupled to various ports on the intelligent switch 902 to provide connectivity to various virtual targets VTs. In the illustrated embodiment, the first zone 916 is coupled to virtual targets one through four (VT 1 to VT 4) and the second zone 918 is coupled to virtual targets VT 5 to VT 8. As described above, this can be referred to as the front end.
The masking and mapping module 910 and virtual volume 908 provide connectivity to the virtual initiators 906 (VI 1 to VI 8), which are coupled to third and fourth (backend) zones 920, 922. In the illustrated embodiment, VI 1 to VI 4 are connected to the third zone 920 and VI 5 to VI 8 are coupled to the fourth zone 922.
A storage array 922 is coupled to the third and fourth zones 920, 922 via various ports 926. In the illustrated embodiment, first and second ports (Port 1 and Port 2) are coupled to the third zone 920 and third and fourth ports (Port 3 and Port 4) are coupled to the fourth zone 922. The ports 926 on the array provide access to array storage 930 via a masking and mapping module 932.
It is understood that additional switches 950 providing further zones 952 can form a part of the system. In general, the system can include any practical number of switches, zones, arrays, etc.
In another aspect of the invention, a storage virtualization system running on intelligent storage area network (SAN) switches provides minimally disruptive data migration that avoids network re-wiring, re-zoning and re-provisioning. In one embodiment, data migration functionality is provided as described in detail below without requiring software being added to hosts. The inventive system utilizes what are referred to as “pass-through” logical units (LUNs), as described in more detail below. The pass through LUNs are mapped one-to-one to the storage elements (SEs) of the storage virtualization system.
It is understood that exemplary embodiments are shown and described in conjunction with storage virtualization systems and terms associated with the INVISTA system provided by EMC Corporation. It is understood that such terms are intended to facilitate an understanding of the invention to one of ordinary skill in the art and should be construed broadly to cover functionality of a storage virtualization system, without limitation to any particular system or vendor.
Before describing exemplary embodiments of the invention, some information is provided. So-called back-end storage, such as back-end storage 41 in
In conventional storage systems, the host includes software (referred to as multi-pathing software e.g., PowerPath multi-pathing software of EMC Corporation) that recognizes, and takes into account, the same LUN can appear to as two different LUNs or on two different paths due to the nature of the storage system. For example, a host may recognize back end storage as Symmetrix storage, for example, via two direct paths exposed to it. Without software on the host, multipath software may believe that the Symmetrix storage is two different devices, one on each path. Multi-pathing software will make it appear to the host that this is in fact the same LUN. Normally, virtual volumes exposed through a storage virtualization system will have the characteristics of the storage virtualization system LUNs. In exemplary embodiments of the invention, storage virtualization system LUNs that have been composed as pass through LUNs will have the characteristics of the back-end storage elements from which they are constructed. Thus, in accordance with exemplary embodiments providing minimally disruptive data migration described below, storage can be imported into a storage virtualization system without being seen as multiples of the same storage, without the addition of software on the host.
In an exemplary embodiment, the pass though LUNs are provided as back end storage LUNs seen through the host that are exposed as frontend LUNs through the storage system with the same unique ID as the backend LUN, so that the host (and the multi-pathing software on it) associates the storage system device as just another path to the original device, and not as a new device. One could consider that the host is ‘spoofed’ into seeing the original device as the same device after importation. That is, the unique ID, e.g., WWN (world wide name), of a virtual volume presented to the host should be the same as the WWN of the actual backend storage element. While WWN of virtual initiators communicating with backend targets can be the same as the WWN of host initiators, in general they will not be the same.
The storage virtualization system using virtual initiators, can record properties of the back end storage to enable the storage virtualization system to respond to the host as if the responses were being issued by the actual back-end storage element. Once this logical path has been established, then using the data migration feature in the storage virtualization system, the data can be migrated off of the present backend storage to new backend storage. This migration is completely transparent to the host. While the host would ‘see’ a physical path and one or more virtual/logical paths, the ‘real’ path can be shut off. The storage virtualization system can then use the recorded properties of the (now) former backend storage to respond to the host.
Once the data migration session is established, in an exemplary embodiment, writes are mirrored to both the source and destination and the reads are serviced only through the source. The user commits the data migration operation (i.e. makes a conscious choice to send writes exclusively to the destination) at some point and de-provisions the backend source LUNs. The storage virtualization presented LUNs should retain the backend source LUN SCSI personality until the “pass through” property of the virtualization system presented LUNs is cleared by the user.
Referring to
Initially, the backend storage/LUNs 1006 is provisioned for the virtual initiators 1012, which are zoned with the virtual target 1014.
As shown in
Some additional detail on pass though LUN operation is now provided. In a backend information collection phase, the SCSI personality of the backend storage is collected by issuing SCSI Commands that yield its personality. The following choices can be considered in this work flow phase. Note, that these are not mutually exclusive and a combination of these choices (up to and including all choices) can be implemented. While SCSI devices are shown and described in exemplary embodiments, it is understood that other technologies and protocols can be used without departing from the present invention.
In one embodiment, a static information collection mechanism is used in which the information is collected by sending some commands to the backend storage at the point of discovery. The set of commands to be issued to the backend can include standard INQUIRY, INQUIRY for Vital Product Data Page 80h, INQUIRY for Vital Product Data Page 83h, READ_CAPACITY and certain other vendor unique commands as deemed appropriate for the back-end array and storage element for which information is collected.
In another embodiment, information collection through monitoring mechanism is used in which the information about what responses the backend array sends to the host is collected via passive observation of commands issued by the host to the actual back-end array. In this implementation, the VT WWN duplicates the WWN of the physical port of the storage array exposed to the host and the VI WWN duplicates the host physical port WWN. The slow path commands issued by the host are transmitted to the backend and the responses are intercepted, the data from these is copied and the responses are transmitted back to the host. This allows collection of the backend array SCSI personality by determining what exactly the host is sending. To ensure that the entire gamut of SCSI Commands for which data is required by the host is collected, discovery-type applications could be run on the host (e.g., ioscan for HPUX hosts or cfgmgr for AIX hosts).
In another embodiment, a self-learning pass-through information collection mechanism is used. An instance of the storage virtualization application functions as a pass through to the backend storage array. Thus, when the host sends SCSI commands that query the SCSI personality, these are issued to the backend. The information from the response coming back is collected and then passed onto the host. This repository of SCSI commands and their responses is collected to be used in case the commands are repeated.
The above, assumes that the virtual initiators (VIs) are registered with the same initiator type as that of the actual initiator. Note, since command responses change based on the registered initiator type, it may be necessary to limit the migration function to one initiator type at a time.
A transfer phase corresponds to collecting the information from the backend and ensuring that the front-end has access to it. This is the information that has been collected in the prior phase.
In one embodiment, information is collected in a flat file. It is understood that the backend information collection phase described above yields information about the responses that the backend will issue to the host. The responses from the backend for the issued slow path commands can be stored in a flat file. This is a combination of Array/Target/LUN information.
In another embodiment, information is collected via an IOCTL query mechanism. For example, FlareX could function as the repository for the backend array information. TCD could query for this backend interface by issuing an IOCTL to FlareX.
A frontend response phase utilizes the data obtained from the transfer phase and responds appropriately to host queries. In one embodiment, in order to achieve appropriate responses, the host is registered using a new initiator type called “pass-through”. In this mode, TCD, upon figuring this initiator type, uses the appropriate mechanism implemented for the transfer phase to return the appropriate response. Note that the appropriate response should be returned for commands that have not been executed.
In a backend mimicking phase, the original backend array has been disconnected and the migration of data has been accomplished. However, the personality of the former array needs to be maintained. This can be achieved by informing the appropriate driver that the commands should not be issued to the backend. The expectation is that the IT Nexus will be deregistered (after the host is taken down) for the migration and then reconfigured to use the new array.
In conventional data migration systems, a volume is switched from source to destination automatically after the destination is completely synchronized with the source. However, with pass-through LUNs, in accordance with exemplary embodiments, if the switch over happens automatically, then the SCSI commands will be routed to the destination LUN which causes multipathing software to think it lost access to the original device and a new device has shown up. In order to avoid that kind of disruption to the host, in an exemplary embodiment manual control is used to switch over is provided in the user interface (referred to as the user “committing” the migration), so that the switch over happens only on user action, after the user has verified the integrity of the data on the destination or/and the host is disconnected from the storage virtualization application. In addition, the user interface should prevent or warn the user if an admin action is being performed when at least one initiator in the virtual frame is logged into the VT.
In an exemplary embodiment, in order to spoof the initiator and target WWNs, the storage virtualization application first needs to discover the SAN and present the initiators and targets to enable the user to select the initiator-target pairs to be spoofed. Depending on the initiator-target pairs, corresponding VIs and VTs can be created. The user might choose all LUNs or a subset of LUNs to be presented to the host through the storage virtualization application and accordingly configure the virtual frames. If the user chooses to present all the LUNs, the user interface can be modified to automatically import the SEs, create volumes, and configure the virtual frames.
In step 1210, data migration is performed transparently to the host until the destination volume is fully synchronized with the source volume in step 1212. In step 1214, the volume is switched from source to destination. In an exemplary embodiment, the switch is performed manually. In optional step 1216, the source volume is erased and can be used as new storage.
In optional step 1218, the migrated volumes can be physically moved to a new location. Once moved, the volumes can be easily reconnected to the system. In optional step 1220, the server is shut down to enable connection of storage as desired
In other embodiments, the user can keep running with new array medium but retain the old array personality.
Having described exemplary embodiments of the invention, it will now become apparent to one of ordinary skill in the art that other embodiments incorporating their concepts may also be used. The embodiments contained herein should not be limited to disclosed embodiments but rather should be limited only by the spirit and scope of the appended claims. All publications and references cited herein are expressly incorporated herein by reference in their entirety.
Palekar, Ashish Arun, Swayampakulaa, Sudhindra, Chinthapatla, Karunaker, Waxman, Matthew D.
Patent | Priority | Assignee | Title |
10459909, | Jan 13 2016 | Walmart Apollo, LLC | System for providing a time-limited mutual exclusivity lock and method therefor |
10986091, | Oct 30 2017 | EMC IP HOLDING COMPANY LLC | Systems and methods of serverless management of data mobility domains |
11010084, | May 03 2019 | Dell Products L.P.; Dell Products L P | Virtual machine migration system |
11099954, | Oct 31 2018 | EMC IP HOLDING COMPANY LLC | Method, device, and computer program product for rebuilding data |
11477270, | Jul 06 2021 | VMware LLC | Seamless hand-off of data traffic in public cloud environments |
11936721, | Jul 06 2021 | VMware LLC | Seamless hand-off of data traffic in public cloud environments |
Patent | Priority | Assignee | Title |
6976103, | Mar 03 2000 | Hitachi, Ltd. | Data migration method using storage area network |
7130941, | Jun 24 2003 | Hitachi, LTD | Changing-over and connecting a first path, wherein hostscontinue accessing an old disk using a second path, and the second path of the old disk to a newly connected disk via a switch |
7149859, | Mar 01 2004 | Hitachi, Ltd.; Hitachi, LTD | Method and apparatus for data migration with the efficient use of old assets |
7343467, | Dec 20 2004 | EMC IP HOLDING COMPANY LLC | Method to perform parallel data migration in a clustered storage environment |
7840723, | Dec 20 2002 | Veritas Technologies LLC | System and method for maintaining and accessing information regarding virtual storage devices |
20040250021, | |||
20060282631, | |||
20070050569, | |||
20090089498, |
Date | Maintenance Fee Events |
Jan 24 2020 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jan 24 2024 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Date | Maintenance Schedule |
Aug 16 2019 | 4 years fee payment window open |
Feb 16 2020 | 6 months grace period start (w surcharge) |
Aug 16 2020 | patent expiry (for year 4) |
Aug 16 2022 | 2 years to revive unintentionally abandoned end. (for year 4) |
Aug 16 2023 | 8 years fee payment window open |
Feb 16 2024 | 6 months grace period start (w surcharge) |
Aug 16 2024 | patent expiry (for year 8) |
Aug 16 2026 | 2 years to revive unintentionally abandoned end. (for year 8) |
Aug 16 2027 | 12 years fee payment window open |
Feb 16 2028 | 6 months grace period start (w surcharge) |
Aug 16 2028 | patent expiry (for year 12) |
Aug 16 2030 | 2 years to revive unintentionally abandoned end. (for year 12) |