Embodiments are described for systems and methods that facilitates dynamic configuration of NPIV virtual ports in a Fiber Channel network. A defined failure detection method in conjunction with certain scsi target endpoint and NPIV definitions and monitor component operations allows scsi target virtual endpoints to be dynamically created, deleted and managed during data migration and endpoint disable/enable operations in a backup storage appliance without incurring disruption of other virtual endpoints on the same appliance.
|
1. A method of dynamically managing virtual scsi transport ports scsi in a network, comprising:
mapping one or more virtual ports to corresponding physical port on a data storage device, the virtual ports representing endpoints that are abstracted objects having attributes that can be moved around a network utilizing the transport ports;
receiving a notification of a failure event from a monitor component requiring reconfiguration of the scsi transport ports;
during runtime of data storage operations, dynamically recreating a target virtual port from a first physical port to a second physical port in response to the failure event of the first physical port; and
managing the endpoints through a multi-threaded scsi target daemon process that sends commands to an operating system of the network to create and maintain multiple virtual ports for the physical port based on at least some of the endpoints.
7. A method of dynamically managing endpoints in a scsi network, comprising:
creating a virtual port on a first physical port of the scsi network and assigning a first virtual port a fixed world-wide port name (WWPN), wherein the virtual port represents an endpoint that is an abstracted object having attributes that can be moved around a network utilizing the transport ports;
detecting a failure of the first physical port;
in the event of the failure of the first physical port, removing the virtual port from the first physical port;
recreating the virtual port on a second physical port with the fixed WWPN to provide a backup port for data transactions intended for the first physical port; and
managing endpoints through a multi-threaded scsi target daemon process that sends commands to an operating system of the network to create and maintain multiple virtual ports for the first and second physical ports.
13. A computer program product comprising a non-transitory computer usable medium having machine readable code embodied therein for dynamically configuring virtual scsi transport ports scsi in a network by:
mapping one or more virtual ports to corresponding physical port on a data storage device, the virtual ports representing endpoints that are abstracted objects having attributes that can be moved around a network utilizing the transport pore,
receiving a notification of a failure event from a monitor component requiring reconfiguration of the scsi transport ports;
during runtime of data storage operations, dynamically recreating a target virtual port from a first physical port to a second physical port in response to the failure event of the first physical port; and
managing the endpoints through a multi-threaded scsi target daemon process that sends commands to an operating system of the network to create and maintain multiple virtual ports for the physical port based on at least some of the endpoints.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
8. The method of
9. The method of
10. The method of
11. The method of
12. The method of
|
Embodiments are generally directed to data storage systems, and more specifically to dynamic management of SCSI target virtual endpoints during runtime of data transfer operations.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
Modern computer data storage systems, such as storage area networks (SAN) in enterprise environments often use the Fibre Channel (FC) network technology to provide high-speed (e.g., 2 to 16 gigabit/second) data transfers. A Fibre Channel network comprises a number of ports that are connected together, where a port is any entity that actively communicates over the network (either optical fiber or copper), where a port is usually implemented in a device such as disk storage or a Fibre Channel switch. The Fibre Channel protocol transports SCSI commands over Fibre Channel networks, and network topologies include point-to-point, arbitrated loop (devices in a ring), and switched fabric (devices/loops connected through switches). The Fibre Channel protocol comprises five layers in which a protocol mapping layer (FC4) encapsulates application protocols (i.e., SCSI or IP) into protocol data units (PDUs) for delivery to the physical layers (FC2 to FC0).
The SCSI (Small Computer System Interface) standards used by Fibre Channel networks define certain commands, protocols and electrical/optical interface characteristics for connected devices, such as hard disks, tape drives, and controllers. In data storage networks, a SCSI initiator is typically a computer that initiates a SCSI session by sending a SCSI command, and SCSI target is a data storage device that responds to initiators' commands and provides the required input/output data transfers.
As the needs for large-scale data storage have increased, storage virtualization techniques have been developed improve functionality and allow location independence. Storage virtualization abstracts logical storage (e.g., at the block or file level) from the physical storage devices (e.g., disk arrays). The physical storage resources are aggregated into storage pools to form the logical storage, which presents the logical storage space and transparently maps the logical space to physical storage locations. The Fibre Channel standard includes an N_Port ID Virtualization (NPIV) feature in which multiple Fibre Channel node port (N_Port) IDs can share a single physical N_Port. This allows multiple Fibre Channel initiators to occupy a single physical port, easing hardware requirements in SAN systems. This mechanism allows each virtual server to see only its own storage and no other virtual server's storage. NPIV thus allows a single N_Port to register multiple World Wide Port Names (WWPNs) and N_Port identification numbers. In present systems, Fibre Channel base ports and virtual ports are discretely managed and configured on a one-to-one basis in which a physical port is associated with a single virtual port. This makes reconfiguration and management of these ports relatively difficult in most operating conditions. Furthermore, NPIV ports are directly administered, which adds the disadvantage of increased management complexity and difficulty in performing operations such as storage device failover and migration. As the amount of data in enterprise applications increases, the use of highly-available storage is a key consideration and system requirement. A highly-available storage framework allows transparent storage of the same data across several physically separated machines connected within a SAN or other TCP/IP network. For such systems, it is important that storage devices and data centers are able to be efficiently and quickly reconfigured or replicated in case of failure conditions or even routine maintenance. Reconfiguration and migration tasks in present systems is typically a static operation in which the storage devices and/or entire network is taken down to reconfigure the system or perform large-scale migration of the data. Such as requirement is obviously disruptive to system operation and can cause issues with regard to system performance and integrity. Furthermore, configuration and management operations of virtual ports, such as Fibre Channel NPIV ports often causes disruption to other ports on the same backup storage appliance, or even in the same system.
What is needed therefore, is a way to facilitate a non-disruptive reconfiguration of storage devices in a storage area network, such as by allowing the dynamic configuration of SCSI transport endpoints and NPIV virtual ports in a Fibre Channel network.
What is further needed is a method that allows SCSI target virtual endpoints to be dynamically created, deleted, and managed in a backup storage appliance without causing or incurring disruption of other virtual endpoints on the same appliance.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. EMC, Data Domain, Data Domain Restorer, and Data Domain Boost are trademarks of EMC Corporation.
In the following drawings like reference numerals designate like structural elements. Although the figures depict various examples, the one or more embodiments and implementations described herein are not limited to the examples depicted in the figures.
A detailed description of one or more embodiments is provided below along with accompanying figures that illustrate the principles of the described embodiments. While aspects of the invention are described in conjunction with such embodiments, it should be understood that it is not limited to any one embodiment. On the contrary, the scope is limited only by the claims and the invention encompasses numerous alternatives, modifications, and equivalents. For the purpose of example, numerous specific details are set forth in the following description in order to provide a thorough understanding of the described embodiments, which may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the embodiments has not been described in detail so that the described embodiments are not unnecessarily obscured.
It should be appreciated that the described embodiments can be implemented in numerous ways, including as a process, an apparatus, a system, a device, a method, or a computer-readable medium such as a computer-readable storage medium containing computer-readable instructions or computer program code, or as a computer program product, comprising a computer-usable medium having a computer-readable program code embodied therein. In the context of this disclosure, a computer-usable medium or computer-readable medium may be any physical medium that can contain or store the program for use by or in connection with the instruction execution system, apparatus or device. For example, the computer-readable storage medium or computer-usable medium may be, but is not limited to, a random access memory (RAM), read-only memory (ROM), or a persistent store, such as a mass storage device, hard drives, CDROM, DVDROM, tape, erasable programmable read-only memory (EPROM or flash memory), or any magnetic, electromagnetic, optical, or electrical means or system, apparatus or device for storing information. Alternatively or additionally, the computer-readable storage medium or computer-usable medium may be any combination of these devices or even paper or another suitable medium upon which the program code is printed, as the program code can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory. Applications, software programs or computer-readable instructions may be referred to as components or modules. Applications may be hardwired or hard coded in hardware or take the form of software executing on a general purpose computer or be hardwired or hard coded in hardware such that when the software is loaded into and/or executed by the computer, the computer becomes an apparatus for practicing the invention. Applications may also be downloaded, in whole or in part, through the use of a software development kit or toolkit that enables the creation and implementation of the described embodiments. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the described embodiments.
Disclosed herein are methods and systems of simplifying the management of both NPIV and non-NPIV Fibre Channel configurations for SAN and virtual SAN data storage systems. Specifically, embodiments include methods and systems that facilitate non-disruptive reconfiguration of storage devices in a storage area network, such as by allowing the dynamic configuration of SCSI transport endpoints and NPIV virtual ports in Fibre Channel networks. Embodiments may be implemented in a data storage system that performs backup, archiving, and disaster recovery operations with deduplication functionality. The described embodiments allow for dynamic management of SCSI target virtual endpoints in a way that provides maximum flexibility without causing or incurring disruption to or from other virtual endpoints. Such a system is illustrated by the data domain restorer (DDR) system provided by EMC Corporation, though other similar systems are also possible.
For DDR system 100, the Fibre Channel protocol may be employed by one or more of the namespace interfaces, such as the VTL or DD-Boost or VDisk namespaces to direct data to devices within the disk storage subsystem 110 that comprise SCSI targets. Fibre Channel allows target names to be advertised as conventional world-wide names (WWPN) or virtual names (NPIV). In an embodiment, the Fibre Channel protocol is modified through enhancements to the SCSI Target subsystem to take advantage of NPIV functionality on the DDR system 100, including endpoint flexibility and failover. Thus, as shown in
In general, the operating system 136 includes an OS kernel and associated target drivers to control the transmission of data to and from the storage devices, such as local storage 134 and/or cloud-based storage accessible through the cloud 130, In an embodiment, the OS kernel target drivers are configured to support NPIV, as defined by the Fibre Channel protocol. To allow user space applications to utilize the NPIV functionality in the kernel, the PLIB 138 is enhanced to support this new functionality. PLIB is a Peripheral Library (PLIB) comprising a simple access library that provides a consistent but very low-level interface to a peripheral on the microcontroller. The PLIB hides register details, making it easier to write drivers that support multiple microcontroller families, and is primarily used to implement device drivers (and some system services) to make them portable.
As opposed to present known Fibre Channel systems in which each physical port 204 is mapped to a single virtual port in a one-to-one relationship, the NPIV mapping component 208 allows the mapping of multiple endpoints (which can be virtual and/or physical) to a single physical port. This means that virtual port management (e.g., port migration, moving ports, removing ports, adding ports, etc.) can be performed on many virtual ports in a unified manner rather than simplistically through the present one-to-one port management procedures. Embodiments thus allow a SAN system to be configured with virtual endpoints that span both base and virtual ports. This is accomplished by adding NPIV features functionality to data domain (or similar) operating systems in order to virtualize transport endpoints between base and virtual ports. This mechanism also significantly impacts the dynamic management and configuration of DD OS based systems. Traditionally, reconfiguration or recovery from failure was a static process requiring taking down a system to reconfigure or repair ports. With the virtualization of SCSI target endpoints through NPIV mapping mechanisms, data storage systems can be reconfigured dynamically or on-the-fly while the system is up and running. This also allows movement of virtual ports among physical ports, or even among different systems in the network. In a highly available system, such as critical data recovery systems, the ability to migrate on the fly by moving endpoints port-to-port or system-to-system greatly facilitates the ability for data to be maintained and protected in a non-disruptive manner.
In an embodiment, the NPIV functionality is enhanced to perform endpoint creation on user demand, protocol segregation based endpoint, Fibre Channel port failover, and provide enhanced quality of service. Embodiments include enhancements to the PLIB that support NPIV functionality and that maintain PLIB compatibility with existing PLIB consumers. For purposes of description, certain interface names, programming elements/objects, and programming code segments will be listed and will use conventions as defined by the Fibre Channel protocol and/or in accordance with a specific operating system, such as the Data Domain Operating System (DD OS) provided by EMC Corporation. Adaptation to other operating systems can be performed using techniques known to those of ordinary skill in the art.
NPIV Functionality
Certain changes are made to the operating system, including the PLIB to accommodate the NPIV feature of Fibre Channel. For example, the size of PLIB port table is extended from 8 to 64, which is defined as DD_PLIB_SCSITGT_PORT_MAX. The size can be extended further to 256 once SCSI Target has support for a 256 bit port bitmask. The port table will now contain both physical ports and virtual ports. PLIB port index is unique across all physical ports and virtual ports. That means a physical port and a virtual port cannot share the same port index. The number of physical ports is static while number of virtual ports is dynamic because virtual port can be created and deleted. The first n entries of PLIB port table are for physical port, where n is the number of physical ports on the system. These entries are fixed and contiguous. The remaining entries in the port table are used for virtual ports. These entries are dynamic and not contiguous. There could be an empty entry in between two filled entries. When a virtual port is created, it will be placed in the first available entry in the port table. When a virtual port is deleted, its entry will be cleared in the port table.
With respect to user interfaces, PLIB interfaces are abstracted within SCSI target (scsitgt) that provides interfaces to the command line (CLI). With respect to programming interfaces, PLIB interfaces that can be used for physical port only are indicated by “pport_idx” in the parameter they take. PLIB interfaces that can be used for virtual port only are indicated by “vport_idx” in the parameter they take. PLIB interfaces that can be used for both physical and virtual ports are indicated by “port_idx” in the parameter they take.
In an embodiment, to support NPIV functionality, several new PLIB interfaces are defined, as described with reference to certain programming examples below. One new interface is the create_virtual_port interface, which is used to create a virtual port on physical port specified by pport_idx. The virtual port will be created using the WWPN and WWNN passed in the second and third arguments. After created, virtual port will be in disabled state with target mode not set. To use the virtual port, target mode of the virtual port must be first set using dd_plib_scsitgt_set_port_target_mode( ) and the virtual port must then be enabled using dd_plib_scsitgt_set_port_state( ).
Another new interface is the delete virtual port, which is used to delete a virtual port specified by vport_idx. A virtual port can be deleted in any state and does not need to be disabled first.
A return number interface is used to get the number of virtual ports that are currently present on the system
A get_port_info interface is used to get various information (such as PUB port index, port state, port id, WWPN, WWNN, etc.) of one or multiple virtual ports in one function call. The information is returned in the form of dd_plib_scsitgt_port_info_t structure which is the same structure used in dd_plib_scsitgt_get_port_info( ). At a minimum, the vinfo buffer must have the size for as many number of virtual ports requested in num_requested_vports argument. The vinfo buffer will be filled with as many number of virtual ports requested or as many number of virtual port currently present (at the moment the interface is invoked) whichever is less. It is recommended that dd_plib_scsitgt_get_num_vports( ) be called first to get the number of virtual ports currently present on the system so that the vinfo buffer can be allocated for the number of virtual ports present. The interface will set the num_present_vports parameter to the number of virtual ports present at the time the interface is invoked. If num_present_vports returned is less than num_requested_vports then it means there are fewer numbers of virtual ports present than it is requested and the vinfo buffer only contains as many virtual ports stated in num_present_vports. If num_present_vports returned is greater than num_requested_vports then it means there are more virtual ports present than it is requested and the vinfo buffer only contains as many virtual ports stated in num_requested_vports. If PLIB port index of a virtual port is known, dd_plib_scsitgt_get_port_info( ) can also be used to get port information of virtual port specified by the PLIB port index.
The following interface is used to get the maximum number of virtual ports that can be created on the system. This is limited by the number of available virtual WWN (world wide names) on the system and the total number of maximum virtual ports supported on each physical port.
The following interface is used to get the maximum number of virtual ports that can be created on physical port specified by pport_idx. This interface only takes PLIB port index of physical port. If PLIB port index of virtual port is passed, an error will be returned.
This return number of virtual port interface listed immediately below is used to get the number of virtual ports that are currently present on physical port specified by pport_idx. This interface only takes PLIB port index of physical port. If PLIB port index of virtual port is passed, an error will be returned.
The get NPIV capability of the switch interface is used to get the NPIV capability of the physical HBA port specified by pport_idx and the switch in which the HBA port is connected to. If NPIV is supported, then DD_PLIB_FC_NPIV_SUPPORTED will be returned in the corresponding capability. Otherwise DD_PLIB_FC_NPIV_NOT_SUPPORTED will be returned. If the link of the HBA port is not online, then the switch's NPIV capability cannot be determined. Therefore DD_PLIB_FC_NPIV_UNKNOWN will be returned in the switch capability. Virtual ports should be created when NPIV is supported on both HBA port and switch.
The get port state interface listed immediately below is used to get the port state of physical or virtual port specified by port_idx. This interface can take PLIB port index of physical port or virtual port. It is introduced to optimize periodic polling done by the SCSI target which currently uses dd_plib_scsitgt_get_port_info ( ) to get mostly port state.
With respect to data structure definitions, a new field parent_port_idx with data type dd_int32_t is added to dd_plib_scsitgt_port_info_t structure. The new field is used to store the physical port index of a virtual port in the PLIB port table. If the port itself is a physical port, the field will be—−1 which is defined as DD_PLIB_NO_PARENT.
typedef struct dd_plib_scsitgt_port_info {
} dd_plib_scsitgt_port_info_t;
The virtual_port_idx field in the above structure is usually of value 0. Now the field will be non-zero if the port is a virtual port. It indicates the port index of a virtual port within its physical port. It is unique within the same physical port but not unique across multiple physical ports. If the port itself is a physical port, its virtual_port_idx field will be 0. virtual_port_idx should not be confused with port index used in PLIB port table. It is assigned by the kernel driver so that it has nothing to do with index within PLIB port table. The use of the count field within dd_plib_table_t of dd_plib_scsitgt_ports is also modified, as follows.
typedef struct {
} dd_plib_table_t;
dd_plib_table_t dd_plib_scsitgt_ports={
};
The dd_plib_scsitgt_ports.count is used for both physical and virtual port counts. The two least significant bytes are used for physical port count. The two most significant bytes are used for virtual port count. dd_plib_scsitgt_ports.count must not be accessed directly. Instead these two macros: DD_PLIB_SCSITGT_PORT_COUNT and DD_PLIB_SCSITGT_VPORT_COUNT, must be used to access physical and virtual port counts. A new data structure dd_plib_fc_npiv_cap_t is introduced to define NPIV capability of the HBA port and the switch.
typedef enum {
} dd_plib_fc_npiv_cap_t;
With respect to threading and locking, all PLIB interfaces described above are executed in the caller's context. All accesses to PLIB port table (dd_plib_scsitgt_ports) are protected with dd_plib_mutex through a dd_plib_mutex_lock( ) function call.
The following sample code shows how to get the maximum number of virtual ports supported on the system and per physical port.
{
}
The following sample code shows how to get NPIV capability of the HBA port and the switch before creating a virtual port.
{
}
The following sample code shows how to create, operate and delete a virtual port.
{
}
The following sample code shows how to get the number of virtual ports currently present on the system and to get port info of all the virtual ports.
{
}
}
The following sample code to get port info of multiple virtual ports is not allowed because virtual port indexes in PUB port table are not guaranteed to be contiguous (due to VPort delete request that could result in empty entry in the port table).
{
}
In general, the PLIB does not use any network protocol, and does not store any persistent information to non-volatile storage. In an embodiment, the changes to the PLIB to accommodate NPIV features in Fibre Channel are used to enhance SCSI target endpoint functionality in a DD OS (or similar) data storage system.
SCSI Target Endpoint Enhancements
Embodiments are directed to providing enhancements to the SCSI target subsystem to take advantage of NPIV functionality on data domain restorer (DDR) systems, including endpoint flexibility and failover DD OS kernel drivers have added support for NPIV. This allows new functionality to be added to the overall SCSI target solution, including additional endpoint flexibility and the ability to perform endpoint failover between physical Fibre Channel ports. Certain use cases are identified for this functionality including: (1) simplifying multiple system consolidation, (2) endpoint failure to improve single-node availability, (3) port cost reduction, (4) system management isolation for multi-tenancy, and (5) facilitating migration to DDR from environments accustomed to more target ports.
With respect to system consolidation, multiple data domain restorer systems (DDRs) are consolidated into a single larger DDR, for example to reduce power usage or data-center footprint, or to improve de-duplication. To reduce the need to rezone or reconfigure existing initiators additional endpoints are configured on the consolidated system to provide equivalent access to the old discrete systems.
With respect to endpoint failover, endpoints are integrated with port failure/offline detection to failover endpoints to alternate ports in case of failure. This provides additional resilience for single-system DDRs.
With respect to port cost reduction, the use of multiple endpoints with fewer, higher capacity, physical interfaces reduces the cost of a DDR, or similar storage appliance. It also reduces the need for additional slots to be reserved for those interfaces. For example, instead of using 4, 2×8 Gb ports in a DDR 2, 2×16 Gb ports can be used, with eight endpoints assigned to the interfaces. This provides the same aggregate bandwidth and connectivity, yet reduces the system cost and slot usage.
With respect to system management isolation and multi-tenancy, multiple endpoints are used to isolate and compartmentalize access to a DDR (or other SAN) system from a management and access perspective. Consider an SMT (multi-tenant) system where it is desirable to allow tenants to manage aspects of their DDR SCSI Target access, but not interfere with other tenants. In this case the landlord could provision one or more endpoints for a tenant, then give the tenant the ability to configure the groups, devices and LUNs for that endpoint (this assumes the SCSI Target service allows corresponding separation for management). Note that multiple endpoints are a building block in the complete SMT solution for a SAN. Additional functionality would include (but not be limited to): allowing delegation of access groups to tenants, allowing delegation of device creation/deletion to tenants, and allowing delegation of SCSI Target service-specific features to tenants, for example create/delete of VTL tapes and importing/export VTL tapes from a tenant-specific tape pool to a tenant-specific VTL.
Embodiments also facilitate migration to the DDR environment from environments used to more target ports. DDR systems traditionally have a relatively low Fibre Channel port count when compared to comparable systems. By using multiple endpoints it becomes easier for new customers to migrate from environments where higher port counts are common. For example a system previously configured with 12 Fibre Channel ports may be configured with a DDR using 12 endpoints and a smaller number of physical interfaces. Embodiments support multiple SCSI Target endpoints on SCSI Target ports. That is, there may be a M:1 relationship between endpoints and ports. Endpoints can be added, deleted or migrated between ports as needed, either under administrator control or under direction of the single-node endpoint failover functionality. For the Fibre Channel transport this is supported on all topologies where NPIV is supported. For non-NPIV environments, scsitgtd continues to operate as previously, with a 1:1 relationship between an endpoint and its associated system address. The scsitgtd is a multi-threaded SCSI target daemon process that interfaces with the SSM monitor subsystem. As a daemon process, scsitgtd runs as a background task and not under direct control of a user interactive process. It comprises a management process that orchestrates the main functionality of the virtual/physical port management process 108, and sends commands to the operating system kernel to create NPIV ports as well as relevant configuration information. The scsitgtd also waits for critical or defined events related to failover/migration, or other events and then sends messages related to these events through the SSM monitor, as shown in
Under an embodiment, scsitgtd is enhanced as follows: (1) to support NPIV and non-NPIV (backwards compatible) modes; (2) to remove the current implicit 1:1 relationship between endpoints and transport system address; (3) enhance the scsitgtd transport subsystem to allow virtual port support; (4) enhance the scsitgtd Fibre Channel transport subsystem in use the new NPIV functionality supported by the kernel drivers (5) to detect and coordinate endpoint failover and failback when port failure occurs; (6) enhance the dd_scsitgtc API to allow utilization of the enhanced functionality; (7) enhance the SMS functionality and API to allow utilization of the enhanced functionality; and (8) enhance the DDR CLI functionality to allow utilization of the enhanced functionality.
The endpoint mapping and scsitgtd SCSI target daemon process also facilitates the dynamic configuration of virtual ports within a storage system network to allow for non-disruptive migration of data or system reconfiguration. The
SCSI Target Port Definition
In general, a system-specific name used to identify a specific SCSI target transport interface. For the Fibre Channel transport the system address is the name of the HBA port used, e.g. 5a. The transport port is a base SCSI target component used to interact with transports. Each interface is identified by a system address. In general, each SCSI Target endpoint has a system address that identifies the transport layer entity used, for example with the Fibre Channel transport the system address refers to the Fibre Channel physical HBA/port, e.g. “5a”, and for the iSCSI transport the system address refers to the iSCSI portal. This simple model is appropriate when there is a 1:1 relationship between endpoints and the underlying transport entity. Embodiments of the enhanced SCSI target endpoint system relax the 1:1 relationship and allows more operations and attributes to be associated with the underlying transport entity, which currently does not have a clear definition. Expanding the term “system address” to refer to the entire underlying transport entity was considered, however this leads to some awkward usage. For example, setting the topology for a system address is unnatural, it is not the system address that is having its topology set, it is the underlying transport entity. To clarify the description, the term SCSI Target “transport port,” or more simply just “port,” is defined as the transport entity that endpoints associate with. Each port has a unique name, its system address, and the system address continues to be used as currently. For example, a port may have system address 5a. This interface has attributes, such as topology or link speed, depending upon the transport in use. Additionally, endpoints can be assigned to the interface using the system address 5a.
In an embodiment, the SCSI target Fibre Channel transport can be configured in either NPIV or non-NPIV mode. Non-NPIV mode may be equivalent to many systems' current functionality. It is intended for use in environments where NPIV is either not available or causes issues with the customer SAN. When NPIV is disabled only a single endpoint is allowed per transport system address, and the Fibre Channel base port is used to configure that endpoint to the SAN. Endpoint failover is disabled. Preferred embodiments may operate in NPIV, which allows multiple endpoints per interface, each using an NPIV port. In this case the Fibre Channel base port is used as a place-holder definition for the port and is not associated with an endpoint. A single global setting to enable NPIV support provides the simplest configuration for the customer. In addition, to meet the requirement for concurrent mixed-mode NPIV and non-NPIV operation, each port maintains its own value for NPIV enabled/disabled. This follows the global NPIV value by default, but may be disabled for specific interfaces if necessary by the administrator. For example, if a customer is using NPIV for most interfaces but wishes to use 5a with a legacy switch that does not support NPIV then the appropriate CLI configuration would be:
// Enable NPIV globally
ddsh# scsitarget transport option set npiv enabled
// Override the global value and disable NPIV for interface 5a:
ddsh# scsitarget interface modify 5a npiv disabled
The npiv enabled option controls whether NPIV functionality can be used by the DDR, for example creating NPIV VPorts in a Fibre Channel SAN. Note: the low-level Fibre Channel subsystem always negotiates the underlying NPIV level in its standard Fibre Channel protocol negotiation. This behavior is unchanged from previous DDOS releases and is not controlled by this option. A system administrator may be responsible for setting the appropriate value for NPIV, or automatic runtime configuration of NPIV-compatible state may be provided.
The properties of a Fibre Channel interface base port change depending on whether or not NPIV mode is enabled, as outlined above. When NPIV mode is disabled the interface base port is configured as today, using the properties provides by the (single) endpoint along with any global properties for the port. For example, the WWPN for the base port is set to that of the endpoint, and the topology is set from the global interface information. When NPIV mode is enabled the interface base port is configured using a transport-specific set of properties derived from default values. For example, the desired default WWPN is associated with the “third WWPN default”, as described in the architecture specification. These values are maintained persistently to allow consistent switching between NPIV and non-NPIV mode, and may also be changed by system administrators, if necessary. For non-HA systems the base port can be set to not register an address with the switch. This is similar to EDL operation, and reduces potential confusion when a WWPN is visible through the SAN but cannot be used for I/O. Note that for HA systems the base port is envisioned as being used as a “ping” port between two nodes of a failover pair, so must still be registered with the switch.
Embodiments allow the configuration and use of multiple endpoints per port. In present systems, scsitgtd discovers ports and automatically creates a single endpoint for each transport port. If a system address is removed, such as by removal of a SLIC, an endpoint can modified to use a different system address, or deleted entirely. The current implicit 1:1 relationship between endpoint and interface leads to most endpoint and transport level management being performed through the endpoint abstraction. With the ability to define multiple endpoints per interface, and the ability to more dynamically assign endpoints to different system addresses, certain enhanced functionality is provided, namely: (1) the ability to add new endpoints, with a given system address and other properties, is defined; (2) properties of endpoints and the properties associated with transport interfaces are clarified and updated as necessary (for example, a Fibre Channel port topology is a property of the port, not an endpoint, as such storage and management of the topology must move from the endpoint to the transport layer); (3) statistics and other monitoring is more clearly defined between the endpoint and each port (i.e., enable and disable of endpoints and ports is clarified and made discrete);
Embodiments also allow for multiple port instance support in scsitgtd transport layer. In present systems, the scsitgtd transport subsystem uses the concept of an abstract port. Each port is uniquely identified by a transport port id, or tpid (an integer value), along with attributes such as it's system address, its online status, as well as transport-specific attributes. Examples of transport-specific attributes include the link speed and firmware version for the Fibre Channel transport. The transport subsystem also associates host initiators (if any) with each transport port. It should be noted that transport ports are referred to as “interfaces.” This reduces confusion with the term “port.” With support for NPIV, the Fibre Channel transport port abstraction is changed because some attributes are appropriate associated with the physical port (e.g., firmware version, physical presence, link speed, etc.) whereas other attributes are associated with virtual ports (e.g., host initiators, WWPN, WWNN, fcp2-retry state). Note that if NPIV is disabled or not otherwise available then the default behavior is backwards-compatible, i.e., a single port is used. To support multiple instances of a physical port, the concept of multiple port instances is used. New instances of a port may be created, up to a system-defined limit. Each instance has a unique tpid, but has the same system address. In this case, each transport port always has an implicit base port instance. When an endpoint is associated with system address if the address allows multiple instances then the transport layer is called to request a new port instance, which returns a new tpid. This is then persistently associated with the instance until the endpoint is otherwise updated or deleted. The transport layer persistently records each port instance in the registry; this is an extension of the existing transport registry information, which describes each physical port. Transport port APIs are modified to allow the association between base port and its instances to be determined, as well as perform operations such as get statistics on a base port or port instance.
SCSI Target Endpoint Failover/Failback
Embodiments include mechanisms for managing endpoint failover/failback. Endpoints can be configured to perform failover, and optionally failback, when events associated with the underlying port occur. For example, if the port for an endpoint goes offline for an extended period the endpoint may be automatically failed over to a different, functioning, port by the system. This functionality is supported for Fibre Channel ports using NPIV through a storage subsystem manager (SSM) component, and other components or functional structures and elements. In an embodiment, the SSM monitors the target Fibre Channel port (e.g., HBA) for offline/online status, failure conditions, and/or marginal operation. When thresholds for offline/online, marginal or failure conditions are exceeded, it automatically sends alerts, and when it determines and identifies a port to be failed, it disables that port. Alerts are automatically cleared once the fault condition is cleared. A user-level interface may be provided via the OS or an alert subsystem to pass alerts and messages to the user.
The primary function of SSM 502 is to monitor the target virtual or physical port for offline, failures or marginal conditions. Alerts are sent for failed and marginal ports through the EMS process 512. When a failed port is identified, that port is disabled by the FC target driver 516 and a notification is sent to scsitgtd 514. SSM 502 will also detect and send an alert when the Fibre Channel target HBA dumps its core. A firmware dump is considered a marginal condition and the port operational state would be set to marginal when this happens. Upon reboot, reloading of the FC target drivers, or when a failed port is enabled by a user space program, the port will resume the state prior to the failure detection. If the failure still exists, then the port operational status will change to failed and that port will be disabled. The SSM 502 will then reconcile the failure with existing alerts, only sending an alert if the failure is a new failure. If the failure is resolved, then the alert will be cleared. SSM 502 will also assume the port monitoring functions for port offline/online oscillations and conditions where an enabled port going offline triggers an alert. In an embodiment, the main functionalities managed or monitored by SSM 502 are: (1) hardware operational status (functional, marginal, failed, missing); (2) firmware dump status; (3) port oscillations and offline/online events; (4) detailed information relating to a port failure or marginal condition; and (5) alert settings and thresholds, though others are also possible.
As shown in
In an embodiment, endpoint failover using the SST monitor may be automatically enabled on ports that support it (e.g., for Fibre Channel ports with NPIV correctly enabled.) Additionally, only those endpoints with a secondary system address are candidates for failover. Each endpoint has a primary (home) system address, and Each endpoint has zero or more secondary (alternate) system addresses. Each endpoint may have a current (active) system address. The active system address may be the primary system address, a secondary system address or none if an endpoint is not currently mapped to a valid system address. On failure of a port, any endpoints that use the port as their current system address are candidates to failover to an alternate system address. Endpoints may be failed back to use their home system address when the underlying issue is resolved. The active, primary and secondary system addresses for each endpoint can be changed under administrative control. From a system perspective, scsitgtd receives notifications from the FC-SSM port monitor when Fibre Channel port related events occur, for example a port becoming online or offline, or changing its operational state. Events are immediately sent to scsitgtd, unlike the delay that is introduced for alerts raised by the Fibre Channel SSM. The SSM monitors the state of Fibre Channel ports and provides notifications to scsigtd of changes in state of ports.
For failover detection, scsitgtd performs failover processing based on event notifications from FC-SSM. Table 1 lists certain events that trigger failover, and whether such a failover is delayed or immediate.
TABLE 1
EVENT
DELAY OR IMMEDIATE FAILOVER
Offline Port
Delay
Failed Port
Immediate
Administrative Endpoint Failure
Immediate
Failover on Port Disable
Immediate
Failover Requested
When an event is received from FC-SSM failover event, scsitgtd looks for endpoints currently associated with the port and queues endpoint failover events for subsequent processing. For a manual failover the administrator causes an immediate failover event to be queued for specified endpoints. Failover events may be immediate or delayed, as indicated in Table 1. A delayed failover waits a given timeout before performing the failover. The delay allows for a transient outage to be resolved without triggering failover. The timeout is an administrator configured option. For the case of delayed failover it is possible for the port state to change a second time before failover has occurred. For example, the port becomes online again. When scsitgtd receives such a notification from FC-SSM it will find and cancel any pending endpoint failovers for that port. For a manual failover the administrator may wait for completion of the operation.
Endpoint failover is executed by an independent agent in scsitgtd. This allows it to gather the appropriate resources to change the configuration. Note that performing failover may take significant time so it is not appropriate to perform it in the context of the FC-SSM notification; so the system is configured such that notifications are relatively lightweight. The execution agent runs both periodically, and also on demand if an immediate failover event occurs. The agent is responsible for handling queued endpoint failover events and executing them. During failover execution each endpoint that meets the criteria for failover is migrated to an alternate system address. The following general algorithm is used:
If endpoint should failover at this
time then
end
if error
else
End
Advise waiters of execution completion.
If failover is not possible the endpoint is left alone. This fits the general architectural goal where the system fails over when possible, but acknowledges that at a given point in time failover may not be possible. The new system address may include any of the system addresses associated with the endpoint that are enabled and online. This is discussed in more detail in the description below.
Part of failover processing determines the failover destination. In an embodiment, the system performs the following to determine the failover destination.
1. If the current address is the primary address search each address in the secondary address list for an online, normal port. For the first one found, use that for the new current address. Done
2. If the current address is a secondary address and there is more than one secondary address then search the secondary address list for an online, normal port that is not the current address. For the first one found, use that for the new current address. Done.
Note that currently there is no failover from secondary back to primary address. This may be configured by enabling automatic failback. Failing over to a marginal port may not possible during certain circumstances, such as if operation on a marginal port is preferable to no service. In other cases, it may be decided that no fail over is preferable. Optimization is also possible when multiple secondary port(s) are available, for example by examining the number of endpoints on each port, or looking at the current amount of activity on each port load balancing could be performed during failover.
While failover returns to a different address, failback is the operation of returning a failed-over endpoint to that endpoint's home system address. Depending upon administrative configuration this can happen automatically when a port has become online and is operating normally, or under manual control by an administrator. For automatic failback the failback delays for an administrator-defined interval before performing failback. This provides additional assurance that the restored port is operating correctly. Administrative (manual) failback is triggered by changing the in-use system address for an endpoint, or by successfully enabling an port and requesting failback of endpoints that have their home on the port and are currently failed over to another system address. In cases of administrative-requested failback no failback delay is applied. When a failback trigger occurs an endpoint failback event is queued for subsequent operation. For administrative failback the administrator may wait for completion of the operation.
Table 2 lists the three main ways for performing failback, and the operation characteristics of each.
TABLE 2
DELAY OR
EVENT
IMMEDIATE FAILOVER
Administrative failback of selected
Finest control of failback;
endpoints via scsitarget endpoint use
administrator controls location
of each endpoint as primary or
secondary port
Administrator enables a now functioning
Administrator controls when
port and requests that any failed-over
operation occurs; single step
endpoints from that port failback to that
operation to recover system
port when it becomes available
to normal operation
Automatic failback when system detects
Hands-free operation of recovery
a port is operating normally after
from earlier failover.
previous failover event
Has lowest level of control
Endpoint failback is executed by an independent agent in scsitgtd. This allows it to gather the appropriate resources to change the configuration. Again, as noted above, because of time resources, notifications should be relatively lightweight.
The execution agent runs both periodically, and also on demand if an immediate failback event occurs. The agent is responsible for handling queued endpoint failback events and executing them. During failback execution each endpoint that meets the criteria for failback is migrated to its home system address. The following general algorithm is used for failback:
If endpoint
should failback
at this time
then
if error
completion.
Further description and illustration of the failover method and system are provided below with reference to
Updating Group Device Port Bitmasks
Devices are visible to specific host initiators, on specific Fibre Channel ports. In an embodiment, the mapping is managed in the kernel by SCST access groups. Each device in an access group has a port bitmask associated with it, providing the definition for which ports that device is visible on. The port bitmask includes NPIV virtual ports. Thus, when executing failover/failback or migrating an endpoint from one port to another in an NPIV environment it is necessary to update the port bitmask information in SCST. If there are many devices this may take a significant amount of time, which could adversely affect the overall failover/failback time. To address this a new kernel SCST port is added that allows batch updating of the port bitmasks for devices in groups.
A detailed API and description for Endpoint Failover/Failback/Migration is provided as Appendix 1 attached hereto.
API, Data Structure, and Registry Changes
In an embodiment, certain APIs may also be changed or added. One such API is the dd_scsitgtc_interface_show( ) API, which is added to show detailed interface information, similar to the existing dd_scsitgt_endpoint_show( ) API for endpoints. The dd_scsitgtc_interface_show_free( ) is used to free the allocated results of a show operation.
The scsitgtd transport API provides a port to the transport subsystem within scsitgtd. scsitgtd_transport_list_ports. The existing scsitgtd_transport_list_ports( ) API lists ports by name (i.e., system address). This API is modified to return the names of base ports. The scsitgtd_transport_list_port_ids lists all transports matching a given criteria.
The existing scsitgtd_transport_port_get_initiator_info API returns initiators visible on a given port by name (system address). This is no longer appropriate. This is replaced the API scsitgtd_transport_port_get_initiator_info_by_id which returns initiators visible on a given transport port instance. The existing scsitgtd_transport_port_name_resolve API looks up a transport port by one or more names (system address). It is modified to return the base port if a system address is given, or a port instance if a more specific name is given. The existing scsitgtd_transport_port_get_info API gets information for a port by system address. This is modified to return the information for the base port matching the system address, if any.
The existing scsitgtd_transport_option_set API sets transport option; it is modified to allow the new option npiv, to enable or disable NPIV support. The existing scsitgtd_transport_option_reset API resets transport options, and is modified to allow the new option npiv, to enable or disable NPIV support. The existing scsitgtd_transport_option_show API sets transport options, and is modified to show the new option npiv, showing whether NPIV support is enabled or not. The existing scsitgtd_transport_port_set_options API sets individual port options, and is modified so that port options that only apply to base port, e.g. port topology, can only be applied to base ports.
The scsitgtd_transport_port_instance_add API requests the transport subsystem to create a new port instance, and associate a port id with it. The scsitgtd_transport_port_instance_delete API requests the transport subsystem to delete an existing port instance. The existing scsitgtd_transport_show_stats API shows detailed statistics for a given list of endpoints, organized by endpoint. This is modified to return detailed statistics for a list for transport system addresses, with filtering by system address.
Under an embodiment, certain defined data structures are also modified, including: dd_scsitgtc data structures. The dd_scsitgtd_transport_stats_filter_t data structure is used to restrict transport statistics to selected transports and/or system addresses. It is similar to the existing endpoint-oriented dd_scsitgtd_stats_filter. The existing dd_scsitgtd_transport_stats_t data structure is used to return detailed transport port information. It is currently organized by endpoint, which is inappropriate. The dd_scsitgtc_interface_info_t data structure describes a single interface in detail. The existing dd_scsitgtd_endpoint_info_t data structure describes an endpoint, and is updated to reflect the separation between endpoint and port.
Certain scsitgtd RPC data structures are also updated. The scsitgtd_transport_stats_filter_t data structure is added to filter scsitgtd statistics requests. It is used for RPCs that return transport oriented detailed statistics. The existing scsitgtd_transport_stats_t data structure is used to return detailed transport port statistics. Currently this returns a scsitgtd_endpoint_stats_t array, which is inappropriate. The structure is changed as follows. The existing scsitgtd_transport_port_info_t structure describes a given port. This is modified to allow for multiple port instances:
typedef struct {
. . .
dd_bool_t base_port; // TRUE if this
is a base port instance dd_uint32_t
max_instances; // Maximum
number of instances supported dd_uint32_t
current_instances // Current
number of instances
scsitgtd_id_t instances[SCSITGTD_MAX_PORT_INSTANCES]; // Current
instance ids
} scsitgtd_transport_port_info_t;
In an embodiment, certain registry structures are also modified. For example, the existing scsitgtd.transport registry namespace contains information about SCSI Target transports and associated configuration (e.g. transport options, ports, etc.) is modified. Likewise, the existing scsitgtd.endpoint registry namespace contains information about SCSI Target endpoints, and a new scsitgtd.option registry namespace that contains global scsitgtd options is added.
Fibre Channel Port Failover
Embodiments include a port failover feature on the DDR system that allows for automatic failover in the event of port failover that uses the NPIV techniques described herein. An NPIV (virtual) port is created on a physical port and assigned with a fixed WWPN. The virtual port is an endpoint that DDR LUN (logical unit number) devices can be accessed through. If the physical port fails, then the virtual port will be removed from the failed physical port and recreated on the designated physical port (failover port) with the same WWPN. DDR LUN devices will remain available through the same endpoint which is now a virtual port on the failover port. A physical link failure is monitored by a user space process and failover can be triggered automatically when failure is detected.
The port failover feature can be used as a basis for multi-node failover strategies in high availability systems. It can be used to increase LUN availability on DDR systems. Access to DDR LUNs will remain available against link failure that could be a result of HBA failure, port failure or other hardware failures such as failed SFP, bad cables/connections, etc.
As defined above, an endpoint is a named generalization of a transport, a specific name in SCSI Target, and is used to expose SCSI Target devices based on SCSI Target access groups; an NPIV is a Fibre Channel technology that allows multiple N_Port IDs to share a single physical N_Port where each N_Port has a unique identity (WWPN) on the SAN; a base port is a port that always exists within a physical port, and as one-to-one mapping with physical port. A base port is assigned a unique WWPN which is used by HBA Firmware to perform fabric login. When port failover feature is disabled, the base port serves as an endpoint. When it is enabled, a base port does not serve as an endpoint and it is only used to monitor physical link state. A virtual port is an NPIV port created on a physical port, and one physical port can have multiple virtual ports. When created, a virtual port is assigned a unique WWPN which is used by HBA Firmware to perform FDISC login. When port failover feature is enabled, virtual port serves as an endpoint. When port failover is enabled, a port is considered failed when it experiences a link failure. A link failure is a state when the link goes down and remains down for a specified (wait_time) period. A link failure can result from HBA failure, physical port failure or other hardware failures such as failed SFP, bad cable, etc. Link failure due to HBA port being explicitly disabled is not considered as a failed port.
In the event of a failure condition caused by a failed port, a defined failover policy is executed. The SCSI Target shall initiate the failover process, which is triggered by a failed port. In an embodiment, the endpoint on the failed port shall fail over to its failover port if failover port is defined. The failover port is the designated physical port(s) where an endpoint will fail over if the original physical port (primary system address) fails. The failover port may be system or user defined as a failover system address. It is set per endpoint basis. In an implementation, the default failover port is set to none for each endpoint which means the endpoint will not fail over on failure. The Failover port could take a list of system addresses. The order within the list reflects the sequence of physical ports which endpoint will fail over to. When the primary system address fails, the endpoint will fail over to the first available system address by following the sequence in the list. If later the failover system address also fails, then the endpoint will fail over to the next available address defined in the list. Once the last system address in the list is reached, the search will circle back to the beginning of the list. If none of the system addresses in the failover list is available, then the endpoint will be declared offline. The endpoint shall fail over based on its set method, which can be automatic or manual and system or user-configurable and set per endpoint basis. By default failover method is set to automatic, which means if failover port is defined, then the endpoint will automatically fail over to the failover port on failure.
In a failback situation, an endpoint will always fail back to its original physical port or primary system address. The failback method can be automatic or manual, and system or user-configurable and set per endpoint basis. By default failback method is set to manual which means user will manually perform failback operation on an endpoint to its original physical port once issue on the original port is resolved. If failback method is set to automatic, endpoint will fail back once link on the primary physical port comes up and remains up after the defined wait time period, which allows the link to achieve stability.
Several driver interfaces can be defined to implement failover and failback functionality according to embodiments. Illustrative driver and PLIB interfaces include:
Driver Interfaces
PLIB Interfaces
In a specific implementation the failover/failback functionality may be implemented as a feature that can be enabled/disabled by the user. In such a case, upon a new installation or system upgrade, the port failover feature may be disabled by default.
With respect to failover operations, upon port failover the SCSI target will receive asynchronous offline notification from SSM if the physical link goes down. If failover method is set to automatic, then the SCSI target will monitor for link failure condition per endpoint. A link is failed when it remains down for a specified wait time period which is set per endpoint. The wait time period may be set to any appropriate length of time, such as one minute but other time periods are also possible. If failover port is defined and link failure condition is satisfied for an endpoint, then the SCSI target shall initiate endpoint failover. In an embodiment, a failover condition can also be triggered manually by user with a failover command line interface command. To perform the failover operation, the SCSI target first deletes the virtual port of the endpoint from its primary physical port and then creates a new virtual port with the same WWPN on the failover port. The newly created virtual port on the failover port will serve the same endpoint and it will be assigned a new unique id. PortMask of LUN devices accessed through the affected endpoint shall be updated with a new value computed from the new unique id. After PortMask is updated, the SCSI Target shall enable the new virtual port.
With respect to failback operation, the system may be implemented such that manual failback is the default failback method. When set to manual, it requires user to perform a failback command once issue with the primary physical port has been resolved. If failback method is set to automatic, once SCSI target receives primary physical port online notification from the SSM, it will wait for a wait time period. If the primary physical port remains online after this period, the SCSI target will initiate a failback. For this case, the wait time period may be set to five minutes or some similar time period. To perform a failback operation, the SCSI target will delete the virtual port on the failover port and then create a new virtual port with the same WWPN on the primary physical port. The unique id of the new virtual port on the primary physical port may be different from the unique id prior to the failover. The PortMask of LUN devices accessed through the affected endpoint will again need to be updated with the new unique id. After it is updated, the SCSI target will enable the new port.
In an embodiment, the port failover feature can be enabled and configured in any configuration or topology, but in general, the feature is only functional on physical ports that are connected to an NPIV-supported switch, and has no effect on physical ports that are not connected to an NPIV-supported switch.
The dynamic remapping method described herein, in conjunction with the SCSI target endpoint and NPIV definitions and SSM operation allow SCSI target virtual endpoints to be dynamically created, deleted and managed during failure, migration, and endpoint disable/enable operations in a backup storage appliance without incurring disruption of other virtual endpoints on the same appliance. In this way, dynamic management of SCSI target virtual endpoints provides maximum flexibility without incurring disruption to other virtual endpoints during failure or exception events. This is accomplished at least in part through a mechanism that maintains the same WWPN address for a virtual port after it is re-mapped from a first port to a second port.
Although embodiments are described with respect to Fibre Channel systems, it should be noted that other transport protocols can also be adapted to use the virtualization methods described herein, including iSCSI and Fibre Channel over Ethernet (FCoE).
Embodiments may be applied to virtualizing SCSI transport endpoints to facilitate dynamic configuration of virtual ports and base ports in any scale of physical, virtual or hybrid physical/virtual network, such as a very large-scale wide area network (WAN), metropolitan area network (MAN), or cloud based network system, however, those skilled in the art will appreciate that embodiments are not limited thereto, and may include smaller-scale networks, such as LANs (local area networks). Thus, aspects of the one or more embodiments described herein may be implemented on one or more computers executing software instructions, and the computers may be networked in a client-server arrangement or similar distributed computer network. The network may comprise any number of server and client computers and storage devices, along with virtual data centers (vCenters) including multiple virtual machines. The network provides connectivity to the various systems, components, and resources, and may be implemented using protocols such as Transmission Control Protocol (TCP) and/or Internet Protocol (IP), well known in the relevant arts. In a distributed network environment, the network may represent a cloud-based network environment in which applications, servers and data are maintained and provided through a centralized cloud-computing platform. It may also represent a multi-tenant network in which a server computer runs a single instance of a program serving multiple clients (tenants) in which the program is designed to virtually partition its data so that each client works with its own customized virtual application, with each VM representing virtual clients that may be supported by one or more servers within each VM, or other type of centralized network server.
The data generated and stored within the network may be stored in any number of persistent storage locations and devices, such as local client storage, server storage, or network storage. In an embodiment the network may be implemented to provide support for various storage architectures such as storage area network (SAN), Network-attached Storage (NAS), or Direct-attached Storage (DAS) that make use of large-scale network accessible storage devices, such as large capacity tape or drive (optical or magnetic) arrays, or flash memory devices.
For the sake of clarity, the processes and methods herein have been illustrated with a specific flow, but it should be understood that other sequences may be possible and that some may be performed in parallel, without departing from the spirit of the invention. Additionally, steps may be subdivided or combined. As disclosed herein, software written in accordance with the present invention may be stored in some form of computer-readable medium, such as memory or CD-ROM, or transmitted over a network, and executed by a processor. More than one computer may be used, such as by using multiple computers in a parallel or load-sharing arrangement or distributing tasks across multiple computers such that, as a whole, they perform the functions of the components identified herein; i.e., they take the place of a single computer. Various functions described above may be performed by a single process or groups of processes, on a single computer or distributed over several computers. Processes may invoke other processes to handle certain tasks. A single storage device may be used, or several may be used to take the place of a single storage device.
Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to.” Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words “herein,” “hereunder,” “above,” “below,” and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word “or” is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.
All references cited herein are intended to be incorporated by reference. While one or more implementations have been described by way of example and in terms of the specific embodiments, it is to be understood that one or more implementations are not limited to the disclosed embodiments. To the contrary, it is intended to cover various modifications and similar arrangements as would be apparent to those skilled in the art. Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
API and description for Endpoint Failover/Failback/Migration. The key APIs are scsitgtd_endpoint_fofb_submit( ) which submits a new request, and scsitgtd_endpoint_fofb_cancel_* to cancel outstanding requests.
Note that requests can be asynchronous or synchronous.
/**
* Reason why FO/FB is being requested
*/
typedef enum {
SCSITGTD_ENDPOINT_FOFB_R_UNKNOWN=0,
SCSITGTD— ENDPOINT_FOFB_R_USE, // scsitgtd endpoint use primary/secondary
SCSITGTD— ENDPOINT— FOFB— R— AUTO_FAILOVER, // automatically triggered failover
SCSITGTD— ENDPOINT— FOFB— R— AUTO_FAILBACK, // automatically triggered failback
SCSITGTD_ENDPOINT_FOFB_R_PORT_ENABLE_FAILBACK, // failback on administrative port enable
SCSITGTD_ENDPOINT_FOFB_R_PORT_DISABLE_FAILOVER // failover on administrative port disable
} scsitgtd_endpoint_fofb_reason_e;
/**
* Flags to control how an endpoint failover/failback operation is
* executed.
* Rules:
*-Exactly one of ASYNC or WAIT must be given
*-Exactly one of PRIMARY or SECONDARY must be given
*-If NOP_OK is set then a NOOP is allowed (i.e. primary to primary)
*/
typedef enum {
SCSITGTD_ENDPOINT_FOFB_ASYNC=0x001, // Don't wait, excl WAIT
SCSITGTD_ENDPOINT_FOFB_WAIT=0x002, // Block, wait for timeout
SCSITGTD_ENDPOINT_FOFB_NOP_OK=0x004, // If set a NOOP is allowed
SCSITGTD_ENDPOINT_FOFB_TO_PRIMARY=0x008, // If set destination is primary
SCSITGTD_ENDPOINT_FOFB_TO_SECONDARY=0x010, // If set destination is secondary
SCSITGTD_ENDPOINT_FOFB_NO_DELAY=0x020 // Don't delay operation
} scsitgtd_endpoint_fofb_e;
/**
* Endpoint failover/failback information, external interface
*/
typedef struct {
scsitgtd_endpoint_fofb_e flags; // Flags
dd_uint32_t wait_secs; // Time to wait for the operation, or 0 to block
scsitgtd_cb_fcn_t complete_cb; // Optional callback called on completion, or NULL
void *complete_cb_cookie; // Optiona cookie to pass to complete_cb
scsitgtd_endpoint_fofb_reason_e reason;
} scsitgtd_endpoint_fofb_t;
/*
* Initialization/shutdown.
*/
extern dd_err_t * scsitgtd_endpoint_fofb_init(void);
extern dd_err_t * scsitgtd_endpoint_fofb_startup(void);
extern void scsitgtd_endpoint_fofb_shutdown(void);
/**
* Request failover/failback/use operation.
*/
extern dd_err_t * scsitgtd_endpoint_fofb_submit(scsitgtd_endpoint_t
*endpoint,
* Flag pending requests for cancel. This is only complete when the
* worker has completed its next run.
*/
extern dd_err_t * scsitgtd_endpoint_fofb_cancel_all(void);
extern dd_err_t * scsitgtd_endpoint_fofb_cancel_port_id(scsitgtd_id_t port_id, dd_bool_t auto_only);
extern dd_err_t * scsitgtd_endpoint_fofb_cancel_endpoint(scsitgtd_endpoint_t *endpoint);
extern dd_err_t * scsitgtd_endpoint_failover_port_disable(scsitgtd_id_t port_id);
extern dd_err_t * scsitgtd_endpoint_failback_port_enable(scsitgtd_id_t port_id);
extern void scsitgtd_endpoint_fofb_wake_worker(void);
Internally, a request is validated then is allocated to a linked list of requests:
/**
* States for Failover/Failback requests
*/
typedef enum {
SCSITGTD— ENDPOINT— FOFB— S_PENDING=1, // Request is pending start
SCSITGTD— ENDPOINT— FOFB— S_EXECUTING, // Request is executing
SCSITGTD— ENDPOINT— FOFB— S_CANCEL, // Request is marked for cancel
SCSITGTD— ENDPOINT— FOFB— S_DONE // Request has completed executing
} scsitgtd_endpoint_fofb_state_e;
/**
* Endpoint failover/failback request structure.
*/
typedef struct {
delem_t elem; // Link in request queue
dd_lwmutex_t mutex; // Mutex for update
dd_lwcondvar_t cond; // Condition for wait
dd_lwcondvar_t cond_wait_done; // Condition for wait done
dd_monotime_t delay_start; // When delay started
dd— uint32_t delay_secs; // Number of seconds to delay
scsitgtd_endpoint_fofb_state_e state; // Request state
scsitgtd_endpoint_fofb_t info; // Original info about FO/FB request
dd_uint32_t waiters // Number of current waiters
scsitgtd_id_t eid; // Endpoint id
scsitgtd_id_t tpid; // The *old* port id.
dd_err_t err; // Resultant error, if any
} scsitgtd_endpoint_fofb_request_t;
A worker thread then periodically walks this list, checking for work to do, its general algorithm is:
while (not shutdown) {
} else {
}
Here, process_request( ) has general algorithm:
if (request has waiters) {
}
release— request( )
Patent | Priority | Assignee | Title |
10833982, | Nov 29 2018 | International Business Machines Corporation | Server-side path selection in multi virtual server environment |
10855515, | Oct 30 2015 | NetApp Inc. | Implementing switchover operations between computing nodes |
10880370, | Nov 27 2018 | AT&T Intellectual Property I, L P | Virtual network manager system |
10970149, | Jan 03 2019 | International Business Machines Corporation | Automatic node hardware configuration in a distributed storage system |
11023307, | Jan 03 2019 | International Business Machines Corporation | Automatic remediation of distributed storage system node components through visualization |
11169946, | Feb 24 2020 | International Business Machines Corporation | Commands to select a port descriptor of a specific version |
11169949, | Feb 24 2020 | International Business Machines Corporation | Port descriptor configured for technological modifications |
11327868, | Feb 24 2020 | International Business Machines Corporation | Read diagnostic information command |
11451624, | Nov 27 2018 | AT&T Intellectual Property I, L.P. | Virtual network manager system |
11520678, | Feb 24 2020 | International Business Machines Corporation | Set diagnostic parameters command |
11645221, | Feb 24 2020 | International Business Machines Corporation | Port descriptor configured for technological modifications |
11657012, | Feb 24 2020 | International Business Machines Corporation | Commands to select a port descriptor of a specific version |
Patent | Priority | Assignee | Title |
6888792, | Dec 07 2000 | Intel Corporation | Technique to provide automatic failover for channel-based communications |
7711789, | Dec 07 2007 | Intellectual Ventures II LLC | Quality of service in virtual computing environments |
7782869, | Nov 29 2007 | INTELLECTUAL VENTURES FUND 81 LLC; Intellectual Ventures Holding 81 LLC | Network traffic control for virtual device interfaces |
8077730, | Jan 31 2003 | AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE LIMITED | Method and apparatus for providing virtual ports with attached virtual devices in a storage area network |
8213447, | Aug 12 2004 | AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE LIMITED | Apparatus and system for coupling and decoupling initiator devices to a network using an arbitrated loop without disrupting the network |
8274881, | May 12 2009 | International Business Machines Corporation | Altering access to a fibre channel fabric |
8626967, | Jun 29 2012 | EMC IP HOLDING COMPANY LLC | Virtualization of a storage processor for port failover |
8839043, | Mar 27 2012 | EMC IP HOLDING COMPANY LLC | Managing a port failover in a data storage system |
8949656, | Jun 29 2012 | EMC IP HOLDING COMPANY LLC | Port matching for data storage system port failover |
9298566, | Apr 17 2013 | Hewlett Packard Enterprise Development LP | Automatic cluster-based failover handling |
9390034, | Jun 27 2013 | EMC IP HOLDING COMPANY LLC | Unified SCSI target management for performing a delayed shutdown of a service daemon in a deduplication appliance |
20020129246, | |||
20030126242, | |||
20040199353, | |||
20050102603, | |||
20070174851, | |||
20070239944, | |||
20080005311, | |||
20080127326, | |||
20080162813, | |||
20090254640, | |||
20090307330, | |||
20090307378, | |||
20100149980, | |||
20100250785, | |||
20100293552, | |||
20110239014, | |||
20110239213, | |||
20110302287, | |||
20120079499, | |||
20120084071, | |||
20120254554, | |||
20130198739, | |||
20130246666, | |||
20140281715, | |||
20140317265, | |||
20140317437, | |||
20150106518, | |||
20150269039, | |||
20160034366, |
Date | Maintenance Fee Events |
Mar 24 2021 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Date | Maintenance Schedule |
Oct 24 2020 | 4 years fee payment window open |
Apr 24 2021 | 6 months grace period start (w surcharge) |
Oct 24 2021 | patent expiry (for year 4) |
Oct 24 2023 | 2 years to revive unintentionally abandoned end. (for year 4) |
Oct 24 2024 | 8 years fee payment window open |
Apr 24 2025 | 6 months grace period start (w surcharge) |
Oct 24 2025 | patent expiry (for year 8) |
Oct 24 2027 | 2 years to revive unintentionally abandoned end. (for year 8) |
Oct 24 2028 | 12 years fee payment window open |
Apr 24 2029 | 6 months grace period start (w surcharge) |
Oct 24 2029 | patent expiry (for year 12) |
Oct 24 2031 | 2 years to revive unintentionally abandoned end. (for year 12) |