An information processing device includes a device, a management device that is connected to the device via a first transmission route and configured to acquire information regarding the device via the first transmission route, and a processing device that is connected to the device via a second transmission route, connected to the management device via a third transmission route, and configured to initialize the device and acquire the information from the management device via the third transmission route.
|
7. An information processing method comprising:
acquiring information regarding a device via a first transmission route by a management device that is connected to the device via the first transmission route;
initializing the device by a processing device that is connected to the device via a second transmission route and connected to the management device via a third transmission route;
acquiring the information from the management device via the third transmission route by the processing device, the management device is configured to acquire the information regarding the device via the second transmission route without passing through the processing device; and
controlling, by the management device, the processing device to activate, in a first activation of the processing device, the processing device in a minimum activation mode in which first resources less than second resources which are initialized in a normal activation mode are initialized and activate, in a reactivation of the processing device, the processing device in the normal activation mode in which the second resources are initialized based on the acquired information regarding the device by the management device via the second transmission route.
1. An information processing device comprising:
a device;
a management device that is connected to the device via a first transmission route and configured to acquire information regarding the device via the first transmission route; and
a processing device that is connected to the device via a second transmission route, connected to the management device via a third transmission route, and configured to initialize the device and acquire the information from the management device via the third transmission route, the management device is configured to acquire the information regarding the device via the second transmission route without passing through the processing device,
the management device is further configured to control the processing device to activate, in a first activation of the processing device, the processing device in a minimum activation mode in which first resources less than second resources which are initialized in a normal activation mode are initialized and activate, in a reactivation of the processing device, the processing device in the normal activation mode in which the second resources are initialized based on the acquired information regarding the device by the management device via the second transmission route.
13. A non-transitory computer-readable medium storing instructions executable by a computer that includes a device, a management device that is connected to the device via a first transmission route, and a processing device that is connected to the device via a second transmission route, and connected to the management device via a third transmission route, the instructions comprising:
instructions for acquiring information regarding the device via the first transmission route by the management device, the management device is configured to acquire the information regarding the device via the second transmission route without passing through the processing device;
instructions for initializing the device by the processing device;
instructions for acquiring the information from the management device via the third transmission route by the processing device; and
instructions for controlling, by the management device, the processing device to activate, in a first activation of the processing device, the processing device in a minimum activation mode in which first resources less than second resources which are initialized in a normal activation mode are initialized and activate, in a reactivation of the processing device, the processing device in the normal activation mode in which the second resources are initialized based on the acquired information regarding the device by the management device via the second transmission route.
2. The information processing device according to
3. The information processing device according to
4. The information processing device according to
5. The information processing device according to
6. The information processing device according to
8. The information processing method according to
9. The information processing method according to
when an error has occurred during initialization of the device, identifying, by the processing device, the device that has the error on the basis of the acquired information.
10. The information processing method according to
11. The information processing method according to
12. The information processing method according to
|
This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2017-168568, filed on Sep. 1, 2017, the entire contents of which are incorporated herein by reference.
The embodiment discussed herein is related to information processing technology.
An information processing device exemplified by a computer, a server, or the like is activated in accordance with a program controlling basic inputs and outputs, which is referred to as a basic input/output system (BIOS). At the time of activation, the information processing device initializes each of units exemplified by a central processing unit (CPU), a memory, and an input/output (I/O) device. The initialization of the I/O device is implemented by a driver of the I/O device. The CPU that executes the BIOS calls the driver and waits until the initialization is completed. In the description below, in a case where the CPU performs processing using the BIOS, it may be also described that the BIOS performs processing, simply.
Furthermore, some information processing devices are mounted with a management device such as a baseboard management controller (BMC). Such a management device is operated by a dedicated CPU, a memory, and an operating system (OS), separately from the information processing device. The BIOS is communicable with the management device as described above with predetermined procedures.
For example, the BIOS collects information of the I/O device after the initialization of the I/O device. The BIOS notifies the management device of the collected information of the I/O device. The management device is able to support management of the information processing device based on the information of the I/O device provided from the BIOS.
The related arts are disclosed in Japanese Laid-open Patent Publication No. 2013-041390, Japanese Laid-open Patent Publication No. 2007-249761, and Japanese Laid-open Patent Publication No. 2002-259130, for example.
According to an aspect of the invention, an information processing device includes a device, a management device that is connected to the device via a first transmission route and configured to acquire information regarding the device via the first transmission route, and a processing device that is connected to the device via a second transmission route, connected to the management device via a third transmission route, and configured to initialize the device and acquire the information from the management device via the third transmission route.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
During activation of an information processing device, until initialization of each unit is completed, there is a period during which information of units, such as an I/O device, in the information processing device is not be able to be acquired. With a conventional technique, when a failure occurs in any part of an information processing device during initialization, the information processing device is not able to identify a part in which the failure occurs in some cases. This disables the information processing device to present information indicating the part in which a failure occurs, whereby time and labor have to be taken to recover the failure.
In the description below, an information processing device according to an embodiment will be described with reference to the drawings. The configuration in the embodiment below is an example, and the information processing device is not limited to that described in the embodiment.
In
Furthermore, the server 1 is mounted with a BMC 20. The BMC 20 performs remote power source control and hardware monitoring with respect to the server 1 based on the Intelligent Platform Management Interface (IPMI) specification. As illustrated in
For example, the BMC 20 has a keyboard controller style (KCS) interface 25 and communicates with the CPU 11 via the I/O device 13 of the server 1, for example, in accordance with the IPMI specification. Furthermore, the BMC 20 has a serial device 23 and a LAN device 24. The BMC 20 is connected to a personal computer (PC) prepared separately from the server 1 with a LAN cable connected to a LAN external connector 34 or a serial cable connected to a serial external connector 33. The PC is able to obtain information of the BMC 20 from a web server in the BMC 20 via the LAN cable or the serial cable. The transmission route connecting the BMC 20 and the I/O device 13 via the KCS interface 25 is an example of a third transmission route.
Furthermore, the server 1 manages each unit of the hardware inside the server 1 in accordance with the interface specification compliant with the IPMI specification. The IPMI is a specification for a standardized message-based hardware management interface. The information of the BMC 20 is able to be obtained via the BIOS and an IPMI driver. The IPMI driver is a program for the OS of the server 1 for communicating with a BMC 320. That is to say, the server 1 is communicable with the BMC 20 using a command compliant with the IPMI specification via an interface such as KCS, System Management Interface Chip (SMIC), and Block Transfer (BT), in accordance with the BIOS or the OS. The CPU 11 is an example of a processor and a first processor, and the I/O device 13 is an example of an electrical device.
The MCTP is used by a management controller such as the BMC 20 that is provided on the server 1 or an information processing device similar to the server 1. The management controller manages the state of the server 1 or statistical information using the MCTP protocol and retains management parameters. The management parameters includes a communication speed of a transmission route, a power state, a use ratio of a hardware resource, a link state of a transmission route, an error count of the hardware of each unit (such as the I/O device), and an uncorrectable error count. Accordingly, the management controller integrates the management parameters from one or a plurality of monitoring target devices using the MCTP protocol. Furthermore, the management controller enables an information processing device such as a local or remote server 1 or other management controller to access these management parameters.
Furthermore, the management controller enables to use the MCTP protocol to monitor a monitoring target device (for example, the I/O device) connected to the server 1 or the like without using an OS operated in the server 1 or the like. This reduces the load of the server 1 or the like for monitoring the monitoring target device. The management controller is able to use the MCTP protocol to access the monitoring target device (such as the I/O device) via various buses such as PCIe or I2C.
At this point, as a physical wire for the BMC 20 and the hardware of each unit, Peripheral Component Interconnect-Express (PCIe) or Inter-Integrated Circuit (I2C) may be cited as an example. However, the physical wire is not limited to the PCIe or the I2C. The physical wire may be a System Management Bus (SMBus) derived from the I2C, for example. Meanwhile, for example, the BIOS of the server 1 is able to acquire pieces of information of the hardware of each unit of the server 1 that have been collected by the BMC 20 using a command compliant with the IPMI specification. The BMC 20 is an example of the management device. That is to say, the BMC 20 is connected to the I/O device 13 as an electrical device via the I2C as a second transmission route. Furthermore, the BMC 20 is connected to the CPU 11 as the processor and the first processor via the KCS interface 25 as the third transmission route and a transmission route thereof. The MCTP is an example of a transmission protocol for communicating with an electrical device via at least one of a first interface and a second interface, without using any processor.
Furthermore, the server 301 is mounted with the BMC 320. The BMC 320 includes a CPU 21, a memory 22, and a LAN device 24 and performs processing compliant with the IPMI specification. The BMC 320 communicates with the BIOS of the server 1 using an interface in accordance with the KCS, for example. Furthermore, the BMC 320 acquires information (I/O device information) of the I/O devices 13-1, 13-2, and 13-3 and the like of the server 301 from the BIOS. Furthermore, the BMC 320 provides a function of a web server to a PC over a LAN via the LAN device 24. Accordingly, the PC is able to acquire the information from the BMC 320 and monitor the state of the server 301 via the LAN.
At the time of activation of the server 301, the I/O device 13 is initialized by a driver of the I/O device 13 in accordance with the control of the BIOS. For example, at the time of activation of the server 301, the BIOS calls the driver and waits until the initialization is completed. Furthermore, the BIOS collects information of the I/O device 13 after the initialization of the I/O device 13. At this point, the information of the I/O device 13 includes the number of the slot to which the I/O device 13 is attached, information of the vendor of the I/O device 13, information identifying the type (for example, the card type) of the I/O device 13. The information of the I/O device 13 is information that is referred to for identifying the I/O device 13 when a trouble occurs in the I/O device 13.
Furthermore, as described above, the BIOS and the BMC 320 are able to transmit and receive information with the IPMI specification. Accordingly, the BIOS notifies the BMC 320 of the collected information, and a server manager is able to check the information of an I/O device via the BMC 320, using a PC.
With the AC power turned on, the BMC 320 starts activation (A2). The activation process of the BMC 320 is then completed, and the BMC 320 completes the activation (A3). The time from turning on the AC power to completing the activation of the BMC 320 is a BMC activation time.
Next, the server 301 receives a user operation, for example, and turns on the power of the system of the server 301 (A4). This starts the activation process of the server 301 and the BIOS starts power-on self test (POST) (A5). The BIOS inquires of the BMC 320 about the setting at the time of activation. The setting at the time of activation includes whether the memory 12 is used in a mirror mode and whether a plurality of CPUs 11 is used as nodes so that the memory 12 is separately used for each node, for example. When the memory 12 is not separated for each node, for example, the memory 12 is accessed by the plurality of CPUs 11 evenly, as a system memory. The BIOS performs POST using the setting notified from the BMC 320.
The BIOS performs normal POST so as to provide a service to all devices. The BIOS sets a POST watchdog to the BMC 320 before the start of the POST (A6). The POST watchdog has a mechanism in which the system is reset in a case where processing is not completed within a set time.
Next, the BIOS initializes all the CPUs 11 and the memories 12 that are mounted in the server 301 (A6A). It is to be noted that the memory 12 is mounted with a dual inline memory module (DIMM), for example.
Next, the BIOS performs I/O device search processing in the POST to search the I/O device 13 (A7). Furthermore, the BIOS allocates a resource to the searched I/O device 13 (A8). The resource to be allocated to the I/O device 13 is an area in the memory 12 used by the I/O device 13, for example. It is to be noted that when the CPU 11 accesses the I/O device 13, the CPU 11 accesses an address of an address space in the memory 12. Next, the BIOS initializes all of the I/O devices 13 sequentially (A9).
A case where, in the processing from A7 to A9, a failure occurs in the I/O device 13 is assumed. When an error occurs in any stage of the processing from A7 to A9, processing performed by the BIOS is stopped (A9A). This is because, at this point, the initialization of the I/O device 13 has not been completed and the BIOS is not able to identify the information of the I/O device 13 such as the slot number, the card type, or the like. The BIOS thus is not able to perform processing for recovering from the error.
Because the processing of the BIOS is stopped, the POST watchdog times out (A11). Because the POST watchdog times out, the system is reset (A12). However, the BIOS does not retain the information identifying the I/O device 13 and the cause of the error is unknown, whereby resetting is repeated with the same error of the I/O device 13 (from A4 to A12). This disables the server 301 to be activated. For the server 301 to recover from the error, for example, the user has to perform an isolation operation by manually removing the I/O device 13, for example, I/O cards one by one to check whether normal activation is possible.
In an embodiment, the BMC 20 collects the information of the I/O device 13 before the start of the POST by the BIOS, whereby the problem that occurs in the above-described comparison example is solved. That is to say, even if an error occurs in a state in which the BIOS does not retain the information identifying the I/O device 13 when the BIOS performs the POST, the BIOS receives the information of the I/O device 13 provided from the BMC 20, whereby the BIOS is able to identify the cause of the error and the part in which the error has occurred.
That is to say, even when the server 1 is activated and the POST is not completed, the server 1 is able to acquire the information of the I/O device 13. In a case where the server 1 is a large-scale system, the number of the I/O devices 13 is very large. With this, even when an error has occurred in one of the I/O devices 13, the remaining ones of the I/O devices 13 are usable, enabling activation of the server 1. Furthermore, when errors occur in a plurality of I/O devices 13, the BIOS points out the plurality of errors without stopping at an error in one of the I/O devices 13, enabling to collectively perform operations for replacing the I/O device 13 that has a trouble.
<Data Flow>
However, unlike the case of the comparison example, the BMC 20 includes an I/O device information acquisition processing unit 210 that acquires information of the I/O device 13 in advance and an MCTP protocol processing unit 203 that performs communication complaint with the MCTP. The I/O device information acquisition processing unit 210 also notifies the BIOS of the acquired information of the I/O device 13 in advance. “In advance” means before the BIOS performs activation of the entire server 1.
That is to say, the BIOS uses not only the information of the I/O device 13 that the BIOS itself has acquired but also information of the I/O device 13 acquired from the BMC 20. However, to acquire the information of the I/O device 13 using the BMC 20 in advance, at least the power has to be applied to the I/O device 13 as a condition. For this reason, in the server 1, the activation processing is proceeded at least up to the stage in which the power is applied to the I/O device 13.
Furthermore, in the present embodiment, the BMC 20 acquires information from the I/O device 13 with a configuration and procedures compliant with the MCTP via a transmission route other than the PCIe being a transmission route connecting the CPU 11 and the I/O device 13, for example, a transmission route of the I2C connecting the BMC 20 and the I/O device 13, for example. However, when the I/O device 13 is initialized by the BIOS, the BMC 20 is able to acquire the information from the I/O device 13 via the PCIe being a transmission route connecting the CPU 11 and the I/O device 13.
Furthermore, a control unit that performs communication compliant with the MCTP, which is omitted in
In the present embodiment, the activation processing of the server 1 is performed in a manner separated in two stages. In the first stage, the power is applied to the I/O device 13. After the power is applied to the I/O device 13, the BMC 20 acquires the information from the I/O device 13 via the I2C. It is to be noted that when the BMC 20 is able to acquire the information from the I/O device 13 via the PCIe, the BIOS initializes the I/O device 13 connected to the CPU 11 via the PCIe. However, the initialization of the I/O device 13 via the PCIe in the first stage may be omitted. That is to say, in the first stage, the BMC 20 acquires the information from the I/O device 13 via at least one of the I2C and the PCIe. Then, the BMC 20 provides the information acquired from the I/O device 13 to the BIOS.
In the second stage, the server 1 is reactivated and the BIOS performs the POST and initialization of each unit of the hardware of the server 1, for example, the I/O device 13. Even in a case where an error has occurred in a state in which the BIOS is not able to identify the information of the I/O device 13 when the BIOS performs the POST, the BIOS refers to the information identifying the I/O device 13 provided from the BMC 20. For this reason, the BIOS has an I/O device error processing unit 107 that performs error processing when an error has occurred in the I/O device 13. The I/O device error processing unit 107 identifies the cause of the error based on the information identifying the I/O device 13.
For reduction of processing time in the first stage, the BIOS has a “minimum activation mode of CPU and DIMM” (hereinafter, simply referred to as a “minimum activation mode”) in which only parts of the CPU 11 and the memory 12 are enabled for activation. In this minimum activation mode, to reduce the time for initialization of the CPU 11 and the memory 12, which accounts for a relatively high percentage of the activation time, only one core of the CPU 11 with multiple cores and only one device of the memory 12 (DIMM) are initialized to be usable. Furthermore, the BMC 20 has a function to notify the BIOS of the minimum activation mode. The BIOS has an activation mode determination processing unit 106 that determines the activation mode notified from the BMC 20.
The BMC 20 uses the MCTP of both of the PCIe and I2C in the I/O device information acquisition processing unit 210, enabling isolation between a trouble with the I/O device 13 itself and an abnormality in the PCIe or I2C route. However, as described above, using the MCTP in the transmission route of the PCIe is based on an assumption that the I/O device 13 is initialized and a resource such as an area in the memory 12 is allocated to the I/O device 13.
The server 1 is an example of an information processing device, the transmission route of the PCIe is an example of a first transmission route, and the transmission route of the I2C is an example of the second transmission route. The PCIe is an example of the first interface and the I2C is an example of the second interface. Furthermore, the KCS interface 25 illustrated in
After the BMC 20 completes the activation, the BMC 20 turns on the power of the system of the server 1. That is to say, the BMC 20 instructs the activation to the server 1 (S3A). The BMC 20 retains setting information of the activation mode of the server 1 in an activation mode storage area. During the time from turning on the AC power to acquiring the information of the I/O device 13, the BMC 20 sets the activation mode of the server 1 to the minimum activation mode and retains the minimum activation mode. The processing from S1 to S3B is an example of a management device activating a processing device. The processing at S3A and S3B is an example of the management device activating the processing device via a third transmission route.
In accordance with the instruction from the BMC 20, the activation process of the server 1 is started (S3B) and the BIOS starts the POST (S3C). The BIOS inquires of the BMC 20 about the activation mode (S3D) and the BMC 20 notifies the BIOS of the activation mode (S3E). The BIOS performs the POST in the notified activation mode (minimum activation mode). That is to say, the BIOS activates the CPU 11 and the memory 12 in the minimum activation mode, and the BMC 20 uses the MCTP through the PCIe and the I2C to acquire the information of the I/O device 13 (S4). Out of procedures at S4, a procedure using the transmission route of the I2C is an example of the management device accessing an electrical device via the I2C as the second transmission route to acquire the information identifying the I/O device 13 as an electrical device.
In communication processing compliant with the MCTP, when an error has occurred in the PCIe and no problem has occurred in the I2C, for example, the BMC 20 provides a notification of the error as an abnormality in the PCIe route. When there is a problem in both of the PCIe and the I2C, the BMC 20 provides a notification of an error of the I/O device 13. When there is an error notification, the processing at and after S5A does not have to be performed, and the BIOS outputs information that prompts replacement of the part causing the error.
Furthermore, the BMC 20 acquires the information of the I/O device 13 at S4, and the BMC 20 sets the activation mode to a “normal mode”. After completing the acquisition of the information of the I/O device 13, the BMC 20 turns off the power of the server 1 (S5A, S5B).
Next, with reference to
The BIOS sets a POST watchdog to the BMC 20 at the start of the POST (S7). Furthermore, the BIOS requests the BMC 20 for the I/O device information acquired by the BMC 20 and receives a notification of the I/O device 13 (S8A, S8B). The processing at S8A is an example of a processor acquiring the information identifying the electrical device from the management device.
The BIOS searches the I/O device 13 in the processing of searching the I/O device 13 in the POST (S9). The BIOS allocates a demanded resource to the searched I/O device 13 (S10). The BIOS initializes all of the I/O devices 13 sequentially (S11).
When a failure occurs in the I/O device 13, an error occurs in the processing at any of S9 to S11 (S12). When an error has occurred, the processing performed by the BIOS moves to an error handler. The BIOS moved to the error handler uses the I/O device information notified from the BMC 20 to point out the I/O device 13 in which the error has occurred. The BIOS uses the I/O device information notified from the BMC 20 and thereby is able to perform error processing without stopping the original processing. Accordingly, in the processing in
In accordance with the instruction by the BIOS, the system is reset (S14). The BIOS disables the I/O device 13 in which an error is pointed out at the time of activation after the reset. Because the I/O device 13 having the error is not used, activation of the system is enabled by the reset. Furthermore, in a case where an error has occurred in a plurality of the I/O devices 13, as described above, the BIOS does not stop at an error and points out errors in the plurality of the I/O devices 13. The BIOS disables the plurality of the I/O devices 13 having the errors and is able to activate the system. As described above, in
<Configurations of BIOS and BMC 20>
As illustrated in
The BMC communication processing unit 101 communicates with a BIOS communication processing unit 201 of the BMC 20 using the KCS interface 25 (see
The I/O device table creation unit 105 searches the I/O device 13 in the POST after the activation of the server 1 and records the information of the I/O device 13 in an I/O device table. The activation mode determination processing unit 106 determines the activation mode specified by the BMC 20 in the POST after the activation of the server 1. The activation mode includes a minimum activation mode using a minimum configuration with which the MCTP is able to be operated and a normal activation mode. The I/O device error processing unit 107 is activated by the error handler when an error has occurred in the I/O device 13 and identifies the I/O device 13 in which the error has occurred.
Furthermore, the BMC 20 includes the CPU 21 and the memory 22, as illustrated in
As illustrated in
The BIOS communication processing unit 201 communicates with the BMC communication processing unit 101 of the BIOS using the KCS interface 25 compliant with the IPMI standard. The external display processing unit 202 executes a web server program, distributes information to an external PC or the like, and receives a setting and an instruction from an external PC or the like, for example. The MCTP protocol processing unit 203 accesses the I/O device 13 in accordance with the MCTP protocol via the transmission route of the PCIe, the I2C, or the like to collect the information of the I/O device 13. The I/O device information acquisition processing unit 210 acquires the information of the I/O device 13 via the MCTP protocol processing unit 203. The I/O device table creation unit 205 stores the information identifying the I/O device 13 acquired by the MCTP protocol processing unit 203 and the I/O device information acquisition processing unit 210 in the memory 22 in the format of the I/O device table.
The activation mode setting processing unit 206 sets the activation mode of the server 1 to a predetermined address in the memory 22. The set activation mode is forwarded to the BIOS by communication between the BIOS communication processing unit 201 and the BMC communication processing unit 101. The power-on processing unit 208 applies the power to the server 1 to activate the BIOS. The MCTP operable minimum configuration activation processing unit 207 sets the minimum activation mode using the activation mode setting processing unit 206 and activates the server 1 using the power-on processing unit 208.
The I/O device 13 includes an MCTP protocol processing unit 131. The I/O device 13 also includes a CPU, a memory, and the like. The CPU of the I/O device 13 executes a program such as firmware loaded to be able to be executed in the memory, as the MCTP protocol processing unit 131. The MCTP protocol processing unit 131 acquires information of the hardware of the I/O device 13, for example, the slot number of the PCIe bus, information of the vendor, the type of the I/O device 13, the address of a resource of the memory 12 allocated to the I/O device 13, and the like and provides the acquired information to the BMC 20.
<Processing and Data Flows of BIOS and BMC 20>
Next, under the control of the MCTP operable minimum configuration activation processing unit 207, the power on is performed. The power on is performed by the power-on processing unit 208 and the activation processing of the BIOS is started. The BIOS acquires the activation mode stored in the activation mode storage area using the activation mode determination processing unit 106 and completes the activation in the minimum activation mode enabling the operation of the MCTP using MCTP operable minimum configuration information. After the completion of the activation, the BMC 20 acquires hardware information of the I/O device 13 via the MCTP protocol processing unit 203 using the I/O device table creation unit 205 and retains the information identifying the I/O device 13 in the format of the I/O device table. At the time of normal activation, the BIOS performs the normal POST. When the BIOS has detected an error in the I/O device 13 via the error handler using the I/O device error processing unit 107 during this POST, the BIOS refers to the information identifying the I/O device 13 to handle the error. For example, the BIOS identifies the I/O device 13 in which the error has occurred and provides a notification of the error together with the information of the identified I/O device 13.
<Details of Processing of BMC 20>
Back to
The MCTP operable minimum configuration activation processing unit 207 performs the power-on processing of the system of the server 1 using the power-on processing unit 208 (P3) and the CPU 11 of the server 1 activates the BIOS. The processing at P2 and P3 is an example of a management device instructing activation of a processing device with a configuration of resources less than in the normal activation.
After the activation of the BIOS, the BMC 20 causes the activation mode determination processing unit 106 of the BIOS to read the activation mode stored in the activation mode storage area. More specifically, by a request from the BMC communication processing unit 101 of the BIOS, the BIOS communication processing unit 201 of the BMC 20 notifies the BMC communication processing unit 101 of the activation mode stored in the activation mode storage area (P4). Thereafter, the BMC 20 waits until the processing of the BIOS (Q0) is completed (P5 to P6).
When the processing of the BIOS has been completed, in the BMC 20, the I/O device table creation unit 205 acquires the information of the I/O device 13 from the MCTP protocol processing unit 131 of each I/O device 13 via the MCTP protocol processing unit 203 (P7). The BMC 20 then sets the “normal mode” (value 1) to the activation mode storage area using the activation mode setting processing unit 206 (P8). Thereafter, the BMC 20 performs the power off of the system and shuts down the server 1 once (P9).
The vendor ID is information uniquely identifying the vendor of the I/O device 13. The device ID is information uniquely identifying the device. The allocation resource is the initial address and the size (capacity of the area) of the memory 12 that is allocated to the I/O device 13 attached to the corresponding slot, for example. The PCI address is the address within the PCI configuration space allocated to the I/O device 13. The PCI configuration space is provided within the I/O device 13, for example, and the CPU 11 is able to access the PCI configuration space with a predetermined instruction to the I/O device 13. However, the PCI configuration space may be mapped to the address space of the memory 12 so as to be accessed from the CPU 11 using the address space of the memory 12. The CPU 11 and the I/O device 13 give and receive data via the PCI address. Furthermore, a register used by the I/O device 13 is allocated to the PCI address.
<Details of Processing of BIOS>
Furthermore, in the present embodiment, the number of the memories 12 is not limited to two. The memory 12-1 is a memory 12 directly accessed by the CPU 11-1 and the memory 12-2 is a memory 12 directly accessed by the CPU 11-2. Each of the memories 12-1 and 12-2 includes DIMMs #1 to #4.
Furthermore, in
Furthermore, in the server 1 in
In the minimum activation mode in
Back to
When the initialization has been completed, the BIOS notifies the BMC of the completion (Q7). As described with reference to P7 in
<Creation of the I/O Device Table>
In
The I/O device table creation unit 205 compares pieces of information that have been acquired using two routes of the PCIe and I2C to check coincidence therebetween (P73). The I/O device table creation unit 205 then determines whether there is any abnormal route from which information is not be able to be acquired (P74). When there is any abnormal route from which information is not be able to be acquired, the I/O device table creation unit 205 points out an error in the route in which an abnormality has occurred (P75). For example, the BMC 20 outputs information of the route in which an abnormality has occurred and the contents of the error on a screen for providing to a PC via a web server program. Furthermore, for example, the BMC 20 outputs an error message including information identifying the slot number of the PCIe on a website on the PC. At this point, when abnormalities have occurred in both of the routes, the I/O device table creation unit 205 may determine that an abnormality has occurred in the I/O device 13.
When there is no route in which an error has occurred at P74, the I/O device table creation unit 205 determines whether the pieces of information acquired from the two routes are both normal (P76). A case where they are not normal includes a case where the slot number of the PCIe is not within a range identified from the number of slots in the server 1, a case where the vendor name does not coincide with any in vendor information that is present, and a case where the device ID does not coincide with any in device ID group. Furthermore, when at least one of the pieces of information acquired from the two routes is not normal, the I/O device table creation unit 205 points out an error pointing out the I/O device 13 having an abnormality. For example, the BMC 20 outputs information of the device in which the abnormality has occurred and the contents of the error on a screen for providing to a PC via a web server program (P77). By contrast, when the pieces of information are both normal by the determination at P76, the I/O device table creation unit 205 ends the processing.
As illustrated in
As described above, in a case where acquisition of the hardware information has been a failure, when an error has occurred in one of the two routes, the BMC 20 is able to point out an abnormality in the route in which the error has occurred. Furthermore, in a case where errors have occurred in both of the routes, the BMC 20 is able to point out an abnormality of the I/O device 13 itself. In the processing at P75 in
The BMC 20 does not demand any special limitation with respect to the type of the route as long as the MCTP protocol is able to be performed. Furthermore, when an error has occurred in a plurality of routes, the BMC 20 is able to identify that there is an error in the I/O device 13. Furthermore, the route of SMBus/I2C is able to be connected one-to-one to the I/O device 13, and the BMC 20 thus does not stop processing at an error in one of the I/O devices 13 and is able to check errors in a plurality of the I/O devices 13. After acquisition of information of all of the I/O devices 13 using the MCTP has been completed, the BMC 20 changes the activation mode to the “normal” as in the processing at P8 in the flowchart in
The description below is processing performed when the manager of the BMC 20 or the server 1 turns on the power of the server 1. With the processing described above, when an error occurs continuously as in a permanent fault, the BIOS points out an error of the I/O device 13 and disables all of the I/O devices 13 in which an error has occurred, and the server 1 is activated.
<Processing when an Error has Occurred>
The I/O device error processing unit 107 identifies the I/O device 13 (slot number) from the access error address notified from the CPU 11 (E1). The I/O device error processing unit 107 acquires the I/O device information from the BMC 20 (E2) and, based on the allocation resource for each slot and the PCI address, checks whether the error address is corresponded. That is to say, the I/O device error processing unit 107 determines whether the access error address is within the range of the allocation resource (E3). The range of the allocation resource is determined by the head address and the size (capacity of the area) of the memory 12 as the allocation resource in the I/O device information which has already been illustrated in
When the access error address is not within the allocation resource (NO at E3), the I/O device error processing unit 107 determines whether the access error address coincides with the PCI address. As has already been described, the PCI address is an address in the PCI configuration space allocated to the I/O device 13. In the example in
Thereafter, by the determination at E3, when the access error address with which the error has occurred is included in the range of the allocation resource of any slot or corresponds with the PCI address of a slot mapped to the address space in the memory 12, the slot in which the error has occurred is identified (E6). The I/O device error processing unit 107 then sets the I/O device 13 mounted in the slot in which the error has occurred to “Disable” in the device disable setting management table.
In the example in
As described above, according to the server 1 in the present embodiment, the BMC 20 acquires the information of the I/O device 13 as an electrical device with a configuration and procedures compliant with the MCTP via the transmission route of the I2C as the second transmission route by the I2C being the second interface. Furthermore, the BMC 20 is able to communicate with the CPU 11 using the KCS interface 25 as a third transmission route to activate the server 1. Accordingly, when the power is applied to the I/O device 13 and the I/O device 13 is under operation, the BMC 20 is able to acquire the information of the I/O device 13 from the transmission route of the I2C as the second transmission route by the I2C being the second interface. The BMC 20 then is able to provide the acquired information of the I/O device 13 to the BIOS executed in the CPU 11 via the third transmission route using the KCS being a third interface. Accordingly, during the activation of the server 1, even when an error has occurred in the I/O device 13 in a state in which the BIOS is not able to identify the I/O device 13, the BIOS is able to acquire the information identifying the I/O device 13 in which the error has occurred from the BMC 20.
Furthermore, in the present embodiment, when the I/O device 13 has been initialized, the BMC 20 is able to acquire the information of the I/O device 13 as an electrical device with a configuration and procedures compliant with the MCTP via two transmission routes exemplified by the PCIe and I2C. That is to say, the BMC 20 is able to acquire the information of the I/O device 13 with a configuration and procedures compliant with the MCTP via the PCIe as the first transmission route in addition to the I2C as the second transmission route (a third effect). That is to say, when the power is applied to the I/O device 13, even when initialization in accordance with the PCIe has not been performed, the BMC 20 is able to acquire information from the I/O device 13 via the I2C as the second transmission route. Furthermore, when the I/O device 13 has been initialized via the PCIe as the first transmission route, the BMC 20 is able to acquire information from the I/O device 13 via the PCIe. Accordingly, even when an error has occurred in one of the first transmission route and the second transmission route while the BMC 20 acquires the information from the I/O device 13, the BMC 20 is able to acquire the information of the I/O device 13 from the transmission route in which the error has not occurred.
Furthermore, when errors have occurred in both of the first transmission route and the second transmission route during acquisition of the information from the I/O device 13, the BMC 20 may determine that an abnormality has occurred in the I/O device 13 itself.
Furthermore, when the BMC 20 activates the server 1 to acquire the information from the I/O device 13, the BMC 20 and the I/O device 13 may activate the server 1 with a minimum configuration enabling communication between the BMC 20 and the I/O device 13 using the MCTP. That is to say, when the BMC 20 acquires the information from the I/O device 13, the server 1 is initialized and activated with a hardware configuration with resources less than normal, and thus activated in a shorter time than normal.
Furthermore, when the BMC 20 acquires the information from the I/O device 13, the BMC 20 and the I/O device 13 communicate with each other with procedures compliant with the MCTP. Accordingly, the BMC 20 is able to collect the information of the I/O device 13 with the load of the processing in the CPU 11 suppressed.
Furthermore, in the present embodiment, when errors occur in a plurality of I/O devices 13, the BIOS points out the plurality of errors without stopping at an error in one of the I/O devices 13, enabling to collectively perform operations for replacing the I/O device 13 that has a trouble.
<Computer-Readable Storage Medium>
A program for a computer or other machine or device (hereinafter, referred to as a computer or the like) to implement any of the above-described functions may be stored in a computer-readable storage medium. Then, by causing the computer or the like to read the program in this storage medium and execute the program, the functions may be provided.
At this point, a computer-readable storage medium means a storage medium that is able to accumulate therein information such as data or a program using an electrical, magnetic, optical, mechanical, or chemical action and be read from a computer or the like. Out of storage media as described above, ones being removable from a computer or the like include a flexible disk, a magneto-optical disk, a CD-ROM, a CD-R/W, a DVD, a Blue-ray disk, a DAT, a 8-mm tape, and a memory card such as a flash memory, for example. Furthermore, storage media fixed to a computer or the like include a hard disk and a read-only memory (ROM). Furthermore, a solid state drive (SSD) is able to be used as a storage medium removable from a computer or the like as well as a storage medium fixed to a computer or the like.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiment of the present invention has been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
9940143, | Mar 17 2016 | DELL PRODUCTS, L.P. | Using peripheral component interconnect express vendor-defined message (PCIe-VDM) and inter-integrated circuit (I2C) transport for network communications |
20070219748, | |||
20090187675, | |||
20110055631, | |||
20120084551, | |||
20120131276, | |||
20130290763, | |||
20180239730, | |||
JP2002259130, | |||
JP2007249761, | |||
JP2009205195, | |||
JP2013041390, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Aug 02 2018 | HAYASHIDA, AKIRA | Fujitsu Limited | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 046958 | /0184 | |
Aug 28 2018 | Fujitsu Limited | (assignment on the face of the patent) | / | |||
Sep 04 2024 | Fujitsu Limited | FSAS TECHNOLOGIES INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 069272 | /0546 |
Date | Maintenance Fee Events |
Aug 28 2018 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Feb 21 2024 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Date | Maintenance Schedule |
Sep 29 2023 | 4 years fee payment window open |
Mar 29 2024 | 6 months grace period start (w surcharge) |
Sep 29 2024 | patent expiry (for year 4) |
Sep 29 2026 | 2 years to revive unintentionally abandoned end. (for year 4) |
Sep 29 2027 | 8 years fee payment window open |
Mar 29 2028 | 6 months grace period start (w surcharge) |
Sep 29 2028 | patent expiry (for year 8) |
Sep 29 2030 | 2 years to revive unintentionally abandoned end. (for year 8) |
Sep 29 2031 | 12 years fee payment window open |
Mar 29 2032 | 6 months grace period start (w surcharge) |
Sep 29 2032 | patent expiry (for year 12) |
Sep 29 2034 | 2 years to revive unintentionally abandoned end. (for year 12) |