A method for computer-based simulation or control of a dynamic system using a computer includes: cyclically receiving, by a programmable logic device, at least one input signal; calculating, by the programmable logic device, at least one matrix multiplication; and outputting, by the programmable logic device, at least one output signal. A configuration of the programmable logic device includes: a parallel multiplication of blocks of at least two elements of a matrix by at least one input-signal-dependent element of a vector, and an adder tree for multiplication results. Successive blocks of the matrix are temporarily stored in a pipeline and processed sequentially. A target number of blocks and a target adder stage are determined based on a number and/or values of parameters of at least one system equation. Processing of blocks for a current cycle is terminated based on the target number of blocks and the target adder stage being reached.
|
8. A computer system comprising:
an operator computer having a human-machine interface; and
a working computer, the working computer including a processor and a programmable logic device;
wherein a configuration of the programmable logic device includes:
a parallel multiplication of blocks of at least two elements of a matrix by at least one input-signal-dependent element of a vector, and
an adder tree for multiplication results;
wherein the programmable logic device is configured to at least temporarily store successive blocks of the matrix and to process the successive blocks sequentially in a pipeline;
wherein the programmable logic device is further configured to:
cyclically receive at least one input signal;
calculate at least one matrix multiplication;
terminate processing of blocks for a current cycle upon a target number of blocks and a target adder stage being reached, wherein the target number of blocks and the target adder stage are based on a number and/or values of parameters of the at least one system equation;
determine at least one output signal based on a result of the target adder stage; and
output the at least one output signal.
9. A non-transitory computer-readable storage medium having stored thereon a configuration for a programmable logic device, wherein the configuration of the programmable logic device includes:
a parallel multiplication of blocks of at least two elements of a matrix by at least one input-signal-dependent element of a vector, and
an adder tree for multiplication results,
wherein the configuration for the programmable logic device configures the programmable logic device to temporarily store successive blocks of the matrix and process the successive blocks of the matrix sequentially;
wherein the configuration, when executed by the programmable logic device, facilitates performance of the following by the programmable logic device:
cyclically receiving at least one input signal,
receiving at least one size parameter,
determining a target number of blocks and a target adder stage in accordance with the at least one size parameter,
calculating at least one matrix multiplication,
terminating processing of blocks for a current cycle upon the target number of blocks and the target adder stage being reached,
determining at least one output signal based on a result of the target adder stage, and
outputting the at least one output signal.
1. A method for computer-based simulation or control of a dynamic system using a computer, wherein behavior of the dynamic system corresponds to at least one system equation,
wherein the computer includes a programmable logic device, wherein a configuration of the programmable logic device includes:
a parallel multiplication of blocks of at least two elements of a matrix by at least one input-signal-dependent element of a vector, and
an adder tree for multiplication results,
wherein successive blocks of the matrix are temporarily stored in a pipeline and processed sequentially,
wherein the method comprises:
cyclically receiving, by the programmable logic device, at least one input signal;
calculating, by the programmable logic device, at least one matrix multiplication;
terminating, by the programmable logic device, processing of blocks for a current cycle upon a target number of blocks and a target adder stage being reached, wherein the target number of blocks and the target adder stage are based on a number and/or values of parameters of the at least one system equation;
determining, by the programmable logic device, at least one output signal based on a result of the target adder stage; and
outputting, by the programmable logic device, the at least one output signal.
2. The method according to
3. The method according to
4. The method according to
5. The method according to
6. The method according to
7. The method as recited in
10. The non-transitory computer-readable storage medium according to
11. The non-transitory computer-readable storage medium according to
|
Priority is claimed to German Patent Application No. DE 102019107817.2, filed on Mar. 27, 2019, the entire disclosure of which is hereby incorporated by reference herein.
The invention relates to a method for simulating a dynamic system, a computer system and a non-transitory computer-readable storage medium.
Modern products, such as control units, are often developed with the aid of computer-based simulations of a dynamic system. On the one hand, it is possible to simulate the system controlled by a controller in order to test a component, such as a control unit, in a hardware-in-the-loop simulation prior to completion of the overall product. On the other hand, in rapid control prototyping, a prototype control unit can be used in a real environment for accelerated development of a control algorithm. In either case, special real-time computers are used that ensure a response within a predetermined maximum period of time. The system dynamics are often limited, so that, for example, a maximum latency in the range of milliseconds is appropriate and a working computer with a standard processor can be used.
However, for electrical and electronic systems, such as DC to DC voltage converters, inverters, smart grids or the control of an electric motor, the maximum permissible latency may be in the range of microseconds. In such cases, it is preferred to use a working computer having a programmable logic device, in particular a field programmable gate array (FPGA). It may also be provided that the control or a slower, higher-level closed-loop control be performed by a computing node having a processor that exchanges data with the programmable logic device. An FPGA is configured by reading a bit stream that directly influences the interconnection of the individual logic elements and thus determines the implemented circuit. This has the disadvantage that the creation of a configuration is complex and requires very time-consuming synthesis and implementation. It is therefore convenient to predefine a fixed FPGA configuration and to make adjustments only to the programming of the computing node.
A method for simulating electric circuits is described in the document entitled “A method for fast time-domain simulation of networks with switches,” Pejovic and Maksimovic, IEEE Transactions on power electronics, vol. 9, No. 4, July 1994. The system equations can be solved by a matrix multiplication, where the number and values of the individual elements of the matrix are dependent on the properties of the specific circuit. Different approaches to efficient matrix multiplication, which are usually based on a sparse matrix, are known in the prior art. However, the matrices occurring in the simulation of electric circuits often do not fall into this category, so that the respective methods become inefficient.
In an exemplary embodiment, the present invention provides a method for computer-based simulation or control of a dynamic system using a computer. Behavior of the dynamic system corresponds to at least one system equation. The computer includes a programmable logic device. The method comprises: cyclically receiving, by the programmable logic device, at least one input signal; calculating, by the programmable logic device, at least one matrix multiplication; and outputting, by the programmable logic device, at least one output signal. A configuration of the programmable logic device includes: a parallel multiplication of blocks of at least two elements of a matrix by at least one input-signal-dependent element of a vector, and an adder tree for multiplication results. Successive blocks of the matrix are temporarily stored in a pipeline and processed sequentially. A target number of blocks and a target adder stage are determined based on a number and/or values of parameters of the at least one system equation. Processing of blocks for a current cycle is terminated based on the target number of blocks and the target adder stage being reached. The at least one output signal is determined based on a result of the target adder stage.
Embodiments of the present invention will be described in even greater detail below based on the exemplary figures. The present invention is not limited to the exemplary embodiments. All features described and/or illustrated herein can be used alone or combined in different combinations in embodiments of the present invention. The features and advantages of various embodiments of the present invention will become apparent by reading the following detailed description with reference to the attached drawings which illustrate the following:
Exemplary embodiments of the invention provide a method and a device for simulating a dynamic system. Exemplary embodiments of the invention are able to dynamically reduce the cycle time or step size for small systems.
In an exemplary embodiment, the invention provides a method for computer-based simulation or control of a dynamic system, particularly an electric circuit, where the behavior is described by at least one system equation and the computer includes a processor and a programmable logic device. The programmable logic device is configured to cyclically receive at least one input signal and calculate at least one matrix multiplication and output at least one output signal. The configuration of the programmable logic device includes a parallel multiplication of blocks of at least two elements of the matrix by at least one input-signal-dependent element of a vector, and an adder tree for the multiplication results, successive blocks of the matrix being temporarily stored in a pipeline and processed sequentially. Based on the number and/or the values of parameters of the at least one system equation, a target number of blocks and a target adder stage are determined, the programmable logic device terminating the processing of blocks for the current cycle as soon as the target number of blocks and the target adder stage are reached and determining at least one output signal based on the result of the target adder stage.
This method advantageously enables dynamic adjustment of the cycle time or a reduction in step size in the simulation of small electric circuits. For this purpose, use is made of the fact that, depending on the number of components, the matrix of the system equation(s) may contain many zeros that can be disregarded. In this connection, the configuration of the programmable logic device can be predefined as a fixed configuration and can thus be used for any matrices up to the given maximum dimensions.
In an exemplary embodiment, the system equation is predefined via a graphical model, at least two elements of the matrix being defined by the system equation and/or the graphical model. The user may, for example, draw a diagram of the electric circuit to be simulated and specify circuit parameters. The elements of the matrix and thus also the target number of blocks as well as the target adder stage are determined based on an automatic analysis of the model according to methods known per se. The graphical representation provides a high degree of user comfort.
The data type of the input signal and/or the data type of the output signal and/or the data type of the elements of the matrix may be a floating-point type, such as, in particular, double. The method is applicable regardless of the data type and not limited to fixed-point types which are problematic in terms of accuracy and scaling.
In an exemplary embodiment, the attainment of the target number of blocks and/or the target adder stage is detected based on an abort signal. In an exemplary embodiment, the configuration is structured in different functional blocks, and the abort signal is then transmitted between individual functional blocks of the configuration. For example, the adder tree may constitute an independent functional block.
In an exemplary embodiment, the target number of blocks and/or the target adder stage and/or a size parameter are stored in an internal memory of the programmable logic device, and at least one functional block generates the abort signal in accordance with the stored value or values.
In an exemplary embodiment, the invention further provides a computer system including an operator computer having a human-machine interface and a working computer, the working computer including a processor and a programmable logic device and being adapted to perform an exemplary embodiment of the above-described method.
In an exemplary embodiment, the invention provides a non-transitory computer-readable storage medium having stored thereon a configuration that configures a programmable logic device to cyclically receive at least one input signal and calculate at least one matrix multiplication and output at least one output signal. The configuration includes a parallel multiplication of blocks of at least two elements of the matrix by at least one input-signal-dependent element of a vector, and an adder tree for the multiplication results, successive blocks of the matrix being temporarily stored in a pipeline and processed sequentially. The programmable logic device is further configured or adapted to receive at least one size parameter, to determine a target number of blocks and a target adder stage in accordance with the size parameter, to terminate the processing of blocks for the current cycle as soon as the target number of blocks and the target adder stage are reached, and to determine at least one output signal based on the result of the target adder stage.
In an exemplary embodiment, the programmable logic device is configured to store the elements of the matrix and/or the size parameter and/or the target number of blocks and/or the target adder stage in an internal memory.
In an exemplary embodiment, the computer-readable storage medium includes at least one non-volatile memory device that is electrically connected to a programmable logic device.
Exemplary embodiments of the invention will now be described in more detail with reference to the drawings, in which like parts are designated by the same reference numerals. The illustrated embodiments are highly schematic; i.e. the distances and dimensions are not true to scale and, unless indicated otherwise, do not have any derivable geometric relations to each other either.
Interface NET allows connection of additional computers, such as, in particular, a working computer ES. One or more interfaces of any type, in particular wired interfaces, may be provided on operator computer PC and may each be used for connecting to other computers. Conveniently, a network interface according to the Ethernet standard may be used, where at least the physical layer is implemented in compliance with the standard. One or more higher protocol layers may also be implemented in a proprietary manner or in a manner adapted to the process computer. Interface NET may also be implemented in a wireless form, such as, in particular, as a wireless local are network (WLAN) interface or according to a standard such as Bluetooth. In an exemplary embodiment, this may also be a mobile cellular connection, such as Long-Term Evolution (LTE), where the exchanged data may be encrypted. It is advantageous if at least one interface of the operator computer is implemented as a standard Ethernet interface, so that other computers and/or servers can easily be connected to operator computer PC.
Operator computer PC may have a secure data container SEC, which facilitates the use of licensed applications and also allows the operator computer to be used as a license server for the working computer. Secure data container SEC may be implemented in the form of a dongle, for example, that is connected, in particular, to a peripheral interface. Alternatively, provision may be made for a secure data container SEC to be permanently integrated as a component in the operation computer or to be stored in the form of a file on non-volatile data storage device HDD.
When implementing the matrix multiplication on an FPGA, the matrix can be divided into blocks in order to reduce the calculation time. For example, a plurality of rows (or also a portion of a row) may be grouped into a block, the elements of the block being processed together or in parallel. Parallel processing of as many elements as possible reduces the number of required clock cycles, but requires larger areas on the FPGA. Depending on the FPGA device or the number of available logic elements, the practically usable block size is limited. Successive blocks are processed sequentially, it being possible to achieve accelerated processing or minimum possible latency through pipelining. The addition of the values is conveniently accomplished by the structure of an adder tree. An adder tree has a large number of parallel adder elements of a first stage, two each of which are connected to an adder element of a second stage. Two each adder elements of the second stage are connected to an adder element of the third stage, and this structure is repeated in subsequent stages until all addition results are accumulated in an adder element of the highest stage. Thus, the output value of this adder element indicates the sum over all input values of the adder elements of the first stage.
In order to calculate a block, the necessary matrix values are first read from the memory (matrix memory) and made available together with the vector data (input and initialization values) for the multiplication. The values are all read out in parallel within one clock cycle. When creating the configuration, the ARRAY PARTITION pragma in VIVADO HLS (a high-level-language development environment for FPGA configurations), for example, may be used to configure the memory accordingly. This allows all necessary block value multiplication operations of the values of a block to occur in parallel. The final addition is performed using an adder tree. For efficient calculation of multiple blocks, pipelining may be used so that after each clock cycle, the calculation for a new block can be started.
As shown in
In an embodiment, the large functional blocks, namely matrix memory, matrix multiplication, and adder tree, may conveniently be implemented in VIVADO HLS, and the activation of the abort mechanism may be accomplished by way of a self-programmed logic circuit in XSG. The functional blocks may be set up for pipelining using a pragma functionality available in VIVADO HLS by specifying the “PIPELINE” pragma to add additional register stages.
In the lower half of the scheme, the sequential transfer of data of successive blocks (BL. 1 through BL. 20) is shown against a clock signal (CLK). The pipelining allows the reading of the data for the second block to be started immediately after the reading of the data for the first block. The further procedure corresponds to that for the data of the first block.
A first example of a circuit to be simulated is a single-phase three-level converter, which, in real applications, is used in the area of power electronics. In the following, it will be used as a basis for a more detailed description of the method presented here. Using the fixed nodal approach (FNA), a matrix having the dimensions 60×23 is created from the upper circuit, which matrix describes the dynamics and outputs of the circuit. In order to calculate the outputs, the matrix is multiplied by a vector that contains the input values and initialization values. An FPGA build is generated based on the matrix dimension 60×23. This FPGA build can then also be used for all other circuits less than or equal to the maximum dimensions.
A second example of a circuit to be simulated is a circuit for charging and discharging a capacitor, which is an example of a smaller circuit that can be simulated with shorter cycle times using a method according to an exemplary embodiment of the invention. From this circuit, a smaller matrix having the dimensions 15×8 can be generated in an automated manner, which matrix can be calculated using the same FPGA configuration.
In order to illustrate the effect of exemplary embodiments of the inventive method, several simulations were performed using a programmable logic device, namely a Xilinx Kintex-7 410T. The table below illustrates, by way of example, the improvements in cycle time that may be achieved according to the inventive method when using the same FPGA configuration for a simulation of the larger circuit and the smaller circuit:
Clock
Type
#Rows
#Columns
#Blocks
#Stages
cycles
Maximum dimension 60 × 23
old
60
23
60
5
79
15
8
60
5
79
new
60
23
20
5
39
15
8
5
3
16
Maximum dimension 300 × 150
old
300
150
300
8
331
15
8
300
8
331
new
300
150
100
8
131
15
8
5
3
16
The term “old” is used here to refer to a calculation according to the prior art: Based on fixed matrix dimensions of 60×23, the calculation for the entire matrix takes as long as for only a small portion having the dimensions 15×8, namely 79 clock cycles. The term “new” is used here to refer to a calculation including parallel processing of blocks and aborting after the required adder stage: By partitioning the matrix into blocks alone, only 39 cycles are needed, so the required time is approximately halved. By reducing the portions to be calculated based on the abort criteria, the cycle time is further reduced to 16 clock cycles. Thus, in the case of a configuration with a fixed maximum dimension of 60×23, the time required for the smaller matrix of the dimension 15×8 is reduced to 20% of the previously required time (“old”).
For even larger matrix dimensions, the performance increase is even more noticeable, as can be seen in the example of a matrix having the dimensions 300×150. A block-by-block parallel processing of blocks of 3 rows alone reduces the cycle time to 40% of the initial value. By aborting the calculations based on the required adder stage or an abort signal, the time required can here be reduced to less than 10% of the previously required time (“old”).
The clock cycles required for the calculation can be determined by the equation
#clock cycles=A+(#blocks−1)+B*#adder stages,
where A and B are constants. The constants may depend on the type of the programmable logic device and, possibly, on the type of implementation. In the example shown, A is equal to 4 and B is equal to 4. In an exemplary embodiment, a simulation method includes automatic setting of the cycle time based on the required clock cycles. Thus, exemplary embodiments of the present invention allow smaller circuits to be simulated with significantly reduced cycle times without having to create a specialized FPGA configuration each time.
While embodiments of the invention have been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. It will be understood that changes and modifications may be made by those of ordinary skill within the scope of the following claims. In particular, the present invention covers further embodiments with any combination of features from different embodiments described above and below. Additionally, statements made herein characterizing the invention refer to an embodiment of the invention and not necessarily all embodiments.
The terms used in the claims should be construed to have the broadest reasonable interpretation consistent with the foregoing description. For example, the use of the article “a” or “the” in introducing an element should not be interpreted as being exclusive of a plurality of elements. Likewise, the recitation of “or” should be interpreted as being inclusive, such that the recitation of “A or B” is not exclusive of “A and B,” unless it is clear from the context or the foregoing description that only one of A and B is intended. Further, the recitation of “at least one of A, B and C” should be interpreted as one or more of a group of elements consisting of A, B and C, and should not be interpreted as requiring at least one of each of the listed elements A, B and C, regardless of whether A, B and C are related as categories or otherwise. Moreover, the recitation of “A, B and/or C” or “at least one of A, B or C” should be interpreted as including any singular entity from the listed elements, e.g., A, any subset from the listed elements, e.g., A and B, or the entire list of elements A, B and C.
Chandra, Vivien, Grunert, Philip
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
10387122, | May 04 2018 | Olsen IP Reserve, LLC | Residue number matrix multiplier |
8626815, | Jul 14 2008 | Altera Corporation | Configuring a programmable integrated circuit device to perform matrix multiplication |
9558156, | Nov 24 2015 | International Business Machines Corporation | Sparse matrix multiplication using a single field programmable gate array module |
20180157465, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Feb 19 2020 | CHANDRA, VIVIEN | dspace digital signal processing and control engineering GmbH | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 052206 | /0795 | |
Feb 19 2020 | GRUNERT, PHILIP | dspace digital signal processing and control engineering GmbH | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 052206 | /0795 | |
Mar 24 2020 | dspace digital signal processing and control engineering GmbH | (assignment on the face of the patent) | / | |||
Nov 03 2021 | dspace digital signal processing and control engineering GmbH | dSPACE GmbH | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 059704 | /0924 |
Date | Maintenance Fee Events |
Mar 24 2020 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Jul 13 2024 | 4 years fee payment window open |
Jan 13 2025 | 6 months grace period start (w surcharge) |
Jul 13 2025 | patent expiry (for year 4) |
Jul 13 2027 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jul 13 2028 | 8 years fee payment window open |
Jan 13 2029 | 6 months grace period start (w surcharge) |
Jul 13 2029 | patent expiry (for year 8) |
Jul 13 2031 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jul 13 2032 | 12 years fee payment window open |
Jan 13 2033 | 6 months grace period start (w surcharge) |
Jul 13 2033 | patent expiry (for year 12) |
Jul 13 2035 | 2 years to revive unintentionally abandoned end. (for year 12) |