Methods and systems for simulating logic may translate logic design into executable code for a multi-processor based parallel logic simulation device. A system may implement one or more parallel execution methods, which may include IPMD, MPMD, and/or DDMT.
|
1. A computer-implemented method of preparing code for execution by a multi-core logic simulation system, the method comprising:
translating a target logic design from a high-level logic design language to an intermediate form comprising code lines, each code line including at least one logic operation and one or more data dependencies with respect to one or more other operations in the code lines;
translating the intermediate code into fixed-width instructions to be executed by core processors of said multi-core logic simulation system; and
constructing digital representation of said fixed-width instructions to be output to at least one digital device.
9. A non-transitory machine-readable medium containing software code that, when executed by a processor, causes the processor to implement a method of preparing code for execution by a multi-core logic simulation system, the computer-implemented method comprising:
translating a target logic design from a high-level logic design language to an intermediate form comprising code lines, each code line including at least one logic operation and one or more data dependencies with respect to one or more other operations in the code lines;
translating the intermediate code into fixed-width instructions to be executed by core processors of said multi-core logic simulation system;
and constructing digital representation of said fixed-width instructions to be output to at least one digital device.
7. A computer-implemented method of preparing code for execution by a multi-core logic simulation system, the method comprising:
translating a target logic design from a high-level logic design language to an intermediate form comprising code lines, each code line including at least one logic operation and one or more data dependencies with respect to one or more other operations in the code lines;
translating the intermediate code into fixed-width instructions to be executed by core processors of said multi-core logic simulation system; and
constructing a digital representation of said fixed-width instructions to be output to at least one digital device, wherein said translating the intermediate code comprises tailoring the fixed-width instructions for an execution model selected from the group consisting of: identical program multiple data (IPMD), multiple program multiple data (MPMD), and data-driven multi-threaded (DDMT).
14. A non-transitory machine-readable medium containing software code that when executed by a processor causes the processor to implement a method of preparing code for execution by a multi-core logic simulation system, the computer-implemented method comprising:
translating a target logic design from a high-level logic design language to an intermediate form comprising code lines, each code line including at least one logic operation and one or more data dependencies with respect to one or more other operations in the code lines; and
translating the intermediate code into fixed-width instructions to be executed by core processors of said multi-core logic simulation system;
and constructing a digital representation of said fixed-width instructions to be output to at least one digital device, wherein said translating the intermediate code comprises tailoring the fixed-width instructions for an execution model selected from the group consisting of: identical program multiple data (IPMD), multiple program multiple data (MPMD), and data-driven multi-threaded (DDMT).
2. The method according to
translating the high-level logic design language into a netlist; and
translating the netlist into said intermediate form.
3. The method according to
4. The method according to
using a library of standard logic to create said intermediate form from said high-level logic design language.
5. The method according to
optimizing the fixed-width instructions for execution by said multi-core logic simulation system.
6. The method according to
downloading software code to implement said translating a target logic design and said translating the intermediate code.
8. The method according to
receiving a user input to indicate an execution model to be used, said execution model selected from the group consisting of IPMD, MPMD, and DDMT.
10. The medium according to
translating the high-level logic design language into a netlist; and
translating the netlist into said intermediate form.
11. The medium according to
12. The medium according to
using a library of standard logic to create said intermediate form from said high-level logic design language.
13. The medium according to
optimizing the fixed-width instructions for execution by said multi-core logic simulation system.
15. The medium according to
receiving a user input to indicate an execution model to be used, said execution model selected from the group consisting of IPMD, MPMD, and DDMT.
|
This application claims the priority of U.S. Provisional Patent Application No. 60/866,517, filed on Nov. 20, 2006, and incorporated herein by reference.
Embodiments of the invention may address multi-core chip architectures that may be used for logic verification and associated methods for using such architectures.
Existing logic verification technology is mostly based on the use of field-programmable gate arrays (FPGAs), clusters of computers (e.g., PCs), or specially designed application-specific integrated circuit (ASIC) systems.
Current FPGA-based technologies usually try to directly map the target logic into a group of FPGAs and to emulate the target system. This approach is not scalable and becomes extremely expensive as the complexity of the target logic increases. Also, the synthesizing processes normally takes a long time, which makes this approach very inefficient at the early stages of the chip logic development when design changes occur very often. Furthermore, FPGAs are intrinsically much slower than custom designed circuits.
The biggest problem of simulating complex chip logic on a PC cluster is the low performance. The main hindering factors come from instruction and data cache locality that are not well-suited to this type of simulation, from inefficient communication channels, and from operating system overhead.
Some companies have developed dedicated logic simulation machines with specially designed ASICs to accelerate the logic simulation process. Those systems are usually extremely expensive to develop and upgrade, and tend to be less flexible than other types of systems. The existing machines are generally not commercially available to outside users.
Various embodiments of the invention will now be described in conjunction with the attached drawings, in which:
An LVC 112 may comprise a logic verification core processor 1131 (which may be referred to below as “LP”), and may include local data memory to hold various associated components 1132, such as input, output, etc. The logic verification core processor may also include local instruction memory 1133 for the LVC to access for execution.
Under traditional event-driven simulation (e.g., CSim), events may be generated when logic cells (netlist design) or signal variables (RTL design) change their values. These events may be stored in an event queue and eventually consumed by the simulation engine to update affected logic cells (netlist design) or RTL processes (RTL design).
In contrast, in embodiments of the invention, the input logic design may be translated into a program composed by a set of primitive logic operations, which may be arranged in such a way that the dependencies between the operations in the original input are satisfied. This may be based, at least in part, on the principle that, no matter how complex a logic circuit, it may be mapped to a group of primitive logic operations, such as AND, OR, MUX, etc.
Note that the LVC synthesizer 13 may be designed such that LVC IR 14 may be able to represent both the functional/applicative subset of the translated logic program and the associated non-functional/imperative parts. Optimizations may then be applied to increase simulation speed, reduce resource usage, and make trade-offs between these two, while generating the final logic programs that are to be mapped on the LVCs 112. This may be accomplished by LVC code generator 15, whose output may then be provided to an LVC chip 11.
In embodiments of the invention, a logic simulation may be converted for execution of the logic programs on logic processors. The LVC compiler (13-15) may be used to bridge the gap between target logic design source 12 and the LVC simulation hardware. The LVC compiling process may be divided into two stages: the “front end” handled by the LVC synthesizer 13 and the “back end” handled by the LVC code generator 15. The target logic design 12 may be written in any hardware description language (HDL) (Verilog, VHDL, etc.) and any code style (RTL or netlist). At the first stage, an LVC synthesizer 13, an embodiment of which is shown in further detail in
As shown in
As shown in
The LVC IR 14 may be thought of as a netlist composed of predefined primitive logic cells. The following is an example of an LVC IR 14:
Block ICache
1 ZDIMES_DBG_FRP0_A WIRE Inputs I:1:W28 Outputs W28 Width 28
2 ZDIMES_DBG_FRP1_A WIRE Inputs I:4:W28 Outputs W28 Width 28
...
307 MC.DG.ZGROUP_EDGE.ZQ6 AND Inputs C:285:W1 C:306:W1 Outputs W1Width 1
308 MC.DG.ZGROUP_EDGE.ZQ7_1_1 CONST Inputs K:7 Outputs W3 Width 3
...
9952 ZX_TOP_HAVE GLUE Inputs C:7773:W1 C:7772:W1 C:7771:W1
C:7770:W1 C:7769:W1 C:7768:W1 C:7767:W1 C:7766:W1 C:7765:W1 C:7764:W1
Outputs W10 Width 10
Inputs
1 FRP0_A[27:0]
2 FRP0_HAVE
...
39 CLK
Outputs
1 TOP_D_0[511:0] C:9833:1
2 TOP_D_1[511:0] C:9834:1
...
30 MTB[3:0] C:870:1
This example is a LVC IR 14 that may represent a hypothetical instruction cache unit. The block in this example has 9952 nodes, each one of which may correspond to a primitive logic cell. Every node may be represented with one line of the statement that may include statement ID, statement name, logic operation type, input and output information, and width (bits). The input information may define the type of the incoming source, which may be any one of three sources: module input, constant, or output of other node. At the end of the LVC IR 14 definition, the module inputs and outputs may be defined. For the module outputs, the sources of the outputs may be specified with a statement ID that may be associated with each one of the outputs. Those statements may correspond to the nodes that have their outputs directly connected to the module outputs. Those primitive logic cells may handle signals with variable length. The LVC logic processors 112 may, in some embodiments, comprise fixed 8-bit processing units. Hence, this is why one may need the LVC code generator to translate the primitive logic cells in the LVC IR 14 into a set of even more primitive fixed-width LVC instructions that may be executed by fixed-width logic processors.
Aspects of embodiments of the invention on LVC code generation may feature a new method for register allocation and instruction scheduling that departs from the traditional implementation in normal optimizing compilers for general purpose microprocessors. In logic verification simulations, there may simply be too many variables for the classical register allocation algorithm to work effectively. Heuristic approaches may be developed to reduce the compilation time without a significant increase in the demand for storage resources.
The LVC code generator 15 is the “back end” of the LVC logic compiler. It may translate the LVC IR 14 into the LVC executables that may be executed by multiple LVC logic processors 112. The LVC code generator 15 may generally be aware of the architectural features of the LVC logic processors 112. Those features may include the on-chip data memory size for each execution engine, the on-chip instruction memory size, and so on. LVC code generator 15 may try to schedule the logic instructions of the logic program so that the temporary storage needed during execution can fit into the on-chip memory of the LVC chip 112. The LVC code generator 15 may also generate debugging information at the same time for signal tracing support.
From the compiler's point of view, the LVC IR 14 may be thought of as a “basic block” composed by logic instructions (or nodes). These logic instructions may generally belong to either of two categories: combinatorial and sequential. The majority of the gates, such as AND, OR, DECODE, and so on, may be combinatorial, and signals may propagate through them in a certain order. The rest of the logic nodes in the LVC IR 14 may be registers (or other sequential instructions). They may retain their values during a simulation cycle until, for example, the next rising edge of the simulated clock, when they may be updated with new values. Given this observation, the LVC IR 14 may also be thought of as a directed acyclic graph (DAG), and the logic instructions may be scheduled to maintain the dependences the DAG imposes.
For example, data storage for the register class of instructions may need to be specially treated with double buffering, one for an old value and one for a new value. The register buffer updating may generally take place between two simulations cycles. Finally, a separate storage space may be allocated for the inputs and the outputs of the “basic block”, so that their values can be used to check the simulation result or to communicate with other simulated modules.
The LVC cores 112 may be implemented by simple stack processors. The use of a particular instruction set architecture (ISA) for the LVC cores 112 may be quite simple in that it may employ a simplified instruction set, compared to modern reduced instruction set computer (RISC) cores. For example, it may not be necessary to include operations on many data types (e.g., float types), nor many addressing modes. It may be supported by a very large instruction word (VLIW) structure that may be exploited by the LVC code generator 15 for multiple logic instruction issues.
The LVC chip architecture 11 may support three execution models: (1) IPMD; (2) MPMD model; (3) DDMT model. These will be discussed further below. The LVC compiler may be directed, e.g., by a user, by a setting in the logic design code, or some other means, to generate LVC code for one of these execution models.
Under an Identical Program Multiple Data (IPMD) execution model one single copy of the program may be shared by all the LVC cores 112, and all LVC cores 112 may execute the program independently. This model may particularly suitable to simulate an array of identical logic circuits and may be well-suited to simulate multiple cores in a multi-core chip. The repetitive functional units within a multi-core chip may be naturally mapped onto a group of LVC cores 112 that share the same target logic program.
Under a Multiple Program Multiple Data (MPMD) execution model, each LVC core 112 may execute its own copy of a program independently. The execution of the LVC cores 112 may be loosely-synchronized: the synchronization may be performed at properly placed barrier synchronizing points. At those synchronization points, interface signals may be exchanged between LVCs 112 to start the next simulation cycle. Under a Data-Driven Multithreaded (DDMT) execution model, each LVC may execute its own program. The execution of the sections of the program may be driven by “events”, which may correspond to data changes at the outputs of the primitive logic cells.
At the LVC chip level 11, embodiments of the invention may employ a multi-core architecture, which may use a shared memory organization, with or without relying on data caches. The explicit memory hierarchy may be exploited by the LVC code generator 15 to ensure that a local memory module of each core is best utilized by exploiting the locality in the LVC IR 14, by means of code partitioning.
As noted above, there may be three execution models (IPMD, MPMD, and DDMT) that may be chosen for simulation. The multi-core architecture of the LVC chip 11 may be adapted to accommodate these three execution models, as well be discussed in further detail below.
IPMD may be well-suited to simulate target logic with many repetitive logic modules. As shown in
The LVC chip 11 may also be configured to let different logic processors execute different logic programs. This may be useful when the target logic is partitioned in such a way that not all sub-modules are identical. Even though the instruction sequencer in the LVC chip 11 may be able to support generating multiple instruction streams, the number of the instruction streams may be limited by the number of read ports of the internal instruction RAM. Therefore, the instruction RAM may be designed to be a set of smaller size dual-port RAM blocks, as shown in
In the DDMT mode, the LP 1131 may execute the logic instructions generated from a node in the LVC IR 14 only when any of its inputs has changed. In a provisional study, using a simple RISC processor logic as an example, it was discovered that, on average, fewer than 10% of the gates in the processor's logic actually produced different outputs every cycle. Given this, the DDMT mode may be able to save a lot of unnecessary execution time during simulation, and the simulation performance may be able to be significantly improved. As shown in
Some of the functional blocks of the LVC chip 11, configured as an IPMD chip, are shown in
Various embodiments of the invention may comprise hardware, software, and/or firmware.
Various embodiments of the invention have now been discussed in detail; however, the invention should not be understood as being limited to these embodiments. It should also be appreciated that various modifications, adaptations, and alternative embodiments thereof may be made within the scope and spirit of the present invention.
Patent | Priority | Assignee | Title |
8365111, | Feb 29 2008 | ET INTERNATIONAL, INC | Data driven logic simulation |
Patent | Priority | Assignee | Title |
4306286, | Jun 29 1979 | International Business Machines Corporation | Logic simulation machine |
4697241, | Mar 01 1985 | QUICKTURN DESIGN SYSTEMS, INC | Hardware logic simulator |
4914612, | Mar 31 1988 | QUICKTURN DESIGN SYSTEMS, INC | Massively distributed simulation engine |
7260794, | Dec 20 2002 | Cadence Design Systems, INC | Logic multiprocessor for FPGA implementation |
7421566, | Aug 12 2005 | International Business Machines Corporation | Implementing instruction set architectures with non-contiguous register file specifiers |
20040010401, | |||
20070101318, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Nov 09 2007 | ET International, Inc. | (assignment on the face of the patent) | / | |||
Jan 04 2008 | CHEN, FEI | ET INTERNATIONAL, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 020475 | /0568 | |
Jan 04 2008 | GAO, GUANG R | ET INTERNATIONAL, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 020475 | /0568 |
Date | Maintenance Fee Events |
Oct 27 2014 | M2551: Payment of Maintenance Fee, 4th Yr, Small Entity. |
Dec 17 2018 | REM: Maintenance Fee Reminder Mailed. |
Jun 03 2019 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Apr 26 2014 | 4 years fee payment window open |
Oct 26 2014 | 6 months grace period start (w surcharge) |
Apr 26 2015 | patent expiry (for year 4) |
Apr 26 2017 | 2 years to revive unintentionally abandoned end. (for year 4) |
Apr 26 2018 | 8 years fee payment window open |
Oct 26 2018 | 6 months grace period start (w surcharge) |
Apr 26 2019 | patent expiry (for year 8) |
Apr 26 2021 | 2 years to revive unintentionally abandoned end. (for year 8) |
Apr 26 2022 | 12 years fee payment window open |
Oct 26 2022 | 6 months grace period start (w surcharge) |
Apr 26 2023 | patent expiry (for year 12) |
Apr 26 2025 | 2 years to revive unintentionally abandoned end. (for year 12) |