Methods for automatically pipelining loops

Methods for automatically pipelining loops
RE40925

A method and an apparatus for creating a representation of a circuit with a pipelined loop from an HDL source code description. It infers a circuit including a pipelined loop which has cycle level simulation behavior matching that of the source HDL. Loop carry dependencies and memory and signal I/O accesses within the loop are scheduled correctly.

PTO Wrapper PDF
Dossier Espace Google

Patent RE40925
Priority May 12 1995
Filed Jun 08 2000
Issued Sep 29 2009
Expiry May 12 2015
Inventors Miller, Ro…
Assg.orig Synopsys, …
Assg.curr Synopsys, …
Entity unknown
Referenced by 14
References 9
Maint.: EXPIRED

RELATED APPLICATIONS

1. A method performed by a data processing system having a memory, comprising the steps of:

parsing a text description of a circuit, said text description stored in the memory, said text description including a loop with a delayed signal assignment having a delay value;

translating said text description into a digital circuit representation in said memory, said digital circuit representation including a pipeline; and

setting a latency of said pipeline equal to said delay value.

0. 26. A method performed by a data processing system having a memory, comprising the steps of:

parsing a text description of a circuit, said text description stored in the memory, said text description including a loop with N wait statements, where N is greater than zero;

translating said text description into a digital circuit representation in said memory, said digital circuit representation including a pipeline; and

setting an initiation interval of said pipeline equal to N.

0. 38. A method performed by a data processing system having a memory, comprising the steps of:

parsing a text description of a circuit, said text description stored in the memory, said text description including a loop with N clock statements, where N is greater than zero;

translating said text description into a digital circuit representation in said memory, said digital circuit representation including a pipeline; and

setting an initiation interval of said pipeline equal to N.

0. 30. A system for building, in a memory, a digital circuit representation which implements the behavior of a text description in said memory, said system having a processor coupled to a memory unit wherein said processor is programmed to perform logic processing, said system comprising:

parsing logic for parsing said text description into a parsed text description, said text description including a loop with N wait statements, where N is greater than zero;

translating logic for translating said parsed text description into said digital circuit representation, said digital circuit including a pipeline; and

initiation interval setting logic for setting an initiation interval of said pipeline equal to N.

0. 39. A system for building, in a memory, a digital circuit representation which implements the behavior of a text description in said memory, said system having a processor coupled to a memory unit wherein said processor is programmed to perform logic processing, said system comprising:

parsing logic for parsing said text description into a parsed text description, said text description including a loop with N clock statements, where N is greater than zero;

translating logic for translating said parsed text description into said digital circuit representation, said digital circuit including a pipeline; and

initiation interval setting logic for setting an initiation interval of said pipeline equal to N.

18. A system for building, in a memory, a digital circuit representation which implements the behavior of a text description in said memory, said system having a processor coupled to a memory unit wherein said processor is programmed to perform logic processing, said system comprising:

parsing logic for parsing said text description into a parsed text description, said text description including a loop with a delayed signal assignment having a delay value;

translating logic for translating said parsed text description into said digital circuit representation, said digital circuit including a pipeline; and

latency setting logic for setting a latency value of said pipeline to be said delay value of said delayed signal assignment.

7. A method, performed by a data processing system having, a memory, of building a digital circuit representation including a pipeline in the memory from a textual description of a loop, comprising the steps of:

identifying a loop carry dependency in said loop;

identifying a producer operation of said loop carry dependency;

identifying a consumer operation of said loop carry dependency;

determining a number, n, of cycles within which said produce operation must be scheduled after said consumer operation;

instantiating a placeholder node in said memory;

node-locking said placeholder node so that it must be scheduled n cycles after said consumer operation; and

constraining said producer operation to be scheduled before said placeholder node.

0. 34. A computer program product comprising a computer usable medium having computer readable code embodied therein for building a digital circuit representation from a text description of a digital circuit, the computer program product comprising:

computer readable program code devices configured to cause a computer to effect parsing said text description, said text description including a loop with N wait statements, where N is greater than zero;

computer readable program code devices configured to cause a computer to effect translating said text description into said digital circuit representation including a pipeline; and

computer readable program code devices configured to cause a computer to effect setting an initiation interval of said pipeline equal to N.

0. 40. A computer program product comprising a computer usable medium having computer readable code embodied therein for building a digital circuit representation from a text description of a digital circuit, the computer program product comprising:

computer readable program code devices configured to cause a computer to effect parsing said text description, said text description including a loop with N clock statements, where N is greater than zero;

computer readable program code devices configured to cause a computer to effect translating said text description into said digital circuit representation including a pipeline; and

computer readable program code devices configured to cause a computer to effect setting an initiation interval of said pipeline equal to N.

21. A computer program product comprising:

a computer usable medium having computer readable code embodied therein for building a digital circuit representation from a text description of a digital circuit, the computer program product comprising:

computer readable program code devices configured to cause a computer to effect parsing said text description, said text description including a loop with a delayed signal assignment having a delay value;

computer readable program code devices configured to cause a computer to effect translating said text description into said digital circuit representation including a pipeline; and

computer readable program code devices configured to cause a computer to effect setting a latency of said pipeline equal to said delay value.

11. A method, performed by a data processing system having a memory, of building a digital circuit representation in said memory, said digital circuit representation including a pipeline derived from a textual description of a loop, said method comprising the steps of:

identifying an access dependency of said loop;

identifying a first access operation of said access dependency;

identifying a second access operation of said access dependency;

determining a number, n, of cycles within which said second access operation must be scheduled after said first access operation;

instantiating a placeholder node in said memory;

node-locking said placeholder node so that it must be scheduled n cycles after said first access operation; and

constraining a scheduling order of said second access operation and said placeholder node.

2. The method of claim 1, wherein said loop further includes N wait statements, where N is greater than zero, said method further comprising the step of setting an initiation interval of said pipeline equal to N.

3. The method of claim 1, wherein said text description is written in Verilog and said delayed signal assignment uses a Verilog “#” operator.

4. The method of claim 3 2, wherein said wait statements use transition on Verilog “@posedge” statements.

5. The method of claim 3 2, wherein said wait statements use transition on Verilog “@negedge” statements.

6. The method of claim of claim 1 2, wherein said text description is written in VHDL, said delayed signal assignment uses a VHDL “after” clause, and said wait statements use VHDL “wait” statements.

8. The method of claim 7, wherein the step of node-locking said placeholder node further comprises the step of creating a template structure in said memory which includes said placeholder node and said consumer operation.

9. The method of claim 8,

wherein said producer operation is included in a second template structure in said memory, and

wherein the step of constraining said producer operation further comprises the step of constraining said second template structure to be scheduled before said template structure.

10. The method of claim 7, wherein n is equal to an initiation interval of said pipeline multiplied by a number of iterations of said loop which execute before data produced by said producer operation is consumed by said consumer operation.

12. The method of claim 11,

wherein said first access operation is chosen from the a group of access operations including a memory read, a memory write, a signal write and a port write,

said second access operation is chosen from the group of access operations including a memory read, a memory write, a signal read, a signal write, a port read and a port write, and

the step of constraining said scheduling order of said second access operation and said placeholder node further includes the step of forcing said second access operation to be scheduled before said placeholder node.

13. The method of claim 11,

wherein said first access operation is chosen from the a group of access operations including a memory read, a memory write, a signal read, a signal write, a port read and a port write,

said second access operation is chosen from the group of access operations including a memory read, a memory write, a signal write and a port write, and

14. The method of claim 11,

wherein said first access operation is chosen from the a group of access operations including a signal read and a port read,

said second access operation is chosen from the group of access operations including a signal read and a port read, and

15. The method of claim 11, wherein the step of constraining said scheduling order of said second access operation and said placeholder node further includes the step of forcing said second access operation to be scheduled before said placeholder node.

16. The method of claim 11, wherein the step of node-locking said placeholder node further includes the step of creating a template which includes said placeholder node and said first access operation.

17. The method of claim 11, wherein n is equal to an initiation interval of said pipeline multiplied by a number of iterations of said loop which execute between said first access operation and said second access operation.

19. A system as described in claim 18, wherein said pipeline implements said loop.

20. A system as described in claim 19, wherein said loop further includes a number, n, of wait statements, said system further comprising initiation interval setting logic for setting an initiation interval of said pipeline to be equal to n.

22. The computer program product of claim 1 wherein said loop further includes N wait statements, where N is greater than zero, said computer program product further comprising computer readable program code devices configured to cause a computer to effect setting an initiation interval of said pipeline equal to N.

0. 23. The method of claim 1, wherein said loop further includes N clock statements, where N is greater than zero, said method further comprising the step of setting an initiation interval of said pipeline equal to N.

0. 24. A system as described in claim 18, wherein said loop further includes a number, n, of clock statements, said system further comprising initiation interval setting logic for setting an initiation interval of said pipeline to be equal to n.

0. 25. The computer program product of claim 21 wherein said loop further includes N clock statements, where N is greater than zero, said computer program product further comprising computer readable program code devices configured to cause a computer to effect setting an initiation interval of said pipeline equal to N.

0. 27. The method of claim 26, wherein the wait statements are VHDL wait statements.

0. 28. The method of claim 26, wherein the wait statements transition on Verilog HDL @posedge statements.

0. 29. The method of claim 26, wherein the wait statements transition on Verilog HDL @negedge statements.

0. 31. The system of claim 30, wherein the wait statements are VHDL wait statements.

0. 32. The system of claim 30, wherein the wait statements transition on Verilog HDL @posedge statements.

0. 33. The system of claim 30, wherein the wait statements transition on Verilog HDL @negedge statements.

0. 35. The method of claim 34, wherein the wait statements are VHDL wait statements.

0. 36. The method of claim 34, wherein the wait statements transition on Verilog HDL @posedge statements.

0. 37. The method of claim 34, wherein the wait statements transition on Verilog HDL @negedge statements.

RELATED APPLICATIONS

This application is related to U.S. patent application Ser. No. 08/440,101 entitled “Behavioral Synthesis Links to Logic Synthesis” with inventors Ronald A. Miller, Donald B. MacMillen, Tai A. Ly and David W. Knapp filed on May 12, 1995, which is hereby incorporated by reference.

FIG. 29 depicts template examples: (a) T1={(a, 0) (b,1) (c,2) (d,3) (e,5)}, (b) T2={(f,0) (h,5)}; (c) T3={(f,0) (a, 1) (g,2) (b,2) (c,3) (d,4) (h,5) (e,6)}.

This statement includes a delay clause (“#24”) indicating that a delay of twenty-four time units, e.g., nanoseconds, should pass before the write operation is performed by the circuit that is to be generated. The delay clause is an example of delayed signal assignment information. Note that the inclusion of the delay clause in the HDL indicates a delay of the write operation only. The delay clause does not cause a delay in the performance of the subtraction operation. Similarly, in FIG. 19(b), the VHDL source code includes a signal assignment statement:
c<=transports−p after 24 ns;

This statement also contains a delay clause (“after 24 ns”) indicating that a delay of twenty-four time units should occur in the generated circuit before the write operation is performed. This delay clause is a further example of delayed signal assignment information.

A circuit loop generated from the HDL source code of FIG. 19(a) and FIG. 19(b) will have an initiation interval of “2” because each source code example has two “wait” (or “posedge” or “negedge”) statements within the loop. As discussed below, the delay clause in the source code causes the resulting loop to have a loop latency of “4”. FIG. 19(a) and FIG. 19(b) are included for the purpose of example only. The present invention can use any appropriate type of source code (VHDL, Verilog, etc.) to represent a delay clause.

FIG. 20 is a flowchart showing steps performed during translation step 810 of FIG. 6 to generate a cdb. The exact placement of the steps of FIG. 20 are not a part of the present invention and the steps also can be performed, for example, in the preprocessing step 820 of FIG. 6. The input to FIG. 20 is a representation of one of the source code examples of FIG. 19(a) and FIG. 19(b), such as a parse tree generated from the source code. The steps of FIG. 20 are performed for each statement in the source code. The output of the translation step 810 and FIG. 20 is a data flow graph (a “Gtech circuit”) and a control flow graph (a “control data base” (cdb)). It will be understood by persons of ordinary skill in the art that the steps of FIG. 20 and FIG. 23 are performed by processor 109 of FIG. 5, performing instructions stored in memory 104 of FIG. 5.

In step 2002, the processor determines whether the current source code statement is a signal assignment statement (e.g., an assignment to a port using the “<=” operator) that includes a delay clause (e.g., “#24” in Verilog or “after 24 ns” in VHDL). If not, in step 2002, the processor performs standard processing for the node to build a node in the data flow graph. If the current source code statement includes a delay clause, then, in step 2004, the processor builds a write operation node in the data flow graph and annotates the node by adding an attribute indicating delayed signal assignment information to show that the write operation corresponding to the write operation node has a delay of, e.g., 24 nanoseconds (see node 2114 of FIG. 21 and FIG. 22).

FIG. 21 shows an example of a data flow graph 2100 generated from one of the source code examples of FIG. 19(a) and FIG. 19(b) in accordance with the steps of FIG. 20. A representation of data flow graph 2100 is stored in memory 104. Data flow graph 2100 includes as inputs a port x, a register p, and ports y and z. Each port has zero or more read operation nodes (“read op”) 2102, 2014, 2106 associated therewith and each read operation node has an attribute indicating a port name (e.g., “port=‘x’”). Respective ones of the inputs are input to a subtracter node 2110 and an adder node 2112. Subtracter node 2110 is connected to a write operation node 2114. Adder node 2112 is connected to a variable assignment node 2116. Output p′ is input as p during successive iteration of the loop. Thus, the data flow graph of FIG. 21 has seven nodes representing the data flow in the circuit to be synthesized.

In step 2008 of FIG. 20, if there are more statements in the source code, control returns to step 2002. If all statements have been processed and a data flow graph (including signal delay attributes) has been generated for the source code, control passes to step 2012, where a control flow graph, such as that in FIG. 22 is created.

Control graph 2200 of FIG. 22 adds control information to nodes 2102, 2104, 2106, 2110, 2112, 2114, and 2116 indicating the order and conditions under which the data flow nodes are executed in the synthesized circuit. A representation of control graph 2200 is stored in memory 104 of FIG. 5. The present invention preferably operates in a “cycle fixed mode” in which each “wait” (or “posedge” or “negedge”) statement in the source code indicates a new cycle in the synthesized circuit. Various processes for generating of control flow graphs are known to person of ordinary skill in the art and are described in High-Level Synthesis of Gajski et al.

In FIG. 22, cnodes are used as “placeholder” nodes in the control graph to represent a collection of data flow nodes. Thus, cnode 2200 is associated with write operation node 2114 (including the signal delay attribute), read operation node 2102, and subtracter node 2110. The wait nodes in FIG. 22 are used to represent the transitions between each cycle (or “cstep”). A wait node 2204 is used to mark the transition between the first cstep (cstep 0) and the second cstep (cstep 1). Wait node 2204 also has attributes indicating that it is based on a rising clock edge (due to the “posedge” statement in the source code) “Wait statements” (in VHDL source code) are treated similarly. Cnode 2206 (located in the second cstep) is associated with variable assignment node 2116, read operation node 2104, read operation node 2106, and adder node 2112. The control graph also includes a second wait node 2208 and a third cnode 2210.

As shown in FIG. 7, the control flow graph is input to step 920, where a control data flow graph (CDFG) is created. The general procedure for creating a conventional CDFG is known to person of ordinary skill in the art and is described in High-Level Synthesis by Gajski et al. FIG. 23 shows certain details of the process of creating a CDFG that relate to the delay clause of the present invention. An example CDFG is shown in FIG. 24. The steps of FIG. 23 are performed for each loop in the control flow graph. In step 2302, the processor sets a Wait_count variable and a Max_wait_count variable in the memory 104 to an initial value of “0”. In step 2304 the processor builds a “loop begin” node in the CDFG and assigns to it a cstep attribute value equal to “0”.

Step 2306 is a first step in a loop performed by the processor for each cdb node. In step 2308, if the current cdb node is a cnode, control passes to step 2310, which is a first step in a loop performed for all data flow nodes associated with the current cdb node. In step 2312, if a current data flow node is a write operation node having a delay clause (i.e., if the current data flow node represents a delayed signal assignment), control passes to step 2322.

In step 2322, a temp_wait_count variable is set to the current value of Wait_count + a number of delay time units in the delayed signal assignment divided by the clock period (e.g. 0+ 24/6=4). A CDFG node is created and assigned to cstep temp_wait_count in step 2324. In step 2326, if temp_wait_count is greater than Max_wait_count, then in step 2328. Max_wait_count is set equal to temp_wait_count. Otherwise, control passes to step 2342. If, in step 2342, there are more data flow nodes associated with the current cdb node, then control passes to step 2310. Otherwise control passes to step 2336.

If, in step 2312, the current data flow nodes not a delayed signal assignment, the processor builds a standard CDFG node in step 2314 and assigns the created data flow node to cstep wait_count in step 2316. If, in step 2318, wait_count is greater than Max_wait_count, the Max_wait_count is assigned to wait_count in step 2320. Control next passes to step 2342.

If, in step 2306, the current cdb node is not a cnode, then control passes to step 2330. If in step 2330 the current cdb node is a wait node, then wait_count is incremented in step 2332 and control passes to step 2336. If, in step 2330, the current cdb node is not a wait node, then regular processing is performed to create a CDFG node in step 2334 and control passes to step 2336.

In step 2336, if there are more cdb nodes to process, then control passes to step 2306. Otherwise, a loop_latency variable in memory 104 for the loop is assigned to Max_wait_count and an initiation interval variable for the loop is assigned to wait_count in step 2338. In step 2340, the processor builds a “loop end” node in the CDFG and assigns it to cstep wait_count.

The output of step 920 of FIG. 7 is input to the scheduler, which uses the CDFG and the loop initiation interval and loop latency to schedule the nodes of the circuit being generated. In the described embodiment, all nodes except read/write operation nodes can “float”, i.e., can be moved between csteps by the scheduler to allow the scheduler to create an efficient circuit design. In the CDFG, these nodes are always assigned a cstep value equal to the initial cteps in which they appear in the HDL as a “suggestion” to the scheduler. It will be understood by persons of ordinary skill in the art that the CDFG of FIG. 24 has been simplified for the sake of example and that the CDFG also includes, e.g., data flow arcs connecting the CDFG nodes that represent data flows in a similar manner to the data flows of FIG. 21.

FIG. 14 shows an example circuit synthesized from the CDFG of FIG. 24. FIG. 25 shows an example of placement of CDFG nodes in csteps without and with use of the delay clause. In the left column, which represents CDFG without the delay clause, CDFG nodes corresponding to write operation node 2114, read operation node 2109, and subtracter node 2110 are assigned to cstep 0. Similarly, CDFG nodes corresponding to adder node 2112, read operation node 2104, read operation node 2106, assignment node 2116 (and a CDFG loop_end node) are assigned to a second cstep 1. Generation of this CDFG representation causes the synthesizer to generate a circuit that has different timing characteristics than the characteristics generated by the circuit synthesizer when the source code includes a delay clause. The right column of FIG. 25 shows the assignment of CDFG nodes to cycles in accordance with the present invention. In this example, a write operation node corresponding to write operation node 2114 is moved into cstep 4 during the steps of FIG. 23. This modification of the process to generate the CDFG (possible because of an addition of a signal delay attribute to the data graph 2100) allows the synthesis process to generate a circuit that has cycle level simulation behavior that is substantially identical to that of the cycle level simulation behavior of the source HDL.

FIG. 26 shows an example of loop pipelining when the present invention is used. The figure shows an nth iteration of the loop and an n+1st iteration of the loop over time. As can be seen in the figure, the initial interval of successive iterations of the loop is equal to a number of wait statements (or “posedge” or “negedge” statements). The loop latency, is equal to the longest cycle delay from the beginning of the loop to a latest operation. The throughput of the pipelined loop is not decreased by use of delayed signal assignments. In general, the scheduler will schedule a circuit having the CDFG of FIG. 24 as a pipelined circuit because the loop latency is longer than the initiation interval.

In summary, use of delayed signal assignments allows behavioral synthesis to infer circuits with pipelined loops which have cycle level simulation behavior which matches that of the source HDL. Pipelined loops may include loop carry dependencies and/or I/O and/or memory accesses which must be scheduled correctly. The use of a placeholder node within a template is an efficient representation of such scheduling constraints.

INVENTORS:

Miller, Ronald A., Ly, Tai A., MacMillen, Donald B., Knapp, David W.

THIS PATENT IS REFERENCED BY THESE PATENTS:

Patent	Priority	Assignee	Title
11520570,	Jun 10 2021	Xilinx, Inc	Application-specific hardware pipeline implemented in an integrated circuit
7930666,	Dec 12 2006	Altera Corporation	System and method of providing a memory hierarchy
8056030,	Nov 27 2007	NEC SOLUTION INNOVATORS, LTD	Behavioral synthesis system, behavioral synthesis method, and behavioral synthesis program
8140883,	May 03 2007	Altera Corporation	Scheduling of pipelined loop operations
8205187,	Jun 09 2009	JASPER DESIGN AUTOMATION, INC	Generalizing and inferring behaviors of a circuit design
8458621,	Jun 09 2009	Jasper Design Automation, Inc.	Comprehending a circuit design
8527911,	Jun 09 2009	JASPER DESIGN AUTOMATION, INC	Comprehending a circuit design
8630824,	Jun 09 2009	JASPER DESIGN AUTOMATION AB	Comprehending waveforms of a circuit design
8731894,	Jun 09 2009	Jasper Design Automation, Inc.	Indexing behaviors and recipes of a circuit design
8831925,	Jun 09 2009	JASPER DESIGN AUTOMATION, INC	Indexing behaviors and recipes of a circuit design
9460252,	Apr 25 2012	Jasper Design Automation, Inc.	Functional property ranking
9477802,	Jun 09 2009	JASPER DESIGN AUTOMATION, INC	Isolating differences between revisions of a circuit design
9619601,	Jan 22 2015	XILINX, Inc.; Xilinx, Inc	Control and data flow graph generation for hardware description languages
9639416,	Nov 18 2012	Altera Corporation	CRC circuits with extended cycles

THIS PATENT REFERENCES THESE PATENTS:

Patent	Priority	Assignee	Title
4827427,	Mar 05 1987		Instantaneous incremental compiler for producing logic circuit designs
5111413,	Mar 24 1989	Synopsys, Inc	Computer-aided engineering
5128871,	Mar 07 1990	Lattice Semiconductor Corporation	Apparatus and method for allocation of resoures in programmable logic devices
5237513,	Nov 20 1989	KAPLAN, JONATHAN T	Optimal integrated circuit generation
5274793,	Mar 08 1989	Hitachi, LTD	Automatic logic generation method for pipeline processor
5437037,	Jun 05 1992	Mega Chips Corporation	Simulation using compiled function description language
5544066,	Apr 06 1990	LSI Logic Corporation	Method and system for creating and validating low level description of electronic design from higher level, behavior-oriented description, including estimation and comparison of low-level design constraints
5572437,	Apr 06 1990	LSI Logic Corporation	Method and system for creating and verifying structural logic model of electronic design from behavioral description, including generation of logic and timing models
6044023,	Feb 10 1995	Hynix Semiconductor Inc	Method and apparatus for pipelining data in an integrated circuit

ASSIGNMENT RECORDS Assignment records on the USPTO

Executed on	Assignor	Assignee	Conveyance	Frame	Reel	Doc
Jun 08 2000		Synopsys, Inc.	(assignment on the face of the patent)

MAINTENANCE FEES AND DATES: Maintenance records on the USPTO

Date	Maintenance Fee Events

Date	Maintenance Schedule
Sep 29 2012	4 years fee payment window open
Mar 29 2013	6 months grace period start (w surcharge)
Sep 29 2013	patent expiry (for year 4)
Sep 29 2015	2 years to revive unintentionally abandoned end. (for year 4)
Sep 29 2016	8 years fee payment window open
Mar 29 2017	6 months grace period start (w surcharge)
Sep 29 2017	patent expiry (for year 8)
Sep 29 2019	2 years to revive unintentionally abandoned end. (for year 8)
Sep 29 2020	12 years fee payment window open
Mar 29 2021	6 months grace period start (w surcharge)
Sep 29 2021	patent expiry (for year 12)
Sep 29 2023	2 years to revive unintentionally abandoned end. (for year 12)