Approaches for sizing first-in-first-out (fifo) buffers for pipelining functions of a circuit. functions of the circuit are performed on an input data set, with respective fifo buffers for buffering data elements between coupled pairs of the functional blocks. While performing the functions of the circuit, a respective current number of elements added to a fifo buffer since a previous element was removed from the fifo buffer is counted for each fifo buffer, and then compared to a respective saved number. The respective current number is saved as a new respective saved number in response to the respective current number being greater than the respective saved number, and the respective current number is reset after the comparing of the respective current number to the respective saved number. respective sizes for the fifo buffers are determined as a function of the respective saved numbers and then the sizes are stored.
|
1. A method for sizing first-in-first-out (fifo) buffers for pipelining functions of a circuit, the method comprising:
performing the functions of the circuit on an input data set, wherein the circuit includes respective fifo buffers for buffering data elements between coupled pairs of functional blocks;
while performing the functions of the circuit, for each fifo buffer,
counting a respective current number of elements added to the fifo buffer since a previous element was removed from the fifo buffer,
comparing the respective current number to a respective saved number in response to removal of an element from the fifo buffer,
saving the respective current number as a new respective saved number in response to the respective current number being greater than the respective saved number, and
resetting the respective current number after the comparing of the respective current number to the respective saved number;
determining respective sizes for the fifo buffers as a function of the respective saved numbers; and
storing the respective sizes for the fifo buffers in a computer-readable storage medium that provides non-transitory storage.
11. A non-transitory program storage medium, comprising:
a processor-readable storage device configured with instructions for sizing first-in-first-out (fifo) buffers for pipelining functions of a circuit, wherein execution of the instructions by one or more processors causes the one or more processors to perform operations including:
performing the functions of the circuit on an input data set, wherein the circuit includes respective fifo buffers for buffering data elements between coupled pairs of functional blocks;
while performing the functions of the circuit, for each fifo buffer,
counting a respective current number of elements added to the fifo buffer since a previous element was removed from the fifo buffer,
comparing the respective current number to a respective saved number in response to removal of an element from the fifo buffer,
saving the respective current number as a new respective saved number in response to the respective current number being greater than the respective saved number, and
resetting the respective current number after the comparing of the respective current number to the respective saved number;
determining respective sizes for the fifo buffers as a function of the respective saved numbers; and
storing the respective sizes for the fifo buffers.
20. A system for sizing first-in-first-out (fifo) buffers for pipelining functions of a circuit, the system comprising:
a processor;
a processor-readable storage device coupled to the processor and configured with instructions for sizing the first-in-first-out (fifo) buffers for pipelining the functions of a circuit, wherein execution of the instructions by the processor causes the processor to perform operations including:
performing the functions of the circuit on an input data set, wherein the circuit includes respective fifo buffers for buffering data elements between coupled pairs of the functional blocks;
while performing the functions of the circuit, for each fifo buffer,
counting a respective current number of elements added to the fifo buffer since a previous element was removed from the fifo buffer,
comparing the respective current number to a respective saved number in response to removal of an element from the fifo buffer,
saving the respective current number as a new respective saved number in response to the respective current number being greater than the respective saved number, and
resetting the respective current number after the comparing of the respective current number to the respective saved number;
determining respective sizes for the fifo buffers as a function of the respective saved numbers; and
storing the respective sizes for the fifo buffers.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
7. The method of
for each of two or more paths to respective inputs of a functional block, determining whether the two or more paths have equal or unequal timing; and
in response to determining that the two or more paths have unequal timing, establishing respective fifo buffers at the respective inputs of the functional block.
8. The method of
for each of two or more paths to respective inputs of a functional block, determining a slowest of the paths; and
establishing respective fifo buffers at each input of the functional block other than the input for the slowest path.
9. The method of
in a graph representation of the circuit, for respective paths to each of two or more inputs of a first functional block, determining a second functional block from which the respective paths diverge;
determining whether the respective paths have equal or unequal timing from the second functional block to the first functional block; and
in response to determining that the respective paths have unequal timing, establishing respective fifo buffers at the two or more inputs of the first functional block.
10. The method of
establishing the respective fifo buffers at all inputs of each of the functional blocks.
12. The program storage medium of
13. The program storage medium of
14. The program storage medium of
15. The program storage medium of
16. The program storage medium of
17. The program storage medium of
for each of two or more paths to respective inputs of a functional block, determining whether the two or more paths have equal or unequal timing; and
in response to determining that the two or more paths have unequal timing, establishing respective fifo buffers at the respective inputs of the functional block.
18. The program storage medium of
for each of two or more paths to respective inputs of a functional block, determining a slowest of the paths; and
establishing respective fifo buffers at each input of the functional block other than the input for the slowest path.
19. The program storage medium of
in a graph representation of the circuit, for respective paths to each of two or more inputs of a first functional block, determining a second functional block from which the respective paths diverge;
determining whether the respective paths have equal or unequal timing from the second functional block to the first functional block; and
in response to determining that the respective paths have unequal timing, establishing respective fifo buffers at the two or more inputs of the first functional block.
|
The present invention generally relates to determining sizes of first-in-first-out (FIFO) buffers used between functional blocks in an electronic circuit design.
The functional blocks of a pipelined circuit have different throughput given the differences in the functions performed and differences in the input data processed by each functional block. In order to maximize the performance of asynchronous circuits, it is desirable to minimize the duration for which any of the functional blocks is kept waiting for input from any other of the functional blocks, and to minimize the duration for which any of the functional blocks is kept waiting to provide output to any other of the functional blocks.
FIFO buffers are often used between the functional blocks of a design in order to accommodate differences in throughput and variations in input data rates. One of the challenges in using FIFO buffers in a design is choosing suitable sizes for the FIFO buffers. A single size for all FIFO buffers would be unsuitable in view of the differences between the functional blocks. If a FIFO buffer is too big, it unnecessarily occupies too large a circuit area. If a FIFO buffer is too small, a functional block may have to wait for input data or wait to output data. For a design having a large number of pipelined functional blocks, the task of selecting suitable sizes for the many FIFO buffers may be difficult and time-consuming.
The present invention may address one or more of the above issues.
The present invention provides various approaches for sizing first-in-first-out (FIFO) buffers for pipelining functions of a circuit. In one embodiment, a method performs the functions of the circuit on an input data set. The circuit includes respective FIFO buffers for buffering data elements between coupled pairs of the functional blocks. While performing the functions of the circuit, for each FIFO buffer, a respective current number of elements added to the FIFO buffer since a previous element was removed from the FIFO buffer is counted. The respective current number is compared to a respective saved number in response to removal of an element from the FIFO buffer. The respective current number is saved as a new respective saved number in response to the respective current number being greater than the respective saved number. The respective current number is reset after the comparing of the respective current number to the respective saved number. Respective sizes for the FIFO buffers are determined as a function of the respective saved numbers. The respective sizes for the FIFO buffers are stored.
In another embodiment, a program storage medium includes a processor-readable storage device having instructions for sizing FIFO buffers for pipelining functions of a circuit. In yet another embodiment, a system is provided for sizing FIFO buffers for pipelining functions of a circuit. The system includes a processor and a processor-readable storage device.
It will be appreciated that various other embodiments are set forth in the Detailed Description and Claims, which follow.
Various aspects and advantages of the invention will become apparent upon review of the following detailed description and upon reference to the drawings, in which:
The various embodiments of the invention provide an approach for determining suitable sizes for first-in-first-out (FIFO) buffers that are used in pipelining functions of a circuit. The general approach entails performing the functions of the circuit on an input data set. The design of the circuit may be simulated or the actual circuit may be operated to perform the circuit's functions, depending on design objectives and restrictions. The circuit includes respective FIFO buffers for buffering data elements between coupled pairs of the functional blocks. These FIFO buffers may be automatically established as part of the design, as described in more detail below. The approaches generally seek to determine the minimum size for each FIFO buffer that may be attained while allowing sufficient space for buffering data to be passed between functional blocks. As a general measure, the maximum number of elements added to a FIFO buffer before an element is removed is determined while performing the functions of the circuit. This maximum number is used in determining the size of the FIFO buffer. Thus, while performing the functions of the circuit, for each FIFO buffer the number of elements added to the FIFO buffer is counted until an element is removed from the FIFO buffer. When an element is removed, the count is compared to the current maximum count of additions until a removal. If the current count is greater than the current maximum count, the current count is saved as the new maximum. The count of additions is then reset for counting additions until the next removal. The respective sizes for the FIFO buffers are determined based on the respective maximum counts for the FIFO buffers.
The above-described approach is beneficial over simply using the maximum depth the FIFO buffer reaches during simulation. An example scenario illustrating the benefit is where one functional block is capable of pushing a new value onto the FIFO buffer every 1 ns, but a consumer functional block can pull only one value from the FIFO buffer every 5 ns. In this particular case, the maximum depth of the FIFO buffer is dependent on how long the simulation is run, because for every five nanoseconds the FIFO buffer will grow by four. With the embodiments of the present invention, the FIFO buffer will be given a depth of at most 5, which should be sufficient to avoid slowing down the circuit.
The functions of the circuit are then performed on an input data set in step 104. For example, operation of the circuit is simulated using input data and control signals consistent with the expected real-world conditions. There are a number of methods that may be used to simulate circuits. ModelSim, for example, is a commonly used logic simulator for application specific integrated circuits (ASICs) and designs for programmable ICs such as field programmable gate arrays (FPGAs). The user provides data for the simulation that is representative of the data expected when the implemented circuit is operating in its intended application, since the sizes of the FIFO buffers will be determined based on the data provided for the simulation.
For the purposes of simulation, the FIFO buffers are assumed to have unlimited depth. Whenever the functional block driving a FIFO buffer has a result generated, it is assumed that the FIFO buffer will be able to store that result. In one implementation, the circuit is an asynchronous programmable logic device (PLD) and the inserted FIFO buffers have a depth substantially greater than what would be anticipated being needed. During the simulation, the FIFO buffers store information about how deep the FIFO buffer actually needs to be, which can be read back either with boundary scan or other methods that are available from the PLD.
The sizes of the FIFO buffers are chosen based on the results of the simulation. In order to minimize the size of the resulting circuit, the size of each FIFO buffer is set to the minimum size required to handle the data provided in the simulation. While the functions of the circuit are performed on an input data set, the adding and removing of elements of each FIFO buffer is monitored to determine a maximum dynamic depth, as shown by step 106. For example, during operation the simulation model counts the number of data elements added to the FIFO buffer since the last removal of a data element from the FIFO buffer. The dynamic depth is the greater of the current count and a previously stored dynamic depth. Once an item is removed from the FIFO buffer, the count is reset to zero; the dynamic depth is saved for a subsequent comparison. Thus, the dynamic depth of a FIFO is the largest of the numbers of data items added to the FIFO buffer and counted between removals of data items. The sizes for the FIFO buffers are then determined based on the dynamic depths determined from the simulation. The actual depth of a FIFO buffer is one less than its dynamic depth, since a FIFO buffer having a dynamic depth of one is unnecessary and can be removed. The sizes for each of the FIFO buffers are then stored in step 108. The determined sizes for the FIFO buffers will likely provide sufficient space for buffering data in the final implemented circuit.
The connecting arcs between ENTER block 202 and DEMUX block 208 show where buffering may be beneficial in order to efficiently pipeline functions of the circuit. Using the connections (or arcs) between ENTER block 202 and DEMUX block 208 as an example, several of the connections are direct from the ENTER block 202 to the DEMUX block 208, while some of the connections are via MULTIPLY block 204 and CMP block 206. In order to efficiently pipeline the functions represented by the graph, the direct connections from the ENTER block 202 to the DEMUX block 208 need to buffer the data for a period of time that matches the time it takes for the MULTIPLY block 204 and the CMP block 206 to perform their functions and propagate the data to the DEMUX block 208. Specifically, in a pipelined arrangement all outputs from the ENTER block 202 occur at the same time, with a goal of one set of output values provided on each cycle to the FIFO buffers 210, 212, 214, 216, 218, and 220. Thus, if any one of the direct ENTER→DEMUX FIFO buffers 210, 212, 214, 216, 218, and 220 is full, another set of output values from the ENTER block cannot be permitted. When the MULTIPLY→CMP blocks complete processing and provide an output value to the DEMUX block via FIFO buffer 222, the DEMUX block reads a set of values from the ENTER→DEMUX FIFO buffers 214, 216, 218, and 220. A new set of values can then be output from the ENTER block to the FIFO buffers 210, 212, 214, 216, 218, and 220. The ENTER block again must wait for the values from FIFO buffers 210 and 212 to flow through the MULTIPLY→CMP blocks before a new set of values can be output. The outcome is that there are many cycles where the ENTER block must wait to output a set of values.
The graph of
In order for asynchronous systems to work in the absence of a clock, there generally needs to be a mechanism for one functional block to communicate that a value is ready for the next functional block in the pipeline. That next block then needs to make sure that all of its input values are ready and wait if those values are not ready. Thus, the previous block must hold its output data until the next block is ready to receive the data. Once a block has all of its input values available and the block is ready to process the data, the block must signal the previous block(s) that the data has been received in order to enable the previous block(s) to process the next set of input values. Assuming that the graph of
With reference to
As an example, assuming that all of the functional blocks take the same amount of time to execute, inserting a FIFO buffer capable of holding two values (i.e., a two-deep FIFO buffer) on the direct connections between the ENTER block 202 and DEMUX block 208 allows the circuit to run at full speed. While the MULTIPLY block 204 and the CMP block 206 are executing, the values from the ENTER block 202 can be held in the FIFO buffers until the DEMUX block 208 receives the result from the CMP block 208. In one implementation, FIFO buffers are placed on connections between functional blocks in a manner that prevents backpressure and maximizes the performance of the circuit. In another implementation, the FIFO buffers are placed only on connections where they are needed and the FIFO buffers are as small as possible (e.g., capable of holding only the minimum required number of data values) in order to minimize the size of the circuit.
If the timing of all functional blocks is fixed and known in advance, it is possible to statically balance the FIFO buffer depths to ensure maximum performance. However, in asynchronous circuits the timing of the functional blocks typically cannot be accurately predicted at synthesis time. Also, the timing of a functional block may be data dependent, such as an implementation of a FOR loop in which the number of iterations depends on the input data. In the FOR loop example, the values of input data would be unknown and suitable sizes of FIFO buffers may not be accurately determined when using static analysis. In contrast, by simulating the pipelined circuit design (or operating a prototype circuit) using data values expected for an implemented circuit, the behavior of the function blocks may be observed in a dynamic environment and suitable sizes of FIFO buffers may be more accurately determined.
In an example embodiment, the circuit design is represented as a graph such as the example shown in
In another embodiment, FIFO buffers may be placed on all of the inputs of a functional block where paths from different timing domains meet, except for those inputs on the slowest path. For example, the inputs to DEMUX block 208 are connected to paths from two different timing domains. One of the timing domains has the paths that directly connect the ENTER block 202 to the DEMUX block 208, and the other timing domain has the path that connects to an input of DEMUX block 208 through CMP block 206. In this embodiment, the paths that directly connect ENTER block 202 to DEMUX block 208 are faster than the path that flows through CMP block 206 and the path that flows through CMP block 206 is the slowest. Thus, FIFO buffers 214, 216, 218, and 220 are inserted at the inputs of the DEMUX block, while FIFO buffer 222 is omitted.
The procedure for determining whether a functional block needs to have one or more FIFO buffers inserted at its inputs involves following the paths from all of the inputs of the functional block backward through the graph until those paths converge at outputs from the same node. In the graph of
In beginning the process of performing the functions of the circuit, the dynamic depth of the FIFO buffer is initialized to zero in step 302. It may be recalled that the functions of the circuit may be performed by way of simulation or by operating a physical circuit, depending on design and implementation objectives. The process tracks the maximum of the numbers of data items that are added to the FIFO buffer between removals of data items from the FIFO buffer. An additions counter, which stores the number for additions between removals, is set to zero in step 304. When a data item is added to the FIFO buffer (decision step 306), then the additions count is incremented in step 308 and the process returns to decision step 306.
If a data item is not added but is removed from the FIFO buffer (decision step 310), the process determines whether the value of the additions count is greater than the dynamic depth in step 312. If the value of the additions count is not greater than the dynamic depth, then the process returns to step 304 where the additions count is reset to zero. The process then repeats steps 306-310 until another data item is removed from the FIFO buffer. If the value of the additions count is greater than the dynamic depth, then the dynamic depth is set to the value of the additions count in step 314. The process then proceeds to step 316 where it is determined whether or not performing the circuit functions is complete. If continued performance of the functions of the circuit is desired, then the process returns to step 304 where the additions count is reset to zero. The process then repeats steps 306-310 until another data item is removed from the FIFO buffer.
When the process of performing the functions of the circuit is complete, the dynamic depth indicates the maximum of the numbers of data items that were added to the FIFO buffer between removals of data items from the FIFO buffer. The appropriate size for the FIFO buffer based upon the results of the simulation is one less than the dynamic depth of the FIFO buffer determined by the process of
The example shown in
The results of performing the functions of the circuit indicated that FIFO buffer 542 is needed between the DEMUX block 508 and the MUX block 520 and that FIFO buffer 544 is needed between ENTER block 502 and EXIT block 522. The results of performing the functions of the circuit also provide the appropriate sizes for the FIFO buffers 542 and 544.
Processor-readable device 602 is configured with software modules 604, 606, 608, 610, and 612. Execution of the instructions of software modules 604, 606, 608, 610, and 612 by processor 614 (or multiple processors) causes processor 614 to determine the sizes of the FIFO buffers for pipelining functions of a circuit and to store the FIFO buffer sizes. In one embodiment, the FIFO buffer sizes 616 are stored on the processor readable device 602.
Execution of the instructions of software module 604 causes processor 614 to insert respective FIFO buffers between functional blocks of the circuit. Execution of the instructions of software module 606 causes processor 614 to perform the functions of the circuit on an input data set. Execution of the instructions of software module 608 causes processor 614 to determine the number of data elements that are added to each of the FIFO buffers since the removal of a data element from that FIFO buffer. Execution of the instructions of software module 610 causes processor 614 to determine a respective dynamic depth for each of the FIFO buffers. Execution of the instructions of software module 612 causes processor 614 to store the respective sizes of the FIFO buffers. The size of each of the FIFO buffers is based on its dynamic depth.
Advanced FPGAs can include several different types of programmable logic blocks in the array. For example,
In some FPGAs, each programmable tile includes a programmable interconnect element (INT 711) having standardized connections to and from a corresponding interconnect element in each adjacent tile. Therefore, the programmable interconnect elements taken together implement the programmable interconnect structure for the illustrated FPGA. The programmable interconnect element (INT 711) also includes the connections to and from the programmable logic element within the same tile, as shown by the examples included at the top of
For example, a CLB 702 can include a configurable logic element (CLE 712) that can be programmed to implement user logic plus a single programmable interconnect element (INT 711). A BRAM 703 can include a BRAM logic element (BRL 713) in addition to one or more programmable interconnect elements. Typically, the number of interconnect elements included in a tile depends on the height of the tile. In the pictured embodiment, a BRAM tile has the same height as five CLBs, but other numbers (e.g., four) can also be used. A DSP tile 706 can include a DSP logic element (DSPL 714) in addition to an appropriate number of programmable interconnect elements. An IOB 704 can include, for example, two instances of an input/output logic element (IOL 715) in addition to one instance of the programmable interconnect element (INT 711). As will be clear to those of skill in the art, the actual I/O pads connected, for example, to the I/O logic element 715 typically are not confined to the area of the input/output logic element 715.
In the pictured embodiment, a columnar area near the center of the die (shown shaded in
Some FPGAs utilizing the architecture illustrated in
Note that
Those skilled in the art will appreciate that various alternative computing arrangements, including one or more processors and a memory arrangement configured with program code, would be suitable for hosting the processes and data structures of the different embodiments of the present invention. In addition, the processes may be provided via a variety of computer-readable storage media or delivery channels such as magnetic or optical disks or tapes, electronic storage devices, or as application services over a network.
The present invention is thought to be applicable to a variety of methods for sizing FIFO buffers for pipelining functions of the circuit. Other aspects and embodiments of the present invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. It is intended that the specification and illustrated embodiments be considered as examples only, with a true scope and spirit of the invention being indicated by the following claims.
Bennett, David W., Mason, Jeffrey M.
Patent | Priority | Assignee | Title |
10659396, | Aug 02 2015 | MIPS HOLDING, INC | Joining data within a reconfigurable fabric |
10949328, | Aug 19 2017 | MIPS HOLDING, INC | Data flow graph computation using exceptions |
10997102, | Apr 01 2019 | MIPS HOLDING, INC | Multidimensional address generation for direct memory access |
11106976, | Aug 19 2017 | MIPS HOLDING, INC | Neural network output layer for machine learning |
11227030, | Apr 01 2019 | MIPS HOLDING, INC | Matrix multiplication engine using pipelining |
11481472, | Apr 01 2019 | MIPS HOLDING, INC | Integer matrix multiplication engine using pipelining |
11934308, | Apr 01 2019 | MIPS HOLDING, INC | Processor cluster address generation |
8464190, | Feb 17 2011 | MAXELER TECHNOLOGIES LTD | Method of, and apparatus for, stream scheduling in parallel pipelined hardware |
8689156, | Feb 17 2011 | Maxeler Technologies Ltd. | Method of, and apparatus for, optimization of dataflow hardware |
8972923, | Feb 08 2011 | Maxeler Technologies Ltd.; MAXELER TECHNOLOGIES LTD | Method and apparatus and software code for generating a hardware stream processor design |
9772876, | Jan 06 2014 | International Business Machines Corporation | Executing an all-to-ally operation on a parallel computer that includes a plurality of compute nodes |
9830186, | Jan 06 2014 | International Business Machines Corporation | Executing an all-to-allv operation on a parallel computer that includes a plurality of compute nodes |
Patent | Priority | Assignee | Title |
7039074, | Sep 14 2000 | PROMPTU SYSTEMS CORPORATION | N-way demultiplexer |
7346483, | Oct 10 2003 | Primary Integration, LLC | Dynamic FIFO for simulation |
20030023914, | |||
20030103460, | |||
20090225768, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Oct 28 2008 | BENNETT, DAVID W | Xilinx, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 021760 | /0950 | |
Oct 28 2008 | MASON, JEFFREY M | Xilinx, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 021760 | /0950 | |
Oct 30 2008 | XILINX, Inc. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Apr 21 2014 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Apr 19 2018 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Mar 22 2022 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Date | Maintenance Schedule |
Oct 19 2013 | 4 years fee payment window open |
Apr 19 2014 | 6 months grace period start (w surcharge) |
Oct 19 2014 | patent expiry (for year 4) |
Oct 19 2016 | 2 years to revive unintentionally abandoned end. (for year 4) |
Oct 19 2017 | 8 years fee payment window open |
Apr 19 2018 | 6 months grace period start (w surcharge) |
Oct 19 2018 | patent expiry (for year 8) |
Oct 19 2020 | 2 years to revive unintentionally abandoned end. (for year 8) |
Oct 19 2021 | 12 years fee payment window open |
Apr 19 2022 | 6 months grace period start (w surcharge) |
Oct 19 2022 | patent expiry (for year 12) |
Oct 19 2024 | 2 years to revive unintentionally abandoned end. (for year 12) |