Examples described herein provide a non-transitory computer-readable medium storing instructions, which when executed by one or more processors, cause the one or more processors to perform operations. The operations include: generating, using the one or more processors, a plurality of child processes according to a number of programmable dies of the multi-die device, each of the plurality of child processes corresponding to a respective programmable die of the multi-die device, wherein the plurality of child processes execute on different processors; partitioning a design for the multi-die device into a plurality of portions, each of the portions to be used to configure one of the programmable dies of the multi-die device; transmitting the plurality of portions of the design to the plurality of child processes for placement; and receiving placements from the plurality of child processes.
|
9. A non-transitory computer-readable medium storing instructions, which when executed by one or more processors, cause the one or more processors to perform operations, the operations comprising:
partitioning a circuit design for a multi-die device into a plurality of circuit design portions, each of the plurality of circuit design portions to be placed on a respective die of the multi-die device;
generating a plurality of thread pools according to a number of dies of the multi-die device, each of the plurality of thread pools corresponding to a die of the multi-die device;
generating a context for each of the dies of the multi-die device by isolating a plurality of data structures of each die, the context having the isolated data structures; and
generating a placement for each die of the multi-die device with the context using a respective thread pool of the plurality of thread pools.
19. A method for configuring a multi-die device comprising a plurality of dies, the method comprising:
generating, using one or more processors, a plurality of child processes according to a number of the dies of the multi-die device, each of the plurality of child processes corresponding to a respective one of the dies of the multi-die device, wherein the plurality of child processes execute on different processors;
partitioning a circuit design for the multi-die device into a plurality of circuit design portions, each of the circuit design portions to be used to configure a respective one of the dies of the multi-die device;
transmitting the plurality of circuit design portions to the plurality of child processes for placement;
receiving placements from the plurality of child processes;
receiving a plurality of placements from the plurality of child processes; and
merging the plurality of placements into a global placement for the multi-die device by merging information associated with placement of components of the circuit design for each of the dies of the multi-die device into the global placement.
1. A non-transitory computer-readable medium storing instructions, which when executed by one or more processors, cause the one or more processors to perform operations, the operations comprising:
generating, using the one or more processors, a plurality of child processes according to a number of dies of a multi-die device, each of the plurality of child processes corresponding to a respective one of the dies of the multi-die device, wherein the plurality of child processes execute on different processors;
partitioning a circuit design for the multi-die device into a plurality of circuit design portions, each of the circuit design portions to be used to configure a respective one of the dies of the multi-die device;
transmitting the plurality of circuit design portions to the plurality of child processes for placement;
receiving a plurality of placements from the plurality of child processes; and
merging the plurality of placements into a global placement for the multi-die device by merging information associated with placement of components of the circuit design for each of the dies of the multi-die device into the global placement.
2. The non-transitory computer-readable medium of
receiving a configuration of the multi-die device, the configuration comprising the number of dies and constraints.
3. The non-transitory computer-readable medium of
identifying critical nets which are nets that have not met their timing constraint;
identifying crossings between the dies of the multi-die device; and
performing an initial placement of the circuit design to place critical paths, wherein the one or more processors avoid placing the critical paths along the crossings between the dies.
4. The non-transitory computer-readable medium of
5. The non-transitory computer-readable medium of
6. The non-transitory computer-readable medium of
7. The non-transitory computer-readable medium of
generating a placement of instances for a portion of the design, wherein the one or more processors correspond to a die of the multi-die device.
8. The non-transitory computer-readable medium of
receive a circuit design portion for the multi-die device, the circuit design portion corresponding to a die of the multi-die device;
generate a placement for the circuit design portion; and
transmit the placement to a parent process.
10. The non-transitory computer-readable medium of
11. The non-transitory computer-readable medium of
synchronizing the contexts of the dies with each other for generating the placement of each die, wherein after the synchronization, the context of each die comprises the same information.
12. The non-transitory computer-readable medium of
13. The non-transitory computer-readable medium of
merging the context of each die into a global context by assembling together information of the context of each die into the global context, wherein the global context comprises information associated with placement of components of the circuit design for each die of the multi-die device.
14. The non-transitory computer-readable medium of
identifying critical nets;
identifying crossings between the dies of the multi-die device; and
performing an initial placement of the circuit design to place the critical nets, wherein the one or more processors avoid placing the critical nets along the crossings between the dies.
15. The non-transitory computer-readable medium of
16. The non-transitory computer-readable medium of
17. The non-transitory computer-readable medium of
18. The non-transitory computer-readable medium of
receiving a configuration of the multi-die device, the configuration comprising the number of dies and constraints.
20. The non-transitory computer-readable medium of
|
Examples of the present disclosure generally relate to electronic circuit design and, in particular, to multi-processing and massive multi-threading flow used for circuit design.
A programmable integrated circuit (IC) refers to a type of IC that includes programmable circuitry. An example of a programmable IC is a field programmable gate array (FPGA). An FPGA is characterized by the inclusion of programmable circuit blocks. Circuit designs may be physically implemented within the programmable circuitry of a programmable IC by loading configuration data, sometimes referred to as a configuration bitstream, into the device. The configuration data may be loaded into internal configuration memory cells of the device. The collective states of the individual configuration memory cells determine the functionality of the programmable IC. For example, the particular operations performed by the various programmable circuit blocks and the connectivity between the programmable circuit blocks of the programmable IC are defined by the collective states of the configuration memory cells once loaded with the configuration data.
Generally, placement and routing algorithms targeting FPGAs do not use multiple processors or massive multi-threading and thus did not save as much runtime as they could have. Even when multiple processors or massive multi-threading were used, they were used to speed up a specific algorithm and not applicable across the entire placement flow. Moreover, these algorithms suffered from quality of results (QoR) loss (e.g., failure to meet timing constraints). QoR is defined here as whether the multi-die device has met timing constraints and maximum frequency of operations. The QoR loss resulted from the algorithms not being able to optimize critical paths across die boundaries or the algorithms suffering from a lack of runtime as a number of placement algorithms did not scale well beyond four to eight threads (e.g., because of bottlenecking in shared resources).
Examples of the present disclosure generally relate to configuring devices with multiple dies in an effort to improve QoR and to improve runtime of placement algorithms.
One example of the present disclosure is a method for configuring a multi-die device using a multi-processing flow (MPF). The method generally includes: generating, using one or more processors, a plurality of child processes according to a number of programmable dies of the multi-die device, each of the plurality of child processes corresponding to a respective programmable die of the multi-die device, wherein the plurality of child processes execute on different processors; partitioning a design for the multi-die device into a plurality of portions, each of the portions to be used to configure one of the programmable dies of the multi-die device; transmitting the plurality of portions of the design to the plurality of child processes for placement; and receiving placements from the plurality of child processes.
Another example of the present disclosure is a method for configuring a multi-die device using a massively multi-thread (MMT) flow. The method generally includes: partitioning a design for the multi-die device into a plurality of portions, each of the plurality of portions to be placed on a respective programmable die of the multi-die device; generating a plurality of thread pools according to a number of programmable dies of the multi-die device, each of the plurality of thread pools corresponding to a programmable die of the multi-die device; generating a context for each of the programmable dies of the multi-die device by isolating a plurality of data structures of each programmable die; and generating a placement for each programmable die of the multi-die with the context using a respective thread pool of the plurality of thread pools.
Aspects of the present disclosure also provide apparatus, methods, processing systems, and computer readable mediums for performing the operations described above.
These and other aspects may be understood with reference to the following detailed description.
So that the manner in which the above recited features can be understood in detail, a more particular description, briefly summarized above, may be had by reference to example implementations, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical example implementations and are therefore not to be considered limiting of its scope.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements of one example may be beneficially incorporated in other examples.
Various features are described hereinafter with reference to the figures. It should be noted that the figures may or may not be drawn to scale and that the elements of similar structures or functions are represented by like reference numerals throughout the figures. It should be noted that the figures are only intended to facilitate the description of the features. They are not intended as an exhaustive description of the description or as a limitation on the scope of the claims. In addition, an illustrated example need not have all the aspects or advantages shown. An aspect or an advantage described in conjunction with a particular example is not necessarily limited to that example and can be practiced in any other examples even if not so illustrated, or if not so explicitly described.
Examples described herein describe a multi-processing flow (MPF) and a massively multi-thread (MMT) flow for configuring and placing circuit designs on multi-die devices. One example of MPF involves partitioning a circuit design to be configured and placed on a multi-die device by splitting the circuit design into multiple circuit design portions and placing each circuit design portion on each programmable die of the multi-die device using multiple processors. By placing each circuit design portion separately, the multiple processors limit interactions with other circuit design portions, which means that the circuit design portions are smaller to place. One example of MMT flow involves partitioning the circuit design into circuit design portions to be placed on the multi-die device by thread pools of a processor, and in this example, contexts are generated for each of the circuit design portions. The context of each circuit design portion can include data structures isolated from the contexts of the other circuit design portions, such that the contexts are independent from each other. By having an independent context for each programmable die, the thread pools can independently place each circuit design portion on each programmable die of the multi-die device.
Accordingly, by subdividing the circuit design into multiple circuit design portions, MPF and MMT flows allow for parallel placement without suffering from QoR loss or lack of thread scaling. Furthermore, placement is performed on a per-die basis, which saves placement runtime, because placement per-die is done in parallel.
In some examples, routing problems (e.g., routing congestion) tend to manifest along the boundaries and corners of the programmable dies of multi-die devices, due to the reduction of adjacent routing resources. This may result in a circuit design failing to route certain nets and/or failing to meet timing requirements. Furthermore, certain components need to be placed next to one another to avoid penalty. Accordingly, MPF and MMT flows are configured to partition the circuit design along the boundaries and corners of the dies of the multi-die device to reduce the number of nets crossing the die boundaries and to make those nets that do cross the boundaries non-timing critical. Because MPF and MMT flows involve avoiding nets that cross die boundaries so that nets are less likely to be critical nets (and thus QoR does not suffer), splitting the circuit design into smaller circuit design portions can make placement of the circuit design portions faster to solve without causing degradation in the quality of the solution.
Additionally, devices with multiple dies often have a limited number of crossings across the dies, and this limited number of crossings provide a natural way of partitioning the circuit design on the devices. MPF and MMT flows save runtime by taking advantage of this natural partition to allow parallelism without suffering from QoR loss or lack of thread scaling. Prior attempts at multi-threading or multi-processing attempted to speed up a specific algorithm, not the entire placement flow. Thus, MPF and MMT flows also save runtime by implementing multi-processing and multi-threading across the entire placement flow. The terms “mapping” and “placement” are used herein interchangeably.
Generally,
The IC dies 122-1, 122-2, 122-3 (collectively “IC dies 122”) can be IC dies of the multi-die device 120. The IC dies 122 can be referred herein as IC dies. In some examples, the IC dies 122 of the multi-die device 120 are vertically stacked and disposed on a carrier. The IC die 122-1 is the top IC die or top-most IC die, and the IC die 122-3 is the bottom IC die or the bottom-most IC die. In general, the top-most IC die is the IC die that is farthest from the carrier and has an exposed backside. The bottom-most IC die is the IC die that is the closest to the carrier (e.g., mounted on the carrier). The carrier can be a circuit board, interposer, or the like. Each of the IC die 122 is an active IC. An “active IC” is an IC that includes active circuitry (e.g., transistors), as opposed to a passive IC, such as an interposer, that includes only conductive interconnect. Each of the IC dies 122 can be a mask-programmed IC, such as an application specific integrated circuit (ASIC), or an IC, such as a FPGA. The multi-die device 120 can include all masked-programmed ICs, all programmable ICs, or a combination of both mask-programmed ICs and programmable ICs. While the multi-die device 120 is shown as having three IC die, in general the multi-die device 120 can include two or more IC dies. The terms “programmable die” and “IC die” are used herein interchangeably.
A user interacts with the circuit design system 100 to generate a circuit design, which is then implemented for the multi-die device 120. The circuit design system 100 implements different circuit design portions 124-1, 124-2, 124-3 of the circuit design (collectively circuit design portions 124) in different IC dies 122 in a manner that optimizes placement of the design to the multi-die device. In the example, the circuit design portion 124-1 is implemented using resources of the IC die 122-1, the circuit design portion 124-2 is implemented using resources of the IC die 122-2, and the circuit design portion 124-3 is implemented using resources of the IC die 122-3. As discussed below, the circuit design system 100 implements portions 124 of the circuit design using multiple child processes 104 corresponding to a die of the multi-die device. In the example, each child process 104 can generate a placement for a corresponding circuit design portion.
In some examples, each processor 102 having a child process 104 is coupled to an IC die 122 of the multi-die device 120. The parent process 103 generates the child processes 104 that can run on the processors 102. The number of generated child processes 104 can equal the number of dies of the multi-die device 120. In some examples, the parent process 103 can configure each child process 104 to perform placement for a corresponding IC die of the multi-die device 120. The parent process 103 can also perform placement for an IC die of the multi-die device 120. In some examples, the parent process 103 can assign a die to each child process 104 (or to itself if it is configured to perform placement for an IC die of the multi-die device 120).
In some examples, the computer 101 can be coupled to input/output (IO) devices and a display to allow a user to interact with computer 101. In some examples, the computer can include various support circuits and an IO interfaces to support the computer 101 and to allow interaction between the computer and a user. The support circuits can include conventional cache, power supplies, clock circuits, data registers, IO interfaces, and the like. The IO interface can be directly coupled to the memory 108 or coupled through the processors 102. The IO interface can be coupled to the IO devices, which can include conventional keyboard, mouse, and the like. The IO interface can also be coupled to the display, which can present a GUI to a user.
The memory 108 may store all or portions of one or more programs and/or data to implement aspects of the circuit design system 100 described herein. The memory 108 can store circuit design tool code 110 that is executable by the processors 102. In some examples, the memory 108 can store code for MPF 112 to implement MPF, which is described below. The memory 108 can include one or more of random access memory (RAM), read only memory (ROM), magnetic read/write memory, FLASH memory, solid state memory, or the like as well as combinations thereof.
In some examples the circuit design tool 110 can be configured to receive behavioral description of a circuit design for the multi-die device. The circuit design tool 110 processes the behavioral description to produce a logical description of the circuit design. The logical description includes a logical network list (“netlist”) of lower-level circuit elements and logic gates, as well as connections (nets) between inputs and outputs thereof, in terms of the hierarchy specified in the behavioral description. For example, the logical description may be compliant with the Electronic Design Interchange Format (EDIF). The circuit design tool 110 may also generate constraint data associated with the logical description that includes various timing and layout constraints. Alternatively, the logical description may be annotated with constraint data. Such an annotated netlist is produced by XST synthesis tool, commercially available by Xilinx, Inc., of San Jose, Calif.
The circuit design tool 110 can pass the logical description of the circuit design to the MPF 112. Because the MPF 112 is configured to implement the circuit design to the multi-die device 120, the MPF 112 can include a map tool and a place-and-route tool. The map tool maps the logical description onto physical resources within the multi-die device (i.e., the circuit components, logic gates, and signals are mapped onto LUTs, flip-flops, clock buffers, I/O pads, and the like of the target FPGA). The map tool produces a mapped circuit description in accordance with any constraints in the constraint data. The mapped circuit description includes groupings of the physical resources of the multi-die device 120 expressed in terms of CLBs and IOBs that include these resources. In one embodiment, the mapped circuit description does not include physical location information. The PAR tool is configured to receive the mapped circuit description and the constraint data. The PAR tool determines placement for the physical resource groupings of the mapped circuit description in the multi-die device 120 and apportions the appropriate routing resources. The PAR tool performs such placement and routing in accordance with any constraints in the constraint data. The PAR tool produces physical design data.
In some examples, the MPF 112 can also include a bitstream generator. In some examples, the bitstream generator is configured to receive the physical design data and produce bitstream data for the multi-die device 120.
Each of the parent computer 152 and child computers 154 has memory 108 storing circuit design tool code 110 and code for MPF 112. The parent computer 152 can be configured with the parent process 103 similar to the parent processor 102 shown in
Because the processors 156 of the child computers 154 are disposed on different machines than the processor 153 for the parent computer 152, the parent computer 152 comprises components (not illustrated) that facilitate communication with the child computers 154. Similarly, the child computers 154 comprises components (not illustrated) that facilitate communication with the parent computer 152. Generally,
At 202, the parent process generates child processes according to a number of IC dies of the multi-die device 120. Each child process corresponds to a respective IC die of the multi-die device 120. For example, a first child process corresponds to IC die 122-1 of the multi-die device 120, a second child process corresponds to IC die 122-2 of the multi-die device 120, and so forth. In some examples, the child processes can correspond to a processor 102 of
At 204, the parent process partitions the circuit design for the multi-die device 120 into a plurality of circuit design portions 124. Each of the circuit design portions 124 of the circuit design can be used to configure one of the IC dies 122 of the multi-die device 120. When partitioning the circuit design, the parent process chooses nets to cut to accommodate die boundaries. The parent process can choose nets that are not likely to be critical nets, and the parent processor can optimize and legalize these cross-die connections. In some cases, during the partitioning of the circuit design at 204, the parent process performs an initial global mapping and placement to assist in the partitioning of the circuit design along the die boundaries. This initial global mapping and placement can map circuit design portions 124 to particular IC dies to avoid having critical nets mapped along inter-die connections and to place components next to each other to avoid penalty. The parent process can also consider clock region and clock routing constraints during partitioning.
At 206, the parent process transmits the plurality of circuit design portions 124 to the plurality of child processes for placement. For example, the parent process transmits the circuit design portion 124-1 to the child process 104-1, and the child process 104-1 performs placement operations for the circuit design portion 124-1 to IC die 122-1. Similarly, the parent process transmits the circuit design portion 124-2 to the child process 104-2 and the circuit design portion 124-3 to the child process 104-3. In some examples, the parent process 103 can act as a child process and keeps a circuit design portion (e.g., the circuit design portion 124-1) for placement operations.
At 302, in some examples, the child process can begin an initialization process, where the child process instantiates and prepares for the rest of the operations 200.
At 304, the child process receives a circuit design portion 124 for the multi-die device 120 from the parent process. As mentioned, the circuit design portion 124 corresponds to an IC die of the multi-die device 120.
At 306, the child process generates a placement of instances for the circuit design portion 124. When each child process generates a placement of instances, the child process generates a placement for its corresponding die. In some examples, placement of the circuit design portion 124 can include mapping components in the netlist and inputs and outputs to hardware of the corresponding die. In some examples, routing is performed, which can generate routes for streaming data between components of the dies. The routing, e.g., global and/or detailed routing, can include using a Boolean satisfiability problem (SAT) algorithm, an integer linear programming (ILP) algorithm, a PathFinder algorithm, a greedy algorithm, and/or the like.
At 308, the child process transmits the placement to a parent process operating on the parent process.
Returning to
As illustrated, the parent process 402 and the child processes 404 can operate in parallel. At 202, the parent process 402 generates the child processes 404 and then continues to operate in parallel with the generated child processes 404. For example, the parent process 402 generates three child processes 404-1, 404-2, 404-3 corresponding to three dies of the multi-die device 120.
At 204, the parent process 402 partitions a circuit design for the multi-die device 120 and transmits the circuit design portions (e.g., circuit design portions 124) to the child processes 404 for placement.
At 302-1, 302-2, 302-3, while the parent process 402 partitions a design for the multi-die device 120, the child processes 404 can perform initialization, in which each child process is assigned a die of the multi-die device 120. The child processes 404 can also prepare for the receipt of circuit design portions of the multi-die device 120.
Accordingly, when the parent process 402 transmits at 206, at 304-1, 304-2, 304-3, the child processes 404 correspondingly receive a circuit design portion corresponding to the die to which the child process is assigned.
At 306-1, 306-2, 306-3, each child process 404 generates a placement of instances for its corresponding die. The placement generated can be based on the circuit design portion 124 transmitted from the parent process 402.
At 308-1, 308-2, 308-3, each child process 404 transmits the placement to the parent process 402.
In some examples, at 207, while the child processes 404 generate a placement for the circuit design portion of the corresponding die, the parent process 402 can manage the child processes 404-1, 404-2, 404-3. In some examples, the parent process 402 can be configured to operate as a child process and can also generate a placement of instances for a die of the multi-die device 120.
At 208, the parent process 402 receives the placements from the various child processes 404, and at 210, merges all the placements into a global placement for the multi-die device 120.
In some examples, the parent process 103 can perform a first iteration of placement of the circuit design for the multi-die device 120 to generate resource usage information (“planned resource usage”). The planned resource usage can include, for example, a general floorplan for each IC die in the multi-die device 120. Each general floorplan includes the resources to be used by a circuit design portion (e.g., logic resources, routing resources, etc.). The parent process 103 can also generate floorplan constraints for each IC die of the multi-die device 120.
In some examples, during the initial placement for the multi-die device 120, the parent process 103 generates a graph of the floorplan 510 with nodes and weighted edges. Each node is either a netlist component 502 having any number of netlist subcomponent 506 or an IO 504, and occupies a site on a die of the multi-die device 120. Each edge represents a connection between the netlist components 502 or a connection between the netlist components 502 and IOs 504. Some netlist components 502 and some IOs 504 have requirements or constraints for placement and routing. Accordingly, edges between those netlist components 502 and those IO 504 are weighted differently.
Once the parent process 103 has performed an initial placement of netlist components 502 and IOs 504, the parent process 103 partitions the floorplan 510. In some examples, the parent process 103 finds a cut that minimizes the edges that will be cut, while satisfying utilization requirements between the circuit design portions. By the end of partitioning, every netlist component 502 and IO 504 is assigned to a circuit design portion.
For example, as illustrated in
In some cases, as mentioned, the parent process 103 cuts the circuit design along a net or an edge passing between two circuit design portions. In such cases, the circuit design portions each comprise information regarding this net. For example, in
Like with the circuit design system 100 using MPF, a user interacts with the circuit design system 600 to generate a circuit design, which is then implemented for the multi-die device 120. The circuit design system 600 implements different circuit design portions 124 of the circuit design in different IC dies 122 in a manner that optimizes placement of the design to the multi-die device 120. The circuit design system 600 uses massively multi-threaded flow to generate placements for circuit design portions. Accordingly, the processor includes thread pools 604-1, 604-2, 604-3 (collectively “thread pools 604”), each corresponding to a different IC die of the multi-die device 120. In some examples, the thread pools 604 can have any number of threads, and the thread pools 604 can be configured to perform placement operations for the multi-die device 120. The circuit design system 600 also generates contexts for the thread pools 604 (e.g., ActiveContext 804-1, 804-2, 804-3 (collectively “contexts 804”) in
In MMT, the thread pools 604 and contexts 804 are independent of other thread pools and contexts. Because of this independence, MMT does not require mutexes and thus does not suffer from lack of scaling found in other multi-threading flows. In some examples, the contexts 804 comprises replicated data common to the context 804 for each die of the multi-die device 120. The contexts can include information associated with components of the circuit design for the multi-die device, and the information of each context is associated with the respective die. The contexts allow the processor 602 to perform placement operations for a die of the multi-die device 120 using information and data structures local to the die without affecting information and data structures of other dies. The use of a massive number of threads improves runtime with improved thread scaling.
At 702, the processor 602 partitions a circuit design for the multi-die device 120 into one or more circuit design portions 124. Each of the circuit design portions 124 partitioned can be placed on one of the dies of the multi-die device 120. In some examples, the processor 602 can partition the circuit design for MMT using the partitioning techniques described for MPF. In some examples, the processor 602 can perform an initial placement of the circuit design to place critical paths in order to avoid placing critical paths along the crossings between the programmable dies
At 704, the processor 602 generates thread pools 604 according to a number of dies of the multi-die device 120. Each of the thread pools 604 corresponds to a die of the multi-die device 120. In some examples, dies can have more than one circuit design portion, and so more than one thread pool can correspond to an IC die.
At 706, the processor 602 generates a context for each of the dies of the multi-die device 120. In some examples, the context is generated by isolating one or more data structures of each die.
At 708, the processor 602 generates a placement for each die of the multi-die device 120 with the context using a respective thread pool of the plurality of thread pools 604. In some examples, the processor 602 instructs the thread pools 604 to synchronize the contexts to help preserve QoR. By synchronizing the contexts, the processor has more global information about the dies of the multi-die device 120. With the synchronized contexts, the processor can determine whether nets crossing die boundaries still meet constraints. In some examples, after synchronization, the context of each die comprises the same information
At 710, the processor 602 can merge the placements into a global placement for the multi-die device 120. The processor 602 can merge the placements in a way similar to how the parent processor 102 merges the placements for MPF in
At 802, the processor 602 can initialize the plurality of thread pools. As mentioned, the processor 602 initializes as many thread pools 604 as there are dies in the multi-die device 120, and each thread pool 604 corresponds to a die of the multi-die device 120. Each thread pool 604 can have any number of threads to perform placement operations. In some examples, the processor 602 also generates a context 804 (e.g., ActiveContext 804-1, 804-2, 804-3) for each thread pool 604. The context 804 comprises a plurality of databases for the corresponding die of the multi-die device 120.
After initialization 802, at 702, the processor 602 partitions the circuit design for the multi-die device 120 into circuit design portions 124. The circuit design portions 124 are then transmitted (as illustrated) to each thread pool 604. For example, three circuit design portions 124 are transmitted to three thread pools 604. As mentioned, each of the thread pools 604 has a corresponding context 804 comprising data structures isolated to the corresponding die. At 708-1, 708-2, 708-3, each thread pool 604 can perform placement operations for a respective IC die using the context 804 having isolated data structures. In some examples, the thread pools 604 synchronize the contexts 804. At 710, the parent processor takes the placements generated from each thread pool 604 and merges them into a global placement for the multi-die device 120.
In some FPGAs, each programmable tile can include at least one programmable interconnect element (“INT”) 911 having connections to input and output terminals 920 of a programmable logic element within the same tile. Each programmable interconnect element 911 can also include connections to interconnect segments 922 of adjacent programmable interconnect element(s) in the same tile or other tile(s). Each programmable interconnect element 911 can also include connections to interconnect segments 924 of general routing resources between logic blocks (not shown). The general routing resources can include routing channels between logic blocks (not shown) comprising tracks of interconnect segments (e.g., interconnect segments 924) and switch blocks (not shown) for connecting interconnect segments. The interconnect segments of the general routing resources (e.g., interconnect segments 924) can span one or more logic blocks. The programmable interconnect elements 911 taken together with the general routing resources implement a programmable interconnect structure (“programmable interconnect”) for the illustrated FPGA. Each programmable interconnect element 911 can include an interconnect circuit that can implement various types of switching among input interconnect segments and output interconnect segments, such as cross-point switching, breakpoint switching, multiplexed switching, and the like.
In an example, a CLB 902 can include a configurable logic element (“CLE”) 912 that can be programmed to implement user logic plus a single programmable interconnect element (“INT”) 911. A BRAM 903 can include a BRAM logic element (“BRL”) 913 in addition to one or more programmable interconnect elements. Typically, the number of interconnect elements included in a tile depends on the height of the tile. In the pictured example, a BRAM tile has the same height as five CLBs, but other numbers (e.g., four) can also be used. A DSP tile 906 can include a DSP logic element (“DSPL”) 914 in addition to an appropriate number of programmable interconnect elements. An IOB 904 can include, for example, two instances of an input/output logic element (“IOL”) 915 in addition to one instance of the programmable interconnect element 911. As will be clear to those of skill in the art, the actual VO pads connected, for example, to the 1/O logic element 915 typically are not confined to the area of the input/output logic element 915.
In the pictured example, a horizontal area near the center of the die is used for configuration, clock, and other control logic. Vertical columns 909 extending from this horizontal area or column are used to distribute the clocks and configuration signals across the breadth of the FPGA.
Some FPGAs utilizing the architecture illustrated in
Note that
The various examples described herein may employ various computer-implemented operations involving data stored in computer systems. For example, these operations may require physical manipulation of physical quantities usually, though not necessarily, these quantities may take the form of electrical or magnetic signals, where they or representations of them are capable of being stored, transferred, combined, compared, or otherwise manipulated. Further, such manipulations are often referred to in terms, such as producing, identifying, determining, or comparing. Any operations described herein that form part of one or more examples may be implemented as useful machine operations. In addition, one or more examples also relate to a device or an apparatus for performing these operations. The apparatus may be specially constructed for specific purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
The various examples described herein may be practiced with other computer system configurations including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
One or more examples may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer readable media. The term computer readable medium refers to any data storage device that can store data which can thereafter be input to a computer system. A computer readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer. Examples of a computer readable medium (e.g., a non-transitory storage medium) include a hard drive, a Solid State Disk (SSD), network attached storage (NAS), read-only memory, random-access memory (e.g., a flash memory device), a CD (Compact Discs) CD-ROM, a CD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
The methods disclosed herein comprise one or more steps or actions for achieving the methods. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is specified, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims. Further, the various operations of methods described above may be performed by any suitable means capable of performing the corresponding functions. The means may include various hardware and/or software component(s) and/or module(s), including, but not limited to a circuit, an application specific integrated circuit (ASIC), or processor. Generally, where there are operations illustrated in figures, those operations may have corresponding counterpart means-plus-function components with similar numbering.
While the foregoing is directed to specific examples, other and further examples may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
Stenz, Guenter, Gasparyan, Grigor S., Kundarewich, Paul D., Shen, Zhaoxuan, Dehkordi, Mehrdad Eslami, Pandya, Amish
Patent | Priority | Assignee | Title |
11803681, | Mar 22 2021 | XILINX, Inc.; Xilinx, Inc | Wafer-scale large programmable device |
11875100, | Jun 04 2021 | XILINX, Inc. | Distributed parallel processing routing |
ER2778, |
Patent | Priority | Assignee | Title |
10108773, | Nov 14 2016 | XILINX, Inc. | Partitioning circuit designs for implementation within multi-die integrated circuits |
10496777, | Nov 17 2017 | XILINX, Inc.; Xilinx, Inc | Physical synthesis for multi-die integrated circuit technology |
10558777, | Nov 22 2017 | XILINX, Inc. | Method of enabling a partial reconfiguration in an integrated circuit device |
10572621, | Jul 12 2018 | XILINX, Inc.; Xilinx, Inc | Physical synthesis within placement |
8032772, | Nov 15 2007 | TAHOE RESEARCH, LTD | Method, apparatus, and system for optimizing frequency and performance in a multi-die microprocessor |
8250513, | Nov 04 2010 | XILINX, Inc. | Parallel process optimized signal routing |
8418115, | May 11 2010 | XILINX, Inc.; Xilinx, Inc | Routability based placement for multi-die integrated circuits |
8473881, | Jan 17 2011 | XILINX, Inc. | Multi-resource aware partitioning for integrated circuits |
9529957, | Jan 27 2015 | XILINX, Inc.; Xilinx, Inc | Multithreaded scheduling for placement of circuit designs using connectivity and utilization dependencies |
9594859, | Mar 02 2007 | Altera Corporation | Apparatus and associated methods for parallelizing clustering and placement |
20100023903, | |||
20130069163, | |||
20150135147, | |||
20180341738, | |||
20190303523, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jan 17 2020 | KUNDAREWICH, PAUL D | Xilinx, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 051890 | /0509 | |
Jan 21 2020 | GASPARYAN, GRIGOR S | Xilinx, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 051890 | /0509 | |
Jan 21 2020 | DEHKORDI, MEHRDAD ESLAMI | Xilinx, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 051890 | /0509 | |
Jan 23 2020 | STENZ, GUENTER | Xilinx, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 051890 | /0509 | |
Jan 24 2020 | PANDYA, AMISH | Xilinx, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 051890 | /0509 | |
Feb 19 2020 | SHEN, ZHAOXUAN | Xilinx, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 051890 | /0509 | |
Feb 20 2020 | XILINX, Inc. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Feb 20 2020 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Oct 17 2024 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Date | Maintenance Schedule |
May 11 2024 | 4 years fee payment window open |
Nov 11 2024 | 6 months grace period start (w surcharge) |
May 11 2025 | patent expiry (for year 4) |
May 11 2027 | 2 years to revive unintentionally abandoned end. (for year 4) |
May 11 2028 | 8 years fee payment window open |
Nov 11 2028 | 6 months grace period start (w surcharge) |
May 11 2029 | patent expiry (for year 8) |
May 11 2031 | 2 years to revive unintentionally abandoned end. (for year 8) |
May 11 2032 | 12 years fee payment window open |
Nov 11 2032 | 6 months grace period start (w surcharge) |
May 11 2033 | patent expiry (for year 12) |
May 11 2035 | 2 years to revive unintentionally abandoned end. (for year 12) |