In one embodiment, an integrated circuit may be designed using a library of clocked circuits that have programmable clock delays that may be inserted on the clock input to the clocked circuits. During the design process, timing paths which are challenging due to significant variations across operating states, process corners, and/or temperature may be met by using the clocked circuits with programmable delays and inserting a delay control circuit that programs the delays based on the current operating state, process corner used to manufacture the integrated circuit, and/or temperature. That is, different delays may be selected by the delay control circuit depending on inputs that identify the operating state, the process corner, and/or the temperature. Because the clock delay is intentionally skewed, the timing of the path may be different at different operating states, temperatures, or process corners and thus may meet timing by changing the clock skew during operation.
|
9. An integrated circuit comprising:
a plurality of programmable delay circuits coupled to a clock input, wherein respective ones of the plurality of programmable delay circuits have respective delay inputs, wherein the respective ones of the plurality of programmable delay circuits are configured to delay the clock input to generate respective clock outputs based on respective values on the respective delay inputs; and
a delay control circuit coupled to the respective delay inputs, wherein the delay control circuit is configured to generate the respective values on the respective delay inputs based on a combination of an operating state indication identifying an operating state of the integrated circuit and a process indication identifying a process corner at which the integrated circuit was manufactured.
1. An integrated circuit comprising:
a plurality of programmable delay circuits coupled to a clock input and having a delay input, wherein a first programmable delay circuit of the plurality of programmable delay circuits is configured to select a first delay based on a value on the delay input, and wherein a second programmable delay circuit of the plurality of programmable delay circuits is configured to select a second delay different from the first delay based on the value, and wherein the plurality of programmable delay circuits are configured to delay the clock input to generate respective clock outputs based on the respective first delay and second delay; and
a delay control circuit coupled to the delay input, wherein the delay control circuit is configured to generate the value on the delay input based on a combination of an operating state indication identifying an operating state of the integrated circuit and a process indication identifying a process corner at which the integrated circuit was manufactured.
16. An apparatus comprising:
a clocked circuit having a clock input, wherein the clocked circuit is configured to perform a specified operation based on a first clock, and wherein the clocked circuit comprises a delay circuit coupled to the clock input and configured to delay a second clock on the clock input to generate the first clock, wherein an amount of the delay is selectable based on a delay input to the clocked circuit; and
a delay control circuit coupled to the delay input of the clocked circuit and configured to generate a value on the delay input to select the amount of the delay, wherein the delay control circuit is configured to map a combination of at least a first indication of an operating state of an integrated circuit and a second indication of a process corner in effect when the integrated circuit implementing the apparatus was manufactured to the value, wherein the mapping is predetermined based on static timing analysis performed during a design of the integrated circuit prior to manufacture of the integrated circuit, wherein the delay control circuit comprises a table programmed with data that is determined via the static timing analysis, and wherein the delay control circuit is configured to read the data from the table to map the combination of the first indication and the second indication to the value.
2. The integrated circuit as recited in
3. The integrated circuit as recited in
4. The integrated circuit as recited in
5. The integrated circuit as recited in
6. The integrated circuit as recited in
7. The integrated circuit as recited in
8. The integrated circuit as recited in
10. The integrated circuit as recited in
11. The integrated circuit as recited in
12. The integrated circuit as recited in
13. The integrated circuit as recited in
14. The integrated circuit as recited in
15. The integrated circuit as recited in
17. The apparatus as recited in
18. The apparatus as recited in
19. The apparatus as recited in
20. The apparatus as recited in
|
This application is a continuation of U.S. patent application Ser. No. 16/545,120, filed on Aug. 20, 2019. The above application is incorporated herein by reference in its entirety.
Embodiments described herein are related to integrated circuits that include clocked circuit elements having programmable clock skew and a method for using the programmable clock skew to facilitate timing closure across different operating states.
Digital integrated circuits generally implement combinatorial logic circuits that receive inputs launched according to a clock and must complete their operations so that outputs can be captured according to the same clock. That is, the propagation delays through the logic circuitry must generally be less than the clock cycle time. Static timing analysis is used to determine if the propagation delays along various paths (input to output) in the integrated circuit meet the timing requirements for the clock cycle time. Paths that are not meeting timing (e.g. the propagation delay is greater than the required clock cycle time) are identified so that designers can revise the design in attempts to make the paths faster.
In some cases, an integrated circuit has a wide range of potential operating states and other factors that affect the timing of the paths over which the integrated circuit is required to operate. For example, microprocessors and/or systems on a chip (SOCs) can have numerous operating states (combinations of supply voltage and clock frequency) at which they are required to operate. Changes in the supply voltage can cause changes in the delays on the paths, and some paths scale differently based the amount of delay that is the operation of the logic circuits versus the amount of delay that is due to parasitic capacitance and resistance in the path. Thus, a path that meets timing at one operating state may not meet timing at another operating state. In addition to the operating states, the design is generally required to meet timing across variations in the manufacturing process. Differences in the manufacturing process can lead to faster or slower circuits. Generally, the design is timed using various points across the spectrum of manufacturing process differences. Each point at which timing is measured as a function of manufacturing process variation is referred to as a process corner. Additionally, the temperature at which the integrated circuit operates can vary, and timing of the paths can vary as a function of temperature as well.
The process of performing static timing analysis, modifying the design, and repeating the analysis can consume a large part of the integrated circuit design cycle. Achieving timing closure, which refers to the integrated circuit design meeting timing requirements at each combination of operating state, temperature, and process corner, is a significant challenge. In some cases, timing closure is not reached and timing targets have to be relaxed to meet schedule requirements. In addition to meeting both setup and hold time requirements for timing closure, many integrated circuit designs are being optimized for power. Paths that have timing slack, meaning that they meet timing requirements with a margin of additional time, can be revised to consume less power at a cost of a longer path delay, reducing the margin. Power optimization can make the paths more challenging to close across the operating states, process corners, and temperatures.
In one embodiment, an integrated circuit may be designed using a library of clocked circuits that have programmable clock delays that may be inserted on the clock input to the clocked circuits. The clocked circuits may include sequential elements (e.g. flops, latches, registers, etc.), macros such as register files and memory arrays, and/or clock gater circuits that provide a conditionally gated clock to sets of other clocked circuits. During the design process, timing paths which are challenging due to significant variations across operating states, process corners, and/or temperature may be met by using the clocked circuits with programmable delays and inserting a delay control circuit that programs the delays based on the current operating state, process corner used to manufacture the integrated circuit, and/or temperature. That is, different delays may be selected by the delay control circuit depending on inputs that identify the operating state, the process corner, and/or the temperature. Because the clock delay is intentionally varied (or skewed), the timing of the path may be different at different operating states, temperatures, or process corners and thus may meet timing in each case by changing the clock skew during operation. The programmable clock skew and delay controls may also be used, in some embodiments, to improve yield and enhance post silicon validation debugging, as described in more detail below.
The following detailed description makes reference to the accompanying drawings, which are now briefly described.
While embodiments described in this disclosure may be susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean “including, but not limited to.” As used herein, the terms “first,” “second,” etc. are used as labels for nouns that they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical, etc.) unless specifically stated.
Within this disclosure, different entities (which may variously be referred to as “units,” “circuits,” other components, etc.) may be described or claimed as “configured” to perform one or more tasks or operations. This formulation—[entity] configured to [perform one or more tasks]—is used herein to refer to structure (i.e., something physical, such as an electronic circuit). More specifically, this formulation is used to indicate that this structure is arranged to perform the one or more tasks during operation. A structure can be said to be “configured to” perform some task even if the structure is not currently being operated. A “clock circuit configured to generate an output clock signal” is intended to cover, for example, a circuit that performs this function during operation, even if the circuit in question is not currently being used (e.g., power is not connected to it). Thus, an entity described or recited as “configured to” perform some task refers to something physical, such as a device, circuit, memory storing program instructions executable to implement the task, etc. This phrase is not used herein to refer to something intangible. In general, the circuitry that forms the structure corresponding to “configured to” may include hardware circuits. The hardware circuits may include any combination of combinatorial logic circuitry, clocked storage devices such as flops, registers, latches, etc., finite state machines, memory such as static random access memory or embedded dynamic random access memory, custom designed circuitry, analog circuitry, programmable logic arrays, etc. Similarly, various units/circuits/components may be described as performing a task or tasks, for convenience in the description. Such descriptions should be interpreted as including the phrase “configured to.”
The term “configured to” is not intended to mean “configurable to.” An unprogrammed FPGA, for example, would not be considered to be “configured to” perform some specific function, although it may be “configurable to” perform that function. After appropriate programming, the FPGA may then be said to be “configured” to perform that function.
Reciting in the appended claims a unit/circuit/component or other structure that is configured to perform one or more tasks is expressly intended not to invoke 35 U.S.C. § 112(f) interpretation for that claim element. Accordingly, none of the claims in this application as filed are intended to be interpreted as having means-plus-function elements. Should Applicant wish to invoke Section 112(f) during prosecution, it will recite claim elements using the “means for” [performing a function] construct.
In an embodiment, hardware circuits in accordance with this disclosure may be implemented by coding the description of the circuit in a hardware description language (HDL) such as Verilog or VHDL. The HDL description may be synthesized against a library of cells designed for a given integrated circuit fabrication technology, and may be modified for timing, power, and other reasons to result in a final design database that may be transmitted to a foundry to generate masks and ultimately produce the integrated circuit. Some hardware circuits or portions thereof may also be custom-designed in a schematic editor and captured into the integrated circuit design along with synthesized circuitry. The integrated circuits may include transistors and may further include other circuit elements (e.g. passive elements such as capacitors, resistors, inductors, etc.) and interconnect between the transistors and circuit elements. Some embodiments may implement multiple integrated circuits coupled together to implement the hardware circuits, and/or discrete elements may be used in some embodiments. Alternatively, the HDL design may be synthesized to a programmable logic array such as a field programmable gate array (FPGA) and may be implemented in the FPGA.
As used herein, the term “based on” or “dependent on” is used to describe one or more factors that affect a determination. This term does not foreclose the possibility that additional factors may affect the determination. That is, a determination may be solely based on specified factors or based on the specified factors as well as other, unspecified factors. Consider the phrase “determine A based on B.” This phrase specifies that B is a factor used to determine A or that affects the determination of A. This phrase does not foreclose that the determination of A may also be based on some other factor, such as C. This phrase is also intended to cover an embodiment in which A is determined based solely on B. As used herein, the phrase “based on” is synonymous with the phrase “based at least in part on.”
This specification includes references to various embodiments, to indicate that the present disclosure is not intended to refer to one particular implementation, but rather a range of embodiments that fall within the spirit of the present disclosure, including the appended claims. Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure.
This specification may use the words “a” or “an” to refer to an element, or “the” to refer to the element. These words are not intended to mean that there is only one instance of the element. There may be more than one in various embodiments. Thus, “a”, “an”, and “the” should be interpreted to mean “one or more” unless expressly described as only one.
This specification may describe various components, units, circuits, etc. as being coupled. In some embodiments, the components, units, circuits, etc. may be coupled if they are electrically coupled (e.g. directly connected or indirectly connected through one or more other circuits) and/or communicatively coupled.
In an embodiment, a library of clocked circuits is provided for the design of an integrated circuit, with programmable clock delays incorporated into the clocked circuits in the library. Different variations of the same underlying clocked circuit (e.g. a flop) may be provided with different amounts of selectable clock delay (or clock skew). The programmable clocked circuits may have a delay input that may be driven by a delay control circuit in the integrated circuit. The delay control circuit may be designed to select the appropriate delay for combinations of current operating states, process corners, and/or current temperatures. The timing of the path may thus be varied as needed to meet timing requirements, providing more flexibility in the design process for the integrated circuit. Timing closure may be more rapidly achieved by providing the designers with the library of programmable clocked circuits to use in tuning the timing paths in their designs. That is, when faced with a difficult timing path that scales differently across the operating states, process corners, and/or temperatures, a designer may instantiate a programmable clocked circuit that supports a set of delays which allow the path to meet timing across the operating states, process corners, and temperatures. The delay control circuit may be coded to select the delays for the instantiated programmable clocked circuit based on inputs that identify the operating state, process corner, and temperature. The process of analyzing the paths and instantiating the programmable clocked circuits may be manual (performed by the designer directly), may be automatically implemented in the design tools available to the designer, or a combination of manual and automatic insertion may be used.
In an embodiment, the programmable clocked circuits may be inserted as needed in a design, and standard clocked circuits without programmable clock delays may be used on paths that are not challenging. The programmable clocked circuits may be larger and/or may consume more power than their non-programmable counterparts, and thus it may not be desirable to use the programmable clocked circuits for all clocked circuits in the design.
Generally, a clocked circuit may include any standard cell in a standard cell library that receives a clock input and performs an operation that is at least in part responsive to the clock input. The clocked circuits may include sequential elements which capture and store data from a data input to the sequential element responsive to the clock input. For example, sequential elements may include flops, latches, registers, and the like. The sequential elements may also launch the stored data on a data output of the sequential elements responsive to the clock input. The clocked circuits may include clock gater circuits. A clock gater circuit may be included in the clock tree that distributes the clock across the integrated circuit, and may be used to conditionally gate the clock when the circuitry served by the clock gater circuit is idle. That is, a gated clock is not toggling while it is gated, and is toggling when ungated. The clocked circuits may also include custom macros such as register files, memories such as static random access memories, and the like. The custom macros may use the clock input in a variety of ways that are custom-designed for the macro. The clocked circuits may further include programmable delay circuits that apply a programmable delay to a clock input to generate a clock output.
In addition to facilitating timing closure, the programmable clocked circuits may be used to enhance other aspects of the integrated circuit. For example, programmable clocked circuits may be instantiated on paths that appear to meet timing requirements, but which are vulnerable to failure because the timing slack is relatively small. Also, paths which have characteristics that have shown difficulty in the past may be considered vulnerable and may have programmable clocked circuits instantiated. Examples of vulnerable paths may include paths that experience significant cross talk noise, paths that cover a large area of the IC 10 and thus may be more vulnerable to cross chip variations, etc.
Paths that are vulnerable may be more likely to be causes of failure during post silicon validation, when the integrated circuit has been fabricated and is being tested. Previously, the paths would be identified as failing in the post-silicon validation, and the integrated circuit design may be modified to correct the path. However, the ability to continue testing and identify other potential issues may be limited by the failing path until the revised design has been fabricated, introducing delay and cost into the post-silicon validation process. If the paths are identified as failing and the paths have programmable clocked circuits on them, the clock delays in the programmable clocked circuits may be programmed to overcome the failing path, allowing the path to function correctly. This may allow post silicon validation to progress further, identifying other issues that may be masked by the failing paths, without waiting for a revised integrated circuit to be manufactured.
Even further, the inclusion of the programmable clocked circuits may be used for yield improvement for the integrated circuit. If a given path is failing in the final design of the integrated circuit more frequently than desired (reducing the yield of functional integrated circuits that can be used in products), and the given path has programmable clocked circuits, the failure may be overcome by changing the programming of the programmable clocked circuits so that the paths meet timing requirements.
Turning now to
The clock source 12 may be any type of clock generation circuitry, in various embodiments. For example, the clock source 12 may include one or more phase locked loops (PLLs) that generate the clock or clocks from a reference clock and lock the phase to the reference clock. The clock source 12 may include one or more delay locked loops (DLLs) that generate the clock or clocks from a reference clock and lock the phase to the reference clock. One or more clock multipliers or clock dividers may be used. In other embodiments, the clock source 12 may simply be an input to the integrated circuit 10.
The clock tree 14 may include one or more clock gater circuits (more briefly “clock gaters” or “CG” in
The clock gaters 16A-16B may be programmable clocked circuits in this embodiment, and thus are coupled to a delay control circuit 24. The delay control circuit 24 may be configured to provide delay values for the clock gaters 16A-16B as discussed below. In other embodiments, there may be clock gaters 16A-16B that are do not have programmable clock delays as well, or one of the clock gaters 16A-16B may have a programmable clock delay while the other clock gater 16A-16B does not. There may also be embodiments in which none of the clock gaters 16A-16B have programmable clock delays (e.g. other clocked circuits in the integrated circuit 10 may have programmable clock delays).
The leaf nodes of the clock tree 14 may be coupled to the clock inputs on various sequential elements such as the sequential elements 18A-18B shown in
In the illustrated embodiment, both the sequential elements 18A-18B have programmable clock delays and thus are coupled to the clock delay control circuit 24. The clock delay control circuit 24 may provide delay values as discussed below. In other embodiments, only one of the sequential elements 18A-18B may have a programmable clock delay and the other sequential element 18A-18B may not have a programmable clock delay. Still further, there may be embodiments in which none of the sequential elements 18A-18B have a programmable clock delay (e.g., other clocked circuits in the integrated circuit 10 may have a programmable clock delay).
While one combinatorial logic circuit 22 is shown in
The custom macro(s) 20 in
The clock delay control circuit 24 is configured to generate the delay values for the programmable clocked circuits in the integrated circuit 10 (e.g. the clock gaters 16A-16B, the sequential elements 18A-18B, and the custom macros 20 in the illustrated embodiment). More particularly, the clock delay control circuit 24 may receive inputs that identify the current operating state (“PState” in
In the illustrated embodiment, a power management unit (PMU) 26 may provide the PState input to the clock delay control circuit 24. The PMU 26 may control the PState for various circuits in the integrated circuit 10. In general, the PState may be a combination of clock frequency and supply voltage magnitude being supplied to the integrated circuit 10 or a portion of the integrated circuit 10. Based on the PState that applies to the programmable clocked circuits in the integrated circuit 10, the clock delay control circuit 24 may generate corresponding delay values. In an embodiment, there may be multiple PStates for different subsets of the programmable clocked storage devices. For example, an IC 10 may include multiple processors that may be operating at different PStates. An IC 10 may include other circuits (e.g. various peripheral circuits if the IC 10 is an SOC) that may operate at different PStates from each other and/or the one or more processors. The clock delay circuit 24 may receive each PState that applies to one or more programmable clocked circuits in the IC 10 may be configured to generate corresponding delay values based on the respective PStates. The PMU 26 may also be referred to as a dynamic voltage and frequency management (DVFM) controller or unit.
The IC 10 may include a thermal control circuit (or thermal controller) 28 which may identify the temperature in the IC 10. The thermal control circuit 28 may include one or more temperature sensors, which may be distributed across the area occupied by the integrated circuit 10, to measure temperature. The temperature control circuit 28 may capture the temperature measurements periodically, and may generate the temperature identification for the clock delay control circuit 24. The identified temperature may be the maximum detected temperature, a combination of the measured temperatures (e.g. an average), or may be selected to be the temperature of the temperature sensors that are physically nearest to the programmable clocked circuits controlled by the clock delay control circuit 24. In an embodiment, a portion of the thermal controller 28 may be implemented in software that executes on one or more processors in the IC 10. The instructions implementing the software may be stored on a non-transitory computer accessible storage medium that may be part of the thermal controller 28. In other embodiments, the thermal controller 28 may provide a temperature input to the PMU/DVFM controller 26, which may determine if a different PState may be appropriate to reduce the temperature. In some such embodiments, the temperature may not be provided directly to the clock delay control circuit 24. Alternatively, the temperature may remain in input to the clock delay control circuit 24 in other embodiments, in addition to potentially affecting the PState.
The clock delay control circuit 24 may also generate the delay values based on the process corner. The process corner may be fixed for a given instance of the integrated circuit 10, and may be recorded in a manner that is accessible to the clock delay control circuit 24. For example, in the illustrated embodiment, the IC 10 includes a set of fuses 30 that may be selectively blown during manufacture to record various values for the IC 10. One or more of the fuses 30 may be selectively blown to identify the process corner that was in effect at the time the IC 10 was manufactured. The fuses 30 may be coupled to the clock delay control circuit 24 may identify the process corner.
In the embodiment of
Other embodiments may use a combination of the global delay value distribution and the unique value distribution. For example, a subset of the programmable clocked circuits may receive a global value while others receive unique values. Alternatively, more than one global value may be transmitted. A given programmable clocked circuit may receive one of the global values, and may decode it to determine the selected delay. Such an embodiment may allow more variation in the selected delays than a single global value (and/or may simplify the decode circuitry in the programmable clocked circuits) but may not require as much routing cost as the full unique value solution.
It is noted that, while some embodiments may include operating state, temperature, and process corner in determining delay values for the programmable clocked circuits, other embodiments may employ subsets of the above depending on which factors contribute to the paths that employ the programmable clocked circuits to meet timing. For example, in some cases, the paths may not be sensitive enough to temperature to need the temperature input. Other embodiments may eliminate the process corner input or the operating state input, if not required. Any combination of factors may be used in various embodiments.
The clock input is coupled to a programmable delay circuit (PDC) 48 in the sequential element 18A. The embodiment of the PDC 48 shown in
By controlling the inputs to the muxes 40A-40C, various delays may be selected. For example, if each mux 40A-40C selects its delay chain input, a delay equal to the sum of the delays of the delay chains 42A-42C and the muxes 40A-40C may be applied to the clock input to generate the delayed clock on the internal input. If mux 40C selects the clock input, then a delay that approximates zero may be applied (e.g. only the delay through the mux 40C may be incurred). If other muxes 40A-40B select the clock input, then delays between the sum of all the delay chains 42A-42C and zero may be applied. For example, if the mux 40A selects the clock input and the mux 40B selects the delay chain 42B (and if there are not any additional muxes and delay chains in the circuit), then the delay would be the sum of the delays of the delay chains 42B-42C and the muxes 40A-40C.
Optionally, a delay control decode circuit 46 may be provided in the PDC 48 to decode the delay value from the clock delay control circuit 24 to generate the mux selects for the muxes 40A-40C. The delay control decode circuit 46 may be designed during the timing analysis phase of the design, based on the delay value that will be provided and the desired delay that maps to that delay value. The delay control decode circuit 46 may be coupled to the delay control circuit 24 and may receive the delay value, and may be coupled to the mux selects for the muxes 40A-40C to control the selected delay. In some embodiments, the delay control decode circuit 46 may not be needed (e.g. if individual delay values are provided similar to
The delay chains 42A-42C may be designed to have approximately the same delays, or different amounts of delay may be employed for one or more of the delay chains 42A-42C as compared to other ones of the delay chains 42A-42C. In an embodiment, each delay chain 42A-42C may have a different delay. Any combination of delays may be implemented as desired in various embodiments. Generally, each delay chain may include one or more buffers that has a specified amount of delay. For example, a buffer may be formed from a series connection of two inverters in complementary metal-oxide-semiconductor (CMOS) logic.
Other embodiments may use different structures than the mux and delay chain structure for the PDC 48 shown in
Similar to the discussion above with regard to the sequential element 18A, the clock input to the clock gater 16A is coupled to a programmable delay circuit (PDC) 58 in the clock gate 16A. The embodiment of the PDC 58 shown in
By controlling the inputs to the muxes 50A-50C, various delays may be selected. For example, if each mux 50A-50C selects its delay chain input, a delay equal to the sum of the delays of the delay chains 52A-52C and the muxes 50A-50C may be applied to the clock input to generate the delayed clock on the internal input to the gater circuit 54. If mux 50C selects the clock input, then a delay that approximates zero may be applied (e.g. only the delay through the mux 50C may be incurred). If other muxes 50A-50B select the clock input, then delays between the sum of all the delay chains 52A-52C and zero may be applied. For example, if the mux 50A selects the clock input and the mux 50B selects the delay chain 52B (and if there are not any additional muxes and delay chains in the circuit), then the delay would be the sum of the delays of the delay chains 52B-52C and the muxes 50A-50C.
Optionally, a delay control decode circuit 56 may be provided in the PDC 58 to decode the delay value from the clock delay control circuit 24 to generate the mux selects for the muxes 50A-50C. The delay control decode circuit 56 may be designed during the timing analysis phase of the design, based on the delay value that will be provided and the desired delay that maps to that delay value. The delay control decode circuit 56 may be coupled to the delay control circuit 24 and may receive the delay value, and may be coupled to the mux selects for the muxes 50A-50C to control the selected delay. In some embodiments, the delay control decode circuit 56 may not be needed (e.g. if individual delay values are provided similar to
The delay chains 52A-52C may be designed to have approximately the same delays, or different amounts of delay may be employed for one or more of the delay chains 52A-52C as compared to other ones of the delay chains 52A-52C. In an embodiment, each delay chain 52A-52C may have a different delay. Any combination of delays may be implemented as desired in various embodiments. Generally, each delay chain may include one or more buffers that has a specified amount of delay. For example, a buffer may be formed from a series connection of two inverters in complementary metal-oxide-semiconductor (CMOS) logic.
Other embodiments may use different structures than the mux and delay chain structure for the PDC 58 shown in
Similar to the discussion above with regard to the sequential element 18A, the clock input to the macro 20 is coupled to a programmable delay circuit (PDC) 68 in the macro 20. The embodiment of the PDC 68 shown in
By controlling the inputs to the muxes 60A-60C, various delays may be selected. For example, if each mux 60A-60C selects its delay chain input, a delay equal to the sum of the delays of the delay chains 62A-62C and the muxes 60A-60C may be applied to the clock input to generate the delayed clock on the internal input to the macro circuit 64. If mux 60C selects the clock input, then a delay that approximates zero may be applied (e.g. only the delay through the mux 60C may be incurred). If other muxes 60A-60B select the clock input, then delays between the sum of all the delay chains 62A-62C and zero may be applied. For example, if the mux 60A selects the clock input and the mux 60B selects the delay chain 62B (and if there are not any additional muxes and delay chains in the circuit), then the delay would be the sum of the delays of the delay chains 62B-62C and the muxes 60A-60C.
Optionally, a delay control decode circuit 66 may be provided in the PDC 68 to decode the delay value from the clock delay control circuit 24 to generate the mux selects for the muxes 60A-60C. The delay control decode circuit 66 may be designed during the timing analysis phase of the design, based on the delay value that will be provided and the desired delay that maps to that delay value. The delay control decode circuit 66 may be coupled to the delay control circuit 24 and may receive the delay value, and may be coupled to the mux selects for the muxes 60A-60C to control the selected delay. In some embodiments, the delay control decode circuit 66 may not be needed (e.g. if individual delay values are provided similar to
The delay chains 62A-62C may be designed to have approximately the same delays, or different amounts of delay may be employed for one or more of the delay chains 62A-62C as compared to other ones of the delay chains 62A-62C. In an embodiment, each delay chain 62A-62C may have a different delay. Any combination of delays may be implemented as desired in various embodiments. Generally, each delay chain may include one or more buffers that has a specified amount of delay. For example, a buffer may be formed from a series connection of two inverters in complementary metal-oxide-semiconductor (CMOS) logic.
Other embodiments may use different structures than the mux and delay chain structure for the PDC 68 shown in
The delay control generator circuit 70 may be designed during the timing analysis phase, as the programmable clocked circuits are instantiated and the delay selections for various combinations of the PState, temperature, and/or process corners are determined. The delay control generator circuit 70 may include combinatorial logic, state machines, clocked storage such as sequential elements, etc. to determine the delay value or values to be transmitted to the programmable clocked circuits. In an embodiment, the delay control generator circuit 70 may be coded in a hardware description language and synthesized, similar to other logic circuitry in the integrated circuit 10. The delay control generator circuit 70 may map the combinations of PState, temperature, and/or process corner to the delay selections.
The programmable overrides 72 may provide the ability to override one or more delays for post-silicon validation and/or yield improvement, as described above. That is, if a path is failing and the delay to one or more of the programmable clocked circuits may be changed to permit the path to pass, the programmable overrides 72 may be used to change the delay. The programmable overrides 72 may be programmed with the delay value to provide, and an enable may be set to select the override in place of the delay value from the delay control generation circuit 70. Thus, the mux 74 may be representative of multiple muxes 74, one for each delay value that may be overridden. Some delay values may not be overridden and may not include the mux 74 in the path for those delay values (e.g. the output of the delay control generation circuit 70 may be output directly to the programmable clocked circuits).
Since incorrect values in the delay table 80 may lead to malfunction in the integrated circuit 10, the delay table 80 may be programmed with the delay values in a secure fashion, so that only the validly-determined delays are provided in the table. In an embodiment, the delays may be provided from a secure, on-chip non-volatile memory, for example. The delay table 80 may still support overrides for post-silicon validation and/or yield improvement. In this case, the source of the data for the delay table 80 may be updated with the overrides. The programming of overrides may also be protected by a secure mechanism to prevent invalid delays from being provided, which could cause erroneous operation.
The designers may generate a design of the IC 10 (block 90). Generating the design may include coding the design in a hardware description language and synthesizing the design using a library of standard cells and/or custom designing circuitry using schematic capture tools. The library of standard cells may include various instances of the clock gaters 16A-16B and the sequential elements 18A-18B with different configurations of programmable clock delay. Initially, however, the synthesis tool may select clock gaters 16A-16B and sequential elements 18A-18B that do not include the programmable delay. The macros 20 may be instantiated in the design without programmable delay as well, and may be replaced by macros with programmable delay as needed.
The designers may perform static timing analysis (STA) on the design, using parameters that specify each process corner as well as combinations of the operating state (PState) and temperature (block 92). The designers may also perform static timing analysis on the design at individual corners (process, operating state, and temperature) with the useful skewing option selected (block 94). Useful skewing may be an option that some STA tools support in which the tools attempt to identify clock skewing (e.g. intentional insertion of clock delay) on certain sequential elements that may improve the timing characteristics of paths that include those sequential elements. Typically, the useful skewing is performed across all the corners, and only identifies skewing that benefits a path across all the corners (or that minimally worsens a path at one or more corners while benefitting the path at one or more other corners). By running the useful skewing on individual corners, skewing that may benefit a path at one corner may be identified. The per-corner useful skewing results may help identify paths which could benefit from programmable clocked circuits such as those described herein.
The designers may analyze the static timing results from the static timing analysis performed at blocks 92 and 94, identifying paths that fail to meet timing at one or more corners (block 96). Paths that fail to meet timing at all corners, or most corners, may be solved using traditional tuning techniques. However, paths that are sensitive to changes in corners may be identified (block 98). That is, paths that meet timing at most corners but fail at particular corners, or paths for which timing varies significantly based on changes in corners, may be identified. Such paths may benefit from the use of programmable clocked circuits. Additionally, the paths may be analyzed to identify paths that have known characteristics that have caused unexpected timing failures in the past (block 100). For example, paths that may be subject to significant cross talk noise may be identified.
The designers may instantiate the programmable clocked circuits on the identified paths, replacing clock gaters, sequential elements, or macros with corresponding circuits that include programmable delays on the clock input (block 102). Different instances of the programmable clocked circuits may have different configurations of the programmable delay circuit, based on the needs of the particular path. For example, if a path needs 20 picoseconds of delay at one corner, and 60 picoseconds of delay at another corner, a programmable delay circuit may be selected that may provide both 20 picoseconds of the delay and 60 picoseconds of delay based on different values on the delay input. A different programmable delay circuit may be selected for another path having different delay requirements.
The clock delay control circuit 24 may be instantiated and connected to the programmable clocked circuits (block 104). In one embodiment, the clock delay control circuit 24 may be programmed in a hardware description language and synthesized, similar to other parts of the IC 10. Alternatively, if the clock delay control circuit 24 is implemented as a table, the circuit may be instantiated and the outputs connected to the programmable clocked circuits.
The IC 10 may be tested using various test patterns developed during the design phase, that stress various paths in the IC 10 (block 110). If one or more tests fail, then one or more failing paths may be identified (block 112). The failing paths may be identified to the designers to determine a fix for the failing path, which may be incorporated into the next revision of the IC 10. Additionally, if the failing paths are equipped with programmable clocked circuits (decision block 114, yes leg), the delay in in the programmable clocked circuits may be overridden to permit the failing path to pass (block 116). Additional testing may be performed to determine if there are additional failures that were previously masked by the initial failing paths.
Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
10069496, | May 02 2017 | NXP USA, INC.; NXP USA,INC | Circuit for compensating for both on and off-chip variations |
10886903, | Aug 20 2019 | Apple Inc. | Programmable clock skewing for timing closure |
8531225, | May 18 2012 | MEDIATEK SINGAPORE PTE. LTD. | Configurable critical path emulator |
8531255, | Dec 11 2009 | Qorvo US, Inc | Two-track surface acoustic wave device with interconnecting grating |
8856578, | Aug 20 2010 | SOCIONEXT INC | Integrated circuit device including skew adjustment circuit and skew adjustment method |
9053257, | Nov 05 2012 | Advanced Micro Devices, Inc.; Advanced Micro Devices, INC | Voltage-aware signal path synchronization |
9407247, | Apr 09 2013 | International Business Machines Corporation | Programmable delay circuit |
9536038, | Apr 13 2015 | Qualcomm Incorporated | Method and algorithm for functional critical paths selection and critical path sensors and controller insertion |
9664737, | Aug 19 2014 | MEDIATEK INC | Method for providing an on-chip variation determination and integrated circuit utilizing the same |
9760672, | Dec 22 2014 | Qualcomm Incorporated | Circuitry and method for critical path timing speculation to enable process variation compensation via voltage scaling |
20010043097, | |||
20070091710, | |||
20090115471, | |||
20090134912, | |||
20120044003, | |||
20130009669, | |||
20130117582, | |||
20130311792, | |||
20130311799, | |||
20160065195, | |||
20160380748, | |||
20170046469, | |||
20170184664, | |||
20180048319, | |||
20200005729, | |||
20200028514, | |||
20200081062, | |||
20200091608, | |||
20200118612, | |||
20200143901, | |||
20200412344, | |||
20210089071, | |||
20210135681, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Dec 04 2020 | Apple Inc. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Dec 04 2020 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Sep 14 2024 | 4 years fee payment window open |
Mar 14 2025 | 6 months grace period start (w surcharge) |
Sep 14 2025 | patent expiry (for year 4) |
Sep 14 2027 | 2 years to revive unintentionally abandoned end. (for year 4) |
Sep 14 2028 | 8 years fee payment window open |
Mar 14 2029 | 6 months grace period start (w surcharge) |
Sep 14 2029 | patent expiry (for year 8) |
Sep 14 2031 | 2 years to revive unintentionally abandoned end. (for year 8) |
Sep 14 2032 | 12 years fee payment window open |
Mar 14 2033 | 6 months grace period start (w surcharge) |
Sep 14 2033 | patent expiry (for year 12) |
Sep 14 2035 | 2 years to revive unintentionally abandoned end. (for year 12) |