棋子 · 2022年12月09日

时钟门控的作用

作者:semiwiki
链接:https://semiwiki.com/eda/321179-the-role-of-clock-gating/

什么是时钟门控?

有几个因素会影响电路的功耗。逻辑门具有静态或泄漏功率,只要对其施加电压,该功率大致恒定,并且它们具有由切换电线产生的动态或开关功率。Flip-flop触发器非常耗电,大约占总功率的 20%。时钟消耗的可能更多,可能约为 40%!全局时钟无处不在,而且每个周期都会切换两次。正如我们将看到的,时钟门控避免了在不需要时钟脉冲时切换时钟。这减少了时钟分配和触发器的功耗,甚至可以减少逻辑门的动态功耗。

即使在繁忙的电路中,当你仔细观察时,大多数逻辑电路大部分时间都没有做有意义的工作。例如,在这个WARP-V CPU核心的跟踪中,CPU几乎每个周期都在执行指令。但计算分支目标的逻辑并不繁忙。它只需要用于分支指令。而浮点逻辑只需要用于浮点指令,等等。在下面的跟踪波形中,大多数信号值是灰色的,表明它们没有被使用。

image.png
显示时钟门控的 CPU 波形

如前所述,将时钟信号驱动到触发器会消耗总功率的很大一部分,因此触发器可以将其输入值传播到其输出以用于下一个执行周期。如果这些触发器的大部分输入信号都是无意义的,那么就没有必要传播它们浪费大量的功耗。

时钟门控切断了不需要的时钟脉冲。(电路也可能被设计成依赖于没有时钟,但我们不要把事情和这种情况混淆)。下面的电路显示了两个时钟门控块(蓝色),它们切断了不需要的时钟脉冲,只在进行有意义的计算时才刚打开时钟脉冲。

image.png
时钟门控图示

除了减少时钟分配和触发器功耗外,时钟门控还可以保证触发器输出在没有时钟脉冲时不会摆动。这降低了下游动态功耗。总之,与非门控电路相比,时钟门控可以节省相当多的功率。

实施时钟门控

时钟门控的一个先决条件是知道什么时候信号有意义,什么时候没有意义。这是事务级Verilog模型中固有的高级感知的一些方面。一个 "事务 "的逻辑是在表明其有效性的条件下表达的。因为一个单一的条件可以应用到事务所遵循的路径上的所有逻辑,所以应用有效性的开销是最小的。

有效性不仅仅是关于时钟门控。可以说,它有助于是否有意义。例如,前面的CPU波形是来自TL-Verilog模型。调试变得更容易了,因为我们已经自动过滤掉了大部分的信号值,将它们识别为无意义的。我们知道它们是无意义的,因为自动检查确保这些值不会被有意义的计算所消耗。

从一开始就有时钟门控的全部意义可能并不明显。我从来没有参与过一个达到时钟门控目标的项目。我们总是带着大量的机会去做芯片。这是因为省电总是最后要实现的事情。功能必须是第一位的。没有它,验证就不能取得进展。逻辑设计人员在完成他们积压的功能错误之前,不能给予时钟门控任何真正的关注,而这要到结束时才会考虑。在这一点上,许多单元已经成功实现了没有完全的时钟门控。该项目无疑已经落后于计划,而增加时钟门控将需要重新实施,包括需要解决新的时序和芯片面积压力。更糟糕的是,它将带来全新的功能错误。因此,我们只能说,如果从一开始就将时钟门控纳入模型,则不需要任何附加成本。

结论

功耗现在是第一阶设计约束,而时钟门控是整个功率策略的重要组成部分。寄存器传输级的建模并不适合成功使用时钟门控。事务级设计可以从一开始就设置时钟门控,对项目进度产生积极的影响,如果您计划生产具有竞争力的芯片,那么从一开始就采用稳健的时钟门控方法非常重要。

原文

What is clock gating?

Several factors contribute to a circuit’s power consumption. The logic gates have static or leakage power that is roughly constant as long as a voltage is applied to them, and they have dynamic or switching power resulting from toggling wires. Flip-flops are rather power-hungry, accounting for maybe ~20% of total power. Clocks can consume even more, perhaps ~40%! Global clocks go everywhere, and they toggle twice each cycle. As we’ll see, clock gating avoids toggling the clock when clock pulses are not needed. This reduces the power consumption of clock distribution and flip-flops, and it can even reduce dynamic power for logic gates.

Even in a busy circuit, when you look closer, most of the logic is not doing meaningful work most of the time. In this trace of a WARP-V CPU core, for example, the CPU is executing instructions nearly every cycle. But the logic computing branch targets isn’t busy. It is only needed for branch instructions. And floating-point logic is only needed for floating-point instructions, etc. Most signal values in the trace below are gray, indicating that they aren’t used.

CPU waveform showing clock gating opportunity

CPU waveform showing clock gating opportunity

As previously noted, a significant portion of overall power is consumed by driving clock signals to flip-flops so the flip-flops can propagate their input values to their outputs for the next cycle of execution. If most of these flip-flop input signals are meaningless, there’s no need to propagate them, and we’re wasting a lot of power.

Clock gating cuts out clock pulses that aren’t needed. (Circuits may also be designed to _depend_ on the absence of a clock pulse, but let’s not confuse matters with that case.) The circuit below shows two clock gating blocks (in blue) that cut out unneeded clock pulses and only pulse the clock when a meaningful computation is being performed.

Illustration of clock gating

Illustration of clock gating

In addition to reducing clock distribution and flip-flop power, clock gating also guarantees that flip-flop outputs are not wiggling when there are no clock pulses. This reduces downstream dynamic power consumption. In all, clock gating can save a considerable amount of power relative to an ungated circuit.

Implementing Clock Gating

A prerequisite for clock gating is knowing when signals are meaningful and when they are not. This is among the aspects of higher-level awareness inherent in a Transaction-Level Verilog model. The logic of a “transaction” is expressed under the condition that indicates its validity. Since a single condition can apply to all the logic along a path followed by transactions, the overhead of applying validity is minimal.

Validity is not just about clock gating. It helps to separate the wheat from the chaff, so to speak. The earlier CPU waveform, for example, is from a TL-Verilog model. Debugging gets easier as we have automatically filtered out the majority of signal values, having identified them as meaningless. And we know they are meaningless because of automatic checking that ensures that these values are not inadvertently consumed by meaningful computations.

With this awareness in our model, default fine-grained clock gating comes for free. The valid conditions are used by default to enable our clock pulses.

The full implications of having clock gating in place from the start may not be readily apparent. I’ve never been on a project that met its goals for clock gating. We always went to silicon with plenty of opportunity left on the table. This is because power savings is always the last thing to be implemented. Functionality has to come first. Without it, verification can’t make progress, and verification is always the long pole. Logic designers can’t afford to give clock gating any real focus until they have worked through their functional bug backlogs, which doesn’t happen until the end is in sight. At this point, many units have already been successfully implemented without full clock gating. The project is undoubtedly behind schedule, and adding clock gating would necessitate implementation rework including the need to address new timing and area pressure. Worse yet, it would bring with it a whole new flood of functional bugs. As a result, well, let’s just say we’re heating the planet faster than necessary. Getting clock gating into the model from the start, at no cost, completely flips the script.

Conclusion

Power is now a first-order design constraint, and clock gating is an important part of an overall power strategy. Register transfer level modeling does not lend itself to the successful use of clock gating. A transaction-level design can have clock gating in place from the start, having a shift-left effect on the project schedule and resulting in lower-power silicon (and indirectly higher performance and lower area as well). If you are planning to produce competitive silicon, it’s important to have a robust clock-gating methodology in place from the start.

作者: allaboutcircuits
文章来源:EETOP

推荐阅读

更多IC设计技术干货请关注IC设计技术专栏。
迎添加极术小姐姐微信(id:aijishu20)加入技术交流群,请备注研究方向。
推荐阅读
关注数
11152
内容数
1221
主要交流IC以及SoC设计流程相关的技术和知识
目录
极术微信服务号
关注极术微信号
实时接收点赞提醒和评论通知
安谋科技学堂公众号
关注安谋科技学堂
实时获取安谋科技及 Arm 教学资源
安谋科技招聘公众号
关注安谋科技招聘
实时获取安谋科技中国职位信息