18

LJgibbs · 2023年07月07日 · 北京市

[译文] 在综合中约束多路同步时钟时序路径

image.png

原文地址:https://vlsitutorials.com/constraining-multiple-synchronous-clock-design-in-synthesis/
后附英文原文

本文是 how to define Synthesis timing constraint 系列文章的第三篇。

在本节中我们讨论一个多时钟设计的约束问题,电路图如图 1 所示。设计中的多路时钟由 PLL 生成以及其后的分频得到,PLL 直接生成一个名为 CLKA 的 3GHz 主时钟,随后有四路分频电路分别在主时钟的基础上分频产生 333.3MHz,500MHz,750MHz 以及 1GHz 的时钟。因为所有时钟都来自于同一个时钟源,因此它们互相之间是同步的,PLL 时钟结构如图 2 所示。

接下来回到图 1 中的设计中,注意其中我们的 IP-1 上并没有所有时钟的端口,只有 CLKC 的时钟端口,但是 CLKC 所在的逻辑的输入数据来自 IP-2 中的 CLKB,逻辑输出被 IP-3 中不同的时钟 CLKD 和 CLKE 分别采样。

image.png
图 1:拥有多路同步时钟的示例设计

image.png
图 2:产生多路时钟的 PLL 示意

Constraining the input port // 约束输入端口

假设 FF-3 的 clock-to-Q 延迟是 0.05ns,组合逻辑 logic- 4 引入的延迟为 0.5ns,FF-1 的建立时间为 0.1ns。

因为驱动我们 IP 输入端口的 FF 的时钟域为 CLKB,而我们的采样时钟是 CLKC,我们需要这样建立输入端口约束,首先在时钟输入端口 CLKC 上约束我们频率为 500MHz 的 CLKC -

create_clock -period 2 [get_ports CLKC]

接下来约束一个对应 CLKB 的虚拟时钟 CLKB,周期为 3ns,作为输入延迟约束的参考时钟。

create_clock -period 3 -name CLKB

之后相对于 CLKB 约束输入端口 Input1 的输入延迟量 0.55ns -

set_input_delay -max 0.55 -clock CLKB [get_ports Input1]

在完成约束之后,让我们从综合工具的角度来看下它是如何基于这些约束,来进行建立时间检查的。

首先,综合工具面对这种有多个同步时钟的情况,会首先计算出多个时钟周期的公共周期(Base Period),也就是各个时钟周期的最小公倍数。对于时钟周期分别为 2 和 3 的 CLKC 和 CLKB 来说,最小公共周期就是 6ns。公共周期的作用是表示出不同时钟波形之间所有可能的相对关系。如图 3 所示,两者时钟都在 0ns 时置起,并在 6ns 时第二次置起。此后,6-12ns,12ns-18ns 等等的情况就与 0-6ns 别无二致了。所以,工具也就是只需要在 0-6ns 之间寻找两者之间最严格的时序关系(worse-case),并将找到的最严时序关系作用到所有时钟周期的检查中。

image.png
图 3:输入路径在公共周期内的最严格的时序关系

在 0-6ns 之间有多组两者时钟之间的发送和接收关系,综合工具寻找最严时序的方法如下,在示例中,CLKB 是发送时钟,因此第一次发送数据是在 0ns 时刻,第一次接收数据是 2ns 时刻的 CLKC 上升沿,所以第一次两者之间的延迟是 2ns。第二次发送数据是 CLKB 在 3ns 时刻,对应的接收数据时刻是 4ns 的 CLKC 边沿,所以两者之间的有效时钟周期是 1ns,因此在这个公共周期内,最严格的情况是有效周期为 1ns 时。

因此,为了满足 FF-1 上的建立时间-

  • (Delay due to combo logic-1) ≤ {(worst effective clock period) – (setup time of FF-1) – (Input delay)
  • (delay due to combo logic-1) ≤ {(1ns) – (0.1ns) – (0.55ns)}
  • 由此可以计算出,组合逻辑 logic-1 所能引入的最大延迟是 0.35ns

Constraining the output port // 约束输出端口

我们的 IP-1 的 Output1 输出端口,存在两个不同的接收时钟。

和输入端口约束类似,首先在时钟输入端口 CLKC 上约束我们频率为 500MHz 的 CLKC,然后约束另外两个对应频率的虚拟时钟 CLKE 和 CLKD,频率分别是 1GHz 和 750MHz -

create_clock -period 2 [get_ports CLKC] create_clock -period 1 -name CLKE create_clock -period [expr {1000/750.0}] -name CLKD

注意:当使用 TCL 的 expr 语法时,必须保证两个数字均为实数,以保证算出来的也是实数。

接下来,相对于 CLKD 和 CLKE 分别约束 Output1 的输出延迟。(译注:此处输出延迟量和下文不一致,取下文使用的数值)

set_output_delay -max 0.2 -clock CLKD [get_ports Output1]

set_output_delay -max 0.57 -clock CLKE -add_delay [get_ports Output1]

我们希望综合工具能够考虑所有两个接收时钟域的路径,并基于最严格的情况优化经过组合逻辑 logic-3 的输出路径。假设 FF-4 和 FF-5 的建立时间均为 0.1ns(原文为 1ns 有误,应为 0.1ns),组合逻辑 logic-7 和 logic-8 的延迟分别是 0.1 和 0.47ns,让我们看下综合工具是如何计算和进行 setup 检查的。

同样地,首先工具计算三个时钟之间的公共周期,三个时钟周期的最大公约数是 4ns,如图 4 所示,4ns 时刻所有时钟上升沿再次对齐。(译注:图 4 中的横坐标有误,应为 0-4ns)

image.png
图 4:三个时钟公共周期中最严格的时序要求

对于两组发送时钟-接收时钟组合,工具都会检查并寻找两者之中对时序要求最严格的路径。如图 4 所示,在 0-4ns 期间,共有 4 组发送-接收边沿关系,CLKC-CLKD 之间最严格的时序要求是 0.67ns,CLKC-CLKE 之间最严格的时序要求是 1ns。

因此,我们可以计算出两者之间的关键路径最大延迟长度 -

For CLKC→CLKD path:

(Delay due to combo logic-3) ≤ {(worst effective clock period) – (Output delay)}
(delay due to combo logic-3) ≤ {(0.67ns) – (0.2ns)}
由此可以计算出,组合逻辑 logic-3 所能引入的最大延迟是 0.47ns

For CLKC→CLKE path:

(Delay due to combo logic-3) ≤ {(worst effective clock period) – (Output delay)}
(delay due to combo logic-3) ≤ {(1ns) – (0.57ns)}
由此可以计算出,组合逻辑 logic-3 所能引入的最大延迟是 0.43ns

因此,关键路径存在于 CLKC-CLKE 路径上,工具会尽力优化组合逻辑 logic-3 的延迟,以此满足不大于 0.43ns 的时序要求。

原文

This is article-3 of how to define Synthesis timing constraint

Consider the example shown in Figure 1, where we have multiple clocks. As shown in Figure 2, the PLL is generating a main clock named CLKA of frequency 3 GHz, and there are 4 dividers generating CLKB, CLKC, CLKD and CLKE of frequency 333.3 MHz, 500 MHz, 750 MHz and 1 GHz respectively, from the main clock. Since all these clocks are derived from the same clock source, they are all synchronous to each other. Now coming back to the Figure 1, notice that some clocks don’t have a corresponding clock port on our IP. The clock to our IP is CLKC, but the input is launched from CLKB in IP-2 and the output is captured by different clocks – CLKD and CLKE in IP-3.

image.png
Figure 1: Multiple synchronous clocks in a design

image.png
Figure 2: Multiple synchronous clock generated from a PLL

Constraining the input port

Assume that the clock-to-Q delay of FF-3 is 0.05ns, the delay due to combo logic-4 is 0.5ns and the setup time of FF-1 is 0.1ns.

The clock that is driving the launching flop is CLKB whereas the clock for our IP is CLKC. We first create the clock for our IP with a period of 2ns (corresponding to CLKC’s frequency of 500 MHz) and apply that on our input port CLKC –

create_clock -period 2 [get_ports CLKC]

We then create a virtual clock of 3ns (corresponding to CLKB’s frequency of 333.3 MHz) which can be used as a reference clock for the input delay and name the virtual clock CLKB –

create_clock -period 3 -name CLKB

Now we simply specify the input delay relative to the virtual clock CLKB and we apply that to our input port Input1 –

set_input_delay -max 0.55 -clock CLKB [get_ports Input1]

Now let’s see how the synthesis tool determines the internal delay (in combo logic-1) for setup check.

The first thing the tool does when faced with a timing path that has multiple clocks is to derive the base period. The base period is defined as the least common multiple of the clock periods involved. For a clock period of 2ns and 3ns, the least common multiple is 6ns. The significance of the base period is it represents the smallest amount of time with unique clock waveform relationship. As shown in Figure 3, both the clocks launch at 0ns and then line up and starts rising again at 6ns. Basically the clocks waveform between 0-6ns will look the same as between 6-12ns, 12-18ns and so on; this means the tool only needs to find the worst case timing situation between 0-6ns and that will define the worst case timing for all the clock cycles.

image.png
Figure 3: Worst effective clock period for input path in the base period

Since there are multiple launching and capturing clock edges between 0-6ns, here is how the tool figures out the worst-case scenario – In our example CLKB is the launching clock, so the first launching edge happens at 0ns and the first capturing edge from CLKC happens at 2ns, so the first effective clock period is 2ns. The next possible launch edge of CLKB happens at 3ns and the next capturing CLKC edge happens after that at 4ns, so this effective clock period is 1ns, which is smaller than 2ns. Since there are no further launch-capture relationship between CLKC→CLKB within 0-6ns, the worst-case timing is 1ns.

Therefore, for satisfying setup time – → (Delay due to combo logic-1) ≤ {(worst effective clock period) – (setup time of FF-1) – (Input delay)} → Implies, (delay due to combo logic-1) ≤ {(1ns) – (0.1ns) – (0.55ns)} → Thus, maximum possible delay that can be introduced by the combo logic-1 is 0.35ns.

Constraining the output port

On our output port Output1, we have two different capturing clocks.

Here again we create the clock for CLKC of 2ns and we then create two virtual clocks, CLKE of 1ns period (corresponding to 1 GHz) and CLKD with a period corresponding to 750 MHz –

create_clock -period 2 [get_ports CLKC] create_clock -period 1 -name CLKE create_clock -period [expr {1000/750.0}] -name CLKD

Note: While using expr command, it is important for one of the two numbers to be a real number so that the resulting number is also a real number.

Now we specify the output delay relative to the virtual clock CLKD and we apply that to our output port Output1 –

set_output_delay -max 0.15 -clock CLKD [get_ports Output1]

And then we follow up with another output delay relative to the virtual clock CLKE on the same output port –

set_output_delay -max 0.52 -clock CLKE -add_delay [get_ports Output1]

Note: We are using add_delay option in the second constraint since we are applying the constraint to an already constrained port; this ensures the second constraint doesn’t overwrite the first one.

We want the synthesis tool to consider both the paths and optimize the output path through combo logic-3 for the worst-case scenario of either of the two capturing paths. Assuming the setup time of both the flops FF-4 and FF-5 as 1ns and the combinational delay of combo logic-7 and combo logic-8 as 0.1ns and 0.47ns respectively, let’s see how that calculation is done.

Here again the first thing the tool does is to calculate the base period. The least common multiple of the three clock periods is 4ns. If you look at the waveform shown in Figure 4, at 4ns all the clock edges line up and starts rising again.

image.png
Figure 4: Worst effective clock period for output paths in the base period

Since there are two launch-capture paths from CLKC→CLKD and CLKC→CLKE, the tool will consider both paths and pick the worst timing relationship of all the edges between both the launch-capture path relationship. As shown in the waveform, there are four pairs of launch-capture relationship between 0-4ns and the worst effective clock period between CLKC→CLKD launch-capture path is 0.67ns and the worst effective clock period between CLKC→CLKE launch-capture path is 1ns.

Now let us calculate the critical path between the two launch-capture paths –

For CLKC→CLKD path:

→ (Delay due to combo logic-3) ≤ {(worst effective clock period) – (Output delay)}

→ Implies, (delay due to combo logic-3) ≤ {(0.67ns) – (0.2ns)}

→ Thus, maximum possible delay that can be introduced by the combo logic-3 for CLKC→CLKD path is 0.47ns.

For CLKC→CLKE path:

→ (Delay due to combo logic-3) ≤ {(worst effective clock period) – (Output delay)}

→ Implies, (delay due to combo logic-3) ≤ {(1ns) – (0.57ns)}

→ Thus, maximum possible delay that can be introduced by the combo logic-3 for CLKC→CLKE path is 0.43ns.

Considering the worst-case scenario (CLKC→CLKE path), the tool will try to optimize the combo logic-3 such that the combinational delay due to it is less than equal to 0.43ns.

原文:知乎
作者:LogicJitterGibbs

相关文章推荐

更多FPGA干货请关注FPGA的逻辑技术专栏。欢迎添加极术小姐姐微信(id:aijishu20)加入技术交流群,请备注研究方向。

推荐阅读
关注数
10617
内容数
589
FPGA Logic 二三事
目录
极术微信服务号
关注极术微信号
实时接收点赞提醒和评论通知
安谋科技学堂公众号
关注安谋科技学堂
实时获取安谋科技及 Arm 教学资源
安谋科技招聘公众号
关注安谋科技招聘
实时获取安谋科技中国职位信息