棋子 · 2024年01月26日

老李带你看菜谱— stx_cookbook 之 加法器 (二)

今天继续往下看菜谱。接下来看Ternary addition,先学个英文单词ternary,意思是3个组成部分,这里的意思就是3个数相加。来看code怎么写的

module ternary_add (a,b,c,o);
parameter WIDTH=8;
parameter SIGN_EXT = 1'b0;
input [WIDTH-1:0] a,b,c;
output [WIDTH+1:0] o;
wire [WIDTH+1:0] o;
generate 
  if (!SIGN_EXT)
     assign o = a+b+c;
  else
     assign o = {a[WIDTH-1],a[WIDTH-1],a} + {b[WIDTH-1],b[WIDTH-1],b} +
                {c[WIDTH-1],c[WIDTH-1],c};
endgenerate
endmodule

这段code还是很简洁明了的,还支持了符号位扩展sign extention,注意一个点是3个N位的数相加,最后的结果N+2位,而不是N+1位。那么这里其实作者想告诉你的是,对于3个数相加,在FPGA的RTL code里就直接写+号就行,FPGA的综合工具是能够帮你把+号这个运算综合成实际的ALM的。看文档中的原话

“The Quartus II Analysis and Synthesis recognizes sums of three binary words and applies the shared arithmetic mode automatically. Area cost is one cell per bit, packed in two cells per ALM, as compared to two cells per bit on a device without share chain support” 

image.png

接下来的例子是对于9个数相加的例子,这里作者说“When combining ternary additions with other arithmetic logic or as part of adder trees, it is best to place them in a submodule. Verilog HDL and VHDL consider “+“ a binary operator, potentially creating ambiguity about which adders to group as a ternary block. ”,所以他建议用上面的ternary_add模块来搭一个adder tree,这样可以直观地告诉FPGA工具怎么样来把9个数分成3组。在这个例子中,它3组数相加的结果先寄存了起来然后再把三组数相加,这样对收敛时序没有压力。老李认同这个思路,即如果你知道9个数相加对于你的时序收敛是个很大问题的时候,对输入进行模块化处理并且适当插入寄存器是一个更好的办法。当然这也取决于信号的位宽,对于ASIC设计里,如果位宽很小,9个数直接相加在目前的工艺下收敛时序并不是太困难,所以也不是不可以直接写a+b+c+d+e+f+g+h+i。

module tern_node (clk,a,b,c,o);
parameter WIDTH = 8;
input clk;
input [WIDTH-1:0] a;
input [WIDTH-1:0] b;
input [WIDTH-1:0] c;
output [WIDTH+2-1:0] o;

reg [WIDTH+2-1:0] o;

always @(posedge clk) begin
  o <= a+b+c;
end

endmodule
//
// pipelined sum of 9 binary words using 
// 4 ternary adder chains.
// This WIDTH 8 example should use 42 DFF and 42
// arithmetic logic cells.   This would require roughly
// 80 arithmetic cells on a binary adder device.
//
module ternary_sum_nine (clk,a,b,c,d,e,f,g,h,i,out);
parameter WIDTH = 8;
input clk;
input [WIDTH-1:0] a,b,c,d,e,f,g,h,i;
output [WIDTH+4-1:0] out;

wire [WIDTH+2-1:0] part0,part1,part2;
// entry layer, 9 => 3
tern_node x (.clk(clk),.a(a),.b(b),.c(c),.o(part0));
defparam x .WIDTH = WIDTH;
tern_node y (.clk(clk),.a(d),.b(e),.c(f),.o(part1));
defparam y .WIDTH = WIDTH;
tern_node z (.clk(clk),.a(g),.b(h),.c(i),.o(part2));
defparam z .WIDTH = WIDTH;
// output layer 3=> 1
tern_node o (.clk(clk),.a(part0),.b(part1),.c(part2),.o(out));
defparam o .WIDTH = WIDTH+2;

endmodule

image.png

接下来的模块很有意思,这里作者说如果是位宽很小的加法器,那么可以直接用组合逻辑来实现,而不要调用加法器去计算进位。FPGA 综合工具可以直接将这个组合逻辑映射到LUT上,在时序和资源上更优。老李觉得在ASIC设计中应该就不需要这样设计了,因为在ASIC的综合中的最小单元是门电路,哪怕是综合工具提前准备好的加法器最终也是转换为门电路,所以在ASIC的code中就还是直接写RTL的+号来得方便,综合工具会综合出最简单的逻辑。而在FPGA设计中,最小单元是ALM 或者是LUT,这个时候如果可以只利用LUT而不利用LUT后面的加法器单元的确会达到资源优化的目的。

module sum_of_3bit_pair
(
input [2:0] a,b,
output reg [3:0] sum
);
always @(*) begin
    case ({a,b})
      6'd0: sum=4'd0;
      6'd1: sum=4'd1;
      6'd2: sum=4'd2;
      6'd3: sum=4'd3;
      6'd4: sum=4'd4;
      6'd5: sum=4'd5;
      6'd6: sum=4'd6;
      6'd7: sum=4'd7;
      6'd8: sum=4'd1;
      6'd9: sum=4'd2;
      6'd10: sum=4'd3;
      6'd11: sum=4'd4;
      6'd12: sum=4'd5;
      6'd13: sum=4'd6;
      6'd14: sum=4'd7;
      6'd15: sum=4'd8;
      6'd16: sum=4'd2;
      6'd17: sum=4'd3;
      6'd18: sum=4'd4;
      6'd19: sum=4'd5;
      6'd20: sum=4'd6;
      6'd21: sum=4'd7;
      6'd22: sum=4'd8;
      6'd23: sum=4'd9;
      6'd24: sum=4'd3;
      6'd25: sum=4'd4;
      6'd26: sum=4'd5;
      6'd27: sum=4'd6;
      6'd28: sum=4'd7;
      6'd29: sum=4'd8;
      6'd30: sum=4'd9;
      6'd31: sum=4'd10;
      6'd32: sum=4'd4;
      6'd33: sum=4'd5;
      6'd34: sum=4'd6;
      6'd35: sum=4'd7;
      6'd36: sum=4'd8;
      6'd37: sum=4'd9;
      6'd38: sum=4'd10;
      6'd39: sum=4'd11;
      6'd40: sum=4'd5;
      6'd41: sum=4'd6;
      6'd42: sum=4'd7;
      6'd43: sum=4'd8;
      6'd44: sum=4'd9;
      6'd45: sum=4'd10;
      6'd46: sum=4'd11;
      6'd47: sum=4'd12;
      6'd48: sum=4'd6;
      6'd49: sum=4'd7;
      6'd50: sum=4'd8;
      6'd51: sum=4'd9;
      6'd52: sum=4'd10;
      6'd53: sum=4'd11;
      6'd54: sum=4'd12;
      6'd55: sum=4'd13;
      6'd56: sum=4'd7;
      6'd57: sum=4'd8;
      6'd58: sum=4'd9;
      6'd59: sum=4'd10;
      6'd60: sum=4'd11;
      6'd61: sum=4'd12;
      6'd62: sum=4'd13;
      6'd63: sum=4'd14;
      default: sum=0;
    endcase
end
endmodule

下期我们来一起看适用于三个数相加的加法器carry save adder,感谢你读到最后,如果喜欢的话不妨点个👍。‍‍‍‍‍‍‍‍‍‍

作者:硅谷老李
文章来源:IC加油站

推荐阅读

更多IC设计干货请关注IC设计专栏。欢迎添加极术小姐姐微信(id:aijishu20)加入技术交流群,请备注研究方向。
推荐阅读
关注数
20382
内容数
1310
主要交流IC以及SoC设计流程相关的技术和知识
目录
极术微信服务号
关注极术微信号
实时接收点赞提醒和评论通知
安谋科技学堂公众号
关注安谋科技学堂
实时获取安谋科技及 Arm 教学资源
安谋科技招聘公众号
关注安谋科技招聘
实时获取安谋科技中国职位信息