棋子 · 2020年02月10日

Is it typical at least 2 cycles taken for load from

I expected load and store instructions accessing zero wait state accessible memory to take only 1 cycle (average and with pipeline filled), but it doesn't seem to. Is it typical even with zero wait state memory access for load and store to take at least 2 cycles?

(Here, by the zero wait state memory I mean, for example, an internal RAM with operating clock freq. larger than that of the processor core.)

Here below is the test code and its generated assembly code I used. (I tested this on STM32F429ZITx board.)

for (i=0; i<20000; i++) {

data = test_data[i];

test_data[20000-1-i] = data;

}

And below is the generated assembly code (loop unrolled with two iterations in the loop; with optimize option -O3 -Otime). This 14 instruction loop is measured to take 36 cycles. So, it takes 2.6 cycles/instruction.

0x080019E0 F8343011 LDRH r3,[r4,r1,LSL #1]

0x080019E4 F8AD3000 STRH r3,[sp,#0x00]

0x080019E8 F8BDC000 LDRH r12,[sp,#0x00]

0x080019EC 1A53 SUBS r3,r2,r1

0x080019EE F824C013 STRH r12,[r4,r3,LSL #1]

0x080019F2 EB040341 ADD r3,r4,r1,LSL #1

0x080019F6 885B LDRH r3,[r3,#0x02]

0x080019F8 F8AD3000 STRH r3,[sp,#0x00]

0x080019FC F8BDC000 LDRH r12,[sp,#0x00]

0x08001A00 1A43 SUBS r3,r0,r1

0x08001A02 F824C013 STRH r12,[r4,r3,LSL #1]

0x08001A06 1C89 ADDS r1,r1,#2

0x08001A08 42A9 CMP r1,r5

0x08001A0A D3E9 BCC 0x080019E0

1 个回答 得票排序 · 时间排序
极术小姐姐 · 2020年02月10日

You are using CCM RAM, right? If not then wait states apply.

See chapter 3.3 of DDI0439C, it lists all cycle counts.

你的回答
关注数
1
收藏数
0
浏览数
1710
极术小姐姐
极术微信服务号
关注极术微信号
实时接收点赞提醒和评论通知
安谋科技学堂公众号
关注安谋科技学堂
实时获取安谋科技及 Arm 教学资源
安谋科技招聘公众号
关注安谋科技招聘
实时获取安谋科技中国职位信息