17

啥都吃的豆芽 · 2023年03月13日 · 北京市

倚天710性能追踪——CMN Wathpoint事件

REVISION HISTORY

image.png

前言

文章《倚天710性能监控 —— CMN PMU概述》我们介绍了如果使用CMN的PMU事件,统计带宽,但是带宽统计到的是包含了Request,Response,Snoop和Data,本文介绍如何使用watchpoint事件,过滤统计Data,并验证此方法的准确性。

1. DTM watchpoint [1]

A DTM has fourWatchPoints (WPs)thatmonitor flit uploads and downloads at XP device ports.WPs monitor flits by matching on a subset of flit fields that youspecify using a pair of val and maskregisters.The following figure shows the WP comparator and the registers that control this functionality.

A WP can be configured to monitor flits fromone of two XP device portsandone of four CHI channels:

  • Request(REQ)
  • Response(RSP)
  • Snoop(SNP)
  • Data(DAT)

por_dtm_wp:

  • por_dtm_wp{0..3}_config.wp_dev_sel: select device port
  • por_dtm_wp{0..3}_config.wp_chn_sel: select flit channel
  • por_dtm_wp{0..3}_val.wp_chn_sel: specify the value for matching
  • por_dtm_wp{0..3}_val.wp_chn_sel: specifies the bits that must be masked from the match comparison

In addition, the WP can be configured to do one or more of the following tasks on detecting a flit match:

  • Set trace tag bit on the flit.
  • Generate flit trace.
  • Generate cross trigger to DTC.
  • Generate debug trigger to DTC.
  • Increment PMU counters.

The four DTM WPs are assigned to flit uploads and downloads according to the following groups:

  • WP0 and WP1 are assigned to flit uploads.
  • WP2 and WP3 are assigned to flit downloads.

On flit upload channels, RN-F or SN-F is the transmitting device and CMN-700 is the receiving device. On flit download channels, CMN-700 is the transmitting device and RN-F or SN-F is the receiving device.

Example:

To monitor REQ flits uploaded from RNF2, set up watchpoints (WPs) inside XP connected to RNF2. The Opcode and Address fields are mapped to the primary and secondary match registers respectively. Therefore, you must set up two WPs, one to monitor the Opcode and the other to monitor the Address.

To set up these WPs:

1. Program WP0 (upload WP) to monitor REQ.Opcode:
    a. Set dtm_wp0_val/mask registers to match on Opcode=ReadShared
    b. Set dtm_wp0_config to:
        a. Select upload device port (wp_dev_sel=RNF2_port).
        b. Select upload device port ({wp_dev_sel2, wp_dev_sel}=RNF2_port).
        c. Select flit channel (wp_chn_sel=REQ)**
        d. Match format group to primary for Opcode match (wp_grp=0)
        e. Set combined mode to gang-up WP0 and WP1 (wp_combine=1)
        f. Enable REQ flit trace packet generation (set wp_pkt_type and wp_pkt_gen=1)
2. Program WP1 (upload WP) to monitor REQ.Address as follows:
    a. Set dtm_wp1_val/mask registers to match on Address=X
    b. Set dtm_wp1_config to:
        a. Select upload device port (wp_dev_sel=RNF2_port).
        b. Select upload device port ({wp_dev_sel2, wp_dev_sel}=RNF2_port).
        c. Select Flit channel (wp_chn_sel=REQ)
        d. Match format group to secondary for Address match (wp_grp=1)

2. 验证CMN带宽统计准确性原理

我们使用lmbench中bw_mem统计的值作为标准的准确值,与CMN中CCG统计的实验值作为对比,验证CMN带宽统计的准确性。

2.1 数据不准问题

根据上一节,我们知道,通过CMN的Watchpoint事件,我们可以过滤DAT Channel,从而解决数据不准问题。

2.1.1 CMN-700 Topology in Yitian SoC

对于D2D的Data带宽,依然需要知道倚天CMN的拓扑结构,指定NodeID,与PMU事件的区别在于,Watchpoint事件指定的为CCG连接XP的NodeID。

D2D
nodeid=0x1b0(432), logid=0x1b, type=0x006(XP), (x,y,portid,devid)=(3,6,0,0)
nodeid=0x1b5(437), logid=0x7, type=0x105(CCLA), (x,y,portid,devid)=(3,6,1,1)
nodeid=0x1a8(424), logid=0x17, type=0x006(XP), (x,y,portid,devid)=(3,5,0,0)
nodeid=0x1ad(429), logid=0x6, type=0x105(CCLA), (x,y,portid,devid)=(3,5,1,1)
nodeid=0x1a0(416), logid=0x13, type=0x006(XP), (x,y,portid,devid)=(3,4,0,0)
nodeid=0x1a5(421), logid=0x5, type=0x105(CCLA), (x,y,portid,devid)=(3,4,1,1)
nodeid=0x198(408), logid=0xf, type=0x006(XP), (x,y,portid,devid)=(3,3,0,0)
nodeid=0x19d(413), logid=0x4, type=0x105(CCLA), (x,y,portid,devid)=(3,3,1,1)

2.1.2 DAT带宽统计方法

CMN DAT带宽统计方法如下:

# - XP uploads and downloads on Die 0
# - XP uploads and downloads on Die 1
perf stat \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -a -- sleep 1
  • type=0x7770, watchpoint
  • eventid: 0x0 for uploads, 0x2 for downloads
  • bynodeid=1,nodeid=0x198: specify nodeid of XP which  connects with CCG
  • wp_dev_sel:0x1 for CCG
  • wp_chn_sel: 0x3 for DATA VC
  • wp_grp: 0x0 for primary group (WP 0,2 for primary group, WP 1,3 for secondary group)
  • wp_val=0x0: reset value
  • wp_mask=0xFFFFFFFFFFFFFFFF: All fields in the primary match group will be ignored so that any upload DAT flit would be a match

WP is selected by eventid + wp_grp, e.g.:

  • eventid=0x0, wp_grp=0x0: WP 0 for uploads primary group
  • eventid=0x0, wp_grp=0x1: WP 1 for uploads secondary group
  • eventid=0x2, wp_grp=0x0: WP 2 for downloads primary group

带宽公式:

  • Inter-Socket RX Bandwidth = Flit_Size CCG_WP0_RX_DATA_FLIT CMN_Freq / PMU_CYCLE_COUNTER
  • Inter-Socket TX Bandwidth = Flit_Size CCG_WP0_TX_DATA_FLIT CMN_Freq / PMU_CYCLE_COUNTER
  • Inter-Die RX Bandwidth = Flit_Size CCG_WP0_RX_DATA_FLIT CMN_Freq / PMU_CYCLE_COUNTER
  • Inter-Die TX Bandwidth = Flit_Size CCG_WP0_TX_DATA_FLIT CMN_Freq / PMU_CYCLE_COUNTER
  • Flit_Size = 32 Bytes

2.2 时间不准问题

bw_mem[2]的主要实现逻辑如下:

benchmp(init_loop, rd, cleanup, 0, parallel,  warmup, repetitions, &state);
    benchmp_child()        // child process
        initialize()        // init buffer
            bzero((void*)state->buf, state->nbytes);
        while (1) {            // run workload `rd`, repeat  repetitions times (default 11)
             (*benchmark)(benchmp_interval(&_benchmp_child_state), cookie);
                    switch (state->state) {
                    case warmup:
                                //....
                    case timing_interval:
                                // effective value couting by bw_mem, repeat 11 times
                    case cooldown:
                                //...
        }

为了更准确的统计计算时间,lmbench中有很多代码逻辑用来统计loop overlead,timing overlead,在计算带宽的有效统计时间时,这些overhead都会被去掉。此外,通过bzero初始化数据buffer也很耗时,40 GB的buffer大概需要15.31秒。因此,在用大的buffer验证带宽时,bw\_mem的有效运行时间和程序运行时间会差很大,导致CMN统计的实验值和bw\_mem的标准真值有出入。

bw_mem的标准真值,是在初始化以及warmup后,多次运行计算带宽,取的中位数。在未指定运行次数时,重复运行11次。在验证CMN统计带宽准确性时,我们使用如下技巧:

  • 修改bw_mem,在case timing_interval处增加打印提示
  • 测试时后台运行bw_mem
  • perf运行时间为1秒,每秒输出带宽值
  • 使用perf统计在case timing_interval时间统计的带宽值作为实验值

3. 带宽验证

3.1 TL;DR

image.png

  • 带宽单位:MB/s

可以看到,通过Watchpoint事件,DAT带宽统计误差不超过 1%。

3.2 READ

C0M1

# First, run bw_mem as backgroud workload
# numactl --cpubind=0 --membind=1 ./bw_mem 40960M rd

# Then run perf command in another console
$ perf stat \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -a -- sleep 1

          90278067      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90278600      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wpval=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90277969      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90278284      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3611734      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3597502      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3727456      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3553791      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3554765      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3728825      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3598923      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3613123      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90276087      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90276563      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90277616      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90278781      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/


       1.000867931 seconds time elapsed      
       
>>> 90278781*4*32/1000/1000
11555

# node1: memory upload to XP,CCG download from XP
# node0: CCG upload to XP, cpu node 0 download from XP to read
numactl --cpubind=0 --membind=1 ./bw_mem 40960M rd
40960.00 11520.56
  • upload WP:die0在向die1读取数据,die0的CCG拿到数据后,需要把数据upload到XP上,最终到请求者结束,所以die0是flit uploads
  • download WP:die0在向die1读取数据,die1的XP拿到数据后,CCG需要把数据从XP上download,最终到请求者结束,所以die1是flit downloads.

C1M0

# First, run bw_mem as backgroud workload
# numactl --cpubind=1 --membind=0 ./bw_mem 40960M rd

# Then run perf command in another console
$ perf stat \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -a -- sleep 1
        
Performance counter stats for 'system wide':

           3419746      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3467562      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3403892      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3457318      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90469072      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90517461      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90570438      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90459938      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90456309      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90567238      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90514305      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          90466295      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3456776      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3403490      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3467108      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
           3419442      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/

       1.000748035 seconds time elapsed
       
$ numactl --cpubind=1 --membind=0 ./bw_mem 40960M rd
40960.00 11566.47

>>> 90517461*4*32/1000/1000
11586

3.3 WRITE

# First, run bw_mem as backgroud workload
# numactl --cpubind=1 --membind=0 ./bw_mem 40960M wr

# Then run perf command in another console
perf stat \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -a -- sleep 1
 
 Performance counter stats for 'system wide':

          49920378      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          49919382      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          49920459      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          49919681      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          49895686      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          49943796      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          50004151      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          49887332      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          49885729      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          50002696      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          49942498      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          49894579      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          49917633      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          49918928      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          49918235      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
          49920091      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/


       1.000911250 seconds time elapsed

>>> 50099224*4*32/1000/1000
6389

numactl --cpubind=0 --membind=1 ./bw_mem 40960M wr
40960.00 6373.56
  • fwr: write every 4 byte word
  • wr: 4 byte write, 32 byte stride
When in write streaming mode, load operations behave as normal, and can still cause linefills. Writes still lookup in the cache, but if they miss then they write out to the L2 or L3 cache rather than starting a linefill. [2]

wr为每写4字节,间隔32字节。cacheline大小为64字节,每次write miss,都会导致cacheline-fill(write allocation),因此,wr都是会有读带宽和写带宽。fwr为连续写4字节,会触发N2处理器的write streaming mode,写穿cache,因此,fwr只有写带宽,没有读带宽。

C0M1

# First, run bw_mem as backgroud workload
# numactl --cpubind=0 --membind=1 ./bw_mem 40960M fwr

# Then run perf command in another console
perf stat \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -a -- sleep 1
 
 Performance counter stats for 'system wide':

             99569      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
             97711      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
             98130      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
             97544      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171790464      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171838050      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171896982      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171781045      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171774934      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171891867      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171833314      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171786692      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
             96434      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
             97152      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
             96733      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
             98713      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/

       1.000987867 seconds time elapsed

>>> 171790464*4*32/1000/1000
21989

# node0: cpu node 0 write to XP, CCG download from XP, 
# node1: CCG upload to XP, XP to memnode 1
# 
numactl --cpubind=0 --membind=1 ./bw_mem 40960M fwr
40960.00 21936.82

C1M0

# First, run bw_mem as backgroud workload
# numactl --cpubind=1 --membind=0 ./bw_mem 40960M fwr

# Then run perf command in another console
perf stat \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -e arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_sel=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/ \
        -a -- sleep 1
 Performance counter stats for 'system wide':

         171633399      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171633694      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171635028      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171635432      arm_cmn_0/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
             44884      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
             92908      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
            147330      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
             36366      arm_cmn_0/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
             36197      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
            147513      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
             93069      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
             45051      arm_cmn_1/type=0x7770,eventid=0x0,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171630800      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x198,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171631315      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A0,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171630736      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1A8,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/
         171634061      arm_cmn_1/type=0x7770,eventid=0x2,bynodeid=1,nodeid=0x1B0,wp_dev_sel=0x1,wp_chn_s
el=0x3,wp_grp=0x0,wp_val=0x0,wp_mask=0xFFFFFFFFFFFFFFFF/

       1.000950488 seconds time elapsed
       

>>> 171630800*4*32/1000/1000
21968

$ numactl --cpubind=1 --membind=0 ./bw_mem 40960M fwr
40960.00 21930.69
文章来源:龙蜥社区

推荐阅读
倚天710性能系列

更多Arm服务器相关技术及移植干货请关注Arm服务器专栏。如要加入Arm Server微信群,请添加极术小姐姐(微信id:aijishu20)备注Arm服务器邀请加入。
推荐阅读
关注数
17403
内容数
80
分享arm服务器软件应用经验、测试方法、优化思路、工具使用等。
目录
极术微信服务号
关注极术微信号
实时接收点赞提醒和评论通知
安谋科技学堂公众号
关注安谋科技学堂
实时获取安谋科技及 Arm 教学资源
安谋科技招聘公众号
关注安谋科技招聘
实时获取安谋科技中国职位信息