Inexplicable performance on big.LITTLE technology (on Android)

I apologise for the long question, but I am trying to measure performance of different indexing techniques on various platforms, one of which is Adaptive Radix tree.

I have run tests where the basic steps look like this (c/c++):

Step 1: Generate or load data (few million key-value pairs)
Step 2: Insert into index and measure time taken (insert_time)
Step 3: Retrieve from index and measure time taken (retrieve_time)
I find that always insert_time > retrieve_time on most platforms such as Intel desktops (i386/amd64), iPad (Apple A9), Android (ARMv7) and Raspberry Pi 3 (ARMv8). This is expected, as insert complexity is higher than retrieve complexity.

But when I run the steps on big.LITTLE platforms, specifically Snapdragon 845 (Xiaomi POCO F1) and HiSilicon Kirin 659 (Honor 9 lite), I find insert_time < retrieve_time, except when data size is too low.

To diagnose what could be wrong, I went through the following steps:

Ensure that the thread is running at maximum speed by using following code:

void set_thread_priority() {

nice(-20);
int policy = 0;
struct sched_param param;
pthread_getschedparam(pthread_self(), &policy, &param);
param.sched_priority = sched_get_priority_max(policy);
pthread_setschedparam(pthread_self(), policy, &param);

}
I could see that the nice value is reflected against the process and the thread runs 100% CPU in most cases (it is basically single thread algorithm).

Set CPU affinity using following code:

void set_affinity() {

cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(4, &mask);
CPU_SET(5, &mask);
CPU_SET(6, &mask);
CPU_SET(7, &mask);
sched_setaffinity(0, sizeof(mask), &mask);

}
This code also reflects well on big.LITTLE because when I set CPUs as 0, 1, 2, 3, the code runs much slower than when I set CPUs as 4, 5, 6, 7. Even then insert_time < retrieve_time in both cases.

Ensure that sufficient free RAM is available for my dataset

To avoid the possibility that Step 3 might retrieve from virtual memory, I added Step 4, which is just repeating Step 3:

Step 4: Retrieve from index and measure time taken again (retrieve_time2)
To my surprise, retrieve_time2 > retrieve_time > insert_time (by 2 to 3 seconds for 10 million records).

As for my code, the insert code looks like this:

it1 = m.begin();
start = getTimeVal();
for (; it1 != m.end(); ++it1) {
    art_insert(&at, (unsigned char*) it1->first.c_str(),
           (int) it1->first.length() + 1, (void *) it1->second.c_str(),
           (int) it1->second.length());
    ctr++;
}
stop = getTimeVal();

and retrieve code looks like this:

it1 = m.begin();
start = getTimeVal();
for (; it1 != m.end(); ++it1) {
    int len;
    char *value = (char *) art_search(&at,
        (unsigned char*) it1->first.c_str(), (int) it1->first.length() + 1, &len);
    ctr++;
}
stop = getTimeVal();

Any pointers as to what I could do further? Or is there an explanation for this from the platform perspective?

1 个回答得票排序 · 时间排序

你的回答

相似问题

Inexplicable performance on big.LITTLE technology (on Android)

1 个回答 得票排序 · 时间排序

你的回答

相似问题

1 个回答得票排序 · 时间排序