Friday, December 18, 2015

Vectorizing

Everyone wants to write their code that gives better performance and one of the processes which slows down program performance is loops. If you are writing a loop that loops through each element in array/s and you want to optimize it to get better performance, you could use vectorization to get best performance on the loop. Vectorization is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.

Ex:
void copy(char* dest, char* source, long size) {
int i;
for (i = 0; i < size; i++)
dest[i] = source[i];
}
The code above copies data from one location into another, the performance wouldn't matter is the size is small (like: 100 - 1000) but if the size was high (millions) then you can see the effect but by using vectorization the program copies N elements (lets say 4) at cost of one element copy in code above only if the memory region of destination and source aren't overlapping. To apply vectorization on your program you could do is compile your program with -03 flag. If the loop is guaranteed to operate on non-overlapping memory regions then you could add “#pragma ivdep” on top of loop which informs compiler to ignore vector dependencies and the vectorizing will be applied to your loop with even better performance then -03 because the vectorization checks will not run but if the loop operates on overlapping memory then your program will give false results.
The difference between loop unrolling and vectorization is loop unrolling will do 4 (4 as an example) separate operations in one index increment but in vectorization it will copy 4 (4 as an example) data and place it in destination in 1 operation. It's easy to show it in code, so here it is:

Loop Unrolling
for (int i=0; i < size; i += 4){
       dest[i] = source[i];
       dest[i + 1] = source[i + 1];
       dest[i + 2] = source[i + 2];
       dest[i + 3] = source[i + 3];
}

Vectorization
for (int i=0; i < size; i += 4){
       copyFourThingsAtOnce(&dest[i], &source[i]);
}

No comments:

Post a Comment