Ex:
void copy(char* dest, char* source, long size) {
int i;
for (i = 0; i < size; i++)
dest[i] = source[i];
}
The code above copies data from one location into another, the performance wouldn't matter is the size is small (like: 100 - 1000) but if the size was
high (millions) then you can see the effect but by using vectorization the program copies N elements (lets say 4) at cost of
one element copy in code above only if the memory region of destination and source aren't overlapping. To apply vectorization on
your program you could do is compile your program with -03 flag. If the loop is guaranteed to operate on non-overlapping memory regions
then you could add “#pragma ivdep” on top of loop which informs compiler to ignore vector dependencies and the vectorizing will be
applied to your loop with even better performance then -03 because the vectorization checks will not run but if the loop operates on
overlapping memory then your program will give false results.
The difference between loop unrolling and vectorization is loop unrolling will do 4 (4 as an example) separate operations in one index increment but in
vectorization it will copy 4 (4 as an example) data and place it in destination in 1 operation. It's easy to show it in code, so here it
is:
Loop Unrolling
for (int i=0; i <
size; i += 4){
dest[i] =
source[i];
dest[i + 1] =
source[i + 1];
dest[i + 2] =
source[i + 2];
dest[i + 3] =
source[i + 3];
}
Vectorization
for (int i=0; i <
size; i += 4){
copyFourThingsAtOnce(&dest[i], &source[i]);
}
}
No comments:
Post a Comment