Loading...
Thumbnail Image
Item

An exploration of using the intel AVX2 gather load instructions for vectorised image processing

Abstract
Processing image data with single-instruction multiple-data (SIMD) CPU instructions provides a means of vectorising, thus speeding up execution, of standard image processing operators. SIMD register loads normally load from consecutive locations in memory, that is, consecutive pixels in a row of the image. For some algorithms, however, data dependencies between pixels along rows render SIMD vectorisation useless. If one could efficiently load pixels from columns of images this problem would be fixed. The Intel AVX2 CPU extension introduces an instruction for the gather loading of data from multiple memory locations into a single CPU SIMD register. We explore using these instructions for column loads of image data in two common image operations, transposing images and mean filtering, to test 1) whether they provide useful speed-ups when other vectorised approaches exist (and find that they do not), and 2) whether they provide means of implementing operations that otherwise would be difficult or extremely inefficient to achieve without a column load (they can provide speed-ups over scalar code).
Type
Conference Contribution
Type of thesis
Series
Citation
Cree, M. J. (2018). An exploration of using the intel AVX2 gather load instructions for vectorised image processing. Presented at the International Conference on Image and Vision Computing New Zealand (IVCNZ), Auckland, New Zealand: IEEE. https://doi.org/10.1109/IVCNZ.2018.8634707
Date
2018
Publisher
IEEE
Degree
Supervisors
Rights
This is an author’s accepted version of an article published in the Proceedings of International Conference on Image and Vision Computing New Zealand (IVCNZ). © 2018 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.