The FFT on a GPU

"The FFT on a GPU." Kenneth Moreland and Edward Angel. In SIGGRAPH/Eurographics Workshop on Graphics Hardware 2003 Proceedings, July 2003, pp. 112–119.


The Fourier transform is a well known and widely used tool in many scientific and engineering fields. The Fourier transform is essential for many image processing techniques, including filtering, manipulation, correction, and compression. As such, the computer graphics community could benefit greatly from such a tool if it were part of the graphics pipeline. As of late, computer graphics hardware has become amazingly cheap, powerful, and flexible. This paper describes how to utilize the current generation of cards to perform the fast Fourier transform (FFT) directly on the cards. We demonstrate a system that can synthesize an image by conventional means, perform the FFT, filter the image, and finally apply the inverse FFT in well under 1 second for a 512 by 512 image. This work paves the way for performing complicated, real-time image processing as part of the rendering pipeline.


Get the paper by clicking this link. Note that in the original publication in the workshop proceedings has an error in Equation 6. The electronic form distributed here has a correction.


At the time of publication I archived the software I was using to test the system and generate images.

I have not touched this code much since publishing the paper, so it is probably out of date. So far no one has reported catastrophic problems that have not been fixed, although I have heard of problems with rendering artifacts.1

Supplemental Code

For those of you who are really adventurous, I have also archived the rest of the source code I generated during my Ph.D. program. The code is contained in a makeshift library called On-Card Algorithms (OCA). It's basically a bunch of boilerplate encapsulated in C++ objects to run Cg algorithms that do not fit will within the classical graphics pipeline. You may or may not find that useful although now there are more complete solutions such as CUDA, OpenCL, and many others2 now available.

In this archive is also the collection of algorithms I happened to be working on at the time, such as in my dissertation. You may enjoy looking through them although many will probably be a bit worthless and other may not even work.

Download oca.tar.gz

You will need CMake in order to compile this code.

Other Implementations

This was the first paper to propose performing an FFT on a GPU. However, since this publication, several other researchers have proposed and implemented algorithms that are either easier to implement or more efficient (or perhaps both). Before spending too much time understanding all the details in this paper, consider looking at other implementations. has links to several recent implementations.

  1. There is also a known issue of a seg fault when you close the window. It's an issue with a bad object deletion order. It does not occur when you stop the application by hitting the 'q' key, and I have no interest in fixing it.
  2. By the time you read this, the list of GPGPU programming systems will probably have changed. See for the latest announcements and information.


SAND 2015-4836 W