The following program loads two images with PyGame, converts them to Numpy arrays, and then performs some other Numpy operations (such as FFT) to emit a final result (of a few numbers). The inputs can be large, but at any moment only one or two large objects should be live.
A test image is about 10M pixels, which translates to 10MB once it's greyscaled. It gets converted to a Numpy array of dtype
uint8, which after some processing (applying Hamming windows), is an array of dtype
float64. Two images are loaded into arrays this way; later FFT steps result in an array of dtype
complex128. Prior to adding the excessive
gc.collect calls, the program memory size tended to increase with each step. Additionally, it seems most Numpy operations will give a result in the highest precision available.
Running the test (sans the
gc.collect calls) on my 1GB Linux machine results in prolonged thrashing, which I have not waited for. I don't yet have detailed memory use stats -- I tried some Python modules and the
time command to no avail; now I'm looking into valgrind. Watching PS (and dealing with machine unresponsiveness in the later stages of the test) suggests a maximum memory usage of about 800 MB.
A 10 million cell array of complex128 should occupy 160 MB. Having (ideally) at most two of these live at one time, plus the not-insubstantial Python and Numpy libraries and other paraphernalia, probably means allowing for 500 MB.
I can think of two angles from which to attack the problem:
Discarding intermediate arrays as soon as possible. That's what the
gc.collect calls are for -- they seem to have improved the situation, as it now completes with only a few minutes of thrashing ;-). I think one can expect that memory-intensive programming in a language like Python will require some manual intervention.
Using less-precise Numpy arrays at each step. Unfortunately the operations that return arrays, like
fft2, do not appear to allow the type to be specified.
So my main question is: is there a way of specifying output precision in Numpy array operations?
More generally, are there other common memory-conserving techniques when using Numpy?
Additionally, does Numpy have a more idiomatic way of freeing array memory? (I imagine this would leave the array object live in Python, but in an unusable state.) Explicit deletion followed by immediate GC feels hacky.
import sys import numpy import pygame import gc def get_image_data(filename): im = pygame.image.load(filename) im2 = im.convert(8) a = pygame.surfarray.array2d(im2) hw1 = numpy.hamming(a.shape) hw2 = numpy.hamming(a.shape) a = a.transpose() a = a*hw1 a = a.transpose() a = a*hw2 return a def check(): gc.collect() print 'check' def main(args): pygame.init() pygame.sndarray.use_arraytype('numpy') filename1 = args filename2 = args im1 = get_image_data(filename1) im2 = get_image_data(filename2) check() out1 = numpy.fft.fft2(im1) del im1 check() out2 = numpy.fft.fft2(im2) del im2 check() out3 = out1.conjugate() * out2 del out1, out2 check() correl = numpy.fft.ifft2(out3) del out3 check() maxs = correl.argmax() maxpt = maxs % correl.shape, maxs / correl.shape print correl[maxpt], maxpt, (correl.shape - maxpt, correl.shape - maxpt) if __name__ == '__main__': args = sys.argv exit(main(args))
on SO says "Scipy 0.8 will have single precision support for almost all the fft code",
and SciPy 0.8.0 beta 1 is just out.
(Haven't tried it myself, cowardly.)