A question that has often been asked over the years is whether IRAF would
benefit from specialized hardware (e.g. the Altivec array processor on PPC
systems) or high-performance compilers. The short answer is, it depends.
See below for further discussion .....
Array processors such as the Altivec do indeed provide a performance
boost, but as we've seen from the discussion on memory usage, many IRAF
tasks don't process really large arrays that would offset the overhead of
restructuring the data so specialized routines could be used. In more recent
times, the focus has been on whether GPU processing power could be applied and the answer (given without any actual testing to back it up) is that again we would probably need to restructure applications to make optimal use of the hardware.
These solutions also don't work as well for general-purpose code
where platform portability is a concern. For a systematic enhancement we can
look at various compiler options, especially for PC systems we have a choice of either the free GCC-based compilers or commercial alternatives. (While
not presented here in detail, an evaluation of the Intel compiler was
examined and though it showed overall better performance for general Fortran
code, porting IRAF to this compiler was more effort than we wanted to expend
for a simple test).
The results are shown below in Table A1 for the various GCC flavors
of compiler. The current IRAF system is built using standard F2C (at the
time of the original Linux port this was the only stable option, G77/GFortran
have become much improved since then), which the table shows is not the worst option. The G77 we tested was the most recent version, and the GFortran compiler was an advanced version that auto-vectorizes code. As a further test, we show results obtained when using FITS as the image format as well as the older IMH format.
Results showing better performance using IMH have been known for a
while, both because of inefficiencies in the FITS kernel and the advantages of keeping the header and pixel data in separate files. (Starting with v2.14, FITS is the default format used, and the only option when dealing with mosaic data). Compute-intensive tasks do generally show an improvement when all
the optimizations are enabled, the results are less dramatic when looked at
over the range of tasks typically used in CCD reductions. For Mac systems
these optimizations are disabled entirely because of an optimization bug that changed the
behavior of the floating-point error handling which was deemed unacceptable
(this has been fixed in recent GCC v4 compilers).
The stability and quality of G77 and GFortran merit a re-examination of
whether these should be used, the primary advantage for keeping the F2C
system however is that no extra compiler installations are required by the
user wishing to build external packages. Still, the core system can be
compiled with G77 and linked successfully with an F2C-compiled package on the
users machine.
As part of this test we also looked at the code structure to see whether we
were taking full advantage of the optimization. For array operator code in
the core IRAF system (i.e. the VOPS interface), the DO loop construct is
consistently used and is preserved in the generated Fortran code (as opposed
to the FOR loop in SPP which becomes a set of labels/gotos). In an
F2C-compiled system these loops are converted to C code and lose much of
their potential to be optimized because of the label/goto statements
generated, however G77/GFortran are able to optimize DO loops quite well and
we do see some gains by again going back to compiling Fortran as the target
language. In general the use of a FOR statement in SPP doesn't hurt
performance too much, this structure is rarely used in compute-intensive
code. Our tests show that optimized DO loops can run up to 6X faster than
the C code generated by F2C.
Total Time (seconds)
Make 5 images
Proc 5 images
Combine 3 images
Median 1 image
F2C
59 46
24 17
17 14
4 2
14 12
(Optimized)
53 36
24 15
17 12
4 2
8 7
G77
61 45
26 17
18 14
4 2
13 12
(Optimized)
50 41
21 18
16 13
4 2
9 8
eFortran
53 39
22 16
18 13
4 2
9 8
(Optimized)
51 40
23 16
16 15
4 1
8 8
Table A1: A comparison of benchmark results using IRAF v2.14.1
systems built
with various compilers and optimizer settings. The benchmark script
simulates real-world data processing, creating images and performing
arithmetic operations meant to exercise both disk-intensive and
memory-intensive tasks. Times are reported in whole seconds, the first number
in each column is the benchmark run using FITS as the image format, the
second number is the same test with IMH as the image format.