Welcome to iraf.net Wednesday, January 24 2018 @ 09:20 AM GMT
How It Works -- Compiler Optimizations
- Saturday, April 18 2009 @ 05:42 AM GMT
- Contributed by: fitz
- Views: 2,605
A question that has often been asked over the years is whether IRAF would benefit from specialized hardware (e.g. the Altivec array processor on PPC systems) or high-performance compilers. The short answer is, it depends.
See below for further discussion .....
Array processors such as the Altivec do indeed provide a performance boost, but as we've seen from the discussion on memory usage, many IRAF tasks don't process really large arrays that would offset the overhead of restructuring the data so specialized routines could be used. In more recent times, the focus has been on whether GPU processing power could be applied and the answer (given without any actual testing to back it up) is that again we would probably need to restructure applications to make optimal use of the hardware.
These solutions also don't work as well for general-purpose code where platform portability is a concern. For a systematic enhancement we can look at various compiler options, especially for PC systems we have a choice of either the free GCC-based compilers or commercial alternatives. (While not presented here in detail, an evaluation of the Intel compiler was examined and though it showed overall better performance for general Fortran code, porting IRAF to this compiler was more effort than we wanted to expend for a simple test).
The results are shown below in Table A1 for the various GCC flavors of compiler. The current IRAF system is built using standard F2C (at the time of the original Linux port this was the only stable option, G77/GFortran have become much improved since then), which the table shows is not the worst option. The G77 we tested was the most recent version, and the GFortran compiler was an advanced version that auto-vectorizes code. As a further test, we show results obtained when using FITS as the image format as well as the older IMH format.
Results showing better performance using IMH have been known for a while, both because of inefficiencies in the FITS kernel and the advantages of keeping the header and pixel data in separate files. (Starting with v2.14, FITS is the default format used, and the only option when dealing with mosaic data). Compute-intensive tasks do generally show an improvement when all the optimizations are enabled, the results are less dramatic when looked at over the range of tasks typically used in CCD reductions. For Mac systems these optimizations are disabled entirely because of an optimization bug that changed the behavior of the floating-point error handling which was deemed unacceptable (this has been fixed in recent GCC v4 compilers).
The stability and quality of G77 and GFortran merit a re-examination of whether these should be used, the primary advantage for keeping the F2C system however is that no extra compiler installations are required by the user wishing to build external packages. Still, the core system can be compiled with G77 and linked successfully with an F2C-compiled package on the users machine.
As part of this test we also looked at the code structure to see whether we were taking full advantage of the optimization. For array operator code in the core IRAF system (i.e. the VOPS interface), the DO loop construct is consistently used and is preserved in the generated Fortran code (as opposed to the FOR loop in SPP which becomes a set of labels/gotos). In an F2C-compiled system these loops are converted to C code and lose much of their potential to be optimized because of the label/goto statements generated, however G77/GFortran are able to optimize DO loops quite well and we do see some gains by again going back to compiling Fortran as the target language. In general the use of a FOR statement in SPP doesn't hurt performance too much, this structure is rarely used in compute-intensive code. Our tests show that optimized DO loops can run up to 6X faster than the C code generated by F2C.
|Total Time (seconds)||Make 5 images||Proc 5 images||Combine 3 images||Median 1 image|
|F2C||59 46||24 17||17 14||4 2||14 12|
|(Optimized)||53 36||24 15||17 12||4 2||8 7|
|G77||61 45||26 17||18 14||4 2||13 12|
|(Optimized)||50 41||21 18||16 13||4 2||9 8|
|eFortran||53 39||22 16||18 13||4 2||9 8|
|(Optimized)||51 40||23 16||16 15||4 1||8 8|
Table A1: A comparison of benchmark results using IRAF v2.14.1
with various compilers and optimizer settings. The benchmark script
simulates real-world data processing, creating images and performing
arithmetic operations meant to exercise both disk-intensive and
memory-intensive tasks. Times are reported in whole seconds, the first number
in each column is the benchmark run using FITS as the image format, the
second number is the same test with IMH as the image format.