Welcome to iraf.net Friday, April 26 2024 @ 10:16 PM GMT

How It Works -- Compiler Optimizations

  • Saturday, April 18 2009 @ 05:42 AM GMT
  • Contributed by:
  • Views: 3,011
How It Works

A question that has often been asked over the years is whether IRAF would benefit from specialized hardware (e.g. the Altivec array processor on PPC systems) or high-performance compilers. The short answer is, it depends.

See below for further discussion .....

Array processors such as the Altivec do indeed provide a performance boost, but as we've seen from the discussion on memory usage, many IRAF tasks don't process really large arrays that would offset the overhead of restructuring the data so specialized routines could be used. In more recent times, the focus has been on whether GPU processing power could be applied and the answer (given without any actual testing to back it up) is that again we would probably need to restructure applications to make optimal use of the hardware.

These solutions also don't work as well for general-purpose code where platform portability is a concern. For a systematic enhancement we can look at various compiler options, especially for PC systems we have a choice of either the free GCC-based compilers or commercial alternatives. (While not presented here in detail, an evaluation of the Intel compiler was examined and though it showed overall better performance for general Fortran code, porting IRAF to this compiler was more effort than we wanted to expend for a simple test).

The results are shown below in Table A1 for the various GCC flavors of compiler. The current IRAF system is built using standard F2C (at the time of the original Linux port this was the only stable option, G77/GFortran have become much improved since then), which the table shows is not the worst option. The G77 we tested was the most recent version, and the GFortran compiler was an advanced version that auto-vectorizes code. As a further test, we show results obtained when using FITS as the image format as well as the older IMH format.

Results showing better performance using IMH have been known for a while, both because of inefficiencies in the FITS kernel and the advantages of keeping the header and pixel data in separate files. (Starting with v2.14, FITS is the default format used, and the only option when dealing with mosaic data). Compute-intensive tasks do generally show an improvement when all the optimizations are enabled, the results are less dramatic when looked at over the range of tasks typically used in CCD reductions. For Mac systems these optimizations are disabled entirely because of an optimization bug that changed the behavior of the floating-point error handling which was deemed unacceptable (this has been fixed in recent GCC v4 compilers).

The stability and quality of G77 and GFortran merit a re-examination of whether these should be used, the primary advantage for keeping the F2C system however is that no extra compiler installations are required by the user wishing to build external packages. Still, the core system can be compiled with G77 and linked successfully with an F2C-compiled package on the users machine.

As part of this test we also looked at the code structure to see whether we were taking full advantage of the optimization. For array operator code in the core IRAF system (i.e. the VOPS interface), the DO loop construct is consistently used and is preserved in the generated Fortran code (as opposed to the FOR loop in SPP which becomes a set of labels/gotos). In an F2C-compiled system these loops are converted to C code and lose much of their potential to be optimized because of the label/goto statements generated, however G77/GFortran are able to optimize DO loops quite well and we do see some gains by again going back to compiling Fortran as the target language. In general the use of a FOR statement in SPP doesn't hurt performance too much, this structure is rarely used in compute-intensive code. Our tests show that optimized DO loops can run up to 6X faster than the C code generated by F2C.

Total Time (seconds)Make 5 imagesProc 5 imagesCombine 3 imagesMedian 1 image
F2C59 4624 1717 144 214 12
(Optimized)53 3624 1517 124 2 8 7
G7761 4526 1718 144 213 12
(Optimized)50 4121 1816 134 2 9 8
eFortran53 3922 1618 134 29 8
(Optimized)51 4023 1616 154 1 8 8
    Table A1: A comparison of benchmark results using IRAF v2.14.1 systems built with various compilers and optimizer settings. The benchmark script simulates real-world data processing, creating images and performing arithmetic operations meant to exercise both disk-intensive and memory-intensive tasks. Times are reported in whole seconds, the first number in each column is the benchmark run using FITS as the image format, the second number is the same test with IMH as the image format.
How It Works -- Compiler Optimizations | 0 comments | Create New Account

The following comments are owned by whomever posted them. This site is not responsible for what they say.



Privacy Policy
Terms of Use

User Functions

Login