Welcome to iraf.net Friday, May 17 2024 @ 05:44 AM GMT
PaulBuis |
06/01/2006 08:41PM (Read 5291 times)
|
|
|
Status: offline
Registered: 06/01/2006
Posts: 3
|
I'm interested in a "high performance" version of IRAF. Has anyone inserted OpenMP directives into the Fortran code to improve the performance on shared-memory parellel systems like hyperthreaded/multicore/multiCPU Intel systems or multicore or multiCPU SPARC systems? The Intel compiler (but not gnu compilers) support OpenMP and the recent versions of the Sun SPARC compilers also support OpenMP.Alternatively, has anyone developed Fortran90 or Fortran95 verisions of the IRAF codes that have implicit parallel constructs in them?Most simplisticly, has anyone simply used the autoparallizing features of a comercial grade compiler to compile IRAF?I use a 4 CPU UltraSPARC III box and my collegues over in the Physics department are using boxes with 2 hyperthreaded Intel Xeon processors. Both of us should be able to get IRAF processing to be done faster than we do now.
|
|
|
|
emiliano |
06/01/2006 08:41PM
|
|
|
Status: offline
Registered: 12/05/2005
Posts: 38
|
|
|
|
|
PaulBuis |
06/01/2006 08:41PM
|
|
|
Status: offline
Registered: 06/01/2006
Posts: 3
|
One could feed the code to the Intel compiler with the autoparallelizing flag set. It produces output indicated which loops were autoparallelized. This would be a good first pass at where to insert OpenMP directives for the gcc compiler.
|
|
|
|
fitz |
06/01/2006 08:41PM
|
|
|
Status: offline
Registered: 09/30/2005
Posts: 4040
|
The only problem with this idea is that the code produced from SPP doesn't vectorize. I tried this again just now and even one of the vector operator routines like sys$vops/lz/amulr.x fails to vectorize on an MBPro with gcc 4, i.e.[code:1:e41fbd49c9]
xc -c -/mfpmath=sse -/msse -/ftree-vectorize -/ftree-vectorizer-verbose=5 amulr.x
[/code:1:e41fbd49c9]Same goes for various other flags and files, although looking through the latest GCC man page there are a lot more optimization options one could try (and a LOT more combinations that may/may not help). I've also experimented with g77/gfortran/g95 and found that in many cases the code is slower. I can hand-craft code that does auto-parallelize but there's no improvement (possibly because of the marshalling of the data).More generally, what OpenMP directives are we talking about? Putting them where? Keep in mind that most tasks will operate on images one row at a time so even the vector operands are dealing with only say a 1K array and not the full 1Kx2K image. The low-level routines aren't always ideal for vectorization anyway, not to mention tasks which are i/o bound or where edge-effects (e.g. a sinc interpolation) mean you can't simply divide an image into smaller grids for processing.I believe you've already been in touch with Ken Mighell about his Beowulf experiments. The thing to remember here is that the MP code is in the application where they can be put in context to the best effect. There really aren't a lot of places in the VOS where you could do the same thing (an analogy would be putting OpenMP code in libc versus the science code).I'm interested in seeing this thread continue, especially if somebody could post a set of flags that demonstably improve the speed. I suspect however it is easier to optimize a given task rather than the entire system.-Mike
|
|
|
|
| |
|
Content generated in: 0.14 seconds |
|