OpenMP for shared memory parallel systems

Welcome to iraf.net Friday, May 17 2024 @ 05:44 AM GMT

	Forum Index > Help Desk > Systems	New Topic	Post Reply
OpenMP for shared memory parallel systems

PaulBuis

06/01/2006 08:41PM (Read 5291 times)

Newbie

Status: offline

Registered: 06/01/2006
Posts: 3

I'm interested in a "high performance" version of IRAF. Has anyone inserted OpenMP directives into the Fortran code to improve the performance on shared-memory parellel systems like hyperthreaded/multicore/multiCPU Intel systems or multicore or multiCPU SPARC systems? The Intel compiler (but not gnu compilers) support OpenMP and the recent versions of the Sun SPARC compilers also support OpenMP.Alternatively, has anyone developed Fortran90 or Fortran95 verisions of the IRAF codes that have implicit parallel constructs in them?Most simplisticly, has anyone simply used the autoparallizing features of a comercial grade compiler to compile IRAF?I use a 4 CPU UltraSPARC III box and my collegues over in the Physics department are using boxes with 2 hyperthreaded Intel Xeon processors. Both of us should be able to get IRAF processing to be done faster than we do now.

Profile

Website

Quote

emiliano

06/01/2006 08:41PM

Chatty

Status: offline

Registered: 12/05/2005
Posts: 38

Hello Paul,I'm also interested in OpenMP. I'm going to buy an SMP machine soon and make some experiment...By the way: gcc now has OpenMP support, it's not in the latest official release; it's now in the cvs tree and will become part of 4.2 release. See:
http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/898465de6f502957/e9fb46c1bed4ed70?lnk=st&q=gcc+openMP&rnum=1#e9fb46c1bed4ed70
and:
http://gcc.gnu.org/wiki/GFortran. Maybe, as a first effort, OpenMP directives could be inserted in some "iraf kernel" widely used by upper level tasks... but I don't know the internal structure. I know that it's off topic, but some time ago I was wondering if parts of CFITSIO (namely: the iterator) could be parallelized with OpenMP.Cheers,
Emiliano

Profile

Quote

PaulBuis

06/01/2006 08:41PM

Newbie

Status: offline

Registered: 06/01/2006
Posts: 3

One could feed the code to the Intel compiler with the autoparallelizing flag set. It produces output indicated which loops were autoparallelized. This would be a good first pass at where to insert OpenMP directives for the gcc compiler.

Profile

Website

Quote

fitz

06/01/2006 08:41PM

Admin

Status: offline

Registered: 09/30/2005
Posts: 4040

The only problem with this idea is that the code produced from SPP doesn't vectorize. I tried this again just now and even one of the vector operator routines like sys$vops/lz/amulr.x fails to vectorize on an MBPro with gcc 4, i.e.[code:1:e41fbd49c9]
xc -c -/mfpmath=sse -/msse -/ftree-vectorize -/ftree-vectorizer-verbose=5 amulr.x
[/code:1:e41fbd49c9]Same goes for various other flags and files, although looking through the latest GCC man page there are a lot more optimization options one could try (and a LOT more combinations that may/may not help). I've also experimented with g77/gfortran/g95 and found that in many cases the code is slower. I can hand-craft code that does auto-parallelize but there's no improvement (possibly because of the marshalling of the data).More generally, what OpenMP directives are we talking about? Putting them where? Keep in mind that most tasks will operate on images one row at a time so even the vector operands are dealing with only say a 1K array and not the full 1Kx2K image. The low-level routines aren't always ideal for vectorization anyway, not to mention tasks which are i/o bound or where edge-effects (e.g. a sinc interpolation) mean you can't simply divide an image into smaller grids for processing.I believe you've already been in touch with Ken Mighell about his Beowulf experiments. The thing to remember here is that the MP code is in the application where they can be put in context to the best effect. There really aren't a lot of places in the VOS where you could do the same thing (an analogy would be putting OpenMP code in libc versus the science code).I'm interested in seeing this thread continue, especially if somebody could post a set of flags that demonstably improve the speed. I suspect however it is easier to optimize a given task rather than the entire system.-Mike

Profile

Quote