Welcome to iraf.net Friday, May 17 2024 @ 05:44 AM GMT


 Forum Index > Help Desk > Systems New Topic Post Reply
 OpenMP for shared memory parallel systems
   
PaulBuis
 06/01/2006 08:41PM (Read 5291 times)  
+----
Newbie

Status: offline


Registered: 06/01/2006
Posts: 3
I'm interested in a "high performance" version of IRAF. Has anyone inserted OpenMP directives into the Fortran code to improve the performance on shared-memory parellel systems like hyperthreaded/multicore/multiCPU Intel systems or multicore or multiCPU SPARC systems? The Intel compiler (but not gnu compilers) support OpenMP and the recent versions of the Sun SPARC compilers also support OpenMP.Alternatively, has anyone developed Fortran90 or Fortran95 verisions of the IRAF codes that have implicit parallel constructs in them?Most simplisticly, has anyone simply used the autoparallizing features of a comercial grade compiler to compile IRAF?I use a 4 CPU UltraSPARC III box and my collegues over in the Physics department are using boxes with 2 hyperthreaded Intel Xeon processors. Both of us should be able to get IRAF processing to be done faster than we do now.

 
Profile Email Website
 Quote
emiliano
 06/01/2006 08:41PM  
+++--
Chatty

Status: offline


Registered: 12/05/2005
Posts: 38
Hello Paul,I'm also interested in OpenMP. I'm going to buy an SMP machine soon and make some experiment...By the way: gcc now has OpenMP support, it's not in the latest official release; it's now in the cvs tree and will become part of 4.2 release. See:
http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/898465de6f502957/e9fb46c1bed4ed70?lnk=st&q=gcc+openMP&rnum=1#e9fb46c1bed4ed70
and:
http://gcc.gnu.org/wiki/GFortran. Maybe, as a first effort, OpenMP directives could be inserted in some "iraf kernel" widely used by upper level tasks... but I don't know the internal structure. I know that it's off topic, but some time ago I was wondering if parts of CFITSIO (namely: the iterator) could be parallelized with OpenMP.Cheers,
Emiliano

 
Profile Email
 Quote
PaulBuis
 06/01/2006 08:41PM  
+----
Newbie

Status: offline


Registered: 06/01/2006
Posts: 3
One could feed the code to the Intel compiler with the autoparallelizing flag set. It produces output indicated which loops were autoparallelized. This would be a good first pass at where to insert OpenMP directives for the gcc compiler.

 
Profile Email Website
 Quote
fitz
 06/01/2006 08:41PM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 4040
The only problem with this idea is that the code produced from SPP doesn't vectorize. I tried this again just now and even one of the vector operator routines like sys$vops/lz/amulr.x fails to vectorize on an MBPro with gcc 4, i.e.[code:1:e41fbd49c9]
xc -c -/mfpmath=sse -/msse -/ftree-vectorize -/ftree-vectorizer-verbose=5 amulr.x
[/code:1:e41fbd49c9]Same goes for various other flags and files, although looking through the latest GCC man page there are a lot more optimization options one could try (and a LOT more combinations that may/may not help). I've also experimented with g77/gfortran/g95 and found that in many cases the code is slower. I can hand-craft code that does auto-parallelize but there's no improvement (possibly because of the marshalling of the data).More generally, what OpenMP directives are we talking about? Putting them where? Keep in mind that most tasks will operate on images one row at a time so even the vector operands are dealing with only say a 1K array and not the full 1Kx2K image. The low-level routines aren't always ideal for vectorization anyway, not to mention tasks which are i/o bound or where edge-effects (e.g. a sinc interpolation) mean you can't simply divide an image into smaller grids for processing.I believe you've already been in touch with Ken Mighell about his Beowulf experiments. The thing to remember here is that the MP code is in the application where they can be put in context to the best effect. There really aren't a lot of places in the VOS where you could do the same thing (an analogy would be putting OpenMP code in libc versus the science code).I'm interested in seeing this thread continue, especially if somebody could post a set of flags that demonstably improve the speed. I suspect however it is easier to optimize a given task rather than the entire system.-Mike

 
Profile Email
 Quote
   
Content generated in: 0.14 seconds
New Topic Post Reply

Normal Topic Normal Topic
Sticky Topic Sticky Topic
Locked Topic Locked Topic
New Post New Post
Sticky Topic W/ New Post Sticky Topic W/ New Post
Locked Topic W/ New Post Locked Topic W/ New Post
View Anonymous Posts 
Anonymous users can post 
Filtered HTML Allowed 
Censored Content 
dog allergies remedies cialis 20 mg chilblain remedies


Privacy Policy
Terms of Use

User Functions

Login