Welcome to iraf.net Tuesday, November 21 2017 @ 06:25 AM GMT


 Forum Index > Help Desk > Systems New Topic Post Reply
 Running IRAF on a Cluster.
   
FSBoyden
 07/25/2006 07:43AM (Read 23543 times)  
++++-
Regular Member

Status: offline


Registered: 06/07/2006
Posts: 95
Hi there.I am sending this Q out there for anyone who can help, or had done, or is doing the same. We want to build a network, or "blade server" cluster to do large amounts of photometric analysis, and other data analysis.The Q is! Is it possible to run IRAF task's and/or applications on a cluster, and how difficult is it to develope or modify a system to the extend that it runs on a cluster. Any other general tips and suggestions would be greatly appreciated.I also have to develope a script to run the photometric analysis. Can I use IRAF script to develope it, or would it be better to use something more independant like Python....?Thanx
Regards
FSBoyden

 
Profile Email
 Quote
LBT_Dave
 07/25/2006 07:43AM  
++---
Junior

Status: offline


Registered: 03/17/2006
Posts: 21
> Is it possible to run IRAF task's and/or applications on a cluster It is possible (I was doing this yesterday on 4 nodes). As one user, you would need to log into each node and do a mkiraf on a local disk/directory, then start iraf from there. It is much faster to work on data on a local disk rather than on an NFS-mounted volume, so plan some local storage on each node. Remote display (e.g. if you ssh into the nodes) slows things down as well. It helps to code any scripts you might make to keep processing on a cluster in mind. In a related question, I am looking into putting together a data reduction computer for the LBT and the cluster vs. workstation issue has come up. Our optical wide-field cameras will have a total of eight 2x4.6k CCDs. Is iraf purely single-threaded, or can parts of it (e.g. mscred) make proper use of multiple CPUs (and do you need to do anything special to make this happen)? In the past I have seen no difference in processing time between a 3.4GHz Northwood P4 and a dual 3.4GHz Xeon machine of the same generation (with 3x the RAM) when working on single-chip data. My work with iraf on a cluster was driven by the availability of the cluster, but may not be the best option. The question is: with the Conroe/Woodcrest CPUs coming out now, what is the preferred forward-looking platform for iraf reductions? Post your wish lists! On a more esoteric level, would a complete recompilation of iraf on one of these new CPUs gain anything, or is the latest/greatest 3GHz Conroe expected to perform the same as the 3GHz Northwood I have in my 3 year old workstation? Does iraf make use of the SSE extensions and such? I've never seen iraf make 100% use of my current CPU, so I was wondering if there are optimizations that could be done to speed things up a bit.Dave Thompson, LBT Observatory

 
Profile Email
 Quote
emiliano
 07/25/2006 07:43AM  
+++--
Chatty

Status: offline


Registered: 12/05/2005
Posts: 38
Hello all,I also asked the same questions, and seems that the answer is "no": no smp or clustering support, no simd (altivec or sse) support...
At least no "direct support inside IRAF", but you can write a C or Fortran parallel code "compatible with IRAF", that is an application you can run it from the cl using IRAF parameter files and so on... see this post:http://iraf.net/phpBB2/viewtopic.php?t=85315Dave, maybe, if you are running an IRAF pipeline which is "database driven" (that is, calibration and science frames are selected from a database server) you can use a sort of scheduler to run the pipeline on different cluster nodes, with different images, at the same time...Cheers,
Emiliano

 
Profile Email
 Quote
LBT_Dave
 07/25/2006 07:43AM  
++---
Junior

Status: offline


Registered: 03/17/2006
Posts: 21
> an IRAF pipeline which is "database driven" Nothing so fancy (or automatic). I was working on data from a camera with 4 arrays. Since the data are independent, all the basic processing for each array can be done on a separate node. I've set up my pipeline so that I can do that by executing essentially the same iraf command (cut and pasted) on each node to work on the separate sets of images. But that will get cumbersome fast. Dave Thompson, LBT Observatory

 
Profile Email
 Quote
FSBoyden
 07/25/2006 07:43AM  
++++-
Regular Member

Status: offline


Registered: 06/07/2006
Posts: 95
Hi there.Emiliano mentioned that it would be able to write a parallel program in C that control's the database, and/or task's. A fellow student and I had a discussion that it might be possible to while you build a photometry, and/or reduction "automated" package (a package that runs large amounts of data) it should be able to build a "control" framework wherein the IRAF package runs, with the framework controlling the designation of the threads. That is at this stage PC's runs with "internal threads", kernals and GUI's. It would be able to expand that so that you can control where those threads go. I.E. you build the package so that it is already has a thread based structure. Then you can control the amount of threads and where their send. Then you could run it on a single PC, or on a Cluster with the control program sending the threads to the different nodes.I hope I'm not running around in the dark here, my friend is the PC guru, not me. I also had not had time to read the article mentioned on the "Beowulf" cluster, thus sorry if it was already mentioned there. I hope this can continue the discussion.

 
Profile Email
 Quote
LBT_Dave
 07/25/2006 07:43AM  
++---
Junior

Status: offline


Registered: 03/17/2006
Posts: 21
I am less familiar with this method of running iraf, but you can apparently package things together into a "shell script" type format (#!cl). Those can be submitted as batch jobs on a multi-CPU machine. Perhaps the real Guru's can comment (or point us to a relevant document)? With CPUs heading towards multiple cores rather than more GHz, we clearly need some good way to extend iraf to make better use of the hardware that is or will be available in the near future. However that is done will point towards multi-threaded or massively parallel architectures. Dave Thompson, LBT Observatory

 
Profile Email
 Quote
fitz
 07/25/2006 07:43AM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 3988
An omnibus reply to various statements:FSBoyden writes:
[quote:10b2b8b108]
The Q is! Is it possible to run IRAF task's and/or applications on a cluster, and how difficult is it to develope or modify a system to the extend that it runs on a cluster. Any other general tips and suggestions would be greatly appreciated.[/quote:10b2b8b108]In the sense I think you're asking the question (i.e. run a single task that internally splits data across the cluster and uses something like MPI), no. There is no reason however why each node of a cluster couldn't be running an IRAF process, the trick is to have some controlling process spawn these as part of a higher level application like a pipeline. The NOAO Mosaic pipeline works in a similar way but makes use of a heterogensous ad hoc network of machines and the ubiquitous iraf networking to access data (along with custom control managers for the pipeline itself).[quote:10b2b8b108]
I also have to develope a script to run the photometric analysis. Can I use IRAF script to develope it, or would it be better to use something more independant like Python....? [/quote:10b2b8b108]Either can be done, #!cl scripts if you want host commands spawned by some process manager, or PyRAF if you want a python interface. 'Better' is a relative term filled with personal preference.LBTDave Replies:
[quote:10b2b8b108] In a related question, I am looking into putting together a data reduction computer for the LBT and the cluster vs. workstation issue has come up. Our optical wide-field cameras will have a total of eight 2x4.6k CCDs. Is iraf purely single-threaded, or can parts of it (e.g. mscred) make proper use of multiple CPUs (and do you need to do anything special to make this happen)? ....[/quote:10b2b8b108]IRAF [b:10b2b8b108]is[/b:10b2b8b108] purely threaded. The Mosaic pipeline I mention above is similar to your case and achieves parallelization by processing each of the 8 CCDs on a separate node rather than in say 8 threads. All you need to do is have each node access a different extension of the MEF (e.g. with iraf networking which is faster than NFS access, but still not as fast as local disk access so staging of data is a big faactor in efficiency). Note that the same trick can sometimes be applied to a single CCD by using image sections, however many tasks interpolate images or otherwise need to deal with "edge effects" so the logical unit for processing in a complete CCD readout.[quote:10b2b8b108]
On a more esoteric level, would a complete recompilation of iraf on one of these new CPUs gain anything, or is the latest/greatest 3GHz Conroe expected to perform the same as the 3GHz Northwood I have in my 3 year old workstation? Does iraf make use of the SSE extensions and such? I've never seen iraf make 100% use of my current CPU, so I was wondering if there are optimizations that could be done to speed things up a bit. [/quote:10b2b8b108]I've experimented with optimization flags a bit and while compiling for specific hardware is easy enough to do it doesn't really gain much. The same can be said for altivec/sse optimization. Remember that most tasks process an image one line at a time so you're not really optimizing a bias subtraction of say the full 2kx4k ccd but a single 2k row 4k times. The marshalling/unmarshalling of the data basically wipes out any gains.
The high level app would need to be rewritten to see significant gains and these would be one-off changes and not system wide. Compiling the VOPS (vector operator) interface for altivec/sse can be done, but only a small part of any given task is spent in these routines anyway.Emiliano reminds us:
[quote:10b2b8b108]I also asked the same questions, and seems that the answer is "no": no smp or clustering support, no simd (altivec or sse) support...
At least no "direct support inside IRAF", but you can write a C or Fortran parallel code "compatible with IRAF", that is an application you can run it from the cl using IRAF parameter files and so on... see this post:[/quote:10b2b8b108]I'd remind you that the application mentioned here was new code using MPI and designed to be run on a cluster, not the use of existing iraf tasks.
LBTDave again:
[quote:10b2b8b108]
I am less familiar with this method of running iraf, but you can apparently package things together into a "shell script" type format (#!cl). Those can be submitted as batch jobs on a multi-CPU machine. Perhaps the real Guru's can comment (or point us to a relevant document)? [/quote:10b2b8b108]Frank is travelling now and would be the best person to respond about how the Mosaic pipeline handles all this and the state of the software. I do know #!cl scripts are heavily used and it sounds like your case is very similar. For information on #!cl scripts see the old (and only) document on it at http://iraf.noao.edu/iraf/web/new_stuff/cl_host.html[quote:10b2b8b108]
With CPUs heading towards multiple cores rather than more GHz, we clearly need some good way to extend iraf to make better use of the hardware that is or will be available in the near future. However that is done will point towards multi-threaded or massively parallel architectures.
[/quote:10b2b8b108]There are various schemes in mind but the idea of threading in IRAF is misplaced unless you're also willing to undertake a rewrite of the applications. A threads (or MPI) interface could be put into the iraf kernel easy enough (relatively speaking), but it doesn't help system wide because the IMIO interface that reads one section of an image wouldn't use it. However, a flavor of CCDPROC for mosaic detectors certainly could but would need to be written to use the new interface where it knows when/where threads need to be managed. Multi-core CPUs are already used implicitly, i.e. start two background jobs and each is running on a different core. In the normal interactive mode your CCDPROC is running on one core and the CL/whatever else uses the other. (not quite your point I know, but what you propose is a massive project and finding time to update the FAQ on this website is tough enough).Anyway, good discussion by all, please keep it up.Cheers,
-Mike

 
Profile Email
 Quote
LBT_Dave
 07/25/2006 07:43AM  
++---
Junior

Status: offline


Registered: 03/17/2006
Posts: 21
Mike, thanks for the details!> the trick is to have some controlling process spawn these as part of a higher level application like a pipeline. I do this manually at the moment (I am the controlling process). I need to learn how to do this in a more automatic manner. I do not know PyRAF or the use of #!cl scripting, so I have no initial preferences. Is one 'better' in terms of greater flexibility and/or capability (this is why I went with iraf over idl 12 years ago, despite what the idl afficionados might say)? And it sounds like I should understand better how mscred works. >most tasks process an image one line at a time Stupid question perhaps, but what about reformatting a 2k x 4k image as a 1 x 8m image for the purpose of at least some of the calculations (the ones that might benefit from this, like bias subtraction or flatfielding)? Or what about having iraf work on N rows of data at a time (where N = the number of cores available)? Would iraf work faster on a 4k x 2k image than a 2k x 4k image?> what you propose is a massive project I was thinking more in terms of coding just a few of the most calculation intensive tasks to be multi-threaded (or parallel), where you would potentially see the biggest benefit. I had a CR masking routine recently run for 8 days on my workstation. I'm hoping to get to the point where a night's data can be 'reduced' in ~1 day to where you can start to do science on it.

 
Profile Email
 Quote
fitz
 07/25/2006 07:43AM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 3988
[quote:69a6970792]
I do this manually at the moment (I am the controlling process). I need to learn how to do this in a more automatic manner. I do not know PyRAF or the use of #!cl scripting, so I have no initial preferences. Is one 'better' in terms of greater flexibility and/or capability ...[/quote:69a6970792]Either (or a mix) could be made to work probably, I don't really know enough details about your project to recommend one over the other.[quote:69a6970792]
Stupid question perhaps, but what about reformatting a 2k x 4k image as a 1 x 8m image for the purpose of at least some of the calculations (the ones that might benefit from this, like bias subtraction or flatfielding)? [/quote:69a6970792]In theory, yes. In reality you do the bias subtract once so maybe the 1sec in takes (assuming a slow 8 MFLOPS cpu) becomes 0.5sec, so what. You don't really see much benefit without a lot more points or outside of an algorith like fitting a surface to the image where you may iterate a while over the same array. The other practical problem is that the SPP code doesn't always become the kind of code that the optimizers can do anything with. Replacing some of the VOPS routines with hand-coded, hand-optimized c/fortran is needed to realize all the gains of the compiler altivec/sse optimization.[quote:69a6970792]
Or what about having iraf work on N rows of data at a time (where N = the number of cores available)? Would iraf work faster on a 4k x 2k image than a 2k x 4k image?[/quote:69a6970792]Again these require task-level changes and aren't always practical. Transposing the 2kx4k array takes time, as does even referencing it as though it were transposed so there is little gain there. Overall I haven't been impressed enough by tricks like this to believe there is a quick fix to be found in compiler optimizations, and I'm not really motivated enough to considering rewriting the various tasks that could easiily process an entire image in memory rather than line-by-line (remember some of these tasks were written when 128Mb RAM was a monster machine!).[quote:69a6970792]
> what you propose is a massive projectI was thinking more in terms of coding just a few of the most calculation intensive tasks to be multi-threaded (or parallel), where you would potentially see the biggest benefit.[/quote:69a6970792]Certainly one could write a new application and call it from IRAF any number of ways. Your new super-fast CR task with MPI and cluster embellishments can easily be a foreign task called from the #!cl/pyraf pipeline.-Mike

 
Profile Email
 Quote
emiliano
 07/25/2006 07:43AM  
+++--
Chatty

Status: offline


Registered: 12/05/2005
Posts: 38
Dave said:[quote:41c3af3ce8]In the past I have seen no difference in processing time between a 3.4GHz Northwood P4 and a dual 3.4GHz Xeon machine of the same generation (with 3x the RAM) when working on single-chip data.[/quote:41c3af3ce8]IMHO your real bottleneck were the disk I/O performance... I'd like to see IRAF performance (say ccdproc task, or a similar one) on a 2GHz machine, with a fair amount or ram, but with a super-fast and big RAID 0 array, filled with 10k RPM SCSI disks Surprised! Well... when talking about clustering or even SMP, for a "well designed parallel code", the number of cpu/cores and core frequency are not always the most important parameters. The real bottlenecks are "cache miss", memory bandwidth, the network latency for message passing to other nodes....Since IRAF is a serial (i.e. not parallel) application, maybe the best we can do now is carefully running IRAF tasks on different nodes/cpu with different data, to solve "explicitly parallel problems"... for example calibrate a lot of science frames with the same calibration frames, a different science frame on each cpu/core/node ... in order to avoid or minimize data dependencies....In this picture, a good job scheduler coupled to an automatic way of selecting data to process (I called it a "database driven pipeline") could help, but you'll always have to carefully design your network and disk I/O subsystems...

 
Profile Email
 Quote
LBT_Dave
 07/25/2006 07:43AM  
++---
Junior

Status: offline


Registered: 03/17/2006
Posts: 21
> but with a super-fast and big RAID 0 array My old workstation had a 4-disk raid0 (7200 IDE disks) and I saw no significant difference in processing time relative to a similar machine with one disk. I never saw any swapping on machines with 1GB of ram or more. My conclusion at the time was that, despite top reporting minimal CPU usage by iraf, things were CPU bound. That scaled pretty well through several newer/faster workstations. But at least one 10k or 15k SCSI (it does not need to be big, it only needs to hold the data you are working on) scratch disk on each node in a cluster would definitely be how I would set things up. > when 128Mb RAM was a monster machine! Now that this is no longer the case, I toyed with setting up a ramdisk at least as a place to dump temporary/intermediate images (Frank or Mike may have answered an earlier question from me on this subject, pre-iraf.net days). You could conceivably move a whole set of images being worked on to the ramdisk and only dump the final product back to the hard drive. My understanding was that Linux sort of does this automatically and there would be no benefit to setting up a ramdisk.Dave Thompson, LBT Observatory.

 
Profile Email
 Quote
fitz
 07/25/2006 07:43AM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 3988
[quote:b22a7d41cf]> when 128Mb RAM was a monster machine! [/quote:b22a7d41cf]....and six people working on iraf was a 'small development group'...but I regress.Anyway, most modern kernels will keep frequently accessed files in virtual memory so you won't really notice a ramdisk improvement if you're just banging on the same image over and over. The linux VM model changed in the 2.4 kernel (at least I think it was before the current 2.6 series) and I haven't done any performance tests since, Torvalds thought it was an improvement so who am I to argue.Be careful about declaring a task to be CPU-bound without having adequate profiling information. Depending on the situation, there could just be a lot of context switching going on rather than page swapping, I frequently see near- 100% cpu usage when an IRAF task is the only thing busy on the machine.-Mike

 
Profile Email
 Quote
LBT_Dave
 07/25/2006 07:43AM  
++---
Junior

Status: offline


Registered: 03/17/2006
Posts: 21
> you won't really notice a ramdisk improvement if you're just banging on the same image over and over. This would be true for reading the same image over and over, but what about the temporary files that get created, used briefly, and then deleted (for example, a set of N sky-subtracted images that will be imcombined)? Rather than writing these out to disk, would dumping them to a ramdisk be faster? If I understand things correctly, this would also not save much time as they too would be in the cached memory (assuming there is sufficient RAM) and not re-read from disk, so all you would save are the CPU cycles involved in writing the temporary images to disk. Dave Thompson, LBT Observatory

 
Profile Email
 Quote
fitz
 07/25/2006 07:43AM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 3988
Seems like an easy enough thing to test, please let us know what you find out 8-)There is in fact a "VM cache" interface already in the iraf kernel but it isn't yet connected to anything. The purpose was to allow an app to lock certain files in virtual memory to guarantee they wouldn't be swapped out, effectively creating a ramdisk. Along with other events, the thing that kept this from being fully realized was that at the time the mlock() functionality was simply a stub routine in linux. I'm certain that's changed by now and this is one of many projects that should be revived. See iraf$unix/boot/vmcached if interested.-Mike

 
Profile Email
 Quote
LBT_Dave
 07/25/2006 07:43AM  
++---
Junior

Status: offline


Registered: 03/17/2006
Posts: 21
[quote:f3af96b333]The purpose was to allow an app to lock certain files in virtual memory to guarantee they wouldn't be swapped out[/quote:f3af96b333] What is iraf doing when a file is being written to disk? If it is waiting for the disk I/O to finish before continuing on, then a ramdisk should help. Dave (LBTO)

 
Profile Email
 Quote
fitz
 07/25/2006 07:43AM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 3988
At least for image i/o it everntually all boils down to a standard glibc write() call. A small test program that does a write() to ramdisk and physical disk should give you an idea of what the relative performance would be. Keep in mind things like virtual memory keep frequently-used files in memory anyway, although the method used varies by platform.-Mike

 
Profile Email
 Quote
BrainBug
 07/25/2006 07:43AM  
++---
Junior

Status: offline


Registered: 10/30/2006
Posts: 33
[quote:564fb730b2="fitz"]
most modern kernels will keep frequently accessed files in virtual memory so you won't really notice a ramdisk improvement if you're just banging on the same image over and over.
-Mike
[/quote:564fb730b2]
In the latest kernel (2.6.22.5) exist some new interesting things:
"Multicore tasks manager" or something like it... I don't remebmer exactly...
Now I have installed hand-builded 2.6.22.5 and can say, that kernel trying "virtual" parallelism of tasks!
As example, FireFox browser on the old kernel, 2.6.17.4, starting ~6 seconds with loading only one core(i have Pentium IV 915 2.8GHz up to 3.22 GHz).
On the 2.6.22.5 this task take near 2 seconds with loading both cores(73%/27% ... 65%/35%)!!! I don't know what does it mean, but it works fine for me!
This feature i'm not tested with IRAF yet... coming soon Smile

 
Profile Email Website
 Quote
BrainBug
 07/25/2006 07:43AM  
++---
Junior

Status: offline


Registered: 10/30/2006
Posts: 33
So, on the 2.6.22.6 linux kernel summary load of CPUs is 100% when working imcoadd...
And what is it mean:
http://www.astro.yale.edu/chunter/astro-support/new-hosts.html
[i:38d5e6996b]
128.36.139.20 yale128036139020.astro.yale.edu superior 206 iraf cluster/zinn
128.36.139.21 yale128036139021.astro.yale.edu michigan 206 iraf cluster/zinn
128.36.139.22 yale128036139022.astro.yale.edu huron 206 iraf cluster/zinn
128.36.139.23 yale128036139023.astro.yale.edu erie 206 iraf cluster/zinn
128.36.139.24 yale128036139024.astro.yale.edu ontario 206 iraf cluster/zinn
128.36.139.25 yale128036139025.astro.yale.edu saintclair 206 iraf cluster/zinn
[/i:38d5e6996b]

 
Profile Email Website
 Quote
   
Content generated in: 0.35 seconds
New Topic Post Reply

Normal Topic Normal Topic
Sticky Topic Sticky Topic
Locked Topic Locked Topic
New Post New Post
Sticky Topic W/ New Post Sticky Topic W/ New Post
Locked Topic W/ New Post Locked Topic W/ New Post
View Anonymous Posts 
Anonymous users can post 
Filtered HTML Allowed 
Censored Content 
dog allergies remedies cialis 20 mg chilblain remedies


Privacy Policy
Terms of Use

User Functions

Login