Welcome to iraf.net Friday, April 19 2024 @ 09:00 PM GMT


 Forum Index > Help Desk > General IRAF New Topic Post Reply
 nonrepeatible crashes
   
duvall
 02/11/2015 12:01PM (Read 1650 times)  
+++--
Chatty
gloomy

Status: offline


Registered: 03/08/2006
Posts: 59
We have multiple CPUs using one file system and I have the personally installed iraf that is used
by more than one CPU. It's linux and iraf 2.16.1, which I installed in Sept. 2014. I have a problem in which I get 'segmentation violations' from (I'm pretty sure) basic iraf tasks (imcopy; imarith; , etc.). When repeated, the errors do not occur in the same place. I've tried putting in 'flprc' in the
loop and it does not seem to help. It seems that it may have to do with multiple jobs running at once and somehow interacting. I realize this is pretty vague, but do you have any suggestions?
Thanks.
Tom

 
Profile Email
 Quote
fitz
 02/13/2015 04:50PM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 4040
Two thoughts: The segfaults from tasks like IMCOPY has been reported when the 'cache' environment variable is set to "/tmp/". This has been fixed (but not yet released) and can sometimes happen during the install process, but the workaround is simply to reset the variable to another valid path. For example,

PHP Formatted Code

cl> show cache
/tmp/
cl> reset cache = "home$cache/"    # trailing '/' required
 


If your cache is already something else then it may be a conflict with simultaneous access to a parameter file or the image itself. If all CPUs (are we talking multiple machines?) share a common login directory then they also share the uparm directory which can lead to conflicts with parameter files. However, this should not affect tasks like IMCOPY.

Otherwise, are the machines all processing the same images at the same time? Does the error happen only on data on an NFS mounted disk?

Lastly, if you're using image sections or MEF extensions then try doing

PHP Formatted Code
cl> reset use_new_imt = no

 
Profile Email
 Quote
duvall
 02/18/2015 09:42AM  
+++--
Chatty

Status: offline


Registered: 03/08/2006
Posts: 59
Thanks for the suggestions.

My cache variable is set to /home/duvall/.iraf/cache/duvall/. I have had a couple of problems where an error occurred with the message that it could not find some file in that directory.

In general, the conflicting programs are not using the same data files.

Yes, they are multiple machines, although each one has a number of cores (32, I think).

All the data being used is on a single file system which is ifs.

One thing I was considering is to have a separate 'iraf' for each system. It seems like somewhat of a radical solution, but there are only four systems which would not be too bad. Do you think this would fix my problem?

Thanks.
Tom

 
Profile Email
 Quote
fitz
 02/24/2015 05:22PM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 4040

Unless the problem is some random bug in the code itself, I think the problem is more likely a conflict of multiple IRAF systems sharing a comming resource like the uparm directory, or simultaneous access to the same images causing the problem. Having separate IRAF installations on each machine won't fix either of these.

The only 'ifs' I know of is a Windows NT extension, are you sure it isn't something else? We have seen similar problems on filesystems such as Lustre where the underlying file is not sync'd properly to disk before the next I/O operation. Are you running some sort of distributed pipeline or is this just a random collection of users?

 
Profile Email
 Quote
duvall
 03/16/2015 12:14PM  
+++--
Chatty

Status: offline


Registered: 03/08/2006
Posts: 59
Sorry about that. The file systems are OneFS.

I guess I'll need to track down better where the errors are coming from.

 
Profile Email
 Quote
   
Content generated in: 0.28 seconds
New Topic Post Reply

Normal Topic Normal Topic
Sticky Topic Sticky Topic
Locked Topic Locked Topic
New Post New Post
Sticky Topic W/ New Post Sticky Topic W/ New Post
Locked Topic W/ New Post Locked Topic W/ New Post
View Anonymous Posts 
Anonymous users can post 
Filtered HTML Allowed 
Censored Content 
dog allergies remedies cialis 20 mg chilblain remedies


Privacy Policy
Terms of Use

User Functions

Login