Welcome to iraf.net Saturday, April 20 2024 @ 05:40 AM GMT
duvall |
02/11/2015 12:01PM (Read 1651 times)
|
|
|
Status: offline
Registered: 03/08/2006
Posts: 59
|
We have multiple CPUs using one file system and I have the personally installed iraf that is used
by more than one CPU. It's linux and iraf 2.16.1, which I installed in Sept. 2014. I have a problem in which I get 'segmentation violations' from (I'm pretty sure) basic iraf tasks (imcopy; imarith; , etc.). When repeated, the errors do not occur in the same place. I've tried putting in 'flprc' in the
loop and it does not seem to help. It seems that it may have to do with multiple jobs running at once and somehow interacting. I realize this is pretty vague, but do you have any suggestions?
Thanks.
Tom
|
|
|
|
fitz |
02/13/2015 04:50PM
|
|
|
Status: offline
Registered: 09/30/2005
Posts: 4040
|
Two thoughts: The segfaults from tasks like IMCOPY has been reported when the 'cache' environment variable is set to "/tmp/". This has been fixed (but not yet released) and can sometimes happen during the install process, but the workaround is simply to reset the variable to another valid path. For example,
PHP Formatted Code
cl > show cache
/tmp /
cl > reset cache = "home$cache/" # trailing '/' required
If your cache is already something else then it may be a conflict with simultaneous access to a parameter file or the image itself. If all CPUs (are we talking multiple machines?) share a common login directory then they also share the uparm directory which can lead to conflicts with parameter files. However, this should not affect tasks like IMCOPY.
Otherwise, are the machines all processing the same images at the same time? Does the error happen only on data on an NFS mounted disk?
Lastly, if you're using image sections or MEF extensions then try doing
PHP Formatted Code cl > reset use_new_imt = no
|
|
|
|
duvall |
02/18/2015 09:42AM
|
|
|
Status: offline
Registered: 03/08/2006
Posts: 59
|
Thanks for the suggestions.
My cache variable is set to /home/duvall/.iraf/cache/duvall/. I have had a couple of problems where an error occurred with the message that it could not find some file in that directory.
In general, the conflicting programs are not using the same data files.
Yes, they are multiple machines, although each one has a number of cores (32, I think).
All the data being used is on a single file system which is ifs.
One thing I was considering is to have a separate 'iraf' for each system. It seems like somewhat of a radical solution, but there are only four systems which would not be too bad. Do you think this would fix my problem?
Thanks.
Tom
|
|
|
|
fitz |
02/24/2015 05:22PM
|
|
|
Status: offline
Registered: 09/30/2005
Posts: 4040
|
Unless the problem is some random bug in the code itself, I think the problem is more likely a conflict of multiple IRAF systems sharing a comming resource like the uparm directory, or simultaneous access to the same images causing the problem. Having separate IRAF installations on each machine won't fix either of these.
The only 'ifs' I know of is a Windows NT extension, are you sure it isn't something else? We have seen similar problems on filesystems such as Lustre where the underlying file is not sync'd properly to disk before the next I/O operation. Are you running some sort of distributed pipeline or is this just a random collection of users?
|
|
|
|
duvall |
03/16/2015 12:14PM
|
|
|
Status: offline
Registered: 03/08/2006
Posts: 59
|
Sorry about that. The file systems are OneFS.
I guess I'll need to track down better where the errors are coming from.
|
|
|
|
| |
|
Content generated in: 0.28 seconds |
|