joequant |
10/29/2013 10:22PM (Read 2710 times)
|
|
|
Status: offline
Registered: 10/25/2013
Posts: 17
|
Just a heads up with a problem we found and fixed on linux. It's a very subtle problem that causes random seeming bugs.
https://bugs.mageia.org/show_bug.cgi?id=11507
The basic issues is that IRAF dynamic memory works with memory that is out of bounds. The problem here is that some loop optimizers will see that there is a loop operating with out of bounds memory and then optimize it with something undefined since according to the C standard, out of bounds behavior is undefined.
This causes problems with f2c running with GCC 4.8. We fixed it by forcing the f77.sh wrapper to run with -fno-aggressive-loop-optimization
|
|
|
|
fitz |
10/29/2013 10:51PM
|
|
|
Status: offline
Registered: 09/30/2005
Posts: 4040
|
I don't quite follow the bug report, the code you have listed as coming from gsacpts (presumable in $iraf/math/gsurfit) is not the same as the code we have here, or even in your git hub. Was something re-written? It isn't shown but I assume 'sum' was initialized? Likewise, the last comment seems to imply you're not using the F2C compiler any longer?
|
|
|
|
joequant |
10/29/2013 10:58PM
|
|
|
Status: offline
Registered: 10/25/2013
Posts: 17
|
Quote by: fitzI don't quite follow the bug report, the code you have listed as coming from gsacpts (presumable in $iraf/math/gsurfit) is not the same as the code we have here, or even in your git hub. Was something re-written? It isn't shown but I assume 'sum' was initialized? Likewise, the last comment seems to imply you're not using the F2C compiler any longer?
The code in the bug report was intermediate code that was used to track down the problem.
The actual fix required no code changes but involved adding the compile option -fno-aggressive-loop-optimizations to the f77.sh script which is used to compile f77 on IRAF. The f77 script is a wrapper around f2c.
This was a tricky bug to find and fix because without that flag, the optimizer will generate bad code.
|
|
|
|
fitz |
10/29/2013 11:03PM
|
|
|
Status: offline
Registered: 09/30/2005
Posts: 4040
|
Intermediate how? I don't see that produced by XPP or RPP.
|
|
|
|
joequant |
10/29/2013 11:16PM
|
|
|
Status: offline
Registered: 10/25/2013
Posts: 17
|
Quote by: fitzIntermediate how? I don't see that produced by XPP or RPP.
Intermediate in the sense that we were using it for trial and error. We had to do a lot of trial and error before we nailed down the problem.
There is nothing wrong code-wise with IRAF. The problem is that because IRAF uses out-of-bounds memory for dynamic allocation, the compiler can assume that seemingly valid IRAF code is undefined because it is modifying out of bounds memory and can generate bad machine code because of it.
|
|
|
|
fitz |
10/29/2013 11:18PM
|
|
|
Status: offline
Registered: 09/30/2005
Posts: 4040
|
Also, note that the -fno-aggressive-loop-optimization flag is new to GCC 4.8 and will cause an error for earlier versions.
|
|
|
|
joequant |
10/29/2013 11:24PM
|
|
|
Status: offline
Registered: 10/25/2013
Posts: 17
|
Quote by: fitz
Also, note that the -fno-aggressive-loop-optimization flag is new to GCC 4.8 and will cause an error for earlier versions.
Yes, and I suspect that this odd behavior we are seeing is because we are using GCC 4.8 to compile things, and GCC 4.8 optimizes things more aggressively than early versions.
|
|
|
|
fitz |
10/29/2013 11:26PM
|
|
|
Status: offline
Registered: 09/30/2005
Posts: 4040
|
The problem is that because IRAF uses out-of-bounds memory for dynamic allocation,
That's not quite how it works, the Mem common is just an indexing trick but the memory is actually allocated.
.... code is undefined because it is modifying out of bounds memory and can generate bad machine code because of it.[/p]
Okay, however the assembly code produced with or without the flag is the same on my machine, and the example runs as described so I don't see either the bug or the fix.
|
|
|
|
joequant |
10/30/2013 03:37AM
|
|
|
Status: offline
Registered: 10/25/2013
Posts: 17
|
Sorry if this is a duplicate....
Yes the memory is properly allocated, however the trick has some unintended side effects. What is happening is that fortran finds the base of the heap with an array of size one. F2C converts this into a static structure of one element. When gcc 4.8 does the static analysis it sees the memd pointer, sees that it is assigned to a static array of size one, and then assumes that any references to the array of greater than 1 are pointing to unallocated memory.
If there were another way of getting a pointer to the base of the heap other than with an array of length one, this would be nice since it would mean that IRAF could be ported to F90. The trick of using an array as a pointer works in f77 but its forbidden in f90, and I haven't been able to think of a workaround.
Also, this was a nasty bug to track down because the compiler has the option to choose what code to generate, and usually it will generate the right code, but the optimizer will sometimes try to optimize out the loop.
|
|
|
|
joequant |
11/02/2013 12:52AM
|
|
|
Status: offline
Registered: 10/25/2013
Posts: 17
|
OK, I have some more information about the root causes and fix for the problem.
The issue is that gcc 4.8 is extremely aggressive at using SSE2 instructions to deal with double arithmetic. However for these instructions to work, the doubles have to be aligned on 128-bit memory addresses. This does not have to do with 32/64-bit.
The compiler issues were because the IRAF memory management caused a situation in which gcc assumed that the memory was 128-bit aligned but it was not in fact aligned, so the SSE2 instructions were causing failures.
The fix is to change IRAF memory management to force alignment on 128-bit addresses. The two changes are
1) Change config.h to
define SZ_MEMALIGN (2*SZ_DOUBLE) # alignment criteria for malloc
2) change f77.sh to include
sed -i -e "s/memd\[1\]/memd\[2\]/" $b.c
Without this change, the compiler may not align memd against a 128-bit boundary. By having two doubles, gcc 4.8 does the initial alignment correctly
|
|
|
|
fitz |
11/02/2013 01:18AM
|
|
|
Status: offline
Registered: 09/30/2005
Posts: 4040
|
Thanks for the extra digging.
Do you have a test program to demonstrate this, and or a reference describing the SSE2 alignment requirement? IRAF memory (at least the part that the SPP code uses) is already 64-bit aligned which is also 16-bit aligned. Additionally, the address of the mem common (i.e. Memd, Memi, etc) is fixed at 0 (zero) so I don't understand how declaring the array to be 2 elements would make any difference. Lastly, are you setting '-msse2' or other GCC flags when compiling or just using defaults for GCC 4.8?
|
|
|
|
joequant |
11/03/2013 03:34PM
|
|
|
Status: offline
Registered: 10/25/2013
Posts: 17
|
Hi. My post seems to have gotten eaten up..
Do you have a test program to demonstrate this, and or a reference describing the SSE2 alignment requirement?
Not a full test program, but I can show how to reproduce ths issue....
pkg/images/imcoords/src/ccfunc.x
around line 150 you see
if (! transpose) {
Memd[w+ax1-1] = tlngref
Memd[w+ax2-1] = tlatref
}
If you put a printf before and after this block of code, you will see that the first assign does not get executed, and this causes problems further downstream. If you change the code to read
if (! transpose) {
Memd[w+ax1-1] = tlngref
call printf("")
Memd[w+ax2-1] = tlatref
}
It works fine.
Looking that assembly, you see that in the first situation it's trying to do a movapd to do the assign in with instruction. However, from the docs....
http://en.wikipedia.org/wiki/MOVAPD
it seems that MOVAPD requires 128-bit alignment
IRAF memory (at least the part that the SPP code uses) is already 64-bit aligned which is also 16-bit aligned. Additionally, the address of the mem common (i.e. Memd, Memi, etc) is fixed at 0 (zero) so I don't understand how declaring the array to be 2 elements would make any difference.
The problem is that GCC-4.8 is issuing instructions that expect 128-bit alignment. To get that you have to set the memory so that it is aligns the address on 128-bit boundaries, also it is necessary to allocate two doubles so that compiler puts the inital zero address on an 128-bit boundary.
Lastly, are you setting '-msse2' or other GCC flags when compiling or just using defaults for GCC 4.8?[/p]
Defaults. It appears that GCC 4.8 assumes that sse2 is standard as I haven't figured out how to turn it off. Setting no-sse2 doesn't work, because GCC tries to generate instructions for the other SSE versions.
The other thing is that everything works with -O2 once I do the alignment. Once I try -O3 then things break again.
|
|
|
|
fitz |
11/04/2013 11:41PM
|
|
|
Status: offline
Registered: 09/30/2005
Posts: 4040
|
I'm not sure why, but I still don't see the issue. I created a Mageia 4 VM (64-bit) using the latest ISO, it comes with GCC 4.8.2 and I had to get several other RPMs (e.g. F2C). Using the v2.16.1 distro we have, or the RPM that you created, I don't even see the 'movapd' instruction generated in any of the code. The test program I used was
PHP Formatted Code procedure test ()
pointer sp , data
double x
int i
begin
call smark (sp )
call salloc (data , 100, TY_DOUBLE )
x = 3.14
do i = 1, 10 {
Memd [data +i -1] = x
Memd [data +i ] = x + 1
}
call sfree (sp )
end
I compiled this to the C code and then to assembly using the commands:
PHP Formatted Code
% xc -c -F -/K test .x
% cc -c -S -I $iraf/unix /bin .linux64 -m64 test .c
and the resulting 'test.s' has no MOVAPD. Likewise, using your rpm installation and making adjustments for the use of /usr/bin/f2c and /usr/include/f2c.h shows essentially the same assembly code. I do see your change to the size of the Mem common, but no movapd. No special compiler flags, stock version of GCC, etc.
Note that with the test program (or the ccfunc.x code) I do see a warning about the aggressive loop optimization, however the GCC seems to support -faggressive-loop-optimization by default but there is no -fno-aggressive-loop-optimization flag or similar to disable it. Either way, the code seems to run as expected.
Are you using a stock system or some developer's setup?
|
|
|
|
olebole |
02/12/2018 11:07AM
|
|
|
Status: offline
Registered: 05/01/2014
Posts: 103
|
This is also reported as issue #73 of IRAFs github repository. This is a major problem, as it may lead to wrong results, if a user compiles his own code.
To bring a specific example here (taken from the test suite), and performed with the original 2.16.1 binary distribution:
Take the following task as otest.x:
SPP Formatted Code task otest = t_otest
procedure t_otest ()
int i
pointer p, sp
begin
call smark(sp)
call salloc(p, 4, TY_DOUBLE)
do i = 1, 4
memd[p+i-1] = i
do i = 1, 4 {
call printf("%d == %g\n")
call pargi(i)
call pargd(memd[p+i-1])
}
call sfree(sp)
end
All this code does is to allocate a temporary array with four integers, fill each position with its index, and then print out the integers. Compile it, declare the task in (e)cl and run it. What is expected to happen:
CL Formatted Code cl> softools
softools> xc otest.x
sys_runtask:
t_otest:
otest.f:
sysruk:
totest:
link:
softools> task $otest = otest.e
softools> otest
1 == 1.
2 == 2.
3 == 3.
4 == 4.
However, what happens is (f.e. with Ubuntu 12.04 and up):
CL Formatted Code cl> softools
softools> xc otest.x
otest.x:
sys_runtask:
t_otest:
otest.f:
sysruk:
totest:
link:
softools> task $otest = otest.e
softools> otest
1 == 9.345989255884568E-307
2 == 5.006895754891793E-308
3 == 8.622161201715412E-308
4 == 4.
The danger with this is that anyone using IRAF on a newer Linux platform and compiling his own tasks will produce errornous code!
Fixed in pull request #60 (and in the snapshot release 2.16.1+2018.02.04).
|
|
|
|