Optimizer / IRAF dynamic memory conflict

Welcome to iraf.net Saturday, April 20 2024 @ 10:09 AM GMT

	Forum Index > Help Desk > Systems	New Topic	Post Reply
Optimizer / IRAF dynamic memory conflict

joequant

10/29/2013 10:22PM (Read 2705 times)

Junior

Status: offline

Registered: 10/25/2013
Posts: 17

Just a heads up with a problem we found and fixed on linux. It's a very subtle problem that causes random seeming bugs.

https://bugs.mageia.org/show_bug.cgi?id=11507

The basic issues is that IRAF dynamic memory works with memory that is out of bounds. The problem here is that some loop optimizers will see that there is a loop operating with out of bounds memory and then optimize it with something undefined since according to the C standard, out of bounds behavior is undefined.

This causes problems with f2c running with GCC 4.8. We fixed it by forcing the f77.sh wrapper to run with -fno-aggressive-loop-optimization

Profile

Quote

fitz

10/29/2013 10:51PM

Admin

Status: offline

Registered: 09/30/2005
Posts: 4040

I don't quite follow the bug report, the code you have listed as coming from gsacpts (presumable in $iraf/math/gsurfit) is not the same as the code we have here, or even in your git hub. Was something re-written? It isn't shown but I assume 'sum' was initialized? Likewise, the last comment seems to imply you're not using the F2C compiler any longer?

Profile

Quote

joequant

10/29/2013 10:58PM

Junior

Status: offline

Registered: 10/25/2013
Posts: 17

Quote by: fitz

The code in the bug report was intermediate code that was used to track down the problem.

The actual fix required no code changes but involved adding the compile option -fno-aggressive-loop-optimizations to the f77.sh script which is used to compile f77 on IRAF. The f77 script is a wrapper around f2c.

This was a tricky bug to find and fix because without that flag, the optimizer will generate bad code.

Profile

Quote

fitz

10/29/2013 11:03PM

Admin

Status: offline

Registered: 09/30/2005
Posts: 4040

Intermediate how? I don't see that produced by XPP or RPP.

Profile

Quote

joequant

10/29/2013 11:16PM

Junior

Status: offline

Registered: 10/25/2013
Posts: 17

Quote by: fitz

Intermediate how? I don't see that produced by XPP or RPP.

Intermediate in the sense that we were using it for trial and error. We had to do a lot of trial and error before we nailed down the problem.

There is nothing wrong code-wise with IRAF. The problem is that because IRAF uses out-of-bounds memory for dynamic allocation, the compiler can assume that seemingly valid IRAF code is undefined because it is modifying out of bounds memory and can generate bad machine code because of it.

Profile

Quote

fitz

10/29/2013 11:18PM

Admin

Status: offline

Registered: 09/30/2005
Posts: 4040

Also, note that the -fno-aggressive-loop-optimization flag is new to GCC 4.8 and will cause an error for earlier versions.

Profile

Quote

joequant

10/29/2013 11:24PM

Junior

Status: offline

Registered: 10/25/2013
Posts: 17

Quote by: fitz

Also, note that the -fno-aggressive-loop-optimization flag is new to GCC 4.8 and will cause an error for earlier versions.

Yes, and I suspect that this odd behavior we are seeing is because we are using GCC 4.8 to compile things, and GCC 4.8 optimizes things more aggressively than early versions.

Profile

Quote

fitz

10/29/2013 11:26PM

Admin

Status: offline

Registered: 09/30/2005
Posts: 4040

The problem is that because IRAF uses out-of-bounds memory for dynamic allocation,

That's not quite how it works, the Mem common is just an indexing trick but the memory is actually allocated.

.... code is undefined because it is modifying out of bounds memory and can generate bad machine code because of it.[/p]

Okay, however the assembly code produced with or without the flag is the same on my machine, and the example runs as described so I don't see either the bug or the fix.

Profile

Quote

joequant

10/30/2013 03:37AM

Junior

Status: offline

Registered: 10/25/2013
Posts: 17

Sorry if this is a duplicate....

Yes the memory is properly allocated, however the trick has some unintended side effects. What is happening is that fortran finds the base of the heap with an array of size one. F2C converts this into a static structure of one element. When gcc 4.8 does the static analysis it sees the memd pointer, sees that it is assigned to a static array of size one, and then assumes that any references to the array of greater than 1 are pointing to unallocated memory.

If there were another way of getting a pointer to the base of the heap other than with an array of length one, this would be nice since it would mean that IRAF could be ported to F90. The trick of using an array as a pointer works in f77 but its forbidden in f90, and I haven't been able to think of a workaround.

Also, this was a nasty bug to track down because the compiler has the option to choose what code to generate, and usually it will generate the right code, but the optimizer will sometimes try to optimize out the loop.

Profile

Quote

joequant

11/02/2013 12:52AM

Junior

Status: offline

Registered: 10/25/2013
Posts: 17

OK, I have some more information about the root causes and fix for the problem.

The issue is that gcc 4.8 is extremely aggressive at using SSE2 instructions to deal with double arithmetic. However for these instructions to work, the doubles have to be aligned on 128-bit memory addresses. This does not have to do with 32/64-bit.

The compiler issues were because the IRAF memory management caused a situation in which gcc assumed that the memory was 128-bit aligned but it was not in fact aligned, so the SSE2 instructions were causing failures.

The fix is to change IRAF memory management to force alignment on 128-bit addresses. The two changes are

1) Change config.h to

define SZ_MEMALIGN (2*SZ_DOUBLE) # alignment criteria for malloc

2) change f77.sh to include

sed -i -e "s/memd\[1\]/memd\[2\]/" $b.c

Without this change, the compiler may not align memd against a 128-bit boundary. By having two doubles, gcc 4.8 does the initial alignment correctly

Profile

Quote

fitz

11/02/2013 01:18AM

Admin

Status: offline

Registered: 09/30/2005
Posts: 4040

Thanks for the extra digging.

Do you have a test program to demonstrate this, and or a reference describing the SSE2 alignment requirement? IRAF memory (at least the part that the SPP code uses) is already 64-bit aligned which is also 16-bit aligned. Additionally, the address of the mem common (i.e. Memd, Memi, etc) is fixed at 0 (zero) so I don't understand how declaring the array to be 2 elements would make any difference. Lastly, are you setting '-msse2' or other GCC flags when compiling or just using defaults for GCC 4.8?

Profile

Quote

joequant

11/03/2013 03:34PM

Junior

Status: offline

Registered: 10/25/2013
Posts: 17

Hi. My post seems to have gotten eaten up..

Do you have a test program to demonstrate this, and or a reference describing the SSE2 alignment requirement?

Not a full test program, but I can show how to reproduce ths issue....

pkg/images/imcoords/src/ccfunc.x

around line 150 you see

if (! transpose) {
Memd[w+ax1-1] = tlngref
Memd[w+ax2-1] = tlatref
}

If you put a printf before and after this block of code, you will see that the first assign does not get executed, and this causes problems further downstream. If you change the code to read

if (! transpose) {
Memd[w+ax1-1] = tlngref
call printf("")
Memd[w+ax2-1] = tlatref
}

It works fine.

Looking that assembly, you see that in the first situation it's trying to do a movapd to do the assign in with instruction. However, from the docs....

http://en.wikipedia.org/wiki/MOVAPD

it seems that MOVAPD requires 128-bit alignment

IRAF memory (at least the part that the SPP code uses) is already 64-bit aligned which is also 16-bit aligned. Additionally, the address of the mem common (i.e. Memd, Memi, etc) is fixed at 0 (zero) so I don't understand how declaring the array to be 2 elements would make any difference.

The problem is that GCC-4.8 is issuing instructions that expect 128-bit alignment. To get that you have to set the memory so that it is aligns the address on 128-bit boundaries, also it is necessary to allocate two doubles so that compiler puts the inital zero address on an 128-bit boundary.

Lastly, are you setting '-msse2' or other GCC flags when compiling or just using defaults for GCC 4.8?[/p]

Defaults. It appears that GCC 4.8 assumes that sse2 is standard as I haven't figured out how to turn it off. Setting no-sse2 doesn't work, because GCC tries to generate instructions for the other SSE versions.

The other thing is that everything works with -O2 once I do the alignment. Once I try -O3 then things break again.

Profile

Quote

fitz

11/04/2013 11:41PM

Admin

Status: offline

Registered: 09/30/2005
Posts: 4040

I'm not sure why, but I still don't see the issue. I created a Mageia 4 VM (64-bit) using the latest ISO, it comes with GCC 4.8.2 and I had to get several other RPMs (e.g. F2C). Using the v2.16.1 distro we have, or the RPM that you created, I don't even see the 'movapd' instruction generated in any of the code. The test program I used was

PHP Formatted Code
procedure test ()

pointer sp, data

double  x

int     i

begin

        call smark (sp)

        call salloc (data,100, TY_DOUBLE)

        x =3.14
do i =1,10{

            Memd[data+i-1]= x

            Memd[data+i  ]= x +1
}

        call sfree (sp)
end

I compiled this to the C code and then to assembly using the commands:

PHP Formatted Code

% xc -c -F -/K test.x
% cc -c -S -I$iraf/unix/bin.linux64 -m64 test.c

and the resulting 'test.s' has no MOVAPD. Likewise, using your rpm installation and making adjustments for the use of /usr/bin/f2c and /usr/include/f2c.h shows essentially the same assembly code. I do see your change to the size of the Mem common, but no movapd. No special compiler flags, stock version of GCC, etc.

Note that with the test program (or the ccfunc.x code) I do see a warning about the aggressive loop optimization, however the GCC seems to support -faggressive-loop-optimization by default but there is no -fno-aggressive-loop-optimization flag or similar to disable it. Either way, the code seems to run as expected.

Are you using a stock system or some developer's setup?

Profile

Quote

olebole

02/12/2018 11:07AM

Regular Member

Status: offline

Registered: 05/01/2014
Posts: 103

This is also reported as issue #73 of IRAFs github repository. This is a major problem, as it may lead to wrong results, if a user compiles his own code.

To bring a specific example here (taken from the test suite), and performed with the original 2.16.1 binary distribution:

Take the following task as otest.x:

SPP Formatted Code
task otest = t_otest

procedure t_otest ()

int i

pointer p, sp

begin

    call smark(sp)

    call salloc(p, 4, TY_DOUBLE)

    do i = 1, 4

        memd[p+i-1] = i

    do i = 1, 4 {

        call printf("%d == %g\n")

            call pargi(i)

            call pargd(memd[p+i-1])

    }

    call sfree(sp)

end

All this code does is to allocate a temporary array with four integers, fill each position with its index, and then print out the integers. Compile it, declare the task in (e)cl and run it. What is expected to happen:

CL Formatted Code
cl> softools

softools> xc otest.x

   sys_runtask:

   t_otest:

otest.f:

   sysruk:

   totest:

link:

softools> task $otest = otest.e

softools> otest

1 == 1.

2 == 2.

3 == 3.

4 == 4.

However, what happens is (f.e. with Ubuntu 12.04 and up):

CL Formatted Code
cl> softools

softools> xc otest.x

otest.x:

   sys_runtask:

   t_otest:

otest.f:

   sysruk:

   totest:

link:

softools> task $otest = otest.e

softools> otest

1 == 9.345989255884568E-307

2 == 5.006895754891793E-308

3 == 8.622161201715412E-308

4 == 4.

The danger with this is that anyone using IRAF on a newer Linux platform and compiling his own tasks will produce errornous code!

Fixed in pull request #60 (and in the snapshot release 2.16.1+2018.02.04).

Profile

Website

Quote