Welcome to iraf.net Sunday, February 25 2018 @ 05:53 PM GMT


 Forum Index > Help Desk > Systems New Topic Post Reply
 Optimizer / IRAF dynamic memory conflict
   
joequant
 10/29/2013 10:22PM (Read 1210 times)  
++---
Junior

Status: offline


Registered: 10/25/2013
Posts: 17
Just a heads up with a problem we found and fixed on linux. It's a very subtle problem that causes random seeming bugs.

https://bugs.mageia.org/show_bug.cgi?id=11507

The basic issues is that IRAF dynamic memory works with memory that is out of bounds. The problem here is that some loop optimizers will see that there is a loop operating with out of bounds memory and then optimize it with something undefined since according to the C standard, out of bounds behavior is undefined.

This causes problems with f2c running with GCC 4.8. We fixed it by forcing the f77.sh wrapper to run with -fno-aggressive-loop-optimization

 
Profile Email
 Quote
fitz
 10/29/2013 10:51PM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 3998
I don't quite follow the bug report, the code you have listed as coming from gsacpts (presumable in $iraf/math/gsurfit) is not the same as the code we have here, or even in your git hub. Was something re-written? It isn't shown but I assume 'sum' was initialized? Likewise, the last comment seems to imply you're not using the F2C compiler any longer?

 
Profile Email
 Quote
joequant
 10/29/2013 10:58PM  
++---
Junior

Status: offline


Registered: 10/25/2013
Posts: 17
Quote by: fitz

I don't quite follow the bug report, the code you have listed as coming from gsacpts (presumable in $iraf/math/gsurfit) is not the same as the code we have here, or even in your git hub. Was something re-written? It isn't shown but I assume 'sum' was initialized? Likewise, the last comment seems to imply you're not using the F2C compiler any longer?



The code in the bug report was intermediate code that was used to track down the problem.

The actual fix required no code changes but involved adding the compile option -fno-aggressive-loop-optimizations to the f77.sh script which is used to compile f77 on IRAF. The f77 script is a wrapper around f2c.

This was a tricky bug to find and fix because without that flag, the optimizer will generate bad code.

 
Profile Email
 Quote
fitz
 10/29/2013 11:03PM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 3998
Intermediate how? I don't see that produced by XPP or RPP.

 
Profile Email
 Quote
joequant
 10/29/2013 11:16PM  
++---
Junior

Status: offline


Registered: 10/25/2013
Posts: 17
Quote by: fitz

Intermediate how? I don't see that produced by XPP or RPP.



Intermediate in the sense that we were using it for trial and error. We had to do a lot of trial and error before we nailed down the problem.

There is nothing wrong code-wise with IRAF. The problem is that because IRAF uses out-of-bounds memory for dynamic allocation, the compiler can assume that seemingly valid IRAF code is undefined because it is modifying out of bounds memory and can generate bad machine code because of it.

 
Profile Email
 Quote
fitz
 10/29/2013 11:18PM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 3998

Also, note that the -fno-aggressive-loop-optimization flag is new to GCC 4.8 and will cause an error for earlier versions.

 
Profile Email
 Quote
joequant
 10/29/2013 11:24PM  
++---
Junior

Status: offline


Registered: 10/25/2013
Posts: 17
Quote by: fitz


Also, note that the -fno-aggressive-loop-optimization flag is new to GCC 4.8 and will cause an error for earlier versions.



Yes, and I suspect that this odd behavior we are seeing is because we are using GCC 4.8 to compile things, and GCC 4.8 optimizes things more aggressively than early versions.

 
Profile Email
 Quote
fitz
 10/29/2013 11:26PM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 3998
The problem is that because IRAF uses out-of-bounds memory for dynamic allocation,


That's not quite how it works, the Mem common is just an indexing trick but the memory is actually allocated.

.... code is undefined because it is modifying out of bounds memory and can generate bad machine code because of it.[/p]


Okay, however the assembly code produced with or without the flag is the same on my machine, and the example runs as described so I don't see either the bug or the fix.

 
Profile Email
 Quote
joequant
 10/30/2013 03:37AM  
++---
Junior

Status: offline


Registered: 10/25/2013
Posts: 17
Sorry if this is a duplicate....

Yes the memory is properly allocated, however the trick has some unintended side effects. What is happening is that fortran finds the base of the heap with an array of size one. F2C converts this into a static structure of one element. When gcc 4.8 does the static analysis it sees the memd pointer, sees that it is assigned to a static array of size one, and then assumes that any references to the array of greater than 1 are pointing to unallocated memory.

If there were another way of getting a pointer to the base of the heap other than with an array of length one, this would be nice since it would mean that IRAF could be ported to F90. The trick of using an array as a pointer works in f77 but its forbidden in f90, and I haven't been able to think of a workaround.

Also, this was a nasty bug to track down because the compiler has the option to choose what code to generate, and usually it will generate the right code, but the optimizer will sometimes try to optimize out the loop.

 
Profile Email
 Quote
joequant
 11/02/2013 12:52AM  
++---
Junior

Status: offline


Registered: 10/25/2013
Posts: 17
OK, I have some more information about the root causes and fix for the problem.

The issue is that gcc 4.8 is extremely aggressive at using SSE2 instructions to deal with double arithmetic. However for these instructions to work, the doubles have to be aligned on 128-bit memory addresses. This does not have to do with 32/64-bit.

The compiler issues were because the IRAF memory management caused a situation in which gcc assumed that the memory was 128-bit aligned but it was not in fact aligned, so the SSE2 instructions were causing failures.

The fix is to change IRAF memory management to force alignment on 128-bit addresses. The two changes are

1) Change config.h to

define SZ_MEMALIGN (2*SZ_DOUBLE) # alignment criteria for malloc

2) change f77.sh to include

sed -i -e "s/memd\[1\]/memd\[2\]/" $b.c

Without this change, the compiler may not align memd against a 128-bit boundary. By having two doubles, gcc 4.8 does the initial alignment correctly


 
Profile Email
 Quote
fitz
 11/02/2013 01:18AM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 3998

Thanks for the extra digging.

Do you have a test program to demonstrate this, and or a reference describing the SSE2 alignment requirement? IRAF memory (at least the part that the SPP code uses) is already 64-bit aligned which is also 16-bit aligned. Additionally, the address of the mem common (i.e. Memd, Memi, etc) is fixed at 0 (zero) so I don't understand how declaring the array to be 2 elements would make any difference. Lastly, are you setting '-msse2' or other GCC flags when compiling or just using defaults for GCC 4.8?

 
Profile Email
 Quote
joequant
 11/03/2013 03:34PM  
++---
Junior

Status: offline


Registered: 10/25/2013
Posts: 17
Hi. My post seems to have gotten eaten up..

Do you have a test program to demonstrate this, and or a reference describing the SSE2 alignment requirement?


Not a full test program, but I can show how to reproduce ths issue....

pkg/images/imcoords/src/ccfunc.x

around line 150 you see

if (! transpose) {
Memd[w+ax1-1] = tlngref
Memd[w+ax2-1] = tlatref
}

If you put a printf before and after this block of code, you will see that the first assign does not get executed, and this causes problems further downstream. If you change the code to read


if (! transpose) {
Memd[w+ax1-1] = tlngref
call printf("")
Memd[w+ax2-1] = tlatref
}

It works fine.

Looking that assembly, you see that in the first situation it's trying to do a movapd to do the assign in with instruction. However, from the docs....

http://en.wikipedia.org/wiki/MOVAPD

it seems that MOVAPD requires 128-bit alignment

IRAF memory (at least the part that the SPP code uses) is already 64-bit aligned which is also 16-bit aligned. Additionally, the address of the mem common (i.e. Memd, Memi, etc) is fixed at 0 (zero) so I don't understand how declaring the array to be 2 elements would make any difference.


The problem is that GCC-4.8 is issuing instructions that expect 128-bit alignment. To get that you have to set the memory so that it is aligns the address on 128-bit boundaries, also it is necessary to allocate two doubles so that compiler puts the inital zero address on an 128-bit boundary.

Lastly, are you setting '-msse2' or other GCC flags when compiling or just using defaults for GCC 4.8?[/p]


Defaults. It appears that GCC 4.8 assumes that sse2 is standard as I haven't figured out how to turn it off. Setting no-sse2 doesn't work, because GCC tries to generate instructions for the other SSE versions.

The other thing is that everything works with -O2 once I do the alignment. Once I try -O3 then things break again.

 
Profile Email
 Quote
fitz
 11/04/2013 11:41PM  
AAAAA
Admin

Status: offline


Registered: 09/30/2005
Posts: 3998
I'm not sure why, but I still don't see the issue. I created a Mageia 4 VM (64-bit) using the latest ISO, it comes with GCC 4.8.2 and I had to get several other RPMs (e.g. F2C). Using the v2.16.1 distro we have, or the RPM that you created, I don't even see the 'movapd' instruction generated in any of the code. The test program I used was

PHP Formatted Code
procedure test ()

pointer sp, data
double  x
int     i

begin
        call smark (sp)
        call salloc (data, 100, TY_DOUBLE)

        x = 3.14
        do i = 1, 10 {
            Memd[data+i-1] = x
            Memd[data+i  ] = x + 1
        }

        call sfree (sp)
end
 


I compiled this to the C code and then to assembly using the commands:

PHP Formatted Code

    % xc -c -F -/K test.x
    % cc -c -S -I$iraf/unix/bin.linux64 -m64 test.c
 


and the resulting 'test.s' has no MOVAPD. Likewise, using your rpm installation and making adjustments for the use of /usr/bin/f2c and /usr/include/f2c.h shows essentially the same assembly code. I do see your change to the size of the Mem common, but no movapd. No special compiler flags, stock version of GCC, etc.

Note that with the test program (or the ccfunc.x code) I do see a warning about the aggressive loop optimization, however the GCC seems to support -faggressive-loop-optimization by default but there is no -fno-aggressive-loop-optimization flag or similar to disable it. Either way, the code seems to run as expected.

Are you using a stock system or some developer's setup?

 
Profile Email
 Quote
olebole
 02/12/2018 11:07AM  
++++-
Regular Member

Status: offline


Registered: 05/01/2014
Posts: 106
This is also reported as issue #73 of IRAFs github repository. This is a major problem, as it may lead to wrong results, if a user compiles his own code. Angry

To bring a specific example here (taken from the test suite), and performed with the original 2.16.1 binary distribution:

Take the following task as otest.x:
SPP Formatted Code
task otest = t_otest
procedure t_otest ()
int i
pointer p, sp
begin
    call smark(sp)
    call salloc(p, 4, TY_DOUBLE)
    do i = 1, 4
        memd[p+i-1] = i

    do i = 1, 4 {
        call printf("%d == %g\n")
            call pargi(i)
            call pargd(memd[p+i-1])
    }

    call sfree(sp)
end

All this code does is to allocate a temporary array with four integers, fill each position with its index, and then print out the integers. Compile it, declare the task in (e)cl and run it. What is expected to happen:
CL Formatted Code
cl> softools
softools> xc otest.x
   sys_runtask:
   t_otest:
otest.f:
   sysruk:
   totest:
link:
softools> task $otest = otest.e
softools> otest
1 == 1.
2 == 2.
3 == 3.
4 == 4.


However, what happens is (f.e. with Ubuntu 12.04 and up):

CL Formatted Code
cl> softools
softools> xc otest.x
otest.x:
   sys_runtask:
   t_otest:
otest.f:
   sysruk:
   totest:
link:
softools> task $otest = otest.e
softools> otest
1 == 9.345989255884568E-307
2 == 5.006895754891793E-308
3 == 8.622161201715412E-308
4 == 4.

The danger with this is that anyone using IRAF on a newer Linux platform and compiling his own tasks will produce errornous code!

Fixed in pull request #60 (and in the snapshot release 2.16.1+2018.02.04).

 
Profile Email Website
 Quote
   
Content generated in: 0.61 seconds
New Topic Post Reply

Normal Topic Normal Topic
Sticky Topic Sticky Topic
Locked Topic Locked Topic
New Post New Post
Sticky Topic W/ New Post Sticky Topic W/ New Post
Locked Topic W/ New Post Locked Topic W/ New Post
View Anonymous Posts 
Anonymous users can post 
Filtered HTML Allowed 
Censored Content 
dog allergies remedies cialis 20 mg chilblain remedies


Privacy Policy
Terms of Use

User Functions

Login