Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

freebsd 386 - test_int_precision tests fail #99

Open
nenadv opened this issue Apr 22, 2017 · 7 comments
Open

freebsd 386 - test_int_precision tests fail #99

nenadv opened this issue Apr 22, 2017 · 7 comments

Comments

@nenadv
Copy link

nenadv commented Apr 22, 2017

Clang UPC 3.9 exhibits a check failure of some values in the array.

test_int_precision.c
78 upc_forall (i = 0; i < SIZE; i++; &clone2[i]) {
79 DTYPE volatile d = (DTYPE)(1.123 * i); /* see BUPC bug 168 */
80 if (clone2[i] != d)
81 err[MYTHREAD] += 1;
82 }

clone2[] array is 3200 in size, and checking fail for indexes 1000, 2000, and 3000. Somehow the value of 'd' is off by 1 (e.g. 1122 instead of 1123). Value inside shared clone2 is correct.

I created a smaller test that fails too, but only for the upc_forall loop. The for loop works as expected.

    for (i = MYTHREAD; i < SIZE; i += THREADS) {
            int volatile d = (int)(1.123 * i);
            if (clone2[i] != d)
                    printf ("for clone2 [%d] %d: %d (%d)\n", MYTHREAD, i, clone2[i], d);
    }
    upc_forall (i = 0; i < SIZE; i++; &clone2[i]) {
            int volatile d = (int)(1.123 * i);
            if (clone2[i] != d)
                    printf ("forall clone2 [%d] %d: %d (%d)\n", MYTHREAD, i, clone2[i], d);
    }

It seems that this happens only on freebsd, 32 bit version. And only if compiled with optimization.

@nenadv
Copy link
Author

nenadv commented Apr 24, 2017

I looked at the IR code and asm. I am attaching my test program, as well as IR for the optimized version.

IR code has three sets of code (I placed '-- [123]' at the beginning of each section) where multiply/conversion is done. Somehow, upc_forall creates two sets of almost identical code but only one of them is being executed (at least this is the case for running with only one thread, confirmed by debugging). Non optimized version (that works) creates only one set of code for upc_forall. You can simple confirm this by looking for printf statements, there are three of them.

Asm code is different for calculating fmul/convert then a simple loop.

bug-freebsd.upc.txt
bad.ll.txt

@swatanabe
Copy link

swatanabe commented Apr 24, 2017 via email

@nenadv
Copy link
Author

nenadv commented Apr 25, 2017

I enclosed an asm code that is relevant, with some comments. Summary is like this:

  • for () loop calculates ref value by converting index into FP, multiply with 1.123, and converting back into integer
  • upc_forloop calculates ref value by multiplying FP value of index and 1.123, and adding 1.0 to the FP value of index in each iteration.

I tried to duplicate with the C code but was not able to create exact sequence.

compare-result.txt

@gary-funck
Copy link

gary-funck commented Apr 25, 2017 via email

@nenadv
Copy link
Author

nenadv commented Apr 25, 2017

I checked the code on Fedora Core 23. It uses SSE (xmm registers), while freebsd uses FP x87 (st registers). Also, the code on FC23 looks like the regular 'for loop' on freebsd. Even though, both of the machines are VMs on the same host, freebsd is probably using different machine target.

Is there a way to ask clang for the default machine attribute?

@swatanabe
Copy link

swatanabe commented Apr 28, 2017 via email

@nenadv
Copy link
Author

nenadv commented Jul 7, 2017

I think the problem is caused by the scheduler where instructions are moved into the region where lower computation precision is used.

Here is the code for non optimized (good) version (which is similar to the 'for' loop).

GOOD (-O0)

    fildl  0xcc(%esp)
    fldl   0x8053f20
[ multiply with 1.123 ]
    fmulp  %st,%st(1)
[ change FP control word]
    fnstcw 0xb6(%esp)
    mov    0xb6(%esp),%dx
    movw   $0xc7f,0xb6(%esp)
    fldcw  0xb6(%esp)
[ new FP control word - 0xc7f
  precision bits (8 and 9) - 00 - 24 bits, single precission
  rounding bits (10 and 11) - 10 - toward 0 ]
    mov    %dx,0xb6(%esp)
[ convert to integer ]
    fistpl 0xd0(%esp)
[ restore FP control word]
    fldcw  0xb6(%esp)


BAD (-O3)
    fstp   %st(0)
    mov    0x80538e4,%edx
    fnstcw 0xc(%esp)
    movzwl 0xc(%esp),%eax
    movw   $0xc7f,0xc(%esp)
    fldcw  0xc(%esp)
[ new FP control word - 24 bits, single precision ]
    mov    %ax,0xc(%esp)
    fstl   0x6c(%esp)
[ multiplay with 1.123 ]
    fmull  0x8051f80
    fistpl 0x58(%esp)
[ restore FP control word ]
    fldcw  0xc(%esp)
'''

Original FP control word is 0x127F which is double precision and in the good case all multiplications are done with double precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants