-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building OpenJ9 for e500v2 core equipped SoC #2585
Comments
We only use AltiVec if we detect the processor at runtime and know that it supports AltiVec. The same applies to VSX and various other hardware features. The We don't detect the e500 so we will assume we are running on a basic PPC chip that has no support for AltiVec, VSX, crypto instructions, transactional memory, etc. If those sorts of instructions get executed on your chip that's a bug in the JIT that can be fixed. |
Does it mean that the OpenJ9 shall be compiled on very basic PPC ISA if no supported architecture is detected? Why I do ask? |
This script just attempts to guess the platform you're compiling OpenJ9 on. The compiler options (gcc or xlC) used when compiling OpenJ9 will target the minimum supported architecture level. I'm not sure what that is on Power, but presumably it is a very old processor. What @ymanton is talking about is what happens at runtime. At runtime OpenJ9 will detect what processor you are running under and the JIT compiler will generate calls to |
As @fjeremic said, If you want to port OpenJ9 to the e500 then most of your work will be in making changes to the build system to work in a 32-bit ppc environment. Once you have a successful build you shouldn't have much trouble running OpenJ9 except for one issue related to using 64-bit instructions -- we assume that If you have not already seen issue #2399 please take a look at it, it discusses problems that are very similar to yours. |
No, it is not. This is quite powerful embedded system; 2 cores , 1.5GHz, 1 GiB RAM. I will look into the pointed thread. Thanks for reply. |
I've started the porting. Why: OpenJ9 claims to be much faster than other JVMs. For sake of simplicity I've decided to use zero variant (to avoid altivec issues) and build it native environment. I've followed: https://www.eclipse.org/openj9/oj9_build.html `==================================================== Configuration summary:
Tools summary:
Build performance summary:
The I've decided to build it with: The build errors poped up in: This file has been appended to the end of: SetupJavaCompilation(BUILD_TOOLS)
When I replace /org/xml -> /X11 the file (WrapperGenerator.java) is present. Any idea why it is like that? Maybe some explanation, which could shed some light? Regarding the debug infrastructure of OpenJ9 build:
Are there any other available? |
There was work needed to get higher levels working. The JIT specifically made use of a slightly modified CRTP which work on gcc 4.8 but not on 5+ due to spec conformance. We should be able to build now with gcc 7.3 through and will be moving to that compiler level soon. See #1684. |
I don't know how the zero parts of OpenJDK are built for OpenJ9, but OpenJ9 itself doesn't have a "zero" VM so unfortunately it will be the same as building a non-zero VM and various assembly files and the JIT will have to be built.
I don't know if it is a bug in the OpenJDK build system or the just the OpenJ9 parts, but the |
Ok, So this is a dead option.
I assume that you use PPC64? Have you ever tried to cross compile the OpenJ9? Is there any way to improve the debug output? I do have a hard time to find places where the files (like _the.BUILD_TOOLS_batch) are generated. Also please correct me if I'm wrong, but it seems to me like the ./configure is already created in the repository (and downloaded). Maybe I do need to regenerate it? |
No, OpenJ9 only builds on ppc64le, not ppc64 or ppc (the IBM JDK builds on ppc64 in both 32- and 64-bit modes). I have not tried to cross-compile OpenJ9 myself, but I think we may support that for ARM targets, but I'm not sure.
Unfortunately not that I know of, OpenJ9 had to make changes to the OpenJDK build system in order to integrate, but some things are still less than perfect. The only thing I can suggest is that if you're building jdk8 that you set
The version that's checked in should be in sync with |
Do you have maybe the build system adjustments to cross-compile the OpenJ9 on ARM? I mean the arm is also not supported (at all), so I could reuse some of its code on ppc port. |
Unfortunately I don't, I haven't spent any time on ARM. @JamesKingdon might have some info on how to get OpenJ9 to cross compile and/or some patches for that on ARM. |
If I may ask about OMR's tools - namely tracemerge, hookgen, etc. What is their purpose? In my native build - for example the tracemerge is used during build: Why do we need to merge trace information during build? I've also noticed the OMR_CROSS_CONFIG="yes", which gives tools the possibility to be cross compiled. However, it seems to be tunned to PPC64 (-m64). |
OMR and OpenJ9 use a trace engine to record diagnostic info on how the code is executing into a circular buffer on the thread. The descriptions of these trace points need to be converted into binary forms and then merged into a single data file that can be used by the runtime. That's roughly
|
Ok, so those are components, which will be used by running JVM instance and hence shall be either cross-compiled of build natively. |
They're only needed as part of the build and not at runtime. |
I think that I've misunderstood you in some way. Are they only used when the OpenJ9 is compiled (so they could be compiled as x86_64)? |
Sorry I wasn't clear. Most of the tools - like hookgen & tracemerge - are only used when OpenJ9 is compiled and can be compiled as x86_64. There is one that depends on right architecture: If you support DDR (used for debugging jvm crashes), it will also need to run on the right architecture. |
With current version of openJ9 build system (scripts) the successful configure gives following output:
Build performance tip: ccache gives a tremendous speedup for C++ recompilations. The problems is that on my system: Is there any workaround to fix this? Or the ./configure script logic is just wrong and the version is determined in a wrong way? |
@dnakamura Any thoughts on the ccache question? |
I believe the openjdk code assumes that the version < 3.1.4 if it fails to parse the version. IT's been a while since I looked at the relevant code, but I think they fail to parse when they seee anything other than digits or a decimal points. Will look into it |
Ok no my bad. It will handle alphabetic characters in the version string. However to check the version number they are just matching against the regex |
If I may ask again the question regarding the gcc 4.8 (which is recommended for this VM native build): I've backported the gcc 4.8.2 to my setup. Unfortunately during the ./configure execution, it wants to check if gcc is working: configure:22215: /usr/bin/powerpc-poky-linux-gnuspe-gcc -O2 -pipe -g -feliminate-unused-debug-types -Wno-error=deprecated-declarations -fno-lifetime-dse -fno-delete-null-pointer-checks -m32 -mcpu=8548 -mabi=spe -mspe -mfloat-gprs=double - The problem is that this particular optimization option is NOT supported in 4.8.[12345]. Why it is like that? Is the '-fno-lifetime-dse' only needed on PPC (as it is possible to compile J9 on x86_64). From the other reply -> the problem with compiling proper code only shows up on gcc 5+, so I guess that 4.9.x can be used? |
Looks like that issue comes from OpenJDK code, not OpenJ9. If you look here you'll see that they're trying to disable that opt under GCC 6, so it should not be used when you build using GCC 4.8. Is your default host compiler GCC 6 or later? Perhaps the configure scripts are invoking that in some places instead of your |
You should also note there is a runtime check you need to disable to work on 32 bit ( see #2399 ). |
I recently followed James' instructions and successfully cross-compiled from Ubuntu/AMD64 to the RPi and the resulting VM works fine. Caveat: you may want to read the recent conversation on Slack about back-contributing directly to the master repo, not via James' fork. I am also actively trying to cross-compile to the e500. I am approaching it differently though, I am trying to start from (pieces of) the OMR testcompiler which kind of looks more within reach. What I understood however is that its build system is quite disconnected from the other two i.e. from both TR's and J9's. And I have a feeling that it's less actively being looked at, as while the other parts cross-compile just fine, I had to dance around things to get the tc/tril/etc to cross-compile to ARM. I'll keep you posted on the progress with tc/tril on e500. |
Thanks Boris for your input.
I've looked on your Github repositories and I couldn't find the ARM port for J9. Would it be possible to upload it somewhere?
Do you have any reference/logs to those conversations?
Could you share the steps (or repository), which were needed on ARM to get it working?
Thanks. |
The scimark.fft.large took 30 minutes to "Warmup()". And similar time to run the test.
Unfortunately, this is too long. I would expect the penalty from lack of FP support, but on the other hand we do emulate them on this SoC anyway. |
During the compilation I caught some strange warnings. I dug a bit over this and I show that JIT files were not compiled with the correct flags. I saw that this variable is set by: Compile broke only to one point after that. This was at the compilation of PPCHWProfiler.cpp because it includes file PPCHWProfilerPrivate.hpp which has 64bit assembly in it and of course this is not compatible with our HW. So the compiler stopped. The problem is at lines 250 to 270. I guessed that this code should not be called otherwise we would have seen and SIGILL during some of the test. |
Yes you are right, the code in the PPCHWProfiler files will only run on certain POWER processors. The inline assembly should be better guarded or the files should not be built at all on platforms like yours ideally, but it's a minor issue. I'll keep it in mind. Most of the code in libj9jit is dedicated to the compiler itself, which doesn't do a lot of floating point math, but some code is runtime code that will be executed by the program. Some of those runtime routines are in object files built via VM makefiles and they get linked together with the JIT library, which explains those warnings. The files built with hardfloat will never interact with the ones built with softfloat so you were not in danger, but it's good to fix that anyway. Is floating point performance important to you? If it is then you really need the JIT to support SPE. With |
Out of curiosity, how is your |
@ymanton FP cannot be excluded. @shingarov configuration comes from ELDK 5.6 which is based on yocto daisy 1.6. I took configuration from there in order to make the native build. Later I will try also the cross compile with a yocto recipe. But if you explicitly say to the gcc Either way even with the |
Yes, the floating point support is necessary. The observed performance regression is not acceptable. From my understanding -> as we emulate FPU instructions in-kernel, the J9 JIT which uses them shall work with fully emulated code. Performance shall be better than |
The above behaviour puzzles me a bit.
It uses "ev*" ASM SPE instructions (like evdmul -> the same performance as FPU but on GPRs), so this is the fastest possible code on this SoC. Even better, the JIT code uses trampoline to jump to function, which provides the log (@ymanton is there a way to check where When we disable the Maybe the problem with performance regression lies somewhere else? Maybe locking (as JVM uses several threads) as we use |
That is an implementation detail.
Notice that it is a native method, this means that it's implementation will be in C. You can still find the assembly for it in gdb or by looking at the
You have to consider that yes you will execute a "fast" version of |
@shingarov - Have you managed to make any progress there (with e500 or RISC V)? |
@ymanton during my tests I figure out something a bit strange. Now I did the following test: The strange thing I show in dmesg was that jvm was emitting lots of lfd and stfd instructions. Who is emitting those instructions if not JIT or AOT? |
There are probably a few low-level routines written in assembly that are still being called. For example https://github.com/eclipse/openj9/blob/master/runtime/util/xp32/volatile.s |
@ymanton We do experience some slow downs when decompressing files (.war/.jar). After enabling the above switch the speedup was not present (but expected as this is a native lib).
Is there any other way to speed up decompression on J9? |
No I've never used that option. It looks like it allows you to use your system's zlib rather than the one in https://github.com/ibmruntimes/openj9-openjdk-jdk8/tree/openj9/jdk/src/share/native/java/util/zip/zlib. Even if you don't use that option you will be getting a native zlib implementation, the only difference is which one. It sounds like there is no performance to be gained by using your system's zlib over the OpenJDK one. Not sure how you can speed up decompression, other than reducing the number of JAR files you access or maybe changing the build so that JAR files are built with no compression (i.e. |
This I do need to check if those files can be converted.
I've tried it and the results are promissing - when put into /tmp dir I can see the startup speedup of around 15% |
I did some detailed tests on the target production application ( with The JIT log interresting parts:
And maybe the most important:
The code responsible for unzip'ing. It seems like this is the main slow down factor (the 'prod' application is 3x slower and the cpu usage is very high). Hence my above question if unzipping can be replaced with on-system library. |
It seems like the zip decompression is the bottle neck - at least from the FPU point of view..... I took the webapp (*.war) and re-archived it with but the execution time (for this part) was reduced from 60 seconds to 2.5 seconds !!! |
That's surprising. I don't see how Is the With |
Output from the perf report (for the part performing the zlib decompression)
It is apparent that some FPU instructions have slipped in the zip "decompression code". However, neither the I've also grep'ed the |
OK thanks, I'll look into some of this when I have some time in the next couple of days and get back to you. |
I took a quick look at why If you want you can try the following change to your JCL to see how much performance you can get back for unzipping: diff --git a/jdk/src/share/classes/java/util/zip/ZipCoder.java b/jdk/src/share/classes/java/util/zip/ZipCoder.java
index b920b82..cc449e6 100644
--- a/jdk/src/share/classes/java/util/zip/ZipCoder.java
+++ b/jdk/src/share/classes/java/util/zip/ZipCoder.java
@@ -45,7 +45,7 @@ final class ZipCoder {
String toString(byte[] ba, int length) {
CharsetDecoder cd = decoder().reset();
- int len = (int)(length * cd.maxCharsPerByte());
+ int len = (int)(length * maxCharsPerByte);
char[] ca = new char[len];
if (len == 0)
return new String(ca);
@@ -76,7 +76,7 @@ final class ZipCoder {
byte[] getBytes(String s) {
CharsetEncoder ce = encoder().reset();
char[] ca = s.toCharArray();
- int len = (int)(ca.length * ce.maxBytesPerChar());
+ int len = (int)(ca.length * maxBytesPerChar);
byte[] ba = new byte[len];
if (len == 0)
return ba;
@@ -127,6 +127,8 @@ final class ZipCoder {
private Charset cs;
private CharsetDecoder dec;
private CharsetEncoder enc;
+ private int maxCharsPerByte;
+ private int maxBytesPerChar;
private boolean isUTF8;
private ZipCoder utf8;
@@ -139,11 +141,15 @@ final class ZipCoder {
return new ZipCoder(charset);
}
+ private int maxCharsPerByteRU() { return (int)(dec.maxCharsPerByte() + 0.5f); }
+ private int maxBytesPerCharRU() { return (int)(enc.maxBytesPerChar() + 0.5f); }
+
private CharsetDecoder decoder() {
if (dec == null) {
dec = cs.newDecoder()
.onMalformedInput(CodingErrorAction.REPORT)
.onUnmappableCharacter(CodingErrorAction.REPORT);
+ maxCharsPerByte = maxCharsPerByteRU();
}
return dec;
}
@@ -153,6 +159,7 @@ final class ZipCoder {
enc = cs.newEncoder()
.onMalformedInput(CodingErrorAction.REPORT)
.onUnmappableCharacter(CodingErrorAction.REPORT);
+ maxBytesPerChar = maxBytesPerCharRU();
}
return enc;
}
|
Thanks @ymanton for your investigation. As one can see above - code which on the first glance doesn't require FP support, needs one.
Taking the above into consideration - we can get away with massive changes in OpenJ9 code and just add support for PPC32 bit to its repository. |
@lmajewski Our immediate goals at this stage are much more modest, being currently confined to just OMR. On RISC-V, we successfully JIT some simple methods such as Fibonacci. We hope to share that initial code during this coming RISC-V summit. On e500, I would like to understand how you were able to run so much of OpenJ9 so successfully. In my experiments so far, I have confined myself to the much simpler TestCompiler, and even for those trivial tests, the generated code is sometimes incorrect. For example, I am trying to debug problems in the area of |
@shingarov PPCSystemLinkage implements the ppc64le ABI only (because OMR is only supported on ppc64le), it does not handle the AIX/ppc64be ABI or the ppc32 ABI. We don't use the native ABIs for Java, we use our own and you can find the implementations for that stuff in https://github.com/eclipse/openj9/blob/master/runtime/compiler/p/codegen/PPCPrivateLinkage.cpp and https://github.com/eclipse/openj9/blob/master/runtime/compiler/p/codegen/PPCJNILinkage.cpp |
@lmajewski and @PTamis just curious if you're still pursuing this and/or still using OpenJ9 on e500? I'm going to spend some time figuring out what we can salvage from the various patches that have been discussed in this issue that can be contributed back to OMR and OpenJ9. |
Dear @ymanton please find some small update from this project:
[1] - https://www.phoronix.com/scan.php?page=news_item&px=GCC-Removes-PowerPCSPE |
Further notes upon researching:
I'm not sure if there's enough interest in non-e500-using communities, though, since most of the (non-ancient) obtainable hardware for an average user is 64-bit POWER with AltiVec:
With that said my PowerBook G4 would certainly be easier to justify keeping if I could run Java software at a reasonable (given the system's inherent performance) clip. OpenJDK/zero is absolutely miserable, and IBM's old J9 JVM only runs on it up to version 1.6 or so (where it performs quite well in Debian). |
@ymanton sorry for my late reply. |
Dear All,
I'm trying to build OpenJ9 on the PPC SoC equipped with e500v2 core. This core doesn't have the AltiVec IP block (Instead it uses the SPE extension for floating point calculation).
The problem seems to be with the OpenJ9 assumption that all supported cores support AltiVec instructions. One of the assembly tuned files:
./openj9/runtime/compiler/p/runtime/J9PPCCRC32.spp
This is the __crc32_vpmsum [1] optimized implementation of CRC32 calculation for 16B data blocks.
Is there any C implementation of this function available? Or maybe one for SPE assembler?
Please correct me if I'm wrong, but it seems to me that one would need to:
or
Personally, I would prefer the first option with C, but I'm not sure what would be the performance impact
on OpenJ9.
Has anybody tried to run OpenJ9 on e500_v2?
Thanks in advance,
Łukasz
The text was updated successfully, but these errors were encountered: