Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace DirectBuffer with Agrona DirectBuffer #68

Merged
merged 1 commit into from
May 31, 2016

Conversation

benalexau
Copy link

@benalexau benalexau commented May 19, 2016

This PR adopts Agrona's DirectBuffer, rather than an early fork of DirectBuffer. Adopting Agrona yields several benefits:

  • LMDBJNI buffers can be directly used in a zero copy fashion with both Simple Binary Encoding and Aeron
  • Java 9 migration will be simplified via delegation to future Agrona implementations
  • Agrona defines DirectBuffer (read-only) and MutableDirectBuffer (read/write) interfaces, which have the normal benefits of separating interfaces from implementations, plus exposing the most minimal API for each use case

This PR includes:

  • The migration to Agrona DirectBuffer
  • Externalisation of the default key length to the Env class, including tests of that default
  • Introduction of a Buffers utility class to centralise MutableDirectBuffer instantiation
  • A new test case that demonstrates and validates Simple Binary Encoding zero copy buffer use

@coveralls
Copy link

Coverage Status

Coverage increased (+1.2%) to 87.542% when pulling dbd2fc2 on benalexau:agrona into 9a29e34 on deephacks:master.

@benalexau
Copy link
Author

benalexau commented May 19, 2016

Travis has passed the Java 8 test, but failed the Java 7 tests due the third-party SBE tool being compiled with Java 8.

As Java 7 was end of life in April 2015 I would respectfully suggest it is reasonable to discontinue the JDK 7 CI tests.

@krisskross
Copy link
Member

I think replacing DirectBuffer with Agrona buffers is a great idea. Better to have good integration with a library like Agrona than nothing at all. There was discussions earlier about using standard ByteBuffer but this idea haven't been pursued AFAIK.

I do have some concerns on dropping Java 7 support, mostly for sake of Android. However, Android N seems to have limited support for Java 8 and there seems to be some support for Android 6.0 (API level 23) as well. I don't have any Android experience so I don't know how this works in practice.

On the other hand, zero copy with DirectBuffer shouldn't be used on Android since we haven't figured out how to do it without manual JNI hacking yet.

If anyone use lmdbjni for Android I would like know if this is concern or not. Also if there are any users on Java 7 still.

@krisskross
Copy link
Member

Some minor comments on the code.

  • The Buffers class seems unnecessary?

There are not a lot of places where MutableDirectBuffer is instantiated in production code. Maybe add a utility method creating buffers only for tests? The Buffers.slice method could be added to BufferCursor instead?

  • The indentation of pom.xml files should be 2 spaces.

@benalexau benalexau force-pushed the agrona branch 2 times, most recently from ab090fa to eac3d5f Compare May 20, 2016 00:00
@coveralls
Copy link

Coverage Status

Coverage increased (+1.2%) to 87.542% when pulling ab090fa on benalexau:agrona into 9a29e34 on deephacks:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.2%) to 87.542% when pulling e980a67 on benalexau:agrona into 9a29e34 on deephacks:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.2%) to 87.542% when pulling eac3d5f on benalexau:agrona into 9a29e34 on deephacks:master.

@benalexau
Copy link
Author

benalexau commented May 20, 2016

Further to my earlier comment on CI failing due to a Java 8 dependency in the SBE Tool, upon further investigation the Java 8 requirement also extends to Agrona itself. I had hoped that it was isolated to the SBE Tool, therefore enabling a tactical workaround such as manually generating the flyweights. However Java 8 is also a requirement of Agrona:

$ javap -verbose -cp ~/.m2/repository/org/agrona/Agrona/0.4.13/Agrona-0.4.13.jar org.agrona.DirectBuffer | grep version
  minor version: 0
  major version: 52

$ javap -verbose -cp ~/.m2/repository/uk/co/real-logic/sbe-all/1.4.0-RC4/sbe-all-1.4.0-RC4.jar uk.co.real_logic.sbe.SbeTool | grep version
  minor version: 0
  major version: 52

In relation to the Buffers class, I would submit that it's worth keeping. Today Agrona's DirectBuffer and MutableDirectBuffer interfaces are almost universally satisfied via its UnsafeBuffer implementation. In the future this may no longer be the case, particularly as Java 9 emerges and/or it becomes necessary to return a different MutableDirectBuffer based on JVM version and/or platform (eg Android).

As the Buffers class is a package protected final class, it is unavailable for end users and a subsequent refactor or removal of the class will not break the public contract.

The main benefit of Buffers is it currently encapsulates every UnsafeBuffer reference in the PR:

$ find . -name \*.java -exec grep -H UnsafeBuffer {} \;
./lmdbjni/src/main/java/org/fusesource/lmdbjni/Buffers.java:import org.agrona.concurrent.UnsafeBuffer;
./lmdbjni/src/main/java/org/fusesource/lmdbjni/Buffers.java:      return new UnsafeBuffer(new byte[0]);
./lmdbjni/src/main/java/org/fusesource/lmdbjni/Buffers.java:    return new UnsafeBuffer(ByteBuffer.allocateDirect(capacity));
./lmdbjni/src/main/java/org/fusesource/lmdbjni/Buffers.java:    return new UnsafeBuffer(source.addressOffset(), length);

It is called from some 48 call sites:

$ find . -name \*.java -exec grep -H Buffers.buffer {} \; | wc -l
48

As such we now have a single location if conditional logic is required to select an alternate implementation in the future. Of more minor benefit, we also have reduced the length of lines that would have needed to manually instantiate UnsafeBuffer themselves. We have also guaranteed consistency of buffer types, specifically that a to-be-used-for-wrapping buffer is initially backed by an inexpensive byte[0], and key buffers are of the Env.MAX_KEY_SIZE test-verified length. I believe these collectively enhance the readability of the code.

Neverthess you are the maintainer, so I can switch back to creating UnsafeBuffers at the call sites if you wish. I just wanted to advocate why I did it in the first place, as there are dozens of call sites.

Regarding the POM indentation, I have fixed this up and squashed it into the original commit for further review.

@krisskross
Copy link
Member

Sorry for the late reply. I have been on vacation for a week. I agree that the Buffers class doesn't matter much since its package protected. I'm ready to merge the PR since there hasn't been any objections so far. This also means moving lmdbjni to Java 8. We might get objections later but I feel we can handle with separate branching.

@krisskross
Copy link
Member

There was one thing. Can we shade this dependency or is it a bad thing? I know this technique will be less relevant as Java 9 emerge but I'm curious about the general opinion.

@benalexau
Copy link
Author

Agrona has had 20 releases in 16 months (see here and here). Any application that uses Agrona (eg Aeron, SBE) will probably want to use the latest Agrona. If LMDBJNI shades Agrona, this will complicate the classpath ordering of such applications.

The two benefits I can see for shading are:

  • Agrona 0.5.1 is 224K where LMDBIJNI 0.4.7-SNAPSHOT is 79K. Thus selectively shading only the required types would reduce the footprint of applications not needing the full Agrona. However these footprint-sensitive applications could remove all of their unnecessary class file bytes (including unwanted types, fields, methods etc) using ProGuard or similar. It just seems a footprint-sensitive use case is going to be using a comprehensive approach like ProGuard anyway.
  • It avoids non-Maven users from needing to add Agrona as a dependency. Maven users will simply receive Agrona as a transitive dependency of LMDBJNI. I haven't checked but I'm guessing non-Maven build systems already have this general problem (ie needing to ingest Maven dependency metadata or their users manually adding the dependency).

Personally I prefer to keep things simple and idiomatic by not shading the dependency.

@phraktle
Copy link

I maintain that using a ByteBuffer is the way to go (see #42), instead of relying on unsafe operations. It would keep the library light-weight, more portable and future-proof. Would also work with existing serialization code using ByteBuffer APIs, and would not stop anyone from using their favorite buffer wrappers, such as Agrona and many others out there. The opposite is not true. The only difficulty seems to be that HawtJNI does not generate native code that returns ByteBuffers.

@benalexau
Copy link
Author

@phraktle I agree in principle (I'd still like to benchmark the result to ensure there is no regression though), but do you know if someone is working on making HawtJNI support this?

@phraktle
Copy link

Opened a ticket here: fusesource/hawtjni#25. In the meantime, perhaps a reasonable hack would be to create DirectByteBuffers from the pointer on the Java level (rather than in JNI). This does require reflection to make address and capacity accessible but arguably this is less problematic than using Unsafe.

@krisskross
Copy link
Member

I think this change make sense irrespective if we change to ByteBuffer some time in the future. But maybe it's time I look into this issue. @phraktle: have you looked into how difficult it would be to implement that in HawtJNI?

It's a bit unfortunate that the library grows with 224K though, at the same time, users sensitive to footprint would indeed probably solve that one way or another. I agree with @benalexau comment on shading (so let's not do that).

@krisskross
Copy link
Member

I'll do a benchmark for the ByteBuffer reflection trick this coming week and see how it performs against DirectBuffer.

@benalexau
Copy link
Author

benalexau commented May 30, 2016

Are we opposed to abandoning HawtJNI in favour of raw JNI or another JNI helper library?

A few months ago I did a quick (and encouraging) experiment accessing LMDB via JNAerator-emitted BridJ bindings. I only used this combination as I had recent experience with them and an internal C++ library.

I feel hand-crafted JNI or a more minimalist C-focused JNI binding generator (eg Gluegen) might be worth exploration, particularly given LMDB is such a small and stable API.

@krisskross
Copy link
Member

I did some manual JNI hacking some time ago but that was a nightmare. I haven't looked closer at other JNI code generating tools. But since lmdbjni is quite mature I think our best bet is to solve it in HawtJNI.

@phraktle
Copy link

I haven't looked into how difficult it would be to do this in HawtJNI. Agree that manual JNI is too painful. Alternatives to HawtJNI could work also. Hopefully there will be a better FFI for Java: http://openjdk.java.net/jeps/191

In the meantime JNR-FFI looks pretty solid.

@krisskross
Copy link
Member

JNR-FFI has a nice API but the performance seems so-so.

http://stackoverflow.com/questions/22288909/jni-vs-jna-performance

@phraktle
Copy link

That link is about JNA, not JNR-FFI (https://github.com/jnr/jnr-ffi)

@krisskross
Copy link
Member

Sorry, I confused the two. Couldn't find any benchmarks so I posted a question on their issue tracker.

@krisskross
Copy link
Member

JavaCPP looks quite nice also, well maintained, lots of examples/documentation and promise "zero overhead compared to manually coded JNI functions".

@benalexau
Copy link
Author

JavaCPP also appears to use NewDirectByteBuffer (see Generator.java). @phraktle does it look like this would work to you?

@phraktle
Copy link

phraktle commented May 30, 2016

I'm not familiar with JavaCPP, but as long as it can generate the JNI code for the LMDB wrapper easily and supports returning ByteBuffers, it should do the trick :) The reason why I mentioned JNR is that it seems widely used and might be the basis of JEP191 / Project Panama.

@krisskross
Copy link
Member

I have been playing around with JavaCPP for a short while. Looks promising.

@benalexau
Copy link
Author

benalexau commented May 31, 2016

I still think it would be nice to merge this PR given it would provide a foundation for benchmarking the latest improvements to DirectBuffer against a ByteBuffer-based approach. In my view the single biggest competitive differentiation of LMDBJNI is it’s the lowest latency off-heap sorted map we can access from Java. After all, there are plenty of pure Java off-heap and on-heap sorted and unsorted maps for those who aren’t as latency sensitive. I believe LMDBJNI should continue to prioritise its latency differentiation for those users who depend on that key attribute.

To this end, LMDBJNI currently uses MutableDirectBuffer to wrap a different memory location as the cursor pointer moves. This does not require any additional object allocation or reflection (see source code). If I understand the NewDirectByteBuffer approach, JNI will allocate a new ByteBuffer instance on each cursor move. A user of the new ByteBuffer instance will then either use it directly (with the associated bounds checking of a ByteBuffer) or separately present it to a helper library that detects the ByteBuffer concrete class, reflectively accesses its private memory address, and uses Unsafe to get/set bytes without bounds checking overhead. To the extent this is correct, each cursor move would mean an extra ByteBuffer object allocation plus either bounds checking or reflection to acquire a memory address.

Such latency might border on immaterial versus the JNI call overhead and practical LMDB use cases, but I believe it needs to be benchmarked with a large enough workload to verify the GC pressure, reflection cost and bounds checking overhead. If the latency is of borderline difference I obviously prefer the cleaner, standards-oriented ByteBuffer approach. I just don’t want to pay a latency cost for the common use case of a single transaction making numerous cursor moves and making use of the fetched data.

Separately I am also working on an LMDB DAO that persists SBE types via a Java annotation processor. SBE Pull Request 350 added some code generation features to support this work, so I am keen to finish this off and make it available for others. The annotation processor generated code is simpler and slightly fewer operations if I don't need to obtain the existing LMDBJNI DirectBuffer address and then wrap it into an Agrona [Mutable]DirectBuffer for the SBE use.

Is @krisskross trying out porting to JavaCPP? Can I help out? Is there a NewDirectByteBuffer approach that would negate the latency concern?

@krisskross
Copy link
Member

Yes, I agree that the primary concern for LMDBJNI is performance. I'm also a bit worried about that extra allocation. The manual tests I did before indicated that it might hurt performance. I will not spend time rewriting LMDBJNI if the performance is worse than what we have today.

Right now I'm just playing around with JavaCPP to get a feel for the work needed and it seems relatively straight forward. My C/C++ knowledge is a little shakey so any help in this area would sure be helpful.

Cool initiative there with SBE! I was also doing something similar a while back in a project called Graphene. Ill have a closer look at the PR.

I will merge the PR, in the meantime. Sorry for the detour.

@krisskross krisskross merged commit 14b264a into deephacks:master May 31, 2016
@phraktle
Copy link

If it turns out that the overhead of the ByteBuffer wrapper allocation is indeed an issue, the reflective trick would still allow you to move the pointer without allocating a new one.

@krisskross
Copy link
Member

Yes, I'll explore that option aswell.

@krisskross
Copy link
Member

BTW, thanks for the PR @benalexau!

@benalexau
Copy link
Author

@krisskross you're welcome. How did you want to tackle the JavaCPP piece? A branch here or in your own fork or something else? Regardless of whether we eventually end up with ByteBuffer or DirectBuffer (again I prefer ByteBuffer if the latency for large workloads ends up materially equivalent), it seems desirable to modernise around a more popular and maintained JNI helper (JavaCPP has 1,532 stars vs HawtJNI's 50) and remove Unsafe from LMDBJNI if we can.

@krisskross
Copy link
Member

Stars is indeed how we measure stuff nowadays, right? ;-) No, but seriously HawtJNI is mostly abandoned so I agree that we should move away from it. Looks like JavaCPP makes it easy to access memory through both address and ByteBuffer.

Right now I just play around in a private repository. I can create a public one if you're interested to have a look?

@benalexau
Copy link
Author

Stars aren't everything, but in this case there's 30 times as many in one project vs the other, not to mention a commit log with similar polarisation. Given most of us aren't terribly interested in JNI and the Java 9 Unsafe dance is going to wipe out many convenient tricks, there's some safety in numbers....

Happy to take a look at your private repo or a public one. I just don't want to duplicate work you have already done. It'd also be good to get informal feedback along the way, rather than a huge PR at the end.

@krisskross
Copy link
Member

Cool, I'll create a repo later tonight and let you know when it's up.

@krisskross
Copy link
Member

Got a reply from Charles Nutter regarding performance of JNR-FFI.

We have done some testing for presentations, but not a formal suite of benchmarks to compare with raw JNI or other libraries like JNA. In our testing, however, the overall cost of making a simple native call is about what it is for hand-written JNI...significantly faster than JNA. As you start to use more advanced features, they add overhead too...but in most cases we (really, @wmeissner, the founder of the JNR projects) try to keep that overhead about what you'd have to accept to do the same features with JNI.

krisskross added a commit that referenced this pull request Jun 5, 2016
This reverts commit 14b264a, reversing
changes made to 9a29e34.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants