Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding DFLib Java DataFrame to comparison #4

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

andrus
Copy link

@andrus andrus commented Jan 11, 2025

Hi @mathijs81 ,

Found your Java DataFrame comparison blog recently. It is a pretty cool comparison, and I wanted to add one more contender : DFLib 🙂

DFLib is pure Java and provides immutable DataFrames and nice fluent transformations API (and Jupyter integration, and charts, etc.). While Tablesaw and Joinery seem dormant, DFLib is a very active project. This PR is based on the last public release (1.1.0). In 2.0, CSV loading will become speedier, once we switch away from Apache commons-csv, so the benchmark should become even better. But even with the slower CSV loader, DFLib is much faster than Tablesaw. In my test it was 409.3 (TS) vs 153.3 (DFLib).

I can't fully build the master branch because of the specific joinery and krangl dependencies missing on Maven Central. But if you already have them locally, my PR should work for you. As far as the build goes, it switches Java to 11 (minimal DFLib requirement) and adds a single dependency.

Would be great to see what numbers you end up with for DFLib on your side 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant