Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geo spatial interfaces #16029

Merged
merged 7 commits into from
Apr 1, 2024

Conversation

pranavbhole
Copy link
Contributor

@pranavbhole pranavbhole commented Mar 2, 2024

Description

  1. This PR creates an interface for ImmutableRTree and moved the existing implementation to new class which represent 32 bit implementation (stores coordinate as floats). This PR makes the ImmutableRTree extendable to create higher precision implementation as well (64 bit).
  2. In all spatial bound filters, we accept float as input which might not be accurate in the case of high precision implementation of ImmutableRTree. This PR changed the bound filters to accepts the query bounds as double instead of float and it is backward compatible change as it compares double to existing float values in RTree. Previously it was comparing input float to RTree floats which can cause precision loss, now it is little better as it compares double to float which is still not 100% accurate.
  3. There are no changes in the way that we query spatial dimension today except input bound parsing. There is little improvement in string filter predicate which now parse double strings instead of float and compares double to double which is 100% accurate but string predicate is only called when we dont have spatial index.
  4. With allowing the interface to extend ImmutableRTree, we allow to create high precision (HP) implementation and defines new search strategies to perform HP search Iterable<ImmutableBitmap> search(ImmutableDoubleNode node, Bound bound);
  5. With possible HP implementations, Radius bound filter can not really focus on accuracy, it is calculating Euclidean distance in comparing. As EARTH 🌍 is round and not flat, Euclidean distances are not accurate in geo system. This PR adds new param called 'radiusUnit' which allows you to specify units like meters, km, miles etc. It uses https://en.wikipedia.org/wiki/Haversine_formula to check if given geo point falls inside circle or not. Added a test that generates set of points inside and outside in RadiusBoundTest.

PR is fully backward compatible. Let me know if you have any questions.

Release Notes:
Adding radiusUnit in Radius Bound filter. radiusUnit : String value of radius unit in lowercase, default value is 'euclidean'. Allowed units are euclidean, meters, miles, kilometers.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Assert.assertFalse(bound.containsDouble(geoPoint));
float[] floatPoint = new float[]{
Float.parseFloat(String.valueOf(geoPoint[0])), // this is how legacy sptial index
Float.parseFloat(String.valueOf(geoPoint[1]))

Check notice

Code scanning / CodeQL

Missing catch of NumberFormatException Note test

Potential uncaught 'java.lang.NumberFormatException'.
@pranavbhole
Copy link
Contributor Author

Can you please take look at this?
@clintropolis @gianm @cheddar

Assert.assertTrue(distance < circleRadius);
Assert.assertTrue(bound.containsDouble(geoPoint));
float[] floatPoint = new float[]{
Float.parseFloat(String.valueOf(geoPoint[0])),

Check notice

Code scanning / CodeQL

Missing catch of NumberFormatException Note test

Potential uncaught 'java.lang.NumberFormatException'.
Assert.assertTrue(haversineDistance > circleRadius); // asserts that point is outside
Assert.assertFalse(bound.containsDouble(geoPoint));
float[] floatPoint = new float[]{
Float.parseFloat(String.valueOf(geoPoint[0])),

Check notice

Code scanning / CodeQL

Missing catch of NumberFormatException Note test

Potential uncaught 'java.lang.NumberFormatException'.
Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 85 to 88
double lat1 = coord1[0];
double lon1 = coord1[1];
double lat2 = coord2[0];
double lon2 = coord2[1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe just inline these in the next 4 double declarations since they aren't used separately

(also a lot of these could be marked final)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


/**
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe either add real javadoc or delete this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

/**
* An immutable representation of an {@link RTree} for spatial indexing.
*/
public final class ImmutableRTree32BitImpl implements ImmutableRTree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about ImmutableFloatRTree? or this is fine too, naming things is the worst

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

@pranavbhole pranavbhole force-pushed the geoSpatialInterfaces branch from 48f71dd to f827ea7 Compare March 21, 2024 05:22
Assert.assertTrue(bound.contains(geoPoint));
float[] floatPoint = new float[]{
Float.parseFloat(String.valueOf(geoPoint[0])),
Float.parseFloat(String.valueOf(geoPoint[1]))

Check notice

Code scanning / CodeQL

Missing catch of NumberFormatException Note test

Potential uncaught 'java.lang.NumberFormatException'.
}

@Test
public void testDeSerForDoubleArrays() throws JsonProcessingException
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: test name is stale i guess. Also instead of doing it with a string, it would be nicer to make a RectangularBound serialize it to string, and then deserialize back to an object to make sure round tripping works ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified test


@JsonCreator
public RadiusBound(
@JsonProperty("coords") float[] coords,
@JsonProperty("radius") float radius,
@JsonProperty("limit") int limit
@JsonProperty("limit") int limit,
@JsonProperty("radiusUnit") RadiusUnit radiusUnit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably add a serde test to RadiusBoundTest with the new parameter. also it should be marked @Nullable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added more tests for nullable

Copy link
Contributor

@abhishekagarwal87 abhishekagarwal87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments but not a blocker so I will merge the PR.

Comment on lines +73 to +78
this.childrenOffset = initialOffset
+ offsetFromInitial
+ HEADER_NUM_BYTES
+ 2 * numDims * Float.BYTES
+ Integer.BYTES
+ bitmapSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.childrenOffset = initialOffset
+ offsetFromInitial
+ HEADER_NUM_BYTES
+ 2 * numDims * Float.BYTES
+ Integer.BYTES
+ bitmapSize;
this.childrenOffset = sizePosition
+ Integer.BYTES
+ bitmapSize;

Comment on lines +71 to +80
final int sizePosition = initialOffset + offsetFromInitial + HEADER_NUM_BYTES + 2 * numDims * Float.BYTES;
int bitmapSize = data.getInt(sizePosition);
this.childrenOffset = initialOffset
+ offsetFromInitial
+ HEADER_NUM_BYTES
+ 2 * numDims * Float.BYTES
+ Integer.BYTES
+ bitmapSize;

this.data = data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final int sizePosition = initialOffset + offsetFromInitial + HEADER_NUM_BYTES + 2 * numDims * Float.BYTES;
int bitmapSize = data.getInt(sizePosition);
this.childrenOffset = initialOffset
+ offsetFromInitial
+ HEADER_NUM_BYTES
+ 2 * numDims * Float.BYTES
+ Integer.BYTES
+ bitmapSize;
this.data = data;
this(numDims, initialOffset, offsetFromInitial, numChildren, leaf, data, bitmapFactory)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work as this( ) calls needs to first statement. I just renamed the class here with no other changes.

import java.util.Iterator;

/**
* Byte layout:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the child here and what is the dim?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nums of dims = size of float array, needs it here esp for byte layout. Child here in byte layout represents the child node stored at the childOffeset
this.childrenOffset = initialOffset
+ offsetFromInitial
+ HEADER_NUM_BYTES
+ 2 * numDims * Float.BYTES
+ Integer.BYTES
+ bitmapSize;

{
private static Random random = ThreadLocalRandom.current();

public static float[][] generateGeoCoordinatesAroundCircle(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since its only used in test, the method should be moved to test code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is already in test package.

@abhishekagarwal87 abhishekagarwal87 merged commit 20de7fd into apache:master Apr 1, 2024
85 checks passed
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants