Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] custom kernel for URI/URL parsing #1326

Closed
8 tasks done
revans2 opened this issue Aug 9, 2023 · 3 comments
Closed
8 tasks done

[FEA] custom kernel for URI/URL parsing #1326

revans2 opened this issue Aug 9, 2023 · 3 comments

Comments

@revans2
Copy link
Collaborator

revans2 commented Aug 9, 2023

Is your feature request related to a problem? Please describe.

NVIDIA/spark-rapids#8761 added URL parsing (Actually URI parsing, spark made the switch a while ago on the backend), but to do it we had to make a regular expression that is crazy big. A lot of that size comes from trying to be IPv6 Compatible. Really we should write a custom kernel to save memory, and performance.

https://www.rfc-editor.org/rfc/rfc2396 describes the official syntax, but we could probably find/use another Apache licensed URI parser like https://github.com/apache/commons-vfs/blob/master/commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/UriParser.java

Tasks

Preview Give feedback
  1. hyperbolic2346
  2. enhancement
    hyperbolic2346
  3. 3 of 3
    hyperbolic2346
@thirtiseven
Copy link
Collaborator

thirtiseven commented Oct 6, 2023

Here is a direct translation of the URL parsing part of the Java URI lib to C++ that might help this issue.

Deleted the gist as it is not allowed by the license.

@revans2
Copy link
Collaborator Author

revans2 commented Oct 6, 2023

Here is a direct translation of the URL parsing part of the Java URI lib to C++ that might help this issue.

We cannot use that code. The java code is under a GPL license with a classpath exception. Our license is Apache. The two are not compatible with each other. Do not copy any of the java or code from the GIST url above to the kernel we produce or we will have to write it over again from scratch.

@hyperbolic2346
Copy link
Collaborator

These have been implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants