You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
NVIDIA/spark-rapids#8761 added URL parsing (Actually URI parsing, spark made the switch a while ago on the backend), but to do it we had to make a regular expression that is crazy big. A lot of that size comes from trying to be IPv6 Compatible. Really we should write a custom kernel to save memory, and performance.
Here is a direct translation of the URL parsing part of the Java URI lib to C++ that might help this issue.
We cannot use that code. The java code is under a GPL license with a classpath exception. Our license is Apache. The two are not compatible with each other. Do not copy any of the java or code from the GIST url above to the kernel we produce or we will have to write it over again from scratch.
Is your feature request related to a problem? Please describe.
NVIDIA/spark-rapids#8761 added URL parsing (Actually URI parsing, spark made the switch a while ago on the backend), but to do it we had to make a regular expression that is crazy big. A lot of that size comes from trying to be IPv6 Compatible. Really we should write a custom kernel to save memory, and performance.
https://www.rfc-editor.org/rfc/rfc2396 describes the official syntax, but we could probably find/use another Apache licensed URI parser like https://github.com/apache/commons-vfs/blob/master/commons-vfs2/src/main/java/org/apache/commons/vfs2/provider/UriParser.java
Tasks
The text was updated successfully, but these errors were encountered: