Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Profiling tool should detect if there is "file.encoding=UTF-8" #713

Closed
viadea opened this issue Jan 4, 2024 · 3 comments · Fixed by #736
Closed

[FEA] Profiling tool should detect if there is "file.encoding=UTF-8" #713

viadea opened this issue Jan 4, 2024 · 3 comments · Fixed by #736
Assignees
Labels
core_tools Scope the core module (scala) feature request New feature or request

Comments

@viadea
Copy link
Collaborator

viadea commented Jan 4, 2024

I wish Profiling tool should detect if there is "file.encoding=UTF-8 ​​​​​" from event log.
If it is empty or set to some other non-UTF8 encoding, we should raise the warning.

This is because, our regexp related functions might fallback to CPU if it is non-UTF8.
Such as below driver log message might show up:

!Expression <RegExpReplace> regexp_replace(xxx) cannot run on GPU because regular expression support is disabled because the GPU only supports the UTF-8 charset when using regular expressions
@viadea viadea added feature request New feature or request ? - Needs Triage labels Jan 4, 2024
@amahussein amahussein added core_tools Scope the core module (scala) and removed ? - Needs Triage labels Jan 10, 2024
@amahussein amahussein changed the title [FEA] Profiling tool should detect if there is "file.encoding=UTF-8 ​​​​​​" [FEA] Profiling tool should detect if there is "file.encoding=UTF-8" Jan 10, 2024
@amahussein
Copy link
Collaborator

This property is part of the environment information in the eventlog.
It should be one of the "System Properties"

For example

{
   "Event":"SparkListenerEnvironmentUpdate",
   "JVM Information":{
      "Java Home":"/usr/lib/jvm/java-8-openjdk-amd64/jre",
      "Java Version":"1.8.0_232 (Private Build)",
      "Scala Version":"version 2.12.10"
   },
   "Spark Properties":{
      
   },
   "Hadoop Properties":{
      
   },
   "System Properties":{
      "file.encoding":"UTF-8"
   },
   "Classpath Entries":{
      
   }
}

@amahussein
Copy link
Collaborator

@viadea How do you want this information to be reported?
Do you want it to be a log warning message? or part of the application output info?

@amahussein
Copy link
Collaborator

Does this mean we need to update the qualification to consider the regex not supported if the non UTF-8 files are used?

amahussein added a commit to amahussein/spark-rapids-tools that referenced this issue Jan 22, 2024
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

Fixes NVIDIA#713, Fixes NVIDIA#734

This PR changes the behavior of the AutTuner to display a comment when
the file-encoding of an application is set to a value that is not "utf-8".

The changes also improves the extraction of the RAPIDS jars values.

*Changes for 713*:

- Added a new field in `ApplicationSummaryInfo` that represents the
  systemProperties
- Capture SystemProperties in the App in order to be able to check the
  file-encoding
- Moved map properties to `CacheableProps` so that it can be used by the
  Qualification as well.
- Moved String-conversion methods from AutTuner to StringUtils
- Moved `getEventFromJsonMethod` to EventUtils object
- Added a new UnitTest for the AutoTuner
- Updated ApplicationInfoSuite unitTests

*Changes for 734*:

- Fixed the implementation of `CollectInformation.getRapidsJARInfo`
@amahussein amahussein self-assigned this Jan 22, 2024
amahussein added a commit that referenced this issue Jan 23, 2024
…ons (#736)

* [FEA] AutoTuner warns that non-utf8 may not support some GPU expressions

Fixes #713, Fixes #734

This PR changes the behavior of the AutTuner to display a comment when
the file-encoding of an application is set to a value that is not "utf-8".

The changes also improves the extraction of the RAPIDS jars values.

*Changes for 713*:

- Added a new field in `ApplicationSummaryInfo` that represents the
  systemProperties
- Capture SystemProperties in the App in order to be able to check the
  file-encoding
- Moved map properties to `CacheableProps` so that it can be used by the
  Qualification as well.
- Moved String-conversion methods from AutTuner to StringUtils
- Moved `getEventFromJsonMethod` to EventUtils object
- Added a new UnitTest for the AutoTuner
- Updated ApplicationInfoSuite unitTests

*Changes for 734*:

- Fixed the implementation of `CollectInformation.getRapidsJARInfo`

---------

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
2 participants