-
Notifications
You must be signed in to change notification settings - Fork 240
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unable to use Spark Rapids with Spark Thrift Server #9867
Comments
Have you tried not placing the RAPIDS Accelerator jar in the Spark jars directory and instead of specifying the driver/executor classpath via the command-line and configs, use the --jars flag instead? e.g.:
This worked for me, I was able to show tables and select elements from a table using pyhive to connect to the Spark thriftserver and verified via the Hive thiftserver log that the RAPIDS Accelerator was being used during the queries. |
I have tried to reproduce as well with different modalities for jar submission. So far I have not been able to.
This looks like an outdated note, $ cat ~/dist/spark-3.5.0-bin-hadoop3/RELEASE
Spark 3.5.0 (git revision ce5ddad9903) built for Hadoop 3.3.4
Build flags: -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pscala-2.12 -Phadoop-3 -Phive -Phive-thriftserver
This has worked for me too, but this is my least-favorite deployment option. It typically is only required with a standalone mode (your case) but only when the RapidsShuffleManager is used as well (not present in your conf). It does not look like the case here but mixing
this is not necessary when you already placed your jar under In your setup it looks cleanest if you remove the jar from $SPARK_HOME/jars and start thriftserver with --jars. |
Hi @gerashegalov @jlowe. Thanks for your helps. Here is a youtube link which shows how I encountered this error. Furthermore, I am using Spark 3.3.0 instead of newest spark version 3.5.0 so that I need recompile my spark package. I can try with spark 3.5.0 later. |
Thanks for the demo @LIN-Yu-Ting. I was using Standard Spark build for 3.3.0 works with beeline for me. And again I am not sure why you need to recompile Spark for hivethriftserver. It should already be there.
At any rate, can you provide your exact build command to double-check if it is about custom build? |
@gerashegalov However, when I execute SQL query using Superset through PyHive then I got the above exception, which is quite weird. |
@jlowe @gerashegalov I got more information from logs of Spark Thrift Server which might give us more insight. Actually, error occurs when superset is executing a command
Can you please try from your side to execute this command to see whether you can reproduce errors or not ? Thanks a lot.
Note:
from my side to either beeline or PyHive. All are executable. Only SQL query from superset will fail. |
We have seen issues with multisession notebooks such as Databricks. I will give a try on Superset, not familiar with it yet. |
@LIN-Yu-Ting another suggestion to quickly unblock while we are looking at it. Classloading issues are likely to go away if you build our artifact from scratch using the instructions for a single-spark-version build To this end check out or download the source for the version tag. In your case the Apache Spark version you want to build for is 3.3.0 which can be accomplished by running from local repo's root dir: mvn package -pl dist -am -Dbuildver=330 -DallowConventionalDistJar=true -DskipTests Since the tests are skipped you do not need a GPU on the machine used for the build. The artifact will be under: |
@gerashegalov Thanks for providing this workaround. I have tried to build locally and replace the jar. However, unfortunately, I still got the same error as before. Anyway, I appreciate. |
@LIN-Yu-Ting Can you double-check that your jar is indeed "conventional"? The output from running this command below should be $ jar tvf dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar | grep -c spark3xx
0 |
@gerashegalov Here is the printscreen after executing this command: |
with the new jar you built can you place it back into the $SPARK_HOME/jars/ directory and try it there? remove it from the --jars parameter. |
Thanks a lot for confirming @LIN-Yu-Ting. Can we try one more thing? Can you start the thrift server with additional params to enable verbose classloading: and grep the the thrift server / Driver log for rapids-4-spark jar to rule out additional jars on the classpath
In branch-24.02 we also have a new feature of detecting duplicate jars automatically #9654. You may want to try this #9867 (comment) again but with HEAD of branch-24.02. You can try the default but better add |
I was able to reproduce this I confirmed that it goes away with the simple Even with the simple jar we have While the workaround for |
I confirmed that with simple -DallowConventionalDistJar=true build and with a static classpath $SPARK_HOME/jar, there is no NoClassDefFoundError anymore. Thanks a lot @gerashegalov. |
Going back to our multi-shim production jar, It looks like there is a race condition that affects all the sessions if the user classloader deployment --jars is used. After So another workaround is to run a single session beeline
before allowing traffic to the thrift server from superset. We should review the usage of lazy vals |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Describe the bug
Our objective is to activate Spark Rapids (SQLPlugin) with Spark Thrift Server. However, we encountered some exception related to ClassNotFound. For your reference, Spark Thrift Server is also known as Distributed SQL Engine.
Steps/Code to reproduce bug
You need to launch Spark Thrift Server with $SPARK_HOME/sbin/start-thriftserver.sh with following steps:
Expected behavior
Under folder $SPARK_HOME/logs, you will see a log related to Spark Thrift Server with following exception:
Environment details (please complete the following information)
spark.driver.cores 1
spark.driver.memory 2g
spark.driver.extraClassPath /home/spark-current/jars/rapids-4-spark.jar
spark.executor.cores 1
spark.executor.memory 6g
spark.executor.resource.gpu.discoveryScript /tmp/getGpusResources.sh
spark.executor.resource.gpu.amount 1
spark.executor.extraClassPath /home/spark-current/jars/rapids-4-spark.jar
spark.task.resource.gpu.amount 0.5
spark.task.cpus 1
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.dynamicAllocation.minExecutors 1
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator org.bdgenomics.adam.serialization.ADAMKryoRegistrator
spark.executor.extraJavaOptions -XX:+UseG1GC
spark.hadoop.io.compression.codecs org.seqdoop.hadoop_bam.util.BGZFEnhancedGzipCodec
spark.local.dir /mnt
spark.rapids.memory.pinnedPool.size 2g
spark.rapids.sql.concurrentGpuTasks 2
spark.rapids.sql.csv.read.double.enabled true
spark.rapids.sql.hasNans true
spark.rapids.sql.explain ALL
spark.plugins com.nvidia.spark.SQLPlugin
Additional context
These exceptions only happen with Thrift Server. With the same configurations, I am able to launch spark-shell and execute whatever sql commands that I want.
The text was updated successfully, but these errors were encountered: