config executor cores #23

Tianhao-Gu · 2024-06-06T13:49:17Z

With the changes, we can control how many cores to use for the spark session.

codecov · 2024-06-06T13:52:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.92%. Comparing base (a493696) to head (c2a5cc4).

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #23      +/-   ##
==========================================
+ Coverage   76.31%   76.92%   +0.60%     
==========================================
  Files           1        1              
  Lines          38       39       +1     
==========================================
+ Hits           29       30       +1     
  Misses          9        9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

MrCreosote · 2024-06-06T15:22:19Z

src/spark/utils.py

@@ -10,6 +10,7 @@
 HADOOP_AWS_VER = os.getenv('HADOOP_AWS_VER')
 DELTA_SPARK_VER = os.getenv('DELTA_SPARK_VER')
 SCALA_VER = os.getenv('SCALA_VER')
+DEFAULT_EXECUTOR_CORES = 1  # the default number of CPU cores that each Spark executor will use


Without this argument how many cores does each executor use? Can you document that here?

added comments. I noticed Spark will take all cores on the worker if we don't set this.

MrCreosote · 2024-06-06T15:48:42Z

src/spark/utils.py

    """
    Helper function to get Delta Lake specific Spark configuration.

    :param jars_str: A comma-separated string of JAR file paths
+    :param executor_cores: The number of CPU cores that each Spark executor will use


So this only applies if Delta lake is true and the user doesn't call the get base conf method. It seems like it should always apply

MrCreosote · 2024-06-06T16:24:35Z

src/spark/utils.py

@@ -58,32 +63,39 @@ def _stop_spark_session(spark):
    spark.stop()


-def get_base_spark_conf(app_name: str) -> SparkConf:
+def get_base_spark_conf(


Should this be a private method? If not executor_cores should probably be a default arg

👍 i think it makes sense to make it a private method.

config executor cores

8a620e2

MrCreosote reviewed Jun 6, 2024

View reviewed changes

add more comments

9a3f50d

MrCreosote reviewed Jun 6, 2024

View reviewed changes

Tianhao-Gu added 3 commits June 6, 2024 10:54

move executor cores config to default

f6e25f1

fix broken tests

e9fecd7

fix broken tests

4378c18

MrCreosote reviewed Jun 6, 2024

View reviewed changes

make get_base_spark_conf private func

c2a5cc4

MrCreosote approved these changes Jun 6, 2024

View reviewed changes

Tianhao-Gu merged commit 33c02dc into main Jun 6, 2024
9 checks passed

Tianhao-Gu deleted the dev_executor_cores branch June 6, 2024 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config executor cores #23

config executor cores #23

Tianhao-Gu commented Jun 6, 2024

codecov bot commented Jun 6, 2024 •

edited

Loading

MrCreosote Jun 6, 2024

Tianhao-Gu Jun 6, 2024

MrCreosote Jun 6, 2024

Tianhao-Gu Jun 6, 2024

MrCreosote Jun 6, 2024

Tianhao-Gu Jun 6, 2024

config executor cores #23

config executor cores #23

Conversation

Tianhao-Gu commented Jun 6, 2024

codecov bot commented Jun 6, 2024 • edited Loading

Codecov Report

MrCreosote Jun 6, 2024

Choose a reason for hiding this comment

Tianhao-Gu Jun 6, 2024

Choose a reason for hiding this comment

MrCreosote Jun 6, 2024

Choose a reason for hiding this comment

Tianhao-Gu Jun 6, 2024

Choose a reason for hiding this comment

MrCreosote Jun 6, 2024

Choose a reason for hiding this comment

Tianhao-Gu Jun 6, 2024

Choose a reason for hiding this comment

codecov bot commented Jun 6, 2024 •

edited

Loading