Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config executor cores #23

Merged
merged 6 commits into from
Jun 6, 2024
Merged

config executor cores #23

merged 6 commits into from
Jun 6, 2024

Conversation

Tianhao-Gu
Copy link
Collaborator

With the changes, we can control how many cores to use for the spark session.

Copy link

codecov bot commented Jun 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.92%. Comparing base (a493696) to head (c2a5cc4).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #23      +/-   ##
==========================================
+ Coverage   76.31%   76.92%   +0.60%     
==========================================
  Files           1        1              
  Lines          38       39       +1     
==========================================
+ Hits           29       30       +1     
  Misses          9        9              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -10,6 +10,7 @@
HADOOP_AWS_VER = os.getenv('HADOOP_AWS_VER')
DELTA_SPARK_VER = os.getenv('DELTA_SPARK_VER')
SCALA_VER = os.getenv('SCALA_VER')
DEFAULT_EXECUTOR_CORES = 1 # the default number of CPU cores that each Spark executor will use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this argument how many cores does each executor use? Can you document that here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments. I noticed Spark will take all cores on the worker if we don't set this.

"""
Helper function to get Delta Lake specific Spark configuration.

:param jars_str: A comma-separated string of JAR file paths
:param executor_cores: The number of CPU cores that each Spark executor will use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this only applies if Delta lake is true and the user doesn't call the get base conf method. It seems like it should always apply

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -58,32 +63,39 @@ def _stop_spark_session(spark):
spark.stop()


def get_base_spark_conf(app_name: str) -> SparkConf:
def get_base_spark_conf(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a private method? If not executor_cores should probably be a default arg

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 i think it makes sense to make it a private method.

@Tianhao-Gu Tianhao-Gu merged commit 33c02dc into main Jun 6, 2024
9 checks passed
@Tianhao-Gu Tianhao-Gu deleted the dev_executor_cores branch June 6, 2024 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants