Skip to content

Commit

Permalink
add comments
Browse files Browse the repository at this point in the history
  • Loading branch information
Tianhao-Gu committed Jun 6, 2024
1 parent e8a9bc2 commit fbeacd1
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions src/spark/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ def read_csv(
Read a file in CSV format from minIO into a Spark DataFrame.
:param spark: The Spark session.
:param path: The minIO path to the CSV file.
:param path: The minIO path to the CSV file. e.g. s3a://bucket-name/file.csv or bucket-name/file.csv
:param header: Whether the CSV file has a header. Default is True.
:param sep: The delimiter to use. If not provided, the function will try to detect it.
:param kwargs: Additional arguments to pass to spark.read.csv.
Expand All @@ -169,7 +169,7 @@ def read_csv(
bucket, key = path.replace("s3a://", "").split("/", 1)
obj = client.get_object(bucket, key)
sample = obj.read(1024).decode()
sep = _detect_delimiter(sample)
sep = _detect_delimiter(sample) # In the event that _detect_delimiter returns None, spark.read.csv will use the default delimiter ','
print(f"Detected delimiter: {sep}")

Check warning on line 173 in src/spark/utils.py

View check run for this annotation

Codecov / codecov/patch

src/spark/utils.py#L167-L173

Added lines #L167 - L173 were not covered by tests

df = spark.read.csv(path, header=header, sep=sep, **kwargs)

Check warning on line 175 in src/spark/utils.py

View check run for this annotation

Codecov / codecov/patch

src/spark/utils.py#L175

Added line #L175 was not covered by tests
Expand Down

0 comments on commit fbeacd1

Please sign in to comment.