Skip to content

Commit

Permalink
[GSProcessing] Fix parsing of file paths that start with "./" (#1091)
Browse files Browse the repository at this point in the history
*Issue #, if available:*

*Description of changes:*

* Some GConstruct or GSProcessing config files from customers could have
file paths that start with `./` which works when input is local but
break for S3. We fix this by removing the `./` prefix if it exists when
parsing the input files.
* Add change to config we use for testing to verify the change works
(although we'd probably need a more S3-specific test)

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
  • Loading branch information
thvasilo authored Nov 14, 2024
1 parent d0873f1 commit f5cf632
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"""

from dataclasses import dataclass
from typing import Sequence, Optional
from typing import Optional

from graphstorm_processing.constants import SUPPORTED_FILE_TYPES

Expand All @@ -27,7 +27,7 @@ class DataStorageConfig:
"""

format: str
files: Sequence[str]
files: list[str]
separator: Optional[str] = None

def __post_init__(self):
Expand All @@ -39,3 +39,7 @@ def __post_init__(self):
raise ValueError(
f"File paths need to be relative (not starting with '/'), got : {file}"
)

for idx, file in enumerate(self.files):
if file.startswith("./"):
self.files[idx] = file[2:]
Original file line number Diff line number Diff line change
Expand Up @@ -939,7 +939,7 @@ def process_node_data(self, node_configs: Sequence[NodeConfig]) -> Dict:
self.graph_info["ntype_to_label_masks"] = defaultdict(list)
for node_config in node_configs:
files = node_config.files
file_paths = [f"{self.input_prefix}/{f}" for f in files]
file_paths = [os.path.join(self.input_prefix, f) for f in files]

node_type = node_config.ntype
node_col = node_config.node_col
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
"data": {
"format": "csv",
"files": [
"nodes/movie.csv"
"./nodes/movie.csv"
],
"separator": ","
},
Expand Down Expand Up @@ -104,7 +104,7 @@
"data": {
"format": "csv",
"files": [
"edges/movie-included_in-genre.csv"
"./edges/movie-included_in-genre.csv"
],
"separator": ","
},
Expand Down

0 comments on commit f5cf632

Please sign in to comment.