Skip to content

Commit

Permalink
Update compare_df to display the diff report on column differences …
Browse files Browse the repository at this point in the history
…not just rows (#2040)

Prevents false positives when columns are missing in output

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: #2040
  • Loading branch information
dagardner-nv authored Nov 20, 2024
1 parent 8d8cb01 commit 2bb3859
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
5 changes: 3 additions & 2 deletions python/morpheus/morpheus/utils/compare_df.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ def compare_df(df_a: pd.DataFrame,

total_rows = len(df_a_filtered)
diff_rows = len(df_a_filtered) - int(comparison.count_matching_rows())
diff_cols = len(extra_columns) + len(missing_columns)

if (comparison.matches()):
logger.info("Results match validation dataset")
Expand All @@ -141,7 +142,7 @@ def compare_df(df_a: pd.DataFrame,

mismatch_df = merged.loc[mismatched_idx]

if diff_rows > 0:
if diff_rows > 0 or diff_cols > 0:
logger.debug("Results do not match. Diff %d/%d (%f %%). First 10 mismatched rows:",
diff_rows,
total_rows,
Expand All @@ -160,5 +161,5 @@ def compare_df(df_a: pd.DataFrame,
"matching_cols": list(same_columns),
"extra_cols": list(extra_columns),
"missing_cols": list(missing_columns),
"diff_cols": len(extra_columns) + len(missing_columns)
"diff_cols": diff_cols
}
2 changes: 1 addition & 1 deletion scripts/compare_data_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def main():
abs_tol=args.abs_tol,
rel_tol=args.rel_tol)

if results['diff_rows'] > 0:
if results['diff_rows'] > 0 or results['diff_cols'] > 0:
sys.exit(1)


Expand Down

0 comments on commit 2bb3859

Please sign in to comment.