Skip to content

Commit

Permalink
specification in readme and bugfix in
Browse files Browse the repository at this point in the history
  • Loading branch information
piconti committed Aug 6, 2024
1 parent 9208421 commit 75f5bb2
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 1 deletion.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,8 @@ To do so, some simple modifications should be made to the process' code:
- Example instantiation:

```python
from impresso_commons.versioning.data_manifest import DataManifest

manifest = DataManifest(
data_stage="passim", # DataStage.PASSIM also accepted
s3_output_bucket="32-passim-rebuilt-final/passim", # includes partition within bucket
Expand Down
10 changes: 9 additions & 1 deletion impresso_commons/versioning/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -927,7 +927,15 @@ def compute_stats_in_entities_bag(
"content_items_out": 1,
"ne_mentions": len(ci["nes"]),
"ne_entities": sorted(
list(set([m["wkd_id"] for m in ci["nes"] if m["wkd_id"] != "NIL"]))
list(
set(
[
m["wkd_id"]
for m in ci["nes"]
if "wkd_id" in m and m["wkd_id"] not in ["NIL", None]
]
)
)
), # sorted list to ensure all are the same
}
)
Expand Down

0 comments on commit 75f5bb2

Please sign in to comment.