Skip to content
This repository has been archived by the owner on Oct 5, 2023. It is now read-only.

Metadata: More flexible specification of metadata type #873

Open
neubig opened this issue Jun 23, 2023 · 0 comments
Open

Metadata: More flexible specification of metadata type #873

neubig opened this issue Jun 23, 2023 · 0 comments

Comments

@neubig
Copy link
Contributor

neubig commented Jun 23, 2023

Currently, string-based meta-data is treated as nominal if there are 20 or fewer examples, and text if there are 21 or more examples:

zeno/zeno/util.py

Lines 46 to 47 in 808f4b2

if len(unique) < 21:
return MetadataType.NOMINAL

This can be limiting. For example, I have a use case where I want to run slice finder on clusters found from a text clustering algorithm. As-is, this means that I am limited to 20 or fewer clusters, which is probably not granular enough to make these clusters meaningful.

One possible design would be if distill functions could (optionally) specify the metadata type like this:

return DistillReturn(distill_output=document_clusters, metadata_type=MetadataType.NOMINAL)

If no type is specified we could fall back to the current behavior (but that also should potentially be documented somewhere).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant