Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading and validating inputs #2

Open
JMMackenzie opened this issue Nov 22, 2024 · 1 comment
Open

Reading and validating inputs #2

JMMackenzie opened this issue Nov 22, 2024 · 1 comment

Comments

@JMMackenzie
Copy link
Contributor

Extend the code in utils.py to include validation for inputs, and potentially handling containers of inputs (mapping a TREC run file to either a list of lists, or a dictionary of lists, etc).

Validation needs to ensure that:

  • Document ranks are obeyed -- ranks are assumed to be strictly increasing, but gaps are allowed for representing tied elements. For example, having a sequence of ranks like [1, 1, 1, 4, 5] should be valid,
  • Scores are used only as a diagnostic against provided ranks. If the ranks and scores do not agree, either warn or error.
@JMMackenzie
Copy link
Contributor Author

Input file types need to be coerced into the appropriate type given the metric of choice.

Rankings

  • If a qrel file is to be treated as a ranking, we can format it into an RBRanking with the highest grade as the first (tied) group, and so on.
  • A trec run is naturally a ranking.

Sets

  • A qrel file is naturally a set in some sense, but we will convert to positive/negative sets according to some cutoff.
  • A trec run is treated as all positive; things not in the run are negative implicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant