Features/registry #162

steffencruz · 2024-03-17T22:26:52Z

The goal of this PR is to introduce a registry data structure to manage both tasks and datasets.

Adds a task and dataset registry so that direct references to task and dataset classes are reduced. This will make the management of tasks and datasets, including exposing and disabling them, simpler. It will also streamline task development and review by reducing the number of changed files for Task PRs.
A further design improvement proposed in this PR enables a many-to-one mapping of datasets to tasks with a controllable way of selecting specific mappings.

p-ferreira · 2024-03-18T13:53:47Z

The way I see it, a Task depends on a Dataset to be created and a Reward model to be evaluated. Imho a config file could be a better place to position those configurations than raw python code, where the validator loads the definition of the "task framework" into the generic code that knows how to handle it.

prompting/__init__.py

prompting/task_registry.py

p-ferreira

Besides minor logging and formatting changes, I believe this approach is less user friendly than relying in configs. This approach also don’t relate reward models to the task to be performed, which could be something to address in the future. In any case, it seems to be functional, if it’s currently working on prod and we have tests for it as it apparently is the case, we should be fine moving on with this approach.

prompting/__init__.py

prompting/tasks/mock.py

prompting/tools/__init__.py

tests/test_registry.py

steffencruz added 4 commits March 17, 2024 16:20

Add dataset registry

c8404cd

Add task-dataset registry, which will enable multi-dataset tasks

d285e08

Use registry for task creation

21beebc

Add arbitrary selector for improved control

58bdbd8

steffencruz marked this pull request as draft March 17, 2024 22:27

bkb2135 changed the base branch from main to pre-staging April 8, 2024 14:37

bkb2135 and others added 8 commits April 8, 2024 10:40

Merge branch 'pre-staging' into features/registry

ed2d07b

Resolve circular imports

35878b0

Resolve circular imports in prompting/conversation.py

35a53b1

Create separate task_registry file

df1885d

Remove llm import from init

dfb65a2

Import TASKS and DATASETS

cd9f592

Add Mock Task to Registry

dc00d3b

Add registry unit tests

a1eb07b

bkb2135 marked this pull request as ready for review April 8, 2024 16:20

Instantiate the datasets before task creation

b807ccc

steffencruz commented Apr 10, 2024

View reviewed changes

prompting/__init__.py Show resolved Hide resolved

prompting/task_registry.py Outdated Show resolved Hide resolved

bkb2135 and others added 4 commits April 10, 2024 14:04

Use class.name rather than hardcoded strings

6f2dda4

Rename date_qa

1fb74fe

Update Registry

8cb53bd

Update qa name

9aa843e

p-ferreira approved these changes Apr 15, 2024

View reviewed changes

prompting/__init__.py Show resolved Hide resolved

prompting/__init__.py Outdated Show resolved Hide resolved

prompting/tasks/mock.py Show resolved Hide resolved

prompting/tools/__init__.py Show resolved Hide resolved

tests/test_registry.py Outdated Show resolved Hide resolved

bkb2135 added 2 commits April 15, 2024 12:41

Remove print statements

a41ff1a

Update test_registry.py

f06994a

p-ferreira approved these changes Apr 15, 2024

View reviewed changes

bkb2135 merged commit be61a87 into pre-staging Apr 15, 2024
3 checks passed

Hollyqui deleted the features/registry branch August 2, 2024 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features/registry #162

Features/registry #162

steffencruz commented Mar 17, 2024

p-ferreira commented Mar 18, 2024

p-ferreira left a comment

Features/registry #162

Features/registry #162

Conversation

steffencruz commented Mar 17, 2024

p-ferreira commented Mar 18, 2024

p-ferreira left a comment

Choose a reason for hiding this comment