Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding better '<fields>' info with schema parser #327

Closed

Conversation

LeoGrosjean
Copy link

@LeoGrosjean LeoGrosjean commented Oct 28, 2023

  • Adding support to type
    "<value|type=string>"
  • Adding support to description
    "<value|type=string|description=a description>"
  • Adding support to Enum
    "<["value_1", ..., "value_n"]|type=string>"
  • Adding support to List[Any]
    ["<value|type=string>"]
  • Adding support to Format (replace type, tbd if not a good idea)
    ["<value|format=any>"]
  • Adding fix to nested Object

Test prompt has been updated
Many test runned with openai models, all ending by being able to instanciate custom model

prompt = prompt_func(NestedModel)
NestedModel(**json.load(model(prompt)))

TODO :

  • Add support to const
    class TemplateValueTask(str, Enum):
        task = "Tâche"
    
    #'TemplateValueTask': {'const': 'Tâche',
    # 'title': 'TemplateValueTask',
    # 'type': 'string'}
    class TemplateValueTask(BaseModel):
      task: Literal["Tâche"]
    
    # {'properties': {'task': {'const': 'Tâche', 'title': 'Task'}},
    # 'required': ['task'],
    # 'title': 'TemplateValueTask',
    # 'type': 'object'}

@LeoGrosjean
Copy link
Author

broken i monkey patch it (todo added)
use of try except a bit ugly, but do the job until refacto

@LeoGrosjean
Copy link
Author

For long schema, openai return max 1000 chr => easy to ask chatgpt to finish the json

I will make a way to allow user to tell how many max api call could be made to fulfill the json response

late but i can detail more

Do you want another pull request for it ?

@brandonwillard
Copy link
Member

@LeoGrosjean, thanks for the contribution! We'll follow up shortly with some questions and review comments.

@brandonwillard brandonwillard linked an issue Oct 29, 2023 that may be closed by this pull request
@brandonwillard brandonwillard marked this pull request as draft October 29, 2023 21:24
@LeoGrosjean
Copy link
Author

LeoGrosjean commented Oct 30, 2023

With xxlarge schema, the output could be text and not json

occured 2 times with a willing confused prompt (bad field's description + text unrelated to schema)
> with gpt-3.5-turbo

could be controlled if outputs[0] not in ["{", "["]

OUT OF CONTEXT, to be discuss elsewhere, might be due to syntax issue

@LeoGrosjean
Copy link
Author

I'm currently reading json schema conventions that pydantic is based on

I'll find a way to code more properly how we will parse schema from model_dump_json method

I'm currently rushing and monkey patching it for personal needs; but its working decently
same walking logic could be used elsewhere in outlines

Will update here a mermaid graph with the workflow I figure out

flowchart TD
    A[Model] -->|model_dump_json| B(raw schema)
    value[value]
    type[type]
    format[format]
    description[description]
Loading

@rlouf
Copy link
Member

rlouf commented Dec 19, 2023

Any updates on this?

@LeoGrosjean
Copy link
Author

holy kid has arrived, got code on my other computer, need to take time to push it !

but didnt take time to update the mermaid graph

will do it this week between two diapers :)

@LeoGrosjean
Copy link
Author

  • feature ready
  • code quality (might need a review/peer if anyone available)
  • writing tests for conditions features
  • real life usecase with any transformer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prompting: response_model | schema does not work with Enum
3 participants