Translate Pydantic models into a regular expression that accept the corresponding YAML #923

rlouf · 2024-05-27T16:30:18Z

No description provided.

lapp0 · 2024-05-29T07:13:13Z

I love the idea, yaml uses fewer syntactic tokens and allows language models to generate without needing to keep track of as much "nesting" / context.

Here's what I'm thinking for a strategy, would love to hear your thoughts:

We should refactor fsm/json_schema.py so it uses a class-based approach with handler methods for each type. Then we can subclass to implement the different behavior in yaml.

class JSONSchemaRegexGenerator:
    def __init__(self):
        self.handlers = {
            "string": self.handle_string,
            "array": self.handle_array,
            ...
        }

    @classmethod
    def get_pattern(cls, schema):
        return cls().handle_node(schema)

    def get_pattern(self, node):
        handler = self.handlers.get(node["type"], self.handle_default)
        return handler(node)

    def handle_string(self, node):
        return STRING

    def handle_array(self, node):
        ...
        return rf"\[{whitespace_pattern}({'|'.join(regexes)})(,{whitespace_pattern}({'|'.join(regexes)})){num_repeats}){allow_empty}{whitespace_pattern}\]"


class YAMLSchemaRegexGenerator(JSONSchemaRegexGenerator):
    def handle_array(self, node):
        """handle format for yaml arrays:
            - elem0
            - elem1
        """
        ...

This would make the code more readable, extensible, reduce technical debt, and make it so we don't have to have conditional handling for a passed is_yaml for many rules within to_regex()

rlouf · 2024-06-05T12:17:51Z

I can get on board with this. To follow ast.NodeVisitor's naming scheme we could name the handlers visit_X. I think we should first implement a first version of the converter to YAML with only a few primitives before refactoring.

rlouf added the enhancement label May 27, 2024

rlouf added this to Improve Outlines May 27, 2024

rlouf moved this to Todo in Improve Outlines May 27, 2024

rlouf mentioned this issue Jun 13, 2024

Yaml Grammar #871

Closed

patricebechard linked a pull request Jul 6, 2024 that will close this issue

Translate JSON Schema to YAML regex #1022

Open

rlouf linked a pull request Aug 21, 2024 that will close this issue

Translate JSON Schema to YAML regex #1022

Open

lapp0 linked a pull request Oct 1, 2024 that will close this issue

Refactor json_schema.py, implement JSON Schema to YAML #1182

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translate Pydantic models into a regular expression that accept the corresponding YAML #923

Translate Pydantic models into a regular expression that accept the corresponding YAML #923

rlouf commented May 27, 2024

lapp0 commented May 29, 2024 •

edited by rlouf

Loading

rlouf commented Jun 5, 2024

Translate Pydantic models into a regular expression that accept the corresponding YAML #923

Translate Pydantic models into a regular expression that accept the corresponding YAML #923

Comments

rlouf commented May 27, 2024

lapp0 commented May 29, 2024 • edited by rlouf Loading

rlouf commented Jun 5, 2024

lapp0 commented May 29, 2024 •

edited by rlouf

Loading