Convert/ extend Signature field from a string to structured data. #249

Carreau · 2023-04-19T12:13:53Z

It would be nice to store the signature in a more structured form in order to be able to do some operation after the fact.

We probably need to iterate a bit on it, and maybe some part of the signature need should be str anyway, but we might want to store:

async or not.
Generator function or not.
each parameter name, wether they are positional or key-word.
The default value (maybe as a string), for each.
The type annotation (as a string or structured?)

Even if the rendering in myst, for now we convert all back to string, it's ok, but it would allow us to dynamically modify the rendering to for example not display default values, or not display deprecated parameter, or do search like "is async" or not.

For the default values/annotation, I'm assuming it might be ok to store as string for now, but in the long run I'm hopping we can also compress/expand those and link to relevant object.

For example

DataFrame.aggegate(column:Union[str, List[str]], method=sum,....)

I think in the end we should be able to make tokens like Union, str List link to the relevant object, and maybe
even expand it to if the type is structured

I'm not sure how to do that yet, but I think it's something we should keep in mind.

The text was updated successfully, but these errors were encountered:

Carreau · 2023-04-19T12:17:55Z

For how to store signature we may want to look into https://pypi.org/project/griffe/

aktech · 2023-04-22T20:27:20Z

griffe looks promising so far:

In [1]: import griffe
   ...:
   ...: papyri = griffe.load("papyri")

In [2]: papyri.functions
Out[2]:
{'_intro': <Function('_intro', 228, 235)>,
 'ingest': <Function('ingest', 238, 266)>,
 'install': <Function('install', 272, 401)>,
 'relink': <Function('relink', 404, 414)>,
 'pack': <Function('pack', 477, 481)>,
 'bootstrap': <Function('bootstrap', 484, 492)>,
 'ascii': <Function('ascii', 512, 519)>,
 'serve': <Function('serve', 522, 527)>,
 'serve_static': <Function('serve_static', 530, 544)>,
 'browse': <Function('browse', 547, 551)>,
 'build_parser': <Function('build_parser', 554, 574)>,
 'open': <Function('open', 577, 590)>,
 'sample_async_function': <Function('sample_async_function', 593, 594)>}

In [3]: sample_async_function = papyri.functions["sample_async_function"]

In [4]: vars(sample_async_function)
Out[4]:
{'name': 'sample_async_function',
 'lineno': 593,
 'endlineno': 594,
 'docstring': None,
 'parent': <Module(PosixPath('papyri/__init__.py'))>,
 'members': {},
 'labels': {'async'},
 'imports': {},
 'exports': None,
 'aliases': {},
 'runtime': True,
 '_lines_collection': None,
 '_modules_collection': None,
 'parameters': <griffe.dataclasses.Parameters at 0x1075b75e0>,
 'returns': None,
 'decorators': [],
 'setter': None,
 'deleter': None,
 'overloads': None}

In [5]: ingest_function = papyri.functions["ingest"]

In [6]: ingest_function.parameters['check'].as_dict()
Out[6]:
{'name': 'check',
 'annotation': Name(source='bool', full='bool'),
 'kind': <ParameterKind.positional_or_keyword: 'positional or keyword'>,
 'default': 'False'}

In [7]: ingest_function.parameters._parameters_dict
Out[7]:
{'paths': <griffe.dataclasses.Parameter at 0x107599400>,
 'check': <griffe.dataclasses.Parameter at 0x1075994c0>,
 'relink': <griffe.dataclasses.Parameter at 0x10759b1c0>,
 'dummy_progress': <griffe.dataclasses.Parameter at 0x10759b2b0>}

In [8]: ingest_function.parameters['check'].as_dict()
Out[8]:
{'name': 'check',
 'annotation': Name(source='bool', full='bool'),
 'kind': <ParameterKind.positional_or_keyword: 'positional or keyword'>,
 'default': 'False'}

I think the first step could be to return the griffe's Function object instead of the string signature, what do you think?

Carreau · 2023-04-24T08:18:59Z

Yes, I think that should work.

I think you can also use

from griffe.encoders import JSONEncoder
import json
print(json.dumps(ingest_function, cls=JSONEncoder, indent=2))

And that should give you a JSON representation of the function.

Though it seem the type annotations are still strings, but we can take care of this later.

"default": "typer.Option(False, help='Disable rich progress bar')"

Note that griffe is based on ast-maniputation and papyri does actually import the objects. So there might be some
mangling to do.

pawamoy · 2023-05-14T21:14:40Z

Hello, chiming in :)

For values (actual values of attributes or default values of parameters), Griffe indeed simply unparses the AST into a string.

For type annotatons however, it builds complete "expressions", which are recursive lists of strings (delimiters like [](){}.,), "names" (which reference other objects and can be resolved to their full path) and expressions (the recursive part).

Storing them like this allows to render signatures with links for each part of each type annotation.

In the future we might store values the same way.

pawamoy · 2023-08-25T11:17:56Z

Update: Griffe changed its way of storing expressions. They're now a tree of proper objects representing Python code (basically ast with a specialized class for names, to resolve them easily). Also, we now parse attribute values, decorators, and class bases as expressions too! Previously it was just type annotations 🙂

asmeurer · 2023-09-21T20:29:52Z

Here's how griffe represents the annotation list[int] | int:

            "annotation": {
              "left": {
                "left": {
                  "name": "list",
                  "cls": "ExprName"
                },
                "slice": {
                  "name": "int",
                  "cls": "ExprName"
                },
                "cls": "ExprSubscript"
              },
              "operator": "|",
              "right": {
                "name": "int",
                "cls": "ExprName"
              },
              "cls": "ExprBinOp"
            },

Reconstructing this would be harder work. It would be simpler to just store the annotation as a string, but this would break compatibility if we wanted to add it in the future. Maybe we can store the string form as annotation_string and leave annotation out for now.

But I'm not clear just how much we want to be compatible with griffe. Also, it would probably be useful if griffe also stored the string form of the annotation in addition to the ast form.

pawamoy · 2023-09-21T21:21:27Z

It's a bit more readable if we put the class name at the top of each dict:

"annotation": {
  "cls": "ExprBinOp",
  "left": {
    "cls": "ExprSubscript",
    "left": {
      "cls": "ExprName",
      "name": "list",
    },
    "slice": {
      "cls": "ExprName",
      "name": "int",
    },
  },
  "operator": "|",
  "right": {
    "cls": "ExprName",
    "name": "int",
  }
}

I'm not familiar with the Papyri project so I'll suggest things that might not be acceptable to you, please forgive me 🙂

If you actually depend on Griffe, you can directly work with expressions. Calling str on an expression will "unparse" it into a string.

You can iterate on expressions, recursively, in depth-first order, to process specific types of sub-expressions (subscripts, attributes, function calls, etc.) if you need to. With the previous example, your (recursive) iteration would yield:

[
    ExprBinOp(...),
    ExprSubscript(...)
    ExprName("list"),
    "[",
    ExprName("int"),
    "]",
    "|",
    ExprName("int"),
]

If you iterate with flat=True, the recursion is handled for you and the iteration yields names and punctuation directly:

[
    ExprName("list"),
    "[",
    ExprName("int"),
    "]",
    "|",
    ExprName("int"),
]

And that's exactly how Griffe outputs strings from expressions:

def __str__(self):
    return "".join(elem if isinstance(elem, str) else elem.name for elem in self.iterate(flat=True))

You can dump and reload expression to/from JSON:

import json
from griffe.encoders import json_decoder

annotation = json.dumps({
  "cls": "ExprBinOp",
  "left": {
    "cls": "ExprSubscript",
    "left": {
      "cls": "ExprName",
      "name": "list"
    },
    "slice": {
      "cls": "ExprName",
      "name": "int",
    }
  },
  "operator": "|",
  "right": {
    "cls": "ExprName",
    "name": "int"
  }
})

json.loads(annotation, object_hook=json_decoder)

ExprBinOp(left=ExprSubscript(left=ExprName(name='list', parent=None), slice=ExprName(name='int', parent=None)), operator='|', right=ExprName(name='int', parent=None))

...though in this example we don't have the parent information allowing to resolve names to their full path (but it's possible to have it).

If you're only looking for how to store annotations, without having to depend on Griffe, then I'll just share my experience with you: previously I stored them as simple lists of punctuation and names (the flat example above). Well, it is not enough once you start having more demanding rendering options, such as:

rendering only the last part of attributes (Sequence instead of typing.Sequence as written in the source)
or the opposite, rendering full paths (typing.Mapping instead of just Mapping)
unwrapping subscripts (int instead of Annotated[int, Gt(10)])
etc.

Ultimately, instead of special casing names, then attributes, then subscripts, then ..., I went all in and now store an actual AST, with just the necessary, additional info (parent) that allows me to resolve names to their full paths (in turn allowing interlinks and cross-references).

It sure is a bit more... chunky, but it's far more robust, powerful, and flexible 😄 Getting the N-th item of an optional tuple before was super prone to errors, now it's a breeze.

asmeurer · 2023-09-21T21:29:45Z

The griffe JSON is already pretty similar to the JSON generated by the myst_serialize. Here's Griffe for

def def f(x1, /, x2, x3=None, *, x4, x5=None): pass

        "parameters": [
          {
            "name": "x1",
            "annotation": null,
            "kind": "positional-only",
            "default": null
          },
          {
            "name": "x2",
            "annotation": null,
            "kind": "positional or keyword",
            "default": null
          },
          {
            "name": "x3",
            "annotation": null,
            "kind": "positional or keyword",
            "default": "None"
          },
          {
            "name": "x4",
            "annotation": null,
            "kind": "keyword-only",
            "default": null
          },
          {
            "name": "x5",
            "annotation": null,
            "kind": "keyword-only",
            "default": "None"
          }
        ],

And here's what papyri already gives

  "parameters": [
    {
      "annotation": null,
      "default": null,
      "kind": "POSITIONAL_ONLY",
      "name": "x1",
      "type": "ParameterNode"
    },
    {
      "annotation": null,
      "default": null,
      "kind": "POSITIONAL_OR_KEYWORD",
      "name": "x2",
      "type": "ParameterNode"
    },
    {
      "annotation": null,
      "default": "None",
      "kind": "POSITIONAL_OR_KEYWORD",
      "name": "x3",
      "type": "ParameterNode"
    },
    {
      "annotation": null,
      "default": null,
      "kind": "KEYWORD_ONLY",
      "name": "x4",
      "type": "ParameterNode"
    },
    {
      "annotation": null,
      "default": "None",
      "kind": "KEYWORD_ONLY",
      "name": "x5",
      "type": "ParameterNode"
    }
  ],

The biggest difference is that papyri is using the variable name of the inspect._ParameterKind enum, and Griffe is using the human readable form https://github.com/python/cpython/blob/b3af888342db12915f0cdaaacbdc61aadfb62eff/Lib/inspect.py#L2630-L2635

(side note: can someone explain to me why an "int enum" has variables assigned to strings, and how you can actually access those?)

pawamoy · 2023-09-21T21:38:51Z

(wow)

Looks like each new value gets replaced by an incremented int, while attaching the string to its description attribute:

    def __new__(cls, description):
        value = len(cls.__members__)   # when first value (POSITIONAL_ONLY) is declared, value is 0
        member = int.__new__(cls, value)   # so member is IntEnum(0)  (or something like that??)
        member._value_ = value  # its value is 0
        member.description = description  # its description is 'positional-only'
        return member  # return and proceed to next declared value

Mind blown.

You then get descriptions with _ParameterKind.POSITIONAL_ONLY.description.

asmeurer · 2023-09-21T22:59:52Z

Work in progress here #289. Not sure if we should try to store the annotations in a way that is forward compatible in case we want to store an ast like griffe later. I still think it's a good idea to store the str in the JSON even if it can be technically reconstructed later.

aktech mentioned this issue May 3, 2023

Store structured function signature #250

Merged

Carreau assigned asmeurer Sep 21, 2023

Carreau added the higher-priority label Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert/ extend Signature field from a string to structured data. #249

Convert/ extend Signature field from a string to structured data. #249

Carreau commented Apr 19, 2023 •

edited

Loading

Carreau commented Apr 19, 2023 •

edited

Loading

aktech commented Apr 22, 2023

Carreau commented Apr 24, 2023

pawamoy commented May 14, 2023 •

edited

Loading

pawamoy commented Aug 25, 2023

asmeurer commented Sep 21, 2023

pawamoy commented Sep 21, 2023

asmeurer commented Sep 21, 2023 •

edited

Loading

pawamoy commented Sep 21, 2023 •

edited

Loading

asmeurer commented Sep 21, 2023

Convert/ extend Signature field from a string to structured data. #249

Convert/ extend Signature field from a string to structured data. #249

Comments

Carreau commented Apr 19, 2023 • edited Loading

Carreau commented Apr 19, 2023 • edited Loading

aktech commented Apr 22, 2023

Carreau commented Apr 24, 2023

pawamoy commented May 14, 2023 • edited Loading

pawamoy commented Aug 25, 2023

asmeurer commented Sep 21, 2023

pawamoy commented Sep 21, 2023

asmeurer commented Sep 21, 2023 • edited Loading

pawamoy commented Sep 21, 2023 • edited Loading

asmeurer commented Sep 21, 2023

Carreau commented Apr 19, 2023 •

edited

Loading

Carreau commented Apr 19, 2023 •

edited

Loading

pawamoy commented May 14, 2023 •

edited

Loading

asmeurer commented Sep 21, 2023 •

edited

Loading

pawamoy commented Sep 21, 2023 •

edited

Loading