Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skeleton architecture documentation #387

Merged
merged 1 commit into from
Feb 12, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
221 changes: 155 additions & 66 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,14 +77,24 @@ papyri enabled (left) and disabled (right).
![](assets/vs_math.png)
</detail>

---

## Table of contents

- [Installation](#installation)
- [Usage](#usage)
- [Rendering](#rendering)
- [Architecture](#architecture)

## Installation (not fully functional):

Some functionality is not yet available when installing from PyPI.
For now you need a dev-install (see next section) to access all features.
Some functionality is not yet available when installing from PyPI. For now you
need a [Development installation](#development-installation) to access all
features.

You'll need Python 3.8 or newer, otherwise pip will tell you it can't find any matching distribution.

Pip install from PyPI:
Install from PyPI:

```bash
$ pip install papyri
Expand All @@ -111,7 +121,7 @@ This will augment the `?` operator to show better documentation (when installed
*Papyri does not completely build its own docs yet, but you might be able to view a static rendering of it
[here](https://pydocs.github.io/). It is not yet automatically built, so might be out of date.*

### Development install
### Development installation

You may need to get a modified version of numpydoc depending on the stage of development. You will need [pip >
21.3](https://pip.pypa.io/en/stable/news/#v21-3-1) if you want to make editable installs.
Expand Down Expand Up @@ -149,19 +159,19 @@ $ pytest

## Usage

In the end there should be roughly 3 steps,
Papyri relies on three steps:

- IR generation (package maintainers)
- IR installation (end user or via pip/conda)
- IR rendering (usually IDE, CLI/webserver)
- IR generation (executed by package maintainers);
- IR installation (executed by end users or via pip/conda);
- IR rendering (usually executed by the IDE, CLI/webserver).

### IR Generation
### IR Generation (`papyri gen`)

This is the step you want to trigger if you are building documentation using Papyri for a library you maintain. Most
likely as an end user you will not have to issue this step and can install pre-published documentation bundles.
This step is likely to occur only once per new release of a project.

Look at the Toml files in `examples`, this will give you example configurations from some existing libraries.
The Toml files in `examples` will give you example configurations from some existing libraries.

```
$ ls -1 examples/*.toml
Expand All @@ -177,8 +187,8 @@ examples/skimage.toml

Right now these files lives in papyri but would likely be in relevant repositories under `docs/papyri.toml` later on.

It is _slow_ on full numpy/scipy; use `--no-infer` (see below) for a subpar but
faster experience.
> [!NOTE]
> It is _slow_ on full numpy/scipy; use `--no-infer` (see below) for a subpar but faster experience.

Use `papyri gen <path to example file>`

Expand All @@ -192,7 +202,16 @@ $ papyri gen examples/numpy.toml
$ papyri gen examples/scipy.toml
```

This will create intermediate docs files in in `~/.papyri/data/<library name>_<library_version>`
This will create intermediate docs files in in `~/.papyri/data/<library name>_<library_version>`. See [Generation](#generation-papyri-gen) for more details.

You can also generate intermediate docs files for a subset of objects using the `--only` flag. For example:

```
$ papyri gen examples/numpy.toml --only numpy:einsum
```

> [!IMPORTANT]
> To avoid ambiguity, papyri uses [fully qualified names](#qualified-names) to refer to objects. This means that you need to use `numpy:einsum` instead of `einsum` or `numpy.einsum` to refer to the `einsum` function in the `numpy` module, for example.


### Installation/ingestion
Expand All @@ -210,11 +229,11 @@ You can ingest local folders with the following command:
$ papyri ingest ~/.papyri/data/<path to folder generated at previous step>
```

This will crosslink the newly generate folder with the existing ones.
This will crosslink the newly generated folder with the existing ones.
Ingested data can be found in `~/.papyri/ingest/` but you are not supposed to
interact with this folder with tools external to papyri.

There is currently a couple of pre-built documentation bundles that can be
There are currently a couple of pre-built documentation bundles that can be
pre-installed, but are likely to break with each new version of papyri. We
suggest you use the developer installation and ingestion procedure for now.

Expand All @@ -225,134 +244,204 @@ is of interest to you. This will likely be done by your favorite IDE, probably
just in time when you explore documentation. Nonetheless, we've
implemented a couple of external renderers to help debug issues.

WARNING:

Many rendering methods current require papyri's own docs to be built and ingested
first.
> [!WARNING]
> Many rendering methods currently require papyri's own docs to be built and ingested first.

```
$ papyri gen examples/papyri.toml
$ papyri ingest ~/.papyri/data/papyri_0.0.7 # or any current version
```

Or you can try to pre-install an old papyri doc bundle
Or you can try to pre-install an old papyri doc bundle:

```
$ papyri install papyri
```

### Standalone HTML rendering

To see the rendered documentation for all packages previously ingested, run

```bash
$ papyri render # render all the html pages statically in ~/.papyri/html
$ papyri serve-static # start a http.server with the propoer root to serve above files.
$ papyri serve
```

This will start a live server that will render the pages on the fly.

If you need to render static versions of the pages, use either of the following
commands:

```bash
$ papyri serve # start a server that will render the pages on the fly (nice to debug or iterate on theme, rendering)
$ papyri render # render all the html pages statically in ~/.papyri/html
$ papyri serve-static # start a http.server with the proper root to serve above files.
```

### Ascii terminal rendering (experimental)
### Rich terminal rendering

To render the documentation for a single object on a terminal, use

```
$ papyri ascii <fully qualified names> # try to render in the terminal.
$ papyri rich <fully qualified name>
```

For example,
For example:

```
$ papyri ascii numpy.linspace
$ papyri rich numpy:einsum # note the colon for the fully qualified name.
```

The next step uses urwid to provide a browsable interface in terminal.
To use the experimental interactive Textual interface in the terminal, use

```
$ papyri browse <fully qualified name> # urwid documentation browser.
$ papyri textual <fully qualified name>
```

### IPython extension

To run `papyri` as an IPython extension, run:

```
$ ipython --ext papyri.ipython
```

This will start an IPython session with an augmented `?` operator.

### Jupyter extension

In progress.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't immediately find how to run papyri as a jupyter extension

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a sub-readme, and yes it's complicated, I would need to publish it as its own subpackage.


Hacking on scrapping libraries `papyri gen --no-infer [...]` will skip type
inference of examples. `--exec` option need to be passed to try to execute examples.
### More commands

You can run `papyri` without a command to see all currently available commands.

## Papyri - Name's meaning

See the legendary [Villa of Papyri](https://en.wikipedia.org/wiki/Villa_of_the_Papyri), which get its name from its
collection of many papyrus scrolls.

## Architecture

## Legacy (MISC/OLD) documentation (Inaccurate):


#### Generation (`papyri gen`)
### Generation (`papyri gen`)

Collects the documentation of a project into a DocBundle -- a number of
DocBlobs (currently json files), with a defined semantic structure, and
Collects the documentation of a project into a *DocBundle* -- a number of
*DocBlobs* (currently json files), with a defined semantic structure, and
some metadata (version of the project this documentation refers to, and
potentially some other blobs).

During the generation a number of normalisation and inference can and should
happen, for example
During the generation a number of normalisation and inference steps can and
should happen. For example:

- using type inference into the `Examples` sections of docstrings and storing
- Using type inference into the `Examples` sections of docstrings and storing
those as pairs (token, reference), so that you can later decide that
clicking on `np.array` in an example brings you to numpy array
documentation; whether or not we are currently in the numpy doc.
- Parsing "See Also" into a well defined structure
- running Example to generate images for docs with images (not implemented)
- resolve package local references for example building numpy doc
"`zeroes_like`" is non ambiguous and shoudl be Normalized to
"`numpy.zeroes_like`", `~.pyplot.histogram`, normalized to
`matplotlib.pyplot.histogram` as the **target** and `histogram` as the text
...etc.
documentation; whether or not we are currently in the numpy documentation;
- Parsing "See Also" into a well defined structure;
- Running examples to generate images for docs with images (partially
implemented);
- Resolve local references. For example, when building the NumPy docs,
`zeroes_like` is non-ambiguous and should be normalized to
`numpy.zeroes_like`. Similarly, `~.pyplot.histogram`, should be normalized
to `matplotlib.pyplot.histogram` as the **target** and `histogram` as the
text.

The Generation step is likely project specific, as there might be import
conventions that are per-project and should not need to be repeated (`import
pandas as pd`, for example,)
conventions that are defined per-project and should not need to be repeated
(`import pandas as pd`, for example.)

The generation step is likely to be the most time consuming, and for each
project, results in the following outputs:

- A `papyri.json` file, which is a list of unique qualified names corresponding
to the documented objects and some metadata;
- A `toc.json` file, ?
- An `assets` folder, containing all the images generated during the
generation;
- A `docs` folder, ?
- An `examples` folder, ?
- A `module` folder, containing one json file per documented object.
Comment on lines +354 to +361
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally this is the kind of information I'd love to have, but not sure if it's too much detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good we can refine later.


After the generation step, *what should have been processed*?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the crucial step that will help new contributors, as it will probably point people to what hasn't been implemented yet


#### Ingestion (papyri ingest)
### Ingestion (`papyri ingest`)

The ingestion step takes a DocBundle and/or DocBlobs and adds them into a graph
of known items; the ingestion is critical to efficiently build the collection
graph metadata and understand which items refers to which. This allows the
following:

- Update the list of backreferences to a DocBundle
- Update the list of backreferences to a *DocBundle*;
- Update forward references metadata to know whether links are valid.

Currently the ingestion loads all in memory and update all the bundle in place
Currently the ingestion loads all in memory and updates all the bundle in place
but this can likely be done more efficiently.

A lot more can likely be done at larger scale, like detecting if documentation
have changed in previous version so infer for which versions of a library this
has changed in previous versions to infer for which versions of a library this
documentation is valid.

There is also likely some curating that might need to be done at that point, as
for example, numpy.array have an extremely large number of back-references.
objects such as `numpy.array` have an extremely large number of back-references.

### Qualified names

### tree sitter info.
To avoid ambiguity when referring to objects, papyri uses the
*fully qualified name* of the object for its operations. This means that instead
of a dot (`.`), we use a colon (`:`) to separate the module part from the
object's name and sub attributes.

https://tree-sitter.github.io/tree-sitter/creating-parsers


### When things don't work !
To understand why we need this, assume the following situation: a top level
`__init__` imports a function from a submodule that has the same name as the
submodule:

```
# project/__init__.py
from .sub import sub
```

#### `SqlOperationalError`:
This submodule defines a class (here we use lowercase for the example):

- The DB schema likely have changed, try: `rm -rf ~/.papyri/ingest/`.
```
# project/sub.py
class sub:
attribute:str
attribute = 'hello'
```

#### Can't build tree-sitter:
and a second submodule is defined:
```
# project/attribute.py
None
```

An error occurred trying to build-tree-sitter with clang, you likely have a conda environment. Install all the compilers
in the current conda env:
Using qualified names only with dots (`.`) can make it difficult to find out
which object we are referring to, or implement the logic to find the object.
For example, to get the object `project.sub.attribute`, one would do:

```
conda install compilers
import project
x = getattr(project, 'sub')
getattr(x, 'attribute')
```

But here, because of the `from .sub import sub`, we end up getting the class
attribute instead of the module. This ambiguity is lifted with a `:` as we now
explicitly know the module part, and `package.sub.attribute` is distinct from
`package.sub:attribute`. Note that `package:sub.attribute` is also
non-ambiguous, even if not the right fully qualified name for an object.

Moreover, using `:` as a separator makes the implementation much easier, as
in the case of `package.sub:attribute` it is possible to directly execute
`importlib.import_module('package.sub')` to obtain a reference to the `sub`
submodule, without try/except or recursive `getattr` checking for the type of an
object.

### Tree sitter information

See https://tree-sitter.github.io/tree-sitter/creating-parsers


### When things don't work !

#### `SqlOperationalError`:

- The DB schema likely have changed, try: `rm -rf ~/.papyri/ingest/`.
Loading