Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: add HTTP API for looper #433

Open
4 tasks done
simeoncarstens opened this issue Dec 22, 2023 · 5 comments
Open
4 tasks done

Discussion: add HTTP API for looper #433

simeoncarstens opened this issue Dec 22, 2023 · 5 comments
Milestone

Comments

@simeoncarstens
Copy link
Collaborator

simeoncarstens commented Dec 22, 2023

This issue is coauthored by @zz1874.

looper is a CLI tool that often runs on the front node of a HPC cluster, so jobs can be submitted to Slurm / SGE / other job schedulers.
@nsheff expressed desire for a HTTP API for looper which wraps around looper. That would allow him and other users to run looper on the front node and use a reverse SSH tunnel from a different machine to send HTTP requests to the HTTP API.
Advantages of this would be

  • use of looper functionalities from any machine without manually copying code to the frontend node,
  • potential for a graphical user interface (GUI) that builds upon that API.

An earlier attempt of this was caravel (https://github.com/pepkit/caravel). @nsheff tells us that there were issues, possibly due to the synchronous nature of the Flask framework. caravel seems to be a Python 2.7 code base that uses 2to3 to convert to Python 3 code on-the-fly during installation via setuptools' use_2to3. This makes it, in the meantime, hard to run caravel for reasons such as: setuptools doesn't come with use_2to3 anymore, the Docker image cannot be built anymore, Debian index URLs are out of date, Python 3.6-specific typing imports are used, ...

After browsing the looper and caravel code, we identified the following possibilities:

  1. Revive caravel, meaning bringing it up-to-date with recent Python versions and making it compatible with recent looper versions,
  2. Write a new HTTP API from scratch, which leaves us at least three possibilities:
    1. Figure out a way of automagically creating both CLI and HTTP API from a single definition of commands / options.
      This would likely be the most sustainable idea, as it prevents the need of keeping CLI and HTTP API in sync if commands / options are added / removed in the future. But it would possibly be a larger undertaking with the risk of being only partially finished in the limited time we can work on it. It would also possibly make a nice separate, reusable library!
    2. Implement only the most important top-level commands and their options as HTTP API endpoints, but design this easily transferable to other commands and document the development process. That way, a subset of the looper commands / options could likely be made available via the HTTP API in the little development time we have. But this also means an increased maintenance burden - if a new CLI command / option is added, the HTTP API and its documentation have to be adapted accordingly.
    3. Implement only top-level commands and allow setting of flags / options only via a project configuration file that is POSTed to the API. This would be the easiest and quickest solution, but limits the use cases of the API. A similarly easy and inflexible approach would be POSTing a string of command lines argument that is then parsed by looper's existing argparse argument parser.

Important questions that would need to be answered:

  • Which version of looper should we develop against? looper is currently at v1.5.1, but there is a PR open for v.1.6.0, and in fact we could only get the hello_looper example working with the future v1.6.0 of looper. A similar question holds for pipestat, if required for development of the HTTP API. The answer is: v.1.6.0 for looper and v0.6.0 for pipestat, as both new versions have now been released.
  • What were the exact issues you faced with caravel? Knowing them would help us make a more informed decision whether to possibly revive caravel or to redevelop from scratch, avoiding mistakes made in caravel. Answer: Discussion: add HTTP API for looper #433 (comment)
  • If we were to decide to implement only a subset of the top-level looper commands: which commands have the highest priority and should thus be implemented first as HTTP API calls? Answer: looper run, looper runp, looper check, looper report (Discussion: add HTTP API for looper #433 (comment))

And finally, of course:

  • Which of options 1-3.1-3 should we pursue? We should discuss this question together with @nsheff and add the answer in a comment. Answer: in a call with @nsheff, we decided to go with 2.1. Details in a comment below.
@donaldcampbelljr
Copy link
Contributor

donaldcampbelljr commented Dec 22, 2023

Hello Simeon,

Thank you for beginning this discussion. I just wanted to make you aware that we released Looper v1.6.0 as well Pipestat v0.6.0 this morning.

~ Donald

@simeoncarstens
Copy link
Collaborator Author

Thanks, @donaldcampbelljr! That's good to know - I edited the issue accordingly 🙂

@nsheff
Copy link
Contributor

nsheff commented Jan 4, 2024

What were the exact issues you faced with caravel?

First, caravel was written using flask, but we're now wanting to use fastAPI. More importantly, though, was that the front-end was not reactive. With caravel, there wasn't really an HTTP API. Instead, there was a web interface that you could use to execute commands. In other words, I think caravel had tightly coupled the front-end (web interface) with the back-end (http API) -- and the front-end was not written using a reactive framework. In the new version, I'd like them to be separate. This would emphasize that the API must be asynchronous-friendly, and that the front-end must also be asychronous-friendly.

which commands have the highest priority and should thus be implemented first as HTTP API calls

The most important commands are looper run, looper runp, looper check, and looper report.

Which of options 1-3.1-3 should we pursue?

I'd advise a hybrid approach:

  • we can start something from "scratch", but use components/ideas from caravel where it makes sense, or where it is easy.
  • we can just run this on a new branch of caravel, if that's convenient, or use a new repo. Might as well re-use the name caravel, though, as the old one will become defunct
  • I like all 3 of the concepts under option 2; realistically, I think option 2.3 might make the most sense. A bit more detail on that here:

The original caravel included code to inspect the looper argument parser, and then create HTML forms that mimic it: https://github.com/pepkit/caravel/blob/master/caravel/looper_parser.py

The idea here is that looper's CLI argument parser is the source of truth for how to interact with looper. This code then allows us to make an HTML interface that would automatically update if the CLI interface changes. What you're proposing in option 2.1 is basically this exact same idea, but instead of creating an HTML form interface, you'd create an HTTP API interface. That's a good idea; however, it could get hairy... and I like what you said in 2.3, "A similarly easy and inflexible approach would be POSTing a string of command lines argument that is then parsed by looper's existing argparse argument parser."

I think we could combine these ideas. The API could accept a POST of some CLI string. From here, the API would construct a CLI argument string, and use the existing argument parser. I think this is something close to what we were trying to do, in this code here:

https://github.com/pepkit/caravel/blob/ebadf7d0489d431eb316ab38be29b743576c1070/caravel/caravel.py#L374-L403

Then, we can use the the HTML form idea for the front-end: from the argparser, create HTML forms, and then have these HTML forms be interpreted into a CLI string, which is then POSTed to the HTTP API.

Alternatively, instead of operating through the CLI string itself, we could introduce a simple config format for defining the arguments. I think there's already a rudimentary way to do this using looper alone, using the .looper.yaml config file.

What do you think?

@simeoncarstens
Copy link
Collaborator Author

A belated (written) thanks to @nsheff for the detailed answer!
We now settled on an approach based on defining the CLI mostly via pydantic models using (for now) the pydantic-argparse library (#438). With a pydantic model that accurately reflects all arguments and flags a given looper command might consume, it is then straight-forward to build an HTTP API based on, for example, FastAPI that expects a JSON conforming to this schema in a POST request.

Let's still keep this issue open to discuss things pertaining to the actual HTTP API implementation, once we get to it.

@donaldcampbelljr
Copy link
Contributor

After discussion, we've decided to move this work to be done after the major work for milestone 2.0.0: https://github.com/pepkit/looper/milestone/14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants