Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial basic API #2

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open

Initial basic API #2

wants to merge 28 commits into from

Conversation

bdc34
Copy link
Contributor

@bdc34 bdc34 commented Sep 23, 2024

This project was started as a proof of concept and intendeds to replace the
legacy submit UI. The scope is limited to the UI to submit a paper to arxiv.org
and an API that the UI uses to record and modify submissions. It is also limited
to parity with the existing legacy system.

NG assessment

The "NG" project in 2016 at arxiv started a replacement for the submit system. A
barrier to modernizing submit was the question if the NG systems could be
quickly put into production. A review of that code found it to have technical
problems. From that: "The NG system cannot be put into immediate use. It has
flaws that demonstrate a misunderstanding with distributed software
development. It has restrictions on how it interoperates with the legacy
system. It also lacks a usable API to uniformly handle db and file CRUD." See
https://docs.google.com/document/d/1zGmJWdCn5HIJLCTJqPTrxcuMRgcmMzhLNCz0M9r4Jng/edit?usp=sharing

I presented my assessment and proposed mostly starting from scratch and that was
accepted. I hope to reuse isolated useful packages from NG.

API handles both metadata and files

To avoid race conditions related to the handling of files, the API will handle
both files and metadata. This is for simplicity. A hash based file set model was
considered and discarded. A hash based system is not excessively complicated,
but it is more complicated and the NG implementation indicate how easy it can go wrong.

Openapi schema via Fastapi

The new system is starting with a fastapi submission CRUD for paper metadata and
file upload. It took little time to get a bare-bones API working and the fastapi
swagger doc can be used as a primitive UI to submit a paper.

There are several mature libraries that can generate a client from the openapi
spec. This will elevate the writing of boilerplate seen in the NG projects.

Legacy API integration as v1

The API will start with an implementation that uses the legacy DB and legacy CIT
SFS as the backend. Once that is in place it can be run at CIT and the UI can
be served at CIT or from the cloud.

There is imperative to get a modernized submit version working ASAP and to not
gold plate the first version since any work put into the legacy is moribund Once
we have a modernized API and UI work can be done that will be forward-looking.

Future API implementation v2

Once the API is in operation it will provide a foundation to create a v2 API
that supports future needs.

Use and install of arxiv-base

The new submit does not use flask. It installs the arxiv-base package without
dependencies. This minimizes the size of the installation.

At this point, the new submit uses pydantic 2 and arxiv-base uses v1. So there
is a feature branch of arxiv-base modified to use pydantic v2 and work well
without flask. This will need to be merged to arxiv-base master soon.

Current state

Working

  • creates a submission
  • uplaod and unpacks tar.gz
  • license, policy and author attestation
  • metadata: title abstract etc
  • docker file
  • tests, coverage at 89%

Not yet started

  • no field validation other than pydantic
  • no legacy "stage" integration
  • no admin_log inserts
  • does not make a PDF from source
  • does not do auth
  • no UI other than fastapi swagger
  • no firing of pubsub events of changes
  • schema of submission totally undefined

Very open to feedback

Would like to hear about:

  • The paths of the API are not great, please make suggestions. Should the submission be more like resource names and less like verbs. Ex {submission_id}/license vs {submission_id}/setLicense.
  • I have the fastapi default API using a ABC and then filling that out for an implementation. Do you like this or dislike
    this? I suspect we will soon want a non-legacy backend.
  • Is the runtime setup of an implementation reasonable to you? The implementation used at runtime is set in the submit_ce/fastapi/config.py which sets a ImplementationConfig which is used in the fastapi API definition of the service. The implementation can be changed by a env var.
  • Anything else.

This is to avoid conflicts with the arxiv. package from arxiv-base
There is a way to resolve this conflict but it has just caused hassles during
NG. So let's move away from that pattern.
Can make a new submission, and it is written to legacy db.
Adds make_test_db.py
Simplifies Agent,User,Client
The install of the requirements.txt file is stange because I want to
avoid installing the large dependency list from arxiv-base.
@bdc34 bdc34 self-assigned this Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants