Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programs aux/config schema specification #418

Open
frankiebee opened this issue Sep 25, 2024 · 20 comments
Open

Programs aux/config schema specification #418

frankiebee opened this issue Sep 25, 2024 · 20 comments
Labels
spec-proposal used for discussions around spec proposals

Comments

@frankiebee
Copy link
Collaborator

frankiebee commented Sep 25, 2024

It would be good to have a schema spec for aux/config data on deployment so when we pull program info from chain we have a rough idea of what we are getting. JSON schema seems like the best option but the it be comes a question of which version and is their also a compatible rust crate

@entropyxyz/core-developers we can use this issue to track the progress on it

@frankiebee frankiebee added the spec-proposal used for discussions around spec proposals label Sep 25, 2024
@frankiebee frankiebee assigned frankiebee and unassigned frankiebee Sep 25, 2024
@mixmix
Copy link
Collaborator

mixmix commented Oct 29, 2024

Hey @ameba23 I'm gonna be leading out on specifying the Schema for Aux/Config.

One thing I'm pretty keen on is validation happening at the lowest level possible. I am keen on aiming for

  1. no dead code on chain
    • no wasted funds deploying things which are not valid schema
    • no wasted space storing stuff which can never be used
  2. no leaky APIs
    • if there is a Schema in place, it cannot be side-stepped for signing
      • JS sdk/cli
      • Rust lib/ CLI

We don't have to put guard rails in Rust immediately, but I have the sense it would be good to choose a Schema standard which will work for JS + Rust.

A) Why do you think?
B) can you give this crate a sniff ? https://docs.rs/jsonschema/latest/jsonschema/
- does it look "healthy"?
- would it be easy to use (documented, nice API)?

@ameba23
Copy link
Contributor

ameba23 commented Oct 29, 2024

Hey,

i can give a fully thought out response later, but just for info, the device key program's aux data uses https://docs.rs/schemars/latest/schemars/ - a different JSON schema library. It looks to me like jsonschema is more focussed on validating json, whereas schemars is for generating schemas for rust structs (so more useful to the program developer, not the program user).

@ameba23
Copy link
Contributor

ameba23 commented Oct 29, 2024

@mixmix Where on the rust side would this be used?

I can see three possible places:

  • In the programs or pallet, which is part of the chain runtime. That means we would need a no-std library which jsonschema is not unfortunately.
  • In entropy-tss, to validate aux data against the schema before running a program. This would be possible, but i'm not sure we gain anything by checking at that point. If we don't check, the program will fail with an error when it attempts to deserialize the json, and attempting to run the program is probably no more expensive in terms of computation than validating using the schema beforehand.
  • in entropy-client (the client code used by our tests and the test-cli). We could for sure do this, but it is supposed to be quite a low-level library which can work on different platforms, so i'd be wary of adding anything with a lot of dependencies.

I think it makes most sense for the validation to happen in full-featured client libraries - such as the SDK.

Also, personally i think using JSON for program aux and configuration should be a recommended guideline but not a requirement for program developers. Json is inefficient both in terms of storage space and computation needed to deserialize when compared to binary serialization, which could be a deal breaker for program developers who want to have large configurations or auxiliary data. But thats just my opinion and i wouldn't wanna block this.

@JesseAbram
Copy link
Member

Also, personally i think using JSON for program aux and configuration should be a recommended guideline but not a requirement for program developers. Json is inefficient both in terms of storage space and computation needed to deserialize when compared to binary serialization, which could be a deal breaker for program developers who want to have large configurations or auxiliary data. But thats just my opinion and i wouldn't wanna block this.

I agree it also goes to the point of neutrality on the platform, as in, this is what you can do, to work with the sdk, out our tools, but anyone can create or write anything or anytools they want, think of an organization using entropy with their own preferred standards

@mixmix
Copy link
Collaborator

mixmix commented Oct 30, 2024

Good questions raised. At the heart of this is the question "What are Schema for?"

Are they:

  • A) a guideline for people inputting data?
  • B) a guard against invalid inputs

I have had in mind the user stories:

As a general entropy user I don't want to write my own program, I want to pick one off the shelf and add some config to tune it to my needs.

As program developer, I want to write a program which can be configured, and I want to define what is a valid configuration.

Example: petty-cash program

Entropy wants to make it possible for team leads to draw on petty-cash for small expenses.

Peg writes a program which will sign "any Eth transaction for <= $X"
Peg deploys the program with

  • bytecode
  • configurationSchema which says X MUST be defined in config, and MUST be a number less than 1

Tux installs this program with a config which says X = 0.1. (She also installs the device-key-proxy program and say "Frankie, Jesse, Vi can all request signatures")

Questions

  1. Is config for programs expected to be used to help define valid inputs for signing?
  2. What would happen if Tux installed the program with "X = 100"?
    • at install-time => nothing?
    • at sign-time => ???
  3. What would happen if Tux installed the program without a config?
    • at install time => nothing?
    • at sign time => ???
  4. Whose responsibility is it to honour the Schema?
    • are they a guide or a guard
    • if it's not at a low level then it feels more like a "guide" to me, and users are open to abuse/ hurt?
  5. If we want un-opinionated schema, how do we track the format and know how to use them?
    • depends on the answer to (4)

I think the least worst case is "signing fails, you wasted money badly installing a program".
My experience is saying "if there is a schema then that means I expect it as a guard, and if it's not then it should probably not be on chain"

I am wearing 2 hats here:

  • 👒 UX : I don't want Tux to install something and later discover the install was bad (you just wasted time + money)
  • 🎩 Security : are we offering a guard rail or not?

@ameba23
Copy link
Contributor

ameba23 commented Oct 31, 2024

Are they:

* A) a _guideline_ for people inputting data?

* B) a **guard** against invalid inputs

Ideally they are both. I think in the best case we can come up with a way of specifying schemas for whichever serialization format people choose. Eg: a json object with one property describing which serialisation format is used (JSON, Protobuf, RLP, SCALE, BSON, whatever), and another property with the actual schema (Json schema, protobuf schema, etc).

But if that turns out to be too difficult im not against sticking with JSON for the sake of having standardisation and keeping things simple.

@JesseAbram
Copy link
Member

no they should not be guard because then we have to decide what is and isn't allowed and we become opinionated.

They should act almost like an ABI for smart contracts on ethereum, it is how I talk to the program as a user.

Should they be on chain..........well this is a good point to be revisited, no probably not, they were originally added as an ask from @frankiebee as we don't have any public structure to place these items, but if they were to look like an ABI then ya maybe third parties should be responsible for this

@ameba23
Copy link
Contributor

ameba23 commented Oct 31, 2024

Re: other public place to put them, we do have the program-metadata-http-service and there is an open PR to read configuration schema and aux data schema out of the Cargo.toml file of the project source code - as it is needed for getting the hash that the program will have on-chain.

@ameba23
Copy link
Contributor

ameba23 commented Oct 31, 2024

at sign-time => ???

At sign time - signing will fail with an error. Which isn't so bad - requesting a signature doesn't cost anything.

@mixmix
Copy link
Collaborator

mixmix commented Oct 31, 2024

It does cost users I reckon @ameba23 - confusion over why signing is failing. If the system "doesn't work" people will say "it's broken"

@ameba23
Copy link
Contributor

ameba23 commented Oct 31, 2024

It does cost users I reckon @ameba23 - confusion over why signing is failing. If the system "doesn't work" people will say "it's broken"

I totally agree that we need guards against these things - but its a bit expensive / impractical to have those guards in the chain runtime.

i think it should be up to client software, like tooling for program developers and the SDK for users, to provide guards against giving bad inputs when deploying or registering with a program. If people choose to use the chain API directly with no client software then yes they can make mistakes and will get confusing errors.

@ameba23
Copy link
Contributor

ameba23 commented Nov 1, 2024

Having thought about this a bit more - i think the problem is we are saying schemas should be a guide and not a guard - but also that there should be a guard in the SDK to stop users from registering with an invalid program configuration.

The best way to check the validity of a program configuration without using a schema is to attempt to load the program with configuration into the programs runtime. But i'm not sure if we can do this from JS as that would be a wasm runtime inside a wasm runtime.

If we can't do that, then i think we do need schemas to act as a guard and not just as an instruction manual describing to users how the configuration should look.

Edit: i attempted to compile wasmtime (the dependency of entropy-programs-runtime which provides the runtime) to target wasm32-unknown-unknown, and it failed. So it looks to me like it is not going to be possible to provide JS bindings to entropy-programs-runtime.

@mixmix
Copy link
Collaborator

mixmix commented Nov 4, 2024

I think you named it well - we want a guard, but we want to balance the need for that with the cost.
Also I like then noticing that we have UX requirements/ expectations, and those could/ should be discussed before diving into the "how" and all the trade-offs.

I notice myself jumping to solutions...

I'm hearing that we do want good UX. Maybe we should write down what we agree that means so we're aligned on the goal we're aiming for, then sort must-haves from the nice-to-haves, then get into the implementation + trade-offs.

@JesseAbram
Copy link
Member

I think it is important to distinguish where the UX lives and why. Core should not be opinionated, which does cause worse UX but doesn't hamstring future builders. All other user facing repos can be opinionated and force design decisions

@mixmix
Copy link
Collaborator

mixmix commented Nov 6, 2024

I am trying to bring clarity to form up a spec, which involves:

  • surfacing what needs are for users (program devs, program installers, signers),
  • exploring how we can meet those needs with our code (across levels of the stack)

Core doesn't get to side-step UX conversation because they're "opinionated". We have users who will use our code, we're all looking for the right balance of opinionated (which constrains, but also stabilises), and un-opinionated (which is flexible, but destabilises) to provide a great experience for these humans. If humans have expectations that are not being met, then it may be a problem very relevant to core.

Speaking as one such user (an app developer - JS CLI), right now we have configurationSchema and auxillaryDataSchema and my feedback on the idea that these are guides is:

what is the point of a schema if people can side-step it to potentially abuse Entropy, this makes deploying programs risky for me + my users?...what is the point of this here even.... just move the validation into the wasm.... oh that means deploying my own wasm if I want to constrain config/ aux inputs....groan

I outlined another possible user-journey where Tux has a bad time above.

Are we interested in users having a good time? Whose responsibility is that?

What I'm hearing is @JesseAbram communicating ~ "we don't want schema as a guard in core, that's too opinionated/ not our concern".

I was excited to have stability provided which would guarantee good UX (reliable behaviour, no wigglyness for abuse). Personally I would rather rip the schema out if it's just a guide - don't pretend safety that does not exist.


To me this conversation needs a north-star which will help guide how we prioritize/ approach/ decide. What I'd love is for it to be collaborative where we sketch a couple of use-cases we could currently + hypothetically would support, and then talk through the implications of those across the stack. I want it to be a conversation where work together to meet each teams needs as best we can.

What I'm feeling is a bit of a "well that's not our problem" from you Jesse. What I'd like you to hear is "I think JS / CLI has a problem, we'd love to collaborate on that".

Concretely

  • explore if the problem I'm sensing is a valid problem
  • then figuring out what we can all do to address that

If core doesn't have capacity for that conversation, that is important + good to hear too ❤️. I'm totally aware that TDX is a major priority. In that case I would probably recommend we move to remove the schema completely till we have the time to actually design a great solution.

@mixmix
Copy link
Collaborator

mixmix commented Nov 6, 2024

On reflection, this thread is feeling like the wrong bandwidth level. (I spent 20-30 mins writing that trying to balance the right level of "this is important" (which will probably land as annoying / challenging) with "I'm trying to hear you and collaborate, we're in this together".)

Would someone from Core like to do a 60 min call with me to progress this to a good next step?

@ameba23
Copy link
Contributor

ameba23 commented Nov 6, 2024

me and @JesseAbram would be totally up to have a call about this - will message you on discord @mixmix .

I think we all agree that it would be great to make it not possible for a user to register with an invalid program configuration. Unfortunately there are practical reasons why it is hard to do this on the backend (chain runtime) and i think it makes sense for this to be a client-side check.

Something we maybe don't all agree on is to what extent we should prioritise standardisation over flexibility for program developers, and what the purpose of schema fields are. I think that is maybe a conversation for the programs repo rather than this one.

Im not sure we can come up with a good solution to that without looking at the whole process: how do users find and choose programs, and what software they use to create the program config. Those things feel like a long way off from where we are now with program tooling, and it would be great if you wanna get involved in that.

But i think what we can do right now is come up with a way for the SDK to be able to check the validity of a program configuration for a given program (without using schemas). Either by using a child process which can load the programs runtime, or if that is not possible for some reason (eg: on browser or mobile) by connecting to a network service which can make that check for you (but not the entropy blockchain).

@mixmix
Copy link
Collaborator

mixmix commented Nov 8, 2024

I think you're right that there's a higher level question about "what the user flows are for programs".
We don't need to solve everything, I think we just need to align on:

  • what the long term goals likely are
  • what we're doing now/ later
  • what this all means for these current fields
    • do we scrap them? (till we can do something well?)
    • do we keep them, but clearly document assumptions/ limitations?

Thanks for the call invite 🙏 See you next week!

@mixmix
Copy link
Collaborator

mixmix commented Nov 12, 2024

@ameba23 @JesseAbram and I had a call!
Here's the rough notes


Entropy Schema spec | 2024-11-12

What can go wrong if a person deliberately puts in config/ auxData

Ideas for

Option A: Dry Run - local

compile the code down to wasm, try some mock input, see if it passes or not
do this in the browser

Option A2: Dry Run - child process / worker

compile the code down to wasm, try some mock input, see if it passes or not
do this in the browser

Option B: Dry Run - 3rd party

if we run this

  • centralized point of failure
  • costs us

could be skipped --no-verify

Option C: try after install

bad in that you waste time on bad install
good in that it's better than what we have

Observations

people are gonna need to learn how to use a program
we're gonna need tooling

Questions

  1. for Frankie: why is this different that Smart-contract ABI
  2. can we read the hard disk from our wasm?
  3. what should we do with schema fields now?
    a) rip them out now
    b) nothing ... wait and watch/ rip them out later
    c) further specify them
  4. could this schema stuff be in pegs metadata service

Recommendations

Rip our schema from on-chain storage
Build the JSON Schema or whatever, but they live in the source code

Eventually go to some 3rd party registry.

Decisions

  1. these are guides, not guards
  2. we would like to aim for "dry run" check
    • before you deploy, your wasm is ok (maybe)
    • before you install/ register a program with a partic config
    • before you sign with a given auxData
  3. [TODO: decide on schema course of action]

Actions

mix

  • talk to JS team + loop back on decisions/ recommendations

jesse

@mixmix
Copy link
Collaborator

mixmix commented Nov 20, 2024

I shared a brief update of this with the JS team. I did hear from @frankiebee a strong need to not have ABI/ schema be in an external system - this was done in Ethereum, and Frankie had things to say about how that made building things in Metamask harder? (will leave her to unpack)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec-proposal used for discussions around spec proposals
Projects
None yet
Development

No branches or pull requests

4 participants