Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate env vars with Zod #2362

Open
wants to merge 42 commits into
base: main
Choose a base branch
from
Open

Validate env vars with Zod #2362

wants to merge 42 commits into from

Conversation

infomiho
Copy link
Contributor

@infomiho infomiho commented Oct 29, 2024

This PR will enable us to have strict env vars validation with Zod for Wasp's internal env vars. Also, it will enable users to define their own env var validation. Wasp will use the Zod schemas to output errors for missing env vars.

Left to do

  • After the strict null checks is merged, make sure that the types are correct Enables strict null checks in SDK #2360
  • Make sure all env vars in our code are used through the env object and not directly
  • Experiment with enabling users to add their own env vars validation Enable users to define their env var validations #2363
  • Update docs (I'm going to do it in a separate PR)
    • Make sure to mention the difference between env and process.env
  • Fix keycloak.env.KEYCLOAK_REALM_URL leftover

Closes #1067

@infomiho infomiho changed the base branch from main to miho-strict-null-checks October 29, 2024 11:25
@infomiho infomiho force-pushed the miho-strict-null-checks branch from 18f9fd0 to a509762 Compare October 29, 2024 11:26
Signed-off-by: Mihovil Ilakovac <[email protected]>
@infomiho infomiho mentioned this pull request Nov 20, 2024
4 tasks
Base automatically changed from miho-strict-null-checks to main November 26, 2024 14:35
{=/ isAuthEnabled =}
const allowedCORSOrigins = allowedCORSOriginsPerEnv[env.NODE_ENV]

const config: Config = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified the creation of the config object, instead of having three separate things all, development, production, now we construct a single object and vary some of the values (e.g. allowed CORS origins) based on NODE_ENV.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we loose anyting this way? Anythihg int he old appraoch that was beneficial from having that split? Why was that split there otherwise, I do think it had some benefits possibly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the previous approach we clever since it envisioned many different options for dev and for production, but it in the end, we only vary the CORS origins between dev and production. So, it seemed like an overkill and I know I had these complex types to capture the different objects. But when I looked closer, it turned out this simple Config type is enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense, we can always modify it if needed!

@@ -16,6 +16,7 @@ export function App() {

const connectionIcon = isConnected ? '🟢' : '🔴'

// TODO: enable users to define their own client env vars
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did it in a follow-up PR: #1067

Copy link
Member

@Martinsos Martinsos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@infomiho looks quite solid to me, improvements in the codebase are visible already! Nice stuff.

I left some comments.

Some general stuff I also had on mind:

  1. What if users try to access our env vars via process.env? I know they can, nothing bad will really happen, but we would rather they use them via our env object, right, so they have better DX (validation happened, types)? This becomes even more important once they start defining their own variables I guess (which is another PR)? I don't think we should in any way forbid them from using process.env of course, that is too crucial of a thing to touch, but I wonder if we should do anything. Maybe monkey patch it in some way so in dev they get a warning that they should use our env object instead? Sounds a bit wild.
  2. process.env and our env object -> are the values in them the same? Shoudl they be? I am guessing they are not the same because of zod default values -> so in env, values will be set to those defaults if they are empty, right? While in process.env, they will not be. This could be confusing for users -> you access variable from env, it has some value, but you know you haven't set it via env, nor anybody else. So how can it have value? Hm. Maybe just documenting this behaviour is good enough.
  3. Just to confirm, what is the behaviour in production now? If some of the env vars are missing and they are required, server will print error and crash? What about in development, it will print errors and also crash or continue?

# of the bundle and references the dotenv package.
# Copying 'server/node_modules' because we require dotenv package to
# load environment variables
# TODO: replace dotenv with native Node.js environment variable loading
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that, what is wrong with dotenv? Node.js now has native support for it that is equally good?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always nicer to use natively supported methods than a package to do the same thing. It's available since Node.js 20 and we should probably go for it when Node.js 20 becomes our minimum version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probabl make nodejs 20 our minimum version any moment hm. Ok, so it does do the same thing? If so, great.

waspc/data/Generator/templates/Dockerfile Show resolved Hide resolved
if (e instanceof z.ZodError) {
const errorOutput = [
'',
'╔═════════════════════════════╗',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool! I do remember in demo that it was still a bit hard for me to figure out that this is the most important output to look at + where it starts / ends.

Maybe it would be better if it hard clear start and end.
E.g. something like this:

====== Env vars validation failed ======
 - error 1
 - error 2
 - error 3
========================================

or

====== Env vars validation failed ======
 - error 1
 - error 2
 - error 3
==== / Env vars validation failed ======

or even

|====== Env vars validation failed ======
| - error 1
| - error 2
| - error 3
|========================================

General idea: we should probably have some kind of universal way for outputing Wasp errors from the TS/JS/node runtime, like we have it for Haskell runtime. E.g. we could have a function defined for it, called consoleWaspError or something like that, that would always format errors like this in a same way. And hopefully it would be similar as the errors we output from the Haskell runtime, so they can easily connect it -> simialr formatting / structure / headers / style / ... .

It might be worth creating this general logging function right now in this PR, it should be quite easy, and then we have it done and can use it? If not ok, we can do it in another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, let me do another iteration of this output 👍

I'll create an issue for making this more universal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-12-16 at 10 32 37

Here's the updated version. Related issue: #2420

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this now -> nice! Why don't you use nice characters for the left border? and and ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it and I didn't like it actually!

The box looks too polished but it's missing the right border, so this look is kinda less polished and the missing right border doesn't bother me that much visually 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah I thought that might be the case. Although this also looks a bit weird. But ok, it is good enough for now, I also don't have better ideas at the moment!

Copy link
Member

@Martinsos Martinsos Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick ideas though:

===== Env vars validation failed =====
> - some text
> - some more text
> - third problem
======================================
===== Env vars validation failed =====
| - some text
| - some more text
| - third problem
======================================

or even just

===== Env vars validation failed =====
 - some text
 - some more text
 - third problem
======================================

Yeah I think I like this one the best.

There will probalby never be more then a couple of these errors, so if you remove those empty lines and make it more compact, you get it all to look quite tight and connected, I think left border might not even be needed then.

Also notice how I put some more = before and after the title, so it doesn't start at the same point where bullets start and looks a bit more like a title I think.

Do as you like though, you can keep it as it is if you prefer that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop all the | at the beginning of lines. It's weird and adds no extra info.

Martin's last suggestion is my favorite.
Obviously, I won't argue over this, so you can choose whatever you want :)

{=/ isAuthEnabled =}
const allowedCORSOrigins = allowedCORSOriginsPerEnv[env.NODE_ENV]

const config: Config = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we loose anyting this way? Anythihg int he old appraoch that was beneficial from having that split? Why was that split there otherwise, I do think it had some benefits possibly?

waspc/data/Generator/templates/sdk/wasp/server/env.ts Outdated Show resolved Hide resolved
waspc/data/Generator/templates/sdk/wasp/server/env.ts Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it bad that this is so centralized? I imagine taht on Haskell side you have to import a ton of things from the codebase in the generator that does this in order to side what goes in and what doesn't. Part of me is tempted to look for a more decentralized appraoch, where each "module" (auth providers, job, email, ...) defines its own env vars and they are somehow all registered in this central place, sure, but they are not all defined in this central place, but it is done via some other mechanism. Maybe that would cause cyclic deps hm, I guess it depends on how you execute it. I haven't thought it through more than this, but you probably did, so I am interested what was your reasoning, why you chose this approach, what are pros/cons.

Copy link
Contributor Author

@infomiho infomiho Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could build such a system, each part of the framework declares what it needs and declares that for development and production. Then we assemble it in this file by importing those schemas and merge them into one large schema which we use to validate the process.env object.

I feel like it could be useful if we start splitting Wasp into multiple full-stack modules or smth like that. Each module declares what it requires. But ... I think it's maybe an overkill for this PR. This PR wanted to use Zod to validate env vars and make sure all framework code uses validated env vars. I think the decentralised env var schemas system could be a nice improvement if we decide it's worth the extra complexity in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok yeah, if you don't feel liek this makes our code much worse, then it is ok I believe. We can see how this behaves!

Copy link
Contributor

@sodic sodic Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could build such a system, each part of the framework declares what it needs and declares that for development and production. Then we assemble it in this file by importing those schemas and merge them into one large schema which we use to validate the process.env object.

I wanted to suggest exactly this!

I see it as a clear improvement. While I'm OK with postponing it for later, I vote for it now.

This PR wanted to use Zod to validate env vars and make sure all framework code uses validated env vars. I think the decentralised env var schemas system could be a nice improvement if we decide it's worth the extra complexity in the future.

I might be missing something, but it seems we already had a modular (decentralized) system. It wasn't validated but it was modular (at least in terms of auth).
Just adding Zod to improve validation would have made it a fully validated modular system. Instead, we have a fully validated non-modular system.

In terms of validation and type safety, this is a definite improvement.
But, in terms of modularity, it seems like a regression.

EDIT (after looking at everything): Ok, the current Haskell code doesn't look bad at all. And I realize this would be a big change (not everything was modular). So consider my vote for doing it now softened by 50% :)

waspc/src/Wasp/Generator/EmailSenders.hs Outdated Show resolved Hide resolved
@infomiho
Copy link
Contributor Author

@Martinsos

What if users try to access our env vars via process.env?

They can, it's part of the Node.js platform - we don't want to deny them that. We offer a convenience method of accessing Wasp defined env variables - which are validated and typed. Monkey patching sounds like it could break some other libraries. that users might want to use in the future - so I think that's not the way to go.

process.env and our env object -> are the values in them the same?

One of them is a raw object, one of them is our validated and typed object. We encourage users to use our env object, while they are free to use the source process.env object as well. We can mention in the docs that there will difference so they are not surprised 👍

Just to confirm, what is the behaviour in production now? If some of the env vars are missing and they are required, server will print error and crash? What about in development, it will print errors and also crash or continue?

Failed validation crashes the server / client - in both development and production. This way it's clear upfront what's wrong vs. having some sort of runtime errors that might not be clear to our users.

@infomiho infomiho requested a review from Martinsos December 16, 2024 12:05
@Martinsos
Copy link
Member

@Martinsos

What if users try to access our env vars via process.env?

They can, it's part of the Node.js platform - we don't want to deny them that. We offer a convenience method of accessing Wasp defined env variables - which are validated and typed. Monkey patching sounds like it could break some other libraries. that users might want to use in the future - so I think that's not the way to go.

process.env and our env object -> are the values in them the same?

One of them is a raw object, one of them is our validated and typed object. We encourage users to use our env object, while they are free to use the source process.env object as well. We can mention in the docs that there will difference so they are not surprised 👍

Just to confirm, what is the behaviour in production now? If some of the env vars are missing and they are required, server will print error and crash? What about in development, it will print errors and also crash or continue?

Failed validation crashes the server / client - in both development and production. This way it's clear upfront what's wrong vs. having some sort of runtime errors that might not be clear to our users.

I believe @sodic suggested we more often quote only small parts of the original message, but in this case I would prefer if you quoted the whole thing, because you left out a lot and I had to go back to my original message to see what I actually wrote, which was not evident here and was missing from the discussion if somebody reads it. Not a biggie though.

Ok, I agree about not touching process.env is probably the best -> btw I advised monkey patching only to print something in dev, exactly because we don't know who might use it, so not to be intrusive, but I guess even that could be surprising in a way, so yeah better not to touch it probably at all.
The thing I think will be confusing is that they will just not be aware of env, and neither will LLM they are using, so we will often be seeing people using process.env instead of env, because they didn't even know env exists. So I am tempted to try to do sometihng about it. Monkey patching does sound intrusive tough, and linter rules or something sound too complex.

On the second thing -> You explained shortly the difference between process.env and env + said we can document it, which is what I already knew / suggested above, but you didn't answer the most interesting (to me) part of my question about default values. That from my perspective is the biggest difference and it could be considered quite unexpected, therefore a source of confusion / wrong assumptions. Have you considered that, and if so, what are your thoughts?

Dev/production craches -> ok great, that is what we want, I was just double checking that is how we did it.

Copy link
Member

@Martinsos Martinsos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments but approved!
I would also suggest that @sodic takes a look before merging.

@infomiho
Copy link
Contributor Author

infomiho commented Dec 17, 2024

@Martinsos

What if users try to access our env vars via process.env?

They can, it's part of the Node.js platform - we don't want to deny them that. We offer a convenience method of accessing Wasp defined env variables - which are validated and typed. Monkey patching sounds like it could break some other libraries. that users might want to use in the future - so I think that's not the way to go.

process.env and our env object -> are the values in them the same?

One of them is a raw object, one of them is our validated and typed object. We encourage users to use our env object, while they are free to use the source process.env object as well. We can mention in the docs that there will difference so they are not surprised 👍

Just to confirm, what is the behaviour in production now? If some of the env vars are missing and they are required, server will print error and crash? What about in development, it will print errors and also crash or continue?

Failed validation crashes the server / client - in both development and production. This way it's clear upfront what's wrong vs. having some sort of runtime errors that might not be clear to our users.

I believe @sodic suggested we more often quote only small parts of the original message, but in this case I would prefer if you quoted the whole thing, because you left out a lot and I had to go back to my original message to see what I actually wrote, which was not evident here and was missing from the discussion if somebody reads it. Not a biggie though.

Ok, I agree about not touching process.env is probably the best -> btw I advised monkey patching only to print something in dev, exactly because we don't know who might use it, so not to be intrusive, but I guess even that could be surprising in a way, so yeah better not to touch it probably at all. The thing I think will be confusing is that they will just not be aware of env, and neither will LLM they are using, so we will often be seeing people using process.env instead of env, because they didn't even know env exists. So I am tempted to try to do sometihng about it. Monkey patching does sound intrusive tough, and linter rules or something sound too complex.

On the second thing -> You explained shortly the difference between process.env and env + said we can document it, which is what I already knew / suggested above, but you didn't answer the most interesting (to me) part of my question about default values. That from my perspective is the biggest difference and it could be considered quite unexpected, therefore a source of confusion / wrong assumptions. Have you considered that, and if so, what are your thoughts?

Dev/production craches -> ok great, that is what we want, I was just double checking that is how we did it.

Replying to the full message but I'm only replying to the last question.

People use the config object today to get the serverUrl and frontendUrl nowadays. So they are used to using higher level abstractions. env object is something we made up, we document it and say it has default values for certain things. People can use it and there they get the default values. If they use the platform primitive process.env, it's just what they expect. I don't see this is a potential source of confusion - maybe I'm wrong. I'd rather do less at first than more. Let's see how people use this and then put extra guard rails if we see it as a source of problems.

@Martinsos
Copy link
Member

People use the config object today to get the serverUrl and frontendUrl nowadays. So they are used to using higher level abstractions. env object is something we made up, we document it and say it has default values for certain things. People can use it and there they get the default values. If they use the platform primitive process.env, it's just what they expect. I don't see this is a potential source of confusion - maybe I'm wrong. I'd rather do less at first than more. Let's see how people use this and then put extra guard rails if we see it as a source of problems.

That makes sense to me -> I wanted to hear your thinking on this. Ok, if we make it clear in the docs that it might have default values, then I think that is fine.
Btw I get what you mean by config, but the thing with env and process.env is that they have very similar names and are even more similar in concept, so it is more likely that people will expect them to behave the same, that is why I think it is important to be careful about it.

Copy link
Member

@Martinsos Martinsos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! One or two small comments left, pls take a look but then do as you like.

waspc/data/Generator/templates/sdk/wasp/env/index.ts Outdated Show resolved Hide resolved
if (e instanceof z.ZodError) {
const errorOutput = [
'',
'╔═════════════════════════════╗',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop all the | at the beginning of lines. It's weird and adds no extra info.

Martin's last suggestion is my favorite.
Obviously, I won't argue over this, so you can choose whatever you want :)

Copy link
Contributor

@sodic sodic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I'm a big fan of this change!

I don't have many comments, Martin covered it well.

waspc/data/Generator/templates/sdk/wasp/server/env.ts Outdated Show resolved Hide resolved
waspc/data/Generator/templates/sdk/wasp/prettier.config.js Outdated Show resolved Hide resolved
@@ -55,15 +55,15 @@ WORKDIR /app
# Copying the top level 'node_modules' because it contains the Prisma packages
# necessary for migrating the database.
COPY --from=server-builder /app/node_modules ./node_modules
# Copying the SDK because 'validate-env.mjs' executes independent of the bundle
# Copying the SDK because the server bundle doesn't bundle the SDK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? Why wouldn't it bundle the SDK? It's a dependency like any other (and it's copied with the node modules anyway).

Also, there's the second part of the old comment (below).

I know I wrote the original comment but I am not sure what I meant. But, I think I remember specifically needing it for validate-env.mjs.

Copy link
Contributor Author

@infomiho infomiho Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rollup doesn't bundle the wasp/* deps because they are considered "external" deps. I tried making them "internal" (adjust the regex for external deps) and it still didn't want to bundle them. Maybe because they are from node_modules. So, yep, we still need to copy over the SDK in the build context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the explanation (or a link to this discussion in the comment)?

Copy link
Contributor

@sodic sodic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work!

I'm approving, just add this please: #2362 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Defining environment variable types using Zod
3 participants