Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Initial support for open-next v2 and ISR #125

Merged

Conversation

kevin-mitchell
Copy link
Contributor

@kevin-mitchell kevin-mitchell commented Jul 28, 2023

See: #119

Updates the server handler to include cache bucket configuration, which is likely not correct as I'm just using the static asset bucket temporarily. I need to look at how SST does this and what the spirit of this should be, e.g. if a separate bucket would be more appropriate, a folder, etc.

This also adds the ISR handling logic. I haven't actually tested this yet (am not 100% sure how to).

This is a WIP.

Fixes #119

TODO

  1. Confirm if we should make a new bucket for cache or use an existing bucket
  2. Confirm permissions appropriate for this bucket between nextjs server and the bucket
  3. Confirm if ISR feature is working and what else may be needed
  4. Make sure I didn't break anything along the way
  5. I'd like to add descriptions to other functions

@kevin-mitchell kevin-mitchell changed the title Initial support for open-next v2 and ISR fix: Initial support for open-next v2 and ISR Jul 28, 2023
@revmischa revmischa requested a review from khuezy July 28, 2023 05:42
@bestickley
Copy link
Collaborator

@kevin-mitchell, thanks for starting on this! One way to test this would be to deploy the Next.js 13 App Dir Example from Vercel here and ensure everything works as expected. I think that example showcases many features we'd want to test against. Not sure about pages directory as I don't have as much experience with those features.

src/Nextjs.ts Outdated Show resolved Hide resolved
src/NextjsLambda.ts Outdated Show resolved Hide resolved
@revmischa revmischa self-requested a review July 28, 2023 15:24
src/ImageOptimizationLambda.ts Show resolved Hide resolved
src/Nextjs.ts Outdated Show resolved Hide resolved
src/Nextjs.ts Outdated Show resolved Hide resolved
src/NextjsLambda.ts Outdated Show resolved Hide resolved
@kevin-mitchell
Copy link
Contributor Author

kevin-mitchell commented Jul 29, 2023

Current version of this branch is mainly working, however using the Next.js demo when I attempt to navigate through the client context page I'm getting an error about a missing key in the cache.

clicking "electronics" on this page
Screenshot 2023-07-29 at 1 10 20 PM

results in
Screenshot 2023-07-29 at 1 10 28 PM

And I think this is the corresponding error. I have to look to see what is wrong here.

edit: the below error I think is actually expected / working as intended because this exception is caught (see: async getFetchCache(key: string) {).

2023-07-29T17:01:21.900Z	e58948db-e51e-4df8-926a-616be4780b38	ERROR	{
  clientName: 'S3Client',
  commandName: 'GetObjectCommand',
  input: {
    Bucket: 'cdk2stack-webcache4214db24-1eiepcnix6r5t',
    Key: '__fetch/zu8Tv8Vb6COL0vbxjUL0N/99afeb833f222f564e02a181c4c851b948d3afac5d6ade910de49fefee39bfb2'
  },
  error: NoSuchKey: The specified key does not exist.
   
... snip snip snip ...

I'm sort of guessing the real error here is actually from the request trying to fetch from the server:

2023-07-29T17:01:21.904Z	e58948db-e51e-4df8-926a-616be4780b38	ERROR	 [TypeError: fetch failed] {
  cause:  [Error: connect ECONNREFUSED 127.0.0.1:3000] {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 3000
}
}

@kevin-mitchell
Copy link
Contributor Author

I switched to using the explicit examples called out here: https://github.com/serverless-stack/open-next/blob/main/README.md#example

I could use some input on the CloudFront Distribution setup. What I have "works" and seems to have the same behavior as SST (without digging too deep - IMHO a 404 should be cached, but both the current branch here as well as SST returns a "Error from cloudfront" x-cache header), but I should look closer to see what the "right" thing to do is.

That said, SOMETHING isn't tuned well because the SSG pages are serving very old Cloudfront caches. I'm not sure if this issue predates this PR because I never actually have used cdk-nextjs when it was working (before the open-next v2 update).

I'm copy / pasting some notes I took, although the 404 is fixed as noted above by removing the origin group and just replacing it will a direct call to the web server.

Issues

I have never (successfully) been able to use this CDK construct / project, so I’m not sure which of these are new, which are related to the changes in this PR, and

Error related to SST in logs for Server Function

2023-07-30T21:15:15.332Z	e6fba54c-e62a-4d70-9b9f-52758eee431c	ERROR	Error: Cannot access bound resources. This usually happens if you are using an older version of SST. Please update SST to the latest version.
    at Object.get (file:///var/task/node_modules/.pnpm/[email protected]/node_modules/sst/node/util/index.js:27:27)
    at /var/task/.next/server/pages/api/auth/[...nextauth].js:51:75

Not sure why this is happening, but it seems “random” (most requests don’t see it).

Cache isn’t behaving ideally for SSG

  1. Do a new deployment of site
  2. SSG page is still cached, so shows the wrong timestamp
  3. Adding query parameters to “bust” cache shows expected SSG timestamp

Maybe this is working as intended, but it seems like the TTL is either too long or a cache invalidation should happen as part of the deployment, OR some documentation somewhere should say “you have to invalidate the cache yourself on deployment.”

Or I’m just missing something (very possible)

404 pages seem to serve 403 XML response from S3

NOTE THIS WAS AT LEAST PARTIALLY FIXED - I removed the default origin group with the S3 fallback with a direct call to the Lambda server which is I believe what SST also does.

I’m not 100% this is what’s actually happening, maybe it’s not S3. But it at least seems involved. One header is “Server: AmazonS3”

I’m not sure how this is intended to work, or if this is coming from the cache bucket because the cache bucket is misconfigured, etc.

<Error>
<script/>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>63DM002111YMJTDN</RequestId>
<HostId>bQZYzcGFF5wf6CCiU2VzdkUgD0q535TC6lAB9fwmLjp/MRYGmmbx4hU2gxyhPDePqBgiye7UZhM=</HostId>
</Error>

Note that the server lambda is triggered, and seems to generate the correct response - so it’s something about either the Cloud Front Distribution or something else about the setup that’s causing the wrong result to be sent.

In other words, the result ofdebug("ServerResponse data", { statusCode, headers, isBase64Encoded, body }); here https://github.com/serverless-stack/open-next/blob/274d446ed7e940cfbe7ce05a21108f4c854ee37a/packages/open-next/src/adapters/server-adapter.ts#L131 is

2023-07-30T22:43:08.050Z	f1f2aecb-9b57-4598-a4cd-9e40cb1cfee0	INFO	ServerResponse data {
  statusCode: 404,
  headers: [Object: null prototype] {
    'x-powered-by': 'Next.js',
    etag: '"xl782hlhkj1o3"',
    'content-type': 'text/html; charset=utf-8',
    'content-length': 2165
  },
  isBase64Encoded: false,
  body: '<!DOCTYPE html><html><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width"/><link rel="icon" href="/favicon.ico"/><meta name="description" content="Learn how to build a personal website using Next.js"/><meta property="og:image" content="https://og-image.vercel.app/Next.js%20Sample%20Website.png?theme=light&amp;md=0&amp;fontSize=75px&amp;images=https%3A%2F%2Fassets.vercel.com%2Fimage%2Fupload%2Ffront%2Fassets%2Fdesign%2Fnextjs-black-logo.svg"/><meta name="og:title" content="Next.js Sample Website"/><meta name="twitter:card" content="summary_large_image"/><meta name="next-head-count" content="7"/><link rel="preload" href="/_next/static/css/52270e7d977c310f.css" as="style"/><link rel="stylesheet" href="/_next/static/css/52270e7d977c310f.css" data-n-g=""/><link rel="preload" href="/_next/static/css/34339a76372c2b69.css" as="style"/><link rel="stylesheet" href="/_next/static/css/34339a76372c2b69.css" data-n-p=""/><noscript data-n-css=""></noscript><script defer="" nomodule="" src="/_next/static/chunks/polyfills-78c92fac7aa8fdd8.js"></script><script src="/_next/static/chunks/webpack-8fa1640cc84ba8fe.js" defer=""></script><script src="/_next/static/chunks/framework-6698976aa0ea586d.js" defer=""></script><script src="/_next/static/chunks/main-e3ffb9c21b7073be.js" defer=""></script><script src="/_next/static/chunks/pages/_app-b0f7a974fa14c156.js" defer=""></script><script src="/_next/static/chunks/524-6309da801f10dcdd.js" defer=""></script><script src="/_next/static/chunks/pages/404-eeaaba92ee717891.js" defer=""></script><script src="/_next/static/Pe7Ceh4Ds1YSzL2f7c72U/_buildManifest.js" defer=""></script><script src="/_next/static/Pe7Ceh4Ds1YSzL2f7c72U/_ssgManifest.js" defer=""></script></head><body><div id="__next"><div class="layout_container__fbLkO"><main><article><h1>404</h1></article></main><div class="layout_backToHome__9sjx_"><a href="/">← Back to home</a></div></div></div><script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{}},"page":"/404","query":{},"buildId":"Pe7Ceh4Ds1YSzL2f7c72U","nextExport":true,"autoExport":true,"isFallback":false,"scriptLoader":[]}</script></body></html>'
}

src/NextjsDistribution.ts Outdated Show resolved Hide resolved
src/NextjsLambda.ts Outdated Show resolved Hide resolved
@bestickley
Copy link
Collaborator

@kevin-mitchell, the "Error related to SST in logs for Server Function" doesn't make sense to me. There should be no code related to SST in this construct. Were you referring to SST's construct?

@bestickley
Copy link
Collaborator

Cache isn’t behaving ideally for SSG

Could this be connected to the change in the default origin group code I commented on?

@kevin-mitchell
Copy link
Contributor Author

kevin-mitchell commented Jul 31, 2023

@bestickley right this is why I called it out. I'm not sure where it came from. I'll try to look with fresh eyes today.

Edit: and nope, I was referring to this construct, the pr I've got open. I figured perhaps open-next might reference it somewhere but couldn't see where.

@bestickley
Copy link
Collaborator

@kevin-mitchell, any other outstanding issues? Do all routes work on the test website?

@kevin-mitchell
Copy link
Contributor Author

@bestickley in short, yes, there are still outstanding issues with the test website(s). It's taken me so long to reply because I'm trying to figure out what is actually an issue with this PR vs what is a prior issue / incompatibility between this project and the Vercel app playground / open-next example sites.

More details below, but I wonder if it might be a good idea to do something like:

  1. Open a new PR against main that does nothing but pegs the default open-next version to 1.4 so that somebody could always go back to that tagged release of this project and at least having things working out of the box
  2. Either create a "v2-open-next" work branch and then update my PR here to merge into that WIP branch, or merge this PR into main and open new tickets / issues to fix the remaining issues.

Below are some more details on specific issues I've hit, but again in some cases I'm not certain it's not just an issue with my local setup and how I'm deploying, etc. So take these with a grain of salt


I'm just not sure if they are new issues, or old issues. Looking at my PR, honestly there aren't all that many changes so I suspect in the end even if this was landed as is it would be in better shape than before, BUT here are the issues I'm aware of:

General

  • In general it seems that SSG pages when redeployed don't have cache cleared. I'm not sure if this is a new issue, or an existing issue, but (for example) /ssg.html will be cached at the CloudFront level for a very long time (days?), and so if the site is regenerated and there is a change, this cached item will be held onto. Maybe this is expected, I'm not really sure what the solution is. I don't think any of my changes should have impacted this, but maybe.

open-next example

Vercel app playground

  • I wonder if this is deployed anywhere with SST. I'd be curious to know how it performs.
  • Some nested layouts aren't working - it seems random, and I wonder if it could have something to do with rate limiting somewhere in my new "free tier" account, or another default somewhere. Either way, it's a bad experience and needs to be better understood and resolved.
  • Very frustrating, but ISR seems to work most of the time, but other times it does not. I feel like I'm loosing it a bit because I've watched this work many times, and I know it works with open-next example.

@bestickley
Copy link
Collaborator

@kevin-mitchell, I think 1 from above makes sense, if you can create a PR for that, I think we should get it merged so newcomers aren't confused why this construct isn't working out of the box.

I'm work with incrementally working on v2 support on this PR.

kevin-mitchell pushed a commit to kevin-mitchell/cdk-nextjs that referenced this pull request Aug 2, 2023
There was quite a bit of discussion about this, and it's still not the best solution for all situations. For now it should fix an issue where direct requests to static assets (e.g. files in the `/public` directory) cause a 404.

jetbridge#125 (comment)
src/NextjsRevalidation.ts Outdated Show resolved Hide resolved
src/NextjsRevalidation.ts Outdated Show resolved Hide resolved
src/NextjsRevalidation.ts Show resolved Hide resolved
@kevin-mitchell
Copy link
Contributor Author

kevin-mitchell commented Aug 3, 2023

OK so current things that are not working that I'm aware of:

app playground

  1. the "streaming" feature for both edge and node aren't working

  2. "Randomly" I see this issue when navigating through pages quickly
    Screenshot 2023-08-02 at 5 52 07 PM

  3. Cache isn't invalidate when deploying, so /random/cached/page won't be invalidated during deployment if it changes. Should we add something to invalidate /* each deployment?

@khuezy
Copy link
Contributor

khuezy commented Aug 3, 2023

  1. Streaming isn't currently supported, Conico has a branch to experiment on Support Streaming opennextjs/opennextjs-aws#79.

  2. Can you check logs

  3. Yes, we should invalidate /* on deployment, but I thought cdk-next already did that?

I'll be on tomorrow morning PST.

@bestickley
Copy link
Collaborator

bestickley commented Aug 3, 2023

@kevin-mitchell
For 2. above, did you accidentally paste wrong screenshot? I see the same screenshot as 1.
For 3. above, the code for invalidating is here. It looks like we're not invalidating "/*" but rather all files that are referenced in event.ResourceProperties.s3Keys. You can add a console.log(event.ResourceProperties) here to check to see if the file that you're seeing that is being incorrectly cached is or isn't in that list. It looks like s3Keys is a list of all the static files generated by open-next. See code here.

@khuezy
Copy link
Contributor

khuezy commented Aug 3, 2023

@kevin-mitchell #129, this will affect the fallback stuff we discussed.

@bestickley
Copy link
Collaborator

@kevin-mitchell, I'm planning on resolving the remaining issues and then resolving the origin group fallback behavior issue in #129. Server actions are an important feature I think we should support. I'll keep you updated.

@kevin-mitchell
Copy link
Contributor Author

@bestickley @khuezy

RE: Steaming isn't supported - good, I won't worry about having broken it then :)

RE: Error in logs - I haven't been able to reproduce. I suspect it's a concurrency issue with Lambda / AWS, though I'm not sure why I would have hit the limit.

RE: Invaliding "/*" on deployment - previously, as @bestickley mentioned perhaps this function was handled by invalidating the S3 "assets" bucket contents on deployment. Now this SSG content is inside a different bucket so at least to me it appears these aren't invalidated. I added a new config parameter to allow the full invalidation to be skipped, but defaulted to invalidating /* because it will result in the behavior most people would likely expect I think.

@bestickley
Copy link
Collaborator

@kevin-mitchell, on 2nd thought, how about we get this branch merged, then we can address origin group issue in another PR. I want you to "get credit" for this great work. I will test out your changes on this branch tomorrow morning with the open-next test app and if everything looks good I'll merge it.

@kevin-mitchell kevin-mitchell marked this pull request as ready for review August 4, 2023 00:42
Kevin Mitchell added 12 commits August 3, 2023 20:44
Updates the server handler to include cache bucket configuration, **which is likely not correct as I'm just using the static asset bucket temporarily**. I need to look at how SST does this and what the spirit of this should be, e.g. if a separate bucket would be more appropriate, a folder, etc.

This also adds the ISR handling logic. I haven't actually tested this yet (am not 100% sure how to).

**This is a WIP**.
This isn't working 100% quite yet. I am testing with https://github.com/vercel/app-playground - with these changes I'm able to deploy but I need to populate the cache bucket name in another function env I believe. I also haven't really gotten into the details of looking to see if messages are actually being sent to the queue, etc.

The main difference here is some cleanup, and I added the separate bucket for the cache.
* added a new constant so we can keep the default memory size uniform across lambda functions
* Allow the cache bucket to be passed in as a construct property if user wants to supply their own s3 cache bucket
I'm not at all confident this is the "right" thing to do here, but without this I was having issues with S3 fallback serving 403s. It's entirely likely this is a shortsighted change and in reality I need to fix something on the S3 bucket in terms of how it handles 404 / 403s.

See SST implementation here: https://github.com/serverless-stack/sst/blob/b56c2ea021290211c72841c605cec58579ef3591/packages/sst/src/constructs/SsrSite.ts#L1053-L1058
I now see why this was in place originally. It still doesn't work, but I'd rather leave it as it is for now.

SST creates additional behaviors in CloudFormation at deploy time, basically iterating through all of the files in the root of `/public` and creating a rule that routes to the static assets bucket for these matches:
https://github.com/serverless-stack/sst/blob/b56c2ea021290211c72841c605cec58579ef3591/packages/sst/src/constructs/SsrSite.ts#L1113
```
  protected addStaticFileBehaviors() {
    const { cdk } = this.props;

    // Create a template for statics behaviours
    const publicDir = path.join(
      this.props.path,
      this.buildConfig.clientBuildOutputDir
    );
    for (const item of fs.readdirSync(publicDir)) {
      const isDir = fs.statSync(path.join(publicDir, item)).isDirectory();
      this.distribution.addBehavior(isDir ? `${item}/*` : item, this.s3Origin, {
        viewerProtocolPolicy: ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
        allowedMethods: AllowedMethods.ALLOW_GET_HEAD_OPTIONS,
        cachedMethods: CachedMethods.CACHE_GET_HEAD_OPTIONS,
        compress: true,
        cachePolicy: CachePolicy.CACHING_OPTIMIZED,
        responseHeadersPolicy: cdk?.responseHeadersPolicy,
      });
    }
  }
```

This is a bit more complicated with the pitfall of hitting the maximum number of behaviors (25 apparently).

> First do no evil.
There was quite a bit of discussion about this, and it's still not the best solution for all situations. For now it should fix an issue where direct requests to static assets (e.g. files in the `/public` directory) cause a 404.

jetbridge#125 (comment)
src/NextjsLambda.ts Outdated Show resolved Hide resolved
I rebased on main to prep for a potential landing but somehow lost the changes in NextjsBuild. Also a typo fix.
@bestickley
Copy link
Collaborator

bestickley commented Aug 4, 2023

@kevin-mitchell, in my testing, I've run across an issue where the CACHE_* env vars aren't set if I specify defaults.lambda.environment. Could you update the function instantiation in NextjsLambda.ts to below? See comment for details.

const fn = new Function(scope, 'ServerHandler', {
      memorySize: functionOptions?.memorySize || DEFAULT_LAMBA_MEMORY,
      timeout: functionOptions?.timeout ?? Duration.seconds(10),
      runtime: LAMBDA_RUNTIME,
      handler: path.join('index.handler'),
      code,
      // prevents "Resolution error: Cannot use resource in a cross-environment
      // fashion, the resource's physical name must be explicit set or use
      // PhysicalName.GENERATE_IF_NEEDED."
      functionName: Stack.of(this).region !== 'us-east-1' ? PhysicalName.GENERATE_IF_NEEDED : undefined,
      ...functionOptions,
      // `environment` needs to go after `functionOptions` b/c if
      // `functionOptions.environment` is defined, it will override
      // CACHE_* environment variables which are required
      environment,
    });

UPDATE: I went ahead and resolved this issue.

@bestickley
Copy link
Collaborator

bestickley commented Aug 4, 2023

I've verified that ISR works with the Next.js App Router example! All other pages seem to be working too except for loading. @kevin-mitchell, was loading working for you in App Router example app?

UPDATE: I think this is happening because open-next's cache doesn't respect the { cache: "no-cache" } here as documented here. In other words, open-next is immediately responding with cache'd page where as vercel version waits the 1000ms to fetch the data. There is probably more to investigate here, but I'll create a separate issue.

@kevin-mitchell
Copy link
Contributor Author

I've verified that ISR works with the Next.js App Router example! All other pages seem to be working too except for loading. @kevin-mitchell, was loading working for you in App Router example app?

output

As far as I know it's working for me - at least based on my understanding. Hopefully this gif either looks "right" or "wrong" - but I just deployed what's out there and this is what I'm seeing.

@revmischa
Copy link
Member

Awesome, if you want to publish a new (major?) version then let's go for it whenever you're ready.

@kevin-mitchell
Copy link
Contributor Author

FYI I'm heading out to do some backcountry camping this afternoon. Normally I'd try to make sure I was around in case any fast-follow type fixes were needed, but I'll be "out of pocket" as they say from this afternoon with limited connectivity options.

@bestickley
Copy link
Collaborator

@kevin-mitchell, no worries! Enjoy your trip!
@revmischa, what do you think about doing a pre-release? There are a couple of other updates I'd like to make in the next week that might contain breaking changes so it would be nice to batch them.

@bestickley
Copy link
Collaborator

@kevin-mitchell, I'm glad the loading is working for you! It's not working for me or @khuezy. Not sure why, but not a show stopper IMO.

@bestickley bestickley merged commit d0df996 into jetbridge:main Aug 4, 2023
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incompatible with open-next@2
4 participants