Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: tarball url redirect #1688

Merged
merged 18 commits into from
May 3, 2021
Merged

Conversation

favoyang
Copy link
Contributor

@favoyang favoyang commented Jan 29, 2020

Type: feature

The following has been addressed in the PR:

Description:
Part of verdaccio/monorepo/issues/250, redirect tarball URL based on config.experiments.tarball_url_redirect. Major usage is to rewriting the redirect tarball URL to a CDN server.

@favoyang
Copy link
Contributor Author

favoyang commented Jan 29, 2020

There're failed tests (workflow > test_node_lts_10), but seems not related to this changes.

https://circleci.com/gh/verdaccio/verdaccio/21155?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link

@codecov
Copy link

codecov bot commented Feb 6, 2020

Codecov Report

❗ No coverage uploaded for pull request base (5.x@255650b). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head c488976 differs from pull request most recent head d5b8fd6. Consider uploading reports for the commit d5b8fd6 to get more accurate results

@@          Coverage Diff           @@
##             5.x    #1688   +/-   ##
======================================
  Coverage       ?   83.86%           
======================================
  Files          ?       48           
  Lines          ?     2597           
  Branches       ?      602           
======================================
  Hits           ?     2178           
  Misses         ?      414           
  Partials       ?        5           

Copy link
Member

@juanpicado juanpicado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@favoyang could you elaborate a bit what's the benefit of this? At this point, I don't see why Verdacccio should handle redirections and seems to me just a workaround for a specific scenario.

@favoyang
Copy link
Contributor Author

@juanpicado

  • It majorly benefits the S3 back-end to swap the URL to use the S3's network infrastructure, (see my issue feat(aws-s3-storage): serve tarball files via CDN monorepo#250 or other's issue Ability to override tarball URL #1620).
  • It can also use to separate the domain name for regular request (endpoints) and the static files (tarballs). i.e. example.com/endpoints... and static.example.com/tarballs... a trivial practice for web application for various reason. Like put entire static.example.com under CDN, or use a different web server to serve static files.

The first case is surely a specific (but common) scenario, but there's no way to implement without changing the verdaccio main repository. If we reject for this reason, maybe we could think about offering a alternative solution?

The second case is a common practice for web application.

@juanpicado
Copy link
Member

If we reject for this reason, maybe we could think about offering an alternative solution?

Not necessarily a rejection, but rather start a discussion about what we are trying to solve and which other issues this creates.

As I exposed him worth to research whether a filter plugin might solve this #1620 (comment) (example: https://verdaccio.org/docs/en/dev-plugins#api-1)

I have the same concerns as @DanielRuf exposes here #1620 (comment) , once this flag is enabled you lose the ability to use the proxy (uplinks) for all packages since I see the flag tarball_url_redirect applies all request (independently what you have configured in the package access. This assumes your CDN contains all tarballs in npmjs.

The unique way to go I see is you filter the packument then the package manager will download from the location you provide, then is not required any change at Verdaccio, you can read more here about it #1161.

"dist": {
    "shasum": "0eb570293ca87b272a0f3a62b50adb9a5f0a8f1a",
     "tarball": "https://{custom}/object-assign/-/object-assign-0.1.1.tgz"
},
``

@favoyang
Copy link
Contributor Author

once this flag is enabled you lose the ability to use the proxy (uplinks) for all packages

I see. the downloadStream will invoke storage.getTarball, which will download tarball from the upstream to local storage, then serve from local. Indeed the redirect flag brokes the uplink feature. Maybe I should let it finish the download, then return to CDN URL.

As I exposed him worth to research whether a filter plugin might solve this #1620 (comment) (example: https://verdaccio.org/docs/en/dev-plugins#api-1)

So filter can change the meta data. But seems filter was only invoked by _syncUplinksMetadata, so it's limited to uplink?

The unique way to go I see is you filter the packument then the package manager will download from the location you provide, then is not required any change at Verdaccio, you can read more here about it #1161.

The intention is to let user download from S3/CDN infrastructure directly.

npm fetches meta data => verdaccio returns meta data
npm fetches tarball from verdaccio URL => verdaccio returns S3 URL (HTTP 302)
npm fetches from S3 URL => S3 serving tarball

The benefit of HTTP 302 is that, verdaccio still get the access, so it can trace the DL stats (not available yet, but a popular feature request). If I change the tarball URL using filter, verdaccio will not be able to trace the access.

@favoyang
Copy link
Contributor Author

Update PR to handle uplinks

  • If the local tarball exists, return 302 based on config.tarball_url_redirect
  • If not, fall back to the regular download stream.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 1 Security Hotspot to review)
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@gordlea
Copy link

gordlea commented Apr 24, 2020

This would make my life so much easier, any chance of it going in?

@juanpicado
Copy link
Member

juanpicado commented Apr 24, 2020

This would make my life so much easier, any chance of it going in?

Your point of view is valuable, could you share more in detail? perhaps helps to move this on.

@gordlea
Copy link

gordlea commented Apr 24, 2020

Sure, basically, we have users around the world who need to access our internal npm registry. Currently we are running multiple instances of this in different parts of the world, and have to jump through some hoops regarding auth/vpn/internal networks etc.

Ideally we'd be able to have the actual download of the tarballs happen via cloudfront signed urls. This would mean that we would only need to run a single instance of verdaccio, while still having the tarball downloads for users in India come from the cloudfront edge cache in Mumbai, etc.

I realize this PR doesn't support signed urls for the redirect, but it's a big step in the right direction.

@gordlea
Copy link

gordlea commented Apr 24, 2020

We use the a javascript config, so if there was also config.tarball_url_redirect_signing_func option that let you pass a function something like:

const config = {
    tarball_url_redirect: 'http://d111111abcdef8.cloudfront.net/${packageName}/${filename}',
    tarball_url_redirect_signing_func(tarballUrl) {
        const signedUrl = // use aws sdk to generate signed url
        return signedUrl;
    }
}

That would let us do what we want, as described above.

@favoyang
Copy link
Contributor Author

@gordlea Thanks for your comment. I didn't realize that the verdaccio configuration supports js format? If so, it can be simplified to

const config = {
    # The tarball_url_redirect can be a template string.
    tarball_url_redirect: 'http://d111111abcdef8.cloudfront.net/${packageName}/${filename}',

   # The tarball_url_redirect can be a function, takes packageName and filename and returns the url
    tarball_url_redirect(packageName, filename) {
        const signedUrl = // use aws sdk to generate signed url
        return signedUrl;
    }
}

@favoyang
Copy link
Contributor Author

It's been months since my last update, I'd like to summarize the status of the PR.

Is the request useful?

Yes, it is necessary if you want to work with modern content delivery network infrastructure.

Will this PR break the uplink feature?

No. It only redirects the URL if the local tarball file exists. Otherwise, it will fall back to the default logic to query uplinks.

Can it be implemented using the filter plugin?

With a filter plugin, you can hijack the packument and modify it. However, it seems the filter plugin is only applied to the uplinks. The only method invokes filter.filter_metadata is _syncUplinksMetadata. If so, it's not useful here.

Even if a filter plugin applied to all packages, the approach of modifying the packument directly to change the tarball URL will make it hard to implement download stats feature like #1201 or #1489, which requires all download requests go through verdaccio at the first place.

route.get('/:package/-/:filename', can('access'), function(req: $RequestExtend, res: $ResponseExtend): void {
    // Update stats first
    UpdateStats(...);
    // Serve the tarball later
    downloadStreamOrRedirect(req.params.package, req.params.filename, storage, req, res, config);
  });

Any other options?

Plugin-middleware seems another option if there's a way to get the local storage instance. If we do this, then the download stats feature have to be implemented as a middleware plugin as well, since it should process before the tarball redirect.

Whether it worth doing more research here depends on how bad to put this PR into the core system.

@gordlea
Copy link

gordlea commented Apr 30, 2020

@gordlea Thanks for your comment. I didn't realize that the verdaccio configuration supports js format? If so, it can be simplified to

const config = {
    # The tarball_url_redirect can be a template string.
    tarball_url_redirect: 'http://d111111abcdef8.cloudfront.net/${packageName}/${filename}',

   # The tarball_url_redirect can be a function, takes packageName and filename and returns the url
    tarball_url_redirect(packageName, filename) {
        const signedUrl = // use aws sdk to generate signed url
        return signedUrl;
    }
}

Any chance you might add this to the PR?

@gordlea
Copy link

gordlea commented May 3, 2020

@favoyang please take a look at: favoyang#1

@sonarqubecloud
Copy link

sonarqubecloud bot commented May 5, 2020

SonarCloud Quality Gate failed.

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 1 Security Hotspot to review)
Code Smell A 1 Code Smell

No Coverage information No Coverage information
37.2% 37.2% Duplication

@favoyang
Copy link
Contributor Author

favoyang commented May 5, 2020

@gordlea your patch merged, also synced with the master branch.

Next, shall we continue the discussion like #1688 (comment)?

@gordlea
Copy link

gordlea commented May 8, 2020

@gordlea your patch merged, also synced with the master branch.

Next, shall we continue the discussion like #1688 (comment)?

Great, sounds good.

@gordlea
Copy link

gordlea commented May 20, 2020

Just FYI I'm going to be trying this branch out in production soon.

@favoyang
Copy link
Contributor Author

I'm using it on production (before your contribution) for a few months, works for me.

@gordlea
Copy link

gordlea commented Jun 22, 2020

We've been running this in production for about 3 weeks with no issues now.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jul 9, 2020

SonarCloud Quality Gate failed.

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 1 Security Hotspot to review)
Code Smell A 0 Code Smells

No Coverage information No Coverage information
38.8% 38.8% Duplication

@gordlea
Copy link

gordlea commented Feb 12, 2021

We've been running this in prod since May 2020 now, with no issues. This PR is so powerful, it basically lets you create a global CDN from a single verdaccio instance.

@juanpicado juanpicado added this to the 5.x.x milestone Feb 12, 2021
@juanpicado
Copy link
Member

juanpicado commented Feb 12, 2021

We've been running this in prod since May 2020 now, with no issues. This PR is so powerful, it basically lets you create a global CDN from a single verdaccio instance.

Thanks for the feedback, I added this PR to 5.x master (milestone) just to be able to track it. Just fyi, I haven't discarded the idea, just :) you know, lack of time. The next major is more flexible for new ideas, please, check 5.x master when you guys have time.

@favoyang favoyang changed the base branch from master to 5.x April 11, 2021 02:22
@favoyang
Copy link
Contributor Author

I changed the base branch to 5.x. I guess this would be a more stable branch to track with.

Copy link
Member

@juanpicado juanpicado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @favoyang this keep this rebased, it was a long way 👍🏼

@juanpicado juanpicado merged commit 78d04cf into verdaccio:5.x May 3, 2021
@favoyang favoyang deleted the tarball-url-redirect branch May 4, 2021 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants