Doing more with less, stop wasting: ways we could reduce the project's resource consumption #12519
Unanswered
marcdumais-work
asked this question in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In modern days, there are amazing tools and services at the disposal of a project such a Theia, and it's tempting to use them. Some of the resources we use are "free as in beer" and depend on the goodwill of 3rd parties. For example GitHub project hosting and related services such as its CI system, including free-of-charge executors. Another example is the public registry for
npm
packages, npmjs.com, where we publish@theia
extensions, each in the form of anpm package
. They host our extensions and make it possible to anyone with an internet connection to build a Theia application without having to first build Theia from sources.When we use such tools and services, it costs the company running it in terms of computer resources (CPU, RAM, storage) and of employee time to run and support it.
If we do not make judicious use of these resources, we run the risk of being on the top of their list, if they ever want to cut operating costs without impacting their average, more nimble, users.
There are a couple of ways I think we waste resources, that have been bugging me for a while. They have been in place for a long time, and may have made better sense then, but maybe no longer.
npm resources:
We publish a lot of
npm
packages!link: @theia/core npm versions
A set of ~50 and growing Theia extensions are part of this repo. They considered part of the "core framework". Each Theia monthly release we publish a
@latest
set of these extensions, each its ownnpm
package. If some months we do a bugfix release, that's 100 packages published. So far I think this is probably ok.We also publish a
@next
set of Theia packages every time a PR is merged, in ourCI-CD workflow
(test/intermediary version). Let's say we merge around 20 PRs a month, that adds-up to around 1000 npm packages we publish.We publish relatively big packages
Some project publish a lot of extensions. But sometimes they can be individually tiny, e.g. containing only some typings. The Theia extensions we publish to
npm
tend to be pretty big. One version for the whole set is probably some tens of MB compressed.Note: Because of the
left-pad
debacle,npm
no longer allows un-publishing packages. Instead they can be deprecated. This means that every package we publish is there "forev.. long time", adding a cost to running the publicnpm
registry, in the form of storage, bandwidth, etc.In the early days, Theia could change a lot between two monthly releases, so it made sense to provide intermediary versions, that could easily be used to produce a better, intermediary version, of a Theia-based IDE. Today, I would guess that the value of an average
@next
Theia release, is close to zero.Proposal
I think we could stop publishing
next
versions systematically when a PR is merged. If we think it's useful, maybe keep it as a "opt-in" option, e.g. triggered through the GitHub UI or if a certain keyword appears in the PR description or commit message or maybe if a certain label is attached to the PR, if such thing is possible.What if it's still needed sometimes to build against an intermediary version of Theia?
We could offer another way to easily build and use @next Theia packages that would previously have been available on
npm
. We could make it easier to locally publish/serve an arbitrary set of theia packages, using a local npm registry, that could then be used to build a Theia-based app (e.g. Blueprint).I've seen some GitHub repos integrate the Verdaccio local
npm
registry and make it easy to use, throughscripts
entries in theirpackage.json
. Doing something like this would make it relatively easy for a developer to pick a Theia commit they want to try, build it from sources and release it to their localVerdaccio
registry, and then use this registry to build a Theia-based application, e.g. Blueprint.GitHub resources:
CI Executor resources
TypeDoc API re-generation
Theia re-generates its
TypeDoc
API documentation during CI, in the "publish" job of workflow "ci-cd". This is executed every time we merge a PR. The resulting HTML documentation is stored on thegh-pages
branch of our repo, to be used by our GitHub Page.Generating that documentation is very memory (1) and CPU intensive. It lasts up-to around an hour, making the post-PR-merge CI/CD run last that much longer. Arguably, the benefit of updating the API doc for every merged PR is low. Maybe instead we could do it once per release, along with it.
(1): currently we run the node process with "--max_old_space_size=9216" , and as Theia grows, we need to bump this periodically or it eventually runs out before completing
Theia CI wastes resources rebuilding Theia multiple times
There is a related comment in our
ci-cd.yml
workflow file:Beta Was this translation helpful? Give feedback.
All reactions