Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: use nx commands #2493

Merged
merged 26 commits into from
Nov 5, 2024
Merged

Conversation

david-luna
Copy link
Contributor

@david-luna david-luna commented Oct 18, 2024

Which problem is this PR solving?

This PR tries to work towards the migration from lerna to nx open-telemetry/opentelemetry-js#4240. Although the issue is on the core repository I think both repos should be aligned in the tooling. I've just started with this one because it seems more straight-forward.

Changes

This PR surfaces the [email protected] package required by lerna. I haven't tried to use a higher version since lerna is not completely removed yet and it might have version conflicts. The package.json's script section has been changed to use the equivalent nx commands for building and testing the packages, with some nuances:

  • a nx.json file has been added defining task dependencies so some package scripts like precompile are not needed
  • a cache has been defined for some tasks like version:update, lint and compile. More details below

The workflow does not change. Indeed the development guidelines needs no changes since already recommends to run npm run compile at the root folder before start doing changes in a package.

Cache

nx can cache the output of a task like the compilation. Having it active may come handy for developers since you can switch branches and rebuild only what changed just by npm run compile at the root. The rest will come from the cache. Cache is located inside node_modules so its easy to invalidate via npm run ci or just removing the node_modules/.cache/nx folder.

Using it in CI

For unit and TAV tests there is a compilation step before running the test suite. Turns out that the tests are using the tsconfig.json to compile the tests before running. The file has the following options

{
  "extends": "../../../tsconfig.base",
  "compilerOptions": {
    "rootDir": ".",
    "outDir": "build"
  },
  "include": [
    "src/**/*.ts",
    "test/**/*.ts"
  ]
}

so the tests are compiling again the code in order to run. This is an unnecessary use of resources.

Test jobs in workflows do a clean checkout before testing so running npm run test in a package would fail because one or more of this reasons:

  • version.ts file is missing
  • package may depend on other in the same repo (this dependency needs to be compiled)
  • package is referring to build folder in tests (example)

Test workflows run npm run compile for all packages

And because of that workflows run npm run compile for all packages before testing consuming time.

This PR leverages nx cache to avoid this extra compilation given the assumption that TypeScript is giving the same output regardless of the nodejs version used for compilation:

  • TypeScript version is pinned and the same in all workflows
  • TS configuration does not change depending on nodejs version

So we can change the workflows to

  • have a single compilation step and storing the cache. Compilation is done with the same setup done for publishing (nodejs v16) so the build results are "production like".
  • restore the cache when test jobs are running. This way the needed compilation takes less time

The gain is significant since the compilation takes around 57min and with the cache is around 36sec (2sec downloading + 2~3sec compiling).

@@ -13,8 +13,35 @@ on:
type: string

jobs:
build:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: this is the build step that is going to be cached. github.run_number is unique for the PR so each time the workflow is run this tasks overwrites the artifact saving space. retention-days is set to 1 so the artifact will be removed the next day after the last run of this workflow.

@@ -5,7 +5,34 @@ on:
pull_request:

jobs:
build:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: same here for unit tests. I guess a single compile step can be done for both workflows but I think this will add complexity to them. This way both workflows are independent and still we reduce the resource usage

@@ -10,7 +10,6 @@
"compile": "tsc -p .",
"lint": "eslint . --ext .ts",
"lint:fix": "eslint . --ext .ts --fix",
"precompile": "tsc --version && lerna run version:update --scope @opentelemetry/resource-detector-alibaba-cloud --include-dependencies",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: not necessary since nx.json has defined the task dependencies. Same applies for other packages

"compile": {
"dependsOn": [
"version:update",
"^compile"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: this is the syntax to tell that dependencies within the monorepo should be compiled before compiling the current package

@david-luna david-luna marked this pull request as ready for review October 22, 2024 09:53
@david-luna david-luna requested a review from a team as a code owner October 22, 2024 09:53
@trentm trentm added the has:sponsor This package or feature has a sponsor that has volunteered to review PRs and respond to questions label Oct 23, 2024
@trentm
Copy link
Contributor

trentm commented Oct 23, 2024

I'll sponsor to not have the "package unmaintained" logic.

@trentm
Copy link
Contributor

trentm commented Oct 24, 2024

The gain is significant since the compilation takes around 57min and with the cache is around 36sec (2sec downloading + 2~3sec compiling).

I got confused by the markup here. Are you trying to say the timing improved from 5-7 minutes to 3-6 seconds? If so, that's pretty nice. :)

@david-luna
Copy link
Contributor Author

The gain is significant since the compilation takes around 57min and with the cache is around 36sec (2sec downloading + 2~3sec compiling).

I got confused by the markup here. Are you trying to say the timing improved from 5-7 minutes to 3-6 seconds? If so, that's pretty nice. :)

Also I think I'm not clear enough. There is improvement in resource (CPU, Memory, ...) usage but not in the total time spent in the workflow. I'll try to elaborate

Currently each job on the matrix does a compilation (necessary to generate version file and have dependencies solved) before testing. Each one of them taking its 5-7 minutes of CPU & Memory consumption.

With the change we compile only once at the beginning, still taking 5-7min but only taking resources from 1 node. Then the job matrix for testing reuses the output from that previous step thanks to the cache. It might be represented like this

Before
test-matrix
├─ node 14/ --- compile (5-7min)---test(?min)---(reports and others)
├─ node 16/ --- compile (5-7min)---test(?min)---(reports and others)
├─ ...

Now
build --- compile with node v16 (5-7min)
├─ test-matrix
   ├─ node 14/ --- compile (3-6sec from cache)---test(?min)---(reports and others)
   ├─ node 16/ --- compile (3-6sec from cache)---test(?min)---(reports and others)
   ├─ ...

so total time is the same but using less resources.

Copy link
Contributor

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • might want to add 'nx' to ignoreDeps in "renovate.json" for now

One thing that is lost here with nx run-many ... vs lerna run ... is the number of workers to run. When doing npm run compile, for example, with lerna run ... it would run 11 tasks in parallel on my machine. With nx run-many ... it runs 3 tasks in parallel. lerna is setting the "parallel / concurrency" based on the number of CPUs on the machine (https://github.com/lerna/lerna/blob/6.6.2/libs/core/src/lib/command/index.ts#L15). nx is hardcoding the default --parallel value to 3. I think even the latest nx version does that.

I was going to say that this means the initial npm run compile takes longer for me. However, I unscientifically compared time npm run compile (after running npm ci) in git clones with and without this PR, and even with the parallel=3 it is faster with nx.

# with this PR
npm run compile  453.13s user 31.79s system 601% cpu 1:20.59 total

# without this PR (latest main)
npm run compile  655.85s user 86.89s system 880% cpu 1:24.35 total

I'll take it.
For me it means that the initial npm run compile takes longer.

.github/workflows/test-all-versions.yml Outdated Show resolved Hide resolved
- name: Install
run: npm ci
- name: Build
run: npm run compile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little scared of the "if npm ci and npm run compile layout is the same for all node versions". :)
Have you done a basic check comparing two separate git clones that do those steps, one with node 16 and one with node 22, say?

  • I think I'm reasonably confident that for non-binary packages that the node_modules/... tree should be the same between versions -- otherwise the idea of package-lock.json is broken.
  • I'm not sure if we have binary deps -- i.e. ones that are built at install time via a gyp file or install or post-install npm scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little scared of the "if npm ci and npm run compile layout is the same for all node versions". :)

We can do even another twist on this because:

  • ts-node compiles src & tests folders so "in theory" is not necessary to compile, just to have the version.ts file available
  • but some packages depend on another in the same monorepo like @opentelemetry/resouce-detector-container and these dependencies need to be build.

Since the published packages are compiled using nodejs v16 maybe is better to have a "production" build as tests assets instead of a different build with each nodejs version (14, 18, 20, ...)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we have binary deps -- i.e. ones that are built at install time via a gyp file or install or post-install npm scripts.

We don't have them anymore AFAICT. The last was removed with grpc-cencus-binary-propagator (#2276) because it always built on macOS (it had pre-built binaries for linux+glibc so building sometimes does not happen based on which system one is using). I think if there was another one left we'd have noticed it by now as it tends to take a lot of time on install. 🙂

Since the published packages are compiled using nodejs v16 maybe is better to have a "production" build as tests assets instead of a different build with each nodejs version (14, 18, 20, ...)?

I think having the Node.js version matched with the published packages makes sense from a testing standpoint. This way we can preempt surprises if there are any. @david-luna would you mind putting a comment in the publish workflow and here that the versions should always be updated together? 🙂

Copy link
Member

@pichlermarc pichlermarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good % some nits, thanks for working on this. 🎉
Looking forward to seeing it in action 🙂

- name: Install
run: npm ci
- name: Build
run: npm run compile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we have binary deps -- i.e. ones that are built at install time via a gyp file or install or post-install npm scripts.

We don't have them anymore AFAICT. The last was removed with grpc-cencus-binary-propagator (#2276) because it always built on macOS (it had pre-built binaries for linux+glibc so building sometimes does not happen based on which system one is using). I think if there was another one left we'd have noticed it by now as it tends to take a lot of time on install. 🙂

Since the published packages are compiled using nodejs v16 maybe is better to have a "production" build as tests assets instead of a different build with each nodejs version (14, 18, 20, ...)?

I think having the Node.js version matched with the published packages makes sense from a testing standpoint. This way we can preempt surprises if there are any. @david-luna would you mind putting a comment in the publish workflow and here that the versions should always be updated together? 🙂

.github/workflows/unit-test.yml Outdated Show resolved Hide resolved
nx.json Outdated Show resolved Hide resolved
@david-luna david-luna merged commit 6234918 into open-telemetry:main Nov 5, 2024
25 checks passed
@david-luna david-luna deleted the dluna/update-scripts branch November 5, 2024 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has:sponsor This package or feature has a sponsor that has volunteered to review PRs and respond to questions pkg:auto-configuration-propagators pkg:auto-instrumentations-node pkg:auto-instrumentations-web pkg:host-metrics pkg:id-generator-aws-xray pkg:instrumentation-amqplib pkg:instrumentation-aws-lambda pkg:instrumentation-aws-sdk pkg:instrumentation-bunyan pkg:instrumentation-cassandra-driver pkg:instrumentation-connect pkg:instrumentation-cucumber pkg:instrumentation-dataloader pkg:instrumentation-dns pkg:instrumentation-document-load pkg:instrumentation-express pkg:instrumentation-fastify pkg:instrumentation-fs pkg:instrumentation-generic-pool pkg:instrumentation-graphql pkg:instrumentation-hapi pkg:instrumentation-ioredis pkg:instrumentation-knex pkg:instrumentation-koa pkg:instrumentation-long-task pkg:instrumentation-lru-memoizer pkg:instrumentation-memcached pkg:instrumentation-mongodb pkg:instrumentation-mongoose pkg:instrumentation-mysql pkg:instrumentation-mysql2 pkg:instrumentation-nestjs-core pkg:instrumentation-net pkg:instrumentation-pg pkg:instrumentation-pino pkg:instrumentation-redis pkg:instrumentation-redis-4 pkg:instrumentation-restify pkg:instrumentation-router pkg:instrumentation-runtime-node pkg:instrumentation-socket.io pkg:instrumentation-tedious pkg:instrumentation-undici pkg:instrumentation-user-interaction pkg:instrumentation-winston pkg:plugin-react-load pkg:propagation-utils pkg:propagator-instana pkg:propagator-ot-trace pkg:redis-common pkg:resource-detector-alibaba-cloud pkg:resource-detector-aws pkg:resource-detector-azure pkg:resource-detector-container pkg:resource-detector-gcp pkg:resource-detector-github pkg:resource-detector-instana pkg:sampler-aws-xray pkg:sql-common pkg:test-utils pkg-status:unmaintained:autoclose-scheduled pkg-status:unmaintained This package is unmaintained. Only bugfixes may be acceped until a new owner has been found.
Projects
None yet
Development

Successfully merging this pull request may close these issues.