Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding QA jobs and expanding test jobs for CI/CD. #292

Merged
merged 28 commits into from
Jan 9, 2022
Merged

Conversation

tegefaulkes
Copy link
Contributor

@tegefaulkes tegefaulkes commented Nov 18, 2021

Description

This PR updates the gitlab CI/CD with new jobs for testing if the nix-build outputs run.

Note the pinning of node-gyp-build dependency discussed here: MatrixAI/TypeScript-Demo-Lib#35 (comment).

Native dependencies are flaky and has to be considered specially when upgrading.

Issues Fixed

Tasks

  • Integrate typescript cached compiler path to the .gitlab-ci.yml is that ts-node cache is reused. Session Bin Test Fixes #296 (comment)
  • Replace testBinUtils.pkAgent with testUtils.setupGlobalAgent
  • See if testUtils.setupGlobalKeyPair is sufficient, in which case no need to share a global agent, it will be faster
    • Turns out that by using the global keypair, it takes 3 seconds to create a PK agent, this is sufficient for many tests and they don't need to synchronous on the global PK agent
  • [ ] Add back in worker manager related testing - KeyManager already has tests for worker manager, CLI cannot test worker manager due to weird threadsjs behaviour when spawned as child process inside jest, and the last would just be DB functions which is already tested in js-db
    • Integrate WorkerManager back into pk agent start, by using a --cores parameter
    • [ ] Test the --workers parameter, by default it's set to all cores, need to ensure that we are using just 1 worker for testing - cannot use workers in pkSpawn thus bin tests are not able to test it at all
  • Add timeout testing to networking domain
  • Find out why allowHalfOpen isn't working in utp-native - The allowHalfOpen doesn't seem to be working on the server mafintosh/utp-native#41 (comment)
  • Fix partial start and partial create for PolykeyAgent, vaults and nodes
  • Need to figure out how best to split up tests, assigning arbitrary number of jobs to all the different domains seems brittle.
  • Integrate NPM, jest and ts-node caching into Gitlab CI/CD
  • Furthermore need to clean up the .gitlab-ci.yml file not to have the special case for this branch as it will be merged into master.
  • Need to test the actual build to see this working, as well as commentary on the node-gyp-build dependency
    • It is expected that nodes may be broken, but this shouldn't affect the build

Final checklist

  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@tegefaulkes tegefaulkes force-pushed the qa-testing branch 3 times, most recently from 905e86b to 0f1f871 Compare November 18, 2021 05:11
@tegefaulkes
Copy link
Contributor Author

This is ready to merge @CMCDragonkai

scripts/runDocker.sh Outdated Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
.gitlab-ci.yml Outdated Show resolved Hide resolved
.gitlab-ci.yml Outdated Show resolved Hide resolved
.gitlab-ci.yml Outdated Show resolved Hide resolved
.gitlab-ci.yml Outdated Show resolved Hide resolved
@CMCDragonkai
Copy link
Member

CMCDragonkai commented Nov 21, 2021

MatrixAI/TypeScript-Demo-Lib#35 merged into TS Demo Lib. Changes to be ported here:

  1. The package.json fix.
  2. Make sure the /builds is ignored in .gitignore and .npmignore
  3. Review the .gitlab-ci.yml and port over changes here.
  4. Uncomment the macos job obviously since js-polykey can handle it
  5. Fix up the node-gyp-build problem by also updating the @matrixai/db dependency
  6. Do we still need level and other deps in js-polykey? We would if we need those types, but other wise everything should be brought into from the @matrixai/db.

@CMCDragonkai
Copy link
Member

Take note of the changes here: https://github.com/MatrixAI/TypeScript-Demo-Lib/pull/35/files There's quite a few things that I've talked about in the PR that require required as well.

Don't forget to fix up the node-gyp-build problem too.

@CMCDragonkai
Copy link
Member

I noticed that the quality jobs still seem to run even if nix job isn't running due to not being on the master branch. It seems we would need the only spec as well for all the quality jobs.

In the future it maybe better to instead put all the jobs into a DAG instead, then we won't need to worry about this.

@CMCDragonkai
Copy link
Member

Final changes applied here: MatrixAI/TypeScript-Demo-Lib@c393b7d also should be ported here.

@tegefaulkes
Copy link
Contributor Author

I noticed that the quality jobs still seem to run even if nix job isn't running due to not being on the master branch. It seems we would need the only spec as well for all the quality jobs.

In the future it maybe better to instead put all the jobs into a DAG instead, then we won't need to worry about this.

using the needs keyword for the check stage jobs ensured that the job only ran after the needed job succeeded. seems like using the 'dependencies` keyword doesn't do the same.

We can switch back to using needs and artifacts: true unless there was a reason to use dependencies keyword instead.

@CMCDragonkai
Copy link
Member

Yes, but that's why I had to add the only fix on the subsequent quality stage so that they were ran only in master branch. That fixed it for TS Demo Lib. We should keep it consistent with ts demo lib for now and explore the needs DAG later.

@tegefaulkes
Copy link
Contributor Author

This is ready to be merged after rebasing on the pending gitlab MR.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Nov 25, 2021 via email

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Nov 25, 2021 via email

@CMCDragonkai
Copy link
Member

Actually fs.promises.watch is even better.

@tegefaulkes
Copy link
Contributor Author

Simplified the tests to just getting the help page for now. If we need something more complex we can come up with something later.

@tegefaulkes
Copy link
Contributor Author

Once #283 has been merged then this can be merged after re-enabling the test jobs.

@CMCDragonkai
Copy link
Member

Jest caching is also incorporated into the CI/CD too now, so hopefully that reduces the amount of compilation time when running the jobs.

@CMCDragonkai
Copy link
Member

I'm seeing that the TS_CACHED_TRANSPILE_CACHE isn't working. It turns out that it requires an absolute path. The reason is that when it's set as a relative path, it is relative to the CWD. Which is different when running pkSpawn and related. Therefore it has to be set to $(pwd)/tmp/ts-node-cache, and then the cache gets created. Similarly I've also added TS_CACHED_TRANSPILE_PORTABLE=true to ensure that the file/key names are portable, not sure if this will help though since the CI/CD container paths may be different...

Anyway, this seemed to work locally, but gitlab CI/CD still has a problem.

TS_CACHED_TRANSPILE_CACHE=$(pwd)/tmp/ts-node-cache TS_CACHED_TRANSPILE_PORTABLE=true npm test -- tests/bin/agent

In other news, still the concurrent with bootstrap results in 1 success has an rare intermittent failure, where the command that exits actually succeeds. Perhaps very quickly... not sure how to replicate yet until I run a 100 runs.

@CMCDragonkai
Copy link
Member

Trying to use $CI_PROJECT_DIR instead of $(pwd) in child pipelines seems to not work: https://gitlab.com/gitlab-org/gitlab/-/issues/349543#note_804914765. Checking if I can get $CI_PROJECT_DIR somehow in the child pipeline to use for TS_CACHED_TRANSPILE_CACHE.

@CMCDragonkai
Copy link
Member

Further tests show that $CI_PROJECT_DIR does work when echoed inside a job script:

randomjob:
  image: registry.gitlab.com/matrixai/engineering/maintenance/gitlab-runner
  stage: test
  script:
    - echo "$TS_CACHED_TRANSPILE_CACHE"
    - echo "$TS_CACHED_TRANSPILE_PORTABLE"
    - echo "$CI_PROJECT_DIR"
    - echo $(pwd)
    - echo done

However, if used inside variables section, it doesn't work.

variables:
  GIT_SUBMODULE_STRATEGY: "recursive"
  # Cache .npm
  NPM_CONFIG_CACHE: "./tmp/npm"
  # Prefer offline node module installation
  NPM_CONFIG_PREFER_OFFLINE: "true"
  # `ts-node` has its own cache
  TS_CACHED_TRANSPILE_CACHE: "${CI_PROJECT_DIR}/tmp/ts-node-cache"
  TS_CACHED_TRANSPILE_PORTABLE: "true"

However this is only a problem when used in the child pipeline. The parent pipeline is fine with using $CI_PROJECT_DIR in the variables section.

This also means setting NPM_CONFIG_CACHE might be better set with $(pwd) or $CI_PROJECT_DIR as well. But if it's not working inside child pipelines, then either the variable has to be set inside the job variables, and not global variables, or the environment variable must be set in the script section.

@CMCDragonkai
Copy link
Member

I had to submit a support request for this: https://support.gitlab.com/hc/en-us/requests/260663

In the mean time, I'll have to use job-specific variables to use $CI_PROJECT_DIR.

@CMCDragonkai
Copy link
Member

Seems like child pipeline variables has some problems.

Furthermore the top-level variables acts like the "default" variables in each job section. That's probably good to know, so you can't refer to top level variables when creating job-level variables.

@CMCDragonkai
Copy link
Member

Ok I have found out what the problem is.

  1. Top level variables defined in a pipeline are used as the "default values" for any given job, therefore $CI_PROJECT_DIR is expanded only within the job, and this makes sense, as different jobs may have different project directories.
  2. Job-specific variables override top-level variables. But the top-level variables are just merged in. Don't think of them as being appended below in the shell.
  3. Parent pipeline variables get automatically inherited by child pipeline, this can be controlled with the inherit:variables option. It is by default true.
  4. Parent pipeline variables take precedence over child pipeline variables. In fact any variable declared in the parent that gets inherited just completely replaces any variable defined in the child whether in the top-level or job-specific.
  5. The $CI_PROJECT_DIR being used inside $TS_CACHED_TRANSPILE_CACHE gets automatically expanded in the "bridge job" or the "trigger job". And it gets expanded to "" which is an empty string. The $TS_CACHED_TRANSPILE_CACHE then gets inherited by the child pipeline, and no matter how we set it in the child pipeline, the value never has the $CI_PROJECT_DIR that we want.
  6. The solution is set inherit:variables: false so that variables are not automatically inherited, it is possible to give a list of variables to inherit from the parent in the future if we need it (https://docs.gitlab.com/ee/ci/yaml/index.html#inheritvariables).
  7. Then set the $TS_CACHED_TRANSPILE_CACHE appropriately in the child pipeline.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jan 9, 2022

Finally with the ts-node-cache applied foreground starts took about 9 seconds. This is about more than 2x the time it takes to run on my own computer here, so the CI/CD computers are not as powerful.

Without the ts-node cache, it takes about 22 seconds, which is double the time.

Anyway the per-test default timeout is 20s, and we x2 when starting pk agents raw, so this should be sufficient.

    ✓ start in foreground (9915 ms)
./tmp/ts-node-cache/: found 739 matching files and directories 

@CMCDragonkai
Copy link
Member

Ok some tests are expected to fail. Regardless, we are going to now test the final builds and QA runs and then if these all work, then we should be able to pass and merge.

@CMCDragonkai
Copy link
Member

The tests/bin/agent finally passes fully in CI/CD: https://gitlab.com/MatrixAI/open-source/js-polykey/-/jobs/1955156121

@CMCDragonkai
Copy link
Member

There's a pkg build failure now:

> [email protected]
> Error! --no-bytecode and no source breaks final executable
  /tmp/nix-build-polykey-1.0.0-linux-x64.drv-0/polykey/dist/bin/polykey.js
  Please run with "-d" and without "--no-bytecode" first, and make
  sure that debug log does not contain "was included as bytecode".
builder for '/nix/store/jgc3w5h0qv0n60jm8mw622d3z5bby3hx-polykey-1.0.0-linux-x64.drv' failed with exit code 2

This is something we can debug locally. Failing on nix builds:

        builds="$(nix-build \
          --max-jobs "$(nproc)" --cores "$(nproc)" \
          ./release.nix \
          --attr docker \
          --attr package.linux.x64.elf \
          --attr package.windows.x64.exe \
          --attr package.macos.x64.macho)"

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jan 9, 2022

This is really quite strange error. It works fine on TypeScript-Demo-Lib, but something changed in the middle. Adding --debug into the pkg call gives us:

> [debug] Deleting record file : /
> Error! --no-bytecode and no source breaks final executable
  /build/polykey/dist/bin/polykey.js
  Please run with "-d" and without "--no-bytecode" first, and make
  sure that debug log does not contain "was included as bytecode".

And there are thousands of debug lines above.

@tegefaulkes any ideas why this is happening?

Note that in Ts demo lib, the debug messages start to say The directory files list was included but messages in js-polykey is instead saying Directory /build/polykey/dist/proto/js is added to queue. and variations of directories being added in the queue. I suspect this must be an earlier phase of building.

@CMCDragonkai
Copy link
Member

It appears to work once I've added the --public option. The docs indicates now that both --public-packages and --public is needed to ensure no byte code is generated.

@CMCDragonkai
Copy link
Member

It turns out that the license switch to GPL3 is what caused it to fail. When setting --no-bytecode, we have to explicitly tell pkg that this is ok with --public. That's strange, it's as if pkg doesn't know that GPL3 is ok with public source code.

@CMCDragonkai
Copy link
Member

The list of licenses that PKG considers to be foss is here https://github.com/vercel/pkg/blob/main/lib/walker.ts#L118-L142

It has:

    'gplv3',

But not GPL-3.0 as we have set it.

@CMCDragonkai
Copy link
Member

Solution is to add --public to all of our calls to pkg.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jan 9, 2022

Builds work and pipeline passed. Time to clean up the scaffolding and merge!!!

https://gitlab.com/MatrixAI/open-source/js-polykey/-/pipelines/444133744

@CMCDragonkai CMCDragonkai merged commit 46ba7b3 into master Jan 9, 2022
@CMCDragonkai CMCDragonkai deleted the qa-testing branch January 9, 2022 07:19
@tegefaulkes
Copy link
Contributor Author

If I remember correctly some modules are included as bytecode for license reasons. You can disable that by using a certain argument but I forget what it is exactly.

@CMCDragonkai
Copy link
Member

All done.

@emmacasolin @tegefaulkes please review the pipelines page to see what the test failures are as you are working on your PRs:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment