Switch to ICU tokenizer #939
Merged
firefoxci-taskcluster / export-ru-en
succeeded
Nov 23, 2024 in 2h 32m 5s
FirefoxCI (pull_request)
export for ru-en
Details
View task in Taskcluster | View logs in Taskcluster | View task group in Taskcluster
Task Status
Started: 2024-11-23T01:21:35.795Z
Resolved: 2024-11-23T01:22:38.542Z
Task Execution Time: 1 minute, 2 seconds, 747 milliseconds
Task Status: completed
Reason Resolved: completed
RunId: 0
Artifacts
- public/build/lex.50.50.ruen.s2t.bin.gz
- public/build/model.ruen.intgemm.alphas.bin.gz
- public/build/vocab.ruen.spm.gz
- public/logs/live_backing.log
- public/logs/live.log
[taskcluster 2024-11-23 01:21:35.937Z] Task ID: bGCHD8mnQJSIDtSUHnTb-w
[taskcluster 2024-11-23 01:21:35.937Z] Worker ID: 4151726480485553056
[taskcluster 2024-11-23 01:21:35.937Z] Worker Group: us-central1-c
[taskcluster 2024-11-23 01:21:35.937Z] Worker Node Type: projects/887720501152/machineTypes/n2-highmem-32
[taskcluster 2024-11-23 01:21:35.937Z] Worker Pool: translations-1/b-linux-large-gcp
[taskcluster 2024-11-23 01:21:35.937Z] Worker Version: 38.0.5
[taskcluster 2024-11-23 01:21:35.937Z] Public IP: 35.238.71.106
[taskcluster 2024-11-23 01:21:35.937Z] Hostname: translations-1-b-linux-large-gcp-ctpvd-pdtwgq2ur11-juiq
[taskcluster 2024-11-23 01:21:35.937Z] using cache "translations-level-1-checkouts-v3-7afeb851dd97df8f3607-KnyIE1GvSz67R9mjL97Now" -> /builds/worker/checkouts
[taskcluster 2024-11-23 01:21:36.438Z] Image 'public/image.tar.zst' from task 'KnyIE1GvSz67R9mjL97Now' loaded. Using image ID sha256:d31e1900b8212f46ff27eab4217df610f5d7a124bb4975b4b8ea07a64443f3ba.
[taskcluster 2024-11-23 01:21:36.496Z] === Task Starting ===
[setup 2024-11-23T01:21:36.764Z] run-task started in /builds/worker
[setup 2024-11-23T01:21:36.764Z] Invoked by command: --firefox_translations_training-checkout=/builds/worker/checkouts/vcs/ -- bash -c export BMT_MARIAN=$MOZ_FETCHES_DIR && export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$MOZ_FETCHES_DIR/cuda-toolkit/lib64" && zstd --rm -d $MOZ_FETCHES_DIR/*.zst && $VCS_PATH/pipeline/quantize/export.sh $MOZ_FETCHES_DIR $MOZ_FETCHES_DIR/lex.s2t.pruned $MOZ_FETCHES_DIR/vocab.spm $TASK_WORKDIR/artifacts
[setup 2024-11-23T01:21:36.764Z] Python version: 3.10.12
[cache 2024-11-23T01:21:36.766Z] cache /builds/worker/checkouts exists; requirements: gid=1000 uid=1000 version=1
[volume 2024-11-23T01:21:36.766Z] volume /builds/worker/checkouts is a cache
[setup 2024-11-23T01:21:36.766Z] running as worker:worker
[vcs 2024-11-23T01:21:36.766Z] executing ['git', 'config', '--global', '--add', 'safe.directory', '/builds/worker/checkouts/vcs']
[vcs 2024-11-23T01:21:36.768Z] executing ['git', 'fetch', '--tags', '--force', 'https://github.com/mozilla/translations', 'icu_tokenizer']
[vcs 2024-11-23T01:21:36.987Z] From https://github.com/mozilla/translations
[vcs 2024-11-23T01:21:36.987Z] * branch icu_tokenizer -> FETCH_HEAD
[vcs 2024-11-23T01:21:36.995Z] executing ['git', 'fetch', '--no-tags', 'https://github.com/mozilla/translations', 'icu_tokenizer']
[vcs 2024-11-23T01:21:37.158Z] From https://github.com/mozilla/translations
[vcs 2024-11-23T01:21:37.158Z] * branch icu_tokenizer -> FETCH_HEAD
[vcs 2024-11-23T01:21:37.165Z] executing ['git', 'checkout', '-f', '-B', 'icu_tokenizer', 'd585a63a6abc04ece83e26ce51a0caa2f7fa21e6']
[vcs 2024-11-23T01:21:37.173Z] Reset branch 'icu_tokenizer'
[vcs 2024-11-23T01:21:37.191Z] executing ['git', 'submodule', 'init']
[vcs 2024-11-23T01:21:37.210Z] executing ['git', 'submodule', 'update', '--force']
[vcs 2024-11-23T01:21:37.296Z] Submodule path '3rd_party/browsermt-marian-dev': checked out '11c6ae7c46be21ef96ed10c60f28022fa968939f'
[vcs 2024-11-23T01:21:37.307Z] Submodule path '3rd_party/extract-lex': checked out '42fa605b53f32eaf6c6e0b5677255c21c91b3d49'
[vcs 2024-11-23T01:21:37.318Z] Submodule path '3rd_party/fast_align': checked out 'cab1e9aac8d3bb02ff5ae58218d8d225a039fa11'
[vcs 2024-11-23T01:21:37.343Z] Submodule path '3rd_party/kenlm': checked out 'bbf4fc511266c5d4515047055d7bdec659a6e158'
[vcs 2024-11-23T01:21:37.460Z] Submodule path '3rd_party/marian-dev': checked out 'e8a1a2530fb84cbff7383302ebca393e5875c441'
[vcs 2024-11-23T01:21:37.478Z] Submodule path '3rd_party/preprocess': checked out '64307314b4d5a9a0bd529b5c1036b0710d995eec'
[vcs 2024-11-23T01:21:37.547Z] Submodule path 'inference/3rd_party/browsermt-marian-dev': checked out '2781d735d4a10dca876d61be587afdab2726293c'
[vcs 2024-11-23T01:21:37.565Z] Submodule path 'inference/3rd_party/emsdk': checked out '2346baa7bb44a4a0571cc75f1986ab9aaa35aa03'
[vcs 2024-11-23T01:21:37.579Z] Submodule path 'inference/3rd_party/ssplit-cpp': checked out 'a311f9865ade34db1e8e080e6cc146f55dafb067'
[vcs 2024-11-23T01:21:37.579Z] cleaning git checkout...
[vcs 2024-11-23T01:21:37.579Z] executing ['git', 'clean', '-nxdff']
[vcs 2024-11-23T01:21:37.582Z] removing []
[vcs 2024-11-23T01:21:37.582Z] successfully cleaned git checkout!
[vcs 2024-11-23T01:21:37.584Z] TinderboxPrint:<a href='https://github.com/mozilla/translations/commit/d585a63a6abc04ece83e26ce51a0caa2f7fa21e6' title='Built from translations commit d585a63a6abc04ece83e26ce51a0caa2f7fa21e6'>d585a63a6abc04ece83e26ce51a0caa2f7fa21e6</a>
[setup 2024-11-23T01:21:37.584Z] MOZ_FETCHES_DIR is /builds/worker/fetches
[fetches 2024-11-23T01:21:37.584Z] fetching artifacts
[fetches 2024-11-23T01:21:37.584Z] executing ['/usr/bin/python3', '-u', '/usr/local/bin/fetch-content', 'task-artifacts']
attempt 1/5
attempt 1/5Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SfKub3XmQoaP6QnvStUlVg/artifacts/public/build/model.intgemm.alphas.bin to /builds/worker/fetches/model.intgemm.alphas.bin
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/VYX7OmYwSMiSBM4cWKRsoQ/artifacts/public/build/lex.s2t.pruned.zst to /builds/worker/fetches/lex.s2t.pruned.zst
attempt 1/5
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SaWdyzsWQRqURmGqlu5cFQ/artifacts/public/build/marian.tar.zst to /builds/worker/fetches/marian.tar.zst
attempt 1/5Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/VYX7OmYwSMiSBM4cWKRsoQ/artifacts/public/build/lex.s2t.pruned.zst
attempt 1/5
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/c3zkspF9RIutgMydCE2W7w/artifacts/public/build/cuda-toolkit.tar.zst to /builds/worker/fetches/cuda-toolkit.tar.zst
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QbzA4AsxRWu2LJDlSoyBgA/artifacts/public/build/vocab.spm to /builds/worker/fetches/vocab.spm
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SfKub3XmQoaP6QnvStUlVg/artifacts/public/build/model.intgemm.alphas.bin
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/c3zkspF9RIutgMydCE2W7w/artifacts/public/build/cuda-toolkit.tar.zstDownloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SaWdyzsWQRqURmGqlu5cFQ/artifacts/public/build/marian.tar.zst
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QbzA4AsxRWu2LJDlSoyBgA/artifacts/public/build/vocab.spm
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/VYX7OmYwSMiSBM4cWKRsoQ/artifacts/public/build/lex.s2t.pruned.zst resolved to 16589 bytes with sha256 9f95a9cf5a02e117726b45855ad269abbd85013d857e56a86e25cd5a1cbfaf23 in 0.132s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/VYX7OmYwSMiSBM4cWKRsoQ/artifacts/public/build/lex.s2t.pruned.zst
Extracting /builds/worker/fetches/lex.s2t.pruned.zst to /builds/worker/fetches
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QbzA4AsxRWu2LJDlSoyBgA/artifacts/public/build/vocab.spm resolved to 255368 bytes with sha256 24c72884c471114ef75bb72514d84a898be5adc71cbe1db1fefa6c6618ca353c in 0.146s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QbzA4AsxRWu2LJDlSoyBgA/artifacts/public/build/vocab.spm
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SfKub3XmQoaP6QnvStUlVg/artifacts/public/build/model.intgemm.alphas.bin resolved to 9081145 bytes with sha256 cc1df96795f8432900a01177f5ad287aac679f334d5d39714ec0b6f869c2167c in 0.390s
Extracting /builds/worker/fetches/model.intgemm.alphas.bin to /builds/worker/fetches
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SaWdyzsWQRqURmGqlu5cFQ/artifacts/public/build/marian.tar.zst resolved to 166210513 bytes with sha256 74f778444cfae13689725562a7a565d2bfd70e8e63846ebd09b41a7e0ac693ef in 30.243s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SaWdyzsWQRqURmGqlu5cFQ/artifacts/public/build/marian.tar.zst
Extracting /builds/worker/fetches/marian.tar.zst to /builds/worker/fetches
/builds/worker/fetches/marian.tar.zst extracted in 1.249s
Removing /builds/worker/fetches/marian.tar.zst
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/c3zkspF9RIutgMydCE2W7w/artifacts/public/build/cuda-toolkit.tar.zst resolved to 3294613746 bytes with sha256 5ec1190f3a4ed8b115dee5bb97083995163a10b3f4892d777d63e1b7855e6234 in 44.340s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/c3zkspF9RIutgMydCE2W7w/artifacts/public/build/cuda-toolkit.tar.zst
Extracting /builds/worker/fetches/cuda-toolkit.tar.zst to /builds/worker/fetches
/builds/worker/fetches/cuda-toolkit.tar.zst extracted in 13.735s
Removing /builds/worker/fetches/cuda-toolkit.tar.zst
PERFHERDER_DATA: {"framework": {"name": "build_metrics"}, "suites": [{"name": "fetch_content", "value": 58.357447347999994, "lowerIsBetter": true, "shouldAlert": false, "subtests": []}]}
[fetches 2024-11-23T01:22:36.022Z] finished fetching artifacts
[task 2024-11-23T01:22:36.022Z] executing ['bash', '-c', 'export BMT_MARIAN=$MOZ_FETCHES_DIR && export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$MOZ_FETCHES_DIR/cuda-toolkit/lib64" && zstd --rm -d $MOZ_FETCHES_DIR/*.zst && $VCS_PATH/pipeline/quantize/export.sh $MOZ_FETCHES_DIR $MOZ_FETCHES_DIR/lex.s2t.pruned $MOZ_FETCHES_DIR/vocab.spm $TASK_WORKDIR/artifacts']
[task 2024-11-23T01:22:36.025Z]
[task 2024-11-23T01:22:36.025Z] s/lex.s2t.pruned.zst : 0 MB...
[task 2024-11-23T01:22:36.025Z]
[task 2024-11-23T01:22:36.025Z] /builds/worker/fetches/lex.s2t.pruned.zst: 60152 bytes
[task 2024-11-23T01:22:36.026Z] + set -euo pipefail
[task 2024-11-23T01:22:36.026Z] + echo '###### Exporting a quantized model'
[task 2024-11-23T01:22:36.026Z] ###### Exporting a quantized model
[task 2024-11-23T01:22:36.026Z] + test -v SRC
[task 2024-11-23T01:22:36.026Z] + test -v TRG
[task 2024-11-23T01:22:36.026Z] + test -v BMT_MARIAN
[task 2024-11-23T01:22:36.026Z] + model_dir=/builds/worker/fetches
[task 2024-11-23T01:22:36.026Z] + shortlist=/builds/worker/fetches/lex.s2t.pruned
[task 2024-11-23T01:22:36.026Z] + vocab=/builds/worker/fetches/vocab.spm
[task 2024-11-23T01:22:36.026Z] + output_dir=/builds/worker/artifacts
[task 2024-11-23T01:22:36.026Z] + mkdir -p /builds/worker/artifacts
[task 2024-11-23T01:22:36.027Z] + model=/builds/worker/artifacts/model.ruen.intgemm.alphas.bin
[task 2024-11-23T01:22:36.028Z] + cp /builds/worker/fetches/model.intgemm.alphas.bin /builds/worker/artifacts/model.ruen.intgemm.alphas.bin
[task 2024-11-23T01:22:36.035Z] + pigz /builds/worker/artifacts/model.ruen.intgemm.alphas.bin
[task 2024-11-23T01:22:36.064Z] + shortlist_bin=/builds/worker/artifacts/lex.50.50.ruen.s2t.bin
[task 2024-11-23T01:22:36.064Z] + /builds/worker/fetches/marian-conv --shortlist /builds/worker/fetches/lex.s2t.pruned 50 50 0 --dump /builds/worker/artifacts/lex.50.50.ruen.s2t.bin --vocabs /builds/worker/fetches/vocab.spm /builds/worker/fetches/vocab.spm
[task 2024-11-23T01:22:36.092Z] [2024-11-23 01:22:36] [data] Loading SentencePiece vocabulary from file /builds/worker/fetches/vocab.spm
[task 2024-11-23T01:22:36.094Z] [2024-11-23 01:22:36] [data] Loading SentencePiece vocabulary from file /builds/worker/fetches/vocab.spm
[task 2024-11-23T01:22:36.096Z] [2024-11-23 01:22:36] [data] Importing text lexical shortlist as /builds/worker/fetches/lex.s2t.pruned 50 50 0
[task 2024-11-23T01:22:36.098Z] [2024-11-23 01:22:36] [data] Saving binary shortlist dump to /builds/worker/artifacts/lex.50.50.ruen.s2t.bin
[task 2024-11-23T01:22:36.098Z] [2024-11-23 01:22:36] Finished
[task 2024-11-23T01:22:36.105Z] + pigz /builds/worker/artifacts/lex.50.50.ruen.s2t.bin
[task 2024-11-23T01:22:36.107Z] + vocab_out=/builds/worker/artifacts/vocab.ruen.spm
[task 2024-11-23T01:22:36.107Z] + cp /builds/worker/fetches/vocab.spm /builds/worker/artifacts/vocab.ruen.spm
[task 2024-11-23T01:22:36.108Z] + pigz /builds/worker/artifacts/vocab.ruen.spm
[task 2024-11-23T01:22:36.120Z] + echo '### Export is completed. Results: /builds/worker/artifacts'
[task 2024-11-23T01:22:36.120Z] ### Export is completed. Results: /builds/worker/artifacts
[task 2024-11-23T01:22:36.120Z] + echo '###### Done: Exporting a quantized model'
[task 2024-11-23T01:22:36.120Z] ###### Done: Exporting a quantized model
[fetches 2024-11-23T01:22:36.120Z] removing /builds/worker/fetches
[fetches 2024-11-23T01:22:37.101Z] finished
[taskcluster 2024-11-23 01:22:37.308Z] === Task Finished ===
[taskcluster 2024-11-23 01:22:38.028Z] Successful task run with exit code: 0 completed in 62.092 seconds
Loading