Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to ICU tokenizer #939

Merged
merged 15 commits into from
Dec 21, 2024

Use ICU system package

d585a63
Select commit
Loading
Failed to load commit list.
Merged

Switch to ICU tokenizer #939

Use ICU system package
d585a63
Select commit
Loading
Failed to load commit list.
firefoxci-taskcluster / export-ru-en succeeded Nov 23, 2024 in 2h 32m 5s

FirefoxCI (pull_request)

export for ru-en

Details

View task in Taskcluster | View logs in Taskcluster | View task group in Taskcluster

Task Status

Started: 2024-11-23T01:21:35.795Z
Resolved: 2024-11-23T01:22:38.542Z
Task Execution Time: 1 minute, 2 seconds, 747 milliseconds
Task Status: completed
Reason Resolved: completed
RunId: 0

Artifacts

- public/build/lex.50.50.ruen.s2t.bin.gz
- public/build/model.ruen.intgemm.alphas.bin.gz
- public/build/vocab.ruen.spm.gz
- public/logs/live_backing.log
- public/logs/live.log


[taskcluster 2024-11-23 01:21:35.937Z] Task ID: bGCHD8mnQJSIDtSUHnTb-w
[taskcluster 2024-11-23 01:21:35.937Z] Worker ID: 4151726480485553056
[taskcluster 2024-11-23 01:21:35.937Z] Worker Group: us-central1-c
[taskcluster 2024-11-23 01:21:35.937Z] Worker Node Type: projects/887720501152/machineTypes/n2-highmem-32
[taskcluster 2024-11-23 01:21:35.937Z] Worker Pool: translations-1/b-linux-large-gcp
[taskcluster 2024-11-23 01:21:35.937Z] Worker Version: 38.0.5
[taskcluster 2024-11-23 01:21:35.937Z] Public IP: 35.238.71.106
[taskcluster 2024-11-23 01:21:35.937Z] Hostname: translations-1-b-linux-large-gcp-ctpvd-pdtwgq2ur11-juiq
[taskcluster 2024-11-23 01:21:35.937Z] using cache "translations-level-1-checkouts-v3-7afeb851dd97df8f3607-KnyIE1GvSz67R9mjL97Now" -> /builds/worker/checkouts

[taskcluster 2024-11-23 01:21:36.438Z] Image 'public/image.tar.zst' from task 'KnyIE1GvSz67R9mjL97Now' loaded.  Using image ID sha256:d31e1900b8212f46ff27eab4217df610f5d7a124bb4975b4b8ea07a64443f3ba.
[taskcluster 2024-11-23 01:21:36.496Z] === Task Starting ===
[setup 2024-11-23T01:21:36.764Z] run-task started in /builds/worker
[setup 2024-11-23T01:21:36.764Z] Invoked by command: --firefox_translations_training-checkout=/builds/worker/checkouts/vcs/ -- bash -c export BMT_MARIAN=$MOZ_FETCHES_DIR && export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$MOZ_FETCHES_DIR/cuda-toolkit/lib64" && zstd --rm -d $MOZ_FETCHES_DIR/*.zst && $VCS_PATH/pipeline/quantize/export.sh $MOZ_FETCHES_DIR $MOZ_FETCHES_DIR/lex.s2t.pruned $MOZ_FETCHES_DIR/vocab.spm $TASK_WORKDIR/artifacts
[setup 2024-11-23T01:21:36.764Z] Python version: 3.10.12
[cache 2024-11-23T01:21:36.766Z] cache /builds/worker/checkouts exists; requirements: gid=1000 uid=1000 version=1
[volume 2024-11-23T01:21:36.766Z] volume /builds/worker/checkouts is a cache
[setup 2024-11-23T01:21:36.766Z] running as worker:worker
[vcs 2024-11-23T01:21:36.766Z] executing ['git', 'config', '--global', '--add', 'safe.directory', '/builds/worker/checkouts/vcs']
[vcs 2024-11-23T01:21:36.768Z] executing ['git', 'fetch', '--tags', '--force', 'https://github.com/mozilla/translations', 'icu_tokenizer']
[vcs 2024-11-23T01:21:36.987Z] From https://github.com/mozilla/translations
[vcs 2024-11-23T01:21:36.987Z]  * branch            icu_tokenizer -> FETCH_HEAD
[vcs 2024-11-23T01:21:36.995Z] executing ['git', 'fetch', '--no-tags', 'https://github.com/mozilla/translations', 'icu_tokenizer']
[vcs 2024-11-23T01:21:37.158Z] From https://github.com/mozilla/translations
[vcs 2024-11-23T01:21:37.158Z]  * branch            icu_tokenizer -> FETCH_HEAD
[vcs 2024-11-23T01:21:37.165Z] executing ['git', 'checkout', '-f', '-B', 'icu_tokenizer', 'd585a63a6abc04ece83e26ce51a0caa2f7fa21e6']
[vcs 2024-11-23T01:21:37.173Z] Reset branch 'icu_tokenizer'
[vcs 2024-11-23T01:21:37.191Z] executing ['git', 'submodule', 'init']
[vcs 2024-11-23T01:21:37.210Z] executing ['git', 'submodule', 'update', '--force']
[vcs 2024-11-23T01:21:37.296Z] Submodule path '3rd_party/browsermt-marian-dev': checked out '11c6ae7c46be21ef96ed10c60f28022fa968939f'
[vcs 2024-11-23T01:21:37.307Z] Submodule path '3rd_party/extract-lex': checked out '42fa605b53f32eaf6c6e0b5677255c21c91b3d49'
[vcs 2024-11-23T01:21:37.318Z] Submodule path '3rd_party/fast_align': checked out 'cab1e9aac8d3bb02ff5ae58218d8d225a039fa11'
[vcs 2024-11-23T01:21:37.343Z] Submodule path '3rd_party/kenlm': checked out 'bbf4fc511266c5d4515047055d7bdec659a6e158'
[vcs 2024-11-23T01:21:37.460Z] Submodule path '3rd_party/marian-dev': checked out 'e8a1a2530fb84cbff7383302ebca393e5875c441'
[vcs 2024-11-23T01:21:37.478Z] Submodule path '3rd_party/preprocess': checked out '64307314b4d5a9a0bd529b5c1036b0710d995eec'
[vcs 2024-11-23T01:21:37.547Z] Submodule path 'inference/3rd_party/browsermt-marian-dev': checked out '2781d735d4a10dca876d61be587afdab2726293c'
[vcs 2024-11-23T01:21:37.565Z] Submodule path 'inference/3rd_party/emsdk': checked out '2346baa7bb44a4a0571cc75f1986ab9aaa35aa03'
[vcs 2024-11-23T01:21:37.579Z] Submodule path 'inference/3rd_party/ssplit-cpp': checked out 'a311f9865ade34db1e8e080e6cc146f55dafb067'
[vcs 2024-11-23T01:21:37.579Z] cleaning git checkout...
[vcs 2024-11-23T01:21:37.579Z] executing ['git', 'clean', '-nxdff']
[vcs 2024-11-23T01:21:37.582Z] removing []
[vcs 2024-11-23T01:21:37.582Z] successfully cleaned git checkout!
[vcs 2024-11-23T01:21:37.584Z] TinderboxPrint:<a href='https://github.com/mozilla/translations/commit/d585a63a6abc04ece83e26ce51a0caa2f7fa21e6' title='Built from translations commit d585a63a6abc04ece83e26ce51a0caa2f7fa21e6'>d585a63a6abc04ece83e26ce51a0caa2f7fa21e6</a>
[setup 2024-11-23T01:21:37.584Z] MOZ_FETCHES_DIR is /builds/worker/fetches
[fetches 2024-11-23T01:21:37.584Z] fetching artifacts
[fetches 2024-11-23T01:21:37.584Z] executing ['/usr/bin/python3', '-u', '/usr/local/bin/fetch-content', 'task-artifacts']
attempt 1/5
attempt 1/5Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SfKub3XmQoaP6QnvStUlVg/artifacts/public/build/model.intgemm.alphas.bin to /builds/worker/fetches/model.intgemm.alphas.bin
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/VYX7OmYwSMiSBM4cWKRsoQ/artifacts/public/build/lex.s2t.pruned.zst to /builds/worker/fetches/lex.s2t.pruned.zst
attempt 1/5
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SaWdyzsWQRqURmGqlu5cFQ/artifacts/public/build/marian.tar.zst to /builds/worker/fetches/marian.tar.zst
attempt 1/5Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/VYX7OmYwSMiSBM4cWKRsoQ/artifacts/public/build/lex.s2t.pruned.zst

attempt 1/5

Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/c3zkspF9RIutgMydCE2W7w/artifacts/public/build/cuda-toolkit.tar.zst to /builds/worker/fetches/cuda-toolkit.tar.zst
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QbzA4AsxRWu2LJDlSoyBgA/artifacts/public/build/vocab.spm to /builds/worker/fetches/vocab.spm
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SfKub3XmQoaP6QnvStUlVg/artifacts/public/build/model.intgemm.alphas.bin
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/c3zkspF9RIutgMydCE2W7w/artifacts/public/build/cuda-toolkit.tar.zstDownloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SaWdyzsWQRqURmGqlu5cFQ/artifacts/public/build/marian.tar.zst

Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QbzA4AsxRWu2LJDlSoyBgA/artifacts/public/build/vocab.spm
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/VYX7OmYwSMiSBM4cWKRsoQ/artifacts/public/build/lex.s2t.pruned.zst resolved to 16589 bytes with sha256 9f95a9cf5a02e117726b45855ad269abbd85013d857e56a86e25cd5a1cbfaf23 in 0.132s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/VYX7OmYwSMiSBM4cWKRsoQ/artifacts/public/build/lex.s2t.pruned.zst
Extracting /builds/worker/fetches/lex.s2t.pruned.zst to /builds/worker/fetches
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QbzA4AsxRWu2LJDlSoyBgA/artifacts/public/build/vocab.spm resolved to 255368 bytes with sha256 24c72884c471114ef75bb72514d84a898be5adc71cbe1db1fefa6c6618ca353c in 0.146s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QbzA4AsxRWu2LJDlSoyBgA/artifacts/public/build/vocab.spm
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SfKub3XmQoaP6QnvStUlVg/artifacts/public/build/model.intgemm.alphas.bin resolved to 9081145 bytes with sha256 cc1df96795f8432900a01177f5ad287aac679f334d5d39714ec0b6f869c2167c in 0.390s
Extracting /builds/worker/fetches/model.intgemm.alphas.bin to /builds/worker/fetches
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SaWdyzsWQRqURmGqlu5cFQ/artifacts/public/build/marian.tar.zst resolved to 166210513 bytes with sha256 74f778444cfae13689725562a7a565d2bfd70e8e63846ebd09b41a7e0ac693ef in 30.243s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/SaWdyzsWQRqURmGqlu5cFQ/artifacts/public/build/marian.tar.zst
Extracting /builds/worker/fetches/marian.tar.zst to /builds/worker/fetches
/builds/worker/fetches/marian.tar.zst extracted in 1.249s
Removing /builds/worker/fetches/marian.tar.zst
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/c3zkspF9RIutgMydCE2W7w/artifacts/public/build/cuda-toolkit.tar.zst resolved to 3294613746 bytes with sha256 5ec1190f3a4ed8b115dee5bb97083995163a10b3f4892d777d63e1b7855e6234 in 44.340s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/c3zkspF9RIutgMydCE2W7w/artifacts/public/build/cuda-toolkit.tar.zst
Extracting /builds/worker/fetches/cuda-toolkit.tar.zst to /builds/worker/fetches
/builds/worker/fetches/cuda-toolkit.tar.zst extracted in 13.735s
Removing /builds/worker/fetches/cuda-toolkit.tar.zst
PERFHERDER_DATA: {"framework": {"name": "build_metrics"}, "suites": [{"name": "fetch_content", "value": 58.357447347999994, "lowerIsBetter": true, "shouldAlert": false, "subtests": []}]}
[fetches 2024-11-23T01:22:36.022Z] finished fetching artifacts
[task 2024-11-23T01:22:36.022Z] executing ['bash', '-c', 'export BMT_MARIAN=$MOZ_FETCHES_DIR && export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$MOZ_FETCHES_DIR/cuda-toolkit/lib64" && zstd --rm -d $MOZ_FETCHES_DIR/*.zst && $VCS_PATH/pipeline/quantize/export.sh $MOZ_FETCHES_DIR $MOZ_FETCHES_DIR/lex.s2t.pruned $MOZ_FETCHES_DIR/vocab.spm $TASK_WORKDIR/artifacts']
[task 2024-11-23T01:22:36.025Z] 
[task 2024-11-23T01:22:36.025Z] s/lex.s2t.pruned.zst : 0 MB...     
[task 2024-11-23T01:22:36.025Z]                                                                                
[task 2024-11-23T01:22:36.025Z] /builds/worker/fetches/lex.s2t.pruned.zst: 60152 bytes 
[task 2024-11-23T01:22:36.026Z] + set -euo pipefail
[task 2024-11-23T01:22:36.026Z] + echo '###### Exporting a quantized model'
[task 2024-11-23T01:22:36.026Z] ###### Exporting a quantized model
[task 2024-11-23T01:22:36.026Z] + test -v SRC
[task 2024-11-23T01:22:36.026Z] + test -v TRG
[task 2024-11-23T01:22:36.026Z] + test -v BMT_MARIAN
[task 2024-11-23T01:22:36.026Z] + model_dir=/builds/worker/fetches
[task 2024-11-23T01:22:36.026Z] + shortlist=/builds/worker/fetches/lex.s2t.pruned
[task 2024-11-23T01:22:36.026Z] + vocab=/builds/worker/fetches/vocab.spm
[task 2024-11-23T01:22:36.026Z] + output_dir=/builds/worker/artifacts
[task 2024-11-23T01:22:36.026Z] + mkdir -p /builds/worker/artifacts
[task 2024-11-23T01:22:36.027Z] + model=/builds/worker/artifacts/model.ruen.intgemm.alphas.bin
[task 2024-11-23T01:22:36.028Z] + cp /builds/worker/fetches/model.intgemm.alphas.bin /builds/worker/artifacts/model.ruen.intgemm.alphas.bin
[task 2024-11-23T01:22:36.035Z] + pigz /builds/worker/artifacts/model.ruen.intgemm.alphas.bin
[task 2024-11-23T01:22:36.064Z] + shortlist_bin=/builds/worker/artifacts/lex.50.50.ruen.s2t.bin
[task 2024-11-23T01:22:36.064Z] + /builds/worker/fetches/marian-conv --shortlist /builds/worker/fetches/lex.s2t.pruned 50 50 0 --dump /builds/worker/artifacts/lex.50.50.ruen.s2t.bin --vocabs /builds/worker/fetches/vocab.spm /builds/worker/fetches/vocab.spm
[task 2024-11-23T01:22:36.092Z] [2024-11-23 01:22:36] [data] Loading SentencePiece vocabulary from file /builds/worker/fetches/vocab.spm
[task 2024-11-23T01:22:36.094Z] [2024-11-23 01:22:36] [data] Loading SentencePiece vocabulary from file /builds/worker/fetches/vocab.spm
[task 2024-11-23T01:22:36.096Z] [2024-11-23 01:22:36] [data] Importing text lexical shortlist as /builds/worker/fetches/lex.s2t.pruned 50 50 0
[task 2024-11-23T01:22:36.098Z] [2024-11-23 01:22:36] [data] Saving binary shortlist dump to /builds/worker/artifacts/lex.50.50.ruen.s2t.bin
[task 2024-11-23T01:22:36.098Z] [2024-11-23 01:22:36] Finished
[task 2024-11-23T01:22:36.105Z] + pigz /builds/worker/artifacts/lex.50.50.ruen.s2t.bin
[task 2024-11-23T01:22:36.107Z] + vocab_out=/builds/worker/artifacts/vocab.ruen.spm
[task 2024-11-23T01:22:36.107Z] + cp /builds/worker/fetches/vocab.spm /builds/worker/artifacts/vocab.ruen.spm
[task 2024-11-23T01:22:36.108Z] + pigz /builds/worker/artifacts/vocab.ruen.spm
[task 2024-11-23T01:22:36.120Z] + echo '### Export is completed. Results: /builds/worker/artifacts'
[task 2024-11-23T01:22:36.120Z] ### Export is completed. Results: /builds/worker/artifacts
[task 2024-11-23T01:22:36.120Z] + echo '###### Done: Exporting a quantized model'
[task 2024-11-23T01:22:36.120Z] ###### Done: Exporting a quantized model
[fetches 2024-11-23T01:22:36.120Z] removing /builds/worker/fetches
[fetches 2024-11-23T01:22:37.101Z] finished
[taskcluster 2024-11-23 01:22:37.308Z] === Task Finished ===
[taskcluster 2024-11-23 01:22:38.028Z] Successful task run with exit code: 0 completed in 62.092 seconds