Switch to ICU tokenizer #939
Open
firefoxci-taskcluster / collect-mono-src-ru-en
succeeded
Nov 23, 2024 in 1h 12m 8s
FirefoxCI (pull_request)
collect mono src ru-en
Details
View task in Taskcluster | View logs in Taskcluster | View task group in Taskcluster
Task Status
Started: 2024-11-23T00:02:34.328Z
Resolved: 2024-11-23T00:02:40.673Z
Task Execution Time: 6 seconds, 345 milliseconds
Task Status: completed
Reason Resolved: completed
RunId: 0
Artifacts
- public/build/mono.en.zst
- public/logs/live_backing.log
- public/logs/live.log
[taskcluster 2024-11-23 00:02:34.371Z] Task ID: Xt1emcAkQJOYWTQ4C5sr3Q
[taskcluster 2024-11-23 00:02:34.371Z] Worker ID: 7974999744821634500
[taskcluster 2024-11-23 00:02:34.371Z] Worker Group: us-west1-b
[taskcluster 2024-11-23 00:02:34.371Z] Worker Node Type: projects/887720501152/machineTypes/n2-highmem-32
[taskcluster 2024-11-23 00:02:34.371Z] Worker Pool: translations-1/b-linux-large-gcp-300gb
[taskcluster 2024-11-23 00:02:34.371Z] Worker Version: 38.0.5
[taskcluster 2024-11-23 00:02:34.371Z] Public IP: 35.199.179.170
[taskcluster 2024-11-23 00:02:34.371Z] Hostname: translations-1-b-linux-large-gcp-300gb-wpglgp0grjksbehrwlcryw
[taskcluster 2024-11-23 00:02:34.371Z] using cache "translations-level-1-checkouts-v3-7afeb851dd97df8f3607-KnyIE1GvSz67R9mjL97Now" -> /builds/worker/checkouts
[taskcluster 2024-11-23 00:02:35.444Z] Image 'public/image.tar.zst' from task 'KnyIE1GvSz67R9mjL97Now' loaded. Using image ID sha256:d31e1900b8212f46ff27eab4217df610f5d7a124bb4975b4b8ea07a64443f3ba.
[taskcluster 2024-11-23 00:02:35.456Z] === Task Starting ===
[setup 2024-11-23T00:02:37.517Z] run-task started in /builds/worker
[setup 2024-11-23T00:02:37.517Z] Invoked by command: --firefox_translations_training-checkout=/builds/worker/checkouts/vcs/ -- bash -c zstd -d --rm $MOZ_FETCHES_DIR/file* && $VCS_PATH/pipeline/translate/collect.sh fetches $TASK_WORKDIR/artifacts/mono.en.zst $MOZ_FETCHES_DIR/mono.ru.zst
[setup 2024-11-23T00:02:37.517Z] Python version: 3.10.12
[cache 2024-11-23T00:02:37.519Z] cache /builds/worker/checkouts exists; requirements: gid=1000 uid=1000 version=1
[volume 2024-11-23T00:02:37.519Z] volume /builds/worker/checkouts is a cache
[setup 2024-11-23T00:02:37.519Z] running as worker:worker
[vcs 2024-11-23T00:02:37.519Z] executing ['git', 'config', '--global', '--add', 'safe.directory', '/builds/worker/checkouts/vcs']
[vcs 2024-11-23T00:02:37.520Z] executing ['git', 'fetch', '--tags', '--force', 'https://github.com/mozilla/translations', 'icu_tokenizer']
[vcs 2024-11-23T00:02:37.749Z] From https://github.com/mozilla/translations
[vcs 2024-11-23T00:02:37.749Z] * branch icu_tokenizer -> FETCH_HEAD
[vcs 2024-11-23T00:02:37.756Z] executing ['git', 'fetch', '--no-tags', 'https://github.com/mozilla/translations', 'icu_tokenizer']
[vcs 2024-11-23T00:02:38.049Z] From https://github.com/mozilla/translations
[vcs 2024-11-23T00:02:38.049Z] * branch icu_tokenizer -> FETCH_HEAD
[vcs 2024-11-23T00:02:38.056Z] executing ['git', 'checkout', '-f', '-B', 'icu_tokenizer', 'd585a63a6abc04ece83e26ce51a0caa2f7fa21e6']
[vcs 2024-11-23T00:02:38.065Z] Reset branch 'icu_tokenizer'
[vcs 2024-11-23T00:02:38.081Z] executing ['git', 'submodule', 'init']
[vcs 2024-11-23T00:02:38.099Z] executing ['git', 'submodule', 'update', '--force']
[vcs 2024-11-23T00:02:38.183Z] Submodule path '3rd_party/browsermt-marian-dev': checked out '11c6ae7c46be21ef96ed10c60f28022fa968939f'
[vcs 2024-11-23T00:02:38.193Z] Submodule path '3rd_party/extract-lex': checked out '42fa605b53f32eaf6c6e0b5677255c21c91b3d49'
[vcs 2024-11-23T00:02:38.204Z] Submodule path '3rd_party/fast_align': checked out 'cab1e9aac8d3bb02ff5ae58218d8d225a039fa11'
[vcs 2024-11-23T00:02:38.228Z] Submodule path '3rd_party/kenlm': checked out 'bbf4fc511266c5d4515047055d7bdec659a6e158'
[vcs 2024-11-23T00:02:38.343Z] Submodule path '3rd_party/marian-dev': checked out 'e8a1a2530fb84cbff7383302ebca393e5875c441'
[vcs 2024-11-23T00:02:38.361Z] Submodule path '3rd_party/preprocess': checked out '64307314b4d5a9a0bd529b5c1036b0710d995eec'
[vcs 2024-11-23T00:02:38.428Z] Submodule path 'inference/3rd_party/browsermt-marian-dev': checked out '2781d735d4a10dca876d61be587afdab2726293c'
[vcs 2024-11-23T00:02:38.445Z] Submodule path 'inference/3rd_party/emsdk': checked out '2346baa7bb44a4a0571cc75f1986ab9aaa35aa03'
[vcs 2024-11-23T00:02:38.459Z] Submodule path 'inference/3rd_party/ssplit-cpp': checked out 'a311f9865ade34db1e8e080e6cc146f55dafb067'
[vcs 2024-11-23T00:02:38.460Z] cleaning git checkout...
[vcs 2024-11-23T00:02:38.460Z] executing ['git', 'clean', '-nxdff']
[vcs 2024-11-23T00:02:38.463Z] removing []
[vcs 2024-11-23T00:02:38.463Z] successfully cleaned git checkout!
[vcs 2024-11-23T00:02:38.464Z] TinderboxPrint:<a href='https://github.com/mozilla/translations/commit/d585a63a6abc04ece83e26ce51a0caa2f7fa21e6' title='Built from translations commit d585a63a6abc04ece83e26ce51a0caa2f7fa21e6'>d585a63a6abc04ece83e26ce51a0caa2f7fa21e6</a>
[setup 2024-11-23T00:02:38.464Z] MOZ_FETCHES_DIR is /builds/worker/fetches
[fetches 2024-11-23T00:02:38.464Z] fetching artifacts
[fetches 2024-11-23T00:02:38.464Z] executing ['/usr/bin/python3', '-u', '/usr/local/bin/fetch-content', 'task-artifacts']
attempt 1/5attempt 1/5
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Lru4vMdrTYCuM3-_YDBaLw/artifacts/public/build/file.1.out.zst to /builds/worker/fetches/file.1.out.zst
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QRGwRMMlR2Chi_A4GizTMQ/artifacts/public/build/mono.ru.zst to /builds/worker/fetches/mono.ru.zst
attempt 1/5Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QRGwRMMlR2Chi_A4GizTMQ/artifacts/public/build/mono.ru.zst
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Lru4vMdrTYCuM3-_YDBaLw/artifacts/public/build/file.1.out.zstDownloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/RyJ61H6TRGqTQPiMv5I8mQ/artifacts/public/build/file.2.out.zst to /builds/worker/fetches/file.2.out.zst
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/RyJ61H6TRGqTQPiMv5I8mQ/artifacts/public/build/file.2.out.zst
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/RyJ61H6TRGqTQPiMv5I8mQ/artifacts/public/build/file.2.out.zst resolved to 5796 bytes with sha256 33706c61de26bce79fb88c1d56d13cd55378f7ced7c39b4204ad89e841ea7424 in 0.078s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/RyJ61H6TRGqTQPiMv5I8mQ/artifacts/public/build/file.2.out.zst
Extracting /builds/worker/fetches/file.2.out.zst to /builds/worker/fetches
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QRGwRMMlR2Chi_A4GizTMQ/artifacts/public/build/mono.ru.zst resolved to 37563 bytes with sha256 304e08e5d74cb9ea8c5df020122b5ed56472d0b41a2848aecbd08b91d788b528 in 0.132s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QRGwRMMlR2Chi_A4GizTMQ/artifacts/public/build/mono.ru.zst
Extracting /builds/worker/fetches/mono.ru.zst to /builds/worker/fetches
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Lru4vMdrTYCuM3-_YDBaLw/artifacts/public/build/file.1.out.zst resolved to 5474 bytes with sha256 a6b84028a1fa16d820b91e4e91fb0b75b9e3f00b50aebe298c65b1238764582c in 0.152s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Lru4vMdrTYCuM3-_YDBaLw/artifacts/public/build/file.1.out.zst
Extracting /builds/worker/fetches/file.1.out.zst to /builds/worker/fetches
PERFHERDER_DATA: {"framework": {"name": "build_metrics"}, "suites": [{"name": "fetch_content", "value": 0.15577055299999643, "lowerIsBetter": true, "shouldAlert": false, "subtests": []}]}
[fetches 2024-11-23T00:02:38.695Z] finished fetching artifacts
[task 2024-11-23T00:02:38.696Z] executing ['bash', '-c', 'zstd -d --rm $MOZ_FETCHES_DIR/file* && $VCS_PATH/pipeline/translate/collect.sh fetches $TASK_WORKDIR/artifacts/mono.en.zst $MOZ_FETCHES_DIR/mono.ru.zst']
[task 2024-11-23T00:02:38.698Z]
[task 2024-11-23T00:02:38.698Z] Decompress: 1/ 2 files. Current: .../file.1.out.zst : 0 MB...
[task 2024-11-23T00:02:38.699Z]
[task 2024-11-23T00:02:38.699Z]
[task 2024-11-23T00:02:38.699Z]
[task 2024-11-23T00:02:38.699Z] 2 files decompressed : 1006050 bytes total
[task 2024-11-23T00:02:38.700Z] + set -euo pipefail
[task 2024-11-23T00:02:38.700Z] + chunks_dir=fetches
[task 2024-11-23T00:02:38.700Z] + output_path=/builds/worker/artifacts/mono.en.zst
[task 2024-11-23T00:02:38.700Z] + mono_path=/builds/worker/fetches/mono.ru.zst
[task 2024-11-23T00:02:38.700Z] + echo '### Collecting translations'
[task 2024-11-23T00:02:38.700Z] ### Collecting translations
[task 2024-11-23T00:02:38.700Z] + find fetches -name '*.out'
[task 2024-11-23T00:02:38.700Z] + sort -t . -k2,2n
[task 2024-11-23T00:02:38.700Z] + xargs cat
[task 2024-11-23T00:02:38.700Z] + zstdmt
[task 2024-11-23T00:02:38.705Z] + echo '### Comparing number of sentences in source and artificial target files'
[task 2024-11-23T00:02:38.705Z] ### Comparing number of sentences in source and artificial target files
[task 2024-11-23T00:02:38.705Z] ++ zstdmt -dc /builds/worker/fetches/mono.ru.zst
[task 2024-11-23T00:02:38.705Z] ++ wc -l
[task 2024-11-23T00:02:38.707Z] + src_len=693
[task 2024-11-23T00:02:38.707Z] ++ zstdmt -dc /builds/worker/artifacts/mono.en.zst
[task 2024-11-23T00:02:38.707Z] ++ wc -l
[task 2024-11-23T00:02:38.709Z] + trg_len=693
[task 2024-11-23T00:02:38.709Z] + '[' 693 '!=' 693 ']'
[fetches 2024-11-23T00:02:38.709Z] removing /builds/worker/fetches
[fetches 2024-11-23T00:02:38.710Z] finished
[taskcluster 2024-11-23 00:02:39.991Z] === Task Finished ===
[taskcluster 2024-11-23 00:02:40.204Z] Successful task run with exit code: 0 completed in 5.834 seconds
Loading