Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to ICU tokenizer #939

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Use ICU system package

d585a63
Select commit
Loading
Failed to load commit list.
Open

Switch to ICU tokenizer #939

Use ICU system package
d585a63
Select commit
Loading
Failed to load commit list.
firefoxci-taskcluster / collect-mono-src-ru-en succeeded Nov 23, 2024 in 1h 12m 8s

FirefoxCI (pull_request)

collect mono src ru-en

Details

View task in Taskcluster | View logs in Taskcluster | View task group in Taskcluster

Task Status

Started: 2024-11-23T00:02:34.328Z
Resolved: 2024-11-23T00:02:40.673Z
Task Execution Time: 6 seconds, 345 milliseconds
Task Status: completed
Reason Resolved: completed
RunId: 0

Artifacts

- public/build/mono.en.zst
- public/logs/live_backing.log
- public/logs/live.log


[taskcluster 2024-11-23 00:02:34.371Z] Task ID: Xt1emcAkQJOYWTQ4C5sr3Q
[taskcluster 2024-11-23 00:02:34.371Z] Worker ID: 7974999744821634500
[taskcluster 2024-11-23 00:02:34.371Z] Worker Group: us-west1-b
[taskcluster 2024-11-23 00:02:34.371Z] Worker Node Type: projects/887720501152/machineTypes/n2-highmem-32
[taskcluster 2024-11-23 00:02:34.371Z] Worker Pool: translations-1/b-linux-large-gcp-300gb
[taskcluster 2024-11-23 00:02:34.371Z] Worker Version: 38.0.5
[taskcluster 2024-11-23 00:02:34.371Z] Public IP: 35.199.179.170
[taskcluster 2024-11-23 00:02:34.371Z] Hostname: translations-1-b-linux-large-gcp-300gb-wpglgp0grjksbehrwlcryw
[taskcluster 2024-11-23 00:02:34.371Z] using cache "translations-level-1-checkouts-v3-7afeb851dd97df8f3607-KnyIE1GvSz67R9mjL97Now" -> /builds/worker/checkouts

[taskcluster 2024-11-23 00:02:35.444Z] Image 'public/image.tar.zst' from task 'KnyIE1GvSz67R9mjL97Now' loaded.  Using image ID sha256:d31e1900b8212f46ff27eab4217df610f5d7a124bb4975b4b8ea07a64443f3ba.
[taskcluster 2024-11-23 00:02:35.456Z] === Task Starting ===
[setup 2024-11-23T00:02:37.517Z] run-task started in /builds/worker
[setup 2024-11-23T00:02:37.517Z] Invoked by command: --firefox_translations_training-checkout=/builds/worker/checkouts/vcs/ -- bash -c zstd -d --rm $MOZ_FETCHES_DIR/file* && $VCS_PATH/pipeline/translate/collect.sh fetches $TASK_WORKDIR/artifacts/mono.en.zst $MOZ_FETCHES_DIR/mono.ru.zst
[setup 2024-11-23T00:02:37.517Z] Python version: 3.10.12
[cache 2024-11-23T00:02:37.519Z] cache /builds/worker/checkouts exists; requirements: gid=1000 uid=1000 version=1
[volume 2024-11-23T00:02:37.519Z] volume /builds/worker/checkouts is a cache
[setup 2024-11-23T00:02:37.519Z] running as worker:worker
[vcs 2024-11-23T00:02:37.519Z] executing ['git', 'config', '--global', '--add', 'safe.directory', '/builds/worker/checkouts/vcs']
[vcs 2024-11-23T00:02:37.520Z] executing ['git', 'fetch', '--tags', '--force', 'https://github.com/mozilla/translations', 'icu_tokenizer']
[vcs 2024-11-23T00:02:37.749Z] From https://github.com/mozilla/translations
[vcs 2024-11-23T00:02:37.749Z]  * branch            icu_tokenizer -> FETCH_HEAD
[vcs 2024-11-23T00:02:37.756Z] executing ['git', 'fetch', '--no-tags', 'https://github.com/mozilla/translations', 'icu_tokenizer']
[vcs 2024-11-23T00:02:38.049Z] From https://github.com/mozilla/translations
[vcs 2024-11-23T00:02:38.049Z]  * branch            icu_tokenizer -> FETCH_HEAD
[vcs 2024-11-23T00:02:38.056Z] executing ['git', 'checkout', '-f', '-B', 'icu_tokenizer', 'd585a63a6abc04ece83e26ce51a0caa2f7fa21e6']
[vcs 2024-11-23T00:02:38.065Z] Reset branch 'icu_tokenizer'
[vcs 2024-11-23T00:02:38.081Z] executing ['git', 'submodule', 'init']
[vcs 2024-11-23T00:02:38.099Z] executing ['git', 'submodule', 'update', '--force']
[vcs 2024-11-23T00:02:38.183Z] Submodule path '3rd_party/browsermt-marian-dev': checked out '11c6ae7c46be21ef96ed10c60f28022fa968939f'
[vcs 2024-11-23T00:02:38.193Z] Submodule path '3rd_party/extract-lex': checked out '42fa605b53f32eaf6c6e0b5677255c21c91b3d49'
[vcs 2024-11-23T00:02:38.204Z] Submodule path '3rd_party/fast_align': checked out 'cab1e9aac8d3bb02ff5ae58218d8d225a039fa11'
[vcs 2024-11-23T00:02:38.228Z] Submodule path '3rd_party/kenlm': checked out 'bbf4fc511266c5d4515047055d7bdec659a6e158'
[vcs 2024-11-23T00:02:38.343Z] Submodule path '3rd_party/marian-dev': checked out 'e8a1a2530fb84cbff7383302ebca393e5875c441'
[vcs 2024-11-23T00:02:38.361Z] Submodule path '3rd_party/preprocess': checked out '64307314b4d5a9a0bd529b5c1036b0710d995eec'
[vcs 2024-11-23T00:02:38.428Z] Submodule path 'inference/3rd_party/browsermt-marian-dev': checked out '2781d735d4a10dca876d61be587afdab2726293c'
[vcs 2024-11-23T00:02:38.445Z] Submodule path 'inference/3rd_party/emsdk': checked out '2346baa7bb44a4a0571cc75f1986ab9aaa35aa03'
[vcs 2024-11-23T00:02:38.459Z] Submodule path 'inference/3rd_party/ssplit-cpp': checked out 'a311f9865ade34db1e8e080e6cc146f55dafb067'
[vcs 2024-11-23T00:02:38.460Z] cleaning git checkout...
[vcs 2024-11-23T00:02:38.460Z] executing ['git', 'clean', '-nxdff']
[vcs 2024-11-23T00:02:38.463Z] removing []
[vcs 2024-11-23T00:02:38.463Z] successfully cleaned git checkout!
[vcs 2024-11-23T00:02:38.464Z] TinderboxPrint:<a href='https://github.com/mozilla/translations/commit/d585a63a6abc04ece83e26ce51a0caa2f7fa21e6' title='Built from translations commit d585a63a6abc04ece83e26ce51a0caa2f7fa21e6'>d585a63a6abc04ece83e26ce51a0caa2f7fa21e6</a>
[setup 2024-11-23T00:02:38.464Z] MOZ_FETCHES_DIR is /builds/worker/fetches
[fetches 2024-11-23T00:02:38.464Z] fetching artifacts
[fetches 2024-11-23T00:02:38.464Z] executing ['/usr/bin/python3', '-u', '/usr/local/bin/fetch-content', 'task-artifacts']
attempt 1/5attempt 1/5
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Lru4vMdrTYCuM3-_YDBaLw/artifacts/public/build/file.1.out.zst to /builds/worker/fetches/file.1.out.zst
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QRGwRMMlR2Chi_A4GizTMQ/artifacts/public/build/mono.ru.zst to /builds/worker/fetches/mono.ru.zst

attempt 1/5Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QRGwRMMlR2Chi_A4GizTMQ/artifacts/public/build/mono.ru.zst

Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Lru4vMdrTYCuM3-_YDBaLw/artifacts/public/build/file.1.out.zstDownloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/RyJ61H6TRGqTQPiMv5I8mQ/artifacts/public/build/file.2.out.zst to /builds/worker/fetches/file.2.out.zst

Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/RyJ61H6TRGqTQPiMv5I8mQ/artifacts/public/build/file.2.out.zst
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/RyJ61H6TRGqTQPiMv5I8mQ/artifacts/public/build/file.2.out.zst resolved to 5796 bytes with sha256 33706c61de26bce79fb88c1d56d13cd55378f7ced7c39b4204ad89e841ea7424 in 0.078s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/RyJ61H6TRGqTQPiMv5I8mQ/artifacts/public/build/file.2.out.zst
Extracting /builds/worker/fetches/file.2.out.zst to /builds/worker/fetches
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QRGwRMMlR2Chi_A4GizTMQ/artifacts/public/build/mono.ru.zst resolved to 37563 bytes with sha256 304e08e5d74cb9ea8c5df020122b5ed56472d0b41a2848aecbd08b91d788b528 in 0.132s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QRGwRMMlR2Chi_A4GizTMQ/artifacts/public/build/mono.ru.zst
Extracting /builds/worker/fetches/mono.ru.zst to /builds/worker/fetches
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Lru4vMdrTYCuM3-_YDBaLw/artifacts/public/build/file.1.out.zst resolved to 5474 bytes with sha256 a6b84028a1fa16d820b91e4e91fb0b75b9e3f00b50aebe298c65b1238764582c in 0.152s
Verified size of https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Lru4vMdrTYCuM3-_YDBaLw/artifacts/public/build/file.1.out.zst
Extracting /builds/worker/fetches/file.1.out.zst to /builds/worker/fetches
PERFHERDER_DATA: {"framework": {"name": "build_metrics"}, "suites": [{"name": "fetch_content", "value": 0.15577055299999643, "lowerIsBetter": true, "shouldAlert": false, "subtests": []}]}
[fetches 2024-11-23T00:02:38.695Z] finished fetching artifacts
[task 2024-11-23T00:02:38.696Z] executing ['bash', '-c', 'zstd -d --rm $MOZ_FETCHES_DIR/file* && $VCS_PATH/pipeline/translate/collect.sh fetches $TASK_WORKDIR/artifacts/mono.en.zst $MOZ_FETCHES_DIR/mono.ru.zst']
[task 2024-11-23T00:02:38.698Z] 
[task 2024-11-23T00:02:38.698Z] Decompress:  1/ 2 files. Current: .../file.1.out.zst : 0 MB...    
[task 2024-11-23T00:02:38.699Z]                                                                                
[task 2024-11-23T00:02:38.699Z] 
[task 2024-11-23T00:02:38.699Z]                                                                                
[task 2024-11-23T00:02:38.699Z] 2 files decompressed : 1006050 bytes total 
[task 2024-11-23T00:02:38.700Z] + set -euo pipefail
[task 2024-11-23T00:02:38.700Z] + chunks_dir=fetches
[task 2024-11-23T00:02:38.700Z] + output_path=/builds/worker/artifacts/mono.en.zst
[task 2024-11-23T00:02:38.700Z] + mono_path=/builds/worker/fetches/mono.ru.zst
[task 2024-11-23T00:02:38.700Z] + echo '### Collecting translations'
[task 2024-11-23T00:02:38.700Z] ### Collecting translations
[task 2024-11-23T00:02:38.700Z] + find fetches -name '*.out'
[task 2024-11-23T00:02:38.700Z] + sort -t . -k2,2n
[task 2024-11-23T00:02:38.700Z] + xargs cat
[task 2024-11-23T00:02:38.700Z] + zstdmt
[task 2024-11-23T00:02:38.705Z] + echo '### Comparing number of sentences in source and artificial target files'
[task 2024-11-23T00:02:38.705Z] ### Comparing number of sentences in source and artificial target files
[task 2024-11-23T00:02:38.705Z] ++ zstdmt -dc /builds/worker/fetches/mono.ru.zst
[task 2024-11-23T00:02:38.705Z] ++ wc -l
[task 2024-11-23T00:02:38.707Z] + src_len=693
[task 2024-11-23T00:02:38.707Z] ++ zstdmt -dc /builds/worker/artifacts/mono.en.zst
[task 2024-11-23T00:02:38.707Z] ++ wc -l
[task 2024-11-23T00:02:38.709Z] + trg_len=693
[task 2024-11-23T00:02:38.709Z] + '[' 693 '!=' 693 ']'
[fetches 2024-11-23T00:02:38.709Z] removing /builds/worker/fetches
[fetches 2024-11-23T00:02:38.710Z] finished
[taskcluster 2024-11-23 00:02:39.991Z] === Task Finished ===
[taskcluster 2024-11-23 00:02:40.204Z] Successful task run with exit code: 0 completed in 5.834 seconds