Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to ICU tokenizer #939

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Use ICU system package

d585a63
Select commit
Loading
Failed to load commit list.
Open

Switch to ICU tokenizer #939

Use ICU system package
d585a63
Select commit
Loading
Failed to load commit list.
firefoxci-taskcluster / bicleaner-ai-mtdata-ELRC-web_acquired_data_related_to_scientific_resea_78c4de-ru-en succeeded Nov 22, 2024 in 13m 22s

FirefoxCI (pull_request)

bicleaner-ai for mtdata ELRC-web_acquired_data_related_to_scientific_resea_78c4de dataset ru-en

Details

View task in Taskcluster | View logs in Taskcluster | View task group in Taskcluster

Task Status

Started: 2024-11-22T22:58:49.192Z
Resolved: 2024-11-22T23:03:52.876Z
Task Execution Time: 5 minutes, 3 seconds, 684 milliseconds
Task Status: completed
Reason Resolved: completed
RunId: 0

Artifacts

- public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.best.zst
- public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.en.zst
- public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.filtered.zst
- public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.ru.zst
- public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.scored.zst
- public/logs/live_backing.log
- public/logs/live.log


[taskcluster 2024-11-22T22:58:49.210Z] Worker Type (translations-1/b-linux-v100-gpu-4-300gb) settings:
[taskcluster 2024-11-22T22:58:49.210Z]   {
[taskcluster 2024-11-22T22:58:49.210Z]     "config": {
[taskcluster 2024-11-22T22:58:49.210Z]       "deploymentId": ""
[taskcluster 2024-11-22T22:58:49.210Z]     },
[taskcluster 2024-11-22T22:58:49.210Z]     "generic-worker": {
[taskcluster 2024-11-22T22:58:49.210Z]       "engine": "insecure",
[taskcluster 2024-11-22T22:58:49.210Z]       "go-arch": "amd64",
[taskcluster 2024-11-22T22:58:49.210Z]       "go-os": "linux",
[taskcluster 2024-11-22T22:58:49.210Z]       "go-version": "go1.22.2",
[taskcluster 2024-11-22T22:58:49.210Z]       "release": "https://github.com/taskcluster/taskcluster/releases/tag/v64.2.6",
[taskcluster 2024-11-22T22:58:49.210Z]       "revision": "edab196d7d030a5d625b77335109cd9060ab7e1f",
[taskcluster 2024-11-22T22:58:49.210Z]       "source": "https://github.com/taskcluster/taskcluster/commits/edab196d7d030a5d625b77335109cd9060ab7e1f",
[taskcluster 2024-11-22T22:58:49.210Z]       "version": "64.2.6"
[taskcluster 2024-11-22T22:58:49.210Z]     },
[taskcluster 2024-11-22T22:58:49.210Z]     "image": "projects/taskcluster-imaging/global/images/gw-translations-gcp-googlecompute-2024-04-22t18-22-42z",
[taskcluster 2024-11-22T22:58:49.210Z]     "instance-id": "7870697164117947128",
[taskcluster 2024-11-22T22:58:49.210Z]     "instance-type": "projects/887720501152/machineTypes/custom-40-262144",
[taskcluster 2024-11-22T22:58:49.210Z]     "local-ipv4": "10.138.0.33",
[taskcluster 2024-11-22T22:58:49.210Z]     "project-id": "fxci-production-level1-workers",

...(332 lines hidden)...

[task 2024-11-22T23:02:14.725Z]   Downloading typing_extensions-4.11.0-py3-none-any.whl (34 kB)
[task 2024-11-22T23:02:14.804Z] Collecting urllib3==2.2.1
[task 2024-11-22T23:02:14.814Z]   Downloading urllib3-2.2.1-py3-none-any.whl (121 kB)
[task 2024-11-22T23:02:14.829Z]      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.1/121.1 KB 8.3 MB/s eta 0:00:00
[task 2024-11-22T23:02:14.950Z] Collecting werkzeug==3.0.3
[task 2024-11-22T23:02:14.960Z]   Downloading werkzeug-3.0.3-py3-none-any.whl (227 kB)
[task 2024-11-22T23:02:14.973Z]      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 227.3/227.3 KB 21.3 MB/s eta 0:00:00
[task 2024-11-22T23:02:15.032Z] Collecting wheel==0.43.0
[task 2024-11-22T23:02:15.042Z]   Downloading wheel-0.43.0-py3-none-any.whl (65 kB)
[task 2024-11-22T23:02:15.049Z]      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.8/65.8 KB 11.2 MB/s eta 0:00:00
[task 2024-11-22T23:02:15.289Z] Collecting wrapt==1.14.1
[task 2024-11-22T23:02:15.299Z]   Downloading wrapt-1.14.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77 kB)
[task 2024-11-22T23:02:15.319Z]      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.9/77.9 KB 3.7 MB/s eta 0:00:00
[task 2024-11-22T23:02:15.321Z] Requirement already satisfied: zstandard==0.22.0 in /usr/local/lib/python3.10/dist-packages (from -r ./checkouts/vcs/pipeline/bicleaner/requirements/bicleaner-ai.txt (line 235)) (0.22.0)
[task 2024-11-22T23:02:15.412Z] Requirement already satisfied: setuptools>=0.7.0 in /usr/lib/python3/dist-packages (from fasttext-wheel==0.9.2->-r ./checkouts/vcs/pipeline/bicleaner/requirements/bicleaner-ai.txt (line 39)) (59.6.0)
[task 2024-11-22T23:02:16.935Z] Building wheels for collected packages: sacremoses, toolwrapper
[task 2024-11-22T23:02:16.936Z]   Building wheel for sacremoses (setup.py): started
[task 2024-11-22T23:02:17.476Z]   Building wheel for sacremoses (setup.py): finished with status 'done'
[task 2024-11-22T23:02:17.479Z]   Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895260 sha256=86f9da97e377dcd5add1dcba3a9cf52d7cf6bcef8e2a183ec12e51e074aae715
[task 2024-11-22T23:02:17.479Z]   Stored in directory: /home/ubuntu/.cache/pip/wheels/00/24/97/a2ea5324f36bc626e1ea0267f33db6aa80d157ee977e9e42fb
[task 2024-11-22T23:02:17.481Z]   Building wheel for toolwrapper (setup.py): started
[task 2024-11-22T23:02:17.707Z]   Building wheel for toolwrapper (setup.py): finished with status 'done'
[task 2024-11-22T23:02:17.707Z]   Created wheel for toolwrapper: filename=toolwrapper-2.1.0-py3-none-any.whl size=3353 sha256=c3234d831205caa4e4f24fad3ca26559901c2c0287f1d3d1558f1294734670bc
[task 2024-11-22T23:02:17.707Z]   Stored in directory: /home/ubuntu/.cache/pip/wheels/e1/af/b1/99b57a06dda78fdcee86d2e22c64743f3b8df8f31cfc04baf7
[task 2024-11-22T23:02:17.709Z] Successfully built sacremoses toolwrapper
[task 2024-11-22T23:02:18.861Z] Installing collected packages: toolwrapper, sentencepiece, libclang, fuzzywuzzy, flatbuffers, wrapt, wheel, urllib3, typing-extensions, tqdm, tomli, threadpoolctl, termcolor, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, safetensors, regex, rapidfuzz, pyyaml, pybind11, pyasn1, psutil, protobuf, pluggy, packaging, oauthlib, numpy, markupsafe, markdown, keras, joblib, iniconfig, idna, grpcio, google-pasta, gast, fsspec, filelock, fastspell-dictionaries, exceptiongroup, click, charset-normalizer, certifi, cachetools, absl-py, werkzeug, scipy, sacremoses, rsa, requests, pytest, pyasn1-modules, opt-einsum, ml-dtypes, levenshtein, h5py, fasttext-wheel, astunparse, scikit-learn, requests-oauthlib, python-levenshtein, huggingface-hub, google-auth, bicleaner-ai-glove, tokenizers, google-auth-oauthlib, fastspell, transformers, tensorboard, bicleaner-hardrules, tensorflow, bicleaner-ai
[task 2024-11-22T23:02:19.500Z]   WARNING: The script wheel is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:19.500Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:19.652Z]   WARNING: The script tqdm is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:19.652Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:20.620Z]   WARNING: The script pybind11-config is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:20.620Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:23.108Z]   WARNING: The script f2py is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:23.108Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:23.201Z]   WARNING: The script markdown_py is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:23.201Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:26.690Z]   WARNING: The script normalizer is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:26.690Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:30.141Z]   WARNING: The script sacremoses is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:30.141Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:30.184Z]   WARNING: The scripts pyrsa-decrypt, pyrsa-encrypt, pyrsa-keygen, pyrsa-priv2pub, pyrsa-sign and pyrsa-verify are installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:30.184Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:30.417Z]   WARNING: The scripts py.test and pytest are installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:30.417Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:32.999Z]   WARNING: The script huggingface-cli is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:32.999Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:33.262Z]   WARNING: The script google-oauthlib-tool is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:33.262Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:33.299Z]   WARNING: The scripts fastspell and fastspell-download are installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:33.299Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:37.522Z]   WARNING: The script transformers-cli is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:37.522Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:38.628Z]   WARNING: The script tensorboard is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:38.628Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:39.500Z]   WARNING: The script bicleaner-hardrules is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:39.500Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:53.591Z]   WARNING: The scripts estimator_ckpt_converter, import_pb_to_tensorboard, saved_model_cli, tensorboard, tf_upgrade_v2, tflite_convert, toco and toco_from_protos are installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:53.591Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:53.736Z]   WARNING: The scripts bicleaner-ai-classify, bicleaner-ai-download, bicleaner-ai-download-hf, bicleaner-ai-generate-train and bicleaner-ai-train are installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:53.736Z]   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:53.771Z] Successfully installed absl-py-2.1.0 astunparse-1.6.3 bicleaner-ai-3.0.1 bicleaner-ai-glove-0.2.1 bicleaner-hardrules-2.10.4 cachetools-5.3.3 certifi-2024.2.2 charset-normalizer-3.3.2 click-8.1.7 exceptiongroup-1.2.1 fastspell-0.11 fastspell-dictionaries-3.2 fasttext-wheel-0.9.2 filelock-3.14.0 flatbuffers-24.3.25 fsspec-2024.5.0 fuzzywuzzy-0.18.0 gast-0.5.4 google-auth-2.29.0 google-auth-oauthlib-1.2.0 google-pasta-0.2.0 grpcio-1.63.0 h5py-3.11.0 huggingface-hub-0.22.2 idna-3.7 iniconfig-2.0.0 joblib-1.4.2 keras-2.15.0 levenshtein-0.25.1 libclang-18.1.1 markdown-3.6 markupsafe-2.1.5 ml-dtypes-0.3.2 numpy-1.26.4 oauthlib-3.2.2 opt-einsum-3.3.0 packaging-24.0 pluggy-1.5.0 protobuf-3.20.3 psutil-5.9.8 pyasn1-0.6.0 pyasn1-modules-0.4.0 pybind11-2.12.0 pytest-8.2.0 python-levenshtein-0.25.1 pyyaml-6.0.1 rapidfuzz-3.9.0 regex-2024.5.15 requests-2.31.0 requests-oauthlib-2.0.0 rsa-4.9 sacremoses-0.0.53 safetensors-0.4.3 scikit-learn-1.4.2 scipy-1.13.0 sentencepiece-0.2.0 tensorboard-2.15.2 tensorboard-data-server-0.7.2 tensorflow-2.15.1 tensorflow-estimator-2.15.0 tensorflow-io-gcs-filesystem-0.37.0 termcolor-2.4.0 threadpoolctl-3.5.0 tokenizers-0.15.2 tomli-2.0.1 toolwrapper-2.1.0 tqdm-4.66.4 transformers-4.36.1 typing-extensions-4.11.0 urllib3-2.2.1 werkzeug-3.0.3 wheel-0.43.0 wrapt-1.14.1
[task 2024-11-22T23:02:54.289Z] + set -euo pipefail
[task 2024-11-22T23:02:54.289Z] + echo '###### Bicleaner filtering'
[task 2024-11-22T23:02:54.289Z] ###### Bicleaner filtering
[task 2024-11-22T23:02:54.289Z] + test -v SRC
[task 2024-11-22T23:02:54.289Z] + test -v TRG
[task 2024-11-22T23:02:54.289Z] + test -v CUDA_DIR
[task 2024-11-22T23:02:54.289Z] + test -v CUDNN_DIR
[task 2024-11-22T23:02:54.289Z] + export LD_LIBRARY_PATH=fetches/cuda-toolkit/lib64:fetches/cuda-toolkit:
[task 2024-11-22T23:02:54.289Z] + LD_LIBRARY_PATH=fetches/cuda-toolkit/lib64:fetches/cuda-toolkit:
[task 2024-11-22T23:02:54.289Z] + corpus_prefix=/home/ubuntu/tasks/task_173231632907177/fetches/ELRC-web_acquired_data_related_to_scientific_resea_78c4de
[task 2024-11-22T23:02:54.289Z] + output_prefix=/home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de
[task 2024-11-22T23:02:54.289Z] + bicleaner_threshold=0.5
[task 2024-11-22T23:02:54.289Z] + threads=auto
[task 2024-11-22T23:02:54.289Z] + pack_dir=/home/ubuntu/tasks/task_173231632907177/fetches/bicleaner-ai-ru-en
[task 2024-11-22T23:02:54.289Z] + '[' auto = auto ']'
[task 2024-11-22T23:02:54.290Z] ++ nproc
[task 2024-11-22T23:02:54.291Z] + threads=40
[task 2024-11-22T23:02:54.292Z] ++ dirname /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de
[task 2024-11-22T23:02:54.293Z] + output_dir=/home/ubuntu/tasks/task_173231632907177/artifacts
[task 2024-11-22T23:02:54.293Z] + mkdir -p /home/ubuntu/tasks/task_173231632907177/artifacts
[task 2024-11-22T23:02:54.296Z] + '[' 0.5 == 0 ']'
[task 2024-11-22T23:02:54.296Z] + '[' 0.5 == 0.0 ']'
[task 2024-11-22T23:02:54.296Z] + export scol=1
[task 2024-11-22T23:02:54.296Z] + scol=1
[task 2024-11-22T23:02:54.296Z] + export tcol=2
[task 2024-11-22T23:02:54.296Z] + tcol=2
[task 2024-11-22T23:02:54.297Z] ++ grep source_lang /home/ubuntu/tasks/task_173231632907177/fetches/bicleaner-ai-ru-en/metadata.yaml
[task 2024-11-22T23:02:54.297Z] ++ awk '{print $2}'
[task 2024-11-22T23:02:54.301Z] + model_source_lang=en
[task 2024-11-22T23:02:54.302Z] ++ grep target_lang /home/ubuntu/tasks/task_173231632907177/fetches/bicleaner-ai-ru-en/metadata.yaml
[task 2024-11-22T23:02:54.302Z] ++ awk '{print $2}'
[task 2024-11-22T23:02:54.306Z] + model_target_lang=xx
[task 2024-11-22T23:02:54.306Z] + '[' en == en ']'
[task 2024-11-22T23:02:54.306Z] + export scol=2
[task 2024-11-22T23:02:54.306Z] + scol=2
[task 2024-11-22T23:02:54.306Z] + export tcol=1
[task 2024-11-22T23:02:54.306Z] + tcol=1
[task 2024-11-22T23:02:54.306Z] + '[' -z '' ']'
[task 2024-11-22T23:02:54.306Z] ++ nvidia-smi --query-gpu=index --format=csv,noheader
[task 2024-11-22T23:02:54.506Z] + export 'CUDA_VISIBLE_DEVICES=0
[task 2024-11-22T23:02:54.506Z] 1
[task 2024-11-22T23:02:54.506Z] 2
[task 2024-11-22T23:02:54.506Z] 3'
[task 2024-11-22T23:02:54.506Z] + CUDA_VISIBLE_DEVICES='0
[task 2024-11-22T23:02:54.506Z] 1
[task 2024-11-22T23:02:54.506Z] 2
[task 2024-11-22T23:02:54.506Z] 3'
[task 2024-11-22T23:02:54.506Z] + echo '### Classifying'
[task 2024-11-22T23:02:54.506Z] ### Classifying
[task 2024-11-22T23:02:54.506Z] + '[' 7 -gt 1 ']'
[task 2024-11-22T23:02:54.506Z] + CUDA_VISIBLE_ARRAY=('0' '1' '2' '3')
[task 2024-11-22T23:02:54.506Z] + export CUDA_VISIBLE_ARRAY
[task 2024-11-22T23:02:54.506Z] + export TF_CPP_MIN_LOG_LEVEL=0
[task 2024-11-22T23:02:54.506Z] + TF_CPP_MIN_LOG_LEVEL=0
[task 2024-11-22T23:02:54.506Z] + export -f biclean
[task 2024-11-22T23:02:54.507Z] + parallel -j 4 --pipe -k --block 10M biclean /home/ubuntu/tasks/task_173231632907177/fetches/bicleaner-ai-ru-en/metadata.yaml '{%}'
[task 2024-11-22T23:02:54.507Z] + zstdmt
[task 2024-11-22T23:02:54.507Z] ++ zstdmt -dc /home/ubuntu/tasks/task_173231632907177/fetches/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.ru.zst
[task 2024-11-22T23:02:54.508Z] + paste /dev/fd/63 /dev/fd/62
[task 2024-11-22T23:02:54.508Z] ++ zstdmt -dc /home/ubuntu/tasks/task_173231632907177/fetches/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.en.zst
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:02:56.104103: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:02:56.104184: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:02:56.105703: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:02:56.113512: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
[task 2024-11-22T23:03:51.018Z] To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:02:57.032599: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:02:58.543578: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:03:08.432163: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:03:08.432575: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:03:13.713454: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:03:13.713504: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:13.715129: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:13.723334: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
[task 2024-11-22T23:03:51.019Z] To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:14.569484: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:16.155543: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:16.222851: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:16.223277: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:17.683716: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:17.684190: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:17.684523: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:18.656239: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:18.656685: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:18.657218: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:18.657513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14784 MB memory:  -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:04.0, compute capability: 7.0
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:19.673275: I external/local_tsl/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:30,261 - WARNING - LM filter not present in metadata, disabling.
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:30,262 - WARNING - Porn removal not present in metadata, disabling
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:30,262 - WARNING - Using multilingual model, disabling language-dependant rules: not_too_short, length_ratio, no_only_numbers, no_repeated_words, no_wrong_language
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:30,262 - INFO - Arguments processed
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:30,262 - INFO - Starting process
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:49,906 - INFO - Finished
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:49,906 - INFO - Total: 1287 rows
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:49,906 - INFO - Elapsed time 19.64 s
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:49,906 - INFO - Troughput: 65 rows/s
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:49,906 - INFO - Program finished
[task 2024-11-22T23:03:51.028Z] + echo '### Filtering'
[task 2024-11-22T23:03:51.028Z] ### Filtering
[task 2024-11-22T23:03:51.029Z] + zstdmt -dc /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.scored.zst
[task 2024-11-22T23:03:51.029Z] + awk -v threshold=0.5 '-F\t' '{if ($3>threshold) {print $0}}'
[task 2024-11-22T23:03:51.029Z] + zstdmt
[task 2024-11-22T23:03:51.047Z] + zstdmt -dc /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.scored.zst
[task 2024-11-22T23:03:51.047Z] + awk -v threshold=0.5 '-F\t' '{if ($3<=threshold) {print $0}}'
[task 2024-11-22T23:03:51.047Z] + zstdmt
[task 2024-11-22T23:03:51.063Z] ++ zstdmt -dc /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.scored.zst
[task 2024-11-22T23:03:51.063Z] ++ wc -l
[task 2024-11-22T23:03:51.084Z] + echo 'Lines before filtering: 1287'
[task 2024-11-22T23:03:51.084Z] Lines before filtering: 1287
[task 2024-11-22T23:03:51.085Z] ++ zstdmt -dc /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.best.zst
[task 2024-11-22T23:03:51.085Z] ++ wc -l
[task 2024-11-22T23:03:51.090Z] + echo 'Lines after filtering: 1110'
[task 2024-11-22T23:03:51.090Z] Lines after filtering: 1110
[task 2024-11-22T23:03:51.090Z] + echo '### Writing output corpus'
[task 2024-11-22T23:03:51.090Z] ### Writing output corpus
[task 2024-11-22T23:03:51.090Z] + zstdmt -dc /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.best.zst
[task 2024-11-22T23:03:51.090Z] + tee /dev/fd/63
[task 2024-11-22T23:03:51.090Z] + cut -f2
[task 2024-11-22T23:03:51.091Z] + zstdmt
[task 2024-11-22T23:03:51.091Z] ++ cut -f1
[task 2024-11-22T23:03:51.091Z] ++ zstdmt
[task 2024-11-22T23:03:51.102Z] + echo '###### Done: Bicleaner filtering'
[task 2024-11-22T23:03:51.102Z] ###### Done: Bicleaner filtering
[fetches 2024-11-22T23:03:51.102Z] removing /home/ubuntu/tasks/task_173231632907177/fetches
[fetches 2024-11-22T23:03:52.356Z] finished
[taskcluster 2024-11-22T23:03:52.366Z]    Exit Code: 0
[taskcluster 2024-11-22T23:03:52.366Z]    User Time: 2m3.854372s
[taskcluster 2024-11-22T23:03:52.366Z]  Kernel Time: 43.277244s
[taskcluster 2024-11-22T23:03:52.366Z]    Wall Time: 5m2.155588047s
[taskcluster 2024-11-22T23:03:52.366Z]       Result: SUCCEEDED
[taskcluster 2024-11-22T23:03:52.366Z] === Task Finished ===
[taskcluster 2024-11-22T23:03:52.366Z] Task Duration: 5m2.157569416s
[taskcluster 2024-11-22T23:03:52.411Z] Uploading artifact public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.ru.zst from file /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.ru.zst with content encoding "identity", mime type "application/zstd" and expiry 2025-11-17T22:50:29.816Z
[taskcluster 2024-11-22T23:03:52.422Z] Uploading artifact public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.en.zst from file /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.en.zst with content encoding "identity", mime type "application/zstd" and expiry 2025-11-17T22:50:29.816Z
[taskcluster 2024-11-22T23:03:52.425Z] Uploading artifact public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.scored.zst from file /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.scored.zst with content encoding "identity", mime type "application/zstd" and expiry 2025-11-17T22:50:29.816Z
[taskcluster 2024-11-22T23:03:52.461Z] Uploading artifact public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.best.zst from file /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.best.zst with content encoding "identity", mime type "application/zstd" and expiry 2025-11-17T22:50:29.816Z
[taskcluster 2024-11-22T23:03:52.473Z] Uploading artifact public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.filtered.zst from file /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.filtered.zst with content encoding "identity", mime type "application/zstd" and expiry 2025-11-17T22:50:29.816Z
[taskcluster 2024-11-22T23:03:52.622Z] [mounts] Preserving cache: Moving "/home/ubuntu/tasks/task_173231632907177/checkouts" to "/home/ubuntu/caches/OrAmAfn0Snq8SXkm6xdiRg"
[taskcluster 2024-11-22T23:03:52.682Z] Uploading link artifact public/logs/live.log to artifact public/logs/live_backing.log with expiry 2025-11-17T22:50:29.816Z