Switch to ICU tokenizer #939
FirefoxCI (pull_request)
bicleaner-ai for mtdata ELRC-web_acquired_data_related_to_scientific_resea_78c4de dataset ru-en
Details
View task in Taskcluster | View logs in Taskcluster | View task group in Taskcluster
Task Status
Started: 2024-11-22T22:58:49.192Z
Resolved: 2024-11-22T23:03:52.876Z
Task Execution Time: 5 minutes, 3 seconds, 684 milliseconds
Task Status: completed
Reason Resolved: completed
RunId: 0
Artifacts
- public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.best.zst
- public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.en.zst
- public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.filtered.zst
- public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.ru.zst
- public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.scored.zst
- public/logs/live_backing.log
- public/logs/live.log
[taskcluster 2024-11-22T22:58:49.210Z] Worker Type (translations-1/b-linux-v100-gpu-4-300gb) settings:
[taskcluster 2024-11-22T22:58:49.210Z] {
[taskcluster 2024-11-22T22:58:49.210Z] "config": {
[taskcluster 2024-11-22T22:58:49.210Z] "deploymentId": ""
[taskcluster 2024-11-22T22:58:49.210Z] },
[taskcluster 2024-11-22T22:58:49.210Z] "generic-worker": {
[taskcluster 2024-11-22T22:58:49.210Z] "engine": "insecure",
[taskcluster 2024-11-22T22:58:49.210Z] "go-arch": "amd64",
[taskcluster 2024-11-22T22:58:49.210Z] "go-os": "linux",
[taskcluster 2024-11-22T22:58:49.210Z] "go-version": "go1.22.2",
[taskcluster 2024-11-22T22:58:49.210Z] "release": "https://github.com/taskcluster/taskcluster/releases/tag/v64.2.6",
[taskcluster 2024-11-22T22:58:49.210Z] "revision": "edab196d7d030a5d625b77335109cd9060ab7e1f",
[taskcluster 2024-11-22T22:58:49.210Z] "source": "https://github.com/taskcluster/taskcluster/commits/edab196d7d030a5d625b77335109cd9060ab7e1f",
[taskcluster 2024-11-22T22:58:49.210Z] "version": "64.2.6"
[taskcluster 2024-11-22T22:58:49.210Z] },
[taskcluster 2024-11-22T22:58:49.210Z] "image": "projects/taskcluster-imaging/global/images/gw-translations-gcp-googlecompute-2024-04-22t18-22-42z",
[taskcluster 2024-11-22T22:58:49.210Z] "instance-id": "7870697164117947128",
[taskcluster 2024-11-22T22:58:49.210Z] "instance-type": "projects/887720501152/machineTypes/custom-40-262144",
[taskcluster 2024-11-22T22:58:49.210Z] "local-ipv4": "10.138.0.33",
[taskcluster 2024-11-22T22:58:49.210Z] "project-id": "fxci-production-level1-workers",
...(332 lines hidden)...
[task 2024-11-22T23:02:14.725Z] Downloading typing_extensions-4.11.0-py3-none-any.whl (34 kB)
[task 2024-11-22T23:02:14.804Z] Collecting urllib3==2.2.1
[task 2024-11-22T23:02:14.814Z] Downloading urllib3-2.2.1-py3-none-any.whl (121 kB)
[task 2024-11-22T23:02:14.829Z] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.1/121.1 KB 8.3 MB/s eta 0:00:00
[task 2024-11-22T23:02:14.950Z] Collecting werkzeug==3.0.3
[task 2024-11-22T23:02:14.960Z] Downloading werkzeug-3.0.3-py3-none-any.whl (227 kB)
[task 2024-11-22T23:02:14.973Z] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 227.3/227.3 KB 21.3 MB/s eta 0:00:00
[task 2024-11-22T23:02:15.032Z] Collecting wheel==0.43.0
[task 2024-11-22T23:02:15.042Z] Downloading wheel-0.43.0-py3-none-any.whl (65 kB)
[task 2024-11-22T23:02:15.049Z] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.8/65.8 KB 11.2 MB/s eta 0:00:00
[task 2024-11-22T23:02:15.289Z] Collecting wrapt==1.14.1
[task 2024-11-22T23:02:15.299Z] Downloading wrapt-1.14.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77 kB)
[task 2024-11-22T23:02:15.319Z] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.9/77.9 KB 3.7 MB/s eta 0:00:00
[task 2024-11-22T23:02:15.321Z] Requirement already satisfied: zstandard==0.22.0 in /usr/local/lib/python3.10/dist-packages (from -r ./checkouts/vcs/pipeline/bicleaner/requirements/bicleaner-ai.txt (line 235)) (0.22.0)
[task 2024-11-22T23:02:15.412Z] Requirement already satisfied: setuptools>=0.7.0 in /usr/lib/python3/dist-packages (from fasttext-wheel==0.9.2->-r ./checkouts/vcs/pipeline/bicleaner/requirements/bicleaner-ai.txt (line 39)) (59.6.0)
[task 2024-11-22T23:02:16.935Z] Building wheels for collected packages: sacremoses, toolwrapper
[task 2024-11-22T23:02:16.936Z] Building wheel for sacremoses (setup.py): started
[task 2024-11-22T23:02:17.476Z] Building wheel for sacremoses (setup.py): finished with status 'done'
[task 2024-11-22T23:02:17.479Z] Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895260 sha256=86f9da97e377dcd5add1dcba3a9cf52d7cf6bcef8e2a183ec12e51e074aae715
[task 2024-11-22T23:02:17.479Z] Stored in directory: /home/ubuntu/.cache/pip/wheels/00/24/97/a2ea5324f36bc626e1ea0267f33db6aa80d157ee977e9e42fb
[task 2024-11-22T23:02:17.481Z] Building wheel for toolwrapper (setup.py): started
[task 2024-11-22T23:02:17.707Z] Building wheel for toolwrapper (setup.py): finished with status 'done'
[task 2024-11-22T23:02:17.707Z] Created wheel for toolwrapper: filename=toolwrapper-2.1.0-py3-none-any.whl size=3353 sha256=c3234d831205caa4e4f24fad3ca26559901c2c0287f1d3d1558f1294734670bc
[task 2024-11-22T23:02:17.707Z] Stored in directory: /home/ubuntu/.cache/pip/wheels/e1/af/b1/99b57a06dda78fdcee86d2e22c64743f3b8df8f31cfc04baf7
[task 2024-11-22T23:02:17.709Z] Successfully built sacremoses toolwrapper
[task 2024-11-22T23:02:18.861Z] Installing collected packages: toolwrapper, sentencepiece, libclang, fuzzywuzzy, flatbuffers, wrapt, wheel, urllib3, typing-extensions, tqdm, tomli, threadpoolctl, termcolor, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, safetensors, regex, rapidfuzz, pyyaml, pybind11, pyasn1, psutil, protobuf, pluggy, packaging, oauthlib, numpy, markupsafe, markdown, keras, joblib, iniconfig, idna, grpcio, google-pasta, gast, fsspec, filelock, fastspell-dictionaries, exceptiongroup, click, charset-normalizer, certifi, cachetools, absl-py, werkzeug, scipy, sacremoses, rsa, requests, pytest, pyasn1-modules, opt-einsum, ml-dtypes, levenshtein, h5py, fasttext-wheel, astunparse, scikit-learn, requests-oauthlib, python-levenshtein, huggingface-hub, google-auth, bicleaner-ai-glove, tokenizers, google-auth-oauthlib, fastspell, transformers, tensorboard, bicleaner-hardrules, tensorflow, bicleaner-ai
[task 2024-11-22T23:02:19.500Z] WARNING: The script wheel is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:19.500Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:19.652Z] WARNING: The script tqdm is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:19.652Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:20.620Z] WARNING: The script pybind11-config is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:20.620Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:23.108Z] WARNING: The script f2py is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:23.108Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:23.201Z] WARNING: The script markdown_py is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:23.201Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:26.690Z] WARNING: The script normalizer is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:26.690Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:30.141Z] WARNING: The script sacremoses is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:30.141Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:30.184Z] WARNING: The scripts pyrsa-decrypt, pyrsa-encrypt, pyrsa-keygen, pyrsa-priv2pub, pyrsa-sign and pyrsa-verify are installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:30.184Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:30.417Z] WARNING: The scripts py.test and pytest are installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:30.417Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:32.999Z] WARNING: The script huggingface-cli is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:32.999Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:33.262Z] WARNING: The script google-oauthlib-tool is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:33.262Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:33.299Z] WARNING: The scripts fastspell and fastspell-download are installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:33.299Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:37.522Z] WARNING: The script transformers-cli is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:37.522Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:38.628Z] WARNING: The script tensorboard is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:38.628Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:39.500Z] WARNING: The script bicleaner-hardrules is installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:39.500Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:53.591Z] WARNING: The scripts estimator_ckpt_converter, import_pb_to_tensorboard, saved_model_cli, tensorboard, tf_upgrade_v2, tflite_convert, toco and toco_from_protos are installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:53.591Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:53.736Z] WARNING: The scripts bicleaner-ai-classify, bicleaner-ai-download, bicleaner-ai-download-hf, bicleaner-ai-generate-train and bicleaner-ai-train are installed in '/home/ubuntu/.local/bin' which is not on PATH.
[task 2024-11-22T23:02:53.736Z] Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
[task 2024-11-22T23:02:53.771Z] Successfully installed absl-py-2.1.0 astunparse-1.6.3 bicleaner-ai-3.0.1 bicleaner-ai-glove-0.2.1 bicleaner-hardrules-2.10.4 cachetools-5.3.3 certifi-2024.2.2 charset-normalizer-3.3.2 click-8.1.7 exceptiongroup-1.2.1 fastspell-0.11 fastspell-dictionaries-3.2 fasttext-wheel-0.9.2 filelock-3.14.0 flatbuffers-24.3.25 fsspec-2024.5.0 fuzzywuzzy-0.18.0 gast-0.5.4 google-auth-2.29.0 google-auth-oauthlib-1.2.0 google-pasta-0.2.0 grpcio-1.63.0 h5py-3.11.0 huggingface-hub-0.22.2 idna-3.7 iniconfig-2.0.0 joblib-1.4.2 keras-2.15.0 levenshtein-0.25.1 libclang-18.1.1 markdown-3.6 markupsafe-2.1.5 ml-dtypes-0.3.2 numpy-1.26.4 oauthlib-3.2.2 opt-einsum-3.3.0 packaging-24.0 pluggy-1.5.0 protobuf-3.20.3 psutil-5.9.8 pyasn1-0.6.0 pyasn1-modules-0.4.0 pybind11-2.12.0 pytest-8.2.0 python-levenshtein-0.25.1 pyyaml-6.0.1 rapidfuzz-3.9.0 regex-2024.5.15 requests-2.31.0 requests-oauthlib-2.0.0 rsa-4.9 sacremoses-0.0.53 safetensors-0.4.3 scikit-learn-1.4.2 scipy-1.13.0 sentencepiece-0.2.0 tensorboard-2.15.2 tensorboard-data-server-0.7.2 tensorflow-2.15.1 tensorflow-estimator-2.15.0 tensorflow-io-gcs-filesystem-0.37.0 termcolor-2.4.0 threadpoolctl-3.5.0 tokenizers-0.15.2 tomli-2.0.1 toolwrapper-2.1.0 tqdm-4.66.4 transformers-4.36.1 typing-extensions-4.11.0 urllib3-2.2.1 werkzeug-3.0.3 wheel-0.43.0 wrapt-1.14.1
[task 2024-11-22T23:02:54.289Z] + set -euo pipefail
[task 2024-11-22T23:02:54.289Z] + echo '###### Bicleaner filtering'
[task 2024-11-22T23:02:54.289Z] ###### Bicleaner filtering
[task 2024-11-22T23:02:54.289Z] + test -v SRC
[task 2024-11-22T23:02:54.289Z] + test -v TRG
[task 2024-11-22T23:02:54.289Z] + test -v CUDA_DIR
[task 2024-11-22T23:02:54.289Z] + test -v CUDNN_DIR
[task 2024-11-22T23:02:54.289Z] + export LD_LIBRARY_PATH=fetches/cuda-toolkit/lib64:fetches/cuda-toolkit:
[task 2024-11-22T23:02:54.289Z] + LD_LIBRARY_PATH=fetches/cuda-toolkit/lib64:fetches/cuda-toolkit:
[task 2024-11-22T23:02:54.289Z] + corpus_prefix=/home/ubuntu/tasks/task_173231632907177/fetches/ELRC-web_acquired_data_related_to_scientific_resea_78c4de
[task 2024-11-22T23:02:54.289Z] + output_prefix=/home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de
[task 2024-11-22T23:02:54.289Z] + bicleaner_threshold=0.5
[task 2024-11-22T23:02:54.289Z] + threads=auto
[task 2024-11-22T23:02:54.289Z] + pack_dir=/home/ubuntu/tasks/task_173231632907177/fetches/bicleaner-ai-ru-en
[task 2024-11-22T23:02:54.289Z] + '[' auto = auto ']'
[task 2024-11-22T23:02:54.290Z] ++ nproc
[task 2024-11-22T23:02:54.291Z] + threads=40
[task 2024-11-22T23:02:54.292Z] ++ dirname /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de
[task 2024-11-22T23:02:54.293Z] + output_dir=/home/ubuntu/tasks/task_173231632907177/artifacts
[task 2024-11-22T23:02:54.293Z] + mkdir -p /home/ubuntu/tasks/task_173231632907177/artifacts
[task 2024-11-22T23:02:54.296Z] + '[' 0.5 == 0 ']'
[task 2024-11-22T23:02:54.296Z] + '[' 0.5 == 0.0 ']'
[task 2024-11-22T23:02:54.296Z] + export scol=1
[task 2024-11-22T23:02:54.296Z] + scol=1
[task 2024-11-22T23:02:54.296Z] + export tcol=2
[task 2024-11-22T23:02:54.296Z] + tcol=2
[task 2024-11-22T23:02:54.297Z] ++ grep source_lang /home/ubuntu/tasks/task_173231632907177/fetches/bicleaner-ai-ru-en/metadata.yaml
[task 2024-11-22T23:02:54.297Z] ++ awk '{print $2}'
[task 2024-11-22T23:02:54.301Z] + model_source_lang=en
[task 2024-11-22T23:02:54.302Z] ++ grep target_lang /home/ubuntu/tasks/task_173231632907177/fetches/bicleaner-ai-ru-en/metadata.yaml
[task 2024-11-22T23:02:54.302Z] ++ awk '{print $2}'
[task 2024-11-22T23:02:54.306Z] + model_target_lang=xx
[task 2024-11-22T23:02:54.306Z] + '[' en == en ']'
[task 2024-11-22T23:02:54.306Z] + export scol=2
[task 2024-11-22T23:02:54.306Z] + scol=2
[task 2024-11-22T23:02:54.306Z] + export tcol=1
[task 2024-11-22T23:02:54.306Z] + tcol=1
[task 2024-11-22T23:02:54.306Z] + '[' -z '' ']'
[task 2024-11-22T23:02:54.306Z] ++ nvidia-smi --query-gpu=index --format=csv,noheader
[task 2024-11-22T23:02:54.506Z] + export 'CUDA_VISIBLE_DEVICES=0
[task 2024-11-22T23:02:54.506Z] 1
[task 2024-11-22T23:02:54.506Z] 2
[task 2024-11-22T23:02:54.506Z] 3'
[task 2024-11-22T23:02:54.506Z] + CUDA_VISIBLE_DEVICES='0
[task 2024-11-22T23:02:54.506Z] 1
[task 2024-11-22T23:02:54.506Z] 2
[task 2024-11-22T23:02:54.506Z] 3'
[task 2024-11-22T23:02:54.506Z] + echo '### Classifying'
[task 2024-11-22T23:02:54.506Z] ### Classifying
[task 2024-11-22T23:02:54.506Z] + '[' 7 -gt 1 ']'
[task 2024-11-22T23:02:54.506Z] + CUDA_VISIBLE_ARRAY=('0' '1' '2' '3')
[task 2024-11-22T23:02:54.506Z] + export CUDA_VISIBLE_ARRAY
[task 2024-11-22T23:02:54.506Z] + export TF_CPP_MIN_LOG_LEVEL=0
[task 2024-11-22T23:02:54.506Z] + TF_CPP_MIN_LOG_LEVEL=0
[task 2024-11-22T23:02:54.506Z] + export -f biclean
[task 2024-11-22T23:02:54.507Z] + parallel -j 4 --pipe -k --block 10M biclean /home/ubuntu/tasks/task_173231632907177/fetches/bicleaner-ai-ru-en/metadata.yaml '{%}'
[task 2024-11-22T23:02:54.507Z] + zstdmt
[task 2024-11-22T23:02:54.507Z] ++ zstdmt -dc /home/ubuntu/tasks/task_173231632907177/fetches/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.ru.zst
[task 2024-11-22T23:02:54.508Z] + paste /dev/fd/63 /dev/fd/62
[task 2024-11-22T23:02:54.508Z] ++ zstdmt -dc /home/ubuntu/tasks/task_173231632907177/fetches/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.en.zst
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:02:56.104103: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:02:56.104184: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:02:56.105703: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:02:56.113512: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
[task 2024-11-22T23:03:51.018Z] To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:02:57.032599: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:02:58.543578: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:03:08.432163: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:03:08.432575: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:03:13.713454: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
[task 2024-11-22T23:03:51.018Z] 2024-11-22 23:03:13.713504: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:13.715129: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:13.723334: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
[task 2024-11-22T23:03:51.019Z] To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:14.569484: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:16.155543: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:16.222851: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:16.223277: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:17.683716: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:17.684190: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:17.684523: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:18.656239: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:18.656685: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:18.657218: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:18.657513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14784 MB memory: -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:04.0, compute capability: 7.0
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:19.673275: I external/local_tsl/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:30,261 - WARNING - LM filter not present in metadata, disabling.
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:30,262 - WARNING - Porn removal not present in metadata, disabling
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:30,262 - WARNING - Using multilingual model, disabling language-dependant rules: not_too_short, length_ratio, no_only_numbers, no_repeated_words, no_wrong_language
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:30,262 - INFO - Arguments processed
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:30,262 - INFO - Starting process
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:49,906 - INFO - Finished
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:49,906 - INFO - Total: 1287 rows
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:49,906 - INFO - Elapsed time 19.64 s
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:49,906 - INFO - Troughput: 65 rows/s
[task 2024-11-22T23:03:51.019Z] 2024-11-22 23:03:49,906 - INFO - Program finished
[task 2024-11-22T23:03:51.028Z] + echo '### Filtering'
[task 2024-11-22T23:03:51.028Z] ### Filtering
[task 2024-11-22T23:03:51.029Z] + zstdmt -dc /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.scored.zst
[task 2024-11-22T23:03:51.029Z] + awk -v threshold=0.5 '-F\t' '{if ($3>threshold) {print $0}}'
[task 2024-11-22T23:03:51.029Z] + zstdmt
[task 2024-11-22T23:03:51.047Z] + zstdmt -dc /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.scored.zst
[task 2024-11-22T23:03:51.047Z] + awk -v threshold=0.5 '-F\t' '{if ($3<=threshold) {print $0}}'
[task 2024-11-22T23:03:51.047Z] + zstdmt
[task 2024-11-22T23:03:51.063Z] ++ zstdmt -dc /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.scored.zst
[task 2024-11-22T23:03:51.063Z] ++ wc -l
[task 2024-11-22T23:03:51.084Z] + echo 'Lines before filtering: 1287'
[task 2024-11-22T23:03:51.084Z] Lines before filtering: 1287
[task 2024-11-22T23:03:51.085Z] ++ zstdmt -dc /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.best.zst
[task 2024-11-22T23:03:51.085Z] ++ wc -l
[task 2024-11-22T23:03:51.090Z] + echo 'Lines after filtering: 1110'
[task 2024-11-22T23:03:51.090Z] Lines after filtering: 1110
[task 2024-11-22T23:03:51.090Z] + echo '### Writing output corpus'
[task 2024-11-22T23:03:51.090Z] ### Writing output corpus
[task 2024-11-22T23:03:51.090Z] + zstdmt -dc /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.best.zst
[task 2024-11-22T23:03:51.090Z] + tee /dev/fd/63
[task 2024-11-22T23:03:51.090Z] + cut -f2
[task 2024-11-22T23:03:51.091Z] + zstdmt
[task 2024-11-22T23:03:51.091Z] ++ cut -f1
[task 2024-11-22T23:03:51.091Z] ++ zstdmt
[task 2024-11-22T23:03:51.102Z] + echo '###### Done: Bicleaner filtering'
[task 2024-11-22T23:03:51.102Z] ###### Done: Bicleaner filtering
[fetches 2024-11-22T23:03:51.102Z] removing /home/ubuntu/tasks/task_173231632907177/fetches
[fetches 2024-11-22T23:03:52.356Z] finished
[taskcluster 2024-11-22T23:03:52.366Z] Exit Code: 0
[taskcluster 2024-11-22T23:03:52.366Z] User Time: 2m3.854372s
[taskcluster 2024-11-22T23:03:52.366Z] Kernel Time: 43.277244s
[taskcluster 2024-11-22T23:03:52.366Z] Wall Time: 5m2.155588047s
[taskcluster 2024-11-22T23:03:52.366Z] Result: SUCCEEDED
[taskcluster 2024-11-22T23:03:52.366Z] === Task Finished ===
[taskcluster 2024-11-22T23:03:52.366Z] Task Duration: 5m2.157569416s
[taskcluster 2024-11-22T23:03:52.411Z] Uploading artifact public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.ru.zst from file /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.ru.zst with content encoding "identity", mime type "application/zstd" and expiry 2025-11-17T22:50:29.816Z
[taskcluster 2024-11-22T23:03:52.422Z] Uploading artifact public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.en.zst from file /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.en.zst with content encoding "identity", mime type "application/zstd" and expiry 2025-11-17T22:50:29.816Z
[taskcluster 2024-11-22T23:03:52.425Z] Uploading artifact public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.scored.zst from file /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.scored.zst with content encoding "identity", mime type "application/zstd" and expiry 2025-11-17T22:50:29.816Z
[taskcluster 2024-11-22T23:03:52.461Z] Uploading artifact public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.best.zst from file /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.best.zst with content encoding "identity", mime type "application/zstd" and expiry 2025-11-17T22:50:29.816Z
[taskcluster 2024-11-22T23:03:52.473Z] Uploading artifact public/build/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.filtered.zst from file /home/ubuntu/tasks/task_173231632907177/artifacts/ELRC-web_acquired_data_related_to_scientific_resea_78c4de.filtered.zst with content encoding "identity", mime type "application/zstd" and expiry 2025-11-17T22:50:29.816Z
[taskcluster 2024-11-22T23:03:52.622Z] [mounts] Preserving cache: Moving "/home/ubuntu/tasks/task_173231632907177/checkouts" to "/home/ubuntu/caches/OrAmAfn0Snq8SXkm6xdiRg"
[taskcluster 2024-11-22T23:03:52.682Z] Uploading link artifact public/logs/live.log to artifact public/logs/live_backing.log with expiry 2025-11-17T22:50:29.816Z