Open
Description
When running the filter_parallel_corpora script, if the score file specified on the command line already exists, the script fails.
Specifying a different score file, or deleting the previous score file before each run, avoids the problem, but is a cumbersome workaround.
$ poetry run python -m silnlp.common.filter_parallel_alignment M:/MT/experiments/OpCap.Suluh/Levuka/Data/8000/id-XriLevuka.8000.all.txt M:/MT/experiments/OpCap.Suluh/Levuka/Data/8000/lvu-XriLevuka.8000.all.txt M:/MT/experiments/OpCap.Suluh/Levuka/Data/8000/id-XriLevuka.8000.all.scores.txt --quantile 0.40 --aligner hmm
2025-04-03 04:39:20,690 - silnlp.common.environment - INFO - Using workspace: M:/ as per environment variable SIL_NLP_DATA_PATH.
2025-04-03 04:39:21,678 - silnlp.common.utils - INFO - Git commit: 32a8754d11
Git commit: 32a8754d11
Loading corpus ...
Loading alignment scores ...
Filtering corpus (lowest 40.0% of alignment scores)
Traceback (most recent call last):
File "C:\Users\Michael Martin\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Michael Martin\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\Michael Martin\PycharmProjects\silnlp\silnlp\common\filter_parallel_alignment.py", line 65, in <module>
main()
File "C:\Users\Michael Martin\PycharmProjects\silnlp\silnlp\common\filter_parallel_alignment.py", line 45, in main
score_threshold = corpus["score"].quantile(args.quantile)
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\pandas\core\series.py", line 2887, in quantile
result = df.quantile(q=q, interpolation=interpolation, numeric_only=False)
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\pandas\core\frame.py", line 12146, in quantile
res_df = self.quantile(
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\pandas\core\frame.py", line 12191, in quantile
res = data._mgr.quantile(qs=q, interpolation=interpolation)
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\pandas\core\internals\managers.py", line 1548, in quantile
blocks = [
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\pandas\core\internals\managers.py", line 1549, in <listcomp>
blk.quantile(qs=qs, interpolation=interpolation) for blk in self.blocks
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\pandas\core\internals\blocks.py", line 1891, in quantile
result = quantile_compat(self.values, np.asarray(qs._values), interpolation)
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\pandas\core\array_algos\quantile.py", line 39, in quantile_compat
return quantile_with_mask(values, mask, fill_value, qs, interpolation)
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\pandas\core\array_algos\quantile.py", line 97, in quantile_with_mask
result = _nanpercentile(
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\pandas\core\array_algos\quantile.py", line 218, in _nanpercentile
return np.percentile(
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\numpy\lib\function_base.py", line 4283, in percentile
return _quantile_unchecked(
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\numpy\lib\function_base.py", line 4555, in _quantile_unchecked
return _ureduce(a,
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\numpy\lib\function_base.py", line 3823, in _ureduce
r = func(a, **kwargs)
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\numpy\lib\function_base.py", line 4722, in _quantile_ureduce_func
result = _quantile(arr,
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\numpy\lib\function_base.py", line 4841, in _quantile
result = _lerp(previous,
File "C:\Users\Michael Martin\AppData\Local\pypoetry\Cache\virtualenvs\silnlp-8WA6DU8h-py3.10\lib\site-packages\numpy\lib\function_base.py", line 4655, in _lerp
diff_b_a = subtract(b, a)
TypeError: unsupported operand type(s) for -: 'str' and 'str'