Closed
Description
Describe the issue
A clear and concise description of what the issue is.
To Reproduce
in one cell you need 755 ascii letters followed by a non-ascii character, you need a column with 5 letters and ending in a 2, and another column ending in a 0 and starting with an ascii letter
Here's an example to generate them:
from __future__ import annotations
import pathlib
import os
import io
import tempfile
import pandas as pd
import pyreadstat
"""
numpy==1.20.2
pandas==1.2.4
pyreadstat==1.1.0
python-dateutil==2.8.1
pytz==2021.1
six==1.15.0
"""
def main():
with tempfile.TemporaryDirectory() as tmp:
tmp_path = pathlib.Path(tmp)
dst_path = os.fsdecode(tmp_path / "eg.sav")
df = pd.read_csv(io.StringIO('aaaaa2,y,a0\n\n"' + ("a" * 755) + 'ü"'))
pyreadstat.write_sav(
dst_path=tmp_path / "eg.sav",
df=df,
column_labels=["x", "y", "z"],
)
pyreadstat.read_sav(dst_path)
if __name__ == "__main__":
main()
this results in:
Traceback (most recent call last):
File "foo.py", line 37, in <module>
main()
File "foo.py", line 33, in main
pyreadstat.read_sav(dst_path)
File "pyreadstat/pyreadstat.pyx", line 342, in pyreadstat.pyreadstat.read_sav
File "pyreadstat/_readstat_parser.pyx", line 1034, in pyreadstat._readstat_parser.run_conversion
File "pyreadstat/_readstat_parser.pyx", line 845, in pyreadstat._readstat_parser.run_readstat_parser
File "pyreadstat/_readstat_parser.pyx", line 775, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to convert string to the requested encoding (invalid byte sequence)
Expected behavior
I'd expect to be able to round trip it
Setup Information:
How did you install pyreadstat? pip, see pip freeze output above
Platform: Ubuntu 20.04.2 LTS
Python Version Python 3.8.5 (default, Jan 27 2021, 15:41:15)
Using Virtualenv or condaenv? python3.8 -m venv