Skip to content

Unable to round trip some files with some specific column and cell values #128

Closed
@graingert

Description

@graingert

Describe the issue
A clear and concise description of what the issue is.

To Reproduce

in one cell you need 755 ascii letters followed by a non-ascii character, you need a column with 5 letters and ending in a 2, and another column ending in a 0 and starting with an ascii letter

Here's an example to generate them:

from __future__ import annotations

import pathlib
import os
import io
import tempfile

import pandas as pd
import pyreadstat


"""
numpy==1.20.2
pandas==1.2.4
pyreadstat==1.1.0
python-dateutil==2.8.1
pytz==2021.1
six==1.15.0
"""


def main():
    with tempfile.TemporaryDirectory() as tmp:
        tmp_path = pathlib.Path(tmp)
        dst_path = os.fsdecode(tmp_path / "eg.sav")

        df = pd.read_csv(io.StringIO('aaaaa2,y,a0\n\n"' + ("a" * 755) + 'ü"'))
        pyreadstat.write_sav(
            dst_path=tmp_path / "eg.sav",
            df=df,
            column_labels=["x", "y", "z"],
        )
        pyreadstat.read_sav(dst_path)


if __name__ == "__main__":
    main()

this results in:

Traceback (most recent call last):
  File "foo.py", line 37, in <module>
    main()
  File "foo.py", line 33, in main
    pyreadstat.read_sav(dst_path)
  File "pyreadstat/pyreadstat.pyx", line 342, in pyreadstat.pyreadstat.read_sav
  File "pyreadstat/_readstat_parser.pyx", line 1034, in pyreadstat._readstat_parser.run_conversion
  File "pyreadstat/_readstat_parser.pyx", line 845, in pyreadstat._readstat_parser.run_readstat_parser
  File "pyreadstat/_readstat_parser.pyx", line 775, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to convert string to the requested encoding (invalid byte sequence)

Expected behavior
I'd expect to be able to round trip it

Setup Information:
How did you install pyreadstat? pip, see pip freeze output above
Platform: Ubuntu 20.04.2 LTS
Python Version Python 3.8.5 (default, Jan 27 2021, 15:41:15)
Using Virtualenv or condaenv? python3.8 -m venv

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions