Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plumbum's path objects look like files but behave differently #651

Open
gnurbs opened this issue Jun 25, 2023 · 0 comments
Open

plumbum's path objects look like files but behave differently #651

gnurbs opened this issue Jun 25, 2023 · 0 comments

Comments

@gnurbs
Copy link

gnurbs commented Jun 25, 2023

I'm using plumbum to read a remote file. The file is consumed by another library, pandas. Like This:

import pandas, plumbum

remote = plumbum.SshMachine('myremote')
fd = remote.path('/tmp/test.csv')
data = pandads.read_csv(fd)

I expected this to work since 'read' in dir(fd). However it fails in a bad way (see bottom for the backtrace, but I don't think it's necessary) that made me think the problem is on pandas' side - but now I think it's not, as per fd.read's help:

fd.read?
Signature: dataFd.read(encoding=None)
Docstring:
returns the contents of this file as a ``str``. By default the data is read
as text, but you can specify the encoding, e.g., ``'latin1'`` or ``'utf8'``
File:      /usr/lib/python3/dist-packages/plumbum/path/remote.py

it's different from usual file-like objects in that read doesn't take the max amount of bytes to read, but the encoding.

I understand plumbum doesn't necessarily try to be pythonic, but it creates confusion if strongly-standing notions such as file-like objects don't work as one would expect them to - this needs a warning. Or maybe an API change.

Traceback of my actual code
File "plot.py", line 31, in <module>
  data = pandas.read_csv(dataFd)
             ^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/pandas/util/_decorators.py", line 211, in wrapper
  return func(*args, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/pandas/util/_decorators.py", line 331, in wrapper
  return func(*args, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/pandas/io/parsers/readers.py", line 950, in read_csv
  return _read(filepath_or_buffer, kwds)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/pandas/io/parsers/readers.py", line 605, in _read
  parser = TextFileReader(filepath_or_buffer, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/pandas/io/parsers/readers.py", line 1442, in __init__
  self._engine = self._make_engine(f, self.engine)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine
  self.handles = get_handle(
                 ^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/pandas/io/common.py", line 856, in get_handle
  handle = open(
           ^^^^^
FileNotFoundError: [Errno 2] No such file or directory: <RemotePath /EnvironSensors.log>

this lead me on the path that pandas uses repr(filepath_or_buffer) (that's fd in my example), but I think it goes on to that codepath after plumbum's fd.read behaves differently than expected.

It's possible this particular problem only appears with remote paths or files from a certain size, but I'm pretty sure the root cause is that plumbum doesn't behave in the standard way - which I think for plumbum in particular is fine, as it tries to do something that doesn't particularly look pythonish, but it should be easier to understand what fails when one reasonably assumes it works differently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant