Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading hyper crashes pantab+python (with pantab 5.2.0) #392

Open
jwhitaker-gridcog opened this issue Nov 15, 2024 · 7 comments
Open

Reading hyper crashes pantab+python (with pantab 5.2.0) #392

jwhitaker-gridcog opened this issue Nov 15, 2024 · 7 comments

Comments

@jwhitaker-gridcog
Copy link

jwhitaker-gridcog commented Nov 15, 2024

Describe the bug
Reading the attached hyper file makes pantab+python crash with a SIGABRT.

The logs from python include:

terminate called after throwing an instance of 'hyperapi::HyperException'
  what():  The Hyper server closed the connection unexpectedly.
Hint: The server process may have been shut down or terminated before or while processing the request.
Context: 0xfa6b0e2f
Fatal Python error: Aborted

This hyper file was produced by this version of pantab, by passing a pyarrow Table constructed from a pd dataframe.
Tableau itself can open the hyper file fine, as can tableauhyperapi==0.0.20027.

To Reproduce
Steps to reproduce the behavior:

import pantab
assert pantab.__version__ == '5.2.0'
df = pantab.frame_from_hyper("wind_scenario_comparisons.hyper", table="scenario_comparisons")

Expected behavior

  1. no crash :)

  2. even given a crash, I'm wondering as a secondary concern if pantab could make this easier to debug - in particular, is there a message in that HyperException that's not being surfaced to Python or stderr?

wind_scenario_comparisons.hyper.gz

System details
Linux x86-64
Python 3.12

Edit: this appears to be a new regression in 5.2.0, 5.1.0 works fine.

@jwhitaker-gridcog jwhitaker-gridcog changed the title Reading hyper crashes pantab+python Reading hyper crashes pantab+python (with pantab 5.2.0) Nov 15, 2024
@WillAyd
Copy link
Collaborator

WillAyd commented Nov 15, 2024

Thanks for the report. I agree it would be nice to have that error message propagated up. Maybe nanobind had a feature for that?

As to the issue, do you have any way of providing data to reproduce the issue? Ideally if you can provide a self contained code sample it helps with debugging. Thanks!

@jwhitaker-gridcog
Copy link
Author

Did you see the attached hyper?

@WillAyd
Copy link
Collaborator

WillAyd commented Nov 15, 2024

My mistake - overlooked that from my phone.

This is super strange...it just seems like the Hyper process crashes during the read of this file, but I also cannot see any details why as it comes from Hyper and not pantab.

The major change between 5.1.0 and 5.2.0 was to iterate over the chunks of a Hyper result set, so I'm guessing it is related to that. Interestingly enough, if you enable logging for the Hyper process everything goes cleanly, so I wonder if there's something in the Hyper process that tries to log an entry during chunked iteration, but fails to recognize that logging is disabled?

Unfortunately will have to defer to Tableau to help on this one. @vogelsgesang may know, and I'll raise this on the Tableau Slack channel as well. Thanks for the report!

@WillAyd
Copy link
Collaborator

WillAyd commented Nov 15, 2024

@jwhitaker-gridcog can you try installing the package from github and see if enabling logging allows you to read the file as well?

pip install git+https://github.com/innobi/pantab.git@add-atomic-keyword

then

import pantab as pt
df = pantab.frame_from_hyper(
    wind_scenario_comparisons.hyper", 
    table="scenario_comparisons", 
    process_params={"log_config": "can_write_anything_here_except_blank_space"},
)

@jwhitaker-gridcog
Copy link
Author

No worries! Sorry but I raised this at a bit of a silly time for me, I've just gone on holiday for a week, ill be able to play when I get back. :)

@WillAyd
Copy link
Collaborator

WillAyd commented Nov 18, 2024

Posted this on slack but copying here in case its easier for anyone at Tableau that may come across this.

When debugging the crash I see the following relevant parts of the backtrace

#12 0x00007fff878ed0f0 in hyperapi::Result::fetchNextChunk (this=0x555556ead460) at /home/willayd/clones/pantab/build/_deps/tableauhyperapi-cxx-src/include/hyperapi/impl/Result.impl.hpp:441
#13 0x00007fff878edf4a in hyperapi::ChunkedResultIterator::operator++ (this=0x5555560903e0)
    at /home/willayd/clones/pantab/build/_deps/tableauhyperapi-cxx-src/include/hyperapi/impl/Result.impl.hpp:648
#14 0x00007fff878e7577 in operator() (__closure=0x0, stream=0x555556f4a878, out=0x7fffffffb700) at /home/willayd/clones/pantab/src/pantab/reader.cpp:568
#15 0x00007fff878e765b in _FUN () at /home/willayd/clones/pantab/src/pantab/reader.cpp:579

The failing code in Result.imp.hpp looks like:

inline void Result::fetchNextChunk() {
   if (!isOpen()) {
      currentChunk_ = Chunk();
      currentChunkIterator_ = end(currentChunk_);
      return;
   }
   hyper_rowset_chunk_t* newChunk = nullptr;
   hyper_error_t* error = hyper_rowset_get_next_chunk(rowset_, &newChunk);
   if (error) {
      throw internal::makeHyperException(error);
   }

Within pantab, we hold a reference to a hyperapi::Result and hyperapi::ChunkedResultIterator. With the latter object, we are incrementing it until we run into hyperapi::ChunkedResultIterator{result_, hyperapi::IteratorEndTag{}};

From what I can tell, this abort is getting called when we are one iteration prior to the end sentinel and try to increment it to the last sentinel. The associated error message I see is

{discriminator = 0, value = {integer = 373254, 
    string = 0x5b206 <error: Cannot access memory at address 0x5b206>, pointer = 0x5b206, uinteger = 373254}}

@WillAyd
Copy link
Collaborator

WillAyd commented Dec 16, 2024

Hi @jwhitaker-gridcog were you ever able to try out the process with logging enabled?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants