Generated Data Shape always 0 #9

iamamiramine · 2024-07-23T14:23:45Z

I am facing an issue when generating data using Tabula.

I trained Tabula on the following datasets:

Census
Fake Hotel Guests
Adult
Health
News

However, when generating, the generation loop is stuck because generated data shape is always 0 (num_samples is always greater than gen_data.shape[0]).

I tried re-training, and tried changing the max_length parameter in the sampling function, but it was of no help.

Can you please help me figure out how to fix this issue?

The text was updated successfully, but these errors were encountered:

zhao-zilong · 2024-09-28T14:27:33Z

Hi @iamamiramine sorry that I just saw your message. Did you solve it? The reason can be that your max_length is too small so that the generation cannot successfully generate one complete row of data.

iamamiramine · 2024-10-06T09:31:31Z

Hello, I tried changing the max_length parameter and it did not work.
Another thing to note is that Fake Hotel Guests dataset consists of 9 columns, so one row from this dataset is relatively short.

omaralvarez · 2024-11-09T07:53:14Z

I am also having problems with this, I am using max_length=1024 the maximum, if use more I get a CUDA error, in this dataset I can not get a single sample:

from imblearn.datasets import fetch_datasets

sick = fetch_datasets()['sick']
sick.data.shape

zhao-zilong · 2024-11-09T12:43:53Z

Hi @omaralvarez @iamamiramine

You do not need to set the max_length to 1024 that big, you can uncomment this part of code to see what is the length of your encoded row:

Tabula/tabula/tabula_dataset.py

Line 64 in 3869567

# Use following print to observe encoded token sequence length

Let me know if that helps.

omaralvarez · 2024-11-11T14:34:16Z

Yes, I don't think it has to do with max_length, the issue in this case is that some numbers always are outside of the requested ranges in the predicted dataframe, so they are always filtered out. I have tried to switch temperature, k, and training epochs to no avail.

xxxx-lzw · 2024-11-24T14:10:28Z

I am facing an issue when generating data using Tabula.

I trained Tabula on the following datasets:

Census

Fake Hotel Guests

Adult

Health

News

However, when generating, the generation loop is stuck because generated data shape is always 0 ( is always greater than ).num_samples``gen_data.shape[0]

I tried re-training, and tried changing the parameter in the sampling function, but it was of no help.max_length

Can you please help me figure out how to fix this issue?

May I ask if your problem is solved, I'm experiencing this problem as well

tmacleod · 2025-01-12T18:51:37Z

This problem also occurs if you train with too few epochs. Train with more epochs and sampling speed improves.

omaralvarez · 2025-01-14T08:30:27Z

In my case, no matter the epochs it wouldn't work (I trained for a week in a 80GB A100).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated Data Shape always 0 #9

Generated Data Shape always 0 #9

iamamiramine commented Jul 23, 2024

zhao-zilong commented Sep 28, 2024

iamamiramine commented Oct 6, 2024 •

edited

Loading

omaralvarez commented Nov 9, 2024 •

edited

Loading

zhao-zilong commented Nov 9, 2024

omaralvarez commented Nov 11, 2024

xxxx-lzw commented Nov 24, 2024

tmacleod commented Jan 12, 2025

omaralvarez commented Jan 14, 2025

Generated Data Shape always 0 #9

Generated Data Shape always 0 #9

Comments

iamamiramine commented Jul 23, 2024

zhao-zilong commented Sep 28, 2024

iamamiramine commented Oct 6, 2024 • edited Loading

omaralvarez commented Nov 9, 2024 • edited Loading

zhao-zilong commented Nov 9, 2024

omaralvarez commented Nov 11, 2024

xxxx-lzw commented Nov 24, 2024

tmacleod commented Jan 12, 2025

omaralvarez commented Jan 14, 2025

iamamiramine commented Oct 6, 2024 •

edited

Loading

omaralvarez commented Nov 9, 2024 •

edited

Loading