Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash when data size exceeds int32 #5705

Closed
SiNZeRo opened this issue Feb 9, 2023 · 5 comments
Closed

crash when data size exceeds int32 #5705

SiNZeRo opened this issue Feb 9, 2023 · 5 comments
Labels

Comments

@SiNZeRo
Copy link
Contributor

SiNZeRo commented Feb 9, 2023

maybe related to this code:

const size_t data_offset = offset + data_index * num_columns_in_cur_partition;

@jameslamb
Copy link
Collaborator

jameslamb commented Feb 9, 2023

Thanks for using LightGBM.

I've updated the link in your question to one that's anchored to a specific commit, so even if that file you've linked to is altered, anyone reading this in the future will know what line you meant. If you don't know how to do that, see https://docs.github.com/en/repositories/working-with-files/using-files/getting-permanent-links-to-files#press-y-to-permalink-to-a-file-in-a-specific-commit.

Typically a report like "this crashed" without any other details is very difficult for us to investigate. Since you didn't provide any details like a minimal, reproducible example, I'm going to interpret this as a question.... "does training with device_type=cuda support using training data with more than max(int32) (2,147,483,647) rows"?

I SUSPECT that the answer is "no", given that even CPU-based LightGBM does not support more than int32 rows in the input data: #5454 .

@guolinke or @shiyu1994 can you please comment?

@SiNZeRo
Copy link
Contributor Author

SiNZeRo commented Feb 9, 2023

Thanks for using LightGBM.

I've updated the link in your question to one that's anchored to a specific commit, so even if that file you've linked to is altered, anyone reading this in the future will know what line you meant. If you don't know how to do that, see https://docs.github.com/en/repositories/working-with-files/using-files/getting-permanent-links-to-files#press-y-to-permalink-to-a-file-in-a-specific-commit.

Typically a report like "this crashed" without any other details is very difficult for us to investigate. Since you didn't provide any details like a minimal, reproducible example, I'm going to interpret this as a question.... "does training with device_type=cuda support using training data with more than max(int32) (2,147,483,647) rows"?

I SUSPECT that the answer is "no", given that even CPU-based LightGBM does not support more than int32 rows in the input data: #5454 .

@guolinke or @shiyu1994 can you please comment?

thanks for the updates.

sorry for misleading you, actually datasize means num_rows * num_feats.

somewhat related to this pr: #5167

@jameslamb
Copy link
Collaborator

closed via #5706

@jameslamb
Copy link
Collaborator

thanks for the help @SiNZeRo !

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants