Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncated download results #645

Open
tucotuco opened this issue Dec 17, 2017 · 1 comment
Open

Truncated download results #645

tucotuco opened this issue Dec 17, 2017 · 1 comment
Assignees

Comments

@tucotuco
Copy link
Member

From Scott Chamberlain...

"The user is using rvertnet::bigsearch - the interface to your download service. He was getting only 33K records (exactly that many, which makes it sound especially like a hard limit) for a search on class="Aves", while they are getting 210K records for class="Aves" + inst="UMMZ" . It seems that the first query should surely be a larger set of data than the second. So we're wondering if there's some kind of limit that is sometimes imposed, sometimes not. Because if it was always imposed, he would only get 33K for both of those queries."

@tucotuco
Copy link
Member Author

There is a hard limit, but it is based on a Google Cloud storage concatenation limit, which is 1024 files. We make files of 1000 records each and join them to make the final download file, so the limit the way we are doing things is 1024000 records. Our reasoning is that, for anything bigger, people should be using the snapshots to avoid excessive costs to us. We'd have to look back through the logs and Google Cloud Storage to see if we can figure out why the Aves query (which WOULD fail to give all desired records) fails with 33k records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant