Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize count returns #3

Open
jotegui opened this issue May 13, 2016 · 3 comments
Open

Optimize count returns #3

jotegui opened this issue May 13, 2016 · 3 comments
Labels

Comments

@jotegui
Copy link
Member

jotegui commented May 13, 2016

Currently, the way counts are calculated imply retrieving the full list of records and then returning just the length of the array. This is highly inefficient (e.g. it took more than 2h to get the volume of records mentioning mvz)

@tucotuco
Copy link
Member

I do not know of a way to get counts efficiently AND accurately with GAE. However, for the case in question of small record sets, I believe the estimated count is a good enough estimate and could be used to make a determination.

@jotegui
Copy link
Member Author

jotegui commented May 24, 2016

You are right, @tucotuco , I was not familiar with Google's search api and I guess I was expecting a bit too much, like a count method or so... So, it seems the only way of counting records is to actually retrieve them and return the length of the array. sigh

Actually, given this difficulty and the current structure, I have been thinking on omitting this whole issue, and here is why:

  1. There is little (if any) potential use for a method such as count from the users' perspective.
  2. Record counts are actually only useful for direct calls to the download api, since portal downloads come after a search event, where record count is already calculated. And direct downloads via the portal-web have already been implemented.
  3. If we enable a new parameter in the search API (like format), where users can decide whether to get records in JSON or TXT format, they will be able to download via that method. But that makes the distinction between both methods a bit blurry...
  4. We can use an approach such as GBIF's: put a hard limit on the number of records retrievable via direct call to the search API, and suggest to use the download API for larger searches...

Again, just thinking out loud here...

@tucotuco
Copy link
Member

I agree with all of these observations.

On Tue, May 24, 2016 at 8:01 AM, Javier Otegui [email protected]
wrote:

You are right, @tucotuco https://github.com/tucotuco , I was not
familiar with Google's search api and I guess I was expecting a bit too
much, like a count method or so... So, it seems the only way of counting
records is to actually retrieve them and return the length of the array.
sigh

Actually, given this difficulty and the current structure, I have been
thinking on omitting this whole issue, and here is why:

  1. There is little (if any) potential use for a method such as count
    from the users' perspective.
  2. Record counts are actually only useful for direct calls to the
    download api, since portal downloads come after a search event, where
    record count is already calculated. And direct downloads via the
    portal-web have already been implemented.
  3. If we enable a new parameter in the search API (like format), where
    users can decide whether to get records in JSON or TXT format, they
    will be able to download via that method. But that makes the distinction
    between both methods a bit blurry...
  4. We can use an approach such as GBIF's: put a hard limit on the
    number of records retrievable via direct call to the search API, and
    suggest to use the download API for larger searches...

Again, just thinking out loud here...


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#3 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants