Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk Data Access - Caching/Reduce Server Load/Research #120

Open
danieldjewell opened this issue Apr 4, 2023 · 0 comments
Open

Bulk Data Access - Caching/Reduce Server Load/Research #120

danieldjewell opened this issue Apr 4, 2023 · 0 comments

Comments

@danieldjewell
Copy link

Hello NY Senate OpenLegislation Team!

First, I'd like to commend you, the NY Senate, and I suppose the entire State of New York for creating and hosting what is, in my experience, one of the most open and accessible systems for accessing State-level legislative information that exists in the USA today. Many kudos!

One thought/suggestion - it appears that there is not currently a bulk export/collection available for download. Having a batch generated (i.e. static) export available in say compressed JSON or Msgpack format would be advantageous:

  • A bulk export would allow for easier offline access (sadly, not everyone in our country - or even the State of New York - always has access to an internet connection)
    • On this note especially, my own experience in Manhattan of cellular data coverage is widely variable - especially in the multitude of buildings of NYC...
  • A bulk export could potentially reduce server load (I suspect that there are at least a few people out there who are already crawling the API to generate a complete dataset - providing a bulk export would obviate the need for those developers to perform a ton of API queries and would reduce load)
  • A bulk export is ideal for research purposes and can also be a great help for developers (assuming the bulk data format closely mirrors that of what the API returns)

A daily export of the data could be setup to run as a batch process (i.e. overnight during periods of low activity) and stored/delivered statically. (I've seen others attempt this as a "live generation" process - e.g. trigger the API to make a dump -- and while this does result in very up-to-date data, it does put a LOT of load on the servers... Not to mention download speeds are usually impacted.)

Additionally, if storage space is available, an archive of the exports could be very useful for researchers as well (e.g. researchers trying to track changes over time, etc.) ... If stored with deduplication, the overall amount of storage would probably be relatively low.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant