Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load data with Marshal instead of YAML #17

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

markprzepiora
Copy link

Loading 4MB of YAML data takes about 1 second on my development machine. Replacing the YAML-dumped data with Marshal-dumped data instead brings this down to about 0.08 seconds. This makes a big difference in feedback speed especially when running individual tests that rely on ZipCode.identify.

                      user     system      total        real
YAML.load         0.859559   0.029983   0.889542 (  0.889713)
Marshal.load      0.043685   0.030232   0.073917 (  0.073922)

@Hengjie
Copy link

Hengjie commented Aug 18, 2019

This looks awesome. Thank you for writing it so that it can be sped up. I just wish the gem author merged it now.

@brodyhoskins
Copy link
Collaborator

@markprzepiora, I'd like to explore options for storing the data.

What I ended up doing in a fork was converting the YAML to CSV and using the FastCSV gem to process the data more quickly and to prevent loading it into memory all at once; however I don't feel that it was necessarily the best method.

Another idea is to keep the YAML around for development purposes but bundle a SQLite database generated by a rake task.

Since you've opened this PR I wanted to get your feedback.

@@ -1,5 +1,3 @@
require 'yaml'

module ZipCodes
VERSION = '0.2.1'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change (semver-1)

Suggested change
VERSION = '0.2.1'
VERSION = '1.0'

If it's going to be included one day, I think the PR needs to include a build script, that convert a YAML (easy for humans to deal with) into the marshaled format.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to also open a PR against https://github.com/suvie-eng/zip-codes/

I'm not wanting to create maintainer controversy - but until the maintainers here integrate all of these, I'd like to have a gem that is the most up to date. Also, I like the performance of this change 👌

@lostapathy
Copy link

Library data like this really shouldn't be distributed via Marshal - the format of Marshal is not guaranteed to be stable between ruby versions (and indeed, is not in practice). I'd suggest either pursuing fast CSV options or just querying a sqlite file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants