Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support of pkl output format and compression #14

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Evgeny7777
Copy link

@Evgeny7777 Evgeny7777 commented Oct 24, 2020

Tests to be added, but what do you think in general? Does it make sense?

@ffeast
Copy link
Owner

ffeast commented Oct 24, 2020

Hey @Evgeny7777

Thanks for posting you suggestion.
What's the use case so that you want it built into finam-export?

It could be sovled via an external tool trivially applied in the following way:

set -e
finam-export.py ... --destdir=some_dir --ext=txt
find some_dir -name '*.txt' | xargs convert_script

where convert_script might convert data into whatever is wanted, not only pickle or gzip

@Evgeny7777
Copy link
Author

Evgeny7777 commented Oct 26, 2020

Hey @ffeast ,

There are different reasons for me

  1. pd.DataFrame --> csv --> pd.DataFrame conversion path may easily introduce conversion problems. pickle dumps/restores object without conversions
  2. I'm thinking about maintaining local data lake, so that size matters. No sense to keep csv
  3. It could be done via pipe or by making own version of the script, but I thought it may be useful for others.

Other feature I'm going to add anyway will be delta-loading. Script would open file if exists and request only missing data. Do you think it would be useful for others?

@ffeast
Copy link
Owner

ffeast commented Oct 26, 2020

Hey @Evgeny7777

  1. What kind of conversions do you mean?
  2. Sounds reasonable
  3. That's definitely a useful feature from both user and operational standpoints!
  • for users it would allow faster download times as I believe a typical scenario is to update historical data regularly so that 99% of data downloaded is already stored locally
  • it would allow to decrease load on finam's services
    Lets just move it to a separate issue

@Evgeny7777
Copy link
Author

@ffeast,

  1. I mean conversion of Dataframe object to cvs (on saving) and back (on loading) (because further on most probably you will need to work with Dataframe again). At least this is my usecase. interim csv state may introduce weird type/precision problems.
  2. Then will eventually cover it with test 👍
  3. Will make an issue then

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants