Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] CSV Import: guess data types #4838

Merged
merged 1 commit into from
Jun 5, 2020

Conversation

PrimozGodec
Copy link
Contributor

Issue

Implements #4794

Description of changes

Implemented guessing strategy for CSV import which should match the strategy in io_utils.guess_data_type.

This PR adds another iteration over the columns. For each column, it checks for the data type according to data. Complex operations here are:

  • unique: Pandas unique is based on a hash table - less complex than Numpy's one
  • casting times to date-time

@ales-erjavec is this kind of guessing acceptable or would decrease the performance of the widget too much?

TODO

  • Tests
Includes
  • Code changes
  • Tests
  • Documentation

@codecov
Copy link

codecov bot commented Jun 2, 2020

Codecov Report

Merging #4838 into master will increase coverage by 0.05%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4838      +/-   ##
==========================================
+ Coverage   84.01%   84.07%   +0.05%     
==========================================
  Files         281      277       -4     
  Lines       56901    56487     -414     
==========================================
- Hits        47804    47490     -314     
+ Misses       9097     8997     -100     

@ales-erjavec
Copy link
Contributor

is this kind of guessing acceptable or would decrease the performance of the widget too much?

I think it is ok.

@PrimozGodec
Copy link
Contributor Author

PrimozGodec commented Jun 2, 2020

Ok. Then I will write a few tests before it is merged.

@PrimozGodec PrimozGodec force-pushed the csvimport-autovariable branch 2 times, most recently from 2d5cdc5 to e970535 Compare June 3, 2020 11:02
@PrimozGodec PrimozGodec force-pushed the csvimport-autovariable branch from e970535 to 9c19f2f Compare June 3, 2020 11:16
@ales-erjavec ales-erjavec changed the title CSV Import: guess data types [ENH] CSV Import: guess data types Jun 5, 2020
@ales-erjavec ales-erjavec merged commit 8ecbe68 into biolab:master Jun 5, 2020
@PrimozGodec PrimozGodec deleted the csvimport-autovariable branch January 21, 2022 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants