Set field_size_limit to max long int size on csv plugin #352
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request touches issue #306 and fixes the issue for all platforms without falling back to hardcoded value.
The overflow was happening due to the fact of python CSV implementation uses a
long
to store max limit (https://github.com/python/cpython/blob/c88239f864a27f673c0f0a9e62d2488563f9d081/Modules/_csv.c#L21)Using the previous implementation of
sys.maxsize
has a drawback on Windows 64bit LLP64 data model that even using 64bit wide pointers, they keeplong
as a 32bit integer to preserve type compatibility (see more in https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models)So, to have a reliable max
long
size on the target system you can usectypes.c_ulong(-1)
that cycles back toulong
max value divided by 2 to get the size oflong
on the target platform.Regarding memory impact, the
long
value for the limit is already allocated to store the max limit and it's used only to block sizes higher than that. The CSV itself is allocated on demand.I ran the profiler on a simple load of a big file from brasil.io https://brasil.io/dataset/genero-nomes/nomes/ and I got the same values after the csv import operation:
versus