You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been here before and I'm back 😊
This gem has become a cornerstone in one of the projects I've developed. In most cases, it handles very well, minus some configuration options we need to customize to each scenario, but so it goes.
Now, we are working with larger files, 1.5million and such rows. In some cases, it seems to take hours. I've tested this before in tests with files between 500,000 and 1,000,000 rows, and have experienced around 15 minutes or more to fully process diffs of these files using the gem. We can deal with that even though it's not lovely, but any time taking longer than that is detrimental.
Now, I am not sure if this is an issue with how we provide key_fields or such, but I am mainly writing this issue out as a question on what experiences people have had with comparing large files? Is this a gem constraint, our own CSVDiff configuration, or something else?
What have you recorded for working with files of one-million plus rows, with up to 100 columns?
The text was updated successfully, but these errors were encountered:
@agardiner You're gonna hear from me a lot but that's because this tool is incredibly valuable to our team.
We have been experiencing very long times to process files over 100,000 rows. Mainly I am talking 500k+ rows in files. These run for hours without producing results. Have you done any performance testing with large files?
I've not used this with files larger than 100k records, but I'd expect that performance drops off exponentially as your inputs grow. The implementation is pretty simple and works well for small inputs, but it was not designed for speed or to scale to large volume inputs.
Sorry I don't have any better news for you.
Hello,
I've been here before and I'm back 😊
This gem has become a cornerstone in one of the projects I've developed. In most cases, it handles very well, minus some configuration options we need to customize to each scenario, but so it goes.
Now, we are working with larger files, 1.5million and such rows. In some cases, it seems to take hours. I've tested this before in tests with files between 500,000 and 1,000,000 rows, and have experienced around 15 minutes or more to fully process diffs of these files using the gem. We can deal with that even though it's not lovely, but any time taking longer than that is detrimental.
Now, I am not sure if this is an issue with how we provide
key_fields
or such, but I am mainly writing this issue out as a question on what experiences people have had with comparing large files? Is this a gem constraint, our ownCSVDiff
configuration, or something else?What have you recorded for working with files of one-million plus rows, with up to 100 columns?
The text was updated successfully, but these errors were encountered: