Skip to content

Long processing times when handling very large files #12

Open
@hammzj

Description

@hammzj

Hello,

I've been here before and I'm back 😊
This gem has become a cornerstone in one of the projects I've developed. In most cases, it handles very well, minus some configuration options we need to customize to each scenario, but so it goes.

Now, we are working with larger files, 1.5million and such rows. In some cases, it seems to take hours. I've tested this before in tests with files between 500,000 and 1,000,000 rows, and have experienced around 15 minutes or more to fully process diffs of these files using the gem. We can deal with that even though it's not lovely, but any time taking longer than that is detrimental.

Now, I am not sure if this is an issue with how we provide key_fields or such, but I am mainly writing this issue out as a question on what experiences people have had with comparing large files? Is this a gem constraint, our own CSVDiff configuration, or something else?

What have you recorded for working with files of one-million plus rows, with up to 100 columns?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions