Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DX: Improve "Maximum retries of 100 reached without finding a unique value." message #109

Closed
staabm opened this issue Feb 19, 2024 · 8 comments

Comments

@staabm
Copy link
Contributor

staabm commented Feb 19, 2024

Preconditions

GdprDump Version: 4.0.3

PHP Version: any

Database Version: any

Steps to reproduce

when running with a complex schema and a big dump the process can take hours.
when unqiue-value rules are not properly configured "dump-time" errors can occur like:

mstaab@mst22:/cluster/www/www/www/gdpr-dump$ time ../gdpr-dump.phar config.yaml  > gdpr-dumped.sql.bz2
Maximum retries of 100 reached without finding a unique value.

real    37m4.517s
user    36m45.867s
sys     0m16.900s

the error message itself is not very helpful when a big config file is involved which contains several unique rules.

Expected result

a error message, which denotes which column in which table could not be uniquely randomized

Actual result

Maximum retries of 100 reached without finding a unique value.

@guvra
Copy link
Collaborator

guvra commented Feb 20, 2024

Fixed by #110

@guvra guvra closed this as completed Feb 20, 2024
@guvra
Copy link
Collaborator

guvra commented Feb 20, 2024

@staabm I forgot to warn you about something

You mentioned creating a dump of a big database, and in another issue, you mentioned using the faker converter.
In some situations it can have a noticeable impact on performance, because a lot of faker formatters are not optimized for being used billions of time.

This is why the default templates mainly use custom converters instead of faker.

However, it only starts being problematic when you're using it to convert at least hundred of millions of values.

@staabm
Copy link
Contributor Author

staabm commented Feb 20, 2024

thanks for the heads up. I am already in the perf optimizing process and looking for bottlenecks.

see e.g.

@guvra
Copy link
Collaborator

guvra commented Feb 20, 2024

thanks for the heads up. I am already in the perf optimizing process and looking for bottlenecks.

see e.g.

* [PDO quote bottleneck php/php-src#13440](https://github.com/php/php-src/issues/13440)

* [Inline escape() method to improve performance ifsnop/mysqldump-php#277](https://github.com/ifsnop/mysqldump-php/pull/277)

Nice!

FYI we switched to druidfi/mysqldump a while ago because the original repo was inactive, you might also need to create a PR on their repo.

@staabm
Copy link
Contributor Author

staabm commented Feb 20, 2024

noted, thanks ;-)

druidfi/mysqldump-php#34

@staabm
Copy link
Contributor Author

staabm commented Feb 20, 2024

40-50% faster dump with druidfi/mysqldump-php#37

@back-2-95
Copy link

I combined 3 performance related PRs to one for testing: druidfi/mysqldump-php#38

@guvra There is also commands if you wanna test it with this library. My first test was good as time went down from 40s to 5s when creating a dump.

@guvra
Copy link
Collaborator

guvra commented Feb 27, 2024

@back-2-95 I made a quick test on a medium size magento database.

  • Without the optimizations: 13 mins 2 secs (13.1 MiB)
  • With the optimizations: 5 mins 53 secs (15.1 MiB)

So it looks pretty good 👍

On another note, I used #113 to monitor the total execution time (this PR adds a progress bar to gdpr-dump when using the -v option). The progress bar uses setInfoHook from mysqldump-php to make the progress bar advance. And it doesn't behave like expected, because this hook is triggered after a table was dumped, not before. So when the progress bar displays a table name, the table was actually already dumped, whereas I would expect to see the name of the table that is currently being dumped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants