Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance And Memory Usage #19

Open
radix0000 opened this issue Sep 19, 2024 · 1 comment
Open

Performance And Memory Usage #19

radix0000 opened this issue Sep 19, 2024 · 1 comment
Assignees

Comments

@radix0000
Copy link
Collaborator

For the new updates version of pipeline, as part of performance improvements (which have been substantial), some trade-offs with memory usage have had to be made. Various critical data is loaded out of Elasticsearch into memory on startup (see caching.py) and saved back at end (this results in an approximately order of magnitude speed improvement). In future for larger datasets like UK PSC some optimisation will be need to keep the size of this in memory data within acceptable limits. There is significant scope for this, various strings that are being stored have only certain values and could be represented as integers for instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@radix0000 and others