Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenSearch implementation #107

Draft
wants to merge 96 commits into
base: master
Choose a base branch
from

Conversation

ElliottKasoar
Copy link
Contributor

OpenSearchDatabase

The focus of this PR is the implementation of the OpenSearchDatabase class in abcd/backends/atoms_opensearch.py, designed to mirror the MongoDatabase class in abcd/backends/atoms_pymongo.py. Where possible, functions should behave equivalently between the two classes, although in at least once case (OpenSearchDatabase.property), a more efficient alternative is provided (OpenSearchDatabase.count_property).

While it would be possible to use OpenSearch in combination of MongoDB (both generally, and as a relatively straightforward extension of this implementation), it seems to make more sense to use OpenSearch as the database itself, as efficiencies from OpenSearch queries are due to processing on ingestion. Having ingested data into OpenSearch, the data is stored as JSON documents, so also storing the data in MongoDB would require duplication of most, if not all, data.

Unit testing, both in mock form, similar to those currently written for MongoDB, and a more completely set of new tests, designed to connect to a live containerised database through GitHub Actions, have also been written.

Properties

A new class in abcd/backends/atoms_properties.py is designed to read in extra information from a CSV file, as well as infer units and the relevant structure files via a template. Unit testing for this class have also been written.

Query parsing

OpenSearch queries can be relatively complex to construct, so this proposes the use of Luqum, which allows queries to be written using the Lucene Query DSL, and parsed into an Elastic/OpenSearch string query.

Parsing to enable extra information to be added in abcd/parsers/extras.py is largely unchanged, although I extended it slightly to allow expressions in the form of Lucene queries (e.g. key:value).

Misc

Note: The initial commits are required for later OpenSearch commits, but were written as a separate branch, as they focus on implementing poetry for package installation and dependency management, and GitHub Actions for unit testing, as well as a fixes to query parsing and pymongo for newer versions of the packages. A separate PR could, therefore, be made for these non-OpenSearch oriented changes, if desired. More general changes to legacy code due to the use of flake8 and black could also be separated out, but would be more work to untangle.

To do

Remaining work to be done is documented in more detail here, of which testing integration with the GUI is perhaps the most significant remaining feature to be worked on that already exists for MongoDB. However, a number of new features will also be required for PSDI, including integration with AiiDA and external databases, storage of potentials, and new metadata.

@ElliottKasoar ElliottKasoar force-pushed the add_opensearch branch 2 times, most recently from 114c10a to 861e685 Compare September 7, 2023 14:58
@ElliottKasoar
Copy link
Contributor Author

Note: further changes to be added following merge of ElliottKasoar#31

As discussed with @stenczelt, ideally this will be split into 2-3 PRs (CI + OpenSearch)

Copy link

codecov bot commented Jun 12, 2024

Codecov Report

Attention: Patch coverage is 78.87029% with 101 lines in your changes missing coverage. Please review.

Please upload report for BASE (master@25a79ff). Learn more about missing BASE report.

Files Patch % Lines
abcd/backends/atoms_opensearch.py 87.94% 34 Missing ⚠️
abcd/backends/utils.py 39.62% 32 Missing ⚠️
abcd/frontends/commandline/commands.py 48.14% 14 Missing ⚠️
abcd/backends/atoms_pymongo.py 52.38% 10 Missing ⚠️
abcd/backends/atoms_properties.py 89.85% 7 Missing ⚠️
abcd/frontends/commandline/decorators.py 75.00% 3 Missing ⚠️
abcd/model.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master     #107   +/-   ##
=========================================
  Coverage          ?   59.29%           
=========================================
  Files             ?       25           
  Lines             ?     1646           
  Branches          ?        0           
=========================================
  Hits              ?      976           
  Misses            ?      670           
  Partials          ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ElliottKasoar ElliottKasoar force-pushed the add_opensearch branch 9 times, most recently from 8e4036d to 0e30256 Compare June 12, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant