-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Enhance SOLR integration and add a Schema API #54
Feature: Enhance SOLR integration and add a Schema API #54
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## development #54 +/- ##
===============================================
+ Coverage 85.93% 86.36% +0.43%
===============================================
Files 41 46 +5
Lines 2702 3022 +320
===============================================
+ Hits 2322 2610 +288
- Misses 380 412 +32 ☔ View full report in Codecov by Sentry. |
f83d7c4
to
bde2bdf
Compare
…age of mapping-ISOLatin1Accent
e1ad9ac
to
a74645b
Compare
@syphax-bouazzouni, this looks great! Really useful feature for us as well! It would be great to test it against the latest Solr 9.5.0 to make sure the code is compatible. I am wrapping up my work on a few search enhancements for the RADx project, which we plan to deploy to production shortly. The next major task is upgrading Solr to the latest version, which we may very well coincide with merging this pull request. Are you planning to make any other significant changes to this feature (you mentioned that this is a "first iteration)? Thank you! |
a74645b
to
ec99ed8
Compare
f25886a
to
ca32e8c
Compare
bc4f5a1
to
1fc6730
Compare
1fc6730
to
51fbfe4
Compare
Hello, @mdorf, I tested it locally with Solr 9.5.0 and all the tests are green. |
1836ce1
to
84aa3db
Compare
* add an abstraction to SOLR integeration and add Schema API * add SOLR Schema API tests * update SOLR backend configuration and init * use the new Solr connector in the model search interface * update search test to cover the new automatic indexing and unindexing * handle the solr container initialization when running docker for tests * add omit_norms options for SolrSchemaGenerator * fix solr schema initial dynamic fields declaration and replace the usage of mapping-ISOLatin1Accent * delay the schema generation to after model declarations or in demand * add solr edismax fitlers tests * fix indexBatch and unindexBatch tests * add security checks to the index and unindex functions * change dynamic fields names to have less code migration * update clear_all_schema to remove all copy and normal fields * add an option to force solr initialization if wanted * handle indexing embed objects of a model * add index update option * fix clear all schema to just remove all the fields and recreate them * add index_enabled? helper for models * perform a status test when initializing the solr connector * extract init_search_connection function from init_search_connections * fix typo in indexOptimize call * add solr search using HTTP post instead of GET for large queries
@syphax-bouazzouni, I am working on merging this functionality into our develop branch. It's a bit tricky given that the pull request is not against our own repo. Probably will need to do a lot of manual merging. Were you planning on submitting this pull request against the ncbo repo? If so, should I wait for that or just proceed with my manual merging? Thank you! |
Hello, the proposition was to move and create PRs directly on the OntoPortal repo now that everyone is positioned under it. I think @syphax-bouazzouni has scheduled some time to create PR related to our work soon. |
Hello @mdorf, I would suggest waiting at least 1 month before merging this, as it is only tested in our development environment, and will be released in our next release, see ontoportal-lirmm/ontologies_api#73 Once deployed to our production environment and tested, I will do a PR. Is it good with you? |
@syphax-bouazzouni, @jonquet, no problem. In general, we are very interested in this feature to facilitate the functionality sought by the RADx project, in which our Solr index would be packaged in a way to be accessible by the third-party API. See bmir-radx/radx-project#49. However, based on my conversation with @alexskr, this does not have the immediate urgency. A month is definitely reasonable for us to wait to be able to merge this feature in its more stable and tested iteration. |
…DF 3.0 and SOLR API (#58) * Feature: Add Virtuso, Allegrograph and Graphdb integration to GOO (#48) * simplify the test configuration init * add docker based tests rake task to run test against 4s, ag, gb, vo * remove faraday gem usage * update test CI to test against all the supported backends with diffirent slice sizes * add high level helper to to know which backend we are currently using * extract sparql processor module from where module * handle language_match? value to upcase by default * add support for virtuoso and graphdb sparql client * replace delete sparql query by delete graph in the model complex test * add some new edge cases tests t o test_where.rb and test_schemaless * make test_chunks_write.rb tests support multiple backends * replace native insert_data with execute_append_request in model save * remove add_rules as it seems to no more be used * move expand_equivalent_predicates from loader to builder module * build two diffirent queries depending on which backend used * update mapper to handle the two different queries depending on the backend used * simplify the loader code, by removing inferable variables * refactor and simplify map_attributes method * fix test chunks write concenrency issues * Refactor: clean model settings module code (#52) * remove old file no more used * extract attribute settings module from the model settings module * remove the inmutable feature as deprecated and not used * rename callbacks method names * Feature: Add after_save and after_destroy hooks to models (#53) * remove old file no more used * extract attribute settings module from the model settings module * remove the inmutable feature as deprecated and not used * rename callbacks method names * add hooks module * Feature: update rdf gem to latest version (#56) * un pin rdf version, to use the latest and add rdf vocab and xml * update URI class monkey patch because Addressable does no more exist * RDF::SKOS is replaced with RDF::Vocab::SKOS in the latest version of RDF * pin rdf version to 3.2.11 the latest version that support ruby 2.7 * monkey path Literal::DateTime format to be supported by 4store * remove addressable dependency * Fix: saving a model removing unmodified attributes after consecutive save * Fix: enforce to use str() when doing a filter with a string value (#57) * enforce to use str() when doing a filter with a string * update agraph version to 8.1.0 * Fix: monkey path RDF to not remove xsd:string by default * Feature: Enhance SOLR integration and add a Schema API (#54) * add an abstraction to SOLR integeration and add Schema API * add SOLR Schema API tests * update SOLR backend configuration and init * use the new Solr connector in the model search interface * update search test to cover the new automatic indexing and unindexing * handle the solr container initialization when running docker for tests * add omit_norms options for SolrSchemaGenerator * fix solr schema initial dynamic fields declaration and replace the usage of mapping-ISOLatin1Accent * delay the schema generation to after model declarations or in demand * add solr edismax fitlers tests * fix indexBatch and unindexBatch tests * add security checks to the index and unindex functions * change dynamic fields names to have less code migration * update clear_all_schema to remove all copy and normal fields * add an option to force solr initialization if wanted * handle indexing embed objects of a model * add index update option * fix clear all schema to just remove all the fields and recreate them * add index_enabled? helper for models * perform a status test when initializing the solr connector * extract init_search_connection function from init_search_connections * fix typo in indexOptimize call * add solr search using HTTP post instead of GET for large queries * make indexed resource_id case insensitive (#59) * Fix: Invalidating cache after insertion of a new element (#60) * create a test to reproduce the cache invalidate on insert bug * use again insert_data instead of execute_append_request because the first invalidate the cache * update sparql client to version 3.2.0 * handle the case virtuoso insert data bug * use development branch of sparql-client * fix search resource_id case insensitive by using string_ci instead
Prerequisites
Goals
Context
SOLR is the indexing tool, that we use for our search features, it works by defining a collection (a table in the databases world), and for each collection, a schema defines the properties to index by giving its type, a list, or not, ... and also some dynamic or special fields to handle fuzzy search, or other.
The requirement was that we were required to define the collection and the schema, in XML configuration files at the start, and then after we could not change it in the code. This meant we were limited in the action that we could do, and it was hard to add new search features to our system. as these files were static, and you had to update the schema and create the collection configuration files each time you wanted to add something into the index.
This PR, integration the SOLR Schema API, in this project, gives us the option to create/delete a collection and update a collection schema dynamically, the following actions were implemented (see the full list in the
SOLR::Admin
,SOLR::Schema
andSOLR::SchemaGenerator
modules in the code):In Addition to the implementation of the SOLR Schema and admin APIs, we added a dsl to the Goo model, to enable index for any model, either in a schemaless mode or in a custom schema model
Changes