-
Notifications
You must be signed in to change notification settings - Fork 36
SIREn is an extension for Apache Lucene and Solr. SIREn adds new features to Lucene and Solr for processing and searching highly heterogeneous semi-structured data (e.g., RDF). In essence, SIREn adds a new "Field Type" with a set of specific tools such as Analyzers, Query Operators and Query Parser. If you were looking for a way to:
- have a real schema-less solution, i.e., you don't have to define all of your fields ahead of time, without penalties on the system performance;
- search efficiently over millions of fields;
- create a Lucene Document containing sub-element (i.e., nested child elements)
then SIREn might be a solution for you.
SIREn extends Lucene and Solr, meaning that you can still use all the features of Solr in conjunction with the features provided by SIREn.
As SIREn introduces a new "Field Type" with a different data model than what can be found in Lucene, SIREn needs its own implementation of each query type supported by Lucene. Currently, most of the core query types that can be found in Lucene have been implemented for SIREn. The table below summarises the current status.
Query Types | SIREn | Lucene |
---|---|---|
Boolean Query | Yes | Yes |
Phrase Query | Yes | Yes |
Proximity (Span) Query | No | Yes |
Wildcard Query | Yes | Yes |
Prefix Query | Yes | Yes |
FuzzyQuery | Yes | Yes |
Range Query | Yes | Yes |
Numeric Range Query | Yes | Yes |
In addition, SIREn provides new query types such as Tuple Query and Cell Query. These new query types allows more complex "structured query" than what Lucene proposes. For example, by using these query types, it is possible now to perform efficient search over an unlimited number of fields, or to perform queries over nested child elements.
The SIREn query types are compatible with the Lucene Boolean query type, i.e., you can combine SIREn query types using the Lucene BooleanQuery.
In the future, SIREn will propose new query types that are similar to XPath, such as the Parent Child query types.
Yes, SIREn returns a list of results that are automatically ranked based on their relevance to your query.
Yes, you can create arbitrary facets using SIREn's query with the Solr Query Faceting feature.
Yes, SIREn support highlighting.
At the momnet, SIREn does not support sorting on a particular value of a SIREn field. This might be supported in a future release. However, you can still use sorting on a Lucene field.
Yes. Similarly to Lucene, language agnostic search can be achieved by carefully designing your indexing and querying analysis pipeline, e.g., by using appropriate word stemming filters.
In addition, SIREn provides more flexibility than Lucene/Solr for such a task. In Lucene, a field is restricted to have one single analyzer. In SIREn, you can associate one analyzer per field value, i.e., SIREn allows to associate more than one analyzer for one single field. You can then have one field with multiple values, each one in a different language, and you can configure SIREn to use a different analyzer based on the language of the value.