This is a package that allows indexing of django models using elasticsearch. It requires django, elasticsearch-py, elasticsearch-dsl and a running instance of elasticsearch.
- Management commands (rebuild_index and update_index, clear_index)
- Django signal receivers on save and delete for keeping ES in sync
- Complex field type support (ObjectField, NestedField, ListField)
- Based of the features of elasticsearch-dsl
Add ‘elasticmodels’ to INSTALLED_APPS
You must define ELASTICSEARCH_CONNECTIONS in your django settings.
For example:
ELASTICSEARCH_CONNECTIONS = {
'default': {
'hosts': ['http://localhost:9200',],
'index_name': 'my_index',
}
'fish': {
'hosts': ['http://example.com:9200',],
'index_name': 'fish',
}
}
Now consider a model like this defined in our app's models.py
file:
class Car(models.Model):
license = models.CharField(primary_key=True)
color = models.CharField()
description = models.TextField()
type = models.IntegerField(choices=[
(1, "Sedan"),
(2, "Truck"),
(4, "SUV"),
])
To make this model work with Elasticsearch, create a subclass of
elasticmodels.Index
. You can define the class wherever you want. We'll put in
a file called indexes.py
inside our Django app:
from elasticmodels import Index, StringField, IntegerField, NestedField
from .models import Car
class CarIndex(Index):
class Meta:
model = Car
fields = [
'license',
'color',
'description',
'type',
]
The Index subclass needs to be imported during the normal execution of your
program. A good way to make that happen is to place this line of code at the
bottom of the models.py
file:
from .indexes import * # noqa isort:skip
We have to put it at the bottom of the file, otherwise, we'd have a circular
import problem. (The # noqa isort:skip
stuff is useful if you're using the
flake8
and isort
packages.)
Elasticmodels will automatically setup a mapping in Elasticsearch for the Car
model, where the Elasticsearch fields are derived from the fields
attribute
on the Meta class.
To create the Elasticsearch index and mappings, use the rebuild_index management command:
./manage.py rebuild_index
Now, when you do something like:
car = Car(license="PYNERD", color="red", type=1, description="A beautiful car")
car.save()
The object will be saved in Elasticsearch too (using a signal handler). To get a pre-filtered Elasticsearch-DSL Search instance, use:
CarIndex.objects.all()
# or
CarIndex.objects.filter("term", color="red")
# or
CarIndex.objects.query("match", description="beautiful")
The return value of these method calls is an Elasticsearch-DSL instance.
Let's say you don't want to store the type of the car as an integer, but as the corresponding string instead. You need some way to convert the type field on the model to a string, so we'll just add a method for it:
class Car(models.Model):
# ... #
def type_to_string(self):
"""Convert the type field to its string representation (the boneheaded way)"""
if self.type == 1:
return "Sedan"
elif self.type == 2:
return "Truck"
else:
return "SUV"
Now we need to tell our Index subclass to use that method instead of just
accessing the type
field on the model directly. Change the CarIndex to look
like this:
class CarIndex(Index):
# add a string field to the Elasticsearch mapping called type, the value of
# which is derived from the model's type_to_string attribute
type = StringField(attr="type_to_string")
class Meta:
model = Car
# we removed the type field from here
fields = [
'license',
'color',
'description',
]
Of course, we need to rebuild the index ./manage.py rebuild_index
after we
make a change like this.
Now when a Car is saved, to determine the value to use for the "type" field, it
looks up the attribute "type_to_string", sees that it's callable, and calls it
(instead of just accessing model_instance.type
directly).
Elasticsearch supports object and nested field types. So does Elasticmodels.
Consider a property like this on our Car model:
class Car(models.Model):
# ... #
@property
def extra_data(self):
"""Generate some extra data to save with the model in ES"""
return {
"a_key": "a value",
"another_key": 5
}
We can add a NestedField or ObjectField to our CarIndex to save this extra_data to ES.
class CarIndex(Index):
type = StringField(attr="type_to_string")
extra_data = NestedField(properties={
"a_key": StringField(),
"number": IntegerField(attr="another_key")
})
class Meta:
model = Car
fields = [
'license',
'color',
'description',
]
When a Car is saved, model_instance.extra_data
will be looked up/called, and whatever
it returns, will be used as the basis to populate the sub-fields listed in properties
.
If you want to store a list of values for a particular field, wrap the field
with ListField
:
class Car(models.Model):
# ... #
@property
def some_stuff(self):
"""Generate some extra data to save with the model in ES"""
return ["alpha", "beta", "gamma"]
class CarIndex(Index):
# ... #
some_stuff = ListField(StringField(attr="some_stuff"))
# ... #
Sometimes, you need to do some extra prepping before a field should be saved to
elasticsearch. You can add a prepare_foo(self, instance)
method to an Index
(where foo is the name of the field), and that will be called when the field
needs to be saved.
class CarIndex(Index):
# ... #
foo = StringField()
def prepare_foo(self, instance):
return " ".join(instance.foos)
# ... #
The subclass of Index has a get_queryset method that by default, just returns the queryset on the model's default manager. If your indexed fields span foreign keys, you may want to override this to select the related models:
class CarIndex(Index):
# ... #
# assume `make` and `model` are foreign keys on the Car model
make = StringField(attr="make.name")
model = StringField(attr="model.name")
# ... #
def get_queryset(self, start=None, end=None):
return super().get_queryset(start, end).select_related("make", "model")
Now when the CarIndex is rebuild, it won't have to do a bunch of extra queries
to access the make
and model
fields.
The get_queryset method takes start
and end
datetime objects as arguments,
which is useful if you're using the --start
and --end
flags when using the
management commands. The default implementation of get_queryset will use those
arguments to filter based on the field named by the date_field
attribute
on the Meta class.
Elasticmodels watches for the post_save and post_delete signals and updates the ES index appropriately.
If you're updating a bunch of objects at once, you should use the suspended_updates context manager so you can more efficently batch process the ES updates:
from elasticmodels import suspended_updates
with suspended_updates():
model1.save()
model2.save()
model3.save()
model4.delete()
clear_index [--using default --using ...] [--noinput] <app[.model] app[.model] ...>
By default, this clears every model index (an Elasticsearch mapping), prompting before doing it. You can limit which connections and models/apps are affected.
update_index [--using default --using ...] [--start yyyy-mm-dd] [--end yyyy-mm-dd] [<app[.model] app[.model] ...>
Update every model index. You can limit the scope of the updates by passing a start and end date, and/or which models/apps/connections to use.
rebuild_index [--clopen] [--using default --using ...] [--noinput] <app[.model] app[.model] ...>
Shortcut to clear_index and update_index. It will detect a conflict in your
analyzers. If there is a conflict, it will show a diff of the analysis sections
defined in Python and ES. Use --clopen
to close the ES index, update the
analysis, and reopen the ES index.
Most elasticsearch field types are supported. The attr
argument is a dotted
"attribute path" which will be looked up on the model using Django template
semantics (dict lookup, attribute lookup, list index lookup). For example
attr="foo.bar"
will try to fetch the first value that doesn't raise an
exception in this order:
If instance['foo'] doesn't raise an exception:
instance['foo']['bar']
instance['foo'].bar
instance['foo'][bar]
else if instance.foo doesn't raise an exception:
instance.foo['bar']
instance.foo.bar
instance.foo[bar]
else if instance[foo] doesn't raise an exception:
instance[foo]['bar']
instance[foo].bar
instance[foo][bar]
Extra keyword arguments are passed directly to elasticsearch when the mapping is created.
- StringField(attr=None, **elasticsearch_properties)
- FloatField(attr=None, **elasticsearch_properties)
- DoubleField(attr=None, **elasticsearch_properties)
- ByteField(attr=None, **elasticsearch_properties)
- ShortField(attr=None, **elasticsearch_properties)
- IntegerField(attr=None, **elasticsearch_properties)
- DateField(attr=None, **elasticsearch_properties)
- BooleanField(attr=None, **elasticsearch_properties)
properties
is a dict where the key is a field name, and the value is a field
instance.
- TemplateField(template_name, **elasticsearch_properties)
- ObjectField(properties, attr=None, **elasticsearch_properties)
- NestedField(properties, attr=None, **elasticsearch_properties)
- ListField(field)
You can define analyzers and use them on fields:
from elasticmodels import Index, ListField, IntegerField, StringField
from elasticsearch_dsl import analyzer, tokenizer, token_filter
name = analyzer(
"name",
# the standard analyzer splits the words nicely by default
tokenizer=tokenizer("standard"),
filter=[
# technically, the standard filter doesn't do anything but we include
# it anyway just in case ES decides to make use of it
"standard",
# obviously, lowercasing the tokens is a good thing
"lowercase",
# ngram it up
token_filter(
"simple_edge",
type="nGram",
min_gram=2,
max_gram=4
)
]
)
class CarIndex(Index):
# ... #
# use the builtin ES keyword analyzer
foo = StringField(analyzer="keyword")
# use our fancy analyzer
name = StringField(analyzer=name)
# ... #
When the mapping is created in ES, the analyzer will be created for you.
class Meta:
# a list of model field names as strings, which will be included in the
# ES mapping
fields = []
# the mapping name to use for this in elasticsearch. The
# default is derived from the app and model name
doc_type = "appname_modelname"
# the ELASTICSEARCH_CONNECTIONS connection to use for this index
using = "default"
# the ES dynamic property to use for the mapping
# dynamic = "strict" <-- This isn't supported by elaticsearch DSL yet
# the field to use for management commands when using the `--start` and
# `--end` options. The default is None.
date_field = "modified_on"
# when the .save() or .delete() method is called on a model
# object, any indexes for that model will automatically update the index
# in ES. If you don't want that behavior, change this to True
ignore_signals = False
In your settings file, set
TEST_RUNNER = 'elasticmodels.SearchRunner'
or subclass it with your own test runner. By default, no data is inserted/updated/deleted by Elasticmodels because it's slow.
If you need a TestCase that actually hits ES, add the mixin elasticmodels.ESTestCase
(this class subclasses unittest.TestCase). For each test, all the indexes are destroyed and recreated. The index names are suffixed with "_test" so your real data is not clobbered. If you override setUp(), make sure you call the superclass's setUp before creating model objects you want inserted into elasticsearch via Elasticmodels (via the post_save handler).
Here's an example:
import elasticmodels
from django.test import TestCase
from project.cars.models import Car
from project.cars.indexes import CarIndex
class SomeTest(TestCase, elasticmodels.ESTestCase):
# ... #
def setUp(self):
# by calling setUp(), elasticmodels will create all the elasticsearch
# indexes defined in your project, and suffix their names with _test
super().setUp()
self.car = Car(license="ND4SPD")
# When `car` is saved, it will be inserted into elasticsearch via
# elasticmodel's post_save handler
self.car.save()
def test_something(self):
self.assertEqual(1, CarIndex.objects.query("match", license="ND4SPD").count())
# ... #
To run the test suite for Python3
make test
It is assumed you have Elasticsearch running on localhost:9200. An index will be used called "elasticmodels-unit-test-db"