Skip to content

Commit

Permalink
Search refactor (#497)
Browse files Browse the repository at this point in the history
## Understandings

- Model filtering `Filtered()`
  - is not working as expected (not 100%)
  - this is implemented per-field (we see 40+ filters instead of 5)
- Indexed fields are not being inherited across models
  - At query generation not index time
- Interim mappings on indexed fields are not necessary 
- Model field name is not treated consistently

## Tasks

- [x] Wagtail update to 5.2 (LTS)
- [x] Update SearchFields - #499 
  - [x] RenamedField mixin to match Wagtail PR understandings
  - [x] `model_field_name` usage needs to be rationalised
- [x] `fields.py`/`index.py` review and agree where things belong
(taking inheritance into account)
- [x] Strip out field_mapping, and use each Model's indexed fields -
#502
  - [x] Remove from index time
  - [x] Remove from query building
  - [x] Remove all functions from SearchFields
- [x] Refactor out IndexManager #504 
- [x] Move functionality out of the `ModelIndexManager` and into
`Indexed`
  - [x] Review `IndexManager`/`QueryBuilder` relationship
  - [x] `get_indexed_models` and `is_class_indexed`
- [x] Add django-silk for local python profiling #508
- [x] Add Nested Query support and resolve field naming inconsistencies
#507
- [x] Add a single place / way to handle various forms of field naming,
including for nested
- [x] Update all existing places in the code (inc. compiler) to use
these new methods
  - [x] Get Nested Query building working inc with parent_field_name
  - [x] Search for `}.{` and review
- [x] Add ability to override a search field from parent classes in
`get_search_fields` and `get_indexed_fields` #512
  - [x] `Filtered` DSL is working as expected
  - [x] QueryBuilder uses `get_indexed_fields`
- [ ] Testing
  - [x] review xfailing tests
  - [x] review full test suite, writing new tests as needed
  - [x] write new smoke tests for all main eventualities
  - [ ] write new playwright e2e tests for main eventualities
- [ ] write smoke tests to cover all expected indexing - e.g.
Person.network etc
- [x] Optimisations #513
- [ ] See if we can't simplify output from `Or` where could be simpler
(ie with combine_queries)
  - [ ] Verify generated ES DSL to see if any obvious stuff there
  - [x] Look into resource usage to check bottlenecks
  - [x] output settings as dict 
  - [x] pre-cache built queries
  - [x] cache expensive model traversal methods
  - [x] prewarm caches
- [x] check everything is in the right file and the files are in the
right folders
- [x] review `@TODO`

---------

Co-authored-by: Marcel Kornblum <[email protected]>
Co-authored-by: marcelkornblum <[email protected]>
Co-authored-by: Cameron Lamb <[email protected]>
  • Loading branch information
4 people authored Nov 27, 2023
1 parent 8407f04 commit de2bd9c
Show file tree
Hide file tree
Showing 64 changed files with 4,507 additions and 2,211 deletions.
2 changes: 2 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,5 @@ FEEDBACK_NOTIFICATION_EMAIL_RECIPIENTS="[email protected],"

# Search
SEARCH_SHOW_INACTIVE_PROFILES_WITHIN_DAYS=90
# Manage apllication caching
SEARCH_ENABLE_QUERY_CACHE=False
9 changes: 9 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Uses CodeOwners to ensure the relevant Tech Lead sees any relevant PRs
#

# Search: Marcel
/src/extended_search/ @marcelkornblum
/src/search/ @marcelkornblum

# PeopleFinder: Cam
/src/peoplefinder @CamLamb
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
*~
/logs
/data
/profiler_results
.~*
common/CACHE
__pycache__
Expand Down Expand Up @@ -68,3 +69,5 @@ ner_output_file.txt
# Playwright
/playwright-report
/playwright/.auth

dw.dump
11 changes: 10 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,9 @@ local-setup:
dump-db:
pg_dump digital_workspace -U postgres -h localhost -p 5432 -O -x -c -f dw.dump

db-from-dump:
PGPASSWORD='postgres' psql -h localhost -U postgres digital_workspace -f dw.dump

reset-db:
docker-compose stop db
rm -rf ./.db/
Expand All @@ -175,6 +178,9 @@ first-use:
superuser:
$(wagtail) python manage.py shell --command="from django.contrib.auth import get_user_model; get_user_model().objects.create_superuser('admin', email='admin', password='password', first_name='admin', last_name='test')"

su-all:
$(wagtail) python manage.py shell --command="from django.contrib.auth import get_user_model; get_user_model().objects.all().update(is_superuser=True, is_staff=True)"

#
# Django
#
Expand Down Expand Up @@ -209,6 +215,9 @@ fixtree:
test:
$(testrunner) pytest -m "not e2e" --reuse-db $(tests)

test-fresh:
$(testrunner) pytest -m "not e2e" $(tests)

test-e2e: up-all
docker-compose exec playwright poetry run pytest -m "e2e" $(tests)
docker-compose stop playwright
Expand Down Expand Up @@ -271,4 +280,4 @@ ingest-uk-staff-locations:
$(wagtail) python manage.py ingest_uk_staff_locations

serve-docs:
poetry run mkdocs serve -a localhost:8002
poetry run mkdocs serve -a localhost:8002
2 changes: 1 addition & 1 deletion package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

126 changes: 96 additions & 30 deletions poetry.lock

Large diffs are not rendered by default.

4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ django_log_formatter_ecs = "^0.0.5"
django-staff-sso-client = "^4.1.1"
notifications-python-client = "^8.0.0"
# Wagtail
wagtail = "5.1.3"
wagtail = "^5.2"
wagtailmedia = "^0.14.2"
wagtailmenus = "^3.1.8"
wagtail-draftail-anchors = "^0.6.0"
Expand Down Expand Up @@ -84,6 +84,8 @@ blessings = "^1.7"
pytest-mock = "^3.11.1"
pytest-freezer = "^0.4.8"
playwright = "^1.36"
django-silk = "^5.0.4"
snakeviz = "^2.2.0"

[tool.poetry.group.docs.dependencies]
mkdocs-awesome-pages-plugin = "^2.8.0"
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ vine==5.0.0 ; python_version >= "3.11" and python_version < "4.0"
wagtail-adminsortable @ git+https://github.com/uktrade/[email protected] ; python_version >= "3.11" and python_version < "4.0"
wagtail-draftail-anchors==0.6.0 ; python_version >= "3.11" and python_version < "4.0"
wagtail-generic-chooser==0.6 ; python_version >= "3.11" and python_version < "4.0"
wagtail==5.1.3 ; python_version >= "3.11" and python_version < "4.0"
wagtail==5.2 ; python_version >= "3.11" and python_version < "4.0"
wagtailmedia==0.14.4 ; python_version >= "3.11" and python_version < "4.0"
wagtailmenus==3.1.9 ; python_version >= "3.11" and python_version < "4.0"
waitress==2.1.2 ; python_version >= "3.11" and python_version < "4.0"
Expand Down
10 changes: 8 additions & 2 deletions src/config/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,11 +98,11 @@

# Set up Django
LOCAL_APPS = [
"patch",
"core",
"feedback",
"home",
"content",
"search",
"news",
"working_at_dit",
"tools",
Expand Down Expand Up @@ -174,7 +174,9 @@
+ WAGTAIL_APPS
+ DJANGO_APPS
+ [
"extended_search", # must be last because it depends on models being loaded into memory
# Search apps must be last because it depends on models being loaded into memory
"search",
"extended_search",
]
)

Expand Down Expand Up @@ -464,6 +466,7 @@
SESSION_ENGINE = "django.contrib.sessions.backends.cache"
SESSION_CACHE_ALIAS = "default"


# Twitter
TWITTER_ACCESS_TOKEN = env("TWITTER_ACCESS_TOKEN")
TWITTER_ACCESS_SECRET = env("TWITTER_ACCESS_SECRET")
Expand Down Expand Up @@ -702,3 +705,6 @@
SEARCH_SHOW_INACTIVE_PROFILES_WITHIN_DAYS = env.int(
"SEARCH_SHOW_INACTIVE_PROFILES_WITHIN_DAYS", 90
)

# Enable the caching of the generated search query DSLs
SEARCH_ENABLE_QUERY_CACHE = env.bool("SEARCH_ENABLE_QUERY_CACHE", True)
20 changes: 20 additions & 0 deletions src/config/settings/developer.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,23 @@
INSTALLED_APPS += [ # noqa F405
"django_extensions",
]

try:
# Add django-silk for profiling
import silk # noqa F401

INSTALLED_APPS += [ # noqa F405
"silk",
]
MIDDLEWARE += [ # noqa F405
"silk.middleware.SilkyMiddleware",
]
SILKY_PYTHON_PROFILER = True
SILKY_PYTHON_PROFILER_BINARY = True
SILKY_PYTHON_PROFILER_RESULT_PATH = os.path.join( # noqa F405
PROJECT_ROOT_DIR, # noqa F405
"profiler_results",
)
SILKY_META = True
except ModuleNotFoundError:
...
5 changes: 3 additions & 2 deletions src/config/settings/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,10 @@
"AUTO_UPDATE": False,
}

INSTALLED_APPS += [ # noqa F405
INSTALLED_APPS = [
"testapp",
"django_extensions",
]
] + INSTALLED_APPS # noqa F405

# Remove the uk_staff_locations database
if "uk_staff_locations" in DATABASES: # noqa F405
Expand Down
18 changes: 10 additions & 8 deletions src/config/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,7 @@

from core.admin import admin_site
from core.urls import urlpatterns as core_urlpatterns
from peoplefinder.urls import (
api_urlpatterns,
people_urlpatterns,
teams_urlpatterns,
)

from peoplefinder.urls import api_urlpatterns, people_urlpatterns, teams_urlpatterns

urlpatterns = [
# URLs for Staff SSO Auth broker
Expand Down Expand Up @@ -50,10 +45,12 @@
path("sitemap.xml", sitemap),
# Feedback
path("feedback/", include(feedback_urls), name="feedback"),
# Wagtail
path("", include(wagtail_urls)),
]

# If django-silk is installed, add its URLs
if "silk" in settings.INSTALLED_APPS:
urlpatterns += [path("silk/", include("silk.urls", namespace="silk"))]

if settings.DEBUG:
from django.conf.urls.static import static
from django.contrib.staticfiles.urls import staticfiles_urlpatterns
Expand All @@ -62,6 +59,11 @@
urlpatterns += staticfiles_urlpatterns()
urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)

urlpatterns += [
# Wagtail
path("", include(wagtail_urls)),
]

# Removed until we find a fix for Wagtail's redirect behaviour
handler404 = "core.views.view_404"
handler500 = "core.views.view_500"
Expand Down
65 changes: 30 additions & 35 deletions src/content/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,8 @@
from content import blocks
from content.utils import manage_excluded, manage_pinned, truncate_words_and_chars
from core.utils import set_seen_cookie_banner
from extended_search.fields import IndexedField
from extended_search.index import Indexed
from extended_search.managers.index import ModelIndexManager
from extended_search.index import DWIndexedField as IndexedField
from peoplefinder.widgets import PersonChooser
from search.utils import split_query
from user.models import User as UserModel
Expand Down Expand Up @@ -141,38 +140,6 @@ def exclusions(self, query):
return self.filter(self.exclusions_q(query))


class ContentPageIndexManager(ModelIndexManager):
fields = [
IndexedField(
"search_title",
tokenized=True,
explicit=True,
fuzzy=True,
boost=5.0,
),
IndexedField(
"search_headings",
tokenized=True,
explicit=True,
fuzzy=True,
boost=3.0,
),
IndexedField(
"excerpt",
tokenized=True,
explicit=True,
boost=2.0,
),
IndexedField(
"search_content",
tokenized=True,
explicit=True,
),
IndexedField("is_creatable", filter=True),
IndexedField("published_date", proximity=True),
]


class ContentOwnerMixin(models.Model):
content_owner = models.ForeignKey(
"peoplefinder.Person",
Expand Down Expand Up @@ -295,7 +262,35 @@ class ContentPage(BasePage):
null=True,
)

search_fields = BasePage.search_fields + ContentPageIndexManager()
indexed_fields = [
IndexedField(
"search_title",
tokenized=True,
explicit=True,
fuzzy=True,
boost=5.0,
),
IndexedField(
"search_headings",
tokenized=True,
explicit=True,
fuzzy=True,
boost=3.0,
),
IndexedField(
"excerpt",
tokenized=True,
explicit=True,
boost=2.0,
),
IndexedField(
"search_content",
tokenized=True,
explicit=True,
),
IndexedField("is_creatable", filter=True),
IndexedField("published_date", proximity=True),
]

@property
def published_date(self):
Expand Down
9 changes: 8 additions & 1 deletion src/content/test/factories.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
import factory
import wagtail_factories

from content.models import ContentPage
from peoplefinder.test.factories import PersonFactory


class ContentPageFactory(factory.Factory):
class ContentPageFactory(wagtail_factories.PageFactory):
class Meta:
model = ContentPage


class ContentOwnerFactoryMixin:
content_owner = factory.SubFactory(PersonFactory)
content_contact_email = factory.Faker("email")
4 changes: 0 additions & 4 deletions src/extended_search/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,6 @@ DEFAULT_SETTINGS = {
"index_fieldname_suffix": "_keyword",
"query_types": ["phrase"],
},
"proximity": { # @TODO think this needs cleanup to work more like Fuzzy - i.e. built into query but analyzed as a FileterField
"es_analyzer": "keyword",
"index_fieldname_suffix": None,
},
},
}
```
Expand Down
6 changes: 4 additions & 2 deletions src/extended_search/admin.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

from core.admin import admin_site
from extended_search.models import Setting
from extended_search.settings import extended_search_settings
from extended_search import settings


class SettingAdminForm(forms.ModelForm):
Expand All @@ -15,7 +15,9 @@ class Meta:

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.fields["key"].choices = [(k, k) for k in extended_search_settings.all_keys]
self.fields["key"].choices = [
(k, k) for k in settings.extended_search_settings.all_keys
]


class SettingAdmin(admin.ModelAdmin):
Expand Down
19 changes: 13 additions & 6 deletions src/extended_search/apps.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
from django.apps import AppConfig
from django.conf import settings
from django.conf import settings as django_settings


class ExtendedSearchConfig(AppConfig):
name = "extended_search"

def ready(self):
import extended_search.signals # noqa
from extended_search.settings import extended_search_settings as search_settings
from extended_search import query_builder, settings
from extended_search.index import get_indexed_models

search_settings.initialise_field_dict()
search_settings.initialise_env_dict()
if settings.APP_ENV not in ["test", "build"]:
search_settings.initialise_db_dict()
settings.settings_singleton.initialise_field_dict()
settings.settings_singleton.initialise_env_dict()
if django_settings.APP_ENV not in ["test", "build"]:
settings.settings_singleton.initialise_db_dict()
settings.extended_search_settings = settings.settings_singleton.to_dict()
query_builder.extended_search_settings = settings.extended_search_settings

for model_class in get_indexed_models():
if hasattr(model_class, "indexed_fields") and model_class.indexed_fields:
query_builder.CustomQueryBuilder.build_search_query(model_class, True)
Loading

0 comments on commit de2bd9c

Please sign in to comment.