Skip to content

Conversation

@isobel-softwire
Copy link
Contributor

Ticket number

PRSD-1023

Goal of change

Makes LA property search efficient

Description of main change(s)

  • Creates SQL script for generating load test data
  • Adds copy of single line address to property ownership entity, alongside triggers for synchronisation
  • Adds index on property ownership single line addresses
  • Updates property search query accordingly

Anything you'd like to highlight to the reviewer?

  • The SQL script generates 2,496,401 landlords and 4,992,802 property ownerships. This models the 2.82 million landlords and 4.7 million properties in the PRS, according to the English Private Landlord Survey
  • Property searches take about 2 seconds on average, but this number varies based on how specific the search is. E.g. searching for a single letter will take longer.

Checklist

Delete any that are not applicable, and add explanation below for any that are applicable but haven't been done

  • Test suite has been run in full locally and is passing
  • Branch has been rebased onto main and run locally, with everything working as expected (both for your new feature
    and any related functionality)

CREATE TRIGGER update_property_ownership_single_line_address
AFTER UPDATE OF single_line_address ON address
FOR EACH ROW
WHEN (OLD.single_line_address IS DISTINCT FROM NEW.single_line_address)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this approach not massively slow down the seeding of the DB with NGD address data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been removed as we'll be referencing property ownerships from addresses rather than the other way around

Copy link
Contributor Author

@isobel-softwire isobel-softwire Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, querying the address table is much slower than querying the property ownership table, even with a partial index. I'll go back to referencing addresses from property ownerships. To combat slowing down NGD data loading, we can refresh the addresses once after loading finishes, rather than using this trigger.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intrigued that it takes longer - would expect it to be at least similar - but yes, if we can update them for each batch in one go during ingest instead of using the trigger that should at least limit the slow-down there

@Travis-Softwire
Copy link
Collaborator

Does the 2 second average drop if we disallow searches below a certain number of characters? Or is that the best case?

@@ -0,0 +1,389 @@
-- This script populates the database with n^2 landlords, where 'n' is the cardinality of the name array passed to load_search_data().
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I paused a bit over doing all of this in SQL though vs doing the manipulation on the host machine in either a Kotlin gradle task or a standalone node/TS script and just doing the batched inserts into the database.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned on slack - on reflection this is fine given we'll use it so infrequently

Copy link
Collaborator

@Travis-Softwire Travis-Softwire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments - I'm not 100% sure about doing this all in SQL as assuming that this is run from a dev laptop it'll all execute on the DB instance rather than the (much more powerful) host. It's also not the most readable language for non-trivial logic! But open to being persuaded that we should do it this way

@isobel-softwire
Copy link
Contributor Author

Does the 2 second average drop if we disallow searches below a certain number of characters? Or is that the best case?

Not necessarily, as longer words that appear in many addresses (e.g. 'Manchester') will still return a large number of results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants