Load government data in an asynchronous task #1522

kevindew · 2019-12-12T17:35:59Z

Trello: https://trello.com/c/TQ6GPOHU/1228-spike-into-loading-governments-from-publishing-api

This introduces a new system, called BulkData, which is used to load data asynchronously for use by Content Publisher. This system is intended to be used in situations where we need to load a large amount of data that we would cache, loading this in can cause timeouts at various places across the app and is thus undesirable. The system to replace this uses a regular ActiveJob that updates the data every 15 minutes, moving the performance impacts to be a background concern. This builds on the work identified in #1021 and the first step towards implementing it to resolve various timeout problems.

What this PR does specifically is start an ActiveJob whenever the application is deployed and also ran every 15 minutes. This job calls the Publishing API to access all governments and stores the response in Redis for 24 hours.

Whenever Content Publisher needs to look up governments it will query Redis for this data. If the data is unavailable it will schedule the job to populate the data and throw an exception. Users will be a shown an error until this data is re-populated.

There's a likelihood users may see the error a reasonable amount in development environments if they are not used to starting the worker job with the app. As a follow up to this we should update govuk-docker to start content-publisher with a worker.

Screenshots of error pages:

Production / Test

Development

Further information in the commits

leenagupte · 2019-12-13T11:46:04Z

lib/bulk_data/cache.rb

+
+module BulkData
+  class Cache
+    include Singleton


Nice! I'm surprised you don't need to require "singleton" here.

Ah Rails auto/eager loading takes care of it.

ChrisBAshton

Well done in getting this significant piece of work to this point! Have attempted to review, some of my questions/suggestions may be missing some context, do feel free to bat back on them.

lib/bulk_data/cache.rb

ChrisBAshton · 2019-12-13T11:27:20Z

spec/lib/bulk_data/cache_spec.rb

+
+  describe ".wrote_at" do
+    it "returns the time the entry was created" do
+      time = 2.hours.ago.change(usec: 0)


A shame that the .change(usec: 0) bit is needed. Why is that?

It is a shame indeed. It's because travel_to does it to the time so we need it to match: https://github.com/rails/rails/blob/8bec77cc0f1fd47677a331a64f68c5918efd2ca9/activesupport/lib/active_support/testing/time_helpers.rb#L145

I could do it is an absolute time but I need it to be within the last 24 hours so it's tricky :-/

spec/lib/bulk_data/cache_spec.rb

app/models/edition.rb

spec/models/edition_spec.rb

ChrisBAshton · 2019-12-13T11:50:54Z

spec/features/scheduling/scheduled_publishing_spec.rb

@@ -64,6 +64,7 @@ def and_a_publish_intent_is_created

  def when_the_publishing_job_runs
    travel_to(Time.zone.parse("2019-6-22 9:00"))
+    populate_default_government_bulk_data


I think the fact that you've had to make this change in 3 places suggests we should move the when_the_publishing_job_runs implementation to somewhere shared 🤔 all three implementations are almost identical.

I think ideally we'd just have less feature tests for this and more of the other conditions handled by requests specs, where this would not be a concern. It doesn't seem out of place to do this sort of thing a few times in our feature specs.

I was rather sad when I realised the effect of the travel_to on clearing the cache and that I had to do this.

What was the effect of travel_to on the cache?

We cache bulk data for 24 hours. If we travel more than 24 hours after that time in Ruby code the cache will be invalidated. Here we jump from June 21st 0:00am to June 22nd 9:00am

spec/services/resync_service_spec.rb

ChrisBAshton · 2019-12-13T12:01:07Z

app/views/errors/local_data_unavailable.html.erb

+  <%= render partial: "errors/error", locals: { title: title, body_govspeak: body_govspeak } %>
+<% else %>
+  <%= render partial: "errors/error", locals: t("errors.local_data_unavailable") %>
+<% end %>


This whole template is smashing 👍

Thanks 😁

leenagupte

This looks great! 💯
The commit messages are great and make the code changes easy to follow, and the tests are really clear. Thanks for breaking up the commit history into smaller commits too, it made reviewing this much easier.

I only a have a few minor comments and things that could be clarified, but nothing major.

spec/lib/bulk_data/cache_spec.rb

spec/models/government_spec.rb

lib/bulk_data/government_repository.rb

spec/lib/bulk_data/government_repository_spec.rb

spec/jobs/application_job_spec.rb

leenagupte · 2019-12-13T15:32:52Z

spec/requests/unwithdraw_spec.rb

  describe "POST /documents/:document/unwithdraw" do
    context "when the edition is in history mode" do
+      let(:edition) do
+        create(:edition, :withdrawn, :political, government: past_government)
+      end


I'm not sure why this is being defined twice now, rather than once at the beginning?

This is now moot as I had to rebase against Peters changes which has outdated this.

I tend to go with trying to have lets near the code they're calling. I don't mind a bit of repetition if it's easy to find where the let is defined. Mainly because I find it difficult to read tests where let definitions occur a long way before a test.

leenagupte · 2019-12-13T15:33:56Z

spec/requests/withdraw_spec.rb

  describe "POST /documents/:document/withdraw" do
    context "when the edition is in history mode" do
+      let(:edition) do
+        create(:edition, :published, :political, government: past_government)
+      end


Same here. Is there a reason for duplicating this setup?

spec/jobs/populate_bulk_data_job_spec.rb

lib/bulk_data/cache.rb

leenagupte

This looks great! 💯
The commit messages are great and make the code changes easy to follow, and the tests are really clear. Thanks for breaking up the commit history into smaller commits too, it made reviewing this much easier.

I only a have a few minor comments and things that could be clarified, but nothing major.

Bulk data is the namespace that we will be using for download large amounts of data periodically for use in Content Publisher. Some examples of candidates for this are topics, contacts and organisations. These are data entities where we need to download all of the Publishing API's data rather than a small sampling of on demand data. Previously we have loaded this data on demand and it risks periodic timeouts for the application. As this data can be loaded in numerous places it also makes it difficult to determine performance issues in the application as it may be a contributing factor to any other tasks we think may be slow. This module contains two common exceptions that we're anticipating to use. LocalDataUnavailableError is for situations when we're expecting data to be available in our cache but it is not present. RemoteDataUnavailableError is for when we try to load remote data but this fails.

kevindew · 2019-12-13T19:19:33Z

@leenagupte and @ChrisBAshton thanks for the reviews. I've addressed the comments.

This class is intended to be the entry point for writing any BulkData. It uses the Rails Redis cache as an underlying store. Using Redis was a design choice for the following reasons: - Redis offers caching across multiple GOV.UK boxes, so one write will cover all machines - Redis is fast to read and write from And using the Rails cache implementation offered the following advantages: - Can be configured with [middleware][1] to avoid need to hit Redis multiple times in a request - Redis namespacing - Automatic serialisation/deserialisation An important concern is that we need to ignore the guidance Rails suggests to use a LRU redis cache, as we don't want to have keys automatically cleared out of redis. Instead we just want them to expire naturally. The methods on this class allow adding an item to the cache, removing an item, removing all items. On top of this there are time based methods which are intended to be used to determine when an item was added to cache. This is to enable checking how old something is when deciding whether to add something or not. This is a feature Redis doesn't have by default so it's quite simple to just add this a different entry, the Rails cache entry actually has some info about created time, but I have no idea how to access it and also think it's probably nicer to not have to pull out the whole cache entry to check it. [1]: https://github.com/rails/rails/blob/9926fd86626690746c7ee78174c4d3577bdfa58d/activesupport/lib/active_support/cache/redis_cache_store.rb#L44

This sets up this model to work with data from the Publishing API. It no longer has class methods to look up the model as these are going to be on a different class. The input we expect from the Publishing API is a hash of base_path, content_id, details, locale and title. We can look up started_on, ended_on and current dates from the details hash.

This allows classes that mix this module in to be serialised to a hash. This has been added so that instances that use this module can be serialised to fake cache data.

ChrisBAshton

Have skimmed through all the commits again, looks good to me 👍

The purpose of this class is to mediate communication between Publishing API and our cache adapter for governments. It writes entries to the cache using a cache key that is defined as a constant in this class. This is named as "government-v1" to indicate that this is version 1 of the data we are caching. If we are to make any backwards incompatible changes to the data we expect (such as adding an extra mandatory field) then we will need to increment this number. This will ensure that when the code change is deployed the old cache will not be used anymore and will expire naturally. It provides a number of methods to finding a particular government and a method to populate the cache.

This clears the BulkData::Cache after every spec and helpers to populate the cache before tests. Default governments are set up for the most common case where we just need to know about the current and previous governments. It also provides helper methods for current and past government as these are likely to be needed quite frequently.

This creates instances of the GovernmentRepository class whenever we need to load governments in the context of an edition. The factory has been updated to reflect that governments can't be determined without an external dependency and thus governments are always injected. This does raise the prospect of some potential performance concerns for the amount of classes initialised if we are frequently looking up government information. Mostly though I expect these things to not occur multiple times in the same request so I don't think it'll cause any actual performance issues.

This removes unnecessary instance variables and reduces long lines, it also removes an unnecessary test expectation.

By using I18n.t we wouldn't be raising errors if a translation was missing. We don't actually need to call I18n directly as Rails has a view helper of t which roughly correlates to I18n.t!

This field is referred to as a title in the Publishing API

This sets up some default governments in advance of tests so that the government data is populated. These are set-up before all feature and request tests.

By jumping more than 24 hours into the future the cache entries for governments expire, thus the government data needs to be re-stubbed.

As we no longer have past_government and current_government traits this injects these object in instead.

This cuts back on the boilerplate needed to run a worker job on a single box and deny another boxes that try and run this at the same time. This is a situation that occurs with the sidekiq scheduler where every box with a sidekiq process tries to run the job at the same time and the first one to start it wins. We counter this with a PostgreSQL advisory lock which prevents any other clients using the db while this lock is held (they gracefully exit). Ideally we want to run this lock inside a transaction so that if anything goes wrong the lock can be automatically closed, whereas if we run it outside a transaction and something crashes we may need to manually release the lock.

It was more common that we were using Sidekiq::Testing.fake! than relying on Sidekiq::Testing.inline! thus switching to this saves us code. This is also a preferred approach to take since it keeps us more honest by having to explicitly tell Sidekiq to perform actions rather than expecting them to just happen synchronously.

The first_published_at value should always be set on documents that have a live edition. I didn't know a nice way to specify an optional trait to the association method so have ended up with this array monstrosity. If you know a better technique feel free to change it.

This moves the it block outside of a context to be before context as per the more common convention. It also removes the government and political tests from their draft and live context since they aren't resultant effects of that particular context. Finally it makes the draft and live tests more consistent with the other edition state tests by creating an edition rather than a document by factory bot.

This sets up a job that is run every 15 minutes to update the bulk data. If accessing the data fails this job is retried. The schedule is set with an interval of 15 minutes and a delay of 0s which makes the job start immediately whenever the worker process is restarted. As this job is run on a schedule it will execute on all machines at the same time. To counter this the job is run exclusively using the method on the parent class.

This will schedule a background job to load the data if for some reason it's not available in the current environment and a service has tried to use it. There is a risk that this could cause a spike in identical queued jobs were a bunch of users to get this at the same time. This shouldn't present much of a problem as the job only executes one at a time and won't do anything if the data was updated in the last 5 minutes.

If the application is ever in a state where there is not local data unavailable it will return a 503 response and a simple error message. As I anticipate that developers may often be people who see this message (at least while sidekiq isn't run by default with the app) I thought it was useful to have explicit instructions in a developer environment. It was pretty hard to think of the best way to test this so I went with this rather contrived example since documents_show is one of the main routes. I expect that when the BulkData technique is used as part of index page filters we can replace this test with a less contrived one.

This is no longer needed given we are now using the Publishing API

This spec relies on a government existing at the date we specify which caused this test to fail. By creating the government in the test we resolve this complication.

This sets up a local cache that wraps around the Redis cache layer of BulkData::Cache during the duration of a request. In order for this to work in a development environment we need the BulkData::Cache class not to be reloaded automatically. I achieved this by doing direct require calls. I first tried using the Rails autoload_once_paths option but didn't have much luck. As an example, before fixed require calls: ``` irb(main):002:0> BulkData::Cache.object_id => 70342543814920 irb(main):003:0> reload! Reloading... => true irb(main):004:0> BulkData::Cache.object_id => 70342595490220 ``` and after: ``` irb(main):002:0> BulkData::Cache.object_id => 70249028613320 irb(main):004:0> reload! Reloading... => true irb(main):005:0> BulkData::Cache.object_id => 70249028613320 ```

By default the RedisCacheStore uses a fault tolerant approach where it returns nil if it can't access Redis and writes to a logger that we haven't configured [1]. This isn't an ideal configuration for our use case as it's a critical problem if the cache isn't available. To try resolve this we've got 3 reconnect attempts available if we don't manage to connect to Redis or have lost our connection. In the event that we have an error reading from the cache this will report the error to sentry, which will hopefully help make problems easier to diagnose. [1]: https://github.com/rails/rails/blob/09a2979f75c51afb797dd60261a8930f84144af8/activesupport/lib/active_support/cache/redis_cache_store.rb#L472-L485

There's not a whole lot we can do when a dependent service is erroring so lets not report it and wait to see if it causes problems in the app. This is to resolve an issue where we start getting server errors overnight in integration at the point of the data sync.

leenagupte

Thanks for addressing the comments.

LGTM 👍 🥇

Since alphagov/content-publisher#1522 was introduced Content Publisher has required a worker to operate normally. Because of this it is now added as an app dependency.

karlbaker02 temporarily deployed to content-publisher-revi-pr-1522 December 12, 2019 17:36 Inactive

kevindew force-pushed the bulk-data-cache-nov branch from 746d3e2 to b89f071 Compare December 12, 2019 23:57

karlbaker02 temporarily deployed to content-publisher-revi-pr-1522 December 12, 2019 23:58 Inactive

leenagupte reviewed Dec 13, 2019

View reviewed changes

ChrisBAshton reviewed Dec 13, 2019

View reviewed changes

leenagupte reviewed Dec 13, 2019

View reviewed changes

kevindew force-pushed the bulk-data-cache-nov branch from b89f071 to 40f7c2f Compare December 13, 2019 19:08

karlbaker02 temporarily deployed to content-publisher-revi-pr-1522 December 13, 2019 19:09 Inactive

kevindew force-pushed the bulk-data-cache-nov branch from 40f7c2f to 59f63b2 Compare December 13, 2019 19:13

karlbaker02 temporarily deployed to content-publisher-revi-pr-1522 December 13, 2019 19:14 Inactive

kevindew added 3 commits December 15, 2019 16:17

Provide a to_h method on InitializeWithHash

ec71269

This allows classes that mix this module in to be serialised to a hash. This has been added so that instances that use this module can be serialised to fake cache data.

ChrisBAshton approved these changes Dec 16, 2019

View reviewed changes

kevindew force-pushed the bulk-data-cache-nov branch from 59f63b2 to 9249b84 Compare December 16, 2019 09:30

karlbaker02 temporarily deployed to content-publisher-revi-pr-1522 December 16, 2019 09:30 Inactive

kevindew added 11 commits December 16, 2019 09:52

Use new government mocking approach for PreviewService::Payload

5a5fc1a

Load governments from repository for PublishService

5aff300

Minor refactoring of document request spec

3bac691

This removes unnecessary instance variables and reduces long lines, it also removes an unnecessary test expectation.

Fix incorrect I18n call

49240a9

By using I18n.t we wouldn't be raising errors if a translation was missing. We don't actually need to call I18n directly as Rails has a view helper of t which roughly correlates to I18n.t!

Use government repository in edit edition service

dbe6b81

Switch government name to title

59e57e4

This field is referred to as a title in the Publishing API

Stub governments for tests

7ede62a

This sets up some default governments in advance of tests so that the government data is populated. These are set-up before all feature and request tests.

Re-populate governments when working with future dates

5cc6f84

By jumping more than 24 hours into the future the cache entries for governments expire, thus the government data needs to be re-stubbed.

kevindew added 15 commits December 16, 2019 09:52

Adjust request/feature tests for change in government trait

06d58ae

As we no longer have past_government and current_government traits this injects these object in instead.

Stub government data for edition filter specs

070b673

Use government repository for ResyncService

4f42b1b

Remove hardcoded government config

d2ad3e7

This is no longer needed given we are now using the Publishing API

Use specific government in history mode spec

be7cb1f

This spec relies on a government existing at the date we specify which caused this test to fail. By creating the government in the test we resolve this complication.

kevindew force-pushed the bulk-data-cache-nov branch from 9249b84 to d395aa5 Compare December 16, 2019 09:52

karlbaker02 deployed to content-publisher-revi-pr-1522 December 16, 2019 09:53 View deployment

leenagupte approved these changes Dec 16, 2019

View reviewed changes

kevindew merged commit 151f9b4 into master Dec 17, 2019

kevindew deleted the bulk-data-cache-nov branch December 17, 2019 08:59

kevindew mentioned this pull request Jan 6, 2020

Add worker to Content Publisher dependencies and remove e2e stack alphagov/govuk-docker#292

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load government data in an asynchronous task #1522

Load government data in an asynchronous task #1522

kevindew commented Dec 12, 2019

leenagupte Dec 13, 2019

kevindew Dec 13, 2019

ChrisBAshton left a comment

ChrisBAshton Dec 13, 2019

kevindew Dec 13, 2019

ChrisBAshton Dec 13, 2019

kevindew Dec 13, 2019

leenagupte Dec 13, 2019

kevindew Dec 13, 2019

ChrisBAshton Dec 13, 2019

kevindew Dec 13, 2019

leenagupte left a comment

leenagupte Dec 13, 2019

kevindew Dec 13, 2019

leenagupte Dec 13, 2019

leenagupte left a comment

kevindew commented Dec 13, 2019

ChrisBAshton left a comment

leenagupte left a comment

Load government data in an asynchronous task #1522

Load government data in an asynchronous task #1522

Conversation

kevindew commented Dec 12, 2019

Production / Test

Development

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChrisBAshton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leenagupte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leenagupte left a comment

Choose a reason for hiding this comment

kevindew commented Dec 13, 2019

ChrisBAshton left a comment

Choose a reason for hiding this comment

leenagupte left a comment

Choose a reason for hiding this comment