-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load government data in an asynchronous task #1522
Commits on Dec 13, 2019
-
Bulk data is the namespace that we will be using for download large amounts of data periodically for use in Content Publisher. Some examples of candidates for this are topics, contacts and organisations. These are data entities where we need to download all of the Publishing API's data rather than a small sampling of on demand data. Previously we have loaded this data on demand and it risks periodic timeouts for the application. As this data can be loaded in numerous places it also makes it difficult to determine performance issues in the application as it may be a contributing factor to any other tasks we think may be slow. This module contains two common exceptions that we're anticipating to use. LocalDataUnavailableError is for situations when we're expecting data to be available in our cache but it is not present. RemoteDataUnavailableError is for when we try to load remote data but this fails.
Configuration menu - View commit details
-
Copy full SHA for 62bab09 - Browse repository at this point
Copy the full SHA 62bab09View commit details
Commits on Dec 15, 2019
-
BulkData cache singleton class
This class is intended to be the entry point for writing any BulkData. It uses the Rails Redis cache as an underlying store. Using Redis was a design choice for the following reasons: - Redis offers caching across multiple GOV.UK boxes, so one write will cover all machines - Redis is fast to read and write from And using the Rails cache implementation offered the following advantages: - Can be configured with [middleware][1] to avoid need to hit Redis multiple times in a request - Redis namespacing - Automatic serialisation/deserialisation An important concern is that we need to ignore the guidance Rails suggests to use a LRU redis cache, as we don't want to have keys automatically cleared out of redis. Instead we just want them to expire naturally. The methods on this class allow adding an item to the cache, removing an item, removing all items. On top of this there are time based methods which are intended to be used to determine when an item was added to cache. This is to enable checking how old something is when deciding whether to add something or not. This is a feature Redis doesn't have by default so it's quite simple to just add this a different entry, the Rails cache entry actually has some info about created time, but I have no idea how to access it and also think it's probably nicer to not have to pull out the whole cache entry to check it. [1]: https://github.com/rails/rails/blob/9926fd86626690746c7ee78174c4d3577bdfa58d/activesupport/lib/active_support/cache/redis_cache_store.rb#L44
Configuration menu - View commit details
-
Copy full SHA for a764576 - Browse repository at this point
Copy the full SHA a764576View commit details -
Adapt Government model for loading from Publishing API
This sets up this model to work with data from the Publishing API. It no longer has class methods to look up the model as these are going to be on a different class. The input we expect from the Publishing API is a hash of base_path, content_id, details, locale and title. We can look up started_on, ended_on and current dates from the details hash.
Configuration menu - View commit details
-
Copy full SHA for 659c391 - Browse repository at this point
Copy the full SHA 659c391View commit details -
Provide a to_h method on InitializeWithHash
This allows classes that mix this module in to be serialised to a hash. This has been added so that instances that use this module can be serialised to fake cache data.
Configuration menu - View commit details
-
Copy full SHA for ec71269 - Browse repository at this point
Copy the full SHA ec71269View commit details
Commits on Dec 16, 2019
-
GovernmentRepository class for BulkData
The purpose of this class is to mediate communication between Publishing API and our cache adapter for governments. It writes entries to the cache using a cache key that is defined as a constant in this class. This is named as "government-v1" to indicate that this is version 1 of the data we are caching. If we are to make any backwards incompatible changes to the data we expect (such as adding an extra mandatory field) then we will need to increment this number. This will ensure that when the code change is deployed the old cache will not be used anymore and will expire naturally. It provides a number of methods to finding a particular government and a method to populate the cache.
Configuration menu - View commit details
-
Copy full SHA for 77a50c2 - Browse repository at this point
Copy the full SHA 77a50c2View commit details -
Set up spec helpers for bulk data
This clears the BulkData::Cache after every spec and helpers to populate the cache before tests. Default governments are set up for the most common case where we just need to know about the current and previous governments. It also provides helper methods for current and past government as these are likely to be needed quite frequently.
Configuration menu - View commit details
-
Copy full SHA for 2e55195 - Browse repository at this point
Copy the full SHA 2e55195View commit details -
Transition edition to use BulkData::GovernmentRepository
This creates instances of the GovernmentRepository class whenever we need to load governments in the context of an edition. The factory has been updated to reflect that governments can't be determined without an external dependency and thus governments are always injected. This does raise the prospect of some potential performance concerns for the amount of classes initialised if we are frequently looking up government information. Mostly though I expect these things to not occur multiple times in the same request so I don't think it'll cause any actual performance issues.
Configuration menu - View commit details
-
Copy full SHA for 10bd447 - Browse repository at this point
Copy the full SHA 10bd447View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5a5fc1a - Browse repository at this point
Copy the full SHA 5a5fc1aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5aff300 - Browse repository at this point
Copy the full SHA 5aff300View commit details -
Minor refactoring of document request spec
This removes unnecessary instance variables and reduces long lines, it also removes an unnecessary test expectation.
Configuration menu - View commit details
-
Copy full SHA for 3bac691 - Browse repository at this point
Copy the full SHA 3bac691View commit details -
By using I18n.t we wouldn't be raising errors if a translation was missing. We don't actually need to call I18n directly as Rails has a view helper of t which roughly correlates to I18n.t!
Configuration menu - View commit details
-
Copy full SHA for 49240a9 - Browse repository at this point
Copy the full SHA 49240a9View commit details -
Configuration menu - View commit details
-
Copy full SHA for dbe6b81 - Browse repository at this point
Copy the full SHA dbe6b81View commit details -
Switch government name to title
This field is referred to as a title in the Publishing API
Configuration menu - View commit details
-
Copy full SHA for 59e57e4 - Browse repository at this point
Copy the full SHA 59e57e4View commit details -
This sets up some default governments in advance of tests so that the government data is populated. These are set-up before all feature and request tests.
Configuration menu - View commit details
-
Copy full SHA for 7ede62a - Browse repository at this point
Copy the full SHA 7ede62aView commit details -
Re-populate governments when working with future dates
By jumping more than 24 hours into the future the cache entries for governments expire, thus the government data needs to be re-stubbed.
Configuration menu - View commit details
-
Copy full SHA for 5cc6f84 - Browse repository at this point
Copy the full SHA 5cc6f84View commit details -
Adjust request/feature tests for change in government trait
As we no longer have past_government and current_government traits this injects these object in instead.
Configuration menu - View commit details
-
Copy full SHA for 06d58ae - Browse repository at this point
Copy the full SHA 06d58aeView commit details -
Helper method to run ActiveJobs exclusively
This cuts back on the boilerplate needed to run a worker job on a single box and deny another boxes that try and run this at the same time. This is a situation that occurs with the sidekiq scheduler where every box with a sidekiq process tries to run the job at the same time and the first one to start it wins. We counter this with a PostgreSQL advisory lock which prevents any other clients using the db while this lock is held (they gracefully exit). Ideally we want to run this lock inside a transaction so that if anything goes wrong the lock can be automatically closed, whereas if we run it outside a transaction and something crashes we may need to manually release the lock.
Configuration menu - View commit details
-
Copy full SHA for edcf545 - Browse repository at this point
Copy the full SHA edcf545View commit details -
Default to fake sidekiq testing
It was more common that we were using Sidekiq::Testing.fake! than relying on Sidekiq::Testing.inline! thus switching to this saves us code. This is also a preferred approach to take since it keeps us more honest by having to explicitly tell Sidekiq to perform actions rather than expecting them to just happen synchronously.
Configuration menu - View commit details
-
Copy full SHA for c59ee81 - Browse repository at this point
Copy the full SHA c59ee81View commit details -
Configuration menu - View commit details
-
Copy full SHA for 070b673 - Browse repository at this point
Copy the full SHA 070b673View commit details -
Add document live trait that sets first_published_at
The first_published_at value should always be set on documents that have a live edition. I didn't know a nice way to specify an optional trait to the association method so have ended up with this array monstrosity. If you know a better technique feel free to change it.
Configuration menu - View commit details
-
Copy full SHA for a7fd0d4 - Browse repository at this point
Copy the full SHA a7fd0d4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4f42b1b - Browse repository at this point
Copy the full SHA 4f42b1bView commit details -
Minor refactoring of ResyncService specs
This moves the it block outside of a context to be before context as per the more common convention. It also removes the government and political tests from their draft and live context since they aren't resultant effects of that particular context. Finally it makes the draft and live tests more consistent with the other edition state tests by creating an edition rather than a document by factory bot.
Configuration menu - View commit details
-
Copy full SHA for 3d3245c - Browse repository at this point
Copy the full SHA 3d3245cView commit details -
Add job to populate bulk data asynchronously
This sets up a job that is run every 15 minutes to update the bulk data. If accessing the data fails this job is retried. The schedule is set with an interval of 15 minutes and a delay of 0s which makes the job start immediately whenever the worker process is restarted. As this job is run on a schedule it will execute on all machines at the same time. To counter this the job is run exclusively using the method on the parent class.
Configuration menu - View commit details
-
Copy full SHA for 69f4964 - Browse repository at this point
Copy the full SHA 69f4964View commit details -
Asynchronously load bulk data if it is not available
This will schedule a background job to load the data if for some reason it's not available in the current environment and a service has tried to use it. There is a risk that this could cause a spike in identical queued jobs were a bunch of users to get this at the same time. This shouldn't present much of a problem as the job only executes one at a time and won't do anything if the data was updated in the last 5 minutes.
Configuration menu - View commit details
-
Copy full SHA for dfb1eb2 - Browse repository at this point
Copy the full SHA dfb1eb2View commit details -
Add an error view for BulkData::LocalDataUnavailableError
If the application is ever in a state where there is not local data unavailable it will return a 503 response and a simple error message. As I anticipate that developers may often be people who see this message (at least while sidekiq isn't run by default with the app) I thought it was useful to have explicit instructions in a developer environment. It was pretty hard to think of the best way to test this so I went with this rather contrived example since documents_show is one of the main routes. I expect that when the BulkData technique is used as part of index page filters we can replace this test with a less contrived one.
Configuration menu - View commit details
-
Copy full SHA for b177dc2 - Browse repository at this point
Copy the full SHA b177dc2View commit details -
Remove hardcoded government config
This is no longer needed given we are now using the Publishing API
Configuration menu - View commit details
-
Copy full SHA for d2ad3e7 - Browse repository at this point
Copy the full SHA d2ad3e7View commit details -
Use specific government in history mode spec
This spec relies on a government existing at the date we specify which caused this test to fail. By creating the government in the test we resolve this complication.
Configuration menu - View commit details
-
Copy full SHA for be7cb1f - Browse repository at this point
Copy the full SHA be7cb1fView commit details -
Use local cache Rack middleware with BulkData::Cache
This sets up a local cache that wraps around the Redis cache layer of BulkData::Cache during the duration of a request. In order for this to work in a development environment we need the BulkData::Cache class not to be reloaded automatically. I achieved this by doing direct require calls. I first tried using the Rails autoload_once_paths option but didn't have much luck. As an example, before fixed require calls: ``` irb(main):002:0> BulkData::Cache.object_id => 70342543814920 irb(main):003:0> reload! Reloading... => true irb(main):004:0> BulkData::Cache.object_id => 70342595490220 ``` and after: ``` irb(main):002:0> BulkData::Cache.object_id => 70249028613320 irb(main):004:0> reload! Reloading... => true irb(main):005:0> BulkData::Cache.object_id => 70249028613320 ```
Configuration menu - View commit details
-
Copy full SHA for 8399dd5 - Browse repository at this point
Copy the full SHA 8399dd5View commit details -
Configure Redis for resiliency and reporting
By default the RedisCacheStore uses a fault tolerant approach where it returns nil if it can't access Redis and writes to a logger that we haven't configured [1]. This isn't an ideal configuration for our use case as it's a critical problem if the cache isn't available. To try resolve this we've got 3 reconnect attempts available if we don't manage to connect to Redis or have lost our connection. In the event that we have an error reading from the cache this will report the error to sentry, which will hopefully help make problems easier to diagnose. [1]: https://github.com/rails/rails/blob/09a2979f75c51afb797dd60261a8930f84144af8/activesupport/lib/active_support/cache/redis_cache_store.rb#L472-L485
Configuration menu - View commit details
-
Copy full SHA for fa27e4c - Browse repository at this point
Copy the full SHA fa27e4cView commit details -
Don't report BulkData::RemoteDataUnavailableError on server errors
There's not a whole lot we can do when a dependent service is erroring so lets not report it and wait to see if it causes problems in the app. This is to resolve an issue where we start getting server errors overnight in integration at the point of the data sync.
Configuration menu - View commit details
-
Copy full SHA for d395aa5 - Browse repository at this point
Copy the full SHA d395aa5View commit details