Skip to content

Commit

Permalink
Updates
Browse files Browse the repository at this point in the history
  • Loading branch information
petermnhull committed Mar 10, 2024
1 parent 97e02c9 commit 634aa5a
Show file tree
Hide file tree
Showing 16 changed files with 55 additions and 45 deletions.
22 changes: 11 additions & 11 deletions asset-manifest.json
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
{
"files": {
"main.css": "/static/css/main.011f2bdb.css",
"main.js": "/static/js/main.32ea4769.js",
"static/js/636.495f3be9.chunk.js": "/static/js/636.495f3be9.chunk.js",
"static/js/520.0d1a7abe.chunk.js": "/static/js/520.0d1a7abe.chunk.js",
"static/js/208.c85cdd2a.chunk.js": "/static/js/208.c85cdd2a.chunk.js",
"static/js/837.0f347d72.chunk.js": "/static/js/837.0f347d72.chunk.js",
"main.js": "/static/js/main.e7fdd1af.js",
"static/js/636.f8139331.chunk.js": "/static/js/636.f8139331.chunk.js",
"static/js/520.dc9fcd40.chunk.js": "/static/js/520.dc9fcd40.chunk.js",
"static/js/208.1963d242.chunk.js": "/static/js/208.1963d242.chunk.js",
"static/js/837.b4751bdb.chunk.js": "/static/js/837.b4751bdb.chunk.js",
"static/js/787.96d091b9.chunk.js": "/static/js/787.96d091b9.chunk.js",
"static/media/2024_01_02_gophercon_2023.md": "/static/media/2024_01_02_gophercon_2023.5c6354491d20ad0939cf.md",
"static/media/2022_10_24_mini_data_warehouse.md": "/static/media/2022_10_24_mini_data_warehouse.f880da82467e5826446b.md",
"static/media/2023_07_11_net_zero.md": "/static/media/2023_07_11_net_zero.6b29a7e315dd6389e5c8.md",
"static/media/2022_11_02_lessons_in_ml_ops.md": "/static/media/2022_11_02_lessons_in_ml_ops.1b42175a395a965357c6.md",
"static/media/2024_01_02_gophercon_2023.md": "/static/media/2024_01_02_gophercon_2023.39f4f4f64b11d75e7c93.md",
"static/media/2022_10_24_mini_data_warehouse.md": "/static/media/2022_10_24_mini_data_warehouse.63f9383468cf9753573a.md",
"static/media/2023_07_11_net_zero.md": "/static/media/2023_07_11_net_zero.ac0cbac6493142d41bc9.md",
"static/media/2022_11_02_lessons_in_ml_ops.md": "/static/media/2022_11_02_lessons_in_ml_ops.fdd4b1598ccbb902d818.md",
"index.html": "/index.html",
"main.011f2bdb.css.map": "/static/css/main.011f2bdb.css.map",
"main.32ea4769.js.map": "/static/js/main.32ea4769.js.map",
"main.e7fdd1af.js.map": "/static/js/main.e7fdd1af.js.map",
"787.96d091b9.chunk.js.map": "/static/js/787.96d091b9.chunk.js.map"
},
"entrypoints": [
"static/css/main.011f2bdb.css",
"static/js/main.32ea4769.js"
"static/js/main.e7fdd1af.js"
]
}
2 changes: 1 addition & 1 deletion index.html
Original file line number Diff line number Diff line change
@@ -1 +1 @@
<!doctype html><html lang="en"><head><meta charset="utf-8"/><meta name="viewport" content="width=device-width,initial-scale=1"/><meta name="theme-color" content="#000000"/><title>Peter Hull</title><script defer="defer" src="/static/js/main.32ea4769.js"></script><link href="/static/css/main.011f2bdb.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>
<!doctype html><html lang="en"><head><meta charset="utf-8"/><meta name="viewport" content="width=device-width,initial-scale=1"/><meta name="theme-color" content="#000000"/><title>Peter Hull</title><script defer="defer" src="/static/js/main.e7fdd1af.js"></script><link href="/static/css/main.011f2bdb.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>
3 changes: 1 addition & 2 deletions manifest.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
{
"short_name": "Peter Hull",
"name": "petermnhull.github.io",
"icons": [
],
"icons": [],
"start_url": ".",
"display": "standalone",
"theme_color": "#000000",
Expand Down
2 changes: 1 addition & 1 deletion static/css/main.011f2bdb.css.map

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 0 additions & 1 deletion static/js/main.32ea4769.js.map

This file was deleted.

6 changes: 3 additions & 3 deletions static/js/main.32ea4769.js → static/js/main.e7fdd1af.js

Large diffs are not rendered by default.

File renamed without changes.
1 change: 1 addition & 0 deletions static/js/main.e7fdd1af.js.map

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,22 @@ About a year ago, we decided that it was time for a rewrite, so we started porti
We chose Go as we evaluated it to be the best choice for a high-performance case management system, and the team had experience in writing backend servers with it. We named the application `Arpeggio`, and focused on simple but high-volume `GET` endpoints. We got `Arpeggio` serving customers in production within months, by redirecting traffic via our API gateway (i.e. [Ambassador](https://www.getambassador.io/products/edge-stack/api-gateway/kubernetes-ingress-controller/)) from `AppCA-PHP` to `Arpeggio`.

### What challenges did we face?

Our testing framework for `GET` endpoints focused on initially shadowing production traffic to `Arpeggio` and manually testing our UI. Shadowing in particular was super useful, as it meant we got to compare the results in production without actually serving the new results to clients. This gave us a lot of confidence, and as we weren't modifying data, this was enough - if we needed to rollback, we just reverted configuration changes in our API gateway.

This turned out to be totally insufficient when we got to `POST`, `DELETE` and `PATCH` endpoints - you can't shadow non-idempotent requests with side effects, and there's no room for error when messing with customer data. If something wasn't covered by our tests which resulted in bad data getting stored, we were stuck.

### The Mini Data Warehouse

All of this led to one idea: a Mini Data Warehouse. We needed to be able to query over the requests that `Arpeggio` served, so that we could both understand the responses clients were served and perform a database rollback in the case of an incident. But we didn't need a fully-fledged data warehouse solution such as Snowflake or Redshift; we just needed a comprehensive log of production requests.

This is what the pipeline we came up with looks like:

<img src="blog_2022_10_24_a.png" alt="Pipeline Part 1" width="90%" height="auto">

#### Kafka Event Middleware
The first step was to set up a middleware in `Arpeggio` which captures every request *and* the corresponding response. This then gets sent to our shared in-house Kafka cluster.

The first step was to set up a middleware in `Arpeggio` which captures every request _and_ the corresponding response. This then gets sent to our shared in-house Kafka cluster.

We made use of a [MultiWriter](https://pkg.go.dev/io#MultiWriter) to copy the output to a buffer. Everything is then serialised in Protobuf format and finally sent to Kafka using SegmentIO's [kafka-go](https://github.com/segmentio/kafka-go) package.

Expand All @@ -51,30 +54,35 @@ The events look something like this:
</CodeBlock>

#### Kafka Connect and S3

Now that the requests are recorded in Kafka, we sink the events from Kafka into S3 using [Kafka Connect](https://docs.confluent.io/platform/current/connect/index.html).

S3 was an obvious decision: at ComplyAdvantage, we use it anytime we need a cheap blob store. But using Kafka as a buffer between Arpeggio and S3 is a more interesting design choice.

We decided to do this instead of calling S3 directly for a number of reasons:
* We already had this design pattern in a few places (e.g. sinking customer profiles from one service to a MongoDB via Kafka Connect). Our Strimzi Kafka Connect clusters were pretty much ready to plug-and-play.
* It was a good opportunity to validate our vision for Event Driven Design (i.e. using Kafka to decouple consumers and producers).
* The latency requirements for storing requests were low and so a lag introduced by a Kafka broker was acceptable.

- We already had this design pattern in a few places (e.g. sinking customer profiles from one service to a MongoDB via Kafka Connect). Our Strimzi Kafka Connect clusters were pretty much ready to plug-and-play.
- It was a good opportunity to validate our vision for Event Driven Design (i.e. using Kafka to decouple consumers and producers).
- The latency requirements for storing requests were low and so a lag introduced by a Kafka broker was acceptable.

#### Athena

Once in S3, we can query over our data by using Athena on the AWS console. Athena uses a query language (similar to SQL) for searching over all the records. It can be a bit slow (e.g. it takes ~10 seconds to look at a day’s worth of requests), but that’s okay as we only need it for one-off queries when investigating an incident or for analysing the responses that our users receive. This limited usage also kept costs low, with our queries and S3 storage altogether accounting for less than 1% of our AWS bill.

### Did it work?

Yes!

With all of `Arpeggio`'s requests and responses being stored in S3, we were able to fully understand what customers were receiving during the migration of an endpoint from PHP to Go.

Unfortunately, we really found out that the system worked through an incident - the processing for one endpoint was incorrect, leading to invalid data in the database.

Luckily, we could find all the failed requests via Athena and replay the payloads through `AppCA-PHP` to recreate the request properly.
Luckily, we could find all the failed requests via Athena and replay the payloads through `AppCA-PHP` to recreate the request properly.

The pipeline enabled us to rebuild the data into a consistent state.

### What came next?

A downside of this design is that we can only send request/response pairs to S3 if the request was served by `Arpeggio`.

This downside, along with a few other reasons, lead us to making some external changes so we could use the pipeline for a broader scope. Instead of using Ambassador to route requests to `Arpeggio` or `AppCA-PHP`, we move that logic into the `Arpeggio` application code. This means that everything gets served by `Arpeggio`.
Expand All @@ -84,9 +92,11 @@ At the time of writing, the pipeline looks a little something like this:
<img src="blog_2022_10_24_b.png" alt="Pipeline Part 2" width="90%" height="auto">

This gives us numerous advantages:
* As mentioned above, responses served by the PHP can be seen in S3.
* Unlike `AppCA-PHP`, `Arpeggio` has comprehensive observability tooling with [DataDog](https://www.datadoghq.com/). If everything goes through `Arpeggio`, that means our distributed tracing shows up in DataDog.
* Finally, the routing logic in Ambassador is very complicated and difficult to test. With that logic now in `Arpeggio`, we can unit test it and avoid issues caused by human error when updating Ambassador's complicated Regex-based configuration.

- As mentioned above, responses served by the PHP can be seen in S3.
- Unlike `AppCA-PHP`, `Arpeggio` has comprehensive observability tooling with [DataDog](https://www.datadoghq.com/). If everything goes through `Arpeggio`, that means our distributed tracing shows up in DataDog.
- Finally, the routing logic in Ambassador is very complicated and difficult to test. With that logic now in `Arpeggio`, we can unit test it and avoid issues caused by human error when updating Ambassador's complicated Regex-based configuration.

### Conclusion

Now that we've got a real-time event store and a stable process for migrating traffic, decommissioning the PHP monolith is much safer to do. It is primarily thanks to this that `Arpeggio` is now serving >99% of requests to EU and US customers, giving us improved uptime, latency, and velocity.
Loading

0 comments on commit 634aa5a

Please sign in to comment.