Skip to content

Commit

Permalink
Merge pull request #276 from LBHackney-IT:DPF-184-4.x-docs
Browse files Browse the repository at this point in the history
DPF-184-4.x-docs
  • Loading branch information
stevefarrhackneygovuk authored Sep 2, 2024
2 parents 1daa632 + 711a6c6 commit dd50b9d
Show file tree
Hide file tree
Showing 41 changed files with 984 additions and 343 deletions.
Binary file added docs/dap-airflow/images/DAPairflowFLOWwide.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dap-airflow/images/branch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dap-airflow/images/github-access-five.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dap-airflow/images/github-access-four.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dap-airflow/images/github-branch-four.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dap-airflow/images/github-branch-three.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/dap-airflow/images/prototype-simple-transforms-nine-b.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/dap-airflow/images/prototype-simple-transforms-seven.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
93 changes: 50 additions & 43 deletions docs/dap-airflow/introduction.md
Original file line number Diff line number Diff line change
@@ -1,112 +1,119 @@
---
id: introduction
title: Introduction
title: 📚Introduction
description: "The DAP⇨flow guide for data analysts and engineers, for developing and deploying Airflow DAGs, running data pipelines in the Data Analytics Platform (DAP)."
layout: playbook_js
tags: [dap-airflow]
---

# Introduction
![DAP⇨flow](../dap-airflow/images/DAPairflowFLOW.png)
# 📚Introduction
![DAP⇨flow](../dap-airflow/images/DAPairflowFLOWwide.png)
## What is **DAP⇨flow**?
**DAP⇨flow** is an integation of ***Apache Airflow*** with ***Amazon Athena*** built upon Hackney's ***Data Analytics Platform***.
#### It allows Data Analysts, in the simplest way possible, to develop and run data pipelines using their own service's data and create data products for their service and service users.

Building data pipelines used to be harder and more complex and time consuming. Data Analysts after prototyping their SQL queries using ***Amazon Athena*** were required to migrate Athena's SQL code to Spark SQL, a different SQL dialect, then embed their code within an ***Amazon Glue*** job. They also had to negotiate querying across multiple generations of the same data stored in the ***Amazon S3*** Data lake. That meant they could not simply take a legacy SQL query and run it in ***Amazon Athena*** or create an ***Amazon Glue*** product directly from it.
Building data pipelines used to be harder and more complex and time consuming.

Data Analysts, after prototyping their SQL queries using ***Amazon Athena*** were required to convert *Athena SQL* code to *Spark SQL*, a different SQL dialect, then embed their code within an ***Amazon Glue*** job which they had to deploy using ***Terraform***.

Data Analysts were forced to query across multiple generations of the same data stored in the ***Amazon S3*** Data lake when all they actually wanted was just their current data. That meant they could not simply take legacy SQL queries and run them directly in ***Amazon Athena***.

#### How **DAP⇨flow** solves these problems
* Firstly, Data Analysts no longer need to use the more complex ***Amazon Glue*** because data pipelines built using ***Apache Airflow*** can use ***Amazon Athena*** to generate the data transformation products using the same prototype SQL code.
* Firstly, Data Analysts no longer need to convert and re-test their prototype SQL transforms to run in the separate and more complex ***Amazon Glue*** run-time environment.

Instead, ***Apache Airflow*** can use exactly the same ***Amazon Athena*** to transform data in production with the outputs going directly into data products. So that Data Analysts' prototype SQL transform queries, that they spent time on testing until they were working, can simply be reused instead of being discarded.

**This cuts development time by more than half while Data Analysts no longer need to context-switch between the two SQL dialects.**
**That cuts development time by more than half while Data Analysts no longer need to context-switch between the two SQL dialects.**

* Secondly, Data Analysts no longer need to adapt their SQL queries to ***Amazon S3***'s Data Lake partitioning architecture, because ***Apache Airflow*** is used to generate views of the underlying table data which presents Data Analysts with only current ingested service data, in readiness for prototyping and also later on when the automated transforms are run by Airflow.
* Secondly, Data Analysts no longer must adapt their legacy SQL queries to ***Amazon S3***'s Data Lake partitioning architecture.

**This further cuts development time while Data Analysts can very easily take the legacy SQL code from their service database system and run it directly on *Amazon Athena* with few changes.**
Instead, ***Apache Airflow*** is configured to generate views of the underlying table data to present Data Analysts with current-only ingested service data, both in readiness for prototyping and testing, and for when the working transforms are subsequently deployed, being automated and run by Airflow.

Data Analysts can also migrate their existing Athena SQL prototypes, previously adapted for the ***Amazon S3***'s Data Lake partitioning architecture, because the table history is still available to them, although the table names will now be suffixed "**_history**".
**That further cuts development time while Data Analysts can very easily take the legacy SQL code from their service database system and run it directly on *Amazon Athena* with few changes.**

Data Analysts can also migrate their existing Athena SQL prototypes, previously adapted for the ***Amazon S3***'s Data Lake partitioning architecture, because the same table history is available to them, although the table names will now be suffixed "**_history**", which is more intuitive for new users.

* Lastly, Data Analysts no longer need to use ***Terraform*** for deploying their data pipeline jobs because ***Apache Airflow*** simply takes care of that as soon as they commit their transform queries to **DAP⇨flow**'s ***GitHub*** repository.

## 📚Onboarding

#### A series onboarding documents is available here, to help Data Analysts get started with **DAP⇨flow**

Anyone new to **DAP⇨flow** will need read [**📚Before you begin**](../dap-airflow/onboarding/begin).

Thereafter, Data Analysts do not need to read every documents in the order they are listed below, especially if they are already familiar with the ***AWS Management Console*** and have used ***Amazon Athena*** before.
Thereafter, Data Analysts do not need to read every document in the order they are listed below, especially if they are already familiar with the ***AWS Management Console*** and have used ***Amazon Athena*** before.

Data Analysts are encouraged to think about what they need to do before deciding which document to read next. For example, if they have a ***legacy SQL query*** that they want to migrate to **DAP⇨flow**, they might jump straight to [**📚Prototype legacy transforms**](../dap-airflow/onboarding/prototype-legacy-transforms).
Data Analysts are encouraged to think about what they need to do before deciding which document to read next. For example, if they have a ***legacy SQL query*** that they want to migrate to **DAP⇨flow**, they could jump straight to [**📚Prototype legacy transforms**](../dap-airflow/onboarding/prototype-legacy-transforms).

#### ***"We**your feedback!"***
Your continuous feedback enables us to improve **DAP⇨flow** and our ***Data Analytics Platform*** service. Survey links are provided at the end of each onboarding document.

#### Here below, is the full list of topics currently on offer...

Further documents will be added as they are developed. [**Jump to the end**](#topics-arriving-here-soon) to discover what is coming next!
#### **Below here, is the full list of topics currently on offer...**
And more topics will be added as they are ready. [**Skip to the end**](#coming-soon) to discover what's coming next!

### [Before you begin](../dap-airflow/onboarding/begin)
#### What must happen before I can begin DAP⇨flow?
#### What must happen before I can begin **DAP⇨flow**?

### [AWS Console access](../dap-airflow/onboarding/access-the-AWS-Management-Console)
#### How will I access the AWS Management Console?
#### How will I access the ***AWS Management Console***?

### [AWS region](../dap-airflow/onboarding/access-the-AWS-region)
#### How will I ensure I am in the correct AWS region?
#### How will I ensure I am in the correct **AWS region**?

### [Amazon Athena](../dap-airflow/onboarding/access-my-Amazon-Athena-database)
#### How will I access my database from Amazon Athena?
#### How will I use ***Amazon Athena*** to access my database?

### [My current service data](../dap-airflow/onboarding/access-my-current-service-data)
#### How will I access my current service data from Amazon Athena?
#### How will I access my `[service]`'s current data from ***Amazon Athena***?

### [My service data history](../dap-airflow/onboarding/access-my-service-data-history)
#### How will I access my service data history from Amazon Athena?
#### How will I access my `[service]`'s data history from ***Amazon Athena***?

### [Query my service data](../dap-airflow/onboarding/query-my-service-data)
#### How will I query and analyze my service data with Amazon Athena?
#### How will I query and analyze my `[service]`'s data with ***Amazon Athena***?

### [Prototype simple transforms](../dap-airflow/onboarding/prototype-simple-transforms)
#### How can I use Amazon Athena to prototype a simple table-join data transformation?
#### How can I use ***Amazon Athena*** to prototype a simple table-join data transformation?

### [Prototype legacy transforms](../dap-airflow/onboarding/prototype-legacy-transforms)
#### How do I use Amazon Athena to prototype a data transformation from a `[legacy SQL query]`?
#### How do I use ***Amazon Athena*** to prototype a data transformation from my `[legacy SQL query]`?

## Topics arriving here soon...
### [GitHub access](../dap-airflow/onboarding/github-access)
#### How do I set up my ***GitHub*** access for **DAP⇨flow**?

### Setting up Github
#### How do I set up my Github access for **DAP⇨flow**?
[DPF-185 DOCUMENTATION / 4.0 Set up Github access for DAP Airflow](https://docs.google.com/document/d/1gEjFshKJYYV5w9IbM1Rj0u2tj\_BwSWpOLGV8aGwId1A/edit?usp=drive\_link)
If you haven’t done so already, get yourself a GitHub account. This document will tell you how to get set up for automating and deploying your DAP Airflow transforms.
### [GitHub branching](../dap-airflow/onboarding/github-branch)
#### How do I create `[transform branch]` as my new working branch of **DAP⇨flow**'s repository?

### Github branching
#### How do I add a new development branch to **DAP⇨flow**'s `[dap-airflow]` repository?
[DPF-185 DOCUMENTATION / 4.1 Add a new development branch to the DAP Airflow repository ](https://docs.google.com/document/d/1g6s14JK-9LBM8HT-F6-T-rGq7TxnWHD37FJi04Fh41Q/edit?usp=drive\_link)
### [Committing transforms](../dap-airflow/onboarding/github-commit-transform)
#### How do I commit my working `[transform SQL]` to **DAP⇨flow**'s repository?

### [GitHub pull requests](../dap-airflow/onboarding/github-pull-request)
#### How do I raise a *"pull request"* to merge my `[transform branch]` into the `main` trunk of the **DAP⇨flow** repository?

### Committing transforms
#### How do I commit my working transform query to **DAP⇨flow**'s `[dap-airflow]` repository?
[DPF-185 DOCUMENTATION / 4.2 Commit a working transform query to the DAP Airflow repository](https://docs.google.com/document/d/18TL2ep1laWzHU9MW-XvC\_N1S-gJx9SmPh9M2DC9XNQ4/edit?usp=drive\_link)
## 📚Coming soon...
The following guides are due for completion.

### Raising pull requests
#### How do I raise a Pull Request to merge my development branch into the main trunk of **DAP⇨flow**'s `[dap-airflow]` repository?
[DPF-146 DOCUMENTATION 4.3 Raise a Pull Request to merge the development branch into the main trunk of the DAP Airflow repository](https://docs.google.com/document/d/1LJjJobb2FVLoadUNCl3R7w9FedTFo8IJWHne-Zw9l-M/edit?usp=drive\_link)
### Merging branches
#### How do i complete the merge of `[transform branch]` into the main trunk of **DAP⇨flow**'s repository?

### Merging branches
#### How do i complete the merge of my development branch into the main trunk of **DAP⇨flow**'s `[dap-airflow]` repository?
### Airflow
#### How will I access my data transforms using ***Airflow*** on the web?

### Adding tables to the raw-zone
#### How do i add a new table ingestion to my service's raw-zone database?
#### How do i add a new table ingestion to my `[service raw-zone]` database?

## Topics suggested for the later...
## 📚Suggested for later...
The following guides are on our backlog.

#### Migrating old Athena prototype SQL to the new **DAP⇨flow**

#### Refined-zone views

#### External access to **DAP⇨flow** products

#### Removing tables from the raw-zone
#### Removing tables from my `[service raw-zone]` database

#### Removing products from the refined-zone
#### Removing products from the `[service refined-zone]` database

<br/>

Expand Down
54 changes: 31 additions & 23 deletions docs/dap-airflow/onboarding/access-my-Amazon-Athena-database.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,59 +6,67 @@ layout: playbook_js
tags: [onboarding]
---

# How will I access my database from Amazon Athena?
# How will I use ***Amazon Athena*** to access my database?

### 1. Access the AWS Management Console
**`🖱`** In your web browser, log in to your AWS account to access the AWS Management Console.
## 1. Access the ***AWS Management Console***
**`🖱`** In your web browser, log in to your AWS account to access the ***AWS Management Console***.

👉 First time AWS Users should **start here ►** **[DAP⇨flow📚AWS Console access](../onboarding/access-the-AWS-Management-Console)**

### 2. Familiarize Yourself with the console
**`🖱`** Take a moment to get comfortable with the layout and check out options available in the AWS console interface.
## 2. Familiarize yourself with the console
**`👁`** Take a moment to get comfortable with the layout and check out options available in the ***AWS Management Console*** interface.

**`Fig. 2 & 3`** ![Fig. 2 & 3](../images/access-my-Amazon-Athena-database-two-three.png)

### 3. Open Amazon Athena
**`🖱`** Locate and open Amazon Athena from the services menu.
## 3. Open ***Amazon Athena***
**`🖱`** Locate and open ***Amazon Athena*** from the services menu.

### 4. Select the Workgroup
**`🖱`** On the Athena interface, look to the top right corner next to "**Workgroup**". Then from the list box, select `[my service]`.
## 4. Select your `[service workgroup]`
**`👁`** Look to the top right corner of the ***Amazon Athena*** interface, next to "**Workgroup**".

**`🖱`** From the list box there, click on **``** to select your `[service workgroup]`.

**`Fig. 4 & 5`** ![Fig. 4 & 5](../images/access-my-Amazon-Athena-database-four-five.png)

### 5. Select the Database
**`🖱`** On the left side of the Athena interface, under the "**Database**" section, find the list box and check you have `[my service raw zone]`.
## 5. Select your `[service raw zone]` database
**`👁`** On the left side of the ***Amazon Athena*** interface, below the "**Database**" section, find the list box and check you can already see your `[service raw zone]` database.

**`🖱`** If you don't see it there, then simply click on **``** to select your `[service raw zone]` database from the list box.

---
## ***"We**your feedback!"***
![DAP⇨flow](../images/DAPairflowFLOWleft.png)
:::tip UX
👉 Please use **this link ►** [**DAP⇨flow** `UX` **Feedback / access-my-Amazon-Athena-database**](https://docs.google.com/forms/d/e/1FAIpQLSdqeNyWIPMNBHEr-YSyxnXQ4ggTwJPkffMYgFaJ4hGEhIL6LA/viewform?usp=pp_url&entry.339550210=access-my-Amazon-Athena-database)
### 👉 Please use **this link ►** [**DAP⇨flow** `UX` **Feedback / access-my-Amazon-Athena-database**](https://docs.google.com/forms/d/e/1FAIpQLSdqeNyWIPMNBHEr-YSyxnXQ4ggTwJPkffMYgFaJ4hGEhIL6LA/viewform?usp=pp_url&entry.339550210=access-my-Amazon-Athena-database)

- Your feedback enables us to improve **DAP⇨flow** and our Data Analytics Platform service.
- We encourage all our users to be generous with their time, in giving us their recollections and honest opinions about our service.
- We especially encourage our new users to give feedback at the end of every **📚Onboarding** task because the quality of the onboarding experience really matters.
**Please use this link to help us understand your user experience!**

**Please use this link to help us understand your user experience!**

:::

#### UX Criteria
## 📚`UX` Criteria
:::info ABILITY
* Hackney **AWS Management Console** user
* `[my service]` Data Analyst
* **AWS Management Console** user
* Hackney `[service]` Data Analyst

:::

:::note BEHAVIOR
**Measures** the behavior of **Amazon Athena** when first run and configured by the user
### How will I use ***Amazon Athena*** to access my database?
**Measures** the behavior of **Amazon Athena** when first run and configured by the user:

**Given** in my web browser, I have access to the Data Platform AWS Management Console
**Given** in my web browser, I have access to the ***AWS Management Console***
**~and** I have familiarized myself with the console interface
**When** I open Amazon Athena via the console menu system, by clicking on “**Athena**” where it appears
**~and** I click through any splash screen that might appear, eg. “Explore the Query Editor”
**Then** I will be presented with the Amazon Athena interface
**~and** up at the right next to “**Workgroup**”, I should be offered `[my service]` from the list box
**~and** over on the left under “**Database**” I should be offered `[my service raw zone]` from the list box.

**When** I open ***Amazon Athena*** via the console menu system, by clicking on “**Athena**” whereever it appears
**~and** I click through any splash screen that might appear, eg. “**Explore the Query Editor**

**Then** I will be presented with the **Amazon Athena** interface
**~and** up at the right next to “**Workgroup**”, I should be offered my `[service workgroup]` from the list box
**~and** over on the left under “**Database**” I should be offered my `[service raw zone]` from the list box.

**Scale** of 3 to 4 **~and** flow features.
:::
Loading

0 comments on commit dd50b9d

Please sign in to comment.