Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration for aws-alb on top of S3 using Materialized view #1496

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
b9d42bf
Integration for aws.alb -> s3 -> MaterializedView using opensearch SQ…
YANG-DB Apr 5, 2023
c152f90
fix broken markdown documents links and remove redundant docs (#1495)
YANG-DB Apr 5, 2023
c13f7a6
add additional information of the steps and assets dependencies with …
YANG-DB Apr 5, 2023
3af084e
Merge remote-tracking branch 'origin/aws_alb_integration' into aws_al…
YANG-DB Apr 13, 2023
46a0f2b
add diagrams for the data flow
YANG-DB Apr 13, 2023
b5050d8
Merge remote-tracking branch 'origin/aws_alb_integration' into aws_al…
YANG-DB Apr 13, 2023
8bdf6d4
add diagram for the flow manager to load the assets of the integration
YANG-DB Apr 13, 2023
3c1ba96
Merge remote-tracking branch 'origin/aws_alb_integration' into aws_al…
YANG-DB Apr 14, 2023
25d8cb8
update sample document fields references
YANG-DB Apr 14, 2023
4e6cee1
Merge remote-tracking branch 'origin/aws_alb_integration' into aws_al…
YANG-DB Apr 14, 2023
13166ab
update reference fields references
YANG-DB Apr 14, 2023
7674a3a
Merge remote-tracking branch 'origin/aws_alb_integration' into aws_al…
YANG-DB Apr 14, 2023
da20c62
add connectivity flow diagram
YANG-DB Apr 14, 2023
8b94fcb
Merge remote-tracking branch 'origin/aws_alb_integration' into aws_al…
YANG-DB Apr 14, 2023
4b43e17
move alb into a distinct aws folder
YANG-DB Apr 14, 2023
2fc3fc0
Merge remote-tracking branch 'origin/aws_alb_integration' into aws_al…
YANG-DB Apr 19, 2023
e8a9ceb
add fields.md support for spec and details
YANG-DB Apr 19, 2023
311a58f
Merge remote-tracking branch 'origin/aws_alb_integration' into aws_al…
YANG-DB Apr 20, 2023
3689a46
add dashboard preview images
YANG-DB Apr 20, 2023
c0002f3
Merge branch 'main' into aws_alb_integration
YANG-DB May 3, 2023
a1180ee
Merge branch 'main' into aws_alb_integration
YANG-DB May 3, 2023
7cdad55
append service & traces analytics docs & info
YANG-DB May 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions integrations/aws/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# AWS Integrations

The next library contains different AWS services Observability integrations

## Load Balancers Logs

Load balancers are a significant part in a cloud environment. Once your system need high availability, you are likely to require a load balancer in front of the different instances of your app.

AWS offers three types of load balancers, suitable for different scenarios: Elastic Load Balancers, Application Load Balancers, and Network Load Balancers.
ELB stands for [Elastic Load Balancer](https://aws.amazon.com/elasticloadbalancing/), as this was its name when it was first introduced in 2009 and was the only type of load balancer available.

ELB works at both layer 4 (TCP) and 7 (HTTP),ELB has quite a few limitations.
- it can’t forward traffic on more than one port per instance
- it doesn’t support forwarding to IP addresses (can only forward to explicit EC2 instances or containers in ECS or EKS)
- it doesn’t support websockets.

In 2016 AWS launched ELB version 2, which is made up of two products:
- [Application Load Balancer (ALB)](https://aws.amazon.com/elasticloadbalancing/application-load-balancer/)
- [Network Load Balancer (NLB)](https://aws.amazon.com/elasticloadbalancing/network-load-balancer/)

They both use a similar architecture and concepts - they use the concept of “target groups,” which is one additional level of redirection.
Listeners receive requests and decide to which target group they forward the requests. Both ALB and NLB can forward traffic to IP addresses.

**Observability supports the next load balance log integrations:**

- [V1: Classic load balance logs]()
- [V2: Application load balance logs](elb/info/README.md)
- [V2: Network load balance logs]()

---

## s3
Amazon Simple Storage Service [(Amazon S3)](https://aws.amazon.com/s3/) is an object storage service offering industry-leading scalability, data availability, security, and performance.

---

## cloudfront
[Amazon CloudFront](https://aws.amazon.com/cloudfront/) is a content delivery network (CDN) service built for high performance, security, and developer convenience.

---

## config
[AWS config service](https://aws.amazon.com/config/) assesses, audits, and evaluates the configurations and relationships of your resources on AWS, on premises, and on other clouds.

- AWS Config records details of changes to your resources to provide you with a configuration history. You can use the AWS Management Console, API, or CLI to obtain details of what a resource’s configuration looked like at any point in the past.
- AWS Config helps you record software configuration changes within your Amazon Elastic Compute Cloud (EC2) instances and servers running on-premises, as well as servers and virtual machines in environments provided by other cloud providers.
- AWS Config discovers, maps, and tracks AWS resource relationships in your account. For example, if a new EC2 security group is associated with an EC2 instance

---
9 changes: 9 additions & 0 deletions integrations/aws/elb/assets/datasource/datasources.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[
{
"name" : "myspark",
"connector": "jdbc",
"properties" : {
"url" : "jdbc:hive2://spark-thrift:10000/default"
}
}
]
654 changes: 654 additions & 0 deletions integrations/aws/elb/assets/display/elb.ndjson

Large diffs are not rendered by default.

29 changes: 29 additions & 0 deletions integrations/aws/elb/assets/display/fields.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Queried Fields

| 'Fields' | 'Mapping Source' |
`@timestamp`

`cloud.account.id`
`cloud.region`
`http.response.status_code`
`http.response.status_code`
`http.request.bytes`
`http.response.bytes`
`http_protocol`
`http.request.method`
`ssl_protocol`
`ssl_cipher`
`url.domain`
`url.path`
`user_agent.original`
`source.geo.country_name`
`source.geo.country_iso_code`
`source.ip`
`destination.ip`
`request_processing_time`,
`response_processing_time`

`event.category`
`event.ingested`
`event.kind`
`event.module`
3 changes: 3 additions & 0 deletions integrations/aws/elb/assets/display/index-pattern.ndjson

Large diffs are not rendered by default.

30 changes: 30 additions & 0 deletions integrations/aws/elb/assets/index/alb_logs_metrics.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"mappings" : {
"properties" : {
"count2xx" : {
"type" : "long"
},
"count4xx" : {
"type" : "long"
},
"count5xx" : {
"type" : "long"
},
"latencyInSec" : {
"type" : "float"
},
"timestamp" : {
"type" : "date"
},
"totalCount" : {
"type" : "long"
},
"totalReceivedBytes" : {
"type" : "long"
},
"totalSentBytes" : {
"type" : "long"
}
}
}
}
42 changes: 42 additions & 0 deletions integrations/aws/elb/assets/index/alb_logs_raw.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
{
"mappings" : {
"properties" : {
"receivedBytes" : {
"type" : "long"
},
"requestUrl" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"requestVerb" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"sentBytes" : {
"type" : "long"
},
"statusCode" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"timestamp" : {
"type" : "date"
}
}
}
}
38 changes: 38 additions & 0 deletions integrations/aws/elb/assets/table/alb_logs_temp.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
CREATE TABLE IF NOT EXISTS alb_logs_temp
(
type string,
time timestamp,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code int,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string,
domain_name string,
chosen_cert_arn string,
matched_rule_priority string,
request_creation_time string,
actions_executed string,
redirect_url string,
lambda_error_reason string,
target_port_list string,
target_status_code_list string,
classification string,
classification_reason string
)
USING PARQUET
LOCATION 's3a://xxx/'
14 changes: 14 additions & 0 deletions integrations/aws/elb/assets/view/alb_logs_metrics.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
CREATE MATERIALIZED VIEW alb_logs_metrics
AS
SELECT
UNIX_MILLIS(window.start) AS timestamp,
COUNT(*) AS totalCount,
AVG(target_processing_time) FILTER(WHERE target_processing_time != -1) AS latencyInSec,
COUNT(*) FILTER(WHERE target_status_code LIKE '2__') AS count2xx,
COUNT(*) FILTER(WHERE target_status_code LIKE '4__') AS count4xx,
COUNT(*) FILTER(WHERE target_status_code LIKE '5__') AS count5xx,
SUM(received_bytes) AS totalReceivedBytes,
SUM(sent_bytes) AS totalSentBytes
FROM alb_logs_temp
WHERE client_ip = '10.212.10.101'
GROUP BY TUMBLE(time, '1 Minute');
11 changes: 11 additions & 0 deletions integrations/aws/elb/assets/view/alb_logs_raw.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
CREATE MATERIALIZED VIEW alb_logs_raw
AS
SELECT
UNIX_MILLIS(time) AS timestamp,
request_verb AS requestVerb,
request_url AS requestUrl,
target_status_code AS statusCode,
received_bytes AS receivedBytes,
sent_bytes AS sentBytes
FROM alb_logs_temp
WHERE client_ip = '10.212.10.101'
35 changes: 35 additions & 0 deletions integrations/aws/elb/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"name": "alb_logs",
"version": {
"integ": "0.1.0",
"schema": "1.0.0",
"resource": "^2.6.0"
},
"description": "AWS ALB Integration",
"catalog": "observability",
"components": [
"http,communication,cloud,container,aws_elb"
],
"collection":[
{
"logs": [{
"info": "ALB logs",
"input_type":"logfile",
"dataset":"aws.alb",
"labels" :["aws","ALB"]
}]
},
{
"metrics": [{
"info": "ALB metrics signals ",
"input_type": "metrics",
"dataset": "alb.status",
"labels": ["metrics",""]
}]
}
],
"repo": {
"github": "https://github.com/opensearch-project/observability/tree/main/integrarions/alb"
}
}

94 changes: 94 additions & 0 deletions integrations/aws/elb/info/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
![Alb logs](alb_logo.png)

# What is [ALB](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html)

AWS Application Load Balancer (ALB) is a service offered by Amazon Web Services (AWS) that helps distribute incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, or IP addresses. It is a part of AWS Elastic Load Balancing (ELB) services, which also include the Classic Load Balancer (CLB) and the Network Load Balancer (NLB).

The ALB operates at the application layer (Layer 7) of the Open Systems Interconnection (OSI) model and is designed to route HTTP/HTTPS traffic. It can make routing decisions based on the content of the request, allowing more advanced load distribution compared to the Classic Load Balancer.


## What is ALB Integration

ALB integration is concerned with the following aspects

- Allow simple and automatic generation of all schematic structured
- logs (using the standard SS4O logs schema including specific cloud components logs & custom AWS Elb logs)
- metrics (using the standard SS4O schema)

- Add Dashboard Assets for both logs / metrics

- Add correlation queries to investigate logs based metrics

This integration helps data flow as described in the next diagram:

![](data-flow-diagram.png)


## ALB logs fields

The following table describes the fields of an access [log entry](# https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html).

## Logs reference

The `elb` dataset collects logs from AWS ELBs.

**Exported fields**

The aws logs mapping fields contain the next components categories:
- [http](../../../../src/main/resources/schema/observability/logs/http.mapping)
- [communication](../../../../src/main/resources/schema/observability/logs/communication.mapping)
- [cloud](../../../../src/main/resources/schema/observability/logs/cloud.mapping)
- [container](../../../../src/main/resources/schema/observability/logs/container.mapping)
- [aws_elb](../../../../src/main/resources/schema/observability/logs/aws_alb.mapping)

### Integration Loading Process
The AWS ALB logs integration loading process includes the following assets
- **Connectivity**
- S3 Datasource connectivity
- Spark compute engine connectivity
- **Tables**
- alb logs external table (definition including fields mapping to observability logs template)
- alb logs index (definition based on observability logs template)
- **Views**
- alb log materialized view (view is a pre-calculated query based on a specific given dimension)
- alb metrics materialized view (view is a pre-calculated query based on a specific given dimension)
- **Display**
- alb dashboard for viewing all the given information in a meaningful manner


#### Assets Loading Order
The general order for which the assets are needed to be loaded is dictated by the next concepts:

1) Connectivity - First the connection related configuration are required to be validated for existence and correctness.

![](datasource-connection-flow.png)

---
2) Mapping - Next the schema specific instruction dictated by the integration's config must be verified for existence or be created

---
3) Tables - Next the external / internal tables / indices will be verified for existence or be created - (they are based on the mapping phase and connectivity phase )

---
4) Views - Next the views search templates need to be created (they are based on the tables and connectivity phase )

---
5) Display - Last the dashboards and visual assets are uploaded (they are based on the views, tables and connectivity phase)

---

### Integration assets flow
The next diagram describes the ALB-Logs Integration assets loading and validation process for the flow manager to upload as part of the state transitions.

![](flint-integration-flow-chart.png)


This diagram details different aspects of the interaction between the visual dashboard and panels to the backing SQL queries that would be executed.

![](alb-integration-load-assets.png)


### Dashboards
The following dashboard preview shows the summarized information collected from the load balancer log index

![dashboard-elb.jpg](dashboard-elb.jpg)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added integrations/aws/elb/info/alb_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added integrations/aws/elb/info/dashboard-elb.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading