Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read SA data & write to cloud logging #1145

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions examples/python/extract-logs-write-to-cloud-logging/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
## Overview

A Python script that extracts Looker system/audit logs from [System Activity](https://docs.looker.com/admin-options/system-activity) and exports the Logs to Cloud Logging. This example tries to format the output logs like a [GCP Audit Log](https://cloud.google.com/logging/docs/audit/understanding-audit-logs) as best as possible. See [mapping](#gcp-audit-log-fields-to-looker-system-activity-mapping) for comparison between Looker System Activity Fields and GCP Audit Log Fields
Copy link
Collaborator

@jeremytchang jeremytchang Sep 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First lines should state what it does, not be saved for the note.

A Python script that extracts [System Activity](https://docs.looker.com/admin-options/system-activity) data from the last 10 minutes, formats the data as Audit Logs, and exports the logs to Cloud Logging. The data formatting/mapping is best effort. See [data mapping](#gcp-audit-log-fields-to-looker-system-activity-mapping) below.

Then can keep note as:

**_NOTE:_**  You can schedule this script to run every 10 minutes using a cron job or equivalent to continually create and export logs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


> **_NOTE:_** The script extracts System Activity data from the last 10 minutes. You can then schedule this script to run every 10 minutes using a cron job or equivalent

## Requirements
- Looker Instance in which you have Admin or `see_system_activity` permission
- Google Cloud Project with Cloud Logging API enabled
- [pyenv](https://github.com/pyenv/pyenv#installation) installed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add the python version and gcloud to this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

itodotimothy6 marked this conversation as resolved.
Show resolved Hide resolved

## Deployment

- Clone the repo and navigate to this directory
```
git clone https://github.com/looker-open-source/sdk-codegen.git
cd sdk-codegen/examples/python/extract-logs-write-to-cloud-logging
```

- Setup Python Virtual environment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is python virtual environment necessary for development?
Seems like Python version 3.8.2 is what's necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A virtual environment is not required. However, it is generally encouraged to install dependencies in a virtual environment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sample script is not the place for this. We should only list bare minimum requirements to run the sample script.

```
pyenv install 3.8.2
pyenv local 3.8.2
python -m venv .venv
```

- Install dependencies
```
pip install looker-sdk
pip install --upgrade google-cloud-logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the --upgrade assume the developer already has google-cloud-logging installed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mistake. --upgrade option is not required.

```


- Create API credentials and set environment variables
```
export LOOKERSDK_BASE_URL="<Your API URL>"
export LOOKERSDK_CLIENT_ID="<Your Client ID>"
export LOOKERSDK_CLIENT_SECRET="<Your Client Secret>"
```

- Configure gcloud and [setup service account](https://cloud.google.com/logging/docs/reference/libraries#setting_up_authentication) to write Logs to Cloud Logging
```
gcloud config set project <Project ID>
export GOOGLE_APPLICATION_CREDENTIALS="<Service Account Key Path>"
```

- Run `main.py`
```
python main.py
```


## GCP Audit Log Fields to Looker System Activity Mapping

| GCP Audit Log Field | Looker System Actvity Field |
Copy link
Collaborator

@jeremytchang jeremytchang Sep 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the second column be Looker System Activity Field OR Value? So that the other hardcoded values can be documented here too?

| ----------- | ----------- |
| [logName](https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#:~:text=Fields-,logName,-string) | `looker_system_activity_logs` |
Copy link
Collaborator

@jeremytchang jeremytchang Sep 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the GCP services that consume Audit Logs have expectations on the values in the Audit Log?
This example would set precedent and probably be defacto standard for Audit logs coming from Looker. I assume compliance/auditing expects certain values for Audit Logs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes GCP Audit logs have expected values.

However, some of these values do not apply or make sense for a Looker resource. For example, GCP resource always falls under a project/resource but a Looker resource doesn't. A project in Looker means something different. The mapping is on a best effort basis.

Technically the output of this script is not a GCP Audit log (when you filter for GCP Audit log, these logs won't appear). I'm only using the Audit log format in this example for similarity when querying these logs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. So this is only for querying standard GCP Audit Logs with Looker System activity Logs included.
However, my main worry is compliance/auditing issues. Like i previously asked, is there any GCP services that consume audit logs that may require specific values? (And have you consulted them on this)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all GCP resources require certain values in audit logs(E.g Project, organization, IAM details). I already mentioned that some of these values don't apply to Looker because technically Looker is not a GCP resource (yet)

No, I did not consult with the Audit Logs team. This repo is only an example showing a way you can programmatically export System Activity logs today, until the product team integrates Looker as a GCP service. It's my understanding that only the product team can get this type of compliance approval you're referring to.

System Activity logs is NOT a GCP Audit Log. And I'm sure the Audit team will echo that if I consult with them. This is not an official solution, but an example to demonstrate one of the workarounds you could do until the product team provides an official solution to this.

I'm only mapping the SA logs to look like GCP Audit logs since most customers are familiar with the format and I mentioned in the readme that this is a best effort basis

| [timestamp](https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#:~:text=reported%20the%20error.-,timestamp,-string) | [event.created](https://docs.looker.com/admin-options/tutorials/events#:~:text=for%20example%2C%20create_dashboard-,created,-Date%20and%20time) |
| [resource.type](https://cloud.google.com/logging/docs/reference/v2/rest/v2/MonitoredResource#:~:text=Fields-,type,-string) | `looker_system_activity_logs` |
itodotimothy6 marked this conversation as resolved.
Show resolved Hide resolved
| [resource.type](https://cloud.google.com/logging/docs/reference/v2/rest/v2/MonitoredResource#:~:text=Fields-,type,-string) | `looker_system_activity_logs` |
| [insertId](https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#:~:text=is%20LogSeverity.DEFAULT.-,insertid,-string) | [event.id](https://docs.looker.com/admin-options/tutorials/events#:~:text=Description-,id,-Unique%20numeric%20identifier) |
| `protoPayload.status` | [event.attribute.status](https://docs.looker.com/admin-options/tutorials/events#:~:text=Trigger-,Attributes,-add_external_email_to_scheduled_task) |
| `protoPayload.authenticationInfo` | [event.user_id](https://docs.looker.com/admin-options/tutorials/events#:~:text=of%20the%20event-,user_id,-Unique%20numeric%20ID), [event.sudo_user_id](https://docs.looker.com/admin-options/tutorials/events#:~:text=for%20example%2C%20dashboard-,sudo_user_id,-Unique%20numeric%20ID) |
| `protoPayload.authorizationInfo` | `permission_set.permissions` |
| `protoPayload.methodName` | [event.name](https://docs.looker.com/admin-options/tutorials/events#:~:text=triggered%20the%20event-,name,-Name%20of%20the) |
| `protoPayload.response` | [event_attributes](https://docs.looker.com/admin-options/tutorials/events#:~:text=Trigger-,Attributes,-add_external_email_to_scheduled_task) |
210 changes: 210 additions & 0 deletions examples/python/extract-logs-write-to-cloud-logging/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
import json
from collections import defaultdict

import looker_sdk
from looker_sdk import models40 as models

from google.cloud import logging

sdk = looker_sdk.init40()


def create_query():
response = sdk.create_query(
body=models.WriteQuery(
model="system__activity",
view="event_attribute",
fields=[
"event.id",
"event.name",
"event.category",
"event.sudo_user_id",
"event.created_time",
"user.email",
"user.name",
"permission_set.permissions",
"permission_set.name",
"permission_set.id",
"model_set.models",
"model_set.name",
"event_attribute.name",
"event_attribute.value",
"event_attribute.id",
"group.id",
"group.name",
"group.external_group_id",
"model_set.id",
"user.dev_branch_name"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick but can we alphabetize?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

],
filters={"event.created_time": "10 minutes"},
sorts=["event.created_time desc"],
filter_config={"event.created_time": [{
Copy link
Collaborator

@jeremytchang jeremytchang Sep 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be set to null unless i'm misunderstanding behavior or api docs.

https://developers.looker.com/api/explorer/4.0/types/Query/WriteQuery
The filter_config represents the state of the filter UI on the explore page for a given query. When running a query via the Looker UI, this parameter takes precedence over "filters". When creating a query or modifying an existing query, "filter_config" should be set to null. Setting it to any other value could cause unexpected filtering behavior. The format should be considered opaque.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. having both filters & filter_config is redundant

"type": "past",
"values": [{
"constant": "10",
"unit": "min"
}]
}]}
))

return response


def get_looker_data():
query_id = create_query()["id"]
response = sdk.run_query(
query_id=query_id,
result_format="json")
return json.loads(response)


def group_permission_by_event_id(data):
output = defaultdict(set)
for r in data:
event_id = r['event.id']
permission_data = json.dumps({
'permission_set_id': r['permission_set.id'],
'permission_set_name': r['permission_set.name'],
'permission_set_permissions': r['permission_set.permissions'],
})
output[event_id].add(permission_data)
return output


def group_event_attribute_by_event_id(data):
output = defaultdict(set)
for r in data:
event_id = r['event.id']
event_attribute_data = json.dumps({
'event_attribute_id': r['event_attribute.id'],
'event_attribute_name': r['event_attribute.name'],
'event_attribute_value': r['event_attribute.value'],
})
output[event_id].add(event_attribute_data)
return output


def group_model_set_by_event_id(data):
output = defaultdict(set)
for r in data:
event_id = r['event.id']
model_set_data = json.dumps({
'model_set_id': r['model_set.id'],
'model_set_name': r['model_set.id'],
'model_set_models': r['model_set.id'],
})
output[event_id].add(model_set_data)
return output


def group_user_by_event_id(data):
output = defaultdict(set)
for r in data:
event_id = r['event.id']
user_data = json.dumps({
'user_email': r['user.email'],
'user_name': r['user.name'],
'user_dev_branch_name': r['user.dev_branch_name'],
})
output[event_id].add(user_data)
return output


def group_event_by_event_id(data):
output = defaultdict(set)
for r in data:
event_id = r['event.id']
user_data = json.dumps({
'event_category': r['event.category'],
'event_name': r['event.name'],
'event_id': r['event.id'],
'event_created_time': r['event.created_time'],
'event_sudo_user_id': r['event.sudo_user_id'],
})
output[event_id].add(user_data)
return output


def group_all(data):
user = group_user_by_event_id(data)
model_set = group_model_set_by_event_id(data)
event_attribute = group_event_attribute_by_event_id(data)
permission = group_permission_by_event_id(data)
event = group_event_by_event_id(data)

event_id_set = set()

for r in data:
event_id_set.add(r['event.id'])

output = {}
for id in event_id_set:
output[id] = {
'event': list(event[id]),
'permission_set': list(permission[id]),
'event_attribute': list(event_attribute[id]),
'user': list(user[id]),
'model_set': list(model_set[id]),
}
return output


def parse_event_attribute(event_attribute):
output = {}
for data in event_attribute:
r = json.loads(data)
output[r['event_attribute_name']] = r['event_attribute_value']
return output


def get_status(data):
ea = parse_event_attribute(data)
if 'status' in ea:
return ea['status']
return ''


def format(aggregated_data):
data = aggregated_data
output = []

for id in aggregated_data:

output.append({
'logName': 'looker_system_activity_logs',
'timestamp': json.loads(data[id]['event'][0])['event_created_time'],
'insertId': id,
'resource': {
'type': 'looker',
},
'protoPayload': {
'@type': 'looker_system_activity_logs',
'authenticationInfo': {
'principalEmail': json.loads(data[id]['user'][0])['user_email']
},
'serviceName': 'looker.com',
'methodName': json.loads(data[id]['event'][0])['event_name'],
'details': parse_event_attribute(data[id]['event_attribute']),
'status': get_status(data[id]['event_attribute']),
}
})

return output


def write_log_entry(formatted_data):

logging_client = logging.Client()
logger = logging_client.logger('looker_system_activity_logs')

for log in formatted_data:
logger.log_struct(log)

print("Wrote logs to {}.".format(logger.name))


if __name__ == "__main__":
data = get_looker_data()
agg_data = group_all(data)
formatted_data = format(agg_data)
write_log_entry(formatted_data)