Summarizes AWS and GCP billing data into an email report. Originally written by Erich Weiler.
This repository was previously named ucsc-cgp/aws-billing-report. It was renamed to ucsc-cgp/cloud-billing-report.
Billing data is provided by AWS and GCP via:
-
the AWS Cost and Usage Report feature, and
-
the Cloud Billing feature for GCP.
AWS Cost and Usage Reports are delivered by Amazon to a specified S3 bucket in Parquet format. A BigQuery Data Transfer job runs daily, automatically importing the AWS billing data to a BigQuery table where it is queried for report generation. As such, report generation requires GCP resources even if GCP reports are not being generated.
Google automatically loads billing data into a specified BigQuery dataset. (This must be set up manually.)
At time of writing, many of these resources have been deployed manually by Erich. Some resources are managed with Terraform. In the former case, "example" Terraform configuration is included. All such configuration lives in [terraform/][terraform/].
Credentials configured in config.json
must be authorized for access to
billing data generated by these features.
- Python 3.8.6
- Terraform 0.12
First, populate config.json
and install requirements:
$ cp config.json.example config.json # and populate it
$ python -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
Now you can generate reports:
$ python report.py aws # AWS report for yesterday
$ python report.py aws 2020-10-10 # AWS report for a given date
$ python report.py gcp # GCP report for yesterday
$ python report.py gcp 2020-10-10 | /usr/sbin/sendmail -t # etc.
Alternatively, you can build a Docker image:
$ docker build -t report .
and run it:
$ docker run \
--volume $(pwd)/config.json:/config.json:ro \
report aws
$ docker run \
--volume $(pwd)/config.json:/config.json:ro \
--volume ~/.config/gcloud/:/root/.config/gcloud:ro \
report gcp 2019-12-31
There are two ways to authenticate to GCP. The quick and dirty way is to install
the gcloud command line tool and do gcloud auth login
.
If you're running the script in a Docker container, it's sufficient to do
gcloud auth login
then mount the gcloud config directory inside the container,
a la:
$ docker run --volume ~/.config/gcloud/:/root/.config/gcloud:ro ... report gcp
but you shouldn't do that. Instead, you can generate a service account with limited permissions and use that to authenticate:
$ SERVICE_ACCOUNT_NAME=my-service-account-name-here
$ PROJECT_ID=project-name-123456
$ gcloud iam service-accounts create $SERVICE_ACCOUNT_NAME
$ gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
--roles roles/bigquery.jobUser
$ gcloud iam service-accounts key create ./gcp-credentials.json \
--iam-account "${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"
$ # You might need to grant the service account access to the data set
$ # separately.
$ docker run \
-e GOOGLE_APPLICATION_CREDENTIALS=/gcp-credentials.json \
-v $(pwd)/gcp-credentials.json:/gcp-credentials.json:ro \
-v $(pwd)/config.json:/config.json:ro report gcp | sendmail -t
Billing data (on both AWS and GCP) are not provided in real time. In particular,
-
AWS billing data usually lands in S3 between 6-12 hours after the end of the day. For example, billing data for the time period between 00:00 PT September 15, 2020 and 23:59 PT September 15, 2020 will generally be available for download by 06:00 PT September 16, 2020.
-
GCP billing data usually lands in GCS between 12-18 hours after the end of the day. So, for example, billing data for the time period between 00:00 PT September 15, 2020 and 23:59 PT September 15, 2020 will generally be available for download by 17:00 PT September 16, 2020.
These observations are general and not always the case. For example, at the beginning of the billing period (the first few days of the month), billing data may not be available at all.
Erich runs these reports via Docker daily; generating AWS reports at 6 AM PT and GCP reports at 5 PM PT. As mentioned above, this means that some reports may fail to generate on time, especially at the beginning of the month. To address this, there's some retry logic to track and automatically retry generating those reports.
It's worth addressing that Docker seems a little heavyweight for something as small as this. Since running and developing this code is manually coordinated, running via Docker helps smooth the upgrade path, and makes managing dependencies a little easier.
The cron jobs look something like this:
0 6 * * * root bash /root/reporting/run-report.sh aws
0 17 * * * root bash /root/reporting/run-report.sh gcp
0 19 * * * root python3 /root/reporting/retry-failed-reports.py /root/reporting/fail.log
(Assuming that everything in scripts/
lives at /root/reporting
.)