DBOS Log Analysis on VPC Logs

5 Rules

Aggregate rules/ queries

Rejected/Failed SSH connections threshold of k within a duration
New high throughput external connection

Look-up rules/ queries

Communication with IPs that have bad reputation (known IOCs)
Probing from TOR exit nodes
Unauthorized destination port

Table Schema

ts: formatted timestamp using VPC_START
day_part: partition/ interval within the day (e.g. if consider 30-min interval, a day has 48 partitions)
dotted boxes are views

Connect to database

ssh -i id_jd [email protected]
docker exec -it vertica_ce /opt/vertica/bin/vsql

# test query
select count(*) from VPCFLOWLOG_TEST; # --> should return 17373056

# to exit the database terminal, type \q and then ENTER

Look-up Queries

Details

Lookup rules

Create 4 tables for each of the known bad source/ destination addresses/ ports (ingest as more are discovered)

Experiment

Randomly populate each of the 4 tables with 10 (distinct) rows
Query over all logs to detect if any log matched the bad fields

Template

SELECT COUNT(*) FROM VPCFLOWLOG_TEST t1 WHERE [[CONDITIONS]];

Example

SELECT COUNT(*) FROM VPCFLOWLOG_TEST t1 WHERE t1.VPC_DSTADDR IN (SELECT VPC_DSTADDR FROM VPCFLOWLOG_BAD_DSTADDR);

SELECT COUNT(*) FROM VPCFLOWLOG_TEST t1 WHERE t1.VPC_DSTADDR IN (SELECT VPC_DSTADDR FROM VPCFLOWLOG_BAD_DSTADDR) OR t1.VPC_SRCADDR IN (SELECT VPC_SRCADDR FROM VPCFLOWLOG_BAD_SRCADDR) OR t1.VPC_DSTPORT IN (SELECT VPC_DSTPORT FROM VPCFLOWLOG_BAD_DSTPORT) OR t1.VPC_SRCPORT IN (SELECT VPC_SRCPORT FROM VPCFLOWLOG_BAD_SRCPORT);

Aggregate Queries

Rule 1: New high throughput connection

Strategy: compare current statistics with rolling baselines

Details

Views: a stored query that dynamically access and compute data at execution time, so it’s always up-to-date

VPCFLOWLOG_HIST: histogram of incoming traffic to a particular destination within an interval on a day
VPCFLOWLOG_HIST_BL: Summarize over a period of days to obtain several statistics of the incoming traffic on a particular day interval (e.g. average, standard deviation)
VPCFLOWLOG_ANLY: Join VPCFLOWLOG_HIST with VPCFLOWLOG_HIST_BL to compare count with baseline statistics

Experiment: From VPCFLOWLOG_ANLY, fetch all logs that is one standard deviation above/ below the average

Template

SELECT COUNT(*) FROM VPCFLOWLOG_ANLY WHERE [[CONDITIONS]];

Example

SELECT COUNT(*) FROM VPCFLOWLOG_ANLY WHERE count > avg_count + stddev_count OR count < avg_count - stddev_count;

SELECT COUNT(*) FROM VPCFLOWLOG_ANLY WHERE count > avg_count + stddev_count;

SELECT COUNT(*) FROM VPCFLOWLOG_ANLY WHERE count > avg_count + 2 * stddev_count OR count < avg_count - stddev_count;

Rule 2: Rejected SSH connections more than 20 times within half an hour

Details

View

Filter the logs by destination port = 22 and action = reject
Group by day, interval within the day, and source address

Experiment:

From the histogram, fetch all logs with count > 20

Template

SELECT COUNT(*) FROM VPCFLOWLOG_HIST2 WHERE [[CONDITIONS]];

Example

SELECT COUNT(*) FROM VPCFLOWLOG_HIST2 WHERE count > 20;

Summary

Performance

#logs/ #files	Ingest into database	Lookup queries	Aggregate query 1	Aggregate query 2
1M (12K)	~42s	~761ms	~510ms	~400ms
17M (47K)	~5min 30s	9650ms	6729ms	3317ms

Query performances scale well even when the number of logs increases significantly

Tuning

A lot of knobs in our implementation can be easily tuned (e.g. interval within the day, metrics interested)
We can also leverage machine learning to help better determine thresholds

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
img		img
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DBOS Log Analysis on VPC Logs

5 Rules

Aggregate rules/ queries

Look-up rules/ queries

Table Schema

Connect to database

Look-up Queries

Lookup rules

Experiment

Template

Example

Aggregate Queries

Rule 1: New high throughput connection

Template

Example

Rule 2: Rejected SSH connections more than 20 times within half an hour

Template

Example

Summary

Performance

Tuning

About

Releases

Packages

Contributors 2

DBOS-project/vpc-analysis-public

Folders and files

Latest commit

History

Repository files navigation

DBOS Log Analysis on VPC Logs

5 Rules

Aggregate rules/ queries

Look-up rules/ queries

Table Schema

Connect to database

Look-up Queries

Lookup rules

Experiment

Template

Example

Aggregate Queries

Rule 1: New high throughput connection

Template

Example

Rule 2: Rejected SSH connections more than 20 times within half an hour

Template

Example

Summary

Performance

Tuning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages