We were trying to perform Anomaly Detection on incoming API Traffic. It used an LSTM-Autoencoder architecture.
Here is parts of the preprocessing, modelling and scoring utility code that I wrote for the project.
The code has functionality including but not limited to -
- Importing custom CSV files with nuanced datetime formats.
- Resampling missing timestamps by various methods e.g. setting missing values to mean of last n timestamps.
- Custom
train_test_split
alternative that lets you easily split by whole months of data instead of a decimal %.
src/SupervisedModel.py
contains code for the final preprocessing and model training.
src/Utility.py
contains code for various dataframe operations that were relevant to the analysis and data cleaning part of the project.
src/exceptions
contain some custom Python Exception classes.
src/tests
contain some manual testcases.
from Utility import TimeSeriesDf
tsdf = TimeSeriesDf(
csvpath,
granularity_ts='%Y-%m-%d %H:%M',
timestamp_column_name='client_received_start_timestamp',
delimiter='|
)
tsdf = TimeSeriesDf(
csvfolder,
granularity_ts='%Y-%m-%d %H:%M',
timestamp_column_name='client_received_start_timestamp',
delimiter='|',
folder='yes'
)
tsdf.scale('request_size')
tsdf.set_missing_minutes_to_zero()
tsdf.impute_zero_to_mean('request_size')
tsdf.dataframe_max_transformer("apiproxy")
tsdf.dataFrame.to_csv(csvpath)