Streaming multi-scale anomaly detection

Problem setup

  • \(x(t)\) are observations over time \(t\) where new data arrives over time \(t\).

  • Problem : Window of \(x(t)\), i.e. \([t:t-p+1]\) which ‘‘deviates’’ from normal behavior.

  • Motivation : How to become invariant to window-size/scale of pseudo-periodic structure in \(x(t)\) ?

  • Solution : Track correlations of principal subspace at various scales

Methodology

  • At \(t\) we build multiple lag-embeddings of series \(x(t)\) at size \(p\) : \(X_t^p = [x_{t}, x_{t-1}, \ldots, x_{t-p+1}]^T \in \mathbb{R}^p\).

  • Given \(X^p \in \mathbb{R}^{T \times p}\), \(\mathbf{w}_p\) is defined as 1-D projection capturing most of the energy of samples :

\( \mathbf{w}_p = \arg\min_{\|\mathbf{w}\|=1} \sum^T_{t=1} \| X_{t}^p - (\mathbf{w}\mathbf{w}^T)X_t^p \|^2 \)

We update the principal directions at each lag \(\mathbf{w}_p\) to maximize the variance captured by \(\mathbf{w}_p \mathbf{w}_p^T\). An example of a 2-d lag matrix embedding with the principal directions are provided :

Time Series

pca plot 

PCA embedding of sliding windows in two dimensions

alt text 

Colormap shows the increasing time instant \(t\) at which new samples \(x(t)\), and thus lag-vectors \(X_t^2\) arrive.

colormap 

Global glow for Multi-scale Anomaly detection

colormap 

Least correlated Scale Vs 2nd Iteration of Streaming PCA

Detector performance (AUC) Vs Avg. Reconstruction error

Streaming PCA on multi-scale Lag-matrix \(\| \pmb{\alpha}_t \|^2 \)

AUCVsErr 
AUCVsErr 
AUCVsErr 

2nd Iteration of streaming PCA on multi-scale Lag-matrix \(\| \widetilde{\pmb{\alpha}_t} - \pmb{\alpha}_t\|^2\)

AUCVsErr 
AUCVsErr 
AUCVsErr 
  • Representation : Multi-scale lagmatrix

  • Aggregation methods : Least correlated scale, Norm of multi-scale anomaly score, Norm of reconstruction error of anomaly scores.

  • Each point in each scatterplot corresponds to a time series from dataset B2, B3, B4 in Yahoo!.

Non-linear features for predictive model based anomaly detection

In second study, we study the performance of scattering transform representation of time series for the purpose of time series prediction using auto-regressive models. We use the prediction error for the purpose of anomaly and change point detection. We also compare our performance with a simple representation of the time series window created by evaluating concatenating non-linear functions of the window at the current time instant with scattering transform representation. The two types of features that we study in the report.

  • Simple features on moving window \(w\) by the embedding

\(\mathbf{x} = \{\min(x_{1:T}), \max(x_{1:T}), \text{avg}(x_{1:T}), \text{std}(x_{1:T}), \text{TV}(x_{1:T})\}\) where \(\mathbf{x} \in \mathbb{R}^5\)

and TV\((w)\) is the digital version of the total variation of the series \(\sum^T_{t=1} |x_t - x_{t+1}|\).

  • Scattering transform \(S x\) of signal \(x\)

We train the following linear predictors to evaluate the variance scaled prediction error as anomaly score :

  • Scalar and Vector Autoregressive Models (AR, VAR)

  • AR and VAR models on PCA transformed features. (PCR, PC-VAR)

(Results coming up shortly)

Reading List

Multiscale Anomaly detection

  • Multi-scale anomaly detection algorithm based on infrequent pattern of time series 2006 link

  • Optimal Multi-scale Patterns in Time Series Streams 2006 pdf

  • Streaming Pattern Discovery in Multiple Time-Series 2005 (SPIRIT) pdf

  • Real-Time Anomaly Detection for Streaming Analytics pdf

  • Combining Filtering and Statistical Methods for Anomaly Detection pdf

  • ASAP: A Streaming Operator for Smoothing Time Series Visualizations

  • Generic and Scalable Framework for Automated Time-series Anomaly Detection KDD 2015 pdf

  • Measuring predictability using multiscale embedding 1996 link

  • Derivative Delay Embedding: Online Modeling of Streaming Time Series 2016 pdf

  • Joakim & Mallat. “Deep scattering spectrum.”Trans. on Signal Processing (2014) pdf.

Subspace Tracking and Sketching

  • B. Yang. Projection approximation subspace tracking 1995 (PAST-D algorithm)

  • Online PCA with Spectral Bounds 2015 pdf

  • Online PCA in High Dimension: Which Algorithm to Choose? 2015 pdf

  • Structure Discovery in Nonparametric Regression by Compositional Kernel Search ICML 2013 pdf

  • Modeling Time Series & Sequences: Learning Representations & Making Predictions (2015) thesis

Others

  • LOF: Identifying Density-Based Local Outliers 2000 pdf

  • Incremental Local Outlier Detection for Data Streams 2007 pdf

  • Machine Learning for Sequential Data: A Review pdf

  • How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms? 2015 pdf

  • Robust Multivariate Autoregression for Anomaly Detection in Dynamic Product Ratings 2014 pdf

  • Time delay neural network Wiki

Workshops and Conferences on anomaly detection

  • NIPS Time series workshop 2015, 2016

  • Workshop on Outlier Detection, Description KDD-ODD:

  • ICML 2016 Anomaly Detection Workshop 2013