Get started
Inside your code, you just have to add the spot.h
header file.
#include "spot.h"
One paramount point is that libspot
does not know how to allocate/free memory (it does not know libc
by design). So you have to provide these functions. By default you can pass the common malloc
and free
functions from stdlib.h
.
#include "spot.h"
#include <stdlib.h>
set_allocators(malloc, free);
Ok, now you want to use the SPOT algorithm. You can allocate a Spot
object either on stack or on heap.
// stack allocation
struct Spot spot;
// or heap allocation
struct Spot* spot_ptr = (struct Spot*)malloc(sizeof(struct Spot));
Then you must init that structure with the spot_init
function.
// here we assume stack allocation
struct Spot spot;
// init with SPOT parameters
int status = spot_init(
&spot, // pointer to the allocated structure
1e-4, // q: anomaly probability
0, // low: observe upper tail
1, // discard_anomalies: flag anomalies
0.998, // level: tail quantile (the 0.2% higher values shapes the tail)
200 // max_excess: number of data to keep to summarize the tail
);
// you can check the initialization
if (status < 0) {
// print error
char buffer[100];
error_msg(-status, buffer, 100);
printf("ERROR %d: %s\n", -status, buffer);
}
Basically q
is the anomaly probability. The algorithm will flag events that have a lower probability than q
. In practice, it must be very low (like 1e-3
or less).
The low
parameter just defines whether we flag high (low = 0
) of low (low = 1
) values while discard_anomalies
says that we want to reject anomalies.
The level
should be a high quantile (a value close to 1
). It is useful to delimitate the tail of the distribution. One may use values like 0.98
, 0.99
or 0.995
.
Finally max_excess
is the number of data that will be kept to model the tail of the distribution.
You can read more about the parameters in the dedicated section.
Before prediction, we commonly need to fit the algorithm with first data. In practice you must provide a buffer of double
(pointer + size of the buffer). How many records are needed? Briefly, few thousands (like 2000) but it depends on the parameters passed to SPOT (and also whether you have enough data).
// double* initial_data = ...
// unsigned long size = ...
status = spot_fit(&spot, initial_data, size);
if (status < 0) {
// print error
char buffer[100];
error_msg(-status, buffer, 100);
printf("ERROR %d: %s\n", -status, buffer);
}
Full example
Here we present a basic example where the SPOT algorithm is run on an exponential stream.
// basic.c
// BUILD:
// $ make
// $ cc -o /tmp/basic examples/basic.c -Idist/ -Ldist/ -l:libspot.so.2.0b0 -lm
// RUN:
// $ LD_LIBRARY_PATH=dist /tmp/basic
#include "spot.h"
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
// U(0, 1)
double runif() { return (double)rand() / (double)RAND_MAX; }
// Exp(1)
double rexp() { return -log(runif()); }
int main() {
// set random seed
srand(1);
// provide allocators to libspot
set_allocators(malloc, free);
// stack allocation
struct Spot spot;
int status = 0;
// init the structure with some parameters
status = spot_init(
&spot,
1e-4, // q: anomaly probability
0, // low: observe upper tail
1, // discard_anomalies: flag anomalies
0.998, // level: tail quantile (the 1% higher values shapes the tail)
200 // max_excess: number of data to keep to summarize the tail
);
if (status < 0) {
return -status;
}
// initial data (for the fit)
unsigned long const N = 20000;
double initial_data[N];
for (unsigned long i = 0; i < N; i++) {
initial_data[i] = rexp();
}
// fit
status = spot_fit(&spot, initial_data, N);
if (status < 0) {
return -status;
}
// now we can run the algorithm
int K = 50000000;
int normal = 0;
int excess = 0;
int anomaly = 0;
clock_t start = clock();
for (int k = 0; k < K; k++) {
// rexp();
switch (spot_step(&spot, rexp())) {
case ANOMALY:
anomaly++;
break;
case EXCESS:
excess++;
break;
case NORMAL:
normal++;
break;
}
}
clock_t end = clock();
printf("%lf\n", (double)(end - start) / (double)(CLOCKS_PER_SEC));
printf("ANOMALY=%d EXCESS=%d NORMAL=%d\n", anomaly, excess, normal);
printf("Z=%.6f T=%.6f\n", spot.anomaly_threshold, spot.excess_threshold);
return 0;
}