Parameters
SPOT can be very powerful, once the parameters are coherent. This is not about precision mechanics but there is rule of thumbs you can follow.
Detection
The main parameter is q
. It defines the probability of an abnormal event. For example, if q = 0.001
, it means that the algorithm will consider events with a probability lower than 0.1%
as anomalies.
The impact of q
defines the trade-off between the detection rate and the false positive rate. When q
is low (e.g. 1e-8
), SPOT will flag only very extreme events so the false positive rate will be low (very extreme events are likely to be true anomalies) and so the detection rate (there will be more unflagged anomalies). The revert phenomenon occurs when q
is "high".
q |
Example | Detection rate | False positive rate |
---|---|---|---|
low | 1e-8 |
low | low |
high | 1e-3 |
high | high |
Bias/variance tradeoff
The parameters level
and size
(from the spot_fit
function) are involved in the fit step of the algorithm.
The size
is the number of data used for calibration while 1-level
represents the proportion of these initial data that belong to the tail of the distribution (level
is then a high quantile in practice). For example, let us use size = 1000
and level = 0.99
. The algorithm will drop the 990 lowest data and will keep the 10 highest to make a first tail fit.
Warning
In practice, the user must ensure that n_init * (1 - level)
is high enough to perform the fit (at least few dozens of data). See the paragraph below.
When size
is fixed, the parameter level
tunes the bias/variance tradeoff.
We have explained that the number of data to perform the fit is (1-level)*size
. If level
is high (close to 1), we are more likely to shape the right tail (low bias) but as the number of fitting data is low, the latter will be more variable (high variance). Conversely if level
is not too close to 1, we will have more data to fit the tail (low variance) but maybe our fit will involve data that do not belong to the tail (high bias).
level |
Example | Bias | Variance |
---|---|---|---|
low | 0.95 |
high | low |
high | 0.999 |
low | high |
Ideally, if you have plenty of records, you can take a very high level
without worrying about the variance.
Warning
Remember also that 1-level
cannot be lower that q
otherwise it leads to a contradiction between what should be flagged and what should be in the tail.
Bounded memory
In theory, the number of data in the tail can grow indifinetely while monitoring an infinite stream. Everyone knows that memory resources are limited so we cannot store all of the data to update the tail. Here comes the max_excess
parameters: it defines the number of tail data we will keep.
Moreover, it creates a memory vanishing effect since the model will keep only the last max_excess
tail data to perform the fit. Thus, it must be high enough to perform a good fit in terms of bias and variance (see the paragraph above) but beware of the tail dynamics: if you need to quickly adapt to the "new" shape of the tail, max_excess
should not be too high (one may advice few hundreds but it could be more if the monitored stream is stable).