- Summary: The article highlights time series discords as a popular method for practitioners due to its simplicity. However, the effectiveness of discords is limited by the sensitivity to the subsequence length parameter set by the user. The article introduces MERLIN, an algorithm that can efficiently and exactly find discords of all lengths in massive time series archives. The authors demonstrate the utility of MERLIN in detecting subtle anomalies that cannot be detected by existing algorithms or human inspection, and also highlight how computational redundancies can make MERLIN significantly faster than comparable algorithms. (paper, code)
Time series discords are subsequences of a time series that are maximally far away from their nearest neighbors. In other words, they are the parts of a time series that are the most different from the rest of the data.
(1) The need for manual tuning of parameters: Many existing methods require the user to manually tune one or more parameters, such as the window size or threshold value, in order to achieve good performance. This can be time-consuming and subjective, and may result in suboptimal performance.
(2) Overfitting: With so many parameters to fit on a small dataset, the risk of overfitting increases, which can lead to suboptimal performance and decreased reliability in anomaly detection. This means that the model may be detecting patterns or features that are specific to the training data but not present in new data. As a result, the model may mistakenly identify these patterns or features as anomalies, even if they are not truly anomalous (increase false positives).
The following example shows that there is a “sweet spot” (or rather, sweet range) for subsequence length when performing anomaly discovery. In some cases, the analyst may have prior knowledge or experience that can guide them in choosing a good value for the subsequence length parameter. However, in anomaly and novelty discovery tasks, it is often necessary to try different subsequence lengths to see which works best for the particular dataset and anomaly detection task.
DRAG requires a single input parameter
For example, the time taken for DRAG given values for r that range from 1.0 to 40.0. For any value greater than 10.27 (discord value) the algorithm reports failure and must be restarted with a lower value. Suppose that we had guessed r = 10.25, then DRAG would have taken 1.48 seconds to find the discord. However, had we guessed a value that was just 2.5% less, DRAG would have taken 9.7 times longer. Has we guessed r = 1.0 (a perfectly reasonable value on visually similar data), DRAG would have taken 98.9 times longer. In the other direction, had we guessed any greater than 1% more, DRAG would have failed. The time it takes to complete a failed run is about 1/6 the time of our successful run when r =
was set to the 10.25 guess. So, while failure is cheaper, it is not free.
(1) MERLINE can remove the need to set even that sole parameter (sequence length). It can efficiently and exactly discover discords of every possible length, then either report all of them, or just the top-K discords under an arbitrary user-defined scoring metric.
(2) MERLINE relieves the issue of estimation of
Given a time series
Start with empty set
is_candidate = T
For each subsequence
For each candidate
if
prune
is_candidate = F
if is_candidate
# if no one is pruned (all
add
*Note that at this phase, the distance is only calculated from the current subsequence to every subsequence before it. So, the distance from the current subsequence to the subsequence after it is NOT calculated.
for each
initialize
end for
Start with empty set
is_discord = T
For each subsequence
For each candidate
if
get
if
prune
is_discord = F
else
We simply consider each subsequence’s distance to every member of our set, doing a best-so-far search for each candidate’s nearest neighbor. There are there situations for the subsequences in time series:
(1) The distance
(2) The distance
(3) The distance
For more details about DRAG, it can be found here.
The z-normalized Euclidean distance
Given the variance of the sequence
we can derive the length of the sequence
Then, we can derive the z-normalized Euclidean distance between
As the Pearson correlation is limited to range
Find the optimal range of
As we can observe, when we make the discord length longer, the discord score can increase, decrease or stay the same. Thus, it is a bad idea that we use the discord value
For the first five discord lengths, MERLIN sets the first
Reason for discovering discords from short sequence length to long sequence length: It is only for the first invocation of DRAG that we are completely uncertain about a good value for r, and we may have multiple failure runs and/or invoke DRAG with too small of a value for r, making it run slow. It is much faster to do this on the shorter subsequence lengths.
(1) A subsequence of the constant region: the z-normalize will fail as the
(2) 'twin freak' problem: more than one anomaly, and they look the same. This can be solved by changing the first nearest neighbor to the
(1) Anomalies without label: Any algorithm that does find this anomaly will be penalized as having produced a false positive.
(2) Obvious and trivial anomalies: obvious anomalies that any algorithm can easily detect.
(1) Can discover ultra-subtle anomalies
(2) Free of assumptions about the anomaly duration
(3) MERLIN achieves better running efficiency
They will use the worst-case dataset from MERLIN, random walk. For such data, the top-1 discord is only slightly further away from its nearest neighbor than any randomly chosen subsequence, meaning that the candidate set in DRAG phase 1 grows relatively large even if given a good value for
Thresholds can often be learned with simple human-in-the-loop algorithms. In brief, the user can simply examine a sorted list of all candidate anomalies. The discord distance of the first one she rejects as “not an anomaly”, can be used as the threshold for future datasets from the same domain.
Each algorithm is tasked with locating the one location it thinks is most likely to be anomalous (We removed the handful of examples that have no claimed anomaly). If that location is within
By utilizing the indexing technique called Orchard's algorithm, MERLIN++ is faster than MERLIN but produces identical results. For more details, it can be found here.