序列挖掘^分析.pptVIP

  • 1
  • 0
  • 约1.31万字
  • 约 29页
  • 2017-01-20 发布于北京
  • 举报
Simple Example N1 and N2 are regions of normal behavior Points o1 and o2 are anomalies Points in region O3 are anomalies Related problems Rare Class Mining Chance discovery Novelty Detection Exception Mining Noise Removal Black Swan* Key Challenges Defining a representative normal region is challenging The boundary between normal and outlying behavior is often not precise The exact notion of an outlier is different for different application domains Availability of labeled data for training/validation Malicious adversaries Data might contain noise Normal behavior keeps evolving Input Data – Complex Data Types Relationship among data instances Sequential Temporal Spatial Spatio-temporal Graph Data Labels Supervised Anomaly Detection Labels available for both normal data and anomalies Similar to rare class mining Semi-supervised Anomaly Detection Labels available only for normal data Unsupervised Anomaly Detection No labels assumed Based on the assumption that anomalies are very rare compared to normal data Type of Anomaly Point Anomalies Contextual Anomalies Collective Anomalies Point Anomalies An individual data instance is anomalous w.r.t. the data Contextual Anomalies An individual data instance is anomalous within a context Requires a notion of context Also referred to as conditional anomalies* Collective Anomalies A collection of related data instances is anomalous Requires a relationship among data instances Sequential Data Spatial Data Graph Data The individual instances within a collective anomaly are not anomalous by themselves Output of Anomaly Detection Label Each test instance is given a normal or anomaly label This is especially true of classification-based approaches Score Each test instance is assigned an anomaly score Allows the output to be ranked Requires an additional threshold parameter Evaluation of Anomaly Detection – F-value Accuracy is not sufficient metric for evaluation Example: network traffic data set with 99.9% of normal data and 0.1% o

文档评论(0)

1亿VIP精品文档

相关文档