Anomaly Detection Series
PART: 1 AN INTRODUCTION IN LAYMAN’S TERM
THE CONCEPTS INVOLVED IN THIS ARTICLE ARE:
- About the Problem Statement
- Why is the problem important?
- Practical Use Cases
- Various Concepts involved in the upcoming articles of this series
Let us begin with a story:
- Let us assume that, there is a company name Splash, and you are a newly joined Machine Learning Engineer at the company, held at 30 LPA CTC. So, the below thing describes the company and the task in hand.
- Currently, we at Splash, help retailers with business intelligence
- One of our main challenges comprises monitoring and analyzing our client’s business metrics in real-time for instant detection of the incidents that may impact their revenue.
- One subpart of this challenge is Anomaly Detection which generates alerts on our client’s business metrics.
- The Problem Statement presented below highlights the problem which the company is tackling now.
1. About the Problem Statement:
- Since the company mainly helps the retailers with Business Intelligence, the company is mainly focused on viewing the client’s data, and then see if there is an anomaly on a given period, and try to quantify the measure with the help of the anomaly score.
So, now we would frame the question that, predict anomaly (at a given time, if it is present), and then try to quantify it.
- Let me clarify the word “quantify” , it means that, as compared to all the previous anomalies, how sure are you (on the scale of 0–100), that the given value is an anomaly.
DATASET:
You would be given a Comma Separated File (CSV), which would contain the following columns:
- timestamp (Data Type: String, You know the meaning, it means at a particular duration)
- value (Data Type: Integer, the quantity which is supposed to be our input, to predict if this is an anomaly or not)
- is_anomaly (Data Type: Boolean, True or False)
- predicted (Data Type: Integer, this is the output of a black-box model, which was developed to predict the value, on the basis of the past values)
SO, PROBLEM STATEMENT AND DATASET CLEAR? I HOPE SO……
2. Why is the problem statement important?
First of all, think about it from your perspective, then we would see it from the world’s perspective, maybe something new could come up, isn’t it?
OUR PERSPECTIVE:
- If we both are running a business, why would anomaly matter to us? Maybe, if I am running a sales business, why would anomaly matter? Because…..
- Because it would help me see, where am I going wrong, trying to justify something which went wrong, and define steps to take precautions in order to avoid/benefit from such things?
WORLD’S PERSPECTIVE:
- Eric Ogren, the senior security analyst at 451 Research, describes anomaly detection as “security analytics”.
Quoting Ogren again, “Two years from now, analytics will drive most organizations’ security strategies as operations teams use insights gleaned from analytics to apply preventive measures.
- It will be analytics first, and then more pinpoint, siloed-type approaches based on what the analytics tell you.”
3. PRACTICAL USE CASES:
The question comes, why to get information about the anomaly, what is its significance in real life, how the corporate world deals with anomaly?
- MEDICAL DOMAIN: Anomaly, in medical domain??? Yes, in the medical domain, it is used in detecting some cells which are anomalous in nature (could be detecting a tumor in the brain cells)
- STOCK TRADING: As we are getting more and more aware of stocks, it could help us in defining anomalies at a given period. An example could be the following picture.
THERE ARE MANY MORE OTHER USE CASES, HOWEVER WE NEED TO JUST GO OVER IT, AND LEARN HOW TO MAKE A ANOMALY DETECTION PROGRAM.
4. VARIOUS CONCEPTS INVOLVED IN THE UPCOMING ARTICLES OF THE SERIES
So, in the upcoming articles, we would various algorithms used for the classification, and use some statistical methods to find out the anomaly score for the given anomaly.