k-Means is a data partitioning technique which is widely used for clustering. It has variants(like the mini-batch k-Means) which are incredibly fast for large amounts of data. Its clustering results are also easy to interpret. However, there are a lot of applications of k-Means which are not talked about a lot. They are:
In my previous article(https://medium.com/analytics-vidhya/anomaly-detection-in-python-part-1-basics-code-and-standard-algorithms-37d022cdbcff) we discussed the basics of Anomaly detection, the types of problems and types of methods used. We discussed the EDA, Univariate and the Multivariate methods of performing Anomaly Detection along with one example of each. We discussed why Multivariate Outlier detection is a difficult problem and requires specialized techniques. We also discussed Mahalanobis Distance Method with FastMCD for detecting Multivariate Outliers.
In this article, we will discuss 2 other widely used methods to perform Multivariate Unsupervised Anomaly Detection. We will discuss:
Anomaly detection is a tool to identify unusual or interesting occurrences…
An Anomaly/Outlier is a data point that deviates significantly from normal/regular data. Anomaly detection problems can be classified into 3 types:
Principal Component Analysis is among the most popular, fastest and easiest to interpret Dimensionality Reduction Techniques which exploits the Linear Dependence among variables. Some of its applications are:
In the following article we will discuss the applications and why PCA works.
Why does Dimensionality Reduction using PCA Work?
Dimensionality reduction using PCA works because of the presence of Collinearity(Or Linear Dependence among features) in data. Let us see what it means. Imagine the following 2 scenarios:
Regularization is a method used to reduce the variance of a Machine Learning model; in other words, it is used to reduce overfitting. Overfitting occurs when a machine learning model performs well on the training examples but fails to yield accurate predictions for data that it has not been trained on.
In theory, there are 2 major ways to build a machine learning model with the ability to generalize well on unseen data:
It has been…
k-Means is a data partitioning algorithm which is among the most immediate choices as a clustering algorithm. Some reasons for the popularity of k-Means are: