Unsupervised Learning: Clustering

Overview of Clustering

<aside> <img src="/icons/table_red.svg" alt="/icons/table_red.svg" width="40px" /> Table of Contents

</aside>

<aside> 💡

<aside> 💡

<aside> 💡

Introduction to Clustering.
1. Clustering is a fundamental technique in unsupervised machine learning that involves grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups. The goal is to partition the data into clusters with:
  1. High intra-class similarity: Objects within the same cluster are similar.
  2. Low inter-class similarity: Objects from different clusters are dissimilar.
Clustering Key Concepts:
1. Similarity Measure: The similarity between records is often measured using a distance function. The most common distance function is the Euclidean Distance.
2. Euclidean Distance: For two points $(x_1, y_1)$ and $(x_2, y_2)$, the Euclidean distance is calculated as:
  
  $$ \text{Distance} = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} $$
3. Example 1: Distance between points 3 and 8 in one dimension:
  
  $$ \text{Distance} = |3 - 8| = 5 $$
4. Example 2: Distance between points (4, 3) and (2, 8):
  
  $$ \text{Distance} = \sqrt{(2 - 4)^2 + (8 - 3)^2} = \sqrt{4 + 25} = \sqrt{29} \approx 5.39 $$
5. Example 3: Distance between points (-1, 2, 3) and (4, 0, -3):
  
  $$ \text{Distance} = \sqrt{(4 - (-1))^2 + (0 - 2)^2 + (-3 - 3)^2} \\ {} \\ = \sqrt{25 + 4 + 36} \\ {} \\ = \sqrt{65} \approx 8.06 $$