Overview of Clustering


<aside> <img src="/icons/table_red.svg" alt="/icons/table_red.svg" width="40px" /> Table of Contents

</aside>

<aside> 💡

  1. Introduction to Clustering </aside>

<aside> 💡

  1. Clustering Methods </aside>

<aside> 💡

  1. K-Means Clustering Algorithm </aside>

Introduction to Clustering


  1. Introduction to Clustering.

    1. Clustering is a fundamental technique in unsupervised machine learning that involves grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups. The goal is to partition the data into clusters with:
      1. High intra-class similarity: Objects within the same cluster are similar.
      2. Low inter-class similarity: Objects from different clusters are dissimilar.

  2. Clustering Key Concepts:

    1. Similarity Measure: The similarity between records is often measured using a distance function. The most common distance function is the Euclidean Distance.

    2. Euclidean Distance: For two points $(x_1, y_1)$ and $(x_2, y_2)$, the Euclidean distance is calculated as:

      $$ \text{Distance} = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} $$

    3. Example 1: Distance between points 3 and 8 in one dimension:

      $$ \text{Distance} = |3 - 8| = 5 $$

    4. Example 2: Distance between points (4, 3) and (2, 8):

      $$ \text{Distance} = \sqrt{(2 - 4)^2 + (8 - 3)^2} = \sqrt{4 + 25} = \sqrt{29} \approx 5.39 $$

    5. Example 3: Distance between points (-1, 2, 3) and (4, 0, -3):

      $$ \text{Distance} = \sqrt{(4 - (-1))^2 + (0 - 2)^2 + (-3 - 3)^2} \\ {} \\ = \sqrt{25 + 4 + 36} \\ {} \\ = \sqrt{65} \approx 8.06 $$