Changyu Lee

Clustering & Gaussian Mixture Model

Published at
Last edited time
2026/02/13 00:32
Created
2026/02/12 23:47
Section
ML
Status
In progress
Series
From the Bottom
Tags
Lecture Summary
AI summary
Keywords
ML
Language
ENG
Week
6
1 more property
Summary

Clustering

Given: a set of adata points (feature vectors) without labels
Output: group the data into some clusters, which means
assign each point to a specific cluster
find the center (representative / prototype / … ) of each cluster
Formal Definition
Given: data points (x1,,xNRD)( x_1, \dots, x_N \in \mathbb{R}^D ) and #clusters KK we want
Output: group the data into KK clusters, which means
find assignment γnk{0,1}\gamma_{nk} \in \{0,1\} for each data point n[N]n \in [N] and k[K]k \in [K]
such that k[K]γnk=1for any fixed n\sum_{k \in [K]} \gamma_{nk} = 1 \quad \text{for any fixed } n
find the cluster centers μ1,,μKRD\mu_1, \dots, \mu_K \in \mathbb{R}^D
Example
Turn it into optimization problem
using K-means Objective
Alternative
Closer Look
K-means Algorithm
How to initialize it?
Why initialization matters?
Simple example: 4 data points, 2 clusters, 2 different initializations
randomly
Greedy Approach
K-means++
expected K-means objective
UntitledUntitled

Gaussian Mixture Model (GMM)

GMM is a probabilistic approach for clustering
more explanatory than minimizing the K-means objective
can be seen as a soft version of K-means
Expectation-Maximizatoin (EM) algorithm
Formal Definition

Learning GMMs

How to learn Parameters?

EM Algorithm

E-Step

M-Step

EM for Learning GMMs in total

GMM and K-means

List view
Search
Wiki
Overview of ML & Classification
Lecture Summary
ML
2025/12/22
Done
Overview of ML & Classification
Lecture Summary
ML
2025/12/22
Done