Sign In

Professor Jo Jeonghyo’s Information Geometry and Machine Learning Trilogy

Haebom
Professor Jo Jeong-hyo from the Department of Physics at Seoul National University wrote this series last year, where you can see how information geometry and machine learning are connected and just how crucial mathematics is in the field of AI.
I picked this up because a PhD I know personally recommended it, and I found it not only fascinating but really well written—so I wanted to share it here, too.
In machine learning, models are generally categorized as either classification models or generative models.
You can represent models using probabilities.
Classification models are described by the conditional probability P(y|x;θ), while generative models correspond to the probability P(x;θ) for the data x.
With a probabilistic model, it is possible to generate samples by selecting data x with a higher probability.
The explanation centers around probability models from the exponential family.
Exponential family probability models allow you to compute accumulation through the cumulative generating function.
The Bregman distance can be used to define the distance between different probability models.
The Bregman distance in exponential family probability models is related to the Pythagorean theorem.
By using projections in both the original and dual spaces, you can determine the distance between models.
The Kullback–Leibler distance is one way to define the distance between probability models.
The Kullback–Leibler distance can be used to derive sufficient statistics related to data compression.
Sufficient statistics contain all the information necessary to estimate the model's parameters; other information is irrelevant.
When you measure the distance between probability models by transforming variables, information is lost, so the distance decreases.
The f-distance is one way to measure distances using a concave function.
The Kullback–Leibler distance is a specific measure that's invariant to sufficient statistics.
The Kullback–Leibler distance is frequently used in machine learning to compare models or data distributions.
Using gradient descent is important for optimization.
Natural gradient descent is more efficient at updating parameters because it takes the curvature of the objective function into account.
Natural gradient descent is invariant to changes in scale.
Gradient descent and natural gradient descent are closely related, as natural gradient descent uses the concave function corresponding to the Legendre transform.
Mirror descent using dual space includes natural gradient descent as a special case.
Mirror descent enables natural gradient descent without having to consider curvature.
Using a data-dependent Bregman distance lets you update the model parameters with the data taken into account.
The distance between models and the curvature of the objective function are key geometric concepts in machine learning.
The development of information geometry and machine learning is closely interconnected.
Subscribe to 'haebom'
📚 Welcome to Haebom's archives.
---
I post articles related to IT 💻, economy 💰, and humanities 🎭.
If you are curious about my thoughts, perspectives or interests, please subscribe.
haebom@kakao.com
Subscribe
1
Haebom
참고 하시면 좋습니다.
Maths_for_ML.pdf5.47MB
😇
1
See latest comments