English
Share
Sign In
Professor Jo Jeong-hyo’s Information Geometry and Machine Learning Trilogy
Haebom
1
👍
This is a series of articles written by Professor Jeong-Hyo Cho of the Department of Physics at Seoul National University last year, which explains how information geometry and machine learning are related and how important mathematics is in the field of artificial intelligence.
I read this book because a doctor I know recommended it to me, and I found it so interesting and well-written that I wanted to recommend it.
Machine learning models are divided into classification models and generative models.
Models can be expressed using probabilities.
The classification model is expressed as a conditional probability P(y|x;θ), and the generative model corresponds to the probability P(x;θ) of data x.
It is possible to generate a sample that selects data x with a high probability using a probability model.
Explanation focusing on probability models of the exponential family.
The probability model of the exponential family can calculate accumulation through the cumulative generating function.
The distance between probability models can be defined using the Bregman distance.
The Bregman distance of exponential family probability models is related to the Pythagorean theorem.
The distance between models can be obtained by using projections in the original space and the dual space.
The Kullback–Leibler distance is one of the ways to define the distance between probability models.
The Kullback-Leibler distance can be used to obtain sufficient statistics related to data compression.
Sufficient statistics contain the information necessary to estimate the parameters of the model, and other information is irrelevant.
When measuring the distance between probability models through variable transformation, the distance is reduced because information is lost.
The f-distance is one of the distance measurement methods using a concave function.
The Kullback–Leibler distance is a special distance that is invariant to sufficient statistics.
The Kullback–Leibler distance is often used in machine learning to compare models or data distributions.
It is important to use gradient descent for optimization.
Natural gradient descent is more effective in updating parameters by taking into account the curvature of the objective function.
Natural gradient descent is invariant to scale changes.
Gradient descent and natural gradient descent are related, and natural gradient descent uses concave functions corresponding to the Legendre transform.
Mirror descent using dual space includes natural gradient descent.
Mirror descent allows for natural gradient descent without considering curvature.
Data-dependent Bregman distance allows us to update model parameters while taking the data into account.
The distance between models and the curvature of the objective function are important geometric concepts in machine learning.
Advances in information geometry and machine learning are interrelated.
Subscribe to 'haebom'
📚 Welcome to Haebom's archives.
---
I post articles related to IT 💻, economy 💰, and humanities 🎭.
If you are curious about my thoughts, perspectives or interests, please subscribe.
Would you like to be notified when new articles are posted? 🔔 Yes, that means subscribe.
haebom@kakao.com
Subscribe
1
👍
    Haebom
    참고 하시면 좋습니다.
    😇
    1