Machine Learning Overview
Jan Byška
PA214 - Visualization II
Challenges with Data (4V’s of Data)
• Volume: Dealing with large volumes of data.
Challenges with Data (4V’s of Data)
• Volume: Dealing with large volumes of data.
• Velocity: Handling the speed at which data is generated.
By Maximilien Brice (CERN) – Wikimedia Commons, CC BY-SA 3.0 By Thomas Mc Cauley; Lucas Taylor - CMS Collaboration, CC BY-SA 4.0
Challenges with Data (4V’s of Data)
• Volume: Dealing with large volumes of data.
• Velocity: Handling the speed at which data is generated.
• Variety: Managing different types of data (structured, unstructured, semi-structured).
• Veracity: Ensuring data quality and reliability.
Challenges with Data (5V’s of Data)
• Volume: Dealing with large volumes of data.
• Velocity: Handling the speed at which data is generated.
• Variety: Managing different types of data (structured, unstructured, semi-structured).
• Veracity: Ensuring data quality and reliability.
• Value: Extracting meaningful insights from data.
(Visual) Data Science
Introduction
• Machine Learning:
• “Field of study that gives computers the ability to learn without being explicitly programmed.”
(1959) Arthur Samuel (pioneer in AI & ML)
Introduction
• Hard to write a computer program
• 1040 legal moves variations (between 10111 and 10123)
• Cannot be brute forced
• Cannot be modeled
• Cannot be visualized
• Best players
• Rely on experience
• Computers
• Can obtain „experience“ much faster
Why ML?
Fabiano Caruana (Photo by: Soeren Stache)
Application Examples
Autonomous cars/drones
Source: http://theoatmeal.com/blog/google_self_driving_car
Application Examples
• Black & White 3 (2001)
• Avatar learns from the player
Adaption in games, imitation learning
• The main problems solved by ML
• classification
• clustering
• dimensionality reduction, embedding
• outlier detection
• prediction
• ...
Application Examples
Application Examples
https://www.edureka.co/blog/how-to-become-a-machine-learning-engineer/
Application Examples
Application Examples
• ChatGPT
Principles
• Machine learning:
• Data «tells» what the «good answers» are (training).
• No explicit commands coded
• Key point of ML is the training of the algorithm
• Three main learning styles:
• supervised
• unsupervised
• semi-supervised
https://www.mathworks.com
Learning Styles
• Supervised learning
• Labeled input
• Model prepared through training that requires predictions, corrected when wrong
• Problem examples: classification, regression
• Algorithmic examples: neural networks, Bayes classifiers, decisions trees...
• Unsupervised learning
• Semi-supervised learning
Learning Styles
• Supervised learning
• Labeled input
• Model prepared through training that requires predictions, corrected when wrong
• Problem examples: classification, regression
• Algorithmic examples: neural networks, Bayes classifiers, decisions trees...
• Unsupervised learning
• Semi-supervised learning
Supervised Learning
https://www.edureka.co/blog/how-to-become-a-machine-learning-engineer/
Learning Styles
• Supervised learning
• Labeled input
• Model prepared through training that requires predictions, corrected when wrong
• Problem examples: classification, regression
• Algorithmic examples: neural networks, Bayes classifiers, decisions trees...
• Unsupervised learning
• Semi-supervised learning
Learning Styles
• Supervised Learning
• Unsupervised Learning
• Input not labeled, no known result
• Model is prepared by deducing structures in the data
• Problem examples: clustering, dimensionality reduction
• Algorithmic examples: a priori algorithm, k-means
• Semi-Supervised Learning
Learning Styles
• Supervised Learning
• Unsupervised Learning
• Semi-Supervised Learning
• Input is a mixture of labeled and unlabeled data
• Model has to recognize structures and make predictions
• Problem examples: classification, regression
• Algorithmic examples: label propagation (adaptive learning)
Learning Styles
• Supervised Learning
• Unsupervised Learning
• Semi-Supervised Learning
• Input is a mixture of labeled and unlabeled data
• Model has to recognize structures and make predictions
• Problem examples: classification, regression
• Algorithmic examples: label propagation (adaptive learning)
Learning Styles
• Supervised Learning
• Unsupervised Learning
• Semi-Supervised Learning
• Input is a mixture of labeled and unlabeled data
• Model has to recognize structures and make predictions
• Problem examples: classification, regression
• Algorithmic examples: label propagation (adaptive learning)
Learning Styles
• Supervised Learning
• Unsupervised Learning
• Semi-Supervised Learning
• Input is a mixture of labeled and unlabeled data
• Model has to recognize structures and make predictions
• Problem examples: classification, regression
• Algorithmic examples: label propagation (adaptive learning)
Principles
• The success of a ML algorithm is highly dependent on two key decisions:
• Data representation
• Choice of the classifier
Principles
• The success of a ML algorithm is highly dependent on two key decisions:
• Data representation
• Choice of the classifier
• Data representation:
• What is the important information in the data?
• How simple can the data be represented?
• Is a basis change needed for a better representation?
Principles
• The success of a ML algorithm is highly dependent on two key decisions:
• Data representation
• Choice of the classifier
• Classifier choice:
• The classifier makes the «decisions», a badly suited classifier will make bad decisions
• The choice of the classifier is dependent on:
• Size of the data
• Variance of the data
• Bias of the data
• ….
Algorithm Families
• Group the ML algorithms into groups based on their function
• There is no universal family structure, some algorithms can be placed into
multiple groups
• The following examples are only a fraction of the existing algorithms
33
Supervised Learning
https://bookdown.org/dli/rguide/scatterplots-and-best-fit-lines-two-sets.html
https://bookdown.org/dli/rguide/scatterplots-and-best-fit-lines-two-sets.html
https://bookdown.org/dli/rguide/scatterplots-and-best-fit-lines-two-sets.html
• Modeling the relationship between variables
• Uses one (or multiple) independent variable
• Tries to explain or predict the outcome of the dependent variable
• Predict sales for a company based on weather, previous sales, GDP growth, etc.
• Iteratively refined using a measure of error in the prediction made by the model
• Examples:
• least squares regression
• linear regression
• step-wise regression
Regression Algorithms
X
X
Y
Least Squares Regression
X 1 2 3 4 5 6 7
Y 2 1 5 3 7 6 8
𝐗2
1
4
9
16
25
36
49
28 32 158 140
Y
2
1
5
3
7
6
8
X
1
2
3
4
5
6
7
XY
2
2
15
12
35
36
56
𝑦 = 𝑚𝑥 + 𝑏
X
Y
𝑚 =
𝑛 ∗ σ 𝑥𝑦 − σ 𝑥 ∗ σ 𝑦
𝑛 ∗ σ 𝑥2 − σ 𝑥 2
𝑏 =
σ 𝑦 − 𝑚 ∗ σ 𝑥
𝑛
= 1.07143
= 0.28571
=
7 ∗ 158 − 28 ∗ 32
7 ∗ 140 − 282
=
32 − 1.07143 ∗ 28
7
Least Squares Regression
X 1 2 3 4 5 6 7
Y 2 1 5 3 7 6 8
𝐗2
1
4
9
16
25
36
49
28 32 158 140
Y
2
1
5
3
7
6
8
X
1
2
3
4
5
6
7
XY
2
2
15
12
35
36
56
X
Y
𝑦 = 1.07𝑥 + 0.29
𝑚 =
𝑛 ∗ σ 𝑥𝑦 − σ 𝑥 ∗ σ 𝑦
𝑛 ∗ σ 𝑥2 − σ 𝑥 2
𝑏 =
σ 𝑦 − 𝑚 ∗ σ 𝑥
𝑛
= 1.07143
= 0.28571
=
7 ∗ 158 − 28 ∗ 32
7 ∗ 140 − 282
=
32 − 1.07143 ∗ 28
7
Gestalt Principles
Z
X
Y
Z
Y
Instance-based Algorithms
• Instead of explicit generalization, compare new problems with instances seen in training
• Typically uses a database of example data
• Also called winner-take-all methods
• To reduce complexity and overfitting, instance reduction is used as preprocessing
• Examples:
• k-nearest neighbor
• kernel methods
Z
Y
Instance-based Algorithms
• Constructs feature vectors
• Color of eyes, distance between them, size of the nose
• Use k-NN to compare with database
Instance-based Algorithms
• Instead of explicit generalization, compare new problems with instances seen in training
• Typically uses a database of example data
• Also called winner-take-all methods
• To reduce complexity and overfitting, instance reduction is used as preprocessing
• Examples:
• k-nearest neighbor
• kernel methods
Z
Y
Regularization Algorithms
• Rather an extension to other algorithms
• Introduce additional information to simplify models, reduce overfitting, create a more general
algorithm
• Examples
• Dropout regularization
• Batch normalization
• Early stopping
https://www.analyticsvidhya.com/...
Regularization Algorithms
Predict animal character
Name, color, species, size Character
Ramses, black, cat, small Unfriendly
Snoop, brown, dog, medium Friendly
Boo, green, snake, small Friendly
Lucilia, white, cat, medium Unfriendly
Chap, yellow, dog, big Friendly
Lis, white, dog, medium Friendly
Napolen the third, orange, cat, small Unfriendly
Luke, brown, snail, small Friendly
Antonetta, black, cat, medium Unfriendly
Rule:
Pets with names shorter than 5 letters, that are not small (except for snakes and snails)
and that are not white (except for dogs) are friendly.
Regularization Algorithms
Name, color, species, size Character
Ramses, black, cat, small Unfriendly
Snoop, brown, dog, medium Friendly
Boo, green, snake, small Friendly
Lucilia, white, cat, medium Unfriendly
Chap, yellow, dog, big Friendly
Lis, white, dog, medium Friendly
Napolen the third, orange, cat, small Unfriendly
Luke, brown, snail, small Friendly
Antonetta, black, cat, medium Unfriendly
Predict animal character
Rule:
Cats are unfriendly.
Regularization Algorithms
• Rather an extension to other algorithms
• Introduce additional information to simplify models, reduce overfitting, create a more general
algorithm
• Examples
• Dropout regularization
• Batch normalization
• Early stopping
https://www.analyticsvidhya.com/...
Decision Tree Algorithms
Outlook
Humidity Wind
Sunny
Overcast
Rain
High Normal Weak Strong
Yes
No Yes Yes No
Example from (Machine Learning, Tom Mitchell)
Decision Tree Algorithms
• Construct decision tree as predictive model
• Finite target variable: classification trees
• Continuous target variable: regression trees
• Requires little data preparation
• Can handle numerical and categorical data
• Examples:
• CART (classification and regression trees)
• Decision stump (components in ensembles)
• Random forest (extension of bagging)
Recursive Partitioning
Example from (Machine Learning, Tom Mitchell)
Day Outlook Temperature Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Recursive Partitioning
Example from (Machine Learning, Tom Mitchell)
Day Outlook Temperature Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Recursive Partitioning
Outlook
Overcast
Yes
Example from (Machine Learning, Tom Mitchell)
Sunny Rain
Recursive Partitioning
Day Outlook Temperature Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Example from (Machine Learning, Tom Mitchell)
Recursive Partitioning
Day Outlook Temperature Humidity Wind Play Tennis
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D10 Rain Mild Normal Weak Yes
D14 Rain Mild High Strong No
Example from (Machine Learning, Tom Mitchell)
Recursive Partitioning
Day Outlook Temperature Humidity Wind Play Tennis
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D10 Rain Mild Normal Weak Yes
D14 Rain Mild High Strong No
Example from (Machine Learning, Tom Mitchell)
Recursive Partitioning
Wind
Weak Strong
Yes No
Example from (Machine Learning, Tom Mitchell)
Outlook
Sunny
Overcast
Rain
Yes
Recursive Partitioning
Day Outlook Temperature Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Example from (Machine Learning, Tom Mitchell)
Recursive Partitioning
Day Outlook Temperature Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
Example from (Machine Learning, Tom Mitchell)
Recursive Partitioning
Day Outlook Temperature Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
Example from (Machine Learning, Tom Mitchell)
Recursive Partitioning
Wind
Weak Strong
Yes No
Example from (Machine Learning, Tom Mitchell)
Outlook
Sunny
Overcast
Rain
Yes
Humidity
High Normal
No Yes
Bayesian Network
• Classification based on Bayes’ theorem
• 𝑃(A|B) =
𝑃(𝐵|𝐴) 𝑃(𝐴)
𝑃(𝐵)
• Very fast, real-time prediction
• Explainable = used in medicine
• Simplistic: presence of feature in a class is unrelated to presence of any other feature
• A fruit is an apple if it is round, red, 7cm in diameter
• Cancer: tobacco use, alcohol, unhealthy diet, excess body weight, physical inactivity
• Examples:
• Gaussian Bayes (normal distribution of features)
• Bernoulli Bayes (binary features)
https://towardsdatascience.com/...
Challanges
Challanges
64
Neural Networks
Perceptron
𝑰 𝟏
𝑰 𝟐
𝑰 𝟑
𝑰 𝟒
𝑰 𝟓
σ
Output
𝑦
Threshold
𝑊1
𝑊3
𝑊3
𝑊4
𝑊5
𝑦 = ቊ
1 𝑖𝑓 σ 𝑤𝑖 ⋅ 𝑥𝑖 + 𝑏𝑖 > 0
0 𝑖𝑓 σ 𝑤𝑖 ⋅ 𝑥𝑖 + 𝑏𝑖 ≤ 0
Multilayer Perceptron
𝑋1
𝑋2
𝑋4
𝑋3
𝑌1
𝑌2
𝑌3
Deep Learning Algorithms
Source: mathworks.com
Personalized Sketch-Based Brushing in Scatterplots
• Predicting the user’s brushing goal
• Average brushing preference
• Improve the brushing technique while using it
Personalized Sketch-Based Brushing in Scatterplots
• Predicting the user’s brushing goal
• Average brushing preference
• Improve the brushing technique while using it
Recurrent Neural Network
Bao, Wei, Jun Yue, and Yulei Rao. "A deep learning framework for financial
time series using stacked autoencoders and long-short term memory."
• Internal memory to include the result from previous classification
• Used in cases when temporal domain is important
• Used for speech recognition
Transformers
Transformers
https://builtin.com/artificial-intelligence/transformer-neural-network
Generative Networks
Ian j. Goodfellow et al.
Result Progression
2014
Goodfellow et al.
2015
Radfort et al.
2016
Liu and Tuzel
2017
Karras et al.
Generative Modeling of Cell Shape Using 3D GANs
• Obtaining real data may be expensive
• Generating synthetic cellular specimens
to produce suitable testing datasets
Wiesner, D., Nečasová, T., & Svoboda, D. (2019)
Generative Modeling of Cell Shape Using 3D GANs
• Obtaining real data may be expensive
• Generating synthetic cellular specimens
to produce suitable testing datasets
Wiesner, D., Nečasová, T., & Svoboda, D. (2019)
Result Progression
https://www.boredpanda.com/ai-fails/
Explainable AI
Using a Model to Explain Another
Source: Hung-yi Lee
Explainable ML
• ML explanation != completely know how ML work
Bernard et al. 2018
WARD et al. 2010
Explainable ML
http://juergen-bernard.de/