Machine Learning Overview Jan Byška PA214 - Visualization II Challenges with Data (4V’s of Data) • Volume: Dealing with large volumes of data. Challenges with Data (4V’s of Data) • Volume: Dealing with large volumes of data. • Velocity: Handling the speed at which data is generated. By Maximilien Brice (CERN) – Wikimedia Commons, CC BY-SA 3.0 By Thomas Mc Cauley; Lucas Taylor - CMS Collaboration, CC BY-SA 4.0 Challenges with Data (4V’s of Data) • Volume: Dealing with large volumes of data. • Velocity: Handling the speed at which data is generated. • Variety: Managing different types of data (structured, unstructured, semi-structured). • Veracity: Ensuring data quality and reliability. Challenges with Data (5V’s of Data) • Volume: Dealing with large volumes of data. • Velocity: Handling the speed at which data is generated. • Variety: Managing different types of data (structured, unstructured, semi-structured). • Veracity: Ensuring data quality and reliability. • Value: Extracting meaningful insights from data. (Visual) Data Science Introduction • Machine Learning: • “Field of study that gives computers the ability to learn without being explicitly programmed.” (1959) Arthur Samuel (pioneer in AI & ML) Introduction • Hard to write a computer program • 1040 legal moves variations (between 10111 and 10123) • Cannot be brute forced • Cannot be modeled • Cannot be visualized • Best players • Rely on experience • Computers • Can obtain „experience“ much faster Why ML? Fabiano Caruana (Photo by: Soeren Stache) Application Examples Autonomous cars/drones Source: http://theoatmeal.com/blog/google_self_driving_car Application Examples • Black & White 3 (2001) • Avatar learns from the player Adaption in games, imitation learning • The main problems solved by ML • classification • clustering • dimensionality reduction, embedding • outlier detection • prediction • ... Application Examples Application Examples https://www.edureka.co/blog/how-to-become-a-machine-learning-engineer/ Application Examples Application Examples • ChatGPT Principles • Machine learning: • Data «tells» what the «good answers» are (training). • No explicit commands coded • Key point of ML is the training of the algorithm • Three main learning styles: • supervised • unsupervised • semi-supervised https://www.mathworks.com Learning Styles • Supervised learning • Labeled input • Model prepared through training that requires predictions, corrected when wrong • Problem examples: classification, regression • Algorithmic examples: neural networks, Bayes classifiers, decisions trees... • Unsupervised learning • Semi-supervised learning Learning Styles • Supervised learning • Labeled input • Model prepared through training that requires predictions, corrected when wrong • Problem examples: classification, regression • Algorithmic examples: neural networks, Bayes classifiers, decisions trees... • Unsupervised learning • Semi-supervised learning Supervised Learning https://www.edureka.co/blog/how-to-become-a-machine-learning-engineer/ Learning Styles • Supervised learning • Labeled input • Model prepared through training that requires predictions, corrected when wrong • Problem examples: classification, regression • Algorithmic examples: neural networks, Bayes classifiers, decisions trees... • Unsupervised learning • Semi-supervised learning Learning Styles • Supervised Learning • Unsupervised Learning • Input not labeled, no known result • Model is prepared by deducing structures in the data • Problem examples: clustering, dimensionality reduction • Algorithmic examples: a priori algorithm, k-means • Semi-Supervised Learning Learning Styles • Supervised Learning • Unsupervised Learning • Semi-Supervised Learning • Input is a mixture of labeled and unlabeled data • Model has to recognize structures and make predictions • Problem examples: classification, regression • Algorithmic examples: label propagation (adaptive learning) Learning Styles • Supervised Learning • Unsupervised Learning • Semi-Supervised Learning • Input is a mixture of labeled and unlabeled data • Model has to recognize structures and make predictions • Problem examples: classification, regression • Algorithmic examples: label propagation (adaptive learning) Learning Styles • Supervised Learning • Unsupervised Learning • Semi-Supervised Learning • Input is a mixture of labeled and unlabeled data • Model has to recognize structures and make predictions • Problem examples: classification, regression • Algorithmic examples: label propagation (adaptive learning) Learning Styles • Supervised Learning • Unsupervised Learning • Semi-Supervised Learning • Input is a mixture of labeled and unlabeled data • Model has to recognize structures and make predictions • Problem examples: classification, regression • Algorithmic examples: label propagation (adaptive learning) Principles • The success of a ML algorithm is highly dependent on two key decisions: • Data representation • Choice of the classifier Principles • The success of a ML algorithm is highly dependent on two key decisions: • Data representation • Choice of the classifier • Data representation: • What is the important information in the data? • How simple can the data be represented? • Is a basis change needed for a better representation? Principles • The success of a ML algorithm is highly dependent on two key decisions: • Data representation • Choice of the classifier • Classifier choice: • The classifier makes the «decisions», a badly suited classifier will make bad decisions • The choice of the classifier is dependent on: • Size of the data • Variance of the data • Bias of the data • …. Algorithm Families • Group the ML algorithms into groups based on their function • There is no universal family structure, some algorithms can be placed into multiple groups • The following examples are only a fraction of the existing algorithms 33 Supervised Learning https://bookdown.org/dli/rguide/scatterplots-and-best-fit-lines-two-sets.html https://bookdown.org/dli/rguide/scatterplots-and-best-fit-lines-two-sets.html https://bookdown.org/dli/rguide/scatterplots-and-best-fit-lines-two-sets.html • Modeling the relationship between variables • Uses one (or multiple) independent variable • Tries to explain or predict the outcome of the dependent variable • Predict sales for a company based on weather, previous sales, GDP growth, etc. • Iteratively refined using a measure of error in the prediction made by the model • Examples: • least squares regression • linear regression • step-wise regression Regression Algorithms X X Y Least Squares Regression X 1 2 3 4 5 6 7 Y 2 1 5 3 7 6 8 𝐗2 1 4 9 16 25 36 49 28 32 158 140 Y 2 1 5 3 7 6 8 X 1 2 3 4 5 6 7 XY 2 2 15 12 35 36 56 𝑦 = 𝑚𝑥 + 𝑏 X Y 𝑚 = 𝑛 ∗ σ 𝑥𝑦 − σ 𝑥 ∗ σ 𝑦 𝑛 ∗ σ 𝑥2 − σ 𝑥 2 𝑏 = σ 𝑦 − 𝑚 ∗ σ 𝑥 𝑛 = 1.07143 = 0.28571 = 7 ∗ 158 − 28 ∗ 32 7 ∗ 140 − 282 = 32 − 1.07143 ∗ 28 7 Least Squares Regression X 1 2 3 4 5 6 7 Y 2 1 5 3 7 6 8 𝐗2 1 4 9 16 25 36 49 28 32 158 140 Y 2 1 5 3 7 6 8 X 1 2 3 4 5 6 7 XY 2 2 15 12 35 36 56 X Y 𝑦 = 1.07𝑥 + 0.29 𝑚 = 𝑛 ∗ σ 𝑥𝑦 − σ 𝑥 ∗ σ 𝑦 𝑛 ∗ σ 𝑥2 − σ 𝑥 2 𝑏 = σ 𝑦 − 𝑚 ∗ σ 𝑥 𝑛 = 1.07143 = 0.28571 = 7 ∗ 158 − 28 ∗ 32 7 ∗ 140 − 282 = 32 − 1.07143 ∗ 28 7 Gestalt Principles Z X Y Z Y Instance-based Algorithms • Instead of explicit generalization, compare new problems with instances seen in training • Typically uses a database of example data • Also called winner-take-all methods • To reduce complexity and overfitting, instance reduction is used as preprocessing • Examples: • k-nearest neighbor • kernel methods Z Y Instance-based Algorithms • Constructs feature vectors • Color of eyes, distance between them, size of the nose • Use k-NN to compare with database Instance-based Algorithms • Instead of explicit generalization, compare new problems with instances seen in training • Typically uses a database of example data • Also called winner-take-all methods • To reduce complexity and overfitting, instance reduction is used as preprocessing • Examples: • k-nearest neighbor • kernel methods Z Y Regularization Algorithms • Rather an extension to other algorithms • Introduce additional information to simplify models, reduce overfitting, create a more general algorithm • Examples • Dropout regularization • Batch normalization • Early stopping https://www.analyticsvidhya.com/... Regularization Algorithms Predict animal character Name, color, species, size Character Ramses, black, cat, small Unfriendly Snoop, brown, dog, medium Friendly Boo, green, snake, small Friendly Lucilia, white, cat, medium Unfriendly Chap, yellow, dog, big Friendly Lis, white, dog, medium Friendly Napolen the third, orange, cat, small Unfriendly Luke, brown, snail, small Friendly Antonetta, black, cat, medium Unfriendly Rule: Pets with names shorter than 5 letters, that are not small (except for snakes and snails) and that are not white (except for dogs) are friendly. Regularization Algorithms Name, color, species, size Character Ramses, black, cat, small Unfriendly Snoop, brown, dog, medium Friendly Boo, green, snake, small Friendly Lucilia, white, cat, medium Unfriendly Chap, yellow, dog, big Friendly Lis, white, dog, medium Friendly Napolen the third, orange, cat, small Unfriendly Luke, brown, snail, small Friendly Antonetta, black, cat, medium Unfriendly Predict animal character Rule: Cats are unfriendly. Regularization Algorithms • Rather an extension to other algorithms • Introduce additional information to simplify models, reduce overfitting, create a more general algorithm • Examples • Dropout regularization • Batch normalization • Early stopping https://www.analyticsvidhya.com/... Decision Tree Algorithms Outlook Humidity Wind Sunny Overcast Rain High Normal Weak Strong Yes No Yes Yes No Example from (Machine Learning, Tom Mitchell) Decision Tree Algorithms • Construct decision tree as predictive model • Finite target variable: classification trees • Continuous target variable: regression trees • Requires little data preparation • Can handle numerical and categorical data • Examples: • CART (classification and regression trees) • Decision stump (components in ensembles) • Random forest (extension of bagging) Recursive Partitioning Example from (Machine Learning, Tom Mitchell) Day Outlook Temperature Humidity Wind Play Tennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Recursive Partitioning Example from (Machine Learning, Tom Mitchell) Day Outlook Temperature Humidity Wind Play Tennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Recursive Partitioning Outlook Overcast Yes Example from (Machine Learning, Tom Mitchell) Sunny Rain Recursive Partitioning Day Outlook Temperature Humidity Wind Play Tennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Example from (Machine Learning, Tom Mitchell) Recursive Partitioning Day Outlook Temperature Humidity Wind Play Tennis D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D10 Rain Mild Normal Weak Yes D14 Rain Mild High Strong No Example from (Machine Learning, Tom Mitchell) Recursive Partitioning Day Outlook Temperature Humidity Wind Play Tennis D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D10 Rain Mild Normal Weak Yes D14 Rain Mild High Strong No Example from (Machine Learning, Tom Mitchell) Recursive Partitioning Wind Weak Strong Yes No Example from (Machine Learning, Tom Mitchell) Outlook Sunny Overcast Rain Yes Recursive Partitioning Day Outlook Temperature Humidity Wind Play Tennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Example from (Machine Learning, Tom Mitchell) Recursive Partitioning Day Outlook Temperature Humidity Wind Play Tennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D11 Sunny Mild Normal Strong Yes Example from (Machine Learning, Tom Mitchell) Recursive Partitioning Day Outlook Temperature Humidity Wind Play Tennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D11 Sunny Mild Normal Strong Yes Example from (Machine Learning, Tom Mitchell) Recursive Partitioning Wind Weak Strong Yes No Example from (Machine Learning, Tom Mitchell) Outlook Sunny Overcast Rain Yes Humidity High Normal No Yes Bayesian Network • Classification based on Bayes’ theorem • 𝑃(A|B) = 𝑃(𝐵|𝐴) 𝑃(𝐴) 𝑃(𝐵) • Very fast, real-time prediction • Explainable = used in medicine • Simplistic: presence of feature in a class is unrelated to presence of any other feature • A fruit is an apple if it is round, red, 7cm in diameter • Cancer: tobacco use, alcohol, unhealthy diet, excess body weight, physical inactivity • Examples: • Gaussian Bayes (normal distribution of features) • Bernoulli Bayes (binary features) https://towardsdatascience.com/... Challanges Challanges 64 Neural Networks Perceptron 𝑰 𝟏 𝑰 𝟐 𝑰 𝟑 𝑰 𝟒 𝑰 𝟓 σ Output 𝑦 Threshold 𝑊1 𝑊3 𝑊3 𝑊4 𝑊5 𝑦 = ቊ 1 𝑖𝑓 σ 𝑤𝑖 ⋅ 𝑥𝑖 + 𝑏𝑖 > 0 0 𝑖𝑓 σ 𝑤𝑖 ⋅ 𝑥𝑖 + 𝑏𝑖 ≤ 0 Multilayer Perceptron 𝑋1 𝑋2 𝑋4 𝑋3 𝑌1 𝑌2 𝑌3 Deep Learning Algorithms Source: mathworks.com Personalized Sketch-Based Brushing in Scatterplots • Predicting the user’s brushing goal • Average brushing preference • Improve the brushing technique while using it Personalized Sketch-Based Brushing in Scatterplots • Predicting the user’s brushing goal • Average brushing preference • Improve the brushing technique while using it Recurrent Neural Network Bao, Wei, Jun Yue, and Yulei Rao. "A deep learning framework for financial time series using stacked autoencoders and long-short term memory." • Internal memory to include the result from previous classification • Used in cases when temporal domain is important • Used for speech recognition Transformers Transformers https://builtin.com/artificial-intelligence/transformer-neural-network Generative Networks Ian j. Goodfellow et al. Result Progression 2014 Goodfellow et al. 2015 Radfort et al. 2016 Liu and Tuzel 2017 Karras et al. Generative Modeling of Cell Shape Using 3D GANs • Obtaining real data may be expensive • Generating synthetic cellular specimens to produce suitable testing datasets Wiesner, D., Nečasová, T., & Svoboda, D. (2019) Generative Modeling of Cell Shape Using 3D GANs • Obtaining real data may be expensive • Generating synthetic cellular specimens to produce suitable testing datasets Wiesner, D., Nečasová, T., & Svoboda, D. (2019) Result Progression https://www.boredpanda.com/ai-fails/ Explainable AI Using a Model to Explain Another Source: Hung-yi Lee Explainable ML • ML explanation != completely know how ML work Bernard et al. 2018 WARD et al. 2010 Explainable ML http://juergen-bernard.de/