Introduction to Regression
(linear, logistic, multivariate, nonlinear)
and a message at the end
Martin Sebera
Magdeburg, January 2024

Content
•I. Linear regression
•II. Logistic regression
•III. Multivariate Regression
•IV. Nonlinear regression - Neural net

What it is regression?
•statistical method that helps us understand the relationship between variables.
•exploring how one variable affects another.
•In a sports setting, we can use linear regression to predict outcomes or analyze performances.
Let's demonstrate this with three examples:

History of regression
•The term regression comes from the works of anthropologist and meteorologist Francis Galton, which
he presented to the public between 1877 and 1885.
•The question of heredity and specifically the relationship between the height of fathers and their
first-born sons.
•The "return tendency" of the next generation towards the mean was called regression by Galton (he
originally called this phenomenon reversion, which he later changed to regression = a step back).
•Although the current concept of regression analysis has little in common with Galton's original
intention, the idea of accessing empirical data has remained, and the term regression has become so
accepted that it is still used today

Correlation
•Correlation - the mutual relationship between two variables.
•If there is a correlation between two variables, it is likely that they depend on each other, but
this does not mean that one of them must be the cause and the other the effect. The correlation
alone does not allow to decide.

Procedure
•Model design, where we choose the appropriate shape of the regression function. If the theoretical
model is not known, we analyze the point diagram and the graph of conditional averages.
•Estimation of regression parameters and tests of their significance.
•Regression diagnostics, when we perform residual analysis and identification of influential
points.
•Assessment of model quality. The result is either the acceptance of the proposed model or the
design of another model.

Regression procedure
•Working with regression models is actually much more difficult.
•It is necessary to test many assumptions (normality, homogeneity of variances, multicollinearity),
choose an appropriate method (method of least squares, maximum likelihood), test residuals, analyze
the quality of the model (residual variance, index of determination, Akaike information criterion,
ROC curve, Gain graph), etc. .
•The following examples are more emotive, which are intended to show the possibilities of
regression.

I. Linear regression
•Linear regression analysis is used to predict the value of a variable based on the value of
another variable. The variable you want to predict is called the dependent variable. The variable
you are using to predict the other variable's value is called the independent variable.
•It mathematically models the unknown or dependent variable and the known or independent variable
as a linear equation.
•

The most frequently used functions


Example 1
•dependence of performance in the long jump on performance in the 100 m run
•LJ = -0,98 * 100m + 17,94
•
•Geometrically speaking,
the coefficient of the
independent variable
is the tangent of the
angle the line makes
with the x-axis.
•arctg(-0,9796) = -45°

Example 2
•
•
•
•
•
•
•Linear and quadratic linear regression models.
•Quadratic regression has a slightly higher quality (both models are very accurate, because R2 →
1). The quadratic model takes into account "fatigue" - the decrease in speed during the sprint

Example 3 - Shooting success
in basketball
•If the horizontal distance of the basketball player
from the basket increases, the percentage of shooting
success decreases with this distance
•
foot
m
%
3
0,9
62
6
1,8
52
9
2,7
40
12
3,7
32
15
4,6
28
18
5,5
24
21
6,4
21
24
7,3
20
27
8,2
18
30
9,1
17
40
12
13

Example 3 - Shooting success
in basketball
•However, the best regression model will be a power
model. Why? See how the curve would behave
with further distance…
foot
m
%
3
0,9
62
6
1,8
52
9
2,7
40
12
3,7
32
15
4,6
28
18
5,5
24
21
6,4
21
24
7,3
20
27
8,2
18
30
9,1
17
40
12
13

Example 3 - Shooting
success in basketball in SPSS
•The chosen type of regression function must first of all respect the logical and objective
connections of the phenomena and their regularities
•SPSS → Analyze → Regression → Curve estimation

•


Example 4 – height, weight
•We take the measures of people in the class and try to estimate the shape of the regression curve


II. Logistic regression
•Logistic regression is a statistical method used to analyze data where the dependent variable is
categorical, usually binary (ie has two possible values, such as yes/no, success/failure, 0/1).
•The main goal of logistic regression is to model the probability that a given input sample belongs
to one of two categories.
•Example:
–Pollard, R., & Reep, C. (1997). Measuring the Effectiveness of Playing Strategies at
Soccer. Journal of the Royal Statistical Society. Series D (The Statistician), 46(4), 541–550.
•

Logistic regression - example
•Pollard and Reep (1997) used logistic regression to investigate the effectiveness of different
strategies in soccer and their effect on the probability of scoring a goal. They wanted to discover
how certain characteristics of the situation affect the probability of a goal being scored. As the
basic event (dependent variable), they chose a situation that ended with a shot on goal. There were
489 of these events. They also identified the characteristics of the situations that could affect
the outcome of the event. They chose as predictors:
•distance from the goal in meters (DIST);
•the angle (ANGLE) to the nearest goal post;
•measure of how many touches the player had with the ball before shooting: one (TOUCH = 0), more
than one (TOUCH = 1);
•distance measure of the closest opponent: less than one meter (TIGHTNESS = 0), more than one meter
(TIGHTNESS = 1);
•origin of ball acquisition: from play (GAIN = 0), free kick or throw from the sideline (GAIN = 1).

Logistic regression - example
•The available information made it possible to complete these variables for all 489 events. Head
shots and kick shots were analyzed in particular. For 410 kick shots, the regression equation was
found:
•Ln(goal chance) = 1.245 - 0.219 DIST - 1.578 ANGLE + 0.947 TIGHTNESS - 1.069 GAIN
•This formula allows you to calculate the probability of scoring a goal in different situations.
•For example, let's assume a kick from 15 meters (DIST=15) directly in front of the goal (ANGLE=0)
with an opponent less than one meter away (TIGHTNESS=0) when the player got to the ball after a
free kick (GAIN=0). The value of Ln(goal chance) = y = 3.109. The probability is calculated
according to the formula

Logistic regression - example
Pollard, R., & Reep, C. (1997). Measuring the Effectiveness of Playing Strategies at
Soccer. Journal of the Royal Statistical Society., 46(4), 541–550.

III. Multivariate Regression
•Multivariate Regression is a method used to measure the degree at which more than one independent
variable (predictors) and more than one dependent variable (responses), are linearly related.

Multivariate Regression - example
•For twenty selected households, data on quarterly expenditure on food and beverages (y), quarterly
household income (x1), number of children (x2), average age of earning household members (x3) and
number of household members (x4) were obtained.
•Decide which variables contribute significantly to explaining the variability in quarterly
spending values.
•
•Try to guess which independent variables what will they be?

Multivariate Regression - example
x1
x2
x3
x4
y
11172
0
55
1
3464
8868
0
21
1
1982
17414
0
49
1
3228
10730
0
22
1
3034
24110
0
62,5
2
10146
38530
0
57
2
8202
22902
0
54,5
2
9332
25448
0
57,5
2
7096
20326
0
28
2
6248
39186
1
38,5
3
13816
28758
1
45,5
3
10328
33658
1
28,5
3
4786
24272
1
36
3
9710
30386
2
35
4
10778
31750
2
30,5
4
10568
39456
2
32,5
4
14260
48458
2
38
4
10934
37990
2
37
4
6388
24920
2
33,5
4
8584
40064
3
47
5
16950

Multivariate Regression – example
•y = - 4027 + 0.042063 x1 - 1348.3 x2 + 84.188 x3 + 3353.4 x4
•with the adjusted coefficient of determination, which takes into account the number of independent
variables, R2 = 0.629 and the residual standard deviation se = 2448.5.

IV. Nonlinear regression - Neural net
•assumptions: normality of data, homogenity of variances
(® parametric vs. nonparametric methods)
nominal, ordinal, categorical variables cannot be combined in model
•often these conditions are not met
•Many inputs generate an output that is a nonlinear function of the weighted sum of these inputs.
•The weights assigned to each of the inputs are obtained on the basis of a learning process, where
the generated outputs are compared with the so-called target outputs.
•The obtained deviations between the known values and the obtained outputs serve as feedback for
the adjustment of the weights.

Nonlinear regression - Neural net
•NN is a method in artificial intelligence
•NN teaches computers to process data in a way that is inspired by the human brain.
•It is a type of machine learning process, called deep learning, that uses interconnected nodes or
neurons in a layered structure that resembles the human brain.
•In other words, it is a very complex regression, where I have one dependent and many independent

Nonlinear regression - Neural net
•Multilayer Perceptron (MLP): class of feed-forward neural networks
•3 types of layers - the input layer, output layer and hidden layer
•Activation Functions: defines how the weighted sum of the input is transformed into an output from
a node or nodes in a layer of the network

Example
Overtraining
•Bernacikova, M., Kumstat, M., Buresova, I., Kapounkova, K., Struhar, I., ☺ Sebera, M., & Paludo,
A. C. (2022). Preventing chronic fatigue in Czech young athletes: The features description of the
“SmartTraining” mobile application. FRONTIERS IN PHYSIOLOGY, 13, 919982.
https://doi.org/10.3389/fphys.2022.919982
A picture containing text, screenshot, businesscard Description automatically generated


•How to get variables that are numerical, categorical, nominal ordinal into one regression model?
•The assumptions of data normality, homogeneity of variances, etc. are not met.
Example Overtraining

Figure 1 – A Simple Neural Network
MLP 30-11-1
A neural network with 30 inputs, one hidden layer with 11 hidden neurons and 1 output neuron.
Example Overtraining


The most important predictors of overtraining
•Amount of regeneration (active regeneration)
•Sleep (pasive regeneration)
•Number of tournaments/races per year
•Type of sport
•…

CONCLUSION


How to fight disinformation and conspiracy?


How to fight disinformation and conspiracy?


How to fight disinformation and conspiracy?


How to fight disinformation and conspiracy?


How to fight disinformation and conspiracy?


How to fight disinformation and conspiracy?


•Verifying Claims and Sources:
•This involves analyzing data sets that can confirm or disprove certain claims.
•Recognition and detection of data manipulation:
•Statistics offers tools for identifying unusual or improbable patterns in data, which may signal
an attempt at disinformation.
•Use of predictive regression models:
•Statistical modeling and machine learning can help predict the spread of misinformation and
identify potential new misinformation before it spreads.
•Statistics Education Initiative:
•Which teach the public how to interpret data and statistical results. This helps people better
understand how data is used to support different arguments.
•Promoting data transparency and openness
•so that the public can verify information and conduct independent analysis.
How to fight disinformation and conspiracy?