Network Defence Strategy Evaluation:
Simulation vs. Live Network
Jana Medková∗†, Martin Husák∗†, Martin Drašar∗
∗Institute of Computer Science, †Faculty of Informatics
Masaryk University
Brno, Czech Republic
{medkova, husakm, drasar}@ics.muni.cz
Abstract—A lot of research has been dedicated to ﬁnding an
optimal strategy to defend network infrastructure. The proposed
methods are usually evaluated using simulations, replayed attacks
or testbed environments. However, these evaluation methods may
give biased results, because in real life, attackers can follow
a suboptimal strategy or react to a defence in an unexpected
way. In this paper, we use a network of honeypots as a testing
environment for evaluating network defence strategies. The
honeypot network provides the opportunity to test a defence
strategy against real attackers and is not as time and resource
consuming as using white hat hackers. In our experiment, we
use two different strategies to defend a group of honeypots in
a live network and we compare these results to the results of
a simulation with replayed attacks. We show that the results
of the strategies in the simulation signiﬁcantly differ from the
results on the honeypot network which implies simulations are
not sufﬁcient for strategy evaluation. We also investigate how the
attacker adapts to the responses taken by a defence strategy and
how this change in behaviour affects the evaluation results.
I. INTRODUCTION
A lot of research has been dedicated to ﬁnding an optimal
strategy to defend network infrastructure [1]. There are many
theoretical approaches, each having its advantages and disadvantages.
Although many of the proposed methods work well
in theory, there are too few cases of them being evaluated and
veriﬁed in practice.
Network defence strategies are usually evaluated in a simulated
environment. Deployment in a real environment requires
a lot of effort to set up and is risky, because the test network
could be abused by the attacker. Internal evaluation with a
white hat hacker would mitigate the risk, however, it is even
more time-consuming and getting a signiﬁcant number of
results for a detailed analysis is almost impossible.
Simulated scenarios are easily set up and do not pose any
risks. A defence strategy is tested against selected attacks and
the efﬁciency of the defence strategy’s response is evaluated.
The attacks are usually based on attacks from public datasets,
attack vectors or, sometimes, an optimal attack strategy obtained
as a by-product of defence strategy synthesis.
The evaluation in a simulated environment is extremely
useful for the initial evaluation of a defence strategy and
has a lot of advantages. It is very easily set up, the results
are almost immediately available, the datasets provide a wide
variety of labelled attacks and a simulated environment allows
for varying simulation parameters and, therefore, it easily tests
different aspects of the strategy. However, there are a couple
of potential drawbacks.
The ﬁrst drawback is that the attacks in the datasets may be
outdated and may not contain the latest attacks. The second
drawback is that the simulated scenarios probably do not
consider changes in an attacker’s behaviour in response to
the actions taken by the strategy. In reality an attacker may
adapt to the defence and change their actions in an unexpected
way. This could signiﬁcantly affect the evaluation’s results.
Using replayed attacks would stop being a sufﬁcient evaluation
option. Implementing an attacker’s logic in the simulated
attacks, based on attack vectors, is very demanding and prone
to incompleteness.
In this paper, we focus on the consequences of the second
drawback. We investigate whether the evaluation results are
signiﬁcantly affected by the attacker’s change in behavior.
We show that the evaluation gives different results in a
simulated environment compared to on a live network facing
real attackers. To formalize the scope of our work, we pose
two research questions which we shall answer:
1) What are the differences between defence strategy evaluation
in simulated and real environments?
2) Does the attacker change his behaviour based on the
defender’s actions?
In our experiment, we use a honeypot network to evaluate
two response strategies in a live network against real attackers.
Then, we use the same setup in a simulated scenario with replayed
attacks, which were observed on the honeypot network.
We found signiﬁcant differences between the results in the real
and simulated environment.
The beneﬁt of using the honeypot network is that the
network is attacked by a wide variety of attackers with no
risk of actual damage being done. Therefore, we gain a
high number of black hat hackers working towards our goal
with minimal effort and resources required on our side. The
honeypots also provide very detailed logging, because their
purpose is to analyse attackers’ actions. Therefore, estimating
the impact of the attack on a honeypot is much easier than on
a regular host. Moreover, we may be sure the trafﬁc consists
solely of attacks, since legitimate users would never access
the honeypot. The disadvantage is that attackers sooner or
later recognize they are interacting with a honeypot. Thus, we
cannot test the response strategy against multi-stage attacks.
This paper is organised into ﬁve sections. Related work
is surveyed in Section II. We describe the methodology of
the experiment and the evaluated strategies in Section III.
The results of the evaluation of the strategies are shown in
Section IV. The paper is concluded in Section V
II. RELATED WORK
We are unaware of a paper that would focus solely on the
topic of strategy evaluation, however, the area is touched upon
in mostly every paper on Intrusion Response Systems (IRS).
Therefore, we mention the methodology of strategy evaluation
in those papers. We took a list of the latest intrusion response
system papers from [1] and went through the techniques used
for evaluating network based IRSs.
The evaluation techniques can be classiﬁed based on the
environment used, source of attacks and on evaluated aspects
of the strategy, such as performance against the attack (how
well does the strategy defend the network), sensitivity (how
much are the results dependent on inputs) and scalability (what
size networks could the strategy defend).
The most basic evaluation used is the veriﬁcation of the
strategy’s decision logic. The evaluation is only theoretical as
the strategy is not implemented in any environment. Several attack
scenarios are considered based on existing attack vectors.
The authors show which actions are chosen by the strategy
in each scenario and argue that those actions are the most
advantageous in the situation [2], [3], so the only evaluated
aspect of the strategy is performance. This type of evaluation
is very limited and is not sufﬁcient. It is used also by Kanoun et
al. [4], however, the authors are preparing for a more extensive
evaluation in a live deployment.
Most of the strategies are evaluated in a simulated environment.
In the simulation scenarios, the sources of the attacks are
usually either replayed attacks [5], [6], [7] or simulated attacks
based on attack vectors [8], [9]. All of the works evaluate only
the strategy performance except for Strasburg et al. [6], who
evaluate the performance and scalability of the strategy, and
Wang et al. [9], who vary simulation parameters to test the
performance, sensitivity and scalability. The replayed attacks
come from various public DARPA datasets. The simulated
attacks, based on attack vectors, cover only a few scenarios.
The most realistic way to evaluate a defence strategy is
a real environment [10], [11]. The strategies are tested in a
small network setup with detection tools. The attacks against
the network are based on attack vectors and are performed
in-house. Because of the effort required, only a few different
attacks are used. Both works evaluate not only the strategy’s
performance, but also vary the setup to test other aspects of
the strategy.
Apart from intrusion response systems, the evaluation of
intrusion detection systems (IDS) is also a much-investigated
topic [12], [13]. The evaluation of IDSs is also facing many
issues [14], however, those issues are quite different. Intrusion
detection evaluation suffers from a lack of labelled, up-todate
datasets, problems with the description and generation of
“normal” trafﬁc, and a lack of zero-day attacks data.
TABLE I: Defended honeypots.
IP address Domain name Username Password
xxx.xxx.xxx.2 test admin password
xxx.xxx.xxx.3 - root raspberry
xxx.xxx.xxx.4 www root 123123
xxx.xxx.xxx.5 mail admin admin
xxx.xxx.xxx.6 db root password1
III. METHODOLOGY
This section describes the experimental environment, strategies,
and evaluation process. First, we describe the network of
honeypots used in the experiment and the process of attack
detection and response. Then, we present the evaluated strategies,
including their theoretical background and parameters
given by the experimental environment. Finally, a simulation
run and the process of strategy evaluation are described.
A. Experimental Environment
We used ﬁve high-interaction honeypots in our experiment,
the list of which can be seen in Table I. The honeypots
were chosen because they have no production value and a
certainty that any incoming trafﬁc is, by nature, suspicious
[15]. Thus, we did not have to take false positive detections
into consideration and the production environment was spared
from changes in conﬁguration.
The honeypots were part of an existing honeynet, which has
been running for several years [16]. The historical records of
attacks against the honeynet were used in a simulation and cost
estimations. The honeypots were placed in the same subnet
and were assigned consequent IP addresses. Thus, we were
able to observe attackers which targeted several victims at the
same time or one by one. The honeypots were also assigned
a domain name to lure potential attackers. The role of each
honeypot corresponds to its domain name (www, mail, db).
Each honeypot runs a SSH server and responded to authentication
attempts. The usernames and passwords were
established for each service so that the attackers could successfully
guess the combination, see Table I. The probability
of a successful attack was based on historical records of
authentication attempts against the honeynet.
B. Attack Detection and Response
The honeypots were deployed in a honeynet with central
logging mechanisms and a database of authentication attempts.
Each honeypot captured the credentials used in an authentication
attempt and sent them to the central database. The
database record consists of a timestamp, attacker’s IP address,
honeypot’s IP address, username, and password. These records
were used as a detection tool for the defence strategy. The
attacker is identiﬁed by a single IP address and the attack is
detected if at least one attempt from the IP is observed in the
database.
The network trafﬁc of the honeynet ﬂows through a gate
that had the capability to manipulate the trafﬁc, e.g., by
blocking access to certain services for certain hosts. Every
minute, a script went through the new records in the database
TABLE II: Values of services for attacker and defender.
Target
Value of service
Unavailability cost
for attacker for defender
xxx.xxx.xxx.2 0.03 500 1
xxx.xxx.xxx.3 0.29 500 0
xxx.xxx.xxx.4 0.02 2000 10
xxx.xxx.xxx.5 0.38 3000 2
xxx.xxx.xxx.6 0.28 4000 5
and computed an optimal strategy for the current situation.
The experiment had several phases using different strategies
which are discussed later. If a strategy resulted in restricting
access to the honeypot for the attacker, the honeynet gate was
commanded to block the attacker from accessing a speciﬁc
honeypot. The blocking was implemented using a ﬁrewall rule
that dropped the trafﬁc only between the attacker and one of
the honeypots. Thus, restricting the attacker’s access to one of
the targets didn’t inﬂuence the attacker’s access to other targets
unless the interaction between the attacker and the other target
was also found to be malicious. The restrictions could also be
lifted in case of the rule being recognized as unneeded at the
moment by the strategy.
C. Strategies
We implemented two different strategies, which were evaluated
on the network of honeypots. We chose two concepts
which are most common in intrusion response systems - game
theory and cost analysis. The requirements placed on the
defence are as follows:
• Each service has a value for the defender. The value indicates
the damage the defender suffers when the service
is successfully attacked. The values of each service are
listed in Table II.
• Each service should be available all the time. For each
minute the service is not available, the defender suffers a
loss. This unavailability penalty is also listed in Table II
for each service.
• The frequent reconﬁguration of iptables and a high
number of iptable rules are undesirable, so for each
added/removed iptable rule, the defender receives a reconﬁguration
penalty. The penalty was set to 10.
The value and the cost of unavailability for each service
were established prior to the experiment start. We based these
values on the domain names of the honeypots. We treat the
honeypot with domain name www as a web server, therefore
it has the highest cost of unavailability and a smaller value.
On the other hand the honeypot with the domain name db
is treated as a database server, therefore it has the highest
value and a lower value of unavailability. The honeypot with
the domain name mail has a medium value and unavailability
cost. The two remaining honeypots have both low value and
unavailability cost, the one with domain name test is treated
as a test web server, the other as a work station.
1) Game theory: In the ﬁrst scenario, we used a gametheory
based strategy. Game theory takes into account both the
attacker’s and defender’s goals. We used the Nash equilibria
Fig. 1: One game round.
A
D
attacker left
(0, c(h))
C A
attack success
(βn
ua(s), −βn
ci(s) + c(h)) A
attack
leave
none
block
0.22
0.78
The n-th round with history h. The A nodes mark the attacker’s moves, D
node marks the defender’s move and C node marks a chance move. The
values in brackets denote the attacker’s and defender’s reward in terminal
nodes. c(h) is the accumulated discounted defender’s reward based on history
h, ps = 0.22 is the probability of attack success, β is the discount factor,
ci(s) is the value of the service for the defender, ua(s) is the value of the
service for the attacker.
to ﬁnd the optimal defender’s strategy. We modelled the
interaction between attacker and defender as a ﬁnite, nonzero,
two player game in an extensive form. Due to the
computational complexity, we were forced to model an attack
against each service as a separate attack and limit the number
of rounds to ﬁve rounds, which corresponds to a ﬁve-minute
interval. Since the upper quartile of the length of attack was
less than ﬁve minutes, we accepted this limitation.
The two players (attacker and defender) select an action in
turns. One game round is depicted in Figure 1. The actions
available to the attacker are to leave the game, gaining nothing,
or attack the target. The actions available to defender are to
block the attacker from the target by reconﬁguring the IP tables
or not to block the attacker from the target. If the attacker
chose to attack and the defender chose not to block, the chance
node is reached. In the chance node the game ends with the
probability of ps = 0.22 for a successful attack and continues
with the probability of p = 0.78. The probability ps was
estimated as a weighted average of probabilities of success
for each service, based on the observed frequency of the
combination of username and password and average attempts
per minute. The reward in the terminal node is discounted by
a factor of β = 0.9 for each round and is computed as follows:
• If the game ends because of a successful attack, the
attacker gains the discounted value of the service, based
on his utility function. The defender gains the discounted
penalty for service lost and the discounted penalty for
reconﬁguration and unavailability based on the history of
the ﬁnal state
• If the attacker leaves, he gains nothing and the defender
gains the discounted penalty for reconﬁguration and unavailability
based on the history of the ﬁnal state
• If the game reaches the ﬁnal state after ﬁve rounds,
the attacker gains nothing and the defender gains the
discounted penalty for reconﬁguration and unavailability
based on the history of the ﬁnal state
The discount factor express that the gain/loss, which is further
in future has a lower value than the same gain/loss received
sooner. The values of each service for attackers are listed in
Table II. The value of a service for the attacker is the reward an
attacker receives upon a successful attack against the service.
Given that the difﬁculty of attacking each target is exactly the
same and assuming the attackers choose their targets based on
the subjective target value, we based the value for the attacker
on the popularity of each target before the experiment.
We found the mixed Nash equilibria of the game using the
Gambit tool [17] and implemented a strategy based on the
equilibria strategy.
2) Cost Sensitive: The second implemented strategy was
based on cost evaluation. It considers the immediate cost
associated with each action from the defender’s point of view
and does not include the attacker’s goals into consideration. It
performs the action with the lowest associated cost.
The defender has the following responses available: to block
the attacker from the target or not to block the attacker from
the target. Each action, a, is described by two parameters: the
type of the action, type(a), which is either to block or not and
the target of the action, target(a), which denotes the service
the attacker will be blocked from or not. This means a total
of 10 actions is considered.
The cost of action a consists of two parts C(a) = Cn(a) −
Cp(a). The cost Cn(a) = Cr(s, a)+Cu(a) sums the negative
impacts of the action on the system and the cost Cp(a) sums
the positive impacts of the action. The negative impacts of
the action are the cost of reconﬁguration Cr(a) and the cost
of unavailability Cu(a). The cost of reconﬁguration Cr(a) is
computed based on the current conﬁguration and the considered
action as follows:
Cr(s, a) =



0 if the current state is the same as the state
of block of action a
10 otherwise
The cost of unavailability depends on the target of the action
and its associated availability value cA. It is computed for one
round, which corresponds to one minute.
Cu(a) =
0 if the action is not to block
cA(target(a)) otherwise
Finally, the positive impacts of the actions are estimated as the
potential damage that was mitigated by the defensive action.
The potential damage depends on the probability of an attack’s
success, ps = 0.22. It was estimated as a weighted average
of probability of an attacks’s success against each service.
This is based on the observed frequency of username/password
combinations and average attempts per minute.
Cp(a) = ps · ci(target(a))
This cost estimation follows the cost estimation in [6], which
uses the operational cost, impact of the response on the system,
and response goodness to calculate the total cost.
Each round, a script is called which considers all actions for
all attacks in terms of their cost. The action with the lowest
cost is selected and performed.
D. Simulation Run
In order to decide whether there is a difference between how
the strategy performs in a semi-real environment of a honeypot
network and in a simulated environment with replayed attacks,
we created a simulation scenario for each strategy. In the
simulation, the attacks observed before the experiment were
replayed against the strategy and the strategy’s responses were
logged. The strategy employed the same logic as it did when
deployed on the honeypot network, however, the attacker was
deprived of the possibility to react to the responses. To simulate
the reconﬁguration of iptables, the attempts that would
have occurred during active blocking were not considered.
E. Strategy Evaluation
In order to evaluate the strategies, we ﬁrst separated the
observed attempts into attacks. An attack is a sequence of
attempts by the same attacker, regardless of the target, which
closely followed after one another. We assume that if there are
no attempts by the attacker for at least one hour, the attacker
has stopped the attack. For gaps lower than one hour, we
were looking into the pace of the attack. If the gap between
the attempts was uncharacteristically long, we also assumed
the attacker stopped their attack. Based on these requirements
a gap between the last attempt and the ﬁrst attempt in the
following attack must satisfy one of the following conditions:
1) The gap between the attempt and the next one is longer
than one hour
2) The gap between the attempt and the next one is longer
than ﬁve minutes and is more than ﬁve times the average
gap between the previous ﬁve attempts
For attacks that occurred during the experiment, we added an
additional condition that the gap was not caused by a block.
We were also checking if the combination of username and
password was used by the attacker before (restarted attacks).
A graph of attack lengths is shown in Figure 2, illustrating the
distribution of attacks before the experiment.
The performance of the strategy was evaluated for each
separate attack. We computed three indicators for each attack:
• the number of reconﬁgurations of iptables, nr, made in
response to the attack
• the total time, nu(s), the service was blocked (in minutes)
for each service s
• whether the service was compromised for each service s
The overall score, e(a), of the strategy per attack, a, was
computed as follows:
e(a) = nr · 10 +
s∈S
nu(s) · cA(s) +
s∈S
nsucc(s) · cI(s),
where cA(s) is the availability value of service s, cI(s) is the
value for the defender of service s and nsucc(s) = 1 if service
s was successfully attacked, nsucc(s) = 0 otherwise.
Fig. 2: Attacks lengths
IV. RESULTS AND DISCUSSION
The experiment consisted of two steps, simulation run and
live evaluation. We ran the live evaluation in the honeynet
in August 2016. Each strategy was deployed for at least two
weeks. In Table III, we summarize the overall statistics of
the experiment, i.e., the number of attacks, successful attacks,
reconﬁgurations, and time interval under which the machine
was unavailable. We observed a total of 207 attacks during
the experiment with the game theory based strategy and 437
attacks during the experiment with the cost sensitive strategy.
In the simulation with replayed attacks, we replayed a vast
collection of attacks against each strategy. The attacks were
reconstructed from the historical records of authentication
attempts against the honeynet. An attack was a series of
authentication attempts that had the same source and target IP.
We used a collection of 2,148,805 authentication attempts in
15,214 attacks observed from December 2011 till June 2016.
A. Strategy Comparison
We evaluated the strategies as described in Section III-E.
Figure 3 shows a comparison of the results for both strategies.
The overall statistics of the results for each strategy are in
summed up in Table IV.
The cost sensitive strategy did better than the game theory
based strategy in the semi-real environment of honeypot networks.
It had the advantage of longer minimal block duration
over the game theory based strategy. The average length of
attack observed prior to experiment was around 26 min. The
short duration of 5 minutes led to a high number of repeated
reconﬁgurations resulting in a high penalty. This is because in
the case of longer attacks, if the block is removed, the attack
is detected again and the block has to be put back in place,
resulting in two additional reconﬁgurations. It also prevented
a small window of opportunity for attackers to perform a
successful attack after the block was removed and before the
attack was detected again (and the block was put back in
place). As you can observe in Table III the cost sensitive
strategy has a lower relative number of successful attacks than
Cost sensitive Game theory
010002000300040005000
Strategyscore
Fig. 3: Comparison of Cost Sensitive and Game Theory
Strategy
the game theory based strategy. The cost sensitive strategy
is also easily scalable, which could give it an even greater
advantage on larger networks.
B. Simulated and Semi-real Execution
Next, we compared how each strategy performed in the
simulated environment and the semi-real environment. The
overall statistics for each strategy in the simulation run are
presented in Table IV. It shows that both strategies performed
differently in the simulated environment than in the semi-real
environment of the honeypot network. The mean results of the
strategies in the semi-real environment are lower than the mean
results of the strategies in the simulated environment, and the
standard deviation is also lower in the semi-real environment.
Moreover, the change is in the opposite direction. In the
experiment, the game theory based strategy performed worse
than the cost sensitive strategy. In the simulation run, it was
the other way around.
To conﬁrm the difference in the outcomes of the strategies,
we want to show that the results in the simulated environment
do not come from the same distribution as the results in the
semi-real environment. We used the Wilcoxon–Mann–Whitney
two-sample rank-sum test [18] to test the following null
hypotheses:
1) the distribution of the game theory based strategy’s
results in the semi-real environment does not signiﬁcantly
differ from the distribution of game theory based
strategy’s results in the simulated environment
2) the distribution of the cost sensitive strategy’s results in
the semi-real environment does not signiﬁcantly differ
from the distribution of the cost sensitive strategy’s
results in the simulated environment
TABLE III: Experiment Statistics
Strategy # attacks # reconﬁgurations # minutes blocked # successful attacks Start End
Game theory 207 2374 5294 55 2016-08-01 00:14 2016-08-16 03:21
Cost sensitive 437 1029 22467 62 2016-08-16 03:41 2016-08-30 17:44
TABLE IV: Strategy Results per Attack Statistics
Strategy Count Mean strategy score Stdev
Game-theory 207 803.37 1279.24
Cost sensitive 437 489.41 937.52
None 15214 1004.47 2407.20
Game-theory* 15214 1006.363 2371.38
Cost sensitive* 15214 1109.63 2343.42
* simulation run
We were able to reject both null hypotheses with a signiﬁcance
level of α = 0.01, which indicates that for both
strategies there is a statistically signiﬁcant difference between
how the strategy performs in a simulated environment and how
the strategy performs in a honeypot network.
We tried to explain why the strategies would perform
better facing real attackers than replayed attacks. We suspect
the difference is tied to the difference in attack lengths as
discussed in detail in Section IV-C1. The defender’s actions
probably forced the attacker to split his attacks over time
and the result is several attacks of medium length instead of
one long attack. This leads to better result in our evaluation
scheme, however, it might be argued, whether this is the
desired result of employing a network defence.
C. Attacker’s Behaviour
We tried to investigate whether the attacker’s behaviour
changes based on the defender’s actions. We found several
statistics from the data gathered during the experiment: the
lengths of the attacks, correlation between attack lengths and
strategy evaluation scores, ratio of returning attackers and time
needed to succeed. Then we compared these statistics between
the semi-real environment runs when attacker can change his
behaviour based on the defender’s response and the simulated
environment run, when this option is taken from him.
1) Attack Lengths: We compared the lengths of the attacks
before the experiment and throughout the experiment. The
estimated density of the lengths of the attack for both strategies
and historical records is shown in Figure 4. We can see that the
lengths are lower in the historical records than in the semi-real
environment. We suspect the longer attacks could be caused
by an attacker waiting for the defender to stop blocking him
to ﬁnish the attack. The relatively high values for the costsensitive
strategy were caused by several very slow attacks that
occurred during the experiment. Although, the attacker made
only several attempts, the time span between the attempts was
almost one hour.
To show that there is a signiﬁcant difference between the
lengths of attacks in a semi-real and simulated environment,
we used the Mann-Whitney-Wilcox test with following null
hypotheses:
0 50 100 150 200
0.000.010.020.030.04
Minutes
Density
Cost sensitive
Game theory
None
Fig. 4: Attacks lengths
1) the distribution of attack lengths observed during experiment
with a game theory based strategy in a semireal
environment does not signiﬁcantly differ from the
distribution of attack lengths prior to the experiment
2) the distribution of attack lengths observed during experiment
with a cost sensitive strategy in a semi-real
environment does not signiﬁcantly differ from the distribution
of attack lengths prior to the experiment
We were able to reject both hypotheses with a signiﬁcance
level of α = 0.01. This conﬁrms that the attackers were indeed
behaving differently during the experiment (when they were
actively opposed) than prior to the experiment (when they did
not meet any resistance).
2) Correlation between attack length and strategy result:
We were interested in how much the overall penalty depends
on the length of the attack. Since the defender receives the
penalty for unavailability, we would expect these two numbers
to be correlated. The correlation coefﬁcients are shown in
Table V. There is almost no correlation between the attack
length and the strategy result when deployed in the semi-real
environment for the game theory based strategy or the cost
sensitive strategy, and in the results of no strategy. A slightly
higher correlation was observed in the simulated environment
for the cost sensitive strategy and game theory strategy. This
means that the correlation differs for data captured during
the experiment when the attacker can react to the defender’s
actions (since no strategy is how the honeypot network was
defended up to now) and the data from the simulated environ-
ment with replayed attacks.
TABLE V: Correlation between attack length and strategy
result
Strategy Correlation 95% Conﬁdence interval
Game-theory 0.11 [-0.02, 0.25]
Cost sensitive 0.06 [-0.03, 0.15]
None 0.05 [0.03, 0.07]
Game-theory* 0.35 [0.33, 0.36]
Cost sensitive* 0.41 [0.39, 0.42]
* simulation run
The overall low correlation is caused by the minimal block
duration for each strategy. The low correlation between penalty
and attack length for no strategy indicates that the attack’s
success is dependent mostly on whether the attacker has the
right combination of username and password in his dictionary,
and not on the length of the attack.
3) The Return of the Attacker: Another interesting aspect
of the attacker’s behaviour is the tendency to repeat the attack.
For both strategies, we computed the average ratio of attackers
who repeated their attack in a period of the next 10 days.
We compared the results with the same values computed for
attacks observed before the experiment. A boxplot with results
is shown in Figure 5. The average ratio of returning attackers
is 0.24 during the experiment with the game theory based
strategy, 0.26 during the experiment with the cost sensitive
strategy and 0.11 in the historical data.
Cost sensitive Game theory None
0.00.10.20.30.40.50.6
Returnrate
Fig. 5: Attacker’s Return Ratio
We can see that the return rate of the attacker is higher
when the attacker has a chance to react to the defender’s
actions. One explanation might be that the attacker gives up
after being blocked, however, returns later to ﬁnish the attack.
We also investigated whether the returning attackers were
starting from where they left off (meaning the combinations of
usernames and passwords were not observed in the previous
attack by the same attacker). We saw only a very low number
of restarted attacks, which supports the hypothesis that the
attackers paused their attacks when blocked and continued the
attack later.
4) Time to Succeed: We investigated how long the attacker
needs to successfully attack the service. We took all the
successful attempts and computed the time from the beginning
of the attack. The statistics of the measured times are shown in
seconds in Table VI. The average values are rather small due to
weak passwords. The highest average value was computed for
no strategy, which is not very surprising since this is the only
strategy that allows the attacker to succeed over the length of
the whole attack. Overall these statistics do not point to any
change of attacker behaviour.
TABLE VI: Time to succeed
Strategy Mean time to succeed Stdev
Game-theory 238.80 s 546.39 s
Cost sensitive 855.03 s 2068.91 s
None 1055.36 s 7561.65 s
Game-theory* 431.64 s 3938.81 s
Cost sensitive* 300.08 s 7561.65 s
* simulation run
D. General Discussion
In this section, we would like to summarize comments and
lessons learned during the experiment’s setup and run. The
following comments might not relate directly to the research
questions, however, we ﬁnd them relevant to the subject of
strategy veriﬁcation.
The data shows that the attacker does indeed change his
behaviour if he faces an active defence. We found out that the
active defence does not deter the attacker. On the contrary,
the attackers were more persistent in their attacks during
the experiment and often returned to continue the attack.
We have two most probable hypotheses for the attacker’s
motivation for this change of behaviour. Either the act of
defending suggests that there is something worth defending,
which increases the attacker’s motivation, or we were dealing
with automated attacks scripted so that the given sequence of
username/password combinations is always ﬁnished and the
observed change in behaviour was caused by retry mechanisms
in the script. Since we cannot conﬁrm or reject our hypothesis
with the attackers, we cannot ﬁnd the true reasoning behind
their actions.
The computational complexity of the strategies is often not
reﬂected in the evaluation. We encountered this limitation with
the game theory based strategy. If the strategy cannot be scaled
with reasonable computational complexity, it might perform
very well, however, it could not be used in real deployment
since networks tend to be of a larger scale than in experiments.
Although game theory has performed well, it could not (in
its current form) be used in practice. However, this attribute
could be detected during the simulation veriﬁcation since we
can easily scale our network in simulated environment.
The combination of one-time costs and cost per unit of time
has to be very carefully considered. Speciﬁcally, in the security
triad Conﬁdence, Integrity, Availability, the former two are
one-time costs and the latter is a cost per unit of time. Since we
were running an experiment, we deﬁned these values without
the need to justify them. However, in deployment in a real
network, the system administrator would be hard pressed to
estimate these parameters.
Deployment in a real environment forces the authors to
address all aspects of the strategy, such as an estimation of
parameters, what inputs are available, whether the assumptions
about other components are realistic, and so on. The strategy
is viable only if all parameters can be estimated, inputs
supplemented and all assumptions met. In our case, we needed
to estimate the probability of an attack’s success, the values
of the services and we also had to take into account that, by
blocking, we lose detection ability.
V. CONCLUSIONS
In this paper, we have evaluated two network defence
strategies in a simulation run and on a live honeypot network
to compare evaluation in a real-life and simulated environment.
Our ﬁrst research question regarded the difference between
defence strategy evaluation in simulated and real environments.
When we compared the results of the simulation run
and the live evaluation, we could see that both strategies
were performing better in the live evaluation than in the
simulation run. We attribute this difference to the change
in attackers’ behaviour. This suggests there are signiﬁcant
differences between theoretical and practical evaluation that
have to be taken into account if we want to evaluate new
defence strategies.
To answer our second research question concerning the
change in attackers’ behaviour when facing an active defence,
we studied various statistics from the data gathered during
the experiment and before the experiment. We found that the
attacks were on average longer during the experiment and
that a higher number of attackers repeated their attack. We
concluded that an active defence does not deter the attacker.
On the contrary, the attackers are more persistent in their
attacks and often return to continue the attack.
We observed notable differences between the evaluation
in a simulated environment and on a honeypot network. We
conclude that using replayed attacks for evaluating strategy is
not sufﬁcient. Since implementing an attacker’s behavior in the
simulation scenarios is going to be a very demanding task, it is
clear that the ﬁeld of strategy evaluation faces a considerable
challenge.
In our future work, we are going to focus more deeply
on the attacker’s reaction to a defence. We would like to
expand our experiment so that the attacker can observe and
react to the success of his attack. However, for this, we
would need a high-interaction honeypot that allows successful
intrusion. It would also be very useful to ﬁnd ways how to
distinguish between scripted attacks and human-driven attacks.
Since scripted attacks follow a very simple logic, the defence
could be tailored to counter this logic, and therefore, could be
much more efﬁcient.
Acknowledgements
This research was supported by the Security Research
Programme of the Czech Republic 2015 - 2020 (BV III /
1 VS) granted by the Ministry of the Interior of the Czech
Republic under No. VI20172020070 Research of Tools for
Cyber Situational Awareness and Decision Support of CSIRT
Teams in Protection of Critical Infrastructure.
REFERENCES
[1] A. Shameli-sendi, M. Cheriet, and A. Hamou-lhadj, “Taxonomy of
intrusion risk assessment and response system,” Computers & Security,
vol. 45, pp. 1–16, 2014.
[2] N. Kheir, “Response policies and countermeasures: Management of
service dependencies and intrusion and reaction impacts,” Ph.D. dissertation,
PhD thesis, Ecole Nationale Supérieure des Télécommunications
de Bretagne, 2010.
[3] M. Papadaki and S. Furnell, “Achieving automated intrusion response:
a prototype implementation,” Information management & computer
security, vol. 14, no. 3, pp. 235–251, 2006.
[4] W. Kanoun, N. Cuppens-Boulahia, F. Cuppens, and J. Araujo, “Automated
reaction based on risk analysis and attackers skills in intrusion
detection systems,” in 2008 Third International Conference on Risks and
Security of Internet and Systems. IEEE, 2008, pp. 117–124.
[5] Z. Zhang, P.-H. Ho, and L. He, “Measuring ids-estimated attack impacts
for rational incident response: A decision theoretic approach,” Computers
& Security, vol. 28, no. 7, pp. 605–614, 2009.
[6] C. Strasburg, N. Stakhanova, S. Basu, and J. S. Wong, “A framework
for cost sensitive assessment of intrusion response selection,” in 2009
33rd Annual IEEE international computer software and applications
conference, vol. 1. IEEE, 2009, pp. 355–360.
[7] N. Stakhanova, S. Basu, and J. Wong, “A cost-sensitive model for
preemptive intrusion response systems.” in AINA, vol. 7, 2007, pp. 428–
435.
[8] W. Kanoun, N. Cuppens-Boulahia, F. Cuppens, and S. Dubus, “Riskaware
framework for activating and deactivating policy-based response,”
in Network and System Security (NSS), 2010 4th International Conference
on. IEEE, 2010, pp. 207–215.
[9] S. Wang, Z. Zhang, and Y. Kadobayashi, “Exploring attack graph for
cost-beneﬁt security hardening: A probabilistic approach,” Computers &
security, vol. 32, pp. 158–169, 2013.
[10] C. Mu and Y. Li, “An intrusion response decision-making model based
on hierarchical task network planning,” Expert systems with applications,
vol. 37, no. 3, pp. 2465–2472, 2010.
[11] B. Foo, Y.-S. Wu, Y.-C. Mao, S. Bagchi, and E. Spafford, “Adepts:
adaptive intrusion response using attack graphs in an e-commerce
environment,” in 2005 International Conference on Dependable Systems
and Networks (DSN’05). IEEE, 2005, pp. 508–517.
[12] N. J. Puketza, K. Zhang, M. Chung, B. Mukherjee, and R. A. Olsson, “A
methodology for testing intrusion detection systems,” IEEE Transactions
on Software Engineering, vol. 22, no. 10, pp. 719–729, 1996.
[13] N. Athanasiades, R. Abler, J. Levine, H. Owen, and G. Riley, “Intrusion
detection testing and benchmarking methodologies,” in Information
Assurance, 2003. IWIAS 2003. Proceedings. First IEEE International
Workshop on. IEEE, 2003, pp. 63–72.
[14] P. Mell, V. Hu, R. Lippmann, J. Haines, and M. Zissman, “An overview
of issues in testing intrusion detection systems,” 2003.
[15] The Honeynet Project, Know Your Enemy: Learning about Security
Threats, 2nd ed. Boston: Addison-Wesley, 2004.
[16] M. Husák and M. Drašar, “Flow-based Monitoring of Honeypots,” in
Security and Protection of Information 2013. Brno: Univerzita obrany,
2013, pp. 63–70.
[17] R. D. McKelvey, A. M. McLennan, and T. L. Turocy, “Gambit:
Software tools for game theory, version 14.1. 0,” 2014. [Online].
Available: http://www.gambit-project.org/
[18] H. B. Mann and D. R. Whitney, “On a test of whether one of two
random variables is stochastically larger than the other,” The annals of
mathematical statistics, pp. 50–60, 1947.