M A S A R Y K UNIVERSITY
F A C U L T Y OF INFORMATICS
s e r v i c e f o r C y b e r n e t i c P r o v i n
G r o u n d ' s v i s u a l i z a t i o n s
MASTER'S THESIS
Be. Robert Dubecky
Brno,2014
Declaration
Hereby I declare that this paper is my original authorial work, which I have worked out
by my own. A l l sources, references and literature used or excerpted during elaboration
of this work are properly cited and listed in complete reference to the due source.
Be. Robert Dubecký
Advisor: RNDr. Radek Ošlejšek, Ph.D.
ii
Acknowledgement
Above all I would like to thank both my advisor RNDr. Radek Ošlejšek, Ph. D. and my
team leader Dalibor Toth for providing me with a lot of help and guidance during the
work on my thesis. I would also like to thank the rest of my colleagues from the
visualization group for being a great team to work with; namely Andrej Lučanský,
Zdenek Eichler, Petr Jelínek, Adam Brauner, Michal Kňazský and Karolína Burská.
Last but not least, I would like to thank my good friend Lenka Plháková for helping me
with creating some nice pictures for my thesis.
iii
Abstract
The Cybernetic Proving Ground is a prototype testbed for executing simulations of
cybernetic attacks on various network infrastructures. It should be able to visualize the
executed simulations. The goal of the thesis was to create a service that will process the
data generated during the simulations and provide them to the visualizations. The
thesis describes the Cybernetic Proving Ground project and its data model. Furthermore
it focuses on the data access and object-relational mapping technologies in Java and lists
their advantages and disadvantages. In the end the design and implementation of the
developed data service are described. The data service exposes a REST API and is
currently deployed in a testing environment.
iv
Keywords
Cybernetic Proving Ground, MyBatis, O R M , Java, impedance mismatch, REST, Spring
M V C , data service
Contents
1 Introduction 1
2 Cybernetic Proving Ground 2
2.1 Goals and requirements of the infrastructure 2
2.1.1 Network specific requirements 3
2.1.2 Host configuration requirements 3
2.1.3 Monitoring requirements 3
2.1.4 Control requirements 3
2.1.5 Deployment requirements 3
2.2 Scenario execution 4
2.3 Architecture 4
2.3.1 Scenario management 4
2.3.2 Cloud management 6
2.3.3 Measurement 6
2.3.4 Visualization 7
2.4 C P G Entity-relationship diagram 7
2.4.1 Network topology configuration 8
2.4.2 Logical topology and network properties 9
2.4.3 Measurement infrastructure 9
2.4.4 Definitions of network parameters and characteristics 9
2.4.5 Observation storage 10
3 Technologies for Mapping Relational Data to Objects in Java 11
3.1 The object-relational impedance mismatch 11
3.2 Mapping relational data to objects in persistence layer 13
3.3 Direct JDBC API calls 13
3.3.1 History 13
3.3.2 Technology 14
3.3.2 When to use the JDBC API directly 15
3.4 O R M frameworks 16
3.4.1 History 16
3.4.2 Technologies 16
3.4.3 When to use O R M framework 18
3.5 Hybrid frameworks 19
3.5.1 Technologies 20
3.5.2 When to use hybrid frameworks 21
4 System Requirements and Architecture 23
4.1 Requirements 24
4.2 Choosing technologies 25
4.3 Architecture 26
5 Implementation 29
5.1 Package structure 29
5.2 Presentation layer 30
5.2.1 Configuration and initialization 31
5.2.2 Request handling 31
5.2.3 Creating JSONs 33
vi
5.2.4 Exception handling 34
5.2.5 Cross-origin requests 34
5.3 Service layer 34
5.3.1 Architecture and configuration 35
5.3.2 Processing data and optimization 35
5.4 Data Access Layer 36
5.4.1 SqlSession and SqlSessionFactory 36
5.4.2 Configuration of the data access layer's components 36
5.4.3 Mapper X M L files 37
5.4.4 Mapper interfaces and mapper cooperation 39
5.4.5 Dynamic SQL 40
5.5 REST API documentation 41
5.6 Deployment 42
6 Conclusion 43
Bibliography 44
List of used abbreviations 50
A The REST API of the Visualization Data Service 52
B Tutorials for deployment and configuration of the service 65
C Visualization portlets 67
D List of electronic appendices 69
vii
1 Introduction
In today's digital age when almost all computers are networked and a part of the
Internet, the cyber-attacks are becoming an increasing threat. Therefore, the network
security needs to be continuously improved and updated. However, the attackers are
unceasingly coming up with new techniques of how to get past even the latest defense
mechanisms and disrupt a system or exploit vulnerability of a service. To design a
successful functional protection, one has to fully understand the nature of the attacks,
their consequences and possible limitations. One of the viable solutions for this problem
is to simulate the attacks in a testing environment. A prototype of such environment is
currently being developed at Masaryk University and goes under the name of
Cybernetic Proving Ground.
The simulations executed in the Cybernetic Proving Ground produce a lot of data about
the attacks. Afterwards, the data need to be processed and visualized in some fashion so
people who analyze the attacks could understand them easily. The goal of this thesis is
to design and implement a prototype of a service which would access those data, process
them and distribute them to the Cybernetic Proving Ground's visualizations. The thesis
also focuses on examining various approaches to data access and object-relational
mapping technologies available in Java environment in order to find a suitable
technology for the implementation of the data service. Since more visualizations with
unpredictable data requirements are likely to be developed in future, the thesis describes
the implementation of the service in detail so new data services could be added quickly
by different developers.
The second chapter describes the requirements for the Cybernetic Proving Ground's
infrastructure and the modules of which the infrastructure is comprised. It also focuses
on explaining the data model of the Cybernetic Proving Ground's database. Chapter
three analyzes the problem of mapping relational data to objects and introduces several
Java technologies that address the problem. The fourth chapter lists the requirements for
the implemented data service, explains the technology choices and the design of the
service's architecture. Chapter five talks about the actual implementation of the service
and shows how the chosen technologies were employed in the project.
1
2 Cybernetic Proving Ground
Cybernetic proving ground (CPG) is a virtual testbed specifically designed for testing
and simulation of cyber-attacks on various network infrastructures. Several similar
testing environments already exist but each one of them comes with a set of restrictions
which limits its functionality or variability. There are projects like DETER [1] which
utilize the publicly available Emulab [2] infrastructure solution. It allows them to deploy
various virtual appliances with flexible network topologies configuration and network
characteristics adjustment. Despite the fact that Emulab simplifies many of the tasks
required for setting up different experiment environments, it introduces several
constraints regarding the infrastructure and its features (e.g. only IPv4 support in
topology configuration, operating system and hardware restrictions etc.). Some other
security-related testbeds need to have their own and dedicated infrastructure purchased
and built. A particular set of hardware and software components have to be acquired
and set up in a specific way allowing it to be used only for a single purpose. O n one
hand, this provides better control over the testbed deployment and its supported
features, but on the other hand, it limits the growth potential and requires high initial
investments [3].
2.1 Goals and requirements of the infrastructure
To overcome the limitations of the other security testbeds, the C P G team decided to
invent a completely new solution in security-simulations field. The main goal of C P G is
to introduce a system where executing various attack scenarios in network is simple (i.e.
there is no need for the user to have extended knowledge about creating virtual
networks and running them) and which contains sufficient monitoring and visualization
tools that enable the user to study the simulated attacks. Five categories of requirements
for the testbed infrastructure have been established [4]:
1. Network specific requirements
2. Host configuration requirements
3. Monitoring requirements
4. Control requirements
5. Deployment requirements
2
2.1.1 Network specific requirements
C P G is supposed to allow its users to have full control over the simulated network's
third layer of ISO/OSI model. That means it must be possible to simulate any network
topology and use arbitrary IP addressing schema or third layer network protocol. To be
able to realistically simulate any network, the new system has to allow the modification
of network characteristics such as link bandwidth, latency, packet loss etc. Furthermore,
a controlled connection of the testbed networks with the real world should be possible to
establish, since some scenarios may require communication with the Internet. However,
this kind of communication must be properly filtered, so the ongoing attack does not
spread outside of the simulated network [5].
2.1.2 Host configuration requirements
The system must be able to simulate computers ran by common operating systems
(specifically Linux and Windows so far) both in their 32-bit and 64-bit versions. It should
be possible to configure the computers easily and to install any software required by a
scenario.
2.1.3 Monitoring requirements
For the simulation to provide a usable data about the attacks a sophisticated monitoring
infrastructure has to be in place. Monitoring of the network communication and line
usage however cannot interfere with the actual traffic since it would introduce an
undesired noise in the measured data. Link monitoring as well as host monitoring
should be implemented in the infrastructure.
2.1.4 Control requirements
The testbed should be equipped with a control layer which would manage the other
components of the system. It will expose a user interface which should enable the user to
simply and intuitively set up and configure virtual networks for simulations, execute
and re-execute scenarios and should comprehensively visualize what is happening in the
simulated environment.
2.1.5 Deployment requirements
It was concluded that the most suitable solution for building a system with the abovementioned
requirements would be using a cloud environment. Clouds possess powerful
tools for creating a virtual infrastructure which would be able to simulate the network
layer as well as host computers. By using common cloud interfaces C P G would not need
to rely on a particular cloud provider which would supply it with desired transferability
and flexibility.
3
2.2 Scenario execution
Before explaining architecture of the C P G it is important to understand what a C P G
scenario is and how it will be executed.
A scenario defines what will be simulated in the C P G and how the components in the
sandbox environment will be built. It describes all the actions that will happen during
the experiment and also contains the details of their technical realization, among which
are for example:
• Nodes - the virtual machines that will be used for the experiment
• Network (physical) topology - how the nodes will be connected and for a virtual
network
• Logical topology - the logical roles of the nodes (attacker, victim, etc.)
• Monitoring rules - defines the monitoring infrastructure for the scenario and the
most interesting observation points in the network topology
• Experiment schedule - the separate steps describing what is supposed to happen
during the experiment
The scenario execution has three main phases which occur in a sequence:
1. Initialization - the infrastructure elements are instantiated according to
configuration (network and logical topology, monitoring rules...) defined in the
scenario.
2. Scenario run - network and host monitoring infrastructures capture data of the
running attack and store them in the database
3. Evaluation - the data that was stored during the attack is being analyzed and
interactively visualized
2.3 Architecture
The architecture of the C P G consists of four infrastructure elements, which are called
modules. The modules communicate by sending messages which contain control
information, configuration or measured data. The following sections describe the
purpose and responsibilities of each module.
2.3.1 Scenario management
The scenario management module is the main module of the CPG. It communicates with
all of the other modules and supervises and controls their work according to the needs of
the scenario simulation. Its main tasks are managing configuration, controlling the
scenario execution and evaluation of the scenario.
4
Scenario
Cloud management
y*
c
Scenario management
Measurement management Visualization
I Network topology (NT)
NT initial configuration
Measurement infrastructure (Ml)
J Ml initial configuration
Logical topology
Figure 1: Schema of the CPG configuration [4]
The configuration of the C P G is illustrated in Figure 1. After receiving scenario
information, the scenario management module creates the configuration of logical
topology and initial configuration of network topology and measurement infrastructure.
Firstly, it contacts the cloud management module with the initial configuration of the
network topology. The cloud management module sets up the network, complements
the configuration and sends it back to the scenario management. The scenario
management module then passes the network topology and initial measurement
configuration to the measurement management module, which completes the
measurement configuration and sends it back. In the end, the scenario management
module provides both the network and logical topologies and measurement
infrastructure information to the visualization module.
The primary role of the scenario management module during scenario execution is the
supervision of scenario progress. It secures that every step defined in the scenario is
performed and completed. When the scenario is completed and evaluation starts, the
module makes sure that all of the outputs are correctly prepared and stored for later
visualization and analysis.
The creation of the scenarios is the responsibility of scenario development group. They
study various kinds of cyber-attacks and transfer them into scenarios that can be
executed in the CPG. So far, they have prepared a detailed version of DDoS (Distributed
Denial of Service) scenario based on observation of real attack against the internet
infrastructure of Czech Republic that took place in March of 2013 [6]. This scenario has
been used as a proof of concept of the C P G [3] and provided testing data for the
development of the visualization module prototype.
5
2.3.2 Cloud management
The creation and management of virtual machines and networks in the C P G is hidden
behind a cloud management module. The module acts as an abstract layer above the
cloud provider. It exposes a simple but powerful Application Programming Interface
(API) functions that communicate with the underlying cloud service and facilitate the
construction of virtual network environment for the experiments.
Every security scenario is executed in a separate closed environment (sandbox), which
ensures that the simulated attack will not affect any other simulation or another part of
the C P G infrastructure. Each assembled virtual environment is controlled by a dedicated
node, called Scenario Management Node (SMN), which provides the functionality of the
scenario management module for the newly created sandbox as well as contains
additional components for measurement (see Measurement management section). The
control node takes care of the configuration and instantiation of virtual machines and
setting up the network topology based on scenario description. The S M N communicates
with the rest of the infrastructure in a separated control network to ensure that the
control data and module configuration does not interfere with the actual experiments.
2.3.3 Measurement
The measurement infrastructure currently implemented in the C P G consists of two
monitoring infrastructures - network and host.
The network monitoring infrastructure's responsibility is gathering the relevant network
traffic data, their processing and storing for later analysis and visualization. For these
purposes a set of probes, data processing unit and a database is used. A probe is a device
that performs the data metering and exporting processes. It gathers raw information
about the network traffic and sends them to the data processing unit. A probe can be
either a software device or may use hardware support. The data processing unit collects,
processes and stores data sent by probes. Its core part is called a collector and it is
located in Scenario Management Node, which means there is one collector per scenario.
After receiving certain amount of data from the probes, the collector then compute basic
statistics, aggregates the data and sends it to the database.
The host monitoring infrastructure collects data about the node performance. It is
possible to monitor C P U and memory utilization, disk usage, number of open
connections, interface statistics and other characteristics about hosts [3]. The
infrastructure is built from two components - a master node and child nodes. The master
node is a process which controls the child nodes and gathers their monitored
information. This information is afterwards processed and stored. The master node is
deployed in the collector and has usually only one instance per scenario. Child nodes are
6
processes running directly in host machines. Each child node monitors the host in which
they are located and send the information to the master node.
2.3.4 Visualization
The purpose of the visualization module is to offer simple and responsive control over
the C P G and to comprehensively represent the measured data. The module should
therefore provide a control Graphical User Interface (GUI) and scenario visualizations.
Although some visualizations will be shared among more scenarios (like network
topology monitoring), there will also be specific visualizations for specific scenarios.
These will present their respective scenario in a unique way, concentrating only on the
data important for its visualization (e.g. showing number of packets received by victim's
computer during DDoS scenario).
The visualization module also has to perform further processing and analysis of the
measured data before they can be passed to the actual visualizations. The designed
visualization module will be implemented as a web application. The application should
be effective, scalable, extensible and maintainable so the new visualizations are simple to
add when necessary.
This thesis is a part of the visualization module. It focuses on accessing data measured
during a scenario execution, processing them and distributing them to other parts of the
module so they can properly visualize them to C P G users. To be able to correctly access
the data, it is important to understand where and how the data will be stored. This issue
is addressed in the next section.
2.4 C P G Entity-relationship diagram
The C P G stores the data about scenarios and measuring results in relational databases.
Each scenario execution has its own database with a common schema. The database
model is shown in Figure 2 its latest version, but because the C P G is still in development,
it is likely that the model will undergo some modifications in the future. The tables in the
model are colored differently by their purpose:
• Blue tables are used for storing the network topology configuration
• Green tables store logical topology and network properties
• Yellow tables store information about the measurement infrastructure
• Orange tables contain definitions of network parameters and characteristics
which will be measured in the C P G
• Grey tables form the observation storage
7
Paradigm For ijfc. logical rOTS'1
'l
-, , l l v e r b :
'c
vJ
y id Int4
Q name text \ )
i
as5igned_logical_role
id int4
neiwork_etement_id int4
• from lime(1) [fl
ß t o lime(1) BO
J
routi a
1ř id
Int4 4
g firewall
text CS
tlM CS *
dsinefwo/Vwrf tM4
es y
ne twork_e lernen!
[7] name text U
<-
r
network element group
network_element_id int4
4^ groupid int4
I id in!4
groupjd in(4
P name text [JJ]
network
4h nehvorJceJemenf_W /ni4
H cidr4 text C3
n Cidrß text 09
| J
rode interface
§ Id int4
t o - -
1
^ nefworfc neiworfcetemeflMd iflM CS
node nefwort eJe.mer)f jtf /nW GO
^ internal_network_probe_id r"ni4 k j !
Q ip4_addr text C5 in ]
^ ip6_addr text 03 10 - - r
n mac_addr text GS 10-- r
Í id int4
^tm node_id m\4
Q id_of_image Int4 CO
text
GO
• size int4 CO
H format text CO
[] template_id int4 CO
r)éíw(Jf-A_*yefrtértř_í(í j'rtM
%• physic at_rote_id ini4
^ Id Int4
^ name text [}
intern al_network__p robe
measuring_probe_id int4
monitored link
id int4
4jh routingjd int4
4h extema!_network_probeJd ůlM
network._property
^ id int4
"I ^^přienome/jon_íy"pe_i:
d' irtM
4jfc| interfaceJd int4
4j|| routingjd inM
m value text
^ id int4
4k phenomenonJypejd int4
name text ßj
9supported values
phenomencnjype
1 id int4
2 name text
H unit text
measured_phenomenon_type
if w InM
measíiremení_fype_íd int4
phenomenon Jypejd in!4
^ derived_characteristics bool CJJ
>.
r
measurement_type \
id int4
detection_metbod_id int4 CS
^ name text CS
?
O^ detection method \
s* host probe
1
» v measuring_probe id int4 r
external network probe
^pta nodejd in\4
v >
measuring_probe id int4
J i
i
i
! ! ! ! int4
I l l i ^ name text y
; ; i i Q desc
V
text C3
^ ^ íi? ^
' observation
id Int4
^ m e a s u r e d _ p h e n o m e n o n J y p e J d int4
measuring_prohe_id int4 CO
^ routingjd int4 CO
4|fa node_inie/face_'"d_iJi ini4 CO
nodeJnterfaceJd_out int4 CO
^] timestamp time&tamp C O - E
^ measuring probe — \
id int4
Q timeout_active time(7) CS
^ timeoLt_inactive time(7] CS
^ Q sampling int4 CS
who measured
measurement
^ observationjd tni4
P value text (JJ]
category observation
observationjd int4
phenomenon_id ini4
Figure 2: Entity-relationship model of CPG database [7]
2.4.1 Network topology configuration
The network topology configuration part of the model describes a simplified logical
structure of the simulated network. It does not represent the concrete architecture used
to build the topology within the C P G environment (specific routers and switches). Local
networks in the topology (table Network) are connected by links which are stored in table
Routing. Each link in table Routing is defined by source and destination networks which
means that for each two-way communication there have to be two records. This solution
ensures the ability to separately measure the link usage in each computer. Table Disk
8
holds information about disc images that are used for initializing host computers in the
virtual environment. Since several logical characteristics can be assigned to network as
well as hosts, they are connected by a generalized table Network_element.
2.4.2 Logical topology and network properties
With logical topology tables, it is possible to specify a logical role1
to any host computer
or network at any given time. This is done by setting the attributes from and to i n table
Assigned_logical_role. These attributes represent relative time from the beginning of the
simulation (i.e. if there is a record where the attribute from is set to 10 seconds and the
attribute to is set to 25 seconds, it means that the logical role which was assigned to a
network element is active between the 10th
and 25t h
second of the simulation). A network
element may change its logical role any number of times2
but must not have more than
one role assigned at a certain moment. Network elements may also belong to different
logical groups by their geographical, organizational or any other logical structure (tables
Group and Network_element_group). Logical groups are not currently used in the C P G
prototype. Table Network_property holds values of various network parameters that can
be specified for either links or interfaces.
2.4.3 Measurement infrastructure
Measurement infrastructure storage contains data about all probes deployed in the
virtual network. Each probe has a record in table Measuringjprobe which contains general
information about the probe. Additionally, since probes can be of different types, every
probe must have a record in exactly one of the Measuring_probe's child tables. Probes
deployed directly in hosts have their records in Host_probe table, probes that monitor
communication that passes through host interfaces within a network are stored in
Internal_network_probe table and probes monitoring communication between networks
are kept in External_network_probe table. Because one external network probe can
monitor more links and one link can be monitored by more probes, there is also a
binding table MonitoredJink which reflects this possibility in a database.
2.4.4 Definitions of network parameters and characteristics
There are many types of network parameters and characteristics that can be measured
during scenario executions. A n y new scenario may define a new set of such measurable
characteristics and that is why they need to be stored in a general and extensible way.
A l l the definitions of network parameters and characteristics are stored in the table
Phenomenon_type, which holds their name and their unit. If a phenomenon type cannot
be measured as a numerical value with a unit, it is assigned an enumeration of values
1
Currently one of - idle, victim, bot, master
2
For example a victim may become the attacker and then victim again in a single scenario
9
that it may acquire (table Phenomenon). This is an example of the definition of a
phenomenon type with a unit:
phenomenon_type
•name = "Number of bits"
•unit = "bit"
This is a phenomenon type that has a list of supported values specified:
phenomenon_type
-name = "network protocol"
-unit = null
-name = "network protocol"
-unit = null
phenomenon
•name = "UDP"
phenomenon
•name = "TCP"
The Measurement_type table stores information about particular measured characteristics
for a scenario. Such characteristics may be more complex and be composed out of several
phenomenon types. Therefore, there is a binding Measured_phenomenon_type table
between the tables Measurement_type and Phenomenon_type. The binding table also
specifies whether the phenomenon type it binds is a derived characteristic or not:
via
measured_phenomenon_type phenomenon_type
measurement_type
via
-derived_cfiaracteristic = "no" -name = "flows"
-name = "traffic increase"
via
-derived_cfiaracteristic = "no" -name = "flows"
-name = "traffic increase"
via
-name = "traffic increase"
via measured_phenomenon_type
phenomenon_type
-name = "traffic increase"
-derived_characteristic = "yes"
phenomenon_type
-derived_characteristic = "yes"
-name = "5 min cummulative flows"
Table Detectionjnethod should serve as a place for storing different methods for
measuring network characteristics but it has not been used in the prototype so far.
2.4.5 Observation storage
Measurements obtained during the experiment are located in the observation storage.
Table Measurement keeps all the numeric values of all measured phenomenon types. The
values stored there are the actual measured values sent by probes during the
experiment. The phenomenon types that use an enumeration of values rather than
numeric values (the ones with records in table Phenomenon) have their measurements
stored in table Category_observation. There is also a table Observation which serves as a
parent of two previously mentioned tables and keeps the information about the origin
(which probe acquired the data) and the date and time of each measurement. It also
specifies whether the measurement took place in a link or in a host's interface (either
incoming or outgoing communication) by providing its identifier.
10
3 Technologies for Mapping Relational Data to Objects in Java
Many contemporary applications use huge amounts of data which need to be stored and
retrieved in some fashion. Although there is a wide range of available technologies for
data management, relational databases still remain the most popular general purpose
data stores that exist. The vast majority of world's corporate data is most likely stored in
them [8].
However, in object oriented environment such as the one provided by Java, objects are
used to represent data. They are an ideal abstraction for building complex systems as
they have access to features like encapsulation, inheritance or polymorphism. The
problem is that objects are only accessible when the Java Virtual Machine (JVM) is
running. If J V M stops, all of the objects are lost unless there is a mechanism which stores
them for later use [9].
The common solution is to store important objects in a relational database. The
mechanisms that are responsible for it must ensure that data which were held by the
objects during execution of the application are accessible even after the application has
been terminated. They must also provide a way of recreating the objects from the stored
data. This is called the object persistence [10]. Using relational databases to achieve
object persistence, however, introduces a new problem to the application development
which is commonly known as the object-relational impedance mismatch.
3.1 The object-relational impedance mismatch
The impedance (or paradigm) mismatch between the relational and object environment
occurs due to their different perception of data. Relational databases see data as
relations that are stored in tables made of rows and columns. Data identification is
provided by special and unique columns or sets of columns called primary keys. Foreign
keys and join tables represent the relationships between tables. O n the other hand,
objects have their identity, state and behavior. They can inherit from other objects and
may have references to collections of other objects or themselves [9]. It can be also said
that object-oriented paradigm is based on proven software engineering principles, while
relational paradigm is based on proven mathematical principles and therefore the two
technologies do not work together seamlessly [11]. There are five particular mismatch
problems [12] [13]:
11
1. Granularity
In relational databases granularity can be implemented in only two levels: table and
column. Moreover, columns should have only atomic values (first normal form). If there
is a need to model a composition, the table must either have all columns of all composite
objects or have foreign keys to other tables that again can only have atomic columns.
However in object oriented languages, programmers can define classes with different
levels of granularity: coarse-grained classes like User which could be composed of
several finer-grained classes such as Address or Person and also atomic values like
username (String class).
2. Subtypes
Inheritance is widely used feature in object oriented languages and therefore many
object models have a lot of subtypes defined. This complicates conversion between the
object and relational environments since there is no inheritance defined in the standard
for relational databases. Although some databases implement it (like PostgreSQL [14]),
inheritance is not usually supported.
3. Identity
In relational databases, identity of a row is specified by its primary key. Each tuple in a
table must differ from the others at least by its primary key and therefore each tuple is
unique. Object oriented languages, however, define identity and equality for objects.
Two non-identical objects (their location in memory differs) may have the same state and
be recognized as equal.
4. Associations
Associations in object environment are represented as unidirectional references between
objects. If there is a need to define bidirectional relationship, the association has to be
defined twice. O n the other hand, modelling relationships in relational databases is done
by foreign keys and binding tables (if needed).
5. Data navigation
To navigate through object oriented environment, associations between objects have to
be used. Getting from one object to another is therefore done by walking through the
object network. This is not an efficient way of retrieving data from a relational database.
12
To minimize the number of SQL queries, tables are usually joined first and data are
selected from the resulting table.
3.2 Mapping relational data to objects in persistence layer
Java Enterprise Edition (Java EE) applications are usually divided into several layers,
which allows them to have different kinds of functionality separated. Most applications
have three distinct layers [15]: "n.
Presentation Layer
The presentation layer exposes user interface which is
responsible for presentation of data and interaction
with users.
The business logic layer encapsulates all application
logic, data analysis and processing.
The data access layer is responsible for storing and
retrieving data from long-term storages.
The mapping of relational data to objects in Java is usually
Figure 3: Three-layered
implemented in the data access layer which is also called architecture [61]
the persistence layer. There are three currently popular
approaches for handling the mapping in Java [16] [17]. Developers of persistence layer
can directly use a call-level API for SQL-based database access. In Java it is the JDBC
API3
. If using the JDBC API directly is not considered a suitable option for a project, a
persistence framework can be used. A lot of persistence frameworks in Java provide
Object-Relational Mapping (ORM) solutions. These frameworks hide the underlying
relational storage and JDBC A P I calls. Also, by employing a variety of mapping
techniques they allow developers to work only with an object domain model without the
need of deep SQL knowledge. Although O R M technology has many advantages,
sometimes it is not the best solution for the data access. In these cases there are certain
hybrid and non-ORM (usually SQL-centric) frameworks, each one having its own
advantages and disadvantages. The following chapters focus on each of the approaches
along with some available technologies that implement them.
3.3 Direct JDBC API calls
3.3.1 History
The JDBC API is the industry standard for connectivity between the Java programming
language and a wide range of databases. It was first released as a part of Java
3
JDBC is a trademarked name, but it is often thought that it stands for Java Database
Connectivity. Java™ DataBase Connectivity has been later added as a second trademarked name
[21].
13
Development Kit (JDK) 1.1 in February 1997 and since then has formed part of the Java
Standard Edition (Java SE) [18]. Later versions came in years 1999 (JDBC 2.1), 2001 (JDBC
3.0) and 2006 (JDBC 4.0). The newest version, JDBC 4.2 was specified by a maintenance
release of Java Specification Request (JSR) 221 in October 2011 and included in Java SE 8
[19].
3.3.2 Technology
The JDBC API is said to allow a
Java programmer exploit "Write
Once, Run Anywhere"
capabilities for applications that
need to access enterprise data [20].
As shown in Figure 4, there are
two sets of interfaces that together
form the technology: the first one
is the JDBC API for application
developers and the second one is
the JDBC Driver API for driver
writers [20]. In the rest of the text
only the JDBC API for application
developers will be considered.
Java Application
JDBC Driver Manager
JDBC Implementation
JDBC API
JDBC Driver
API
Figure 4: JDBC interfaces [21]
The JDBC API allows the developers to do three things [21]:
1. Establish a connection with a data source
2. Send queries and update statements to the data source
3. Process the results
A simple example of all three steps is shown in the following code fragment:
/* 1. Establish connection with a database */
Connection connection =
Driver-Manager.getConnection(url, username, password);
/* 2. Prepare and execute a query */
Statement statement = connection. createStatementQj
String sql = "SELECT id, last_name FROM person";
ResultSet rs = statement.executeQuery(sql)j
14
/* 3. Extract data from the result set */
while(rs.next()) {
int i d = rs.getlnt("id")j
String lastName = rs.getString("last_name")j
// ... use the retrieved data
}
3.3.2 When to use the JDBC API directly
The JDBC API is a low level database access tool in Java. As such, it does not introduce
any unnecessary overhead while establishing connections with databases and executing
SQL statements, which means that it has a very good performance. It also allows
developers to use any desired features of the target relational database since it can
execute any required SQL statement.
However, there are several reasons why using the JDBC API directly is not a good choice
for many applications. The JDBC API does not offer any automatic features for object
mapping so programmers have to map the data to objects themselves. Moreover, writing
the JDBC code is time consuming, verbose and therefore error-prone. A programmer has
to write a code to obtain connections, prepare statements to execute, handle the results
and also close all the connections and handle exceptions. A l l of this leads to a very
complex and unmaintainable code in larger projects. Also, as the JDBC API calls and
SQL statements are usually embedded into code, any change (such as changing the
database vendor) is very difficult.
Accessing relational data by direct JDBC API calls should be employed when there are
very specific requirements laid upon the executed SQL statements or when the best
possible performance is required. It should not be used when there is a complex business
object model and changes are expected. It should not be used for large projects.
Advantages Disadvantages
Simple syntax, easy to learn Complex when used in large projects
Good performance with large data Programming overhead
Good for small applications No cache
Control over executed SQL Database specific queries
No SQL code generated
Does not provide transparent persistence
Table 1: Advantages and disadvantages of using the JDBC API directly
15
3.4 ORM frameworks
Object-Relational Mapping, broadly referred to as O R M , is a technique for converting
data between incompatible type systems in object oriented languages and relational
databases. O R M forms an intermediary between the object model and relational model
and creates, in effect, a "virtual object database" that can be used from within a
programming language [22]. This should allow developers to work only with objects
during the application development and shield them from the underlying relational
database. Developers, therefore, do not need a thorough knowledge of the SQL and the
source code is not bound with any database vendor, which simplifies the changing of the
database vendor if needed.
3.4.1 History
The need of O R M frameworks emerged after Java 2 Enterprise Edition (J2EE) was
released in 1999. Two means for accessing persistent stores (mostly relational databases)
provided by J2EE were JDBC and entity beans, which was a part of Enterprise JavaBeans
(EJB) framework [23]. Since using the JDBC directly is not a suitable choice for many
cases and the EJB framework was considered to be too heavyweight and resource
consuming, new persistence solutions were being searched for. In 2002 the first popular,
fully featured open source O R M solution was released - Hibernate [24]. Hibernate
provided a simpler and more efficient way of persisting objects than EJB and soon
became the most used persistence framework. Later, in 2006, a standard called Java
Persistence API (JPA) was released as a part of JSR 220 [25]. As a new standard for
managing relational data in Java, JPA combined the best ideas from other actively used
frameworks and standards at that time [26], which apart from Hibernate and EJB were
also the Java Data Objects [27] (JDO) standard and the TopLink framework. JPA is
currently in version 2.0 and the standard is distributed as JSR 317 [28].
3.4.2 Technologies
The Java Persistence API is a specification describing how POJO (Plain Old Java Objects)
can be persisted to a relational database without requiring the classes to implement any
special interfaces or methods. It allows the definitions of object's object-relational
mappings to be described by standard annotations within Java code or by X M L
(extensible Markup Language) files. These mappings contain information about how
Java classes map to relational database tables. JPA also describes how query processing
and transactions should be handled, in its EntityManager API. Part of the JPA standard
is focused on Java Persistence Query Language (JPQL) - an object query language that
allows querying of the objects from a relational database.
16
Nowadays there are several implementations of the JPA standard, such as EclipseLink
[29], Apache OpenJPA [30] and Hibernate4
[31].
Hibernate, with its many features, extensive documentation and vast active community,
is the most popular O R M framework in Java. As an open source project, it can be used
for free. It supports lazy initialization, numerous fetching strategies and offers a scalable
architecture. It also implements a second level cache to speed up the execution of hot
queries5
. Hibernate supports most of currently used relational databases like Oracle,
DB2, SQL Server, MySQL, PostgreSQL and many more [32]. Since Hibernate provides
many features and is generally a very mature project, it is very complex. Its complexity
means a steep learning curve and it makes debugging difficult.
On the other hand, one of the biggest advantages that Hibernate has against other O R M
frameworks is a toolset created for the framework called Hibernate Tools [33]. It contains
several powerful features that ease the development with Hibernate such as:
• Mapping Editor - editor for creating Hibernate mapping files and supports autocompletion
and syntax highlighting
• Hibernate Console - helps with configuration of database connections and
allows executing Hibernate Query Language6
(HQL) queries interactively
against a database
• a database reverse engineering tool for generating domain model classes,
Hibernate mapping files and H T M L documentation
There are also many tools and frameworks that employ the idea of O R M but does not
follow the JPA standard. Some of these frameworks are:
• Apache Cayenne [34] - an open source framework, which provides O R M
features, caching and remoting services. Cayenne offers a mode where multiple
clients can connect to a data source through a Cayenne controlled service (instead
of via JDBC) which gives better control over centralized validation, caching and
seamless persistence of objects. It supports a number of relational databases,
which results in good portability. It also includes a GUI-based database schema
modeler to simplify the learning of the framework and quicken the development
of applications.
4
Although Hibernate was created before the JPA standard was established and has its own nonstandardized
functionality and features, it also implemented the standard after the JPA was
released
5
Hot query is second, third or any other later execution of a query (first execution is a cold query)
6
Hibernate Query Language [62] is a broader version of the JPQL. JPQL is a subset of HQL.
17
• Ujorm [35] - a small Java open source library partly inspired by Hibernate and
Cayenne. It is a very lightweight framework with no library dependencies in the
runtime. Ujorm offers a unique O R M module for rapid Java development and
allows an easy configuration of an O R M model by Java source code, annotations
or X M L . The key features include type-safe database queries that ensure the most
of typing errors are detected before running the application.
3.4.3 When to use ORM framework
Opinions about using O R M frameworks differ a lot. Some state that O R M is a bad
concept, an anti-pattern [36], and that O R M frameworks usually introduce a lot of
problems into a project [37] [38] [39]. The problem with O R M is that it is a leaky
abstraction [40] so it cannot completely shield the underlying relational database from a
programmer. If a user of an O R M tool has knowledge about how O R M works and how it
cooperates with relational databases, O R M may be a right choice for many projects.
Since trying to reach a full object abstraction over relational data generates complexity,
O R M frameworks tend to be very complicated. This affects the performance of O R M
frameworks. More mature and complex frameworks such as Hibernate must be properly
configured to achieve acceptable performance [41], which may slow down a project. On
the other hand, if there are not special requirements on a project's performance or there
is an expert of the used O R M framework in a team, the specified problem may not
appear and the framework can be used successfully.
O R M frameworks can also be used when a developing team has a full control both over
the business object model as well as the database model (schema). In this case, if one
model evolves, there is no problem with evolution of the other model. However, in most
cases the database and application developers are in separate groups and the database is
often used by more teams. This renders any changes to the database schema very
inconvenient and it must be considered when choosing the framework.
A huge advantage of O R M frameworks is that they generate SQL statements themselves
so they are not embedded in the code and the resulting code is portable between
supported databases. Switching the underlying database is therefore very quick and
usually only includes changing a database driver in the framework's configuration. The
downside of this behavior is that some O R M frameworks do not support stored
procedures and advanced non-standardized database features provided by many
vendors.
Some O R M frameworks also support the creation of mappings between objects and
relations via annotations in the code, which can speed up the development significantly.
18
Figure 5 contains simple guidelines about
when an O R M should be used. If a business
object model of a project is very complex and
an application's performance is not as
important as for example portability, using an
O R M tool should be beneficial for the project.
On the other hand, if the model is simple and
there is a requirement of high performance, an
O R M framework could cause problems. If,
however, an expert of the framework is
present in the project, the O R M framework
may be considered.
Model Complexity
Don't Use
ORM Expert
Definitely Use
Optionally Use
Definitely Use
Throughput
Figure 5: When to use an ORM [63]
Advantages Disadvantages
Transparent persistence Performance issues (needs tuning)
Encourages object-oriented design Huge mapping overhead
Easy to change database vendor Often very complex, difficult debugging
Often packed with powerful tools Little control over executed SQL
No deep understanding of database
required
Difficulties with handling complex queries
Good caching support
Table 2: Advantages and disadvantages of ORM frameworks
3.5 Hybrid frameworks
O R M frameworks can solve a lot of problems that come with the impedance mismatch
but may introduce new problems to a project if used inappropriately or in a situation
when they are not suitable. Direct JDBC API usage is also usually not the best solution,
mostly due to its verbosity, limited portability and code reusability. For these situations
there are several frameworks which despite the fact they do not provide full objectrelational
mappings, offer higher level of functionality than pure JDBC API. They could
be categorized somewhere between O R M and JDBC and could be called hybrid
frameworks.
Hybrid frameworks tend to be more SQL-centric than O R M frameworks. They usually
serve as an intelligent wrapper around the JDBC API and hide the need of a lot of
boilerplate code that is typical for the JDBC API-based applications. Developers,
therefore, do not have to create connections, prepare statements, iterate through result
sets and sometimes populate objects with data themselves as the frameworks usually
take care of such operations. The frameworks facilitate the execution of SQL via Java
code and offer additional features.
19
3.5.1 Technologies
Each of the hybrid frameworks is unique and solves particular problems. The purpose of
each framework has to be considered when choosing the correct technology for a project.
Three different technologies will be briefly described in this section: Spring JDBC
Framework, jOOQ and MyBatis.
The Spring JDBC Framework [42] is a part of the Spring Framework [43]. It provides
solutions to the low-level details of the JDBC API, like opening and closing connections,
preparing and executing SQL statements, processing exceptions and handling
transactions. A developer must only define connection parameters, specify the SQL
statements to be executed and determine the work for the iterations that take place
during fetching data from a database. The core class of the framework, which manages
all database communication and exception handling, is the DdbcTemplate class. It
supports full functionality of the JDBC API while offering an automatic clean-up of the
resources, translation of the standard JDBC exceptions into RuntimeExceptions for
better flexibility and provides several ways for database querying.
Java Object Oriented Querying (jOOQ) [44] is a Java database framework for building
type-safe portable SQL queries and their execution. The main idea behind jOOQ is that
SQL is a declarative language that is hard to integrate into object-oriented programming
languages but it is the correct tool for database querying. jOOQ takes SQL as an external
domain-specific language [45] and maps it onto Java, creating an internal domainspecific
language (DSL). jOOQ's main feature is the SQL building. Developers are able to
construct valid SQL statements directly from the Java code using the internal DSL
provided by jOOQ. The constructed statements can be afterwards executed against any
of the many supported databases. The builder supports all of the standardized SQL
functionality (like insert, update or any sort of select with joins, groups, etc.) and also
some vendor specific features (like MySQL's encryption functions) if a corresponding
DSL subclass is used. Other features of jOOQ include SQL execution and code
generation tools.
The code fragment shows a simple SQL query written with the jOOQ framework:
create.select(AUTHOR.FIRSTJWIE, AUTHOR.LASTJWIE, BOOK.TITLE)
.from(AUTHOR)
.join(BOOK).on(BOOK.AUTHOR_ID.equal(AUTHOR.ID))
.where(BOOK.LANGUAGE.equal("EN"))
20
MyBatis [46] is the most popular hybrid framework. Its first version was released on
July 1, 2002 by Clinton Begin under the name "The iBATIS Database Layer" [47]. The
framework was not actually released as a separate product back then, yet it was a part of
JPetStore - a Java implementation of Microsoft .NET's Pet Shop [48]. However, it was
accepted well by the Java community [49] and was later released separately under the
Apache Software Foundation. When the project left the Apache Software Foundation, its
name was changed to MyBatis.
MyBatis is a Java persistence framework that couples objects with stored procedures or
SQL statements using an X M L descriptor or annotations. When compared to the O R M
tools, the biggest advantage of the MyBatis data mapper is simplicity. Unlike O R M
frameworks, MyBatis does not map objects to database tables but methods to SQL
statements. It provides a mapping engine that maps SQL results to object trees in a
declarative way (i.e. after the mapping of the result columns to object properties is
specified, MyBatis automatically creates and populates new objects with the data from
the result). One of the most powerful features of MyBatis is also its Dynamic SQL
capabilities that allow developers to write flexible SQL statements which may be
interpreted differently with different parameter values.
Since all SQL queries in MyBatis applications are hand coded rather than generated, they
can be properly optimized and tested before being deployed. A l l the database
functionality and vendor specific features can be used easily this way, too. The downside
is that it makes portability between databases more inconvenient if vendor-specific
syntax is used.
3.5.2 When to use hybrid frameworks
Hybrid frameworks differ a lot from each other. A l l of them have their own advantages
and disadvantages and solve specific problems tied with accessing relational databases
from Java. This means that if using any O R M framework or the JDBC API directly is not
suitable for a project, a hybrid framework can be handpicked to match a particular
problem.
As hybrid frameworks are generally oriented towards SQL more than O R M frameworks,
they excel in projects where developers have good knowledge about SQL and know how
relational databases work. They are usually very lightweight since no complex objectrelational
mappings take place and therefore are easy to learn, use and debug and have
decent performance.
21
Advantages Disadvantages
Full power of SQL Does not provide full O R M solution
Good performance SQL must be hand-written
Simplicity Weak caching support
Easy debugging No fully transparent persistence
Can be chosen for a specific problem Knowledge about relational databases and
SQL needed
Table 3: Advantages and disadvantages of hybrid frameworks
22
4 System Requirements and Architecture
The Cybernetic Proving Ground's visualization module has two main responsibilities: to
expose a GUI for controlling the system and to present the process of scenario simulation
execution and its results to a user. The presentation of scenario execution is done by a
number of more or less independent visualizations, each concentrating on a specific part
or view of the simulation. Security scenarios may require to be visualized by specific
visualizations. As new scenarios will be created, additional visualizations will be
deployed to the existing module.
The module is developed as a web application. To ensure appropriate scalability, the
visualizations are implemented as portlets inside an enterprise portal. The chosen
platform for hosting the visualizations portlets is a free and open source enterprise portal
written in Java, called Liferay Portal [50].
In order to present the simulation process and results, each visualization portlet needs a
mechanism for accessing the database and acquiring the appropriate data. One
possibility was that every portlet would maintain its connection with the database
separately (individually) but that was not considered as suitable option since every
portlet developer would have to understand the database schema and program the data
access code and queries himself. This could possibly slow down the development
process and the resulting code would be difficult to maintain. Therefore it was decided
that there should be a separate backend data service that would shield the frontend
portlets from the database. Only the new Visualization Data Service (VDS) would access
the database and prepare all of the required data. This concept is shown in the Figure 6.
The first step when developing a new application or service is to analyze the problem
and specify the requirements. Afterwards, based on the results of the analysis, the
selection of technologies suitable for implementation and the design of application
architecture take place. These parts of software product development are extremely
important as any design or technology change would need a lot more resources (time,
people, money, etc.) in later stages of the project than in the beginning. The analysis and
design of the VDS is described in the following sections.
23
Figure 6: The placement of the Visualization Data Service
4.1 Requirements
The Visualization Data Service was going to be designed and implemented at the same
time as the visualization portlets to which it would provide the data. Therefore, most of
the functional requirements on the service would be specified during the development.
In this case the functional requirements are almost entirely various data services that the
VDS would provide to the portlets based on their requests. The architecture of the VDS
has to allow simple adding of the functionality when needed.
The result was that only a set of non-functional requirements on the Visualization Data
Service and its architecture was established. It was based on the information about the
C P G project, the data model and planned visualization portlets. The non-functional
requirements are as follows:
Simple architecture
The architecture of the service should be as simple and transparent as possible. Firstly, it
is generally a good idea to follow the KISS principle [51] as it usually leads to faster
development and more maintainable code. The second reason is that the current
implementation of the VDS is supposed to be a prototype for the Release Candidate 1 of
the visualization module and after that it will be probably taken over by a different
developer, most likely a student. Simple architecture and easy-to-learn frameworks
should be used in order to ease the developer change.
Ability to adapt to frequent database schema changes
In the beginning of the development of the data service, the C P G database schema (the
current version is described in the section 2.4) was not final and was supposed to change
24
occasionally if any of the C P G modules required it. The implementation of the VDS must
not need a complex code refactoring if such a change occurs.
Optimized for reading relational data
The data measured during a scenario execution will be stored in a relational database7
by
other modules than the visualization module. The VDS responsibility will be only to
read the required data, process them and expose them to the visualization portlets. It
will not support any insert, update or delete operations.
Performance
The amount of data that are going to be stored during the experiments is expected to be
enormous as there will be several network and host characteristics measured
periodically on a huge number of hosts, links and network interfaces. The service should
not introduce unnecessary overhead while reading the data so it does not excessively
slow the visualization portlets and their interaction with the user.
Possibility to change the database vendor
Although it was agreed upon PostgreSQL as a chosen database, the service application
should not be tightly coupled with any database vendor and should allow a relatively
easy change of the vendor without extensive code modifications.
Independent from visualization portlets
By separating the data service from the portlets, the visualization module becomes more
scalable. The development of the service and portlets can be easily separated as well
(provided that the format for data requests and responses is agreed upon) and they can
be deployed individually.
4.2 Choosing technologies
Several technology decisions had to be made prior to designing the application
architecture. First of all, the application has been chosen to be written in Java as the rest
of the visualization module. Using Java ensures that the application will be platform
independent, which is important as no information about the expected deployment of
the service had been provided. Furthermore, it was decided that the Visualization Data
Service will be implemented as a web service. It will expose a Representational State
Transfer (REST) API for data requests and return JavaScript Object Notation (JSON)
7
Specifically, a PostgreSQL [64] database has been chosen for the CPG prototype.
25
objects in its responses. This eases the separation of the service from the visualization
portlets and enables it to be deployed independently.
The Spring Framework (later in the text referred to as Spring) [43] has been chosen as
the core framework the project will be built upon. Spring provides many useful features
for Java projects such as its implementation of Inversion-of-Control (IoC) container,
which simplifies the unit testing and promotes creating reusable code. Spring is also
packed with a powerful the Web Model-View-Controller (MVC) Framework with
support for creating RESTful services.
As the VDS is primarily a data providing service, the most important technology
decision was to choose a suitable means of data access. Embedding pure JDBC API calls
would make the project hardly maintainable and changing database schema or vendor
would require significant code changes. Therefore, the main decision was whether to use
an O R M framework or one of the hybrid frameworks. After comparing the leading
technologies, MyBatis [46] framework has been selected. Hibernate or generally any JPA
implementing framework works better as a full O R M solution in projects where various
C R U D 8
operations are executed frequently and database portability is the most
important aspect rather than simplicity and adaptability to data model changes [52].
jOOQ is good for executing portable SQL statements but it is also not suitable for
projects with possible model changes. On the other hand, MyBatis suits the project
requirements well as it is not a complex framework and has a good overall performance
without unnecessary overhead, while providing sufficient data-to-object mapping
features. The SQL statements in MyBatis are hand written so they can be easily modified
if there are any changes of the data model.
For integration of the two chosen frameworks, MyBatis-Spring [53] project has been
used. It is specifically designed to connect the MyBatis and Spring frameworks so they
form a well cooperating base for building applications.
Project's library dependencies and building will be managed by Maven [54], a proven
software management and comprehension tool released by Apache.
4.3 Architecture
The architecture of the VDS is designed in three main layers. Although it is not a typical
web application but rather a web data service, the layers are similar to the ones
described in section 3.2, with a few differences as shown in Figure 7. The layered
architecture was chosen because it improves the reusability and maintainability of the
8
Create, Read, Update, Delete
26
code and also transparently separates the responsibility of the classes. The
implementation of each layer is thoroughly explained in the chapter 5.
Request U R L J S O N response
+ t
P r e s e n t a t i o n
l a y e r
S e r v i c e
l a y e r
D a t a a c c e s s
l a y e r
REST API
•1 , Spring Web MVC
Controllers + r 3
Framework
Service interfaces
Service implemantations
Mapper interfaces
Mapper XML . _ .
files + M y B a t l s
data
Database
Figure 7: The architecture of the Visualization Data Service
The top layer is the presentation layer, which exposes a web interface of the service; in
this case it is the REST API. It receives requests from its clients and responds them with
JSON objects populated with the desired data. The objects that form the presentation
layer are called controllers. For the request handling and response creation, controllers
use features of the Spring Web M V C Framework.
Beneath the presentation layer there are the service objects, forming the service layer.
Services gather raw data objects from its underlying data access layer, process and
transform the data (if needed) and send them to their respective controllers. Each service
exposes an interface, which is used by the controller objects, in order to offer a simple
way of switching the implementation of service objects if needed.
The last layer, which separates the rest of the application from a database, is the already
mentioned data access layer. This layer is formed mostly by the MyBatis framework,
which secures the correct execution of SQL statements defined in files called mappers.
27
Mappers are formed by mapper interfaces and mapper X M L files and their purpose and
usage will be explained in section 5.4.
Apart from the described horizontal layers, there is also a vertical separation of the
project in form of components based on their functionality and usage by visualization
portlets. There are currently three main components - network, measurement and time
component. Each of them encompasses the data services logically related together and to
the type of visualization portlets that are going to use them.
The network component makes the network related data available and is used by C P G
topology visualization portlet. It is divided on network usage service and topology
service in the service layer.
The measurement component gives information about the phenomenon types measured
during a scenario execution and about the actual stored data. It is used by portlets that
visualize the measurements of various phenomenon types and the change of their values
during a simulation.
The time component offers the information about the time, date and the time zone of the
stored measurements. The service is mostly used by the time portlet which controls the
timeline of all of the visualizations.
The documentation of the services offered by each component of the VDS can be found
in the appendix A .
The last architecture decision was to use Data Transfer Objects (DTOs) instead of
business model objects [55]. DTOs are ordinary POJOs that are serializable [56] and
contain only private fields accessible via getters and setters. When request is received
and desired data are extracted from a database, appropriate DTOs are created and
populated by the data. Afterwards they are moved through the layers, modified if
necessary and transformed into JSONs as web service's response. The reason behind
using DTOs instead of business objects is that the visualization portlets have usually
very specific data needs and therefore require specific objects to hold and transfer these
data. If more general business objects were used they would still have to pass the data to
some DTOs before creating a response which would introduce unnecessary complexity.
28
5 Implementation
The chapter about the implementation of the VDS will firstly explain the package
structure of the project and then focus on each of the project's layers and how they
utilize used technologies. Afterwards, it will explain how the documentation of the REST
API was created. In the end the current deployment of the VDS within the C P G is
portrayed. This chapter's main purpose is to guide developers that will take over the
project in the future through the whole implementation and technology usage and help
them with faster integration into the visualization development group.
5.1 Package structure
Before the actual implementation can be described it
is important to understand the package and
directory structure of the project. As shown in
Figure 8, the structure follows the standard
directory layout [57] used in Maven projects. O n the
top level there is pom.xml, which is an X M L file that
contains project information and configuration
details used by Maven to build the project. The file
includes project's library dependencies, used
plugins and some additional information such as
the project version and license.
The application source directory is /src/main/java.
Inside, there is the project's main package
cz.muni.fi.kypo, which contains all source packages
and Java files that form the application. The
packages are:
• Rest - contains controller classes that
handle the rest requests. Controllers are
described in section 5.2.
• Service - this package is further divided
into a p i and impl packages; a p i provides
the interfaces that are called and used by
controller classes and impl provides the
E^l- El src
main
ava
cz.muni.fi.kypo
El mappers
El rest
El service
3—B transfer
El error
S measurement
El networkusage
S time
topology
EJD— D resources
mappers
—J9i applicationContext.xml
— [ill jdbc.properties
— [ill log4j.properties
— project.properties
B—El webapp
ED-El WEB-IN F
E3- El test
ED— D Java
ED—Cl resources
— tTJ pom.xrnl
Figure 8: Package structure of the VDS
project
29
concrete implementations of these interfaces, which are then injected into the
controller objects. Services are discussed in section 5.3.
• Mappers - the package includes all of the mapper interfaces that are used by
MyBatis in the data access layer for calling SQL statements. The role of mapper
interfaces is explained in section 5.4.4.
• Transfer - the transfer package is the base package for all DTOs used by the
application. It is subdivided into several packages - error, measurement,
networkusage, time and topology - each containing the DTOs used by their
respective service or controller classes. The error package contains a class for
creating error objects which are used in exception handling (section 5.2.4).
Application's resource files are held in directory /src/main/resources. There are three
kinds of files:
• The application context X M L file used by Spring for IoC container initialization.
• Properties files with various project settings for database connection, logging,
etc.
• Mapper X M L files inside the Mapper package. The mapper X M L files are
described in section 5.4.3.
Directory /src/main/webapp is the web application source directory. In web applications
it usually keeps all the frontend view files, cascading style sheets and JavaScript libraries.
However, since this project exposes a REST API and returns JSON objects to the callers,
there is no need for such files. Therefore, the directory contains only the WEB-INF
subdirectory with the web application deployment descriptor file, web.xml.
The test sources and resources are located in /src/test/java and /src/test/resources. Unit
tests that are implemented here secure that the data processing methods at the service
layer return correctly transformed data. They also verify that there will be no null values
returned by the service methods, if they are not expected.
5.2 Presentation layer
The presentation layer of the VDS exposes a REST API for the implemented data
services. For handling the REST requests, the Spring Web M V C framework is used as it
supports easy creation of RESTful web applications and services since Spring 3.0. After
receiving a request, the presentation layer calls the underlying service layer for the
processed data and generates a response in form of a JSON object.
30
The following sections explain how the presentation layer is implemented and handles
the requests and how the JSON objects used in responses are created. Further the error
handling and the solution for cross origin requests is described.
5.2.1 Configuration and initialization
The core of the Spring Web M V C framework and also the presentation layer of the VDS
is a dispatcher servlet, which dispatches the received requests to the appropriate
handlers. Requests are in the form of Uniform Resource Locator (URL). In the VDS, there
is a sole dispatcher servlet with the name kypoDispatcher configured in the web.xml file
and it is mapped to all incoming URLs, which ensures that it receives all of the REST
requests:
kypoDispatcher
/*
The dispatcher servlet then needs to know which classes contain handler methods, so it
can properly forward the requests during the runtime. These classes are called
controllers and are marked with the (SController annotation. Classes annotated as
controllers are identified as Spring application components and are automatically
searched for and instantiated during the initialization of the IoC container. This is done
thanks to the following two lines defined in the applicationContext.xml:
1.
2.
The first line instructs Spring to look for and instantiate components that are located in
the cz.muni.f i . kypo package and the second line specifies that these components can
be recognized by their annotations.
There are currently three independent controllers implemented, one for each of project's
components: network, measurement and time controller. Each of them extends the
BaseController class, which contains methods common to all controllers.
5.2.2 Request handling
The request handling in project's controllers is secured by using various annotations. The
most important one is the @RequestMapping annotation and its value property. It maps
31
a U R L9
(or U R L pattern) onto an entire controller class or a particular handler method.
The class level annotation maps a specific request path (or path pattern) onto a controller
with additional method-level annotations that narrow the primary mapping.
Some handler methods annotated with (SRequestMapping also contain parameters that
can be afterwards used in the method's body. Parameters are either placed directly to the
URL pattern defined in the @RequestMapping and then used via the @PathVariable
annotation on a method parameter or they are specified exclusively as a method
parameter annotated as @RequestParam.
This is a code fragment from the MeasurementController class showing the usage of
the above-mentioned annotations:
^Controller
(5>RequestMapping(value = "measurement")
public class MeasurementController extends BaseController {
(5>RequestMapping(value = "/{element}/{id}/initialData", method =
RequestMethod.GET)
(SResponseBody
public MeasuredDataTO getInitialData(
(SPathVariable String element,
(SPathVariable i n t i d ,
(5>RequestParam(value = "phenomenons")
List phenomenons) {
//implementation omitted
}
}
In this case, all requests with path /measurement are forwarded on
MeasurementController by the dispatcher servlet. More specifically, the requests that
match the pattern / { e l e m e n t } / { i d } / i n i t i a l D a t a will be handled by the
g e t I n i t i a l D a t a ( ) method. A correct U R L with specified parameters can be for
example:
/measurement/link/l/initialData?phenomenons=Number of bits,Number of
packets
9
Note: all mentioned URL mappings means just the ending of the URL of the REST request. For
example, the whole URL would be http://localhost:8080/service-name/mapping, while the
©RequestMapping annotation defines just the /mapping part.
32
After receiving such a request, the element variable would be set to "link", i d would be
"1" and phenomenons would contain a list of strings "Number of bits" and "Number of
packets".
5.2.3 Creating JSONs
JSON responses are created automatically by using Spring's @ResponseBody annotation
on handler methods in controllers. If this annotation is put on a method, it indicates that
the returned object should be serialized and written straight to the HTTP response body.
That means that there is no need to create or modify any HTTP response manually as
Spring takes care of it completely. The usage of @ResponseBody can be seen in the code
fragment from MeasurementController class in the previous section.
Spring converts the returned object to a response body by using an
HttpMessageConverter. There are various types of converters such as
StringMessageConverter for strings or ByteArrayMessageConverter for converting
byte arrays. For converting objects to (or from) JSON (which is desired in the project),
MappingDackson2HttpMessageConverter has to be initialized. Most of the
HttpMessageConverters are automatically set up by Spring if there is the
tag specified in the application context file. However, for
the initialization of Mapping]ackson2HttpMessageConverter, there also has to be the
Jackson 2 library [58] for processing JSON data format included and present in the
project, otherwise the converter would not be loaded.
Since there are several different converters initialized, the Jackson converter is not
always chosen by Spring to do the conversion. This happens with L i s t , String, Long
and some other common types as they have their own converters. It is a problem,
because such converters do not produce JSON responses. To force Spring to pick the
Jackson converter, the objects with these troublesome types are put into a Map, which is
correctly converted into JSON. A code fragment from TimeController shows an
example of returning a value with type Long:
(5>RequestMapping(value = "/experiment/start", method = RequestMethod.GET)
(SResponseBody
public Map getExperimentStart() {
Map experimentStart = new HashMap()j
experimentStart.put("experimentStart",
timeService.getExperimentStart())j
return experimentStart;
}
33
5.2.4 Exception handling
As in every application, it may happen that during the runtime of the VDS an exception
is raised. However, the client who issued the REST request expects that he will be
returned a JSON object in the response. Therefore, if an exception occurs, it must be
caught, logged for later debugging and transformed into a comprehensible error
message in JSON format that will form a response.
Since the exception handling is required in each controller, it is implemented in the
BaseController abstract class, which the other controllers inherit from. The
BaseController contains two exception handler methods marked with the Spring's
(SExceptionHandler annotation - one for catching runtime exceptions and the other for
catching checked exceptions. When an exception is raised, appropriate handler method
intercepts it and logs the problem. Afterwards, an error object (which will be serialized
into correct JSON) is constructed and has its HTTP response status code set to 50010
so
the client knows that there was a problem while processing the request.
5.2.5 Cross-origin requests
The fact that the VDS is independent from the visualization portlets means it may be
deployed at a different site than the portlets - it has a different origin. Some portlets use
JavaScript's XMLHttpRequest API for sending data requests to the VDS. The problem is
that the same-origin security restrictions may prevent a client-side web application
running from one origin (= a visualization portlet) from obtaining data retrieved from
another origin (= the VDS).
A Cross-Origin Resource Sharing (CORS) mechanism [59] has been invented to
overcome such restrictions. CORS defines a technique for exchanging data between a
client and a server with diverse origins by using a number of Access-Control headers.
The VDS uses a CORS Filter library to set up a filter which enables and controls any
cross-origin communication. The filter is configured to allow the serving of requests
from any origin, but only the HTTP GET requests, as the data service should not permit
any data modification. The configuration is located in the web application deployment
descriptor file, web.xml.
5.3 Service layer
The service layer, as mentioned before, is the main data processing layer of the project.
Services call the SQL queries via the mapper interfaces from the data access layer and
process and transform the data into their final form.
1 0
Internal Server Error
34
The following sections will discuss the architecture and configuration of the service layer
and describe how the service layer helps with data processing and service optimization.
5.3.1 Architecture and configuration
The service layer is comprised of four service interfaces and their implementations: time,
measurement, topology and network-usage. Although both topology and network-usage
are parts of the network component, they are separated on service layer as they offer two
different types of data about a network - the topology service returns information about
the network nodes and their roles while the network-usage service focuses on the link
load and the general usage of network.
The separation of service layer into interface and implementation classes allows a simple
switching of the data access technology. If there was a need of using another technology
than MyBatis in the data access layer in future, only minor changes would be needed in
the presentation layer thanks to the usage of interfaces.
Each service implementation class is annotated as @Service, which ensures that they
will be recognized as Spring application components. The process is similar as with
controllers - < context: component-scan /> tag from the Spring application context
will search for and instantiate all annotated services during the initialization of the IoC
container. Services are then ready to be used.
5.3.2 Processing data and optimization
A lot of the data processing and application logic is placed in the actual SQL queries in
the data access layer. They query only for the required data and use database functions
to transform the data if needed. However, this only works for simpler tasks, where there
is just one SQL call sufficient to gather all the data for a response. In such cases, services
only forward the result that they got from the data access layer. The primary task of the
service layer is the extraction and processing of more complex data (i.e. when more than
one SQL query have to be executed) and the optimization.
Regarding the optimization, there are cases in the project when a result for a data request
could be easily constructed by executing several simple SQL queries. That would be,
however, highly inefficient as each database call is a costly operation. Therefore, a single
and more general SQL query is executed and the result it provides is processed at the
service layer, which afterwards prepares the correct data for the response. Naturally, the
algorithm that processes the data must provide faster results than the execution of
multiple SQL queries; otherwise it would not be beneficial to the project.
GetDataByTimestampRange() method from the measurement service is a good
example of this technique. By using multiple queries, it returned results in around two to
35
three seconds, which was unacceptable performance. After the optimization with a
single SQL call and an processing algorithm with 0(n) time complexity it is able to
generate a result in roughly 50 milliseconds.
5.4 Data Access Layer
The core of the data access layer is the MyBatis framework in version 3.2.3. After
acquiring the database connection from the Spring framework it executes the SQL
queries, creates the DTOs and populates them with the extracted data. The constructed
DTOs then work their way up through the rest of the application's layers and are
returned as a response to the user.
The following chapter will describe the main components of the MyBatis framework that
are used in the VDS and how they are initialized and configured. Later, the concept of
the mappers and their usage is explained. The Dynamic SQL capabilities provided by
MyBatis and their contribution to the project are shown in the end.
5.4.1 SqlSession and SqlSessionFactory
The primary component and the most powerful class of MyBatis is the SqlSession.
Everything from getting the correct mappers to executing the SQL statements is done via
SqlSession instances. The creation of SqlSession objects is the responsibility of the
SqlSessionFactory class. Normally, an instance of the SqlSessionFactory is
acquired from the SqlSessionFactoryBuilder which has to be manually invoked in
the code and must be given the configuration details defined in a configuration X M L file.
However, the VDS is built with the Spring framework and uses the MyBatis-Spring
library, which eases the configuration and concentrates it to the Spring application
context file. The SqlSessionFactoryBuilder is replaced by the
SqlSessionFactoryBean, which is instantiated automatically during the IoC container
initialization and handles the creation of a SqlSessionFactory object when it is
needed.
5.4.2 Configuration of the data access layer's components
The configuration of the data access layer's components is located in the
applicationContext.xml file and consists of two parts.
The first part is the construction of the SqlSessionFactoryBean, which is given three
properties as shown in the following fragment:
36
The dataSource is a common JDBC data source bean specified elsewhere in the
application context file and referenced here. The dataSource is later used by the
SqlSession instances created from the SqlSessionFactoryBean to acquire database
connections. The mapperLocations property specifies the route to the mapper X M L files
which contain the SQL queries. The last property, typeAliasesPackage, is optional.
However, if it is configured, the full class names can be replaced by their shorter version
without the package name in the mapper X M L files. For example a class with name
cz.muni.fi. kypo. t r a n s f e r , topology. RouterTO can be referenced just as RouterTO,
which improves the readability of the mapper files.
The second part of the configuration is the registration of the mapper interfaces. Each
mapper interface has to be registered in order to be used for SQL query execution.
MyBatis-Spring library eases the process as normally they would have to be specified
separately, but with the library it is possible to automatically register all mapper
interfaces with a single tag placed in the application context file:
Although the mapper interfaces are registered with MyBatis after this configuration,
they also need to be recognized as Spring components, so they can be injected and used
by the services. In order to accomplish this, each one of them is marked with the Spring's
^Repository annotation.
5.4.3 Mapper XML files
There are two types of mappers in the VDS's data access layer - the mapper X M L files
and the mapper interfaces. Their cooperation is the key to executing SQL queries and
mapping the returned data to the DTOs. Before explaining interfaces and mapper
cooperation, it is important to understand the concept of the mapper X M L files.
The mapper X M L files contain all SQL statements that the project is able to execute
against the database. The SQL statements may be of any kind - data modification
language (CRUD operations) or data definition language. The VDS, however, only
extracts the data, so the only type of SQL statements used in the project are SQL queries
- selects. Each SQL query defined in the mapper X M L file is enclosed in the
element which is identified by its unique i d attribute. It also has to specify either the
resultType or the resultMap attribute, which is used to resolve the mapping of the
37
returned data onto the Java objects. The fragment from the TopologyMapper.xml file
shows a simple example of s e l e c t with resultType:
The identifier of this query is g e t A l l L i n k s and the result type is LinkTO, which means
that one row of values from the resulting columns id, s r c l d and d e s t l d will populate
the newly created LinkTO object's properties of the same name. Also, if there are more
rows in the result, more objects will be created and returned as a list of LinkTO objects.
The resultType attribute can be used for simple mapping when these two conditions
are fulfilled:
• The names of returned columns match the names of result type's properties
• The result type contains only single object properties
If the result type contains a collection of objects that needs to be mapped to the returned
data or the column names does not match to property names, a resultMap must be
specified and used in the select. In the following code fragment from the
TopologyMapper.xml file, there is a result map prepared for the RouterTO result type,
which contains a collection of Nodelnterf aceTO objects:
c o l l e c t i o n property="nodelnterfaces" ofType="NodelnterfaceTO">
38
The following fragment shows how the previous result map is used i n a s e l e c t :
5.4.4 Mapper interfaces and mapper cooperation
The SQL queries that are stored in the mapper X M L files can be only executed by an
SqlSession object. Direct work with the SqlSession objects is impractical and
therefore the concept of mapper interfaces was invented in newer versions of MyBatis.
Mapper interfaces are Java interfaces that does not have any implementation and act as a
facade of mapper X M L files - calling a method of a mapper interface will in fact execute
an SQL statement from a mapper X M L file. Internally, a mapper interface is instantiated
by an SqlSession object, which then provides all the SQL execution capabilities. After
registering mapper interfaces in the application context file (as described in section 5.4.2),
MyBatis-Spring creates the SqlSession objects and instantiate mapper interfaces
automatically so they can be used without any further configuration. The only constraint
is that mapper interfaces must have the same name as the mapper X M L files1 1
.
When a mapper interface's method is called in the code of the VDS, the SqlSession
object invokes the execution of an SQL query that has the same i d as is the name of the
called method. For example timeMapper.getAllTimestamps() call invokes the
execution of the s e l e c t with id="getAllTimestamps". The return type of the mapper
interface's method must match to the result type (or result map's type) defined with the
SQL query. If a list of objects is going to be returned after the SQL query execution, the
method's return type has to be a list of those objects.
Mapper interfaces can also supply the mapper X M L files with parameters. The
parameters in method signatures are marked with the @Param annotation and are
assigned a name. The following example is a code fragment from the
MeasurementMapper interface:
public String getMinTimestamp (@Param("elementId") i n t i d
//rest of the parameters omitted
1 1
Except the file extension
39
Such parameters can be afterwards used in the SQL queries in mapper X M L files with
#{param_name} syntax.
5.4.5 Dynamic SQL
Sometimes there is a need to construct SQL queries dynamically at the runtime, based on
the values of specific variables. It is usually very troublesome to prepare dynamic SQL
queries, but MyBatis offers a fairly simple way of doing it in the mapper X M L files.
MyBatis's dynamic SQL allows using a variety of tags inside the SQL queries, which
then act as SQL templates. Before an SQL template is executed against a database after
being invoked by calling a method of a mapper interface, it is evaluated into a correct
SQL query. The evaluation is based on the parameters that mapper interface passes to
the mapper X M L file.
The ability of MyBatis to create dynamic selects easily is widely used i n the VDS,
particularly in two situations. The first situation happens when there are two SQL
queries that are almost the same except a small part of the where or from clause. In such
case the possible fragments are enclosed in i f or choose-when tags and the correct one
is chosen by passing a specific parameter value via the mapper interface. In the following
example from MeasurementMapper.xml, the value of parameter elementType
determines which column of table observation should be used for i d comparison:
observation.routing_id = #{elementld} AND
observation.node_interface_id_in = #{elementld} AND
observation.node_interface_id_out = #{elementld} AND
The previous structure prevents the repetition of code and lowers the possibility of
creating bugs if the SQL query has to be modified (the changes are only made in one
query).
The second situation in which the dynamic SQL capabilities of MyBatis are employed in
the VDS happens when a parameter that is going to be passed to the mapper X M L file
contains a list of values. Such list usually varies in its length and there is no other way to
40
construct an SQL query which uses it than building the query dynamically. For these
cases there is a f oreach tag, which usage is shown in the next code fragment also from
the MeasurementMapper.xml file:
WHERE phenomenon_type.name IN
#{item}
This structure correctly expands the phenomenonNames list parameter into a list of values
separated by commas and bounded by brackets.
5.5 REST API documentation
The documentation of the REST API exposed by the VDS is very important for the
developers of the visualization portlets. The portlets are the clients of the service and
therefore they need to know how to call the data services (the U R L patterns), what data
the services return (the data semantics) and how the data are organized (the structure of
JSONs).
Maintaining the documentation in an external file may introduce many problems
because it has to be separately updated whenever there is a change in the code. If a
discrepancy between the external documentation and the code occurs, it often ends up
with numerous errors on the client side and requires developer time to perform
unnecessary debugging of their code. Therefore it was decided that the documentation
will be kept in the code and will be rebuilt each time the code compiles. However, such
documentation also needs a way to be accessed from the outside of the code and should
be easily readable and understandable.
For the above-mentioned reasons the VDS uses the JSONDoc [60] library for the
construction of REST API documentation. JSONDoc provides a number of annotations
that can be used to document the request handler methods in controllers and also the
objects that are being returned in JSON format - in VDS's case they are the DTOs. If
JSONDoc is correctly set up and all controller and DTO classes are properly annotated,
the documentation can be returned from the VDS in form of a JSON object by sending a
REST request with path / jsondoc.
Although the documentation in form of a JSON object can be easily parsed, it is not the
most suitable format for people to read. Therefore the JSONDoc UI project is utilized,
which parses the returned JSON object and provides an intuitive GUI for browsing the
41
documentation from a web browser. It also incorporates a playground component via
which a user can easily send REST requests to the service and see what they return. The
JSONDoc UI is deployed along with the VDS and is used by the visualization portlets'
developers.
5.6 Deployment
The Visualization Data Service is currently deployed in the same virtual machine as the
Liferay portal with the visualization portlets. Each portlet uses the REST API of the VDS
via the HTTP protocol to obtain the data it requires. The VDS gathers the data by
connecting to a PostgreSQL database, which is located on another virtual machine, via
JDBC. The data are stored in the database by the measurement module of the C P G
during the execution of a scenario.
«Virtual machine>>
U bun tu
«application Server»
Apache Tomcat
« c o m p o n e n t »
Liferay a
:
Allowed values: l i n k , node-interf ace-in, node-interface-out
A list of required phenomena. Must contain their full names separated
by comma from each other
Response object
Object: Measured Data
Multiple: False
54
Path: /measurement/{element}/{id}/nextData?phenomenons=
{phenomenons}×tamp={timestamp}
Description: Returns the next timestamp (the next to the timestamp specified in the
request URL) that has got the data for at least one of the specified
phenomena + list of data. The data in the list are in the order of the
phenomenon names specified in the request URL. Each phenomenon
has exactly one value. The returned measured values are the most
recent known data for the returned timestamp (i.e. if there are no data
which would match the timestamp for the phenomenon, the last
known data (=prior to returned timestamp) for this phenomenon are
returned).
Path parameters
element Type: S t r i n g
Allowed values: l i n k , node-interf ace-in, node-interface-out
Demanded topology element (if node interface is the element, specify
whether ingoing or outgoing communication should be returned).
i d Type: i n t
The id of the element (linkld, nodelnterfaceld)
Query parameters
phenomenons Type: List
Allowed values: l i n k , node-interf ace-in, node-interface-out
A list of required phenomena. Must contain their full names separated
by comma from each other.
timestamp Type: Long
Timestamp formatted as UNIX epoch time i n seconds i n UTC time
zone.
Response object
Object: Measured Data
Multiple: False
55
Path: /measurement/{element}/{id}/actualData?phenomenons=
{phenomenons}×tamp={timestamp}
Description: Returns the same timestamp as specified in the request U R L + list of
data. The data in the list are in the order of the phenomenon names
specified in the request URL. Each phenomenon has exactly one value.
The returned measured values are the most recent known data for the
returned timestamp (i.e. if there are no data which would match the
timestamp for the phenomenon, the last known data (=prior to
returned timestamp) for this phenomenon are returned).
Path parameters
element Type: S t r i n g
Allowed values: l i n k , node-interf ace-in, node-interface-out
Demanded topology element (if node interface is the element, specify
whether ingoing or outgoing communication should be returned).
i d Type: i n t
The id of the element (linkld, nodelnterfaceld)
Query parameters
phenomenons Type: List
Allowed values: l i n k , node-interf ace-in, node-interface-out
A list of required phenomena. Must contain their full names separated
by comma from each other.
timestamp Type: Long
Timestamp formatted as UNIX epoch time i n seconds in UTC time
zone.
Response object
Object: Measured Data
Multiple: False
56
Path: /measurement/{element}/{id}/dataRange?phenomenons=
{phenomenons}&startTimestamp={startTimestamp}&
endTimestamp={endTimestamp}
Description: Returns the list of MeasuredData structures called
"rangedMeasuredData". Each item in the list has a timestamp and a list
of data. RangedMeasuredData list is ordered by timestamps and all
timestamps are between start and end timestamps specified i n the
request URL. The data in the list of each MeasuredData structure are in
the order of the phenomenon names specified in the request URL. Each
phenomenon has exactly one value. The returned measured values are
the most recent known data for the assigned timestamp (i.e. if there are
no data which would match the timestamp for the phenomenon, the
last known data (=prior to assigned timestamp) for this phenomenon
are returned).
Path parameters
element Type: S t r i n g
Allowed values: l i n k , node-interf ace-in, node-interface-out
Demanded topology element (if node interface is the element, specify
whether ingoing or outgoing communication should be returned).
i d Type: i n t
The id of the element (linkld, nodelnterfaceld)
Query parameters
phenomenons Type: List
Allowed values: l i n k , node
A list of required phenomena
by comma from each other.
startTimestamp
Type: Long
Timestamp formatted as UNIX epoch time i n seconds in UTC time
zone.
Specifies the start of the range of returned data.
endTimestamp
Type: Long
Timestamp formatted as UNIX epoch time in seconds i n UTC time
zone.
Specifies the end of the range of returned data.
- i n t e r f a c e - i n j node-interface-out
. Must contain their full names separated
57
Time component services
Path: /time/timezone
Description: Returns the time zone of the minimal timestamp in database. Other
timestamps should have the same time zone.
Response object
Object: Timezone
Multiple: False
Path: /time/experiment/start
Description: Returns the first timestamp of any measured value in the database,
which is considered as the start of the experiment. Timestamp is
formatted as UNIX epoch time in seconds in U T C time zone and
returned as a number called "experimentStart".
Path: /time/experiment/end
Description: Returns the last timestamp of any measured value in the database,
which is considered as the end of the experiment. Timestamp is
formatted as UNIX epoch time in seconds in U T C time zone and
returned as a number called "experimentEnd".
Path: /time/all-timestamps
Description: Returns all (distinct) timestamps of measured values from the database
(= all times of interest from measurement's point of view). Timestamps
are formatted as UNIX epoch time in seconds in U T C time zone and
returned as a list of numbers called "timestamps".
58
Structures of returned JSON objects
Name: Error
Description: Error is returned when an exception occurs in application.
Fields
status Type: i n t
Http response status code.
message Type: S t r i n g
Information about the error.
Name:
Description:
Fields
i d
Link
Link between routers (networks).
Type: i n t
Identifier of the link.
s r c l d
destld
"target"
Type: i n t
Identifier of the source router (network). It is returned also as "source"
in the object and contains router's topologyld.
Type: i n t
Identifier of the destination router (network). It is returned also as
in the object and contains router's topologyld.
Name:
Description:
Fields
i d
name
Node Interface Node
Base class for other objects used as nodes in topology.
Type: i n t
Identifier of the node. Topology objects also return "topologyld" which
should be unique in topology.
Type: S t r i n g
Name of the node.
physicalRole Type: S t r i n g
Physical role of the node in the topology.
address4 Type: S t r i n g
IPv4 address for node interfaces or cidr4 address for routers
(networks).
59
hostNodeld Type: i n t
Identifier of node-interface's host node (computer).
Name:
Description:
Fields
i d
name
Router Node
Base class for other objects used as nodes in topology.
Type: i n t
Identifier of the node. Topology objects also return "topologyld" which
should be unique in topology.
Type: S t r i n g
Name of the node.
physicalRole Type: S t r i n g
Physical role of the node in the topology.
address4
(networks).
Type: S t r i n g
IPv4 address for node interfaces or cidr4 address for routers
nodelnterfaces
Type: List
List of node interfaces connected to this router. This field is called
"children" in returned JSON because that name is specifically needed
by d3.js in visualization of topology.
Name:
Description:
Fields
l i n k s
Topology Objects
Contains information about all separately visualized objects in
topology.
Type: List
List of visualized links.
i n t e r f a c e s Type: List
List of visualized interfaces.
routers Type: List
List of visualized routers. The list does not contain the node interfaces
connected to the routers.
60
Name:
Description:
Fields
l i n k s
routers
Topology
Describes the network topology. Topology consists of routers, node
interfaces connected to these routers and links between routers.
Topology structure is specifically prepared for topology visualization.
Type: List
List of links between routers in topology.
Type: List
List of routers in topology. This field is called "children" in returned
JSON because that name is specifically needed by d3.js in visualization
of topology.
Name: Node Interface Link Usage
Description: Contains information about usage of link connecting node interface
with a router.
Fields
nodelnterfaceld
Type: i n t
Identifier of the node interface from (or to) which leads the observed
link.
numberOfBits Type: double
Absolute number of bits that are being sent through a link at a moment.
bandwidth Type: double
Link's maximum bandwidth.
bwUnit
load
Type: S t r i n g
Unit in which link bandwidth is expressed.
Type: double
Load is always between 0 and 1. Load of the link:
n u m b e r o f b i t s / bandwidthinbits.
speed Type: double
Speed is always between 0 and 1. Speed of the link =
n u m b e r o f b i t s / number_of_bits_of_the_fastest_link).
61
Name: Router Link Usage
Description: Contains information about usage of link connecting two routers.
Fields
i d Type: i n t
Identifier of the link between routers.
numberOfBits Type: double
Absolute number of bits that are being sent through a link at a moment.
Type: double
Link's maximum bandwidth.
Type: S t r i n g
Unit in which link bandwidth is expressed.
Type: double
Load is always between 0 and 1. Load of the link =
numberofbits / bandwidthinbits.
Type: double
Speed is always between 0 and 1. Speed of the link =
numberofbits / number_of_bits_of_the_fastest_lmk).
Name: Network Link Usages
Description: Contains lists of link usages in network.
Fields
routerLinks Type: List
List of usages of links between routers (networks).
i n t e r f a c e L i n k s I n
Type: List
List of usages of links going from routers to node interfaces.
interfaceLinksOut
Type: List
List of usages of links going from node interfaces to routers.
bandwidth
bwUnit
load
speed
62
Name: Node Interface Role
Description: Denotes the logical role of node interface in the network topology.
Fields
i d Type: i n t
Identifier of the node interface. Returns also "topologyld" which should
uniquely identify the node interface in the network topology.
r o l e Type: S t r i n g
The logical role of the node interface.
Name: Phenomenon
Description: Object describing a phenomenon.
Fields
name Type: S t r i n g
Name of the phenomenon.
unit Type: S t r i n g
Unit in which the data about this phenomenon is stored.
Name: Measured Data
Description: Contains measured values for different phenomena at a certain time.
Fields
timestamp Type: S t r i n g
Timestamp formatted as UNIX epoch time in seconds in UTC time
zone.
data Type: List
List of measured values. Each value represents one phenomenon - their
order should depend on the request URL. For more information, check
methods that use MeasuredData as return object.
63
Name:
Description:
Fields
timezonelDs
Timezone
Time zone of the data in the database. A l l time data received from this
service should be converted into this timezone so it matches to the time
when the data were stored, (service returns all times converted to
UTC).
Type: List
Possible text name representations of the timezone.
offsetSeconds Type: Integer
Offset of the time one against the UTC, measured in seconds.
offsetSeconds Type: Integer
Offset of the time one against the UTC, measured in hours.
64
B Tutorials for deployment and configuration of the service
Deploying the VDS to the Tomcat
Method 1 - by using the vds directory
• In the vds directory [set up the database connection information]
• In the vds directory [configure the server IP and port]
• Copy the vds directory into the designated Tomcat's /webapps directory
• If Tomcat is not started, start it
• Wait for the automatic deploy (usually around 10 seconds)
Method 2 - by using the vds.war file
• Copy the vds.war file into the designated Tomcat's /webapps directory
• If Tomcat is not started, start it
• Wait for the automatic deploy (usually around 10 seconds)
• A vds directory should appear in the Tomcat's /webapps directory
• In the vds directory [set up the database connection information]
• In the vds directory [configure the server IP and port]
Setting up the database connection information
• Open the classes directory at /vds/WEB-INF/classes
• Open the file jdbc.properties
• Set up the database driver, URL, username and password (for PostgreSQL the
driver is already set and the U R L only needs to have set the IP and the database
name)
• Save the changes and restart the Tomcat if the VDS is already deployed
Configuring the server IP and port
• Open the classes directory at /vds/WEB-INF/classes
• Open the file project.properties
• Set up the project.serverIP and project.serverPort properties
• Save the changes and restart the Tomcat if the VDS is already deployed
65
Deploying the VDS documentation GUI
• Copy the vds-doc directory into the designated Tomcat's /webapps directory
• Open the vds-doc directory and then open the connection.) s file
• Set the connectionString variable so it correctly points to the /jsondoc REST
service of the VDS as in the following:
o http://[IP_of_the_server_with_VDS]: [port]/vds/jsondoc
• Save the changes
• For browsing the documentation open the web browser and navigate to URL:
o http://[IP_of_the_server_with_vds-doc]: [port]/vds-doc/jsondoc.jsp
Changing the name of the VDS
Changing the name of the service
• Rename the directory from vds to [new_name] (if it is already deployed it is in
the /webapps directory of Tomcat)
• Open the renamed directory at /[new_name]/WEB-INF/classes
• Open the file project.properties
• Change the project.finalName property from the vds to [newname]
• Save the changes (if it was already deployed, restart the Tomcat for changes to
take effect)
Registering the [newname] of the VDS with the VDS documentation GUI
• Open the vds-doc directory and then open the connection.) s file
• Change the name of the service in the connectionString variable from vds to
[newname] so it looks as follows:
o http://[IP_of_the_server_with_VDS]: [port]/[new_name]/jsondoc
• Save the changes
66
C Visualization portlets
Network topology visualization portlet
The portlet visualizes the network topology and the usage of links between the network
nodes.
N e t w o r k T o p o l o g y
Time management portlet
The portlet allows exposes an interface to control the playback of a scenario execution.
Users can also choose which phenomenon types on a particular link they would like to
observe.
Time Manager
67
2D spider chart and line charts portlet
The portlet visualizes the actual values and changes of values of phenomenon types
selected in the Time management portlet.
Spider Chan °
1
aline
chartsSpider Chan °
1 1MM IOSM» IStMH lfcll» lfcitM
*»• / \
M- / \
MS- / \
3D spider chart portlet
The portlet visualizes the the changes of values of phenomenon types selected in the
Time management portlet in a 3D sequence spider chart object.
SpiderChart3D
68
D List of electronic appendices
The archive file electronic_appendices.zip is located in the thesis archive in the IS M U
and contains the following electronic appendices:
• vds - a directory with the compiled Visualization Data Service
• vds-doc - a directory with GUI for VDS's documentation (a slightly modified
JSONDoc UI project)
• vds-sources - a directory with the sources of the VDS
• vds.war - a web application archive file with the compiled VDS
69