M A S A R Y K UNIVERSITY F A C U L T Y OF INFORMATICS s e r v i c e f o r C y b e r n e t i c P r o v i n G r o u n d ' s v i s u a l i z a t i o n s MASTER'S THESIS Be. Robert Dubecky Brno,2014 Declaration Hereby I declare that this paper is my original authorial work, which I have worked out by my own. A l l sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Be. Robert Dubecký Advisor: RNDr. Radek Ošlejšek, Ph.D. ii Acknowledgement Above all I would like to thank both my advisor RNDr. Radek Ošlejšek, Ph. D. and my team leader Dalibor Toth for providing me with a lot of help and guidance during the work on my thesis. I would also like to thank the rest of my colleagues from the visualization group for being a great team to work with; namely Andrej Lučanský, Zdenek Eichler, Petr Jelínek, Adam Brauner, Michal Kňazský and Karolína Burská. Last but not least, I would like to thank my good friend Lenka Plháková for helping me with creating some nice pictures for my thesis. iii Abstract The Cybernetic Proving Ground is a prototype testbed for executing simulations of cybernetic attacks on various network infrastructures. It should be able to visualize the executed simulations. The goal of the thesis was to create a service that will process the data generated during the simulations and provide them to the visualizations. The thesis describes the Cybernetic Proving Ground project and its data model. Furthermore it focuses on the data access and object-relational mapping technologies in Java and lists their advantages and disadvantages. In the end the design and implementation of the developed data service are described. The data service exposes a REST API and is currently deployed in a testing environment. iv Keywords Cybernetic Proving Ground, MyBatis, O R M , Java, impedance mismatch, REST, Spring M V C , data service Contents 1 Introduction 1 2 Cybernetic Proving Ground 2 2.1 Goals and requirements of the infrastructure 2 2.1.1 Network specific requirements 3 2.1.2 Host configuration requirements 3 2.1.3 Monitoring requirements 3 2.1.4 Control requirements 3 2.1.5 Deployment requirements 3 2.2 Scenario execution 4 2.3 Architecture 4 2.3.1 Scenario management 4 2.3.2 Cloud management 6 2.3.3 Measurement 6 2.3.4 Visualization 7 2.4 C P G Entity-relationship diagram 7 2.4.1 Network topology configuration 8 2.4.2 Logical topology and network properties 9 2.4.3 Measurement infrastructure 9 2.4.4 Definitions of network parameters and characteristics 9 2.4.5 Observation storage 10 3 Technologies for Mapping Relational Data to Objects in Java 11 3.1 The object-relational impedance mismatch 11 3.2 Mapping relational data to objects in persistence layer 13 3.3 Direct JDBC API calls 13 3.3.1 History 13 3.3.2 Technology 14 3.3.2 When to use the JDBC API directly 15 3.4 O R M frameworks 16 3.4.1 History 16 3.4.2 Technologies 16 3.4.3 When to use O R M framework 18 3.5 Hybrid frameworks 19 3.5.1 Technologies 20 3.5.2 When to use hybrid frameworks 21 4 System Requirements and Architecture 23 4.1 Requirements 24 4.2 Choosing technologies 25 4.3 Architecture 26 5 Implementation 29 5.1 Package structure 29 5.2 Presentation layer 30 5.2.1 Configuration and initialization 31 5.2.2 Request handling 31 5.2.3 Creating JSONs 33 vi 5.2.4 Exception handling 34 5.2.5 Cross-origin requests 34 5.3 Service layer 34 5.3.1 Architecture and configuration 35 5.3.2 Processing data and optimization 35 5.4 Data Access Layer 36 5.4.1 SqlSession and SqlSessionFactory 36 5.4.2 Configuration of the data access layer's components 36 5.4.3 Mapper X M L files 37 5.4.4 Mapper interfaces and mapper cooperation 39 5.4.5 Dynamic SQL 40 5.5 REST API documentation 41 5.6 Deployment 42 6 Conclusion 43 Bibliography 44 List of used abbreviations 50 A The REST API of the Visualization Data Service 52 B Tutorials for deployment and configuration of the service 65 C Visualization portlets 67 D List of electronic appendices 69 vii 1 Introduction In today's digital age when almost all computers are networked and a part of the Internet, the cyber-attacks are becoming an increasing threat. Therefore, the network security needs to be continuously improved and updated. However, the attackers are unceasingly coming up with new techniques of how to get past even the latest defense mechanisms and disrupt a system or exploit vulnerability of a service. To design a successful functional protection, one has to fully understand the nature of the attacks, their consequences and possible limitations. One of the viable solutions for this problem is to simulate the attacks in a testing environment. A prototype of such environment is currently being developed at Masaryk University and goes under the name of Cybernetic Proving Ground. The simulations executed in the Cybernetic Proving Ground produce a lot of data about the attacks. Afterwards, the data need to be processed and visualized in some fashion so people who analyze the attacks could understand them easily. The goal of this thesis is to design and implement a prototype of a service which would access those data, process them and distribute them to the Cybernetic Proving Ground's visualizations. The thesis also focuses on examining various approaches to data access and object-relational mapping technologies available in Java environment in order to find a suitable technology for the implementation of the data service. Since more visualizations with unpredictable data requirements are likely to be developed in future, the thesis describes the implementation of the service in detail so new data services could be added quickly by different developers. The second chapter describes the requirements for the Cybernetic Proving Ground's infrastructure and the modules of which the infrastructure is comprised. It also focuses on explaining the data model of the Cybernetic Proving Ground's database. Chapter three analyzes the problem of mapping relational data to objects and introduces several Java technologies that address the problem. The fourth chapter lists the requirements for the implemented data service, explains the technology choices and the design of the service's architecture. Chapter five talks about the actual implementation of the service and shows how the chosen technologies were employed in the project. 1 2 Cybernetic Proving Ground Cybernetic proving ground (CPG) is a virtual testbed specifically designed for testing and simulation of cyber-attacks on various network infrastructures. Several similar testing environments already exist but each one of them comes with a set of restrictions which limits its functionality or variability. There are projects like DETER [1] which utilize the publicly available Emulab [2] infrastructure solution. It allows them to deploy various virtual appliances with flexible network topologies configuration and network characteristics adjustment. Despite the fact that Emulab simplifies many of the tasks required for setting up different experiment environments, it introduces several constraints regarding the infrastructure and its features (e.g. only IPv4 support in topology configuration, operating system and hardware restrictions etc.). Some other security-related testbeds need to have their own and dedicated infrastructure purchased and built. A particular set of hardware and software components have to be acquired and set up in a specific way allowing it to be used only for a single purpose. O n one hand, this provides better control over the testbed deployment and its supported features, but on the other hand, it limits the growth potential and requires high initial investments [3]. 2.1 Goals and requirements of the infrastructure To overcome the limitations of the other security testbeds, the C P G team decided to invent a completely new solution in security-simulations field. The main goal of C P G is to introduce a system where executing various attack scenarios in network is simple (i.e. there is no need for the user to have extended knowledge about creating virtual networks and running them) and which contains sufficient monitoring and visualization tools that enable the user to study the simulated attacks. Five categories of requirements for the testbed infrastructure have been established [4]: 1. Network specific requirements 2. Host configuration requirements 3. Monitoring requirements 4. Control requirements 5. Deployment requirements 2 2.1.1 Network specific requirements C P G is supposed to allow its users to have full control over the simulated network's third layer of ISO/OSI model. That means it must be possible to simulate any network topology and use arbitrary IP addressing schema or third layer network protocol. To be able to realistically simulate any network, the new system has to allow the modification of network characteristics such as link bandwidth, latency, packet loss etc. Furthermore, a controlled connection of the testbed networks with the real world should be possible to establish, since some scenarios may require communication with the Internet. However, this kind of communication must be properly filtered, so the ongoing attack does not spread outside of the simulated network [5]. 2.1.2 Host configuration requirements The system must be able to simulate computers ran by common operating systems (specifically Linux and Windows so far) both in their 32-bit and 64-bit versions. It should be possible to configure the computers easily and to install any software required by a scenario. 2.1.3 Monitoring requirements For the simulation to provide a usable data about the attacks a sophisticated monitoring infrastructure has to be in place. Monitoring of the network communication and line usage however cannot interfere with the actual traffic since it would introduce an undesired noise in the measured data. Link monitoring as well as host monitoring should be implemented in the infrastructure. 2.1.4 Control requirements The testbed should be equipped with a control layer which would manage the other components of the system. It will expose a user interface which should enable the user to simply and intuitively set up and configure virtual networks for simulations, execute and re-execute scenarios and should comprehensively visualize what is happening in the simulated environment. 2.1.5 Deployment requirements It was concluded that the most suitable solution for building a system with the abovementioned requirements would be using a cloud environment. Clouds possess powerful tools for creating a virtual infrastructure which would be able to simulate the network layer as well as host computers. By using common cloud interfaces C P G would not need to rely on a particular cloud provider which would supply it with desired transferability and flexibility. 3 2.2 Scenario execution Before explaining architecture of the C P G it is important to understand what a C P G scenario is and how it will be executed. A scenario defines what will be simulated in the C P G and how the components in the sandbox environment will be built. It describes all the actions that will happen during the experiment and also contains the details of their technical realization, among which are for example: • Nodes - the virtual machines that will be used for the experiment • Network (physical) topology - how the nodes will be connected and for a virtual network • Logical topology - the logical roles of the nodes (attacker, victim, etc.) • Monitoring rules - defines the monitoring infrastructure for the scenario and the most interesting observation points in the network topology • Experiment schedule - the separate steps describing what is supposed to happen during the experiment The scenario execution has three main phases which occur in a sequence: 1. Initialization - the infrastructure elements are instantiated according to configuration (network and logical topology, monitoring rules...) defined in the scenario. 2. Scenario run - network and host monitoring infrastructures capture data of the running attack and store them in the database 3. Evaluation - the data that was stored during the attack is being analyzed and interactively visualized 2.3 Architecture The architecture of the C P G consists of four infrastructure elements, which are called modules. The modules communicate by sending messages which contain control information, configuration or measured data. The following sections describe the purpose and responsibilities of each module. 2.3.1 Scenario management The scenario management module is the main module of the CPG. It communicates with all of the other modules and supervises and controls their work according to the needs of the scenario simulation. Its main tasks are managing configuration, controlling the scenario execution and evaluation of the scenario. 4 Scenario Cloud management y* c Scenario management Measurement management Visualization I Network topology (NT) NT initial configuration Measurement infrastructure (Ml) J Ml initial configuration Logical topology Figure 1: Schema of the CPG configuration [4] The configuration of the C P G is illustrated in Figure 1. After receiving scenario information, the scenario management module creates the configuration of logical topology and initial configuration of network topology and measurement infrastructure. Firstly, it contacts the cloud management module with the initial configuration of the network topology. The cloud management module sets up the network, complements the configuration and sends it back to the scenario management. The scenario management module then passes the network topology and initial measurement configuration to the measurement management module, which completes the measurement configuration and sends it back. In the end, the scenario management module provides both the network and logical topologies and measurement infrastructure information to the visualization module. The primary role of the scenario management module during scenario execution is the supervision of scenario progress. It secures that every step defined in the scenario is performed and completed. When the scenario is completed and evaluation starts, the module makes sure that all of the outputs are correctly prepared and stored for later visualization and analysis. The creation of the scenarios is the responsibility of scenario development group. They study various kinds of cyber-attacks and transfer them into scenarios that can be executed in the CPG. So far, they have prepared a detailed version of DDoS (Distributed Denial of Service) scenario based on observation of real attack against the internet infrastructure of Czech Republic that took place in March of 2013 [6]. This scenario has been used as a proof of concept of the C P G [3] and provided testing data for the development of the visualization module prototype. 5 2.3.2 Cloud management The creation and management of virtual machines and networks in the C P G is hidden behind a cloud management module. The module acts as an abstract layer above the cloud provider. It exposes a simple but powerful Application Programming Interface (API) functions that communicate with the underlying cloud service and facilitate the construction of virtual network environment for the experiments. Every security scenario is executed in a separate closed environment (sandbox), which ensures that the simulated attack will not affect any other simulation or another part of the C P G infrastructure. Each assembled virtual environment is controlled by a dedicated node, called Scenario Management Node (SMN), which provides the functionality of the scenario management module for the newly created sandbox as well as contains additional components for measurement (see Measurement management section). The control node takes care of the configuration and instantiation of virtual machines and setting up the network topology based on scenario description. The S M N communicates with the rest of the infrastructure in a separated control network to ensure that the control data and module configuration does not interfere with the actual experiments. 2.3.3 Measurement The measurement infrastructure currently implemented in the C P G consists of two monitoring infrastructures - network and host. The network monitoring infrastructure's responsibility is gathering the relevant network traffic data, their processing and storing for later analysis and visualization. For these purposes a set of probes, data processing unit and a database is used. A probe is a device that performs the data metering and exporting processes. It gathers raw information about the network traffic and sends them to the data processing unit. A probe can be either a software device or may use hardware support. The data processing unit collects, processes and stores data sent by probes. Its core part is called a collector and it is located in Scenario Management Node, which means there is one collector per scenario. After receiving certain amount of data from the probes, the collector then compute basic statistics, aggregates the data and sends it to the database. The host monitoring infrastructure collects data about the node performance. It is possible to monitor C P U and memory utilization, disk usage, number of open connections, interface statistics and other characteristics about hosts [3]. The infrastructure is built from two components - a master node and child nodes. The master node is a process which controls the child nodes and gathers their monitored information. This information is afterwards processed and stored. The master node is deployed in the collector and has usually only one instance per scenario. Child nodes are 6 processes running directly in host machines. Each child node monitors the host in which they are located and send the information to the master node. 2.3.4 Visualization The purpose of the visualization module is to offer simple and responsive control over the C P G and to comprehensively represent the measured data. The module should therefore provide a control Graphical User Interface (GUI) and scenario visualizations. Although some visualizations will be shared among more scenarios (like network topology monitoring), there will also be specific visualizations for specific scenarios. These will present their respective scenario in a unique way, concentrating only on the data important for its visualization (e.g. showing number of packets received by victim's computer during DDoS scenario). The visualization module also has to perform further processing and analysis of the measured data before they can be passed to the actual visualizations. The designed visualization module will be implemented as a web application. The application should be effective, scalable, extensible and maintainable so the new visualizations are simple to add when necessary. This thesis is a part of the visualization module. It focuses on accessing data measured during a scenario execution, processing them and distributing them to other parts of the module so they can properly visualize them to C P G users. To be able to correctly access the data, it is important to understand where and how the data will be stored. This issue is addressed in the next section. 2.4 C P G Entity-relationship diagram The C P G stores the data about scenarios and measuring results in relational databases. Each scenario execution has its own database with a common schema. The database model is shown in Figure 2 its latest version, but because the C P G is still in development, it is likely that the model will undergo some modifications in the future. The tables in the model are colored differently by their purpose: • Blue tables are used for storing the network topology configuration • Green tables store logical topology and network properties • Yellow tables store information about the measurement infrastructure • Orange tables contain definitions of network parameters and characteristics which will be measured in the C P G • Grey tables form the observation storage 7 Paradigm For ijfc. logical rOTS'1 'l -, , l l v e r b : 'c vJ y id Int4 Q name text \ ) i as5igned_logical_role id int4 neiwork_etement_id int4 • from lime(1) [fl ß t o lime(1) BO J routi a 1ř id Int4 4 g firewall text CS tlM CS * dsinefwo/Vwrf tM4 es y ne twork_e lernen! [7] name text U <- r network element group network_element_id int4 4^ groupid int4 I id in!4 groupjd in(4 P name text [JJ] network 4h nehvorJceJemenf_W /ni4 H cidr4 text C3 n Cidrß text 09 | J rode interface § Id int4 t o - - 1 ^ nefworfc neiworfcetemeflMd iflM CS node nefwort eJe.mer)f jtf /nW GO ^ internal_network_probe_id r"ni4 k j ! Q ip4_addr text C5 in ] ^ ip6_addr text 03 10 - - r n mac_addr text GS 10-- r Í id int4 ^tm node_id m\4 Q id_of_image Int4 CO text GO • size int4 CO H format text CO [] template_id int4 CO r)éíw(Jf-A_*yefrtértř_í(í j'rtM %• physic at_rote_id ini4 ^ Id Int4 ^ name text [} intern al_network__p robe measuring_probe_id int4 monitored link id int4 4jh routingjd int4 4h extema!_network_probeJd ůlM network._property ^ id int4 "I ^^přienome/jon_íy"pe_i: d' irtM 4jfc| interfaceJd int4 4j|| routingjd inM m value text ^ id int4 4k phenomenonJypejd int4 name text ßj 9supported values phenomencnjype 1 id int4 2 name text H unit text measured_phenomenon_type if w InM measíiremení_fype_íd int4 phenomenon Jypejd in!4 ^ derived_characteristics bool CJJ >. r measurement_type \ id int4 detection_metbod_id int4 CS ^ name text CS ? O^ detection method \ s* host probe 1 » v measuring_probe id int4 r external network probe ^pta nodejd in\4 v > measuring_probe id int4 J i i i ! ! ! ! int4 I l l i ^ name text y ; ; i i Q desc V text C3 ^ ^ íi? ^ ' observation id Int4 ^ m e a s u r e d _ p h e n o m e n o n J y p e J d int4 measuring_prohe_id int4 CO ^ routingjd int4 CO 4|fa node_inie/face_'"d_iJi ini4 CO nodeJnterfaceJd_out int4 CO ^] timestamp time&tamp C O - E ^ measuring probe — \ id int4 Q timeout_active time(7) CS ^ timeoLt_inactive time(7] CS ^ Q sampling int4 CS who measured measurement ^ observationjd tni4 P value text (JJ] category observation observationjd int4 phenomenon_id ini4 Figure 2: Entity-relationship model of CPG database [7] 2.4.1 Network topology configuration The network topology configuration part of the model describes a simplified logical structure of the simulated network. It does not represent the concrete architecture used to build the topology within the C P G environment (specific routers and switches). Local networks in the topology (table Network) are connected by links which are stored in table Routing. Each link in table Routing is defined by source and destination networks which means that for each two-way communication there have to be two records. This solution ensures the ability to separately measure the link usage in each computer. Table Disk 8 holds information about disc images that are used for initializing host computers in the virtual environment. Since several logical characteristics can be assigned to network as well as hosts, they are connected by a generalized table Network_element. 2.4.2 Logical topology and network properties With logical topology tables, it is possible to specify a logical role1 to any host computer or network at any given time. This is done by setting the attributes from and to i n table Assigned_logical_role. These attributes represent relative time from the beginning of the simulation (i.e. if there is a record where the attribute from is set to 10 seconds and the attribute to is set to 25 seconds, it means that the logical role which was assigned to a network element is active between the 10th and 25t h second of the simulation). A network element may change its logical role any number of times2 but must not have more than one role assigned at a certain moment. Network elements may also belong to different logical groups by their geographical, organizational or any other logical structure (tables Group and Network_element_group). Logical groups are not currently used in the C P G prototype. Table Network_property holds values of various network parameters that can be specified for either links or interfaces. 2.4.3 Measurement infrastructure Measurement infrastructure storage contains data about all probes deployed in the virtual network. Each probe has a record in table Measuringjprobe which contains general information about the probe. Additionally, since probes can be of different types, every probe must have a record in exactly one of the Measuring_probe's child tables. Probes deployed directly in hosts have their records in Host_probe table, probes that monitor communication that passes through host interfaces within a network are stored in Internal_network_probe table and probes monitoring communication between networks are kept in External_network_probe table. Because one external network probe can monitor more links and one link can be monitored by more probes, there is also a binding table MonitoredJink which reflects this possibility in a database. 2.4.4 Definitions of network parameters and characteristics There are many types of network parameters and characteristics that can be measured during scenario executions. A n y new scenario may define a new set of such measurable characteristics and that is why they need to be stored in a general and extensible way. A l l the definitions of network parameters and characteristics are stored in the table Phenomenon_type, which holds their name and their unit. If a phenomenon type cannot be measured as a numerical value with a unit, it is assigned an enumeration of values 1 Currently one of - idle, victim, bot, master 2 For example a victim may become the attacker and then victim again in a single scenario 9 that it may acquire (table Phenomenon). This is an example of the definition of a phenomenon type with a unit: phenomenon_type •name = "Number of bits" •unit = "bit" This is a phenomenon type that has a list of supported values specified: phenomenon_type -name = "network protocol" -unit = null -name = "network protocol" -unit = null phenomenon •name = "UDP" phenomenon •name = "TCP" The Measurement_type table stores information about particular measured characteristics for a scenario. Such characteristics may be more complex and be composed out of several phenomenon types. Therefore, there is a binding Measured_phenomenon_type table between the tables Measurement_type and Phenomenon_type. The binding table also specifies whether the phenomenon type it binds is a derived characteristic or not: via measured_phenomenon_type phenomenon_type measurement_type via -derived_cfiaracteristic = "no" -name = "flows" -name = "traffic increase" via -derived_cfiaracteristic = "no" -name = "flows" -name = "traffic increase" via -name = "traffic increase" via measured_phenomenon_type phenomenon_type -name = "traffic increase" -derived_characteristic = "yes" phenomenon_type -derived_characteristic = "yes" -name = "5 min cummulative flows" Table Detectionjnethod should serve as a place for storing different methods for measuring network characteristics but it has not been used in the prototype so far. 2.4.5 Observation storage Measurements obtained during the experiment are located in the observation storage. Table Measurement keeps all the numeric values of all measured phenomenon types. The values stored there are the actual measured values sent by probes during the experiment. The phenomenon types that use an enumeration of values rather than numeric values (the ones with records in table Phenomenon) have their measurements stored in table Category_observation. There is also a table Observation which serves as a parent of two previously mentioned tables and keeps the information about the origin (which probe acquired the data) and the date and time of each measurement. It also specifies whether the measurement took place in a link or in a host's interface (either incoming or outgoing communication) by providing its identifier. 10 3 Technologies for Mapping Relational Data to Objects in Java Many contemporary applications use huge amounts of data which need to be stored and retrieved in some fashion. Although there is a wide range of available technologies for data management, relational databases still remain the most popular general purpose data stores that exist. The vast majority of world's corporate data is most likely stored in them [8]. However, in object oriented environment such as the one provided by Java, objects are used to represent data. They are an ideal abstraction for building complex systems as they have access to features like encapsulation, inheritance or polymorphism. The problem is that objects are only accessible when the Java Virtual Machine (JVM) is running. If J V M stops, all of the objects are lost unless there is a mechanism which stores them for later use [9]. The common solution is to store important objects in a relational database. The mechanisms that are responsible for it must ensure that data which were held by the objects during execution of the application are accessible even after the application has been terminated. They must also provide a way of recreating the objects from the stored data. This is called the object persistence [10]. Using relational databases to achieve object persistence, however, introduces a new problem to the application development which is commonly known as the object-relational impedance mismatch. 3.1 The object-relational impedance mismatch The impedance (or paradigm) mismatch between the relational and object environment occurs due to their different perception of data. Relational databases see data as relations that are stored in tables made of rows and columns. Data identification is provided by special and unique columns or sets of columns called primary keys. Foreign keys and join tables represent the relationships between tables. O n the other hand, objects have their identity, state and behavior. They can inherit from other objects and may have references to collections of other objects or themselves [9]. It can be also said that object-oriented paradigm is based on proven software engineering principles, while relational paradigm is based on proven mathematical principles and therefore the two technologies do not work together seamlessly [11]. There are five particular mismatch problems [12] [13]: 11 1. Granularity In relational databases granularity can be implemented in only two levels: table and column. Moreover, columns should have only atomic values (first normal form). If there is a need to model a composition, the table must either have all columns of all composite objects or have foreign keys to other tables that again can only have atomic columns. However in object oriented languages, programmers can define classes with different levels of granularity: coarse-grained classes like User which could be composed of several finer-grained classes such as Address or Person and also atomic values like username (String class). 2. Subtypes Inheritance is widely used feature in object oriented languages and therefore many object models have a lot of subtypes defined. This complicates conversion between the object and relational environments since there is no inheritance defined in the standard for relational databases. Although some databases implement it (like PostgreSQL [14]), inheritance is not usually supported. 3. Identity In relational databases, identity of a row is specified by its primary key. Each tuple in a table must differ from the others at least by its primary key and therefore each tuple is unique. Object oriented languages, however, define identity and equality for objects. Two non-identical objects (their location in memory differs) may have the same state and be recognized as equal. 4. Associations Associations in object environment are represented as unidirectional references between objects. If there is a need to define bidirectional relationship, the association has to be defined twice. O n the other hand, modelling relationships in relational databases is done by foreign keys and binding tables (if needed). 5. Data navigation To navigate through object oriented environment, associations between objects have to be used. Getting from one object to another is therefore done by walking through the object network. This is not an efficient way of retrieving data from a relational database. 12 To minimize the number of SQL queries, tables are usually joined first and data are selected from the resulting table. 3.2 Mapping relational data to objects in persistence layer Java Enterprise Edition (Java EE) applications are usually divided into several layers, which allows them to have different kinds of functionality separated. Most applications have three distinct layers [15]: "n. Presentation Layer The presentation layer exposes user interface which is responsible for presentation of data and interaction with users. The business logic layer encapsulates all application logic, data analysis and processing. The data access layer is responsible for storing and retrieving data from long-term storages. The mapping of relational data to objects in Java is usually Figure 3: Three-layered implemented in the data access layer which is also called architecture [61] the persistence layer. There are three currently popular approaches for handling the mapping in Java [16] [17]. Developers of persistence layer can directly use a call-level API for SQL-based database access. In Java it is the JDBC API3 . If using the JDBC API directly is not considered a suitable option for a project, a persistence framework can be used. A lot of persistence frameworks in Java provide Object-Relational Mapping (ORM) solutions. These frameworks hide the underlying relational storage and JDBC A P I calls. Also, by employing a variety of mapping techniques they allow developers to work only with an object domain model without the need of deep SQL knowledge. Although O R M technology has many advantages, sometimes it is not the best solution for the data access. In these cases there are certain hybrid and non-ORM (usually SQL-centric) frameworks, each one having its own advantages and disadvantages. The following chapters focus on each of the approaches along with some available technologies that implement them. 3.3 Direct JDBC API calls 3.3.1 History The JDBC API is the industry standard for connectivity between the Java programming language and a wide range of databases. It was first released as a part of Java 3 JDBC is a trademarked name, but it is often thought that it stands for Java Database Connectivity. Java™ DataBase Connectivity has been later added as a second trademarked name [21]. 13 Development Kit (JDK) 1.1 in February 1997 and since then has formed part of the Java Standard Edition (Java SE) [18]. Later versions came in years 1999 (JDBC 2.1), 2001 (JDBC 3.0) and 2006 (JDBC 4.0). The newest version, JDBC 4.2 was specified by a maintenance release of Java Specification Request (JSR) 221 in October 2011 and included in Java SE 8 [19]. 3.3.2 Technology The JDBC API is said to allow a Java programmer exploit "Write Once, Run Anywhere" capabilities for applications that need to access enterprise data [20]. As shown in Figure 4, there are two sets of interfaces that together form the technology: the first one is the JDBC API for application developers and the second one is the JDBC Driver API for driver writers [20]. In the rest of the text only the JDBC API for application developers will be considered. Java Application JDBC Driver Manager JDBC Implementation JDBC API JDBC Driver API Figure 4: JDBC interfaces [21] The JDBC API allows the developers to do three things [21]: 1. Establish a connection with a data source 2. Send queries and update statements to the data source 3. Process the results A simple example of all three steps is shown in the following code fragment: /* 1. Establish connection with a database */ Connection connection = Driver-Manager.getConnection(url, username, password); /* 2. Prepare and execute a query */ Statement statement = connection. createStatementQj String sql = "SELECT id, last_name FROM person"; ResultSet rs = statement.executeQuery(sql)j 14 /* 3. Extract data from the result set */ while(rs.next()) { int i d = rs.getlnt("id")j String lastName = rs.getString("last_name")j // ... use the retrieved data } 3.3.2 When to use the JDBC API directly The JDBC API is a low level database access tool in Java. As such, it does not introduce any unnecessary overhead while establishing connections with databases and executing SQL statements, which means that it has a very good performance. It also allows developers to use any desired features of the target relational database since it can execute any required SQL statement. However, there are several reasons why using the JDBC API directly is not a good choice for many applications. The JDBC API does not offer any automatic features for object mapping so programmers have to map the data to objects themselves. Moreover, writing the JDBC code is time consuming, verbose and therefore error-prone. A programmer has to write a code to obtain connections, prepare statements to execute, handle the results and also close all the connections and handle exceptions. A l l of this leads to a very complex and unmaintainable code in larger projects. Also, as the JDBC API calls and SQL statements are usually embedded into code, any change (such as changing the database vendor) is very difficult. Accessing relational data by direct JDBC API calls should be employed when there are very specific requirements laid upon the executed SQL statements or when the best possible performance is required. It should not be used when there is a complex business object model and changes are expected. It should not be used for large projects. Advantages Disadvantages Simple syntax, easy to learn Complex when used in large projects Good performance with large data Programming overhead Good for small applications No cache Control over executed SQL Database specific queries No SQL code generated Does not provide transparent persistence Table 1: Advantages and disadvantages of using the JDBC API directly 15 3.4 ORM frameworks Object-Relational Mapping, broadly referred to as O R M , is a technique for converting data between incompatible type systems in object oriented languages and relational databases. O R M forms an intermediary between the object model and relational model and creates, in effect, a "virtual object database" that can be used from within a programming language [22]. This should allow developers to work only with objects during the application development and shield them from the underlying relational database. Developers, therefore, do not need a thorough knowledge of the SQL and the source code is not bound with any database vendor, which simplifies the changing of the database vendor if needed. 3.4.1 History The need of O R M frameworks emerged after Java 2 Enterprise Edition (J2EE) was released in 1999. Two means for accessing persistent stores (mostly relational databases) provided by J2EE were JDBC and entity beans, which was a part of Enterprise JavaBeans (EJB) framework [23]. Since using the JDBC directly is not a suitable choice for many cases and the EJB framework was considered to be too heavyweight and resource consuming, new persistence solutions were being searched for. In 2002 the first popular, fully featured open source O R M solution was released - Hibernate [24]. Hibernate provided a simpler and more efficient way of persisting objects than EJB and soon became the most used persistence framework. Later, in 2006, a standard called Java Persistence API (JPA) was released as a part of JSR 220 [25]. As a new standard for managing relational data in Java, JPA combined the best ideas from other actively used frameworks and standards at that time [26], which apart from Hibernate and EJB were also the Java Data Objects [27] (JDO) standard and the TopLink framework. JPA is currently in version 2.0 and the standard is distributed as JSR 317 [28]. 3.4.2 Technologies The Java Persistence API is a specification describing how POJO (Plain Old Java Objects) can be persisted to a relational database without requiring the classes to implement any special interfaces or methods. It allows the definitions of object's object-relational mappings to be described by standard annotations within Java code or by X M L (extensible Markup Language) files. These mappings contain information about how Java classes map to relational database tables. JPA also describes how query processing and transactions should be handled, in its EntityManager API. Part of the JPA standard is focused on Java Persistence Query Language (JPQL) - an object query language that allows querying of the objects from a relational database. 16 Nowadays there are several implementations of the JPA standard, such as EclipseLink [29], Apache OpenJPA [30] and Hibernate4 [31]. Hibernate, with its many features, extensive documentation and vast active community, is the most popular O R M framework in Java. As an open source project, it can be used for free. It supports lazy initialization, numerous fetching strategies and offers a scalable architecture. It also implements a second level cache to speed up the execution of hot queries5 . Hibernate supports most of currently used relational databases like Oracle, DB2, SQL Server, MySQL, PostgreSQL and many more [32]. Since Hibernate provides many features and is generally a very mature project, it is very complex. Its complexity means a steep learning curve and it makes debugging difficult. On the other hand, one of the biggest advantages that Hibernate has against other O R M frameworks is a toolset created for the framework called Hibernate Tools [33]. It contains several powerful features that ease the development with Hibernate such as: • Mapping Editor - editor for creating Hibernate mapping files and supports autocompletion and syntax highlighting • Hibernate Console - helps with configuration of database connections and allows executing Hibernate Query Language6 (HQL) queries interactively against a database • a database reverse engineering tool for generating domain model classes, Hibernate mapping files and H T M L documentation There are also many tools and frameworks that employ the idea of O R M but does not follow the JPA standard. Some of these frameworks are: • Apache Cayenne [34] - an open source framework, which provides O R M features, caching and remoting services. Cayenne offers a mode where multiple clients can connect to a data source through a Cayenne controlled service (instead of via JDBC) which gives better control over centralized validation, caching and seamless persistence of objects. It supports a number of relational databases, which results in good portability. It also includes a GUI-based database schema modeler to simplify the learning of the framework and quicken the development of applications. 4 Although Hibernate was created before the JPA standard was established and has its own nonstandardized functionality and features, it also implemented the standard after the JPA was released 5 Hot query is second, third or any other later execution of a query (first execution is a cold query) 6 Hibernate Query Language [62] is a broader version of the JPQL. JPQL is a subset of HQL. 17 • Ujorm [35] - a small Java open source library partly inspired by Hibernate and Cayenne. It is a very lightweight framework with no library dependencies in the runtime. Ujorm offers a unique O R M module for rapid Java development and allows an easy configuration of an O R M model by Java source code, annotations or X M L . The key features include type-safe database queries that ensure the most of typing errors are detected before running the application. 3.4.3 When to use ORM framework Opinions about using O R M frameworks differ a lot. Some state that O R M is a bad concept, an anti-pattern [36], and that O R M frameworks usually introduce a lot of problems into a project [37] [38] [39]. The problem with O R M is that it is a leaky abstraction [40] so it cannot completely shield the underlying relational database from a programmer. If a user of an O R M tool has knowledge about how O R M works and how it cooperates with relational databases, O R M may be a right choice for many projects. Since trying to reach a full object abstraction over relational data generates complexity, O R M frameworks tend to be very complicated. This affects the performance of O R M frameworks. More mature and complex frameworks such as Hibernate must be properly configured to achieve acceptable performance [41], which may slow down a project. On the other hand, if there are not special requirements on a project's performance or there is an expert of the used O R M framework in a team, the specified problem may not appear and the framework can be used successfully. O R M frameworks can also be used when a developing team has a full control both over the business object model as well as the database model (schema). In this case, if one model evolves, there is no problem with evolution of the other model. However, in most cases the database and application developers are in separate groups and the database is often used by more teams. This renders any changes to the database schema very inconvenient and it must be considered when choosing the framework. A huge advantage of O R M frameworks is that they generate SQL statements themselves so they are not embedded in the code and the resulting code is portable between supported databases. Switching the underlying database is therefore very quick and usually only includes changing a database driver in the framework's configuration. The downside of this behavior is that some O R M frameworks do not support stored procedures and advanced non-standardized database features provided by many vendors. Some O R M frameworks also support the creation of mappings between objects and relations via annotations in the code, which can speed up the development significantly. 18 Figure 5 contains simple guidelines about when an O R M should be used. If a business object model of a project is very complex and an application's performance is not as important as for example portability, using an O R M tool should be beneficial for the project. On the other hand, if the model is simple and there is a requirement of high performance, an O R M framework could cause problems. If, however, an expert of the framework is present in the project, the O R M framework may be considered. Model Complexity Don't Use ORM Expert Definitely Use Optionally Use Definitely Use Throughput Figure 5: When to use an ORM [63] Advantages Disadvantages Transparent persistence Performance issues (needs tuning) Encourages object-oriented design Huge mapping overhead Easy to change database vendor Often very complex, difficult debugging Often packed with powerful tools Little control over executed SQL No deep understanding of database required Difficulties with handling complex queries Good caching support Table 2: Advantages and disadvantages of ORM frameworks 3.5 Hybrid frameworks O R M frameworks can solve a lot of problems that come with the impedance mismatch but may introduce new problems to a project if used inappropriately or in a situation when they are not suitable. Direct JDBC API usage is also usually not the best solution, mostly due to its verbosity, limited portability and code reusability. For these situations there are several frameworks which despite the fact they do not provide full objectrelational mappings, offer higher level of functionality than pure JDBC API. They could be categorized somewhere between O R M and JDBC and could be called hybrid frameworks. Hybrid frameworks tend to be more SQL-centric than O R M frameworks. They usually serve as an intelligent wrapper around the JDBC API and hide the need of a lot of boilerplate code that is typical for the JDBC API-based applications. Developers, therefore, do not have to create connections, prepare statements, iterate through result sets and sometimes populate objects with data themselves as the frameworks usually take care of such operations. The frameworks facilitate the execution of SQL via Java code and offer additional features. 19 3.5.1 Technologies Each of the hybrid frameworks is unique and solves particular problems. The purpose of each framework has to be considered when choosing the correct technology for a project. Three different technologies will be briefly described in this section: Spring JDBC Framework, jOOQ and MyBatis. The Spring JDBC Framework [42] is a part of the Spring Framework [43]. It provides solutions to the low-level details of the JDBC API, like opening and closing connections, preparing and executing SQL statements, processing exceptions and handling transactions. A developer must only define connection parameters, specify the SQL statements to be executed and determine the work for the iterations that take place during fetching data from a database. The core class of the framework, which manages all database communication and exception handling, is the DdbcTemplate class. It supports full functionality of the JDBC API while offering an automatic clean-up of the resources, translation of the standard JDBC exceptions into RuntimeExceptions for better flexibility and provides several ways for database querying. Java Object Oriented Querying (jOOQ) [44] is a Java database framework for building type-safe portable SQL queries and their execution. The main idea behind jOOQ is that SQL is a declarative language that is hard to integrate into object-oriented programming languages but it is the correct tool for database querying. jOOQ takes SQL as an external domain-specific language [45] and maps it onto Java, creating an internal domainspecific language (DSL). jOOQ's main feature is the SQL building. Developers are able to construct valid SQL statements directly from the Java code using the internal DSL provided by jOOQ. The constructed statements can be afterwards executed against any of the many supported databases. The builder supports all of the standardized SQL functionality (like insert, update or any sort of select with joins, groups, etc.) and also some vendor specific features (like MySQL's encryption functions) if a corresponding DSL subclass is used. Other features of jOOQ include SQL execution and code generation tools. The code fragment shows a simple SQL query written with the jOOQ framework: create.select(AUTHOR.FIRSTJWIE, AUTHOR.LASTJWIE, BOOK.TITLE) .from(AUTHOR) .join(BOOK).on(BOOK.AUTHOR_ID.equal(AUTHOR.ID)) .where(BOOK.LANGUAGE.equal("EN")) 20 MyBatis [46] is the most popular hybrid framework. Its first version was released on July 1, 2002 by Clinton Begin under the name "The iBATIS Database Layer" [47]. The framework was not actually released as a separate product back then, yet it was a part of JPetStore - a Java implementation of Microsoft .NET's Pet Shop [48]. However, it was accepted well by the Java community [49] and was later released separately under the Apache Software Foundation. When the project left the Apache Software Foundation, its name was changed to MyBatis. MyBatis is a Java persistence framework that couples objects with stored procedures or SQL statements using an X M L descriptor or annotations. When compared to the O R M tools, the biggest advantage of the MyBatis data mapper is simplicity. Unlike O R M frameworks, MyBatis does not map objects to database tables but methods to SQL statements. It provides a mapping engine that maps SQL results to object trees in a declarative way (i.e. after the mapping of the result columns to object properties is specified, MyBatis automatically creates and populates new objects with the data from the result). One of the most powerful features of MyBatis is also its Dynamic SQL capabilities that allow developers to write flexible SQL statements which may be interpreted differently with different parameter values. Since all SQL queries in MyBatis applications are hand coded rather than generated, they can be properly optimized and tested before being deployed. A l l the database functionality and vendor specific features can be used easily this way, too. The downside is that it makes portability between databases more inconvenient if vendor-specific syntax is used. 3.5.2 When to use hybrid frameworks Hybrid frameworks differ a lot from each other. A l l of them have their own advantages and disadvantages and solve specific problems tied with accessing relational databases from Java. This means that if using any O R M framework or the JDBC API directly is not suitable for a project, a hybrid framework can be handpicked to match a particular problem. As hybrid frameworks are generally oriented towards SQL more than O R M frameworks, they excel in projects where developers have good knowledge about SQL and know how relational databases work. They are usually very lightweight since no complex objectrelational mappings take place and therefore are easy to learn, use and debug and have decent performance. 21 Advantages Disadvantages Full power of SQL Does not provide full O R M solution Good performance SQL must be hand-written Simplicity Weak caching support Easy debugging No fully transparent persistence Can be chosen for a specific problem Knowledge about relational databases and SQL needed Table 3: Advantages and disadvantages of hybrid frameworks 22 4 System Requirements and Architecture The Cybernetic Proving Ground's visualization module has two main responsibilities: to expose a GUI for controlling the system and to present the process of scenario simulation execution and its results to a user. The presentation of scenario execution is done by a number of more or less independent visualizations, each concentrating on a specific part or view of the simulation. Security scenarios may require to be visualized by specific visualizations. As new scenarios will be created, additional visualizations will be deployed to the existing module. The module is developed as a web application. To ensure appropriate scalability, the visualizations are implemented as portlets inside an enterprise portal. The chosen platform for hosting the visualizations portlets is a free and open source enterprise portal written in Java, called Liferay Portal [50]. In order to present the simulation process and results, each visualization portlet needs a mechanism for accessing the database and acquiring the appropriate data. One possibility was that every portlet would maintain its connection with the database separately (individually) but that was not considered as suitable option since every portlet developer would have to understand the database schema and program the data access code and queries himself. This could possibly slow down the development process and the resulting code would be difficult to maintain. Therefore it was decided that there should be a separate backend data service that would shield the frontend portlets from the database. Only the new Visualization Data Service (VDS) would access the database and prepare all of the required data. This concept is shown in the Figure 6. The first step when developing a new application or service is to analyze the problem and specify the requirements. Afterwards, based on the results of the analysis, the selection of technologies suitable for implementation and the design of application architecture take place. These parts of software product development are extremely important as any design or technology change would need a lot more resources (time, people, money, etc.) in later stages of the project than in the beginning. The analysis and design of the VDS is described in the following sections. 23 Figure 6: The placement of the Visualization Data Service 4.1 Requirements The Visualization Data Service was going to be designed and implemented at the same time as the visualization portlets to which it would provide the data. Therefore, most of the functional requirements on the service would be specified during the development. In this case the functional requirements are almost entirely various data services that the VDS would provide to the portlets based on their requests. The architecture of the VDS has to allow simple adding of the functionality when needed. The result was that only a set of non-functional requirements on the Visualization Data Service and its architecture was established. It was based on the information about the C P G project, the data model and planned visualization portlets. The non-functional requirements are as follows: Simple architecture The architecture of the service should be as simple and transparent as possible. Firstly, it is generally a good idea to follow the KISS principle [51] as it usually leads to faster development and more maintainable code. The second reason is that the current implementation of the VDS is supposed to be a prototype for the Release Candidate 1 of the visualization module and after that it will be probably taken over by a different developer, most likely a student. Simple architecture and easy-to-learn frameworks should be used in order to ease the developer change. Ability to adapt to frequent database schema changes In the beginning of the development of the data service, the C P G database schema (the current version is described in the section 2.4) was not final and was supposed to change 24 occasionally if any of the C P G modules required it. The implementation of the VDS must not need a complex code refactoring if such a change occurs. Optimized for reading relational data The data measured during a scenario execution will be stored in a relational database7 by other modules than the visualization module. The VDS responsibility will be only to read the required data, process them and expose them to the visualization portlets. It will not support any insert, update or delete operations. Performance The amount of data that are going to be stored during the experiments is expected to be enormous as there will be several network and host characteristics measured periodically on a huge number of hosts, links and network interfaces. The service should not introduce unnecessary overhead while reading the data so it does not excessively slow the visualization portlets and their interaction with the user. Possibility to change the database vendor Although it was agreed upon PostgreSQL as a chosen database, the service application should not be tightly coupled with any database vendor and should allow a relatively easy change of the vendor without extensive code modifications. Independent from visualization portlets By separating the data service from the portlets, the visualization module becomes more scalable. The development of the service and portlets can be easily separated as well (provided that the format for data requests and responses is agreed upon) and they can be deployed individually. 4.2 Choosing technologies Several technology decisions had to be made prior to designing the application architecture. First of all, the application has been chosen to be written in Java as the rest of the visualization module. Using Java ensures that the application will be platform independent, which is important as no information about the expected deployment of the service had been provided. Furthermore, it was decided that the Visualization Data Service will be implemented as a web service. It will expose a Representational State Transfer (REST) API for data requests and return JavaScript Object Notation (JSON) 7 Specifically, a PostgreSQL [64] database has been chosen for the CPG prototype. 25 objects in its responses. This eases the separation of the service from the visualization portlets and enables it to be deployed independently. The Spring Framework (later in the text referred to as Spring) [43] has been chosen as the core framework the project will be built upon. Spring provides many useful features for Java projects such as its implementation of Inversion-of-Control (IoC) container, which simplifies the unit testing and promotes creating reusable code. Spring is also packed with a powerful the Web Model-View-Controller (MVC) Framework with support for creating RESTful services. As the VDS is primarily a data providing service, the most important technology decision was to choose a suitable means of data access. Embedding pure JDBC API calls would make the project hardly maintainable and changing database schema or vendor would require significant code changes. Therefore, the main decision was whether to use an O R M framework or one of the hybrid frameworks. After comparing the leading technologies, MyBatis [46] framework has been selected. Hibernate or generally any JPA implementing framework works better as a full O R M solution in projects where various C R U D 8 operations are executed frequently and database portability is the most important aspect rather than simplicity and adaptability to data model changes [52]. jOOQ is good for executing portable SQL statements but it is also not suitable for projects with possible model changes. On the other hand, MyBatis suits the project requirements well as it is not a complex framework and has a good overall performance without unnecessary overhead, while providing sufficient data-to-object mapping features. The SQL statements in MyBatis are hand written so they can be easily modified if there are any changes of the data model. For integration of the two chosen frameworks, MyBatis-Spring [53] project has been used. It is specifically designed to connect the MyBatis and Spring frameworks so they form a well cooperating base for building applications. Project's library dependencies and building will be managed by Maven [54], a proven software management and comprehension tool released by Apache. 4.3 Architecture The architecture of the VDS is designed in three main layers. Although it is not a typical web application but rather a web data service, the layers are similar to the ones described in section 3.2, with a few differences as shown in Figure 7. The layered architecture was chosen because it improves the reusability and maintainability of the 8 Create, Read, Update, Delete 26 code and also transparently separates the responsibility of the classes. The implementation of each layer is thoroughly explained in the chapter 5. Request U R L J S O N response + t P r e s e n t a t i o n l a y e r S e r v i c e l a y e r D a t a a c c e s s l a y e r REST API •1 , Spring Web MVC Controllers + r 3 Framework Service interfaces Service implemantations Mapper interfaces Mapper XML . _ . files + M y B a t l s data Database Figure 7: The architecture of the Visualization Data Service The top layer is the presentation layer, which exposes a web interface of the service; in this case it is the REST API. It receives requests from its clients and responds them with JSON objects populated with the desired data. The objects that form the presentation layer are called controllers. For the request handling and response creation, controllers use features of the Spring Web M V C Framework. Beneath the presentation layer there are the service objects, forming the service layer. Services gather raw data objects from its underlying data access layer, process and transform the data (if needed) and send them to their respective controllers. Each service exposes an interface, which is used by the controller objects, in order to offer a simple way of switching the implementation of service objects if needed. The last layer, which separates the rest of the application from a database, is the already mentioned data access layer. This layer is formed mostly by the MyBatis framework, which secures the correct execution of SQL statements defined in files called mappers. 27 Mappers are formed by mapper interfaces and mapper X M L files and their purpose and usage will be explained in section 5.4. Apart from the described horizontal layers, there is also a vertical separation of the project in form of components based on their functionality and usage by visualization portlets. There are currently three main components - network, measurement and time component. Each of them encompasses the data services logically related together and to the type of visualization portlets that are going to use them. The network component makes the network related data available and is used by C P G topology visualization portlet. It is divided on network usage service and topology service in the service layer. The measurement component gives information about the phenomenon types measured during a scenario execution and about the actual stored data. It is used by portlets that visualize the measurements of various phenomenon types and the change of their values during a simulation. The time component offers the information about the time, date and the time zone of the stored measurements. The service is mostly used by the time portlet which controls the timeline of all of the visualizations. The documentation of the services offered by each component of the VDS can be found in the appendix A . The last architecture decision was to use Data Transfer Objects (DTOs) instead of business model objects [55]. DTOs are ordinary POJOs that are serializable [56] and contain only private fields accessible via getters and setters. When request is received and desired data are extracted from a database, appropriate DTOs are created and populated by the data. Afterwards they are moved through the layers, modified if necessary and transformed into JSONs as web service's response. The reason behind using DTOs instead of business objects is that the visualization portlets have usually very specific data needs and therefore require specific objects to hold and transfer these data. If more general business objects were used they would still have to pass the data to some DTOs before creating a response which would introduce unnecessary complexity. 28 5 Implementation The chapter about the implementation of the VDS will firstly explain the package structure of the project and then focus on each of the project's layers and how they utilize used technologies. Afterwards, it will explain how the documentation of the REST API was created. In the end the current deployment of the VDS within the C P G is portrayed. This chapter's main purpose is to guide developers that will take over the project in the future through the whole implementation and technology usage and help them with faster integration into the visualization development group. 5.1 Package structure Before the actual implementation can be described it is important to understand the package and directory structure of the project. As shown in Figure 8, the structure follows the standard directory layout [57] used in Maven projects. O n the top level there is pom.xml, which is an X M L file that contains project information and configuration details used by Maven to build the project. The file includes project's library dependencies, used plugins and some additional information such as the project version and license. The application source directory is /src/main/java. Inside, there is the project's main package cz.muni.fi.kypo, which contains all source packages and Java files that form the application. The packages are: • Rest - contains controller classes that handle the rest requests. Controllers are described in section 5.2. • Service - this package is further divided into a p i and impl packages; a p i provides the interfaces that are called and used by controller classes and impl provides the E^l- El src main ava cz.muni.fi.kypo El mappers El rest El service 3—B transfer El error S measurement El networkusage S time topology EJD— D resources mappers —J9i applicationContext.xml — [ill jdbc.properties — [ill log4j.properties — project.properties B—El webapp ED-El WEB-IN F E3- El test ED— D Java ED—Cl resources — tTJ pom.xrnl Figure 8: Package structure of the VDS project 29 concrete implementations of these interfaces, which are then injected into the controller objects. Services are discussed in section 5.3. • Mappers - the package includes all of the mapper interfaces that are used by MyBatis in the data access layer for calling SQL statements. The role of mapper interfaces is explained in section 5.4.4. • Transfer - the transfer package is the base package for all DTOs used by the application. It is subdivided into several packages - error, measurement, networkusage, time and topology - each containing the DTOs used by their respective service or controller classes. The error package contains a class for creating error objects which are used in exception handling (section 5.2.4). Application's resource files are held in directory /src/main/resources. There are three kinds of files: • The application context X M L file used by Spring for IoC container initialization. • Properties files with various project settings for database connection, logging, etc. • Mapper X M L files inside the Mapper package. The mapper X M L files are described in section 5.4.3. Directory /src/main/webapp is the web application source directory. In web applications it usually keeps all the frontend view files, cascading style sheets and JavaScript libraries. However, since this project exposes a REST API and returns JSON objects to the callers, there is no need for such files. Therefore, the directory contains only the WEB-INF subdirectory with the web application deployment descriptor file, web.xml. The test sources and resources are located in /src/test/java and /src/test/resources. Unit tests that are implemented here secure that the data processing methods at the service layer return correctly transformed data. They also verify that there will be no null values returned by the service methods, if they are not expected. 5.2 Presentation layer The presentation layer of the VDS exposes a REST API for the implemented data services. For handling the REST requests, the Spring Web M V C framework is used as it supports easy creation of RESTful web applications and services since Spring 3.0. After receiving a request, the presentation layer calls the underlying service layer for the processed data and generates a response in form of a JSON object. 30 The following sections explain how the presentation layer is implemented and handles the requests and how the JSON objects used in responses are created. Further the error handling and the solution for cross origin requests is described. 5.2.1 Configuration and initialization The core of the Spring Web M V C framework and also the presentation layer of the VDS is a dispatcher servlet, which dispatches the received requests to the appropriate handlers. Requests are in the form of Uniform Resource Locator (URL). In the VDS, there is a sole dispatcher servlet with the name kypoDispatcher configured in the web.xml file and it is mapped to all incoming URLs, which ensures that it receives all of the REST requests: kypoDispatcher /* The dispatcher servlet then needs to know which classes contain handler methods, so it can properly forward the requests during the runtime. These classes are called controllers and are marked with the (SController annotation. Classes annotated as controllers are identified as Spring application components and are automatically searched for and instantiated during the initialization of the IoC container. This is done thanks to the following two lines defined in the applicationContext.xml: 1. 2. The first line instructs Spring to look for and instantiate components that are located in the cz.muni.f i . kypo package and the second line specifies that these components can be recognized by their annotations. There are currently three independent controllers implemented, one for each of project's components: network, measurement and time controller. Each of them extends the BaseController class, which contains methods common to all controllers. 5.2.2 Request handling The request handling in project's controllers is secured by using various annotations. The most important one is the @RequestMapping annotation and its value property. It maps 31 a U R L9 (or U R L pattern) onto an entire controller class or a particular handler method. The class level annotation maps a specific request path (or path pattern) onto a controller with additional method-level annotations that narrow the primary mapping. Some handler methods annotated with (SRequestMapping also contain parameters that can be afterwards used in the method's body. Parameters are either placed directly to the URL pattern defined in the @RequestMapping and then used via the @PathVariable annotation on a method parameter or they are specified exclusively as a method parameter annotated as @RequestParam. This is a code fragment from the MeasurementController class showing the usage of the above-mentioned annotations: ^Controller (5>RequestMapping(value = "measurement") public class MeasurementController extends BaseController { (5>RequestMapping(value = "/{element}/{id}/initialData", method = RequestMethod.GET) (SResponseBody public MeasuredDataTO getInitialData( (SPathVariable String element, (SPathVariable i n t i d , (5>RequestParam(value = "phenomenons") List phenomenons) { //implementation omitted } } In this case, all requests with path /measurement are forwarded on MeasurementController by the dispatcher servlet. More specifically, the requests that match the pattern / { e l e m e n t } / { i d } / i n i t i a l D a t a will be handled by the g e t I n i t i a l D a t a ( ) method. A correct U R L with specified parameters can be for example: /measurement/link/l/initialData?phenomenons=Number of bits,Number of packets 9 Note: all mentioned URL mappings means just the ending of the URL of the REST request. For example, the whole URL would be http://localhost:8080/service-name/mapping, while the ©RequestMapping annotation defines just the /mapping part. 32 After receiving such a request, the element variable would be set to "link", i d would be "1" and phenomenons would contain a list of strings "Number of bits" and "Number of packets". 5.2.3 Creating JSONs JSON responses are created automatically by using Spring's @ResponseBody annotation on handler methods in controllers. If this annotation is put on a method, it indicates that the returned object should be serialized and written straight to the HTTP response body. That means that there is no need to create or modify any HTTP response manually as Spring takes care of it completely. The usage of @ResponseBody can be seen in the code fragment from MeasurementController class in the previous section. Spring converts the returned object to a response body by using an HttpMessageConverter. There are various types of converters such as StringMessageConverter for strings or ByteArrayMessageConverter for converting byte arrays. For converting objects to (or from) JSON (which is desired in the project), MappingDackson2HttpMessageConverter has to be initialized. Most of the HttpMessageConverters are automatically set up by Spring if there is the tag specified in the application context file. However, for the initialization of Mapping]ackson2HttpMessageConverter, there also has to be the Jackson 2 library [58] for processing JSON data format included and present in the project, otherwise the converter would not be loaded. Since there are several different converters initialized, the Jackson converter is not always chosen by Spring to do the conversion. This happens with L i s t , String, Long and some other common types as they have their own converters. It is a problem, because such converters do not produce JSON responses. To force Spring to pick the Jackson converter, the objects with these troublesome types are put into a Map, which is correctly converted into JSON. A code fragment from TimeController shows an example of returning a value with type Long: (5>RequestMapping(value = "/experiment/start", method = RequestMethod.GET) (SResponseBody public Map getExperimentStart() { Map experimentStart = new HashMap()j experimentStart.put("experimentStart", timeService.getExperimentStart())j return experimentStart; } 33 5.2.4 Exception handling As in every application, it may happen that during the runtime of the VDS an exception is raised. However, the client who issued the REST request expects that he will be returned a JSON object in the response. Therefore, if an exception occurs, it must be caught, logged for later debugging and transformed into a comprehensible error message in JSON format that will form a response. Since the exception handling is required in each controller, it is implemented in the BaseController abstract class, which the other controllers inherit from. The BaseController contains two exception handler methods marked with the Spring's (SExceptionHandler annotation - one for catching runtime exceptions and the other for catching checked exceptions. When an exception is raised, appropriate handler method intercepts it and logs the problem. Afterwards, an error object (which will be serialized into correct JSON) is constructed and has its HTTP response status code set to 50010 so the client knows that there was a problem while processing the request. 5.2.5 Cross-origin requests The fact that the VDS is independent from the visualization portlets means it may be deployed at a different site than the portlets - it has a different origin. Some portlets use JavaScript's XMLHttpRequest API for sending data requests to the VDS. The problem is that the same-origin security restrictions may prevent a client-side web application running from one origin (= a visualization portlet) from obtaining data retrieved from another origin (= the VDS). A Cross-Origin Resource Sharing (CORS) mechanism [59] has been invented to overcome such restrictions. CORS defines a technique for exchanging data between a client and a server with diverse origins by using a number of Access-Control headers. The VDS uses a CORS Filter library to set up a filter which enables and controls any cross-origin communication. The filter is configured to allow the serving of requests from any origin, but only the HTTP GET requests, as the data service should not permit any data modification. The configuration is located in the web application deployment descriptor file, web.xml. 5.3 Service layer The service layer, as mentioned before, is the main data processing layer of the project. Services call the SQL queries via the mapper interfaces from the data access layer and process and transform the data into their final form. 1 0 Internal Server Error 34 The following sections will discuss the architecture and configuration of the service layer and describe how the service layer helps with data processing and service optimization. 5.3.1 Architecture and configuration The service layer is comprised of four service interfaces and their implementations: time, measurement, topology and network-usage. Although both topology and network-usage are parts of the network component, they are separated on service layer as they offer two different types of data about a network - the topology service returns information about the network nodes and their roles while the network-usage service focuses on the link load and the general usage of network. The separation of service layer into interface and implementation classes allows a simple switching of the data access technology. If there was a need of using another technology than MyBatis in the data access layer in future, only minor changes would be needed in the presentation layer thanks to the usage of interfaces. Each service implementation class is annotated as @Service, which ensures that they will be recognized as Spring application components. The process is similar as with controllers - < context: component-scan /> tag from the Spring application context will search for and instantiate all annotated services during the initialization of the IoC container. Services are then ready to be used. 5.3.2 Processing data and optimization A lot of the data processing and application logic is placed in the actual SQL queries in the data access layer. They query only for the required data and use database functions to transform the data if needed. However, this only works for simpler tasks, where there is just one SQL call sufficient to gather all the data for a response. In such cases, services only forward the result that they got from the data access layer. The primary task of the service layer is the extraction and processing of more complex data (i.e. when more than one SQL query have to be executed) and the optimization. Regarding the optimization, there are cases in the project when a result for a data request could be easily constructed by executing several simple SQL queries. That would be, however, highly inefficient as each database call is a costly operation. Therefore, a single and more general SQL query is executed and the result it provides is processed at the service layer, which afterwards prepares the correct data for the response. Naturally, the algorithm that processes the data must provide faster results than the execution of multiple SQL queries; otherwise it would not be beneficial to the project. GetDataByTimestampRange() method from the measurement service is a good example of this technique. By using multiple queries, it returned results in around two to 35 three seconds, which was unacceptable performance. After the optimization with a single SQL call and an processing algorithm with 0(n) time complexity it is able to generate a result in roughly 50 milliseconds. 5.4 Data Access Layer The core of the data access layer is the MyBatis framework in version 3.2.3. After acquiring the database connection from the Spring framework it executes the SQL queries, creates the DTOs and populates them with the extracted data. The constructed DTOs then work their way up through the rest of the application's layers and are returned as a response to the user. The following chapter will describe the main components of the MyBatis framework that are used in the VDS and how they are initialized and configured. Later, the concept of the mappers and their usage is explained. The Dynamic SQL capabilities provided by MyBatis and their contribution to the project are shown in the end. 5.4.1 SqlSession and SqlSessionFactory The primary component and the most powerful class of MyBatis is the SqlSession. Everything from getting the correct mappers to executing the SQL statements is done via SqlSession instances. The creation of SqlSession objects is the responsibility of the SqlSessionFactory class. Normally, an instance of the SqlSessionFactory is acquired from the SqlSessionFactoryBuilder which has to be manually invoked in the code and must be given the configuration details defined in a configuration X M L file. However, the VDS is built with the Spring framework and uses the MyBatis-Spring library, which eases the configuration and concentrates it to the Spring application context file. The SqlSessionFactoryBuilder is replaced by the SqlSessionFactoryBean, which is instantiated automatically during the IoC container initialization and handles the creation of a SqlSessionFactory object when it is needed. 5.4.2 Configuration of the data access layer's components The configuration of the data access layer's components is located in the applicationContext.xml file and consists of two parts. The first part is the construction of the SqlSessionFactoryBean, which is given three properties as shown in the following fragment: 36 The dataSource is a common JDBC data source bean specified elsewhere in the application context file and referenced here. The dataSource is later used by the SqlSession instances created from the SqlSessionFactoryBean to acquire database connections. The mapperLocations property specifies the route to the mapper X M L files which contain the SQL queries. The last property, typeAliasesPackage, is optional. However, if it is configured, the full class names can be replaced by their shorter version without the package name in the mapper X M L files. For example a class with name cz.muni.fi. kypo. t r a n s f e r , topology. RouterTO can be referenced just as RouterTO, which improves the readability of the mapper files. The second part of the configuration is the registration of the mapper interfaces. Each mapper interface has to be registered in order to be used for SQL query execution. MyBatis-Spring library eases the process as normally they would have to be specified separately, but with the library it is possible to automatically register all mapper interfaces with a single tag placed in the application context file: Although the mapper interfaces are registered with MyBatis after this configuration, they also need to be recognized as Spring components, so they can be injected and used by the services. In order to accomplish this, each one of them is marked with the Spring's ^Repository annotation. 5.4.3 Mapper XML files There are two types of mappers in the VDS's data access layer - the mapper X M L files and the mapper interfaces. Their cooperation is the key to executing SQL queries and mapping the returned data to the DTOs. Before explaining interfaces and mapper cooperation, it is important to understand the concept of the mapper X M L files. The mapper X M L files contain all SQL statements that the project is able to execute against the database. The SQL statements may be of any kind - data modification language (CRUD operations) or data definition language. The VDS, however, only extracts the data, so the only type of SQL statements used in the project are SQL queries - selects. Each SQL query defined in the mapper X M L file is enclosed in the SELECT r.id as id., r.src_network_id as srcld, r.dst_network_id as destld FROM routing r The identifier of this query is g e t A l l L i n k s and the result type is LinkTO, which means that one row of values from the resulting columns id, s r c l d and d e s t l d will populate the newly created LinkTO object's properties of the same name. Also, if there are more rows in the result, more objects will be created and returned as a list of LinkTO objects. The resultType attribute can be used for simple mapping when these two conditions are fulfilled: • The names of returned columns match the names of result type's properties • The result type contains only single object properties If the result type contains a collection of objects that needs to be mapped to the returned data or the column names does not match to property names, a resultMap must be specified and used in the select. In the following code fragment from the TopologyMapper.xml file, there is a result map prepared for the RouterTO result type, which contains a collection of Nodelnterf aceTO objects: c o l l e c t i o n property="nodelnterfaces" ofType="NodelnterfaceTO"> 38 The following fragment shows how the previous result map is used i n a s e l e c t : 5.4.4 Mapper interfaces and mapper cooperation The SQL queries that are stored in the mapper X M L files can be only executed by an SqlSession object. Direct work with the SqlSession objects is impractical and therefore the concept of mapper interfaces was invented in newer versions of MyBatis. Mapper interfaces are Java interfaces that does not have any implementation and act as a facade of mapper X M L files - calling a method of a mapper interface will in fact execute an SQL statement from a mapper X M L file. Internally, a mapper interface is instantiated by an SqlSession object, which then provides all the SQL execution capabilities. After registering mapper interfaces in the application context file (as described in section 5.4.2), MyBatis-Spring creates the SqlSession objects and instantiate mapper interfaces automatically so they can be used without any further configuration. The only constraint is that mapper interfaces must have the same name as the mapper X M L files1 1 . When a mapper interface's method is called in the code of the VDS, the SqlSession object invokes the execution of an SQL query that has the same i d as is the name of the called method. For example timeMapper.getAllTimestamps() call invokes the execution of the s e l e c t with id="getAllTimestamps". The return type of the mapper interface's method must match to the result type (or result map's type) defined with the SQL query. If a list of objects is going to be returned after the SQL query execution, the method's return type has to be a list of those objects. Mapper interfaces can also supply the mapper X M L files with parameters. The parameters in method signatures are marked with the @Param annotation and are assigned a name. The following example is a code fragment from the MeasurementMapper interface: public String getMinTimestamp (@Param("elementId") i n t i d //rest of the parameters omitted 1 1 Except the file extension 39 Such parameters can be afterwards used in the SQL queries in mapper X M L files with #{param_name} syntax. 5.4.5 Dynamic SQL Sometimes there is a need to construct SQL queries dynamically at the runtime, based on the values of specific variables. It is usually very troublesome to prepare dynamic SQL queries, but MyBatis offers a fairly simple way of doing it in the mapper X M L files. MyBatis's dynamic SQL allows using a variety of tags inside the SQL queries, which then act as SQL templates. Before an SQL template is executed against a database after being invoked by calling a method of a mapper interface, it is evaluated into a correct SQL query. The evaluation is based on the parameters that mapper interface passes to the mapper X M L file. The ability of MyBatis to create dynamic selects easily is widely used i n the VDS, particularly in two situations. The first situation happens when there are two SQL queries that are almost the same except a small part of the where or from clause. In such case the possible fragments are enclosed in i f or choose-when tags and the correct one is chosen by passing a specific parameter value via the mapper interface. In the following example from MeasurementMapper.xml, the value of parameter elementType determines which column of table observation should be used for i d comparison: observation.routing_id = #{elementld} AND observation.node_interface_id_in = #{elementld} AND observation.node_interface_id_out = #{elementld} AND The previous structure prevents the repetition of code and lowers the possibility of creating bugs if the SQL query has to be modified (the changes are only made in one query). The second situation in which the dynamic SQL capabilities of MyBatis are employed in the VDS happens when a parameter that is going to be passed to the mapper X M L file contains a list of values. Such list usually varies in its length and there is no other way to 40 construct an SQL query which uses it than building the query dynamically. For these cases there is a f oreach tag, which usage is shown in the next code fragment also from the MeasurementMapper.xml file: WHERE phenomenon_type.name IN #{item} This structure correctly expands the phenomenonNames list parameter into a list of values separated by commas and bounded by brackets. 5.5 REST API documentation The documentation of the REST API exposed by the VDS is very important for the developers of the visualization portlets. The portlets are the clients of the service and therefore they need to know how to call the data services (the U R L patterns), what data the services return (the data semantics) and how the data are organized (the structure of JSONs). Maintaining the documentation in an external file may introduce many problems because it has to be separately updated whenever there is a change in the code. If a discrepancy between the external documentation and the code occurs, it often ends up with numerous errors on the client side and requires developer time to perform unnecessary debugging of their code. Therefore it was decided that the documentation will be kept in the code and will be rebuilt each time the code compiles. However, such documentation also needs a way to be accessed from the outside of the code and should be easily readable and understandable. For the above-mentioned reasons the VDS uses the JSONDoc [60] library for the construction of REST API documentation. JSONDoc provides a number of annotations that can be used to document the request handler methods in controllers and also the objects that are being returned in JSON format - in VDS's case they are the DTOs. If JSONDoc is correctly set up and all controller and DTO classes are properly annotated, the documentation can be returned from the VDS in form of a JSON object by sending a REST request with path / jsondoc. Although the documentation in form of a JSON object can be easily parsed, it is not the most suitable format for people to read. Therefore the JSONDoc UI project is utilized, which parses the returned JSON object and provides an intuitive GUI for browsing the 41 documentation from a web browser. It also incorporates a playground component via which a user can easily send REST requests to the service and see what they return. The JSONDoc UI is deployed along with the VDS and is used by the visualization portlets' developers. 5.6 Deployment The Visualization Data Service is currently deployed in the same virtual machine as the Liferay portal with the visualization portlets. Each portlet uses the REST API of the VDS via the HTTP protocol to obtain the data it requires. The VDS gathers the data by connecting to a PostgreSQL database, which is located on another virtual machine, via JDBC. The data are stored in the database by the measurement module of the C P G during the execution of a scenario. «Virtual machine>> U bun tu «application Server» Apache Tomcat « c o m p o n e n t » Liferay a : Allowed values: l i n k , node-interf ace-in, node-interface-out A list of required phenomena. Must contain their full names separated by comma from each other Response object Object: Measured Data Multiple: False 54 Path: /measurement/{element}/{id}/nextData?phenomenons= {phenomenons}×tamp={timestamp} Description: Returns the next timestamp (the next to the timestamp specified in the request URL) that has got the data for at least one of the specified phenomena + list of data. The data in the list are in the order of the phenomenon names specified in the request URL. Each phenomenon has exactly one value. The returned measured values are the most recent known data for the returned timestamp (i.e. if there are no data which would match the timestamp for the phenomenon, the last known data (=prior to returned timestamp) for this phenomenon are returned). Path parameters element Type: S t r i n g Allowed values: l i n k , node-interf ace-in, node-interface-out Demanded topology element (if node interface is the element, specify whether ingoing or outgoing communication should be returned). i d Type: i n t The id of the element (linkld, nodelnterfaceld) Query parameters phenomenons Type: List Allowed values: l i n k , node-interf ace-in, node-interface-out A list of required phenomena. Must contain their full names separated by comma from each other. timestamp Type: Long Timestamp formatted as UNIX epoch time i n seconds i n UTC time zone. Response object Object: Measured Data Multiple: False 55 Path: /measurement/{element}/{id}/actualData?phenomenons= {phenomenons}×tamp={timestamp} Description: Returns the same timestamp as specified in the request U R L + list of data. The data in the list are in the order of the phenomenon names specified in the request URL. Each phenomenon has exactly one value. The returned measured values are the most recent known data for the returned timestamp (i.e. if there are no data which would match the timestamp for the phenomenon, the last known data (=prior to returned timestamp) for this phenomenon are returned). Path parameters element Type: S t r i n g Allowed values: l i n k , node-interf ace-in, node-interface-out Demanded topology element (if node interface is the element, specify whether ingoing or outgoing communication should be returned). i d Type: i n t The id of the element (linkld, nodelnterfaceld) Query parameters phenomenons Type: List Allowed values: l i n k , node-interf ace-in, node-interface-out A list of required phenomena. Must contain their full names separated by comma from each other. timestamp Type: Long Timestamp formatted as UNIX epoch time i n seconds in UTC time zone. Response object Object: Measured Data Multiple: False 56 Path: /measurement/{element}/{id}/dataRange?phenomenons= {phenomenons}&startTimestamp={startTimestamp}& endTimestamp={endTimestamp} Description: Returns the list of MeasuredData structures called "rangedMeasuredData". Each item in the list has a timestamp and a list of data. RangedMeasuredData list is ordered by timestamps and all timestamps are between start and end timestamps specified i n the request URL. The data in the list of each MeasuredData structure are in the order of the phenomenon names specified in the request URL. Each phenomenon has exactly one value. The returned measured values are the most recent known data for the assigned timestamp (i.e. if there are no data which would match the timestamp for the phenomenon, the last known data (=prior to assigned timestamp) for this phenomenon are returned). Path parameters element Type: S t r i n g Allowed values: l i n k , node-interf ace-in, node-interface-out Demanded topology element (if node interface is the element, specify whether ingoing or outgoing communication should be returned). i d Type: i n t The id of the element (linkld, nodelnterfaceld) Query parameters phenomenons Type: List Allowed values: l i n k , node A list of required phenomena by comma from each other. startTimestamp Type: Long Timestamp formatted as UNIX epoch time i n seconds in UTC time zone. Specifies the start of the range of returned data. endTimestamp Type: Long Timestamp formatted as UNIX epoch time in seconds i n UTC time zone. Specifies the end of the range of returned data. - i n t e r f a c e - i n j node-interface-out . Must contain their full names separated 57 Time component services Path: /time/timezone Description: Returns the time zone of the minimal timestamp in database. Other timestamps should have the same time zone. Response object Object: Timezone Multiple: False Path: /time/experiment/start Description: Returns the first timestamp of any measured value in the database, which is considered as the start of the experiment. Timestamp is formatted as UNIX epoch time in seconds in U T C time zone and returned as a number called "experimentStart". Path: /time/experiment/end Description: Returns the last timestamp of any measured value in the database, which is considered as the end of the experiment. Timestamp is formatted as UNIX epoch time in seconds in U T C time zone and returned as a number called "experimentEnd". Path: /time/all-timestamps Description: Returns all (distinct) timestamps of measured values from the database (= all times of interest from measurement's point of view). Timestamps are formatted as UNIX epoch time in seconds in U T C time zone and returned as a list of numbers called "timestamps". 58 Structures of returned JSON objects Name: Error Description: Error is returned when an exception occurs in application. Fields status Type: i n t Http response status code. message Type: S t r i n g Information about the error. Name: Description: Fields i d Link Link between routers (networks). Type: i n t Identifier of the link. s r c l d destld "target" Type: i n t Identifier of the source router (network). It is returned also as "source" in the object and contains router's topologyld. Type: i n t Identifier of the destination router (network). It is returned also as in the object and contains router's topologyld. Name: Description: Fields i d name Node Interface Node Base class for other objects used as nodes in topology. Type: i n t Identifier of the node. Topology objects also return "topologyld" which should be unique in topology. Type: S t r i n g Name of the node. physicalRole Type: S t r i n g Physical role of the node in the topology. address4 Type: S t r i n g IPv4 address for node interfaces or cidr4 address for routers (networks). 59 hostNodeld Type: i n t Identifier of node-interface's host node (computer). Name: Description: Fields i d name Router Node Base class for other objects used as nodes in topology. Type: i n t Identifier of the node. Topology objects also return "topologyld" which should be unique in topology. Type: S t r i n g Name of the node. physicalRole Type: S t r i n g Physical role of the node in the topology. address4 (networks). Type: S t r i n g IPv4 address for node interfaces or cidr4 address for routers nodelnterfaces Type: List List of node interfaces connected to this router. This field is called "children" in returned JSON because that name is specifically needed by d3.js in visualization of topology. Name: Description: Fields l i n k s Topology Objects Contains information about all separately visualized objects in topology. Type: List List of visualized links. i n t e r f a c e s Type: List List of visualized interfaces. routers Type: List List of visualized routers. The list does not contain the node interfaces connected to the routers. 60 Name: Description: Fields l i n k s routers Topology Describes the network topology. Topology consists of routers, node interfaces connected to these routers and links between routers. Topology structure is specifically prepared for topology visualization. Type: List List of links between routers in topology. Type: List List of routers in topology. This field is called "children" in returned JSON because that name is specifically needed by d3.js in visualization of topology. Name: Node Interface Link Usage Description: Contains information about usage of link connecting node interface with a router. Fields nodelnterfaceld Type: i n t Identifier of the node interface from (or to) which leads the observed link. numberOfBits Type: double Absolute number of bits that are being sent through a link at a moment. bandwidth Type: double Link's maximum bandwidth. bwUnit load Type: S t r i n g Unit in which link bandwidth is expressed. Type: double Load is always between 0 and 1. Load of the link: n u m b e r o f b i t s / bandwidthinbits. speed Type: double Speed is always between 0 and 1. Speed of the link = n u m b e r o f b i t s / number_of_bits_of_the_fastest_link). 61 Name: Router Link Usage Description: Contains information about usage of link connecting two routers. Fields i d Type: i n t Identifier of the link between routers. numberOfBits Type: double Absolute number of bits that are being sent through a link at a moment. Type: double Link's maximum bandwidth. Type: S t r i n g Unit in which link bandwidth is expressed. Type: double Load is always between 0 and 1. Load of the link = numberofbits / bandwidthinbits. Type: double Speed is always between 0 and 1. Speed of the link = numberofbits / number_of_bits_of_the_fastest_lmk). Name: Network Link Usages Description: Contains lists of link usages in network. Fields routerLinks Type: List List of usages of links between routers (networks). i n t e r f a c e L i n k s I n Type: List List of usages of links going from routers to node interfaces. interfaceLinksOut Type: List List of usages of links going from node interfaces to routers. bandwidth bwUnit load speed 62 Name: Node Interface Role Description: Denotes the logical role of node interface in the network topology. Fields i d Type: i n t Identifier of the node interface. Returns also "topologyld" which should uniquely identify the node interface in the network topology. r o l e Type: S t r i n g The logical role of the node interface. Name: Phenomenon Description: Object describing a phenomenon. Fields name Type: S t r i n g Name of the phenomenon. unit Type: S t r i n g Unit in which the data about this phenomenon is stored. Name: Measured Data Description: Contains measured values for different phenomena at a certain time. Fields timestamp Type: S t r i n g Timestamp formatted as UNIX epoch time in seconds in UTC time zone. data Type: List List of measured values. Each value represents one phenomenon - their order should depend on the request URL. For more information, check methods that use MeasuredData as return object. 63 Name: Description: Fields timezonelDs Timezone Time zone of the data in the database. A l l time data received from this service should be converted into this timezone so it matches to the time when the data were stored, (service returns all times converted to UTC). Type: List Possible text name representations of the timezone. offsetSeconds Type: Integer Offset of the time one against the UTC, measured in seconds. offsetSeconds Type: Integer Offset of the time one against the UTC, measured in hours. 64 B Tutorials for deployment and configuration of the service Deploying the VDS to the Tomcat Method 1 - by using the vds directory • In the vds directory [set up the database connection information] • In the vds directory [configure the server IP and port] • Copy the vds directory into the designated Tomcat's /webapps directory • If Tomcat is not started, start it • Wait for the automatic deploy (usually around 10 seconds) Method 2 - by using the vds.war file • Copy the vds.war file into the designated Tomcat's /webapps directory • If Tomcat is not started, start it • Wait for the automatic deploy (usually around 10 seconds) • A vds directory should appear in the Tomcat's /webapps directory • In the vds directory [set up the database connection information] • In the vds directory [configure the server IP and port] Setting up the database connection information • Open the classes directory at /vds/WEB-INF/classes • Open the file jdbc.properties • Set up the database driver, URL, username and password (for PostgreSQL the driver is already set and the U R L only needs to have set the IP and the database name) • Save the changes and restart the Tomcat if the VDS is already deployed Configuring the server IP and port • Open the classes directory at /vds/WEB-INF/classes • Open the file project.properties • Set up the project.serverIP and project.serverPort properties • Save the changes and restart the Tomcat if the VDS is already deployed 65 Deploying the VDS documentation GUI • Copy the vds-doc directory into the designated Tomcat's /webapps directory • Open the vds-doc directory and then open the connection.) s file • Set the connectionString variable so it correctly points to the /jsondoc REST service of the VDS as in the following: o http://[IP_of_the_server_with_VDS]: [port]/vds/jsondoc • Save the changes • For browsing the documentation open the web browser and navigate to URL: o http://[IP_of_the_server_with_vds-doc]: [port]/vds-doc/jsondoc.jsp Changing the name of the VDS Changing the name of the service • Rename the directory from vds to [new_name] (if it is already deployed it is in the /webapps directory of Tomcat) • Open the renamed directory at /[new_name]/WEB-INF/classes • Open the file project.properties • Change the project.finalName property from the vds to [newname] • Save the changes (if it was already deployed, restart the Tomcat for changes to take effect) Registering the [newname] of the VDS with the VDS documentation GUI • Open the vds-doc directory and then open the connection.) s file • Change the name of the service in the connectionString variable from vds to [newname] so it looks as follows: o http://[IP_of_the_server_with_VDS]: [port]/[new_name]/jsondoc • Save the changes 66 C Visualization portlets Network topology visualization portlet The portlet visualizes the network topology and the usage of links between the network nodes. N e t w o r k T o p o l o g y Time management portlet The portlet allows exposes an interface to control the playback of a scenario execution. Users can also choose which phenomenon types on a particular link they would like to observe. Time Manager 67 2D spider chart and line charts portlet The portlet visualizes the actual values and changes of values of phenomenon types selected in the Time management portlet. Spider Chan ° 1 aline chartsSpider Chan ° 1 1MM IOSM» IStMH lfcll» lfcitM *»• / \ M- / \ MS- / \ 3D spider chart portlet The portlet visualizes the the changes of values of phenomenon types selected in the Time management portlet in a 3D sequence spider chart object. SpiderChart3D 68 D List of electronic appendices The archive file electronic_appendices.zip is located in the thesis archive in the IS M U and contains the following electronic appendices: • vds - a directory with the compiled Visualization Data Service • vds-doc - a directory with GUI for VDS's documentation (a slightly modified JSONDoc UI project) • vds-sources - a directory with the sources of the VDS • vds.war - a web application archive file with the compiled VDS 69