Conceptual Model of Visual Analytics for Hands-on Cybersecurity Training Radek Ošlejšek, Vít Rusňák, Karolína Burská, Valdemar Švábenský, Jan Vykopal, and Jakub Čegan Abstract—Hands-on training is an effective way to practice theoretical cybersecurity concepts and increase participants' skills. In this paper, we discuss the application of visual analytics principles to the design, execution, and evaluation of training sessions. We propose a conceptual model employing visual analytics that supports the sensemaking activities of users involved in various phases of the training life cycle. The model emerged from our long-term experience in designing and organizing diverse hands-on cybersecurity training sessions. It provides a classification of visualizations and can be used as a framework for developing novel visualization tools supporting phases of the training life-cycle. We demonstrate the model application on examples covering two types of cybersecurity training programs. Index Terms—Visual analytics, cybersecurity, hands-on training, classification, education. • 1 INTRODUCTION Our society is being exposed to an increasing number of cyber threats and attacks. The lack of a strong cybersecurity workforce presents a critical danger for companies and nations [1]. Hands-on training of new professionals is an effective way to remedy this situation. In our work, we use visual-based sense-making and reasoning to support participants in better and faster comprehension of attacks, threats, and defense strategies. The ability to use visual-based analytical reasoning is essential in many fields, including biology [2], medicine [3], urbanization [4], and education [5]. The goal of this paper is to create a conceptual framework providing broader insight into the application of visual analytics (VA) principles [6] in hands-on cybersecurity training. Conceptual models like the one proposed i n this paper help researchers design effective visual techniques in a given domain. To the best of our knowledge, the current literature for cybersecurity training lacks such a conceptual model. There are several reasons for the absence of a conceptual model. Existing hands-on cybersecurity training is largely heterogeneous. Training sessions differ in content, organization, target audience, and technical means. Moreover, the cybersecurity domain represents a sensitive area similar to military or intelligence services, in which many sources are secret or restricted. Therefore, it is challenging to become familiar with this domain and clarify the terms and processes. Fortunately, we have the benefit of seven years of experi• R. Ošlejšek and K. Burská are with the Faculty of Informatics, Masaryk University, Brno, Czech Republic. E-mail: loslejsek, xburskal@fi.muni.cz • V. Rusňák, and J. Cegan are with the Institute of Computer Science, Masaryk University, Brno, Czech Republic. E-mail: Irusnak, ceganl@ics.muni.cz • V. Švábenský and J. Vykopal are with the Institute of Computer Science and Faculty of Informatics, Masaryk University, Brno, Czech Republic. E-mail: Isvabensky, vykopalj@ics.muni.cz Manuscript received August 6, 2019. ence with the design and organization of training sessions. The results of this paper arise from close cooperation with domain experts who directly participate in the development and operation of the KYPO Cyber Range [7] - a sophisticated platform for cybersecurity training. Their knowledge and the survey of other existing approaches are essential for this work. The two most widely recognized hands-on cybersecurity training activities are Capture the Flag (CTF) and the Cyber Defense Exercise (CDX). The main difference lies in their educational goals. While CTFs focus mainly on improving hard skills in the cybersecurity domain, C D X s target both hard and soft skills. CTF features a game-like approach [8][11]. Participants gain points for solving technical tasks that exercise their cybersecurity skills. Completing each task yields a text string called flag. In contrast, C D X s have been traditionally organized by military and governmental agencies [12] that emphasize realistic training scenarios that authentically mimic the operational environment of a real organization [13]. We deeply analyzed these types of training programs to distill a unified visual analytics model that fits the heterogeneous cyber-training events and is simultaneously instructive for the design of specialized visual analytics tools. The major contributions of this paper are: (a) a definition of a unified training life cycle with user roles having clear responsibilities and requirements; (b) a proposal for a conceptual model of visual analytics for hands-on cybersecurity training that can be used as a framework for further research and for developing visualizations supporting particular lifecycle tasks; and (c) demonstrations of the applicability of the model using real examples and lessons learned from our long-term experience in designing and organizing hands-on cybersecurity training. The paper is organized as follows: Section 2 introduces the related work. In Section 3, we discuss the generic life cycle of hands-on cybersecurity training sessions with user roles that delimit requirements put on analytical tasks © 2 0 2 1 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, i n any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or iists, or reuse of a m copvrighted component of this work in other works. Cite this article as follows: R. OSlejsek, V. Rusnak, K. Burska, V. Svabensky, J. Vykopal and J. Cegan, "Conceptual M o d e l of Visual Analytics for Hands-on Cybersecurity Training", in IEEE Transactions on Visualization and Computet Graphics, v o l . 27, no. 8, p p . 3425-3437, 1 A u g . 2021, D O I : https://doi.org/10.1109/TVCG.2020.2977336. 2 and visualizations. Sections 4 and 5 provide classification schemes for data and analytical visualizations. A demonstration of the conceptual model is presented i n Section 6. Section 7 summarizes the observations attained during our research. Section 8 outlines the direction for future research topics. 2 RELATED WORK Our work is unique i n its close interconnection of three areas: visual analytics, cybersecurity, and education. Publications dealing directly with the intersection of these fields are rare. Therefore, we have explored related work from several relevant points of view. 2.1 Visual Analytics in Cybersecurity Many works have addressed the challenges related to the design or evaluation of cybersecurity tools and techniques [14]-[18]. A visual analytics approach to automated planning attacks has been discussed [19]. A l l the surveys have confirmed the importance of supporting analytic tasks by visual interfaces. However, they are aimed at the security-related focus only and do not tackle the educational aspect of the training of new experts. We took the challenges into account i n our work, and we incorporated specific aspects of hands-on cybersecurity exercises. 2.2 Visual Analytics in Education and Training Another perspective that considers visualizations in relation to cybersecurity emphasizes the educational aspect. There are distinct approaches to enhancing cybersecurity abilities that focus on training or teaching computer security [20][22]. However, these works again provide outputs of a narrow scope and often omit any profound conceptualization of their findings. To help us comprehend the topic more thoroughly, we do not focus exclusively on the cybersecurity field; we also consider studies that relate to education and training from a broader view. A recent survey [23] introduces a literature classification i n the field of interactive visualization for education with a focus on evaluation, and it lists common categories of educational visualizations from distinct fields. In this respect, our work is unique as it considers more than the educational theory. It also includes the application of hands-on training with practical and technical aspects that are an essential part of the learning process. The issue of education has been approached from the opposite direction [24]. In this work, the authors focus on predictive models for teachers of higher education institutions. They confirm the need for insight for both the teachers and the students that exceed simple summative feedback. 2.3 Generic Models of Visual Analytics Many generic design frameworks, models, and methods exist i n the literature. These provide a structure and explanation of activities that designers perform when proposing suitable visualization tools [25]-[28]. However, the aim of this paper is not to discuss processes leading to the development of specific visualizations for cybersecurity training. Instead, we provide a conceptualization of the domain so that our model can serve as a framework for discussion and the efficient application of existing design methods for specific training tasks. Fig. 1. Altered version of models by Keim [29] and Sacha [30] for insight retrieval based on visual analytics approaches. Our solution builds upon Keim's [29] and Sacha's [30] conceptual models for the visual analytics process. The V A process is characterized by the interaction between data, visualizations, models of the data, and users discovering knowledge, as shown i n Fig. 1. Keim emphasizes the computer-driven components of the V A process; Sacha extends the model with human reasoning. Data carries facts in structured, semi-structured, or unstructured form. The model captures the results of automated analysis methods. The interactive visualizations are the primary user interface presenting data and models in a comprehensible manner. The human-centered part consists of three loops. The exploration loop captures low-level visual interactions using actions and findings that are specific for individual visualizations and interests. The analysts then refine their hypotheses i n the verification loop. The knowledge generation loop describes the transition from observations into generalized knowledge. These two models form the foundations of our work. We utilize data and visualization components of Keim's model and narrow our focus on the verification loop that plays a crucial role i n building knowledge i n any domain. The model component of the V A process represents the crosscutting concern, which is out of the scope of this paper. Therefore, we do not provide a separate classification for it. Instead, we mention suitable models in our discussion of the classification of visualizations and hypotheses. The exploration loop and knowledge generation loop are omitted since they provide either too detailed or too generic concepts. 3 CYBERSECURITY TRAINING LIFE C Y C L E The human loops of Sacha's V A model (see Fig. 1) reflect the needs of users who interact with the computer system. Based on the literature review, our experience, and the application of analytical methods, we distilled the following general life cycle that clarifies who is involved i n the human loops, what they expect (at a high level of abstraction), and when they conduct their V A tasks. These pieces of information are later used for the detailed conceptualization of the "computer part" of the V A model by answering what (data and hypotheses) and how (visualizations) can be analyzed in the cyber training. 3 3.1 Phases Based on the literature review and our experience, we distilled three generic phases (see Fig. 2) of the cybersecurity training life cycle. We performed a theory-driven qualitative coding method [31] on four key papers [32]-[35] that deal with organizational aspects of cybersecurity training. Using an open coding method helped us to structure the analysis and consolidate observations. Phases and outcomes discussed i n the analyzed papers can slightly differ from our model. Nevertheless, the subtleties are rather negligible since the terminology i n this domain is yet not established. (training) designer proficiency (training) analyst [reflection] participant Fig. 2. Cybersecurity training life-cycle phases with corresponding user roles, and main outcomes of each phase. Planning is the first phase of any new training. The goal is to formulate technical and educational requirements, set measurable objectives, and allocate necessary resources. The training definition - the main output - is a set of (more or less) formally defined configurations of the computer network and its nodes, specification of attacks, training tasks and objectives, scoring rules, expected skills of participants, and related configuration data of the training. The execution phase represents a training session in which participants are physically involved. User activities and the state of the training infrastructure are monitored, and the data is stored for further analysis. We refer to the data from this phase as training runs. During the reflection phase, training definitions and training runs are analyzed and evaluated. Reflection can be conducted at any time. Analysts usually explore the data after each training run to learn from it or provide feedback to involved people. However, they can also analyze the data before or during the planning phase of a new training session to gradually improve its quality. The reflection phase, therefore, helps to increase the proficiency in designing and organizing training events. 3.2 User Roles The requirements put on visual analytic interfaces are affected by user roles. The basic roles emerged from the life cycle. They reflect individual phases captured in Fig. 2. For clarity, our roles are C A P I T A L I Z E D in the paper. T R A I N I N G D E S I G N E R S ( D E S I G N E R S for short) are responsible for the design of training definitions during the planning phase. Multiple designers with different skills are usually involved in the preparation of new training content. Cybersecurity experts contribute primarily to the technical aspects; education experts are responsible for defining the learning objectives and assessment criteria. P A R T I C I P A N T S represent everyone involved in the training event. Their analytical activities are associated with situational awareness and gaining insight into the training during the execution phase. The T R A I N I N G A N A L Y S T ( A N A L Y S T for short) role covers all the people who conduct the post-training analysis of collected data. In our V A model, this role is used to capture the requirements of generic analytical interactions. Various people interested in the relevant data can take on this role, e.g., cybersecurity experts looking for talented participants. o C D (training) analyst r o o C D (training) participant designer ^ r