Faculty of Informatics
Masaryk University
Czech Republic
Habilitation Thesis
Interactive Virtual and Augmented Reality
Environments
Fotis Liarokapis, Ph.D
March 2015
Interactive Virtual and Augmented Reality Environments
ii
Preface
Interactive computer graphics applications have gained a lot of attention over the past decade.
In this broad field the two major technologies, virtual and augmented reality are interfering
with consumer's life in a number of ways. Virtual reality has already become dominant in
certain applications such as movies and video games where augmented reality has now started
developing more robust applications.
The main goal of this thesis is to provide an overview of my most significant achievements in
the areas of interactive virtual and augmented reality environments. My achievements are subdivided
into four areas, which are: (a) procedural modelling, (b) virtual and augmented reality
interfaces, (c) interactive environments, and (d) application domains. This work covers a
complete set of methods and techniques from content generation to visualisation and
interaction, and finally to the application into different domains.
This thesis is written as a commentary to a collection of 10 peer-reviewed journal papers and 6
peer-reviewed conference papers. My percentage contribution for each paper is estimated and
included in the thesis as well as a brief description of my work. My personal contribution to the
papers ranges between 10% to 100% with an average of approximately 40%.
Interactive Virtual and Augmented Reality Environments
iii
Acknowledgements
Firstly, I would like to thank all of my colleagues and collaborators who contributed to the
papers that are provided in this thesis. I would also like to thank all of my colleagues at the
Human Computer Interaction laboratory for their support. Special thanks to Petr Matula and
Michal Kozubek for their inspiration. Finally, my greatest thanks go to my family and my
girlfriend for their support and patience throughout the whole process of this thesis.
Parts of the work presented in this thesis have been supported by the EU IST Framework V
programme, Key Action III-Multimedia Content and Tools, Augmented Representation of
Cultural Objects (ARCO) project IST-2000-28336 and by the EPSRC Pinpoint Faraday project
GR/T04212/01, called LOCUS.
Interactive Virtual and Augmented Reality Environments
iv
Table of Contents
Chapter 1 Introduction .........................................................................1
1.1 Introduction...................................................................................... 1
1.2 Motivation ........................................................................................ 1
1.3 Background ...................................................................................... 2
1.3.1 Virtual Reality............................................................................. 2
1.3.2 Augmented Reality ...................................................................... 2
1.3.3 Procedural Modelling.................................................................... 3
1.3.4 Crowd Modelling.......................................................................... 3
1.3.5 Serious Games............................................................................ 3
1.3.6 Human Computer Interaction........................................................ 4
1.4 Goal and Overview ............................................................................ 4
Chapter 2 Procedural Modelling .............................................................5
2.1 Introduction...................................................................................... 5
2.2 Terrain Environments......................................................................... 5
2.3 Buildings and Cities ........................................................................... 7
2.4 Behaviour of Crowd Simulation ........................................................... 8
Chapter 3 Virtual and Augmented Reality Interfaces...............................10
3.1 Introduction.....................................................................................10
3.2 Virtual Reality Interfaces ...................................................................10
3.2.1 Indoor VR Interfaces...................................................................10
3.2.2 Mobile VR Interfaces...................................................................12
3.3 Augmented Reality Interfaces ............................................................13
3.3.1 Indoor AR Interfaces...................................................................13
3.3.2 Mobile AR Interfaces...................................................................15
Chapter 4 Interactive Environments ...................................................17
4.1 Introduction.....................................................................................17
4.2 Multimodal Interaction ......................................................................17
4.3 Wireless Sensor Network Based Interaction .........................................18
4.4 Brain-Computer Interaction ...............................................................20
Chapter 5 Application Domains ............................................................22
5.1 Introduction.....................................................................................22
5.2 Virtual Archaeology ..........................................................................22
5.3 Urban Navigation .............................................................................24
5.4 Higher Education..............................................................................26
5.4.1 VR and AR in Education...............................................................26
5.4.2 Activity-Led Introduction to First Year Creative Computing ..............27
5.5 Serious Games and Virtual Environments ............................................29
5.5.1 Serious Games Technologies........................................................29
5.5.2 Learning as Immersive Experiences within Serious Games...............29
Chapter 6 Conclusions and Future Work ...............................................31
6.1 Conclusions .....................................................................................31
6.2 Future Work ....................................................................................31
Chapter 7 References.........................................................................32
Chapter 8 Appendix – Paper Reprints ...................................................38
8.1 Paper #1.........................................................................................39
8.2 Paper #2.........................................................................................48
8.3 Paper #3.........................................................................................53
8.4 Paper #4.........................................................................................62
8.5 Paper #5.........................................................................................67
8.6 Paper #6.........................................................................................72
8.7 Paper #7.........................................................................................86
Interactive Virtual and Augmented Reality Environments
v
8.8 Paper #8.......................................................................................108
8.9 Paper #9.......................................................................................125
8.10 Paper #10 ..................................................................................138
8.11 Paper #11 ..................................................................................145
8.12 Paper #12 ..................................................................................155
8.13 Paper #13 ..................................................................................165
8.14 Paper #14 ..................................................................................174
8.15 Paper #15 ..................................................................................190
8.16 Paper #16 ..................................................................................212
Interactive Virtual and Augmented Reality Environments
vi
List of Figures
Figure 2-1 Procedural terrain [40]................................................................................... 6
Figure 2-2 (a) Roman Settlement, (b) Vitruvian Temple Comparions .............. 7
Figure 2-3 The urban crowd simulation displaying crowds of agents (a) No
graphical complexity [44] (b) High realistic scenes.......................................... 9
Figure 3-1 Indoor online VR Interfaces ....................................................................... 11
Figure 3-2 Mobile VR Interfaces (a) Manual mode (b) GPS mode (c) VR view
............................................................................................................................................. 12
Figure 3-3 Indoor AR Interfaces [54]........................................................................... 14
Figure 3-4 Mobile AR Interfaces ..................................................................................... 15
Figure 4-1 Multimodal augmented reality interface [48]...................................... 18
Figure 4-2 Wireless Sensor Network Based Interaction........................................ 19
Figure 4-3 Brain-computer interaction [47]............................................................... 20
Figure 5-1 Archaeology (a) Complete solution and (b) Artefact visualisation
in AR .................................................................................................................................. 23
Figure 5-2 Interfaces for presenting information retrieved from a mobile
information system...................................................................................................... 25
Figure 5-3 Operation of the AR application (a) AR environment (b)
Visualisation of educational content [59]............................................................ 26
Figure 5-4 3D etch-a-sketch. (a) Student-based drawing application [60],
(b) student group’s hardware interface [60]..................................................... 28
Figure 5-5 Learning as immersive experiences. (a) Four Dimensional
Framework [63], (b) Meeting in-world in Second Life for virtual tour
[62].................................................................................................................................... 30
Interactive Virtual and Augmented Reality Environments
vii
Abbreviations
2D Two Dimensional
3D Three Dimensional
ALL Activity Lead Learning
AR Augmented Reality
BCI Brain Computer Interface
DOF Degrees of Freedom
EEG Electroencephalography
GPS Global Positioning System
GUI Graphical User Interface
HCI Human Computer Interaction
HMD Head Mounted Display
UMPC Ultra Mobile Personal Computer
VE Virtual Environment
VR Virtual Reality
WSN Wireless Sensor Network
Interactive Virtual and Augmented Reality Environments
1
Chapter 1
Introduction
1.1 Introduction
The presented habilitation thesis consists of a collection of 16 publications, 10 peer-reviewed
journal papers and 6 peer-reviewed conference papers. The introduction chapter presents the
motivation of this thesis, followed by a brief background of the research areas covered, and
then the goal and overview of this work. The next four chapters provide a summary of the main
research contributions. Finally, the last chapter presents conclusions and future work.
1.2 Motivation
Interactive virtual and augmented reality environments are becoming more and more appealing
to a wider audience. The creation of realistic virtual and augmented reality environments is an
important issue in computer animation, computer games, digital film effects, and simulation
industries. In recent years, the computer and video games industry has overtaken both the film
and music industries. The top revenue producers and the cost for developing commercial
interactive applications nowadays usually require investments of several million dollars. This
typically involves large teams of developers recruiting hundreds of workers, many of whom are
artists and designers providing content for the decoration of rich virtual and augmented reality
environments. While many creative companies have the necessary budget to develop these
expensive interactive environments (i.e. movies, games, etc), which employ state of the art
computer graphics and their applications, not all companies have the same resources. In
addition, hardware improvements allow for better and faster tracking and visualisation devices
that can be used for creating novel applications.
Interactive Virtual and Augmented Reality Environments
2
1.3 Background
This section provides a brief overview of the main technologies included in the thesis such as:
virtual reality, augmented reality, procedural modelling, crowd modelling, serious games and
human-computer interaction.
1.3.1 Virtual Reality
The first virtual reality (VR) environment was originally introduced in the 1960s by Ivan
Sutherland [1]. Since then there have been published many studies [2], [3], [4], [5]. The main
characteristic of a VR system is that the user’s natural sensory information is completely
replaced with digital information. The user’s experience of a computer-simulated environment
is called immersion. VR systems can completely immerse a user inside a synthetic environment
by blocking all the signals of the real world. The most common problems of VR systems are of
emotional and psychological nature including motion sickness, nausea, and other symptoms
which are created by the users’ high degree of immersiveness [6]. VR systems are also
sometimes called virtual environments (VEs); however, typically the term is referred to online
virtual world applications. Nowadays more than 100 VEs exist and they provide excellent
capabilities for creating effective distance and online learning opportunities through the
provision of unique support for distributed groups (online chat, the use of avatars, document
sharing etc.) [2].
1.3.2 Augmented Reality
The basic concept of augmented reality (AR) is to superimpose digital information directly
upon a user’s sensory perception [7], rather than replacing it with a synthetic environment as
VR systems do. Both technologies process and display the same digital information and often
make use of the same dedicated hardware but AR systems use more complex software
approaches compared to VR systems [8]. In technical terms, it is not a single technology but a
collection of different technologies that operate in conjunction, with the aim of enhancing the
user’s perception of the real world through computer-generated information [9]. This kind of
information is usually referred to as virtual, digital, or synthetic information. The real word
must be matched with the virtual in position and context in order to provide an understandable
and meaningful view [10]. Users can work individually or collectively, experiment with
computer-generated information and interact with a mixed environment in a natural way [11].
In the coming years, AR systems will be able to include a complete set of augmentation applied
exploiting all people’s senses [12]. Finally, a recent survey of AR describes some known
limitations regarding human factors that developers need to overcome with [13].
Interactive Virtual and Augmented Reality Environments
3
1.3.3 Procedural Modelling
A number of survey papers have been recently published in the areas of terrains [14], cities
[15], and virtual worlds [16]. Procedural modelling can be considered as a set of formal
production rules that specify how geometric shapes are created and transformed. Procedural
modelling is mainly used to generate content for a number of aspects of the real environment
including: terrains, buildings, cities, road structures, trees and vegetation. Muller and Parish
[17] proposed a city generation approach that made use of self-sensitive L-systems to
automatically lay out a set of streets and generate virtual architecture. Greuter et al. described a
set of methods that allowed for the procedural generation of a ‘pseudo-infinite’ digital
environment [18]. Wonka et al. [19] devised a variation on shape grammars for use in the
construction of building facades, which they named split grammars. More recently, shape
grammars have been extended through the use of context-sensitive shape rules [20].
1.3.4 Crowd Modelling
The process of simulating huge crowds of intelligent agents in real-time is still a challenging
task due to numerous different issues [21], [22]. The real-time simulation of crowds can be
conducted using a variety of approaches. The most common methods involve employing a
series of models and algorithms working in tandem with animate each agent. These include
decision-making [23], pathfinding navigation [24], local steering mechanics [25] and agent
perception systems [26]. Social forces models [27] can also be utilised to enhance crowd
believability under certain situations. However, some form of quantification is required to
assess the behaviour of agents within crowd simulations and past research has utilised
perception as a tool for evaluating crowds [28], [29]. Realism is the degree of plausibility of the
crowd behaviour whereas perceived realism is centred on the perception of humans.
1.3.5 Serious Games
Serious games are part of a new emerging field that focuses on computer games that are
designed for non-leisure but often, educational purposes. They have important applications in
several distinct areas such as: military, health, government, and education [30]. Serious games
have the capability of enabling learners to undertake tasks and experience situations that would
have otherwise been impossible. The success of serious computer games in educational
scenarios is based on the combination of audiovisual media that is prevalent in these games,
which enhances the absorption of information in the learner's memory [31], [32]. Although the
state-of-the-art in serious games technology is identical to the state-of-the-art in computer
games, both types share the same technical infrastructure [2]. Moreover, there are two diverse
Interactive Virtual and Augmented Reality Environments
4
views on how serious games should be designed. One argues that while pedagogy is an implicit
component of a serious game, it should be secondary to entertainment, meaning that a serious
game that is not ‘fun’ to play with would be useless, regardless of its pedagogical content or
value [33]. On the other hand, design methodologies exist for the development of games
incorporating pedagogic elements, such as the four dimensional framework [34], which outlines
the centrality of four elements that can be used as design and evaluation criteria for the creation
of serious games. As a result, this approach focuses mainly on educational and pedagogy
theories.
1.3.6 Human Computer Interaction
Human-computer interaction (HCI) is the study of the interaction between humans and
computer systems [35]. As a result, it is one of the most important issues when designing
interactive environments [36][37]. The design and implementation of software user interfaces
that will produce robust interfaces is interrelated with the use of HCI techniques. The
integration of such interfaces into AR/VR systems can reduce the complexity of the HCI by
using implicit contextual input information [38]. Nevertheless, the design and implementation
of effective VR and AR environments is a difficult task and an area of continuous research.
Nowadays, most common HCI rely on different types of sensors providing user-friendly
applications. Typical techniques include: acoustic, mechanical, optical, electromagnetic,
inertial, global positioning system (GPS), and electroencephalography (EEG). Multimodal
systems combine natural input modes (i.e. speech, pen, touch, hand gestures, eye gaze, head and
body movements) in a coordinated manner with multimedia system output [39].
1.4 Goal and Overview
The goal of this thesis is to illustrate the most significant results in the area of interactive virtual
and augmented reality environments. The main results are summarised in the next four
chapters. Each chapter provides a brief overview of the contributions incorporating my
contribution as well.
Interactive Virtual and Augmented Reality Environments
5
Chapter 2
Procedural Modelling
2.1 Introduction
This chapter presents a set of techniques used for creating content for the computer graphics
community as well as for research purposes. This includes the creative industry, VR and AR
interactive applications. The focus of this chapter is on procedural modelling techniques for: (a)
terrains, (b) buildings and cities and (c) behaviour of crowd simulation.
2.2 Terrain Environments
A variety of methods for automatically creating detailed but also randomised terrain
environments have been developed. The use of these procedural methods saves time and
reduces the budget for creating effective computer graphics applications (i.e. games, VR, AR
etc). This work explained some of the problems that can arise from this situation and described
a variety of methods that can be used to overcome them [40]. These methods have been applied
to a basic flight simulator, so that the results could be observed and evaluated. Figure 2-1 (a)
illustrates an overview of the randomly generated environment for the proposed flight simulator
which can be used for developing games and serious games. Heightmaps were generated using
the diamond-square algorithm to provide surface detail.
Interactive Virtual and Augmented Reality Environments
6
Figure 2-1 Procedural terrain [40]
Based on a recursive algorithm, the level of detail can be adjusted as necessary, which can be
an advantage when dealing with different methods that require different levels of processing
power. To smooth the terrain, a two-dimensional (2D) Lorentz distribution as well as a
Gaussian filter allowing for bell shapes were used. Next, the terrain was generated in a way so
that it gives the illusion that it is infinite. Upon reaching the right side of the landscape, a tile of
terrain is moved in front of the player to provide the illusion of endless terrain (Figure 2-1 b).
Upon reaching the right side of terrain grid A, the values of the far-right points are copied to the
far left points of terrain grid B (Figure 2-1 c). The other points of terrain grid B are then
calculated via randomisation and mid-point displacement, as done for grid A. Moreover, a
simplified method was implemented that made use of randomised positioning of vegetation
models, rather than procedurally creating vegetation. Evaluation with two different types of
user groups (remote and hallway) showed that overall the flight simulator is enjoyable, looks
realistic for a gaming scenario and thus has the potential to be used for the development of
serious games [40].
Paper: Noghani, J., Liarokapis, F., Anderson, E.F. Randomly Generated 3D
Environments for Serious Games, Proc. of the 2nd
IEEE International Conference in
Games and Virtual Worlds for Serious Applications, IEEE Computer Society, Braga,
Portugal, 25-26 March, 3-10, 2010.
Contribution (40%): Design of the architecture, implementation of smoothing
techniques and advice on evaluation. Write-up of most of the paper (full text on section
8.1).
Interactive Virtual and Augmented Reality Environments
7
2.3 Buildings and Cities
This work proposed the development of a novel shape grammar [19] inspired by ‘CGA Shape’
[20] for describing Roman settlements derived from the writings of Vitruvius (Figure 2-2),
initially with a focus on the description of classical Roman temples, meaning the main building
of a religious site, excluding its courtyard [41]. Moreover, the technique was extended in
generating complete Roman settlements. The construction of Roman temples included a large
number of common elements found in Roman architecture, e.g. palaces shared many of these
and often also incorporated temples themselves [42]. Structures generated from these Vitruvian
rules can provide an exemplar of archetypal Roman architecture in a similar manner as the
“Virtual Egyptian Temple” by Jacobson and Holden [43] which depicts architecture in ancient
Egypt. Different approaches were taken for the various elements of the generated city. A
weighted formula was designed for the purpose of citing a city location upon a heightmap,
incorporating factors like the distance to the nearest body of water and the gradient of the land.
Three methods of situating generic structures within a city were considered, including a
probability distribution method that assigned buildings to districted allotments with a flexible
degree of randomness.
Figure 2-2 (a) Roman Settlement, (b) Vitruvian Temple Comparions
For the generation of the rest of the city, a novel formal grammar syntax was devised, capable
of describing shapes in a deterministic and technical fashion. The grammar made use of
superscripts preceding symbols for notating conditional rules, and superscripts and subscripts
Interactive Virtual and Augmented Reality Environments
8
following symbols for the purpose of adding attributes to existing symbols. In this way,
architecture was described using grammar rules in a way that would be impractical or outright
impossible through the use of traditional grammar syntax. The dominant feature of a temple is
its main building, which in most Roman settlements would be built within the courtyard and the
enclosure wall. While the overall makeup of this usually followed the same pattern – the ‘cella’
(the temple building’s enclosed room), fronted or surrounded by a portico and raised on top of a
podium – there is considerable architectural variation possible in Roman temple construction.
Temple buildings were built on a podium with steps only at the front or with steps on all four
sides, with the number of steps in both cases being an odd number. The temples’ proportions
would be such that the length of the main temple buildings would be twice the width of the
temple with the length of the temple’s cella being 25% larger than the overall width of the
temple [41].
Paper: Noghani, J., Anderson, E., Liarokapis, F. Towards a Vitruvian Shape Grammar
for Procedurally Generating Classical Roman Architecture, Proc. of the 13th
International Symposium on Virtual Reality, Archaeology and Cultural Heritage VAST
2012, Short and Project Papers, Eurographics, Brighton, UK, 19-21 November, 41-44,
2012.
Contribution (30%): Design of the architecture and advice on the implementation.
Collaboration on the writing of the paper (full text on section 8.2).
2.4 Behaviour of Crowd Simulation
This work examined (a) the development of intelligent crowd simulation in virtual
environments, and (b) a perceptual experiment to identify features of behaviour, which can be
linked to perceived realism [44]. The urban crowd simulation developed as part of this research
implements a range of real-time simulation techniques [23], [24], [25], [26], [27]. To carry out
the psychophysical experimentation a platform was developed in the form of the urban crowd
simulation. The results of this research can feedback into the development processes of
simulating inhabited locations, by identifying the key features, in order to achieve more
perceptually realistic crowd behaviour. Perceptual experimentation methodologies can be
adapted and potentially utilised to test other types of crowd simulation, for application within
computer games, or more specific simulations such as for urban planning or health and safety
purposes. In the initial stage of the research, the perceived realism of agent crowd behaviour is
evaluated through the features that shape behaviour traits. For example, for the velocity type,
the behavioural annotation and so on, the graphical complexity is not essential to the core of the
research (Figure 2-3, a).
Interactive Virtual and Augmented Reality Environments
9
Figure 2-3 The urban crowd simulation displaying crowds of agents (a) No graphical
complexity [44] (b) High realistic scenes
Data is collected from experiments in the form of a perceived realism value between ‘0’
(completely unrealistic) and ‘1’ (completely realistic) [44]. Initial results with 32 participants
completed the social forces experiment, using an online survey platform. The experiment
consists of two key variables, one for each of the agent-based social forces. These variables
were tested at specific trials, such as: (a) agent avoidance and (b) agent attraction. The majority
of participants (94%) found that when the agent avoidance social force is present the behaviour
of the agents is more realistic, and (95%) selected the videos with the agent attraction social
force present to be more realistic. Results showed that the majority of the participants found the
simulation with social forces to be more realistic than a simulation without. In the next stage of
this research, the realism of the environment is also included (Figure 2-3, b) and another study
will examine if there is a correlation with the previous results.
Paper: O'Connor, S., Liarokapis, F., Peters, C. An Initial Study to Assess the Perceived
Realism of Agent Crowd Behaviour in a Virtual City, Proc. of the 5th
International
Conference on Games and Virtual Worlds for Serious Applications (VS-Games 2013),
IEEE Computer Society, Bournemouth, UK, 11-13 September, 85-92, 2013.
Contribution (30%): Collaboration on the design of the architecture and advice on the
experimental part. Collaboration on the writing of the paper (full text on section 8.3).
Interactive Virtual and Augmented Reality Environments
10
Chapter 3
Virtual and Augmented
Reality Interfaces
3.1 Introduction
This chapter demonstrates novel solutions developed in the area of virtual and augmented
reality for both indoor and outdoor environments. In particular, a number of novel virtual and
augmented reality interfaces are presented illustrating how these technologies can be used
effectively for both types of environments.
3.2 Virtual Reality Interfaces
A number of novel VR interfaces have been developed for both indoor and outdoor
environments that can be categorised as: (a) indoor interfaces and (b) mobile interfaces.
3.2.1 Indoor VR Interfaces
VR systems vary from laboratory custom-made systems [45], to modern gaming environments
(which rely on the functionality of commercial game engines [46], [47]) as well as on online
virtual environments. The focus of this work was on the presentation of realistic graphics in an
interactive online VR environment [49]. Online VR interfaces allow multiple users to access
the content in an easy and convenient manner from remote locations. The most significant work
performed here, was the integration of multimodal visualisation VR environments (which are
connected to a database) to allow users to switch between web and VR views in real-time
performance.
Interactive Virtual and Augmented Reality Environments
11
Figure 3-1 Indoor online VR Interfaces
Metadata is also associated with the virtual information and presented appropriately (Figure
3-1). In the Web-based interface a user can browse information presented in a form of 3D
VRML virtual galleries or 2D Web pages with embedded multimedia objects. Virtual
exhibitions can also be visualized in the Web browser in a form of 3D galleries [55]. In this
visualization, users can browse objects simply by walking along the 3D environment (i.e. a
reconstruction of a real gallery). Different interaction devices were also integrated to the system
allowing users to manipulate 3D content in a more appealing manner.
Paper: Liarokapis, F., Mourkoussis, N., White, M., Darcy, J., Sifniotis, M., Petridis,
P., Basu, A., Lister, P.F. Web3D and Augmented Reality to support Engineering
Education, World Transactions on Engineering and Technology Education, UICEE,
3(1): 11-14, 2004.
Contribution (80%): Collaboration on the design of the architecture. Implementation
of the most of the VR interface. Write-up of most of the paper (full text on section 8.4).
Paper: White, M., Mourkoussis, N., Darcy, J., Petridis, P., Liarokapis, F., Lister, P.F.,
Walczak, K., Wojciechowski, R., Cellary, W., Chmielewski, J., Stawniak, M., Wiza,
W., Patel, M., Stevenson, J., Manley, J., Giorgini, F., Sayd, P., Gaspard, F. ARCO-An
Architecture for Digitization, Management and Presentation of Virtual Exhibitions,
Proc. of the 22nd
International Conference on Computer Graphics (CGI'2004), IEEE
Computer Society, Hersonissos, Crete, June 16-19, 622-625, 2004.
Interactive Virtual and Augmented Reality Environments
12
Contribution (15%): Collaboration on the design of the VR and AR architecture.
Implementation of the most of the VR and AR interface. Write-up of parts of the paper
(full text on section 8.5).
3.2.2 Mobile VR Interfaces
In this work, visualisation within the mobile virtual environment (the spatial 3D map) can take
place in two modes: automatic and manual. In the automatic mode, a GPS automatically feeds
and updates the spatial 3D map with respect to the user's position in the real space (Figure 3-2,
b). This mode is designed for intuitive navigation. In the manual mode, the control is fully with
the user, and it was designed to provide alternative ways of navigating into areas where we
cannot obtain a GPS signal (Figure 3-2, a). Users might also want to stop and observe parts of
the environment in which case control is left in their hands (Figure 3-2, c).
Figure 3-2 Mobile VR Interfaces (a) Manual mode (b) GPS mode (c) VR view
The immersion provided by GPS navigation is considered as pseudo-egocentric because
fundamentally the camera is positioned at a height, which does not represent a realistic
scenario. If, however, the user switches to manual navigation, any perspective can be obtained,
Interactive Virtual and Augmented Reality Environments
13
which is very helpful for decision-making purposes. While in a manual mode any model can be
explored and analysed, therefore additional enhancements of the graphical representation are of
vital importance [50].
Paper: Liarokapis, F., Brujic-Okretic, V., Papakonstantinou, S. Exploring Urban
Environments using Virtual and Augmented Reality, Journal of Virtual Reality and
Broadcasting, GRAPP 2006 Special Issue, Digital Peer Publishing, 3(5): 1-13, 2006.
Contribution (70%): Collaboration on the design of the architecture. Implementation
of the majority of the VR interface. Write-up of most of the paper (full text on section
8.6).
3.3 Augmented Reality Interfaces
Results from the previous sub-sections were used as an input for the implementation of the
interactive AR interfaces and again can be categorised as: (a) indoor interfaces and (b) mobile
interfaces.
3.3.1 Indoor AR Interfaces
Human computer interaction techniques can offer greater autonomy when compared with
traditional windows style interfaces. Although some research has been performed into the
integration of such interfaces into AR systems [51], [52], [53] the design and implementation of
an effective AR system that can deliver realistically audio-visual information in a user-friendly
manner is a difficult task and an area of continuous research. However, it is very difficult to
create experiences to eliminate these barriers [53] preventing even nowadays the creation of
new AR applications. To address the above issues, a number of prototype AR interfaces were
proposed and implemented.
Interactive Virtual and Augmented Reality Environments
14
Figure 3-3 Indoor AR Interfaces [54]
A prototype high-level AR architecture using a selection of cost effective software and
hardware components to realise robust visualisation and interaction of virtual information for
indoor environments was developed. The software libraries are based on the integration of
computer vision, computer graphics and auditory techniques resulting in three prototype videosee
through AR architectures and eventually to a general purpose AR interface. The main
novelty of the interface is that they are capable of superimposing simultaneously digital
information such as metadata, 2D images, 3D models, spatial sound and videos [54]. Spatial
sound was simulated based on a linear approximation of distance to give the impression of 3D
space. The greatest advantage of the proposed AR interface is that it allows participants to
perform complex operations very accurately (Figure 3-3). Specifically, sometimes it is of
crucial importance to superimpose objects in specific locations in the real environment. Using
other methods it could take a great amount of time and effort (depending on the experience of
the user) to achieve this and it will definitely not be very accurate. The graphical user interface
(GUI) interaction techniques offer the solution to this issue using double point precision
accuracy. Finally, it allows users to transfer data from the internet into a tabletop AR
environment [55].
Paper: Liarokapis, F. An Augmented Reality Interface for Visualizing and Interacting
with Virtual Content, Virtual Reality, Springer, 11(1): 23-43, 2007.
Contribution (100%): Design of the architecture and implementation of the AR
interface. Write-up of the paper (full text on section 8.7).
Interactive Virtual and Augmented Reality Environments
15
Paper: White, M., Mourkoussis, N., Darcy, J., Petridis, P., Liarokapis, F., Lister, P.F.,
Walczak, K., Wojciechowski, R., Cellary, W., Chmielewski, J., Stawniak, M., Wiza,
W., Patel, M., Stevenson, J., Manley, J., Giorgini, F., Sayd, P., Gaspard, F. ARCO-An
Architecture for Digitization, Management and Presentation of Virtual Exhibitions,
Proc. of the 22nd
International Conference on Computer Graphics (CGI'2004), IEEE
Computer Society, Hersonissos, Crete, June 16-19, 622-625, 2004.
Contribution (15%): Collaboration on the design of the VR and AR architecture.
Implementation of the most of the VR and AR interface. Write-up of parts of the paper
(full text on section 8.5).
3.3.2 Mobile AR Interfaces
The two most common tracking techniques used in AR applications include computer vision
and external sensor systems. In this work both approaches were investigated but since the
requirement was to have an AR system that could be operational anywhere and everywhere, the
sensor approach was preferred. A GPS receiver and digital compass can provide sufficient
accuracy for displaying points of interest in the approximate location relative to the user’s
position. At present, however, these sensor solutions lack the accuracy required for more
advanced AR functionality, such as aligning an alternative facade on the front of a building in
the real world scene. There is no need for a head-mounted display (HMD), since the screen on
the device can be aligned with the real world scene. On the screen of the device, information
can either be overlaid on imagery captured from the device’s internal camera, or the screen can
display just the virtual information with the user viewing the real world scene directly.
Figure 3-4 Mobile AR Interfaces
Interactive Virtual and Augmented Reality Environments
16
For the computer vision approach, road signs which are most of the time represented in black
color on a white background, were used as an initial approach. Later on, road signs were
replaced and distinctive natural features like door entrances, windows etc, have been
experimentally tested to see whether they can be used as 'natural markers' (Figure 3-4). For the
sensor approach a similar approach to section 3.2.2 was adopted. The main challenge however
was to reduce the latency produced by the sensors (GPS and digital compass) as well as provide
a textual based augmentation. The AR interface can then provide navigational information, in
the form of distance and direction annotations, to guide the user to the location associated with
those results [56].
Paper: Liarokapis, F., Brujic-Okretic, V., Papakonstantinou, S. Exploring Urban
Environments using Virtual and Augmented Reality, Journal of Virtual Reality and
Broadcasting, GRAPP 2006 Special Issue, Digital Peer Publishing, 3(5): 1-13, 2006.
Contribution (70%): Collaboration on the design of the architecture. Implementation
of the majority of the VR interface. Write-up of most of the paper (full text on section
8.6).
Paper: Mountain, D., Liarokapis, F. Mixed reality (MR) interfaces for mobile
information systems, Aslib Proceedings, Special issue: UK library & information
schools, Emerald Press, 59(4/5): 422-436, 2007.
Contribution (50%): Collaboration on the design of the architecture and
implementation of the VR interface. Write-up of half of the paper (full text on section
8.8).
Interactive Virtual and Augmented Reality Environments
17
Chapter 4
Interactive Environments
4.1 Introduction
HCI is an important aspect of any computer system and this section is focused on illustrating
different novel paradigms that were developed. Both VR and AR users can make use of more
sophisticated hardware devices to perceive and interact with the environment. These can be
categorized in three different areas including: multimodal, wireless sensor networks, and braincomputer
interactions.
4.2 Multimodal Interaction
In this section, tangible AR gaming environments that can be used to enhance entertainment
using a multimodal interface were explored [48]. The main objective of the research was to
design and implement generic tangible interfaces that are user-friendly in terms of interaction
and can be used by a wide range of players, including the elderly or people with disabilities.
To allow for seamless interaction between the users and the superimposed environmental
information, a number of custom interaction devices have been researched. In particular, six
different types of interaction were implemented including: hand position and orientation, pinch
glove interaction, head orientation, Wii interaction, and ultra mobile personal computer
(UMPC) I/O manipulation. However, since usability and mobility were crucial, only a few
interaction devices were finally integrated to the final architecture. In the final configuration
players can interact using different combinations between a pinch glove, a Wiimote, a six
degrees-of-freedom (DOF) tracker, through tangible ways as well as through I/O controls. An
overview of the system is shown in Figure 4-1.
Interactive Virtual and Augmented Reality Environments
18
Figure 4-1 Multimodal augmented reality interface [48]
Two tabletop AR games have been designed and implemented including a racing game and a
pile game. The goal of the AR racing game was to start the car and move around the track
without colliding with either the wall or the objects that exist in the gaming arena. Initial
evaluation results showed that multimodal-based interaction games can be beneficial in gaming.
Based on these results, an AR pile game was implemented with the goal of completing a circuit
of pipes (from a starting point to an end point on a grid). Initial evaluation showed that tangible
interaction is preferred to keyboard interaction and that tangible games are much more
enjoyable. From the research proposed many potential gaming applications could be produced
such as strategy, puzzles and action games.
Paper: Liarokapis, F., Macan, L., Malone, G., Rebolledo-Mendez, G., de Freitas, S.
Multimodal Augmented Reality Tangible Gaming, Journal of Visual Computer,
Springer, 25(12): 1109-1120, 2009.
Contribution (30%): Contribution on the design of the architecture. Implementation of
parts of the AR interface. Write-up of most of the paper (full text on section 0).
4.3 Wireless Sensor Network Based Interaction
Wireless Sensor Network (WSN) technology uses networks of sense enabled miniature
computing devices to gather information about the world around them. While the gathering of
data within a sensor network is one challenge, another of equal importance is presenting the
data in a useful way to the user. A prototype mobile AR system for visualising environmental
information including temperature and sound data was proposed [57]. Sound and temperature
data are transmitted wirelessly to the client (which is a handheld device). Environmental
Interactive Virtual and Augmented Reality Environments
19
information is represented graphically, as 3D objects and textual information, in real-time
performance (Figure 4-2). Participants visualise and interact with the augmented environmental
information using a small but powerful handheld computer. The main contribution of this work
is the visual representation of wireless sensor data in a meaningful and tangible way.
Figure 4-2 Wireless Sensor Network Based Interaction
In terms of operation, as soon as the temperature and sound sensors are ready to transmit data,
visual representations including a 3D thermometer and a 3D music note as well as textual
annotations are superimposed onto the appropriate marker. When environmental data is
transferred to the AR interface, the colour of the 3D thermometer and the 3D music note change
according to the temperature level and sound volume accordingly. Textual annotations indicate
the sensor readings. For the temperature data, the readings from the sensors are superimposed
as text next to the 3D thermometer. For the sound data, a different measure was employed
based on a scale 0-4, where ‘0’ corresponds to ‘quiet’, ‘1’ corresponds to ‘low’, ‘2’ corresponds
to ‘medium’, ‘3’ corresponds to ‘loud’ and ‘4’ corresponds to ‘very loud’.
Paper: Goldsmith, D., Liarokapis, F., Malone, G., Kemp, J. Augmented Reality
Environmental Monitoring Using Wireless Sensor Networks, Proc. of the 12th
International Conference on Information Visualisation (IV08), IEEE Computer Society,
8-11 July, 539-544, 2008.
Contribution (30%): Collaboration on the design of the architecture. Advice on the
implementation of the majority of the VR interface. Write-up of most of the paper (full
text on section 8.10).
Interactive Virtual and Augmented Reality Environments
20
4.4 Brain-Computer Interaction
Non-invasive BCIs operate by recording the brain activity from the scalp with EEG sensors
attached to the head on an electrode cap or headset without being surgically implanted.
However, they still have a number of problems since they cannot function as accurately as other
natural user interfaces and traditional input devices such as the standard keyboard and mouse.
The current research done examined the application of commercial and non-invasive EEGbased
brain–computer (BCIs) interfaces with serious games [47].
Figure 4-3 Brain-computer interaction [47]
Two different EEG-based BCI devices were used to fully control the same serious game (Figure
4-3). The first device (NeuroSky MindSet) uses only a single dry electrode and requires no
calibration. The second device (Emotiv EPOC) uses 14 wet sensors requiring an additional
training of a classifier. User testing was performed on both devices with sixty-two participants
measuring the player experience as well as key aspects of serious games, primarily learnability,
satisfaction, performance and effort. Recorded feedback indicates that the current state of BCIs
can be used in the future as alternative game interfaces following familiarisation and in some
cases calibration. Comparative analysis showed significant differences between the two
Interactive Virtual and Augmented Reality Environments
21
devices. The first device provides more satisfaction to the players whereas the second device is
more effective in terms of adaptation and interaction with the serious game.
Paper: Liarokapis, F., Debattista, K., Vourvopoulos, A., Ene, A., Petridis, P.
Comparing interaction techniques for serious games through brain-computer interfaces:
A user perception evaluation study, Entertainment Computing, Elsevier, 5(4): 391-399,
2014.
Contribution (40%): Collaboration on the design of the architecture. Advice on the
implementation of the serious game as well as in the BCI interface. Write-up of most of
the paper (full text on section 8.11).
Interactive Virtual and Augmented Reality Environments
22
Chapter 5
Application Domains
5.1 Introduction
This section presents how the above mentioned research can be applied for creating different
applications such as archaeology, navigation, education, and serious games.
5.2 Virtual Archaeology
A number of museums hold large archives or collections of artefacts, which they cannot exhibit
in a low cost and efficient way. Another underlying issue is that museums simply do not have
the space to exhibit all the artefacts in an educational and learning manner. Museums are
interested in the digitising of their collections not only for the sake of preserving the cultural
heritage, but also to make the information content accessible to the wider public in a manner
that is attractive. Emerging technologies, such as VR, AR and Web3D are widely used to create
virtual museum exhibitions both in a museum environment through informative kiosks and on
the World Wide Web. This work surveyed the field, and while it explored the various kinds of
virtual museums in existence, it discusses the advantages and limitations involved with a
presentation of old and new methods and of the tools used for their creation.
Interactive Virtual and Augmented Reality Environments
23
Figure 5-1 Archaeology (a) Complete solution and (b) Artefact visualisation in AR
The work also provided a complete tool chain starting with the stereo photogrammetry based
digitization of artefacts, their refinement, collection and management with other multimedia
data, and visualization using virtual and augmented reality (Figure 5-1, a). The generated
system is a one-stop-solution for museums to create, manage and present both content and
context for virtual exhibitions (Figure 5-1, b). Interoperability and standards are also key
features of our system allowing both small and large museums to build a bespoke system suited
to their needs [55]. Moreover, different multimodal interfaces have been developed for cultural
heritage. The integration of these technologies provides a novel multimodal mixed reality
interface that facilitates the implementation of more interesting digital heritage exhibitions.
With such exhibitions, participants can switch dynamically between virtual web-based
environments to indoor augmented reality environments as well as make use of various
multimodal interaction techniques to better explore different applications such as virtual
museums.
Paper: Sylaiou, S, Liarokapis, F., Kotsakis, K., Patias, P. Virtual museums, a survey
and some issues for consideration, Journal of Cultural Heritage, Elsevier, 10(4): 520-
528, 2009.
Contribution (30%): Collaboration on the collection of the material and write-up of
the paper (full text on section 8.12).
Paper: Liarokapis, F. An Augmented Reality Interface for Visualizing and Interacting
with Virtual Content, Virtual Reality, Springer, 11(1): 23-43, 2007.
Contribution (100%): Design of the architecture and implementation of the AR
interface. Write-up of the paper (full text on section 8.7).
Interactive Virtual and Augmented Reality Environments
24
Paper: White, M., Mourkoussis, N., Darcy, J., Petridis, P., Liarokapis, F., Lister, P.F.,
Walczak, K., Wojciechowski, R., Cellary, W., Chmielewski, J., Stawniak, M., Wiza,
W., Patel, M., Stevenson, J., Manley, J., Giorgini, F., Sayd, P., Gaspard, F. ARCO-An
Architecture for Digitization, Management and Presentation of Virtual Exhibitions,
Proc. of the 22nd
International Conference on Computer Graphics (CGI'2004), IEEE
Computer Society, Hersonissos, Crete, June 16-19, 622-625, 2004.
Contribution (15%): Collaboration on the design of the VR and AR architecture.
Implementation of the most of the VR and AR interface. Write-up of parts of the paper
(full text on section 8.5).
5.3 Urban Navigation
Up to now most attempts to develop pedestrian navigation tools for the urban environment have
used GPS technologies to display position on two-dimensional digital maps (as in the classic
'satnav' systems on the market). Although GPS is the key technology for location-based services
(LBS), it cannot currently meet all the requirements for navigation in urban environments.
Specifically, GPS technologies suffer from multipath signal degradation and they cannot
provide orientation information at low or zero speed, which is an essential component of
navigation. It has also been demonstrated that maps are not always the most effective interfaces
to pedestrian navigation applications on mobile devices. Orientation information is necessary to
help the user self-localise in an unknown environment and can be provided by either sensors
(i.e. accelerometers or digital compass) or through computer vision techniques.
Interactive Virtual and Augmented Reality Environments
25
Figure 5-2 Interfaces for presenting information retrieved from a mobile information
system
The LOCUS project has developed alternative, mixed reality interfaces for existing mobile
information system technology based upon the WebPark platform. The WebPark platform can
assist users in formulating spatially referenced, mobile queries. The retrieved set of spatially
referenced results can then be displayed using various alternative interfaces: a list, a map, VR
or AR (Figure 5-2). An evaluation exercise was undertaken to assess appropriate levels of
detail, realism and interaction for the mobile virtual reality interface. Virtual 3D scenes were
found to have many advantages when compared to paper maps: the most positive feature was
found to be the possibility to recognize the features in the surrounding environment, which
provides a link between the real and virtual worlds. Overall, results showed that these
technologies are helpful however, the most suitable interface is likely to vary according to the
user and task in hand [56].
Paper: Liarokapis, F., Brujic-Okretic, V., Papakonstantinou, S. Exploring Urban
Environments using Virtual and Augmented Reality, Journal of Virtual Reality and
Broadcasting, GRAPP 2006 Special Issue, Digital Peer Publishing, 3(5): 1-13, 2006.
Contribution (70%): Collaboration on the design of the architecture. Implementation
of the majority of the VR interface. Write-up of most of the paper (full text on section
8.6).
Interactive Virtual and Augmented Reality Environments
26
Paper: Mountain, D., Liarokapis, F. Mixed reality (MR) interfaces for mobile
information systems, Aslib Proceedings, Special issue: UK library & information
schools, Emerald Press, 59(4/5): 422-436, 2007.
Contribution (50%): Collaboration on the design of the architecture and
implementation of the VR interface. Write-up of half of the paper (full text on section
8.8).
5.4 Higher Education
5.4.1 VR and AR in Education
Although current teaching methods work successfully, Universities are interested in introducing
more productive methods for improving the learning experience and increasing the level of
understanding of the students. The emergence of new technological innovations such as the
Internet, multimedia, virtual and augmented reality technologies, was able to demonstrate the
weaknesses of traditional teaching methods but also the potential of improving them. This work
focuses on the use of high-level AR interfaces for the construction of collaborative educational
applications that can be used in practice to enhance current teaching methods (Figure 5-3).
Figure 5-3 Operation of the AR application (a) AR environment (b) Visualisation of
educational content [59]
Interactive Virtual and Augmented Reality Environments
27
A combination of multimedia information including spatial 3D models, images, textual
information, video, animations and sound, can be superimposed in a student-friendly manner
into the learning environment. In several case studies, different learning scenarios have been
carefully designed based on HCI principles so that meaningful virtual information is presented
in an interactive and compelling way. Collaboration between the participants is achieved
through use of a tangible AR interface that uses marker cards as well as an immersive AR
environment which is based on GUIs and sensors devices. The interactive AR interface has
been piloted in the classroom of two UK universities in the Departments of Informatics and
Information Science. Initial results indicated that students appreciated this type of tool for
assisting the lecturer and improving the learning process [59].
Paper: Liarokapis, F., Mourkoussis, N., White, M., Darcy, J., Sifniotis, M., Petridis,
P., Basu, A., Lister, P.F. Web3D and Augmented Reality to support Engineering
Education, World Transactions on Engineering and Technology Education, UICEE,
3(1): 11-14, 2004.
Contribution (80%): Collaboration on the design of the architecture. Implementation
of the most of the VR interface. Write-up of most of the paper (full text on section 8.4).
Paper: Liarokapis, F., Anderson, E. Using Augmented Reality as a Medium to Assist
Teaching in Higher Education, Proc. of the 31st
Annual Conference of the European
Association for Computer Graphics (Eurographics 2010), Education Program,
Norrkoping, Sweden, 4-7 May, 9-16, 2010.
Contribution (90%): Implementation of the AR interface and collection of all the
experimental data. Write-up of most of the paper (full text on section 8.13).
5.4.2 Activity-Led Introduction to First Year Creative Computing
One of the goals of higher education is to prepare students for life by enabling them to become
independent learners. Independent learning does not come easy to students who have adapted to
becoming passive participants in the learning process, where they are presented with all of the
required learning material, a learning style that many of them acquired during their secondary
education [60]. Activity Lead Learning (ALL) is focused on providing students with a specific
problem, scenario, task or activity in order to motivate, engage and stimulate them for providing
effective and efficient solutions. The range of activities and tasks has a wide range and
according to the requirements, different activities have to be planned and disseminated. ALL is
a student-centred approach that has its roots in problem-based learning (PBL) [61].
Interactive Virtual and Augmented Reality Environments
28
Misconceptions about the nature of the computing disciplines pose a serious problem to
university faculties that offer computing degrees, as students enrolling on their programmes
may come to realise that their expectations are not realistic. This frequently results in the
students’ early disengagement from the subject of their degrees, which in turn can lead to
excessive ‘wastage’, i.e. reduced retention. This work, reports on our academic group’s
attempts within creative computing degrees at a UK university to counter these problems
through the introduction of a six-week long project that newly enrolled students embark on at
the very beginning of their studies (Figure 5-4).
Figure 5-4 3D etch-a-sketch. (a) Student-based drawing application [60], (b) student
group’s hardware interface [60]
This group project provided a breadth-first, activity-led introduction to their chosen academic
discipline, aiming to increase student engagement while providing a stimulating learning
experience with the overall goal to increase retention. The methods and results of two iterations
of these projects in the 2009/2010 and 2010/2011 academic years were presented. Results
indicate that the ALL approach worked well for these cohorts, with students expressing
increased interest in their chosen discipline, in addition to noticeable improvements in retention
following the first year of the students’ studies [60].
Paper: Anderson, E.F., Peters, C., Halloran, J., Every, P., Shuttleworth, J., Liarokapis,
F., Lane, R., Richards, M. In at the Deep End: An Activity-Led Introduction to First
Year Creative Computing, Computer Graphics Forum, Wiley-Blackwell, 31(6): 1852-
1866, September, 2012.
Contribution (10%): Collaboration on the teaching methods and write-up of the paper
(full text on section 0).
Interactive Virtual and Augmented Reality Environments
29
5.5 Serious Games and Virtual Environments
5.5.1 Serious Games Technologies
The success of computer games, fuelled among factors such as the great realism that can be
attained using modern consumer hardware, and the key techniques of games technology that
have resulted from this, have given way to new types of games, including serious games, and
related application areas, such as virtual worlds, mixed reality, augmented reality and virtual
reality. All of these types of application utilise core games technologies (e.g. 3D environments)
as well as novel techniques derived from computer graphics, human computer interaction,
computer vision and artificial intelligence, such as crowd modelling. Together these
technologies have given rise to new sets of research questions, often following technologically
driven approaches to increasing levels of fidelity, usability and interactivity. The aim has been
to use this state-of-the-art report to demonstrate the potential of serious games technology for
cultural heritage, to outline key problems and to indicate areas of technology where solutions
for remaining challenges may be found. However, the same technology can be easily applied to
other application domains.
Paper: Anderson, E.F., McLoughlin, L., Liarokapis, F., Peters, C., Petridis, P., de
Freitas, S. Developing serious games for cultural heritage: a state-of-the-art review,
Virtual Reality, Springer, 14(4): 255-275, 2010.
Contribution (20%): Write-up of the serious games, virtual and augmented reality
sections of the paper. Also co-written the introduction and conclusions (full text on
section 8.15).
5.5.2 Learning as Immersive Experiences within Serious Games
Traditional approaches to learning have often focused upon knowledge transfer strategies that
have centred on textually based engagements with learners, and dialogic methods of interaction
with tutors. The use of virtual worlds, with text-based, voice-based and a feeling of ‘presence’
naturally is allowing for more complex social interactions and designed learning experiences
and role plays, as well as encouraging learner empowerment through increased interactivity. To
unpick these complex social interactions and more interactive designed experiences, this work
considers the use of virtual worlds in relation to structured learning activities for college and
lifelong learners [62]. This consideration necessarily has implications upon learning theories
adopted and practices taken up, with real implications for tutors and learners alike. Alongside
this is the notion of learning as an ongoing set of processes mediated via social interactions and
Interactive Virtual and Augmented Reality Environments
30
experiential learning circumstances within designed virtual and hybrid spaces. This implies the
need for new methodologies for evaluating the efficacy, benefits and challenges of learning in
these new ways.
Figure 5-5 Learning as immersive experiences. (a) Four Dimensional Framework [63], (b)
Meeting in-world in Second Life for virtual tour [62]
Towards this aim, this work proposed an evaluation methodology for supporting the
development of specified learning activities in virtual worlds, based upon inductive methods
and augmented by the four-dimensional framework [63]. The approach was based upon an
assumption that learning experiences need to be designed, used and tested in a
multidimensional way due to the multimodal nature of the interface (Figure 5-5). The presented
evaluation methodology may be used as a design tool for designing learning activities in-world
as well as for evaluating the efficacy of experiences, due to its set of consistent criteria.
Paper: de Freitas, S., Rebolledo-Mendez, G., Liarokapis, F., Magoulas, G.,
Poulovassilis, A. Learning as immersive experiences: Using the four-dimensional
framework for designing and evaluating immersive learning experiences in a virtual
world, British Journal of Educational Technology, Blackwell Publishing, 41(1): 69-85,
2010.
Contribution (20%): Collaboration on the design and evaluation of the serious game
as well as the write-up of the paper (full text on section 8.16).
Interactive Virtual and Augmented Reality Environments
31
Chapter 6
Conclusions and Future Work
6.1 Conclusions
In this habilitation thesis there was a presentation of several contributions to interactive virtual
and augmented reality environments as well as various application domains. The thesis covered
a number of different procedural generation techniques for generating content as well as human
behaviour. Moreover, it provided contributions in virtual and augmented reality environments
ranging from indoor to outdoor (mobile) solutions. It also covered a significant amount of
contributions in the area of HCI, including more standard techniques using sensors to more
advanced ones such as EEG methods. Finally, it showed how all the above mentioned methods
can be applied in creating novel applications. It is worth mentioning that all these areas are fast
evolving and the state-of-the-art research changes very fast.
6.2 Future Work
In terms of future directions, it is realistic to expect contributions in all areas mentioned in this
thesis. Firstly, to explore in more detail procedural approaches in different contexts. Secondly,
to develop further the architecture of virtual and augmented reality allowing for more realistic
computer graphics functionality. Thirdly, to improve the human computer interaction
techniques by making use of multimodal approaches as well as more sensor devices. Finally, to
apply the systems in different application domains such as medicine.
Interactive Virtual and Augmented Reality Environments
32
Chapter 7
References
[1] Sutherland, I. The ultimate display, Proc. of the IFIP Congress, vol.2, 506-508, (1965).
[2] Anderson, E.F., McLoughlin, L., Liarokapis, F., Peters, C., Petridis, P., de Freitas, S.
Developing serious games for cultural heritage: a state-of-the-art review, Virtual Reality,
Springer, 14(4): 255-275, 2010.
[3] Pausch, R., Crea, T., Conway, M. A literature survey for virtual environments: military
flight simulator visual systems and simulator sickness, Presence: Teleoperators and Virtual
Environments, MIT Press, 1(3): 344-363, 1992.
[4] Schuemie, M.J., Straaten, P.V.D., et al., Research on Presence in Virtual Reality: A Survey,
CyberPsychology & Behavior, 4(2): 183-201, 2001.
[5] Zhao, Q.P. A survey on virtual reality, Science in China Series F: Information Sciences,
Springer, 52(3): 348-400, 2009.
[6] LaViola, J.J. A discussion of cybersickness in virtual environments, ACM SIGCHI Bulletin,
ACM Press, 32(1): 47-56, 2000.
[7] Feiner, S.K. Augmented Reality: A New Way of Seeing. Scientific American, 286, 4, April
24, 48–55, (2002).
[8] Liarokapis, F., Augmented Reality Interfaces - Architectures for Visualising and Interacting
with Virtual Information, Sussex theses S 5931, Department of Informatics, School of
Science and Technology, University of Sussex, Falmer, UK, 2005.
[9] Azuma, R. A Survey of Augmented Reality, Teleoperators and Virtual Environments, 6(4):
355-385, 1997.
[10] Mahoney, D. Better Than Real, Computer Graphics World, February 1999, 32-40, 1999.
Interactive Virtual and Augmented Reality Environments
33
[11] Klinker, G., Ahlers, et al. Confluence of Computer Vision and Interactive Graphics for
Augmented Reality, PRESENCE: Teleoperations and Virtual Environments, Special Issue
on Augmented Reality, 6(4): 433-451, August 1997.
[12] Azuma, R., Baillot, Y., et al. Recent Advances in Augmented Reality, Computers Graphics
and Applications, IEEE Computer Society, November/December, 21(6): 34-47, 2001.
[13] Van Krevelen, D.W.F., Poelman, R. A survey of augmented reality technologies,
applications and limitations, International Journal of Virtual Reality 9(2): 1-20, 2009.
[14] Smelik, R.M., De Kraker, K.J., Tutenel, T., Bidarra, R., Groenewegen, S.A. A survey of
procedural methods for terrain modelling, Proc. of the CASA Workshop on 3D Advanced
Media In Gaming And Simulation (3AMIGAS), 25-34, 2009.
[15] Kelly, G., McCabe, H. A survey of procedural techniques for city generation, ITB Journal,
14, 87-130, 2006.
[16] Smelik, R. M., Tutenel, T., Bidarra, R., Benes, B. A survey on procedural modelling for
virtual worlds, Computer Graphics Forum, 33(6): 31-50, 2014.
[17] Parish, Y.I.H. Muller, P. Procedural Modeling of Cities. Proc. of the 28th
Annual Conference
on Computer Graphics and Interactive Techniques (SIGGRAPH '01), ACM Press, 301-308,
2001.
[18] Greuter, S., Parker, J., Stewart, N., Leach, G., Real-time Procedural Generation of 'Pseudo
Infinite' Cities, Proc. of the 1st
International Conference on Computer Graphics and
Interactive Techniques in Australasia and South East Asia (Graphite), 87-95, 2003.
[19] Wonka, P., Wimmer, M., Sillion, F. & Ribarsky, W. Instant Architecture, ACM
Transactions on Graphics, 22(3): 669-677, 2003.
[20] Muller, P., Wonka P., et al., Procedural Modeling of Buildings. ACM Transactions on
Graphics, 25(3): 614-623, 2006.
[21] Azahar, M.A.B.M., Sunar, M.S., Daman, D., Bade, A. Survey on Real-Time Crowds
Simulation, Technologies for E-Learning and Digital Entertainment, Lecture Notes in
Computer Science, Springer, Volume 5093, 573-580, 2008.
[22] Zhou, S., Chen, D., et al. Crowd modeling and simulation technologies, ACM Transactions
on Modeling and Computer Simulation (TOMACS), ACM Press, 20(4), Article 20, 2010.
[23] Luo, L., Zhou, S., Cai, W., Low, M., Lees, M. Toward a Generic Framework for Modeling
Human Behaviors in Crowd Simulation, Proc. of the IEEE/WIC/ACM Int’l Joint
Interactive Virtual and Augmented Reality Environments
34
Conference on Web Intelligence and Intelligent Agent Technology - Volume 02 (WI-IAT
'09), Vol. 2. IEEE Computer Society, Washington, DC, USA, 275-278, 2009.
[24] Cui, X., Shi, H. A*-based Pathfinding in Modern Computer Games, International Journal of
Computer Science and Network Security, 11(1): 125-130, 2011.
[25] Reynolds C. Steering behaviours for autonomous characters, Proc. of game developers
conference, Miller Freeman Game Group, San Francisco, California, 763-782, 1999.
[26] Ondej, J., Pettre, J., Olivier, A.-H., Donikian, S. A Synthetic-Vision-Based Steering
Approach for Crowd Simulation, ACM Transactions on Graphics, 29(4): 123, 2010.
[27] Helbing, D., Molnar, P. Social force model for pedestrian dynamics, Phys. Rev E, American
Physical Society, 51(5): 4282-4286, 1995.
[28] Ennis, C., Peters, C., O'Sullivan, C. Perceptual effects of scene context and viewpoint for
virtual pedestrian crowds, ACM Transactions on Applied Perception (TAP), 8(2), Article
10, 2011.
[29] O'Sullivan, C., Ennis, C. Metropolis: multisensory simulation of a populated city,
International Conference on Games and Virtual Worlds for Serious Applications (VSGames),
IEEE Computer Society, Athens, Greece, 1-7, 2011.
[30] Rego, P., Moreira, P.M., Reis, L.P. Serious games for rehabilitation: A survey and a
classification towards a taxonomy, Proc. of the 5th
Iberian Conference on Information
Systems and Technologies (CISTI), IEEE Computer Society, 1-6, 2010.
[31] Paivio, A. Mental representations: A dual coding approach, Oxford University Press, New
York, 1990.
[32] Baddeley, A.D. The episodic buffer: a new component of working memory?, Trends in
Cognitive Science, 4(11): 417-423, 2000.
[33] Zyda, M. From visual simulation to virtual reality to games, IEEE Computer, 38(9): 25-32,
2005.
[34] de Freitas, S., Oliver, M. How can exploratory learning with games and simulations within
the curriculum be most effectively evaluated?, Computers and Education, Elsevier, 46(3):
249-264, 2006.
[35] Dix, A., Finlay, J., Abowd, G., Beale, R. Human-computer interaction, 3rd
edition, Prentice
Hall, 2003.
Interactive Virtual and Augmented Reality Environments
35
[36] Wright, P.C., Fields, R.E., Harrison, M.D. Analyzing Human-Computer Interaction as
Distributed Cognition: The Resources Model, Human–Computer Interaction, Taylor and
Francis, 15(1): 1-41, 2000.
[37] Kjeldskov, J., Graham, C. A review of mobile HCI research methods. In Human-computer
interaction with mobile devices and services, Springer Berlin Heidelberg, 317-335, 2003.
[38] Rekimoto, J., Nagao, K. The World through the Computer: Computer Augmented
Interaction with Real World Environments, Proc. of UIST ’95, (ed B.A. Myers), ACM
Press, Pennsylvania, 29-36, 1995.
[39] Oviatt, S. Ten myths of multimodal interaction, Communications of the ACM, ACM Press,
42(11): 74-81, 1999.
[40] Noghani, J., Liarokapis, F., Anderson, E.F. Randomly Generated 3D Environments for
Serious Games, Proc. of the 2nd IEEE International Conference in Games and Virtual
Worlds for Serious Applications, IEEE Computer Society, Braga, Portugal, 25-26 March, 3-
10, 2010.
[41] Noghani, J., Anderson, E., Liarokapis, F. Towards a Vitruvian Shape Grammar for
Procedurally Generating Classical Roman Architecture, Proc. of the 13th International
Symposium on Virtual Reality, Archaeology and Cultural Heritage VAST 2012, Short and
Project Papers, Eurographics, Brighton, UK, 19-21 November, 41-44, 2012.
[42] Barton, I.M. Palaces. In Roman Domestic Buildings, University of Exeter Press, 91-120,
1996.
[43] Jacobson, J., Holden, L. The virtual egyptian temple, ED-MEDIA: Proccedings of the World
Conference on Educational Media, Hypermedia & Telecommunications 2005.
[44] O'Connor, S., Liarokapis, F., Peters, C. An Initial Study to Assess the Perceived Realism of
Agent Crowd Behaviour in a Virtual City, Proc. of the 5th International Conference on
Games and Virtual Worlds for Serious Applications (VS-Games 2013), IEEE Computer
Society, Bournemouth, UK, 11-13 September, 85-92, 2013.
[45] Liarokapis, F. An exploration from virtual to augmented reality gaming, Simulation and
Gaming, Symposium: Virtual Reality Simulation, SAGE Publications, December, 37(4):
507-533, 2006.
[46] Vourvopoulos, A., Liarokapis, F. Evaluation of commercial brain–computer interfaces in
real and virtual world environment: A pilot study, Computers and Electrical Engineering,
Elsevier, 40(2): 714-729, 2014.
Interactive Virtual and Augmented Reality Environments
36
[47] Liarokapis, F., Debattista, K., Vourvopoulos, A., Ene, A., Petridis, P. Comparing interaction
techniques for serious games through brain-computer interfaces: A user perception
evaluation study, Entertainment Computing, Elsevier, 5(4): 391-399, 2014.
[48] Liarokapis, F., Macan, L., Malone, G., Rebolledo-Mendez, G., de Freitas, S. Multimodal
Augmented Reality Tangible Gaming, Journal of Visual Computer, Springer, 25(12): 1109-
1120, 2009.
[49] Liarokapis, F., Mourkoussis, N., White, M., Darcy, J., Sifniotis, M., Petridis, P., Basu, A.,
Lister, P.F. Web3D and Augmented Reality to support Engineering Education, World
Transactions on Engineering and Technology Education, UICEE, 3(1): 11-14, 2004.
[50] Liarokapis, F., Brujic-Okretic, V., Papakonstantinou, S. Exploring Urban Environments
using Virtual and Augmented Reality, Journal of Virtual Reality and Broadcasting, GRAPP
2006 Special Issue, Digital Peer Publishing, 3(5): 1-13, 2006.
[51] Feiner S, MacIntyre B, et al. Windows on the World: 2D Windows for 3D Augmented
Reality, Proc. of the ACM Symposium on User Interface Software and Technology, Atlanta,
November 3-5, ACM Press, 145-155, 1993.
[52] Haller M, Hartmann W, et al. Combining ARToolKit with Scene Graph Libraries, Proc. of
The 1st
IEEE International Augmented Reality Toolkit Workshop, Darmstadt, Germany, 29
September, 2002.
[53] MacIntyre B., Gandy M., Dow S., Bolter J.D. DART: a toolkit for rapid design exploration
of augmented reality experiences, ACM Transactions on Graphics (TOG), 24(3): 932, 2005.
[54] Liarokapis, F. An Augmented Reality Interface for Visualizing and Interacting with Virtual
Content, Virtual Reality, Springer, 11(1): 23-43, 2007.
[55] White, M., Mourkoussis, N., Darcy, J., Petridis, P., Liarokapis, F., Lister, P.F., Walczak, K.,
Wojciechowski, R., Cellary, W., Chmielewski, J., Stawniak, M., Wiza, W., Patel, M.,
Stevenson, J., Manley, J., Giorgini, F., Sayd, P., Gaspard, F. ARCO-An Architecture for
Digitization, Management and Presentation of Virtual Exhibitions, Proc. of the 22nd
International Conference on Computer Graphics (CGI'2004), IEEE Computer Society,
Hersonissos, Crete, June 16-19, 622-625, 2004.
[56] Mountain, D., Liarokapis, F. Mixed reality (MR) interfaces for mobile information systems,
Aslib Proceedings, Special issue: UK library & information schools, Emerald Press, 59(4/5):
422-436, 2007.
Interactive Virtual and Augmented Reality Environments
37
[57] Goldsmith, D., Liarokapis, F., Malone, G., Kemp, J. Augmented Reality Environmental
Monitoring Using Wireless Sensor Networks, Proc. of the 12th
International Conference on
Information Visualisation (IV08), IEEE Computer Society, 8-11 July, 539-544, 2008.
[58] Sylaiou, S, Liarokapis, F., Kotsakis, K., Patias, P. Virtual museums, a survey and some
issues for consideration, Journal of Cultural Heritage, Elsevier, 10(4): 520-528, 2009.
[59] Liarokapis, F., Anderson, E. Using Augmented Reality as a Medium to Assist Teaching in
Higher Education, Proc. of the 31st
Annual Conference of the European Association for
Computer Graphics (Eurographics 2010), Education Program, Norrkoping, Sweden, 4-7
May, 9-16, 2010.
[60] Anderson, E.F., Peters, C., Halloran, J., Every, P., Shuttleworth, J., Liarokapis, F., Lane, R.,
Richards, M. In at the Deep End: An Activity-Led Introduction to First Year Creative
Computing, Computer Graphics Forum, Wiley-Blackwell, 31(6): 1852-1866, September,
2012.
[61] Savin-Baden, M., Major, C. Foundations of Problem Based Learning, Open University
Press, Buckingham, UK, 2004.
[62] de Freitas, S., Rebolledo-Mendez, G., Liarokapis, F., Magoulas, G., Poulovassilis, A.
Learning as immersive experiences: Using the four-dimensional framework for designing
and evaluating immersive learning experiences in a virtual world, British Journal of
Educational Technology, Blackwell Publishing, 41(1): 69-85, 2010.
[63] de Freitas, S. Serious virtual worlds: a scoping study. Bristol: Joint Information Systems
Committee, Report, 3rd
November 2008. (Available at:
http://www.jisc.ac.uk/publications/publications/seriousvirtualworldsreport.aspx, Accessed
at: January 2015).
Interactive Virtual and Augmented Reality Environments
38
Chapter 8
Appendix – Paper Reprints
In the following sections, copies of the papers used for this habilitation thesis are provided. The
selected conference papers concern topics which have not yet been published in journal papers,
but this will happen in the future, since the work is on-going.
Interactive Virtual and Augmented Reality Environments
39
8.1 Paper #1
Noghani, J., Liarokapis, F., Anderson, E.F. Randomly Generated 3D Environments for Serious
Games, Proc. of the 2nd
IEEE International Conference in Games and Virtual Worlds for
Serious Applications, IEEE Computer Society, Braga, Portugal, 25-26 March, 3-10, 2010.
Contribution (40%): Design of the architecture, implementation of smoothing techniques and
advice on evaluation. Write-up of most of the paper.
Randomly Generated 3D Environments for Serious Games
Jeremy Noghani
Interactive Worlds Applied
Research Group
Coventry University
Coventry, UK
noghanij@coventry.ac.uk
Fotis Liarokapis
Interactive Worlds Applied
Research Group
Coventry University
Coventry, UK
F.Liarokapis@coventry.ac.uk
Eike Falk Anderson
Interactive Worlds Applied
Research Group
Coventry University
Coventry, UK
Eike.Anderson@coventry.ac.uk
Abstract— This paper describes a variety of methods that
can be used to create realistic, random 3D environments for
serious games requiring real-time performance. These
include the generation of terrain, vegetation and building
structures. An interactive flight simulator has been created
as proof of concept. An initial evaluation with two small
samples of users (remote and hallway) revealed some
usability issues but also showed that overall the flight
simulator is enjoyable and appears realistic and believable.
Keywords – serious games; 3D terrain modeling;
computer graphics; flight simulator.
I. INTRODUCTION
The creation of realistic virtual environments is an
important issue in the computer animation, computer
games, digital film effects and simulation industries. In
recent years, the computer and video games industry has
overtaken both the film and music industries as the top
revenue producers, and the cost for developing a
commercial game now usually requires investments of
several million dollars, involving large teams of
developers that can number in the hundreds of workers,
many of whom are artists and designers providing
content for the decoration of rich virtual game worlds.
While many games companies have the necessary budget
to develop these expensive modern computer games that
employ state of the art computer graphics, not all game
developers have the same resources. Serious games refer
to computer games that are not limited to the aim of
providing just entertainment but which can be used for
other purposes, such as education or training in a number
of application domains. There are several game engines
and online virtual environments that have been used to
design and implement these games for non-leisure
purposes [1]. The development of serious games using
the same approach as used for entertainment games is not
possible because their budget is usually limited to a few
thousand dollars. The literature states that when games
and simulations technologies are applied to nonentertainment
domains, serious gaming applications can
be created [2]. When classifying a game, the definition of
the term ‘game’ does not necessarily require formalised
criteria for success such as praising winners, totalling
points or reaching certain areas in a level [3]. “Gaming is
by no means a replacement for existing model and
simulation building processes and practices but it has
tangible advantages that ultimately could result in wider,
more flexible, and more versatile products” [4].
To overcome these problems, a variety of methods
for automatically creating detailed but also randomised
environments have been developed. The use of these
procedural methods [5] saves time and reduces the
budget for creating effective serious games. However, if
a user wishes to interact with the environment in a
meaningful way, such as in a flight simulator that has an
expansive world and implements collision detection, then
numerous problems arise that are often not dealt with
during the creation stage.
Figure 1 Randomly generated environment for an entertainment flight
simulator
This paper explains some of the problems that can
arise from this situation and describes a variety of
methods that can be used to overcome them. These
methods have been applied to a basic flight simulator
(see Figure 1), so that the results could be observed and
evaluated. Initial results with 2 types of user groups
(remote and hallway) revealed some usuablity issues but
also illustrated that overall the flight simulator is
enjoyable, fun and looks realistic.
The rest of the paper is structured as follows.
Section II provides past methods used in terrain
generation. Section III presents how our flight simulator
serious game was created to allow for navigation and
interaction with the terrain whereas section IV describes
2010 Second International Conference on Games and Virtual Worlds for Serious Applications
978-0-7695-3986-7/10 $26.00 © 2010 IEEE
DOI 10.1109/VS-GAMES.2010.31
3
techniques used for creating infinite terrains. Section V
provides an overview of procedural techniques for
adding vegetation and buildings into randomised
environments. Section VII presents a flight simulator as a
case study and section VIII illustrates initial evaluation
results. Finally section IX presents conclusions and
future work.
II. TERRAIN CREATION METHODS
The majority of the traditional methods used to
create partially randomised terrain involve the use of
fractals, such as fault formation [6] and noise algorithms
[7]. Fractals are objects or shapes which, when split into
smaller parts, result in shapes that are similar to the
original shape as a whole [8] (self-similarity). Their use
is advantageous from a computer graphics point of view,
due to their ability to define complex geometry from a
small set of instructions, and due to their ability to define
shapes that are often difficult to define with simple
Euclidian geometry.
Random one-dimensional midpoint displacement is a
simple algorithm that can be used to create fractals that
appear similar to the two-dimensional silhouette of
mountain ranges. It is implemented by finding the
midpoint of a single line, and displacing its height by a
random offset value. This process is then repeated at the
midpoints between these newly defined points with a
reduced random number range. This algorithm is usually
implemented recursively to allow the silhouette to be
made as detailed as the user requires [9].
When the random midpoint displacement is applied
to the centre of a terrain grid square only, this can be
defined in terms of the displacements of the centre points
of the square’s sides. A more efficient way is to derive
the same result by adding the four corners of the square
and dividing them by four and adding the random value
to the result.
The diamond-square algorithm can be considered an
effective way of applying this one-dimensional method
to a second dimension, creating three-dimensional terrain
if the resulting lattice is used as a virtual heightmap [10].
The recursive algorithm works by refining a square area,
whose four corner points’ height values may be
initialised randomly, and then calculating its centre point
by calculating a mean of the corner points, to which a
random value is added. Midpoints of the edges between
the corners are then calculated in a similar manner and
the original shape is then subdivided by generating new
edges between the newly generated points, forming new
squares for further subdivision, as well as diamond
shapes within the squares. Using a smaller random value
tends to result in the creation of smoother terrain,
whereas larger offsets result in more jaggered edges. The
use of hexagonal and triangular shapes instead of a
square grid has been proposed to reduce the problems of
‘creasing’ in the terrain [11].
There has been some work on modelling terrain
based on realistic physical constraints. Kelley et al. [12]
produced a system in which water drainage is simulated
to shape and constrain the landscape, in a similar manner
to the way in which water erosion affects real terrain.
Musgrave et al. [13] managed to achieve realistic results
through a different method that took hydraulic and
thermal erosion into account when creating a fractal
terrain. Attempts to create more geographically accurate
models have lead to increased realism in some aspects,
but have also increased the complexity of the design and
rendering [14].
An alternative method to create randomised terrain
is the use of Lindenmayer systems. L-systems were
originally created to study organic growth, such as is the
case with plants, but they can easily be adapted to cover
other self-similar structures, such as mountainous terrain
[15]. The distinctive feature of L-systems is the use of
rules that rewrite strings, which can be called recursively
to make a hierarchy of strings. When displayed visually,
these may produce results similar to those of a mountain
range silhouette, for example.
III. SIMULATOR CREATION
For the purpose of this paper, a small, simple flight
simulator was created to allow for navigation and
interaction with the terrain. In the design, usability took
precedence over realism, as a result of which the controls
were deliberately kept simple; the mouse is used to alter
the yaw and pitch of the aircraft, and two keys are used
for acceleration and deceleration. Additionally, to allow
for close examination of the terrain, the in-program
physics were kept liberal; i.e. the plane was allowed to
come to a complete stop in mid-air without gravity
taking effect.
For the terrain itself, heightmaps that were generated
using the diamond-square algorithm were chosen to
provide surface detail. This method was chosen primarily
due to the algorithm’s simplicity and adaptability,
meaning that the system itself could easily be altered to
accommodate a more complex algorithm or to accept
alterations to factors such as surface roughness without
the need to be rewritten from scratch. Additionally, by
choosing a recursive algorithm, the level of detail could
be adjusted as necessary, which proved to be an
advantage when dealing with different methods that
required different levels of processing power.
Figure 2 Pyramid created by a random height-displacement of the
centre point of the square base
Figure 2 illustrates how a vertical deformation of
the centre of the square base connected to the
neighbouring points can produce a pyramid. Instead of
using this linear deformation of the neighbouring points,
a two-dimensional Lorentz distribution [16] for the
4
height array was assumed. By adjusting the width of the
Lorentzian shape, one could obtain a means for
controlling the smoothness of the terrain. Another
distribution that could be used is the Gaussian shape.
After some trials, it was found that this bell-shaped
distribution that creates a smooth terrain was the
following:
( ) ( )
height y random number
D
x x z z Do o
= = ×
− + − +
( )
/
/
2
2 2 2
8
8
where (x0, z0) is the position of the peak and D2
is
the length of the square which is related with the width
of the Lorentz distribution. The value of the width
directly affects the smoothness of the terrain. By
decreasing the width of the bell-shaped distribution, the
terrain becomes steeper.
Water was added to the landscape in the form of a
single translucent plane placed at an appropriate height
(see Figure 1). Small buildings and trees were also added
as decorations for the terrain, using pre-fabricated
models. Their placement on the landscape was decided
randomly, although rules were implemented to prevent
their creation on top of mountain peaks, below the virtual
world’s water level, or on steep slopes.
IV. OUTER BOUNDARIES
In any simulator that requires travelling for a long
time in one direction (flight simulators being the most
notable example), the user may find that they see or pass
an “outer boundary”; the terrain is only created within
certain dimensions, so reaching an area where nothing is
rendered can be a possibility, depending on the actual
implementation. Three potential solutions were devised
and implemented.
The first was to create a terrain of such large extent
that the user would never reach the outer boundary (see
Figure 3). The success of this method depended upon the
scale of the landscape relative to the user’s movement
speed; if a user moved over the length of single terrain
grid squares per second, then they would be less likely to
reach a boundary than a user who moved at 5 grid
squares per second, assuming that other factors remained
equal. The implication of this is that landscapes must be
scaled to be as large as possible to minimise the chance
of the user discovering a boundary. Upon attempting this
method, another problem arose in the form of the terrain
looking bland and flat, due to the spread-out nature of the
polygons. The solution to this was to increase the
number of subdivisions, thus increasing the level of
detail. However, it ought to be noted that if the designer
were to keep increasing the scale and the level of detail
of the terrain at the same time, then eventually the
computer would reach the limitations of its hardware.
For this reason, this method can be considered
appropriate for small demonstration purposes or
applications where the user moves slowly relative to the
virtual landscape, but inappropriate for a full flight
simulator where a large detailed environment is
desirable.
The second attempted method was to “loop” the old
landscape when the user reaches a boundary. In order for
the landscape to loop seamlessly, it was vital to ensure
that the edges have equal height values; on an A1 to H9
grid, for example, B1 must equal B9, A5 must equal G5,
and the corner values (A1, A9, G1, and H9) must equal
each other. The landscape can then be kept in the
computer memory as a single “tile”, which can be
duplicated to a new spot when needed. Tiles of terrain
that are a particular distance from the player’s avatar, i.e.
the aircraft, can be deleted or moved to a more
appropriate position, to keep memory usage to a
minimum.
Figure 3 Upon reaching the right side of the landscape, a tile of terrain
is moved in front of the player to provide the illusion of endless terrain
This solution could cause the problem of the user
noticing the repetitive nature of the terrain (especially if
the terrain contained a notable feature, such as a peak),
but the severity of this issue would depend on several
factors. For example, if the user intended on using the
same terrain for a long period of time, then he or she
would be more likely to notice the copied terrain tiles
than a user who intended on playing for a short period of
time. Additionally, if the terrain is seemingly large (i.e. if
it took 60 seconds to travel across a single tile, for
instance), then the repeating nature of the terrain tiles
would be less noticeable than if the terrain were small
(i.e. if it took 20 seconds to travel across a tile). The
implications of this are that looping the landscape with
the same terrain tile would work in a number of
simulation scenarios, but it is difficult to assess whether
this could be successfully applied to a particular flying
simulator without some form of user testing. A final note
on this method is that by making the four corner values
identical, the deviation between the highest and lowest
points of the landscape may be reduced. Alternate
methods of increasing the stochasticity (such as using
more random points) may be considered to avoid this
problem.
The final method was to automatically generate a
new, unique tile of terrain when the user reaches a
boundary (see Figure 4). To ensure that the tiles matched
seamlessly, one edge of the terrain would have its
heights copied to the matching edge of the new tile. The
rest of the heights can then be calculated via random
values and midpoint displacement using the diamond-
5
square algorithm, in the same manner as was used to
create the first tile.
Figure 4 Upon reaching the right side of terrain grid A, the values of
the far-right points are copied to the far left points of terrain grid B.
The other points of terrain grid B are then calculated via randomisation
and mid-point displacement, as done for grid A
The advantage of this method is that the terrain is
genuinely infinite; from a user’s perspective, the land
would continue in all directions with no repetitions.
However, there is the problem of memory usage. If a
user were to continue travelling in a straight line,
increasingly more tiles would have to be generated and
stored in memory (even if they were not rendered),
which could cause a memory overflow. The solutions to
this are to either store previously visited tiles in a
separate cache file, or to simply delete previously visited
areas that are far away from the avatar. The former
solution has the problem of requiring an efficient caching
system (i.e. one that allows for fast writing and reading
of large sets of coordinates), and the latter suffers from
not allowing the user to backtrack to a previously-visited
area that is a certain distance away.
V. AUTOMATIC DECORATION OF VIRTUAL
ENVIRONMENTS
Empty, featureless spaces resulting from terrain
generation alone are insufficient for the creation of
convincing virtual environments. To overcome this, the
environment needs to be decorated with suitable
vegetation and artificial structures, including buildings as
well as settlements.
A. Procedural Generation and Placement of
Vegetation
There are different methods for the procedural
creation of vegetation, many of which are based on
fractal or simpler rule-based techniques. One of the latter
methods has been used for on-the-fly generation of
forests for real-time virtual environments [17], using a
skeletal topology for procedurally generated and
animated trees, which has also been combined
effectively with on-the-fly generated grass to create a
rich natural scenery [18]. A much more powerful, rulebased
approach applying component-based modelling is
the one by Lintermann and Deussen [19], which provides
a more intuitive way for controlling plant modelling than
the well-known L-Systems [15]. The decision of what
type of plant needs to be placed into the virtual world
usually depends on a number of factors, including the
elevation and slope of the terrain, as well as topographic
features that dictate the probability of a specific plant’s
occurrence [20]. Once a position in the generated terrain
has been decided, the generation of plant models can be
followed with the placement of the vegetation.
For the proof of concept application, a simplified
method was implemented that made use of randomised
positioning of vegetation models, rather than random
vegetation itself. A series of low-polygon trees and
bushes were created and exported as 3D models, which
were then loaded at the start of the program. Once the
terrain had been created, the vegetation was randomly
assigned to various places on the terrain grid. However,
if a particular part of the terrain was too high, low,
submerged in water, or on a heavily inclined slope, then
that area was rejected and ignored, the purpose being to
prevent vegetation appearing in unrealistic locations. To
reduce the problem of repetition, the vegetation was also
rotated and scaled by a random value. The result was
surprisingly effective; the plants appeared to have
stochastic properties despite being pre-defined models,
and it was only when the density of the vegetation was
increased to the level of woodland when the repetition
became noticeable.
B. Procedural Generation of Buildings
Most real-world environments include some sort of
artificial structures. In a rural setting these might be
scattered houses that make up only a fraction of the
decorations of the terrain, with the majority of
decorations being plants, whereas in urban settings this
would be reversed with buildings providing the majority
of virtual world decorations. If the level of detail
required for buildings is relatively low, as would be the
case in a flight simulator that depicts the virtual world
from a high altitude, then simple geometric bodies can
create adequate results if combined with suitable texture
maps that hide the lack of actual detail in the geometry.
The use of ‘split grammars’ [21] and ‘shape grammars’
[22] for describing architectural features allow the use of
much more complex shapes and building structures,
which can be intricately detailed [23].
At the greatest level of detail, even building interiors
can be generated [24]. The placement of these artificial
structures in the virtual world can reach great levels of
complexity if the buildings form part of an urban
environment [25]. These more complex settlements are
created in a series of steps [26]: (a) first a suitable road
network is generated, effectively providing street maps
that partition the terrain and to constrain the placement
of buildings (b) this is then used to direct the division of
the terrain into lots which may be partitioned further to
generate building footprints, and (c) which are then used
as input for the generation of the buildings themselves.
Due to time restrictions, attempts at implementing a
more complex urban generation system had to be
simplified. The method of distributing buildings was
therefore nearly identical to the method of distributing
6
plants, with a few distinct changes. Firstly, the building
models were adjusted to have ‘foundations’, or basement
levels. The purpose of this was to prevent the underside
of the model showing, should the building be positioned
on a slope. Secondly, rather than being rotated and
scaled, which would be inappropriate for the majority of
buildings, structures were assigned random ‘height’ and
‘extention’ values that copied parts of the building model
above or to the side of the original, the purpose being to
reduce repetition and to reduce the isolated feel that can
be associated with solitary buildings.
The results were acceptable, and would be especially
fitting if applied to a small settlement of village size, but
the environment as a whole lacked the structure and
density associated with urban areas. The solution to this
problem would be to redesign the placement system, and
possibly the terrain generation system, from scratch,
whilst taking into account the architectural shape
grammars and road networks used in previous city
generation applications.
VI. COLLISION DETECTION
Accurate detection of when an object hits the terrain
surface is highly desirable in many applications,
especially flight simulators. However, calculating precise
polygon overlap between an aircraft and a landscape
would be too computationally expensive, especially
given the recursive and detailed nature of fractal terrain,
which could result in thousands of checks per frame.
Alternative methods were therefore implemented in an
attempt to find a method that was both fast and accurate.
Traditionally, the most common method of
calculating collisions is through the use of bounding
volumes, such as spheres or orientated cuboids, which
can be positioned in place of a complex model and then
be checked for overlap. They can also be used
hierarchally (i.e. checking a large bounding volume,
followed by more precise checks), to give results that are
both memory efficient and precise [27]. In this paper, a
single axis-aligned bounding box was used to cover the
aircraft. The trees, vegetation and buildings were covered
by single oriented bounding boxes. Additionally, a single
plane check was used to check whether the aircraft had
hit the water surface. More complex and accurate
methods, such as a series of bounding boxes, would be
more appropriate for a final game or simulation, but
simplicity was adhered to for the sake of shortening the
debugging process and for achieving consistent results
when testing the speed and accuracy of the collisions.
For the terrain in this instance, bounding spheres
were quickly deemed as inappropriate, as they would fit
awkwardly with the relatively flat polygons unless used
at a high level of detail. Axis-aligned bounding boxes
would be faster than spheres to calculate, due to the
simpler calculations needed for every frame [27], and
they could potentially be more accurate on less
mountainous terrain due to the nearly-aligned nature of
the landscape. The bounding boxes were applied by
making use of the terrain data arrays; for every polygon
point, a box was applied that would match the point’s
height, and have a polygon’s length and width. An
advantage of this method was that, as with the fractal
terrain itself, the checks could be called recursively to
achieve a higher level of detail (collision accuracy), at
the cost of efficiency.
For example, a bounding box could be applied every
two polygons across, or bounding boxes of twice the
width and length could be applied every four polygons
across, resulting in significantly fewer calculations.
During testing, it was found that, relative to the
complexity of the fractal terrain, only a small number of
bounding boxes were needed for the terrain collisions to
be perceived as accurate, so program efficiency was not
an issue. More bounding boxes were needed to maintain
accuracy if the terrain was made to be mountainous.
However, a scenario where more bounding boxes were
needed than were possible with the terrain data array was
not implemented. One observed point was that mountain
peaks seemed to suffer occasionally from inaccurate
collisions, probably due to the fact that this was the area
with the most ‘space’ contained within the box. There
did not seem to be a simple fix to this problem, as
manually adjusting the bounding boxes used for
mountain peaks was impractical and delivered mixed
results. Nonetheless, axis-aligned bounding boxes were
considered successful for this simulator due to their
speed and overall accuracy.
Before oriented bounding boxes can be discussed, it
ought to be noted that since we are working in three
dimensions, there are two sets of rotation that can be
implemented. The first would align the boxes according
to the way the landscape is facing (the yaw); this would
appear to be a rotated square, if viewed from above. The
second would align the boxes according to the slope of a
polygon or terrain face (the pitch and roll). Performing a
check on whether an axis-aligned box (the aircraft)
collides with a rotated square (part of the terrain after
being rotated once) would require a simple series of
checks for each of the oriented box’s points; the collision
detection is still being performed in two dimensions.
However, after applying the second rotation, checks must
be performed between two bounding boxes aligned on
separate axes, and consequently the number of
calculations rises steeply. Considering the number of
bounding boxes on the terrain, it was predicted that this
could potentially become an issue.
Figure 5 Comparison of axis-aligned (top) and oriented bounding box
method (bottom). Recursive calls lead to a more accurate terrain match.
7
Upon testing, it was found that performing only a
single rotation on the bounding boxes gave very similar
results to the axis aligned boxes, both in terms of
efficiency and accuracy. This method carried the same
problems and advantages of the axis aligned method.
Upon rotating the boxes a second time, the accuracy
improved somewhat and the problems associated with
the mountain peak inaccuracies disappeared. However,
the program was notably less efficient; attempting to
apply the fully oriented bounding boxes to every terrain
polygon slowed the program down to an unusable level
(although whether such a level of accuracy is actually
required for a flight simulator is questionable). Deciding
whether it would be more accurate and efficient to use a
small number of fully oriented bounding boxes or a large
number of axis-aligned boxes for fractal terrain is a
matter that requires further research and testing.
The program could be further streamlined through
the addition of Bounding Volume Hierarchies (BVHs).
By splitting up the terrain area into large bounding
volumes that in turn contain consecutively smaller
bounding volumes, the number of collision checks
carried out per frame could be substantially reduced.
Since this particular program uses a square grid for the
terrain, it would be logical for the checks to take the
form of testing which side the aircraft falls on on an
imaginary plane placed in the middle of the terrain grid.
This is repeated to further divide the grid into quarters
and eighths within the section in which the craft is
located. Precise collision checks can then be carried out
in the appropriate section. In this case, the efficiency of
using BVHs depends upon the complexity of the
landscape; a more complex landscape would benefit
more from having BVHs implemented for collision
detection, as a large number of calculations would be
removed per frame, whereas if the landscape were only a
few polygons in size, implementing BVHs would have
little effect. If numerous aircraft are included, or
buildings and objects overlap BVH boundaries, then the
efficiency of BVHs becomes more difficult to calculate,
and their use ought to be carefully considered.
VII. FLIGHT SIMULATOR
The simple flight simulator game created as a case
study for the random world generation allows navigation
and interaction with the randomised 3D environment.
Thematically the game is set to take place on alien
worlds. This could potentially offer a wider range of
potential scenarios (i.e. the online virtual world Second
Life is based on this philosophy) and thus offer a higher
level of entertainment and enjoyability compared to
‘Earth’ based scenarios. While of course this type of
representation does not automatically lead to exploratory,
challenging and problem-based learning experiences, the
opportunities for players to “define their learning
experiences or pathways, using the virtual mediations
within virtual worlds, has the potential to invert the more
hierarchical relationships associated with traditional
learning, thereby leading to more learner-led approaches
based upon activities for example” [28].
Players can navigate intuitively inside the the alien
worlds using mouse and keyboard inputs. As mentioned
before, the controls were deliberately kept simple; the
mouse is used to steer the aircraft, and two keys, ‘W’ and
‘S’, are used for acceleration and deceleration
respectively. Players can also fire a weapon by pointing
and clicking with the mouse. An overview of the flight
simulator in operation is shown in Figure 6.
Figure 6 Alien world flight simulator in operation
A digital compass and a speedometer are
implemented os overlays in the interface. The digital
compass (see top left hand side of Figure 6) allows
players to navigate inside the virtual environment using
directional information. The speedometer representation
(see top right hand side of Figure 6) was kept as simple
as possible in order to leave maximum screen space
available for the game. Additionally, a widget menu was
implemented that allows players to change specific
components of the 3D environment by right-clicking the
mouse. The environmental components that can be
modified include: terrain re-generation, weather
alteration (i.e. rain, fog, sunny, etc.), and colouring of
the grass, water and sky. An overview of the widget
menu is shown in Figure 7.
Figure 7 Flight simulator widget menu
Options to change the music track or mute it entirely
were also added to the widget menu. These controls were
introduced to the user in the form of a simple menu
8
screen that appeared at the start of the game. The
collision detection algorithm was based on axis-aligned
bounding boxes for the terrain and the building structures
(see section VI). In addition, based on the techniques
described in section IV, the landscape creates the illusion
that is infinite. An example screenshot of explosions
generated when the aircraft collides with the ground is
shown in Figure 8.
Figure 8 Collision between the aircraft and the terrain
To simulate the effect of collisions, explosions were
incorporated into the game based on particle systems.
Each particle effect (snow, rain, engine fumes and
explosions) has a velocity value, rotation value, or
transparency applied to it and for each collision, each
particle is changed appropriately (such as a slight
decrease in y position, for rain), and if it reaches a certain
condition (such as rain falling too low), it is set to a new
start position.
VIII. INITIAL EVALUATION
To acquire feedback on the finished core of the
application, a self-contained executable file was supplied
to two sets of users: a small Internet general discussion
forum (remote usability testing), and a group of Coventry
University students (hallway usability testing). The
intention of these tests was primarily to gather
information on the playability and enjoyability of the
game, but also to discover potential technical problems.
All of the end-users had some experience with games,
and the vast majority described themselves as ‘gamers’.
A few of those involved also had experience with games
programming, or had some knowledge of the architecture
behind creating a game. For both sets of users, the aim of
the flight simulator project was presented and it was
explained that the players should not expect a complete
game, but rather a prototype.
A. Remote Testing
For the Internet forum users, the following set of
questions was provided: (a) do the collisions seem
accurate? (b) would you be interested in playing a fuller
version of this game?, (c) what would you like to see
added? (e.g. larger variety of landscapes, different
controls) and (d) how large and varied were the
environments? – the final question testing, whether the
attempts at making the different environments varied was
a success. A qualitative analysis was done with five
users. Feedback was received some in direct reply to the
questions, and some raising additional issues. Recorded
feedback was very encouraging and all users agreed that
the methods used were very useful for the creation of
serious games applications, although important issues
were pointed out.
The answers to the second and third questions were
positive and similar. Several users commented that they
would play such a game, on the condition that further
additions were made to the gameplay. “It needs more to
do but the engine is cool”, one user noted. In regards to
the final question, reactions were mixed. One user
commented that they “enjoyed exploring the worlds”,
and mentioned how the use of colour made a lot of
difference, but another noted that “the buildings look
samey. They need more variation”. Additional comments
were also made. One player complimented the
“atmosphere” of the game. However, the collision
detection was criticised by another player. Specifically,
he stated “I sometimes crash when I drive too close to
mountains. [The collision detection] is fine for the water
and flat land though“. The other users claimed that the
collision detection was acceptable.
B. Hallway Testing
Four students from the Faculty of Engineering and
Computing, Coventry University were asked to partake
in the second test group. Instead of asking the university
students questions, they were asked to talk through what
they were doing and how they felt as they played the
game. Two students had some issues with the controls.
Specifically, they found the delay between hitting a key
and the aircraft movement difficult to get to grips with. It
is worth-mentioning that after playing for long enough,
the players adjusted to the issue. The object ‘popping’
due to terrain decorations being added to the scene was
criticised by two players; one commented that “it’s nice
that I can keep going forever, but it’s annoying that I
can’t see the horizon properly”. Similar to the Internet
test group, two players complimented the ‘feel’ of the
virtual worlds. One admired the water and sky effects,
and the other spent a fair amount of time recreating
different landscapes. Despite making use of the repeated
tile method for this test, none of the users were aware of
the repetition, which meant that this method is successful
and efficient for terrains that are explored by the user for
less than five minutes. Further testing would be needed
for the effectiveness of this method over longer periods
of time, however.
IX. CONCLUSIONS AND FUTURE WORK
This paper discussed a number of methods that can
be used to create realistic but also randomised 3D
environments for serious games. These methods referred
9
to the automated generation of fractal terrains, vegetation
and building structures. To prove the feasibility of the
techniques, an interactive flight simulator has been
implemented and evaluated. Initial results with two
different types of user groups (remote and hallway)
showed that overall the flight simulator is enjoyable,
looks realistic for a gaming scenario and thus also has
the potential to be used for the development of serious
games.
In the future, a classification regarding buildings and
vegetation will be developed allowing for automatic
random generation of larger urban environments. To
improve the cognitive perception of the players,
additional urban geometry will be generated
automatically in the game such as: streets, pavements,
signs etc., similar to the framework proposed by Smelik
et al. [28]. Finally, scenarios will be developed and more
evaluation studies will be performed with more users.
ACKNOWLEDGEMENTS
The authors would like to thank the Interactive
Worlds Applied Research Group (iWARG) members for
their their support and inspiration. A video that illustrates
the application in action can be found at:
http://www.youtube.com/watch?v=6G1NSALgSEY
REFERENCES
[1] Anderson, E.F., McLoughlin, L., Liarokapis, F., Peters,
C., Petridis, P., de Freitas, S. Serious Games in Cultural
Heritage, Proc. of the 10th
Int’l Symposium on Virtual
Reality, Archaeology and Cultural Heritage, VASTSTAR,
Short and Project Proceedings, Eurographics,
Malta, 22-25 September, 29-48, (2009).
[2] Zyda, M. From visual simulation to virtual reality to
games. IEEE Computer 38(9): 25-32, (2005).
[3] Krause, D. Serious Games – The State of the Game, The
relationship between virtual worlds and Web 3D, White
Paper, Pixelpark Agentur, (2008).
[4] Sawyer, B. Serious Games: Improving Public Policy
through Game-based Learning and Simulation, Foresight
and Governance Project, Available at:
[http://www.seriousgames.org/images/seriousarticle.pdf],
Accessed at: 26/10/2009.
[5] Smelik, R.M., de Kraker, K.J., Groenewegen, S.A.,
Tutenel, T., Bidarra, R. A Survey of Procedural Methods
for Terrain Modelling, Proc. of the CASA Workshop on
3D Advanced Media In Gaming And Simulation
(3AMIGAS), (2009).
[6] Shankel. J. Fractal Terrain generation – Fault Formation,
Game Programming Gems, Charles River Media, 499-
502, (2000).
[7] Perlin, K. An Image Synthesizer. Proc. of ACM
SIGGRAPH ‘85, 287-296, (1985).
[8] Mandelbrot, B. The Fractal Geometry of Nature, W H
Freeman, New York, (1983).
[9] Fournier, A., Fussel, D., Carpenter, L. Computer
Rendering of Stochastic Models. Communications of the
ACM, 25(6): 371-384, (1982).
[10] Miller, G.S.P. The definition and rendering of terrain
maps, Proc. of ACM SIGGRAPH ‘86, ACM Press, 39-
48, (1986).
[11] Peitgen, H. Saupe, D. The Science of Fractal Images,
Springer-Verlag, (1998).
[12] Kelley, A.D., Malin, M.C., Nielson, G.M. Terrain
simulation using a model of stream erosion. Proc. of
ACM SIGGRAPH ‘88, ACM Press, 263-268, (1988).
[13] Musgrave, F.K., Kolb, C.E., Mace, R.S. The synthesis
and rendering of eroded fractal terrains. Computer
Graphics 23(3): 41-50, (1989).
[14] Belhadj, F. Terrain Modeling: A Constrained Fractal
Model. Proc. of the 5th
Int’l Conference on Computer
Graphics, Virtual Reality, Visualisation and Interaction
in Africa, Grahamstown, South Africa, 197-204, (2007).
[15] Prusinkiewicz, P., Lindenmayer, A. The Algorithmic
Beauty of Plants, Springer-Verlag New York, Inc,
(1990).
[16] Hecht, E. Optics, 2nd
Edition, Addison Wesley, 603,
(1987).
[17] Di Giacomo, T., Capo, S. and Faure, F. An interactive
forest. Proc. of the 2001 Eurographics Workshop on
Computer Animation and Simulation, 65-74, (2001).
[18] Guerraz, S., Perbet, F., Raulo, D., Faure, F., and Cani,
M-P. A Procedural Approach to Animate Interactive
Natural Sceneries. Proc. of the 16th
Int’l Conference on
Computer Animation and Social Agents (CASA 2003),
73-78, (2003).
[19] Lintermann, B. and Deussen, O. Interactive Modeling of
Plants. IEEE Computer Graphics and Applications 19
(1): 56-65, (1999).
[20] Wells, W.D. Generating Enhanced Natural Environments
and Terrain for Interactive Combat Simulations. Doctoral
Dissertation, Naval Postgraduate School, Monterey
(CA), (2005).
[21] Wonka, P., Wimmer, M., Sillion, F., Ribarsky, W.
Instant Architecture. ACM Transactions on Graphics
22(3): 669-677, (2003).
[22] Müller, P. Wonka, P., Haegler, S., Ulmer, A. Van Gool,
L. Procedural Modeling of Buildings. Proc. of ACM
SIGGRAPH 2006, ACM Press, 614-623, (2006).
[23] Havemann, S. Generative Mesh Modelling. Doctoral
Dissertation, Technische Universität Braunschweig,
(2005).
[24] Hahn, E., Bose, P., Whitehead, A. Persistent Realtime
Building Interior Generation. Proc. of Sandbox
Symposium 2006, 179-186, (2006).
[25] Greuter, S., Parker, J., Stewart, N., Leach, G. Real-time
Procedural Generation of ‘Pseudo Infinite’ Cities. Proc.
of GRAPHITE 2003, ACM SIGGRAPH, 87-94, (2003).
[26] Parish, Y.I.H., Müller, P. Procedural Modeling of Cities.
Proc. of ACM SIGGRAPH 2001, ACM Press, 301-308,
(2001).
[27] Gottschalk, S., Lin, M.C., Manocha, D. OBBTree: A
Hierarchical Structure for Rapid Interference Detection,
Proc. of SIGGRAPH ‘96, ACM Press, 171-180, (1996).
[28] De Freitas, S. Serious Virtual Worlds - A scoping study,
JISC, (2008), Available at:
[http://www.jisc.ac.uk/media/documents/publications/ser
iousvirtualworldsv1.pdf], Accessed at: 26/10/2009.
[29] Smelik, R.M., Tutenel. T., de Kraker, K.J., Bidarra, R. A
procedural Terrain Modelling Framework, Poster Proc.
of the Eurographics Symposium on Virtual
Environments EGVE08, 39-42, (2008).
10
Interactive Virtual and Augmented Reality Environments
48
8.2 Paper #2
Noghani, J., Anderson, E., Liarokapis, F. Towards a Vitruvian Shape Grammar for
Procedurally Generating Classical Roman Architecture, Proc. of the 13th
International
Symposium on Virtual Reality, Archaeology and Cultural Heritage VAST 2012, Short and
Project Papers, Eurographics, Brighton, UK, 19-21 November, 41-44, 2012.
Contribution (30%): Design of the architecture and advice on the implementation.
Collaboration on writing of the paper.
Interactive Virtual and Augmented Reality Environments
53
8.3 Paper #3
O'Connor, S., Liarokapis, F., Peters, C. An Initial Study to Assess the Perceived Realism of
Agent Crowd Behaviour in a Virtual City, Proc. of the 5th
International Conference on Games
and Virtual Worlds for Serious Applications (VS-Games 2013), IEEE Computer Society,
Bournemouth, UK, 11-13 September, 85-92, 2013.
Contribution (30%): Collaboration on the design of the architecture and advice on the
experimental part. Collaboration on writing of the paper.
An Initial Study to Assess the Perceived Realism of Agent Crowd Behaviour in a
Virtual City
Stuart O'Connor
Interactive Worlds Applied
Research Group (iWARG)
Coventry University
Coventry, CV1 5FB, UK
oconno13@uni.coventry.ac.uk
Fotis Liarokapis
Interactive Worlds Applied
Research Group (iWARG)
Coventry University
Coventry, CV1 5FB, UK
F.Liarokapis@coventry.ac.uk
Christopher Peters
HPCViz, CSC
KTH Royal Institute of
Technology
Stockholm, Sweden
chpeters@kth.se
Abstract— This paper examines the development of a
crowd simulation in a virtual city, and a perceptual
experiment to identify features of behaviour which can be
linked to perceived realism. This research is expected to
feedback into the development processes of simulating
inhabited locations, by identifying the key features which
need to be implemented to achieve more perceptually
realistic crowd behaviour. The perceptual experimentation
methodologies presented can be adapted and potentially
utilised to test other types of crowd simulation, for
application within computer games or more specific
simulations such as for urban planning or health and safety
purposes.
Keywords – crowd simulation, perceptual studies,
artificial intelligence, agent behaviour, virtual environments.
I. INTRODUCTION
Simulating vast crowds of agents within a virtual
environment is a challenging endeavour from a technical
perspective [1]; however it becomes even more difficult
when the subjective nature of viewer perception is also
taken into account. Agent behaviour is the product of
artificial intelligence systems working in tandem;
nevertheless the sophistication of these systems is not a
guarantee of achieving believable behaviour [2]. Within
a medium such as computer games that require viewer
immersion [3], the perceived realism of agent behaviour
is a crucial factor for consideration. The specific features
of implemented behaviours may have a great impact
towards creating a believable scene.
Crowd simulation is the process of populating a
virtual scene with a large number of intelligent agents
that display behaviour in a manner not dissimilar from a
real person within the same context [4]. Realism in the
context of crowd simulations and computer games has
been given multiple definitions over time, making it a
difficult factor to define for measurement. There are two
presented definitions [5], one considering plausibility in
terms of the graphically quality and the other considering
plausibility in terms of the similarities to reality. Recent
research considers these types of realism within
predefined virtual environments, as well as the
perceptual effects [6]. Perceived realism is one definition
for realism within the context of simulations and
computer games. It is the plausibility of an aspect or a
feature when perceived by a human viewer, and can have
a varying level of intensity depending on whether it is
perceptually realistic or not.
As it is a general definition it can be applied to
different features such as 3D models or lighting, but for
the purposes of this research it is applied to aspects of
agent crowd behaviour. Since this type of realism is
perceptually based, it can be measured by applying
psychophysical testing methodologies. In this paper, an
overview of the types of perceptual experiments that can
be utilised to assess the perceived realism of agent crowd
behaviour within a virtual simulation are presented, in
addition to preliminary results. The core research
challenge is how to assess the perceived realism of agent
crowd behaviour within a virtual urban environment.
To carry out the psychophysical experimentation a
platform was developed in the form of the urban crowd
simulation utilising the C++ programming language and
the OpenGL graphics library, both of which are
commonly used for developing computer games. For this
simulation, behaviour features are added consistently
through the methodology of analysis, synthesis and
perception [7], allowing for the definition of parameter
spaces and customisability within the platform
specifically for perceptual experimentation.
As the perceived realism of agent crowd behaviour
is evaluated through the features that shape behaviour
traits, for example velocity type, behavioural annotation
and so on, graphical complexity is not essential to the
core of the research. It is highly important that a crowd
simulation is perceived to be realistic by human viewers
or else plausibility will be lost, which is especially true
for computer games that require a level of immersion
[8]. This research investigates the perceived realism of
agent behaviour within an urban environment through
perceptual experimentation, to identify features of
behaviour that are the most effective for ensuring the
perceptual plausibility of the virtual scene.
The paper is structured as follows. Section II
presents related research on crowd simulation and
computer games. The methodology is detailed in Section
III and Section IV describes the implementation of the
urban crowd stimulation. The perceptual experiments
and the psychophysical methods, along with preliminary
results are outlined in Section V. Finally, Section VI
provides conclusions and the direction of future research.
978-1-4799-0965-0/13/$31.00 ©2013 IEEE
II. BACKGROUND
Crowds can be simulated for numerous purposes
like health and safety, where the simulated agents are
utilised to test for current or possible dangers [9]. The
identification of these dangers is used to improve the
current system or inform the development processes if
the simulation is pre-emptive. There are professionally
developed simulations for crowd management,
evacuation procedures etc, and research [10] have been
conducted into evaluating the movement paths and
density of virtual crowds for urban planning purposes.
These types of simulations require virtual realism
whereby the simulation must be as close to reality as
possible [11] or it causes inaccuracies that can lead to
significant issues.
This is one of the many definitions of realism in the
context of simulations and computer games. In others,
such as [12], aesthetic realism is set apart from realism of
representation. While this virtual realism is important for
a serious simulation to achieve its purpose [13], other
mediums (i.e. entertainment) require a different type of
realism that is not entirely dependent on mimetic
representation [14].
Computer games in particular require a sense of
immersion [3] within the game world to be successful, as
in [15]. It is also conveyed that for immersion, total
photo and audio realism is not required for a sense of the
world to be real and complete. This is where the idea of
perceived realism is relevant, as it acts as a gauge for the
perceptual plausibility of features within the simulation
or computer game. Game designer Chris Crawford wrote
that “games represent a subset of reality” [16] which can
be considered true in terms of the subjective nature of
perceived realism. This can help to ensure that the virtual
scene is perceived to be real, potentially aiding
immersion and flow [17].
Crowds have been a reoccurring theme in computer
games over the past decade [18] and continue to be so in
the present and foreseeable future. Computer games are
at the forefront of consumer media, with game launches
often breaking sales records. The game Call of Duty:
Modern Warfare 3 [19] for example broke records when
it sold over 6.5 million copies in the first twenty four
hours of its release in 2011. Considering the platform,
games offer intuitive yet technically advanced visual and
interactive simulation. So it is not surprising that games
have incorporated crowd based systems within their
game play for innovation [20].
In fact it is often the case that technologies from the
games industry are utilised for research purposes and
vice versa. As is the case with agent-based crowd
simulation in airports using game technology [21],
research aimed at simulating the navigational traits of
agents within a special public location while maintaining
an interactive frame-rate. Simulating crowds in games is
not specific to a single genre either, with the action
adventure title Assassin's Creed 3 [8], the Stealth shooter
game Hitman: Absolution [22], the city builder title
Tropico 4 [23] and the open world shooter set within a
city Grand Theft Auto IV [24], all having some crowd
elements in game play. In Hitman crowds are utilised for
blending into the game world to assassinate targets or
hide from pursuers (Figure 1).
Figure 1 Crowds in the China Town level of Hitman: Absolution [22]
In Tropico, the crowds of agents react to the changes
made within the environment by the player or from other
events. In Grand Theft Auto, crowds are a living part of
the city, simulated to act as normal pedestrians. The
latest Assassin's Creed title takes crowd simulation in
games forward by utilising the perceived realism of the
in-game crowds as a core game play mechanic for its
online multiplayer. In a multiplayer match there are only
several differing character models for the many agents
that populate crowds within the map and the players
must try to hunt each other, but the catch is that they
each look like one of the different character models
(Figure 2).
Figure 2 A multiplayer match showing crowds and assassinations in
Assassin's Creed 3 [8]
This means that often a player is identified as they
do not react in a manner that is perceptually realistic to
other players, where as the agents typically do. As is
described in research towards understanding realism in
computer games through phenomenology [25] for the
game world to be perceived as real it must react in a
realistic way, an especially relevant statement when
considering crowd behaviour. Assassin's Creeds game
play takes this sentiment to its core showing the
significance of perceived realism and crowd simulation
within computer games.
In the art of computer game design [16] it is
highlighted that the nature of human fantasy can turn an
objectively unreal situation into a subjectively real
situation, which in essence may indicate that the
perceived realism of a virtual scene is a highly important
aspect that is set apart from the virtual realism.
Furthermore, within research into immersion and
presence in computer games [3], it is noted that areas for
further research include the links between immersion and
perception showing that investigating the perceived
realism of agent crowd behaviour within a virtual city is
a viable line of enquiry.
III. METHODOLOGY
To assess the perceived realism of agent crowd
behaviour, a general three stage methodology was
employed. This allowed for the development of the urban
crowd simulation as an interactive process, meaning
features could be added over time and perceptually
tested. This forms a cycle allowing for a corpus of data
to be collected, while at the same time adding more
sophistication to the simulation. There are three distinct
aspects to this methodology including: analysis,
synthesis and perception [7], as illustrated below:
• Analysis: Identify a feature and inform
algorithm construction, by analysing real-world
and similar instances of crowd behaviour.
• Synthesis: Synthesise a new simulation with
further refinement and the behaviour impacting
feature that was identified in analysis.
• Perception: Conduct the psychophysical
experiment for gauging the perceived realism
values of the added feature.
As the methodology was employed the most obvious
features were identified as part of analysis. The first core
feature distinguished in analysis was the varying velocity
of agents, due to the fact that when one looks towards
reality it is easily identified that pedestrians move at
different rates. This may seem like a simplistic choice
but within each feature there is the depth of parameter
space for customisability. In this instance, what is the
maximum velocity? What is the minimum velocity? Is
there a specific velocity range that is most effective?
Should the distribution of velocities be closer to the
maximum or minimum? This type of methodology
therefore allows each feature to be added and
psychophysically tested individually, enabling perceptual
study before other features are added to the system.
After the feature is identified in analysis, it is then
implemented into the urban crowd simulation with the
specific parameter spaces and customisability required.
The urban crowd simulation is discussed in detail in
Section IV. In the case of varying velocity parameter
spaces the minimum and maximum velocity was added,
along with a value to control the distribution of velocities
within the agent populous. The output stimuli produced
in this stage are video clips of the same virtual scene
with the different configurations required for the
experimentation methods. These configurations are
generally the different parameter value set-ups for the
newly added feature but can include different features as
required.
Once the output from the synthesis stage is acquired
the primary psychophysical experiment can be conducted
as part of the perception stage. The aim of the
experiment is to acquire the highest level of perceived
realism for the newly implemented feature so that it can
be linked to a specific configuration and set of values.
This will allow for the setup to be replicated, helping to
ensure perceptual plausibility can be obtained in other
instances.
There are two other experiments that have different
aims. The first, aims to identify the optimum number of
features required in a simulation before it becomes
overall perceptually plausible in terms of crowd
behaviour. The other test will allow for the features to be
ranked with regards to their overall effectiveness at
implementing a sense of perceptual realism for crowd
behaviour within a virtual scene. These two tests are not
primary and require multiple features to be implemented
to be successful. As such, they are not conducted at each
iteration of the methodology but only after several new
features have been added.
An online survey platform has been developed for
the purposes of conducting the perceived realism
experiments on large numbers of participants. It is
currently in the prototype stage but has been utilised for
a pilot study into the varying velocity feature, which is
covered in Section V. These experiments are aimed at
gauging the perceived realism values for features within
the simulation to allow a set of guidelines to be shaped
with the corpus of data that will allow the developers of
other simulations or computer games to implement high
quality of perceived realism in terms of the agent crowd
behaviour.
IV. URBAN CROWD SIMULATION
The main purpose for developing the urban crowd
simulation was to create a platform with alterable
parameters capable of customising agent behaviour for
the purposes of perceptual evaluation. Since the primary
aim is to probe human perception, the standard
modelling and behaviour approaches had to be altered to
accommodate the fact that more stimuli were needed
than just the configuration that appeared most realistic to
the developer. The real-time simulation of crowds has
been conducted using a variety of approaches. The most
common methods involve employing a series of models
and algorithms working in tandem to animate each agent.
These include decision making [26], pathfinding
navigation [27], local steering mechanics [28] and an
agent perception system [29]. Social forces models [30]
can also be utilised to enhance crowd believability under
certain situations. The urban crowd simulation developed
as part of this research implements a range of these
techniques for simulating agents. While the general
methodology adds behaviour orientated features to the
system, a base containing the virtual urban environment
and the core AI elements was still required. Development
of the urban crowd simulation is conducted using the
C++ programming language and the OpenGL graphics
library. The three core components in the urban crowd
simulation are discussed in the following subsections.
A. Procedural City Generation
The first major aspect considered for the urban
crowd simulation was the virtual urban environment. A
procedural approach was used to generate the virtual
city, allowing for the possibility of multiple layouts and
setups. Procedural generation means that the city is
automatically created based on a series of predefined
rules for the general shape, structure and layout. It is
possible to generate many different layouts, due to the
parameterised nature of the approach. At the same time it
is also possible to record a specific virtual environment if
needed for experimentation purposes. The main benefit
of procedurally generating the virtual city is that it allows
for substantial complex geometry in terms of the
buildings and roads, without the need to manually place
them.
Figure 3 A procedurally generated city in the urban crowd simulation
The geometry for the architecture within the virtual
city is rendered using OpenGL. For the urban crowd
simulation, a procedural city modelling open source
toolkit [31] written in C++ was utilised. The geometry
for the architecture and layout within the virtual city is
defined using the toolkit.
The layout of city is produced by land generation
rules, which produce templates from subdividing quads
and triangles. These templates are populated with the
urban architecture models from the geometric generation
rules in order to create the virtual city (see Figure 3).
Other features initialised as part of the procedural
generation routine include materials, light sources and
camera controls. The generated virtual city is very large
at around 100km² and includes three zone types:
commercial, residential and industrial. Given the
approach above for generating the urban environment,
agents are introduced to populate it.
B. Core AI Components
There are four core AI components implemented in
the urban crowd simulation: decision making,
pathfinding, steering and perception. These support the
real-time simulation of crowds of agents. The core AI
components are separate from the behaviour-oriented
features, as they only allow for the most basic elements
of operation. Using the core systems an agent can
perceive, think and act [32], to select destinations and
navigate through the virtual environment. Each agent is
updated on a frame-by-frame basis and is modelled as an
individual entity, with its own data structure containing
key variables.
The decision making system is a highly important
aspect in any artificial intelligence system, as it allows
for the selection of a specific behaviour or action from a
range of possible behaviours or actions. The decision
making system discerns which of these is the most
appropriate to choose at that given moment. Finite-state
machines were implemented as the decision making
mechanism for agents in the simulation. Currently agents
follow the main paths within the environment. However
the approach is extensible, so more states can be added to
accommodate new behaviours.
The main purpose of pathfinding is to plan a path for
an agent from its current location to the next selected
location, as resolved by its decision making system. A*
pathfinding was implemented to achieve this [27].
Connected nodes are defined for the major paths within
the virtual environment. The A*
algorithm calculates the
path in the form of a list of nodes from the agents
starting location to its destination location according to a
number of heuristics.
A perception system allows an agent to sense its
local environment. The sophistication of the perception
system is highly dependent upon the AI systems and the
features available to utilise the local data. A simple yet
common approach is to designate a radius around the
agent to act as its locally accessible neighbourhood. This
is the type of system implemented for the urban crowd
simulation.
Steering allows an agent to navigate in a local and
reactive manner to dynamic obstacles. There are multiple
types of steering mechanics that are suited to different
purposes [28]. Here, crowd path-following behaviour has
been implemented. This includes path-following and
separation mechanics so that agents follow the short
paths between nodes as part of the calculated A*
main
path. They also, have a degree of separation to prevent
them clustering and forming large masses.
C. Quantitative Evaluation
Given these four core AI components and the virtual
environment, it is possible to simulate crowds of agents
in real-time within an urban context (see Figure 4).
Some behaviour-orientated features were specified as a
focus of interest for the analysis. These features require
parameter space and customisability for perceptual
evaluation. In this work, agent velocity is the main
feature, which has parameter space ranging from
minimum to maximum velocities, as well as velocity
distribution.
A mechanism for behavioural annotation has been
implemented in preparation for the identification of
future features placed in the environment, such as
pedestrian crossings, roads and stationary positions.
These features will be embedded within cells in the
virtual world in order to alter the behaviour of nearby
agents or influence them when detected by their
perception systems. For example, in the context of the
current focus on agent velocities, the detection of a road
could cause the agent to increase velocity in order to
cross it faster. It is hoped that when these annotations are
implemented they will allow a clearer contextual
relationship between the agents and their local
environment that can be studied as part of the perceived
realism experiments.
Figure 4 The urban crowd simulation displaying crowds of agents
While the next specific behaviour orientated feature
to be implemented will be identified in the analysis step,
there are several prominent feature considerations that
will be considered for incorporation in future models.
These could include physiological aspects such as age,
height and weight, and psychological aspects such as
emotions, internal states and predispositions. These
features will be visually represented. Other features will
almost certainly touch upon context related
considerations and environmental awareness, adding
further levels to the decision making processes in order
to give agents refined objectives. These inclusions will
add more depth and sophistication to agent behaviour but
as a gradual process that can be examined, as outlined in
the methodology.
V. PERCEIVED REALISM EXPERIMENTS
At the core of the experimental methodology are a
number of perceptual experiments that permit the
exploration of perceived realism based on the parameter
spaces for the implemented crowd simulation features.
As the main purpose is to evaluate the plausibility of
crowd behaviour, the aim is to identify thresholds for
parameters and features that can produce credible virtual
scenes. A corpus of data is generated in this stage which
is analysed in order to rank the behaviour orientated
features on their effectiveness, discern the optimum
number of features for perceptual plausibility in terms of
the crowds and discover the most effective configuration
values for the parameters and customisability of the
features. The perceptual experiments utilise
psychophysical methods for acquiring this threshold
data.
“The art of psychophysics is to formulate a question
that is precise and simple enough to obtain a convincing
answer” [33], such that it is possible to study the
perceptual effects of particular physical dimensions. To
this end it is possible to investigate the limits of visual
perception by parametrically varying the stimuli within a
virtual scene, with the aim of measuring the thresholds
and levels of realistic plausibility. The link between the
level of stimuli and the subjective nature of the human
response is known as the psychometric function.
A. Psychophysical Methods
There are various psychophysical experimental
methodologies ranging from the three classical methods
of limits, constant stimuli and adjustment, to adaptive
methods which include staircase and magnitude
estimation. Here three main psychophysical methods are
utilised: the constant stimuli procedure, the adjustment
procedure and the staircase procedure.
The constant stimuli procedure is a classical
psychophysical methodology where participants are
asked to perceptually evaluate a stream of different
levels of stimulus that are randomly shown rather than
being presented in a given order such as ascending or
descending. The adjustment procedure consists of
participants taking control of the level of the stimulus in
order to identify the detectable threshold. Finally, the
staircase procedure is an adaptive type of
psychophysical methodology in which the stimulus is
constantly adapted to the individual participant. It
involves starting at a high level of stimulus and then
reducing the stimulus until the participant can notice the
change, at which point there is a reversal of the staircase
and the stimulus is increased until the participant notices
again.
Research comparing the constant stimuli classical
method to the staircase up-down adaptive method [34]
has found that while both have the same accuracy of
results, the staircase procedure has some advantages by
automatically setting the dynamic range for the
psychometric function. This is the reason why the
staircase procedure is the primary experimental
methodology employed in this work.
B. Experiments
Three experiments are presented as part of this
research. The first and primary experiment utilises the
staircase procedure in order to establish the perceptual
thresholds and perceived realism levels of a feature
within the simulation. The second experiment utilises the
adjustment procedure to rank the features based on their
perceptual effectiveness. The third experiment utilises
the constant stimuli procedure to determine the threshold
and most effective number of features required for
perceptual plausibility.
In these experiments the video clips obtained in the
synthesis stage of the general methodology are utilised as
stimuli and presented in the manner dictated by the
respective psychophysical method. An online survey
platform has been developed (see Figure 5) in order for
a large number of participants to perceptually evaluate
the footage. The platform automates the display of video
clips to the participant and provides a slider at the bottom
of the page in order to collect ratings of realism. Data
collected from the participants includes: initials, date of
birth, gender and primary language. The number of video
clips and the order in which they are shown varies
according to the specific feature and type of experiment.
Figure 5 The survey platform prototype showing a video clip
Data is collected from the experiments in the form
of a perceived realism value between 0 and 1 for each
configuration shown, where a value of 1 maps to
completely realistic and 0 maps to completely
unrealistic. The manner in which the perceptual
thresholds and perceptual plausibility levels are
calculated varies between experiments.
Generally each configuration shown is on a scale
from low to high stimuli. Here, when the perceived
realism value is above 0.5 we consider the stimuli to be
perceptually realistic. The general approach for
calculating the thresholds is to consider the lowest and
highest stimuli on the underlying scale that are
perceptually plausible. From these thresholds the mean
stimuli can also be calculated, which should be close to
the optimum configuration. This is not necessarily
always the case however. The optimum perceptually
plausible level is identified as the highest average
positive perceived realism response from the
participants. This is the average perceived realism value
closest to 1, which can then be linked to a specific
configuration. Depending on the experiment type this
configuration can be parameter space values, a specific
feature or even a number of features. By completing
these experiments, we endeavour to identify thresholds
for features in addition to optimal configurations in order
to create perceptually plausible crowd behaviour.
In the experiments the results will be treated fairly
as perception is being tested. In terms of the bias, it may
become apparent that some groups perceive the stimuli
in different ways to other groups. This data will be noted
and will become an important consideration in the final
analysis. If this causes a skewer in the results then
modifiers can be utilised for the groups that are not
represented equally, in order to combat the bias and
ensure the optimum configuration is accurate as an
average for all groups. A small number of outliers in
later results with large pools of participants is to be
expected and will not be considered an anomalous
condition, unless presented in a relevant density. Results
from the experiments will be statistically analysed using
the general linear model, with X representing the
configurations starting with low intensity stimuli and
with Y representing the perceived realism value
responses from participants.
C. Perceived Realism Pilot Study
The staircase based primary experiment described
above was conducted in pilot study form. Three
participants were shown a series of video clips with
different velocity configurations, using a prototype of the
online survey platform. While within the crowd
simulation system agent velocity is represented with a
directional vector as well as a speed component, in the
case of the results the agent velocity is essentially just
the speed component as the direction would have no
purpose for representation. The speed component is
measured as a decimal of a single unit which is one. The
speed is determined on a per second basis. This single
unit can be altered depending on specific requirements,
however within the urban crowd simulation a unit is the
equivalent of 2.5 meters.
While the small test pool means that the results
presented here are preliminary, the purpose of the study
was to test the experimental approach and its viability for
collecting larger, statistically valid samples. The
underlying scale of stimuli for the velocity feature
consisted of different configurations. The low end of the
scale consisted of configurations with a small velocity
range and distribution toward the minimum velocity. The
high end of the scale consisted of configurations with a
large velocity range and a distribution towards the
maximum velocity. The order in which configurations
were presented was adapted according to the
psychophysical method but the general approach was to
start at a high level and reduce it until the participant
rated the behaviour as being unrealistic. At this point the
stimuli would be increased until the participant found it
to be realistic again. This was repeated several times to
identify thresholds and perceived realism values.
As the velocity feature had two distinct factors,
velocity range based on minimum and maximum
velocities and velocity distribution, different passes were
required in order to ensure both were evaluated properly.
Firstly, the range was evaluated starting at a high stimuli
with a large velocity range and being reduced to low
stimuli of small velocity range and so on. Secondly, the
velocity distribution was evaluated again starting at a
high stimuli with distribution towards maximum velocity
and then reduced and reversed again. The aggregate
results from this experiment are as follows:
• The normalised velocity range was 0.3, based
on an average perceived realism value of 0.82.
• Normalised velocity range thresholds were 0.2
and 0.5.
• The normalised velocity distribution was 0.5,
based on an average perceived realism value of
0.85.
• Normalised velocity distribution thresholds
were 0.3 and 0.7.
Even though these results cannot be considered
accurate due to the small number of participants and
therefore do not allow conclusions related to the agent
velocity feature, the pilot study was appropriate for
testing the viability of the experimental method. Our
intention is to conduct a larger scale study to obtain
statistically relevant data which can then be used to
provide more useful information about the potential role
of the features in viewer perception.
VI. CONCLUSIONS AND FUTURE WORK
Details have been presented about ongoing
research towards assessing the perceived realism of
crowd behaviour in a virtual city. The methodology
consists of the analysis of real-world and related
instances of crowd behaviour, synthesis to replicate the
crowd behaviour through features and output
experimental stimuli, and perceptual experimentation to
investigate participants subjective views of the realism of
agent crowd behaviour within an urban context. It is
based on the insight that there are multiple types of
realism each with their own merits and that the realism
of simulations and computer games should not be judged
only by aesthetic means, but also by more in-depth
methods that consider the content of the virtual world.
Crowds not only add a sense of life and realism to virtual
environments but can be used as tools for serious and
entertainment purposes. It's therefore important that they
can be properly evaluated to highlight specific features
that make them so effective with respect to viewers.
The research is envisaged to feedback into
improving the development processes of simulations and
applications implementing virtual crowds, especially
those within urban environments. It has been shown that
computer games in particular are developing at an
accelerated rate and it is hoped they could benefit from
the outcomes of this research. In the broader sense, these
may include identifying human mental models of crowd
behaviour through perceptual experimentation, thus
adding new perspectives to enable AI systems to better
predict or understand real behaviour. Guidelines will be
constructed through data from the perceptual
experiments, which will be useful for identifying key
features of crowd behaviour for implementation. The
simulation thus far is a prototype in order to support the
development of the perceptual experiments and allow
refinements to take place to both the simulation and
experimental approaches.
Currently the urban crowd simulation is in the
process of enhancement to improve the structural and
visual quality of stimuli. In particular, one possibility is
to investigate a procedural annotation mechanism, for
automatically generating information for supporting
artificial behaviours within the environment when the
city is initialised. This would allow new behavioural
annotation types to be added with relative ease and
would enable the use of multiple city layouts since
manual annotation efforts would no longer be required.
Other planned improvements include more complex
pedestrian models and new behavioural features such as
collision avoidance and a social forces model. These
additions will need to be carefully managed however, as
experimentally they can influence participant expectation
in relation to crowd behaviour plausibility. In relation to
this aspect, the use of simple stimuli thus far in the
experimentation is important and could be applied across
other simulation and application contexts.
In the future, the experiment will be evaluated by a
larger sample of participants. This will be achieved by
launching the survey platform online. In addition,
guidelines for implementing perceptual plausibility in
terms of agent crowd behaviour will be investigated.
ACKNOWLEDGEMENTS
The authors would like to thank the Interactive
Worlds Applied Research Group (iWARG) at Coventry
University, Faculty of Engineering and Computing, UK
as well as Dr. Etienne Roesch for their support and
inspiration. A video that illustrates the system in
operation can be found at:
http://www.youtube.com/watch?v=BqnmMksd30I&featu
re=youtu.be
REFERENCES
[1] Leggett, R. “Real Time Crowd Simulation: A Review”,
Retrieved, March 20, (2004).
[2] McDonnell, R., Newell, F., O' Sullivan, C. “Smooth
movers: Perceptually guided human motion simulation”,
Proc. of the 2007 ACM SIGGRAPH/ Eurographics
symposium on computer animation, 259-269, (2007).
[3] Jennett, C, I. Cox, A, L. Cairns, P. “Being In The Game”.
Proc. of the Philosophy of Computer Games, 210-227.
(2008).
[4] Peters, C., Ennis, C., McDonnell, R., O'Sullivan, C.
“Crowds in context: Evaluating the perceptual
plausibility of pedestrian orientations”, Proc. of
Eurographics Short Papers, Springer-Verlag, Berlin, 227-
230, (2008).
[5] Barlett, C. Rodeheffer, C. “Effects of Realism on
Extended Violent and Nonviolent Video Game Play on
Aggressive Thoughts, Feelings and Physiological
Arousal”, Aggressive Behaviour, 35, 213-224, (2009).
[6] Peters, C. Ennis, C. “Modelling groups of plausible
virtual pedestrians”, IEEE Computer Graphics and
Applications, Special Issue on Virtual Populace, IEEE
Computer Society, 29(4): 54-63, (2009).
[7] Ennis, C., Peters, C., O' Sullivan, C. “Perceptual effects
of scene context and viewpoint for virtual pedestrian
crowds”, ACM Transactions Applied Perception, 8(2),
Article Number 10, (2011).
[8] Assassin's Creed 3, Ubisoft Montreal, (Available at:
http://assassinscreed.ubi.com/ac3/en-gb/index.aspx),
Accessed at: 28/03/2013.
[9] Almeida, J.E., Rossetti, R., Coelho, A.L. “Crowd
Simulation Modeling Applied to Emergency and
Evacuation Simulations using Multi-Agent Systems”,
Proc. of 6
th
Doctoral Symposium on Informatics
Engineering, Porto, Portugal, 93-104, (2011).
[10] Aschwanden, G., Halatsch, J., Schmitt, G. “Crowd
simulation for urban planning”, eCAADe 2008, Antwerp,
(2008).
[11] Banerjee, B., Kraemer, L. “Evaluation and comparison of
multi-agent based crowd simulation systems”. Agents for
games and simulations II, Frank Dignum (eds), SpringerVerlag,
Berlin, Heidelberg, 53-66, (2011).
[12] Galloway, A.R. “Social Realism in Gaming”, The
International Journal of Computer Game Research,
Volume 4, Issue 1. (2004).
[13] Chalmers, A. “Level of Realism for Serious Games”,
Proc. of the IEEE Int’l Conference on Games and Virtual
Worlds for Serious Applications (VS-Games 09), IEEE
Computer Society, 225-232, (2009).
[14] Sommerseth, H. “Gamic Realism: Player, Perception and
Action in Video Game Play”, Situated Play, DiGRA
Conference, (2007).
[15] McMahan, A. “Immersion, Engagement and Presence”,
The Video Game Theory Reader, Mark J.P. Wolf and
Bernard Perron, (eds), New York. NY: Routledge, 77-78.
(2003).
[16] Crawford, C. “The Art of Computer Game
Design”, McGraw-Hill/Glencoe, (1982).
[17] Cowley, B., Charles, D., Black, M., Hickey, R. “Towards
an Understanding of Flow in Video Games”, Computers
in Entertainment, ACM Press, 6(2), Article 20, (2008).
[18] Ennis, C., McDonnell, R., O' Sullivan, C. “Seeing is
believing: Body motion dominates in multi-sensory
conversations”, ACM Trans Graph 29(4):19, (2010).
[19] Infinity Ward, Call of Duty: Modern Warfare 3,
Activision, (Available at:
http://www.callofduty.com/mw3), Accessed at:
28/03/2013.
[20] Bernard, S., Therien, J., Malone, C., Beeson, S.,
Gubman, A., Pardo, R. “Taming the Mob: Creating
believable crowds in Assassin’s Creed”, Game
Developers Conference, San Francisco, CA. Feb 18-22.
(2008).
[21] Szymanezyk, O., Dickinson, P., Duckett, T. “Towards
Agent-based Crowd Simulation in Airports using Games
Technology”, Agent and MultiAgent Systems
Technologies and Applications, Volume: 6682, 524-533.
(2011).
[22] Hitman: Absolution, Square Enix, IO Interactive,
(Avaliable at: http://hitman.com/), Accessed at:
28/03/2013.
[23] Tropico 4, Kalypso Media, Haemimont Games,
(Available at:
http://www.tropico3.com/en/T4/en/index.php), Accessed
at: 28/03/2013.
[24] Grand Theft Auto IV, Rockstar Games, Rockstar North,
(Available at: http://www.rockstargames.com/IV/),
Accessed at: 28/03/2013.
[25] Low, G. “Understanding Realism in Computer Games
through Phenomenology”, Stanford University,
California, (2001).
[26] Luo, L., Zhou, S., Cai, W., Low, M., Lees, M. “Toward a
Generic Framework for Modeling Human Behaviors in
Crowd Simulation”, Proc. of the IEEE/WIC/ACM Inter’l
Joint Conference on Web Intelligence and Intelligent
Agent Technology - Volume 02 (WI-IAT '09), Vol. 2.
IEEE Computer Society, Washington, DC, USA, 275-
278. (2009).
[27] Cui, X., Shi, H. “A*-based Pathfinding in Modern
Computer Games”, International Journal of Computer
Science and Network Security, 11(1): 125-130, (2011).
[28] Reynolds C. “Steering behaviours for autonomous
characters”, Proc. of game developers conference, Miller
Freeman Game Group, San Francisco, California, 763-
782, (1999).
[29] Ondej, J., Pettre, J., Olivier, A.-H., Donikian, S. “A
Synthetic-Vision-Based Steering Approach for Crowd
Simulation”, ACM Transactions on Graphics, 29(4): 123.
(2010).
[30] Helbing, D. and Molnar, P. “Social force model for
pedestrian dynamics”, Phys. Rev E, American Physical
Society, 51(5): 4282-4286, (1995).
[31] Lance, F., Matheossian, D., Poli, A. City Procedural
Modelling Open Source Tool-kit, GitHub, (Available at:
https://github.com/Akado/City-procedural-modeling),
Accessed at: 01/03/2013
[32] Anderson, E. “Playing smart – artificial intelligence in
computer games”, Proc. of zfxCON03 Conference on
Game Development, (2003).
[33] Ehrenstein, W.H., Ehrenstein, A. “Psychophysical
Methods”, Modern techniques in neuroscience research,
1211-1241, (1999).
[34] Dai, H. “On measuring psychometric functions: A
comparison of the constant-stimulus and adaptive updown
methods”, Journal of the Acoustical Society of
America, 98(6): 3135-3139, (1995).
Interactive Virtual and Augmented Reality Environments
62
8.4 Paper #4
Liarokapis, F., Mourkoussis, N., White, M., Darcy, J., Sifniotis, M., Petridis, P., Basu, A.,
Lister, P.F. Web3D and Augmented Reality to support Engineering Education, World
Transactions on Engineering and Technology Education, UICEE, 3(1): 11-14, 2004.
Contribution (80%): Collaboration on the design of the architecture. Implementation of the
most of the VR interface. Write-up of most of the paper.
World Transactions on Engineering and Technology Education © 2004 UICEE
Vol.3, No.1, 2004
11
INTRODUCTION
Traditional methods of educating students have well-proven
advantages, but some deficiencies have also been detected. A
typical problem has been how to engage students with
appropriate information and communication technologies
(ICT) during the learning process. In order to implement
innovative interactive communication and learning paradigms
with students, teachers should make innovative use of new ICT
[1]. Although multimedia material is provided in a number of
formats, including textual, images, video animations and aural,
educational systems are not designed according to current
teaching and learning requirements. That requirement is to
efficiently integrate these formats in well-proven means, eg
through the Web. The system described here does this by
introducing Web3D, virtual and augmented reality (AR) in the
same Web-based learning support application.
Research into educational systems associated with the use of
Web3D technologies is very limited. Web3D has the potential
for a number of different applications ranging from 2D to 3D
visualisation [2]. One of the most appropriate means of
presenting 2D information is through the WWW Consortium
[3]. On the other hand, a promising and effective way of 3D
visualisation is AR, which combines computer-generated
information with the real world, and it can be used successfully
to provide assistance to the user necessary to carry out difficult
procedures or understand complex problems [4].
An overview of existing AR systems in education and learning
has been presented elsewhere [5]. A more recent educational
application is an experimental system that demonstrates how to
aid teaching undergraduate geography students using AR
technologies [6]. An educational approach for collaborative
teaching targeted at teachers and trainees that makes use of AR
and the Internet has been illustrated by Wichert [7].
An educational system is presented here for improving the
understanding of the students through the use of Web3D and
AR presentation scenarios. An engineering and design
application has been experimentally designed to support the
teaching of mechanical engineering concepts such as machines,
vehicles, platonic solids and tools. It should be noted that more
emphasis has been given to the visualisation of 3D objects
because 3D immediately enhances the process of learning. For
example, a teacher can explain what a camshaft is using
diagrams, pictures and text, etc. However, it still may be
difficult for a student to understand what a camshaft does. In
the current system’s Web3D pictures, text and 3D model
(which can be animated) are visualised so that the student can
manipulate and interact with the camshaft, and also see other
related components such as the tappets, follower, etc, arranged
as they might be with an engine.
In this article, the authors present four example themes to
support the teaching of engineering design. These four themes
may represent different courses or different teaching sessions
as part of the same course. The remainder of this article
describes the requirements for augmented learning, provides a
brief discussion of the presented system’s architecture, and
illustrates how the system might be used to support teaching
processes using Web3D and AR technologies. Finally,
conclusions are made and future work suggested.
THE REQUIREMENTS OF AUGMENTED LEARNING
The requirements for virtual learning environments have been
already well defined [8]. However, in AR learning
environments, they have not been systematically studied. In
general, any educational application requires technological,
pedagogical and psychological aspects to be carefully
investigated before their implementation [9]. Especially when
introducing new technologies, such as Web3D and AR into the
Web3D and augmented reality to support engineering education
Fotis Liarokapis, Nikolaos Mourkoussis, Martin White, Joe Darcy, Maria Sifniotis, Panos Petridis,
Anirban Basu & Paul F. Lister
University of Sussex
Falmer, England, United Kingdom
ABSTRACT: In the article, the authors present an educational application that allows users to interact with 3D Web content
(Web3D) using virtual and augmented reality (AR). This enables an exploration of the potential benefits of Web3D and AR
technologies in engineering education and learning. A lecturer’s traditional delivery can be enriched by viewing multimedia content
locally or over the Internet, as well as in a tabletop AR environment. The implemented framework is composed in an XML data
repository, an XML-based communications server, and an XML-based client visualisation application. In this article, the authors
illustrate the architecture by configuring it to deliver multimedia content related to the teaching of mechanical engineering. Four
mechanical engineering themes (machines, vehicles, platonic solids and tools) are illustrated here to demonstrate the use of the
system to support learning through Web3D.
12
education process, many aspects need to be considered. The
authors have classified some of the most important issues that
are involved in AR learning scenarios.
To begin with, the educational system must be simple and
robust and provide users with clear and concise information.
This will increase the level of students’ understanding and their
skills. Moreover, the system must provide easy and efficient
interaction between the lecturer, students and the teaching
material. Apart from these issues, the digitisation of the
teaching material must be carried out carefully so that all of the
information is accurately and clearly presented to users. This
digitisation or content preparation is usually an offline process
and consists of many different operations, depending on the
target application.
The authors believe that a combination of Web3D and AR
technologies can help students explore the multidimensional
augmentation of teaching materials in various levels of detail.
Students can navigate through the augmented information and,
therefore, concentrate and study in detail any part of the
teaching material in different presentation formats, thus
enhancing understanding. With Web3D environments
traditional teaching materials may be augmented by high
quality images, 3D models, single- or multi-part models, as
well as textual metadata information. An image could be a
complex diagram, a picture or even a QuickTime movie. The
3D model allows the student to understand aspects of the
teaching material that is not evident in the pictures, because
they are hidden. Finally, metadata can provide descriptive
information about the teaching material that cannot be provided
by the picture and the 3D model.
SYSTEM ARCHITECTURE
The system presented here can be used to create and deliver
multimedia teaching material using Web3D and AR
technologies. The authors have already demonstrated this in
other application domains, such as virtual museum exhibitions
[10]. The architecture of this system is based on an
improvement of the researchers’ previously defined three-tier
architecture [11]. The architecture, as shown in Figure 1,
consists of content production, a server and visualisation
clients.
Figure 1: The three-tier architecture.
The first tier is the content production side, which consists of
the content acquisition process – content can consist of 3D
models, static images, textual information, animations and
sounds – and a content management application for XML
(CMAX) that gathers content from the file system and
packages this content into an XML repository called XDELite.
In the example illustrated in this article, most of the 3D models
utilised were downloaded from the Internet [12]. This is quite
important, because teachers should make best use of freely
available content because generating 3D content can be
expensive and time consuming.
The server side tier is based on XML and Java-Servlet
technologies. The Apache Tomcat server was used and was
configured with a Java Servlet, named the ARCOLite XML
Transformation Engine (AXTE) [10]. The purpose of this
server is to respond to user requests for data, stored in the
XDELite repository, and dynamically deliver this content to
the visualisation tier. XSL stylesheets are then utilised to
render the content to the visualisation clients.
The client visualisation tier consists of three different
visualisation domains, namely: the local, the remote and the
AR domains. The local domain is used for delivering
supporting teaching material over a Local Area Network
(LAN), while the remote domain may be used to deliver the
same presentations over the Internet, both utilising standard
Web browsers.
The AR domain allows the presentation of the same content in
a tabletop AR environment [10]. The authors have developed
an application called ARIFLite that consists of a standard Web
browser and an AR interface integrated inside a user friendly
visualisation client built from Microsoft Foundation Class
(MFC) libraries. The software architecture of ARIFLite is
implemented in C++ using an Object-Oriented (OO) style.
ARIFLite uses technologies, such as ARToolKit’s tracking and
vision libraries [13] and computer graphics algorithms based in
the OpenGL API [14]. The only restriction of the AR system is
that the marker cards and the camera are always in line of sight
of the camera.
USER OPERATION
The user, eg a student, accesses this system simply by typing a
URL into a Web browser that addresses the index page of the
presentation or launches the presentation from a desktop icon.
In this case, the student will be accessing a Web3D
presentation with 3D, but no AR view (see Figure 2), which
illustrates the Web browser embedded in ARIFLite. This is the
mode of operation for the Internet.
For local Web and AR use, eg in a university laboratory
environment or a seminar room, the student would launch
ARIFLite from an icon on the PC desktop. By using ARIFLite,
the student can browse multimedia content as usual, but also
extend the 3D models into the AR view. Switching to AR view
causes the Web browser to be replaced with a video window in
which the 3D model appears. The user can then interact with
the 3D model and can compare it to real objects in a natural
way, as illustrated in Figure 5.
WEB3D PRESENTATION
Demonstration in seminars and lecture rooms is one of the
most effective means of transferring knowledge to groups of
13
people. One of the capabilities of the presented system is to
increase the level of understanding of students through
interactive Web3D and AR presentation scenarios. The lecturer
can control the sequence of the demonstration using the
visualisation client [10]. One can imagine a group of students
and the lecturer gathered around a table on which there is a
computer and large screen display. The virtual demonstration
starts by launching a Web browser (ie Internet Explorer) or
ARIFlite. Figure 2 actually illustrates the Web browser
embedded in ARIFLite.
Figure 2: Web browser embedded in ARIFLite showing the
presentation’s homepage.
Figure 5: AR visualisation of a piston.
On the homepage, the user has the option to choose between
four different supporting material themes, namely: platonic
solids, tools, machines and vehicles. Each module contains a
list of thumbnails representing links to relevant sub-categories,
as shown in Figure 3.
Next, the user can access more specific information about any
of the existing sub-categories. For example, in Figure 4, the
user has clicked on the camshaft, which accesses a new Web
page showing a thumbnail (that could access a larger picture or
a QuickTime movie), description of the camshaft and an
interactive 3D model displayed in an embedded VRML
browser. At this stage, the lecturer can describe the underlying
theory of a camshaft while interacting with the 3D model, eg
rotating, translating or scaling the model.
Figure3: Selection of machines.
Figure 4: Web3D visualisation.
Augmenting a Web-based presentation with 3D information (as
shown in Figure 4) can enhance student understanding and
allow the lecturer to present material in a more efficient
manner.
AUGMENTED REALITY PRESENTATION
By using ARIFLite, the authors could now extend the Web3D
presentation into a tabletop AR environment. AR can be
extremely effective in providing information to a user dealing
with multiple tasks at the same time [15]. With ARIFLite, users
can easily perceive visual information in a new and exciting
way. In order to increase the level of understanding of the
teaching material, 3D information is presented on the tabletop
in conjunction with real objects. Figure 5 shows an AR view of
a user examining a virtual 3D model of camshaft arrangement
in conjunction with a set of real engine components.
Similarly with the system demonstrated by Kato et al, users can
physically manipulate the marker cards in the environment by
14
just picking the markers and moving them into the real world
[16]. In this way, students are able to visualise how a camshaft
is arranged in relation to other engine components and examine
the real components at the same time. Users can interact with
the 3D model using standard I/O devices, such as the keyboard
and the mouse.
In order to manipulate better the 3D model, haptic interfaces,
such as 3D mouse (ie SpaceMouse XT Plus), are integrated
within the system. The SpaceMouse provides an 11-button
menu and a puck allowing six degrees of freedom, which gives
a more efficient interface than the keyboard [17]. The user can
zoom, pan and rotate virtual information as naturally as if they
were objects in the real world.
CONCLUSION AND FUTURE WORK
In this article, a simple and powerful system for supporting
learning based on Web3D and AR technologies is presented.
Students can explore a 3D visualisation of the teaching
material, thus enabling them to understand more effectively
through interactivity with multimedia content. It is believed
that the presented experimental scenarios can provide a
rewarding learning experience that would be otherwise difficult
to obtain.
In the future, the authors plan to create more educational
templates and add further multimedia content for the XML
repository so as to apply the system in practice. In order to
optimise the system’s rendering capabilities, greater realism
will be added into the augmented environment using
augmented shadows. Finally, more work needs to be conducted
in improving human-computer interactions by adding haptic
interfaces so that the system will have a more collaborative
flavour.
ACKNOWLEDGEMENTS
This research was partially funded by the EU IST Framework
V programme, Key Action III-Multimedia Content and Tools,
Augmented Representation of Cultural Objects (ARCO)
project IST-2000-28366.
REFERENCES
1. Barajas, M. and Owen, M., Implementing virtual learning
environments: looking for holistic approach. Educational
Technology and Society, 3, 3 (2000).
2. Web3d Consortium (2004), http://www.web3d.org
3. World Wide Web Consortium (2004), http://www.w3c.org
4. Schwald, B. and Laval, B., An augmented reality system
for training and assistance to maintenance in the industrial
context. J. of WSCG, Science Press, 1 (2003).
5. Liarokapis, F., Petridis, P., Lister, P.F. and White, M.,
Multimedia Augmented Reality Interface for E-learning
(MARIE). World Trans. on Engng. and Technology Educ.,
1, 2, 173-176 (2002).
6. Shelton, B.E. and Hedley, N.R., Using augmented reality
for teaching earth-sun relationships to undergraduate
geography students. Proc. 1st
IEEE Inter. Augmented
Reality Toolkit Workshop, Darmstadt, Germany (2002).
7. Wichert, R., A mobile augmented reality environment for
collaborative learning and teaching. Proc. World Conf. on
E-Learning in Corporate, Government, Healthcare and
Higher Education (E-Learn), Montreal, Canada (2000).
8. Dillenbourg, P., Virtual learning environments. Proc. EUN
Conf. 2000, Workshop on virtual environments (2000).
9. Kaufmann, H., Collaborative augmented reality in
education. Proc. Imagina 2003 Conf. (Imagina03),
Monaco (2003).
10. White, M., Liarokapis, F., Mourkoussis, N., Basu, A.,
Darcy, J., Petridis, P. and Lister, P.F., A lightweight XML
driven architecture for the presentation of virtual cultural
exhibitions (ARCOLite). Proc. IADIS Inter. Conf. on
Applied Computing 2004, Lisbon, Portugal, 205-212
(2004).
11. Liarokapis, F., Mourkoussis, N., Petridis, P., Rumsey, S.,
Lister, P.F. and Whiet, M., An interactive augmented
reality system for engineering education. Proc. 3rd
Global
Congress on Engng. Educ., Glasgow, Scotland, UK,
334-337 (2002).
12. VRML Models (2004), http://www.ocnus.com/models/
13. Kato, H., Billinghurst M. and Poupyrev, I., ARToolkit
User Manual, Version 2.33, Human Interface Lab. Seattle:
University of Washington (2000).
14. Woo, M., Neider, J. and Davis, T., OpenGL Programming
Guide: The Official Guide to Learning OpenGL, Version
1.2, Addison Wesley, September, (1999).
15. Kalawsky, R.S. and Hill, K., Experimental Research into
Human Cognitive Processing in an Augmented Reality
Environment for Embedded Training Systems. London:
Springer-Verlag (2000).
16. Kato, H., Billinghurst, M., Poupyrev, I., Imamoto, K. and
Tachibana, K., Virtual object manipulation on a table-top
AR environment. Proc. Inter. Symp. on Augmented Reality
2000, Munich, Germany, 111-119 (2000).
17. SpaceMouse XT Plus (2004),
http://www.logicad3d.com/press/archive/2000/20001002.html
Interactive Virtual and Augmented Reality Environments
67
8.5 Paper #5
White, M., Mourkoussis, N., Darcy, J., Petridis, P., Liarokapis, F., Lister, P.F., Walczak, K.,
Wojciechowski, R., Cellary, W., Chmielewski, J., Stawniak, M., Wiza, W., Patel, M.,
Stevenson, J., Manley, J., Giorgini, F., Sayd, P., Gaspard, F. ARCO-An Architecture for
Digitization, Management and Presentation of Virtual Exhibitions, Proc. of the 22nd
International Conference on Computer Graphics (CGI'2004), IEEE Computer Society,
Hersonissos, Crete, June 16-19, 622-625, 2004.
Contribution (15%): Collaboration on the design of the VR and AR architecture.
Implementation of the most of the VR and AR interface. Write-up of parts of the paper.
1
ARCO—An Architecture for Digitization, Management and Presentation of
Virtual Exhibitions
Martin White1
, Nicholaos Mourkoussis1
, Joe Darcy1
, Panos Petridis1
, Fotis Liarokapis1
,
Paul Lister1
, Krzysztof Walczak2
, Rafa Wojciechowski2
, Wojciech Cellary2
,
Jacek Chmielewski2
, Miros aw Stawniak2
, Wojciech Wiza2
, Manjula Patel3
, James Stevenson4
,
John Manley5
, Fabrizio Giorgini6
, Patrick Sayd7
, Francois Gaspard7
University of Sussex1
, Poznan University of Economics2
, UKOLN –University of Bath3
, Victoria
and Albert Museum4
, Sussex Archaeological Society5
, Giunti Publishing Group6
, Commissariat
a l’Enegie Atomique7
{M.White, N.Mourkoussis, J.Darcy, P.Petridis, F.Liarokapis, P.F.Lister}@sussex.ac.uk
{walczak, wojciechowski, cellary, jchmiel, stawniak, wiza}@kti.ae.poznan.pl
M.Patel@ukoln.ac.uk, j.stevenson@vam.ac.uk, ceo@sussexpast.co.uk,
f.giorgini@giuntilabs.com, {SAYD, francois.gaspard}@ortolan.cea.fr
Abstract
A complete tool chain starting with stereo
photogrammetry based digitization of artefacts, their
refinement, collection and management with other
multimedia data, and visualization using virtual and
augmented reality is presented. Our system provides a
one-stop-solution for museums to create, manage and
present both content and context for virtual
exhibitions. Interoperability and standards are also
key features of our system allowing both small and
large museums to build a bespoke system suited to
their needs.
1. Introduction
The concept of using virtual exhibitions in
museums has been around for many years. Museums
are keen on presenting their collections in a more
appealing and exciting manner using the Internet to
attract visitors both virtually and into the physical
museum site. Recent surveys show that about 35% of
museums have already started developments with some
form of 3D presentation of objects [4].
Requirements related to the development of
augmented reality (AR) applications in the Cultural
Heritage field have been well documented [3]. Many
museum applications based on VRML have been
developed for the web [1][5][6]. An example of an
interactive virtual exhibition is the Meta-Museum
visualized guide system based on AR [2]. Another
simple museum AR system is the automated tour
guide, which superimposes audio on the world based
on the location of the user [7].
The European Union has also funded many
research projects in the field of cultural heritage and
archaeology. For example, the SHAPE project [8]
applies AR to the field of archaeology to educate
visitors about the artefacts and their history. The
3DMURALE project [9] is developing and using 3D
multimedia tools to record, reconstruct, encode and
visualize archaeological ruins in VR. In the Ename
974 project [10] visitors can enter a specially designed
on-site kiosk where real-time video images and
architectural reconstructions are superimposed, and
visitors can control the video camera and display
images using a touch screen. The ARCHEOGUIDE
project [11] provides an interactive AR guide for the
visualization of outdoor archaeological sites. Similar
to ARCHEOGUIDE is the LIFEPLUS project which
additionally encompasses real-time 3D simulations of
ancient fauna and flora [12].
The main advantage of the ARCO system over the
projects described above are that ARCO offers a
complete museum focused solution that can be
configured for museum needs—we can build bespoke
museum systems from interoperable ARCO
components. But more importantly, ARCO offers
methods for digitization, management and presentation
of heritage artefacts in virtual exhibitions based on
well understood metaphors that are also interactive and
appealing [13].
Proceedings of the Computer Graphics International (CGI’04)
1530-1052/04 $20.00 © 2004 IEEE
2
2. ARCO System Overview
The ARCO functionality mention above defines the
specification of the system architecture, illustrated in
Figure 1. For the content production process ARCO
provides two tools for 3D modelling of museum
artefacts: the Object Modeller (OM) and the Model
Refiner (MR). The OM tool is a 3D stereo
photogrammetry system based on the principles of
Image-based Modelling. The MR tool is a 3D
reconstruction refinement tool based on the 3ds max
framework that complements the OM tool. Note that
content production also includes acquiring other
multimedia data such as images, movies, etc. for input
to the content management process.
Content
Production
Database
ACMA
Designing Virtual
Exhibitions
Acquisition
Object
Modeller
Object
Refiner
Web + VR
Presentation
Web + AR
Presentation
Content
Management
Content
Visualization
XDE
Data
Exchange
Figure 1: ARCO System Architecture
For the content management process ARCO
provides a multimedia database management system
based on Oracle9i and the ARCO content Management
Application (ACMA). The database is the central
component of the ARCO system in that it stores,
manages and organises virtual artefacts into collections
for display in virtual exhibitions.
The final part of the ARCO architecture is the
content visualization process. The visualization of the
virtual museum artefacts is performed by VR and AR
browser. These browsers combine Web-based form of
presentation with either VR or AR virtual exhibitions.
The end user is able to browse content stored in the
database either remotely through the web, in a museum
kiosk, or to interact with the virtual objects in an AR
table-top environment using either a simple monitor
display or HMD. The ARCO system is based on the
data model [14] illustrated in Figure 2.
Cultural
Object
Acquired
Object
Media
Object
Refined
Object
Refines
Contains
SubclassSubclass
Refined
Includes
Figure 2: ARCO Data Model
A key element of the ARCO system is the
specification of an appropriate metadata element set
that underpins both the heritage and technical aspects
of ARCO. We need both to describe museum artefacts
and the technical processes that transform the artefacts
from the physical to the virtual. Accordingly, we have
designed a metadata element set called AMS [14],
called the ARCO Metadata Schema.
3. Virtual Museum Exhibitions
Virtual museum artefacts are displayed as virtual
exhibitions through three presentation domains:
WEB_LOCAL for use on local web-based displays
inside museums, WEB_REMOTE for use on the
Internet, and WEB_AR for use in AR presentations.
The ARCO system provides two main kinds of user
interfaces for browsing cultural heritage exhibitions:
Web-based interfaces and Augmented Reality
interfaces, see sections 3.1 and 3.2
3.1. Virtual Reality Exhibitions
In the Web-based interface a user can browse
information presented in a form of 3D VRML virtual
galleries or 2D Web pages with embedded multimedia
objects. The Web-based interface requires a standard
Web browser such as Internet Explorer with a VRML
plug-in. This kind of user interface can be used both on
local displays inside a museum (WEB_LOCAL) and
remotely on the Internet (WEB_REMOTE).
An example visualization of virtual exhibitions in a
Web browser is presented in Figure 3. This
visualization consists of Web pages with embedded 3D
VRML models and other multimedia objects and can
be used remotely over the Internet. Users can browse
the hierarchy of virtual exhibitions and virtual museum
artefacts by clicking on appropriate icons at the top of
the page.
Proceedings of the Computer Graphics International (CGI’04)
1530-1052/04 $20.00 © 2004 IEEE
3
Figure 3: Web-based visualization
Virtual exhibitions can also be visualized in the
Web browser in a form of 3D galleries, see Figure 4.
In this visualization, users can browse objects simply
by walking along the 3D room, which is a
reconstruction of a real gallery—an exhibition corridor
in the Victoria and Albert Museum in London.
Figure 4: Example 3D virtual exhibitions
3.2. Augmented Reality Exhibitions
To enable visualization of selected objects in an
AR environment an AR application has been
developed. The AR application is used instead of a
typical Web browser used in the Web-based interfaces.
The AR application integrates two components: a Web
browser and an AR browser. For the AR visualization
a camera and a set of physical markers placed in a real
environment is used. Video captured by the camera is
passed on to the AR browser that overlays virtual
representations of virtual museum artefacts using the
markers for object positioning [15].
Users can interact with the displayed objects using
both the markers and standard input devices, such as
the SpaceMouse®. In the first method, a user can
manipulate a marker in front of a camera as it is
presented in Figure 5 and look at an overlaid objects
from different angles and distances. This is a natural
and intuitive method of interaction with virtual objects.
Figure 5: Real scene augmented with superimposed
virtual models
The content and layout of the visualized scenes are
determined by visualization templates that define
which components of a virtual museum artefact are
composed into one VRML scene. One of the
important goals of the ARCO system is presenting
virtual museum artefacts in an attractive manner that
would make people, especially children, more
interested in cultural heritage. ARCO enables museum
curators to build interactive learning scenarios, where
visitors can gain information not only by browsing it,
but also by answering series of questions presented in
the form of a quiz. As an example, we have
implemented an interactive AR quiz based on
Fishbourne Roman Palace [16], illustrated Figure 6.
Figure 6: Example quiz scene
Proceedings of the Computer Graphics International (CGI’04)
1530-1052/04 $20.00 © 2004 IEEE
4
In this quiz we use one of the markers to display
the virtual museum artefact and a question, and three
more markers to display potential answers. The user
then chooses an answer, and depending on whether the
answer is correct or not, an appropriate response in the
AR scene appears, see Figure 7.
Figure 7: Wrong and correct answers
4. Conclusions
The ARCO system provides a complete solution
for digitization, management and presentation of
virtual museum exhibitions. We have addressed
digital acquisition, storage, management and
visualization in interactive VR and AR interfaces by
adopting a component based approach. Furthermore,
mixing and matching of individual components is
supported through the use of XML for interoperability
purposes. A system such as ARCO has the potential to
revolutionise the use of computer-based systems in
museums in the future, so that they are no longer
regarded as mere tools for cataloguing purposes, but
rather as ways of engaging and enhancing the
experience of their users.
5. Acknowledgments
This research was funded by the EU IST
Framework V programme, Key Action III-Multimedia
Content and Tools, Augmented Representation of
Cultural Objects (ARCO) project IST-2000-28336.
6. References
[1] Martin White, Krzysztof Walczak, Nicholaos
Mourkoussis, ‘ARCO—Augmented Representation of
Cultural Objects’ Advanced Imaging, Len Yencharis
(ed.), June 2003, Vol 18, No. 6 pp14-15, 46.
[2] Mase, K., Kadobayashi, R., et al., Meta-Museum: A
Supportive Augmented-Reality Environment for
Knowledge Sharing, ATR Workshop on Social Agents:
Humans and Machines, Kyoto, Japan, April 21-22,
1997.
[3] Brogni A., Avizzano C.A., Evangelista C., Bergamasco
M., Technological Approach for Cultural Heritage:
Augmented Reality, Proc. of 8th International
Workshop on Robot and Human Interaction (RO-MAN
'99).
[4] ORION, Object Rich Information Network
[http://www.orion-net.org/index.asp],
[accessed 11/12/2003]
[5] Sinclair, P and Martinez, K., Adaptive Hypermedia in
Augmented Reality, Proceedings of Hypertext, 2001.
[6] Gatermann, H., From VRML to Augmented Reality
Via Panorama-Integration and EAI-Java, in
Constructing the Digital Space, Proc. of SiGraDi’2000,
254-256, September 2000.
[7] Bederson, B.B., Audio Augmented Reality: A
Prototype Automated Tour Guide, In the ACM Human
Computer in Computing Systems conference (CHI'95),
pp 210-211
[8] Hall, T., Ciolfi, L., et al. The Visitor as Virtual
Archaeologist: Using Mixed Reality Technology to
Enhance Education and Social Interaction in the
Museum, Proc. of Virtual Reality, Archaeology, and
Cultural Heritage (VAST 2001), (Ed Spencer, S.) New
York, ACM SIGGRAPH, pp 91-96, Glyfada, Nr
Athens, November 2001.
[9] Cosmas, J., et al. 3D MURALE: a multimedia system
for archaeology, In Proceedings of the 2001 conference
on Virtual Reality, archaeology, and cultural heritage,
pp 297-306 (2001).
[10] Pletinckx, D., Callebaut, D., Killebrew, A., Silberman,
N., Virtual-Reality Heritage Presentation at Ename,
IEEE Multimedia April-June 2000 (Vol. 7, No. 2), pp.
45-48.
[11] Gleue, T., Dähne, P., Design and Implementation of a
Mobile Device for Outdoor Augmented Reality in the
ARCHEOGUIDE Project, Virtual Reality,
Archaeology, and Cultural Heritage International
Symposium (VAST01), Glyfada, Nr Athens, Greece,
28-30 November 2001.
[12] LIFEPLUS,
[http://www.miralab.unige.ch/subpages/lifeplus/HTML
/home.htm],
[accessed 11/12/2003]
[13] ARCO Consortium, ‘Augmented Representation of
Cultural Objects’, [http://www.arco-web.org],
[accessed 11/12/2003]
[14] Nicholaos Mourkoussis, Martin White, Manjula Patel,
Jecek Chmielewski and Krzysztof Walczak, ‘AMSMetadata
for Cultural Exhibitions using Virtual
Reality', DC-2003 Proc. of the International DCMI
Metadata Conference and Workshop, September 29Oct
2, 2003, Seattle, Washington, USA, ISBN 0-
9745303-0-1.
[15] Kato, H., Billinghurst, M., Poupyrev, I., Imamoto, K.
and Tachibana, K., Virtual object manipulation on a
table-top AR environment, Proceedings of
International Symposium on Augmented Reality
(ISAR’00), 111-119.
[16] Fishbourne Roman Palace,
[http://www.sussexpast.co.uk/fishbo/fishbo.htm],
[accessed 11/12/2003]
Proceedings of the Computer Graphics International (CGI’04)
1530-1052/04 $20.00 © 2004 IEEE
Interactive Virtual and Augmented Reality Environments
72
8.6 Paper #6
Liarokapis, F., Brujic-Okretic, V., Papakonstantinou, S. Exploring Urban Environments using
Virtual and Augmented Reality, Journal of Virtual Reality and Broadcasting, GRAPP 2006
Special Issue, Digital Peer Publishing, 3(5): 1-13, 2006.
Contribution (70%): Collaboration on the design of the architecture. Implementation of the
majority of the VR interface. Write-up of most of the paper.
Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 5
Exploring Urban Environments Using Virtual and Augmented Reality
Fotis Liarokapis, Vesna Brujic-Okretic, Stelios Papakonstantinou
City University
giCentre, Department of Information Science, School of Informatics
London EC1V 0HB
email: fotisl, vesna, stelios@soi.city.ac.uk
www: www.soi.city.ac.uk/organisation/is/research/giCentre/
Abstract
In this paper, we propose the use of speciﬁc system
architecture, based on mobile device, for navigation
in urban environments. The aim of this work is to
assess how virtual and augmented reality interface
paradigms can provide enhanced location based
services using real-time techniques in the context of
these two different technologies. The virtual reality
interface is based on faithful graphical representation
of the localities of interest, coupled with sensory
information on the location and orientation of the
user, while the augmented reality interface uses computer
vision techniques to capture patterns from the
real environment and overlay additional way-ﬁnding
information, aligned with real imagery, in real-time.
The knowledge obtained from the evaluation of
the virtual reality navigational experience has been
used to inform the design of the augmented reality
interface. Initial results of the user testing of the
experimental augmented reality system for navigation
are presented.
Digital Peer Publishing Licence
Any party may pass on this Work by electronic
means and make it available for download under
the terms and conditions of the current version
of the Digital Peer Publishing Licence (DPPL).
The text of the licence may be accessed and
retrieved via Internet at
http://www.dipp.nrw.de/.
First presented at the International Conference on
Computer Graphics Theory and Applications (GRAPP) 2006,
extended and revised for JVRB
Keywords: Mobile Interfaces, Augmented and Virtual
Environments, Virtual Tours, Humancomputer in-
teraction.
1 Introduction
Navigating in urban environments is one of the most
compelling challenges of wearable and ubiquitous
computing. The term navigation which can be deﬁned
as the process of moving in an environment can be extended
to include the process of wayﬁnding [DS93].
Wayﬁnding refers to the process of determining one or
more routes (also known as paths). Mobile computing
has brought the infrastructure for providing navigational
and wayﬁnding assistance to users, anywhere
and anytime. Moreover, recent advances in positioning
technologies - as well as virtual reality (VR), augmented
reality (AR) and user interfaces (UIs) - pose
new challenges to researchers to create effective wearable
navigation environments. Although a number of
prototypes have been developed in the past few years
there is no system that can provide a robust solution for
unprepared urban navigation. There has been signiﬁcant
research in position and orientation navigation in
urban environments. Experimental systems that have
been designed range from simple location-based services
to more complicated VR and AR interfaces.
An account of the user’s cognitive environment is
required to ensure that representations are not just delivered
on technical but also usability criteria. A key
concept for all mobile applications based upon location
is the ’cognitive map’ of the environment held in
mental image form by the user. Studies have shown
that cognitive maps have asymmetries (distances between
points are different in different directions), that
they are resolution-dependent (the greater the denurn:nbn:de:0009-6-7720,
ISSN 1860-2037
Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 5
sity of information the greater the distance between
two points) and that they are alignment-dependent
(distances are inﬂuenced by geographical orientation)
[Tve81]. Thus, calibration of application space concepts
against the cognitive frame(s) of reference is vital
to usability. Reference frames can be divided into
the egocentric (from the perspective of the perceiver)
and the allocentric (from the perspective of some external
framework) [Kla98].
End-users can have multiple egocentric and allocentric
frames of reference and can transform between
them without information loss [MA01]. Scale by
contrast is a framing control that selects and makes
salient entities and relationships at a level of information
content that the perceiver can cognitively manipulate.
Whereas an observer establishes a ’viewing scale’
dynamically, digital geographic representations must
be drawn from a set of preconceived map scales. Inevitably,
the cognitive ﬁt with the current activity may
not always be acceptable [Rap00].
Alongside the user’s cognitive abilities, understanding
the spatio-temporal knowledge users have is vital
for developing applications. This knowledge may
be acquired through landmark recognition, path integration
or scene recall, but will generally progress
from declarative (landmark lists), to procedural (rules
to integrate landmarks) to conﬁgurational knowledge
(landmarks and their inter-relations) [SW75]. There
are quite signiﬁcant differences between these modes
of knowledge, requiring distinct approaches to application
support on a mobile device. Hence, research has
been carried out on landmark saliency [MD01] and on
the process of self-localisation [Sho01] in the context
of navigation applications.
This work demonstrates that the cognitive value of
landmarks is in preparation for the unfamiliar and that
self-localisation proceeds by the establishment of rotations
and translations of body coordinates with landmarks.
Research has also been carried out on spatial
language for direction-giving, showing, for example,
those paths prepositions such as along and past
is distance-dependent [KBZ+01]. These ﬁndings suggest
that mobile applications need to help users add to
their knowledge and use it in real navigation activities.
Holl et al, [HLSM03] illustrate the achievability of this
aim by demonstrating that users who pre-trained for a
new routing task in a VR environment made fewer errors
than those who did not. This ﬁnding encourages
us to develop navigational wayﬁnding and commentary
support on mobile devices accessible to the cus-
tomer.
The objectives of this research include a number
of urban navigation issues ranging from mobile VR
to mobile AR. The rest of the paper is structured as
follows. In section 2, we present background work
while in section 4 we describe the architecture of our
mobile solution and explain brieﬂy the major components.
Sections 5 and 6 present the most signiﬁcant
design issues faced when building the VR interface,
together with the evaluation of some initial results. In
section 8, we present the initial results of the development
towards a mobile AR interface that can be used
as a tool to provide location and orientation-based services
to the user. Finally, we conclude and present our
future plans.
2 Background Work
There are a few location-based systems that have proposed
how to navigate through urban environments.
Campus Aware [BGKF02] demonstrated a locationsensitive
college campus tour guide, which allows
users to annotate physical spaces with text notes.
However, user-studies showed that navigation was not
well supported. The ActiveCampus project [GSB+04]
tests whether wearable technology can be used to enhance
the classroom and campus experience for a college
student. The project also illustrates ActiveCampus
Explorer, which provides location aware applications
that could be used for navigation. The latest
application is EZ NaviWalk, a pedestrian navigation
service launched in Japan in October 2003 by KDDI
[oTI04] but in terms of visualisation it offers only the
’standard’ 2D map.
From the other hand, many VR prototypes have
been designed for navigation and exploration purposes.
A good overview of the potential and challenges
for geographic visualisation has been previously
provided [MEH99]. One example is LAMP3D a
system for the location-aware presentation of VRML
content on mobile devices, applied in tourist mobile
guides [BC05]. Although the system provides tourists
with a 3D visualization of the environment they are exploring,
synchronized with the physical world through
the use of GPS data, there is no orientation information
available. Darken and Sibert [DS96] examined
whether real world wayﬁnding and environmental design
principles can be effective in designing large virtual
environments that support skilled wayﬁnding be-
haviour.
urn:nbn:de:0009-6-7720, ISSN 1860-2037
Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 5
Another example is the mobile multi-modal interaction
platform [WSK03] which supports both indoor
and outdoor pedestrian navigation by combining 3D
graphics with synthesised speed generation. Indoor
tracking is achieved through infra-red beacon communication
while outdoor via GPS. However, the system
does not use georeferenced or accurate virtual representations
of the real environment, neither report on
any evaluation studies. For the route guidance applications,
3D City models have been demonstrated as useful
for mobile navigation [KK02], but studies pointed
out the need for detailed modelling of the environment
and additional route information. To enhance the visualisation,
to aid navigation, a combination of 3D scene
representation and a digital map were previously used
in a single interface [RV01], [LGS03].
In terms of AR navigation, a few experimental systems
have been reported on, until present. One of
the ﬁrst wearable navigation systems is MARS (Mobile
Augmented Reality Systems) [FMHW97], which
aimed at exploring the synergy of two promising ﬁelds
of user interface research: AR and mobile computing.
Thomas et al, [TD+98] proposed the use of a wearable
AR system with a GPS and a digital compass as a new
way of navigating into the environment. Moreover, the
ANTS project [RCD04] proposes an AR technological
infrastructure that can be used to explore physical
and natural structures, mainly for environmental management
purposes. Finally, Reitmayr, et al., [RS04]
demonstrated the use of mobile AR for collaborative
navigation and browsing tasks in an urban environ-
ment.
Although the experimental systems listed above focus
on some of the issues involved in navigation, they
cannot deliver a functional system capable of combining
all accessible interfaces, consumer devices and
web metaphors. The motivation for the research reported
on in this paper is to address those issues,
namely an integration of a variety of hardware and
software components to provide effective and ﬂexible
navigational and wayﬁnding tool for urban environments.
In addition, we compare potential solutions for
detecting the user location and orientation in order to
provide appropriate urban navigation applications and
services.
To realise this we have designed a mobile platform
based on both VR and AR interfaces. To understand
in depth all the issues that relate to location and
orientation-based services, ﬁrst a VR interface was designed
and tested on a personal digital assistant (PDA)
as a navigation tool. Then, we have incorporated the
user feedback into the design of an experimental AR
interface. Both prototypes require the precise calculation
of the user position and orientation, for the registration
purpose. The VR interface is coupled with the
GPS and digital compass output to correlate the model
with the location and orientation of the user, while the
AR interface is only dependent on detecting features
belonging to the environment.
3 Urban Modelling
Figure 1: Accurate modelling of urban environment
(a) high resolution aerial image (b) 3D building ex-
truding
The objectives of this research include issues, such
as modelling the urban environment and using visualisation
concepts and techniques on a mobile device to
help navigation. Currently, the scene surrounding the
user is modelled in 3D, and the output is used as a base
for both VR and AR navigation scenarios. A partner
on the project, GeoInformation Group (GIG), Cambridge,
provided a unique and comprehensive data
urn:nbn:de:0009-6-7720, ISSN 1860-2037
Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 5
set, containing the building height/type and footprint
data, for the entire City of London. We are using 3D
modelling techniques, ranging from manual to semiautomated
methods, to create virtual representation of
the users immediate environment. The ﬁrst step of the
process involves the extrusion of a geo-referenced 3D
mesh using aerial photographs as well as building footprints
and heights (Figure 1).
The data set is enhanced by texture information, obtained
from the manually captured photographs of the
building sides, using a standard, higher resolution digital
camera. The steps in the semiautomated technique
for preparing and texturing the 3D meshes include: detaching
the objects in the scene; un-ﬂipping the mesh
normals; unifying the mesh normals; collapsing mesh
faces into polygons and texturing the faces. An example
screenshot of the textured model is shown in Figure
3. All 3D content is held in the GIG City heights database
for the test sites in London. The geo-referenced
models acquire both the orientation information and
the location through a client API on the mobile device,
and the application is currently fully functional on a
local device. In the ﬁnal version, the models will be
sent to the server in the packet-based message transmitted
over the used network. The server will build
and render the scene graph associated with the location
selected and return it to the client for portrayal.
4 Mobile Platform and Functionality
Figure 2: Architecture of our mobile interfaces
Based on these geo-referenced models as building
blocks, a generic mobile platform architecture has
been designed and implemented for urban navigation
and wayﬁnding applications and services (Figure 2).
4.1 System Conﬁguration
Figure 2 illustrates the system architecture aimed at
optimising navigation by using intelligent data retrieval
inside an urban area and providing types of digital
appropriately visualised information, suitable to be
offered as a core of an enhanced location based service.
The hardware conﬁguration consists of two distinct
sub-systems: i) the remote server equipment and
ii) the client device (e.g. a PDA) enhanced with a selection
of sensors and peripherals to facilitate the information
acquisition, in real time. Both sides feed
into the interface on a mobile device, in the form adequate
for the chosen mode of operation.
4.2 System Functionality
Software applications are custom made and include
the information retrieval application, clientserver communication
software and a cluster of applications on
the client side, which process sensory information, in
real-time, and ensure seamless integration of the outputs
into a unique interface. The calibration and registration
algorithms are at the core of the client side applications
ensuring all information is geo-referenced
and aligned with the real scene. Registration, in this
context, is achieved using two different methods: i) a
sensor based solution, taking and processing the readings
off the sensors directly, and ii) the image analysis
techniques coupled with the information on user’s location
and orientation obtained from the sensors. The
sensor system delivers position and orientation data,
in real-time, while a vision system is used to identify
ﬁducial points in the scene. All this information is
used as input to the VR and AR interfaces. The VR interface
uses GPS and digital compass information for
locating and orientating the user.
4.3 Interface modalities
Information visualisation techniques used vary according
to the nature of the digital content, and/or the
navigational task in hand, throughout the navigation.
In terms of the content to be visualised, the VR interface
can present only 3D maps and textual information.
On the other hand, the AR interface uses the
calculated user’s position and orientation coordinates
from the image analysis to superimpose 2D and 3D
urn:nbn:de:0009-6-7720, ISSN 1860-2037
Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 5
maps as well as text and auditory information on the
’spatially aware’ framework.
4.4 Notes on Hardware Components
Initially, the mobile software prototype was tested on
a portable hardware prototype consisting of a standard
laptop computer (equipped with 2.0 GHz M-processor,
1GB RAM and a GeForce FXGo5200 graphics card),
a Honeywell HMR 3300 digital compass, a Holux
GPS component and a Logitech web-camera (with 1.3
mega-pixel resolution). Then, the prototype system
has been ported to a mobile platform based on a Personal
Digital Assistant (PDA) and is currently being
tested with users.
4.5 Software infrastructure
In terms of the software infrastructure used in this
project, both interfaces are implemented based on Microsoft
Visual C++ and Microsoft Foundation Classes
(MFC). The graphics libraries used are based on
OpenGL, Direct3D and VRML. Video operations are
supported by the DirectX SDK (DirectShow libraries).
5 Virtual Reality Navigation
Navigation within our virtual environment (the spatial
3D map) can take place in two modes: automatic and
manual. In the automatic mode, GPS automatically
feeds and updates the spatial 3D map with respect to
the users position in the real space. This mode is designed
for intuitive navigation. In the manual mode,
the control is fully with the user, and it was designed
to provide alternative ways of navigating into areas
where we cannot obtain a GPS signal. Users might
also want to stop and observe parts of the environment
in which case control is left in their hands.
During navigation, there are minor modiﬁcations
obtained continuously from the GPS to improve the
accuracy, which results in minor adjustments in the
camera position information. This creates a feeling
of instability in user, which can be avoided by simply
restricting minor positional adjustments. The immersion
provided by GPS navigation is considered as
pseudo-egocentric because fundamentally the camera
is positioned at a height which does not represent a
realistic scenario. If, however, the user switches to
manual navigation, any perspective can be obtained,
which is very helpful for decision-making purposes.
While in a manual mode, any model can be explored
and analysed, therefore additional enhancements of
the graphical representation are of vital importance.
One of the problems that quickly surfaced during
the system evaluation is the viewing angle during navigation
which can make it difﬁcult to position the user.
This can make it difﬁcult to understand at which point
the user is positioned. After informal observation of
users during the development process, an altitude of
ﬁfty meters over the surface was ﬁnally adopted as adequate.
In this way, the user can visualise a broader
area plus the tops of the buildings, and acquire richer
knowledge about their location, in the VR environment.
The height information is hard-coded when the
navigation is in the automatic mode because user testing
(section 7) showed that it can be extremely useful
in cases where a user tries to navigate between tall
buildings, having low visibility.
Figure 3: FOV differences (a) low angle (b) high angle
Figure 3, illustrates to what extent the FOV is inﬂuenced
by that angle and how much more information
can be included from the same ﬁeld-ofview, if the angle
is favourable. In both Figure 3 (a) and Figure 3
(b), the camera is placed at exactly the same position
urn:nbn:de:0009-6-7720, ISSN 1860-2037
Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 5
and orientation in the horizontal plane, with the only
difference in the pitch angle. In Figure 3 (a), the pitch
angle is very low and in the Figure 3 (b) it is set to
maximum (90◦). This feature was considered important
to implement after initial testing. The obvious advantage
is that, once in a position, no additional rotations
are required from the user to understand the exact
position of the camera. Taking into consideration the
fact that the normal human viewing angle is about 60◦
and the application supports angles in the range from
0◦ to 90◦ , wide angles (including more objects of the
landscape) can be interactively obtained. This can be
extremely useful in cases where a user tries to navigate
between tall buildings, having low visibility.
We are currently implementing two different technologies
for presenting 3D maps on PDA interfaces,
involving VRML and Managed Direct3D Mobile
(MD3DM). The ﬁrst solution operates as a stand-alone
mobile application and uses VRML technology combined
with GPS for determining the position and a digital
compass for calculating orientation.
Figure 4: VR navigation in City Universitys campus
Figure 4 illustrates how the PDA-based navigation
inside a virtual environment can be performed. Specifically,
stylus interactions can be used to navigate inside
a realistic virtual representation of City University’s
campus. Alternatively, menu interactions can be
used as another medium for performing navigation and
wayﬁnding tasks. In terms of performance, the framerate
per second (FPS) achieved varies depending on
the device capabilities. For example, using an HTC
Universal device the efﬁciency ranges between 3 to 5
FPS while in a Dell Axim X51v PDA (with a dedicated
16 MB graphics accelerator) the efﬁciency ranges between
12 to 15 FPS.
The second interface is based on MD3DM that operates
as a separate mode, with the aim of handling the
output from the GPS/compass automatically providing
sufﬁcient functionality to generate mobile VR applications.
Compared to the VRML interface, the major
advantage of MD3DM is that it takes full advantage
of graphics hardware support and enables the development
of highperformance three-dimensional rendering
[LRBO06]. On the other hand, the major disadvantage
of MD3DM is that the Application Programming Interface
(API) is low level and thus a lot of functionality
which is standard in VRML has to be re-implemented.
6 Preliminary Evaluation
The aims of the evaluation of the VR prototype included
assessment of the user experience with particular
focus on interaction via movement, identiﬁcation
of speciﬁc usability issues with this type of interaction,
and to stimulate suggestions regarding future directions
for research and development. A ’thinking
aloud’ evaluation strategy was employed [DFAB04];
this form of observation involves participants talking
through the actions they are performing, and what they
believe to be happening, whilst interacting with the
system. This qualitative form of evaluation is highly
appropriate for small numbers of participants testing
prototype software: Dix et al, [DFAB04] suggested
that the majority of usability problems can be discovered
from testing in this way. In addition, Tory
and M¨oller [TM05] argued that formal laboratory user
studies can effectively evaluate visualisation when a
small sample of expert users is used.
The method used for the evaluation of our VR prototype
was based on the Black Box technique which
offers the advantage that it does not require the user
to hold any low-level information about the design
and implementation of the system. The usertesting
took place at City University campus which includes
building structures similar to the surrounding area with
urn:nbn:de:0009-6-7720, ISSN 1860-2037
Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 5
eight subjects in total (testing each one individually).
All subjects had a technical background and some
were familiar with PDAs. Their age varied between
25 and 55. For each test, each subject followed a predetermined
path represented by a highlighted line. Before
the start of the walk, the GPS receiver was turned
on and ﬂow of data was guaranteed between it and the
’Registration’ entity of the system. The navigational
attributes that were qualitatively measured include the:
user perspective, movement with device and decision
points.
6.1 User Perspective
The main point of investigation was to test whether
the user can understand where they are located in the
VR scene, in correspondence to the real world position.
An examination of the initial orientation and
level of immersion was also evaluated after minimum
interaction with the application and understanding of
the available options. The information that was obtained
by the users was concerning mainly four topics
including: level-ofdetail (LOD), user-perspective, orientation
and ﬁeld-of-view (FOV).
Most of the participants agreed that the LOD is not
sufﬁciently high for a prototype navigational application.
Some concluded that texture based models would
be a lot more appropriate but others expressed the
opinion that more abstract, succinct annotations would
help, at a different level (i.e. A to Z abstract representations).
Both groups of answers can ﬁt in the same
context, if all interactions could be visualised from
more than one perspective. A suggested improvement
was to add geo-bookmarks (also known as hotspots)
that would embed information about the nature of the
structures or even the real world functionality.
As far as the ’user-perspective’ attribute is concerned,
each user expressed a different optimal solution.
Some concluded that more than one perspective
is required to fully comprehend their position and orientation.
Both perspectives, the egocentric and the
allocentric, are useful during navigation for different
reasons [LGM+05] and under different circumstances.
During the initial registration, it would be more appropriate
to view the model from an allocentric point
of view (which would cover a larger area) and by
minimising the LOD just to include annotations over
buildings and roads. This proved easier to increase
the level of immersion with the system but not being
directly exposed to particular information such as
the structure of the buildings. In contrast, an egocentric
perspective is considered productive only when the
user was in constant movement. When in movement,
the VR interface retrieves many updates and the number
of decision points is increased. Further studies
should be made on how the system would assist an
everyday user, but a variation on the user perspective
is considered useful in most cases.
The orientation mechanism provided by the VR application
consists of two parts. The ﬁrst maintains the
user’s previous orientation whilst the second restores
the camera to the predeﬁned orientation (which is parallel
to the ground). Some users noted that when angle
direction points towards the ground gives better appreciation
of the virtual navigation. Another subject that
the users agree in is the occurrence of fast updates.
This can make it difﬁcult to navigate, because the user
needs to align the camera on three axes and not two.
Based on our experiments we noticed that the used orientation
mechanisms are inadequate for navigational
purpose and it is imperative that the scene should be
aligned in the same direction as the device in the real
world.
Furthermore, all participants appreciated the usermaintained
FOV. They agreed that it should be wide
enough to include as much information, on the screen,
as possible. They added that in the primary viewing
angle, there should be included recognisable landmarks
that would aid the user comprehend the initial
positioning. One mentioned that the orientation should
stay constant between consecutive decision points, and
hence should not be gesturebased. Most users agreed
that the functionality of the VR interface provides a
wide enough viewing angle able to recognise some
of the surroundings even when positioned between
groups of buildings with low detail level.
6.2 Movement with the Device
The purpose of this stage was to explore how respondents
interpreted their interaction with the device,
whilst moving. The main characteristics include the
large number of updates as well as the change of direction
followed by the user. The elements, which are
going to be discussed, are mainly considered with the
issues of making the navigation easier, the use of the
most appropriate perspective, and the accuracy of the
underlying system as well as the performance issues
that drive the application. One important issue is to
consider the inheritance of a speciﬁc perspective for
urn:nbn:de:0009-6-7720, ISSN 1860-2037
Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 5
use throughout the navigation process. Some participants
mentioned the lack of accurate direction waypoints
that would assist route tracking. A potential
solution is to consider the adoption of a user-focused
FOV during navigation using a simple line on the surface
of the model. However, this was considered partially
inadequate because the user expects more guidance
when reaching a decision point. Some participants
suggested to use arrows on top of the route line
which would be either visible for the whole duration of
the movement or when a decision point was reached.
In addition, it was positively suggested that the route
line should be more distinct, minimising the probability
of missing it while moving. Some expressed the
opinion that the addition of recognisable landmarks
would provide a clearer cognitive link between the VR
environment and the real world scene. However, the
outcomes of this method are useful only for registering
the users in the scene and not for navigation purposes.
A couple of participants included in their answers that
the performance of the system was very satisfactory.
This is an important factor to consider, because in the
change of the camera position occurs when new data is
being retrieved from the external sensor. The characterisation,
of the position transition, as smooth reﬂects
that the main objective of any actor is to obtain new
information about his position, at the time it is available.
The latency that the system supports is equal to
the latency the H\W receiver obtains meaning that the
performance of the application is solely dependent on
the quality of operating hardware. The adaptation to
a mobile operating system (i.e. Windows Mobile 5.0)
would signiﬁcantly increase the latency of the system,
since devices are not powerful enough to handle heavy
operations.
Moreover, opinions, about the accuracy of the
system, differ. One of respondents was convinced
that the accuracy, provided by the GPS receiver, was
inside the acceptable boundaries, which reﬂected
the GPS speciﬁcations supporting that the level of
accuracy between urban canyons was reﬂecting the
correspondence to reality, in a good manner. A second
test subject revealed that the occlusion problem was
in effect due to GPS inaccuracy reasons underlining
that when the GPS position was not accurate enough,
the possibility to miss the route line or any developed
direction system increased. Both opinions are equally
respected and highlighted the need for additional
feedback.
6.3 Decision Points
The last stage is concerned with the decision points
and the ability of the user to continue the interaction
with the system when it reaches them. A brief
analysis of the users’ answers is provided to identify
ways forward with the design, but full analysis
will be published in a separate publication. As described
previously, the user has the feeling of full freedom
to move at any direction, without being restricted
by any visualisation limitations of the computergenerated
environment. Nonetheless, participants may feel
overwhelmed by the numerous options they may have
available and be confused about what action to take
next. We take into consideration that large proportion
of users is not sufﬁciently experienced in 3D navigational
systems and the appropriate time is given to
them to familiarise with the system.
Preliminary feedback suggests that some users
would prefer the application to be capable of manipulating
the users perspective automatically, when a decision
point (or, an area close to it) is reached. This
should help absorb more information about the current
position as well as supporting the future decision
making process. Another interesting point relates to
the provision of choice to the user in the future to
accommodate sudden, external factors that may allow
them to detour from a default path. Partially, some
of these requirements would be met if the user could
manually add geo-bookmarks in the VR environment
representing points in space with supplementary personal
context. The detailed analysis of the responses
will be taken into account in further developments of
the system, which is underway.
7 Augmented Reality Navigation
The AR interface is the alternative way of navigating
in the urban environment using mobile systems. Unlike
the VR interface, which uses the hardware sensor
solution (a GPS component and a digital compass),
the AR interface uses a webcamera (or video camera)
and computer vision techniques to calculate position
and orientation. Based on the ﬁndings of the
previous section and a previously developed prototypes
[Lia05], [LGM+05], a high-level AR interface
has been designed for outdoor use. The major difference
with other existing AR interfaces, such as the
ones described in [FMHW97], [TD+98], [RS04] and
[RCD04], is that our approach allows for the combiurn:nbn:de:0009-6-7720,
ISSN 1860-2037
Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 5
nation of four different types of navigational information:
3D maps, 2D maps, text and spatial sound. In
addition, two different modes of registration have been
designed and experimented upon, based upon ﬁducial
points and feature recognition. The purpose for the exercise
was to understand some of the issues involved in
two of the key aspects of urban navigation: wayﬁnding
and commentary.
In the ﬁducial points recognition mode, the outdoor
environment needs to be populated with ﬁducials prior
to the navigational experience. Fiducials are placed in
points-of-interest of the environment, such as corners
of the buildings, ends of streets etc, and play a signiﬁcant
role in the decision making process. In our current
implementation we have adopted ARToolKit’s template
matching algorithm [KB99] for detecting marker
cards and we try to extend it for natural feature detection.
Features that we currently detect can come in
different shapes, such as square, rectangular, parallelogram,
trapezium and rhomb [Lia05] similar to shapes
that exist in the environment. In addition, it is not convenient,
sometimes it is even impossible, to populate
large urban areas with ﬁducials. Therefore, we have
experimentally used road signs as ﬁducials to compute
the users pose [LRBO06]. Road signs are most of the
time printed in black colour on a white background.
Also, they are usually placed at the decision points,
such as beginning and ending of streets, corners and
junctions. As a result, if a highresolution camera is
used to capture the object, it is relatively easy to detect
the road signs, as illustrated in Figure 5.
One of the known limitations of this technique is
that, sometimes, road signs are not in a good condition,
which makes it more difﬁcult to recognise a pattern.
Also, the size of the road signs is usually ﬁxed (depending
on the urban area) limiting severely the number
of operations that can be done on it. An example
screenshot of how road signs can be used in practice as
ﬁducial points (instead of using pre-determined markers)
during urban navigation is illustrated in 5.
Alternatively, in the feature recognition, the user is
’searching’ to detect natural features of the real environment
to serve as ’ﬁducial points and pointsof- interest,
respectively. Distinctive natural features like
door entrances, windows etc, have been experimentally
tested to see whether they can be used as ’natural
markers’. Figure 6 shows the display presented to
a user navigating in City University’s campus, to acquire
location and orientation information using ’natural
markers’.
Figure 5: Pattern recognition of road signs: (a) original
image; (b) detected image
As soon as the user turns the camera (on a mobile
device) towards these predeﬁned natural markers,
audio-visual information (3D arrows, textual and/or
auditory information) can be superimposed on the
real-scene imagery (Figure 7), thus satisfying some of
the requirements identiﬁed in section 6.1. Userstudies
for tour guide systems showed that visual information
could sometimes distract the user [BGKF02],
while audio information could be used to decrease the
distraction [WAHS01]. With this in mind, we have
introduced a spatially referenced sound into the interface,
to be used simultaneously with the visual information.
In our preliminary test case scenario, a prerecorded
sound ﬁle is assigned to the corresponding
ﬁducial point, for each pointof- interest. As the user
approaches a ﬁducial point, commentary information
can be spatially identiﬁed; the closer the user to the
object the louder the volume of the commentary audio
information. Depending on the end-user’s preferences,
urn:nbn:de:0009-6-7720, ISSN 1860-2037
Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 5
or needs, the system allows for a different type of digital
information to be selected and superimposed. For
example, for visually impaired users audio information
may be preferred to use over visual, or a combination
of the two may be found optimal [Lia05]. A
coarse comparison between the use of ﬁducial points
and the feature recognition mode is shown in Table 1.
Further testing is underway and the detailed analysis
will be published in a separate publication.
Recognition Mode Range Error Robustness
Fiducial 0.5∼ 2m Low High
Feature 2∼ 10m High Low
Table 1: Fiducial vs feature recognition mode
In the feature recognition mode, the advantage is
that the range within which it may operate is much
greater because it does not require preparation of the
environment. Thus, it can be applied when wayﬁnding
is the focus of the navigation. However, the natural
feature tracking algorithm, which is used in this scenario,
does require improved accuracy of the position
and orientation information, which is currently limited.
In contrast, the ﬁducial points recognition mode
offers the advantage of a very low error during the
tracking process (i.e. detecting ﬁducial points). However,
the limited space of operation due to the need to
populate the area with tags, makes it more appropriate
for conﬁned areas and commentary navigation modes.
The research suggests, however, that the combination
of ﬁducial and feature recognition modes allows the
user to pursue both wayﬁnding and commentary based
navigation into urban environments within a single ap-
plication.
Figure 6: Road sign pedestrian navigation
8 Discussion
After completing the development of a portable prototype
application (based on a laptop computer based)
speciﬁc requirements to enhance the user interface
and interaction mechanisms on a mobile device (PDA)
were identiﬁed. Through this research, it was found
obligatory to retrieve and visualise spatio-temporal
content from a remote server in order to support realtime
operation and meet the information needs of a
user. This was accomplished by transmitting geographic
coordinates (i.e. GPS input) to the server-side
and automatically retrieving geo-referenced information
in the form of VRML 3D maps. The 3D content
was designed to cover an area encompassing the
current position of the user and the position of one or
more actors/points-of-interest in their proximity. The
quality and accuracy of these models are proved good
while the techniques used are customdeveloped and
based on a semi-automated routine, developed in a
specialised software development environment.
9 Conclusions
This paper addresses how virtual and augmented reality
interface paradigms can provide enhanced location
based services for urban navigation and wayﬁnding.
The VR interface operates on a PDA and presents a
realistic and geo-referenced graphical representation
of the localities of interest, coupled with sensory information
on the location and orientation of the user.
The knowledge obtained from the evaluation of the
VR navigational experience has been used to inform
Figure 7: Detecting door entrances
urn:nbn:de:0009-6-7720, ISSN 1860-2037
Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 5
the design of the AR interface which operates on a
portable computer and overlays additional way-ﬁnding
information onto the captured patterns from the real
environment.
Both systems calculate the user’s position and orientation,
but using a different methodology. The VR
interface relies on a combination of GPS and digital
compass data whereas the AR interface is only dependent
on detecting features of the immediate environment.
In terms of information visualisation, the VR
interface can only present 3D maps and textual information
while the AR interface can, in addition, handle
other relative geographical information, such as digitised
maps and spatial auditory information.
Work on both modes and interfaces is in progress
and we also consider a hybrid approach, which aims
to ﬁnd a balance between the use of hardware sensors
(GPS and digital compass) and software techniques
(computer vision) to achieve the best registration results.
In parallel, we are designing a spatial database
to store our geo-referenced urban data, which will feed
the client-side interfaces as well as routing algorithms,
which we are developing to provide more services to
mobile users. The next step in the project is a thorough
evaluation process, using both qualitative and quantitative
methods. The results will be published in due
course.
10 Acknowledgments
The work presented in this paper is conducted within
the LOCUS project, funded by EPSRC, through the
Location and Timing (KTN) Network. We would also
like to thank our partner on the project, GeoInformation
Group, Cambridge, for making the entire database
of the City of London buildings available to the
project. The invaluable input from David Mountain on
resolving the sensor fusion issues and from Christos
Gatzidis for generating components of the 3D content
are greatly acknowledged.
References
[BC05] Stefano Burigat and Luca Chittaro, Locationaware
visualization of VRML models
in GPS-based mobile guides, Proceedings
of the 10th International Conference
on 3D Web Technology, ACM Press,
2005, ISBN 1-59593-012-4, pp. 57–64.
[BGKF02] Jenna Burrell, Geri K. Guy, Kiyo Kubo,
and Nick Farina, Context-aware computing:
test case, Proceedings of UbiComp,
Lecture Notes in Computer Science
Vol. 2498, Springer, 2002, ISBN 3-
540-44267-7, pp. 1–15.
[DFAB04] Alain J. Dix, Janet E. Finlay, Gregory D.
Abowd, and Russel Beale, HumanComputer
Interaction, 3rd edition ed.,
Prentice Hall, Harlow, 2004, ISBN 0-13-
046109-1.
[DS93] Rudy P. Darken and John L. Sibert, A
toolset for navigation in virtual environments,
Proceedings of the 6th annual
ACM symposium on User interface software
and technology (New York, NY,
USA), ACM Press, 1993, ISBN 0-89791-
628-X, pp. 157–165.
[DS96] Rudy P. Darken and John L. Sibert, Navigating
Large Virtual Spaces, International
Journal of Human-Computer Interaction
8 (1996), no. 1, 49–72, ISSN 1044-
7318.
[FMHW97] Steven Feiner, Blair MacIntyre, Tobias
H¨ollerer, and Antony Webster, A touring
machine: Prototyping 3D mobile
augmented reality systems for exploring
the urban environment, Proceedings of
the 1st IEEE International Symposium
on Wearable Computers, IEEE Computer
Society, 1997, ISBN 0-8186-8192-
6, pp. 74–81.
[GSB+04] William G. Griswold, Patricia Shanahan,
Steven W. Brown, Robert S. Boyer,
Matt Ratta. R. Benjamin Shapiro, and
Tan Minh Truong, ActiveCampus: Experiments
in Community-Oriented Ubiquitous
Computing, Computer 37 (2004),
no. 10, 73–81, ISSN 0018-9162.
[HLSM03] Doris H¨oll, Bernd Leplow, Robby
Sch¨onfeld, and Maximilian Mehdorn,
Spatial cognition III, Lecture Notes in
Computer Science Vol. 2685, ch. Is it
possible to learn and transfer spatial information
from virtual to real worlds?,
pp. 143–156, Springer, Berlin, 2003,
ISBN 3-540-40430-9.
urn:nbn:de:0009-6-7720, ISSN 1860-2037
Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 5
[KB99] Hirokazu Kato and Mark Bilinghurst,
Marker Tracking and HMD Calibration
for a Video-Based Augmented Reality
Conferencing System, Proceedings of the
2nd IEEE and ACM Internationial Workshop
on Augmented Reality, IEEE Computer
Society, 1999, ISBN 0-7695-0359-
4, pp. 85–94.
[KBZ+01] Christian Kray, J¨org Baus, Hubert D.
Zimmer, Harry R. Speiser, and Antonio
Kr¨uger, Two path prepositions: along
and past, Proceedings of the International
Conference on Spatial Information
Theory: Foundations fo Geographic Information
Science (London) (D. Montello,
ed.), Lecture Notes in Computer
Science Vol. 2205, Springer, 2001, ISBN
3-540-42613-2, pp. 263–277.
[KK02] M. Kulju and E. Kaasinen, Guidance Using
a 3D City Model on a Mobile Device,
Workshop on Mobile Tourism Support
Mobile HCI 2002 Symposium, Pisa,
Italy, Sept. 17th, 2002.
[Kla98] R. L. Klatzky, Spatial cognition - An
interdisciplinary approach to representation
and processing of spatial knowledge,
ch. Allocentric and egocentric spatial
representations: Deﬁnitions, distinctions,
and interconnections, pp. 1–
18, Springer, Berlin, 1998, ISBN 3-540-
64603-5.
[LGM+05] Fotis Liarokapis, Ian Greatbatch, David
Mountain, Anil Gunesh, Vesna BrujicOkretic,
and Johnathan Raper, Mobile
Augmented Reality Techniques for GeoVisualisation,
Proceedings of the 9th
International Conference on Information
Visualisation IV’05, IEEE Computer
Society, 2005, ISBN 0-7695-2397-
8, pp. 745–751.
[LGS03] K. Laakso, O. Gjesdal, and J.R. Sulebak,
Tourist information and navigation
support by using 3D maps displayed on
mobile devices, Workshop on Mobile
Guides, Mobile HCI 2003 Symposium,
Udine, Italy, 2003.
[Lia05] Fotis Liarokapis, Augmented Reality
InterfacesArchitectures for Visualising
and Interacting with Virtual
Information, Ph.D. thesis, School of
Science and Technology, University
of Sussex, Department of Informatics,
2005, Sussex theses S 5931,
ISBN/ISSN/CNM0426866US.
[LRBO06] Fotis Liarokapis, Johnathan Raper, and
Vesna Brujic-Okretic, Navigating within
the urban environment using Location
and Orientation-based Services, European
Navigation Conference, 710 May,
Manchester, UK, 2006.
[MA01] Christy R. Miller and Gary L. Allen, Spatial
frames of reference used in identifying
directions of movement: an unexpected
turn, Proceedings of the International
Conference on Spatial Information
Theory: Foundations of Geographic Information
Science (London) (D. Montello,
ed.), Lecture Notes in Computer
Science Vol 2205, Springer, 2001, ISBN
3-540-42613-2, pp. 206–216.
[MD01] Pierre-Emmanuel Michon and Michel
Denis, Proceedings of the international
conference on spatial information theory:
Foundations of geographic information
science, Lecture Notes in Computer
Science Vol 2205, ch. When and why are
visual landmarks used in giving directions,
pp. 292–305, London, 2001, ISBN
3-540-42613-2.
[MEH99] A.M. MacEachren, R. Edsall, and
D. Hauq, Virtual environments for Geographic
Visualization: Potential and
Challenges, Proceedings of the ACM
Workshop on New Paradigms in Information
Visualization and Manipulation,
ACM Press, 1999, ISBN 1-58113-254-9,
pp. 35–40.
[oTI04] DTI Department of Trade and Industry,
Location-based services:
understanding the japanese experience,
global watch mission report,
http://www.oti.globalwatchonline.com/online
pdfs/36246MR.pdf, 2004, visited:
10/02/2006.
urn:nbn:de:0009-6-7720, ISSN 1860-2037
Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 5
[Rap00] Johnathan F. Raper, Multidimensional
geographic information science, Taylor
and Francis, London, 2000, ISBN 0-
7484-0506-2.
[RCD04] T. Rom˜ao, N. Correia, and E. Dias,
ANTS-Augmented Environments, Computers
& Graphics 28 (2004), no. 5, 625–
633, ISSN 0097-8493.
[RS04] Gerhard Reitmayr and Dieter Schmalstieg,
Collaborative Augmented Reality
for Outdoor Navigation and Information
Browsing, Proceedings of the Symposium
of Location Based Services and
TeleCartography, Vienna, Austria, January
2004, 2004, pp. 31–41.
[RV01] I. Rakkolainen and T. Vainio, A 3D
City Info for Mobile Users, Computer
& Graphics 25 (2001), no. 4, 619–625,
ISSN 0097-8493.
[Sho01] M. Jeanne Sholl, The role of self reference
system in spatial navigation, Proceedings
of the International Conference
on Spatial Information Theory: Foundations
of Geographic Information Science
(London) (D. Montello, ed.), Lecture
Notes in Computer Science Vol. 2205,
Springer, 2001, ISBN 3-540-42613-2,
pp. 217–232.
[SW75] A.W. Siegel and S.H. White, The development
of spatial representation of large
scale environments, Advances in child
development and Behaviour 10 (1975),
9–55, ISSN 0065-2407.
[TD+98] B.H. Thomas, , V. Demczuk,
W. Piekarski, D. Hepworth, and B. Gunther,
A Wearable Computer System with
Augmented Reality to Support Terrestrial
Navigation, Proceedings of the 2nd
International Symposium on Wearable
Computers, IEEE and ACM, 1998, ISBN
0-8186-9074-7, pp. 168–171.
[TM05] M. Tory and T. M¨oller, Evaluating Visualizations:
Do Expert Reviews Work?,
Computer Graphics and Applications 25
(2005), no. 5, 8–11, ISSN 0272-1716.
[Tve81] B. Tversky, Distortions in memory for
maps, Cognitive Psychology 13 (1981),
407–433, ISBN 0010-0285.
[WAHS01] Allison Woodruff, Paul M. Aoki, Amy
Hurst, and Margaret H. Szymanski, Electronic
Guidebooks and Visitor Attention,
Proceedings of 6th International
Cultural Heritage Informatics Meeting
ICHIM’01, Milan, Italy, Sep. 2001,
2001, ISBN 1-885626-24-X, pp. 437–
454.
[WSK03] Rainer Wasinger, Christoph Stahl, and
Antonia Kr¨uger, Mobile Multi-Modal
Pedestrian Navigation, Second International
Workshop on Interactive Graphical
Communication IGC 2003, London,
2003.
Citation
Fotis Liarokapis, Vesna Brujic-Okretic, Stelios
Papakonstantinou, Exploring Urban Environments
Using Virtual and Augmented, Journal of Virtual Reality
and Broadcasting, 3(2006), no. 5, December 2006,
urn:nbn:de:0009-6-7720, ISSN 1860-2037.
urn:nbn:de:0009-6-7720, ISSN 1860-2037
Interactive Virtual and Augmented Reality Environments
86
8.7 Paper #7
Liarokapis, F. An Augmented Reality Interface for Visualizing and Interacting with Virtual
Content, Virtual Reality, Springer, 11(1): 23-43, 2007.
Contribution (100%): Design of the architecture and implementation of the AR interface.
Write-up of the paper.
ORIGINAL ARTICLE
An augmented reality interface for visualizing and interacting
with virtual content
Fotis Liarokapis
Received: 15 December 2004 / Accepted: 19 October 2006 / Published online: 9 November 2006
Ó Springer-Verlag London Limited 2006
Abstract In this paper, a novel AR interface is proposed
that provides generic solutions to the tasks involved
in augmenting simultaneously different types of
virtual information and processing of tracking data for
natural interaction. Participants within the system can
experience a real-time mixture of 3D objects, static
video, images, textual information and 3D sound with
the real environment. The user-friendly AR interface
can achieve maximum interaction using simple but
effective forms of collaboration based on the combinations
of human–computer interaction techniques. To
prove the feasibility of the interface, the use of indoor
AR techniques are employed to construct innovative
applications and demonstrate examples from heritage
to learning systems. Finally, an initial evaluation of the
AR interface including some initial results is presented.
Keywords Augmented reality Á Human–computer
interaction Á Tangible interfaces Á Virtual heritage Á
Learning systems
1 Introduction
Augmented reality (AR) is an increasingly important
and promising area of mixed reality (MR) and user
interface design. In technical terms, it is not a single
technology but a collection of different technologies
that operate in conjunction, with the aim of enhancing
the user’s perception of the real world through virtual
information (Azuma 1997). This sort of information is
usually referred to as virtual, digital or synthetic
information. The real world must be matched with the
virtual in position and context in order to provide an
understandable and meaningful view (Mahoney 1999).
Participants can work individually or collectively,
experiment with virtual information and interact with a
mixed environment in a natural way (Klinker et al.
1997). In an ideal AR visualisation scenario, the virtual
information must be mixed with the real world in realtime
in such a way that the user can either understand
or not, the difference (Vallino 1998). In case where
virtual information looks alike the real environment,
the AR visualisation is considered as the ultimate immersive
system where participants cannot become
more immersed in the real environment (RE).
The term AR usually refers to one of the following
deﬁnitions (Milgram and Colquhoun 1999). A class of
display systems that consist of a type of head mounted
display (HMD) (Azuma 1997); those systems that utilize
an equivalent of an HMD belong to the second
class, encompassing both large screen and monitorbased
displays (Milgram and Kishino 1994). A third
classiﬁcation refers to the cases that include any type of
mixture of real and virtual environments. Overall, the
majority of AR systems rely on electronic sensors or
video input in order to gain knowledge of the environment
(Haniff et al. 2000). All these variables make
these systems more complex than systems that do not
rely on sensors. Vision based systems on the other
hand, often use markers as feature points so they can
estimate the camera pose (position and orientation).
F. Liarokapis
Department of Informatics, University of Sussex,
Falmer, Brighton BN1 9QT, UK
F. Liarokapis (&)
Department of Information Science, City University,
London EC1V 0HB, UK
e-mail: fotisl@soi.city.ac.uk; f.Liarokapis@sussex.ac.uk
123
Virtual Reality (2007) 11:23–43
DOI 10.1007/s10055-006-0055-1
In the upcoming years, AR systems will be able to
include a complete set of augmentation applied and
exploiting all people’s senses (Azuma et al. 2001).
However, although there are many examples of AR
systems where users can interact with and manipulate
virtual content and even create virtual content within
some AR environments, one of their major constraints
is the lack of ability to allow participants control
multiple forms of virtual information in a number of
different ways. To a great extend, this deﬁciency derives
mainly from the lack of robustness of currently
existing AR interface systems. At this stage, this can be
dealt by using a user-friendly interface to allow users
position audio–visual information anywhere inside the
physical world. Since the pose can be easily estimated
through an existing vision based tracking system such
as the well-known ARToolKit (Kato et al. 2000a, b),
the focus of this research is to provide effective solutions
for interactive indoor AR environments.
Vision-based AR interface environments highly
depend on four key elements. The ﬁrst two relate to
marker implementation and calibration techniques. The
latter are interrelated with the construction of software
user interfaces that will allow the effective visualisation
and manipulation of the virtual information. The integration
of such interfaces into AR systems can reduce
the complexity of the human–computer interaction
using implicit contextual input information (Rekimoto
and Nagao 1995). Human computer interaction techniques
can offer greater autonomy when compared with
traditional windows style interfaces. Although some
work has been performed into, the integration of such
interfaces into AR systems (Feiner et al. 1993; Haller
et al. 2002; MacIntyre et al. 2005) the design and
implementation of an effective AR system that can
deliver realistically audio–visual information in a userfriendly
manner is a difﬁcult task and an area of
continuous research. However, it is very difﬁcult even
for technologists to create AR experiences to eliminate
these barriers (MacIntyre et al. 2005) that prevent users
to create new AR applications. To address the above
issues, a prototype AR interface for assisting users that
have some virtual reality experience to create fast and
effective AR applications is proposed. The main novel
contributions of this paper include the following:
• Simultaneous and realistic 3D audio–visual augmentation
in real-time performance;
• Implementation and combination of ﬁve different
ways for interacting with the virtual content;
• Design and implementation of a high-level user
centred interface that provides accurate and
reliable control of the AR scene;
• Two innovative applications: one for cultural
heritage and one for higher education and
• Initial evaluation regarding the overall effectiveness
of the system;
In the remainder of this paper, we describe our
system starting with Sect. 2 that gives a historical
overview of the AR interfaces. In Sect. 3, the architecture
of the prototype AR interface is presented
in detail. Section 4, presents various calibration
approaches followed to calibrate our camera sub-system
accurately. Section 5 presents realistic augmentation
techniques that can be applied in real-time
performance. Section 6 proposes ﬁve different ways of
interacting with the AR scene while in Sect. 7 two
application scenarios are presented. In Sect. 8, the
results from an initial evaluation are presented
whereas Sect. 9 summarises the key ﬁndings and the
current status of research and suggests future work.
2 Historical overview of AR interfaces
One of the earliest applications involved an experimental
AR system that supports a full X11 server on a
see-through HMD. The display overlays a selected
portion of the X bitmap, on the user’s view of the
world, creating an X-based AR. Three different types
of windows were developed: surround-ﬁxed windows,
display-ﬁxed windows and world-ﬁxed windows. The
performance of the system was in the range of
6–20 frames-per-second (FPS). A fast display server
was developed supporting multiple overlaid bitmaps
having the ability to index into a display a selected
portion of a larger bitmap (Feiner et al. 1993). EMMIE
(Butz et al. 1999) is another experimental hybrid user
interface designed for a collaborative augmented
environment that combines various different technologies
and techniques such as virtual components (i.e.
3D widgets) and physical objects (tracked displays,
input devices). The objects in the system can be moved
among various types of displays, ranging from seethrough
HMDs to additional 2D and 3D displays.
These vary from palm-sized to wall-sized depending on
the nature of the task.
The MagicBook (Billinghurst et al. 2001) and the
Tiles system (Poupyrev et al. 2002) are two of the most
well known AR interfaces based on the ARToolKit.
The Tiles system proposes a way of creating an AR
workspace blending together virtual and physical
objects. The interface combines the advantages (power
and ﬂexibility) of computing environments with the
comfort and awareness of the traditional workplace
24 Virtual Reality (2007) 11:23–43
123
(Poupyrev et al. 2002). On the other hand, the MagicBook
uses a real book to transfer users from reality
to virtuality. Virtual objects are superimposed on the
pages of the book and users can interact with the
augmented scene (Billinghurst et al. 2001). Another
example of an AR tangible interface is a tabletop
system designed for virtual interior design (Kato et al.
2000a). One or multiple users can interact with the
augmented scene, which consists of virtual furniture
and manipulates the virtual objects.
MARE (Grasset and Gascuel 2002) is a collaborative
system that mixes together AR techniques with
human-computer interaction techniques, in order to
provide a combination of natural metaphors of communication
(voice, gesture, expression) with virtual
information (simulation, animation, persistent data).
The architecture of the system is based on OpenGL
Performer and XML conﬁguration ﬁles and it can be
easily adapted to many application domains. Another
interesting workspace is a wearable AR generic platform
that supports true stereoscopic 3D graphics
(Reitmayr and Schmalstieg 2001). The system supports
six degrees-of-freedom (DOF) manipulations of virtual
objects in the near ﬁeld using a pen and a pad interface.
Slay et al. (2001) developed an AR system that extends
interactions from a traditional desktop interaction
paradigm to a tangible AR paradigm. A range of issues
related to the rapid assembly and deployment of
adaptive visualisation systems was investigated. Three
different techniques, for the task of switching the
attributes of the virtual information in AR views, were
presented.
Furthermore, the AMIRE project (Haller et al.
2002) aims at developing fast rapid prototyping
through vision-based AR for users without detailed
knowledge of the underlying base technologies of
computer graphics and AR skills. AMIRE uses a
component-oriented technology consisting of a reusable
GEM collection, a visual authoring tool and object
tracking system based on the ARToolKit library.
Another system that allows users to create AR experiences
is the designer’s augmented reality toolkit
(DART) (MacIntyre et al. 2005). The system is based
on the Macromedia Director multimedia-programming
environment to allow a user to visually create complex
AR applications as well as providing support for the
trackers, sensors and camera.
Although most of the above systems describe generic
frameworks that allow for AR and/or MR applications,
they have not focused on designing a high-level
user-focused interface that can deliver audio–visual
information. The DART system is the most similar
to this approach but it is based on a commercial
multimedia package and thus it is addressed to
designers and not general purpose developers. However,
this sometimes limits the capabilities of the
generated applications because they will be limited to
the speciﬁc package (i.e. Director). On the contrary,
this work is targeting developers who want to develop
AR applications and use higher level tools than
currently exist (i.e. ARToolKit).
3 Architecture of the system
The scope of the AR interface is to provide all the
necessary tools for developers to generate user-speciﬁc
AR applications (see Sect. 7). They will select which
sort of functionality is useful, and either use it as it is or
extend it to ﬁt the needs of the application. Based on
previous prototypes (Liarokapis et al. 2004a, b) a tangible
AR interface focused on superimposing ﬁve different
types of virtual information and allowing users
to interact using a combination of ﬁve different interaction
techniques was designed and implemented. The
system allows for the natural arrangement of virtual
information anywhere inside the interior of a building
or any other type of indoor environment. A diagrammatic
overview of the operation of the system is presented
in Fig. 1.
In the simplest conﬁguration, a laptop computer
with a USB web-camera and a set of trained marker
cards are employed. The most complex conﬁguration
performed for the purpose of this research included
two cy-visor HMDs, four LCD monitors, an 18 in.
iiyama touch screen and a 42 in. plasma screen (Sony
PFM-42V1N). Depending on the capabilities of the
Input Hardware Devices
Laptop
HMD Display
Plasma Screen
Video Splitter
Web
Camera
Marker Cards
Flat Screens
Augmented Reality
Environment
Touch Screen
Fig. 1 Overview of operation of the system
Virtual Reality (2007) 11:23–43 25
123
splitter different conﬁgurations can be supported
depending on the level of immersion and collaboration
required. For example, for some applications (i.e.
museum environments) the plasma screen could provide
an idealistic cognitive environment for collaborative
while the touch screen could be preferred as an
effective means for user-centred interaction. All displays
have been used to present the capabilities of the
system in various demonstrations and other dissemination
events and the plasma screen found to be the
most appealing one. To further increase the level of
interaction, a 3D mouse is integrated into the system
allowing users to manipulate the virtual information in
a natural way in six DOF (see Sect. 6.5).
Audio–visual augmentation techniques have been
also been implemented (see Sect. 5) in order to
achieve a realistic visualisation such as, matching
virtual lighting to real lighting, texture mapping
techniques, shading and clipping. To further improve
the quality of the visualisation, planar shadows and
reﬂections are generated in real-time so that the user
can get a more realistic perception of the augmented
information in respect to the real world. It is worthmentioning
that the software and hardware infrastructure
of the prototype AR interface developed in
this research is based on off-the-self hardware components
and low-priced software resources. The hierarchy
of the software architecture is presented in
Fig. 2.
The blue boxes represent the off-line tools used and
which form the basis of the implementation. The
technologies in the orange boxes show the software
components implemented for the creation of the AR
interface. A brief overview of how each technology was
used is presented in the following sections.
3.1 Off-line technologies
The off-line software technologies include a number of
commercial tools that must be used before the execution
of the AR interface to prepare the content used in
the augmentation (i.e. virtual information) as well as
the AR environment. Speciﬁcally, the ARToolKit’s
tracking libraries were used for the calibration of the
camera (see Sect. 4.2) as well as for the training of new
markers designed for the needs of our research. Image
processing (Adobe Photoshop) was appropriate for
creating appropriate 2D images that were used as part
of the visualisation process (see Sect. 5.2) and for
generating textures for the 3D models.
To create professional-quality 3D models, 3ds max
employed to digitise the models and export them into
3ds format. Next, deep exploration utilised to convert
3ds models into a number of formats including VRML
and ASCII. CoolEdit Pro served as a useful off-line
tool to record and processes all the necessary wave
samples required for the augmentation. WinHex was
helpful to analyse the robustness of the markers
existing in the AR environment. Finally, the Calibration
Toolbox for Matlab was used to improve the
camera parameters calculated from ARToolKit
(Sect. 4.2).
3.2 Real-time technologies
Real time software technologies consist of all the
software libraries that have been integrated into a
single application that comprise the AR interface. The
Microsoft vision software development kit (SDK) was
used as a basic platform to develop an interface
between the video input (from and video or web
cameras) and the rest of the AR application. Based on
this, only ARToolKit’s (Kato et al. 2000b) tracking
library (AR32.lib) was integrated to calculate the
camera pose in real-time. On top of the tracking library
a high-level computer graphics rendering engine was
implemented based on C++ that can perform mathematical
operations between 3D vectors and matrices.
Standard graphics functionalities like shading, lighting
and colouring were based on the OpenGL API (Woo
et al. 1999) while more advanced functions like shading
and reﬂection were implemented in the rendering
engine to provide a platform for the rapid development
of AR applications (Sect. 7).
GLUT (OpenGL utility toolkit) (Angel 2003) was
initially used to create a user-interface and to control
the visualisation window of the AR interface. In
addition, it was used for the textual augmentations
(Sect. 5.4) because it provides sufﬁcient support forFig. 2 Software technologies
26 Virtual Reality (2007) 11:23–43
123
bitmap and stroke fonts. However, GLUT provides
only a minimum set of functions for the user to control
the visualisation and therefore a more advanced solution
was implemented based on MFC (Microsoft
foundation classes). The advantage of implementing a
windows-based interface is that it allows users to
familiarise quickly with the GUI (graphical user
interface) as well as it provides menus and toolbars to
implement any type of user interaction. Finally,
OpenAL (open audio library) API was employed to
generate audio in a simulated 3D space (Sect. 5.5)
because it is similar to OpenGL coding style and it can
be considered as an extension of it.
4 Tracking
A key objective of this research was to provide a
robust platform for developing innovative AR interfaces.
However, to achieve the best tracking (with
commercial web-cameras) accurate calculation of the
camera parameters is required. As mentioned before,
ARToolKit’s tracking library was preferred because it
seems to provide accurate results with regards to the
estimation of the location of the object especially at
small distances and in cases where the camera is not
moving fast. However, the major ﬂaw of this approach
is that all ﬁducials must be visible continuously. Also
in un-calibrated environments, with poor lighting
condition, tracking might not work at all. In this
section, the results obtained from measuring ARToolKit’s
error and the algorithms used for calibrating
the camera (calculating the camera parameters)
are brieﬂy analysed.
4.1 Measuring ARToolKit’s error
ARToolKit was originally designed for small applications
working on a limited range of operation,
usually around one meter. In these applications the
distance between the marker and the user is often
small so most of the errors occurred are not easily
detectable. But in wide area applications, its positioning
accuracy is not very robust. In distances between
1 and 2.5 m the error in the x and y values
increases proportionally with the distance from the
marker (Malbezin et al. 2002). Because this research
is focused on indoor environments, it is very important
to work accurately in small distances ranging
between 1 m and reasonably well for up to 3 m. For
this reason an experimental measurement of the
accuracy of ARToolKit’s tracking libraries was performed
in the laboratory environment. The aim of the
experiment was to evaluate the error in distances
ranging between 20 and 80 cm under normal lighting
conditions. The experimental apparatus of this procedure
is illustrated in Fig. 3.
The optimal area, which contains the least error, is
the one that is perpendicular to the marker card. To
allow placing the camera on speciﬁc points with high
precision a grid is positioned on the ground (Fig. 3).
Besides, a rigid path was designed so that the camera
cannot loose its direction while moving backwards.
For each point on the grid, numerous measurements
of the location of the web camera in a local coordinate
system were taken. The camera is setup in
the shortest operating distance (20 cm) and after
completing measurements on its position it moves
backwards on a step of 1 cm. When the camera
moves 60 cm (60 different positions) the program
exits. For each position 20 measurements were taken
and they were averaged. Figure 4 illustrates the results
of this experiment (purple line) showing that
the error is proportional to the distance. In very
small distances the error in the detection of the
marker is small while in larger distances the error
becomes considerably bigger. It increases proportionally
to the angle of rotation when the camera
does not change position, but it is rotated around the
Y-axis.
To verify that the best location is on the perpendicular
axis of the camera and the marker, another set
of measurements were recorded with the camera facing
the marker at variable angle (yaw) having the other
two (pitch, roll) stable. In this case, the camera was
setup again in the same plane (ground plane) but the
Web
Camera
Marker
20 cm
60 cm
Wall
Ground
Fig. 3 Experimental setup for the measurement of ARToolKit’s
error
Virtual Reality (2007) 11:23–43 27
123
measurements were taken when the x and y values
tended to zero values. It was measured that the angle
in the initial position (20 cm from the wall, Fig. 3) is
approximately 12° while for the ﬁnal position the angle
is approximately 4°. It is worth mentioning, that the
second sets of measurements were not done automatically,
so on each step the camera had to be manually
adjusted to provide values as close as possible to values
of x and y to zero. Figure 4 shows the results from the
second experiment and illustrates that the measured
error, when the camera is not pointing directly to the
marker, is proportional to the distance. However, the
difference is of minimal signiﬁcance. This means that
when the camera lies with a certain area and does not
change its orientation the error is quite small. In contrast,
if the camera changes direction the error increases
considerably. Figure 4 illustrates differences in
the errors produced from the experiments compared
with the actual value (top dark line).
Figure 4 shows that the best results can be obtained
when the camera is oriented to point at the centre of
the marker. Even if the camera has a small offset, then
the error increases linearly with the distance. Nevertheless,
the tracking results are acceptable in the area
of less than 1 m since the error is hardly noticed.
4.2 Camera calibration
This section describes the procedures used in order to
calculate the intrinsic and extrinsic camera parameters.
The purpose for this was to deﬁne an accurate camera
model that can be effectively applied into indoor AR
environments. Although there are a few camera calibration
techniques available (Weng et al. 1992; Shi and
Tomasi 1994) for calculating the intrinsic camera
parameters, ARToolKit’s calibration library (Kato
et al. 2000b) was preferred since it works reasonable
good in small distances and in cases where the camera
is not moving fast. This method was originally applied
to measure the camera model properties such as: the
center point of the camera image; the lens distortion;
and the camera’s focal length. ARToolKit provides two
software tools that can be used to calculate these
camera properties, one to measure the lens distortion
and the image center point, while the second to compute
the focal length of the camera (Kato et al. 2000b).
Based on this, the initial calibration was performed
since it produces reasonable results for the calculation
of the intrinsic camera parameters.
However, the greatest limitations of this vision solution
include the tracking accuracy and the range of
operation (Malbezin et al. 2002). To minimise some of
the errors produced in the tracking of the markers, the
extrinsic camera used had to be accurately estimated.
The virtual objects will only appear when the tracking
marks are in view. The size of the predeﬁned patterns
inﬂuences the effectiveness of the tracking algorithms.
For instance, if the pattern is large then the pattern is
detected further away. To calculate the extrinsic camera
parameters the camera calibration toolbox (Camera
Calibration Toolbox for Matlab 2003) was used which
provides a user-friendly interface and it is very convenient
when working with a large number of images.
Another advantage over the previous method is that it
provides very accurate results.
Before the camera calibration begins two steps need
to be initially followed. In the ﬁrst step the calibration rig
must be generated while in the second all the calibration
images must be collected. When done, the grid corners
are easily extracted. The ToolKit offers an automatic
mechanism for counting the number of squares in each
grid and all calibration images used are searched and
focal and distortion factors are automatically estimated.
However, similarly to ARToolKit method, in most
occasions the algorithm may not predict the right number
of squares and thus provides a poor result. This can
be clearer by observing the results of the calculation of
the re-projection error. As it is clearly observed from
Fig. 5a, the re-projection error is quite big compared to
the scale. The reason behind this is because the extraction
of the corners is not acceptable on some highly
distorted images. However, the advantage of this technique
is that it allows the user to improve the calibration.
Speciﬁcally, the whole procedure can be repeated until
the error is minimised up to a certain point.
After repeating the procedure for ﬁve times the error
is reduced from a scale of ﬁve to a scale of one as illustrated
in Fig. 5b. Furthermore, because ARToolKit
accepts only binary data format for the calibration, a
simple way to do this is to estimate the extrinsic
Fig. 4 Comparison of measured values
28 Virtual Reality (2007) 11:23–43
123
parameters and then save the computed parameters in
the data structure replacing the old values. The old data
structure that holds the calculated camera parameters
(ARParam struct) is shown below:
typedef struct {
int xsize, ysize;
double mat [3][4];
double dist_factor [4];
} ARParam;
In this structure the xsize, ysize and dist_factor have
been experimentally replaced with the new values calculated
from the above. Speciﬁcally, the camera
parameters including the focal length (fc), the principal
point (cc), the skew (sk) and the distortion (kc) have
been computed and based on these values the intrinsic
matrix can be deﬁned as shown in Eq. (1):
fc0 skfc0 cc0
0 fc1 cc1
0 0 1
0
@
1
A ð1Þ
Since ARToolKit does not take into account the
skew factor and makes use of the following matrix:
sxf 0 x0
0 syf y0
0 0 1
0
@
1
A ð2Þ
To match the outputted camera matrix from Matlab
and ﬁt it into ARToolKit’s matrix, the following matrix
can be derived:
fc0 0 cc0
0 fc1 cc1
0 0 1
0
@
1
A ð3Þ
After testing the new camera model a small
improvement was succeeded in the distortion in the
magnitude of 3–4%. As a further improvement, it was
decided to add the skew parameters to the matrix, thus
the skew parameter was used instead of zero in the
matrix as shown below:
fc0 skfc0 cc0
sk fc1 cc1
sk sk 1
0
@
1
A ð4Þ
Although the last modiﬁcation provided us with a
more correct camera model with an estimated
improvement of about 1%, the effectiveness of the
tracking system was not signiﬁcantly improved. This is
due to the fact that the optics used in the camera system
(web camera) is really poor compared to professional
video cameras. Other environmental issues that
inﬂuence tracking include lighting conditions and
range of operation.
5 Audio–visual augmentations
Each type of virtual media information is designed for
speciﬁc purposes and as a result produces different
outcomes. For instance, textual explanation can be utilized
much more effectively than auditory description
when communicating verbal information. On the other
hand, pictures work better than text, for recalling or
explaining diagrammatically a procedure. To describe a
sequence of events video seems to be one of the most
efﬁcient techniques. In this section, the methodology
used for the simultaneously multimedia visualisation of
virtual information into an AR indoor environment is
presented.
5.1 Object augmentation
An ideal AR system must be able to mix the virtual
information with the real in a physical way. The participants
should not realize the difference between the
real and the augmented visualisation. The focus of this
research is to present and implement methods of
Fig. 5 Calculation of camera
error a re-projection error b
minimisation of error
Virtual Reality (2007) 11:23–43 29
123
realistically rendering 3D representations of real objects
in an easy and interactive manner. The selection
of the most appropriate 3D format is a crucial task in
order to achieve a high level of realism in the system.
In this research, both 3ds and VRML ﬁle formats have
been used as shown in Table 1.
In any case, one of the ﬁrst problems derived when
displaying a 3D representation of a real model is the
correct alignment on the required position. Virtual
objects may appear to ﬂoat on the marker and the user
will be easily confused. This usually occurs because the
3D model is not registered correctly into the scene. For
example, when a 3D object is transformed into the real
scene it may appear below the origin as illustrated in the
left image of Fig. 6.
To correct the problem of misalignment, Fig. 6a, a
sorting algorithm for registering 3D objects precisely
onto the top of the markers was implemented. To
achieve a correct registration the virtual information
need to be ﬁrst sorted and then initialised to exactly the
same level, as the marker is located in the Z-axis. An
efﬁcient way to align objects is by using a two-stage
process. In the ﬁrst part, the vertices of the object are
sorted by the Z-axis. Upon completion, the vertices are
translated to the minimum value, which is the origin of
the marker cards, resulting in a proper object regis-
tration.
Next, to improve the realism of the AR scene a fast
algorithm for planar shadows and reﬂections was
implemented. The location of the shadow can be calculated
by projecting all the vertices of the AR object
to the direction of the light source. To generate augmented
shadows an algorithm that creates a 4 · 4
projection matrix (Ps) in homogeneous coordinates
must be calculated based only on the plane equation
coefﬁcients and the position of the light (Moller 1999).
Say that L is the position of the point light source; P
the position of a vertex of the AR object where the
shadow is cast; and n the normal vector of the plane.
The projection matrix of the shadow can be calculated
by solving the system, which consists of the equation of
the plane and a straight that passes from the plane
point in the direction of the light source (see Eq. 5).
where LpÁPc is the dot product of plane and light
position. The projection matrix has a number of
advantages compared with other methods (i.e. fake
shadows) but the most important is that it works fast
and it is generic so that it can generate hard shadows
in real-time for any type of objects independently of
their complexity (Liarokapis 2005). An example
screenshot that illustrates planar shadows is shown in
Fig. 7.
The main disadvantage of this algorithm is that it
renders the virtual information twice for each frame:
once for the virtual object and another one for its
shadow. Another obvious ﬂaw is that it can cast
shadows only into planar surfaces but with some
modiﬁcations, it can be extended to be applied to
speciﬁc cases such as curved surfaces (Liarokapis
2005).
To realistically model reﬂections in AR environments,
many issues must be taken into account. Although
in reality the light is scattered uniformly in all
directions depending on the material of the object in
this work, the effect of mirror reﬂections has been
implemented. An example screenshot of a virtual object
casting a shadow and a reﬂection on a virtual plane
is illustrated in Fig. 8.
Table 1 Categorisation of 3D
ﬁle formats
Advantages Disadvantages
3ds
Includes pre-vertex texture coordinates 3D can have 216 vertices maximum
Unknown parts can be skipped Poor normal information
VRML
Easy to read Contains less information than 3ds
Standard for 3D internet presentations Does not support advanced lighting and texturing
Contains animation and collision detection
ps ¼
Lp  Pc À Lp0 Â Pc0 0 À Lp1 Â Pc0 0 À Lp2 Â Pc0 0 À Lp3 Â Pc0
0 À Lp0 Â Pc1 Lp  Pc À Lp1 Â Pc1 0 À Lp2 Â Pc1 0 À Lp3 Â Pc1
0 À Lp0 Â Pc2 0 À Lp1 Â Pc2 Lp  Pc À Lp2 Â Pc2 0 À Lp3 Â Pc2
0 À Lp0 Â Pc3 0 À Lp1 Â Pc3 0 À Lp2 Â Pc3 Lp  Pc À Lp3 Â Pc3
0
B
B
@
1
C
C
A ð5Þ
30 Virtual Reality (2007) 11:23–43
123
Based on the OpenGL’s stencil buffer a reﬂection of
the object is performed onto a user-deﬁned virtual
ground. The stencil buffer is initially set to sixteen bits
in the pixel format function. Then, the buffer is emptied
and ﬁnally the stencil test is enabled.
5.2 Image augmentation
Images are widely used as a means to increase realism
and in the past, they have been used with success
for educating purposes. The augmentation of images
is a highly cost effective means to present simple 2D
information in the real world. The use of their
operation may be performed in a number of different
ways depending on the learning scenario applied. The
digital image augmentation can be either static or
dynamic. Dynamic image augmentation is widely used
for achieving video augmentation (see Sect. 5.3).
With static augmentation, a single image only is
rendered into the scene. Based on the theoretical
framework provided by Smith (1994), images used for
AR environments have been categorised into
description, symbolic, iconic and functional as shown
in Table 2.
The algorithm used is simple but very efﬁcient and
can be applied into two types of image formats (BMP
and TGA). First, it loads an image ﬁle and checks if it
is a valid image format. In the next step, textures are
generated using data from the image ﬁle. Following
this, the texture is created and the parameters are set
based on the OpenGL API. Finally, the texture is
bound to the target texture, which is a quadrilateral.
5.3 Video augmentation
The mode of operation within the video AR system
is to read an AVI ﬁle, decompose it into 256 · 256 ·
24 bit images, mix it with the dynamic video (coming
from the camera) and ﬁnally display it on the selected
visualisation display (Liarokapis 2005). When
Fig. 6 Object augmentation a
misalignment of object b
correct registration
Fig. 7 Illustration of planar shadows
Fig. 8 Planar shadows and reﬂections
Virtual Reality (2007) 11:23–43 31
123
the video ﬁle is loaded, the program automatically
counts the number of frames so that its size is
known. Then all frames are decomposed into 2D
images and each image is applied to a square quad,
exactly in the same way as textures are wrapped to
objects. It is worth mentioning here that because
each animation has a speciﬁc length (in seconds) and
its own frame rate, the time required for each frame
is calculated.
Moreover, the augmented video starts automatically
when two things occur: a marker is detected and user has
loaded a particular ﬁle from the ﬁlling system using the
interface menu. When the animation is completed it
repeats itself until the user decides to stop it (by pressing
a keyboard key or using the interface menu). To increase
the feasibility of the system, if the camera is not
in line of view with the marker card, then the video
augmentation will continue playing until the video
sequence is ﬁnished. This was designed on purpose
to prevent cases where the user changes position or
orientation rapidly and thus looses the perceived visualisation.
The augmented animation can be controlled
in a number of different means including the cease of the
animation, resize the animation window or even
manipulate the augmented video animation into six
DOF (see Sect. 6).
Figure 9 shows four frames of a video sequence
superimposed into a marker card, describing a complex
concept in electronics (i.e. Moore Diagram). In terms
of efﬁciency, the video augmentation results range
between 20 and 35 FPS depending on the resolution of
the videos. However, the drawback of this method is
that when the animation is augmented the overall
performance of the system is signiﬁcantly reduced.
In particular, the performance was experimentally
measured to be reduced by approximately 20–50% the
FPS of the system. For instance, if the performance of
the system is in real-time (i.e. 25 FPS) then the AR
video algorithm would drop the performance to 12–
20 FPS. Another limitation of the proposed video
augmentation is that it can currently decompose videos
into only 256 · 256 · 24 bit images.
5.4 Textual augmentation
Textual annotations are the simplest form of information
that can be easily augmented in any type of AR
environments. This can be either presented as a label
or as a description. Label text has been used in the past
(Klinker et al. 1997; Sinclair and Martinez 2001), to
point out speciﬁc parts of a complex system using the
minimum textual information. In this case, the most
important aspect is to ensure that the augmented labels
do not obscure each other and that the information is
clearly presented to the user. Description text requires
a much more demanding process because it needs to
provide complete information about an object or about
a virtual operation. The problems begin in cases where
the magnitude of textual information needs to be
augmented on a display is large.
In this research, label and descriptive textual information
was performed by dynamically loading ASCII
text ﬁles. Each ﬁle contained a very different level of
information depending on the reasons for utilising it.
For example, label text ﬁles were deﬁned to specify the
type of visualisation (i.e. image augmentation) or the
name of an object. The main advantage of this method
is that the textual information, which will be augmented
on the real environment, is stored on a txt ﬁle.
Text ﬁles are widely used and can be easily transferred
Table 2 Categorisation of images augmentation
Purpose Usage
Description
Most popular format Explain a 3D real object
Describe the real world Textual information have a useful meaning
Image itself can tell a self-explanatory story
Symbolic
Identify a basic principle or symbol Concerns images that represent various types of well known symbols
Allow both simple and complex symbolism
Interpretation can change over time
Iconic
Identify a case of a multinational meaningful
icon that is not related to a speciﬁc language
Image contains different types of iconic representations that can illustrate something
useful
For example the ‘‘exit’’ or ‘‘danger’’ sign
Functional
A single operation can be expressed Functional images act as virtual buttons and a speciﬁc operation is assigned on each
Multiple operations can be also supported
32 Virtual Reality (2007) 11:23–43
123
over all types of networks. Users of the system can
position textual augmentations anywhere in the real
environment using standard transformations. In addition,
they can change their appearance in terms of
colour, size and font type (Bitmap, Times Roman and
Helvetica).
5.5 Audio augmentation
In the real world, audio is a process that is heard spatially
and thus it is a very important aspect for any
simulation scenario. The most important issue when
designing 3D sound is to ‘‘see’’ the sound source
(Yewdall 1999). However, most AR applications have
not incorporated 3D sound component even if it can
contribute to the sense of immersivity. The augmented
sound methodology followed in this work, has some
similarities with the ASR approach (Dobler et al. 2002)
in the way virtual sounds are augmented in the real
environment. Unlike this experimental approach,
which is based on a Creative EAX API, the implemented
3D audio system is based on OpenAL, which
has many similarities with OpenGL and was originally
designed for generating 3D sounds around a listener.
The recording of speech sounds was done in mono
format, using a standard microphone and the mono
samples were converted into stereo format. Furthermore,
each sound source in the system has been speciﬁed
to have the following three properties: position,
orientation and velocity. The spatial audio system can
handle multiple sound sources and mix them together.
The user can move the sources in 3D space using the
keyboard and menu interaction techniques illustrated
in Sect. 6.
The spatial sound algorithm ﬁrst initialises all the
necessary OpenAL variables (position, orientation and
velocity) and then loads them into the appropriate
buffers (format, length and frequency) for further processing.
Next, sound sources and buffers are initialised
and the sources are assigned to the buffers. The picth
and gain are set to one and the sources are set into a
continuous loop unless stopped by the user. Each time
the camera detects the marker the transformation
matrix is inverted to estimate the position of the camera.
In the context of this research, the distance model
experimentally applied used to simulate the distance
followed the linear equation as illustrated below:
y ¼ ax þ b ð6Þ
where a represents distance between the camera and
the marker and b the offset position of the marker
card. Although this cannot accurately represent the
distribution of sound in 3D space it provides very good
results. To provide more freedom to the listener the
values of the linear function may change depending on
the requirements of the visualisation. If the sound
source is positioned in the origin then the above
equation may be re-written as shown below:
Listener ¼
camera position
distance factor
ð7Þ
where camera_position refers to the inverse transformation
of the camera and distance_factor to a constant
number. To achieve a realistic simulation of the sound
different values have been tried to simulate the distance_factor.
However, this constant value may change
off-line depending on the requirements of the visualisation.
For example, some users may prefer to perceive
the auditory information louder than others do. In
addition, the system is capable of loading and mixing
music sound ﬁles. This option can be extremely useful
for simulating surround music audio. The sound ﬁles
may be overlaid into the same marker or onto a different
marker depending on the needs of the application.
6 Human–computer interactions
Human–computer interactions are one of the most
important issues when designing a robust real-time system.
They have to be performed in a natural way so that
inexperienced participants familiarise quickly in the AR
environment (Liarokapis et al. 2004a). The proposed
Fig. 9 Video augmentation
Virtual Reality (2007) 11:23–43 33
123
interface allows users tangible interaction with various
types of multimedia information such as 3D models,
images, textual information and 3D sound, using a
number of interaction techniques. Interactions controlled
by the user-computer can be distinguished into
ﬁve different categories including physical manipulation,
interface menu interaction, standard I/O, Touch Screen
and SpaceMouse interaction as illustrated in Fig. 10.
Although, some types of interactions proposed in
Fig. 10 are not novel (i.e. physical manipulation), the
novelty comes in the way they are used by the participants.
Participants can combine two or more types and
experience a novel form of interaction with great
ﬂexibility. For example, the most signiﬁcant combination
of human–computer interactions is the use of
intuitive methods like the physical manipulation with
sophisticated devices such as the SpaceMouse. Users
can hold in one hand a marker card with a virtual
object superimposed and on the other hand use the
SpaceMouse to perform graphics operations like virtual
lighting. In the following sections, all the types of
interactions are explained in detail.
6.1 Standard interactions
The ﬁrst method is addressed to users with some
computer experience and is based on standard
interaction input devices like the keyboard and the
mouse. For example, by pressing buttons (hot keys)
the visual parameters of the virtual objects can be
changed faster instead of using the menu dialogues.
Some of the most characteristic are described in
(Liarokapis et al. 2004a, b; Liarokapis 2005) and
include the change of lighting conditions (ambient,
diffuse, specular and shiness); the texturing information
(standard and environmental); the switch from
solid mode to wireframe mode; and others (see
Sect. 6.2). Moreover, the keyboard is also employed
for changing the position (translation), orientation
(rotation) and scaling of the virtual information in six
DOF. Initially, the above transformations were
implemented based on the OpenGL functionality but
soon it became obvious that OpenGL could not meet
the requirements of this research because it provides
only the minimum functionality to rotate an object
around X, Y or Z-axis. However, in a tabletop AR
environment this is constraining the user when
rotating the virtual information as well as it restricts
the use of simultaneous interactions. To tackle this
problem a generic rotational matrix that takes as
input three angles and rotates the object around an
arbitrary axis is speciﬁed in Eq. 8:
Based on the above rotational matrix, it became
possible for users to rotate virtual information around
an arbitral-deﬁned axis. The above matrix was also
implemented to the standard mouse providing a quick
way to perform intuitive rotations. Although, it provides
the means to perform a rotation around all three
axes simultaneously if one interaction device is used,
problems occur when more than one device is used (i.e.
keyboard and mouse). An alternative way of performing
transformations is by using quaternions. To
specify multiple rotations, many intermediate control
points are required where a quaternion interpolation
depends only on the relation between the initial and
ﬁnal rotations. The easiest way to prove the link between
a rotation matrix and a quaternion is by linking
them in three dimensions. Say that q = s + vÁI a unit
Standard I/O
Computer
Physical Manipulation Touch Screen
Interface Menu Interaction
SpaceMouseUser
Fig. 10 Interactions within the system
cos cos w À cos sin w sin
sin u sin cos w þ cos u sin w À sin u sin sin w þ cos u cos w À sin u cos
À cos u sin cos w þ sin u sin w cos u sin sin w þ sin u cos w cos u cos
2
4
3
5 ð8Þ
34 Virtual Reality (2007) 11:23–43
123
quaternion and deﬁned Q, where v = (ux, uy, uz)T
, it
can be shown that there is a 3 · 3 matrix that represents
a rotation matrix of the form (Eq. 9):
vvT
þðsI3Â3 þCuÞ2
¼
s2
þu2
x Àu2
y Àu2
z 2ðuxuy ÀsuzÞ 2ðuxuz þsuyÞ
2ðuxuy þsuzÞ s2
þu2
x þu2
y Àu2
z 2ðuyuz ÀsuxÞ
2ðuxuz À suyÞ 2ðuyuz þsuxÞ s2
Àu2
x Àu2
y þu2
z
0
B
B
@
1
C
C
A
ð9Þ
To obtain a quaternion corresponding to a given
rotation matrix we ﬁrst deﬁne an arbitrary rotation
matrix R and then the corresponding quaternion
q = s + uxi + uyj + uzk to the rotation matrix. Using the
above equation it is easy to solve the equation and
derive the values for ux, uy and uz, respectively. In
OpenGL, rotations are speciﬁed as matrices since
homogeneous matrices are the standard 3D representations.
By combining the property of unit quaternion
with the above rotation quaternion matrix we can
deduce the following equation (Eq. 10):
vvT
þ ðsI3Â3 þ CuÞ2
¼
1 À 2ðu2
y À u2
zÞ 2ðuxuy À suzÞ 2ðuxuz þ suyÞ
2ðuxuy þ suzÞ 1 À 2ðu2
x þ u2
zÞ 2ðuyuz À suxÞ
2ðuxuz À suyÞ 2ðuyuz þ suxÞ 1 À 2ðu2
x À u2
yÞ
0
B
@
1
C
A
ð10Þ
Other functions that were integrated to the mouse
include translations and scaling. On the other hand,
using the mouse users can access the carefully designed
GUI. This allows users to have full access to the
superimposed virtual information. An example is presented
in Fig. 11, where users can select the information
that is going to be augmented on the real
environment.
6.2 GUI Interactions
On the other hand, using the mouse or the Touch
Screen users can access the functionality that has been
carefully integrated into a novel GUI. The GUI consists
of a menu, a toolbar, a status bar and a number of
Fig. 11 GUI functionality
Virtual Reality (2007) 11:23–43 35
123
dialog boxes. This allows participants to have the same
access to the augmented virtual information as if they
were using standard interaction techniques. Four
example screenshots that illustrate some of the functionalities
of the GUI is presented in Fig. 11.
The greatest advantage of the proposed GUI is that
it allows participants to perform complex operations
very accurately. Speciﬁcally, sometimes it is of crucial
importance to transform a virtual object in a speciﬁc
location in the real environment. Using other methods
it could take a great amount of time and effort
(depending on the experience of the user) to achieve
this and it will deﬁnitely not be very accurate. However,
the GUI interaction techniques offer the solution
to the problem using double point precision.
Next, the ‘‘Edit’’ category consists of three basic
operations including video (start or stop), a zoom
dialog box and a scale dialog box. The ‘‘View’’ category
consists of two sets of operations. Firstly, a
Toolbar and a Status bar, which is commonly found in
windows based applications. It is worth-mentioning
here that the GUI has been built on top of the windows
API so that full compatibility with windows based
operating systems is ensured. As far as this research is
concerned this is the only true windows based interface
that can superimpose ﬁve different types of multimedia
content into the real environment.
The second set of operations consists of three
functions called axis (to insert a Cartesian set of axis
indicating the origin of the AR environment), debug
(to threshold the live video sequence and thus check
whether a marker is detectable) and clip (to clip the
graphics geometry). The rest of the menu categories
(graphics and augment) are used to control visualisation
properties of the augmented information. Functions
that have been implemented include shadows,
fog, lighting, material, texturing, colouring, transparency
and shading. Finally, the ‘‘help’’ category provides
some information about the release version of
the AR interface as well as the date and the author
name.
6.3 Physical manipulation
Physical manipulations were speciﬁcally designed for
users with no computer experience and refers to a
physical manipulation of the marker cards (Kato et al.
2000a; Billinghurst et al. 2001). As illustrated in
Fig. 12, users can manipulate freely the marker cards in
six DOF to receive a different perception of the
superimposed information.
Another beneﬁt of natural interactions is that they
can be used with the other types of interactions described
in this section. This allows producing unique
combinations (see Fig. 14) that can provide solutions
for speciﬁc AR applications that require a high-level of
interaction. In addition, apart from using the marker
cards for just superimposing virtual information, they
have been used to perform some basic operations such
as: assign an object into a marker, de-assign an object
from a marker, scale, rotate and translate. The
advantage of this method is that users can use only
physical objects (marker cards) to visualise and interact
with the virtual information. However, the disadvantage
is that when multiple markers are used the
overall efﬁciency of the system is reduced. Speciﬁcally,
the template matching algorithm used operates very
Fig. 12 Natural manipulation of virtual object Fig. 13 Pseudo code for SpaceMouse
36 Virtual Reality (2007) 11:23–43
123
effectively in real-time performance with one marker
but it starts to decrease drastically as more markers are
added. The reason behind this is because for each
marker the algorithm is aware; it creates four templates,
one at each orientation. Each marker has to be
compared to all known templates until the best match
is detected. To calculate the number of comparisons
performed by the algorithm the following equation is
illustrated:
Nc ¼ 4 Â Nm Â Nt ð11Þ
where Nc is the number of comparisons, Nm is the
number of known markers and Nt corresponds to the
number of known templates. If in the scene there are
10–20 markers and the application knows about 250
markers then the system performs around 10,000–
20,000 comparisons. This makes any system to run
much slower and makes the application operate in less
than 25 FPS. Thus, to achieve a fast AR application in
the ﬁnal system it was preferred to use as few markers
as possible having as limit ten markers.
6.4 Touch screen interactions
An alternative way of interacting with the virtual
information is to make use of interaction devices such
as Touch Screens. This is ideal for some application
scenarios where, the use of other interaction devices is
not possible. For example, in museum environments,
Touch Screens are the most appropriate means of
interacting with the virtual exhibitions. Besides, although
it was easy to integrate the Touch Screen to the
AR interface, many problems arose when users tried to
interact with the GUI menu. The reason for this is
because the menus in the GUI were too small and it
was difﬁcult for some users to select. To tackle the
problem, large toolbar buttons and dialog boxes were
designed and associated with appropriate functionality.
The main advantage of using the Touch Screen is that
it can serve both the visualisation and interaction all in
one device. However, the major drawback is that the
effectiveness of the interactions is dependent on the
effectiveness of the GUI. If the GUI is not userfriendly,
it will affect the usefulness of the Touch
Screen interactions.
6.5 SpaceMouse interactions
Finally, users can manipulate virtual information using
sophisticated VR sensor devices such as SpaceMouse
(Liarokapis et al. 2004b) and InertiaCube. SpaceMouse
allows the programmer to assign functionality to provide
a customised nine button-menu interface. This
method has the advantage manipulating virtual information
in six DOF in a natural way using only one
hand. A combination of C++ functions, SpaceMouse
commands and OpenGL allowed the integration of the
3D mouse into the system. Important functionalities
that have been implemented and assigned to the menu
buttons include either standard graphics transformations
for easier manipulation, or more advanced
graphics operations (Fig. 13).
In Fig. 13, S represents the scaling operations, Tx, Ty
and Tz represent the translations and Rx, Ry, and Rz
the rotations. To perform one of the above operations
Fig. 14 SpaceMouse
interactions
Virtual Reality (2007) 11:23–43 37
123
the user has to press one of the buttons (the translation
button for example) and then use the bar to translate
the object in 3D space. Depending on which direction
force is applied, the object will move respectively.
Furthermore, the ambient lighting, the clipping of
superimposed geometry through an inﬁnite plane and
the augmentation of a virtual plane can be switched on
and off using the remaining SpaceMouse buttons. Four
example screenshots of a user interacting with 3D
information using the SpaceMouse is illustrated in
Fig. 14.
It illustrates how a user can adapt the MagicBook
approach (Billinghurst et al. 2001) in conjunction with
the SpaceMouse to visualise and interact with the virtual
artefacts. On the top left image, the user is only
visualizing the virtual artefact (marker B) while on the
top right image, the user translates the artefact using
the SpaceMouse. On the bottom left image, the user
interacts (rotates) with another artefact (marker A)
and on the bottom right image the user visualises another
artefact (belonging on marker E). The most
important limitation of this tangible interface is the use
of a single marker for tracking by the computer vision
based tracking system.
7 Application scenarios
To test the functionality of the proposed AR interface
system two application scenarios have been designed.
The ﬁrst section (see Sect. 7.1) presents an educational
application used to support and simplify teaching and
learning techniques currently applied in the higher
education sector. The second section (see Sect. 7.2)
illustrates a museum application with the aim of facilitating
access to museums and other cultural heritage
galleries. In the following sections, each application is
brieﬂy analysed and the most important ﬁndings of the
research are presented.
7.1 Educational application
Most educational AR applications operate in indoor
environments (Begault 1994; Fuhrmann and Schmalstieg
1999) and the scenarios proposed in this section
are focused on enhancing the teaching and learning
process for higher education institutions like colleges
and universities. With this purpose in mind, AR educational
scenarios have been designed to assist teachers
to transfer knowledge to the students in other ways
than traditionally has been the case (Liarokapis 2005).
The aim is to provide a rewarding learning experience
that is otherwise difﬁcult or impossible to obtain by
offering the ability to achieve better user interaction
(with teaching material and complex tools) while the
provision of an interactive augmented presentation
provides students a high degree of ﬂexibility and
understanding of the teaching material. All scenarios
are speciﬁcally engaged with the improvement of
learning and teaching techniques in the ﬁelds of engineering
and informatics at the University of Sussex.
Based on the functionality of the AR interface
described in the above sections, a lecture was prepared
introducing students on how computers work. This
application has in practice some similarities with the
experimental application proposed by Fernandes and
Miranda (2003). However, the higher education
application offers a very powerful user interface that
allows audio–visual augmentation as well as simultaneous
interactions. From a visualisation point of view,
the system displays the data in a single window and the
lecturer can describe basic IT principles with the use of
AR technology in a number of different ways. In
Fig. 15, a PowerPoint slide presentation that describes
the characteristics of a computer system as well as
relative textual information is augmented onto the
appropriate marker card.
Learners can zoom into the diagram in two ways.
Firstly, by using the predeﬁned functionality (scale and
translate) existing in the keyboard, the menu and the
SpaceMouse interfaces. Alternatively, learners can either
move the marker card intuitively closer to the
camera and vice versa. In both ways, potential users
can clearly observe and understand the theoretical
Fig. 15 Teaching IT using AR
38 Virtual Reality (2007) 11:23–43
123
operation of a computer. The textual information describes
the diagram in detail providing a more complete
learning presentation simultaneously. Learners
can now get simultaneously appropriate audio–visual
information that helps them to acquire a deeper
knowledge about the characteristics of a computer. In
the same way, to increase the level of understanding of
the teaching material presented to the students, 3D
information can be presented to deepen the level of
knowledge transfer. Along these lines, learners can
have a more rounded idea of what are the main characteristics
of a computer, what are the main parts and
how they look like in reality.
The main advantage of the educational application
over the traditional teaching methods is that learners
can actually ‘‘see’’ and ‘‘listen’’ the virtual information
superimposed in the real world (Liarokapis et al.
2002). Students can naturally manipulate the virtual
information using standard or sophisticated VR
devices and they can repeat a speciﬁc part of the
augmentation as many times as they want. Another
beneﬁt of the system is that it does not require students
to have any previous experience to operate it. Finally,
even AR has been experimentally applied for teaching
engineering and information technology (IT) courses it
has been designed in such a way that it can be easily
adapted and applied very easily to other educational
courses.
7.2 Cultural heritage application
The concept of virtual exhibitions in museums has
been around for many years and researchers have designed
and developed several applications (Liarokapis
et al. 2004a, b; Hall et al. 2001; Gatermann 2000). In
addition, a number of museums hold innumerable
archives or collections of artefacts, which they cannot
exhibit in a low cost and efﬁcient way. Museums simply
do not have the space to exhibit all the artefacts in an
educational and learning manner. Augmented Representations
of Cultural Objects (ARCO) was an EUfunded
research project (completed in September
2004) in order to analyse and provide innovative but
simple to use technical solutions for virtual cultural
object creation and visualisation. In short, ARCO
provides museums with a set of tools that allow them to
digitize, manage and present artefacts in virtual exhibitions.
To evaluate the usability of the system properly
ARCO collaborated with Victoria and Albert
Museum and the Sussex Archaeological Society.
The work illustrated in the previous sections has
been applied in ARCO to explore the potential of AR
in a museum environment by mixing virtual information
in an environment comprised of real objects. The
success of an AR exhibition is highly related to the
level of realism achieved. In general, there are a few
AR applications that do not require a high level of
realism, but within the cultural heritage ﬁeld realistic
visualisation is an important issue (Liarokapis et al.
2004a). The scenarios illustrate how virtual museum
visitors can visualise archaeological information such
as virtual artefacts or even whole virtual museum galleries
providing an educational narration for the preservation
of cultural heritage. An example screenshot of
four different virtual galleries from Victoria and Albert
Museum are illustrated in Fig. 16.
In theory, this technique can be extended to as many
markers as long as the camera can detect them within
the ﬁeld-of-view. The major drawback of this method is
that the frame rate drops analogous to the number of
markers used (as illustrated in Sect. 6.3) but the overall
effectiveness in all galleries was between 25–30 FPS.
Furthermore, the realism of the system highly depends
on the 3D modelling procedure and for this reason the
3D models used in this scenario are the very high
resolution models. The visualisation of an expedition
to large groups of people can be considered as a collaborative
activity. By looking at and interacting with
the artefact visualisation visitors can communicate with
each other by expressing their thoughts about any aspects
that relate to the history of the artefact. This
results in an exchange of opinions amongst the visitors
in an implicit and explicit way. By zooming into the
artefact more contained arguments can be made about
the nature of the material used for its construction. On
the other hand, in a perspective view more verbal
communication is possible. By using the conﬁguration
setting of the collaboration in the AR interface, visitors
can use HMDs and obtain a completely immersed view.
8 Preliminary evaluation
The knowledge gained from reviewing the literature
and the experimental results, enabled an initial
dissemination of the prototype AR interface. Even if
this work is still on an experimental status, the results
can be taken into consideration to improve the effectiveness
of the presented system as well as to design
future high-level AR interfaces. An expert-based evaluation
approach was followed that argues that formal
laboratory user studies can effectively evaluate visualisation
when a small sample of expert users is used (Tory
and Mo¨ller 2005). In terms of evaluating the system,
some initial empirical research was conducted based on
a two stage human-centred questionnaire. The ﬁrst part
Virtual Reality (2007) 11:23–43 39
123
is generic but aims at evaluating the usability of the
system in the learning process while the second part is
more technical and refers to the effectiveness of the
visualisation and interaction techniques of the interface.
An educator would design the questionnaire in a
different way taking primarily into consideration educational
aspects whereas in this case, the purpose was to
obtain a number of useful conclusions regarding the
technicalities and practicalities of the system. This pilot
study was disseminated to a ﬁve research staff from
Sussex University that had experience in working with
VR applications. Four were men and one was woman.
Subjects were between the ages of 24 and 28 and the
average time of the evaluation was 30 min.
8.1 General questions
The feedback received following the completion of the
evaluation process varied but in general lines was
encouraging. As far as the ﬁrst part of the questionnaire
is concerned, all the users thought that the system
has the potential to be used as a learning tool in the
future although it currently lacks from interoperability
issues. Speciﬁcally, they argued that the application
scenarios were really interesting and exciting but for
teaching purposes more comprehensive learning scenarios
have to be implemented. The ﬁndings from this
study are summarised in Table 3.
Results illustrate that 92% of the users believe that
the system has the potential to be used as a basic
platform to create AR scenarios and applications
whereas 80% rate the quality of the system good. On
the other hand, 72% of the users liked the overall
usage of the system and 64% feel that educational
applications could beneﬁt from this technology.
Moreover, two users mentioned that the system would
be much more useful if a multimedia database system
with a content management system is used to increase
interoperability issues. Another one stated that a print
function would help to capture and store into images
the different views of the AR environment.
8.2 Technical questions
Regarding the second part, all users agreed that the
system is very easy to use and that the visualisation
process is more than satisfactory. Surprisingly, most of
the users preferred the HMD-based visualisation versus
the monitor-based visualisation (Fig. 17).
Figure 17 shows the user-response in comparing the
monitor-based AR (mean = 6.8, SD = 2.77489,
SE = 1.24097) versus video see through HMD-based
AR (mean = 8.2, SD = 2.48998, SE = 1.11355). Similar
studies have shown the exact opposite result but in this
study all users were computer literature and all had
Fig. 16 Virtual museum
gallery visualisation
Table 3 General questions about the system
General questions Mean (max = 5) SD (yEr±)
Rate quality of performance? 4 0.7071
Rate the overall usage? 3.6 0.5477
Can aid the education process? 3.2 0.4472
Create new AR applications? 4.6 0.5477
40 Virtual Reality (2007) 11:23–43
123
previously used VR prototypes that make use of
HMDs. Moreover, many difﬁculties were observed
when participants tried to move around with the
camera mounted on the HMD because they could not
keep it in line with the sight of view. Also because the
resolution of the HMD is limited to 800 · 600 and the
quality of the overlaid graphics into the optics system is
not very good, two participants felt nausea and motion
sickness after a 10 min usage. However, even if these
problems seem to restrict the use of HMDs, participants
appreciated the level of immersion provided and
thus preferred it. As far as the interaction techniques
are concerned, the natural interaction techniques
based on the marker cards were found to be very
effective and intuitive to use compared to the other
interaction techniques. Figure 18 illustrates a comparison
based on the user-response between the most
important interaction techniques implemented.
The I/O interaction techniques got the second
highest score (mean = 6.2, SD = 2.16795, SE = 0.96954)
since they are the standard way for interacting with
computers and the end-users feel more familiar with.
Surprisingly, the SpaceMouse interactions (mean =
5.4, SD = 2.50998, SE = 1.1225) received the most
variable responses. Some participants argued that it is
extremely useful to manipulate the virtual information
using only one hand but others recorded that a lot of
time is required to fully familiarise with the device and
even then it is not as easy to use other means such as
the I/O devices and the marker cards.
Moreover, the GUI interactions (mean = 4.4,
SD = 2.19089, SE = 0.9798) got the worst score from
all other types of interaction. One of the end-users
argued that it is difﬁcult to understand how to alter the
orientation of the virtual objects since it was speciﬁed
as yaw, pitch and roll. Other users stated that it takes
the most time to perform a single rotation compared to
the rest of the methods. For example, the keyboard
keys replicate the functionality and as soon as the user
becomes familiar with the ‘‘shortcuts’’ it is much faster.
On the contrary, the marker cards interaction (mean =
7.8, SD = 2.58844, SE = 1.15758) received the most
positive feedback of all other types of interaction and
although it was pretty much expected, an initial comparison
between different techniques has been made.
All participants agreed that it very easy and intuitive to
manipulate the virtual information in 3D space using
any type of physical interface but they also proposed to
use in the future a physical interface that consists of a
handle. Overall, the preliminary evaluation was a
proﬁtable experience to complete the ﬁrst cycle of this
research but more user-studies need to be performed in
the future.
9 Conclusions and future work
In this paper the design and implementation of effective
AR interfaces for indoor-environments was presented
and analysed. The proposed framework can be used as a
generic tool to create high-level AR applications. The
ﬁnal visualisation can be performed either on a variety
of display technologies ranging from monitor based to
video see-through display technologies. A series of
visualisation and interaction techniques were investigated
in order to create the illusion that the virtual
information coexists with the real world. In addition,
two innovative AR case studies have been implemented:
one for higher education purposes (university
environments) and the second for archaeological and
cultural heritage purposes (museum environments).
Finally, an initial evaluation was performed to obtain
useful critique concerning the overall technicalities and
practicalities of the system.
Fig. 17 Monitor versus HMD user response
0
2
4
6
8
10
I/O 3D Mouse GUI Cards
Satisfaction
Interaction Techniques
Fig. 18 Interaction techniques user response
Virtual Reality (2007) 11:23–43 41
123
The main advantages of the AR architecture are the
low cost and the multimedia augmentation in realtime.
The structure of the architectures is based on the
philosophy that the most appropriate tool/device must
be used for the task ones seeking to achieve. This,
however, does not imply that the best tool/device is the
most expensive one. The two different experimental
setups successfully tested for this research clearly
demonstrate this. One cost effective setup has been
constructed comprising of off-the-self hardware and a
second one based on state of the art expensive hardware
components (i.e. Spacemouse, Touch Screen).
Although the system is designed for indoor environments
it can be easily extended to operate in outdoor
environments. The current status of the research
is focused in various mobile devices such as personal
digital assistants (PDAs) and third-generation (3G)
phones as well as positioning technologies (such as
GPS). This will create a robust mobile AR environment
that will be integrated with the rest of the interface
framework to provide prototype applications for
outdoor environments.
Acknowledgments Part of this research work was funded by
the EU IST Framework V programme, Key Action III- Multimedia
Content and Tools, Augmented Representation of
Cultural Objects (ARCO) project IST-2000-28366.
References
Angel E (2003) Interactive computer graphics: a top-down
approach using OpenGL,3rd edn. Addison–Wesley, Reading,
pp 17–18, 69, 107, 322–349, 472
Azuma R (1997) A survey of augmented reality. Teleoper
Virtual Environ 6(4):355–385
Azuma R, Baillot Y et al (2001) Recent advances in augmented
reality. IEEE Comput Graph November/December
21(6):34–47
Begault DR (1994) 3D Sound for virtual reality and multimedia,
Academic, New York, 1, 17–18
Billinghurst M, Kato H, Poupyrev I (2001) The magicbook: a
traditional AR interface. Comput Graph 25:745–753
Butz A, Ho¨llerer T et al (1999) Enveloping users and computers
in a collaborative 3D augmented reality. In: Proceedings of
the 2nd IEEE and ACM international workshop on
augmented reality ‘99. San Francisco, October 20–21
Camera calibration toolbox for Matlab, available at: [http://
www.vision.caltech.edu/bouguetj/calib_doc/], Accessed at
14/01/2003
Dobler D, Haller M, Stampﬂ P (2002) ASR—augmented sound
reality, ACM SIGGRAPH 2002 conference abstracts and
applications, San Antonio, p 148
Feiner S, MacIntyre B et al (1993) Windows on the world: 2D
Windows for 3D augmented reality. In: Proceedings of the
ACM symposium on user interface software and technology,
Atlanta, November 3–5, Association for Computing
Machinery, pp 145–155
Fernandes B, Miranda JC (2003) Learning how computer works
with augmented reality. In: Proceedings of the 2nd international
conference on multimedia and information and communication
technologies in education, Badajoz, December 3–6
Fuhrmann A, Schmalstieg D (1999) Concept and implementation
of a collaborative workspace for augmented reality,
GRAPHICS ‘99, 18(3)
Gatermann H (2000) From VRML to augmented reality via
panorama-integration and EAI-Java, in constructing the
digital space. In: Proceeding of the SiGraDi, September,
254–256
Grasset R, Gascuel J-D (2002) MARE: multiuser augmented
reality environment on table setup. ACM SIGGRAPH
conference abstracts and applications
Hall T, Ciolﬁ L et al (2001) The visitor as virtual archaeologist:
using mixed reality technology to enhance education and
social interaction in the museum. In: Spencer S (ed)
Proceedings of the virtual reality, archaeology, and cultural
heritage (VAST 2001), New York, ACM SIGGRAPH,
Glyfada, Nr Athens, November, pp 91–96
Haller M, Hartmann W et al (2002) Combining ARToolKit with
scene graph libraries. In: Proceedings of the ﬁrst IEEE
international augmented reality toolkit workshop, Darmstadt,
Germany, 29 September
Haniff D, Baber C, Edmondson W (2000) Categorizing augmented
reality systems. J Three Dimens Images 14(4):105–
109
Kato H, Billinghurst M, et al (2000a) Virtual object manipulation
on a table-top AR environment. In: Proceedings of the
international symposium on augmented reality 2000, Munich,5–6
Oct, pp111–119
Kato H, Billinghurst M, Poupyrev I (2000b) ARToolkit user
manual, version 2.33, Human Interface Lab, University of
Washington
Klinker G, Ahlers KH et al (1997) Conﬂuence of computer
vision and interactive graphics for augmented reality,
PRESENCE: teleoperations and virtual environments. special
issue on augmented reality, August 6(4):433–451
Liarokapis F, White M, Lister PF (2004a) Augmented reality
interface toolkit. In: Proceedings of the international
symposium on augmented and virtual reality, London, pp
761–767
Liarokapis F, Sylaiou S, et al (2004b) An interactive visualisation
interface for virtual museum. In: Proceedings of the 5th
international symposium on virtual reality, ArchaeologyCultural
Heritage, pp 47–56
Liarokapis F (2005) Augmented reality interfaces—architectures
for visualising and interacting with virtual information. PhD
thesis. University of Sussex, Falmer
Liarokapis, Petridis P, Lister PF, White M (2002) Multimedia
augmented reality interface for E-learning (MARIE).
World TransEng Technol Educ 1(2):173–176
MacIntyre B, Gandy M, Dow S, Bolter JD (2005) DART: a
toolkit for rapid design exploration of augmented reality
experiences. ACM Trans Graph (TOG), 24(3):932
Mahoney D (1999b) Better than real, computer graphics world,
pp 32–40
Malbezin P, Piekarski W and Thomas B (2002) Measuring
ARToolKit accuracy in long distance tracking experiments.
In: Proceedings of the 1st international augmented reality
toolkit workshop, Germany, Darmstadt, September 29
Milgram P, Colquhoun H (1999) A Taxonomy of real and virtual
world display integration, mixed reality merging real and
virtual worlds. Ohta Y, Tamura H (eds) Ohmsha Ltd,
Chapter 1, pp 5–30
42 Virtual Reality (2007) 11:23–43
123
Milgram P, Kishino F (1994) A taxonomy of mixed reality visual
displays, IEICE Trans Inf Syst E77-D(12):1321–1329
Moller T (1999) Real-time rendering. AK Peters Ltd, Natick, 23–
38, 171
Poupyrev I, Tan D et al (2002) Developing a generic augmented
reality interface. Computer 35(3):44–50
Reitmayr G, Schmalstieg D (2001) A wearable 3D augmented
reality workspace. In: Proceedings of the 5th international
symposium on wearable computers, October 8–9
Rekimoto J, Nagao K (1995) The world through the computer:
computer augmented interaction with real world environments.
In: Myers BA (ed) Proceedings of UIST ‘95. ACM,
Pennsylvania, pp 29–36
Shi J, Tomasi C (1994) Good features to track, IEEE conference
on computer vision and pattern recognition, Seattle, June,
pp 593–600
Sinclair P, Martinez K (2001) Adaptive hypermedia in augmented
reality. In: Proceedings of the third workshop on
adaptive hypertext and hypermedia at the twelfth ACM
conference on hypertext and hypermedia, Denmark, August
2001, pp217–219
Slay H, Phillips M et al (2001) Interaction modes for augmented
reality visualization, Australian symposium on information
visualization, Sydney, December
Smith GC (1994) The art of interaction. In: MacDonald L, Vince
J (eds) Interacting with virtual environments. Wiley, New
York,pp 79–94
Tory M, Mo¨ller T (2005) Evaluating visualizations: do expert
reviews work? IEEE Comput Graph Appl 25(5):8–11
Vallino J (1998) Interactive augmented reality. PhD thesis,
Department of Computer Science, University of Rochester,
pp 1–25
Weng J, Cohen P, Herniou M (1992) Camera calibration with
distortion models and accuracy evaluation, IEEE transactions
on pattern analysis and machine intelligence, 14(10)
Woo M, Neider J, Davis T (1999) OpenGL programming guide:
the ofﬁcial guide to learning OpenGL, Version 1.2, Addison–Wesley,
Reading
Yewdall D (1999) Practical art of motion picture sound. Focal
Press, Boston
Virtual Reality (2007) 11:23–43 43
123
Interactive Virtual and Augmented Reality Environments
108
8.8 Paper #8
Mountain, D., Liarokapis, F. Mixed reality (MR) interfaces for mobile information systems,
Aslib Proceedings, Special issue: UK library & information schools, Emerald Press, 59(4/5):
422-436, 2007.
Contribution (50%): Collaboration on the design of the architecture and implementation of the
VR interface. Write-up of half of the paper.
Mixed reality (MR) interfaces for
mobile information systems
David Mountain
giCentre, Department of Information Science, City University London,
London, UK, and
Fotis Liarokapis
Coventry University, Coventry, UK and Department of Information Science,
City University London, London, UK
Abstract
Purpose – The motivation for this research is the emergence of mobile information systems where
information is disseminated to mobile individuals via handheld devices. A key distinction between
mobile and desktop computing is the signiﬁcance of the relationship between the spatial location of an
individual and the spatial location associated with information accessed by that individual. Given a set
of spatially referenced documents retrieved from a mobile information system, this set can be
presented using alternative interfaces of which two presently dominate: textual lists and graphical
two-dimensional maps. The purpose of this paper is to explore how mixed reality interfaces can be
used for the presentation of information on mobile devices.
Design/methodology/approach – A review of relevant literature is followed by a proposed
classiﬁcation of four alternative interfaces. Each interface is the result of a rapid prototyping approach
to software development. Some brief evaluation is described, based upon thinking aloud and cognitive
walk-through techniques with expert users.
Findings – The most suitable interface for mobile information systems is likely to be user- and
task-dependent; however, mixed reality interfaces offer promise in allowing mobile users to make
associations between spatially referenced information and the physical world.
Research limitations/implications – Evaluation of these interfaces is limited to a small number of
expert evaluators, and does not include a full-scale evaluation with a large number of end users.
Originality/value – The application of mixed reality interfaces to the task of displaying spatially
referenced information for mobile individuals.
Keywords Reality, Mobile communication systems, Information systems, Geography
Paper type Research paper
1. Introduction
Two of the most signiﬁcant technological trends of the past 15 years have been the
increased portability of computer hardware – such as laptop computers and personal
digital assistants (PDAs) – and the increasing availability of wireless networks such
as mobile telecommunications, and more recently wireless access points (Brimicombe
and Li, 2006). The convergence of these technological drivers presents opportunities
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/0001-253X.htm
The work presented in this paper is conducted within the LOCUS project, funded by EPSRC
through the Pinpoint Faraday Partnership. The authors would also like to thank their partner on
the project, GeoInformation Group, Cambridge, for contributing the building geometry and
heights data used by the project. The authors are also grateful to Hulya Guzel for her assistance
in the expert user evaluation.
AP
59,4/5
422
Received 15 December 2006
Accepted 12 June 2007
Aslib Proceedings: New Information
Perspectives
Vol. 59 No. 4/5, 2007
pp. 422-436
q Emerald Group Publishing Limited
0001-253X
DOI 10.1108/00012530710817618
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
within the emerging ﬁeld of mobile computing. Increasingly there is ubiquitous access
to information stored via a variety of media (for example, text, audio, image and video)
via mobile devices with wireless network connections. Advances in software
development tools for mobile devices have resulted in the implementation of
user-friendly interfaces that aim to appeal to a wide audience of end users. A key
challenge for researchers of mobile information systems is to decide the type of
interface to adopt when presenting this information on mobile devices. Additionally,
developers should assess whether the most suitable interface is dependent upon the
audience, the task-in-hand and geographic context in which the mobile information
system is likely to be used (Jiang and Yao, 2006).
The LOCUS project (LOcation Context tools for UMTS Services) being conducted
within the Department of Information Science at City University is addressing some of
the research challenges described above (LOCUS, 2007). The main aim of the project is
to enhance the effectiveness of location-based services (LBS) in urban environments by
investigating how mixed reality interfaces compare with the current map- and
text-based approaches used by the majority of location-based services for the tasks of
navigation and wayﬁnding (Mountain and Liarokapis, 2005). To satisfy this aim,
LOCUS is tackling a number of issues including the three-dimensional representation
of urban environments, the presentation of spatially referenced information – such as
the information retrieved as the result of a user query, and navigational information to
speciﬁc locations – and advanced visualisation and interaction techniques (Liarokapis
et al., 2006).
The LOCUS system is built on top of the WebPark mobile client-server architecture
(WebPark, 2006) which provides the basic functionality associated with LBS including
the retrieval of information based upon spatial and semantic criteria, and the
presentation of this information as a list or on a map (see Figures 1a and b). In common
with the majority of LBS, the basic architecture provides no mechanism for the display
of information in a three-dimensional environment, such as a mixed reality interface.
Mixed reality environments occupy a spectrum between entirely real environments
at one extreme and entirely virtual environments on the other. This mixing of the real
and the virtual domain offers great potential in terms of displaying information
retrieved as a result of a location-based search, since this requires the presentation of
digital information relative to your location in the physical world. This presentation
may on the one hand be entirely synthetic, for example, placing virtual objects
representing individual results within a virtual scene as a backdrop. Alternatively, an
augmented reality interface can superimpose this information over the real world scene
in the appropriate spatial location from the mobile user’s perspective. Both interfaces
can present the location of information within the scene as well as navigation tools that
describe the routes to the spatial locations associated with retrieved information. The
LOCUS project is extending the functionality of the WebPark architecture to allow the
presentation of spatially referenced information via these mixed reality interfaces on
mobile devices (see Figures 1c and d).
The rest of the paper is structured as follows. First, a review of relevant background
literature in mobile computing and mixed reality is presented. Next, candidate
interfaces for mobile information provision into mobile devices are suggested: these
include the list, the map, and virtual and augmented reality interfaces. The paper
closes with a discussion and conclusions.
MR interfaces for
mixed reality
interfaces
423
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
2. Background
2.1 Mobile computing
Just as the evolution of the internet has had a profound impact upon application
development, forcing a change from a stand-alone desktop architecture to a more
ﬂexible, client-server architecture (Peng and Tsou, 2003), researchers in mobile
computing are currently having a similar impact, forcing the development of web
resources and applications that can be run on a wider range of devices than traditional
desktop machines. According to Peng and Tsou (2003), mobile computing
environments have three deﬁning characteristics:
(1) mobile clients that have limited processing and display capacity (e.g. PDAs and
smart phones);
Figure 1.
Interfaces for presenting
information retrieved from
a mobile information
system
AP
59,4/5
424
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
(2) non-stationary users who may use their devices whilst on the move; and
(3) wireless connections that are often more volatile, and have more constrained
bandwidth, compared to the “ﬁxed” internet.
These three characteristics suggest that mobile devices have both speciﬁc constraints
and unique opportunities when compared to their desktop counterparts. First, screen real
estate is limited; typically screens are small (usually less than 60 mm by 80 mm) with low
resolution (typically 240 pixels width), and a relatively large proportion of this space may
be taken up with marginalia such as scroll bars and menus; hence every pixel should be
used wisely. Next, the outdoor environment is a more unpredictable and dynamic
environment than the typically familiar indoor home and ofﬁce environments in which
desktop machines are used; hence, user attention is more likely to be distracted in the
mobile context. Mobile computer usage tends to be characterised by multiple short
sessions per day, as compared with desktop usage, which tends to be for relatively few,
longer durations (Ostrem, 2002). Given these constraints, there is a clear need for
information to be communicated concisely and effectively for mobile users.
Despite constraints, the mobile computing environment offers a unique opportunity
for the presentation of information, in particular taking advantage of location sensors
to organise information relative to the device user’s position, or their spatial behaviour
(Mountain and MacFarlane, 2007).
While spatial proximity is perhaps the most intuitive and easily calculated measure
of geographic relevance, it may not be the most appropriate in all situations and a
variety of other measures of geographic relevance (Mountain and MacFarlane, 2007;
Raper, 2007) have been suggested. Individuals may be more interested in the relative
accessibility of results, which can be quantiﬁed by travel time and can take account for
natural and manmade boundaries (Golledge and Stimson, 1997) or the transportation
network, to discount results that are relatively inaccessible despite being physically
close (Mountain, 2005). Geographic relevance can also be quantiﬁed as the results are
most likely to be visited in the future (Brimicombe and Li, 2006), or those that are most
visible from the current location (Kray and Kortuem, 2004). However geographic
relevance is quantiﬁed, there are opportunities to use this property to retrieve
documents from document collections. Given a set of spatially referenced results that
are deemed to be geographically relevant according to some criterion, there are a
variety of different approaches to presenting this information.
Various mobile information systems have been developed. Kirste (1995) developed one
of the ﬁrst experimental mobile information systems based on wireless data
communication. A few years later, Afonso et al. (1998) presented an adaptable
framework for mobile computing information dissemination systems called UbiData. This
model adopts a “push” model where relevant information is sent to the user, without them
making a speciﬁc request, based upon their location. There are now a host of commercial
and prototype mobile information systems that can present information dependent upon
an individual’s semantic and geographic criteria (Yell Group, 2006; WebPark, 2006), the
majority of which present results either as a list or over a backdrop map.
2.2 Mixed reality
The mixed reality spectrum was proposed by Milgram and Kishino (1994), who
depicted representations on a continuum with the tangible, physical (“real”) world at
MR interfaces for
mixed reality
interfaces
425
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
one extreme and entirely synthetic virtual reality (VR) at the other. Two classes were
identiﬁed between these extremes. Augmented reality (AR) refers to virtual
information placed within the context of the real world scene, for example, virtual
chess pieces on a real chessboard. The second case – augmented virtuality – refers to
physical information being placed in a virtual scene, for example, real chess pieces on a
virtual board. The resulting reality-virtuality continuum is shown in Figure 2.
The ﬁrst VR system was introduced in the 1950s (Rheingold, 1991), and since then
VR interfaces have taken two approaches:
(1) immersive head-mounted displays (HMDs); and
(2) through the window approaches.
HMDs are very effective at blocking the signals from the real world and replacing this
natural sensory information with digital information. Navigation within the scene can
be controlled by mounting orientation sensors on top of the HMD, a form of gesture
computing whereby the user physically turning their head results in a rotation of the
viewpoint in the virtual scene. The ergonomic limitations of HMDs proved unpopular
with users and this immersive interface has failed to be taken up on a wide scale
(Ghadirian and Bishop, 2002). In contrast to HMDs, the through the window (Bodum,
2005) – or monitor-based VR/AR – approach exploits monitors on desktop machines
to visualise the virtual scene, a far less immersive approach since the user is not
physically cut-off from the physical world around them. This simplistic form of
visualisation has the advantage that it is cost-effective (Azuma, 1997). Interaction is
usually realised via standard input/output (I/O) devices such as the mouse or the
keyboard but also more sophisticated devices (such as spacemouse, inertia cube, etc.)
may be employed (Liarokapis, 2005).
Both HMDs and through the window approaches of VR aim to replace the physical
world with the virtual. The distinction of AR is that it aims to seamlessly combine real
and virtual information (Tamura and Katayama, 1999) by superimposing digital
information directly into a user’s sensory perception (Feiner, 2002) (see Figure 3). Whilst
VR and AR can process and display similar information (for example three-dimensional
buildings) the combination of the “real” and the “virtual” in the AR case is inherently
more complex than the closed virtual worlds of VR systems.
This combination of real and virtual requires accurate tracking of the location of the
user (in three spatial dimensions: x, y and z) and the orientation of their view (around
three axes of orientation: yaw, pitch and roll), in order to be able to superimpose digital
information at the appropriate location with respect to the real world scene, a
procedure known as registration. In the past few years, research has achieved great
advances in tracking, display and interaction technologies, which can improve the
effectiveness of AR systems (Liarokapis, 2005). The required accuracy of the AR
Figure 2.
The reality-virtuality
continuum
AP
59,4/5
426
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
tracking depends to a degree upon the scenario of use. In order to correctly
superimpose an alternative building fac¸ade (for example, a historic or planned building
fac¸ade) over an existing building, highly accurate tracking is required in terms of
position and orientation, else the illusion will fail since the real and virtual fac¸ades will
not align, or may drift apart as the user moves or turns their head (Hallaway et al.,
2004). However, if simply augmenting the real world scene with annotations in the
forms of text or symbols, for example, an arrow indicating the direction to turn at an
upcoming junction, this tracking may not be required to be so accurate.
The two most common tracking techniques used in AR applications include computer
vision and external sensor systems. The visual approach uses ﬁducial reference points,
where a speciﬁc number of locations act as links between the real and virtual scenes
(Hallaway et al., 2004). These locations are usually marked with distinctive high contrast
markers to assist identiﬁcation, but alternatively can be distinctive landmarks within the
real-world scene. Computer vision algorithms ﬁrst need to identify at least three reference
points in real time from a video camera input, then calculate the distance and orientation
of the camera with respect to those reference points. Tracking using a computer
vision-based system therefore establishes a relative spatial relationship between a ﬁnite
number of locations in the real-world scene and the observer, via a video camera carried or
worn by that observer (Hallaway et al., 2004), which can allow very accurate registration
between the real and virtual scenes in a well-lit indoor environment. This computer vision
approach nevertheless has signiﬁcant constraints. First, the system must be trained to
identify these ﬁducial reference points, and may further require the real-world scene to
have markers placed within it. It requires both good lighting conditions (although infrared
cameras can be also used for night vision) and signiﬁcant computing resources to perform
real-time tracking, and therefore has usually been conducted in an indoor, desktop
environment (Liarokapis and Brujic-Okretic, 2006).
An alternative to the vision-based approach is to use external sensors to determine
the position of the user and the orientation of their view. Positioning sensors such as
Figure 3.
Augmented reality
representation: a computer
vision sensor recognises
the doorway outline, and
augments the video
stream with virtual
information (the direction
arrow). Developed as part
of the LOCUS project
MR interfaces for
mixed reality
interfaces
427
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
the Global Positioning System (GPS) can determine position in three dimensions and
digital compasses, gyroscopes and accelerometers can be employed to determine the
orientation of the user’s view. These sensor-based approaches have the advantage that
they are not constrained to speciﬁc locations, unlike computer vision algorithms, which
must be trained to recognise speciﬁc reference points within a scene. Also, the user’s
location is known with respect to an external spatial referencing system, rather than
establishing relative relationships between the user and speciﬁc reference points. A
major disadvantage is the accuracy of the positioning systems, which can produce
errors measured in tens of metres and can produce poor results when attempting to
augment the real-world scene with virtual information. While advances in GPS
systems such as differential GPS and real-time kinematic GPS can bring down the
accuracy to one metre and a few centimetres respectively, GPS receivers still struggle
to attain a positional ﬁx where there is no clear view of the sky, for example in doors.
Digital compasses also have limitations; the main ﬂaw is that they are prone to
environmental factors such as magnetic ﬁelds.
Having identiﬁed a spatial relationship between the real-world scene and the user
location, virtual information needs to somehow be superimposed upon the real world
scene. Traditionally there have been two approaches to achieving this:
(1) video see-through displays; and
(2) optical see-through displays.
Video see-through displays are comprised of a graphics system, a video camera, a
monitor and a video combiner (Azuma, 1997). They operate by combining a HMD with
a video camera. The video camera records the real environment and then sends the
recorded video to the graphics system for processing. There the outputted video and
the generated graphics images, by the graphics system, are blended together. Finally,
the user perceives the augmented view in the closed-view display system. Using the
alternative approach, optical see-through displays are usually comprised of a graphics
system, a monitor and an optical combiner (Azuma, 1997). They work by simply
placing the optical combiners in front of the user’s view. The main characteristic of the
optical combiners is that they are partially transmissive and reﬂective. That is because
the combiners operate like half-silvered mirrors, permitting only a portion of the light
to penetrate. As a result, the intensity of the light that the user ﬁnally sees is reduced.
A novel approach to augmenting the real-world scene with virtual information,
emerging from within the ﬁeld of mobile computing, is to use the screen of a handheld
device to act as a virtual window on the physical world. Knowing the position and
orientation of the device, the information displayed on screen can respond to
movements and gestures of a mobile individual, for example, presenting the name of a
building as text on the screen when a user points their mobile device at it, or updating
navigational instructions via symbols or text as a user traverses a route.
MARS is one of the ﬁrst outdoor AR systems and a characteristic example of a
wireless mobile interface system for indoor and outdoor applications. MARS was
developed to aid navigation and to deliver location-based information to tourists in a
city (Ho¨llerer et al., 1999). The user stands in an outdoor environment wearing a
prototype system consisting of a computer, a GPS system, a see-through head-worn
display and a stylus-operated computer. Interaction is via a stylus and display is via a
tracked see-through head-worn display. MARS like most current mobile AR systems
AP
59,4/5
428
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
has signiﬁcant ergonomic restrictions which stretch the deﬁnitive of mobile and
wearable computing beyond what is acceptable for most users (the system is driven by
a computer contained in a backpack).
Tinmith-Hand AR/VR is a uniﬁed interface technology designed to support outdoor
mobile AR applications and indoor VR applications (Piekarski and Thomas, 2002).
This system employs various techniques including 3D interaction techniques,
modelling techniques, tracked input gloves and a menu control system, in order to
build VR/AR applications that can be applied to construct complex models of objects in
both indoor and outdoor environments. A location-based application that was designed
for a mobile AR system is ARLib: it aims to assist the user in typical tasks that are
performed within a library environment (Umlauf et al., 2002). The system follows a
wide area tracking approach (Hedley et al., 2002) based on ﬁducial-based registration.
Many distinct markers are attached to bookshelves and walls so that the book’s
positions are superimposed on the shelves as the user navigates inside the library. To
provide extra support to the user, a simple interface and a search engine are integrated
to provide maximum usability and speed during book searches.
3. Interfaces for mobile information systems
There are many candidate interfaces for the presentation of the results of an
information retrieval query on mobile devices (Mannings and Pearson, 2003; Schoﬁeld
and Kubin, 2002; Mountain and Liarokapis, 2005). This section describes how the
interfaces described previously can be applied to the task of presenting information
retrieved as the result of a mobile query.
As described in the introduction, the LOCUS project has developed alternative, mixed
reality interfaces for existing mobile information system technology based upon the
WebPark platform. The WebPark platform can assist users in formulating spatially
referenced, mobile queries. The retrieved set of spatially referenced results can then be
displayed using various alternative interfaces: a list, a map, virtual reality or augmented
reality. Each interface is described in more detail in the rest of this section.
3.1 List interface
The most familiar interface for the presentation of the results of an information
retrieval query is a list; this is the approach taken by the majority of internet search
engines where the most relevant result is placed at the top of the list, with relevance
decreasing further down the list (see Figure 1a). In the domain of location-aware
computing, results that are deemed to be particularly geographically relevant
(Mountain and MacFarlane, 2007; Raper, 2007) will be presented higher up the list
(Google, 2006; WebPark, 2006). While familiar, this approach of simply ordering the
results does not convey their location relative to your current position.
3.2 Map interface
The current paradigm in the ﬁeld of LBS is to present information relevant to an
individual’s query or task over a backdrop map (see Figure 1b). This information may
include the individual’s current position (and additionally some representation of the
spatial accuracy), the locations of features of interest that were retrieved as the result of
a user query (e.g. the results from a “ﬁnd my nearest” search), or navigation
information such as a route to be followed. This graphical approach has the advantage
MR interfaces for
mixed reality
interfaces
429
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
of displaying the direction and distance of results relative to the user’s location (a
vector value), as opposed to just an ordering results based on distance. The viewpoint
is generally allocentric (Klatzky, 1998), adopting a bird’s eye view looking straight
down on a ﬂat, two-dimensional scene (see Figure 1c). The backdrop contextual map
used is usually an abstract representation and may choose to display terrain, points or
regions of interest, transportation links, or other information, alternatively a degree of
realism can be included by using aerial photography (WebPark, 2006; Google, 2006).
3.3 VR interface
An alternative to the allocentric viewpoint of a two-dimensional, abstract scene is to
choose an egocentric viewpoint within a three-dimensional scene (see Figure 4). Such a
perspective is familiar from VR discussed in section 2.2. While the concept of VR has
existed for many decades, only during the past few years has it been used on handheld
mobile devices. Traditionally, VR applications have been deployed on desktop devices
and have attempted to create realistic looking models of environments to promote a
feeling of immersion within a virtual scene. This has resulted in less opportunity for
individuals to compare the virtual scene with its real-world counterpart. This separation
of the real and the virtual is due in part to the static nature of desktop devices, and in
addition that the appeal of many virtual scenes is that they allow the viewing of locations
that cannot be visited easily, for example, virtual ﬂy-throughs on other planets (NASA Jet
Propulsion Laboratory, 2006) and imagined landscapes (Elf World, 2006).
In a location-aware, mobile computing context, the position of the user’s viewpoint
within a VR scene can be controlled from an external location sensor such as GPS, and
the orientation of the viewpoint can be controlled by sensing the direction of movement
(from GPS heading), or an orientation sensor to gauge the direction an individual is
facing (e.g. a digital compass). The VR scenes themselves can adopt different levels of
detail and realism (Bodum, 2005). A particular building may be represented with an
exact three-dimensional geometric representation, and graphics added as textures to
the fac¸ades of the building to create as true a representation as possible – known as a
verisimilar representation (Bodum, 2005). Alternatively, the building may be modelled
with a generalised approximation of the geometry within speciﬁc tolerances. For
texturing the building facades, generic images may be applied that are typical of that
class of building. The building block can be left untextured, but more abstract
information conveying using shading, icons, symbols or text (Bodum, 2005). The level
of detail and realism required by different users for different tasks is an open question
currently under investigation (Liarokapis and Brujic-Okretic, 2006).
Traditionally, for VR applications deployed in a static, desktop context, there has
been greater emphasis placed upon scenes looking realistic than ensuring that the
content of these scenes is spatially referenced. However, in a mobile context, accurate
spatial referencing of VR scenes is required when setting the viewpoint within that
scene (using position and orientation sensors) to ensure that the viewpoint in the
virtual scene is registered accurately with the user’s location in the real world scene.
Realism is still important since this can help the user make associations between
objects in the virtual scene and those in the real world.
For the applications developed as part of the LOCUS project, within this VR
backdrop, additional, non-realistic visual information can be included to augment the
scene. Such information can include nodes representing documents retrieved from a
AP
59,4/5
430
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
Figure 4.
A virtual representation of
a London neighbourhood
MR interfaces for
mixed reality
interfaces
431
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
spatially referenced document collection (see Figure 1c), or navigational information
and instructions (i.e. 3D textual directions). This approach has the advantage of
promoting a feeling of immersion, and creating a stronger association between the
physical world and relevant geo-referenced information, but is potentially less effective
than a map in providing a quick synopsis of larger volumes of information relative to
your location. There are opportunities to adopt multiple viewpoints within the VR
scene that fall between the extremes of the allocentric-egocentric spectrum, for example
an oblique perspective several metres higher than the user’s viewpoint (see Figure 4).
3.4 AR interface
A fourth approach to the display of information in mobile computing is to use the
device to merge the real world scene with relevant, spatially referenced information by
using an AR interface – the virtual window approach described in section 2.2. Just as
for the mobile VR case described above, knowing the location and orientation of the
device is an essential requirement for outdoor AR, in order to superimpose information
in the correct location.
As described in the literature review, a GPS receiver and digital compass can
provide sufﬁcient accuracy for displaying points of interest in the approximate location
relative to the user’s position. At present, however, these sensor solutions lack the
accuracy required for more advanced AR functionality, such as aligning an alternative
fac¸ade on the front of a building in the real world scene. In the LOCUS system, the
handheld mobile device presents text, symbols and annotations in response to the
location and orientation of the device. There is no need for a HMD, since the screen on
the device can be aligned with the real world scene. On the screen of the device,
information can either be overlaid on imagery captured from the device’s internal
camera, or the screen can display just the virtual information with the user viewing the
real world scene directly.
The information displayed is dependent upon the task in hand. When viewing a set
of results, as the user pans the device around them, the name and distance of each
result is displayed in turn as it coincides with the direction that the user is pointing the
device, allowing the user to interrogate the real world scene by gesturing. By adopting
an egocentric perspective to combine real and virtual information in this way, users of
the system can base their decisions of which location to visit on more quantiﬁable
criteria – such as the distance to a particular result, and the relevance on semantic
criteria – but also the more subjective criteria that could never be quantiﬁed by an
information system. For example, following a mobile search for places to eat conducted
at a crossroads, by gesturing with a mobile device, users can see the distance and
direction of candidate restaurants, and make an assessment based upon the ambience
of the streets upon which different restaurants are located.
Having selected a particular result from the list of candidates, the AR interface can
then provide navigational information, in the form of distance and direction
annotations (see Figure 1d), to guide the user to the location associated with those
results. Although most examples from location-based service suggest “where’s my
nearest” shop or service, there is no reason that this information could not be the
location of breaking news stories from a news website, or spatially referenced HTML
pages providing historical information associated with a particular era or event.
AP
59,4/5
432
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
4. Discussion
An evaluation exercise was undertaken to assess appropriate levels of detail, realism
and interaction for the mobile virtual reality interface. Whilst there has been extensive
evaluation of these requirements in a static desktop context (Dollner, 2005), relatively
little attention has been paid to the speciﬁc needs of mobile users. In order to gauge
these speciﬁc requirements, an expert evaluation was conducted. Two common
evaluation techniques were applied:
(1) think aloud; and
(2) cognitive walkthrough (Dix et al., 2004).
Think aloud is a form of observation that involves participants talking through the
actions they are performing, and what they believe to be happening, whilst interacting
with a system. The cognitive walkthrough technique was also used where a prototype
of a mobile VR application and scenario of use were presented to expert users:
evaluating in this way allows fast assessment of early mock-up, and hence can
inﬂuence subsequent development and the suitability of the ﬁnal application. Both
forms of evaluation are appropriate for small numbers of participants testing prototype
software and it has been suggested that the majority of usability problems can be
discovered from testing in this way (Dix et al., 2004).
The expert user testing took place at City University with a total of four users with
varied backgrounds: one human-computer interaction expert, one information
visualization expert, one information retrieval expert and one geographic
information scientist. Each user spent approximately one hour performing four
tasks. The aims of the evaluation of the VR prototype included assessment of the
expert user experience with particular focus on:
.
the degree of realism required in the scene;
.
the required spatial accuracy and level of detail of the building outlines; and
.
a comparison of 3D virtual scenes with 2D paper maps.
A virtual reality scene was created of the University campus and surrounding area,
and viewpoints placed to describe trajectories of movement through the scene. The
expert-evaluation process covered two tasks, including mobile search and navigation.
The ﬁrst scenario was in relation to searching for, then locating, speciﬁc features. For
example, a user searching on a mobile system for entrances to the City University
campus from a nearby station. The second scenario was in relation to navigation from
one point to another, for example, from the station to the University. Starting and
target locations were marked in the 3D maps, and sequences of viewpoints were
presented, to mimic movement through the scene.
There was a great deal of variation in terms of the level of photorealism required in the
scene, and whether buildings should have image textures placed over the building faces,
or whether the building outlines would be sufﬁcient alone. Opinions varied between
evaluators and according to the task in hand. Plain, untextured buildings are hard to
distinguish from each other and, in contrast, buildings with realistic textures were
considered easy to recognise in a micro-scale navigation context (for example, trying to
ﬁnd the entrance to a particular building). However, many evaluators thought that much
of this realism would not be required or visible on a small screen device when an overview
MR interfaces for
mixed reality
interfaces
433
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
of the area was required, for example, when considering one’s present location in relation
to information retrieved from a mobile search. Expert users also suggested various
departures from the realism traditionally aspired to within the ﬁeld of virtual reality.
These included transparency, to allow users to see through buildings as an aid to
navigation, since this will allow the identiﬁcation of the location of a concealed destination
point. Other suggestions included labelling of objects in the scene (for example, building
and street names). The inclusion of symbology in the scene to represent points, and routes
to those points, was considered to be beneﬁcial to the task of navigation.
In terms of the level of detail and spatial accuracy, some users thought that it was
not important to have very detailed models of building geometry. Building outlines
that are roughly the right size and shape are sufﬁcient, especially when considering an
overview of an area, as often required in the mobile search task. For micro-navigation,
a higher degree of accuracy may be required.
Virtual 3D scenes were found to have many advantages when compared to paper
maps: the most positive feature was found to be the possibility to recognize the features
in the surrounding environment, which provides a link between the real and virtual
worlds. This removes the need to map-read, which is required when attempting to link
your position in the real world with a 2D map, hence the VR interface offers an effective
way to gauge your initial position and orientation. A more intangible response was the
majority of the users enjoyed interacting with the VR interface more than a 2D map.
However, the 3D interface also has signiﬁcant drawbacks. Some users said that they
are so used to using 2D maps that they do not really need a 3D map for navigating,
however they thought this attitude may change with the next generation. The size,
resolution and contrast of the device screen were also highlighted as potential
problems for the VR interface.
5. Conclusions
This paper has presented some insights on how mixed reality interfaces can be used in
conjunction with mobile information systems to enhance the user experience. We have
explored how the LOCUS project has extended LBS through different interfaces to aid
the tasks of urban navigation and wayﬁnding. In particular, we have described how
virtual and augmented reality interfaces can be used in place of text- and map-based
interfaces, which can provide an egocentric perspective to location-based information
which is lacking from map- and text-based representations.
Expert user evaluation has proven to be a useful technique to aid development, and
suggests that the most suitable interface is likely to vary according to the user and task
in hand. Continued research, development and evaluation is required to provide
increasingly intuitive interfaces for location-based services that can allow users to
make associations between spatially referenced information retrieved from mobile
information systems, and their location in the physical world.
References
Afonso, A.P., Regateiro, F.S. and Silva, M.J. (1998), “UbiData: an adaptable framework for
information dissemination to mobile users”, Object Oriented Technology, ECOOP’98
Workshop on Mobility and Replication, Brussels, July 20-24, p. 1543.
Azuma, R. (1997), “A survey of augmented reality”, Teleoperators and Virtual Environments,
Vol. 6 No. 4, pp. 355-85.
AP
59,4/5
434
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
Bodum, L. (2005), “Modelling virtual environments for geovisualization: a focus on
representation”, in Dykes, J.A., Kraak, M.J. and MacEachren, A.M. (Eds), Exploring
Geovisualization, Elsevier, London, pp. 389-402.
Brimicombe, A. and Li, Y. (2006), “Mobile space-time envelopes for location-based services”,
Transactions in GIS, Vol. 10 No. 1, pp. 5-23.
Dix, A., Finlay, J.E., Abowd, G.D. and Beale, R. (2004), Human-Computer Interaction,
Prentice-Hall, Harlow.
Dollner, J. (2005), “Geovisualization and real-time computer graphics”, in Dykes, J.A., Kraak, M.J.
and MacEachren, A.M. (Eds), Exploring Geovisualization, Elsevier, London, pp. 325-44.
Elf World (2006), “Elven Forest, 3D”, available at: www.allelves.ru/forest/ (accessed 12 December
2006).
Feiner, S.K. (2002), “Augmented reality: a new way of seeing”, Scientiﬁc American, Vol. 4 No. 24,
pp. 48-55.
Ghadirian, P. and Bishop, I.D. (2002), “Composition of augmented reality and GIS to visualise
environmental changes”, Proceedings of the Joint AURISA and Institution of Surveyors
Conference, Adelaide, 25-30 November.
Golledge, R.G. and Stimson, R.J. (1997), Spatial Behaviour: A Geographic Perspective, The
Guildford Press, New York, NY.
Google (2006), Google Local, available at: http://local.google.co.uk/ (accessed 10 December 2006).
Hallaway, D., Hollerer, T. and Feiner, S. (2004), “Bridging the gaps: hybrid tracking for adaptive
mobile augmented reality”, Applied Artiﬁcial Intelligence, Vol. 18 No. 6, pp. 477-500.
Hedley, N.R., Billinghurst, M., Postner, L., May, R. and Kato, H. (2002), “Explorations in the use of
augmented reality for geographic visualization”, Presence, Vol. 11 No. 2, pp. 119-33.
Ho¨llerer, T., Feiner, S.K., Terauchi, T., Rashid, G. and Hallaway, D. (1999), “Exploring MARS:
developing indoor and outdoor user interfaces to a mobile augmented reality system”,
Computers and Graphics, Vol. 23 No. 6, pp. 779-85.
Jiang, B. and Yao, X. (2006), “Location-based services and GIS in perspective”, Computers
Environment and Urban Systems, Vol. 30 No. 6, pp. 712-25.
Kirste, T. (1995), “An infrastructure for mobile information systems based on a fragmented
object model”, Distributed Systems Engineering Journal, Vol. 2 No. 3, pp. 161-70.
Klatzky, R.L. (1998), “Allocentric and egocentric spatial representations: deﬁnitions, distinctions,
and interconnections”, in Freksa, C., Habel, C. and Wender, K.F. (Eds), Spatial Cognition –
An Interdisciplinary Approach to Representation and Processing of Spatial Knowledge,
Springer, Berlin, pp. 1-18.
Kray, C. and Kortuem, G. (2004), “Interactive positioning based on object visibility”, in Brewster, S.
and Dunlop, M. (Eds), Mobile Human-Computer Interaction, Springer, Berlin, pp. 276-87.
Liarokapis, F. (2005), “Augmented reality interfaces – architectures for visualising and
interacting with virtual information”, DPhil thesis, Department of Informatics, School of
Science and Technology, University of Sussex, Brighton.
Liarokapis, F. and Brujic-Okretic, V. (2006), “Location-based mixed reality for mobile information
services”, Advanced Imaging, Vol. 21 No. 4, pp. 22-5.
Liarokapis, F., Mountain, D., Papakonstantinou, S., Brujic-Okretic, V. and Raper, J. (2006), “Mixed
reality for exploring urban environments”, Proceedings of the 1st International Conference
on Computer Graphics Theory and Applications, Setu´bal, 25-28 February, pp. 208-15.
LOCUS (2007), Homepage, available at: www.locus.org.uk (accessed 22 January 2007).
MR interfaces for
mixed reality
interfaces
435
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
Mannings, R. and Pearson, I. (2003), “‘Virtual air’: a novel way to consider and exploit
location-based services with augmented reality”, Journal of the Communications Network,
Vol. 2 No. 1, pp. 29-33.
Milgram, P. and Kishino, F. (1994), “A taxonomy of mixed reality visual displays”, IEICE
Transactions on Information Systems E Series D, Vol. 77 No. 12, pp. 1321-9.
Milgram, P., Takemura, H., Utsumi, A. and Kishino, F. (1994), “Augmented reality: a class of
displays on the reality-virtuality continuum”, Telemanipulator and Telepresence
Technologies, Vol. 2351, pp. 282-92.
Mountain, D.M. (2005), “Exploring mobile trajectories: an investigation of individual spatial
behaviour and geographic ﬁlters for information retrieval”, PhD thesis, Department of
Information Science, City University London, London.
Mountain, D.M. and Liarokapis, F. (2005), “Interacting with virtual reality scenes on mobile
devices”, Human Computer Interaction with Mobile Devices and Services, University of
Salzburg, Salzburg, 19-22 September.
Mountain, D.M. and MacFarlane, A. (2007), “Geographic information retrieval in a mobile
environment: evaluating the needs of mobile individuals”, Journal of Information Science,
forthcoming.
NASA Jet Propulsion Laboratory (2006), “Lander and Rover on Mars”, available at: http://mars.
sgi.com/worlds/pathﬁnder/pathﬁnder.html (accessed 12 December 2006).
Ostrem, J. (2002), “Palm OS user interface guidelines”, available at: www.palmos.com/dev/
support/docs/ui/UIGuide_Front.html (accessed 10 April 2006).
Peng, Z.-R. and Tsou, M.-H. (2003), Internet GIS: Distributed Geographic Information Services for
the Internet and Wireless Networks, Wiley, New York, NY.
Piekarski, W. and Thomas, B.H. (2002), Unifying Augmented Reality and Virtual Reality User
Interfaces, University of South Australia, Adelaide.
Raper, J.F. (2007), “Geographic relevance”, Journal of Documentation, forthcoming.
Rheingold, H.R. (1991), Virtual Reality, Summit Books, New York, NY.
Schoﬁeld, E. and Kubin, G. (2002), “On interfaces for mobile information retrieval”, Lecture Notes
in Computer Science, Vol. 2411, pp. 383-7.
Tamura, H. and Katayama, A. (1999), “Steps toward seamless mixed reality”, in Ohta, Y. and
Tamara, H. (Eds), Mixed Reality Merging Real and Virtual Worlds, Ohmsha, Tokyo,
pp. 59-84.
Umlauf, E., Piringer, H., Reitmayr, G. and Schmalstieg, D. (2002), “ARLib: the augmented
library”, Proceedings of the First IEEE International Augmented Reality ToolKit
Workshop, Darmstadt.
WebPark (2006), “Geographically relevant information for mobile users in protected area”,
available at: www.webparkservices.info (accessed 12 December 2006).
Yell Group (2006), The UK’s Local Search Engine, available at: www.yell.com (accessed
12 December 2006).
Corresponding author
David Mountain can be contacted at: dmm@soi.city.ac.uk
AP
59,4/5
436
To purchase reprints of this article please e-mail: reprints@emeraldinsight.com
Or visit our web site for further details: www.emeraldinsight.com/reprints
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
This article has been cited by:
1. David M. MountainFrom Location-Based Services to Location-Based Learning: Challenges and
Opportunities for Higher Education 327-343. [CrossRef]
2. Jonathan Raper. 2009. Geographic information science. Annual Review of Information Science and
Technology 43:1, 1-117. [CrossRef]
3. Jonathan Raper, Georg Gartner, Hassan Karimi, Chris Rizos. 2007. Applications of location–based
services: a selected review. Journal of Location Based Services 1:2, 89-111. [CrossRef]
4. Yu-Horng Chen, Yih-Shyuan ChenA Study of Mobile Guide Applications in Wayfinding Context
230-246. [CrossRef]
DownloadedbyMASARYKOVAUNIVERZITAAt03:1726January2015(PT)
Interactive Virtual and Augmented Reality Environments
125
8.9 Paper #9
Liarokapis, F., Macan, L., Malone, G., Rebolledo-Mendez, G., de Freitas, S. Multimodal
Augmented Reality Tangible Gaming, Journal of Visual Computer, Springer, 25(12): 1109-
1120, 2009.
Contribution (30%): Contribution on the design of the architecture. Implementation of parts of
the AR interface. Write-up of most of the paper.
Vis Comput (2009) 25: 1109–1120
DOI 10.1007/s00371-009-0388-3
O R I G I NA L A RT I C L E
Multimodal augmented reality tangible gaming
Fotis Liarokapis · Louis Macan · Gary Malone ·
Genaro Rebolledo-Mendez · Sara de Freitas
Published online: 27 August 2009
© Springer-Verlag 2009
Abstract This paper presents tangible augmented reality
gaming environment that can be used to enhance entertainment
using a multimodal tracking interface. Players can interact
using different combinations between a pinch glove,
a Wiimote, a six-degrees-of-freedom tracker, through tangible
ways as well as through I/O controls. Two tabletop augmented
reality games have been designed and implemented
including a racing game and a pile game. The goal of the
augmented reality racing game is to start the car and move
around the track without colliding with either the wall or the
objects that exist in the gaming arena. Initial evaluation results
showed that multimodal-based interaction games can
be beneﬁcial in gaming. Based on these results, an augmented
reality pile game was implemented with goal of
completing a circuit of pipes (from a starting point to an end
point on a grid). Initial evaluation showed that tangible interaction
is preferred to keyboard interaction and that tangible
games are much more enjoyable.
F. Liarokapis ( ) · L. Macan · G. Malone
Interactive Worlds Applied Research Group, Coventry University,
Coventry, UK
e-mail: F.Liarokapis@coventry.ac.uk
L. Macan
e-mail: macanl@coventry.ac.uk
G. Malone
e-mail: maloneg@coventry.ac.uk
G. Rebolledo-Mendez · S. de Freitas
Serious Games Institute, Coventry University, Coventry, UK
G. Rebolledo-Mendez
e-mail: GRebolledo-Mendez@coventry.ac.uk
S. de Freitas
e-mail: s.defreitas@coventry.ac.uk
Keywords Serious games · Pervasive computing ·
Augmented reality · Multimodal interfaces
1 Introduction
Computerized games which have learning or training purposes
demonstrate a popular trend in training due to the wide
availability and ease of use of virtual worlds. The use of serious
games in virtual worlds not only opens up the possibility
of deﬁning learning game-based scenarios but also
of enabling collaborative or mediated learning activities that
could lead to better learning [1]. An added beneﬁt of using
serious games in combination with virtual worlds is that
learners engage with these in a multimodal fashion (i.e. using
different senses) helping learners to fully immerse in a
learning situation [2] which might lead to learning gains [3].
The multimodal nature of virtual worlds [4] and the facilities
they offer to share resources, spaces and ideas greatly support
the development and employment of serious games and
virtual worlds for learning and training. The use of games as
learning devices is not new. The popularity of video games
among younger people led to the idea of using them with educational
purposes [5]. As a result, there has been a tendency
to develop more complex serious games which are informed
by both pedagogical and game-like, fun elements. One common
example of these combinations is the use of agents [6]:
The idea behind agents is to provide pedagogical support
[7] while providing motivating environments in the form of
agents [8]. However, the use of agents is not the only motivating
element in serious games as the use metaphors [9]
and narratives [10] have been used to support learning and
training in game-like scenarios.
Tangible games can sometimes have an educational aspect.
The whole idea of playability in tangible games is the
1110 F. Liarokapis et al.
player’s interaction with the physical reality. In addition, the
accessibility space that is the key to the oscillation between
embedded and tangible information [11]. On the contrary,
augmented reality (AR) has existed for quite a few years and
numerous prototypes have been proposed mainly from universities
and research institutes. AR refers to the seamless
integration of virtual information with the real environment
in real-time performance. AR interfaces have the potential
of enhancing ubiquitous environments by allowing necessary
information to be visualized in a number of different
ways, depending on the user needs. However, only a few
gaming applications combined them together to offer a very
enjoyable and easy to use interface.
This paper presents tangible augmented reality gaming
environment that can be used to enhance entertainment using
a multimodal tracking interface. The main objective of
the research is to design and implement generic tangible
augmented reality interfaces that are user-friendly in terms
of interaction and can be used by a wide range of players
including the elderly or people with disabilities. Players
can interact using different combinations between a pinch
glove, a Wiimote, a six-degrees-of-freedom tracker, through
tangible ways as well as through I/O controls. Two tabletop
augmented reality games have been designed and implemented
including a racing game and a pile game. The
goal of the augmented reality racing game is to start the
car and move around the track without colliding with either
the wall or the objects that exist in the gaming arena. Initial
evaluation results showed that multimodal-based interaction
games can be beneﬁcial in gaming. Based on these results,
an augmented reality pile game was implemented with goal
of completing a circuit of pipes (from a starting point to an
end point on a grid). Initial evaluation showed that tangible
interaction is preferred to keyboard interaction and that
tangible games are much more enjoyable.
2 Background
In the past, a number of AR games have been designed in
different areas including education, learning, enhanced entertainment
and training [12]. A good survey of tracking
sensors used in ubiquitous AR environments [13], as well
a taxonomy of mobile and ubiquitous applications, was previously
documented [14]. This section presents an overview
of the most characteristic applications and prototypes that
integrate tracking sensors into AR tabletop and gaming environments.
One of the earliest examples of educational AR
was the MagicBook [15]. This is a real book which shows
how AR can be used in schools for educational purposes
and is an interesting method of teaching. MagicBook was
also used as a template for a number of serious applications
in numerous AR games.
One of the earliest pervasive AR prototypes is NaviCam
[16], which has the ability to recognize the user’s situation
by detecting color-code IDs in real-world environments by
displaying situation-sensitive information by superimposing
messages on its video see-through screen. Another early important
work refers to the Remembrance Agent [17], a textbased
AR wearable system which allows users to explore
over a long period of time augmented representations and
provide better ways of managing such information. EMMIE
[18] is a hybrid user interface to a collaborative augmented
environment which combines a variety of different technologies
and techniques, including virtual elements such as
3D widgets, and physical objects such as tracked displays
and input devices. Users share a 3D virtual space and manipulate
virtual objects that can be moved among displays
(including across dimensionalities) through drag-and-drop.
A more recent prototype is DWARF [19] which includes
user interface concepts, such as multimedia, multimodal,
wearable, ubiquitous, tangible, or augmented reality-based
interfaces. DWARF covers different approaches that are all
needed to support complex human–computer interactions.
Higher level functionality can be achieved allowing users to
manage any complex, interrelated processes, using a number
of physical objects in their surroundings. The framework can
be used for single-user as well as multi-user applications. In
another prototype, the combination of AR and ubiquitous
computing can lead to more complex requirements for geometric
models that are appearing [20]. For such models a
number of new requirements appear concerning cost, ease
of reuse, inter-operability between providers of data, and ﬁnally
use in the individual application.
In terms of enhanced entertainment, outdoor AR gaming
plays a signiﬁcant role. A characteristic example is the
Human Pacman project [21] that is built upon position and
perspective sensing via GPS, inertia sensors and tangible
human–computer interfacing with the use of Bluetooth and
capacitive sensors. The game brings the computer gaming
experience to a new level of emotional and sensory gratiﬁcation
by embedding the natural physical world ubiquitously
and seamlessly with a fantasy virtual playground. AR Tennis
[22] is the ﬁrst example of a face-to-face collaborative
AR application developed for mobile phones. Two players
sit across the table from each other, while computer vision
techniques are used to track the phone position relative to the
tracking markers. When the player points the phone camera
at the markers, they see a virtual tennis court overlaid on live
video of the real world.
Another interesting project is STARS [23] which focused
on the nature of state representation in augmented game designs
and developed several games based on these principles.
Moreover, Mixed Fantasy [24] presents a MR experience
that applies basic research to the media industries of
entertainment, training and informal education. As far as
Multimodal augmented reality tangible gaming 1111
training is concerned, the US Army paid more than $5 million
to design an educational game based on the Xbox platform
to train troops in urban combat [25]. Another example
is the MR OUT project [26] which uses extreme and
complex layered representation of combat reality, using all
the simulation domains such as live, virtual, and constructive
by applying advanced video see-through mixed reality
(MR) technologies. MR OUT is installed at the US Army’s
Research Development and Engineering Command and focuses
on a layered representation of combat reality.
3 Architecture
The architecture of our system has been based on an earlier
prototype [29] but it provides similarities with AR interfaces
such as [18, 19, 27, 28]. In the current system, interaction
is performed using a pinch glove, a six-DOF tracker
and Wiimote (2 DOF). The processing unit can be wearable
(or mobile) and thus the Sony VAIO UMPC was selected
(1.3 GHz, two 1.3 mega-pixel cameras, VGA port, Wi-Fi,
USB ports, Bluetooth and keyboard/mouse). The rest of the
hardware devices used and integrated to the UMPC included
a pinch glove (5DT Data glove), a Wii Remote, a 6-DOF
tracker (Polhemus Patriot) and an HMD (eMagin Z800). An
overview of the system is shown in Fig. 1.
Visualization is enhanced through the use of a headmounted
display (HMD) which includes a three-DOF orientation
tracker. Visual tracking is based on multiple markers
which provide better robustness and range of operation,
based on ARToolKit [30] and ARTag [31] libraries. To retrieve
multimodal tracking data in real time, a socket was
created which was constantly waiting for input. The input
comes in a structured form so data structures were set up in
order to grab and store this information for the visualization.
This socket server function was placed inside a thread of its
own to stop it affecting the whole process while it waits for
data to be retrieved. Once the data are received, attributes
are assigned to a particular marker or to multiple markers.
When that marker comes into sight of the camera the rendering
part recognizes that the current marker has further
data attached to it.
4 Tracking
An issue that arose early on with using the player’s hand to
interact with virtual objects in an AR environment was occlusion.
Once the hand moves to interact with the AR scene,
it obscures a real-world visual marker. As a result, the onscreen
objects disappear; the AR system relies on the markers
to accurately register the virtual information on screen.
This being the case, one of the objectives early on was to
create a system that would use multiple markers to represent
a single object or set of objects. This would mean that even
if one or several markers were blocked by a user’s hand,
for example, the objects would still be displayed based on
where the visible markers were placed. The game uses a
method of detecting which marker of the available markers
Fig. 1 Multimodal architecture
for tangible gaming
1112 F. Liarokapis et al.
on a sheet of paper is currently the most visible, based on
a conﬁdence rating assigned to it by a class in ARToolKit.
This marker becomes the origin from which all other objects
are drawn on the game board. If this marker becomes
obscured, the program automatically switches to the next
highest conﬁdence-rated marker. The conﬁdence is based on
a comparison between the marker pattern stored in memory
and what is detected by the camera in the current frame.
The ﬁrst problem encountered with a tangible interface
incorporated into an augmented reality application was that
of marker occlusion. In order for an object to be drawn on
screen, the camera must have direct line of sight of a recognized
marker. If that marker is even partially obscured, the
program cannot recognize it as a square nor read the information
on it, so the object will not be shown. This presented
an important challenge for the project; if the game
is to be controlled via a tangible interface, then a user must
be able to physically interact with the graphics which will
mean frequently putting their hand between the camera and
the marker. Thus, a method of preventing marker occlusion
that would be simple for a user to set up and would still
allow 3D movement was developed. Moreover, inspired by
the currently unavailable ARTag [31] a multiple marker system
was implemented. Using several markers to represent
the game playing area, one marker at any given time is selected
as the basis to draw many objects. This marker is selected
through the use of a conﬁdence value as explained in
[29].
5 Multimodal interaction
The main objective of this work was to allow for seamless
interaction between the users and the superimposed environmental
information. To achieve this, a number of custom interaction
devices have been researched, such as the PS3 controller,
3D mousse, etc. However, since usability and mobility
were crucial, only a few interaction devices were ﬁnally
integrated to the ﬁnal architecture. In particular, six different
types of interaction were implemented including: hand position
and orientation, pinch glove interaction, head orientation,
Wii interaction and UMPC I/O manipulation. A brief
overview of these techniques is presented below.
5.1 Polhemus
Once integrated into the architecture, there were several issues
that prevented it from being as effective as previously
hoped. The sensors often suffer from inversion problems,
meaning the user’s hand is displayed in the wrong position
and disrupting the interface for the game. A ﬁx employed
was to place the base sensor above and to the left of the
game board, a position where the hand would always be expected
to appear on the positive side of each axis. By taking
the absolute value of each of the position vector’s values,
the inversion problem was resolved, though this does reduce
the area in which movement is tracked. Moreover, Polhemus
Patriot offers two ways in which the data can be captured:
single mode and continuous mode. In the Pile game, the single
mode of capture was used, as the program is not multithreaded
and would stop functioning when the Polhemus’s
continuous method of data capture was selected. In testing,
however, the single mode method was very slow in reporting
the data, causing massively detrimental effects to the frame
rate of the game.
5.2 Hand tracking
Detecting the orientation of the user’s hand plays a large
part in this project. The intention is to move around environments
with ease. A separate measurement is needed from the
player’s body position due to the hand being free to move in
a different orientation to the player’s body. The tracking data
were obtained by attaching a small USB web camera into
the pinch glove. Based on ARTag tracking libraries, hand’s
pose was combined with the data of the player’s position
and orientation in the environment and then used to compute
where the hand is located in the real environment. Based on
those readings, it is easy to deﬁne different functionalities
that may be used for different conﬁgurations. As an exemplar,
a ‘ﬁring’ function was implemented based on localization
of the hand (see the next section). Another function that
was experimentally implemented is multiple camera viewports
(one originating from the UMPC camera and another
one from the mounted web camera) to provide a more immersive
view to the user.
5.3 Head orientation
Head orientation was achieved through the capabilities of
the HMD since it included a three-DOF orientation tracker.
The advantage of using head orientation is that it can illuminate
the use of computer vision methods for head tracking.
However, when used with monitor-based AR, it can provide
a distracting effect. Another problem that occurred after experimentation
is that if it is used with conjunction with the
rest of the sensors (Wiimote and pinch glove), it can confuse
the user. For this reason the tracking capabilities of the
HMD were not used in the application scenarios.
5.4 Wiimote interaction
It was decided to implement the Wii remote as a device
to obtain positional data of the user’s hand for an alternative
to mouse controls. Implementation was based on an extensive
library written to manage the actual communication
with the Wiimote, called Wiiuse. This takes care of all of
Multimodal augmented reality tangible gaming 1113
the Bluetooth communication between the Wiimote and the
computer. It also recognizes events and data received from
the Wiimote accelerometers giving orientation information.
When the Wiimote was implemented into the system, another
thread was added to continuously retrieve data without
affecting the rest of the application. When directional or
action buttons are activated, different operations may be performed
(i.e. start the game, help screen, etc.). The Wiimote
was also found to be very useful since it is a very ‘mobile’
piece of equipment. It is battery-powered and can work for
roughly 35 hours without needing a replacement and it emits
data via Bluetooth. The only disadvantage of the Wiimote
is that it provides 2-DOF tracking, so it is not a completeorientation
device. However, for a number of tabletop games
(i.e. puzzles, racing, etc.) it is a very useful device since only
the yaw rotation is useful.
5.5 Pinch glove
The pinch glove has internal sensors that give the system
data on each ﬁnger’s position. If a user has placed their
index ﬁnger into a curled-up position, any event could be
triggered. In some circumstances this is an ideal choice for
user input; however, if the user has to hold any other piece
of hardware then it would become difﬁcult to make use of
the glove’s data because their hand position would be set
by whatever is in their hand. The pitch glove allows up to
15 different combinations. However, only 5 ﬂexures have
been implemented at this stage corresponding to a ﬁnger
(translate X-axis, translate Y-axis, translate Z-axis, rotate
clockwise and scale). In terms of operation, the glove is initialized
and then a thread is created for the constant monitoring
of the glove; this thread is responsible for grabbing
the data for each ﬁnger and assigning it to a variable which
can be used throughout the application.
5.6 I/O interaction
The I/O interaction (mouse/keyboard) is adequate only
when the HMD was not used. However, it allows users to
perform more accurate manipulations of the superimposed
information and thus it was explored only as backup option.
On the other hand, the camera mounted on the rear could
be used for marker detection and as the hand was holding
the camera, the value returned from the ARToolKit would
be a position and orientation of the hand.
6 Gaming techniques
In the following subsections, an overview of the main functionality
of the generic AR multimodal system is presented.
6.1 Picking and ﬁring
Once it has been established that a user is interacting with
a particular object, the program checks the state of the sensors
on the glove. If it is detected that the user is bending
all ﬁve sensors over a certain threshold, then the object
adopts the same position and orientation as the user’s hand.
This gives the impression that the object has been picked up
and is now held by the user. If the sensors running along
the glove’s ﬁngers are detected to straighten, then the item
is dropped and falls to the plane representing the virtual
ground. One method of interaction that was not fully integrated
into the game, but the framework was created, is a
way of ﬁring from the virtual hand. Figure 2 provides an
overview of how ﬁring is performed.
By making a predetermined gesture, detected by the
glove, a user is able to ﬁre a virtual projectile into the scene.
From the marker on the user’s hand, we can apply a transformation
from its orientation to that of the virtual world
and thereby determine the direction a projectile would travel
from the hand and whether it would intersect with other objects
in the scene. Once integrated into the game, this would
allow a user to destroy obstacles or non-player characters
(NPCs).
6.2 Collision detection and spatial sound
Using bounding boxes around each of the items in the scene
and one to encompass the user’s hand, intersection testing
was used as a simple method of collision detection. Much as
in any game, collision detection plays a vital role in gameplay;
in the AR Racing game, for example, the car is not able
to cross the boundaries of the game board and its progress
is impeded by the other obstacles. Importantly for this particular
application, the collision detection also enables the
program to determine when the user’s hand in the real world
is intersecting with an object in the virtual scene.
To enhance the level of immersion of the application,
features from the OpenAL and AL Utility Toolkit (ALUT)
APIs were added. In our system, the camera is always deﬁned
as the listener, making the levels of all sounds in the
augmented part of the environment relative to the camera’s
position and orientation. For the AR Racing game, examples
of sound sources deﬁned by the game include ‘engine’ and
‘collision’ sounds. The ‘engine’ sound is assigned in the virtual
car whereas the ‘collision’ sound represents the noise
created when the car collides with other virtual objects in
the scene, such as the movable cubes. As the car is directed
away from the center of the camera’s view, the sound of its
engine gradually reduces in volume and, depending on the
direction of movement taken, the balance of stereo sound is
altered accordingly.
Collisions between the car and virtual objects on-screen
will trigger the playback of a “.wav” ﬁle. The volume of this
1114 F. Liarokapis et al.
Fig. 2 Augmented reality ﬁring
technique
ﬁle in each audio channel will again be relative to the positions
of both the camera and the collision. This functionality
creates a base upon which a more complex system of sound
could be developed. For example, the speed of the car affects
the sound of the engine, by using different samples depending
on the current velocity. Also, given the vehicle’s velocity
and the perceived material of another object in the scene,
different sound ﬁles were tested representing varying levels
of collision. A fast impact into a hard surface could sound
completely different than a slight glance against a soft, malleable
object.
6.3 Gestures
The main idea using gestures in AR gaming was to perform
appropriate transformations (i.e. translations, rotations or
scaling). Several possible solutions were considered. Firstly,
using threshold values for rotation, wrist movements could
be interpreted into larger rotations. By monitoring if rotations
occur in a particular axis within a certain number of
frames, it is inferred that the user wished to perform an operation
(i.e. rotate a piece in that direction, Fig. 3). Depending
on the AR gaming scenario, appropriate functionality is assigned.
For instance, for the Pile game (see the next section)
the pieces in the game can only be placed at right angles; it is
reasonable to conclude that if a user wishes to rotate the pipe
pieces in a certain direction, then they wish to do so in increments
of 90 degrees. Similar gesture operation presents a
more comfortable way of playing the game when compared
to carrying out the full rotations each time.
There were, however, several issues that this functionality
presented, which needed to be addressed. Firstly, the speed
and extent of the rotations that would trigger the function
were very difﬁcult to deﬁne. Different users would move at
different speeds and have variable ranges of motion. Setting
the speed that triggered the rotation function too high would
prevent certain users from accessing it, but too low would
Fig. 3 As the hand moves from position A to B, the unintentional
anticlockwise rotation created is shown highlighted in yellow
cause it to be triggered when not required (i.e. by moving
the hand to reach different parts of the board).
7 Tangible racing AR game
To illustrate the effectiveness of the multimodal interface,
the interaction techniques presented above have been combined
with a tabletop AR car gaming application. It was decided
to use a simple gaming scenario and focus more on
the reaction of the players during interaction. The goal of
the game is to start the car and move around the track without
colliding with either the wall or the objects that exist in
the gaming arena. In addition, the objects can be rearranged
in real time by picking and dropping them anywhere in the
arena. A screenshot of the starting stage of the car game is
shown in Fig. 4.
Multimodal augmented reality tangible gaming 1115
Fig. 4 AR racing game
The main aim of the game is to move the car around the
scene using the Wiimote without colliding with the other objects
or the fountain. However, alternative interaction techniques
may be used such as picking using the pinch glove.
Players can interactively change the sound levels (of the car
engine as well as the collisions), the speed of the simulation,
and ﬁnally the color and the size of the car. In addition, they
can interact with the whole gaming arena in a tangible manner
by just physically manipulating the multi-markers. Interaction
is performed in a far more instinctive and tangible
way than is possible using a convention control system (for
example, keyboard and mouse or a video game controller).
The pinch glove was used to move objects in the scene by
grabbing them as illustrated in Fig. 5.
It is worth mentioning that the game can be played in
a collaborative environment by eliminating the use of the
HMD. One player can be in charge of the Wiimote interaction
and another one for the pinch glove manipulation.
Moreover, the game has only been qualitatively evaluated
in two demonstrations at ‘Cogent Computing Applied
Research Centre’ [32] and ‘Serious Games Institute’
(SGI) [33]. At Cogent, the basic functionality of the game
was tested based on ‘think aloud’ evaluation technique [34].
Think aloud is a form of observation that involves participants
talking through the actions they are performing, and
what they believe to be happening, whilst interacting with
a system. Overall, the feedback received was encouraging
but certain aspects need to be improved in the future. The
three tasks that were examined include: Wiimote interaction
and pinch glove interaction. For the ﬁrst task, a virtual sword
was superimposed with a Wiimote placed next to it as shown
in Fig. 6(a). It must be mentioned that yaw detection cannot
be detected as the motion sensor chip used in the Wiimote
is sensitive to gravity, but rotation around the yaw axis is
parallel to the earth.
Users were asked to angle the sword upwards as if to
point to an object in the sky. The feedback received was
positive from four users. However, one user stated that although
it is possible to detect the yaw rotation against one
area, it is impossible to identify different IR sensors placed
around the whole environment. As a result, interaction using
the Wiimote would get confused as one IR sensor went
out of site. For the second task, users were presented with
the data glove placed on their right hand as illustrated in
Fig. 6(a). Then they were asked to manipulate in 3D the virtual
information, in this case the virtual sword used in the
previous task. Virtual manipulation included scaling, rotations
and translations using the ﬁngers of the pinch glove.
All users managed to interact with the pinch glove without
any problems as soon as they were briefed with its operations.
Four users agreed that it was very intuitive to perform
the pre-programmed operations. Two users mentioned that
they would like to have more combinations such as change
or color and activate/deactivate the textual augmentations.
One user stated that it can be tiring to control the UMPC
with one hand only.
At the SGI the game was demonstrated in an internal
event with 20 visitors. Initial feedback received stated that
the game is very realistic in terms of interactions and enjoyable
to play. Especially the idea of picking virtual objects
and placing them in arbitral positions was very enthusiastic.
Most visitors felt that tangible games presented potential for
1116 F. Liarokapis et al.
Fig. 5 Pinch glove interaction
scenarios: (a) the user is picking
up a 3D object (in this case a 3D
cube) that exists in the AR
game; (b) shows how the user
can manipulate the 3D object in
three dimensions; (c) the user is
dropping the 3D object in the
gaming arena; (d) the object is
placed in the gaming arena in a
random position
Fig. 6 (a) Wiimote interaction
test; (b) data glove interaction
test
the next generation of gaming. On the negative side, they
preferred to experience a more complete gaming scenario
including a score indicating successful achievements. In addition,
some users requested more objects in the scene (i.e.
obstacles), multi-player capabilities (i.e. more racing cars)
and more tracks with different levels of difﬁculty.
8 Pile AR game
The game created for this project is based on a simple game
from 1989, ‘Pipe Mania’ [35] with sales of over 4 million
units in the past twenty years. The goal of the game is to
complete a circuit of pipes from a starting point to an end
point on a grid. While a player is laying these pipes, there
is a liquid that is gradually ﬂowing through the pipework. If
a player does not connect the pipes quickly enough, and the
liquid spills out, the game is lost. There were speciﬁc motivations
behind using the template of a game already available.
To begin with, it allowed for rapid development of the
ﬁnal product, which in itself was primarily a testing ground
for the method of interaction that the project is proposing. In
the version of ‘Pipe Mania’ created for this project, the interaction
is entirely carried out through movements of a user’s
hand. They can reach over to pipe pieces, grip their hand
into a ﬁst to hold the piece, then move to a new position and
open their hand to drop the piece. On the right-hand side
of the board, there is a supply point for pipe pieces, which
is automatically replenished when the player picks a piece
from there.
Crates are positioned over some of the game board’s
squares, preventing pipes from being placed in some areas
and blocking certain paths from the starting pipe to the end
pipe (Fig. 7). If a user attempts to release the pipe piece on
Multimodal augmented reality tangible gaming 1117
Fig. 7 AR pile game: (a) initial
setup of the game; (b) pile game
in process
Fig. 8 Interaction test: (a) graph showing users’ enjoyment of the tangible interaction method versus their enjoyment of the keyboard interaction;
(b) graph showing the users’ level of enjoyment of the keyboard and tangibly controlled versions of the game
squares blocked by crate or another pipe piece, they will ﬁnd
the pipe will remain in their hand until they move over an
unoccupied square. The pipe pieces can be rotated by turning
the hand in relation to the game board or visa versa.
The game automatically corrects the rotation to increments
of 90 degrees when the piece is placed on the board. Using
this technique, a player can change any curved pipe piece
to make any turn, and straight pieces can be made to run
either left to right or top to bottom. The goal of the game remains
the same: place the pieces to complete the pipe from
the ﬁxed start to the end before the pipe ﬁlls with water.
To obtain their opinions on particular aspects of the
game’s functionality, users were asked to rate their agreement
with certain statements on a Likert scale. For the purposes
of this project, the most important aspects of the game
were related to its controls, so many of the questions related
to this. The set of questions relating to both the keyboard and
the tangibly controlled versions of the game were exactly the
same, to attempt to ﬁnd the different ways in which users
perceived them. Speciﬁcally, the project aimed to discover
which type of control the users found the most intuitive, the
easiest to use and most enjoyable to interact with. The answers
to the questionnaire, along with the observations from
the tests and the notes from the post-test interview, form the
basis for the conclusions drawn about the effectiveness of
the method of control, as well as the quality of the game developed.
After playing the two versions of the pipe game,
nine users were asked to indicate how strongly they agreed
or disagreed with a number of statements, to gauge their enjoyment
of the different types of game. The graphs in Fig. 8
show their responses to some of the questions.
By observing the players while playing the two versions
of the game, there were several general points that were
noted. Firstly, players with a background in gaming and particularly
those with experience of PC games, were much
faster to pick up the keyboard controls. People who had little
experience of games or solely played console games, were
slower to understand the controls. Several people in this category
were observed to forget the controls and move the
hand in ways they did not intend, slowing down the game.
They were also noticeably frustrated at times, rotating the
hand in the wrong direction, then back pressing several keys
to ﬁnd the correct movement through trial and error. Whilst
playing the tangible version, all players were able to quickly
understand the nature of the controls, even after little to no
explanation from me. The games were generally completed
more slowly, however, as the players became used to the interface
and also explored the limits of the interaction. Several
players had to be prompted to move the board to assist
them with rotation, struggling to complete certain moves.
1118 F. Liarokapis et al.
9 Conclusions and future work
Tangible AR gaming has the potential to change a number
of applications that we perform in our day-to-day activities.
This paper has presented a generic tangible augmented reality
gaming environment that can be used to enhance entertainment
using a multimodal tracking interface. The main
objective of the research is to design and implement generic
tangible interfaces that are user-friendly in terms of interaction
and can be used by a wide range of players, including
the elderly or people with disabilities. Players can interact
using different combinations between a pinch glove,
a Wiimote, a six-degrees-of-freedom tracker, through tangible
ways as well as through I/O controls. Two tabletop augmented
reality games have been designed and implemented,
including a racing game and a pile game. The goal of the
augmented reality racing game is to start the car and move
around the track without colliding with either the wall or the
objects that exist in the gaming arena. Initial evaluation results
showed that multimodal-based interaction games can
be beneﬁcial in gaming. Based on these results, an augmented
reality pile game was implemented with goal of
completing a circuit of pipes (from a starting point to an
end point on a grid). Initial evaluation showed that tangible
interaction is preferred to keyboard interaction and that tangible
games are much more enjoyable. From the research
proposed many potential gaming applications could be produced
such as strategy, puzzles and action games.
Future development will include more work on the graphical
user interface to make it more user-friendly, and speech
recognition is considered as an alternative option to enhance
the usability of interactions. A potentially better solution
that will be tested for the glove in the future on this system
is to use a model for the hand that has separate sections
for each ﬁnger, the position of which would be determined
by the current readings on each of the ﬁnger sensors on the
glove. This model could then use an alpha color value of
1.0, meaning that it is entirely transparent in the scene. As
a result the model would become an alpha mask for the live
video of the user’s hand; the real-world hand would then
appear to be above any virtual objects that the depth testing
determined were further away from the camera. Finally,
a formal evaluation with 30 users is currently under way and
results will be used to reﬁne the architecture.
Acknowledgements The authors would like to thank ‘Interactive
Worlds Applied Research Group (iWARG)’ as well as ‘Cogent
Computing Applied Research Centre (Cogent)’ for their support
and inspiration. Videos of the AR racing game and pile game
can be found at: http://www.youtube.com/watch?v=k3r181_GW-o and
http://www.youtube.com/watch?v=0xPIpinN4r8 respectively.
References
1. Tudge, J.R.H.: Processes and consequences of peer collaboration:
a Vygotskian analysis. Child Dev. 63, 1364–1379 (1992)
2. Csikszentmihalyi, M.: Flow: The Psychology of Optimal Experience.
Harper and Row, New York (1990)
3. Craig, S., Graesser, A., et al.: Affect and learning: an exploratory
look into the role of affect in learning. J. Educ. Media 29, 241–250
(2004)
4. de Freitas, S.: Serious Virtual Worlds: A Scoping Study JISC. Serious
Games Institute Coventry University, London (2008)
5. Malone, T., Lepper, M.: Making learning fun. In: Snow, R.,
Farr, M. (eds.) Aptitude, Learning and Instruction: Co-native and
Affective Process Analyses, pp. 223–253. Erlbaum, Lawrence
(1987)
6. Lester, J.C., Towns, S.G., et al.: Deictic and emotive communication
in animated pedagogical agents. In: Cassell, J., Prevost,
S., Sullivant, J., Churchill, E. (eds.) Embodied Conversational
Agents, pp. 123–154. MIT Press, Boston (2000)
7. Lester, J.C., Converse, S.A., et al.: Animated pedagogical agents
and problem-solving effectiveness: a large-scale empirical evaluation.
In: Proc. of the 8th Int’l Conference on Artiﬁcial Intelligence
in Education, pp. 23–30. IOS Press, Kobe (1997)
8. Yoon, S.-Y., Blumberg, B.M., et al.: Motivation driven learning
for interactive synthetic characters. In: Fourth International Conference
on Autonomous Agents, Barcelona (2000)
9. Laurel, B.: Interface agents: Metaphors with characters. In: Bradshaw,
J.M. (ed.) Software Agents, pp. 67–78. AAAI Press/MIT
Press, London (1997)
10. Iuppa, N., Weltman, G., et al.: Bringing Hollywood storytelling
techniques to branching storylines for training applications. In:
3rd Narrative and Interactive Learning Environments. Edinburgh,
Scotland, pp. 1–8 (2004)
11. Walther, B.K.: Reﬂections on the methodology of pervasive gaming.
In: Proc. of the 2005 ACM SIGCHI Int’l Conference on
Advances in Computer Entertainment Technology, pp. 176–179.
ACM Press, Valencia (2005),
12. Oda, O., Lister, L., J., White, S., Feiner, S.: Developing an augmented
reality racing game. In: Proc. of the 2nd Int’l Conference
on INtelligent TEchnologies for Interactive Entertainment, January
8–10. Cancun, Mexico, Article No. 2 (2008)
13. Beigl, M., Krohn, A., Zimmer, T., Decker, C.: Typical sensors
needed in ubiquitous and pervasive computing. In: Proc. of the 1st
Int’l Workshop on Networked Sensing Systems (INSS), Tokyo,
Japan, June, pp. 153–158 (2004)
14. Dombroviak, K.M., Ramnath, R.: A taxonomy of mobile and pervasive
applications. In: Proc. of the 2007 ACM Symposium on
Applied Computing, pp. 1609–1615. ACM Press, Seoul (2007)
15. Billinghurst, M., Kato, H., Poupyrev, I.: The MagicBook: a transitional
AR interface. Comput. Graph. 25(5), 745–753 (2001)
16. Rekimoto, J., Nagao, K.: The world through the computer: computer
augmented interaction with real-world environments. In:
Proc. of the 8th Annual Symposium on User Interface Software
and Technology (UIST’95), Pittsburgh, Pennsylvania, USA, November,
pp. 29–36. ACM Press, New York (1995)
17. Starner, T., Mann, S., et al.: Augmented reality through wearable
computing. Presence: Teleoper. Virtual Environ. 6(4), 386–398
(1997)
18. Butz, A., Höllerer, T., et al.: Enveloping users and computers in
a collaborative 3D augmented reality. In: Proc. of the 2nd IEEE
and ACM International Workshop on Augmented Reality, October,
pp. 35–44. IEEE Computer Society, San Francisco (1999),
19. Sandor, C., Klinker, G.: A rapid prototyping software infrastructure
for user interfaces in ubiquitous augmented reality. Personal
Ubiquitous Comput. 9(3):169–185 (2005)
Multimodal augmented reality tangible gaming 1119
20. Reitmayr, G., Schmalstieg, D.: Semantic world models for ubiquitous
augmented reality. In: Proc. of Workshop Towards Semantic
Virtual Environments’ (SVE 2005), March (2005)
21. Cheok, A.D., Fong, S.W., et al.: Human pacman: a sensing-based
mobile entertainment system with ubiquitous computing and tangible
interaction. In: Proc. of the 2nd Workshop on Network and
System Support for Games, pp. 106–117. ACM Press, California
(2003)
22. Henrysson, A., Billinghurst, M., Ollila, M.: AR tennis. In: International
Conference on Computer Graphics and Interactive Techniques
Archive ACM SIGGRAPH 2006 Sketches, Article No. 13.
ACM Press, New York (2006)
23. Magerkurth, C., Engelke, T., Memisoglu, M.: Augmenting the virtual
domain with physical and social elements: towards a paradigm
shift in computer entertainment technology. Comput. Entertainment
2(4), 12 (2004)
24. Stapleton, C.B., Hughes, C.E., Moshell, J.M.: MIXED FANTASY:
exhibition of entertainment research for mixed reality. In: Proc.
of the 2nd Int’l Symposium on Mixed and Augmented Reality,
pp. 354–355. IEEE Computer Society, Tokyo (2003)
25. Korris, J.: Full spectrum warrior: How the institute for creative
technologies built a cognitive training tool for the xbox. In: 24th
Army Science Conference. Florida, Orlando, December (2004)
26. Hughes, C.E., et al.: Mixed reality in education, entertainment,
and training. IEEE Comput. Graph. Appl., November/December:
24–30 (2005)
27. Benford, S., Magerkurth, C., Ljungstrand, P.: Bridging the physical
and digital in pervasive gaming. Commun. ACM 48(3), 54–57
(2005)
28. Magerkurth, C., Engelke, T., Grollman, D.: A component-based
architecture for distributed, pervasive gaming applications. In:
Proc. of the 2006 ACM SIGCHI International Conference on Advances
in Computer Entertainment Technology. ACM Press, California
(2006). Article No 15
29. Goldsmith, D., Liarokapis, F., et al.: Augmented reality environmental
monitoring using wireless sensor networks. In: Proc. of the
12th Int’l Conference on Information Visualisation, pp. 539–544.
IEEE Computer Society, Los Alamitos (2008)
30. ARToolKit. Available at: http://www.hitl.washington.edu/
artoolkit/. Accessed at: 21/09/2008
31. ARTag. Available at: http://www.artag.net/. Accessed at:
21/09/2008
32. Cogent Computing Applied Research Centre. Available at:
http://www.coventry.ac.uk/researchnet/cogent. Accessed at:
11/10/2008
33. Serious Games Institute. Available at: http://www.
seriousgamesinstitute.co.uk/. Accessed at: 11/10/2008
34. Dix, A., Finlay, J., Abowd, G., Beale, R.: Human–Computer Interaction.
Prentice Hall, Harlow (2004)
35. Empire Interactive Entertainment: Pipemania 2008 New _2_
With Consoles. Available at: www.empireinteractive.co.uk/
corporate/ﬁles/PDFs/Pipemania%202008%20New%20_2_%20
With%20Consoles.pdf. Accessed at: 19/05/2009
Fotis Liarokapis is the Director
of Interactive Worlds Applied Research
Group (iWARG) at the Faculty
of Engineering and Computing,
Coventry University and a Research
Fellow at the Serious Games
Institute (SGI), Coventry University.
He is also a Visiting Lecturer
at the Centre for VLSI and Computer
Graphics, University of Sussex
and a visiting research fellow
at the giCentre, City University. His
research interests include virtual
and augmented reality, computer
graphics, human–computer interaction
and serious games. Furthermore, he is a member of IEEE, IET,
ACM and BCS and he is on the editorial advisory board of The Open
Virtual Reality Journal published by Bentham.
Louis Macan graduated this year
with a ﬁrst class honors B.Sc. degree
in Creative Technologies from
Coventry University. His previous
research involving augmented reality
has been published as part of
VAST 2007 and IEEE VS-GAMES
2009, at the latter of which he also
presented the work during the conference
proceedings. Louis has recently
worked as a consultant on a
communications technology project
in the Midlands and is preparing to
start development on a serious game
with a company in Milan. He will
begin his Ph.D. with Coventry University towards the end of 2009.
Gary Malone obtained his Bachelor
of Arts degree in Creative Computing
at Coventry University in
2008. He is currently studying a
Master of Science degree at The
University of Newcastle-upon Tyne
in Mobile and Pervasive Computing.
His research interests include
computer vision, 3D visualization,
augmented reality, biometrics, serious
games and virtual worlds.
1120 F. Liarokapis et al.
Genaro Rebolledo-Mendez is a
Senior Lecturer and Researcher at
the Faculty of Informatics University
of Veracruz, Mexico. Previously,
he was a Senior Researcher
at the Serious Games Institute, University
of Coventry, UK. He has also
been a Research Fellow at the London
Knowledge Lab, University of
London and the IDEAS Lab, Sussex
University, UK. Genaro’s interest is
the design and evaluation of educational
technology that adapts sensitively
to affective and cognitive differences
among students. To do so,
he studies how cognitive and affective differences impact students’ behavior
while interacting with educational technology and how, in turn,
technology impacts students’ learning and affect. To that end, he uses
techniques from artiﬁcial intelligence, computer science, education and
psychology.
Sara de Freitas is Professor of Virtual
Environments and Director of
Research at the Serious Games Institute
(SGI)—an international hub
of excellence in the area of games,
virtual worlds and interactive digital
media for serious purposes. Situated
on the Technology Park at
the University of Coventry, Sara
leads an interdisciplinary and crossuniversity
applied research group
with expertise in AI and games,
visualization, mixed reality, augmented
reality and location-aware
technologies. The Research Group
works closely with international industrial and academic research and
development partners. Sara is a Visiting Fellow of the London Knowledge
Lab, London, and a Fellow of the Royal Society of Arts.
Interactive Virtual and Augmented Reality Environments
138
8.10 Paper #10
Goldsmith, D., Liarokapis, F., Malone, G., Kemp, J. Augmented Reality Environmental
Monitoring Using Wireless Sensor Networks, Proc. of the 12th
International Conference on
Information Visualisation (IV08), IEEE Computer Society, 8-11 July, 539-544, 2008.
Contribution (30%): Collaboration on the design of the architecture. Advice on the
implementation of the majority of the VR interface. Write-up of most of the paper.
Augmented Reality Environmental Monitoring Using Wireless Sensor Networks
Daniel Goldsmith, Fotis Liarokapis, Garry Malone, John Kemp
Cogent Computing Applied Research Centre, Coventry University
Department of Computer Science, Coventry CV1 5FB
{goldsmid, F.Liarokapis, maloneg, kempj}@coventry.ac.uk
Abstract
Environmental monitoring brings many challenges
to wireless sensor networks: including the need to collect
and process large volumes of data before presenting the
information to the user in an easy to understand format.
This paper presents SensAR, a prototype augmented
reality interface specifically designed for monitoring
environmental information. The input of our prototype is
sound and temperature data which are located inside a
networked environment. Participants can visualise 3D as
well as textual representations of environmental
information in real-time using a lightweight handheld
computer.
Keywords--- Augmented Reality, Handheld
Interfaces, Human-Computer Interaction, Wireless
Sensor Networks.
1. Introduction
Augmented Reality (AR) is a subset of a Mixed
Reality (MR) that allows for seamless integration of
virtual and real information in real-time. Other important
characteristics of AR include real-time and accurate
representation in three-dimensions (3D) as well as being
interactive. However, AR is not limited to vision but can
be applied to all senses including touch, and hearing [1].
Although many applications of AR have emerged, they
are usually concerned with tracking. This is achieved
using computer vision techniques, sensor devices, or
multimodal interactions to calculate position and
orientation of a camera/user. However, AR has not been
actively employed for the visualisation of environmental
information originating from a Wireless Sensor Network
(WSN).
Research within the WSN community has led to the
development of new computing models ranging from
distributed computing to large-scale pervasive computing
environments [2]. This rapid evolution of pervasive
computing technologies has allowed the development of
novel interfaces which are capable of interacting with
sensory information originating from the environment
with little or no manual intervention. Although a number
of technologies are able to perform natural interactions,
pervasive AR is one of the strongest candidates.
WSN technology uses networks of sense enabled
miniature computing devices to gather information about
the world around them. Common applications include
environmental monitoring, military, health, home, and
education [3], [4]. While the gathering of data within a
sensor network is one challenge, another of equal
importance is presenting the data in a useful way to the
user. Using low cost, low power computing devices
equipped. A sensor network is composed of a large
number of sensor nodes, with wireless communication
and sensing hardware. These are deployed within the
area of interest to monitor and measure phenomenon and
collaboratively processes the data before relaying
information to a base station or sink node.
The constrained nature of the data gathering
platform has lead to much of the active WSN research to
be focused on network concerns such as data
communication and energy efficiency. Recent initiatives
such as Nokia's Sensor Planet [5] aim to incorporate
sensor networks, mobile phones, and other devices into a
large scale ad-hoc multipurpose sensor network [6] with
sensor information available via a web based Application
Programming Interface (API). The use of these
commonly available and familiar devices is envisaged to
allow WSN to become part of the pervasive computing
mainstream, requiring new approaches to information
visualisation to process the vast amount of information
available.
This paper presents SensAR - an environmental
monitoring prototype that uses WSN to gather
temperature and audio data about the user’s
surroundings. SensAR displays the environmental
information in an understandable format using a realtime
handheld AR interface. Participants can visualise
3D as well as textual representations of the sound and
temperature information in a tangible manner. To our
knowledge, SensAR is the first to embody the idea of
combining sound and temperature data in a handheld AR
environment.
The remainder of this paper is organised as follows:
Section 2 describes some of the most important related
work. Section 3 gives an overview of our systems
architecture including a brief description of its main
components. Section 4 presents the operation of the
handheld AR interface in an indoor networked
environment. Section 5 illustrates how the environmental
information coming from the sound and temperature
sensors is visualised in an AR environment through 3D
objects and textual annotations. Finally, section 6
concludes by presenting our plans for future work
12th International Conference Information Visualisation
1550-6037/08 $25.00 © 2008 IEEE
DOI 10.1109/IV.2008.72
539
2. Related Work
A number of WSN applications have been proposed
in the past and some of the most characteristic systems
are presented here. iPower [7] utilises a WSN to provide
intelligent energy conservation for buildings. The system
is composed of a sensor network that gathers data on
light levels, temperature, and sound to activate
appliances based on the likelihood of a room being
occupied. If the system detects low temperature or high
brightness in a room that is unlikely to be occupied, a
signal can be sent to turn off the air-conditioning or
reduce lighting levels. If the network receives a signal
that the area is still occupied (for instance detection of a
noise) the system returns the light and temperature levels
to values suitable for comfortable use of the room. Aside
from a system overview provided by the user interface
iPower has no data visualization, it nevertheless presents
a practical application of wireless sensor networks in
environmental monitoring.
SpyGlass [8] is concerned with the provision of a
visualization framework for WSNs. Data is gathered on a
gateway node within the network then passed to the
visualization application on a remote machine. The data
is passed using the TCP/IP suite of protocols and
therefore can be carried over many network types
including Local Area Network (LAN), Wireless Local
Area Network (WLAN), and General Packet Radio
Service (GPRS). Network visualisation is provided by a
Graphical User Interface (GUI) allowing an overall view
of the network to be displayed. This visualization
component is comprised of a relation layer to display
relationships between nodes and a node layer to draw the
nodes themselves.
The ‘Plug’ sensor network [9] is a ubiquitous
networked sensing platform ideally suited to broad
deployment in environments where people work and
live. The backbone of the Plug sensor network is a set of
35 sensor, radio, and computation enabled power strips.
A single Plug device fulfils all the functional
requirements of a normal power strip and can be used
without special training. Additionally, each Plug has a
wide range of sensing modalities (e.g., sound, light,
electrical voltage and current, vibration, motion, and
temperature) for gathering data about how it is being
used and its nearby environment.
In terms of handheld AR sensing applications, most
prototypes that exist focus on multimodal interactions
using tracking sensors. An interesting approach to 3D
multimodal interaction in immersive AR environment
that accounts for the uncertain nature of the information
sources was proposed by [10]. The multimodal system
fuses symbolic and statistical information from a set of
3D gesture, spoken language, and referential agents. The
referential agents employ visible or invisible volumes
that can be attached to 3D trackers in the environment,
and which use a time stamped history of the objects that
intersect them to derive statistics for ranking potential
referents.
Another approach proposed an architecture for
handling events from different tracking systems and
maintaining a consistent spatial model of people and
objects [11]. The principal distinguishing feature is the
automatic derivation of dataflow network of distributed
sensors, dynamically and at run-time, based on
requirements expressed by clients. This work also
classifies sensor characteristics for AR and Ubicomp.
Moreover, a grid of sensors was used to synthesize
images in AR by interpolating the data and mapping
them to colour values [12]. This application used an
optically tracked mobile phone as a see-through
handheld AR display allowing for interaction metaphors
already familiar to most mobile phone users. The sensor
network is interfaced by visualizing its data within its
context, taking advantage of the spatial information.
Furthermore, techniques for creating indoor location
based applications for mobile augmented reality systems
using computer vision and sensors have been also well
documented [13]. An indoor tracking system was
proposed that covers a substantial part of a building. It is
based on visual tracking of fiducial markers enhanced
with an inertial sensor for fast rotational updates. To
scale such a system to a whole building, a space
partitioning scheme was introduced to reuse fiducial
markers throughout the environment.
3. System Architecture
SensAR follows an experimental prototype recently
presented [14]. However, there are many differences
with the earlier prototype. Firstly, sound and temperature
sensors are populated inside the environment (see Figure
2). Secondly, WiFi is used instead of Bluetooth
providing a much faster method of communication,
although Bluetooth can be enabled for connecting other
hardware devices. Finally, the mobile client side
provides enhanced visualisation options including textual
and 3D information. SensAR uses a three-tier
architecture consisting of a sensor layer, communication
layer, and visualisation layer. A diagrammatic overview
of the pipeline of our system is presented in Figure 1.
Figure 1 Architecture of SensAR
The sensor layer handles multimodal data from
temperature and sound sensors, positioned at fixed
locations within an indoor environment. These sensors
are attached to a WSN node, which is capable of
performing the initial processing before passing the data
up the protocol stack. In the case of the WSN node, the
540
data is formatted ready for transmission by the
communication layer. The data is transferred over a WiFi
link via User Datagram Protocol (UDP) to a dedicated
server on the visualisation machine. This link is
bidirectional, and allows control packets to be sent
between each device. The visualisation layer contains the
handheld device running the AR software. The data
received is represented using visual information such as
3D objects and textual information.
3.1 Hardware
There are a variety of available embedded platforms
for sensing applications. Communication technologies
such as Bluetooth, WiFi and ZigBee [15] allow for
network collection and transfer of environmental data to
wearable devices. The hardware choice decision for the
network discussed here was based on the available
platforms' sensing capability, ease of software
development and size.
Gumstix Verdex XM4-bt boards were selected as
the main processing platform. Although not as popular as
Mica2 motes for wireless sensing applications, they are
becoming more prevalent [16]. These devices offer more
processing power and memory (in terms of both RAM
and flash) than many similarly sized platforms. The
particular model chosen includes an Intel XScale
PXA270 400MHz processor, 16MB of flash memory,
64MB of RAM, a Bluetooth controller and antenna, 60pin
and 120-pin connectors for expansion boards, and a
further 24-pin flex ribbon connector. There are no onboard
sensors provided, though a variety of interface
methods are available.
Figure 2 Sound and Temperature Sensors
Commercially available expansion boards for the
Gumstix platform include communications options such
as WiFi and Ethernet, along with additional storage
provided by Compact Flash (CF) cards. An expansion
board developed in house additionally provides an I2
C
bus for the connection of sensors, along with a ZigBee
compatible module. The sensors used for temperature
sensing were the Analog Digital ADT75A chip [17],
which performs sampling and conversion internally,
providing the sensed temperature values via an I2
C bus.
For visualisation, a VAIO UX Ultra Mobile PC
(UMPC) was used, which is one of the smallest fully
functioning PCs ever made. Comparable to PDAs in size,
but with more powerful processing capabilities, it is able
to run complex AR applications. VAIO UX includes an
Intel® Core™ Solo Processor at 1.3MHz, wireless
802.11a/b/g, 32GB hard drive, 1GB SDRAM, 4.5" touch
panel LCD, a Graphics Accelerator and 2 built-in digital
cameras. This makes it a suitable device to handle our
WSN configuration and display the visualisation with
real-time performance
3.2 Software
At the heart of the sensing system is a collection of
software libraries developed as part a software support
system for WSN. The provision of a generic interface to
common sensor network tasks allows the implementation
details of complex tasks to be hidden, thereby offering
the systems designer a cleaner workflow. Software
abstractions of sensing and communication tasks have
been created, allowing the user to plug functionality into
the application.
A generic interface to the I2
C bus has been
implemented to allow access to data from the
temperature modules. The API allows other I2
C enabled
devices such as digital compasses, pressure sensors,
accelerometers, and light meters to be supported. Using
an abstraction model for sensing interfaces, the process
of gathering data is simplified, as similar function calls
are used to retrieve information from different devices.
This in turn allows a modular approach to application
development.
The framework supports a range of communication
protocols and interfaces, offering the choice of
Bluetooth, WiFi and Ethernet based data transfer.
Support is also provided for network protocols offered
by each communications stack. As an example, WiFi
offers connection orientated TCP and connectionless
UDP allowing the user to balance the requirements of the
application with the quality of service received. In
keeping with the modular theme of the framework, the
communication modules are interchangeable. This
allows the user to swap between radio devices by simply
changing the software module used. In the instance of
WiFi and Ethernet this is a straight swap as the two
communications mediums use the IP suite of protocols,
and addressing schemes. However if the user wishes to
switch to Bluetooth communication, the alternative
hardware addressing would scheme would need used, all
other communication calls are handled in the same way
regardless of communication medium.
The sensing layer was developed using the above
framework. Using the high-level Python programming
language for development has allowed the algorithms for
541
the gathering of data to be prototyped with a
development cycle much shorter that that associated with
complied languages such as C. Although Python offered
ease of development, the framework has also been
implemented in a collection of C libraries, allowing the
final application to be transferred to this faster executing
compiled language for deployment. Whilst Python and C
have differences in syntax, the framework has been
designed to take account of the similarities in
functionality and programming methodology afforded by
both languages. This allows the code developed to be
transferred between each language making only small
syntactical changes.
The visualisation layer used the OpenGL API for the
rendering of the 3D environmental representations. The
textual augmentations were implemented based on
GLUT API which provides support for bitmap fonts.
The six-degrees-of-freedom tracking of the user inside
the environment was based on ARToolKit library [18]
and the rest of the coding of the handheld interface was
performed in C programming language. Finally, the 3D
models used in the visualisation were designed using an
open source modelling tool (Blender) and exported in
VRML file format.
4. Handheld AR Interface
A handheld AR interface has been implemented in
order to allow a user to experience the environmental
information gathered. Sensors collect sound and
temperature level data at various points in space and
relay this information to SensAR. A user interface is then
used to seamlessly superimpose computer generated
representations of sound and temperature based on the
readings of these sensors. Figure 3 illustrates how a user
operating the handheld interface would perceive a 3D
representation of environmental information (in this case
temperature and sound) in a mobile AR environment.
Figure 3 Handheld AR Visualisation
Users can navigate inside the room by moving the
UMPC and detecting different markers. SensAR checks
each video frame for predetermined patterns that are
included in the environment. These are squares
containing a unique black and white image that the
program can be programmed to recognize. The markers
used in this project have been specifically selected from
the ARTag library [19] to be distinct from one another
regardless of orientation or reflection. The current
version of the system uses patterns numbered 1 to 12,
taken from the ARTag implementation of the ARToolKit
as shown in Figure 4.
Figure 4 Marker Setup
The markers are placed so that the centre of the
pattern is halfway up the height of the wall (142.5cm
from the floor). For each marker different sound and
temperature sensors are attached as close to the markers
as possible to give accurate localisation readings. The
markers are enlarged as much as possible whilst still
fitting on a single sheet of A4 size sheet of paper. When
the program detects a marker within a video frame, it
overlays a 3D model of a thermometer and a music note
onto the video image (Figure 6 and Figure 7). One of the
versions of our program also includes a 3D
representation of the entire room, which is projected over
the real room in AR.
In order for this to line up with the real image sent
from the camera, we have to attach the model to one of
the 12 markers (Figure 4), much in the same way as the
3D virtual sensors as illustrated in Figure 5.
542
Figure 5 Virtual Representation of Environment
However, if there are several markers in view, we
don't want the program to draw multiple versions of the
virtual room. To prevent this, we exploit the confidence
value that is used in marker detection. Each detected
pattern is then checked for correlation with the markers
detected by the program and a confidence value is
generated to show the level of similarity. SensAR
compares the confidence values of the patterns that have
been established as being markers. The marker that has
the greatest confidence value is used as the point from
which to draw the virtual room. One advantage with
using this system is that the room will automatically
revert to the next best marker in sight should the most
visible marker become obscured.
5. Environmental Data Visualisation
There is an open issue of how to visually represent
environmental data coming from the WSNs. One of the
aims of this work was to select an appropriate metaphor
to assist users in rapid interpretation of the information.
After some informal evaluation, it was decided to
represent the environmental information through the use
of a 3D thermometer and a 3D music note. In the
previous prototype (which included only sound data) a
3D microphone was used.
In terms of operation, as soon as the temperature and
sound sensors are ready to transmit data, visual
representations including a 3D thermometer and a 3D
music note as well as textual annotations are
superimposed onto the appropriate marker. This is the
neutral stage of SensAR where no sensor readings are
actually inputted to the AR interface. An example
screenshot of the neutral stage is illustrated in Figure 6.
Figure 6 Low Levels of Sound and Temperature
When environmental data is transferred to the AR
interface, the color of the 3D thermometer and the 3D
music note change according to the temperature level
and sound volume accordingly. In addition, textual
annotations indicate the sensor readings. For the
temperature data, the readings from the sensors (which
have an error of ± 0.1) are superimposed as text next to
the 3D thermometer. For the sound data, a different
measure was employed based on a scale 0 to 4, where 0
corresponds to ‘quiet’, 1 corresponds to ‘low’, 2
corresponds to ‘medium’, 3 corresponds to ‘loud’ and 4
corresponds to ‘very loud’. This choice of banding has
been based on user input, to provide a clearer
representation of sound levels than a raw value could.
Also note that the bottom right side displays the intensity
of the sound level. A screenshot of the above
configuration is shown in Figure 7.
Figure 7 High Levels of Sound and Temperature
It is worth-mentioning that the camera position is
also displayed on the top left side of the interface. This
feature is useful for calculating the position of the user in
respect to the rest of the environment. Moreover, users
can interact with the superimposed information using the
keyboard or the mouse of the UMPC. In this way, it is
possible to translate, rotate or scale the visual
augmentations in real-time. In addition, it is possible to
543
hide the various elements of the interface such as the
camera position, the textual annotations and the 3D
objects.
6. Conclusions and Future Work
This paper describes SensAR, a prototype mobile
AR system for visualising environmental information
including temperature and sound data. Sound and
temperature data are transmitted wirelessly to our client
which is a handheld device. Environmental information
is represented graphically as 3D objects and textual
information in real-time based. Participants visualise and
interact with the augmented environmental information
using a small but powerful handheld computer. The main
advantage of SensAR is the visual representation of
wireless sensor data in a meaningful and tangible way.
We believe that SensAR design principle is essential for
the effective realisation of ubiquitous computing.
In the future we are planning to integrate more
sensors to SensAR including light, pressure and
humidity. On the visualization side, we are currently
working with a head-mounted display that includes
orientation tracking to provide a greater level of
immersion to the users. In terms of interaction other
forms of interaction will be added to the prototype such
as a digital compass, a virtual reality glove and the Wii
controller. Finally we plan to do user extensive studies to
test the feasibility of SensAR application.
Acknowledgements
The authors would like to thank Dr. Elena Gaura and
the rest of the team in the Cogent Computing Applied
Research Centre for their support and inspiration as well
as Louis Macan, Sarah Mount and Prof. Robert Newman
who worked so hard during the development of the first
prototype.
References
[1] Azuma, R., Baillot, Y., et al. Recent Advances in
Augmented Reality, IEEE Computer Graphics and
Applications, 21(6): 34-47, 2001.
[2] Harihar, K., Kurkovsky, S. Using Jini to Enable
Pervasive Computing Environments, In Proc. of the 43rd
Annual Southeast Regional Conference - Volume 1,
Architecture and distributed systems, ACM Press,
Kennesaw, Georgia, 188-193, 2005.
[3] Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci,
E. Wireless sensor networks: a survey. Computer
Networks, 38(4): 393-422, 2002.
[4] Culler, D., Estrin, D., Srivastava, M. Guest Editors'
Introduction: Overview of Sensor Networks, Computer
37(8): 41-49, 2004.
[5] SensorPlanet, Available at:
[http://www.sensorplanet.org/], Accessed at: 29/02/2008.
[6] Tuulos, V.H., Scheible, J., Nyholm, H. Combining Web,
Mobile Phones and Public Displays in Large-Scale:
Manhattan Story Mashup, In Proc. of the 5th
Int’l
Conference on Pervasive Computing, Canada, 37-54,
2007.
[7] Yeh, L.W. Wang, Y.C, Tseng, Y.C. iPower: An Energy
Conservation System for Intelgent Buildings by Wireless
Sensor Networks, To appear in Int'l Journal of Sensor
Networks, 5(2), 2009.
[8] Buschmann, C., Pfisterer, D., et al. Spyglass: a wireless
sensor network visualiser, SIGBED Review, 2(1): 1-6,
2005.
[9] Lifton, J., Feldmeier, M., et al. A platform for ubiquitous
sensor deployment in occupational and domestic
environments, In Proc. of the 6th
Int’l Conference on
Information Processing in Sensor Networks, ACM Press,
Cambridge, Massachusetts, USA, 119-127, 2007.
[10] Kaiser, E., Olwal, A., et al. Mutual Disambiguation of
3D Multimodal Interaction in Augmented and Virtual
Reality, In Proc. of the 5th
Int’l Conference on
Multimodal Interfaces, ACM Press, November 5-7,
Vancouver, British Columbia, Canada, 12-19, 2003.
[11] Newman, J., Schall, G., Schmalstieg, D. Modelling and
Handling Seams in Wide-Area Sensor Networks, In
Proc. of the 10th
Int’l Symposium on Wearable
Computers, IEEE Computer Society, Montreux,
Switzerland, 51-54, 2006.
[12] Rauhala, M., Gunnarsson, A.S., Henrysson, A. A novel
interface to sensor networks using handheld augmented
reality, In Proc. of the 8th
Int’l Conference on HumanComputer
Interaction with Mobile Devices and Services,
ACM Press, Helsinki, Finland, 145-148, 2006.
[13] Reitmayr, G., Schmalstieg, D. Location based
Applications for Mobile Augmented Reality, In Proc. of
the 4th
Australasian User Interface Conference,
Adelaide, Australia, 65-73, 2003.
[14] Liarokapis, F., Newman, R., et al. Sense-Enabled Mixed
Reality Museum Exhibitions, In Proc. of the 8th
Int’l
Symposium on Virtual Reality, Archaeology and Cultural
Heritage, Eurographics, Brighton, UK, 26-30 November,
31-38, 2007.
[15] IEEE 802.15.4. IEEE Standard for Information
technology Part 15.4: Specifications for Low-Rate
Wireless Personal Area Networks (LR-WPANs), 2003.
[16] Keoh, S.L., Dulay, N., et al. Self managed cell: A
middleware for managing body sensor networks. In Proc
of the 4th
Int’l Conference on Mobile and Ubiquitous
Systems: Computing, Networking and Services
(Mobiquitous), Philadelphia, USA, August, 1-5, 2007.
[17] ADT75, Available at:
[http://www.analog.com/en/prod/0%2C2877%2CADT75
%2C00.html], Accessed at: 25/02/2008.
[18] ARToolKit, Available at:
[http://www.hitl.washington.edu/artoolkit/], Accessed at:
25/02/2008.
[19] ARTAG, Available at: [http://www.artag.net/], Accessed
at: 29/02/2008.
544
Interactive Virtual and Augmented Reality Environments
145
8.11 Paper #11
Liarokapis, F., Debattista, K., Vourvopoulos, A., Ene, A., Petridis, P. Comparing interaction
techniques for serious games through brain-computer interfaces: A user perception evaluation
study, Entertainment Computing, Elsevier, 5(4): 391-399, 2014.
Contribution (40%): Collaboration on the design of the architecture. Advice on the
implementation of the serious game as well as in the BCI interface. Write-up of most of the
paper.
Interactive Virtual and Augmented Reality Environments
155
8.12 Paper #12
Sylaiou, S, Liarokapis, F., Kotsakis, K., Patias, P. Virtual museums, a survey and some issues
for consideration, Journal of Cultural Heritage, Elsevier, 10(4): 520-528, 2009.
Contribution (30%): Collaboration on the collection of the material and write-up of the paper.
Journal of Cultural Heritage 10 (2009) 520–528
Review
Virtual museums, a survey and some issues for consideration
Sylaiou Styliania,∗, Liarokapis Fotisb, Kotsakis Kostasa,c, Patias Petrosa,d
a Inter-departmental Postgraduate Program of School of Technology ‘Protection, Conservation and Restoration of Cultural Monuments’,
Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
b Interactive Worlds Applied Research Group, Coventry University, Coventry, CV1 5FB, United Kingdom
c Department of History and Archaeology, Aristotle University of Thessaloniki, Thessaloniki, Greece
d Department of Rural & Surveying Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
Received 12 November 2007; accepted 23 March 2009
Abstract
Museums are interested in the digitizing of their collections not only for the sake of preserving the cultural heritage, but to also make the
information content accessible to the wider public in a manner that is attractive. Emerging technologies, such as VR, AR and Web3D are widely
used to create virtual museum exhibitions both in a museum environment through informative kiosks and on the World Wide Web. This paper
surveys the ﬁeld, and while it explores the various kinds of virtual museums in existence, it discusses the advantages and limitation involved with
a presentation of old and new methods and of the tools used for their creation.
© 2009 Elsevier Masson SAS. All rights reserved.
Keywords: Virtual museums; E-Heritage; Cultural informatics; Virtual reality; Augmented reality; Haptics
1. Introduction
Silverstone states that ‘museums are in many respects like
other contemporary media. They entertain and inform; they
tell stories and construct arguments; they aim to please and
to educate; they deﬁne, consciously or unconsciously; effectively
or ineffectively, an agenda; they translate the otherwise
unfamiliar and inaccessible into the familiar and accessible’ [1,
p. 162]. An extensive research work [2,3] and a survey of the
European museum sector [4] have shown that information technologies
such as the World Wide Web (WWW) enhanced by
three-dimensional visualization tools can provide valuable help
to achieve the aims mentioned above. Furthermore, their use
by a wide range of cultural institutions, such as museums, has
become easier due to an ever-increasing development of interactive
techniques and of new information technology software
and hardware, accompanied by a decrease in cost. Information
technologies provide solutions to issues of space limitation, of
the considerable exhibitions cost and of curator’s concerns concerning
the fragility of some museum artefacts. Conferences
∗ Corresponding author. Tel.: +30 2310 996407; fax: +30 2310 994207.
E-mail address: sylaiou@photo.topo.auth.gr (S. Styliani).
such as the ICHIM Conferences on Hypermedia and Interactivity
in Museums1 started in 1991 and Museums and the Web,2
established in 1997, highlight the importance of introducing new
technologies in museums. The utility and the potential beneﬁts
for museums of emerging technologies such as Virtual Reality
(VR) [5–7], Augmented Reality (AR) [8–10] and Web technologies
[11,12] have been well documented by a number of
researchers [13].
In the 1980s, museums inﬂuenced by the New Museology
and began to change the way they conveyed the context information
of the exhibits to the wider public. There was a shift
in the museology concept towards considering that the context
of a cultural artefact was more important than the item itself
[14–17]. By means of innovative methods and tools and by taking
advantage of the WWW potential as an information source,
virtual museums were created. They have made the content and
context of museum collections more accessible and attractive
to the wide public and have enriched the museum experience.
There is no ofﬁcial ﬁgure yet for the number of virtual museums
presently existing worldwide but we know that there are thou-
1 Available at: http://www.archimuse.com/conferences/ichim.html.
2 Available at: http://www.archimuse.com/conferences/mw.html.
1296-2074/$ – see front matter © 2009 Elsevier Masson SAS. All rights reserved.
doi:10.1016/j.culher.2009.03.003
S. Styliani et al. / Journal of Cultural Heritage 10 (2009) 520–528 521
sands of them and that their number is rapidly on the increase
[18]. This article will present the results of a survey on the current
state-of-the-art in virtual museums. The purpose behind this
is threefold:
(a) to review the various types and forms that a virtual museum
can have and the characteristics of these;
(b) to present an analysis of their advantages and to highlight
their potential;
(c) to present an overview of emerging technologies used by
virtual museums.
2. Types of virtual museums
The idea of the virtual museum was ﬁrst introduced by André
Malraux in 1947. He put forward the concept of an imaginary
museum (le musée imaginaire) [19], a museum without walls,
location or spatial boundaries, like a virtual museum, with its
content and information surrounding the objects, might be made
accessible across the planet. A virtual museum is:
“a collection of digitally recorded images, sound ﬁles, text
documents and other data of historical, scientiﬁc, or cultural
interest that are accessed through electronic media” [20].
With no standard deﬁnition prevailing for the term ‘virtual
museum’, the deﬁnition adopted for the purpose of this article
describes it as:
“(. . .) a logically related collection of digital objects composed
in a variety of media, and, because of its capacity to
provide connectedness and various points of access, it lends
itself to transcending traditional methods of communicating
and interacting with the visitors being ﬂexible toward their
needs and interests; it has no real place or space, its objects
and the related information can be disseminated all over the
world” [21].
Another less rigid deﬁnition states that a virtual museum can
be a digital collection that is presented either over the Web, or
to an intranet, either via a personal computer (PC), an informative
kiosk, a personal digital assistant (PDA), or even to a
CD-ROM as an extension of a physical museum, or that it can
be completely imaginary. Furthermore, the abstract term virtual
museum can take various forms depending on the application
scenario and end-user. It can be a 3D reconstruction of the physical
museum [22]. Alternatively, it can be a completely imaginary
environment, in the form of various rooms, in which the cultural
artifacts are placed [23].
According to ICOM [24], there are three categories of virtual
museums on the Internet that are developed as extensions of
physical museums: the brochure museum, the content museum
and the learning museum. The brochure museum aims at informing
future visitors about the museum and is mainly used as a
marketing tool, with basic information such as location, opening
hours and sometimes a calendar of events etc. [25,26], in
order to create motivation to visit the walled museum. The
content museum is a website created with the purpose of making
information about the museum collections available. It can
be identiﬁed to a database containing detailed information about
the museum collections, with the content presented in an objectoriented
way. The learning museum is a website, which offers
different points of access to its virtual visitors, depending on
their age, background and knowledge. The information is presented
in a context-oriented, rather than object-oriented way.
Moreover, the site is educationally enhanced and linked to additional
information intended to motivate the virtual visitor to learn
more about a subject of particular interest to them and to visit the
site again. The goal of the learning museum is to make the virtual
visitor come back and to make him/her establish a personal
relationship with the online collection.
3. Emerging tools and technologies used by virtual
museums
Technological advances that have emerged as areas of crucial
interest are making it possible to use sophisticated tools
to provide customized interfaces for the generation of virtual
museums, to design a virtual museum exhibition in a number
of ways [27,58] and to get used as conveyors of information
for knowledge construction, acquisition and integration. New
types of interfaces, interaction techniques and tracking devices
are developing at a rapid pace and can be integrated into multimodal
interactive VR and AR interfaces [9]. The ﬁrst studies
in the ﬁeld were mainly focused on static presentations of texts
and photos concerning a museum. Later on, the exhibits tended
to be more dynamic and interactive rather than static in nature
and authoritative [28,27], thus creating an approach which was
closer to reality and enhancing the experience for virtual visitors.
Usually, the structure of most virtual exhibitions is deﬁned by
the structure of exhibition spaces [11] that consist of two types of
elements:theVirtualGalleriesandtheCulturalObjects.Exhibits
are the principal means through which museums communicate
their mission objectives and they can be static or interactive.
According to research the key features of an online interactive
exhibit are:
(a) multiplicity of contexts for the user to connect with the
exhibit in a seamless manner;
(b) good instructional design;
(c) pro-active learning contexts;
(d) good balance between learning and leisure;
(e) no text-heavy pages to interfere with the learning experience
[29].
In this section, a brief overview of the most characteristic
methods and tools currently used for the generation of virtual
museum exhibitions and their exhibits are presented.
3.1. Imaging technology
Virtual museums need high-resolution images in order to provide
as much information as possible about the virtual exhibits.
However, the level-of-detail (LOD) is dependent on the resolution
of the digital images and high-resolution conventional
images produce very large ﬁles that are difﬁcult to manage and
522 S. Styliani et al. / Journal of Cultural Heritage 10 (2009) 520–528
to transmit across networks because of their dependence on
bandwidth availability (slow Internet connections). A strategy
adopted to confront this problem is the image servers that use a
“Russian doll” imaging architecture and give the user scalability
and interactivity opportunities, because multiple resolutions of
an image are stored in a single ﬁle and make it possible to progressively
transmit an image. FlashPix and then JPEG2000 are
the two image formats that introduced a new concept for imaging
architecture [30]. Metadata storing is also allowed. This image
format is used by various museums such [31–34]. Some of the
FlashPix features are adopted by the JPEG2000 image format
that also has the potential of progressive image transmission
and scalability and some new features that ﬁll the gaps for the
inclusion of metadata and the protection of the content [35] of
earlier standards for encoding digital media. The advantages of
the image format have been extensively investigated in research
work [36,37] and the JPEG2000 format has been adopted by
cultural institutions [38–40].
3.2. Web3D exhibitions
Internet technologies have the tremendous potential of offering
virtual visitors ubiquitous access via the WWW to a virtual
museum environment. Additionally, the increased efﬁciency of
Internet connections (i.e. ADSL) makes it possible to transmit
signiﬁcant media ﬁles relating to the artefacts of virtual museum
exhibitions. The most popular technology for the WWW visualisation
includes Web3D which offers tools such as VRML
and X3D, which can be used for the creation of an interactive
virtual museum. The Web3D consortium [41] contains open
standards for real-time 3D communication and the most important
standards include: VRML97 and X3D and are presented
below. Many museum applications based on VRML have been
developed for the web [12,42]. As from 4 April 1997, VRML97
has stood for Virtual Reality Modeling Language. Technically
speaking, VRML is neither VR, nor a modelling language,
but a 3D interchange format which deﬁnes most of the commonly
used semantics found in today’s 3D applications such as
hierarchical transformations, light sources, viewpoints, geometry,
animation, fog, material properties, and texture mapping.
Another deﬁnition states that VRML serves as a simple, multiplatform
language for publishing 3D Web pages as well as for
providing the necessary technology to integrate three dimensions,
two dimensions, text, and multimedia into a coherent
model. “When these media types are combined with scripting
languages and Internet capabilities, an entirely new genre
of interactive applications is possible” [43]. This is due to
the fact that some information is best experienced in threedimensional
form, such as the information of virtual museums
[11,9]. However, VRML can be excessively labour-intensive,
time consuming and expensive. QuickTime VR (QTVR) and
panoramas that allow animation and provide dynamic and continuous
360◦ views might represent an alternative solution for
museums such as in [44]. As with VRML, the image allows
panning and high-quality zooming. Furthermore, hotspots that
connect the QTVR and panoramas with other ﬁles can be added
[45]. In contrast, X3D is an Open Standards XML-enabled 3D
ﬁle format offering real-time communication of 3D data across
all applications and network applications. Although, X3D is
sometimes considered as an Application Programming Interface
(API) or a ﬁle format for geometry interchange, its main
characteristic is that it combines both geometry and runtime
behavioral descriptions into a single ﬁle alone. Moreover, X3D
is considered to be the next revision of the VRML97 ISO
speciﬁcation, incorporating the latest advances in commercial
graphics hardware features, as well as improvements based on
years of feedback from the VRML97 development community.
For a virtual museum, making possible the presentation of virtual
exhibitions, the visualization usually consists of dynamic
Web pages embedded with 3D VRML models [9]. This can be
enhanced with other multimedia information (i.e. movie clips,
sound) and used remotely over web protocols (i.e. HTTP). A
more 3D graphics format, is COLLAborative Design Activity
(COLLADA) [46] which deﬁnes an open standard XML schema
for exchanging digital assets among various graphics software
applications that might otherwise store their assets in incompatible
formats. One of the main advantages of COLLADA is that is
includes more advanced physics functionality such as collision
detection and friction (which Web3D does not support).
Moreover, more powerful technologies that have been used
in museum environments include OpenSceneGraph (OSG) [47]
and a variety of 3D game engines [48,49]. OSG is an open
source multi-platform high performance 3D graphics toolkit,
used by museums [50,51] to generate more powerful VR applications,
especially in terms of immersion and interactivity since
it supports text, video, audio and 3D scenes into a single 3D environment.
On the other hand, 3D games engines are also very
powerful and they provide superior visualization and physics
support. Serious games is a new concept and allows for collaborative
use of 3D spaces which are used for learning and
educational purposes in a number of educational domains. The
main strengths of serious gaming applications could be generalised
as being in the areas of communication, visual expression
of information, collaboration mechanisms, interactivity and
entertainment. Both technologies (OSG and 3D game engines)
compared to VRML and X3D can provide very realistic and
immersive museum environments but they have two main drawbacks.
First, they require advanced programming skills in order
to design and implement custom applications. Secondly, they
do not have support for mobile devices such as PDAs and 3G
phones.
3.3. Virtual reality exhibitions
VR is a simulation of a real or imaginary environment generated
in 3D by digital technologies that is experienced visually
and provides the illusion of reality. Over the past few years, modeling
software has become affordable and the cost of building
virtual environments has fallen considerably, thus fuelling new
application domains such as virtual heritage. For example, low
cost and highly interactive VR experiences for museum visitors
can be created on the basis of standard hardware components (a
relatively low cost PC with cheap graphics accelerator, a touch
screen and a sensor device, e.g. a inertia cube), some applica-
S. Styliani et al. / Journal of Cultural Heritage 10 (2009) 520–528 523
tion software and suitable browser plug-ins. VR applications can
be used by distributed groups of large numbers of players, and
are immersive and interactive. In a VR environment participants
get immersed into a completely artiﬁcial world but there are
various types of VR systems, which provide different levels of
immersion and interaction. Heim believes that weak VR can be
characterized by the appearance of a 3D environment on a 2D
screen [52].
In contrast, strong VR is the total sensory immersion, which
includes immersion displays, tracking and sensing technologies.
Common visualization displays include head-mounted displays
and 3D polarizing stereoscopic glasses while inertia and magnetic
trackers are the most popular positional and orientation
devices. As far as sensing is considered, 3D mouse and gloves
can be used to create a feeling of control of an actual space.
An example of a high immersion VR environment is Kivotos,
a VR environment that uses the CAVE® system, in a room of
3 meters by 3 meters, where the walls and the ﬂoor act as projection
screens and in which visitors take off on a journey thanks
to stereoscopic 3D glasses [53]. As mentioned earlier, virtual
exhibitions can be visualized in the Web browser in the form of
3D galleries, but they can also be used as a stand-alone interface
(i.e. not within the web browser). In addition, a number
of commercial VR software tools and libraries exist, such as
Cortona [54], which can be used to generate fast and effectively
virtual museum environments. However, the cost of creating and
storing the content (i.e. 3D galleries) is considerably high for the
medium and small sized museums that represent the majority of
cultural heritage institutions. An overview of the tools and methods
available to visitors visualizing a virtual museum has been
already carried out [55].
3.4. Augmented reality exhibitions
In addition to the VR exhibitions, museum visitors can enjoy
an enhanced experience by visualizing, interacting and navigating
into museum collections (i.e. artifacts), or even by creating
museum galleries in an AR environment. The virtual visitors
can position virtual artifacts anywhere in the real environment
by using either sophisticated software methods (i.e. computer
vision techniques) or specialized tracking devices (i.e. InertiaCube).
Although the AR exhibition is harder to achieve, it offers
more advantages to museum visitors as compared to Web3D and
VR exhibitions. Speciﬁcally, in an AR museum exhibition, virtual
information (usually 3D objects but it can also be any type
of multimedia information, such as textual or pictorial information)
is overplayed upon video frames captured by a camera,
giving users an impression that the virtual cultural artifacts actually
exist in the real environment. Through human–computer
interaction techniques users can examine thoroughly the virtual
artifacts through tactile manipulation of ﬁducials (i.e. markers)
or sensor devices (i.e. pinch-gloves). This ‘augmentation’ of the
real-world environment can lead to an intuitive access to the
museum information and enhance the impact of the museum
exhibition on virtual visitors.
One of the earliest examples of an interactive virtual exhibition
is an automated tour guide system that uses AR techniques
[56]. It can superimpose meaningful audio on the real world
on the basis of the location of the user, offering the advantage
of enriching visitors’ experiences. Also, the Meta-Museum
guide system [57] is based on AR and artiﬁcial intelligence technologies
and provides a communication environment between
the real world and cyberspace to maximize the utilization of a
museum’s archives and knowledge base. Furthermore, AR has
been experimentally applied to make it possible to visualise
incomplete or broken real objects as they were in their original
state by superimposition of the missing parts [10]. Finally,
the ARCO system [23,11] provides customised tools for virtual
museum environments, ranging from the digitisation of museum
collectionstothetangiblevisualizationofbothmuseumgalleries
and artifacts. ARCO developed tangible interfaces that allow
museumvisitorstovisualisevirtualmuseumsinWeb3D,VRand
AR environments sequentially. A major beneﬁt of an AR-based
interface resides in the fact that carefully designed applications
can themselves provide novel and intuitive interaction without
the need for expensive input devices.
3.5. Mixed reality exhibitions
Finally, mixed reality (MR) relies on a combination of
VR, AR and the real environment. According to Milgram and
Kishino’s virtuality-continuum, real world and virtual world,
objectsarepresentedtogetheronasingledisplay[58]withvisual
representation of real and virtual space [59]. An example of the
use of MR techniques in a museum environment is the Situating
Hybrid Assemblies in Public Environments (SHAPE) project
[60] that uses hybrid reality technology to enhance users’ social
experience and learning in museum and other exhibition environments,
with regard to cultural artifacts and to their related
contexts. It proposes the use of a sophisticated device called the
periscope (it is now called the Augurscope), which is a portable
mixed reality interface, inside museum environments to support
visitors interaction and visualisation of artifacts.
3.6. Haptics
‘Haptics, from the Greek word ‘haptein’, involves the modality
of touch and the sensation of shape and texture which an
observer feels when exploring a virtual object’ [61]. Haptics
makes it possible to achieve the extension of visual displays
to render them more realistic, useful and engaging for visitors.
One of the most characteristic museum applications using haptics
is at the University of Southern California’s Interactive Art
Museum [62]. In this case, the PHANToM device was used
within a museum allowing visitors to touch and feel virtual
artifacts [63] PHANToM is a desk-grounded robot that allows
simulationofsingleﬁngertipcontactwithvirtualobjectsthrough
a pointing device (i.e. stylus). In addition, its actuators communicate
forces back to the user’s ﬁngertips as it detects collisions
with virtual objects, simulating the sense of touch. Another
application is the ‘Museum of Pure Form’ a VR system where
users can interact, through the senses of touch and sight, with
digital models of 3D art forms and sculptures. Its aim was to
change the way normal users perceive sculptures, statues or,
524 S. Styliani et al. / Journal of Cultural Heritage 10 (2009) 520–528
more generally, any type of 3D artwork [64]. Two different presentations
of this application were developed including a system
placed inside several museums and art galleries around Europe
as well as a system placed inside a CAVETM environment [27].
3.7. Use of handheld devices in museums
Handheld devices represent a wide range, including
cell phones, personal-digital assistants (PDAs) and tabloids.
Improvements during the past few years in optics, processing
powerandergonomicshaveinitiatedanumberofmuseum-based
applications. A prototype application is the City co-visiting
system which combines VR, hypermedia technology, handheld
devices and ultrasound tracking technology to allow three visitors,
one on-site and two remote [65]. A location–aware PDA is
used for the on-site visitor to display the ongoing positions of all
three visitors on a map of the gallery while the two off-site visitors
use two different environments: a web-only environment
and a VE. The application also supports web-based multimedia
information for the off-site visitors that are dynamically
presented upon movement across the map. The San Francisco
Museum of Modern Art (SFMoMA) has also presented work
from its permanent collection in iPAQ handhelds [66]. Furthermore,
the giCentre at City University is exploring LBS
through the use of mobile computing including the use of thirdgeneration
(3G) phones and PDAs [67]. Users can interact with
the virtual artifacts using either the menu interface or the stylus.
In addition, using external sensors (i.e. inertia cube, accelerometers
and digital compass) museum visitors can perceive virtual
information about the artifacts in relation to their location inside
the museum.
4. Real and virtual museum
According to the deﬁnition of the International Council of
Museums (ICOM) about museums [68]: “A museum is a nonproﬁt
making, permanent institution in the service of society
and of its development, and open to the public, which acquires,
conserves, researches, communicates and exhibits, for purposes
of study, education and enjoyment, material evidence of people
and their environment.” Virtual museum enjoy the same
functions of acquisition, storage, documentation, research, exhibition
and communication as the ‘brick and mortar’ museums
as set out by the above deﬁnition. They can, in addition, act
in a complementary and auxiliary manner. A virtual museum
website can provide worldwide publicity. Research has revealed
that 70% of people visiting a museum website would subsequently
be more likely to go and visit the ‘real’ museum [69].
Museum curators can digitally preserve the artifacts of their collections.
The effective safeguarding of cultural artifacts can be
achieved through the use of technological advances, by means
of the comparison of different images across time to monitor
their conservation. Furthermore, they provide the means to
create digital representations of cultural artifacts and database
technologies with which multimedia information about the virtual
museum artifacts can be stored and retrieved whenever is
needed. The digitized information can be re-used in a variety of
ways, for different purposes and probably even by other cultural
institutions.
Additionally, virtual museums allow museum curators to
experiment with various arrangements of 3D objects inside the
gallery, to test different designs before deciding on the presentation
style of a temporary exhibition. They create and disseminate
to the wider public virtual models of cultural artifacts that
combine archaeological accuracy and reliability with aesthetic
pleasure. Finally, they visualize the digital representation of
the cultural objects via VR and AR interfaces, so as to make
available to the wider audience more realistic and appealing
virtual museum exhibitions that can be interactively and easily
explored. In addition to this, they can overcome limitations of
space in respect of the number of objects accessible in the real
museum [70].
The WWW is widely used by museums for putting their
collections online [71], not only because it is very popular (especially
among young people), but also because it is in the hands
of museum curators a powerful communication tool that can
deliver in a fast, user-friendly and low-cost information about
the museum to potential virtual visitors and provides museum
curators with a great variety of opportunities in terms of museum
data dissemination. As it has been already mentioned, virtual
museums, through innovative technologies, provide unrestricted
round-the-clock access to their visitors through the WWW. Virtual
museums can provide access from any place and to anyone,
including people with special needs (visual, acoustic, speech and
motor disabilities and learning difﬁculties). The UN Convention
on the Rights of Persons with Disabilities [72], the Americans
with Disabilities Act of 1990 (ADA) [73] and the Disability Discrimination
Act (DDA) in the UK state that disabled people have
equal rights of ‘access to goods, facilities and services’ [74]. It
is therefore the responsibility of Cultural institutions, such as
museums to ﬁnd ways of providing access to the exhibitions to
people with disabilities. Digital museums take into account the
need emphasized by the Resource Disability Action Plan and
formed by the Council of Museums, Archives and Libraries for
efﬁcient ways of using new technologies which allow the access
to museum exhibitions to all end-user groups including virtual
access to disabled people [75] using AR interfaces designed to
operate on off-the shelf computer systems [9].
The cultural artifacts that are exhibited in the physical environment
of a museum are usually shown in display cases, where
only a limited amount of information about them is available.
In virtual museum exhibitions, museum artifacts can be digitized
and visualized into a virtual interactive environment. A
virtual exhibit can contain information that a physical exhibit in
a museum showcase cannot. Thus, museum curators are given
the opportunity to offer a more rewarding experience thanks
to rich multimedia context information data about the objects,
in comparison to artifacts that are locked in a museum glass
case with a simple description on a card. In these virtual exhibitions,
users may explore exhibits in an interactive and more
ﬂexible way. Virtual museum exhibitions provide the experience
of allowing virtual visitors to observe and examine an object
from all angles. AR exhibitions can also involve physical interfaces
(i.e. marker-cards), which are used as the link between real
S. Styliani et al. / Journal of Cultural Heritage 10 (2009) 520–528 525
and virtual worlds. Physical interfaces allow museum visitors
to pick up and manipulate virtual cultural objects and examine
them within the display system in their hands (i.e. ﬂat screen)
[9]. Additionally, a virtual museum gives the user the control of
the virtual tour, because it may provide 3D views of a museum
and a ﬂoor plan. Virtual visitors can orient themselves; know in
which room of the virtual exhibition the exhibits are found and
to which group of the exhibits an object belongs. The exhibits
themselves can convey their meaning, when they are examined
in conjunction with the other exhibits of the room and through a
narrative that connects the objects and their context and ‘brings
to life the potential dynamism of objects and their stories’ [76].
The communities targeted by virtual museums are the
museum curators and the end-users. The second category can
be divided into three subcategories: the specialists, the students
and the tourists [77,78]. Virtual museum exhibitions can contain
a great amount and depth of information, meant to broaden perspectives,
satisfy needs and encourage a deeper understanding
of virtual visitors of any of the above proﬁles. They can fulﬁl
the need for “basic and distinguishing information’ of simple
tourists [79] and they do not need any additional help to deciphering
the concepts and the ideas behind museum objects [80,
p. 210]. Virtual museums are also capable of providing information
to a degree of detail that is sufﬁcient for various kinds of
visitors [72] while it may assist the specialised research needs
including the comparative study requirements of specialists and
students, by providing access not only to one, but to multiple
museum collections. Furthermore, creative websites may attract
audiences that ‘would not normally use libraries or museums’
[81] and do not have prior knowledge of or interest in the subject
of the museum exhibition [82]. The visitors of virtual museum
exhibitions are not passive nor do they lack opportunities to
develop their critical skills. A virtual museum can provide visitors
with the freedom to explore, to exercise autonomy and to
be active participants as they create their own virtual tour and
paths. Additionally, the digital tools provided are used as cognitive
technologies that help the virtual visitors transcend the
limitations of the human mind, such as memory or problem
solving limitations [83] and construct their own knowledge. A
representative example of the above is the ability provided to
virtual museum visitors for creating a personal online exhibition
of digitized material, a ‘gallery’ that corresponds to their
interests and they can share it with others [33]. In a virtual
museum environment, there are more learning opportunities
via educational games than in a physical museum [84,85], as
cited by [86]. Most of the virtual museums have been designed
by taking into account the constructivist principles of learning
through construction and learning through play [87,88] and they
involve interaction, experiencing and learning at the same time.
In a virtual museum environment, the visitor is not an observer
but s/he interacts with the learning objects and s/he constructs
her/himself the knowledge. Museum visitors use and interact
with the virtual museum environment via a constructive dialogue
that provide them with access to thematic information
and explanations about the museum objects’ context with the
level of information and the amount of detail they prefer [89].
Learning is an active process and the end-users are engaged in
hands-on involvement in an engaging experience that enhances
the understanding, fosters fruitful learning interactions, awakens
and keeps the interest alive and enriches aesthetic sensitivities.
Most of the time, virtual visitors do not want to ‘learn something’
but rather to engage in an ‘experience of learning’ or ‘learning
for fun’ that can be ‘important and enjoyable in its own right’
[90].
5. Problems and implications confronted
New technologies provide new possibilities and impose
new restrictions [91]. Despite signiﬁcant advantages, a virtual
museum also presents drawbacks. The forms these take will
now be examined. ‘VR’ (an oxymoron) cannot have the complexity
of the real objects. Virtual museum comes from Greek
dynaton (gr. δυνατóν=possible) and it means “that in potential”
(Aristotle, Analitici primi) and exists in potential form and not
in reality [92]. The problem is that advanced graphic systems
that are used for computer reconstructions adopted by virtual
museums may sometimes be too realistic. They are based on
partial evidence, but they suggest an impression of good knowledge
of the past. Sometimes advanced graphic systems present
the ‘image’ as true, giving the sense of misleading accuracy
[93,94]. When the reconstructed item has a lot of missing elements
then – obviously – scientists must use their imagination
or rely on ethno-historical information on how similar cases
might have looked like, in order to reconstruct it. However, in
these cases, the result will not be an explanation of the past,
but a personal and subjective way of seeing it. A good ‘image’
can give the impression to the viewer that museologists know
more than they actually do. Some products of computer reconstructions
can be considered as scientiﬁcally accurate, because
they seem to be accurate. The term “user” is used for virtual
museum visitors, because, in order to retrieve information on
virtual exhibits, computer skills are required [86]. This means
that the computer illiterate are automatically excluded and a lot
of visitors encounter difﬁculties with understanding the use of
plug-ins and other applications that need to be downloaded from
the Internet and installed in order to retrieve information from
sophisticated virtual museum exhibitions.
The idea of the ambiguity between reality and virtuality can
be ﬁrst traced in the Metaphor of the Cave in The Republic of
Plato, where people take as real a fact that is an illusion [95].
Prisoners that have been chained and held immobile can only
see at a wall in front of them. Behind them, there is a ﬁre and
between them and the ﬁre there is a walkway with shadows of
moving things and creatures. So, they consider the shadows and
the echoes as the only ‘reality’ and the reﬂections of objects
more important than the objects themselves.
When it comes to building virtual reconstructions, even if
there is a degree of accuracy, the one-sided view of the reconstructed
site is still wrong. Computer reconstructions that offer
only one aspect of the subject they examine and do not provide
any alternative reconstructions, contradict the fact that there are
many ways to examine the Past. In virtual reconstructions there
is only one aspect of the subject that has been reconstructed
and no alternative reconstructions have been created. Some
526 S. Styliani et al. / Journal of Cultural Heritage 10 (2009) 520–528
high-quality and sophisticated virtual museums involve collaborations
between museologists and computer experts. In such
cases, communication problems often arise between those with
theoretical knowledge in museology and those with practical
knowledge of computers. In most circumstances, the software
itself used by virtual museums is not accessible to museologists
and computer scientists stand between them and the data. In
some cases, it is probable that the Past is both misinterpreted
and misrepresented. The visualization results are impressive,
thus fulﬁlling a primary goal, more speciﬁcally general public
consumption, but without, in turn, serving the museum goals.
Virtual museums may provide users with fragmented museumrelated
information that often bear no obvious information with
each other or refer to a useful context. In addition to this, some
virtual museums suffer from the lack of clearly identiﬁed purposes.
Their design must be carried out according to their raison
d’être and the information provided must be organized in order
to construct a narrative [96]. A virtual museum has to deﬁne its
target community/ies, its aims, its content and how this will be
structured and delivered. Throughout all the creation phases of
the virtual museum, evaluation studies that involve real users
must be undertaken, in order to identify the parts of the program
that need further improvement [97].
6. Conclusions
In this paper, the various types of virtual museums in the light
of a range of classiﬁcations have been discussed. With the use of
imaging technology, Web3D, VR, AR, MR, haptics and handheld
devices as PDAs, museums can exploit all possibilities of
the new media, analyze and answer in various ways to visitors’
needs, enable an intuitive interaction with the displayed content
and provide an entertaining and educational experience. The
beneﬁts of virtual museums are noteworthy as far as museum
curators are concerned and in terms of documentation, conservation,
research and exhibition. The virtual museums have the
potential to both preserve and disseminate the cultural information
in an effectively and at a low-cost through innovative
methods and tools. They are an engaging medium with great
appeal to a variety of groups of visitors and can promote the ‘real
sites’ by providing information about museum exhibitions and
offer an enhanced display of museum artifacts through emerging
technologies. Various groups of end-users such as tourists, students
and specialists can take advantage of them and satisfy their
learning and entertainment needs. The visit of virtual museums
can be an enjoyable and productive experience that draws the
user into involvement and participation and help the promotion
of real museums [98].
The virtual museums enrich the museum experience by
allowing an intuitive interaction with the virtual museum artifacts.
A comparison between real and virtual museums indicates
that there still are important issues for virtual museums to solve.
Good collaboration must be ensured between cultural heritage
specialists (museum curators, historians, archaeologists, etc.)
and information science specialists to achieve optimal results
and in order to avoid dependence on market-produced software
and to promote open-source software that may be produced with
the aid of cultural heritage specialists. Virtual museums cannot
and do not intend to replace the walled museums. They can be
characterised as ‘digital reﬂections’ of physical museums that do
not exist per se, but act complementarily to become an extension
of physical museums exhibition halls and the ubiquitous vehicle
of the ideas, concepts and ‘messages’ of the real museum.
Their primary aim is (or should be) to investigate and propose
models for the exploration of the real purpose and conceptual
orientation of a museum.
References
[1] R. Silverstone, The medium is the museum, in: R. Miles, L. Zavala (Eds.),
Towards the Museum of the Future, Routledge, London/New York, 1994,
pp. 161–176.
[2] J. Jones, M. Christal, The Future of Virtual Museums: On-Line, Immersive,
3D Environments, Created Realities Group, 2002.
[3] G. Scali, M. Segbert, B. Morganti, Multimedia applications for innovation
in cultural heritage, in: Proceedings of 68th IFLA Council and General
Conference, August 2002, Glasgow, U.K., 2002.
[4] ORIONReportonScientiﬁc/TechnologicalTrendsandPlatforms,available
at: http://www.orion-net.org.
[5] D. Pletinckx, D. Callebaut, A. Killebrew, N. Silberman, Virtual-reality
heritage presentation at Ename, IEEE Multimedia 7 (2) (2000) 45–48.
[6] M. Roussou, Immersive interactive virtual reality in the museum, in: Proceedings
of TiLE, June 2001, London, U.K, 2001.
[7] R. Wolciechowski, K. Walczak, M. White, W. Cellary, Building Virtual and
Augmented Reality Museum Exhibitions, in: Proceedings of the 9th Int.
Conference on 3D Web Technology, California, USA, April 2004, ACM
SIGGRAPH, 2004, pp. 135–144.
[8] A. Brogni, C.A. Avizzano, C. Evangelista, M. Bergamasco, Technological
approach for cultural heritage: augmented reality, in: Proceedings of the
RO-MAN 99 Conference, Pisa, Italy, September 1999, 1999, pp. 206–212.
[9] F. Liarokapis, S. Sylaiou, A. Basu, N. Mourkoussis, M. White, P.F. Lister,
An interactive visualisation interface for virtual museums, in: K. Cain,
Y. Chrysanthou, F. Niccolucci, N. Silberman (Eds.), Proceedings of the
VAST 2004 Conference, Belgium, EPOCH Publication, Belgium, 2004,
pp. 47–56.
[10] F. Liarokapis, M. White, Augmented reality techniques for museum environments,
The Mediterranean Journal of Computers and Networks 1 (2)
(2005) 90–96.
[11] M. White, N. Mourkoussis, J. Darcy, P. Petridis, F. Liarokapis, P.F. Lister,
K. Walczak, R. Wolciechowski, W. Cellary, J. Chmielewski, M. Stawniak,
W. Wiza, M. Patel, J. Stevenson, J. Manley, F. Giorgini, P. Sayd, F. Gaspard,
ARCO—An Architecture for digitization, management and presentation of
virtual exhibitions, in: Proceedings of the CGI’2004 Conference, Hersonissos,
Crete, June 2004, Los Alamitos, California: IEEE Computer Society,
2004, pp. 622–625.
[12] P.A.S. Sinclair, K. Martinez, D.E. Millard, M.J. Weal, Augmented reality
as an interface to adaptive hypermedia systems. New review of hypermedia
and multimedia, Special Issue on Hypermedia beyond the Desktop 9 (1)
(2003) 117–136.
[13] P. Patias, Y. Chrysanthou, S. Sylaiou, H. Georgiadis, S. Stylianidis, The
development of an e-museum for contemporary arts, in: Proceedings of
the VSMM Conference on Virtual Systems and Multimedia dedicated to
Cultural Heritage 2008, 20–25 October, Nicosia, Cyprus, 2008.
[14] S.M. Pearce, Thinking about things. approaches to the study of artifacts,
Museum Journal (1986) 198–201.
[15] W.E. Washburn, Collecting information, not objects, Museum News 62 (3)
(1984) 5–15.
[16] G. McDonald, S. Alsford, The museum as information utility, Museum
Management and Curatorship 10 (1991) 305–311.
[17] S. Alsford, Museums as hypermedia: Interacting on a museum-wide scale,
in: D. Bearman (Ed.), Proceedings of the ICHIM ‘91 Conference, Pittsburgh,
Pennsylvania, USA, October 1991, 1991, pp. 7–16.
S. Styliani et al. / Journal of Cultural Heritage 10 (2009) 520–528 527
[18] Information today, December 2005, pp. 31–34, available at:
http://www.infotoday.com.
[19] A. Malraux, La Musée immaginaire, Gallimard, Paris, 1996 [orig. 1947].
[20] Encyclopaedia Britannica online, available at: http://www.britannica.
com/eb/article-9000232.
[21] W. Schweibenz, The virtual museum: new perspectives for museums to
present objects and information using the Internet as a knowledge base and
communication system, in: H. Zimmermann, H. Schramm (Eds.), Proceedings
of the 6th ISI Conference, Prague, November 1998, Konstanz, UKV,
1991, pp. 185–200.
[22] 3D Van Gogh, Museum Virtual Tour’, available at: http://www3.
vangoghmuseum.nl/vgm/index.jsp?page=49335&lang=en.
[23] ARCO, ARCO (Augmented Representation of Cultural Objects) Consortium.
Available at: http://www.arco-web.org.
[24] ICOM News, no. 3, 2004, available at: http://icom.museum/pdf/
E news2004/p3 2004-3.pdf.
[25] L. Teather, A museum is a museum. Or is it?: Exploring museology and
the web, in: D. Bearman, J. Trant (Eds.), Proceedings of the Conference
Museums and the Web, 1998, Pittsburgh, 1998.
[26] M. McDonald, The Museum and the Web: Three Case Studies, available at
: http://xroads.virginia.edu/∼MA05/macdonald/museums/method.html.
[27] M. Bergamasco, A. Frisoli, F. Barbagli, Haptics Technologies and Cultural
Heritage Applications, in: S. Kawada (Ed.), IEEE Proceedings of the
CA Conference 2002, Geneva, Switzerland, June 2002, IEEE Computer
Society Press, 2002, pp. 25–32.
[28] S. Worden, Thinking critically about virtual museums, in: D. Bearman, J.
Trant (Eds.), Proceedings of the Conference Museums and the Web, 1997,
Pittsburgh, 1997, pp. 93–109.
[29] L.T.W. Hin, R. Subramaniam, A.K. Aggarwal, Virtual Science Centers: A
new genre of learning in web-based promotion of science education, in:
Proceedings of the 36th Annual HICSS’03 Conference, IEEE Computer
Society, 2003, pp. 156–165.
[30] O. Georgoula, P. Patias, Visualization tools using FlashPix image format,
in: A. Gruen, S. Murai, J. Niederoest, F. Remondino (Eds.), Proceedings of
the International Archives of Photogrammetry, Remote Sensing and Spatial
Information Sciences, vol. XXXIV, PART 5/W10, February 2003, 2003.
[31] San Francisco Fine Arts Museums, http://www.thinker.org/.
[32] N. Talagala, S. Asami, D. Patterson, B. Futernick, D. Hart, The
Berkeley-San Francisco Fine Arts image database, in: B. Kobler, P.C.
Hariharan (Eds.), Proceedings of the 15th IEEE Symposium on Mass
Storage Systems, Maryland, USA, March 1998, 1998, available at:
http://romulus.gsfc.nasa.gov/msst/conf1998/B5 06/TALAGA.pdf.
[33] Metropolitan Museum of Art in New York, available at: http://www.
metmuseum.org/.
[34] D. Marshak, J. Paul Getty Museum Re-Architects Technology to
Enhance Visitors Experience, available at: http://www.sun.com/productsn-solutions/edu/casestudy/pdf/getty
museum.pdf.
[35] Virtual Display Case, Making Museum Image Assets Safely Visible,
3rd ed., available at: http://www.chin.gc.ca/English/Intellectual
Property/Virtual Display Case/.
[36] A.N. Skodras, C.A. Christopoulos, T. Ebrahimi, The JPEG2000 still image
compression standard, IEEE Signal Processing Magazine 18 (5) (2001)
36–58.
[37] S. Sylaiou, P. Patias, O. Georgoula, L. Sechidis, Digital image formats
suitable for museum publications, in: Proceedings of the 2nd International
Museology Conference in Technology for the Cultural Heritage, Lesvos,
Greece, June 2004, 2004.
[38] Charles Olson’s Melville Project digital library, available at:
http://charlesolson.uconn.edu/Works in the Collection/Melville
Project/index.htm.
[39] National Archives of Japan, available at http://jpimg.digital.archives.
go.jp/kouseisai/word/abc.html.
[40] Digital Collections of the University of South Carolina Libraries, available
at: http://www.sc.edu/library/digital/.
[41] Web3D consortium, available at: http://www.web3d.org (accessed at
6/08/2007).
[42] S. Goodall, P.H. Lewis, K. Martinez, P. Sinclair, M. Addis, C. Lahanier, J.
Stevenson, Knowledge-Based Exploration of Multimedia Museum Collections,
in: Proceedings of the EWIMT Conference, November 2004,
London, U.K., 2004.
[43] VRML, The Annotated VRML 97 Reference, available at: http://accad.osu.
edu/∼pgerstma/class/vnv/resources/info/AnnotatedVrmlRef/ch1.htm.
[44] Benaki Museum, available at: http://www.benaki.gr.
[45] Rembrandt House Museum, available at: http://www.
rembrandthuis.nl/cms pages/index main.html.
[46] COLLADA – Digital Asset Schema Release 1.5.0, http://www.khronos.
org/ﬁles/collada spec 1 5.pdf.
[47] OpenSceneGraph, 2009. Available at: http://www.openscenegraph.org/
projects/osg.
[48] QuakeDev, 2009. Available at: http://www.quakedev.com/.
[49] Second Life, 2009. Available at: http://secondlife.com/.
[50] J. Looser, R. Grasset, H. Seichter, M. Billinghurst, 2006. OSGART – A
pragmatic approach to MR, In ISMAR 2006.
[51] L. Calori, C. Camporesi, M. Forte, A. Guidazzoli, S. Pescarin, Openheritage:
integrated approach to web 3D publication of virtual landscape,
in: Proceedings of the ISPRS Working Group V/4 Workshop, 3D-ARCH
2005: Virtual Reconstruction and Visualization of Complex Architectures,
Mestre-Venice, Italy, 22–24 August, 2005.
[52] M. Heim, The Metaphysics of Virtual Reality, Oxford University Press,
Oxford, 1993.
[53] Foundation of the Hellenic World, available at: http://www.ime.gr.
[54] Cortona, VRML Client – Web3D Products, available at: http://www.
parallelgraphics.com/products/cortona/.
[55] Y.M. Kwon, J.E. Hwang, T.S. Lee, M.J. Lee, J.K. Suhl, S.W. Ryul, Toward
the synchronized experiences between real and virtual museum, in: Proceedings
of Conference of APAN, January 2003, Japan, 2003.
[56] B.B. Bederson, Audio augmented reality: a prototype automated tour guide,
in: R. Mack, J. Miller, I. Katz, L. Marks (Eds.), Proceedings of the ACM
Conference on CHI’95, Denver Colorado, USA, May 1995, ACM Press,
New York, 2003.
[57] K. Mase, R. Kadobayashi, R. Nakatsu, Meta-Museum: A Supportive
Augmented-Reality Environment for Knowledge Sharing, in: Proceedings
of the Conference VSMM ‘96, Japan, September 1996, IEEE Computer
Society Press, 1996, pp. 107–110.
[58] P. Milgram, F. Kishino, A Taxonomy of Mixed Reality Visual Displays,
IEICE Transactions on Information and Systems, Special issue on Networked
Reality, E77-D (12), (1994) 1321–1329.
[59] C.E. Hughes, C.B. Stapleton, D.E. Hughes, E. Smith, Mixed reality in
education, entertainment and training: An interdisciplinary approach, IEEE
Computer Graphics and Applications 26 (6) (2005) 24–30.
[60] T. Hall, L. Ciolﬁ, M. Fraser, S. Benford, J. Bowers, C. Greenhalgh, S. Hellstrom,
S. Izadi, H. Schnadelbach, The visitor as virtual archaeologist: using
mixed reality technology to enhance education and social interaction in the
museum, in: S. Spencer (Ed.), Proceedings of the VAST 2001 Conference,
Greece, November 2001, ACM Press, New York, 2001.
[61] B. Baird, Using haptics and sound in a virtual gallery, in: M.A. Srinivasan
(Ed.), Proceedings of the Fifth Annual PHANToM Users Group Workshop,
October 2000, Aspen, Colorado, USA, 2000.
[62] S. Brewster, The impact of haptic ‘Touching’ technology on cultural applications,
in: J. Hemsley (Ed.), Proceedings of the EVA 2001 Conference,
Glasgow, UK, July 2001, Academic Press, Vasari UK, 2001, pp. 1–14,
s28.
[63] M. McLaughlin, G. Sukhatme, J. Hespanha, C. Sharabi, A. Ortega, G.
Medioni,Thehapticmuseum,in:V.Cappellini,J.Hemsley(Eds.),Proceedings
of the EVA 2000 Conference, Florence, Italy, March 2000, Pitagora
Editrice Bologna, 2000.
[64] M. Bergamasco, Le musée de formes purés, in: M. Bergamasco
(Ed.), Proceedings of the EVA 2000 Conference, Proceedings of the
8th IEEE International Workshop on Robot and Human Interaction,
RoMan ‘99, Pisa, Italy, September 1999, IEEE Computer Society, 1999,
pp. 27–29.
[65] A. Galani, M. Chalmers, B. Brown, I. McColl, C. Randell, A. Steed,
Developing a mixed reality co-visiting experience for local and remote
museum companions, in: J. Jacko, C. Stephanidis (Eds.), Proceedings of
the 10th Conference of HCI, Crete, Greece, June 2003, Lawrence Erlbaum
Associates, 2003, pp. 1143–1147.
528 S. Styliani et al. / Journal of Cultural Heritage 10 (2009) 520–528
[66] San Francisco Museum of Modern Art, 2001, ‘Points of Departure’,
available at: http://www.sfmoma.org/press/pressroom.asp?arch=
y&id=117&do=events.
[67] S. Sauer, S. Göbel, Focus your young visitors: kids innovation – fundamental
changes in digital edutainment, in: M. Bergamasco (Ed.), Proceedings
of the Conference Museums and the Web 2003, Charlotte, USA, Toronto,
2003, pp. 131–141.
[68] Development of the Museum Deﬁnition according to ICOM Statutes
(1946–2001), available at: http://icom.museum/hist def eng.html.
[69] R.J. Loomis, S.M. Elias, M. Wells, 2003. Website availability and visitor
motivation: An evaluation study for the Colorado Digitization Project.
Unpublished Report. Fort Collins, CO: Colorado State University, available
at: http://www.cdpheritage.org/resource/reports/loomis report.pdf.
[70] S. Sylaiou, F. Liarokapis, L. Sechidis, P. Patias, O. Georgoula, Virtual
museums, the ﬁrst results of a survey on methods and tools, in: Proceedings
of the CIPA and the ISPRS Conference, Torino, Italy, 2005, pp. 1138–1143.
[71] Museums in the USA, available at: http://www.museumca.org/
usa/alpha.html.
[72] UN Convention on the Rights of Persons with Disabilities,
http://www.ohchr.org/english/law/disabilities-convention.htm.
[73] Americans with Disabilities Act of 1990 (ADA), http://www.usdoj.
gov/crt/ada/adahom1.htm.
[74] Disability Discrimination Act 1995, available at: http://www.
disability.gov.uk/dda/.
[75] Resource Disability Action Plan, available at: http://www.mla.gov.
uk/documents/dap.pdf.
[76] Research on ‘Quality’ in Online Experiences for Museum Users,
available at: http://www.chin.gc.ca/English/Digital Content/Research
Quality/about vmc.html.
[77] S. Filippini-Fantoni, Museums with a personal touch, in: J. Hemsley, V.
Cappellini, G. Stanke (Eds.), Proceedings of EVA 2003 Conference, University
College London, July 2003, Vasari, UK, s25, 2003, pp. 1–10.
[78] J.P. Bowen, S. Filippini-Fantoni, Personalization and the web from a
museum perspective, in: D. Bearman, J. Trant (Eds.), Proceedings of the
Conference Museums and the Web 2004, Arlington, Virginia, USA, April
2004, 2004, pp. 63–78.
[79] F. Paternò, C. Mancini, Effective levels of adaptation to different types of
users in interactive museum systems, Journal of the American Society for
Information Science 51 (1) (2000) 5–13.
[80] E. Hooper-Greenhill, Museums and the Shaping of Knowledge, Routledge,
London, 1992.
[81] M. McDonald, The Museum and The Web, Comparing the Virtual
and the Physical Visits, available at: http://xroads.virginia.
edu/∼ma05/macdonald/museums/virtual.pdf.
[82] M. Economou, The evaluation of museum multimedia applications: lessons
from research, Museum Management and Curatorship 17 (2) (1998)
173–187.
[83] R.D. Pea, Beyond ampliﬁcation: Using the computer to reorganize mental
functioning, Educational Psychologist 20 (4) (1985) 167–182.
[84] J. Davallon, Une écriture éphémère : l’exposition face au multimedia,
Degrés (92–93) (1998) 25–26.
[85] M. Mokre, New technologies and established institutions, in: How Museum
Present Themselves in the World Wide Web, Technisches Museum Wien,
Austria, 1998.
[86] R. Bernier, The uses of virtual museums: the French viewpoint, in:
D. Bearman, J. Trant (Eds.), Proceedings of the Conference Museums
and the Web 2002, Boston, USA, April 2002, 2002, available at:
http://www.archimuse.com/mw2002/papers/bernier/bernier.html.
[87] G. Hein, Constructivist Learning Theory, in: Proceedings of Developing
Museum Exhibitions for Lifelong Learning Conference,
ICOM/CECA, Israel, 1991, 1991, available at: http://www.exploratorium.
edu/IFI/resources/constructivistlearning.html.
[88] J.H. Falk, L.D. Dierking, Learning from Museums: Visitor Experiences
and the Making of Meaning, Altamira Press, Walnut Creek, CA,
2000.
[89] F. Liarokapis, S. Sylaiou, D. Mountain, Personalizing Virtual and Augmented
Reality for Cultural Heritage Indoor and Outdoor Experiences, in:
Proceedingsofthe9thInternationalSymposiumonVirtualReality,Archaeology
and Cultural Heritage (VAST 08), Eurographics, Braga, Portugal, 2-5
Dec, 55–62, (2008), 2008.
[90] J. Packer, Learning for fun: The unique contribution of educational leisure
experiences, Curator: The Museum Journal 49 (3) (2006) 329–344.
[91] S. Sylaiou, P. Patias, Virtual reconstructions in archaeology and some issues
for consideration, IMEros, Journal for Culture and Technology (4) (2004)
180–191.
[92] M. Forte, About virtual archaeology: disorders, cognitive interactions and
virtuality, in: J. Barcelò, M. Forte, D.H. Sanders (Eds.), Virtual Reality in
Archaeology, BAR International Series 843, Archaeopress, Oxford, 2000,
pp. 247–259.
[93] P. Miller, J. Richards, The good, the bad, and the downright misleading:
archaeological adoption of computer visualization, in: J. Hugget, N. Ryan
(Eds.), Proceedings of the CAA 1995 Conference, Oxford, U.K., Tempus
Reparatum, BAR International Series 600, 1995, pp. 19–22.
[94] N.S. Ryan, Computer based visualisation of the past: technical ‘realism’
and historical credibility, Imaging the Past, British Museum Occasional
Paper 114 (1996) 95–108.
[95] Plato: Republic. Barnes & Noble Books, 2004.
[96] J. Bonnett, New technologies, new formalisms for historians: The 3D
virtual buildings, Literary and Linguistic Computing 19 (3) (2004)
273–287.
[97] S. Sylaiou, M. Economou, A. Karoulis, L.M. White, The evaluation of
ARCO: a lesson in curatorial competence and intuition with new technology
ACM Computers in Entertainment, 6(2), ACM Press, New York,
2008.
[98] R. Jackson, M. Bazley, D. Patten, M. King, Using the web to change the
relation between a museum and its users, in: D. Bearman, J. Trant (Eds.),
Proceedings of the Conference Museums and the Web 1998, Toronto,
Canada, April 1998, Archives and Museum Informatics, Pittsburgh, 1998.
Interactive Virtual and Augmented Reality Environments
165
8.13 Paper #13
Liarokapis, F., Anderson, E. Using Augmented Reality as a Medium to Assist Teaching in
Higher Education, Proc. of the 31st
Annual Conference of the European Association for
Computer Graphics (Eurographics 2010), Education Program, Norrkoping, Sweden, 4-7 May,
9-16, 2010.
Contribution (90%): Implementation of the AR interface and collection of all the
experimental data. Write-up of most of the paper.
Interactive Virtual and Augmented Reality Environments
174
8.14 Paper #14
Anderson, E.F., Peters, C., Halloran, J., Every, P., Shuttleworth, J., Liarokapis, F., Lane, R.,
Richards, M. In at the Deep End: An Activity-Led Introduction to First Year Creative
Computing, Computer Graphics Forum, Wiley-Blackwell, 31(6): 1852-1866, September, 2012.
Contribution (10%): Collaboration on the teaching methods and write-up of the paper.
DOI: 10.1111/j.1467-8659.2012.03066.x COMPUTER GRAPHICS forum
Volume 31 (2012), number 6 pp. 1852–1866
In at the Deep End: An Activity-Led Introduction to First Year
Creative Computing
E. F. Anderson, C. E. Peters, J. Halloran, P. Every, J. Shuttleworth, F. Liarokapis, R. Lane and M. Richards
Interactive Worlds ARG, Coventry University, UK
eikea@siggraph.org
Abstract
Misconceptions about the nature of the computing disciplines pose a serious problem to university faculties that
offer computing degrees, as students enrolling on their programmes may come to realise that their expectations
are not met by reality. This frequently results in the students’ early disengagement from the subject of their degrees
which in turn can lead to excessive ‘wastage’, that is, reduced retention. In this paper, we report on our academic
group’s attempts within creative computing degrees at a UK university to counter these problems through the
introduction of a 6 week long project that newly enrolled students embark on at the very beginning of their studies.
This group project, involving the creation of a 3D etch-a-sketch-like computer graphics application with a hardware
interface, provides a breadth-ﬁrst, activity-led introduction to the students’ chosen academic discipline, aiming to
increase student engagement while providing a stimulating learning experience with the overall goal to increase
retention. We present the methods and results of two iterations of these projects in the 2009/2010 and 2010/2011
academic years, and conclude that the approach worked well for these cohorts, with students expressing increased
interest in their chosen discipline, in addition to noticeable improvements in retention following the ﬁrst year of
the students’ studies.
ACM CCS: [Computers and Education]: K.3.2 Computer and Information Science Education—Computer science
education
1. Introduction
When applying for a university degree in the computing disciplines
[ACM06], relatively few potential students have a
fully accurate conception of what their chosen degree may entail.
Many may believe computing or computer science to be
an extension of the use of ofﬁce suites that they are familiar
with from ICT (information and communication technologies)
courses at school, confusing the degree programmes
with basic computer literacy [BM05]. This problem appears
to be exacerbated by current teaching practices in schools
[Roy10]. In other words, school or college learners may not
be aware of the differences between ICT, based on the use of
computing technology, and Computer Science with its emphasis
on problem solving and the production of solutions
which often involve programming. As a result, many students
are disappointed when they enrol at university and, to their
dismay, discover their mistake. This is reﬂected in the observable
decline of retention in computing programmes, and
to remedy this, it has been suggested to modify degrees to
become ‘more fun’ and to offer ‘multidisciplinary and crossdisciplinary
programs’ [Car06] that will keep students interested
in the subject. Unfortunately, retention problems are not
restricted to traditional computing courses, but also extend to
some of the multidisciplinary and cross-disciplinary degree
programmes, such as creative computing degrees. Creative
computing degrees are those degree programmes that expose
students to the use of computing outside of the traditional
desktop computing context. They include computing for the
creative industries (see http://www.skillset.org) and also explore
the creative use of computing itself, for example, in
wireless sensor networks, embedded in consumer devices or
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics
Association and Blackwell Publishing Ltd. Published by
Blackwell Publishing, 9600 Garsington Road, Oxford OX4
2DQ, UK and 350 Main Street, Malden, MA 02148, USA.
1852
E. F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing 1853
as collections of services that augment our physical and digital
environments. In these degrees, there is the potential to
ﬁnd a completely new set of misconceptions, where potential
students confuse programmes such as multimedia computing,
for example, with more vocational training courses for
content creation software packages or web design. These
applicants often demonstrate very strong expectations that
their courses will predominantly feature artistic and creative
content production topics, usually at the expense of more
low-level technical topics such as mathematics, computer
architectures or programming. Furthermore, the complexity
of undergraduate computing degree programmes tends to be
greatly underestimated. Once students become aware of this,
they often disengage from the subject matter, often resulting
in assessment failure or in the worst cases, withdrawal from
their degree programmes. Consequently, retaining computing
students remains a serious problem, one possible solution
for which is to deepen the students’ engagement with the
subject.
Following the adoption of a new pedagogic model by
the Faculty of Engineering and Computing (EC), Coventry
University (UK), the solution of the Creative Computing subject
group to address this problem has been the development
of an integrative, interdisciplinary learning experience, providing
new students with a breadth-ﬁrst introduction to their
chosen academic discipline. Newly enrolled students embark
on a subject-spanning group project dubbed the ‘Six Week
Challenge’ (see http://vimeo.com/neophyte/pressplay), that
encompasses the ﬁrst 6 weeks of their ﬁrst year at university,
replacing the regular teaching schedule and combining
various aspects of the courses that make up the ﬁrst year
of the creative computing degree programmes. This project,
which is not formally assessed, aspires to confront students
with a challenging and ambitious task requiring them to take
on a proactive role in problem-solving and to use their own
initiative if they want their ‘product’ to succeed. They are
encouraged to ‘learn by doing’, assuming responsibility for
their student experience in the process, aiming to engage
them closely with the subject matter of their degree programme
while improving cohort cohesion, engagement and
retention.
In this paper, we ﬁrst present (Section 2) the background
details of Activity-Led Learning (ALL), which
has been adopted as the educational methodology in the
6 week group project. In Section 3 we describe how the
group project of building a 3D etch-a-sketch-like computer
graphics application engages students with activities integrating
software and hardware development with usability
evaluation, viral marketing techniques and academic
writing. We present the results of an evaluation of the
methods, based on student surveys, in Section 4. and discuss
implications of the results in Section 5. We conclude
in Section 6 with insights gained and issues for further
consideration.
2. Activity-Led Learning (ALL)
One of the goals of higher education is to prepare students
for life by enabling them to become independent learners.
Independent learning does not come easy to students who
have adapted to becoming passive participants in the learning
process, where they are presented with all of the required
learning material, a learning style that many of them acquired
during their secondary education. The students of
this ‘Plug&Play Generation’ [AM06, AM07] are sometimes
described as suffering from shorter attention spans and impatience
with the expectation to achieve quick and effortless
results. However, ‘active involvement in learning helps the
student to develop the skills of self-learning while at the same
time contributing to a deeper, longer lasting knowledge of
the theoretical material’ [MK02]. This is a key reason why
our faculty has adopted ALL [WM08, IJP*08, PJB*10], a
student-centred approach that has its roots in problem-based
learning (PBL) [SBM04]. PBL is a constructivist instructional
method [SD95] that provides a ‘complex mixture of a
general teaching philosophy, learning objectives and goals’
[VB93].
2.1. Advantages
The problems that students are required to solve in PBL
are usually much broader and more extensive than the relatively
small, self-contained and well-deﬁned exercises used
in more traditional teaching sequences [BFG*00]. Furthermore,
in PBL and similar approaches, such as ALL, educators
take on the role of facilitator, guiding the students’ learning
and monitoring their progress [HS04], which some studies
on the subject have concluded may be superior in some aspects
over more traditional teaching methods [VB93]. Such
activity based educational approaches are supposed to work
especially well in group projects, as they take advantage
of group members’ distributed expertise by allowing the
whole group to tackle problems that would normally be
too difﬁcult for individuals [HS04], including other students
in mutually supporting roles, as well as tutors and faculty
[AM93].
PBL has gained some acceptance as an effective approach
within a variety of disciplines in higher education environments
[YG96, Fel96, BFG*00]. This may be attributed to it
providing an environment where the student is immersed, receiving
guidance and support from fellow students and where
the learning process is functional [Per92].
ALL and PBL not only lend themselves to the teaching
of computing [BFG*00] and computer graphics [MGJ06],
but the use of computer graphics itself offers the possibility
of deﬁning interesting PBL scenarios whilst also enabling
collaborative or mediated learning activities that could
lead to better learning [Tud92]. Learning occurs through
multiple interactions within the learning environment
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
1854 E .F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing
[SD95, Cam96] and thus a potential added beneﬁt of using
computer graphics in combination with PBL scenarios is
that learners engage with these using different senses, helping
them to fully immerse themselves in the learning situation
[Csi90] which could be expected to result in learning gains
[CGSG04].
2.2. Pitfalls
This type of student-centred education is not without problems,
however. It has been criticised due to the amount of
guidance given to students [KSC06], relying on the use of
‘scaffolding’, that is, close guidance of the learner’s discovery,
which others consider a simple improvement of a fundamentally
ineffective approach [SKC07]. Finding an adequate
balance for the amount of guidance given to students is one of
the challenges of this type of educational approach [BBA09],
as students might become too dependent on the provision of
guidance, defeating one of the main aims of this type of approach,
that is, to create independent problem solvers. It has
been suggested that one precondition for the success of activity
centred instruction is that participants need to already be
highly motivated, well educated and possessing some degree
of base competency in the subject area before engaging in
activities [Mer07] and that the success of PBL approaches
may depend upon the ability of students to work together
to identify and analyse problems and to generate solutions
[Cam96].
The use of scaffolding is not universally seen as a negative,
and it has been suggested that the idea of PBL implies a
‘minimally guided’ form of education is wrong [HSDC07].
Our experience has been that to be useful, it is far from
minimally guided, but also that this does not imply that the
students are encouraged to become dependent on constant
guidance from staff. The staff time requirements, however,
are signiﬁcant in comparison to traditional teaching.
2.3. Creative computing at Coventry University
The Creative Computing subject group of Coventry
University’s EC faculty delivers degree courses which aim
to produce graduates and computing professionals capable
of working in environments where art and technology
meet. Our courses have a strong Computer Science core,
balanced with studies in design theory, game development,
programming, graphics and content creation, pervasive and
sensing technologies, usability and video and sound production.
The teaching team strives to develop a strong interdisciplinary
environment integrating content from these distinct
domains.
Computing curriculum recommendations state that ‘the
breadth of the discipline should be taught early in the curriculum’
[Tuc96]. This is realised in a breadth-ﬁrst computing
curriculum, where students are exposed to the computing
domain through a broad introduction to the major areas of
Computer Science [VW00], allowing them to gain a more
comprehensive understanding and appreciation of the discipline.
They are able to gain ‘a holistic view of a topic
before they learn about more complicated details’ [DG06]
that empowers them. Important concepts are touched upon
early on to provide students with the basis for a much larger
range of activities than would be possible in more traditional/conservative
teaching sequences. This is because students
experience the tasks that they embark upon in the wider
context of the computing discipline, rather than as isolated
subject matter. While to many students this may seem intimidating
at ﬁrst, it nevertheless tends to result in much deeper
understanding.
In line with a faculty driven move towards more activityled
teaching and learning, the Creative Computing subject
group has developed a 6 week group project. The project
aims to immerse ﬁrst year students in an engaging activity
designed to address some of their apprehensions, while introducing,
in microcosm, the entire spread of topics in the
ﬁrst year curriculum. The design of the project is described
in more detail next, in Section 3
3. A Six Week Challenge—Learning by Doing
First piloted at the start of the 2009/2010 academic year
[SEA*10] (see also http://vimeo.com/neophyte/pressplay),
the activity for our creative computing degrees, including
Multimedia Computing and Games Technology pathways,
integrates software and hardware development with usability
evaluation, viral marketing techniques and academic writing.
In its reﬁned second iteration at the onset of the 2010/2011
academic year, the software development aspect focussed
on computer graphics, resulting in the students’ creation of
a computer graphics application with a physical hardware
interface. Our creative computing degree programmes are
heavily reliant on modern multimedia concepts and technologies.
‘Multimedia—-while embracing computer graphics—
describes the foray of other disciplines into the digital realm’
[Gon00] and through their projects our students not only
‘learn computer graphics’, but also ‘learn through computer
graphics’, effectively making our students’ learning experience
a hybrid of both aspects of teaching computer graphics
in context [CC09].
The purpose of a Six Week Challenge is to allow students
to evaluate the ﬂavour of the course they are about to embark
upon, addressing a number of issues in the orientation
of new students whilst promoting high levels of engagement,
which aim at both deep learning and increased retention.
The activities were intended to be challenging and engaging
without requiring assessment to monitor progress or encourage
participation. Next, we describe our rationale for ﬁnding
a suitable challenge (Section 3.1) and the details related to
running one (Section 3.2).
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
E. F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing 1855
3.1. Finding a suitable challenge
To meet our goal of engaging the students with the creative
computing discipline we had to face our own challenge of
ﬁnding a suitable set of integrative activities for students. In
the development of such activities it is important that they
are meaningful to the student [Cun99], and appropriate for
the intended student group, which in our case are absolute
beginners embarking on their ﬁrst steps in higher education.
The activities designed for the 6 week group project
would have to be related to the degree programmes of the
students, complex enough to appear challenging, yet achievable
within the set time-frame. At the same time the problem
that ‘students . . . expect to see immediate (and spectacular)
results, often before they have learned enough to achieve anything
remotely spectacular’ [AM07] needs to be addressed
by enabling the students to achieve results that appear ‘spectacular’.
We ﬁrst delivered a Six Week Challenge in the
2009/2010 academic year, and did so again in the 2010/2011
academic year. The student cohorts, staff numbers and tasks
set for both years were as follows:
• The 2009/2010 cohort consisted of 56 students, with
6 faculty members and one graduate intern involved, of
whom only 4 faculty members were actively delivering
content. Students were tasked with developing a hardware
controlled media player (see [SEA*10] for more
details).
• The 2010/2011 cohort consisted of 54 students supported
by 6 faculty and 2 teaching assistants. The students were
tasked with the development of a graphics application
based on the popular Etch A Sketch® drawing toy by
the Ohio Art Company (http://www.etch-a-sketch.com),
the computer implementation of which would not only
involve graphics, but would also provide an interesting
exercise in user interface design and evaluation [Bux86].
To provide students with an additional challenge, we extended
the basic concept of a 2D drawing toy to the third
dimension: a 3D etch-a-sketch-like graphics application
with turnable knobs as inputs for drawing on the three
axes.
In both the 2009/2010 and 2010/2011 challenges, Processing
[RF06] (http://www.processing.org) was chosen as the
development environment for the task. It is a Java-derivative
language for computer arts creation, which lends itself well
to introductory programming and computer graphics education
[PBTF09] and also interfaces with the Arduino microcontroller
[Sto09] (http://www.arduino.cc/) that we chose for
the development of the hardware interface. The Arduino is an
Open Hardware design that has been successfully employed
as an educational tool [FW10], which allows the easy creation
of input devices for computers. The kits we used were
ideal for our purposes as they did not require any soldering,
allowing the hardware to be simply slotted together.
Table 1: A Six Week Challenge consists of six sub-challenges, or
themes. Each theme adds a new element to the overall project and
can be completed by students within a week.
Week Theme Section
1 2D graphics programming 3.2.1.
2 3D graphics programming 3.2.1.
3 Hardware design & interfacing 3.2.2.
4 Usability evaluation 3.2.3.
5 Viral marketing 3.2.4.1
6 Academic communication & reporting 3.2.4.2
Our careful selection and presentation of topics was aimed
to provide students with the opportunity to quickly evaluate
the ﬂavour of the course they were about to embark upon,
addressing a number of issues in the orientation of new students
and attempting to promote high levels of engagement,
deep learning and increased retention.
3.2. Running the challenge
Since the Six Week Challenge is a group project, the
2010/2011 cohort of 54 students was split up into groups
of 6 to 7 students. For the duration of the project normal
delivery of teaching was suspended entirely whilst the teaching
team worked collaboratively with the student groups to
develop their products.
The task of creating the 3D etch-a-sketch-like graphics
application with a dedicated hardware interface was broken
down into six sub-challenges, or themes (Table 1), that each
added new elements to the overall project and that each could
be completed within 1 week, including:
• graphics programming and software control, consisting
of 2D and 3D graphics programming in the Processing
language and the mapping of manipulation functions to
keyboard controls (see Section 3.2.1).
• hardware interface, concerning the construction of a
hardware interface with the Arduino micro-controller
for the 3D etch-a-sketch-like application (see Section
3.2.2).
• usability evaluation, to consider usability aspects of the
controller—this gives the students their ﬁrst experience
of what it means for software not just to be correctly implemented,
but also acceptable to users. This topic, which
relates to human-computer interaction (HCI) and usability,
is particularly important on the degrees for which this
programme was developed (Section 3.2.3).
• dissemination (Section 3.2.4), consisting of a viral marketing
campaign (Section 3.2.4.1) and academic communication
(Section 3.2.4.2).
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
1856 E .F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing
Figure 1: The Activity-Led Instruction cycle. The
mainactivityis introduced through an introductory lecture
to subject-speciﬁc aspects of the students’ task,
which they then solve independently; this may lead to
furtheractivitiesand additional lectures that are based on
students’ needs/demands.
We employed an activity-led instruction cycle [AP09]
(Figure 1) in which students were ﬁrst, in a Monday morning
brieﬁng at the start of each week, introduced to the subchallenges.
Important subject related information was covered
in a short introductory lecture, followed by a variety
of guided learning activities that focussed on the challenge
for the week which the students participated in. The students
were then left to work out how to solve each of the
sub-challenges, being allowed to organise their remaining
time as they saw ﬁt. Teaching staff were available throughout
the week to provide encouragement and additional guidance
when requested and, depending on the student groups’
progress, to run additional sessions to cover subject areas that
the students discovered while working on their projects, with
these support sessions timetabled from Tuesday to Thursday.
No group structure was enforced, although in many groups
individuals began to take on obvious roles, such as leader,
tester or presenter.
A special ‘show and tell’ session consisting of a gathering
of all of the students and lecturers involved in the project was
organised for the end of every week (Section 3.2.4.3). This
was an opportunity for students to demonstrate their work to
the whole cohort and to members of the faculty.
Overall, this mode of delivery allows students to actively
inﬂuence the direction of their learning, as they are
given some level of control of the delivery of subjectspeciﬁc
information, that is, while students receive an introductory
lecture to subject-speciﬁc aspects in support of
their activities, any additional teaching sessions (lectures
and/or tutorials) are dependent on the students’ needs and/or
demands.
3.2.1. Graphics programming and software control
The ﬁrst set of tasks for the student groups concentrated on
the development of the graphics application. This required
each team to develop, using the ‘new to them’ Processing
language, know-how in the creation of the graphical elements
needed to create the etch-a-sketch-like application. It
consisted of:
• implementation of the drawing environment itself, commencing
with the drawing of simple 2D points, and
progressing to lines, squares and more complicated
2D shapes. A primary goal here was the understanding
of how objects could be created for display on the screen,
particularly their speciﬁcation using vertices, edges and
faces. Some experimentation took place with simple
3D objects.
• placing newly created objects within the drawing environment
in 2D and 3D. Developing an understanding of
basic afﬁne transformations in 2D and 3D (i.e. translation,
scaling and rotation). In many cases, this led to
animation attempts that required an exploration of aspects
related to the composition and redisplay of scenes,
such as single- and double-buffering.
• deﬁnition of a changeable camera/view. The need for
knowledge about the camera naturally arose from incidents
where objects unexpectedly disappeared from view
for some groups, either due to being placed outside of
the viewing area of the window or outside of a poorly
deﬁned view frustum during 3D experimentation. Some
groups also wished to be able to move the camera around
in a manner similar to popular ﬁrst person shooter games,
motivating them to learn more about camera parameters.
• user interaction, to allow the program to process input
from the keyboard and mouse. This involved a basic
understanding of event handling and the event processing
loop and was initially based on predeﬁned keyboard input
(i.e. controls that allow a user to limit movement to X, Y
and Z), while students grasped the relation between the
event loop, user input processing and scene redisplay for
animation. Basic mouse control was also introduced.
• an appropriate graphical user interface design, building on
topics learned during user interaction, but going somewhat
further to consider the ease of use for the user and
performance issues.
All of the student groups achieved at least a basic implementation
of the features and demonstrated prototypes
capable of drawing to the screen in 2D and 3D, and allowing
the screen to be cleared subsequently. Most groups exceeded
the basic requirements (see for example, Figure 2) and included
diverse additional features. Many of these related
to the selection of different drawing colours from a predeﬁned
palette, either by manual selection or, in some cases,
automatic schemes that accounted for the drawing depth by
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
E. F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing 1857
Figure 2: One of the student-created 3D drawing applications.
Keyboard controls allow the turtle to be moved in three
dimensions and enable the screen to be cleared.
changing some of the colour characteristics. A number of implementations
also featured the use of 2D shapes as brushes
with which to draw.
As students experimented with shapes and drawing in
3D, important questions arose. For example, technical issues
relating to camera set-up, object and scene rotation, and
3D object positioning using transformations, all arose naturally
as the task was feature-driven. Furthermore, in cases
where groups redisplayed the scene each frame, they also
required a means for storing and updating previously drawn
lines or shapes so that a full sketch could be displayed each
frame. This represented an interesting and challenging problem
for the students, who investigated a number of data structures
and methods to do this. In this way, students discovered
for themselves the need to understand these concepts, which
might otherwise have seemed obscure or unimportant.
Students were also encouraged to investigate different interaction
schemes, for example, by mapping different keys
onto controls and considering mouse movement. In particular,
they were tasked with attempting to control the application
using the minimum number of keys possible and to create
novel mouse-keyboard methods for control. This added
an extra challenge beyond the more obvious 1:1 mapping
between keys and functions, and additionally helped raise
important issues for consideration during the modelling of
the physical controller (Section 3.2.2) and in the usability
evaluation (Section 3.2.3) A number of groups succeeded in
enabling more advanced interaction control by combining
both the mouse and the keyboard. This proved to be very
useful when the groups were subsequently asked to design
user interaction tasks for usability studies.
3.2.2. Hardware interface
Once the basic graphics application was developed, teams
were asked to integrate their application with a dedicated
Figure 3: Example of a student group’s hardware interface.
Three potentiometers (right side of board) allow the user to
draw in three dimensions.
hardware interface and then to evaluate the hardware prototypes.
For the hardware task, students used the Arduino prototyping
platform, which allows users to quickly construct
devices ranging from simple ﬂashing lights to autonomous
spy-planes and hand-held consoles. Online resources were
provided to the student groups, including eBooks and hardware
tutorials. In addition, students were given instructions
on how to create a blinking LED device using resistors and
potentiometers. Resistors were used to protect the circuit and
the potentiometer to control the speed of a LED Scanning
light effect. At the end of the task, all of the groups had created
circuit diagrams for their etch-a-sketch-like applications
using the ‘Fritzing’ application [KWC09], and many students
created solutions with three potentiometers for controlling
the drawing (see for example, Figure 3), similar to the Digital
Airbrush by Batagelj et al. [BMTM09]. Some of the most
important keyboard functions that were assigned to hardware
buttons included: change of colour, drawing speed change,
background colour change, clearing the screen, restoring the
screen, precision mode, camera movement, zoom in/out, and
the provision of a help screen. Some groups also decided
to provide a combination of two or more button pushes to
perform a particular action, solving the problem of having
too many key assigned features.
3.2.3. Usability evaluation
The usability component of the Six Week Challenge involved
students in designing a simple usability study for their etcha-sketch.
Usability is perhaps the key issue in HCI. HCI is
concerned with how systems are used in practice, and usability
is about how to design systems so that using them is easy,
effective and enjoyable [RSP11]. Thus, students focussed
on one or two key tasks for their etch-a-sketch, running the
study on four or ﬁve users, collecting data, and analysing it
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
1858 E .F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing
to develop an informed view on whether or not the interface
to their graphics application was usable in terms of the tasks
tested.
There are alternative approaches to usability in HCI.
Broadly, studies can be ‘ethnographic’, or ‘lab-based’. Labbased
studies [DFAB03] feature pre-deﬁned tasks carried out
by users in controlled settings where there is informed consent.
Aspects of performance like completion times and error
rates are measured. This approach is useful for establishing
common issues across a range of users, and is particularly
appropriate in the context of system development, for evaluating
prototypes. In contrast, ethnographic studies [Cra03]
are of naturally occurring use of real systems in authentic
contexts of use (i.e. the real world), unscripted and uncontrolled.
This generates descriptive rather than numeric data,
and is good for looking at particular cases of use in depth.
It is particularly appropriate where the system is a product
rather than a prototype.
The lab-based approach was what was experienced by students
during the Six Week Challenge. The students engaged
in the design of a usability study in which the setting, tasks
and measurements were all pre-deﬁned and kept uniform
across different users to allow comparability of results.
In designing their usability tests, groups were ﬁrst instructed
to deﬁne the core tasks required of users by the
etch-a-sketch. To do this, it was suggested a simple task
analysis [Hor10] was created, showing the steps and any
substeps required to complete the task. This required students
to think about scoping: what should the realistic limits
of the tested task be? How long should it take? What
should count as its beginning and end, and what is the necessary
sequence of actions? Following this, students turned to
metrics: what aspects of the users’ performances could and
should be counted? This relates to a quantitative approach to
data, where numbers are the basis for claims about usability.
Students made sensible suggestions: for example, number
of errors made, and time taken overall. This naturally led
into the need for ‘baseline’ measures, that is, benchmark
performances with which to compare user performance, and
how these should be established [Nie93]. After addressing
this issue, students were asked to prepare observational instruments
(paper forms) they would use to record data, and
to explain to tutors in advance how they would carry out
data analysis, which led into consideration of individual and
mean scores, variance and representation, for example, by bar
charts. Crucially, students needed to be able to explain how
they would make usability claims on the basis of their data.
Most groups realised that the numerical scores they got from
users needed to approach or equal baselines. In that case,
it could be claimed that, in terms of tasks tested, their design
was usable. Conversely, students were asked to consider
what they could say about design revision if the numbers
were further away from baselines, that is, it was more difﬁcult
to claim usability. This issue links usability studies to
technology design and is crucial to start negotiating early on
in the study of human-computer interaction.
These methods and techniques, although elementary, are
crucial to usability studies [BTT05], but can be hard to teach.
The most difﬁcult issue is that students, while they may be
able to perform aspects of the practical work, are frequently
not so clear on how to design it or why they are doing it in the
ﬁrst place. In the context of ALL, one goal of the usability
week was to start to inculcate a scientiﬁc approach, where
claims about usability are evidence-based, and the process is
explicit, repeatable and replicable. This was eased by the fact
that the groups had a vested interest in showing the usability
of their designs. This helped leverage understanding of these
principles: in other words, it was important for groups to show
that their claims were not just their own subjective opinion,
but evidence-based according to scientiﬁc practice, in such a
way that they would gain credibility. This is a crucial hurdle
for students to clear, and the motivation provided by the Six
Week Challenge undoubtedly helped (although developing a
scientiﬁc attitude is not immediate). That there was general
appreciation of this was clear from the end-of-week groupto-peer
presentations made at the end of the usability week.
Having developed their usability tests, students had to run
them. This means engaging with users in systematic ways.
In particular, instructions needed to be developed and kept
consistent across users. Students had to learn not to interrupt
or make hints to users, and crucially to keep their own
behaviour discreet and uniform across users to control for
any researcher effect. This resulted in tests being run in ways
that began to approach professional practice. Many students
worked out that in addition to the metrics they were using,
they could add in other qualitative observational data, for
example, questions users asked, things they said, facial expressions
they made and so on. This spontaneous activity
was the beginning of the important process of gathering both
quantitative and qualitative data and looking for the complementarities
between these, particularly how qualitative data
can help explain numbers: for example, where time was slow,
did the user ask a lot of questions? If so, this might indicate
confusion, which helps explain slow times.
The main difﬁculty in teaching HCI, including usability,
is that it is highly conceptual and often abstract. Typically
it is taught by asking students to run studies on interfaces
they may not have a personal interest in. The Six Week Challenge
meant that students had a strong motivation to show
their designs were usable. Personal investment in the work
helped leverage engagement in many issues which can be a
challenge to teach, in particular the forming of a research
question for a usability study, the collection and analysis of
different types of data, realistic and relevant scoping of user
tasks, and the correct setting up and running of user sessions.
The embedding of advanced usability material within
the Six Week Challenge increased its accessibility: there
was impressive work within a short period. Our activity-led
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
E. F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing 1859
approach in general can be claimed to ease the transition
from pre-degree to degree education, particularly helping to
ameliorate the feelings of dismay and difﬁculty we identiﬁed
in the introduction (Section 1).
3.2.4. Dissemination
An important further aim beyond developing students’ technical
abilities and team work was to develop their awareness
of the importance of dissemination and how dissemination
should be tailored to both target audience and goal. Students
disseminated their work both internally and externally
through group demonstrations, a viral marketing campaign
and academic communication methods. The aims of dissemination
were to inform the work of other groups, to provide
them with the experience of presenting work to different external
target-groups, highlighting the necessity of differing
dissemination methods based on the target audience (e.g.
academics, consumers), and to think about ways in which
quantitative and qualitative feedback could be collected. In
addition to internal demonstration, students also had to disseminate
their work externally in two ways: through a viral
marketing campaign (Section 3.2.4.1) and in the form of an
academic manuscript (Section 3.2.4.2).
3.2.4.1. Viral marketing campaign This challenge involved
student groups generating publicity for their products
by creating web-pages for presenting their programs
and gathering usage statistics, as well as an online viral
advert linked to their product to tempt back users to their
groups’ product homepages. The stipulation of numbers was
an important inclusion as students would need to solve the
problem of digital veriﬁcation and customer tracking. A suggested
operational strategy for the week was to upload their
source code to an open source repository, and to upload their
executable to an online storage site or a hosted product website.
Students were encouraged to create a video or other
promotional device and to disseminate this through social
networks. For visitor tracking we demonstrated the use of
Google analytics software [Cli10] and tracking code.
This task allows the students to work in media that most of
them are familiar with already: blogs, on-line videos, social
networking sites and so on. Rather than simply allow them to
demonstrate their familiarity and facility with these media,
however, the marketing task asks them to think more critically
about what they can achieve through them, how they
might be applied in their studies or careers and ensures at
least a basic level of skill in the minority of students who,
before coming to university, have not had any experience in
this area. The students, who we might describe as ‘digital
natives’ [Pre01], do not always have a great deal of skill in
transferring their skills [KJCG08] or realise how they might
be of use in their studies or careers. The goal of this media
week component of the challenge was to get the students to
think about the context under which their future productive
Figure 4: Example of a student group’s graphics application
embedded in their website. Control knobs on the interface
allow the user to create a sketch interactively in three
dimensions.
activities may take place, and how to shape products and
messages for a particular audience. Whilst presented in a
light-hearted fashion, the media week provided opportunities
for discussions about the nature of digital goods, ethics
and piracy, copyright, open source and creative commons
solutions to intellectual property rights problems.
We found that many students published their work by
placing interactive demonstrations of their graphics applications
on their web-pages (Figure 4)and loading pre-recorded
videos on YouTube [BG09]. The Processing system provided
the necessary facilities for allowing students to do this themselves,
as it allows interactive graphics programs to be embedded
in websites. Most of the student groups successfully
completed the website integration of their applications and
interfaces, while some groups chose to provide downloadable
executables of their applications instead. Unsurprisingly,
very few of the students exhibited any difﬁculty with the technical
components of the week’s challenge: producing simple
web pages, embedding JavaScript tracking code, uploading
video content and accessing analytic data. The students performed
particularly well during this week, being able to share
their existing knowledge of how to resource web activity for
free and they welcomed the opportunity to proudly demonstrate
their achievements to their friends on social networking
sites. The graphical nature of the work seemed particularly
amenable to such sites, as a means for attracting interest from
peers and potential employers, and also serving as a starting
point for the creation of a graphics programming portfolio.
In fact, in hind-sight, we probably set the ‘number of viewers’
stipulation too low, as between them each group could
possess over one thousand contacts on social networking
sites. The more interesting learning outcomes of the week
occurred in the conversations that the tasks entailed. Some
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
1860 E .F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing
students worried about how to protect their products from
piracy (even though they were free) and then had to consider
this in the light of the fact that the tools with which they had
made them were free also. Students were encouraged to read
about copyright and creative commons solutions to the problem
of intellectual property. Similarly, the mechanics of viral
marketing were a topic for discussion during the week, leading
students to examine what makes an individual share a link
with their friends on the internet and which content was most
likely to trigger exponential sharing. This also helped raise
an awareness that the dissemination method must account
for the target audience, which may also include potential
employers.
This week’s activities also served to raise the question of
feedback, by looking at ways in which qualitative and quantitative
data could be collected. This involved accounting
for simple metrics, such as tracking the number and types
of comments and views that their work attracted. The issue
of feedback is sometimes underestimated from the students’
point of view. Graphics work published on the web may be a
very useful way for attracting comments from more skilled
graphics practitioners from around the world, as a way for
students to obtain broader formative feedback on their portfolio
work from a diverse audience.
At the end of the week presentation all of the groups had
met their viewing targets, a few were able to share customer’s
comments’ and one group had even ‘monetized’ their website
and were deriving an income stream.
3.2.4.2. Academic communication The academic writing
and research component deals directly with the process
of critically evaluating students’ own work and the work
of others, reading academic texts, synthesising arguments
and presenting information; skills that will be used and developed
throughout any degree course, yet are not necessarily
obviously critical to students beginning a technical
degree.
The task involved preparing a short paper (3–4 pages of
collaborative academic writing) providing background information
to their projects and stressing the relevance of this
research to their product. Each student group was presented
with a different research question. Many of these were related
to the graphics techniques they used and that they were tasked
with describing in their short papers. For this the groups
had to:
• engage with a number of academic texts, providing a basic
understanding of academic writing (language and style),
some of which [LR88, Lar09], originating from the computer
graphics community, were provided to them;
• adopt appropriate strategies for ﬁnding and evaluating
relevant textual sources [Gri09], including the use of
citation databases;
• learn to organise information in a logical manner, suitable
for presentation in written form, as well as for oral
presentation [Ger04].
The introductory lecture for the academic communication
week provided students with an overview of academic
writing, that is, the academic writing style and the structure
of academic texts, which students were exposed to in
a light-hearted manner [Sch96], as well as considerations
of good academic conduct, including issues of proper citing
of sources. Students were then introduced to literature
search strategies, as well as the LaTex document preparation
system [Lam86] to ease them into the practice of preparing
consistently formatted documents. Students were then
directed towards the compilation of a comprehensive reading
list of academic articles that appeared relevant to their
set research questions, providing the basis for their short review/survey
paper. Throughout this activity, students were
repeatedly briefed on the principles of academic honesty to
prevent problems like plagiarism.
The resulting short papers showed an unexpected level of
maturity, rarely seen in students in their ﬁrst year at university.
The students also developed a much greater appreciation
for the academic writing style, contrasting it to the much
more informal communication forms they were familiar with
before (Section 3.2.4.1).
3.2.4.3. Group demonstrations Over the course of the Six
Week Challenge, a special ‘show and tell’ session consisting
of a gathering of all of the students and lecturers involved in
the project was organised for the end of every week, so that
students could demonstrate the week’s results to the other
groups of their cohort, as well as to members of the faculty.
This was primarily a student-driven activity: while lecturers
had the opportunity to provide feedback on the work of the
students, the student demonstration sessions focused on students
commenting on the work of others. Most importantly,
it allowed groups to demonstrate any innovative features that
they had implemented over the course of the previous week.
We believe that the fostering of this type of constructive competition
between groups was a major contributing factor in
motivating them to seek new and interesting features to be
demonstrated the following week.
4. Evaluation
The previous sections all have an evaluative aspect, in indicating
the gains accruing from the Six Week Challenge for the
teaching of the discipline represented in each week. This suggests
that ALL has deﬁnite advantages over more traditional
teaching methods. In terms of overall evaluation, a range
of anonymous surveys were carried out, and the Six Week
Challenge was also externally evaluated, concluding that the
Six Week Challenge ‘potentially represents one of the most
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
E. F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing 1861
interesting developments in PjBL across the UK’ [Gra10]
(Graham refers to PBL as PjBL—Project Based Learning).
The external expert ‘was particularly impressed by the extent
of the students’ awareness and understanding of the active
learning approach that had been adopted. Hearing them reﬂect
on their own learning, it was clear that this awareness
was an important element of their development through the
6-week activity’ [Gra09].
In a faculty-wide student survey of the activities offered
by the 11 subject groups in the EC faculty, conducted at
the end of the 2010/2011 group project, our group’s project
was found to have received the overall best feedback from
students [WM11].
This survey asked students some key questions concerning
the relevance and importance of their learning to their futures;
how far ALL challenges are achievable; if students felt part
of a learning community; and whether the workload was
right. All these questions met with high average scores of
the order of 4 out of 5 (on a 5 point Likert-type scale),
indicating high satisfaction. Thus, it appears that, despite
the potential disorientation that computer science students
can face at degree level, discussed in the introduction (see
Section 1), students generally felt what they learned was
relevant and important, reported a sense of belonging, and
believed the workload was feasible. Students were asked
more speciﬁcally about their learning: whether or not the
ALL experience had developed their subject knowledge; how
far the teaching staff encouraged them to learn effectively;
and if there were sufﬁcient opportunities to learn from others.
Responses to such questions are important to gauge, to see
whether the passing of initiative and direction to students that
ALL implies results in any difﬁculties compared to traditional
alternatives. The average response scores for these questions
were all of the order of 4.3 out of 5, which again indicates high
satisfaction.
To complement the questions about learning, questions
were also asked about teaching: the extent to which students
were satisﬁed with how they were being taught; how
far tutors were available informally and whether the group
size and teaching environment were right. Again, these
are important questions to ask, especially concerning the
more agile, ad-hoc tutor responsiveness required throughout
an ALL process, and whether this works compared
with the more formal traditional alternatives. Again, the responses
are of the order of 4 of 5 and over, indicating high
satisfaction.
These scores are gratifying and indicate that students were
happy with the teaching and learning that took place during
the Six Week Challenge. Importantly, there seems to have
been a sense of engagement and involvement which could
help mitigate attrition rates, which, as we saw in Section 1, are
a problem in degree-level computing education. Graphics is a
tough area of computer science, but the Six Week Challenge
Table 2: Student responses to the prompt ‘Taking part in the six
week activity has helped improve my ...’. Results are based on the
responses of 56 students from the 2009/2010 cohort and displayed as
percentages, where SD, Strongly Disagree; D, Disagree; N, Neutral;
A, Agree; SA, Strongly Agree.
Improvement SD(%) D(%) N(%) A(%) SA(%)
Problem solving 0 5 17 71 7
Team-working 0 7 8 46 39
Communication 0 0 15 56 29
Time-management 0 7 27 29 37
Self conﬁdence 0 2 34 49 15
Analytical & 0 2 35 56 7
critical abilities
indicates that if an ALL approach is taken, graphics plus
linked relevant disciplines can be effectively taught with high
satisfaction at this level.
The results of the EC faculty survey are very similar to a
further anonymous survey that we conducted of the students
of the 2009/2010 cohort in our subject group, for which
the students’ responses were also highly positive. We have
been particularly concerned to track how students reﬂect on
their own learning during the 6 week period, particularly in
the absence of traditional lectures and tutorials (Table 2).
Asked if they would recommend this type of learning to
other students, 98% of our 2009/2010 cohort agreed that
they would.
‘The Six Week Challenge began as difﬁcult and uncertain
but the results showed our potential. This was a triumph’
(Student feedback on the 2009/2010 activity).
5. Discussion
Our student-centred, activity-led introduction to creative
computing through the development of a simple, yet intriguing
interactive computer graphics application, appears
to have achieved its aims. Over the course of the six weeks,
we observed the transformation in our students from ‘nervous
and unsure’ to ‘conﬁdent and proud’ as they became
increasing capable communicators. The group presentations
at the end of each week especially were an arena where the
groups competed in terms of the features and capabilities of
their product. Indeed, we believe that this competitive atmosphere
was crucial to driving student effort and engagement,
allowing us to forego assessment as a means of motivation.
We have found that:
• by introducing students to all components of their course
in a concentrated short term exercise, they are better able
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
1862 E .F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing
to assess quickly what the coming 3 years will involve in
terms of content and approach.
• by working in small groups alongside, and supported by,
the teaching team, students are rapidly introduced to our
academic community. This is further enhanced by social
activities which help to develop a strong sense of cohort
identity.
• by focussing on activity and production, students are introduced
to the practical nature of their subject and, by
example, realise that their learning will be active, rather
than passive, and that the production of technically sound
artefacts will be a predominant feature of their course.
One reason for the success may be the novelty effect of our
approach, which Vernon and Blake believed to be a possible
factor of the success of PBL, as ‘participating in something
new and different . . . may create positive attitudes by
psychological mechanisms that are unrelated to the theory,
content, or learning objectives’ [VB93]. However, a review
conducted by the EC faculty of the six week project designed
by our group has led to our project being characterised as a
‘true “high impact” activity’ [WM11] as described by the
US National Survey of Student Engagement [Nat07], which
could explain the success that this project seems to have had
with the participating students.
5.1. High points
We have experienced a number of other positive outcomes.
Our ﬁrst year students have retained a signiﬁcant degree of
group cohesion throughout the year, organising social events
and often speaking with one voice on issues that affect them.
The early use of group-based activities, which can provide
a social support structure that helps to retain students who
might otherwise consider leaving their degree programme,
is likely to be one factor that has inﬂuenced this apparent
success. Furthermore, many students have retained some of
the good habits they learned in the six week group project,
particularly in academic writing, and assessments submitted
by the students so far appear to be of a better quality than
previously observed. Furthermore, the students appear more
amenable to challenging material and tasks than in previous
years. Finally, the introduction of the Six Week Challenge
has coincided with a signiﬁcant improvement in ﬁrst year
student retention. In the 2010/2011 academic year we have
suffered no early withdrawals and at the end of this academic
year we expect year 1 retention to be over 90%.
5.2. Comparison of the 2009/2010 and 2010/2011
challenges
Although the 2009/2010 media player challenge involved
some notable computer graphics aspects, particularly relating
to the visualisation of the audio component, the graphics
component was not core to the functionality of the player
and the implementation of 3D graphical effects was voluntary.
In contrast, the nature of the 2010/2011 3D etcha-sketch
meant that 3D graphics programming formed a
mandatory core of the challenge, necessitating the use of
linear algebra and transformations to deﬁne and animate interactive
scenes. Computer graphics and mathematics lie at
the core of the students’ degree programmes; it is essential
for the games technology course and important for multimedia
computing. The 2010/2011 3D etch-a-sketch challenge
therefore seemed more relevant to the students’ degrees, providing
a better practical introduction to the use of mathematics
and programming for deﬁning 3D scenes and interactive
animations.
5.3. Issues for further consideration
The Six Week Challenge is highly resource intensive both
in terms of stafﬁng, accommodation and technology. One
important observation that should be taken seriously is that
despite our approach’s expectation that the students should
demonstrate initiative and solve the set challenges on their
own, this does not imply reduced responsibility or workload
on behalf of faculty involved in preparing the challenges and
developing teaching materials for the sessions that are led
by an instructor. In the 2010/2011 Six Week Challenge, the
project involved 6 academics and 2 teaching assistants working
with a cohort of 54 students and it is unclear how well this
activity would ‘scale up’ for larger cohorts, especially as the
instructors need to closely monitor the students’ progress to
ensure that the learning goals are met. As the student groups
have freedom in the way in which they approach any task,
their solution may very well miss a speciﬁc aspect of vital
importance to the outcome of their activities and instructors
must watch for these ‘wrong turns’ and if the need should
arise, make the students aware of potential problems with
their chosen approach. One of the more demanding aspects
of the Six Week Challenge for the support staff (besides the
physical requirements of extensive ad-hoc student support)
was ensuring that each member of the student groups was
participating as much as possible, and it was not uncommon
to ﬁnd some students trying to avoid doing parts of the tasks
they did not enjoy by taking a back seat. Generally this could
be rectiﬁed by engaging these students and trying to get them
to think about the problem faced by the group and to provide
input. This monitoring was not implemented as a formal
process or assessment, but as part of the close relationship
developed between groups and their personal tutor. Additionally
due to the problem-based, self discovery structure
of the Six Week Challenge, support staff would often ﬁnd
the demand for guidance from the students would ﬂuctuate
throughout the week depending on the overall complexity
of the task. One issue that did become apparent during the
programming element of the project was that we found that
within the groups a minority of the students had previous
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
E. F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing 1863
experience with programming, resulting in these students
tending to take on the majority of the workload in this area.
This often caused a divide in the group and would further
isolate the students who were new to computer science. The
main solution to this was for staff to encourage the students to
share knowledge with the group and for the less technically
experienced members to understand that even small contributions
to technical aspects, coupled with the experience of
seeing a software project develop, was beneﬁcial. Despite
these efforts, some students did still become disillusioned
during this activity. This could be addressed by running optional
programming orientation sessions for students who are
completely new to computer science.
Finally, we have found that student expectations are significantly
higher at the end of the Six Week Challenge, in terms
of pace and direction of their degree programme. Management
of these expectations can be problematic as the students
return to more traditional classroom formats. The delivery of
the latter has also been affected by the Six Week Challenge,
as a side effect of the suspension of regular teaching activities
for the duration of the project has been the need to redesign
courses which started after the project which now have to run
within a shorter time frame. During each academic year, students
are asked to complete a feedback questionnaire on each
module of study. When the data for this year is available, we
intend to examine how the attitudes of students might have
changed in comparison to previous years. Combined with a
study of the performance of the year group, we hope to have
a much better understanding of the longer-term effects of this
kind of introductory programme.
6. Conclusions
Our mode of delivery has very much followed the concept
of activity-led instruction, which in this context refers to
the instruction of students on how to embrace the ALL
process. At the introduction for every sub-challenge (Section
3.2), exemplar-based activity sessions were organised
with the primary purpose of familiarising students with the
process, rather than the task’s content per se. Students were
thus provided with a concrete, real-world example of the
processes involved in addressing the challenges, eventually
turning them into pro-active problem solvers who were not
‘afraid’ to face new problem domains. In this respect the
weekly ‘show and tell’ sessions were also highly useful, as
the competition they instilled between the different student
groups prompted many students to independently investigate
different techniques, which they then disseminated among
their peers—effectively students took on the role of instructors.
The evaluation of the students’ experience during the
Six Week Challenge suggests that students have reﬂected on
themselves and their learning and the reasons for which they
enrolled at university, which in itself is a positive outcome of
the six week group project.
Acknowledgements
The authors would like to thank the creative computing students
who achieved much more than we had anticipated. Furthermore
we wish to thank Sarah Wilson-Medhurst, who conducted
the faculty-wide survey for the evaluation of the six
week group project in the 11 subject groups of the EC faculty.
The screenshots shown in this article show the projects of two
of the student groups. Figure 2 shows the program developed
by the student group Clumsy Penguin Entertainment (Sareena
Hussain, Sennel Ionus, Charnjeet Kaur, Jaipreet Panesar,
Shaun Richardson, Anthony Rickhuss and Sarah Wardle).
Figure 4 shows the program developed by the group DDG
(Ahsan Ahmed, Sean Bhadrinath, William Brady, Thomas
Bridger, Constantin Cercel and Ian Evans).
References
[ACM06] ACM – ASSOCIATION FOR COMPUTING MACHINERY,
INC: Computing disciplines & majors. ACM Computing
Careers Website: http://computingcareers.acm.org/
(2006). Accessed 18 March 2012.
[AM93] ALBANESE M., MITCHELL S.: Problem based learning:
A review of literature on its outcomes and implementation
issues. Academy of Medicine 68, 1 (1993), 52–81.
[AM06] ANDERSON E., MCLOUGHLIN L.: Do robots dream of
virtual sheep: Rediscovering the karel the robot paradigm
for the plug&play generation. In Proceedings of the Fourth
Game Design and Technology Workshop and Conference
(GDTW 2006) (Liverpool, UK, 2006), pp. 92–96.
[AM07] ANDERSON E., MCLOUGHLIN L.: Critters in the classroom:
A 3D computer-game-like tool for teaching programming
to computer animation students. In Proceedings
of the ACM SIGGRAPH 2007 Educators Program
(San Diego, CA, 2007).
[AP09] ANDERSON E. F., PETERS C. E.: On the provision of
a comprehensive computer graphics education in the context
of computer games: An activity-led instruction approach.
In Proceedings of the Eurographics 2009 - Education
Papers (Munich, Germany, 2009), G. Domik and
R. Scateni (Eds.), Eurographics Association, pp. 7–14.
[BBA09] BRUNSTEIN A., BETTS S., ANDERSON J.: Practice enables
successful learning under minimal guidance. Journal
of Educational Psychology 101, 4 (2009), 790–802.
[BFG*00] BARG M., FEKETE A., GREENING T., HOLLANDS O.,
KAY J., KINGSTON J., CRAWFORD K.: Problem-based learning
for foundation computer science courses. Computer
Science Education 10, 2 (2000), 109–128.
[BG09] BURGESS J., GREEN J.: YouTube: Online Video and
Participatory Culture. Polity press, Cambridge, UK, 2009.
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
1864 E .F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing
[BM05] BEAUBOUEF T., MASON J.: Why the high attrition rate
for computer science students: Some thoughts and observations.
ACM SIGCSE Bulletin 37, 2 (2005), 103–106.
[BMTM09] BATAGELJ B., MAROVT J., TROHA M., MAHNIC
D.: Digital airbrush. In Proceedings of the 51st International
Symposium ELMAR-2009 (Zadar, Croatia, 2009),
pp. 305–308.
[BTT05] BENYON D., TURNER P., TURNER S.: Designing Interactive
Systems. Addison Wesley, Harlow, UK, 2005.
[Bux86] BUXTON W.: There’s more to interaction than meets
the eye: Some issues in manual input. In User Centered
System Design: New Perspectives on Human-Computer
Interaction, D. Norman, S. Draper (Eds.). Lawrence
Erlbaum Associates, Hillsdale, NJ, 1986, pp. 319–337.
[Cam96] CAMP G.: Problem-based learning: A paradigm
shift or a passing fad? Medical Education Online 1, 2
(1996).
[Car06] CARTER L.: Why students with an apparent aptitude
for computer science don’t choose to major in computer
science. ACM SIGCSE Bulletin 38, 1 (2006), 27–31.
[CC09] CASE C., CUNNINGHAM S.: Teaching computer
graphics in context—CGE 09 workshop report. Available
from: http://education.siggraph.org/conferences/
eurographics/eurographics-2009-computer-graphics-
education-09-workshop/teaching-computer-graphics-incontext/
(2009).
[CGSG04] CRAIG S., GRAESSER A., SULLINS J., GHOLSON B.:
Affect and learning: An exploratory look into the role
of affect in learning. Journal of Educational Media 29
(2004), 241–250.
[Cli10] CLIFTON B.: Advanced Web Metrics with Google Analytics
(2nd edition). SYBEX Inc., Alameda, CA, 2010.
[Cra03] CRABTREE A.: Designing Collaborative Systems: A
Practical Guide to Ethnography. Springer-Verlag, Secaucus,
NJ, 2003.
[Csi90] CSIKSZENTMIHALYI M.: Flow: The Psychology of Optimal
Experience. Harper and Row, NY, 1990.
[Cun99] CUNNINGHAM S.: Re-inventing the introductory
computer graphics course: Providing tools for a wider audience.
In GVE ’99: Proceedings of the Graphics and Visualization
Education Workshop (Coimbra, Portugal, 1999),
pp. 45–50.
[DFAB03] DIX A., FINLAY J. E., ABOWD G. D., BEALE R.:
Human-Computer Interaction (2nd edition) Prentice-Hall,
Upper Saddle River, NJ, 2003.
[DG06] DOMIK G., GOETZ F.: A breadth-ﬁrst approach for
teaching computer graphics. In Proceedings of the EG
Education Papers (Vienna, Austria, 2006), pp. 1–5.
[Fel96] FELTON J.: Problem-based learning as a training
modality in the occupational medicine curriculum. Occupational
Medicine-Oxford 46, 1 (1996), 5–11.
[FW10] FURMAN B., WERTZ E.: A ﬁrst course in computer
programming for mechanical engineers. In Proceedings
of the IEEE/ASME International Conference on Mechatronics
and Embedded Systems and Applications (MESA)
(Qingdao, Shandong, 2010), pp. 70–75.
[Ger04] GERODIMOS R.: How to present at conferences:
A guide for graduate students. PSA Graduate Network
Newsletter (October 2004) (2004), 13–16.
[Gon00] GONZALEZ R.: Disciplining multimedia. Multimedia,
IEEE 7, 3 (2000), 72–78.
[Gra09] GRAHAM R.: Personal communication, 2009.
[Gra10] GRAHAM R.: UK approaches to engineering projectbased
learning. White Paper sponsored by the Bernard
M, Gordon MIT Engineering Leadership Program, MIT,
Boston, MA, 2010.
[Gri09] GRISWOLD W.: How to read an engineering research
paper. Available at: http://cseweb.ucsd.edu/users/
wgg/CSE210/howtoread.html (2009). Accessed 18 March
2012.
[Hor10] HORNSBY P.: Hierarchical task analysis. UX Matters.
Available at: http://www.uxmatters.com/mt/archives/
2010/02/hierarchical-task-analysis.php (2010). Accessed
18 March 2012.
[HS04] HMELO-SILVER C.: Problem-based learning: What and
how do students learn? Educational Psychology Review
16, 3 (2004), 235–266.
[HSDC07] HMELO-SILVER C., DUNCAN R., CHINN C.: Scaffolding
and achievement in problem-based and inquiry
learning: A response to Kirschner, Sweller, and
Clark (2006). Educational Psychologist 42, 2 (2007),
99–107.
[IJP*08] IQBAL R., JAMES A., PAYNE L., ODETAYO M.,
AROCHENA H.: Moving to activity-led-learning in computer
science. In Proceedings of iPED 2008 (Coventry,
UK, 2008).
[KJCG08] KENNEDY G. E., JUDD T. S., CHURCHWARD A., GRAY
K.: First year students’ experiences with technology: Are
they really digital natives? Australasian Journal of Educational
Technology 24, 1 (2008), 108–122.
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
E. F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing 1865
[KSC06] KIRSCHNER P. A., SWELLER J., CLARK R. E.: Why
minimal guidance during instruction does not work:
An analysis of the failure of constructivist, discovery,
problem-based, experiential, and inquiry-based teaching.
Educational Psychologist 41, 2 (2006), 75–86.
[KWC09] KN¨ORIG A., WETTACH R., COHEN J.: Fritzing: a
tool for advancing electronic prototyping for designers. In
Proceedings of the 3rd International Conference on Tangible
and Embedded Interaction (Cambridge, UK, 2009),
pp. 351–358.
[Lam86] LAMPORT L.: Latex: a document preparation system.
Addison-Wesley Longman Publishing Co., Inc.,
Boston, MA, 1986.
[Lar09] LARAMEE R. S.: How to write a visualization research
paper: The art and mechanics. In Proceedings of the
EG 2009 - Education Papers (Munich, Germany, 2009),
pp. 59–66.
[LR88] LEVIN R., REDELL D. D., : An evaluation of the ninth
sosp submissions or how (and how not) to write a good systems
paper. SIGGRAPH Computer Graphics 22 (1988),
264–266.
[Mer07] MERRILL M.: A task-centered instructional strategy.
Journal of Research on Technology in Education 40,
1 (2007), 33–50.
[MGJ06] MART´I E., GIL D., JULI`A C.: A PBL experience in
the teaching of computer graphics. Computer Graphics
Forum 25, 1 (2006), 95–103.
[MK02] MCCOWAN J., KNAPPER C.: An integrated and comprehensive
approach to engineering curricula, part one:
Objectives and general approach. International Journal of
Engineering Education 18, 6 (2002), 633–637.
[Nat07] NATIONAL SURVEY OF STUDENT ENGAGEMENT: Experiences
that matter: Enhancing student learning and success.
NSSE Annual Report 2007, 2007.
[Nie93] NIELSEN J.: Usability Engineering. Morgan Kaufman,
San Francisco, CA, 1993.
[PBTF09] PELLICER J. L., BLANES J. S., TORMOS P. M., FRAU
D. C.: Using processing.org in an Introductory Computer
Graphics Course. In Proceedings of the Eurographics
2009 - Education Papers (Munich, Germany, 2009),
pp. 23–28.
[Per92] PERELMAN L.: School’s out: hyperlearning, the new
technology, and the end of education. William Morro, NY,
1992.
[PJB*10] POOLE N., JINKS R., BATE S., OLIVER M., BLAND C.:
An activity led learning experience for ﬁrst year electronic
engineers. In Proceedings of the 2010 Engineering Education
(EE2010) Conference (Birmingham, UK, 2010).
[Pre01] PRENSKY M.: Digital natives, digital immigrants part
1. On the Horizon 9 (2001), 1–6.
[RF06] REAS C., FRY B.: Processing: Programming for the
media arts. AI&Society 20 (2006), 526–538.
[Roy10] ROYAL SOCIETY: Current ICT and computer science
in schools: Damaging to UK’s future economic
prospects? Available at: http://royalsociety.org/CurrentICT-and-Computer-Science-in-schools/
(2010). Accessed
18 March 2012.
[RSP11] ROGERS Y., SHARP H., PREECE J.: Interaction Design:
Beyond Human - Computer Interaction (3rd edition)
Wiley Publishing, Chichester, UK, 2011.
[SBM04] SAVIN-BADEN M., MAJOR C.: Foundations of Problem
Based Learning. Open University Press, Buckingham,
UK, 2004.
[Sch96] SCHULMAN E.: How to write a scientiﬁc paper. Annals
of Improbable Research 2, 5 (1996), 8–9.
[SD95] SAVERY J., DUFFY T.: Problem based learning: An
instructional model and its constructivist framework. Educational
Technology 35, 5 (1995), 31–38.
[SEA*10] SHUTTLEWORTH J., EVERY P., ANDERSON E.,
HALLORAN J., PETERS C., LIAROKAPIS F.: Press play: An experiment
in creative computing using a novel pedagogic
approach. Anglo Higher 2, 1 (2010), 23–24.
[SKC07] SWELLER J., KIRSCHNER P., CLARK R.: Why minimally
guided teaching techniques do not work: A reply
to commentaries. Educational Psychologist 42, 2 (2007),
115–121.
[Sto09] STORNI C.: The ambivalence of engaging technology:
Artifacts as products and processes. In Proceedings
of the NORDIC Design Research Conference (Oslo,
Norway, 2009).
[Tuc96] TUCKER A. B., : Strategic directions in computer
science education. ACM Computing Surveys 28 (1996),
836–845.
[Tud92] TUDGE J.: Processes and consequences of peer collaboration:
A vygotskian analysis. Child development 63
(1992), 1364–1379.
[VB93] VERNON D., BLAKE R.: Does problem-based learning
work? A meta-analysis of evaluative research. Academic
Medicine 68, 7 (1993), 550–563.
[VW00] VANDERBERG S., WOLLOWSKI W.: Introducing computer
science using a breadth-ﬁrst approach and functional
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
1866 E .F. Anderson et al./An Activity-Led Introduction to First Year Creative Computing
programming. In Proceedings of SIGCSE 2000 (Austin,
TX, 2000).
[WM08] WILSON-MEDHURST S.: Towards sustainable activity
led learning innovations in teaching, learning and
assessment. In Proceedings of the 2008 Engineering
Education (EE2008) Conference (Loughborough, UK,
2008).
[WM11] WILSON-MEDHURST S.: Key ﬁndings from 2010/11
ﬁrst year UG ﬁrst integrative ALL experience. Unpublished
Report, Faculty of Engineering and Computing,
Coventry University, 2011.
[YG96] YATES W., GERDES T.: Problem-based learning in
consultation psychiatry. General Hospital Psychiatry 18,
3 (1996), 139–144.
c 2012 The Authors
Computer Graphics Forum c 2012 The Eurographics Association and Blackwell Publishing Ltd.
Interactive Virtual and Augmented Reality Environments
190
8.15 Paper #15
Anderson, E.F., McLoughlin, L., Liarokapis, F., Peters, C., Petridis, P., de Freitas, S.
Developing serious games for cultural heritage: a state-of-the-art review, Virtual Reality,
Springer, 14(4): 255-275, 2010.
Contribution (20%): Write-up of the serious games, virtual and augmented reality sections of
the paper. Also co-written the introduction and conclusions.
ORIGINAL ARTICLE
Developing serious games for cultural heritage:
a state-of-the-art review
Eike Falk Anderson • Leigh McLoughlin •
Fotis Liarokapis • Christopher Peters •
Panagiotis Petridis • Sara de Freitas
Received: 14 September 2009 / Accepted: 12 October 2010 / Published online: 16 November 2010
Ó Springer-Verlag London Limited 2010
Abstract Although the widespread use of gaming for
leisure purposes has been well documented, the use of
games to support cultural heritage purposes, such as historical
teaching and learning, or for enhancing museum
visits, has been less well considered. The state-of-the-art in
serious game technology is identical to that of the state-ofthe-art
in entertainment games technology. As a result, the
ﬁeld of serious heritage games concerns itself with recent
advances in computer games, real-time computer graphics,
virtual and augmented reality and artiﬁcial intelligence. On
the other hand, the main strengths of serious gaming
applications may be generalised as being in the areas of
communication, visual expression of information, collaboration
mechanisms, interactivity and entertainment. In this
report, we will focus on the state-of-the-art with respect to
the theories, methods and technologies used in serious
heritage games. We provide an overview of existing literature
of relevance to the domain, discuss the strengths and
weaknesses of the described methods and point out
unsolved problems and challenges. In addition, several
case studies illustrating the application of methods and
technologies used in cultural heritage are presented.
Keywords Cultural heritage Á Serious games Á
Computer games technology
1 Introduction
Computer games with complex virtual worlds for entertainment
are enjoying widespread use, and in recent years
we have witnessed the introduction of serious games,
including the use of games to support cultural heritage
purposes, such as historical teaching and learning, or for
enhancing museum visits. At the same time, game development
has been fuelled by dramatic advances in computer
graphics hardware—in turn driven by the success of video
games—which have led to a rise in the quality of real-time
computer graphics and increased realism in computer
games. The successes of games that cross over into educational
gaming—or serious gaming, such as the popular
Civilization (although ‘‘abstract and ahistorical’’ (Apperley
2006)) and Total War series of entertainment games, as well
as games and virtual worlds that are speciﬁcally developed
for educational purposes, such as Revolution (Francis 2006)
and the Virtual Egyptian Temple (Jacobson and Holden
2005), all of which exist within a cultural heritage context,
reveal the potential of these technologies to engage and
motivate beyond leisure time activities.
The popularity of video games, especially among
younger people, makes them an ideal medium for educational
purposes (Malone and Lepper 1987). As a result,
there has been a trend towards the development of more
complex, serious games, which are informed by both
pedagogical and game-like, fun elements. The term ‘serious
games’ describes a relatively new concept, computer
games that are not limited to the aim of providing entertainment,
that allow for collaborative use of 3D spaces that
E. F. Anderson (&) Á F. Liarokapis Á C. Peters
Interactive Worlds Applied Research Group (iWARG),
Coventry University, Coventry, UK
e-mail: eikea@siggraph.org
L. McLoughlin
The National Centre for Computer Animation (NCCA),
Bournemouth University, Bournemouth, UK
P. Petridis Á S. de Freitas
Serious Games Institute (SGI), Coventry University,
Coventry, UK
123
Virtual Reality (2010) 14:255–275
DOI 10.1007/s10055-010-0177-3
are used for learning and educational purposes in a number
of application domains. Typical examples are game
engines and online virtual environments that have been
used to design and implement games for non-leisure purposes,
e.g. in military and health training (Macedonia 2002;
Zyda 2005), as well as cultural heritage (Fig. 1).
This report explores the wider research area of interactive
games and related applications with a cultural heritage
context and the technologies used for their creation. Modern
games technologies (and related optimisations (Chalmers
and Debattista 2009) allow the real-time interactive visualisation/simulation
of realistic virtual heritage scenarios,
such as reconstructions of ancient sites and monuments,
while using relatively basic consumer machines. Our aim is
to provide an overview of the methods and techniques used
in entertainment games that can potentially be deployed in
cultural heritage contexts, as demonstrated by particular
games and applications, thus making cultural heritage much
more accessible.
Serious games can exist in the form of mobile applications,
simple Web-based solutions, more complex
‘mashup’ applications (e.g. combinations of social software
applications) or in the shape of ‘grown-up’ computer
games, employing modern games technologies to create
virtual worlds for interactive experiences that may include
socially based interactions, as well as mixed reality games
that combine real and virtual interactions, all of which can
be used in cultural heritage applications. This state-of-theart
report focuses on the serious games technologies that
can be found in modern computer games.
The report is divided into two main sections:
– The ﬁrst of these is concerned with the area of cultural
heritage and serious games, which integrate the core
technologies of computer games with principled pedagogical
methodologies. This is explored in a range of
characteristic case studies, which include entertainment
games that can be used for non-leisure purposes as well
as virtual museums and educationally focused and
designed cultural heritage projects.
– The second part investigates those computer games
technologies that are potentially useful for the creation
of cultural heritage games, such as real-time rendering
techniques, mixed reality technologies and subdomains
of (game) artiﬁcial intelligence. This literature review
includes discussions of strengths and weaknesses of the
most prominent methods, indicating potential uses for
cultural heritage serious games and illustrating challenges
in their application.
2 The state-of-the-art in serious games
The state-of-the-art in Serious Game technology is identical
to the state-of-the-art in Entertainment Games technology.
Both types of computer game share the same infrastructure,
or as Zyda notes, ‘‘applying games and simulations technology
to non-entertainment domains results in serious
games’’ (Zyda 2005). The main strengths of serious gaming
applications may be generalised as being in the areas of
communication, visual expression of information, collaboration
mechanisms, interactivity and entertainment.
Over the past decade, there have been tremendous
advances in entertainment computing technology, and
‘‘today’s games are exponentially more powerful and
sophisticated than those of just three or four years ago’’
(Sawyer 2002), which in turn is leading to very high consumer
expectations. Real-time computer graphics can
achieve near-photorealism and virtual game worlds are
usually populated with considerable amounts of high
quality content, creating a rich user experience. In this
respect, Zyda (2005) argues that while pedagogy is an
implicit component of a serious game, it should be secondary
to entertainment, meaning that a serious game that
is not ‘fun’ to play would be useless, independent of its
pedagogical content or value. This view is not shared by
all, and there exist design methodologies for the development
of games incorporating pedagogic elements, such as
the four-dimensional framework (de Freitas and Oliver
2006), which outlines the centrality of four elements that
can be used as design and evaluation criteria for the creation
of serious games. In any case, there is a need for the
game developers and instructional designers to work
together to develop engaging and motivating serious games
for the future.
2.1 Online virtual environments
There is a great range of different online virtual world
applications—at least 80 virtual world applications existedFig. 1 ‘Roma Nova’–experiencing ‘Rome Reborn’ as a game
256 Virtual Reality (2010) 14:255–275
123
in 2008 with another 100 planned for 2009. The ﬁeld is
extensive, not just in terms of potential use for education
and training but also in terms of actual usage and uptake by
users, which is amply illustrated by the online platform
Second Life (Linden Labs), which currently has 13 million
registered accounts worldwide. The use of Second Life for
supporting seminar activities, lectures and other educational
purposes has been documented in a number of recent
reports and a wide range of examples of Second Life use by
UK universities has been documented (Kirriemuir 2008).
Online virtual worlds provide excellent capabilities for
creating effective distance and online learning opportunities
through the provision of unique support for distributed
groups (online chat, the use of avatars, document sharing,
etc.). This beneﬁt has so far been most exploited in
business where these tools have been used to support
distributed or location-independent working groups or
communities (Jones 2005). Online virtual worlds in this
way facilitate the development of new collaborative
models for bringing together subject matter experts and
tutors from around the world, and in terms of learning
communities are opening up opportunities for learning in
international cohorts where students from more than one
country or location can learn in mixed reality contexts
including classroom and non-classroom based groups
(https://lg3d-wonderland.dev.java.net). Online virtual
worlds also notably offer real opportunities for training,
rehearsing and role playing.
2.2 Application to cultural heritage: case studies
This section provides an overview of some of the most
characteristic case studies in cultural heritage. In particular,
the case studies have been categorised into three types of
computer-game-like applications including: prototypes and
demonstrators, virtual museums and commercial historical
games.
2.2.1 Prototypes and demonstrators
The use of visualisation and virtual reconstruction of
ancient historical sites is not new, and a number of projects
have used this approach to study crowd modelling (Arnold
et al. 2008; Maim et al. 2007). Several projects are using
virtual reconstructions in order to train and educate their
users. Many of these systems have, however, never been
released to the wider public, and have only been used for
academic studies. In the following section, the most signiﬁcant
and promising of these are presented.
2.2.1.1 Roma Nova The Rome Reborn project is the
world’s largest digitisation project and has been running
for 15 years. The main aims of the project are to produce a
high-resolution version of Rome at 320 AD (Fig. 2), a
lower resolution model for creating a ‘mashup’ application
with ‘Google Earth’ (http://earth.google.com/rome/), and
ﬁnally the collaborative mode of the model for use with
virtual world applications and aimed primarily at education
(Frischer 2008).
In order to investigate the efﬁcacy of the Rome Reborn
Project for learning, exploration, re-enactment and research
of cultural and architectural aspects of ancient Rome the
serious game ‘Roma Nova’ is currently under development.
In particular, the project aims at investigating the
suitability of using this technology to support the archaeological
exploration of historically accurate societal aspects
of Rome’s life, with an emphasis on political, religious and
artistic expressions.
To achieve these objectives, the project will integrate
four cutting-edge virtual world technologies with the Rome
Reborn model, the most detailed three-dimensional model
of Ancient Rome available. These technologies include:
– the Quest3D visualisation engine (Godbersen 2008)
– Instinct(maker) artiﬁcial life engine (Toulouse University)
(Sanchez et al. 2004)
– ATOM Spoken Dialogue System (http://www.agilingua.
com)
– High-resolution, motion-captured characters and objects
from the period (Red Bedlam).
The use of the Instinct artiﬁcial life engine enables
coherent crowd animation and therefore the population of
the city of Rome with behaviour-driven virtual characters.
These virtual characters with different behaviours can
teach the player about different aspects of life in Rome
(living conditions, politics, military) (Sanchez et al. 2004).
Agilingua ATOM’s dialogue management algorithm allows
determining how the system will react: asking questions,
making suggestions, and/or conﬁrming an answer.
Fig. 2 ‘Rome Reborn’ serious game
Virtual Reality (2010) 14:255–275 257
123
This project aims to develop a researchers’ toolkit for
allowing archaeologists to test past and current hypotheses
surrounding architecture, crowd behaviour, social interactions,
topography and urban planning and development,
using Virtual Rome as a test-bed for reconstructions. By
using such a game the researches will be able to analyse the
impact of major events. For example, the use of this
technique would allow researchers to analyse the impact of
major events, such as grain distribution or the inﬂux of
people into the city. The experiences of residents and
visitors as they pass through and interact with the ancient
city can also be explored.
2.2.1.2 Ancient Pompeii Pompeii was a Roman city,
which was destroyed and completely buried in the ﬁrst
recorded eruption of the volcano Mount Vesuvius in 79
AD (Plinius 79a, Plinius 79b). For this project, a model of
ancient Pompeii was constructed using procedural methods
(Mu¨ller et al. 2005) and subsequently populated with
avatars in order to simulate life in Pompeii in real time.
The main goal of this project was to simulate a crowd of
virtual Romans exhibiting realistic behaviours in a
reconstructed district of Pompeii (Maim et al. 2007). The
virtual entities can navigate freely in several buildings in
the city model and interact with their environment (Arnold
et al. 2008).
2.2.1.3 Parthenon Project The Parthenon Project is a
short computer animation that ‘‘visually reunites the Parthenon
and its sculptural decorations’’ (Debevec 2005).
The Parthenon itself is an ancient monument, completed in
437 BC, and stands in Athens, while many of its sculptural
decorations reside in the collection of the British Museum,
London (UK). The project goals were to create a virtual
version of the Parthenon and its separated sculptural elements
so that they could be reunited in a virtual
representation.
The project involved capturing digital representations of
the Parthenon structure and the separate sculptures,
recombining them and then rendering the results. The
structure was scanned using a commercial laser range
scanner, while the sculptures were scanned using a custom
3D scanning system that the team developed speciﬁcally
for the project (Tchou 2002). The project made heavy use
of image-based lighting techniques, so that the structure
could be relit under different illumination conditions within
the virtual representation. A series of photographs were
taken of the structure together with illumination measurements
of the scene’s lighting. An inverse global illumination
technique was then applied to effectively ‘remove’ the
lighting. The resulting ‘‘lighting-independent model’’
(Debevec et al. 2004) could then be relit using any lighting
scheme desired (Tchou et al. 2004; Debevec et al. 2004).
Although the Parthenon Project was originally an offline-rendered
animation, it has since been converted to
work in real-time (Sander and Mitchell 2006; Isidoro and
Sander 2006). The original Parthenon geometry represented
a large dataset consisting of 90 million polygons
(after post-processing), which was reduced to 15 million
for the real-time version and displayed using dynamic
level-of-detail techniques. Texture data consisted of
300 MB and had to be actively managed and compressed,
while 2.1 GB of compressed High-Dynamic Range (HDR)
sky maps were reduced in a pre-processing step. The reduced
HDR maps were used for lighting, and the extracted sun
position was used to cast a shadow map.
2.2.2 Virtual museums
Modern interactive virtual museums using games technologies
(Jones and Christal 2002; Lepouras and Vassilakis
2004) provide a means for the presentation of digital
representations for cultural heritage sites (El-Hakim et al.
2006) that entertain and educate visitors (Hall et al. 2001)
in a much more engaging manner than was possible only a
decade ago. A recent survey paper that examines all the
technologies and tools used in museums was recently
published (Sylaiou et al. 2009). Here, we present several
examples of this type of cultural heritage serious game,
including some virtual museums that can be visited in realworld
museums.
2.2.2.1 Virtual Egyptian Temple This game depicts a
hypothetical Virtual Egyptian Temple (Jacobson and
Holden 2005; Troche and Jacobson 2010), which has no
real-world equivalent. The temple embodies all of the key
features of a typical New Kingdom period Egyptian temple
in a manner that an untrained audience can understand.
This Ptolemaic temple is divided into four major areas, each
one of which houses an instance of the High Priest, a pedagogical
agent.Each areaofthis virtualenvironment represents
a different feature from the architecture of that era.
The objective of the game ‘Gates of Horus’ (Jacobson
et al. 2009) is to explore the model and gather enough
information to answer the questions asked by the priest
(pedagogical agent). The game engine that this system is
based on is the Unreal Engine 2 (Fig. 3) (Jacobson and
Lewis 2005), existing both as an Unreal Tournament 2004
game modiﬁcation (Wallis 2007) for use at home, as well
as in the form of a Cave Automatic Virtual Environment
(CAVE Cruz-Neira et al. 1992) system in a real museum.
2.2.2.2 The Ancient Olympic Games The Foundation of
the Hellenic World has produced a number of gaming
applications associated with the Olympic Games in ancient
Greece (Gaitatzes et al. 2004). For example, the ‘Olympic
258 Virtual Reality (2010) 14:255–275
123
Pottery Puzzle’ exhibit the user must re-assemble a number
of ancient vases putting together pot shards. The users are
presented with a colour-coded skeleton of the vessels with
the different colours showing the correct position of the
pieces. They then try to select one piece at a time from a
heap and place it in the correct position on the vase.
Another game is the ‘Feidias Workshop’, which is a highly
interactive virtual experience taking place at the construction
site of the 15-m-tall golden ivory statue of Zeus,
one of the seven wonders of the ancient world. The visitors
enter the two-storey-high workshop and come into sight of
an accurate reconstruction of an unﬁnished version of the
famous statue of Zeus and walk among the sculptor’s tools,
scaffolding, benches, materials and moulds used to construct
it. They take the role of the sculptor’s assistants and
actively help ﬁnish the creation of the huge statue, by using
virtual tools to apply the necessary materials onto the statue,
process the ivory and gold plates, apply them onto the
wooden supporting core and add the ﬁnishing touches.
Interaction is achieved using the navigation wand of the
Virtual Reality (VR) system, onto which the various virtual
tools are attached. Using these tools, the user helps ﬁnish
the work on the statue, learning about the procedures,
materials and techniques applied for the creation of these
marvellous statues. The last example is the ‘Walk through
Ancient Olympia’, where the user, apart from visiting the
historical site, learns about the ancient games themselves
by interacting with athletes in the ancient game of pentathlon
(Fig. 4). The visitors can wonder around and visit
the buildings and learn their history and their function: the
Heraion, the oldest monumental building of the sanctuary
dedicated to the goddess Hera, the temple of Zeus, a model
of a Doric peripteral temple with magniﬁcent sculpted
decoration, the Gymnasium, which was used for the
training of javelin throwers, discus throwers and runners,
the Palaestra, where the wrestlers, jumpers and boxers
trained, the Leonidaion, which was where the ofﬁcial
guests stayed, the Bouleuterion, where athletes, relatives
and judges took a vow that they would upheld the rules of
the Games, the Treasuries of various cities, where valuable
offerings were kept, the Philippeion, which was dedicated
by Philip II, king of Macedonia, after his victory in the
battle of Chaeronea in 338 BC and the Stadium, where
most of the events took place. Instead of just observing the
games, the visitors take place in them. They can pick up the
discus or the javelin and they try their abilities in throwing
them towards the far end of the stadium. Excited about the
interaction they ask when they will be able to interact with
the wrestler one on one. A role-playing model of interaction
with alternating roles was tried here with pretty good
success as the visitors truly immersed in the environment,
wishing they could participate in more games (Gaitatzes
et al. 2004).
2.2.2.3 Virtual Priory Undercroft Located in the heart of
Coventry, UK, the Priory Undercrofts are the remains of
Coventry’s original Benedictine monastery, dissolved by
Henry VIII. Although archaeologists revealed the architectural
structure of the cathedral, the current site is not
easily accessible for the public. Virtual Priory Undercroft
offers a virtual exploration of the site in both online and
ofﬂine conﬁgurations.
Furthermore, a ﬁrst version of a serious game (Fig. 5)
has been developed at Coventry University, using the
Object-Oriented Graphics Rendering Engine (OGRE)
(Wright and Madey 2008). The motivation is to raise the
interest of children in the museum, as well as cultural
heritage in general. The aim of the game is to solve a
puzzle by collecting medieval objects that used to be
located in and around the Priory Undercroft. Each time a
new object is found, the user is prompted to answer a
question related to the history of the site. A typical userinteraction
might take the form of: ‘‘What did St. George
slay?–Hint: It is a mythical creature. –Answer: The Dragon’’,
meaning that the user then has to ﬁnd the Dragon.
2.2.3 Commercial historical games
Commercial games with a cultural heritage theme are
usually of the ‘documentary game’ (Burton 2005) genre
that depict real historical events (frequently wars and battles),
which the human player can then partake in. These
are games that were primarily created for entertainment,
but their historical accuracy allows them to be used in
educational settings as well.
2.2.3.1 History Line: 1914–1918 An early representative
of this type of game was History Line: 1914–1918 (Blue
Byte 1992), an early turn-based strategy game depicting the
events of the First World War The game was realised using
the technology of the more prominent game Battle Isle,
Fig. 3 New Kingdom Egyptian temple game
Virtual Reality (2010) 14:255–275 259
123
providing players with a 2D top-down view of the game
world, divided into hexagons that could be occupied by
military units, with the gameplay very much resembling
traditional board games.
The game’s historical context was introduced in a long
(animated) introduction, depicting the geo-political situation
of the period and the events leading up to the outbreak
of war in 1914. In between battles the player is provided
with additional information on concurrent events that
shaped the course of the conﬂict, which is illustrated with
animations and newspaper clippings from the period.
2.2.3.2 Great Battles of Rome More recently, a similar
approach was used by the History Channel’s Great Battles of
Rome (Slitherine Strategies 2007), another ‘documentary
game’, which mixes interactive 3D real-time tactical simulation
of actual battles with documentary information (Fig. 6),
including footage originally produced for TV documentaries,
which places the battles in their historical context.
2.2.3.3 Total War The most successful representatives of
this type of historical game are the games of the Creative
Assembly’s Total War series, which provide a gameplay
combination of turn-based strategy (for global events) and
real-time tactics (for battles). Here, a historical setting is
enriched with information about important events and
developmentsthatoccurredduringthe timeframe experienced
by the player. While the free-form campaigns allow the
game’s players to change the course of history, the games also
include several independent battle-scenarios with historical
background information that depict real events and allow
players to partake in moments of historical signiﬁcance.
Fig. 4 Walk through Ancient Olympia (Gaitatzes et al. 2004)
Fig. 5 Priory Undercroft—a serious game
260 Virtual Reality (2010) 14:255–275
123
The use of up-to-date games technology for rendering,
as well as the use of highly detailed game assets that are
reasonably true to the historical context, enables a fairly
realistic depiction of history. As a result, games from the
Total War series have been used to great effect in the
visualisation of armed conﬂicts in historical programmes
produced for TV (Waring 2007).
The latest titles in the series, ‘Empire: Total War’
(released in 2009), depicting events from the start of the
eighteenth century to the middle of the nineteenth century,
and ‘Napoleon: Total War’ (released in 2010), depicting
European history during the Napoleonic Wars, make use of
some of the latest developments in computer games technology
(Fig. 7). The games’ renderer is scalable to support
different types of hardware, including older systems,
especially older graphics cards (supporting the programmable
Shader Model 2), but the highest visual ﬁdelity is
only achieved on recent systems (Shader Model 3 graphics
hardware) (Gardner 2009). If the hardware allows for this,
shadows for added realism in the virtual world are
generated using Screen Space Ambient Occlusion (Mittring
2007; Bavoil and Sainz 2008), making use of existing
depth-buffer information in rendered frames. Furthermore
the virtual world of the game is provided with realistic
vegetation generated by the popular middleware system
SpeedTree (Interactive Data Visualization, Inc.), which
‘‘features realistic tree models and proves to be able to
visualise literally thousands of trees in real-time’’ (Fritsch
and Kada 2004). As a result, the human player is immersed
in the historical setting, allowing the player to re-live
history.
3 The technology of cultural heritage serious games
Modern interactive virtual environments are usually
implemented using game engines, which provide the core
technology for the creation and control of the virtual world.
A game engine is an open, extendable software system on
which a computer game or a similar application can be
built. It provides the generic infrastructure for game creation
(Zyda 2005), i.e. I/O (input/output) and resource/asset
management facilities. The possible components of game
engines include, but are not limited to the following: rendering
engine, audio engine, physics engine, animation
engine.
3.1 Virtual world system infrastructure
The shape that the infrastructure for a virtual environment
takes is dictated by a number of components, deﬁned by
function rather than organisation, the exact selection of
which determines the tasks that the underlying engine is
suitable for. A game engine does not provide data or
functions that could be associated with any game or other
application of the game engine (Zerbst et al. 2003). Furthermore,
a game engine is not just an API (Application
Programming Interface), i.e. a set of reusable components
that can be transferred between different games, but also
provides a glue layer that connects its component parts. It
is this glue layer that sets a game engine apart from an API,
making it more than the sum of its components and sub-
systems.
Modern game engines constitute complex parallel systems
that compete for limited computing resources (Blow
2004). They ‘‘provide superior platforms for rendering
multiple views and coordinating real and simulated scenes
as well as supporting multiuser interaction’’ (Lewis and
Jacobson 2002), employing advanced graphics techniques
to create virtual environments. Anderson et al. (2008)
provide a discussion of several challenges and open problems
regarding game engines, which include the precise
deﬁnition of the role of content creation tools in the game
Fig. 6 Great Battles of Rome
Fig. 7 Reliving the battle of Brandywine Creek (McGuire 2006) in
‘Empire: Total War’
Virtual Reality (2010) 14:255–275 261
123
development process and as part of game engines, as well
as the identiﬁcation of links between game genres and
game engine architecture, both of which play a crucial role
in the process of selecting an appropriate game engine for a
given project.
Frequently, the technology used for the development of
virtual environments, be they games for entertainment,
serious games or simulations, is limited by the development
budget. Modern entertainment computer games frequently
require ‘‘a multimillion-dollar budget’’ (Overmars
2004) that can now rival the budgets of feature ﬁlm productions,
a signiﬁcant proportion of which will be used for
asset creation (such as 3D models and animations). Some
of these costs can be reduced through the use of procedural
modelling techniques for the generation of assets, including
terrain (Noghani et al. 2010), vegetation (Lintermann and
Deussen 1999) or whole urban environments (Vanegas
et al. 2009). Game developers are usually faced with the
choice of developing a proprietary infrastructure, i.e. their
own game engine, or to use an existing engine for their
virtual world application. Commercially developed game
engines are usually expensive, and while there are affordable
solutions, such as the Torque game engine which is
favoured by independent developers and which has been
successfully used in cultural heritage applications (Leavy
et al. 2007; Mateevitsi et al. 2008), these generally provide
fewer features, thus potentially limiting their usefulness. If
one of the project’s requirements is the use of highly
realistic graphics with a high degree of visual ﬁdelity, this
usually requires a recent high-end game engine, the most
successful of which usually come at a very high licensing
fee.
There are alternatives, however, as several older commercially
developed engines have been released under
Open Source licences, such as the Quake 3 engine (id Tech
3) (Smith and Trenholme 2008; Wright and Madey 2008),
making them easily accessible, and while they do not
provide the features found in more recently published
games, they nevertheless match the feature sets of the
cheaper commercial engines. Furthermore, there exist open
source game engines such as the Nebula Device (Re´mond
and Mallard 2003), or engine components, such as OGRE
(Re´mond and Mallard 2003; Wright and Madey 2008) or
ODE (Open Dynamics Engine) (Macagon and Wu¨nsche
2003), which are either commercially developed or close to
commercial quality, making them a viable platform for the
development of virtual worlds, although they may lack the
content creation tools that are frequently packaged with
larger commercial engines.
Finally, there is the possibility of taking an existing
game and modifying it for one’s own purposes, which
many recent games allow users to do (Wallis 2007; Smith
and Trenholme 2008). This has the beneﬁt of small
up-front costs, as the only requirement is the purchase of a
copy of the relevant game, combined with access to highspec
modern game engines, as well as the content development
tools that they contain. Examples for this are the
use of the game Civilization III for the cultural heritage game
The History Game Canada (http://historycanadagame.com)
or the use of the Unreal Engine 2 (Smith and Trenholme
2008) for the development of an affordable CAVE (Jacobson
and Lewis 2005), which has been used successfully in cultural
heritage applications (Jacobson and Holden 2005).
3.2 Virtual world user interfaces
There are different types of interface that allow users to
interact with virtual worlds. These fall into several different
categories, such as VR and Augmented Reality (AR),
several of which are especially useful for cultural heritage
applications, and which are presented in this section.
3.2.1 Mixed reality technologies
In 1994, (Milgram and Kishino 1994) tried to depict the
relationship between VR and AR. To illustrate this, he
introduced two new terms called Mixed Reality (MR),
which is a type of VR but has a wider concept than AR,
(Tamura et al. 2001) and Augmented Virtuality (AV). On
the left-hand side of the Reality-Virtuality continuum,
there is the representation of the real world and on the
right-hand side there is the ultimate synthetic environment.
MR stretches out in-between these environments, and it can
be divided into two sub-categories: AR and AV (Milgram
and Kishino 1994). AR expands towards the real world,
and thus it is less synthetic than AV which expands
towards virtual environments. To address the problem from
another perspective, a further distinction has been made.
This refers to all the objects that form an AR environment:
real objects and virtual objects. Real objects are these,
which always exist no matter what the external conditions
may be. On the other hand, a virtual object depends on
external factors but mimics objects of reality. Some of the
most interesting characteristics that distinguish virtual
objects, which include holograms and mirror images, and
real objects are illustrated below (Milgram and Kishino
1994).
The most obvious difference is that a virtual object can
only be viewed through a display device after it has been
generated and simulated. Real-world objects that exist in
essence, on the contrary, can be viewed directly and/or
through a synthetic device. Another factor is the quality of
viewed images that are generated using state-of-the-art
technologies. Virtual information cannot be sampled
directly but must be synthesised, therefore, depending on
the chosen resolution, displayed objects may appear real,
262 Virtual Reality (2010) 14:255–275
123
but their appearance does not guarantee that the objects are
real. Virtual and real information may be distinguished
depending on the luminosity of the location that it appears
in. Images of real-world objects receive lighting information
from the position at which they appear to be located
while virtual objects do not necessarily, unless the virtual
scene is lit exactly like the real-world location in which
objects appear to be displayed. This is true for directly
viewed real-world objects, as well as displayed images of
indirectly viewed objects.
3.2.2 Virtual reality
Ivan Sutherland originally introduced the ﬁrst Virtual
Reality (VR) system in the 1960s (Sutherland 1965).
Nowadays VR is moving from the research laboratories to
the working environment by replacing ergonomically limited
HMDs (Head-Mounted Displays) with projective displays
(such as the well known CAVE and Responsive
Workbench) as well as online VR communities. In a typical
VR system the user’s natural sensory information is
completely replaced with digital information. The user’s
experience of a computer-simulated environment is called
immersion. As a result, VR systems can completely
immerse a user inside a synthetic environment by blocking
all the signals of the real world. In addition, a VR simulated
world does not always have to obey all laws of nature. In
immersive VR systems, the most common problems of VR
systems are of emotional and psychological nature
including motion sickness, nausea, and other symptoms,
which are created by the high degree of immersiveness of
the users.
Moreover, internet technologies have the tremendous
potential of offering virtual visitors ubiquitous access via
the World Wide Web (WWW) to online virtual environments.
Additionally, the increased efﬁciency of Internet
connections (i.e. ADSL/broadband) makes it possible to
transmit signiﬁcant media ﬁles relating to the artefacts of
virtual museum exhibitions. The most popular technology
for WWW visualisation includes Web3D which offers
tools such as the Virtual Reality Modeling Language
(VRML–http://www.web3d.org/x3d/vrml/) and its successor
X3D (http://www.web3d.org/x3d/), which can be used
for the creation of an interactive virtual museum. Many
cultural heritage applications based on VRML have been
developed for the Web (Gatermann 2000; Paquet et al.
2001; Sinclair and Martinez 2001). Another 3D graphics
format, is COLLAborative Design Activity (COLLADA –
https://collada.org) which deﬁnes an open standard XML
schema (http://www.w3.org/XML/Schema) for exchanging
digital assets among various graphics software applications
that might otherwise store their assets in incompatible
formats. One of the main advantages of COLLADA is that
it includes more advanced physics functionality such as
collision detection and friction (which Web3D does not
support).
In addition to these, there are more powerful technologies
that have been used in museum environments, which
include the OpenSceneGraph (OSG) high performance 3D
graphics toolkit (http://www.openscenegraph.org/projects/
osg) and a variety of 3D game engines. OSG is a freely
available (open source) multi-platform toolkit, used by
museums (Calori et al. 2005: Looser et al. 2006) to
generate more powerful VR applications, especially in
terms of immersion and interactivity since it supports the
integration of text, video, audio and 3D scenes into a single
3D environment. An alternative to OpenSceneGraph, is
OpenSG, which is an open-source scene graph system used
to create real-time VR applications (http://www.opensg.
org/) On the other hand, 3D game engines are also very
powerful and they provide superior visualisation and physics
support. Both technologies (OSG and 3D game engines),
compared to VRML and X3D, can provide very realistic
and immersive museum environments but they have two
main drawbacks. First, they require advanced programming
skills in order to design and implement custom applications.
Secondly, they do not have support for mobile devices
such as PDAs and third-generation mobile phones.
3.2.3 Augmented reality
The concept of AR is the opposite of the closed world of
virtual spaces (Tamura et al. 1999) since users can perceive
both virtual and real information. Compared to VR systems,
most AR systems use more complex software
approaches, usually including some form of computer
vision techniques (Forsyth and Ponce 2002) for sensing the
real world. The basic theoretical principle is to superimpose
digital information directly into a user’s sensory
perception (Feiner 2002), rather than replacing it with a
completely synthetic environment as VR systems do. An
interesting point is that both technologies—AR and VR—
may process and display the same digital information and
that they often make use of identical dedicated hardware.
Although AR systems are inﬂuenced by the same factors,
the amount of inﬂuence is much less than in VR since only
a portion of the environment is virtual. However, there is
still a lot of research to be done in AR (Azuma 1997;
Azuma et al. 2001; Livingston 2005) to measure accurately
its effects on humans.
The requirements related to the development of AR
applications in the cultural heritage ﬁeld have been well
documented (Brogni et al. 1999; Liarokapis et al. 2008;
Sylaiou et al. 2009). An interactive concept is the MetaMuseum
visualised guide system based on AR, which
tries to establish scenarios and provide a communication
Virtual Reality (2010) 14:255–275 263
123
environment between the real world and cyberspace (Mase
et al. 1996). Another AR system that could be used as an
automated tour guide in museums is the automated tour
guide, which superimposes audio in the world based on the
location of the user (Bederson 1995). There are many ways
in which archaeological sources can be used to provide a
mobile AR system. Some of the wide range of related
applications includes the initial collection of data to the
eventual dissemination of information (Ryan 2000).
MARVINS is an AR assembly, initially designed for
mobile applications and can provide orientation and navigation
possibilities in areas, such as science museums, art
museums and other historical or cultural sites. Augmented
information like video, audio and text is relayed from a
server via the transmitter-receiver to a head-mounted display
(Sanwal et al. 2000).
In addition, a number of EU projects have been undertaken
in the ﬁeld of virtual heritage. The SHAPE project
(Hall et al. 2001) combined AR and archaeology to
enhance the interaction of persons in public places like
galleries and museums by educating visitors about artefacts
and their history. The 3DMURALE project (Cosmas et al.
2001) developed 3D multimedia tools to record, reconstruct,
encode and visualise archaeological ruins in virtual
reality using as a test case the ancient city of Sagalassos in
Turkey. The Ename 974 project (Pletinckx et al. 2000)
developed a non-intrusive interpretation system to convert
archaeological sites into open-air museums, called TimeScope-1
based on 3D computer technology originally
developed by IBM, called TimeFrame. ARCHEOGUIDE
(Stricker et al. 2001) provides an interactive AR guide for
the visualisation of archaeological sites based on mobile
computing, networking and 3D visualisation providing the
users with a multi-modal interaction user interface. A
similar project is LIFEPLUS (Papagiannakis et al. 2002),
which explores the potential of AR so that users can
experience a high degree of realistic interactive immersion
by allowing the rendering of realistic 3D simulations of
virtual ﬂora and fauna (humans, animals and plants) in
real-time.
AR technologies can be combined with existing game
engine subsystems to create AR game engines (Lugrin and
Cavazza 2010) for the development of AR games. AR has
ben applied successfully to gaming in cultural heritage.
One of the earliest examples is the Virtual Showcase
(Bimber et al. 2001) which is an AR display device that
has the same form factor as a real showcase traditionally
used for museum exhibits and can be used for gaming. The
potentials of AR interfaces in museum environments and
other cultural heritage institutions (Liarokapis 2007) as
well as outdoor heritage sites (Vlahakis et al. 2002) have
been also brieﬂy explored for potential educational applications.
A more speciﬁc gaming example are the MAGIC
and TROC systems (Renevier et al. 2004) which were
based on a study of the tasks of archaeological ﬁeldwork,
interviews and observations in Alexandria. This takes the
form of a mobile game in which the players discover
archaeological objects while moving.
Another cultural heritage AR application is the serious
game SUA that was part of the BIDAIATZERA project
(Linaza et al. 2007). This project takes the form of a play
which recreates the 1813 battle between the English and
the French in San Sebastian. Researchers developed an
interactive system based on AR and VR technologies for
recreational and educational applications with tourist, cultural
and socio-economical contents, the prototype for
which was presented at the Museo del Monte Urgull in San
Sebastian.
3.3 Advanced rendering techniques
One of the most important elements of the creation of
interactive virtual environments is the visual representation
of these environments. Although serious games have
design goals that are different from those of pure entertainment
video games, they can still make use of the wide
variety of graphical features and effects that have been
developed in recent years. The state-of-the-art in this
subject area is broad and, at times, it can be difﬁcult to
specify exactly where the ‘cutting edge’ of the development
of an effect lies. A number of the techniques that are
currently in use were originally developed for ofﬂine
applications and have only recently become adopted for
use in real-time applications through improvements in
efﬁciency or hardware. Here, the ‘state-of-the-art’ for realtime
lags several years behind that for ofﬂine—good
examples of this would be raytracing or global illumination,
which we shall brieﬂy examine. A number of effects,
however, are developed speciﬁcally for immediate
deployment on current hardware and can make use of
speciﬁc hardware features—these are often written by
hardware providers themselves to demonstrate their use or,
of course, by game developers. Other real-time graphical
features and effects can be considered to follow a development
cycle, where initially they are proven in concept
demonstrations or prototypes, but are too computationally
expensive to implement in a full application or game. Over
time these techniques may then be progressively optimised
for speed, or held back until the development of faster
hardware allows their use in computer games.
The primary reason for the proliferation of real-time
graphics effects has been due to advances in low-cost
graphics hardware that can be used in standard PCs or
games consoles. Modern graphics processing units (GPUs)
are extremely powerful parallel processors and the graphics
pipeline is becoming increasingly ﬂexible. Through the use
264 Virtual Reality (2010) 14:255–275
123
of programmable shaders, which are small programs that
deﬁne and direct part of the rendering process, a wide
variety of graphical effects are now possible for inclusion
in games and virtual environments, while there also exist a
range of effects that are currently possible but still too
expensive for practical use beyond anything but the display
of simple scenes.
The graphics pipeline used by modern graphics hardware
renders geometry using rasterisation, where an object
is drawn as triangles which undergo viewing transformations
before they are converted directly into pixels. In
contrast, ray-tracing generates a pixel by ﬁring a corresponding
ray into the scene and sampling whatever it may
hit. While the former is generally faster, especially using
the hardware acceleration on modern graphics cards, it is
easier to achieve effects such as reﬂections using raytracing.
Although the ﬂexibility of modern GPUs can allow
ray-tracing (Purcell et al. 2002) in real-time (Horn et al.
2007; Shirley 2006), as well as fast ray-tracing now
becoming possible on processors used in games consoles
(Benthin et al. 2006), rasterisation is currently still the
standard technique for computer games.
Although the modern graphics pipeline is designed and
optimised to rasterise polygonal geometry, it should be
noted that other types of geometry exist. Surfaces may be
deﬁned using a mathematical representation, while volumes
may be deﬁned using ‘3D textures’ of voxels or,
again, using a mathematical formula (Engel et al. 2006).
The visualisation of volumetric ‘objects’, which are usually
semi-opaque, is a common problem that includes features
such as smoke, fog and clouds. A wide variety of options
exist for rendering volumes (Engel et al. 2006; Cerezo
et al. 2005), although these are generally very computationally
expensive and it is common to emulate a volumetric
effect using simpler methods. This often involves
drawing one or more rectangular polygons to which a fourchannel
texture has been applied (where the fourth, alpha,
channel represents transparency)—for example a cloud
element or wisp of smoke. These may be aligned to always
face the viewer as billboards (Akenine-Mo¨ller et al. 2008),
a common game technique with a variety of uses (Watt and
Policarpo 2005), or a series of these may be used to slice
through a full volume at regular intervals. An alternative
method for rendering full volumes is ray-marching, where
a volume is sampled at regular intervals along a viewing
ray, which can now be implemented in a shader (Crassin
et al. 2009), or on processors that are now being used in
games consoles (Kim and Jaja 2009).
It is sometimes required to render virtual worlds, or
objects within worlds, that are so complex or detailed that
they cannot ﬁt into the graphics memory, or even the main
memory, of the computer—this can be especially true when
dealing with volume data. Assuming that the hardware
cannot be further upgraded, a number of options exist for
such rendering problems. If the scene consists of many
complex objects at varying distances, it may be possible to
adopt a level-of-detail approach (Engel et al. 2008) and use
less complex geometry, or even impostors (AkenineMo¨ller
et al. 2008), to approximate distant objects (Sander
and Mitchell 2006). Alternatively, if only a small subsection
of the world or object is in sight at any one time, it
may be possible to hold only these visible parts in memory
and ‘stream’ replace them as new parts come into view,
which is usually achieved by applying some form of spatial
partitioning (Crassin et al. 2009). This streaming approach
can also be applied to textures that are too large to ﬁt into
graphics memory (Mittring and Crytek 2008). If too much
is visible at one time for this to be possible, a cluster of
computers may be used, where the entire scene is often too
large for a single computer to hold in memory but is able to
be distributed among the cluster with the computers’
individual renders being accumulated and composited
together (Humphreys et al. 2002) or each computer controlling
part of a multi-screen tile display (Yin et al. 2006).
3.3.1 Post-processing effects
One important category of graphical effect stems from the
ability to render to an off-screen buffer, or even to multiple
buffers simultaneously, which can then be used to form a
feedback loop. A polygon may then be drawn (either to
additional buffers or to the visible framebuffer) with the
previously rendered texture(s) made available to the
shader. This shader can then perform a variety of ‘postprocessing’
effects.
Modern engines frequently include a selection of such
effects (Feis 2007), which can include more traditional
image processing, such as colour transformations (Burkersroda
2005; Bjorke 2004), glow (James and O’Rorke
2004), or edge-enhancement (Nienhaus and Do¨llner 2003),
as well as techniques that require additional scene information
such as depth of ﬁeld (Gillham 2007; Zhou et al.
2007), motion blur (Rosado 2008) and others which will be
mentioned in speciﬁc sections later.
The extreme of this type of technique is deferred
shading, where the entire lighting calculations are performed
as a ‘post-process’. Here, the scene geometry is
rendered into a set of intermediate buffers, collectively
called the G-buffer, and the ﬁnal shading process is performed
in image-space using the data from those buffers
(Koonce 2008).
3.3.2 Transparency, reﬂection and refraction
The modern real-time graphics pipeline does not deal with
the visual representation of transparency, reﬂection or
Virtual Reality (2010) 14:255–275 265
123
refraction and their emulation must be dealt with using
special cases or tricks. Traditionally, transparency has been
emulated using alpha blending (Akenine-Mo¨ller et al.
2008), a compositing technique where a ‘transparent pixel’
is combined with the framebuffer according to its fourth
colour component (alpha). The primary difﬁculty with this
technique is that the results are order dependent, which
requires the scene geometry to be sorted by depth before it
is drawn and transparency can also present issues when
using deferred shading (Filion and McNaughton 2008). A
number of order-independent transparency techniques have
been developed, however, such as depth-peeling (Everitt
2001; Nagy and Klein 2003).
Mirrored background reﬂections may be achieved using
an environment map (Blinn and Newell 1976; Watt and
Policarpo 2005), which can be a simple but effective
method of reﬂecting a static scene. If the scene is more
dynamic, but relatively fast to render, reﬂections on a ﬂat
surface may be achieved by drawing the reﬂective surface
as transparent and mirroring the entire scene geometry
about the reﬂection surface, drawing the mirrored geometry
behind it (Fig. 8) or, for more complex scenes, using
reduced geometry methods such as impostors (Tatarchuk
and Isidoro 2006). Alternatively, six cameras can be used
to produce a dynamic environment map (Blythe 2006).
Alternative methods have also been developed to address
the lack of parallax, i.e. apparent motion offsets due to
objects at different distances, which are missing in a ﬁxed
environment map (Yu et al. 2005).
Perhaps surprisingly on ﬁrst note, simple refraction
effects can be achieved using very similar techniques to
those used for reﬂection. The only differences are that the
sample ray direction points inside the object and that it is
bent due to the difference in refractive indices of the two
materials, in accordance with Snell’s Law (Akenine-Mo¨ller
et al. 2008). Thus, environment mapping can be used for
simple refractions in a static scene, which may be expanded
to include chromatic dispersion (Fernando and Kilgard
2003). In some cases, refraction may also be achieved as a
post-processing effect (Wyman 2007).
3.3.3 Surface detail
The simplest method of adding apparent detail to a surface,
without requiring additional geometry, is texture mapping.
The advent of pixel shaders means that textures can now be
used in more diverse ways to emulate surface detail (Rost
2006; Watt and Policarpo 2005; Akenine-Mo¨ller et al.
2008).
A variety of techniques exist for adding apparent highresolution
bump detail to a low-resolution mesh. In normal
mapping (Blinn 1978) the texture map stores surface normals,
which can then be used for lighting calculations.
Parallax mapping (Kaneko et al. 2001) uses a surface
height map and the camera direction to determine an offset
for texture lookups. Relief texture mapping (Oliveira et al.
2000; Watt and Policarpo 2005) is a related technique
which performs a more robust ray-tracing of the height
map and can provide better quality results at the cost of
performance.
3.3.4 Lighting
The old ﬁxed-function graphics pipeline supported a pervertex
Gouraud lighting model [OpenGL ARB], but programmable
shaders now allow the developer to implement
their own lighting model (Rost 2006; Hoffman 2006). In
general, though, the ﬁxed-function lighting equation is split
into: a diffuse component, where direct lighting is assumed
to be scattered by micro-facets on the surface; a specular
component, which appears as a highlight and is dependent
on the angle between the viewer and the light; and an
ambient component, which is an indirect ‘background’
lighting component due to light that has bounced off other
objects in the scene (Akenine-Mo¨ller et al. 2008).
3.3.4.1 Shadows Although the graphics pipeline did not
originally support shadows, it does now provide hardware
acceleration for texture samples of a basic shadow map
(Akenine-Mo¨ller et al. 2008; Engel et al. 2008). However,
this basic method suffers from aliasing issues, is typically
low resolution and can only result in hard shadow edges.
Except in certain conditions, the majority of shadows in the
real world exhibit a soft penumbra, so there is a desire
within computer graphics to achieve efﬁcient soft shadows,
for which a large number of solutions have been developed
(Hasenfratz et al. 2003; Bavoil 2008). Shadowing complex
Fig. 8 Achieving a mirror effect by rendering the geometry twice
(Anderson and McLoughlin 2007)
266 Virtual Reality (2010) 14:255–275
123
objects such as volumes can also present issues, many of
which have also been addressed (Lokovic and Veach 2000;
Hadwiger et al. 2006; Ropinski et al. 2008).
3.3.4.2 High-Dynamic Range Lighting HDR Lighting is
a technique that has become very popular in modern games
(Sherrod 2006; Engel et al. 2008). It stems from the fact
that real world luminance has a very high dynamic range,
which means that bright surface patches are several orders
of magnitude brighter than dark surface patches—for
example, the sun at noon ‘‘may be 100 million times
brighter than starlight’’ (Reinhard et al. 2006). In general,
this means that the 8-bit integers traditionally used in each
component of the RGB triplet of pixels in the framebuffer,
are woefully inadequate for representing real luminance
ranges. Thankfully, modern hardware now allows a greater
precision in data types, so that calculations may be performed
in 16 or even 32-bit ﬂoating-point format, although
it should be noted that a performance penalty usually
occurs when using more precise formats.
One of the most striking visual effects associated with
HDR lighting is bloom, where extremely bright patches
appear to glow. Practically, this is usually applied as a postprocess
effect in a similar way to a glow effect, where
bright patches are drawn into a separate buffer which
is blurred and then combined with the original image
(Kawase 2004; Kawase 2003). This can also be applied to
low-dynamic range images, to make them appear HDR
(Sousa 2005).
Modern displays still use the traditional 8-bit per colour
component format (with a few exceptions (Seetzen et al.
2004)), so the HDR ﬂoating point results must be converted,
which is the process of tonemapping (Reinhard
et al. 2006). Some tonemapping methods allow the speciﬁcation
of a brightness, or exposure value as taken from a
physical camera analogy. In an environment where the
brightness is likely to change dramatically this exposure
should be automatically adjusted—much like a real camera
does today. Various methods are available to achieve this,
such as by downsampling the entire image to obtain the
average brightness (Kawase 2004), or by asynchronous
queries to build a basic histogram of the brightness level to
determine the required exposure (McTaggart et al. 2006;
Sheuermann and Hensley 2007).
3.3.4.3 Indirect lighting: global illumination Incident
light on a surface can originate either directly from a light
source, or indirectly from light reﬂected by another surface.
Global illumination techniques account for both of these
sources of light, although in such methods it is the indirect
lighting component that is usually of most interest and the
most difﬁcult to achieve. The main difﬁculty is that in
order to render a surface patch, the light that is reﬂected by
all other surface patches in the scene must be known. This
interdependence can be costly to compute, especially for
dynamic scenes, and although indirect lighting accounts for
a high proportion of real world illumination, the computational
cost of simulating its effects has resulted in very
limited use within real-time applications (Dutr et al. 2003).
The simplest inclusion of indirect lighting is through
pre-computed and baked texture maps, which can store
anything from direct shadows or ambient occlusion results
to those from radiosity or photon mapping (Mittring 2007).
However, this technique is only viable for completely static
objects within a static scene. Another simple global illumination
technique, which is commonly associated with
HDR lighting, is image-based lighting (Reinhard et al.
2006). Here, an environment map stores both direct and
indirect illumination as a simple HDR image, which is then
used to light objects in the scene. The image may be captured
from a real-world location, drawn by an artist as an art asset or
generated in a pre-processing stage by sampling the virtual
environment. Multiple samples can then be used to light a
dynamic character as it moves through the (static) environment
(Mitchell et al. 2006). Although the results can be very
effective, image-based lighting cannot deal with fully
dynamic sceneswithouthavingto recompute the environment
maps, which may be costly.
Fully dynamic global illumination techniques generally
work on reduced or abstracted geometry, such as using
discs to approximate the geometry around each vertex for
ambient occlusion (Shanmugam and Arikan 2007; Hoberock
and Jia 2008). It is also possible to perform some
operations as a post-process, such as ambient occlusion
(Mittring 2007) and even approximations for single-bounce
indirect lighting (Ritschel et al. 2009). The general-purpose
use of the GPU has also allowed for radiosity at near
real-time for very small scenes (Coombe and Harris 2005)
and fast, but not yet real-time, photon mapping (Purcell
et al. 2003). The latter technique can also be used to simulate
caustics, which are bright patches due to convergent
rays from a refractive object, in real-time on the GPU
(Kru¨ger et al. 2006), although other techniques for speciﬁcally
rendering caustics are also possible (Wand and
Straßer 2003), including as an image-space post-process
effect (Wyman 2007), or by applying the ’Caustic Cones’
that utilise an intensity map generated from real photographic
images (Kider et al. 2009).
3.4 Artiﬁcial intelligence
Another important aspect of the creation of populated
virtual environments as used in cultural heritage applications
is the creation of intelligent behaviour for the
inhabitants of the virtual world, which is achieved using
artiﬁcial intelligence (AI) techniques.
Virtual Reality (2010) 14:255–275 267
123
It is important to understand that when we refer to the
AI of virtual entities in virtual environments, that which we
refer to is not truly AI—at least not in the conventional
sense (McCarthy 2007) of the term. The techniques applied
to virtual worlds, such as computer games, are usually a
mixture of AI related methods whose main concern is the
creation of a believable illusion of intelligence (Scott
2002), i.e. the behaviour of virtual entities only needs to be
believable to convey the presence of intelligence and to
immerse the human participant in the virtual world.
The main requirement for creating the illusion of
intelligence is perception management, i.e. the organisation
and evaluation of incoming data from the AI entity’s
environment. This perception management mostly takes
the form of acting upon sensor information but also
includes communication between or coordination of AI
entities in environments which are inhabited by multiple
entities which may have to act co-operatively. The tasks
which need to be solved in most modern virtual world
applications such as computer games and to which the
intelligent actions of the AI entities are usually restricted
to (by convention rather than technology) are (Anderson
2003):
– decision making
– path ﬁnding (planning)
– steering (motion control)
The exact range of problems that AI entities within a
computer game have to solve depends on the context in
which they exists and the virtual environment in which the
game takes place. Combs and Ardoint (2004) state that a
popular method for the implementation of game AI is the
use of an ‘environment-based programming style’, i.e. the
creation of the virtual game world followed by the association
of AI code with the game world and the entities that
exist in it. This means that the AI entity intelligence is built
around and is intrinsically linked to the virtual game
environment. This type of entity intelligence can be created
using ‘traditional’ methods for ‘decision making’, ‘path
ﬁnding’ and ‘steering’.
Of the three common AI tasks named above, ‘decision
making’ most strongly implies the use of intelligence.
Finite state machines (FSMs) are the most commonly used
technique for implementing decision making in games (Fu
and Houlette 2004). They arrange the behaviour of an AI
entity in logical states—deﬁning one state per possible
behaviour—of which only one, the entity’s behaviour at
that point in time, is active at any one time. In game FSMs
each state is usually associated with a speciﬁc behaviour
and an entity’s actions are often implemented by linking
behaviours with pre-deﬁned animation cycles for the AI
entity that allow it to enact the selected behaviour (Orkin
2006). It is relatively simple to program a very stable FSM
that may not be very sophisticated but that ‘‘will get the job
done’’. The main drawback of FSMs is that they can
become very complex and hard to maintain, while on the
other hand the behaviour resulting from a too simple FSM
can easily become predictable. To overcome this problem
sometimes hierarchical FSMs are used that break up
complex states into a set of smaller ones that can be
combined, allowing the creation of larger and more complex
FSMs.
In recent years, there has been a move towards performing
decision making using goal-directed techniques to
enable the creation of nondeterministic behaviour. Dybsand
describes this as a technique in which an AI entity
‘‘will execute a series of actions ... that attempt to
accomplish a speciﬁc objective or goal’’ (Dybsand 2004).
In its simplest form, goal-orientation can be implemented
by determining a goal with an embedded action sequence
for a given AI entity. This action sequence, the entity’s
plan, will then be executed by the entity to satisfy the goal
(Orkin 2004a). Solutions that allow for more diverse
behaviour can improve this by selecting an appropriate
plan from a pre-computed ‘plan library’ (Evans 2001)
instead of using a built-in plan. More complex solutions
use plans that are computed dynamically, i.e. ‘on the ﬂy’,
as is the case with Goal-Oriented Action Planning (GOAP)
(Orkin 2004a). In GOAP the sequence of actions that the
system needs to perform to reach its end-state or goal is
generated in real-time by using a planning heuristic on a set
of known values which need to exist within the AI entity’s
domain knowledge. To achieve this in his implementation
of GOAP, Orkin (2004b) separates the actions and goals,
implicitly integrating preconditions and effects that deﬁne
the planner’s search space, placing the decision making
process into the domain of the planner. This can be further
improved through augmenting the representation of the
search space by associating costs with actions that can
satisfy goals, effectively turning the AI entity’s knowledge
base into a weighted graph. This then allows the use of path
planning algorithms that ﬁnd the shortest path within a
graph as the planning algorithm for the entity’s high-level
behaviour (Orkin 2006). This has the additional beneﬁt of
greater code re-use as the planning method for high-level
decision making, as well as path planning is the same and
can therefore be executed by the same code module (Orkin
2004b) if the representations of the search space are kept
identical. The most popular path planning algorithm used
in modern computer games is the A* (A-Star) algorithm
(Stout 2000; Matthews 2002; Nareyek 2004), a generalisation
of Dijkstra’s algorithm (1959). A* is optimal, i.e.
proven to ﬁnd the optimal path in a weighted graph if an
optimal solution exists (Dechter and Pearl 1985), which
guarantees that AI entities will ﬁnd the least costly path if
such a solution exists within the search space.
268 Virtual Reality (2010) 14:255–275
123
Challenges in game AI that are relevant to serious
games include the construction of intelligent interfaces
(Livingstone and Charles 2004), such as tutoring systems
or virtual guides, and particularly real-time strategy game
AI, part of which is concerned with the modelling of great
numbers of virtual entities in large scale virtual environments.
Challenges there include spatial and temporal reasoning
(Buro 2004), which can be addressed through the
use of potential ﬁelds (Hagelba¨ck and Johansson 2008).
3.4.1 Crowd simulation
The AI techniques described in the previous section are
important tools with which more complex systems can be
constructed. A domain of great potential relevance to cultural
heritage that is derived from such techniques is the
simulation of crowds of humanoid characters. If one wishes
to reconstruct and visualise places and events from the
past, a crowd of real-time virtual characters, if appropriately
attired and behaving, can add new depths of immersion
and realism to ancient building reconstructions. These
characters can feature merely as a backdrop (Ciechomski
et al. 2004) to add life to a reconstruction, or can assume
the centre stage in more active roles, for example, as virtual
tour guides to direct the spectator (DeLeon 1999). Indeed,
the type of crowd or character behaviour to be simulated
varies greatly with respect to the type of scenario that needs
to be modelled. In this vein, (Ulicny and Thalmann 2002)
model crowd behaviour of worshippers in a virtual mosque,
while (Maim et al. 2007) and (Ryder et al. 2005) focus on
the creation of more general pedestrian crowd behaviours,
the former for populating a virtual reconstruction of a city
resembling ancient Rome.
More general crowd synthesis and evaluation techniques
are also directly applicable to crowd simulation in cultural
heritage. A variety of different approaches have been
taken, most notably the use of social force models (Helbing
and Molnar 1995), path planning (Lamarche and Donikian
2004), behavioural models incorporating perception and
learning (Shao and Terzopoulos 2005) sociological effects
(Musse and Thalmann 1997) and hybrid models (Pelechano
et al. 2007).
The study of real world corpus has also been used as a
basis for synthesising crowd behaviour in approaches that
do not entail the deﬁnition of explicit behaviour models.
Lerner et al. (2007) manually track pedestrians from an
input video containing real world behaviour examples.
They use this data to construct a database of pedestrian
trajectories for different situations. At runtime, the database
is queried for similar situations matching those of the
simulated pedestrians: the closest matching example from
the database is selected as the resulting trajectory for each
pedestrian and the process is repeated.
Lee et al. (2007) simulate behaviours based on aerialview
video recordings of crowds in controlled environments.
A mixture of manual annotation and semi-automated
tracking provides information from video about
individuals’ trajectories. These are provided as inputs to an
agent movement model that can create crowd behaviours of
a similar nature to those observed in the original video.
Human perception of the animation of crowds and
characters has been increasingly recognised as an important
factor in achieving more realistic simulations.
Research has been conducted regarding the perception of
animation and motion of individuals (Reitsma and Pollard
2003; McDonnell et al. 2007), groups (Ennis et al. 2010a;
McDonnell et al. 2009a) and crowds (Peters et al. 2008;
Ennis et al. 2010b). For example, (Peters et al. 2008)
examined the perceptual plausibility of pedestrian orientations
and found that participants were able to consistently
distinguish between those virtual scenes where the character
orientations matched the orientations of the humans
in the corresponding real scenes and those where the
character orientations were artiﬁcially generated, according
to a number of different rule types. The results of such
perceptual studies can be linked to synthesis, in order to
create more credible animations (McDonnell et al. 2009b).
A key factor of differentiation between crowd control
methods concerns where knowledge is stored in the system.
One approach is to endow knowledge separately to individual
characters, an extreme example of which would
create autonomous agents that have their own artiﬁcial
perceptions, reasoning, memories, etc. with respect to the
environment, as in (Lamarche and Donikian 2004).
Another method is to place knowledge into the environment
itself, to create a shared or partially shared database
accessible to characters. According to this smart object
methodology (Peters et al. 2003), graphical objects are
tagged with behavioural information and may inform,
guide or even control characters. Such an approach is
applicable also to crowd simulation in urban environments.
For example, navigation aids, placed inside the environment
description, may be added by the designer during the
construction process. These have been referred to as
annotations (Doyle and Hayes-Roth 1998). The resulting
environment description (Farenc et al. 1999; Thomas and
Donikian 2000; Peters and O’Sullivan 2009) contains
additional geometric, semantic and spatial partitioning
information for informing pedestrian behaviour, thus
transferring a degree of the behavioural intelligence into
the environment. In (Hostetler 2002), for example, skeletal
splines are deﬁned that are aligned with walkways. These
splines, called ribbons, provide explicit information for
groups to use, such as the two major directions of travel on
the walkway. In addition to environment annotation and
mark-up, interfaces for managing the deﬁnition of crowd
Virtual Reality (2010) 14:255–275 269
123
scenarios have also been investigated. Crowdbrush (Ulicny
et al. 2004) provides an intuitive way for designers to add
crowds of characters into an environment using tools
analogous to those found in standard 2D painting packages.
It allows designers to paint crowds and apply attributes and
characteristics using a range of different tools in real-time,
obtaining immediate feedback about the results.
3.4.2 Annotated entities and environments
A fairly recent method for enabling virtual entities to
interact with one another as well as their surroundings is
the use of annotated worlds. The mechanism for this, which
we refer to using the term ‘Annotated Entities’, has been
described using various names, such as ‘Smart Terrain’
(Cass 2002), ‘Smart Objects’ (Peters et al. 2003; Orkin
2006) and ‘Annotated Environment’ (Doyle 2002), all of
which are generally interchangeable and mostly used with
very similar meanings, although slight differences in their
exact interpretation sometimes remain. A common aspect
to all of the implementations that utilise this mechanism is
the indirect approach to the creation of believable intelligent
entities.
The idea of annotated environments is a computer
application of the theory of affordance (or affordance
theory) (Cornwell et al. 2003) that was originally developed
in the ﬁelds of psychology and visual perception.
Affordance theory states that the makeup and shape of
objects contains suggestions about their usage. Affordance
itself is an abstract concept, the implementation of which is
greatly simpliﬁed by annotations that work like labels
containing instructions which provide an explicit interpretation
of affordances. Transferred into the context of a
virtual world, this means that objects in the environment
contain all of the information that an AI controlled entity
will need to be able to use them, effectively making the
environment ‘smart’.
A beneﬁcial side effect of this use of ‘annotated’ objects
(Doyle 1999) is that the complexity of the entities is neutral
to the extent of the domain knowledge that is available for
their use, i.e. the virtual entities themselves can not only be
kept relatively simple, but they do not need to be changed
at all to be able to make use of additional knowledge. This
allows for the rapid development of game scenarios
(Cornwell et al. 2003) and if all annotated objects use the
same interface to provide knowledge to the world’s entities
then there is no limit to the scalability of the system, i.e. the
abilities of AI controlled entities can practically be extended
indeﬁnitely (Orkin 2002) despite a very low impact on
the system’s overall performance. Furthermore, this
method provides an efﬁcient solution to the ‘anchoring
problem’ (Coradeschi and Safﬁotti 1999) of matching
sensor data to the symbolic representation of the virtual
entity’s knowledge as objects in the world themselves have
the knowledge as to how other virtual entities can interact
with them.
Annotations have been employed in several different
types of applications in order to achieve different effects.
They have proven popular for the animation of virtual
actors in computer animation production, where they
facilitate animation selection (Lee et al. 2006), i.e. the
choice of appropriate animation sequences that ﬁt the
environment. Other uses of annotations include the storage
of tactical information in the environment for war games
and military simulations (Darken 2007), which is implemented
as sensory annotations to direct the virtual entities’
perception of their environment. Probably the most common
form of annotations found in real-time simulated
virtual environments affects behaviour selection, usually in
combination with animation selection (Orkin 2006), i.e. the
virtual entity’s behaviour and its visual representation
(animation) are directed by the annotated objects that it
uses.
Virtual entities that inhabit these annotated worlds can
be built utilising rule-based system based on simple FSMs
in combination with a knowledge interface based on a
trigger system that allows the entities to ‘use’ knowledge
(instructions) for handling the annotated objects. The
interaction protocol employed to facilitate the communication
between entity and ‘smart’ object needs to enable
the object to ‘advertise’ its features to the entities and then
allow them to request from the object relevant instructions
(annotations) on its usage (Macedonia 2000). The success
of this technique is demonstrated by the best-selling
computer game The Sims, where ‘Smart Objects’ were
used for behaviour selection to great effect. Forbus and
Wright (2001) state that in The Sims all game entities,
objects as well as virtual characters, are implemented
as scripts that are executed in their own threads within
a multitasking virtual machine. A similar approach,
based on a scripting language that can represent the
behaviours of virtual entities, as well as the objects that
the can interact with, has been presented more recently
by Anderson (2008). These scripting-language based
approaches are most likely to provide solutions for the
creation of large scale virtual environments, such as the
serious game component of the Rome Reborn project. This
is the automatic generation of AI content (Nareyek 2007),
which in combination with techniques such as procedural
modelling of urban environments (Vanegas et al. 2009),
will require the integration of the creation of complex
annotations with the procedural generation of virtual
worlds, automating the anchoring of virtual entities into
their environment.
270 Virtual Reality (2010) 14:255–275
123
4 Conclusions
The success of computer games, fuelled among other factors
by the great realism that can be attained using modern
consumer hardware, and the key techniques of games
technology that have resulted from this, have given way to
new types of games, including serious games, and related
application areas, such as virtual worlds, mixed reality,
augmented reality and virtual reality. All of these types of
application utilise core games technologies (e.g. 3D environments)
as well as novel techniques derived from computer
graphics, human computer interaction, computer
vision and artiﬁcial intelligence, such as crowd modelling.
Together these technologies have given rise to new sets of
research questions, often following technologically driven
approaches to increasing levels of ﬁdelity, usability and
interactivity.
Our aim has been to use this state-of-the-art report to
demonstrate the potential of games technology for cultural
heritage applications and serious games, to outline
key problems and to indicate areas of technology where
solutions for remaining challenges may be found. To
illustrate that ﬁrst we presented some characteristic case
studies illustrating the application of methods and technologies
used in cultural heritage. Next, we provided an
overview of existing literature of relevance to the domain,
discussed the strengths and weaknesses of the described
methods and pointed out unsolved problems and challenges.
It is our ﬁrm belief that we are only at the
beginning of the evolution of games technology and that
there will be further improvements in the quality and
sophistication of computer games, giving rise to serious
heritage games of greater complexity and ﬁdelity than is
now achievable.
Acknowledgments The authors would like to thank the following:
The Herbert Art Gallery & Museum (Coventry, UK), Simon Bakkevig,
and Lukasz Bogaj. This report includes imagery generated using
the Virtual Egyptian Temple, which is a product of PublicVR
(http://publicvr.org).
References
Akenine-Mo¨ller T, Haines E, Hoffman N (2008) Real-time rendering
3rd edn. A. K. Peters, Natick
Anderson EF (2003) Playing smart–artiﬁcial intelligence in computer
games. In: Proceedings of zfxCON03 conference on game
development
Anderson EF (2008) Scripted smarts in an intelligent virtual
environment: behaviour deﬁnition using a simple entity annotation
language. In: Future Play ’08: Proceedings of the 2008
conference on future play, pp 185–188
Anderson EF, McLoughlin L (2007) Critters in the classroom: a 3d
computer-game-like tool for teaching programming to computer
animation students. In: SIGGRAPH ’07: ACM SIGGRAPH
2007 educators program, p 7
Anderson EF, Engel S, McLoughlin L, Comninos P (2008) The case
for research in game engine architecture. In: Future Play ’08:
Proceedings of the 2008 conference on future play, pp 228–231
Apperley TH (2006) Virtual unaustralia: Videogames and australia’s
colonial history. In: UNAUSTRALIA 2006: Proceedings of the
cultural studies association of Australasia’s annual conference
Arnold D, Day A, Glauert J, Haegler S, Jennings V, Kevelham B,
Laycock R, Magnenat-Thalmann N, Mam J, Maupu D, Papagiannakis
G, Thalmann D, Yersin B, Rodriguez-Echavarria K
(2008) Tools for populating cultural heritage environments with
interactive virtual humans. In: Open digital cultural heritage
systems, EPOCH ﬁnal event Rome
Azuma R (1997) A survey of augmented reality. Presence: teleoperators
and virtual environments 6(4):355–385
Azuma R, Baillot Y, Behringer R, Feiner S, Julier S, MacIntyre B
(2001) Recent advances in augmented reality. IEEE Comput
Graph Appl 21(6):34–47
Bavoil L (2008) Advanced soft shadow mapping techniques. Presentation
at the game developers conference 2008
Bavoil L, Sainz M (2008) Screen space ambient occlusion. NVIDIA
developer information: http://developers.nvidia.com
Bederson BB (1995) Audio augmented reality: a prototype automated
tour guide. In: CHI ’95: Conference companion on human
factors in computing systems, pp 210–211
Benthin C, Wald I, Scherbaum M, Friedrich H (2006) Ray tracing on
the cell processor, pp 15–23
Bimber O, Frhlich B, Schmalstieg D, Encarnao LM (2001) The
virtual showcase. IEEE Comput Graph Appl 21(6):48–55
Bjorke K (2004) Color controls. In: Fernando R (ed) GPU gems,
Pearson Education, pp 363–373
Blinn JF (1978) Simulation of wrinkled surfaces. SIGGRAPH
Comput Graph 12(3):286–292
Blinn JF, Newell ME (1976) Texture and reﬂection in computer
generated images. Commun ACM 19(10):542–547
Blow J (2004) Game development harder than you think. ACM
Queue 1(10):28–37
Blythe D (2006) The direct3d 10 system. ACM Trans Graph
25(3):724–734
Brogni B, Avizzano C, Evangelista C, Bergamasco M (1999)
Technological approach for cultural heritage: augmented reality.
In: RO-MAN ’99: Proceedings of the 8th IEEE international
workshop on robot and human interaction, pp 206–212
Burkersroda R (2005) Colour grading. In: Engel W (eds) Shader X3:
advanced rendering with DirectX and OpenGL. Charles River
Media, Hingham, pp 357–362
Buro M (2004) Call for ai research in rts games. In: Proceedings of
the AAAI-04 workshop on challenges in game AI, pp 139–142
Burton J (2005) News-game journalism: history, current use and
possible futures. Aust J Emerg Technol Soc 3(2):87–99
Calori L, Camporesi C, Forte M, Guidazzoli A, Pescarin S (2005)
Openheritage: integrated approach to web 3d publication of
virtual landscape. In: Proceedings of the ISPRS working group
V/4 workshop 3D-ARCH 2005: virtual reconstruction and
visualization of complex architectures
Cass S (2002) Mind games. IEEE Spectrum 39(12):40–44
Cerezo E, Perez-Cazorla F, Pueyo X, Seron F, Sillion F (2005) A
survey on participating media rendering techniques. The Visual
Comput 21(5):303–328
Chalmers A, Debattista K (2009) Level of realism for serious games.
In: VS-Games 2009: Proceedings of the IEEE virtual worlds for
serious applications ﬁrst international conference, pp 225–232
Ciechomski PDH, Ulicny B, Cetre R, Thalmann D (2004) A case
study of a virtual audience in a reconstruction of an ancient
roman odeon in aphrodisias. In: The 5th international symposium
on virtual reality, archaeology and cultural heritage, VAST
(2004)
Virtual Reality (2010) 14:255–275 271
123
Combs N, Ardoint J (2004) Declarative versus imperative paradigms
in Games AI. Available from: http://www.red3d.com/cwr/
games/
Coombe G, Harris M (2005) Global illumination using progressive
reﬁnement radiosity. In: Pharr M (ed) GPU gems 2, Pearson
Education, pp 635–647
Coradeschi S, Safﬁotti A (1999) Symbolic object descriptions to
sensor data. Problem statement. Linko¨ping Electronic Articles in
Computer and Information Science 4(9)
Cornwell J, O’Brien K, Silverman B, Toth J (2003) Affordance theory
for improving the rapid generation, composability, and reusability
of synthetic agents and objects. In: BRIMS 2003: Proceedings
of the twelfth conference on behavior representations in
modeling and simulation
Cosmas J, Itegaki T, Green D, Grabczewski E, Weimer F, Van Gool
L, Zalesny A, Vanrintel D, Leberl F, Grabner M, Schindler K,
Karner K, Gervautz M, Hynst S, Waelkens M, Pollefeys M,
DeGeest R, Sablatnig R, Kampel M (2001) 3d murale: a
multimedia system for archaeology. In: VAST ’01: Proceedings
of the 2001 conference on virtual reality, archeology, and
cultural heritage, pp 297–306
Crassin C, Neyret F, Lefebvre S, Eisemann E (2009) Gigavoxels: rayguided
streaming for efﬁcient and detailed voxel rendering. In:
I3D ’09: Proceedings of the 2009 symposium on interactive 3D
graphics and games, pp 15–22
Cruz-Neira C, Sandin DJ, DeFanti TA, Kenyon RV, Hart JC (1992)
The cave: audio visual experience automatic virtual environment.
Commun ACM 35(6):64–72
Darken CJ (2007) Level Annotation and Test by Autonomous
Exploration. In: AIIDE 2007: Proceedings of the third artiﬁcial
intelligence and interactive digital entertainment conference
Debevec P (2005) Making ‘‘The Parthenon’’. 6th international
symposium on virtual reality, archaeology, and cultural heritage
Debevec P, Tchou C, Gardner A, Hawkins T, Poullis C, Stumpfel J,
Jones A, Yun N, Einarsson P, Lundgren T, Fajardo M, Martinez
P (2004) Estimating surface reﬂectance properties of a complex
scene under captured natural illumination. Tech. rep., University
of Southern California, Institute for Creative Technologies
Dechter R, Pearl J (1985) Generalised best-ﬁrst search strategies and
the optimality of A*. J ACM 32(3):505–536
DeLeon VJ (1999) Vrnd: notre-dame cathedral: a globally accessible
multi-user real time virtual reconstruction. In: Proceedings of
virtual systems and multimedia
Dijkstra EW (1959) A note on two problems in connexion with
graphs. Numerische Mathematik 1:269–271
Doyle P (1999) Virtual intelligence from artiﬁcial reality: building
stupid agents in smart environments. In: AAAI ’99 spring
symposium on artiﬁcial intelligence and computer games
Doyle P (2002) Believability through context. In: AAMAS ’02:
Proceedings of the ﬁrst international joint conference on
autonomous agents and multiagent systems, pp 342–349
Doyle P, Hayes-Roth B (1998) Agents in annotated worlds. In:
AGENTS ’98: Proceedings of the second international conference
on autonomous agents, pp 173–180
Dutr P, Bekaert P, Bala K (2003) Advanced global illumination. A. K.
Peters, Natick
Dybsand E (2004) Goal-directed behaviour using composite tasks. In:
AI game programming wisdom 2, Charles River Media,
pp 237–245
El-Hakim S, MacDonald G, Lapointe JF, Gonzo L, Jemtrud M (2006)
On the digital reconstruction and interactive presentation of
heritage sites through time. In: International symposium on
virtual reality, archaeology and intelligent cultural heritage,
pp 243–250
Engel K, Hadwiger M, Kniss JM, Rezk-Salama C, Weiskopf D (2006)
Real-time volume graphics. A. K. Peters, Wellesley
Engel W, Hoxley J, Kornmann R, Suni N, Zink J (2008) Programming
vertex, geometry, and pixel shaders. Online book available
at: http://wiki.gamedev.net/
Ennis C, McDonnell R, O’Sullivan C (2010a) Seeing is believing:
body motion dominates in multisensory conversations. ACM
Trans Graph 29(4):1–9
Ennis C, Peters C, O’Sullivan C (2010b) Perceptual effects of scene
context and viewpoint for virtual pedestrian crowds. ACM Trans
Appl Percept (in press)
Evans R (2001) AI in computer games: the use of AI techniques in
Black & White. Seminar notes, available from: http://www.dcs.
qmul.ac.uk/research/logic/seminars/abstract/EvansR01.html
Everitt C (2001) Interactive order-independent transparency. NVIDIA
Whitepaper
Farenc N, Boulic R, Thalmann D (1999) An informed environment
dedicated to the simulation of virtual humans in urban context.
Comput Graph Forum 18(3):309–318
Feiner S (2002) Augmented reality: a new way of seeing. Sci Am
286(4):48–55
Feis A (2007) Postprocessing effects in design. In: Engel W (ed)
Shader X5: advanced rendering techniques. Charles River
Media, pp 463–470
Fernando R, Kilgard MJ (2003) The Cg tutorial. Addison Wesley
Filion D, McNaughton R (2008) Effects & techniques. In: SIGGRAPH
’08: ACM SIGGRAPH 2008 classes, pp 133–164
Forbus KD, Wright W (2001) Some notes on programming objects in
The SimsTM
. Class notes, available from: http://qrg.northwestern.
edu/papers/papers.html
Forsyth DA, Ponce J (2002) Computer vision: a modern approach.
Prentice Hall, Upper Saddle River
Francis R (2006) Revolution: learning about history through situated
role play in a virtual environment. In: Proceedings of the
American educational research association conference
de Freitas S, Oliver M (2006) How can exploratory learning with
games and simulations within the curriculum be most effectively
evaluated?. Comput Educ 46:249–264
Frischer B (2008) The rome reborn project. How technology is
helping us to study history. OpEd, November 10, University of
Virginia
Fritsch D, Kada M (2004) Visualisation using game engines. ISPRS
commission 5, pp 621–625
Fu D, Houlette R (2004) The ultimate guide to FSMs in games.
In: AI game programming Wisdom 2. Charles River Media,
pp 283–302
Gaitatzes A, Christopoulos D, Papaioannou G (2004) The ancient
olympic games: being part of the experience. In: VAST 2004:
The 5th international symposium on virtual reality, archaeology
and cultural heritage, pp 19–28
Gardner R (2009) Empire total war–graphics work shop. Available
from (the ofﬁcial) Total War blog: http://blogs.sega.com/totalwar/
2009/03/05/empire-total-war-graphics-work-shop/
Gatermann H (2000) From vrml to augmented reality via panoramaintegration
and eai-java. In: SIGraDi’2000–Construindo (n)o
espacio digital (constructing the digital Space), pp 254–256
Gillham D (2007) Real-time depth-of-ﬁeld implemented with
a postprocessing-only technique. In: Engel W (ed) Shader
X5: advanced rendering techniques. Charles River Media,
pp 163–175
Godbersen H (2008) Virtual environments for anyone. IEEE Multimedia
15(3):90–95
Hadwiger M, Kratz A, Sigg C, Bu¨hler K (2006) Gpu-accelerated deep
shadow maps for direct volume rendering. In: GH ’06:
Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS
symposium on graphics hardware, pp 49–52
Hagelba¨ck J, Johansson SJ (2008) The rise of potential ﬁelds in real
time strategy bots. In: AIIDE 08: Proceedings of the fourth
272 Virtual Reality (2010) 14:255–275
123
artiﬁcial intelligence and interactive digital entertainment conference,
pp 42–47
Hall T, Ciolﬁ L, Bannon L, Fraser M, Benford S, Bowers J,
Greenhalgh C, Hellstro¨m SO, Izadi S, Schna¨delbach H, Flintham
M (2001) The visitor as virtual archaeologist: explorations in
mixed reality technology to enhance educational and social
interaction in the museum. In: VAST ’01: Proceedings of the
2001 conference on virtual reality, archeology, and cultural
heritage, pp 91–96
Hasenfratz JM, Lapierre M, Holzschuch N, Sillion F (2003) A survey
of real-time soft shadows algorithms
Helbing D, Molnar P (1995) Social force model for pedestrian
dynamics. Phys Rev E 51(5):4282–4286
Hoberock J, Jia Y (2008) High-quality ambient occlusion. In: Nguyen
H (ed) GPU gems 3. Pearson Education, pp 257–274
Hoffman N (2006) Physically based reﬂectance for games
Horn DR, Sugerman J, Houston M, Hanrahan P (2007) Interactive k-d
tree gpu raytracing. In: I3D ’07: Proceedings of the 2007
symposium on interactive 3D graphics and games, pp 167–174
Hostetler TR (2002) Controlling steering behavior for small groups of
pedestrians in virtual urban environments. PhD thesis, The
University of Iowa
Humphreys G, Houston M, Ng R, Frank R, Ahern S, Kirchner PD,
Klosowski JT (2002) Chromium: a stream-processing framework
for interactive rendering on clusters. ACM Trans Graph
21(3):693–702
Isidoro JR, Sander PV (2006) Animated skybox rendering and
lighting techniques. In: SIGGRAPH ’06: ACM SIGGRAPH
2006 courses, pp 19–22
Jacobson J, Holden L (2005) The Virtual Egyptian Temple. In: EDMEDIA:
Proccedings of the world conference on educational
media. Hypermedia & Telecommunications
Jacobson J, Lewis M (2005) Game engine virtual reality with
CaveUT. IEEE Comput 38(4):79–82
Jacobson J, Handron K, Holden L (2009) Narrative and content
combine in a learning game for virtual heritage. In: Computer
applications to archaeology 2009
James G, O’Rorke J (2004) Real-time glow. In: Fernando R (ed) GPU
gems. Pearson Education, pp 343–362
Jones C (2005) Who are you? theorising from the experience of
working through an avatar. E-Learning 2(4):414–425
Jones G, Christal M (2002) The future of virtual museums: On-line,
immersive, 3d environments. Created realities group
Kaneko T, Takahei T, Inami M, Kawakami N, Yanagida Y, Maeda T,
Tachi S (2001) Detailed shape representation with parallax
mapping. In: Proceedings of ICAT 2001, pp 205–208
Kawase M (2003) Frame buffer postprocessing effects in doubles.t.e.a.l
(wreakless). Presentation at the game developers conference
2003
Kawase M (2004) Practical implementation of high dynamic range
rendering. Presentation at the game developers conference 2004
Kider JT, Fletcher RL, Yu N, Holod R, Chalmers A, Badler NI (2009)
Recreating early islamic glass lamp lighting. In: VAST09: The
10th international symposium on virtual reality, archaeology and
intelligent cultural heritage, pp 33–40
Kim J, Jaja J (2009) Streaming model based volume ray casting
implementation for cell broadband engine. Sci Program 17(1–2):
173–184
Kirriemuir J (2008) Measuring the impact of second life for
educational purposes. Eduserv foundation, Available from:
http://www.eduserv.org.uk/foundation/sl/uksnapshot052008
Koonce R (2008) Deferred shading in Tabula Rasa. In: Nguyen H (ed)
GPU Gems 3. Pearson Education, pp 429–457
Kru¨ger J, Bu¨rger K, Westermann R (2006) Interactive screen-space
accurate photon tracing on GPUs. In: Rendering Techniques
(Eurographics symposium on rendering–EGSR), pp 319–329
Lamarche F, Donikian S (2004) Crowd of virtual humans: a new
approach for real time navigation in complex and structured
environments. Comput Graph Forum 23(3):509–518
Leavy B, Wyeld T, Hills J, Barker C, Gard S (2007) The ethics of
indigenous storytelling: using the torque game engine to support
australian aboriginal cultural heritage. In: Proceedings of the
DiGRA 2007 conference, pp 24–28
Lee KH, Choi MG, Lee J (2006) Motion patches: building blocks for
virtual environments annotated with motion data. In: SIGGRAPH
’06: ACM SIGGRAPH 2006 Papers, pp 898–906
Lee KH, Choi MG, Hong Q, Lee J (2007) Group behavior from video:
a data-driven approach to crowd simulation. In: SCA ’07:
Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium
on computer animation, pp 109–118
Lepouras G, Vassilakis C (2004) Virtual museums for all: employing
game technology for edutainment. Virtual Real 8(2):96–106
Lerner A, Chrysanthou Y, Dani L (2007) Crowds by example.
Comput Graph Forum 26(3):655–664
Lewis M, Jacobson J (2002) Game engines in scientiﬁc research.
Commun ACM 45(1):27–31
Liarokapis F (2007) An augmented reality interface for visualising
and interacting with virtual content. Virtual Real 11(1):23–43
Liarokapis F, Sylaiou S, Mountain D (2008) Personalizing virtual and
augmented reality for cultural heritage indoor and outdoor
experiences. In: VAST08: the 9th international symposium on
virtual reality, archaeology and intelligent cultural heritage,
pp 55–62
Linaza MT, Cobos Y, Mentxaka J, Campos MK, Penalba M (2007)
Interactive augmented experiences for cultural historical events.
In: VAST07: the 8th international symposium on virtual reality,
archaeology and intelligent cultural heritage, pp 23–30
Lintermann B, Deussen O (1999) Interactive modeling of plants.
IEEE Comput Graph Appl 19(1):56–65
Livingston MA (2005) Evaluating human factors in augmented reality
systems. IEEE Comput Graph Appl 25(6):6–9
Livingstone D, Charles D (2004) Intelligent interfaces for digital
games. In: Proceedings of the AAAI-04 workshop on challenges
in game AI, pp 6–10
Lokovic T, Veach E (2000) Deep shadow maps. In: SIGGRAPH ’00:
Proceedings of the 27th annual conference on computer graphics
and interactive techniques, pp 385–392
Looser J, Grasset R, Seichter H, Billinghurst M (2006) Osgart–a
pragmatic approach to mr. In: ISMAR 06: 5th IEEE and ACM
international symposium on mixed and augmented reality
Lugrin J, Cavazza M (2010) Towards ar game engines. In: SEARIS
2010–3rd workshop on software engineering and architecture of
realtime interactive systems
Macagon V, Wu¨nsche B (2003) Efﬁcient collision detection for
skeletally animated models in interactive environments. In:
Proceedings of IVCNZ ’03, pp 378–383
Macedonia M (2000) Using technology and innovation to simulate
daily life. IEEE Comput 33(4):110–112
Macedonia M (2002) Games soldiers play. IEEE Spectrum 39(3):
32–37
Maim J, Haegler S, Yersin B, Mueller P, Thalmann D, Van Gool L
(2007) Populating ancient pompeii with crowds of virtual
romans. In: VAST07: the 8th international symposium on virtual
reality, archaeology and intelligent cultural heritage, pp 109–116
Malone TW, Lepper MR (1987) Making learning fun: A taxonomy of
intrinsic motivations for learning. In: Snow RE, Farr MJ (eds)
aptitude, learning and instruction: III. Conative and affective
process analyses, Erlbaum, pp 223–253
Mase K, Kadobayashi R, Nakatsu R (1996) Meta-museum: a
supportive augmented-reality environment for knowledge sharing.
In: ATR workshop on social agents: humans and machines,
pp 107–110
Virtual Reality (2010) 14:255–275 273
123
Mateevitsi V, Sfakianos M, Lepouras G, Vassilakis C (2008) A gameengine
based virtual museum authoring and presentation system.
In: DIMEA ’08: Proceedings of the 3rd international conference
on digital interactive media in entertainment and arts, pp 451–457
Matthews J (2002) Basic A* pathﬁnding made simple. In: AI game
programming wisdom, Charles River Media, pp 105–113
McCarthy J (2007) What is Artiﬁcial Intelligence. Available from:
http://www-formal.stanford.edu/jmc/whatisai/whatisai.html
McDonnell R, Newell F, O’Sullivan C (2007) Smooth movers:
perceptually guided human motion simulation. In: SCA ’07:
Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium
on computer animation, pp 259–269
McDonnell R, Ennis C, Dobbyn S, O’Sullivan C (2009a) Talking
bodies: Sensitivity to desynchronization of conversations. ACM
Trans Appl Percept 6(4):21
McDonnell R, Larkin M, Herna´ndez B, Rudomin I, O’Sullivan C
(2009b) Eye-catching crowds: saliency based selective variation.
ACM Trans Graph 28(3):1–10
McGuire TJ (2006) The Philadelphia Campaign: volume one:
Brandywine and the fall of Philadelphia. Stackpole Books,
Washington
McTaggart G, Green C, Mitchell J (2006) High dynamic range
rendering in valve’s source engine. In: SIGGRAPH ’06: ACM
SIGGRAPH 2006 Courses, p 7
Milgram P, Kishino F (1994) A taxonomy of mixed reality visual
displays. IEICE Trans Inf Syst E77-D(12):1321–1329
Mitchell J, McTaggart G, Green C (2006) Shading in valve’s source
engine. In: SIGGRAPH ’06: ACM SIGGRAPH 2006 Courses,
pp 129–142
Mittring M (2007) Finding next gen: Cryengine 2. In: SIGGRAPH
’07: ACM SIGGRAPH 2007 courses, pp 97–121
Mittring M, Crytek GmbH (2008) Advanced virtual texture topics. In:
SIGGRAPH ’08: ACM SIGGRAPH 2008 classes, pp 23–51
Mu¨ller P, Vereenooghe T, Ulmer A, Van Gool L (2005) Automatic
reconstruction of roman housing architecture. In: Recording,
modeling and visualization of cultural heritage, pp 287–298
Musse SR, Thalmann D (1997) A model of human crowd behavior:
group inter-relationship and collision detection analysis. In:
Computer animation and simulation ’97, pp 39–52
Nagy Z, Klein R (2003) Depth-peeling for texture-based volume
rendering. In: PG ’03: Proceedings of the 11th Paciﬁc conference
on computer graphics and applications, p 429
Nareyek A (2004) Ai in computer games. ACM Queue 1(10):58–65
Nareyek A (2007) Game ai is dead. long live game ai!. IEEE Intell
Syst 22(1):9–11
Nienhaus M, Do¨llner J (2003) Edge-enhancement–an algorithm for
real-time non-photorealistic rendering. International Winter
School of computer graphics. J WSCG 11(2):346–353
Noghani J, Liarokapis F, Anderson EF (2010) Randomly generated 3d
environments for serious games. In: VS-GAMES 2010: Proceedings
of the 2nd international conference on games and
virtual worlds for serious applications, pp 3–10
Oliveira MM, Bishop G, McAllister D (2000) Relief texture mapping.
In: SIGGRAPH ’00: Proceedings of the 27th annual conference
on computer graphics and interactive techniques, pp 359–368
OpenGL Architecture Review Board, Shreiner D, Woo M, Neider J,
Davis T (2007) OpenGL programming guide, 6th edn. AddisonWesley,
New York
Orkin J (2002) 12 Tips from the trenches. In: AI game programming
wisdom. Charles River Media, Hingham, pp 29–35
Orkin J (2004a) Applying goal-oriented action planning to games. In:
AI game programming wisdom 2. Charles River Media,
Hingham, pp 217–228
Orkin J (2004b) Symbolic representation of game world state: toward
real-time planning in games. In: Proceedings of the AAAI-04
workshop on challenges in game AI, pp 26–30
Orkin J (2006) Three states and a plan: the A.I. of F.E.A.R. In:
Proceedings of the 2006 game developers conference
Overmars M (2004) Teaching computer science through game design.
IEEE Comput 37(4):81–83
Papagiannakis G, Ponder M, Molet T, Kshirsagar S, Cordier F,
Magnenat-Thalmann M, Thalmann D (2002) LIFEPLUS: revival
of life in ancient Pompeii. In: Proceedings of the 8th international
conference on virtual systems and multimedia (VSMM
’02)
Paquet E, El-Hakim S, Beraldin A, Peters S (2001) The virtual
museum: virtualisation of real historical environments and
artefacts and three-dimensional shape-based searching. In:
VAA’01: Proceedings of the international symposium on virtual
and augmented architecture, pp 182–193
Pelechano N, Allbeck JM, Badler NI (2007) Controlling individual
agents in high-density crowd simulation. In: SCA ’07: Proceedings
of the 2007 ACM SIGGRAPH/Eurographics symposium on
computer animation, pp 99–108
Peters C, O’Sullivan C (2009) Metroped: A tool for supporting
crowds of pedestrian ai’s in urban environments. In: Proceedings
of the AISB 2009 convention: AI and games symposium,
pp 64–69
Peters C, Dobbyn S, Mac Namee B, O’Sullivan C (2003) Smart
Objects for Attentive Agents. In: Proceedings of the international
conference in central Europe on computer graphics, Visualization
and computer vision
Peters C, Ennis C, McDonnell R, O’Sullivan C (2008) Crowds in
context: evaluating the perceptual plausibility of pedestrian
orientations. In: Eurographics 2008–Short Papers, pp 33–36
Pletinckx D, Callebaut D, Killebrew AE, Silberman NA (2000)
Virtual-reality heritage presentation at ename. IEEE MultiMedia
7(2):45–48
Plinius Caecilius Secundus G (79a) Epistulae vi.16. The Latin
Library: http://www.thelatinlibrary.com/pliny.ep6.html
Plinius Caecilius Secundus G (79b) Epistulae vi.20. The Latin
Library: http://www.thelatinlibrary.com/pliny.ep6.html
Purcell TJ, Buck I, Mark WR, Hanrahan P (2002) Ray tracing on
programmable graphics hardware. ACM Trans Graph 21(3):
703–712
Purcell TJ, Donner C, Cammarano M, Jensen HW, Hanrahan P (2003)
Photon mapping on programmable graphics hardware. In:
HWWS ’03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS
conference on graphics hardware, pp 41–50
Reinhard E, Ward G, Pattanaik S, Debevec P (2006) High dynamic
range imaging: acquisition, display and image-based lighting.
Morgan Kaufmann
Reitsma PSA, Pollard NS (2003) Perceptual metrics for character
animation: sensitivity to errors in ballistic motion. ACM Trans
Graph 22(3):537–542
Re´mond M, Mallard T (2003) Rei: an online video gaming platform.
In: Proceedings of the 9th international Erlang/OTP User
Conference
Renevier P, Nigay L, Bouchet J, Pasqualetti L (2004) Generic
interaction techniques for mobile collaborative mixed systems.
In: CADUI 2004: Proceedings of the ﬁfth international conference
on computer-aided design of user interfaces, pp 307–320
Ritschel T, Grosch T, Seidel HP (2009) Approximating dynamic
global illumination in image space. In: I3D ’09: Proceedings of
the 2009 symposium on interactive 3D graphics and games,
pp 75–82
Ropinski T, Kasten J, Hinrichs KH (2008) Efﬁcient shadows for gpubased
volume raycasting. In: Proceedings of the 16th international
conference in central Europe on computer graphics,
visualization and computer vision (WSCG 2008), pp 17–24
Rosado G (2008) Motion blur as a post-processing effect. In: Nguyen
H (ed) GPU gems 3, Pearson Education, pp 575–581
274 Virtual Reality (2010) 14:255–275
123
Rost RJ (2006) OpenGL shading language. 2nd edn. Addison-Wesley,
Upper Saddle River
Ryan N (2000) Back to reality: augmented reality from ﬁeld survey to
tourist guide. In: Virtual archaeology between Scientiﬁc
Research and Territorial Marketing, proceedings of the VAST
Euroconference
Ryder G, Flack P, Day A (2005) A framework for real-time virtual
crowds in cultural heritage environments. In: M Mudge NR, R S
(eds) Vast 2005, short papers prceedings, pp 108–113
Sanchez S, Balet O, Luga H, Duthen Y (2004) Vibes, bringing
autonomy to virtual characters. In: Proceedings of the third IEEE
international symposium and school on advance distributed
systems, pp 19–30
Sander PV, Mitchell JL (2006) Out-of-core rendering of large meshes
with progressive buffers. In: ACM SIGGRAPH 2006: Proceedings
of the conference on SIGGRAPH 2006 course notes,
pp 1–18
Sanwal R, Chakaveh S, Fostirpoulos K, Santo H (2000) Marvins–
mobile augmented reality visual navigational system. Eur Res
Consort Informatics Math (ERCIM News) 40:39–40
Sawyer B (2002) Serious games: improving public policy through
game-based learning and simulation. Whitepaper for the woodrow
wilson international center for scholars
Scheuermann T, Hensley J (2007) Efﬁcient histogram generation
using scattering on gpus. In: I3D ’07: Proceedings of the 2007
symposium on interactive 3D graphics and games, pp 33–37
Scott B (2002) The illusion of intelligence. In: AI game programming
wisdom. Charles River Media, Hingham, pp 16–20
Seetzen H, Heidrich W, Stuerzlinger W, Ward G, Whitehead L,
Trentacoste M, Ghosh A, Vorozcovs A (2004) High dynamic
range display systems. vol 23, pp 760–768
Shanmugam P, Arikan O (2007) Hardware accelerated ambient
occlusion techniques on gpus. In: I3D ’07: Proceedings of the 2007
symposium on interactive 3D graphics and games, pp 73–80
Shao W, Terzopoulos D (2005) Autonomous pedestrians. In: SCA
’05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics
symposium on Computer animation, pp 19–28
Sherrod A (2006) High dynamic range rendering using opengl frame
buffer objects. In: Game programming gems 6. Charles River
Media, Hingham, pp 529–536
Shirley P (2006) State of the art in interactive ray tracing. In: ACM
SIGGRAPH 2006 courses
SinclairP,MartinezK(2001)Adaptivehypermediainaugmentedreality.
In: Proceedings of the 3rd workshop on adaptive hypertext and
hypermedia systems, ACM hypertext 2001 conference
Smith S, Trenholme D (2008) Computer game engines for developing
ﬁrst-person virtual environments. Virtual Real 12(3):181–187
Sousa T (2005) Adaptive glare. In: Engel W (eds) Shader X3:
advanced rendering with directX and openGL. Charles River
Media, Hingham, pp 349–355
Stout B (2000) The basics of A* for path planning. In: Game
programming gems, Charles River Media, Hingham, pp 254–263
Stricker D, Daehne P, Seibert F, Christou I, Almeida L, Carlucci R,
Ioannidis N (2001) Design and development issues for ARCHEOGUIDE:
an augmented reality based cultural heritage onsite
guide. In: icav3d’01: Proceedings of the international
conference on augmented, virtual environments and threedimensional
imaging, pp 1–5
Sutherland IE (1965) The Ultimate Display. In: Proceedings of the
IFIP congress, vol 2. pp 506–508
Sylaiou S, Liarokapis F, Kotsakis K, Patias P (2009) Virtual
museums, a survey on methods and tools. J Cult Herit 10(4):
520–528
Tamura H, Yamamoto H, Katayama A (1999) Steps toward seamless
mixed reality. In: Ohta Y, Tamura H (eds) Mixed reality:
merging real and virtual worlds. Ohmsha Ltd/Springer, Tokyo,
pp 59–79
Tamura H, Yamamoto H, Katayama A (2001) Mixed reality: future
dreams seen at the border between real and virtual worlds. IEEE
Comput Graph Appl 21(6):64–70
Tatarchuk N, Isidoro J (2006) Artist-directable real-time rain
rendering in city environments. In: Eurographics workshop on
natural phenomena
Tchou C (2002) Image-based models: geometry and reﬂectance
acquisition systems. Master’s thesis, University of California,
Berkeley
Tchou C, Stumpfel J, Einarsson P, Fajardo M, Debevec P (2004)
unlighting the parthenon. In: SIGGRAPH ’04: ACM SIGGRAPH
2004 Sketches, p 80
Thomas G, Donikian S (2000) Virtual humans animation in informed
urban environments. In: Computer animation 2000, pp 112–119
Troche J, Jacobson J (2010) An exemplar of ptolemaic egyptian
temples. In: CAA 2010 the 38th conference on computer
applications and quantitative methods in archaeology
Ulicny B, Thalmann D (2002) Crowd simulation for virtual heritage.
In: Proceedings of ﬁrst international workshop on 3D virtual
heritage, pp 28–32
Ulicny B, de Heras Ciechomski P, Thalmann D (2004) Crowdbrush:
interactive authoring of real-time crowd scenes. In: SCA ’04:
Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium
on computer animation, pp 243–252
Vanegas CA, Aliaga DG, Wonka P, Mu¨ller P, Waddell P, Watson B
(2009) Modeling the appearance and behavior of urban spaces.
In: Eurographics 2009–State of the Art Reports, pp 1–16
Vlahakis V, Ioannidis N, Karigiannis J, Tsotros M, Gounaris M,
Stricker D, Gleue T, Daehne P, Almeida L (2002) Archeoguide:
an augmented reality guide for archaeological sites. IEEE
Comput Graph Appl 22(5):52–60
Wallis A (2007) Is modding useful?. In: Game carreer guide 2007,
CMP Media, pp 25–28
Wand M, Straßer W (2003) Real-time caustics. In: Brunet P, Fellner
D (eds) Comput Graph Forum, vol 22. p 3
Waring P (2007) Representation of ancient warfare in modern video
games. Master’s thesis, School of Arts, Histories and Cultures,
University of Manchester
Watt A, Policarpo F (2005) Advanced game development with
programmable graphics hardware. A. K. Peters, Natick
Wright T, Madey G (2008) A survey of collaborative virtual
environment technologies. Tech Rep 2008–11, University of
Notre Dame, USA
Wyman C (2007) Interactive refractions and caustics using imagespace
techniques. In: Engel W (eds) Shader X5: advanced
rendering techniques. Charles River Media, Hingham,
pp 359–371
Yin P, Jiang X, Shi J, Zhou R (2006) Multi-screen tiled displayed,
parallel rendering system for a large terrain dataset. IJVR
5(4):47–54
Yu J, Yang J, McMillan L (2005) Real-time reﬂection mapping with
parallax. In: I3D ’05: Proceedings of the 2005 symposium on
interactive 3D graphics and games, pp 133–138
Zerbst S, Du¨vel O, Anderson E (2003) 3D-Spieleprogrammierung.
Markt ? Technik
Zhou T, Chen JX, Pullen M (2007) Accurate depth of ﬁeld simulation
in real time. Comput Graph Forum 26(1):655–664
Zyda M (2005) From visual simulation to virtual reality to games.
IEEE Comput 38(9):25–32
Virtual Reality (2010) 14:255–275 275
123
Interactive Virtual and Augmented Reality Environments
212
8.16 Paper #16
de Freitas, S., Rebolledo-Mendez, G., Liarokapis, F., Magoulas, G., Poulovassilis, A. Learning
as immersive experiences: Using the four-dimensional framework for designing and evaluating
immersive learning experiences in a virtual world, British Journal of Educational Technology,
Blackwell Publishing, 41(1): 69-85, 2010.
Contribution (20%): Collaboration on the design and evaluation of the serious game as well as
the write-up of the paper.
Learning as immersive experiences: Using the
four-dimensional framework for designing and evaluating
immersive learning experiences in a virtual world_1024 69..85
Sara de Freitas, Genaro Rebolledo-Mendez, Fotis Liarokapis,
George Magoulas and Alexandra Poulovassilis
Sara de Freitas, Genaro Rebolledo-Mendez and Fotis Liarokapis are all researchers in the ﬁeld of serious
games and virtual worlds. George and Alex are researchers with a specialism in data management and
integration technologies. Sara de Freitas, Genaro Rebolledo-Mendez and Fotis Liarokapis are from Serious
Games Institute, Coventry University. George Magoulas and Alexandra Poulovassilis are from London
Knowledge Lab, Birkbeck, University of London.
Abstract
Traditional approaches to learning have often focused upon knowledge transfer
strategies that have centred on textually-based engagements with learners,
and dialogic methods of interaction with tutors. The use of virtual worlds,
with text-based, voice-based and a feeling of ‘presence’ naturally is allowing
for more complex social interactions and designed learning experiences and
role plays, as well as encouraging learner empowerment through increased
interactivity. To unpick these complex social interactions and more interactive
designed experiences, this paper considers the use of virtual worlds in relation
to structured learning activities for college and lifelong learners. This consideration
necessarily has implications upon learning theories adopted and practices
taken up, with real implications for tutors and learners alike. Alongside
this is the notion of learning as an ongoing set of processes mediated via social
interactions and experiential learning circumstances within designed virtual
and hybrid spaces. This implies the need for new methodologies for evaluating
the efﬁcacy, beneﬁts and challenges of learning in these new ways. Towards
this aim, this paper proposes an evaluation methodology for supporting the
development of speciﬁed learning activities in virtual worlds, based upon
inductive methods and augmented by the four-dimensional framework
reported in a previous study.
The study undertaken aimed to test the efﬁcacy of the proposed evaluation
methodology and framework, and to evaluate the broader uses of a virtual
world for supporting lifelong learners speciﬁcally in their educational choices
and career decisions. The paper presents the ﬁndings of the study and considers
that virtual worlds are reorganising signiﬁcantly how we relate to the
British Journal of Educational Technology Vol 41 No 1 2010 69–85
doi:10.1111/j.1467-8535.2009.01024.x
© 2009The Authors. Journal compilation © 2009 Becta. Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ,
UK and 350 Main Street, Malden, MA 02148, USA.
design and delivery of learning. This is opening up a transition in learning
predicated upon the notion of learning design through the lens of ‘immersive
learning experiences’ rather than sets of knowledge to be transferred between
tutor and learner. The challenges that remain for tutors rest with the design
and delivery of these activities and experiences. The approach advocated here
builds upon an incremental testing and evaluation of virtual world learning
experiences.
Background
The widespread reporting of Second Life (SL)—a social virtual world—has helped to
highlight the more general use of immersive worlds for supporting a variety of human
activities and interactions, presenting a wealth of new opportunities and challenges for
enriching how we learn (eg, Boulos, Hetherington & Wheeler, 2007; PrasolovaFørland,
Sourin & Sourina, 2006), as well as how we work and play. In this way, SL, in
common with other virtual world applications, has opened up the potential for users
and learners, teachers and trainers, policy makers and decision makers to easily collaborate
together in immersive three-dimensional (3D) environments regardless of distance
in real time. At the heart of the immersive experiences is the presence of the
learner or user as an ‘avatar’ in the virtual space. This avatar represents the embodiment
of the user in the virtual space and facilitates a greater sense of control within the
immersive environments, allowing users to more readily engage with the experiences as
they unfold in real time (Gazzard, 2009).
The more general use of virtual environments over the last few years has been facilitated
greatly through Web-based technologies and applications, as well as increasing
broadband connectivity and computer graphics capabilities. Together, these allow a
range of options in the context of education and training, not least sharing documents
and ﬁles, holding meetings and events, networking and hosting virtual seminars, lectures
and conferences, running research experiments, providing forums for sharing
research ﬁndings and meeting international colleagues (eg, de Freitas, 2008). Such
applications also have an even greater potential for integrating different technologies by
supporting social software applications (eg, Facebook, Flickr and Wikipedia), presenting
e-learning materials and content, and offering learners’ games and rich social
interactions. In addition, custom online virtual platforms originating mainly from Universities
and research institutes have also been developed mainly for educational and
learning purposes (eg, Liarokapis, Petridis, Lister & White, 2002; Liarokapis et al,
2004). These are more experimental prototypes and usually use dedicated hardware
devices such as advanced visualisation (head-mounted displays, stereoscopic displays),
interaction (3D mouse, orientation and position sensors) as well as haptics (gloves).
However, usually the costs involved in these types of conﬁgurations are still very high,
compared to the alternatives presented above.
70 British Journal of Educational Technology Vol 41 No 1 2010
© 2009 The Authors. Journal compilation © 2009 Becta.
This ﬂexibility of usage alongside potential global reach for users has led to a sudden
and wide growth in the emergence of virtual world applications: in work preparing this
paper, 80 virtual world applications were identiﬁed with another 100 planned by the
end of 2009 (de Freitas, 2008). While not all of these virtual worlds have applicability
for learning, and many are aimed at young children (eg, Club Penguin), the extent of
the ﬁeld, not just in terms of potential use for education and training, but actual usage
and uptake by users is extensive. For example, SL, a social open world, currently has 13
million registered accounts (as of March 2008). This paper however is focused upon
how virtual worlds can be better understood and used speciﬁcally in the context of
education and training, and here the use of SL for supporting seminar activities and
lectures and other educational purposes has been documented in a number of recent
studies and reports (eg, Dickey, 2005; Hut, 2007; Jennings and Collins 2008; for a list
of examples of SL use by UK universities, see Kirriemuir 2008). Both the broad emergence
and the applicability of immersive spaces for undertaking learning have led to
wide interest from learning practitioners in ﬁnding out more about how they may be
best deployed in the class and seminar room.
However, the breadth of applications of virtual worlds, and their relatively swift emergence,
has made this a challenging area for researchers and tutors (Hendaoui, Limayem
& Thompson, 2008). The area is fragmented due to the nature of its cross-disciplinary
appeal and the literature is dispersed around a range of disciplines. Suitably then, this
study, undertaken as part of the Joint Information Systems Committee (JISC)-funded
MyPlan project (see http://www.lkl.ac.uk/research/myplan), led by the London Knowledge
Lab, University of London, set out to explore in a cross-disciplinary way how
virtual worlds might be most effectively evaluated in relation to designed learning
activities, and whether this evaluation methodology could be used as part of the design
process and feedback into an iterative design of activities that could then be replicated
by other researchers and learning practitioners.
Underpinning this cross-disciplinary approach to the emerging ﬁeld of serious games
and virtual worlds, the authors in previous work have been attempting to reconceptualise
ideas around learning, in particular away from more traditional approaches and
towards a notion of learning as more centred upon experience and exploration. To
understand this we are considering the role of multimodal interfaces (eg, 3D interfaces)
and perceptual modelling (cognitive-based approaches), in that our interactions with
the environment and our social interactions with others are adopting an approach
towards constructing learning experiences as a process of ‘choreography’ rather than
based around data recall strategies (de Freitas & Neumann, 2009). This approach
reorganises how we produce and develop learning activities, with a greater emphasis
upon learner control, greater engagement, learner-generated content and peersupported
communities, which jointly may increase learning gains. Work outlining an
‘exploratory learning model’ to support this experience-based and open-ended
approach to learning in training contexts is outlined elsewhere (Jarvis & de Freitas,
2009a), and this paper aims to present the outcomes from a study undertaken to
Learning as immersive experiences 71
© 2009 The Authors. Journal compilation © 2009 Becta.
evaluate the efﬁcacy of using SL as a platform for supporting lifelong learners. In
particular, the study was testing the ‘four dimensional framework’ developed in previous
studies (de Freitas & Oliver, 2006).
Methodology
Literature searches have found few other evaluative frameworks for exploring the uses
and designs of learning activities in virtual worlds, and these are generally trainingcentred
(eg, Fu, Jensen & Hinkelman, 2008). Therefore, this evaluation study adopted
an inductive methodology, which requires researchers to construct theories and explanations
based upon observations conducted using educational research approaches,
including the use of survey data and observations (Gill & Johnson, 1997). A similar
approach has been adopted in the Serious Games—EngagingTraining Solutions project
co-funded by the UK Technology Strategy Board, Selex Systems and TruSim (a division
of Blitz Games), but this focused upon measuring the efﬁcacy of game-based learning
rather than virtual world learning activities (Jarvis & de Freitas, 2009a). The methodology
was selected to address some of the wider issues of efﬁcacy as well as highlighting
some of the main issues and challenges arising from this approach to learning and
support.
In addition to the inductive method, the study combined the use of the ‘four dimensional
framework’ to provide a more structured approach to the synthesis and analysis
of the research ﬁndings. The four-dimensional framework has been proposed in previous
studies and papers, (eg, de Freitas & Oliver, 2006). The framework emerged from
user studies with tutors and learners around the selection and use of game-based
learning. But it has since been used to support the game design and development
process (Jarvis & de Freitas, 2009a). In this study, we applied its use for supporting other
immersive experiences—in virtual worlds. The framework proposes four dimensions:
the learner, the pedagogic models used, the representation used and the context within
which learning takes place (see Figure 1).
Figure 1: The four-dimensional framework
Source: Sara de Freitas, 2008
72 British Journal of Educational Technology Vol 41 No 1 2010
© 2009 The Authors. Journal compilation © 2009 Becta.
The ﬁrst dimension involves a process of proﬁling and modelling the learner and their
requirements.This proﬁle ensures a close match between the learning activities and the
required outcomes. The emphasis upon the learner highlights the importance of the
interaction between the learner and their environment. For example, more naturalistic
interactions may provide less of a gap in learning transfer. Information and communication
technology (ICT) capabilities may affect the way that the learner interacts with
the experience, and their abilities to become immersed in the activities in the ﬁrst place.
Feedback to the learner is an important aspect of reﬂection upon learning and may be
central to the most effective learning experience—or individual perception of effectiveness
(eg, Jarvis & de Freitas, 2009b).
The second dimension analyses the pedagogic perspective of the learning activities, and
includes a consideration of the kinds of learning and teaching models adopted alongside
the methods for supporting the learning processes. This may include the use of
associative models based upon task-centred approaches of learning and consistent with
training methodology (eg, Gagné, 1965), and constructivist models of learning that
involve building upon existing knowledge on the part of the learner (eg, Vygotsky,
1978). ‘Situative’ models of learning involve more socially constructed approaches to
learning (eg, Wenger’s model of communities of practice, 1998). Particular selection of
learning theories may anticipate the types of learning outcomes that result. For
example, it has been observed that immersive experiences based upon task-centred
analysis and learning task construction result in task-centred outputs, and although
effective may be limited to more training-based contexts for learning. Also, certain
forms may reinforce particular approaches more readily.
The third dimension outlines the representation itself, how interactive the learning
experience needs to be, what levels of ﬁdelity are required and how immersive the
experience needs to be. The link between ﬁdelity and learning has been well explored in
the work around simulations, but what constitutes interactivity and immersion are
relatively under-researched areas and so present challenges for researchers designing
experiments. The representational dimension includes the ‘diegesis’ or world of the
experience, and may affect levels of engagement and motivation.
The ﬁnal dimension of context may impact upon the place where learning is undertaken,
for example, in school or informal contexts; it may also affect the disciplinary
context, for example, which subject area is being studied, and whether the learning is
conceptual or applied. Context may also include the supporting resources used for
learning.The interactions between the learner and their context are particularly important
as the learner may be present in a physical and a virtual space at the same time.
These hybrid spaces are relatively unexplored in research terms, but may allow for
different approaches to learning beyond those outlined here.
Each dimension has dependencies upon the others; however, jointly, the four dimensions
provide a conceptual framework for exploring immersive learning and, we argue,
have implications upon learning design as a whole, particularly when applied to immerLearning
as immersive experiences 73
© 2009 The Authors. Journal compilation © 2009 Becta.
sive learning environments. In part to test the efﬁcacy of the framework and the methodology
outlined, the study aims to explore this framework. For ease of use, the ﬁndings
of the study are synthesised in relation to these four dimensions.
Using SL to support planning lifelong learning
The JISC MyPlan project as a whole aimed to develop a personalised system for planning
lifelong learning. The component of the study outlined in this paper aimed to explore
the possibilities of using a virtual world for supporting lifelong learners in their career
decisions and educational choices. In particular, we were interested to ﬁnd out whether
this method could support mentoring and social interactions for learners in a blended
virtual context supplemented with face-to-face tutoring. The study was therefore
designed as user studies with two deﬁned groups of learners: learners studying at
Birkbeck College on the IT Applications programme and learners from Hackney Community
College studying on BTEC courses. The data collection methods for the study
included pre- and post-activity surveys, video observations of real world and the
in-world sessions, recordings and chat logs. The study was undertaken with ethical
considerations and active consent from the participants.
The sessions were held in two computer labs at Birkbeck College, University of London
(BBK) and Hackney Community College, London (HCC), and in SL. Learner groups from
both institutions were selected for the study. The learners from Birkbeck’s IT Applications
programme were mature part-time learners all over 18 years of age, and were
self-motivated learners. The learners from Hackney Community College were aged
between 18 and 24 years of age and were studying for BTEC courses. The two groups
offered signiﬁcant contrast, allowing the researchers to test a range of different
responses to the learning activities under exploration. The Learning Day sessions were
constructed in order to allow for some degree of structured activities, and some degree
of exploration on the part of the learner. The activities functioned as a method for
highlighting the main issues arising from this mode of learning, and to aid with producing
guidelines for tutors using the tools.
Although the intention was that each learner had access to the Internet, some learners’
sessions at HCC were shared since not enough computers were available. User groups
consisted of 7 learners at BBK and 14 at HCC. A tutor with experience of SL guided the
sessions, which lasted between 2 and 3 hours. At the beginning and end of both
sessions individual learners were asked to answer an online survey.
Although factors outside our control altered the sessions (see below), they aimed to take
the following structure: an introduction to the session, where the tutor introduces the
session, explaining the timetable and answering any questions from the learners. This
is followed by an introduction to SL, where the tutor takes learners through an induction
into SL, including the creation of an avatar, movement around the virtual world,
and text chat functions. This is followed by sessions using a blended approach with
face-to-face and virtual components. This includes the tutor and learners visiting the
Universities & Colleges Admissions Service (UCAS) SL island, a session with a UCAS
74 British Journal of Educational Technology Vol 41 No 1 2010
© 2009 The Authors. Journal compilation © 2009 Becta.
advisor in SL, a visit to the Serious Games Institute in SL and a short session with David
Burden, an expert in SL who discusses the merits of using SL. See Figure 2, where David
Burden takes the participants on a virtual tour. The group then visits the IBM island
where they walk around and converse with an expert.To complete the session, the tutor
holds a debrief meeting with the group, including a discussion about their experience
and completion of survey.
Modelling the learner and their learning experiences
As outlined above, the learner dimension provides a modelling of the needs and requirements
of the learner and learner group, including their ICT capabilities. These cohorts
were therefore surveyed. A total of 18 learners answered the pre-activity survey, 7
(38.89%) were BBK learners and 11 (61.11%) were from HCC. The average for selfrated
ICT skills (using a scale from 1–5, where 1 = not very good and 5 = excellent) was
3.94, where BBK learners’ skill was rated as 3.57 and HCC learners’ skill was 4.18. The
high self-rating for ICT skills, and in particular HCC learners self-rated their ICT skills
considerably higher than BBK learners, may be attributed to the difference in age
groups or the greater familiarity of younger learners with new technologies. Notably,
though our previous user studies had also found a higher estimation of technological
capabilities from mature learners (de Freitas, Harrison, Magoulas, Mee, Mohamad &
Oliver, 2006).
Also the capabilities of the learners in using related games technologies were surveyed.
It was found that in the user groups polled, 66.67% of learners do play video games
(28.57% from BBK and 90.91% from HCC), of which 70% from HCC play every day.
Video games are played once a week by 50% of BBK learners and 10% of HCC learners;
two to ﬁve times a week by 20% of HCC learners; and once a month by 50% of BBK
learners who play video games. The learners (when asked to select one or more options
from the survey) who play video games answered that online games were the most
Figure 2: Meeting in-world in Second Life for virtual tour
Source: David Burden
Learning as immersive experiences 75
© 2009 The Authors. Journal compilation © 2009 Becta.
popular (40% from HCC and 100% from BBK), followed by PC (30% from HCC) and
console (30% from HCC and 50% from BBK). Other forms of video games that are
played are mobile games by 20% of HCC learners and virtual games by 10% of learners.
HCC learners are heavy gamers, which may explain the fact that only a few (18.18%)
had seen or experienced a virtual world. This is at least partly attributable to the
comparatively higher numbers of users using multiplayer online games when compared
with virtual worlds. Surprisingly perhaps, only 22.22% of the sample had used
virtual worlds before. Broken down by institution, 28.57% of BBK learners and 18.18%
of HCC learners had used this type of application before. All of the learners who had
used virtual worlds previously had chosen SL, and none had used a different platform,
such as Olive. All of the learners had used SL only once.
A total of 16 learners answered the post-activity survey (two learners from HCC left
after the session without having completed the post-activity survey). All seven learners
from BBK and nine from HCC completed this survey. When asked how much they had
enjoyed the SL session (using a scale from 1–5, where 1 = didn’t enjoy the session and
5 = really enjoyed the session), BBK learners averaged 3.14 while HCC learners averaged
3.22. The survey asked learners about how much they had enjoyed the different
aspects of sessions. The ﬁndings of the survey, including Likert numbers, are included
in Table 1 below.
More generally, the survey synthesis found that 43.75% of the sample (42.88% from
BBK and 44.44% from HCC) would recommend the use of SL to their friends. However,
when asked whether SL sessions helped them to reﬂect upon their educational choices
and career decisions, only 12.5% of the sample answered positively (14.29% from BBK
and 11.11% from HCC). Nevertheless, when asked whether they would like to use SL or
another virtual world as part of an educational environment for international collaboration
with learners globally, the majority of the sample (81.25%) answered afﬁrmatively
(100% from BBK and 66.67% from HCC). This indicates that there were problems
Table 1: A comparison of how well liked each aspect of the session was by each user group
Aspect of session BBK learners HCC learners
The face-to-face sessions 3 2.5
Using the SL application 3.14 2.66
Creating avatars 2.2 3.14
Moving in the virtual space 2.42 2.75
The visit to the UCAS island 2.83 3.125
The SGI presentations 3 3.125
The visit to IBM’s island 2.85 3
Meeting the experts 3.14 2.87
Reacting with your fellow learners in-world 3.5 3.14
BBK, Birkbeck College, University of London; HCC, Hackney Community College, London; SL,
Second Life; SGI, Serious Games Institute; UCAS, Universities & Colleges Admissions Service.
76 British Journal of Educational Technology Vol 41 No 1 2010
© 2009 The Authors. Journal compilation © 2009 Becta.
with the method we used for structuring the learning activities, for example providing
more time for feedback and reﬂection may have been advantageous.
Learning models and theories
The pedagogic dimension of the study design rested largely upon a posited constructivist
model where knowledge construction on the part of the learner was inferred. It was
expected that the learners’ experiences would build upon previous experiences, in particular
using previous experience of similar formats of learning and previous knowledge
of career decisions and educational choices. However, this area of the study design
perhaps presupposed too much prior knowledge on the part of the learner, and some
learners found it difﬁcult to engage with the virtual world. A more structured pedagogic
model and more structured activities in-world may have been more effective and warrants
further testing. Existing constructivist theories of learning are being supplemented
by new ones, currently being piloted (eg, the exploratory learning model of de
Freitas and Neumann 2009) and the use of social virtual worlds such as SL favours
social interactions. Therefore, although a more constructivist approach was favoured
for the study, the ﬁndings seemed to point to greater strengths for supporting social
learning. One college learner noted that: ‘[it] brings all people from every aspect of the
world together and learn about each other [sic]’. A greater focus upon social interactions
and pedagogic models designed to support more socially focused activities may be
a better approach for future design. The strengths of the social virtual world need to be
better reﬂected in learning design strategies.
The emphasis and strength of the system for supporting social interactions was supported
by the comment from one learner around the use of voice capability: ‘we
couldn’t use voice on this trial, but I’m sure this would help quite a bit.’ However, in
some studies tutors have expressed a preference for using text interactions due to the
ease of turn taking when managing groups of learners in-world. Other studies with SL
have demonstrated similar ﬁndings to this study. In particular, the study undertaken by
Dr Diane Carr observing the use of SL with Masters learners at the Institute of Education,
UK as outlined on the Learning in Social Worlds project blog (Carr, 2008), demonstrated
some similarities, such as problems with using text chat, disorientation and
ambiguity, the need to spend time getting used to the interface and the complexity
around structuring experiences that are useful for supporting learning. Carr summarises
this:
A great deal of ‘structuring’ was going on during the sessions—the tutors’ frantically [sic] use of
Instant Messenger, for instance, that was not visible to the learners. Also, there were 2 or 3 tutors
at each session, taking on different roles in relation to content and class management (Carr,
2008).
The study also pointed to strengths of using SL in terms of enhancing social interactions,
which is useful for distance and online learners, adding a greater sense of ‘presence’
than traditional virtual learning environments such as Blackboard, with which
the use of SL was compared (rather than face-to-face learning). Carr’s study also found
Learning as immersive experiences 77
© 2009 The Authors. Journal compilation © 2009 Becta.
that some individual learners were unable to adapt to the use of virtual worlds. Our
observation was that those unfamiliar with text chat had a particular disconnect from
the application: as both the 3D interface and text chat were unfamiliar to them, they felt
excluded from the session. Learners’ capabilities with using the interfaces therefore do
need to be considered in advance of using the technologies, and additional induction
training may be needed in these cases, or alternative learning strategies (eg, Web-based)
may be offered.
Usability, interactivity and accessibility
In the course of this study, the main area of consideration with relevance to the representational
dimension focused on the usability of SL. There were clearly issues with the
technology, not least because there were signiﬁcant problems with connectivity and
local development work being undertaken at Linden Lab that day that affected the
access to the system, and had a negative impact upon the study ﬁndings. These technical
issues had a clear impact upon the transfer of the learning experience.
The comments from learners underlined these technical issues. In the area of usability
of the system, learners commented that ‘movement was a bit sluggish, but I suppose
that’s more to do with the Internet connection I think.’ One of the college learners
noted: ‘make it so it dont [sic] glitch as much and add a few more features to the island’.
The connectivity problems were signiﬁcant and led to some comments from the learners,
such as ‘a better Internet connection would have allowed us to have a “fuller”
experience. I think that would have made it better’. The issues are signiﬁcant and tutors
aiming to use SL would have to ﬁnd coping mechanisms for these kinds of problems that
occur with limited broadband, accessibility issues and regular maintenance work
at Linden Lab. The newness of the technologies and the architectural issues with
SL has led a group of open source developers to develop OpenSim (http://
www.opensimulator.com), with the aim of developing a more scalable architecture and
allowing the application to be hosted behind institutional ﬁrewalls, reducing considerably
the technical issues experienced on the day of testing.
However, despite these difﬁculties at least one of the mature learners could see real
beneﬁts for those using the application with disabilities:
I work with drama/theatre and people with a disability—acquired brain injury—who are on a
programme getting them back to work. I think there are some really interesting possibilities in
helping to develop conﬁdence among such clients interacting virtually before or as an adjunct to
‘real’ life social interaction and skills development.
The representation of the virtual world itself therefore can have a negative impact upon
learning, not least because of the level of expectation on the part of the learner. There
is evidence that regular gamers ﬁnd the graphics of virtual worlds too low level, and can
experience negative transfer as a result. Learner expectation is a factor for tutors to deal
with when using immersive worlds. However, if the activities are well structured and
feedback is given by tutors to the learners then there are possibilities for using the tools,
78 British Journal of Educational Technology Vol 41 No 1 2010
© 2009 The Authors. Journal compilation © 2009 Becta.
in particular, where social interactions and support may be required. The representation
of the virtual world then creates an additional design tool for the tutor: once
usability and accessibility issues are addressed, the tutor may explore learning through
the interchange between the real and virtual representations—or hybrid spaces (and
experiences). In this context, virtual worlds may be used as metaphors of learning or life
experiences that can be reﬂected upon and interacted with in social groups.
Real and virtual contexts
There were wider contextual issues that affected the efﬁcacy of the learning experiences,
and these centred upon a lack of engagement with the virtual worlds due in part
to speciﬁc learners’ background and age. For example, one or two learners did have
problems relating to the format. One mature learner commented, ‘I am afraid that I
cannot relate to the virtual world’. Another learner commented that ‘I think anyone
new to SL would need someone to show them how to use it, as it is not intuitive to non
computer games players.’ The ﬁrst learner commented that ‘my worry is that it would
exclude people who weren’t technologically sophisticated.’ The ﬁrst learner felt that: ‘I
can’t relate to a virtual world and imaginary people; it makes me restless and want to be
with real people.’ Interestingly, this learner found it difﬁcult to relate to the fact that
avatars were all human-driven, and felt distanced from the real people due to the
interface and use of avatars. This was compounded by the fact that the learner was not
familiar with the process of text chat and found it alienating for communicating with
others.
In addition, the study raised particular issues around accessibility and usability, including
the quality of broadband connectivity and the user interface design. It is undeniable
that using SL behind the institutional ﬁrewalls is a difﬁcult and imprecise undertaking,
and negative ﬁrst impressions can be off-putting to the extent that some will not return.
As an indication of this, Linden Lab estimate that half of all users never return after
their ﬁrst hour in SL (Lorica, Magoulas & the O’Reilly Radar Team, 2008). However for
those that do there are interesting applications that can be investigated (de Freitas,
2008).
It is worth considering that while the learners were participating in a study situated at
college and university, and as such the context of learning was strictly formal, it would
be interesting to gauge the reactions if the study were undertaken in informal learning
settings, at home, or in work based settings.
Discussion
While multiplayer games may have educational potential in the future, virtual worlds
are generally regarded as having greater educational potential (de Freitas, 2008). Currently
this is broadly because of the focus of activities. However, the method for comparing
the beneﬁts of structured activities in games over open-ended explorations of
virtual worlds is an area in need of further research. Of interest here may be how to
bring together the structured activities of games with the exploration and social power
Learning as immersive experiences 79
© 2009 The Authors. Journal compilation © 2009 Becta.
of virtual worlds. The motivational capacities of game-play when brought together
with the social interactions of virtual worlds may be a powerful teaching combination
in the future.
The wider trends of technical convergence between games technologies and educational
uses is occurring in the shape of serious games and simulations, but while
simulations and games for learning are more established approaches, and have more
literature to accompany them, the uses of virtual worlds for learning is still a relatively
new ﬁeld, and as this preliminary study has shown there is a signiﬁcant learning curve
when using virtual world applications to support learning, both for tutors and learners.
The main impediment lies in the context and familiarity of the form. Indeed, factors
such as where the virtual world is used and the past experience of users with the system
are signiﬁcant aspects ensuring or preventing effective use. Additionally, prior experience
of gameplay may not be a positive factor, and previous game experience may in
fact have a negative impact upon learning with virtual world applications, as gameplayers
are used to much higher levels of ﬁdelity and interactivity than are presently
available in virtual worlds. With convergence, this is in the process of changing, but as
the testing session revealed issues, such as ﬁrewalls and graphics, capabilities of hardware
can signiﬁcantly reduce the immersion of the experience and so reduce the effectiveness
of the experience.
The technical issues did signiﬁcantly impede the users’ seamless experience and, in
contrast with other studies, the least liked aspects of the interaction in SL were creating
avatars and moving in-world. This was certainly due to extremely slow connections as
a result of maintenance work that day at HCC and due to multiple users on the network
at BBK, both of which caused slow download times. In general, the research indicates
that control over avatars can be a critical aspect of allowing users to become engaged
and motivated through empowerment of controlling their own representation in-world
although as Carr, for example, has indicated for some learners this can be off-putting
and produce a ‘pain barrier’ to be overcome. From our study, it was clear that the college
learners felt more familiar with the process of avatar creation and that this did hold
their attention: Figure 3 shows a college learner who had personalised his avatar
within a few minutes of using SL, although he had no prior knowledge of SL. Younger
learners are adapting to new approaches more readily and concepts such as avatars and
customisation of one’s avatar are integrated into their prior knowledge of online
gaming.
The research team experienced signiﬁcant challenges with assessing and validating the
efﬁcacy of SL for supporting educational choices and career decisions, in terms of the
methods of structuring of exercises, providing the best support for the learners and also
in terms of technical issues experienced by the users. While some learners were clearly
visibly engaged, more work is needed to ﬁnd out ways of engaging more learners with
how to structure the activities, and greater support in advance of trialling is required.
More rigorous frameworks and metrics would also be useful for supporting future efﬁcacy
studies. The research team would like to undertake further larger and more longitudinal
studies towards that end.
80 British Journal of Educational Technology Vol 41 No 1 2010
© 2009 The Authors. Journal compilation © 2009 Becta.
Reﬂecting on these difﬁculties, only a handful of learners tested (12.5% of learners)
expressed that SL helped them to reﬂect upon their educational choices and career
decisions. This indicates that the platform is one in which the format used with users
would not be appropriate for mentoring learners. In particular the technical issues such
as accessibility and usability were too jarring for the learners, and got in the way of
them appreciating the value of the form. Problems with SL such as connection speed,
difﬁculty to move around, orientation, lack of signposts, and not using voice as used in
a classroom setting, impeded the study. The HCC did visit the UCAS island, but needed
more support with their interactions with the information there. They also thought
more signposting on the island would be helpful. They enjoyed visiting the IBM island
but also needed more support and guidance in-world. Due to technical issues it was not
possible to provide this. However, if the activities were better structured, and the technical
issues could be overcome then the format may have potential for mentoring and
other socially-driven interactions and learning modes.
On the other hand, 81.25% of learners saw positive links between using SL as part of an
educational environment for international collaboration with learners globally. This
indicates that there are other aspects of SL that may be used in the future for supporting
socially-based learning activities designed for lifelong learners. The social dimension of
SL is clearly a powerful component of the format, and when the technology becomes
more stable, and broadband and sufﬁcient graphics capabilities can be guaranteed
within institutions, then it could be used for role play, mentoring and for social skills
acquisition.
The main lessons arising from this study demonstrated a need to evaluate the platform
with a larger sample of learners. While this study is useful for deﬁning some of the
Figure 3: Photo of one of the students participating in the study in Second Life
Source: Sara de Freitas, 2008
Learning as immersive experiences 81
© 2009 The Authors. Journal compilation © 2009 Becta.
evaluation issues, larger numbers of learners would yield a richer dataset and more
scope for analysis. In addition, there is a need to consider the design criteria for more
structured activities, ﬁnd ways to better orientate the learners and tutors in advance of
the study, and a need to utilise more concerted and experienced technical support and
resources.
While it was found that the inductive methodology of data collection was effective for
providing information about the use of SL (in particular, the combination of chat logs,
video footage and surveys was useful for providing a more multidimensional impression
of the usage of SL), the use of in-depth semistructured interviews with some of the
participants would have been useful for providing a more qualitative dimension for
study ﬁndings analysis. A follow-up study examining the design, development and use
of virtual worlds for tertiary education with lifelong learners would be helpful for
validating this evaluation methodology. Moreover, a study using greater numbers of
users exploring the patterns of use of modules being taught in SL, in particular with a
comparison between face-to-face learner groups, pure distance or online learners and
hybrid groups of both would be desirable.
The use of immersive learning centrally implies a shift from considering and designing
learning tasks to choreographing learning experiences as a whole, mediated by structured
and semistructured social interactions. This has implications upon elements of
how the learning day as a whole is structured in terms of the different requirements
such as duration of sessions, breaks, and necessary facilities and technical support. But
it also has implications upon pedagogic considerations, such as learning theories and
models applied, the role of the tutor and the context of learning. This shift merits
consideration of learning experiences as involving social interactions between
members of the learning group, supporting exploratory individual pathways and identiﬁcation
of methods of tutoring that focus more upon mentoring and guiding development.
Towards this end, tutors may analyse the learner group and consider their ICT
skills levels, game experience and learning approaches. Also, they may consider the
pedagogic approaches needed for the subject area taught, learner group and context of
learning. Use of the four-dimensional framework can support this process, in terms of
the selection of media used and the questions that the tutor needs to ask themselves
when structuring and considering the most appropriate ways of integrating immersive
learning into their plans.
Orientation is important for new users of virtual worlds to induct them into using the
platform, and for maximising their engagement with virtual worlds as a whole. As this
study has demonstrated, those who are familiar with gaming and who use multiplayer
games regularly often ﬁnd the unstructured and open-ended aspect of virtual worlds
difﬁcult to adapt to, as they are used to more structured and purposeful activities, and it
can take a long while for them to adapt to these more open and exploratory social
worlds. In order to support learners who are novices or regular gameplayers, it would be
useful to hold start-up sessions with learners in advance of learning sessions to allow
learners to become orientated with the user interface. For example, sessions may be
82 British Journal of Educational Technology Vol 41 No 1 2010
© 2009 The Authors. Journal compilation © 2009 Becta.
held where learners log in remotely from home, allowing sufﬁcient time for them to
become used to the interface, and minimising the technical issues. In addition to that,
orientation sensors (ie, Wiimote) may be used to allow for more tangible orientation in
the virtual social worlds.
Conclusions
This study set out with the intention of testing a virtual world using a predeveloped
evaluation methodology and approach. The approach was based upon an assumption
that learning experiences need to be designed, used and tested in a multidimensional
way due to the multimodal nature of the interface. To support this, the fourdimensional
framework was used with the inductive method to gather data and to
synthesis and analyse the ﬁndings. As a whole, the approach has worked well in this
ﬁrst iteration, its main strength being that the use of the evaluation methodology
allowed the research team to evaluate the learning experience according to speciﬁc
criteria. The presented evaluation methodology may be used as a design tool for designing
learning activities in-world as well as for evaluating the efﬁcacy of experiences, due
to its set of consistent criteria. The approach does augment the existing methods for
evaluation, but needs to be tested with a larger sample and in wider contexts of use to
verify its efﬁcacy across different platforms.
While the study itself was affected by technical issues that in general were off-putting for
those unfamiliar with virtual worlds, still some beneﬁts of using SL for supporting
under-served learners, for engaging learners and for supporting distributed groups of
learners were highlighted, due to the engaging nature of the form and to its international
reach. While it is generally considered that improvements of the SL platform, and
the advent of OpenSim and other new-generation virtual worlds will signiﬁcantly
reduce many of the technical issues experienced by the learners, it is also recognised
that such tools are still relatively immature and that more work needs to be undertaken
to establish their most effective uses, to produce clear guidelines and to exploit their
capabilities to the highest degree.
Particular strengths of the medium were highlighted; for example, the learners were
positive about using the tools for supporting international collaboration, indicating the
power of the tool for supporting distributed learning communities based upon shared
interests. While the study has not proved conclusively the power of the tool for mentoring,
the sessions with the mentor were very effective in practice, and in the future
one-to-one sessions with mentors based abroad or not co-located could be further
explored. However, more context and advance study is needed to situate the activities
in-world and greater time for reﬂection needs to be provided. Virtual worlds may also
support peer collaboration and may be used, for example, for collaborative assignments
in-world with practical outputs, for example, designing a marketing campaign
in-world, and work centring upon social interactions would be well served in this
virtual world. Also, there is real potential for supporting online learning methods by
extending the beneﬁts of audio-graphic conferencing to provide a greater sense of
presence, thereby potentially reducing non-completion rates.
Learning as immersive experiences 83
© 2009 The Authors. Journal compilation © 2009 Becta.
The potential for using a social virtual world such as SL for supporting life decisions and
educational choices has been established with this study, but thorough testing of sessions,
appropriate technical support, use of established and tested pedagogical principles
and well-structured sessions are essential for providing enriched experiences that are
properly contextualised for the learner. In particular, this immersive learning approach
could work well with distance and online learners, distributed user groups or as an
additionalsupportforface-to-facelearners.Theuseof virtualworldsmayalsoneedtobe
considered with respect to using a ‘blend’ of other media support mechanisms, such as
videoconferencing and virtual learning environments, which may help to support the
community-based and social collaborative strengths of immersive environments.
References
Boulos, M., Hetherington, L. & Wheeler, S. (2007). Second Life: an overview of the potential of
3-D virtual worlds in medical and Health education. Health Information & Libraries Journal, 24,
4, 233–245.
Carr, D. (2008). Learning to Teach in Second Life. Report for Learning from Online Worlds; Teaching in
Second Life. Institute of Education/Eduserv Foundation, April 2008. Retrieved October 13, 2008,
from http://learningfromsocialworlds.wordpress.com/learning-to-teach-in-second-life/
Dickey, M. D. (2005). Three-dimensional virtual worlds and distance learning: two case studies of
Active Worlds as a medium for distance education. British Journal of Educational Technology, 36,
3, 439–451.
de Freitas, S. (2008). Serious virtual worlds: a scoping study. Bristol: Joint Information Systems
Committee. Retrieved April 27, 2009, from http://www.jisc.ac.uk/publications/publications/
seriousvirtualworldsreport.aspx
de Freitas, S. & Neumann, T. (2009). The use of ‘exploratory learning’ for supporting immersive
learning in virtual environments. Computers and Education, 52, 2, 343–352.
de Freitas, S. & Oliver, M. (2006). How can exploratory learning with games and simulations
within the curriculum be most effectively evaluated? Computers and Education, 46, 249–264.
de Freitas, S., Harrison, I., Magoulas, G., Mee, A., Mohamad, F., Oliver, M. et al (2006). The
development of a system for supporting the lifelong learner. British Journal of Educational
Technology. Collaborative e-support for lifelong learning, 37, 6, 867–880.
Fu, D., Jensen, R. & Hinkelman, E. (2008). Evaluating game technologies for training. IEEE aerospace
conference. San Mateo, CA: Stottler Henke Assoc., Inc.
Gagné, R. M. (1965). The conditions of learning. New York: Holt, Rinehart & Winston.
Gazzard, A. (2009). The avatar and the player: understanding the relationship beyond the screen.
In G. Rebolledo-Mendez, F. Liarokapis & S. de Freitas (Eds), Proceedings of the IEEE Games and
Virtual Worlds for Serious Applications conference, Coventry, UK (pp. 190–193). Proceedings of the
1st IEEE International Conference in Games and Virtual Worlds for Serious Applications, IEEE
Computer Society, Coventry, UK, 23–24 March.
Gill, J. & Johnson, P. (1997). Research methods for managers (2nd ed.). London: Paul Chapman
Publishing.
Hendaoui, A., Limayem, M. & Thompson, C. W. (2008). 3D social virtual worlds: research issues
and challenges. IEEE Internet Computing, 12, 1, 88–92.
Hut, P. (2007). Virtual laboratories. Progress of Theoretical Physics, 164, 38–53.
Jarvis, S. & de Freitas, S. (2009a). Towards a development approach for serious games. In T. M.
Connolly, M. Stansﬁeld & E. Boyle (Eds), Games-based learning advancements for multi-sensory
human-computer interfaces: techniques and effective practices. Hershey, PA: IGI Global.
Jarvis, S. & de Freitas, S. (2009b). Evaluation of an Immersive Learning Programme to support
Triage Training. Proceedings of the 1st IEEE International Conference in Games and Virtual Worlds
for Serious Applications, IEEE Computer Society, Coventry, UK, 23–24 March (pp. 117–122)
ISBN: 978-0-7695-3588-3.
84 British Journal of Educational Technology Vol 41 No 1 2010
© 2009 The Authors. Journal compilation © 2009 Becta.
Jennings, N. & Collins, C. (2008). Virtual or Virtually U: educational institutions in Second Life.
International Journal of Social Sciences, 2, 3, 180–186.
Kirriemuir, J. (2008). Measuring the impact of Second Life for educational purposes. Retrieved August
4, 2008, from http://www.eduserv.org.uk/foundation/sl/uksnapshot052008
Liarokapis, F., Mourkoussis, N., White, M., Darcy, J., Sifniotis, M., Petridis, P. et al (2004). Web3D
and augmented reality to support engineering education. World Transactions on Engineering and
Technology Education, 3, 1, 11–14.
Liarokapis, F., Petridis, P., Lister, P. F. & White, M. (2002). Multimedia Augmented Reality Interface
for E-Learning (MARIE). World Transactions on Engineering and Technology Education, 1, 2,
173–176.
Lorica, B., Magoulas, R. & the O’Reilly Radar Team (2008). Virtual worlds: a business guide:
2008. O’Reilly Radar Report. http://radar.oreilly.com/research/virtual-world-report.html.
Publisher name: O’Reilly, Web publication, date accessed 29th October 2009.
Prasolova-Førland, E., Sourin, A. & Sourina, O. (2006). Cybercampuses: design issues and future
directions. Visual Computing, 22, 12, 1015–1028.
Vygotsky, L. S. (1978). M. Cole, V. John-Steiner, S. Scribner & E. Souberman (Eds), Mind in society,
Cambridge, Massachusetts and London England: Harvard University Press.
Wenger, E. (1998). Communities of practice. Cambridge: Cambridge University Press.
Learning as immersive experiences 85
© 2009 The Authors. Journal compilation © 2009 Becta.