9 Auxiliary Data: Events and Representations This data analysis chapter differs from the earlier ones by adding to the eye-tracking data other data recorded from the participant. Recording auxiliary data along with eye tracking is a technical pursuit currently in its infancy, therefore there is only limited advice available on the complex details surrounding the combination of equipment and complementary data representations. Becasusc of the structure of this book, you will find information on how to deal with auxiliary data in two other parts of this book: if you plan to use auxiliary data, see pages 95-108. If you want to record auxiliary data, please read the section on pages 134— 139. This chapter deals only with analysing auxiliary data in addition to eye-tracking data. Please note that the analysis of auxiliary data in isolation from eye-movement data is out of the scope of this book. If you are interested in analysing auxiliary data in itself, please adhere to a large body of already existing literature, such as Ericsson and Simon (1993) and Chafe (1994) for verbal data, and Luck (2005) for EEO data. Co-aligning the data streams over time should be the first step. The continued analysis depends on whether the other data source has events, or whether their analysis requires processing data over a period of time. The chapter is structured as follows: • In Section 9.1 (p. 286), we exemplify two common methods for co-analysis of data: first, using the eye-movement data as an onset marker, and analysing the data in the other channel from that onset. Second, when there are events in the auxiliary data that can be associated with events in the eye-movement data, latencies can be calculated. We exemplify this analysis using studies combining eye-movement data with EEG. fMRI, motion tracking, verbal data, and keystroke data. • Section 9.2 (p. 290) introduces a content-oriented co-analysis of eye-movement and verbal data. In this type of analysis, the eye-movement data have only been used to stimulate the elicitation of verbal data and are not intended for further analysis. Recording verbal data along with eye-movement data, however, also allows the researcher to co-analyse the two types of data as a form of methodological triangulation. 9.1 Event-based coalignment Whether the auxiliary data consist of motion tracking, EEG, speech, or GSR recordings, the analysis with eye-tracking data starts by placing the two synchronized data streams side by side, and then executing some form of latency analysis. Typically, the onset of an event in one of the two data streams is taken as time 0. For auxiliary data where events cannot be detected, the analysis takes its starting point as the onset of an event in the eye-tracking data, most commonly a fixation or a saccade. Figure 9.1(a) illustrates this case. Around the selected event, periods of analysis are defined, often called 'epochs' in EEG and fMRI terminology. Specific types of analysis are then done for the auxiliary data within those periods. EVENT-BASED COALIGNMENT| 287 Onset of event in eye-tracking data, such as fixation, saccade, etc. Time Periods for analysis of the auxiliary data, aligned with the onset of the event. (a) When the auxiliary data have no events, alignment is made at the point in time of the onset of an eye-tracking event (to), such as fixation, saccade, micro-saocade, smooth pursuit. Fig. 9.1 Principles for the two types of co-alignment of eye-tracking and auxiliary data. However, the selected 0-time event can just as well reside in the other data channel. For instance, Diderichsen (2008) investigated dwell times to AOIs in collaborative puzzle building just before and after the onset of the indefinite pronoun 'one'. Often, a latency analysis is made between the onset of eye-tracking events and the corresponding events in the auxiliary data. Latency analysis was first performed by Buswell (1920), and in this book Chapter 13 is entirely dedicated to latency measures. A crucial question when there are many events in both eye-tracking and auxiliary data is which event in one stream should correspond to which in the other stream. Solving this often requires using objects in the stimulus as alignment mediators. Note that co-alignment and latency analyses require that both recording systems, the stimulus presentation tools, and the stimulus monitor are synchronized, and that there are no internal system latencies in either type of recording system. 9.1.1 Alignment of eye-tracking events with auxiliary data The simplest type of co-analysis is to take the onset of an event in one of the data streams as the onset for analysis of the other data stream. EEG co-analysis with eye-tracking data is shown in Figure 9.2. A fixation event starts at time 0, and during the period of its duration and for some time after it has ended—a total of 300 ms in this case—ERP components are analysed. This simple synchronization ties the ERP components to the object looked at during the fixation (Baccino & Manunta, 2005), allowing identification of what is known as EFRPs ~ Time (s) Fig. 9.6 A complex latency situation: data from a participant writing a text, the recording situation shown an page 54, The tiers show gazes on the monitor and keyboard, keystrokes writing the end of a sentence, j='ods of reading as detected by a reading detector (p. 267), and at the bottom, the (x,y)-stream ot raw saia samples. 292 |AUXILIARY DATA: EVENTS AND REPRESENTATIONS Table 9.1 Example of transcribed and segmented speech from a description of poor building quality in bathrooms. (38) but it's a piece/ a part of the basin ... (39) and it is placed on the bench ... (40) which means that if water runs over here (41) then you have to force it over the edge ... back again ... (42) which is very natural ... for us ... (43) so the sink actually rests on ... (44) so if you have a cross section here ... Eye-movement data are very similar to verbal data: both consist of a data stream over time, and both are composed of events. While events in eye-movement data are the well-known fixations, saccades etc., the events in verbal data are words, sentences, or meaningful parts. In this section we will see how these verbal events can be detected in verbal data, which is called 'transcribing and dividing into idea units', and how representations can be built out of these events, known as 'coding'. This section ends by reviewing measures that can be used based on such transcribed and coded verbal events and representations. For further detail on the event-detection procedure in verbal data, see Chi (1997) or Ericsson and Simon (1993). Finally, this section will address open issues in the triangulation of verbal and eye-tracking data and what both research communities can learn from each other. 9.2.1 Detecting events in verbal data: transcribing verbalizations and segmenting them into idea units When analysing the audio recordings, there are two possibilities. The first is to code the audio files by listening to them. Such direct analysis of audio files is only possible for very simple coding schemata, such as if you want to check whether certain terms were mentioned, by listening through the audio files and tick the mentioned terms on a checklist or in a software program designed for this purpose. Several software packages support this function (e.g. Crutcher, 2007). With a large number of possible verbal events and complex verbal processes, you quickly run the risk of losing the overview of your data. The other option is to transcribe the verbal data into written text. This has many benefits: it is easier to grasp the context of single statements, easier to segment the data, and easier to compare ratings of multiple coders (see next section). In particular, if you are interested in more detailed analyses of your data, you cannot forgo a time-consuming transcription. The transcriptions can be done in two ways: by typewriting or by the use of speech recognition software. If you are good at typewriting, this is the preferred choice. Using a pedal that controls the sound player you are using (pausing and playing) frees your hands for writing. Current speech recognition software may be helpful but needs to be trained on the voice to be transcribed, and also has problems with surrounding sounds or different dialects. Just as fixations are aggregated into dwell or scanpath events such as regressions and sweeps, the single words of verbal data are also aggregated into larger units. These segments may be of different granularity: an entire paragraph, a sentence, a turn between different people speaking, or a meaningful idea unit. For instance, Table 9.1 shows a piece of verbal data which has been transcribed and segmented into idea units. Segmenting verbal data into idea units is only one of several possibilities, but it gives units that are not only easy to read but arc argued to correspond both to the flow of conscious ideas and to eye-movement data (Holsanova, 2001; Chafe, 1994). TRIANGULATING EYE-MOVEMENT DATA WITH VERBAL DATA] 293 Table 9.2 Some sub-types of speech, adapted from Chafe (1994). Substantive speech relates solely to factual information; Contextual regulation indicates the speaker is remaining 'on topic'; Interactional utterances signify engagement with another; Cognitive verbalizations reflect thinking directly, possibly arrival at a solution; Validational utterances reflect judgement of the likelihood that the information being conveyed is accurate. Type of speech Example Substantive Contextual regulation Interactional Cognitive Validational "the voltage is very low" "and then...", "well..." "mhm", "you know" "let me see", "oh" "maybe", "I think" 9.2.2 Coding of verbal data units In eye-tracking data analysis, eye movement events, like fixations, are usually aggregated into representations, like different AOIs. A comparable procedure is usually chosen when dealing with verbal data. The transcribed and possibly segmented verbal data must be coded and may be scored in a next step. Coding refers to assigning each verbal data event into a category within a coding schema. This coding schema provides an overview over categories that are important for investigating a certain research question. You can either re-use an existing coding schema from other researchers who have investigated a similar research question or you can develop your own by means of task analysis. After the verbal data have been coded, these coded events may be scored, i.e. positive or negative values may be assigned within a code. An utterance that is relevant for the task is a correct statement versus an incorrect statement would be an example. In that, "utterances relevant for the task" would be a category within a coding schema, while "correct versus incorrect" would be the score. How the transcribed data are coded and scored depends on the purpose of the study in which the data are recorded, and the decision is part of your overall experimental design: what is the question your study wants to answer, and what verbal events would confirm or refute your hypothesis? Hence, verbal data can be coded and scored according to very different systems, depending on the research perspective, as stressed in the following. Examplary coding schemas The verbal data can either be recorded as performance or as process data. If they are recorded as performance data, the data should be scored according to whether the utterances are correct or wrong, complete or incomplete, and whether the relevant technical terms are mentioned. This is often the case, for instance, for free recall or self-explanaiion data. If verbal data were recorded as process data, they can be coded in several ways. They may be coded according to their verbalization level, to the level of processing, to learning and other cognitive processes involved, or based on a cognitive task analysis. The following bullet points expand on these terms along with the tables that accompany them. • A very common way to code the verbal data is to code them according to a prior cognitive task analysis. In his study of software users, Hansen (1991) categorized (substantive) speech events into three different categories ('cognitive', 'visual', 'manipulative'), and the same were used by Hyrskykari et al. (2008) and Holmqvist and Hanson (1999). Van Gog, Paas, Van Merrienboer, and Witte (2005) used 24 different categories for an electrical circuit problem-solving task. In a study where participants evaluated machine 294 (AUXILIARY DATA: EVENTS AND REPRESENTATIONS Table 9.3 Major types of idea units appearing in speech when describing a scene; adapted from Holsanova (2008). Function Types of foci Presentational substantive foci: speakers describe the referents, states, and events in the scene. summarizing foci: speakers connect scene elements with similar characteristics at a higher level of abstraction and introduce them as a global gcstalt. localizing foci: speakers state the spatial relations in the scene. Orientational evaluative foci: speaker judgement of the scene as a whole, the properties of the scene elements or relations between picture elements. expert foci: speaker judgements of scene genre or scene composition. Organizational interactive foci: speakers signal the start and the end of the description. introspective foci and metatextual comments: thinking aloud; memory processes; procedural comments, monitoring; expressing the scene content on a textual met-alevel; speech planning; speech recapitulation. translation outputs, Doherty and O'Brien (2009) used five categories ('positive', 'negative', 'mixed', 'silent', and 'N/A'). Based on a task analysis where steps need to be performed to execute the task of classifying fish locomotion, Jarodzka, Scheiter, et a!. (2010) coded verbal data according to categories of knowledge and understanding. In sum, if verbal data are scored according to a cognitive task analysis, the categories will vary across tasks and research questions. • A coding schema based on cognitive processes in speech, as in Table 9.2, was devised by Chafe (1994), which subdivides utterances into different categories, e.g. substantive, contextual regulation, interactional, cognitive, and validative. Holsanova (2001). based on Chafe's system, used seven different types of 'ideas units' (substantive, summarizing, localizing, evaluative, expert, interactive, and introspective), in her study of picture viewing (Table 9.3). • Another possibility is to score the data according to the three verbalization levels in Table 9.4 (Ericsson & Simon, 1993). Note that the higher the verbalization level, the more it interferes with the primary task and slows it down. • Verbal reports may also be scored according to the different levels of processing (Craik & Lockhart, 1972). The exemplified levels of verbalizations in Table 9.5 can be used to indicate whether a content was processed on a surface or on a deep level. • Furthermore, verbal reports may be scored according to learning strategies used: rehearsal, elaboration, overall control strategies, confirming comprehension, comprehension failure, and planning for further learning (Lewalter, 2003). Irrespective of which coding system you use, you have to keep in mind that at least part of it needs to be coded by two independent raters. Only if the inter-rater reliability is high, TRIANGULATING EYE-MOVEMENT DATA WITH VERBAL DATA| 295 Table 9.4 Verbalization levels. Adapted from Ericsson and Simon (1993). Level-1- The thoughts of the participant were verbalized directly. verbalizations The content of the working memory existed already in a verbal code. Thus, the verbalizations represent the direct content of the working memory, without interposed processes in between attending to the working memory content and its verbalization. Level-2- The content of the working memory existed initially in a verbalizations non-verbal code. Thus, it needed to be converted into a verba! code before being verbalized. One example is verbalizations of mentally animating a picture of a dynamic system, like pulley systems. This level of verbalization is important to all research dealing with pictures, like scene perception, learning with multimedia, or inspecting advertisements. Level-3- The participant's thoughts were filtered before verbaliz- verbalizations ing (most likely by instruction). In that, search and selection processes were interposed before the actual verbalization in order to compare the content of the working memory to the to-be-reported information. This level of verbalizations also occurs in verbalizing motor activities, like car driving, because those activities are normally not accessible to verbalizations. Table 9.5 Surface and deep levels of processing. Adapted from Craik and Lockhart (1972). Surface level of processing For instance, pure verbalizations where no higher level of understanding is inferred, e.g. reading a text word-for-word. Deep level of processing Verbalizations which indicate recognizing the concepts referred to in an abstract sense. Comprehension, and the ability to extract meaning and relatedness between concepts referred to are necessary requisites of deep levels of processing. may the remaining files be coded by one rater. Otherwise you have to re-think your coding schema. 9.2.3 Representations, measures, and statistical considerations for verbal data Besides simple counting of the verbal unit, data representations similar to those in eye-movement data can be built. As a coded verbal event is a state similar to an AOI dwell, transitions between verbal codes can also be counted, for instance the number of instances in which a contextual verbal event is followed by a validational verbal event. A transition matrix which holds the number of each such transitions can easily be calculated, and measures we know from eye tracking applied to it. Also, both eye-movement data and verbal data count 296 lAUXIUARY DATA: EVENTS AND REPRESENTATIONS and numerosity data such as frequency, rates, etc. can be reported and different experimental groups can be compared statistically. To do so, you have to keep the following in mind. The coded verbal data can have different levels of measurement, depending on the coding. If you have a coding without a certain order, the level of measurement is nominal, for instance if you code whether the participant mentioned one of n possible objects. If you have a coding that can be arranged into a meaningful order, the level of measurement is ordinal. This is the case if you assigned school marks (i.e. A, B, C.) to the verbal codes, since you used them as a performance measure. If you have a coding that has not only an order, but also the difference between two codes is always equal, then the level of measurement is interval. This might be the case, for instance, if participants have to recall a certain number of items. If your coding has an absolute zero point, then you have a ratio level of measurement. This is the case, for instance, in counting word numbers. Note that only in the case of interval and ratio data can you use parametric statistical tests. In other cases you have to use non-parametric tests (p. 90). It is currently not systematically investigated exactly how extensive the similarity in analysis methods really is between eye-movement and verbal data. 9.2.4 Open issues: how to co-analyse eye-movement and verbal data As we have seen, both data types pass through a similar procedure, from raw data over event detection and representations to measures. On each of these levels both data types can be compared. The first section in this chapter referred to a comparison on an event level, namely eye-voice latencies. Other research often triangulates both data types on a representation level. For instance, it may be compared whether participants verbally mention an area as often or in the same order as they look at it (e.g. Jarodzka, Scheiter, et al., 2010). This said, it has to be kept in mind that both data types are very rich and require a vast amount of skills, effort, and time from the analyst. Hence, a researcher usually places an emphasis on one of these data sources, and very seldom are data sources investigated to their full potential. The big open issue in co-analysing verbal and eye-tracking data is to successfully triangulate both in an equally elaborate manner. Therefore, the eye-movement and the verbal data communities must start to recognize each other's presence far more and to learn from each other. 9.3 Summary: events and representations with auxiliary data We have learned from this chapter that eye-tracking and auxiliary data sources can be analysed in conjunction. In that, the following steps are taken: • Both data sources are aligned, either according to their onset or offset, or via a mediator. • The latency events are established, with their onset time, duration, and references to the two events between which the latency is calculated. Moreover, we learned that eye-tracking and verbal data can be triangulated. The triangu-lation can in principle take place on each level of the alike data preparation: • As raw data, verbal data are audio files, while eye-tracking data are timestamps and (*,y)-coordinates. While in eye-tracking data events like fixations are detected with algorithms, in verbal data the event detection is usually done manually by identifying so-called idea units. SUMMARY: EVENTS AND REPRESENTATIONS WITH AUXILIARY DATA| 297 • Next, the verbal data are coded according to a schema selected (or developed) by the researcher. • Finally, representations can be built of verbal data, and measures calculated on these representations. Part III Measures In Part 11 we gave detailed descriptions of how to go about calculating oculomotor events from raw data samples, and also how to build meaningful representations from these eye-movements events. Part III of this book provides a comprehensive taxonomy of eye-movement measures. 'Measures' can be thought of as precisely quantifiable data which can be calculated taking events and/or representations as input. Measures arc the dependent variables we explained in Chapter 3. From here statistical analysis can be performed, allowing you to understand what your data means in relation to your experimental design. Measures are thus more quantitative counterparts complimenting events and representations that are more qualitative in nature. Some 120 measures are discussed and evaluated in Part III from a vast literature search. The measures covered are grouped thematically according to commonalities in metrics. There are many overlapping and interchangeable terms concerning eye-movement measures, and our aim is to envelope the vast majority of terms and come up with the most appropriate "fit" in our taxonomy, it is therefore likely that if the reader does not find a measure addressed in a particular chapter of Part III, then it will be addressed in another chapter, and the reasoning for this placement has been thoroughly considered (see p. 463).