How to Evaluate (your) Visualizations PA214 - Visualization II Vit Rusnak Talk Outline • Methodologies for visualization design • Evaluation Categories • Understanding the tool vs. understanding the processes • Evaluation without users vs. with users • Some tips and tricks for doing the evaluation Why do we evaluate the visualizations? Five Design Sheets Ideas Filter Sheet 1 Categorize Combine & Refine Question Layout Sheet 2,3,4 Information Operations Focus / Parti Discussion Layout Information Operations Focus / Parti Detail (a) (b) (c) pra t**t - Be careful with averaging (median is often better) 1. The website has a user friendly interface. O -O- -O- -O strongly agree agree disagree strongly disagree 2. The website is easy to navigate. strongly agree agree disagree strongly disagree 3. The website's pages generally have good images. O- -O o- -o strongly agree agree disagree strongry disagree 4. The website allows users to upload pictures easily. -O- -O- -O- -O strongly agree disagree strongly disagree Even vs. odd number of options 5. The website has a pleasing color scheme. O-O-!S-O- strongly agree agree disagree strongly disagree Source: https://en.wikipedia.org/wiki/Likert_scale Standardized Usability Questionnaires Questionnaires designed for the assessment of perceived usability, typically with a specific set of questions presented in a specified order using a specified format with specific rules for producing scores based on the answers of respondents. J. Sauro, J. R. Lewis, Quantifying the User Experience, 2016 • Post-task: SEQ, SMEQ, ER, NASA-TLX,... • Post-study: SUS, UMUX, SUM I, PSSUQ,... • Benefits: • objectivity, replicability, quantification, economy, generalization, communication Post-task: Examples Single Ease Question (SEQ) Overall, this task was Very Easy O O O O O O Very Difficult Source: Sauro J. and Dumas J, S, Comparison of Three One-Question, Post-Task Usability Questionnaires. Subjective Mental Effort Question (SMEQ) 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 J_L Absolutely No Effort Almost No Effort A Little Effort Some Effort Rather Much Effort Considerable Effort Great Effort Very Great Effort Extreme Effort Source: So, et a I. Subjective mental effort questionnaire. Post-study: Examples System Usability Scale (SUS) Questions: 1. I think that I would like to use this system frequently. 2. I found the system unnecessarily complex. 3. I thought the system was easy to use. 4. I think that I would need the support of a technical person to be able to use this system. 5. I found the various functions in this system were well integrated. 6. I thought there was too much inconsistency in this system. 7. I would imagine that most people would learn to use this system very quickly. 8. I found the system very cumbersome to use. 9. I felt very confident using the system. 10. I needed to learn a lot of things before I could get going with this system. Usability Metric for User Experience (UMUX) and UMUX-Lite Strongly Disagree 1 2 3 4 Strongly Agree 5 O o o o O 1. [This system's] capabilities meet my requirements. 1 2 3 4 5 6 7 Strongly Strongly Disagree Agree 2. Using [this system) is a frustrating experience. 1 2 3 4 5 6 7 Strongly Strongly Disagree Agree 3. [This system) is easy to use. 1 2 3 4 5 6 7 Strongly Strongly Disagree Agree 4. I have to spend too much time correcting things with [this system). 1 2 3 4 5 6 7 Strongly Strongly Disagree Agree Source; Finstad K. The Usability Metric for User Experience UMUX-Lite - same 7-point likert scale, only two questions • This system's capabilities meet my requirements. • This system is easy to use. Source: Sauro J. Measuring Usability: From the SUS to the UMUX-Lite. MeasuringU Case Studies • "A detailed reporting about a small number of individuals working on their own problems in their normal environment"* • Case study != Usage scenario • Four key aspects: • in-depth investigation of a small number of cases (often up to 5) • examination in context (how the participant use the tool in his/her natural setting, not a lab-study) • multiple data sources • emphasis on qualitative data an analysis (results in validity and reliability concerns) • Summarized feedback (feature requests, opinion of participants on the tool functions and limits and its applicability in their work) "A" B. Shneiderman and C. Plaisant. 2006. Strategies for evaluating information visualization tools: multi-dimensional in-depth long-term case studies. BELIV '06 Goals of Case Studies • Exploration — understanding novel problems or situations • Explanation — developing models that can be used to understand a context of use • Description — documenting a system, technology use (in context) or the process • Demonstration — showing how the tool was successfully used Case Study Design • There are four main components of a case study design: • (research) questions — What are you interested in? • hypotheses or propositions — What you expect to find? • units of analysis — What are you focusing on? • data analysis plan — Which data we collect and how to process them? Evaluation Workflow Preparation Execution Goal and method Data and forms Workflow Dry run Introduction Demonstration Familiarization Debriefing Interpretation Data analysis Outcomes Summarization Publication Research paper, Tech. report, ... Preparation • Set the goal, then choose the method (with/without participants) • Prepare data and related documents, datasets, consent forms, questionnaires (pre-, post-) • Always do the pilot or dry run => identification of unexpected problems • Make a checklist — always follow the same steps • Get the participants Participants • People participating in the experiment (don't use subjects) • How many? • Short answer: use the same number as used in similar research • Too many: unnecessary work • Too few: fail to get statistically significant results => paper reject Execution • Follow the checklist • Do not change experiment design or conditions after starting it • Use different dataset for practice trials and main experiment • With participants: • Get consent first, debrief participants afterwards • Record: audio/video, mouse traces, make notes Evaluator's Toolbox Within vs. Between Subjects Advantages & Limitations 8- Site 1 Site 2 Within-subjects design The same participant tests all conditions corresponding to a variable. + Smaller sample size + Effective isolation of individual differences + More powerful tests - Hard to control learning effect - Large impact of fatigue Site 1 Person A Person B Between-subjects design Site 2 Different participants are assigned to different conditions corresponding to a variable. Source: https://www.nngroup.com/articles/between-within-subjects/ NN/g + Avoids learning effect + Better control of confounding factors (e.g., fatigue) - Requires more people - Harder to get statistically significant results - Large impact of individual differences Counterbalancing The effect of one condition "carries over" into the next one Common in within-subjects designs, e.g., learning effect Counterbalancing = compensation of carryover effects The order of tasks or datasets used in the experiment (Pseudo)Randomized order — one for each participant Latin Square: nxn array filled with n different symbols, occurring exactly once in each row and column (=Sudoku). • Problem with the odd ones (from order 3) • Solution (for even-ordered only) is Balanced Latin Square • Online generator Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 B m c m s c m B H Posttest Posttest Posttest Posttest Posttest Posttest Souirce: https://tophat.com/mairketplace/social-sclenice/psychology/-/research-methods-inrpsychological-science-laura-freberg/736/68472/ Consent Form • Who you are • What you are asking the participants to do • What kind of data you will be collecting and how it will be used • What rights the participant has • If they will be compensated • The participant must explicitly say "yes" to the consent form SIMON FRASER UNIVERSITY INFORMED CONSENT BY SUBJECTS TO PARTICIPATE IN EVALUATION OF AN INTERACTIVE COMPUTER SYSTEM FOR DATA VISUALIZATION The University and those conducting Ihis project subscribe to the ethical conduct of research and to the protection at all times of the interests, comfort, and safety of subjects. This form and the information it contains are given to you tor your awn protection and lull understanding of the procedures, Your signature on this form will signify that you have received a document which describes the procedures, possible risks, and benefits ot Ihis research project, that you baue received an adequate opportunity to consider the information in Ihe document, and that you voluntarily agree to participate In Ihe project. Knowledge of your identity is not required. You will not be required to write your name or any other identifying information on the research questionnaires. An audio recording of your voice and a video recording of the computer ween only will be made during the session. The video and audio recordings of the session will be reviewed only by the Principal Investigator, ah research materials will be held confidential by the Principal Investigator and kept in a secure location. These research materials will be destroyed afler Ihe completion of the study. Having been asked by Daryl H. Hepting of the School of Computing Science of Simon Fraser University to participate in a research project study, I have read the procedures specified in the accompanying information sheet I understand the procedures to be used in this study and the personal risks and benefits to me in taking part, I understand that I may withdraw my participation in Ihis study at any time. I understand that my decision to participate in this study, and my subsequent involvement in il, will have absolutely no bearing on any other dealings I have wilh Mr. Hepting. This includes, but is not limited to, the case that I am a student in the CMPT 361 course taught by Mr Hepting. offered at SFU during Ihe 99-2 semester I understand that I may register any complaint I might have about the study with the researcher named above or with Dr Jim Delgrande. Director. School of Computing Science of Simon Fraser University. Bumaby, BC. V5A 1 S&, telephone 604-291-4277. I may obtain copies ot the results ot Ihis sludy. upon its completion, by contacting Mr Daryl Hepting. in care of the School of Computing Science at Simon Fraser University. I understand thai my supervisor or employer may require me to obtain his or her permission prior to my participation in a study such as this I agree to participate by completing: a pre-tasK questionnaires a training session on the prototype software system; a task with ihe prototype Software system, and a post-task questionnaire. I understand ihat these activities will require approximately one hour at a time scheduled with Mr. Hepting. I understand that the experiment will be conducted in Room 9836 in the Applied Science Building of Simon Fraser University. NAME (please type or print legibly): ADDRESS: _ SIGNATURE: WITNESS DATE: A COPY OF THIS SIGNED CONSENT FORM AND A SLIQJECT FEEDBACK FORM WILL BE PROVIDED TO YOU AT YOUR EXPERIMENT SESSION. Source: D. Hepting. "A New Paradigm for Exploration in Computer-Aided Visualization", Dissertation thesis. Simon Fraser University. 1999. Color Perception Test • Shinobu Ishihara, 1917 • Ishihara plates • Diagnostic test for color perception deficiencies • 38 plates (full set) • Variants with 10, 12 or 24 Statistical Evaluation Descriptive statistics • Summary of a data set characteristics • Mean, median, mode, standard deviation, spread, central tendency,... Inferential statistics • Infers properties of a population based on a sample data • Testing hypotheses and deriving estimates • Parametric (t-test, ANOVA) and non-parametric tests • Concrete methods are out of scope of this talk • Further reading: Statistical Methods for HCI Research Take away qual. results inspection In SciVis, InfoVis, VAST, we mostly do:algorithmicperformance user experience user performance work processes analysis & reasoning case studies, qualitative inspection, collaboration user experience (qualitative) algorithm benchmarking, user performance (quantitative) communication 0 50 100 150 200 250 300 350 400 450 500 IEEE Vis/SciVis: total numbers & percent of evaluation scenarios • Contribution of real users is invaluable but also painful (involve them ASAP) • Use methodologies and best practices from the field (learn from papers) • Evaluation methods are similar (same) to those in HCI Chart source: http://tobias.isenberg.cc/personal/papers/lsenberg_2013_SRP_Slides.pdf References 1. J. Lazar, J. H. Feng, and H. Hochheiser. 2010. Research Methods in Human-Computer Interaction. Wiley Publishing. 2. I. S. MacKenzie. 2013. Human-Computer Interaction: An Empirical Research Perspective (1st. ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 3. J. Sauro and J. R. Lewis. 2016. "Quantifying the User Experience, Second Edition: Practical Statistics for User Research". 2nd ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 4. J. W. Creswell, Ch. N. Poth, "Qualitative Inquiry and Research Design: Choosing Among Five Approaches", 4th Ed. SAGE Publishing, 2018. 5. T. Isenberg et al., "A Systematic Review on the Practice of Evaluating Visualization," in IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 12, pp. 2818-2827, Dec. 2013. 6. B. Shneiderman, et al. "Designing the User Interface: Strategies for Effective Human-Computer Interaction". 6th ed. Pearson Global Edition. 2018.