Integrating Dialogue Systems with Images Jaromír Plhák, Ivan Kopeček and Radek Ošlejšek {xplhak, kopecek, oslejsek@fi.muni.cz Faculty of Informatics Masaryk University, Brno Czech Republic C:\Users\xplhak\Desktop\filogo.png © J. Plhák, I. Kopeček, R. Ošlejšek, FI MU Brno 1/12 TSD 2012, Brno C:\Users\xplhak\Desktop\filogo.png 2/12 © J. Plhák, I. Kopeček, R. Ošlejšek, FI MU Brno Motivation ØLooking at this photo from a holiday approximately three years ago: It is somewhere in the Czech Republic, but where exactly? And what is the name of that castle? And who is the man behind your girlfriend? ... Ø Figure 1: Photo from a holiday TSD 2012, Brno Communicative image An “intelligent” image that is able to discuss its content with the user -Pernstejn castle -Such pieces of information are virtually inaccessible. However, some relevant pieces of information can be retrieved using current information technologies. GPS coordinates allow us to determine where the photo has been taken. Face recognition may help reveal the identities of persons. The orientation may help to determine some objects in the picture (e.g. the peak in the background) -The idea of communicative images presented in this paper lies in enabling the people to communicate with the images, i.e. enabling the users to get easily relevant pieces of information from the images and enabling simultaneously the images to gather relevant information about themselves from the users C:\Users\xplhak\Desktop\filogo.png 3/12 © J. Plhák, I. Kopeček, R. Ošlejšek, FI MU Brno Key Concepts ØDialogue-based communication in natural language ØOntology-based knowledge base oOntology Web Language ØScalable Vector Graphics oRaster image and objects annotation oowl:import ØInformation sources and algorithms oEXIF data; Face detection; Similarity search algorithms oInformation from users oCrowdsourcing TSD 2012, Brno -At the beginning, we need to build up an communicative image from the classical photo or image. - SVG – support structured annotation data * Either 2D points laying in the middle of the relevant objects or invisible (transparent) polygons determining approximate silhouettes can be exploited. * owl:import - In this way, the knowledge base can be shared by many pictures. On the other hand, concrete annotation data, i.e. concrete values of the properties prescribed by the ontologies, are stored directly in the SVG format in the form of XML elements - Once an image is transformed into the SVG format, the system tries to acquire as much information about the image as possible, using auto-detection and image recognition techniques, e.g. face detection and recognition algorithms, similarity search algorithms searching in large collections of tagged pictures, EXIF data extraction from photos, etc. - After this initial stage, the user is informed about the estimated content and invited to confirm or refute the information and to continue with questioning. -New pieces of the acquired information are stored in the image ontology and reused by subsequent interactions -Points in the middle of the relevant objects -Transparent polygons for approximate silhouettes C:\Users\xplhak\Desktop\filogo.png 4/12 © J. Plhák, I. Kopeček, R. Ošlejšek, FI MU Brno Ontologies ØGraphical ontology oGeneral visual characteristics •Unusual size •Typical shape •Dominant color oNavigation •General and mutual position ØDomain-specific ontologies oFamily and friends oSights oNature ICCHP-ULD 2012, Linz Figure 2: Part of graphical ontology * We have developed a basic graphical ontology which restricts abstraction to the aspects that are suitable and utilizable for dialogue-based investigation of a graphical content. -This ontology focuses on generic visual characteristics of graphical objects and enables to define several typical or usual aspects, e.g. unusual size, typical shape, dominant color etc. Using this ontology the annotator can say that some object is ”mostly red, oval and unusually big”, for instance. -The ontology include navigational part as well, enabling to describe either general or mutual position of objects in the picture. * Although the graphical ontologies cover basic visual characteristics that are useful for generic dialogue interactions, verbal descriptions of domain-specific pictures require to employ specialized ontology extensions helping to generate domain-specific dialogues. For example, a domain model, Family, provides vocabulary and background knowledge to classify people by their relationships, similarly to the popular ”circles” known from social networks. People in a photo are identified and assigned into ”circles”(e.g. my family, friends, colleagues, etc.). C:\Users\xplhak\Desktop\filogo.png 5/12 © J. Plhák, I. Kopeček, R. Ošlejšek, FI MU Brno Dialogues with Images ØCommunication modes oInformation retrieval mode oImage information supplementing mode oFree communication mode ØCommunication Analysis oSmall fragment of natural language •Relatively simple grammars •Frames technology •Standard techniques for misunderstanding solving How far is it from this hotel to the nearest beach? How far is it from to ? TSD 2012, Brno 1. The information retrieval mode (the mode with user’s initiative). In this mode the user just puts questions and gets answers. 2. The image information supplementing mode (the mode with image’s initiative). This mode is used for supplying the image with missing pieces of information. 3. The free communication mode (the mode with mixed initiative). In this mode, there are no apriory limitations on user or image. Both user and image can ask questions and give answers. In each mode, the user can switch to another mode. Typically, we can restrict ourselves to a small fragment of natural language, so that the engine can be based on relatively simple grammars in combination with the frames technology and standard techniques for misunderstanding solving. For instance, the question ”How far is it from this hotel to the nearest beach?” is resolved using the template ”How far is it from SLOT1 to SLOT2?”. The system expects both SLOT1 and SLOT2 to be filled by the specific “object” entries. * graph_final C:\Users\xplhak\Desktop\filogo.png 6/12 © J. Plhák, I. Kopeček, R. Ošlejšek, FI MU Brno Dialogues with Images TSD 2012, Brno ØGrammar generator oGeneral ontology structure (concepts and relations) oSet of objects in the image oThe restrictions on the given slots (cardinality or value type) oActual dialogue strategy oDialogue context oHistory of the communication Figure 3: Example of application structure DM prunes the redundant patterns and resolves the conflicts. C:\Users\xplhak\Desktop\filogo.png 7/12 © J. Plhák, I. Kopeček, R. Ošlejšek, FI MU Brno Dialogues with Images TSD 2012, Brno Ø o o o Figure 4: Example of a dialogue communication in Information retrieval mode S: This is a photo of a group of your friends standing before the Pernstejn castle. Now, you can open and view the photo or start the communication with image. U: Open the image, please. S: /opens the photo/ File is opened. Now, you can start the communication with the image. U: What is the resolution of this picture? S: The resolution of this picture is 8 MPix. Enter your next query, please. U: Who is in the picture? S: There is a group of your friends. George, John and Elisabeth. There are also four other groups of tourists. U: Give me the detailed information about this caste. S: Pernstejn Castle is a castle located on a rock above the village of Nedvedice and the rivers Svratka … U: Thank you, bye. S: Good bye! DM prunes the redundant patterns and resolves the conflicts. C:\Users\xplhak\Desktop\filogo.png 8/12 © J. Plhák, I. Kopeček, R. Ošlejšek, FI MU Brno Implemented Modules TSD 2012, Brno ØGateDialogueBean oDialogue manager ØGateOwlBean oProcessing of ontologies oOWL reasoner Pellet ØGateSvgBean oSVG processing oBatik library ØClient scripts oWeb browser plug-in DM prunes the redundant patterns and resolves the conflicts. C:\Users\xplhak\Desktop\filogo.png 9/12 © J. Plhák, I. Kopeček, R. Ošlejšek, FI MU Brno Applications Domains ØPersonal photos and albums oOrganize oSearch ØE-learning study materials linked to domain-specific knowledge base S: In the picture there is the chemical structure of antioxidant resveratrol. U: What is antioxidant resveratrol? S: Antioxidant resveratrol consists of two benzene nucleus and three hydroxyl groups. U: What is benzene nucleus? S: Benzene nucleus … /The system uses definition from the chemical ontology/ Fig. 2: Antioxidant resveratrol TSD 2012, Brno -Chai et al. proposed an intelligent photo album enabling to organize and search collection of family photos by means of ontologies and SWRL questioning. If communicative images are implemented into this scheme, we see straightforward way to enhance its functionality. Because the photos in the album are organized by means of OWL ontology, it might be possible to employ the mechanism of generating dialogues from domain ontologies. In this way, the user could organize photos via dialogue as well. -E-learning is another field, which could benefit from communicative images. A very important feature of the presented communicative images concept is, that because of the concept is based on formal ontologies, it is fully compatible with the Semantic web paradigm and simultaneously fully supporting multilinguality. C:\Users\xplhak\Desktop\filogo.png © J. Plhák, I. Kopeček, R. Ošlejšek, FI MU Brno Application Domains ØInternet graphics exploration oSocial networks oElectronic news oWeb page editing ØApplications for people with special needs oVisually impaired people oOlder adults oPeople with any constraint in communication TSD 2012, Brno 10/12 -The concept of communicative images seems to be a promising approach in accessibility of graphics for visually impaired people, older adults or people with any constraint in communication. -It provides the users with special needs with another way to explore graphics on the Internet and gives them more detailed information about the photos in the social networks or electronic news. * The communicative images paradigm depends on the users cooperation and sharing data. Annotation and knowledge gained during the interaction of one user is used to enhance and simplify interaction of another user. Crowdsourcing present Web 2.0 technologies that could support this paradigm. If content cannot be recognized by auto-detection techniques then the blind user can ask community of sighted users for help, e.g. to describe ”what is that building in this picture”. Their answers are employed to annotate the picture and extend the knowledge base in the same way as results of machine-based recognition. * Moreover, the communicative images paradigm makes it possible to build other useful applications. Integration of communication images with a dialogue-based web editing would enable to the visually impaired people to create their personal web pages with graphical content by means of dialogues. C:\Users\xplhak\Desktop\filogo.png © J. Plhák, I. Kopeček, R. Ošlejšek, FI MU Brno Conclusion and Future Work ØCommunication with image by means of natural language ØEfficient images investigation using frame based dialogue management oSupported by the ontologies ØAutomated generation of grammars from knowledge base ØTesting of the prototype ØAbstraction of ontologies oSemantic terms, attributes and relations valuable for the system ØModules that search for additional knowledge oInternet, picture analysis 11/12 TSD 2012, Brno •Acquiring additional knowledge (Internet, picture analysis) – Kdo byl v tom závodě nejrychlejší? Jaký typ fotografie to je? © J. Plhák, I. Kopeček, R. Ošlejšek, FI MU Brno Thank you for your attention! xplhak@fi.muni.cz 12/12 TSD 2012, Brno