Dialogue systems Dialogue systems ■----s,«-s,~ ■ Speech Recognition Grammars Utterance Semantic Interpretation Luděk Bártek Laboratory of Searching and Dialogue, Fakulty of Informatics, Masaryk University, Brno spring 2023 Speech Recognition Dialogue systems Luděk Bártek Speech Recognition ■ Continuous speech recognition - transforms continuous Continuous Speech Recognition Language Model speech into the corresponding text. Speech Recognition Grammars ■ Commend/isolated words recognition. Utterance Semantic ■ Speech recognition principles: Interpretation □ Acquire feature vectors using a short-term signal analysis method. El Classify the speech using the feature vectors from previous step. Continuous Speech Recognition Dialogue systems Luděk Bártek Continuous Speech Recognition Language Model Speech Recognition Grammars The principle differences to the isolated word recognition: ■ the pattern database can not be created ■ the prosodic factors must be taken into the account ■ there is a need to separate the words (find the words borders) ■ the algorithm must deal with filler sounds and speech errors. Solution - statistical approach: ■ a language model ■ a user model. Example: HMM returns the same probability for Czech words "máma"(mother) and "nana"(not a very smart girl) - mother will be used with higher probability - is used often. Continuous Speech Recognition Language Models Dialogue systems Luděk Bártek Continuous Speech Recognition Language Model Speech Recognition Grammars There are: ■ a sequence of words (utterance) 1/1/ = (1/1/1,..., wn) ■ a sequence of acoustic vectors 0 = (oi,..., ot). We want to find 1/1/* (set of all utterances), that maximize P(W\0). According to the Bayes' rule: P(W*\0) = maxP(W\0) = max P(W)*P(0\W) Continuous Speech Recognition Language Models - cont. Dialogue systems Luděk Bártek Continuous Speech Recognition Language Model Speech Recognition Grammars We need following to calculate the maximum of probability: ■ a speaker model - P(0|l/l/) ■ a language model - P{W). The speaker model can be replaced by the probability of generating the W using the corresponding Markov Model The trigram model: ■ Experimentally evaluated that: P(wn\w1... wn_i) = P(wn\wn_2wn-i) Continuous Speech Recognition A Topic Recognition Dialogue systems Luděk Bártek Continuous Speech Recognition Language Model Speech Recognition Grammars The continuous speech recognition is in a range 50 % — 99 % in dependency of a task, a language, ... The recognition success rate can be improved by limiting the recognition domain: ■ a topic recognition ■ using speech recognition grammars. The well-known topic: ■ a change of the state space and trigrams probabilities: ■ for example in stock market news was recognized either " honey" or " money" ? ■ a more accurate language model can be developed. Speech Recognition Grammars Dialogue systems Luděk Bártek Continuous Speech Recognition Language Model Speech Recognition Grammars A general continuous speech recognition accuracy can drop to 50 % approx. The improvement can be reached by the recognition domain restriction - a specification of accepted inputs for example. The speech recognition grammars can be used: ■ context-free grammars Used grammars notations: ■ a logic programming notation ■ proprietary solutions ■ open standards - JSGF, W3C SRGS, ... Speech Recognition Grammars Java Speech Grammar Specification (JSGF) Dialogue systems Luděk Bártek Continuous Speech Recognition Language Model Speech Recognition Grammars Platform and reseller independent textual grammar format. Used for a speech recognition. Part of the Java Speech API. Uses the Java language style and conventions. Present version 1.0 (Oct 1998). Used in the Použit např. v rozpoznávací Sphinx-4 recognizer, VoiceXML interpreter VoiceGlue, ... More details in the 2nd half of semester - a dialogue interface creation tools. Speech Recognition Grammar JSGF example Dialogue systems Luděk Bártek Continuous Speech Recognition Language Model Speech Recognition Grammars #JSGF = I want to go by . I want to go by from to . I want to go by from to at .; = train | bus; = ; = ; Speech Recognition Grammars W3C Speech Recognition Grammar Specification (SRGS) Dialogue systems Luděk Bártek Speech Recognition ■ The W3C standard. Continuous Speech Recognition Language Model ■ Current version 1.0 (Mar 2004). Speech Recognition Grammars ■ Defines the notation of rules and their referencing. Utterance Semantic Interpretation ■ Two types of the notation: ■ XML ■ ABNF (Augmented BNF). ■ More details on 2nd half of semester - the topic dialogue interface creating. W3C SRGS Example Dialogue systems Luděk Bártek Continuous Speech Recognition Language Model Speech Recognition Grammars #ABNF 1.0 UTF-8 root Sgreeting; language en-GB; mode voice; Sgreeting = hi hi < /rule> < /grammar> Utterance Semantic Interpretation Dialogue systems ■ Objective - computer understandable interpretation of the Luděk Bártek informations entered by user. Speech Recognition ■ Example: 1 want to buy the Shakespeare's The Taming of the Continuous Speech Recognition Shrew. Language Model Speech Recognition ■ action = shopping Grammars ■ title = The Taming of the Shrew Utterance Semantic Interpretation ■ author = Shakespeare ■ Representation - the (attribute, value) pairs. ■ General semantic analysis steps: □ acquiring the utterance structure (syntactic analysis) Q the part of the speech interpretation Q deriving the whole utterance interpretation from the parts of the speech interpretations. ■ The utterance semantic interpretation ^ the utterance intended sense (pragmatic interpretation). Utterance Semantic Interpretation Implementation Dialogue systems Luděk Bártek Continuous Speech Recognition Language Model Speech Recognition Grammars Attributes containing a part of speech semantic interpretation are assigned to the speech recognition grammar rules. We can perform operations used to derive the semantic interpretation of the entire utterance from the interpretations of its parts. ■ The ECMAScript (see Semantic Interpretation for Speech Recognition can be used. To find the intended utterance meaning we need to process its context as well. ■ The context can be described using finite automaton with output (the Mealy automaton - see later). Semantic Interpretation Description Dialogue systems Luděk Bártek Continuous Speech Recognition Language Model Speech Recognition Grammars JSGF: ■ Assigned using tags ■ notation - {semantic interpretation} < sentence >=< intro >< title > od < author > < titlie >= Pejska a kočičku {Povídání o pejskovi a kočičce}| (Zlou ženu|Zkrocení zlé ženy) {Zkrocení zlé ženy}|... SRGS - the SISR standard : ■ standard W3C Voice Browser Activity. ■ uses ECMAScript. ■ The semantic interpretation is added to the rules using the tag tag or attribute. ■ The semantic interpretation is is return back using the JSON notation.