Dialogue systems
Dialogue systems
■----s,«-s,~ ■
Speech Recognition Grammars
Utterance
Semantic
Interpretation
Luděk Bártek
Laboratory of Searching and Dialogue, Fakulty of Informatics, Masaryk
University, Brno
spring 2023
	Speech Recognition
Dialogue systems	
Luděk Bártek	
Speech Recognition	■ Continuous speech recognition - transforms continuous
Continuous Speech Recognition Language Model	speech into the corresponding text.
Speech Recognition Grammars	■ Commend/isolated words recognition.
Utterance Semantic	■ Speech recognition principles:
Interpretation	□ Acquire feature vectors using a short-term signal analysis method. El Classify the speech using the feature vectors from previous step.
Continuous Speech Recognition
Dialogue systems
Luděk Bártek
Continuous Speech Recognition
Language Model
Speech Recognition Grammars
The principle differences to the isolated word recognition:
■ the pattern database can not be created
■ the prosodic factors must be taken into the account
■ there is a need to separate the words (find the words borders)
■ the algorithm must deal with filler sounds and speech errors.
Solution - statistical approach:
■ a language model
■ a user model.
Example: HMM returns the same probability for Czech words "máma"(mother) and "nana"(not a very smart girl) - mother will be used with higher probability - is used often.
Continuous Speech Recognition
Language Models
Dialogue systems
Luděk Bártek
Continuous Speech Recognition
Language Model
Speech Recognition Grammars
There are:
■ a sequence of words (utterance) 1/1/ = (1/1/1,..., wn)
■ a sequence of acoustic vectors 0 = (oi,..., ot).
We want to find 1/1/* (set of all utterances), that maximize P(W\0).
According to the Bayes' rule:
P(W*\0) = maxP(W\0) = max
P(W)*P(0\W)
Continuous Speech Recognition
Language Models - cont.
Dialogue systems
Luděk Bártek
Continuous Speech Recognition
Language Model
Speech Recognition Grammars
We need following to calculate the maximum of probability:
■ a speaker model - P(0|l/l/)
■ a language model - P{W).
The speaker model can be replaced by the probability of generating the W using the corresponding Markov Model
The trigram model:
■ Experimentally evaluated that:
P(wn\w1... wn_i) = P(wn\wn_2wn-i)
Continuous Speech Recognition
A Topic Recognition
Dialogue systems
Luděk Bártek
Continuous Speech Recognition
Language Model
Speech Recognition Grammars
The continuous speech recognition is in a range 50 % — 99 % in dependency of a task, a language, ...
The recognition success rate can be improved by limiting the recognition domain:
■ a topic recognition
■ using speech recognition grammars.
The well-known topic:
■ a change of the state space and trigrams probabilities:
■ for example in stock market news was recognized either " honey" or " money" ?
■ a more accurate language model can be developed.
Speech Recognition Grammars
Dialogue systems
Luděk Bártek
Continuous Speech Recognition
Language Model
Speech Recognition Grammars
A general continuous speech recognition accuracy can drop to 50 % approx.
The improvement can be reached by the recognition domain restriction - a specification of accepted inputs for example.
The speech recognition grammars can be used:
■ context-free grammars
Used grammars notations:
■ a logic programming notation
■ proprietary solutions
■ open standards - JSGF, W3C SRGS, ...
Speech Recognition Grammars
Java Speech Grammar Specification (JSGF)
Dialogue systems
Luděk Bártek
Continuous Speech Recognition
Language Model
Speech Recognition Grammars
Platform and reseller independent textual grammar format.
Used for a speech recognition.
Part of the Java Speech API.
Uses the Java language style and conventions.
Present version 1.0 (Oct 1998).
Used in the Použit např. v rozpoznávací Sphinx-4 recognizer, VoiceXML interpreter VoiceGlue, ...
More details in the 2nd half of semester - a dialogue interface creation tools.
Speech Recognition Grammar
JSGF example
Dialogue systems
Luděk Bártek
Continuous Speech Recognition
Language Model
Speech Recognition Grammars
#JSGF
<root> = I want to go by <what> . I want to go by <what> from <where> to <where> . I want to go by <what> from <where> to <where> at <when> .;
<what> = train | bus; <where> = <city>; <when> = <dateTime>;
	Speech Recognition Grammars
	W3C Speech Recognition Grammar Specification (SRGS)
Dialogue systems	
Luděk Bártek	
Speech Recognition	■ The W3C standard.
Continuous Speech Recognition Language Model	■ Current version 1.0 (Mar 2004).
Speech Recognition Grammars	■ Defines the notation of rules and their referencing.
Utterance Semantic Interpretation	■ Two types of the notation: ■ XML ■ ABNF (Augmented BNF). ■ More details on 2nd half of semester - the topic dialogue interface creating.
W3C SRGS Example
Dialogue systems
Luděk Bártek
Continuous Speech Recognition
Language Model
Speech Recognition Grammars
#ABNF 1.0 UTF-8 root Sgreeting; language en-GB; mode voice; Sgreeting = hi
<?xml version^" 1.0"encodings" utf-8"? >
<grammar root="greating"xml:lang="en-GB"version=" 1.0">
<rule id="greating">
hi
< /rule>
< /grammar>
	Utterance Semantic Interpretation	
Dialogue systems	■	Objective - computer understandable interpretation of the
Luděk Bártek		informations entered by user.
Speech Recognition	■	Example:
		1 want to buy the Shakespeare's The Taming of the
Continuous Speech Recognition		Shrew.
Language Model Speech Recognition		■ action = shopping
Grammars		■ title = The Taming of the Shrew
Utterance Semantic Interpretation		■ author = Shakespeare
	■	Representation - the (attribute, value) pairs.
	■	General semantic analysis steps: □ acquiring the utterance structure (syntactic analysis) Q the part of the speech interpretation Q deriving the whole utterance interpretation from the parts of the speech interpretations.
	■	The utterance semantic interpretation ^ the utterance intended sense (pragmatic interpretation).
Utterance Semantic Interpretation
Implementation
Dialogue systems
Luděk Bártek
Continuous Speech Recognition
Language Model
Speech Recognition Grammars
Attributes containing a part of speech semantic interpretation are assigned to the speech recognition grammar rules.
We can perform operations used to derive the semantic interpretation of the entire utterance from the interpretations of its parts.
■ The ECMAScript (see Semantic Interpretation for Speech Recognition can be used.
To find the intended utterance meaning we need to process its context as well.
■ The context can be described using finite automaton with output (the Mealy automaton - see later).
Semantic Interpretation Description
Dialogue systems
Luděk Bártek
Continuous Speech Recognition
Language Model
Speech Recognition Grammars
JSGF:
■ Assigned using tags
■ notation - {semantic interpretation}
< sentence >=< intro >< title > od < author >
< titlie >= Pejska a kočičku {Povídání o pejskovi a kočičce}|
(Zlou ženu|Zkrocení zlé ženy) {Zkrocení zlé ženy}|...
SRGS - the SISR standard :
■ standard W3C Voice Browser Activity.
■ uses ECMAScript.
■ The semantic interpretation is added to the rules using the tag tag or attribute.
■ The semantic interpretation is is return back using the JSON notation.