Order from us for quality, customized work in due time of your choice.
ABSTRACT
In recent times, with the increasing interest in conversational agents for dialog systems are being actively researched. A dialogue system consists of different components. Dialogue manager is the core component of every dialogue system. A dialogue management system can manage a dialogue between two or more agents, be they human or computer. The Dialogue Manager (DM) is the program which coordinates the activity of several subcomponents in a dialogue system and its main goal is to maintain a representation of the current state of the ongoing dialogue. This paper provides an overview of wide range of techniques applied for developing dialogue manager components. A classification for current approaches applied for dialogue management task is presented. Also we analytically discuss the properties of each approach.
INTRODUCTION
Conversational agents have been designed with the aim to converse with a human applying natural language processing. These systems are software program which receive users statements, interpret them using computational linguistics and provide responses according to users requirements. In recent decades, conversational agents have attracted increasing attention and have become a part of many information systems.
Interactive conversational agent has been applied in various domains such as Search and Recommender system, Spoken Dialog system, Chat bots, Task Oriented dialogue agents and Question Answering. Conversational agents can be divided into three main categories including: Question Answering, Task Oriented Dialogues and Social Chatbots [1]. Question Answering systems provide answers to user queries, Task Oriented Dialogue models attempt to execute users task and Social Chatbots agents are developed in order to converse with their users. Herein, the dialogue system which accomplish tasks like; ticket-booking, movie-booking is considered. Hulstijn [2] defines the dialogue system as follows:
A dialogue system is a computer system that is able to engage in an interactive dialogue with a human user about a particular topic. Usually it is designed to help the user of the system to performher1task.
In terms of task domain, the dialogue systems can be investigated in open domain or goal oriented dialogue systems [5]. They expressed goal oriented (or task based) systems as domain dependent systems which track a specific goal whilst open domain or non-task oriented system converse with human.
Based on the selecting the responses to the user dialogue system either follow single-turn or multi-turn response selection [6]. Single-turn models just consider the last utterance of user in order to opt for the response whereas multi-turn systems work on whole context for selecting the response.
Different approaches have been proposed for constructing dialogue management component. Most methods work based on handcrafted rules. Recently probabilistic methods draw extensive attraction. In this paper we briefly review methods applied for dialogue management component of dialogue system. The classification of methods is represented and the qualitative analyzes is expressed.
The rest of this paper is organized as follows. In section 2 different components of a dialogue system is introduced. In section 3 the related studies including the earliest one to the most state-of-the-art work are reported. Section 4 presents the classification for dialogue manager. Section 5 represents the analysis and section 6 remarks conclusions and future work.
Dialogue System
A modular dialogue system usually consists of input unit, Natural Language Understanding (NLU), Dialogue Manager (DM), Natural Language Generation and response units, in which each module is trained separately. Dialogue systems according to the architecture applied in learning their components, can be categorized into pipeline and end-to-end strategies [5].
Input Unit
Depending on user input types a system can be in form of Spoken dialogue system, text based dialogue system and Multimodal dialogue system. In Spoken dialogue system, automatic speak recognition (ASR) unit is the first component of dialogue system while in text based dialogue system, inputs are in form of raw text. A multimodal interactive system by Bernsen [3] is defined as follows:
A multimodal interactive system is a system which uses at least two different modalities for input and/or output.
Considering this definition [4] defines multimodal dialogue system (MDS) as the dialogue systems which applied two or more different modalities. For example, a mixture of speech, gestures, touches, etc. Strategies developed for text based dialogue system can be adjusted for spoken dialogue system as well, and reverse.
Natural Language Understanding
Spoken language understanding or natural language understanding is the second part of a dialogue system. This component receives utterance as input. First the domain of utterance is determined and intent of user is detected, then slots are tagged. Semantic utterance classification is offered as solution to domain and intent detection [9]. Predened slots are filled by values in semantic frames throughout inputs information [10]. Slot filling usually is considered as sequence labeling task. Generative and discriminative approaches have been used for slot filling [11].
Dialogue Manager
In order to converse with user this part manages the dialogue flow. Dialogue manger is made of two sub modules including: Dialogue State Tracker and Policy Learning, which both are trainable. A policy learning is responsible to map a dialogue state to a dialogue act. Standard approaches to learning policy include: handcrafted rules, supervised machine learning and reinforcement learning (online and batch).
Natural Language Generation
The natural language generator (NLG) receives the specification of a communicative act from the DM and generates a matching textual representation. The system responses are typically generated as natural language with a list of content items from a part of the external knowledge database (e.g., restaurant database) that answers the specific user query or request.
Output Units
This unit is responsible to conveying the output of the response generated in the NLG to the user. The output can be in form of text or speech. The system output can be visualized on a display if available or synthesized by a text-to-speech (TTS) module or pre-recorded audio. For speech it can be fulfilled in terms of two processes: text analysis, a mapping of the text to their matching phoneme representationsincluding an analysis of linguistic structure and prosodic mark-up, and speech generationwhereby the annotated speech act is finally vocalised to the user.
Related Work
In [&] authors list five key capabilities that a dialogue manager fulfils: first supports mixed-initiative system by fielding spontaneous input from either participant and routing it to the appropriate components. second Supports non-linguistic dialogue ‘events’ by accepting them and routing them to the Context Tracker (below). Third increases overall system performance. For example, awareness of system output allows the Dialogue Manager to predict user input, boosting speech recognition accuracy. Similarly, if the back-end introduces a new word into the discourse, the Dialogue Manager can request the speech recognizer to add it to its vocabulary for later recognition. Forth supports meta-dialogues between the dialogue system itself and either participant. An example might be a participant’s questions about the status of the dialogue system. fifth acts as a central point for dialogue troubleshooting, after. If any component has insufficient input to perform its task, it can alert the Dialogue Manager, which can then reconsult a previously invoked component for different output.
A dialogue manager component is consisting of dialogue state tracking (DST) and policy learning. The DST holds the current state of dialogue and next possible states. Policy learning selects the next dialogue act. In [12] authors demonstrated contextual interpretation, domain knowledge management and Action selection duties as the common tasks of the dialogue manager.
In [13] authors candidate three approaches for dialogue management including: finite state, form-filing and information state update. Another classification for dialogue manger is suggested in [14] in which strategies are divided into three main categories including Handcrafted approaches, probabilistic methods and hybrid techniques. In this grouping Handcrafted algorithms includes finite-state automata, frame-based dialogue and model based. Example-based, Markov Decision Processes (MDP) and memory neural based are belong to probabilistic methods. In [15] Finite State Automata, Form-lling Approach, Agent-based Approach and Information State Update are introduced as the main approaches to dialogue management. Another classification suggested in [16] including: Integrated (tree-based), Finite-state, Frame-based, Plan-based and Agent-based (BDI) models. In addition to aforementioned strategies several methods are based on Agenda and Ontology. In [24] authors categorized dialogue systems into Finite State Automata, Form-lling Approach, Agent-based Approach, plan based, rule based and Information State Update as a subgroup of handcrafted and Neural Networks, Bayesian Networks and Markov Decision Process as sub approaches to machine learning methods. In [25] models are divided into plan based approaches and collaborative. In [26] The approaches are categorized into finite state/dialogue grammars, plan-based and collaborative. In [1] authors group conversational systems into three categories: question answering agents, task-oriented dialogue agents, and chatbots. For each category, they present a review of state-of-the-art neural approaches.
Classification of Dialogue Management
In following descriptions for different approaches to dialogue manger are presented.
Handcrafted Approaches
This group of models refers to strategies in which the human developers or domain experts define a set of rules to made decision on action selection. In this category decision are made based on the rules or states [14]. Herein, these methods are investigated in five approaches including: Finite-State Automata, Semantic Frame based, Agent Based, Information state based, and Plan based.
Finite-State Automata:
This system is shown by Finite State Machines (FSMs) [24] which is a graph of predefined steps. In this system state of a dialogue during conversation is represented by steps and the possible actions which can be taken by user or system are shown by transition between the states. This structure is straight forward to encode all possible dialogue. Several studies have been proposed for this structure.
Semantic Frame Based Systems
These models also known as form-filling and slot-filing methods in which the required information are filling into predefined slots. These slot according to the domain application is defined by expert and application specific. In these method, if the information of a slot can limit the conversation is called informable, if the user wants to know the value of a slot, this slot is called requestable [1]. This model since the dialogue flow is depend on the users information is not pre- predetermined [22].
Agent-based Approach
In [19] author defined agent as follows. An agent is something that perceives and acts in an environment. In a dialogue system, dialogue participants as agents interact to each other to develop agent base system. In order to maintain information and pro-attitude for the purpose of selecting action an agent should have knowledge or belief [20]. In [22] In agent-based systems communication is viewed as interaction between two agents, each of which is capable of reasoning about its own actions and beliefs, and sometimes also about the actions and beliefs of the other agent. Several structure such as BDI [21], &-Ants [15] have been proposed for this model.
Information State Update (ISU) approach
In [17] authors introduced five concepts for an information based dialogue theory including: Informational components, formal representations of components, dialogue moves, update rules and update strategy. Informational components refer to common context and internal motivating factors. A set of update rules which control updating information state, applying update strategies in order to perform dialogue moves.
The approachs natural amiability and cohesiveness with other dialogue methodologies. Given this description, it appears that the ISU approach does not always share the downfalls of handcrafted systems.
Plan Based
In [22] this structure has been introduced as a sub model of agent based approaches, while it can be seen as a representative separate model. In this system each utterance should be considered and treated as action which has been done to achieve some goals [Cohen1994]. In [31] the description of this model is represented by an example which an agent A asks another agent B a question as follows:
A has a goal to acquire certain information. This causes him to create a plan that involves asking B a question. B will hopefully possess the sought information. A then executes the plan, and thereby asks B the question. B will now receive the question and attempt to infer As plan. In the plan there might be goals that A cannot achieve without assistance. B can accept some of these obstacles as his own goals and create a plan to achieve them. B will then execute his plan and thereby respond to As question.
Probabilistic Approaches
For this group of method statistics and machine learning (ML) techniques are used. In [32] due to the fact that datasets are applied these approaches have been called data-driven. These system as contrast to handcrafted models are dynamic, since during dialogue approach learning is applied. In [24] these methods are categorized into three main models including Neural Networks, Bayesian Networks and Markov decision process.
Neural Networks
In the context of spoken dialogue systems, neural network (NN) approaches tend to feature less in dialogue management but are especially prominent in speech recognition and natural language processing areas for processes such as sequence matching , learning , and prediction . The literature survey conducted for this report uncovered very little research in the area of dialogue management with specific use of NNs for the purpose of action selection, which the authors felt surprisinggiven that NNs have shown increased popularity in recent years.
Bayesian Networks
Bayesian Networks (BNs) are well-studies probabilistic models. They capture probabilistic distributions between events or variables, and comprise two parts: a directed acyclic graph, and conditional probability tables for each node. Bayesian networks are generally applied due to the realisation that the environment in which SDSs operate is inherently noisy the users utterances can be unclear due to unnecessary prolixity or speech recognition errors [24]. Situations where Bayesian inference can assist include keyword and feature recognition, and in user modelling and intent recognition. The use of BNs for deciding system actions appears to occur only when it is combined with other methods.
Markov Decision Process
A complete dialogue system can be modeled as a Markov decision process (MDP) in which each dialogue exchange results in a state transition from S to S` [27]. It is a formal model of fully-observable sequential decision processes which is an extension of Markov chains with a set of decisions/actions and a state based reward structure. In this process for each state a decision has to be made regarding the action to be taken in that state to increase some predefined measure of performance. The action affects not only the transition probabilities but the rewards as well. A state describes the environment at a particular instant of time. Itis assumed that the system can be in a finite number of states and the agent (SDS) can choose from a finite set of actions.
Analysis
Finite-state systems fail to act natural and cannot represent human dialogues. Although this model is proper for small domain problems, when the domain is complex and involve wide-range these systems become non-trivial. This system is predictable. Also these models are simple to construct. The required vocabulary and grammar for each state can be determined in advance [29]. They do not allow over-informative answers. Inhabits the user ability to ask questions and take initiative [29]. they Limit the users answers [22]. Such methods are not sufficient for Dialog because of our interesting in flexible, natural tutorial dialogues. [14]
Frame based are not predictable since they give the users more freedom to answers [22]. They allow more natural Dialogues. User can provide over informative answers. These systems cant handle complex dialogues. Range of application is limited to the systems that elicit information from users and act on the basis on the same [29]. However, even this flexibility does not reach the level required by Dialog. Also, form-filling is more suited to situations in which the information ow is mainly in the direction of the system, for instance in personal banking applications, whereas the dialogue manager for Dialog must support flexible information exchange in both directions. [14]
Information System Update systems are suitable for situation in which the determinism is significant. Similar to finite state machine this approach is predictable. In this approach the dialogue manager maintains a dialogue context, a description of the state of the dialogue and its participants, which then forms a framework for communication between the external modules associated with the system. The ISU approach has been developed in in the Siridus and Trindi projects, and implemented in TrindiKit [30]. Here the IS is divided into private’ system information, such as internal beliefs of the system, andpublic’ information shared between the system and the user such as their common beliefs. The IS stores both dialogue-level knowledge, such as the user’s last speech act or an evaluation of the utterance, as well as meta-information about the dialogue, such as an utterance history. It can be changed by update rules which rebased on actions in the world and which update the IS accordingly. [13]
Agent-based allow natural language in complex domain and are user friendly, like talking to human. These systems are hard to build. The agent itself are usually very complex [29]. Agent-based systems are suitable for mixed initiative dialogue because, for instance, the user can introduce new topics of conversation. Such systems can also use expectations to aid error correction. Due to the unconstrained nature of the interaction that agent-based systems support, there is a need for sophisticated natural language abilities. This contrasts with both nite state and form-lling systems, which restrict the language in which the interaction can take place [15].
Plan-based methods have the ability to provide scalable solutions to dialogue management, containing the required intelligence to automatically decide the pathways through a conversation [30]. One objection against plan-based dialogue modelling concerns the lack of theoretical base. Even though the approach provides a solid computational model, it is not entirely clear how the mental constructs postulated in the model correlates to people’s actual mental states. Disregarding the obvious objections by eliminative materialists and behaviourists alike, one would want the model to be rooted more deeply in psychological research. However, there seems to be similarities, at least at a surface level [28].
The establishment overheads of Bayesian Networks can be reduced if the domain in which the network is applied is suitably small and manageablethe BN approach is not trying to capture the entire dialogue modeland if it is combined with other techniques, notably Markovian models. If the domain was indeed small enough, then one could posit that a FSM could also be applied; however, if the transitions needed to be updated in response to observed interactions or other training data, then a BN would still be necessary [24].
Markovian Models by adopting strategies that have been created automatically by the system, without a human-in-the-loop developer, the system has essentially removed control from the developers to ensure that dialogue flow is effective and suitably refined. In the system the system allows for the domain expert to create and apply handcrafted rules which grants them a greater ability to ensure the conversation is adequately constrained [24].
Neural networks have shown good performance. Despite positive results in such implementations they appear restricted to single-round responses, albeit as their design intend, and appear unsuitable where conversational interaction is needed. Study of [24] expressed that the use of NNs for action selection should be a future goal for researchas common applications of the technique appear grounded in natural language processing.
Conclusions
Herein, different approaches to dialogue management have been introduced and discussed. Benefits of handcrafted systems will vary depending on the requirements of their domain of implementation and the capabilities desired. Systems concerned with security, safety, or strict adherence to business rules necessarily require the ability to adequately predict and cater for expected and unexpected usage scenarios; here, the key characteristic is determinism. In addition, handcrafted methods are easier to implement in smaller domains and simpler use cases, and their outputs can always be derived back to the conditions and inputs that caused them. A defining characteristic of probabilistic systems is their reliance upon large datasets in order to produce sufficiently reliable dialogue strategies, and this can be positive or negative depending upon domain of implementation. Availability of corpora and training data may be plentiful in social media scenarios, but this may not be the case in specific domains; without appropriate resources the ML algorithms cannot operate effectively. With appropriate training, however, they are able to respond to inputs in ways that cannot be matched or anticipated with rules prior to deployment. Indeed, they are perhaps sought after for their ability to operate without extensive effort by designers, and to adapt with extended use. Several studies in order to take benefit of handcrafted and probabilistic approaches have been focused on hybrid system in different ways. Investigating hybrid models could be a plan for future study.
Order from us for quality, customized work in due time of your choice.