Báo cáo khoa học: "A MULTILEVEL APPROACH TO NON-STANDARD INPUT " potx

5 271 0
Báo cáo khoa học: "A MULTILEVEL APPROACH TO NON-STANDARD INPUT " potx

Đang tải... (xem toàn văn)

Thông tin tài liệu

A MULTILEVEL APPROACH TO HANDLE NON-STANDARD INPUT Manfred Gehrke Project "Prozedurale Dialogmodelle" * Department of Linguistics and Literature University of Bielefeld P.O.Box 8640, D-4800 Bielefeld 1 "da kommen sic doch ungefaehr ganz bestimmt hln." from one of our dialogues ABSTRACT In the project "Procedural Dialogue Models" being carried on at the University of Bielefeld we have developed an Incre- mental multilevel parsing formalism to reconstruct task-oriented dialogues. A major difficulty we have had to overcome is that the dialogues are real ones with numerous ungrammatical utterances. The approach we have devised to cope with this problem is reported here. I THE INCREMENTAL, MULTILEVEL PARSING FORMALISM In recent NLU-systems a major impor- tance is lald on processing non-standard input.l) The present paper reports on the experiences we have made in the project "Procedural Dialogue Models" reconstruc- ting task~oriented dialogues, which were uttered in a rather colloquial German.2) To this aim we have developed an incre- mental multilevel parsing formalism (Chri- staller/Metzlng 82, Gehrke 82, Gehrke 83), based on an extension of the concept of cascaded ATNs (Woods 80). This formalism (see fig. A) organizes the interaction of several independent processing components, in our case 5. The processing components need not be ATNs; it is up to the user of the formalism to choose the tool for the specific task that suits her/hlm best. * The project is funded by the Deutsche Forschungsgemeinschaft. I) See e.g. session VIII in ACL 82, Car- bonell 83, Kwasny 80, 'Sondheimer/Wei- schedel 80; for handling of ellipsis see Weischedel/ Sondheimer 82, Wahlster et al. 83. 2) The dialogues that we are working with were recorded in the City of Frankfurt/ Main (Klein 79). The first level, an ATN, is responsible for the syntactic analysis. Its main put ~ pose is to detect phrases as well as wh~ and imperative structures and to determine the syntactic status a phrase may have in the utterance. On this level the analysis of an utterance can reach a permissible final state even if there is no complete sentence structure derived. The decision, if permissible or not, is made on the pragmatic level. The semantic interpretation is carried out by a case-oriented production rule system. According to the incremental man- ner of processing there are two defini ~ tions of case slots: i. a general one for a tentative categori- zation of phrases before the main verb is detected, and 2. a specific one, connected with the respective verb frame. This double definition of case slots en- ables the parsing formalism to make a minimal interpretation of parts of the utterance in the case of a missing verb and thus gives suggestions for filling this gap. The QUESTION-ANSWER-INTERACTION~compo~ nent is an ATN. It has to categorize an utterance as a question, a part of an answer or as communication maintaining categories such as assurance, confirmation etc. This component is also responsible for recognizing a dialogue within in a dialogue when e.g. some clarification on that dialogue takes place. Finally the TASK-COMMUNICATION-compo- nent is itself a two-level cascade. One stage, the TASK-INTERACTION-component, provides the formalism with a dialogue scheme that presumably is applicable to most types of information-giving dialo- gues. The other stage, the TASK-SPECIFICA- TION-component, is responsible for the 183 SYNTACT/C- ~ COMPONENT "-~ I I I SEMANTIC- COMPONENT ~ I QUESTION- ANSWER- ~-~,~~ INTERACT/ON" COMPONENT addresser's KS addressee's KS t I I I TA SK-INTERAC T ION- COMPO NENT { TASK- SPECIF/CA - T ION "COMPONENT common KS ufferance : :: fransmif ~ fransfer of confro! o := read, resume :: wrife, gef ~ " " dafa info/out of KSs Fig. A: Archifecfure of fhe Forma(ism 184 task-specific categorization, in this case direction giving with categories such as route description or place description. We divided this component into two stages which are both realized as ATNs, I. in order to have a greater modulariza- tion between different components (pro- cessing other types of task-oriented dialogues may require only to change the TASK-SPECIFICATION-component on the pragmatic level.), and 2. because each level contributes one category to the utterance or a part of it, which avoids double categorizations at one level. The pragmatic components are supported by knowledge sources (KS) that hold for each participant about his knowledge of the world, the partner and the course of the dialogue dependent of the task. The processing components exchange their re- sults via a common KS (a kind of a black- board). Only control information is trans- mitted by the cascade. The parsing forma- lism is written in MacLISP and in FLAVORS (diPrimio/Chrfstaller 83) - an object- oriented language embedded in MacLISP. II The Dialogue Corpus The dialogues that we are dealing with are real task-oriented dialogues. The majority of utterances in these dialogues contain non-standard constructions or are in some sense incomplete. There are dia- lect words, word duplications, self-cor- rections and interjections. On the other hand they do not contain complicated sen- tence structures such as subordinations, complex noun-phrases, etc. The translation of one of our dialogues (see fig. B) may give a little impression of these non- standard features. An extreme approach to the solution of the problem of non-standard utterances would be, in our case, to take the dialo ~ gues in the corpus as they are as stan ~ dard. But this would only be an ad ho~ solution, lacking generality. Thus we burden the pragmatic components with the decision whether an utterance is accept- able or not. III HANDLING OF NON-STANDARDS ON THE WORD LEVEL Dialect words are handled as words of the standard speech, i.e. they occur in the lexicon. Duplication of words is re- cognized during the read process t ~heTc~e actual word is compared with its predeces- sor. If they are identical and if they belong only to one syntactic category, then the next word is processed directly. Otherwise a flag is set, stating that there is possibly a duplication of words to analyse. Such words are analysed as usual, but the syntactic category of the predecessing word may not be used. This condition may cause a new problem, namely X: Could You please tell me, how I can come to the old opera? to y: What? X: the old opera y: to the old opera; straight ahead, yes. Come on, I show X: yes, yes (I0 sec. pause) Y: it to you. ahead to the Kaufhof. To the X: yes Y: right there is the Kaufhof, isn't it? and there you stay on the X: yes, the eh Y: right side, straight on through the Fressgass" it is new X: eh mhm Y: it's just in a new shape, the Fressgass', yes then you will X: thank you Y: reach directly the opera square, that is the opera ruin. X: very much. Y: Fig. B: a sample translation 185 when a participial construction occurs within a noun-phrase, e.g. "die die Stras- se ueberquerende Frau". Comparable to this problem are constructions in English that begin with "that that ". Luckily such constructions do not occur in our corpus , but this prob~lem has to be kept in mind. If the analysis runs into an error, then the status quo ante is reestablished and the actual word is dlscarded as a duplica- tion. Cases of self-correctlon on the word level, when a word is replaced by another word of the same syntactic category or the same word with an altered inflection, are recognized during the read process as well. They can be treated in a similar way with the difference being, that the pre- ceeding word is discarded and the diffe * ring features of the actual word are taken but no rules are without exceptions. The rare case of two suceeding nouns, e.g. in proper names (names of streets or buil- dings) is captured in the lexicon, while groups of prepositions or adverbs are permissible. IV HANDLING OF INCOMPLETE UTTERANCES To handle utterances that are in some sense incomplete we have the great advan ~ rage that they have been uttered in a specific context. A linguistic analysis of the dialogues shows furtheron that some types of answers, especially route des ~ criptions und partial goal determinations, have a preference for being elliptificat- ed. In the cases mentioned the degree of elllptification ranges from omitting the facultative SOURCE case slot to omitting the AGENT case slot up to uttering only a GOAL case slot. Due to the incremental manner o6 par ~ sing, as soon as a partial analysis of an utterance is obtained the SEMANTIC-compo- nent is triggered. There a phrase is ten 4 tatively categorized, depending on case markers (ending, preposition); auxiliary verbs mark tense or mood, etc. Some deic- tic adverbs such as "hier" ("here") could act as a SOURCE case slot for MOVE-verbs. Categorized phrases are sent to the QUEST- ION-ANSWER-INTERACTION-component. When the end of an utterance is recog- nized (sentence markers; colons can act as end markers too), then the SEMANTIC-compo- nent tests for completion. If a main verb and/or a obligatory case slot is missing, then a procedure is triggered to fill this gap. This inference procedure fir~:t in- spects the actual states of the pragmatic components to gather information as to which categories they expect next and wether the partial analysis fits into the requirements of the respective category. This information is then used by various inference rules to fix the missing verb or case slot. Let us consider some examples: i. "vor bis zum Kaufhof." ("ahead to the Kaufhof") Expectations of the pragmatic compo- nents: QUESTION-ANSWER- INTERACTION-comp.: answer T.ASK-INTERACTION~ comp.: an act of information~giving TASKISPECIFICATION ~ comp. : route-,place description, partial goal determination, goal declaration SEMANTIC~comp. : "zum Kaufhof" is care ~ gorized as a GOAL case slot. The categories goal declaration and place description can be discarded, because their requirements are not matched. Since an explicit goal (buil~ ding, street connection etc.) is utter- ed the requirements of partial goal determination are fulfilled first. This category requires a verb of the field MOVE, e.g. "gehen" ("to go"). The GOAL case slot matches one of the require- ments of the verb, but an AGENT is still missing. Since the utterance is part of a dialogue and it is directed from the person, who is asked to give a direction, to that person, who had asked for the direction, a reference to the last person, "sie" ("you"), is taken as AGENT. 2. "gradaus dutch die Fressgass'" ("straight on through the Fressgass'") The expectations on the pragmatic com- ponents are the same as above. "dutch die Fressgass'" is categorized as a PATH case slot. In this case a route description is proved first and again a MOVE-verb is taken as a candidate for the verb. The PATH case slot matches with its requirements and the adverb "gradaus" is a possible description of the way of MOVing. The AGENT case slot is found as above. 3. At last a very funny example. One of our dialogues starts with the following sequence: X: to the old opera? Y: Yes? 186 Here Y must have recognized, presumably by eye contact, that X wants to get into contact with him. X's answer, itself a question, is quite unpollte but understandable. Syntactically this utterance is an elliptical question (voice rising, when uttered) and on the semantic stage it can be categorized as a GOAL case slot, depending on "zur" and the fact that the NP refers to a building. Since it is at the beginning of a task-oriented dialogue with no task fixed until now, it is categorized as a de~i.af~o~i{,'c~lo A complete ver- sion of this utterance may be "How can, I get to the old opera?" Another possible interpretation may be that X only wants to be confirmed in her/hls assumption that he/she is on the right way to his goal. In this case a correct answer would have been simply "yes". But a decision which interpreta- tion holds true can not be made with the available information. V Conclusion It has been shown how some types of ill4formed input are handled, especially with the help of semantic constraints and pragmatic considerations. At present, our work in this field is laid on handling selfocorrections above the word level, as you will find one in llne 5 of the sample translation. Acknowlegdements I would llke to thank D. Me,zing, T. Christaller and B. Terwey without whose cooperation this work would not have been possible. References ACL 82 Proc. of 20th Annual Meeting of the Association for Computational Lingu- istics, Toronto, 1982 Carbonell, J.G. "The EXCALIBUR project: A natural lan- guage interface to expert systems", in: Proc. 8th IJCAI Karlsruhe 1983, Los Altos, Ca. 1983 Chrlstaller, T., Me,zing, D. "Parsing Interaction: a multilevel par ~ set formalism based on cascaded ATNs." in: Sparek-Jones, K., Wilks, Y. (eds.), Automatic Natural Language Parsing, Chlchester, 1983 Gehrke, M. "Rekonstruktion aufgabenorlentierter Dialoge mit einen mehrstufigen Parsing ~ Algorithmus auf der Grundlage kaska- dierter ATNs", in: W. Wahlster (ed.), Proc. of 6th German Workshop on AIp Berlln-Heidelberg~New York, 1982 Gehrke, M. "Syntax, Semantics and Pragmatics in Concert: an incremental, multilevel approach in reconstructing task-oriented dialogues", in: Proc. 8th IJCAI Karlsru- he 1983, Los Altos, Ca., 1983 Klein, W. "Wegauskuenfte", Zeitschrift fuer Lin~u~ istik und Literaturwissenschaft, 9: 9~57, (1979) Kwasny, S.C Treatment of ungrammatical and extra~ grammatical phenomena in natural langu- age understanding systems, Indiana Uni- versity, 1980 di Primio, F., Christaller, T. A poor man's flavor system, ISSCO, Gene~ va, 1983 Sondheimer, N.K., Weischedel, R.M. "A rule based Approach to Ill-formed Input", in: Proc. of COLING 80, Tokyo, 1980 Wahlster,W., Marburger,H., Jameson,A., Busemann,S. "Over'Answering Yes-No Questions: Exten- ded Responses in a NL Interface to a Vision System", in: Proc. 8th IJCAI Karlsruhe 83, Los Altos, Ca., 1983 Weischedel, R.M., Sondheimer, N.K. "An Improved Heuristic for Ellipsis Processing", ~CL 82, 85-88 Woods, W.A. "Cascaded ATN Grammars", Journal of ACL, 6: 1 (1980), 1-13 187 . A MULTILEVEL APPROACH TO HANDLE NON-STANDARD INPUT Manfred Gehrke Project "Prozedurale Dialogmodelle". standard features. An extreme approach to the solution of the problem of non-standard utterances would be, in our case, to take the dialo ~ gues in the

Ngày đăng: 24/03/2014, 05:21

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan