Báo cáo khoa học: "Evaluation of Importance of Sentences based on Connectivity to Title" doc

5 333 0
Báo cáo khoa học: "Evaluation of Importance of Sentences based on Connectivity to Title" doc

Đang tải... (xem toàn văn)

Thông tin tài liệu

Evaluation of Importance of Sentences based on Connectivity to Title Takehiko Yoshimi and Toshiyuki Okunishi Takahiro Yamaji and Yoji Fukumochi Software Business Development Center, SHARP Corporation 492 Minosho-cho Yamatokoriyama Nara, Japan Abstract This paper proposes a method of selecting impor- tant sentences from a text based on the evaluation of the connectivity between sentences by using sur- face information. We assume that the title of a text is the most concise statement which expresses the most essential information of the text, and that the closer a sentence relates to an important sentence, the more important this sentence is. The importance of a sentence is defined as the connectivity between the sentence and the title. The connectivity between two sentences is measured based on correference be- tween a pronoun and a preceding (pro)noun, and on lexical cohesion of lexical items. In an experiment with 80 English texts, which consist of an average of 29.0 sentences, the proposed method has marked recall of 78.2% and precision of 57.7%, with the se- lection ratio being 25%. The recall and precision values surpass those achieved by conventional meth- ods, which means that our method is more effective in abridging relatively short texts. (Luhn, 1958; Edmundson, 1969; i~I~I~, 1987; l~)g~)~ and ~I~, 1988; r~]i~l~ et al., 1989; Salton et al., 1994; Brandow et al., 1995; ~'2~tZ~ and ~3kl~, 1995; {~)l~ et al., 1995; I£1~$1~ et al., 1995; Watanabe, 1996; Zechner, 1996; {q~it~, 1997). 1443 i~©~ L~, 1) ~~9 ~ b~ ~tS~ (F,~i~k~ et al., 1989; Ono et al., 1994) (Hoey, 1991; Collier, 1994; ~gi~ , 1997; {~ ;~ ~ et al., 1993) 7)~b5. ~&i~-~t,¢<'~-~ ot,¢7)~ 9 ~)~b;5 (Halliday and Hasan, 1976) 7)~, ~ 2 2.1 ff-#Y, I- ~Ili~ J- R:q) ~_~l~ I= ~ ~ 7~ {K~ ~:Z ( ¢~ ~) ~ b ~? :Z ( ©~ ~) ~ © ;~ ~ [A,B,C} ~. $2 $4"~'' ~. ~. '\ {A,D,E} [B,C,G} ~ ~\, / ~'k IA,D,H,K} { A,D,E,F} {D,H} f4~ .~lZ ~ 7o ~}/~[~.( 5~ ~ ~) [] 1: ~¢~/~6~]~ ~. i<3 (1) 2.2 ~a~ ~9 ~1~ o~ ~ffi .~.~ ~ ©~-e~:, #o~ ~ ,~ Iz t~ t ~-c ~, ~. I~, ~-~, ),.~.f~, .~-~, ~-~, ~l~© ~,-~-e~ ~ ~ L.,= e~_~°, t~.~. & eS~¢)~t~ = M~,I (e) y_,y_z2, Mi, i ~S i ~©~ ~ t~S~ (2) ~9,~.~t~ 2.2A ~_/~-~-t6. _ 9¢~ 1444 2.2.1 A,~~(~) :~O)~,,~o)~t~ 2.2.2 ~@~ ~"~ ~/J¢ ~.J ~ ~ to~=~d'h~t~, ff~J~_[~, "put pressure on" "put" [:t~' ~W, "cabinet meeting" ~ "meet- ing" g~ ~ ~k'~ ,~,.~h~- 6. 2.2.3 ~-,, 0) ~:g~ {-~ :k:-~ t~{[~_ ~ (Edmundson, 1969; P,~8,~ et al., 1989; Watanabe, 1996) ©~){~'~bTo. ~g~'~t±, t 1 b, ~.~I: w : 5 ~ bt:. 2.2.4 ~-~:a)~.a) 9~ 9 ~X Sj t:_$31,,~©~ (theme) ~ b'~l~, (Givon, 1979). ~o~:, ~ Sj ¢)~Y~S~ ~e)~tZ]~ (t~, 1985)~L ~T~, ~e{£~$<, ~ ~:~#~, Sj ~ ~l~]~ ~o~ , ~©~i~:Y. ~" ~X~¢) 1/4 ~-F¢~ 3 ¢)~© 17.9% ~b o ~: b'~I~-~hl~T~ (~k~/K and ~ t F ~S¢)~_~ = N -:zx~.~ A, ~~,~o©-:z:~-~ B, C, 25% ~ LT:. ~2 1:-3:~l~, ~ +Yi=~J~ 1445 20 15 ~ b~J~ 10 I I © 0 0 0 © © 0 0 © 0 0 © 0 0 0 0 © O0 0 0 0 0 O0 0 0 0 0 O0 I 40 60 I 2O [] 2: ~-~t:~ 6~/~¢ $ © © © © I 8O 100 ~-~ 78.2% 57.7% 25% -:/x ~.h A 72.3% 52.6% 26% -:/;z ~ ~ B 61.7% 39.5% 29% -:/y~ ~.h C 61.4% 40.9% 29% ->" ~ ~- A D 57.5% 42.2% 27% ~g/~otz=k-~5. ~za~-~x b-~l~, "shooting" "gunfire" ©~1~,~-~-~ t;~z~, "gun- fire" ~ZE~t~ g'~ ~lz ~o ot~7]~ ~ t~!, , ~ 7% ~#©(~ b ~:~ (base) ©~J~ =~ ~t~ ~_tff, nounce" ~ "announcement" t~:, ~.~ L~ b~" 1446 t$5. 4 ~3~ ~) l: t: ~ btz ~ -~, ~ 78.2%, ~#~ 57.7% ©~ ~6 : k ~, U~=. RIU (Hearst, 1997), ~r+)-:~ " b l:°y~ ' ='~ {:-~-T-~ References R. Brandow, K. Mitze, and L. F. Rau. 1995. Auto- matic Condensation of Electric Publications by Sentence Selection. Information Processing &' Management, 31(5):675-685. 100 80 6O ~#m / ~m (%) 4O 20 0 0 ~,~ ' , _ ~'¢¢ x. ~ -~ + ' i~IgN~NNo~N~ -+. - +. I I I I 20 40 60 80 100 ~m (%) ~3:~~~~ A. Collier. 1994. A System for Automating Concor- dance Line Selection. In Proceedings of NeMLaP, pages 95-100. H. P. Ednmndson. 1969. New Methods in Automatic Extracting. Journal of the ACM, 16(2):264-285. T. Givon. 1979. From Discourse to Syntax: Gram- mar as a Processing Strategy. In T. Givon, editor, Discourse and Syntax, pages 81-112. Academic Press. M. A. K. Halliday and R. Hasan. 1976. Cohesion in English. Longman. M. A. Hearst. 1997. TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages. Compu- tational Linguistics, 23(1):33-64. M. Hoey. 1991. Patterns of Lexis in Text. Describ- ing English Language. Oxford University Press. H. P. Luhn. 1958. The Automatic Creation of Lit- erature Abstracts. IBM Journal for Research and Development, 2(2):159-165. K. Ono, K. Sumita, and S. Miike. 1994. Abstract Generation based on Rhetorical Structure Extrac- tion. In Proceedings of COLING, pages 344-348. G. Salton, J. Allan, C. Buckley, and A. Singhal. 1994. Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts. Science, 264(3):1421-1426. H. Watanabe. 1996. A Method for Abstracting Newspaper Articles by Using Surface Clues. In Proceedings of COLING, pages 974-979. 1447 K. Zechner. 1996. Fast Generation of Abstracts from General Domain Text Corpora by Extract- ing Relevant Sentences. In Proceedings of COL- ING, pages 986-989. r,~]i¢~, ~i/~, and ~I~ 1989. ~P)~3~cP~')~]~t~ ~I:-"p~'~. NLC89-40, ~-T-~{~. ~I~B. 1987. ~~t~-:,':xff-2~. NL63- 6, '1~¢~~. ~'~ ~, ~, and P~J~ 1993. ~3~ ~. ~J~J~Fq, ~, and ~1~ 1995. ~-Y-=~ :~©~'4~-~ ~-~. ~~:~, 36(10):2371-2379. ~g~n~:, ~1~, and I~ 1995. ~t~ ~-~Iz$~J~ L ~': ~~-:/~ ff h GREEN. ~J~, 2(1):39-55. ~'2~1:1:~ and ~<;~gl~. 1995. ~ffl/¢~ :/©~ ~~~, 36(8):1838-1844. {~. 1997. ~L~$~J~bt:.~[~ • t~ b~ ~. ~. 1985. *~©~t. :k~. ~2g?~ 1997. 3~©~#~t:-~-'-J< I~1~. In ,,J~J~x~ 31~lZ~:~-~l~, pages 321- 324. ~g~)X and ~1~. 1988. ~ ~7 1~'~9~ ~~ ~ ~, 29(3):325-328. . Evaluation of Importance of Sentences based on Connectivity to Title Takehiko Yoshimi and Toshiyuki Okunishi Takahiro Yamaji and Yoji Fukumochi Software. on the evaluation of the connectivity between sentences by using sur- face information. We assume that the title of a text is the most concise statement

Ngày đăng: 23/03/2014, 19:20

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan