thematic segmentation of texts

Báo cáo khoa học: "Thematic segmentation of texts: two methods for two kinds of texts" pdf

Báo cáo khoa học: "Thematic segmentation of texts: two methods for two kinds of texts" pdf

Ngày tải lên : 08/03/2014, 05:21
... Thematic segmentation of texts: two methods for two kinds of texts Olivier FERRET LIMSI-CNRS B~t. 508 - BP 133 F-91403, Orsay ... process a large amount of texts, word-based approaches have been developed. Hearst (1997) and Masson (1995) make use of the word distribution in a text to find a thematic segmentation. These ... interested in the thematic dimension of the texts, they have to be represented by their significant features from that point of view. So, we only hold for each text the lemmatized form of its nouns,...
  • 5
  • 363
  • 0
Tài liệu Báo cáo khoa học: "Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure" pdf

Tài liệu Báo cáo khoa học: "Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure" pdf

Ngày tải lên : 20/02/2014, 04:20
... iso- lated texts, to the problem of segmenting parallel parts of documents. The task of aligning each sentence of an abstract to one or more sentences of the body has been studied in the context of summarization ... facilitate the development of better user interfaces and im- prove the performance of summarization and in- formation retrieval systems. Discourse segmentation of the documents com- posed of parallel parts ... emergence of the Web 2.0 technologies: many texts on the web are now accompanied with com- ments and discussions. Segmentation of these par- allel parts into coherent fragments and discovery of hidden...
  • 5
  • 376
  • 0
Tài liệu Báo cáo khoa học: "Unsupervised Segmentation of Chinese Text by Use of Branching Entropy" pdf

Tài liệu Báo cáo khoa học: "Unsupervised Segmentation of Chinese Text by Use of Branching Entropy" pdf

Ngày tải lên : 20/02/2014, 12:20
... 0.6 0.7 0.8 0.9 1 10 100 1000 10000 100000 1e+06 size(KB) recall precision 434 Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 428–435, Sydney, July 2006. c 2006...
  • 8
  • 395
  • 0
Tài liệu Báo cáo khoa học: "REPRESENTATION OF TEXTS FOR INFORMATION RETRIEVAL" pdf

Tài liệu Báo cáo khoa học: "REPRESENTATION OF TEXTS FOR INFORMATION RETRIEVAL" pdf

Ngày tải lên : 21/02/2014, 20:20
... volume constraints typical of DR systems. The modi~,cations are designed to recognize such aspects of discourse structure as establishment of topic; "setting of context; summarizing; concept ... of adjacency, which counted stop-list words, to one which ignores stop- list words. This is to correct for the distortion caused by the distribution of function words in the recognition of ... systems for each of the pro- posed modifications. In this experiment the original corpus of thirty abstracts (but not the prublem state- ments) is submitted to all versions of the analysis...
  • 2
  • 419
  • 0
Báo cáo khoa học: "Joint Identification and Segmentation of Domain-Specific Dialogue Acts for Conversational Dialogue Systems" doc

Báo cáo khoa học: "Joint Identification and Segmentation of Domain-Specific Dialogue Acts for Conversational Dialogue Systems" doc

Ngày tải lên : 07/03/2014, 22:20
... number of words in utterances with only a single dialogue act is 7.5 (with a maximum of 34, and minimum of 1), and the average length of utterances with multiple dialogue acts is 15.7 (max- imum of ... 30 words or more). Figure 2 shows the histogram of the average error (absolute value of word offset) in the start and end of the dialogue act segmentation. Each dialogue act identified by Algorithm ... and (2) instead of using the product of the scores of all segments, use the average score per segment: (Score(C h,i ) · Score(S h−1 )) 1/(1+N(S h−1 )) where N (S i ) is the number of segments in...
  • 6
  • 354
  • 0
Báo cáo khoa học: "Understanding the thematic structure of the Qur’an: an exploratory multivariate approach" pdf

Báo cáo khoa học: "Understanding the thematic structure of the Qur’an: an exploratory multivariate approach" pdf

Ngày tải lên : 08/03/2014, 04:22
... the thematic structure of the Qur’an based on a fundamental idea in data mining and related disciplines: that, with respect to some collection of texts, the lexical frequency profiles of the ... following: “Figure 6. Plot of groups C and D” Results from figure 6 are also supportive of the thematic structure of each group. Suras of group C are more abundant in the use of narratives and addressing ... following plot of number of content words per sura, sorted in order of descending magnitude. “Figure 1. Plot of number of words per sura” Clearly, given a word with some probability of occurrence,...
  • 6
  • 309
  • 0
Báo cáo khoa học: "Searching for Topics in a Large Collection of Texts" doc

Báo cáo khoa học: "Searching for Topics in a Large Collection of Texts" doc

Ngày tải lên : 08/03/2014, 04:22
... be a graph of col- lection . Each subset is called a cut of ; stands for the complement . If are disjoint cuts then is a set of edges within cut ; is called weight of cut ; is a set of edges between ... weight of the connection between cuts and ; is the expected weight of edge in graph ; is the expected weight of cut ; is the expected weight of the connection between cut X and the rest of the ... parameter. This feature of the GRA has been designed for the sake of generalization, in order to not overfit the input sample. The input of the GRA consists of (i) a sam- ple set of document vectors...
  • 6
  • 447
  • 0
Báo cáo khoa học: "Combining a Statistical Language Model with Logistic Regression to Predict the Lexical and Syntactic Difficulty of Texts for FFL" potx

Báo cáo khoa học: "Combining a Statistical Language Model with Logistic Regression to Predict the Lexical and Syntactic Difficulty of Texts for FFL" potx

Ngày tải lên : 08/03/2014, 21:20
... number of classes is small. Exploratory analysis of the corpus highlighted the importance of having a similar number of texts per class. This requirement made it impossible to use all the texts ... proportion of predictions that were within one level of the human-assigned 24 M. Brysbaert, M. Lange, and I. Van Wijnendaele. 2000. The effects of age -of- acquisition and frequency -of- occurrence ... generalisability of the formula, and also to check what sort of texts it does not fit (e.g. sta- tistical descriptive analyses have considered songs and poems as outliers). 4 Selection of lexical and...
  • 9
  • 514
  • 0
a thematic analysis of psycho

a thematic analysis of psycho

Ngày tải lên : 21/03/2014, 21:55
... begins with a view of a city that is arbitrarily identified along with an exact date and time. The camera, seemingly at random, chooses first one of the many buildings and then one of the many windows ... that Hitchcock first introduces the notion of a split personality to the audience. Throughout the first part of the film, Marion's reflection is often noted in several mirrors and windows. ... height of its foreshadowing power as Marion battles both sides of her conscience while driving on an ominous and seemingly endless road toward the Bates Motel. Marion wrestles with the voices of...
  • 2
  • 405
  • 0
movies a thematic analysis of alfred hitchcock s

movies a thematic analysis of alfred hitchcock s

Ngày tải lên : 21/03/2014, 22:10
... audience until the end of the film, Movies: A Thematic Analysis of Alfred Hitchcock's Psycho Alfred Hitchcock's Psycho has been commended for forming the archetypical basis of all horror films ... aspects of life. The effective use of character parallels and the creation of the audience's subjective role in the plot allows Hitchcock to entice terror and a convey a lingering sense of anxiety ... height of its foreshadowing power as Marion battles both sides of her conscience while driving on an ominous and seemingly endless road toward the Bates Motel. Marion wrestles with the voices of...
  • 4
  • 210
  • 0
Báo cáo khoa học: "Discourse Segmentation of Multi-Party Conversation" doc

Báo cáo khoa học: "Discourse Segmentation of Multi-Party Conversation" doc

Ngày tải lên : 23/03/2014, 19:20
... 19 of the meetings, which seems to indicate that seg- ment identification is a feasible task. 2 4 Segmentation based on Lexical Cohesion Previous work on discourse segmentation of written texts ... error of P k = 15.79, while the average performance of the algorithm is P k = 15.31 on the WSJ test corpus (unknown number of segments). mean and the variance of the hypothesized probabil- ities of ... a segmentation algorithm that utilizes a metric of lex- ical cohesion and that performs as well as state -of- the-art text-based segmentation techniques. It works both with written and spoken texts. ...
  • 8
  • 239
  • 0
Báo cáo khoa học: "Automatic Segmentation of Multiparty Dialogue" pot

Báo cáo khoa học: "Automatic Segmentation of Multiparty Dialogue" pot

Ngày tải lên : 24/03/2014, 03:20
... alue. 278 Automatic Segmentation of Multiparty Dialogue Pei-Yun Hsueh School of Informatics University of Edinburgh Edinburgh, EH8 9LW, GB p.hsueh@ed.ac.uk Johanna D. Moore School of Informatics University of ... for automatic segmentation of meetings at different levels of granularity, we will explore the use of 279 and van Mulbregt et al. (1999) use topic lan- guage models and variants of the hidden ... shifts to the problem of identifying subtopic boundaries. We then explore the impact on performance of using ASR output as opposed to human transcription. Exam- ination of the effect of features shows that...
  • 8
  • 162
  • 0
Báo cáo khoa học: "Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency" ppt

Báo cáo khoa học: "Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency" ppt

Ngày tải lên : 31/03/2014, 03:20
... [%] Probabilistic Recursive MDL Linguistica No segmentation Figure 2: Expectation of the percentage of recog- nized morphemes for English data. a baseline of no segmentation fairly high. The no- segmentation baseline ... obtain a segmentation of the words in the corpus, since we can rewrite every word as a sequence of morphs. 2.1 Size of the morph lexicon We start the generation process by deciding the num- ber of ... segmenta- tion of a set of words consists of the following steps: (1) Segment the words in the corpus using the au- tomatic segmentation algorithm. (2) Divide the segmented data into two parts of equal...
  • 8
  • 215
  • 0