Thông tin tài liệu
Proceedings of the 12th Conference of the European Chapter of the ACL, pages 835–842,
Athens, Greece, 30 March – 3 April 2009.
c
2009 Association for Computational Linguistics
Growing Finely-Discriminating Taxonomies from Seeds
of Varying Quality and Size
Tony Veale
tony.veale@ucd.ie
Guofu Li
guofu.li@ucd.ie
Yanfen Hao
yanfen.hao@ucd.ie
Abstract
! "
#
$"%%
!
1 Introduction
&
'
!&
(')*++,-
./(%01,,*-
"%(.et al.*++,-!2
$
! &
3(!!4
51,,6-!
)
3! %'0
78 # 3
(!! 59 (*+:6-
52;- #
!
#
!0
"%
2
!0
(5 1,,<-! 4 "%
#!"
=
!
grown
seeds!
&
7>?%/%8
!""%
# 2
835
&7
8
sharp!;
(-
@"%
X-ness! /
!
0
1
A
!<
=
#
!>
B
#
!&
6!
2 Related Work
#
(>
2*+::-!;5
(*++1-
#
4
(*+++- !
3(
-
! & KnowItAll 2 et al.
(1,,<-
5(!!7%0%0*%01C8-
#
! " (1,,D-
5
#
=
7 E
%/%87%/%E8
(-(-
(-!
?%(1,,<-
##
!&
"%
=
!>
! F et al.
(1,,B-
! >
=
!F et al. (1,,:-
5 (*++1-
!;
7%/%
%/%
E8
E
%/%
%/%
!
!&
$
reckless
(E-(%/%
-
=
!
&
F et al!(1,,:-!"
#
7>?
%/%
E87>?
E
%/%
8!>
(>?
- # !
&
"%"!Fet al!(1,,:-
3 states countries (
836
-singersfish(
-
food
sweet (G51,,D-!4
"%
"%"%
!
3 Seeds for Taxonomic Growth
>
&
3
HI
3
0
J
0
3
=
3
0
! &
Icola, carbonated, drinkJ!;
cola
(treatrefreshment-#
7E8
#7
E8!
#
#! "
$"
%%!
3.1 WordNet
& "%
!;"%
{feline, felid}
{true_cat, cat} {big_cat, cat}
!
5
"%
6,K Xess
"%(ess,
ess, ?ess!-
female !
%
"%
! >
Ilioness, female,
lionJ Iespresso, strong, coffeeJ
Imessiah, awaited, kingJImessiah, expect-
ed, delivererJ!
3.2 ConceptNet
"%
!
%('1,,<-
"""
!
%
(-
!'
>
% espresso
strong coffee("%-
bagelJewish word(usemen-
tion-!'expressionism
artistic style ("%
artistic movement- explosion
suicide attack(-!
%
"%
! " %
A,,,,>
(78-
(!!7 8-
(!!78-
"%!
& IWyoming, great,
stateJ Iwreck, serious, accidentJ Iwolf,
wild, animalJ!
3.3 Web-derived Stereotypes
G5(1,,D-
#
7>?%/%8
!&
#!!
#
!5
*BL
837
(!!787
8!-
6,,,
1,,, 3 ! 5
G59
!
&
=
Isurgeon, skilful, ?JIvirus, malicious, ?J
Idog, loyal, ?J!&
#
!
3.4 Overview of Seed Resources
%
!&"%
#
!&
%
"%
!>
G5
# 3
3
!>#
&*!
"% %
M
*111D **AA 6B*1
M
B*A*< *:,: *66::
M
<!*1 *!6 1!B6
M
1A,B BB, **D1
&*$&
!
""%
$
(-(
-
(
-! 4
#
!B
#
!
4 Bootstrapping from Seeds
&
!
NN
!&
&
3
HI
3
0
J
#
# (E
-$
*! 7
3
E
8
1! 7
3
0
E8
#
3
0
!
#
#$
A! 7E
3
0
E8
<! 7E
3
E
8
&
! ;
7
8 #
Ilemonade, cold, beverageJIlemon-
ade, refreshing, beverageJ!&
(
3
-
!
"&
#
#O ex-
pand(T')!"
) >0! /
# 1,,
#B,
#!
838
"
#
StK
t
S
. &
K
,
S
=S
K
*
S
=K
,
S
∪
{T
∣
T ' ∈S ∧ T ∈expand T ' }
K
t*
S
=K
t
S
∪
{T
∣
T ' ∈K
t
S
∧ T ∈expand T ' }
"#
3
!
# ex-
pand(T') !;
Fet al.(1,,:-
reckless bootstrapping
#
!&
3
!"
"% near-miss$
I
3
0
J"%
0
(-
0
( -!&
#
"%
#
"%(
"%-!&
$
K
tP
S
=K
t
S
∪
{
T
∣
T '∈K
t
S
∧
T ∈ filter
near−miss
expand T '
}
;*1
!
Q*
%
ND
<,
"%N
! & "% near-
miss
#!
;*$)#
B!
;1$)
#B
!
4.1 An Example
cola
$Icola, refreshing, beverageJ!>
cola
effervescent beverage
sweet beverage nonalcoholic beverage
!>
sugary foodfizzy drinkdark mixer!
> sensitive
beverage everyday beverage common
drink!>
irritating food unhealthy drink!>
stimulating
drinktoxic foodcorrosive substance!
cola
*<*<A1
D1A+A<*,1
B!
refreshing beverage
champagne
lemonadebeer!
0 1 2 3 4 5
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
WordNet
Simile
ConceptNet
Bootstrapping Cycle
# Triples
0 1 2 3 4 5
0
50000
100000
150000
200000
250000
300000
350000
WordNet
Simile
ConceptNet
Bootstrapping Cycle
# Terms
839
5 Empirical Evaluation
&"% near-miss
(0
-
(
-
(
3
- #
!&
#
#
3
#
3
!;
>
0 (1,,B-
#
O
"
%!
>0
<,1 1*
"%!&#
( hot
red!-#R(a|an|
the) * C
i
(is|was)R
3
!
not #
#
(0
-(
Temperature hot
-!&#+<+:+
<,1!&
'&/
F(1,,1-!4
<,1
1* >
0
1* "%
!&
"%
(-
!
3B6!DL
"%!4
61!DL!.0>
,!61D
,!AA:B*A<B
<,1!
*
We replicate the above experiments using the
same 402 nouns, and assess the clustering accur-
acy (again using WordNet as a gold-standard)
after each bootstrapping cycle. Recall that we use
only the D
j
fields of each triple as features for the
clustering process, so the comparison with the
WordNet gold-standard is still a fair one. Once
again, the goal is to determine how much like the
human-crafted WordNet taxonomy is the tax-
onomy that is clustered automatically from the
discriminating words D
j
only. The clustering ac-
curacy for all three seeds are shown in Tables 2,
3 and 4.
Cycle E P # Features Coverage
1
st
!A1D !61+ +,D 66L
2
nd
!1BA !D*1 *<:1 DDL
3
rd
!1D1 !D*D 1**< :1L
4
th
!A*1 !6<, 1<DA :AL
5
th
!1:+ !6:< 1DB1 :AL
&1$WordNet
(2200-
Cycle E P # Features Coverage
1
st
!**B !:<1 A6A <*L
2
nd
!1BB !D1< D:D B+L
3
rd
!1:6 !6+< *A61 D<L
4
th
!1D+ !6+< *:BA D+L
5
th
!1++ !6DA 11D< :1L
&A$ConceptNet
Cycle E P # Features Coverage
1
st
!1B< !D*6 :AD B+L
2
nd
!1:, !D*1 *AA: DAL
3
rd
!1:+ !6+A *+<< D+L
4
th
!A*A !66, 1A*1 :1L
5
th
!*BD !:<A 16*< :1L
&<$Simile
& <,1
#casuarina, cinchona, do-
decahedron concavity>
*
"
!"=
,!61D61!DL!
840
0
#B *,,
4%!'
0 >
#B
!
#(
&*-
(S:1L-B ! &
yesteryear, nonce ( -
salient(3-jag, droop,
fluting, fete, throb, poundage, stinging, rouble,
rupee, riel, drachma, escudo, dinar, dirham,
lira,dispensationhoardairstream(
-riversidecurling!;
A<
$
#
!
;A$)
!
;<$0
!&
0>$
H,!61D!
4 "% %
6:L6DL
B
61!DL
0 >! 5
:<!AL
66!<L0>
and ( Tem-
peratureColor!-
D,!+L
!;
<,1
G5(1,,:-
(6+!:BL-!
4
#
!&
316*<
0 > B*A<B
!&
0>!
6 Conclusions
&
#
B
!%#
! 4
"%
0>!"
!
&
#! ;
G5(1,,D-
O=
!G
5(1,,:-
1 2 3 4 5
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
WordNet
Simile
ConceptNet
Bootstrapping Cycle
Coverage
1 2 3 4 5
0.40
0.50
0.60
0.70
0.80
0.90
1.00
WordNet
Simile
ConceptNet
Poesio & Alm.
Bootstrapping Cycle
Purity
841
!
7D
j
C
i
8
7D
j
P
k
C
i
8=
C
i
D
j
! 7 8
!
>
:1L
0> !
B
A1:
<,1
:*!B+L !
!>
#
! &
#F et
al. (1,,:-
!
#
#
#
!
References
>&!2.!(*+::-!0!&
0!
In Proc. of the 26
th
>.>'
1*D 11<!
> >! 0 .! (1,,B-!
' "!
Proc. of the annual meeting of the Cognitive
Society?!
4 >! 5 )! (1,,6-! 2
"% .'Q
!Computational Linguistics,A1(*-$*A <D!
0!"?!(1,,D-!>>
# Q T
"! In Proc. of the 45
th
Annual Meeting of the
ACL::: :+B!
2!4.!(*+++-!;
! In Proc. of the 37
th
Annual
Meeting of the ACLBDN6<!
2 /! F ! ! .!
0> .!"!!&!
U>! (1,,<-!"
F> ( -! In
Proc. of the 13
th
WWW Conference*,,N*,+!
5F!?!(*+:6-!52;$>.
0! In Proc. of the 5
th
National Con-
ference on Artificial Intelligence 16D 1D*
00!>>
>!
50!(1,,<-!"%$"V
Proc. of GWC’2004, the 2
nd
Global WordNet con-
ference.4!
5 .! (*++1-! > #
! In Proc. of the
14
th
Int. Conf. on Computational Linguistics
BA+NB<B!
F G! Q ! &! >!
(1,,B-!&.$
! Int. Jour-
nal of Web and Grid Services*(1-1<, 166!
F )! (1,,1-! '&/$ > !
Technical Report 02-017.!
$OO !!!OSOO!
FW!Q2!52!(1,,:-!
' "5
0')!In Proc. of the 46
th
Annu-
al Meeting of the ACL.
'!4!)Q!G!(*++,-!4
$
3!%U$> "!
'5!0!(1,,<-%$>0
Q&! BT Technology
Journal11(<-$1** 116!
.)!4Q!;!)!
.F!?! (*++,-! "%$
!!?'
A(<-$1ABN1<<!
%!0>!(1,,*-!&
!In Proc. of the 2
nd
International Con-
ference on Formal Ontology in Information Sys-
tems (FOIS-2001)!
Q!?!%>!U!(1,,<-!'
! Advances in Neural Information Process-
ing Systems*D!
G&!5U! (1,,D-!.'/
; !In Proc.
of the 45
th
Annual Meeting of the ACLBDN6<!
G&!5U! (1,,:-!>;F
Q )
.!In Proc. of Coling 2008, The
22
nd
International Conference on Computational
Linguistics.!
842
. Association for Computational Linguistics
Growing Finely-Discriminating Taxonomies from Seeds
of Varying Quality and Size
Tony Veale
tony.veale@ucd.ie
Guofu. Proceedings of the 12th Conference of the European Chapter of the ACL, pages 835–842,
Athens, Greece, 30 March
Ngày đăng: 24/03/2014, 03:20
Xem thêm: Báo cáo khoa học: "Growing Finely-Discriminating Taxonomies from Seeds of Varying Quality and Size" docx, Báo cáo khoa học: "Growing Finely-Discriminating Taxonomies from Seeds of Varying Quality and Size" docx