A quarterly bulletin of the IEEE computer society technical committee on Database engineering (VOL 8) ppt

A quarterly bulletin of the IEEE computer society technical committee on Database engineering (VOL. 8) ppt

... and“where.” The Do-specialistreplaces the predicateDO(from the verb“do”)with a morespecificverbchosenfromthoseacquiredfor a domain.Although“do”doesnotappearas the mainverbveryoftenin the database querytask, the translatorsdeduceitsimpliedpresenceinsomequeries—forinstanceinsuchcomparativequestionsas“WhatcountriescovermoreareathanPeruLdoes~?”. The comparativespecialistexamines the twoarguments of a comparisontodeterminewhether the comparisontobemadeisbetweentwoattributevalues(e.g.,Jack’sheightandsevenfeet)orbetweenanentityandsomevalue(e.g.,Jackandsevenfeet).In the lattercase,TEAMtriestoidentify the appropriateattribute of the entity(e.g.,Jack’sheight).2.3.4 Database Schema The translationfromlogicalformtoSODAqueryrequiresknowing the exactstructure of the target database and the mannerinwhich the predicatesappearingin the logicalformareassociatedwith the relationsin the database. Thisinformationisprovidedby the database schema,whichincludes the followinginformation8:•Definition of sortsinterms of database relations(subject)orfields(andfieldvalueforsortsderivedfromfeaturefields). 8The schematranslatoralsousescertaininformationin the conceptualschema,includingtaxonomicinformationin the sorthierarchyanddelineationinformationassociatedwithnonsortpredicates.—18—-‘IrisenuIIORLDCBCITYCONTieldP1~nuCITY—COUNTRYBCITY—NRMEBCITY—POPCONT—ARERONT-HEMICONT—NRPIECONT—POPPEAK-COUNTRYERK-HEIGHTPEAK—MAPlEPEAK-VOLWURLOC-RRERIORLDC-CRPITRLWORLOC—COtITIIIEIITUORLDC—TIRMEWORLDC—POPordPlenuRER(n)CAPITAL(n)CITY(n)ONTINENT(n)COUNTRY(o)HEIGHT(n)EPII(n)HEMISPHERE(n)HIGH(edj)ARGE(adj)LOW(edj)N(n)RME(n)MORTIIEN(edj)PERK(n)OP(n)POPULATION(n)POPULOUS(sdj)(n)SHORT(edj)SMALL(adj)uestjonRnswerjn9Area4e~dPERK-HEIGHT1~partoranACTUALrs)ation.Typs of 11.14-SYMOOUC A~ 1)~TICFEATUREeluntyp.DATES~Ait~SCOUNTSAuthaunitsImpfcit?YESNOMarImplicitunit—FOOTI000ursty~ of thisunit-TIMEWEIONTSPEEDVOLUMEI3 ~A~ AMAWORTHTCt,WERATUREOTHERAbbr.vI.donforthisunit?—FTConv.r,lonformulafromMETERStoFEET-(IK0.30 48) Conv.rilonfonoulafromFEETtoMETERS-K0.3040)‘ositly.edjactivu—HIGHTAb.Nagetivaodiscdvsa-SHORTLOWFigure4: The AcquisitionMenu•List of convenientidentifyingfieldsforeachsortcorrespondingto a filesubjectorfield.•Definition of predicatesinterms of actual database relationsandattributes;thisisdoneforpredicatesderivedfrombothactualandvirtualrelations(forrelationsubjectsandattributes).•List of eachrelation’skeyfields. The database schemarelatesall the predicatesin the conceptualschematotheirrepresentationin a particular database. Foreachpredicate, the database schemagenerates a logicformuladefining the predicateinterms of database relations.Forexample, the predicateWORLDC-CAPITAL -OF hasasitsassociated database schema a formularepresenting the factthatitsfirstargumentistakenfrom the WORLDC-CAPITALfield of a tuple of the WORLDCrelation,andthatitssecondargumentcomesfrom the WORLDC-NAMEfield of the samerelation.If a predicatehasmultipledelineations—i.e.,ifitappliestodifferentsorts of arguments(e.g., a HEMISPHERE -OF predicatecouldapplytobothCOUNTRIESandCONTINENTS) the schemawillinclude a separatedefinitionforeachset of arguments.Insomecases(e.g.,predicatesresultingfrom the acquisition of someverbsandadjectives), the mappingassociatedwith a predicateindicatesthatitisequivalenttoanotherconceptualschema]predicatewithcertainargumentssettofixedvalues.2.4Acquisition The acquisitioncomponent of TEAMiscrucialtoitssuccessas a transportablesystem.Recallthatoneconstraint on TEAMisthat the DBEnotberequiredtohaveanyknowledge of TEAM’sinternalworkings,norabout the intricacies of the grammar,nor of computationallinguisticsingeneral.Yetdetailedinformation,oftennecessarilylinguisticinitsorientation,mustsomehowbeextractedfrom-~desirablethat the acquisitioncomponentbedesignedtoallow a DBEtochangeanswerstoquestionsand ... of language-processingtasksfrom the analysis of anEnglishsentenceto the generation of a database query. The rectangularboxesrepresent the processes,and the ovalstotheirright, the variousknowledgesources. The acquisitionbox on the rightpointstothoseknowledgesourcesthatareaugmentedthroughinteractionwith the DBE.AllothermodulesandknowledgesourcesarebuiltintoTEAMandremainunchangedduringacquisition.Inthissectionwewilllookat the TEAMsystemfromseveralangles.Tobegin,wewillsketch the overallflow of processingduringquestion-answering,describing the variousprocessesinvolvedintransforminganEnglishqueryinto a formal database query.Because the particularlogicalform(LF)TEAMusestoencode the meaning of a queryplays a crucialroleinmediatingbetween the wayqueriesareposedand the wayinformationisobtainedfrom the database, itaffects the design of severalcomponents of the system.Wethenlookinsomewhatmoredetailat the datastructuresthatencodedomain-specificinformation.Finally,wediscuss the overallstrategyusedforacquiringinformationaboutspecificdomainsanddatabases.2.1Flow of Control The flow of controlduringTEAM’stranslation of a natural-languagequeryinto a formalqueryto the database isillustratedas the path on the leftside of Figure2,fromtoptobottom. The transformationtakesplaceintwomajorsteps:first, a representation of the literalmeaning of the query,orlogicalform,isconstructed;second,thislogicalformistransformedinto a database query. The translationintologicalformisperformedby the DIALOGICsystem,whichcomprises the following-components,shownsurrounded-by the~ dotted~boxinFigure2: the DIAMONDparser, the DIAGRAMgrammar, the lexicon,semantic-interpretationfunctions,basicpragmaticfunctions,andproceduresfordetermining the scope of quantifiers.Since a description of DIALOGICisprovidedelsewhereGrosS2],letusdiscusshereonlythoseaspects of the systemthatwereinfluencedby the development of TEAM.TwocentraldatastructuresinDIALOGICthatareaffectedbyTEAM’sacquisitionprocessaredescribed: the lexiconand—13—Figure2:TEAMSystemDiagram the conceptualschema.Tounderstand the semanticandpragmaticcomponents of TEAM,itisalsonecessarytoappreciateDIALOGIC’sseparation of semanticinterpretationoperationsintotwomainclasses:translators,whichdefinehow the interpretations of the constituents of a phrasearecombinedinto the phrase’sinterpretation;basicsemanticfunctions,whicharecalledby the translatorstoassemble the actuallogical-formfragmentsthatform the interpretations of phrases.hibrief,when the enduserasks a query,DIALOGICparses the sentence,producingoneormoretreesrepresentingpossiblesyntacticstructures. The “best”parsetree,based on a priorisyntacticcriteria,isselectedandannotatedwithsemanticinformation(Robi82,Mart83J.Next,pragmaticanalysisisappliedtoassignspecificmeaningsthatarerelevantto the currentdomaintonoun-nouncombinationsandto“vague”predicateslikeHAVEand OF. 4Finally, the quantifier-scopedeterminationprocess,afterconsideringallpossiblealternatives,determines the bestrelativescopefor the quantifiersin the query. The logicalformthusconstructed,using a set of predicatesthataremeaningfulwithrespectto the givendomainand database, constitutesanunambiguousrepresentation of the Englishquery. The logicalformproducedbyDIALOGICistranslatedinto a queryin the SODAMoor79J~ database querylanguageby the schematranslator.Inadditionto the conceptualschema, the schematranslatoruses a database schemathatfurnishesinformationabout the particular database structures.Thisschema,describedbrieflybelow,isalsoaffectedby the acquisitionprocess.4Weconsiderthesepredicatesvaguebecausetheycanbeappliedtomanykinds of entities;theyarereplacedby~predicatesduringpragmaticprocessing.5SODAisactually a querycompilerthattakesqueriesin a standardrelationalfonnalismandcompilesthemintooptimizedqueriesin the languages of other database managementsystems;bothrelationalandcodicilDBMSshavebeenaccommodated.Forourexperiments,aninterpreterthatfollowsSODAcommandstoaccess a small database inprimarymemorywasusedin ... addinformationashegainsexperiencewithTEAMand the types of questionsthatareaskedby the endusers.Inanattempttosatisfyalltheseconstraints, the menu-orientedsystemdepictedinFigure4wasdeveloped. The acquisitionsystemconsists of a menu of generalcommandsat the verytop,threemenusassociatedwithrelations,fields,andlexicalitemsrespectively,and,at the bottom, a —19—Figure5:Acquiring the VirtualRelationsPKCONTandHEMICwindowforquestionsandanswers.When the DBEuses the mousetoselectone of the itemsfrom the threemenus, a set of questionsappearsin the question-answeringareaat the bottom of the display,towhichhecanthenrespond.One of the generalprinciples of acquisitionisevidentfromthisdisplay,namely,that the acquisitioniscenteredupon the relationsandfieldsin the database, becausethisis the informationmostfamiliarto the DBE. The answerstoeachquestioncanaffect the lexicon, the conceptualschema,and the database schema. The DBEneednotbeaware of exactlywhyTEAMposes the questionsitdoes—allhehastodoisanswerthemcorrectly.Even the entriesdisplayedin the wordmenuowetheirpresencetoquestionsabout the database. The DBEvolunteersentriestothismenuonlyin the case of verbacquisition,tosupplyanadjectivecorrespondingtosomenounalreadyinTEAM’slexicon,ortoenter a synonymforsomelexicon-residentword. The DBEisassumednottohaveanyknowledge of formallinguisticsor of natural-languageprocessingmethods.Heisassumed,however,toknowsomegeneralfactsaboutEnglish—forexample,whatpropernouns,verbs,plurals,andtenseare,butnothingmoredetailedthanthat.Ifmoresophisticatedlinguisticinformationisrequired,asin the case of verbacquisition,TEAMproceedsbyaskingquestionsaboutsamplesentences,allowing the DBEtorely on hisintuitionas a nativespeaker,andextracting the informationitneedsfromhisresponses.Virtualrelationsarespecifiediconically. The leftside of Figure5shows the acquisition of a virtualrelationthatidentifies the continent(PKCONT-CONTINENT,derivedfromWORLDC-CONTINENT) of a peak(PKCONT-NAME,fromPEAK-NAME)byperforming a database join on the PEAK-COUNTRYandWORLDC-CONTINENTfields.Similarly, the rightside of Figure5shows the acquisition of the virtualrelationthatencodes the hemisphere(HEMIC-HEMI) of a country(HEMIC.NAME)byjoining on the WORLDC-CONTINENTandCONT-NAMEfields.Ifhewishes, the DBEcanchangepreviousanswers.Incrementalupdatesarepossiblebecausemost of the methodsforupdating the variousTEAMstructures(lexicon,schemata)weredevisedtoundo the effects of previousanswersbefore the effects of newanswerscouldbeasserted.Helpinformationisalwaysavailabletoassist the DBEwhenheisunsurehowtoanswer a question.Selecting the questiontextwith the mouseproduces a moreelaboratedescription of the informationTEAMistryingtoelicit,usuallyaccompaniedbypertinentexamples.Finally, the acquisitioncomponentkeepstrack of whatinformationremainstobesuppliedbeforeTEAMhas the minimumitneedstohandlequeries. The DBEdoesnothavetodeterminehimselfhowmuchinformationissufficient;allhehastodoistoperceivethatnoacquisitionwindowindicatesremainingunansweredquestions. Of course, the DBEcanalwaysprovideinformationbeyond the minimum—forexample,bysupplyingadditionalverbs,derivedadjectives,orsynonyms.—20—3ConclusionsTEAMhasbeentestedin a variety of multifile database domainsby a fairlylargenumber of peopleinadditiontoitsoriginalimplementationteam.While the testinghasbeenmuchlessrigorousthanwouldberequiredforanactualproduct,enoughhasbeenlearnedtoconcludethat the basicideas~work”—namely,thatitispossibletobuild a natural-languageinterfacethatisgeneralenoughtoallowitsadaptationtonewdomainsbyuserswhoarefamiliarwiththesedomains,butarethemselvesneitherexperts on the systemitselfnorspecialistsinAlorlinguistics.TEAMhandles a widerange of verbs, a capabilitythatisabsolutelyessentialforfluentnatural-languagecommunication.Asitembodiesnodiscoursemodel,itshandling of pronounresolutionanddeterminerscopingiscorrespondinglylimited.Whileitsgrammarcoverageisquiteextensive, the formalismusedtorepresentitand the processesusedtoimplementitareyieldingtonewerandmoreperspicuousdesigns~Shie84].Wearenowinvestigatingwaystoprovidetransportabilityinnatural-languagesystemsthatcaninteractwith a variety of softwareservicesbeyond database accessandwhichmoreextensivediscoursecapabilitieswillbeembodied.AcknowledgmentsJerryR.Hobbs,RobertC.Moore,JaneJ.Robinson,andDanielSagalowiczplayedimportantrolesin the design of TEAM.ArmarArchbold,NormanHaas,GaryHendrix,LornaShinkle,MarkStickelandDavidH.Warrenalsocontributedto the project.9ReferencesGros85}BarbaraGrosz,DouglasE.Appelt,PaulMartin,andFernandoPereira.TEAM:AnExperimentin the Design of TransportableNaturalLanguageInterfaces. Technical Note,ArtificialIntelligenceCenter,SRIInternational,MenloPark,California,1985.Cros82]BarbaraGrosz,NormanHaas,GaryC.Hendrix,JerryHobbs,PaulMartin,RobertMoore,JaneRobinson,andStanRosenschein.DIALOCIC: A CoreNatural-LanguageProcessingSystem. Technical Note270,ArtificialIntelligenceCenter,SRIInternational,MenloPark,California,November1982.Hendl7]GaryG.Hendrix.Human engineering forappliednaturallanguageprocessing.InProc. of the FifthInternationalJointConference on ArtificialIntelligence,pages183—191,InternationalJointConferences on ArtificialIntelligence,Cambridge,Massachusetts,August1977.Mart83]PaulMartin,DouglasAppelt,andFernandoPereira.Transportabilityandgeneralityin a natural-languageinterfacesystem.InAlanBundy,editor,Proc. of the EightInternationalJointConference on ArtificialIntelligence,pages573—581,InternationalJointConferences on ArtificialIntelligence,August1983.IMoor79IRobertC.Moore.HandlingComplexQueriesin a Distributed Database. Technical ~Note470,~ArtificialIntelligenceCenter,SRIInternational,MenloPark,California,October1979.Moor8l]RobertC.Moore.Problemsinlogicalform.InProc. of the 19thAnnualMeeting of the AssociationforComputationalLinguistics,Stanford,California,1981. 9The development of TEAMwassupportedbyDARPAcontractsN00039.80.C.0645,N00039.83.C-0109,andN00039.80-C.0575; the NationalLibrary of MedicineNIHgrantLM03611;andNSFgrantIST.8209346.—21—Robi82]JaneJ.Robinson.Diagram: a grammarfordialogues.Communications of the ACM,25(1):27—47,1982.Shie84]StuartM.Shieber The design of a computer languageforlinguisticinformation.InProc. of Coling84,pages362—366,AssociationforComputationalLinguistics,June1984.Wa1t75JDavidWaltz.Natural.languageaccessto a largedatabase:an engineering approach.InProc. of the FourthInternatioalJointConference on ArtificialIntelligence,pages868—872,InternationalConferences on ArtificialIntelligence,September1975.—22— A MULTILINGUALINTERFACETODATABASESHubertLehxnann,NikolausOtt,MagdalenaZoeppritzIBMGermany,}~eidelbergScientificCenterAbstract The UserSpecialtyLanguages(USL)System, a portableinterfacetorelationaldatabasesinrestrictedEnglish,French,German,Italian,andSpanishisdescribed.Webrieflydiscussourdesignobjectives,theoreticalandpracticalproblemsweencounteredduringsystemrealization,and the consequenceswehavedrawnfor a successorproject. The GermanandEnglishversions of the USLSystemhavebeenextensivelyevaluatedwithrealusersandrealapplications,whichnotonlyshoweduswherewecouldimproveoursystembutalsoprovidedvaluableinsightsfor the methods of softwareergonomics.IntroductionWhenwetalkaboutinteractionwithdatabaseswemustclarifytwothings:1.whoare the groups of peoplewhowanttoobtaininformation,and2.whatare the operationstobeperformed on the database toyield the informationdesired?Thenwecanthinkabouthowtheseoperationsaretobespecifiedby a givenuser. A number of querylanguageshavebeendevelopedduring the 70’sandeffortstoshowtheir“user-friendlinesstt,theirappropriatenessfor“non-DPexperts”havebeenmadewithgreaterorlessersuccess(cf.e.g.LEHN79]for a survey). A differentapproachistoregardhumanquestion-answeringdialogas a modelfor the interactionwith a database, aspresumablyitisbesttotalkto the computer inone’sownlanguage. The problemthenistorelatenaturallanguageexpressionstodatain the database andto the operationstobeperformed on them.In the USLprojectweshowedthat•fragments of naturallanguagecanbeimplementedthatarelargeenoughtobeusablefor database access,• the syntaxandsemantics of suchfragmentscanbedescribedinsuch a waythat the systembecomesindependent of the particulardomain of discourse(thispropertyhasbecomeknownas(trans)portability),•adaptationto a newdomaincanbeachievedwithouttraininginlinguistics,•naturallanguageinterfacescanbebuiltwhichoperate on standarddatabases(i:e~neitherrequirespe~ialrepresentatioflnormaMp~lation of data).—23—Designprinciples The USLSystemwasdesignedwith the objectivestobeusableinrealisticapplications,tobeportable,toenableadaptationtonewdomainsbynon-linguists,andtoprovideaninterfaceto~.i.aitdarddatabases. A latergoalwas the adaptationto a variety of differentlanguages,whichbroughtin a fewnewaspects,butwas on the whole a relativelystraightforwardtask.Theseobjectiveshad a number of consequencesfor the design of the USLsystemwhichwediscussin the followingsections.Consequences of portability A systemisportableif...

A quarterly bulletin of the IEEE computer society technical committee on Database engineering (VOL. 9) pptx

... of commonsubexpressioneliminationGM82],whichappearsparticularlyusefulwhenflatteningoccurs. A simpletechniqueusing a hill—climbingmethodiseasytosuperimpose on the proposedstrategy,butmoreambitioustechniqueprovide a topicforfutureresearch.Further,anextrapolation of commonsubexpressioninlogicqueriescanbeseenin the followingexample:letbothgoalsP (a, b,X)andP (a, Y,c)occurin a query.ThenitisconceivablethatcomputingP (a, Y,X)onceandrestricting the resultforeach of the casesmaybemoreefficient.Acknowledgments:WearegratefultoShamimNaqviforinspiringdiscussionsduring the development of anearlierversion of thispaper.References:AU79]Aho, A. andJ.Uliman,Universality of DataRetrievalLanguages,Proc.POPLCon!.,SanAntonio,TX,1979.B40]Birkhoff,G.,“LatticeTheory”,AmericanMathematical Society, 1940.BMSU8S]Bancilhon,F.,D,Maier,Y.SagivandUliman,MagicSetsandotherStrangeWaystoImplementsLogicPrograms,Proc.5—thACMSIGMOD—SIGACTSymposium on Principles of DatabaseSystems,pp.1—16,1986.BR86]Bancilhon,F.,andR.Ramakrishan,AnAmateur’sIntroductiontoRecursiveQueryProcessingStrategies,Proc.1986ACM—SIGMQDIntl.Conf. on Mgt. of Data,pp.16—52,1986.D82]Daniels,D.,et.al.,“AnIntroductiontoDistributedQueryCompilationin~Proc. of SecondInternationalConf, on DistriutedDatabases,Berlin,Sept.1982.GM82]Grant,J.andMinkerJ., On Optimizing the Evaluation of a Set of Expressions,mt.Journal of Computer andInformationScience,11,3(1982),179—189.1W87]loannidis,Y.E,Wong,E,QueryOptimizationbySimulatedAnnealing,SIGMOD87,SanFrancisco.KBZ86]Krishnamurthy,R.,Boral,H.,Zaniolo,C.Optimization of NonrecursiveQueries,Proc. of 12thVLDB,Kyoto,Japan,1986.KRS87]Krishnamurthy,R,Ramakrishnan,R,Shmueli,0.,“TestingforSafetyandEffectiveComputability”,ManuscriptinPreparation.KT811Kellog,C.,andTravis,L.Reasoningwithdatain a deductivelyaugmented database system,inAdvancesin Database Theory:Vol1,H.Gallaire,J.Minker,andJ.Nicholaseds.,PlenumPress,NewYork,1981,pp261—298.Lb84]Lloyd,J.W.,Foundations of LogicProgramming,SpringerVerlag,1984.M84]Maier,D., The Theory of RelationalDatabases,(pp.542—553),Comp.SciencePress,1984.Na86]Naish,L.,NegationandControlinPrologJournal of LogicProgramming,toappear.Sel79]Sellinger,P.G.et.al.AccessPathSelectionin a Relational Database ManagementSystem.,Proc.1979ACM—SIGMODIntl.Conf. on Mgt. of Data,pp.23—34,1979.5Z86]Sacca’,D.andC.Zaniolo, The GeneralizedCountingMethodforRecursiveLogicQueries,Proc.ICDT‘86——mt.Conf. on Database Theory,Rome,Italy,1986.TZ86]Tsur,S.andC.Zaniobo,LDL: A Logic—BasedDataLanguage,Proc. of 12thVLDB,Kyoto,Japan,1986.U85]Ullman,J.D.,Implementation of logicalquerylanguagesfordatabases,TODS,10,3,(1985),289—321.UV85]Ullman,J.D.and A. VanGelder,TestingApplicability of Top—DownCaptureRules,StanfordUniv.ReportSTAN—CS—85—146,1985.V86]Viflarreal,M.,“Evaluation of anO(N**2)MethodforQueryOptimization”,MSThesis,Dept. of Computer Science,Univ. of TexasatAustin,Austin,TX.Z85]Zaniolo,C. The representationanddeductiveretrieval of complexobjects,Proc. of 11thVLDB,pp.458—469,1985.Z86]Zaniolo,C.,SafetyandCompilation of Non—RecursiveHornClauses,Proc.Firstmt.Con!. on Expert Database Systems,Charleston,S.C.,1986.3OPTIMIZATION OF COMPLEX DATABASE QUERIESUSINGJOININDICESPatrickValduriezMicroelectronicsand Computer TechnologyCorporation3500WestBalconesCenterDriveAustin,Texas78759ABSTRACTNewapplicationareas of database systemsrequireefficientsupport of complexqueries.Suchqueriestypicallyinvolve a largenumber of relationsandmayberecursive.Therefore,theytendtouse the joinoperatormoreextensively. A joinindexis a simpledatastructurethatcanimprovesignificantly the performance of joinswhenincorporatedin the database systemstoragemodel.Thus,asanyotheraccessmethod,itshouldbeconsideredasanalternativejoinmethodby the queryoptimizer.Inthispaper,weelaborate on the use of joinindicesfor the optimization of bothnon—recursiveandrecursivequeries.Inparticular,weshowthat the incorporation of joinindicesin the storagemodelenlarges the solutionspacesearchedby the queryoptimizerandthusoffersadditionalopportunitiesforincreasingperformance.1.IntroductionRelational database technologycanwellbeextendedtosupportnewapplicationareas,suchasdeductive database systemsGallaire84].Comparedto the traditionalapplications of relationaldatabasesystems,theseapplicationsrequire the support of morecomplexqueries.Thosequeriesgenerallyinvolve a largenumber of relationsandmayberecursive.Therefore, the quality of the queryoptimizationmodule(queryoptimizer)becomes a keyissueto the success of database systems. The idealgoal of a queryoptimizeristoselect the optimalaccessplanto the relevantdataforaninputquery.Most of the work on traditionalqueryoptimizationJarke84]hasconcentrated on select—project—join(SPJ)queries,fortheyare the mostfrequentonesintraditionaldataprocessing(business)applications.Furthermore,emphasishasbeengivento the optimization of joinsIbaraki84]becausejoinremains the mostcostlyoperator.Whencomplexqueriesareconsidered, the joinoperatorisusedevenmoreextensivelyforbothnon—recursivequeriesKrishnamurthy86]andrecursivequeriesValduriez8 6a] .InValduriez87],weproposed a simpledatastructure,called a joinindex,thatimprovessignificantly the performance of joins.Inthispaper,weelaborate on the use of joinindicesin the context of non—recursiveandrecursivequeries.Weview a joinindexasanalternativejoinmethodthatshouldbeconsideredby the queryoptimizerasanyotheraccessmethod.Ingeneral, a queryoptimizermaps a queryexpressed on conceptualrelationsintoanaccessplan,i.e., a low—levelprogramexpressed on the physicalschema. The physicalschemaitselfisbased on the storagemodel, the set of datastructuresavailablein the database system. The incorporation of joinindicesin the storagemodelenlarges the solutionspacesearchedby the queryoptimizer,andthusoffersadditionalopportunitiesforincreasingperformance.10Joinindicescouldbeusedinmanydifferentstoragemodels.However,inordertosimplifyourdiscussionregardingqueryoptimization,wepresent the integration of joinindicesin a simplestoragemodelwithsingleattributeclusteringandselectionindices.Thenweillustrate the impact of the storagemodelwithjoinindices on the optimization of non—recursivequeries,assumedtobeSPJqueries.Inparticular,efficientaccessplans,where the mostcomplex(andcostly)part of the querycanbeperformedthroughindices,canbegeneratedby the queryoptimizer.Finally,weillustrate the use of joinindicesin the optimization of recursivequeries,where a recursivequeryismappedinto a program of relationalalgebraenrichedwith a transitiveclosureoperator.2.StorageModelwithJoinIndices The storagemodelprescribes the storagestructuresandrelatedalgorithmsthataresupportedby the database systemtomap the conceptualschemainto the physicalschema.In a relationalsystemimplemented on a disk—basedarchitecture,conceptualrelationscanbemappedintobaserelations on the basis of twofunctions,partitioningandreplicating.All the tuples of a baserelationareclusteredbased on the value of oneattribute.Weassumethateachconceptualtupleisassigned a surrogatefortupleidentity,called a TID(tupleidentifier). A TIDis a valueuniqueforalltuples of a relation.Itiscreatedby the systemwhen a tupleisinstantiated.TID’spermitefficientupdatesandreorganizations of baserelations,sincereferencesdonotinvolvephysicalpointers. The partitioningfunctionmaps a relationintooneormorebaserelations,where a baserelationcorrespondsto a TIDtogetherwithanattribute,severalattributes,orall the conceptualrelation’sattributes. The rationalefor a partitioningfunctionis the optimization of projection,bystoringtogetherattributeswithhighaffinity,i.e.,frequentlyaccessedtogether. The replicatingfunctionreplicatesoneormoreattributesassociatedwith the TID of the relationintooneormorebaserelations. The primaryuse of replicatedattributesisforoptimizingselectionsbased on thoseattributes.Anotheruseisforincreasedreliabilityprovidedbythoseadditionaldatacopies.inthispaper,weassume a simplestoragemodel ... )clustered on TID.Clusteringisbased on a hashedortreestructuredorganization. A selectionindex on attribute A of relationRis a baserelationF (A, TID)clustered on A. LetR1andR2betworelations,notnecessarilydistinct,andletTID1andTID2beidentifiers of tuples of R1and A2 ,respectively. A joinindex on relationsR1and A2 is a relation of couples(TID1,TID2),whereeachcoupleindicatestwotuplesmatching a joinpredicate.Intuitively, a joinindexisanabstraction of the join of tworelations. A joinindexcanbeimplementedbytwobaserelationsF(TID1,TID2),oneclustered on TID1and the other on TID2.Joinindicesareuniquelydesignedtooptimizejoins. The joinpredicateassociatedwith a joinindexmaybequitegeneralandincludeseveralattributes of bothrelations.Furthermore,morethanonejoinindexcanbedefinedbetweenanytworelations. The identification of variousjoinindicesbetweentworelationsisbased on the associatedjoinpredicate.Thus, the join of relations A1 andR2 on the predicate(R1 .A =R2 .A andR1.B=R2.B)canbecapturedaseither a singlejoinindex, on the multi—attributejoinpredicate,ortwojoinindices,one on (R1 .A =R2 .A) and the other on (R1.BR2.B). The choicebetween the alternativesis a database designdecisionbased on joinfrequencies,updateoverhead,etc.Letusconsider the followingrelational database schema(keyattributesarebold):11CUSTOMER(cname,city,age,job)ORDER(cname,pname,qty,date)PART(pname,weight,price,spname) A (partial)physicalschemaforthis database, based on the storagemodeldescribedabove,is(clusteredattributesarebold)C_PC(CID,cname,city,age,job)City_IND(city,CID)Age_IND(age,CID)0_PC(OlD,cname,pname,qty,date)CnamelND(cname,OlD)CIDJI(CID,OlD)OID_Jl(OlD,CID)C_PCand0_PCareprimarycopies of CUSTOMERandORDERrelations.City_INDandAge_INDareselectionindices on CUSTOMER.CnamelNDis a selectionindex on ORDER.CIDJIandOlDJIarejoinindicesbetweenCUSTOMERandORDERfor the joinpredicate(CUSTOMER.Cname=ORDER.Cname).3.Optimization of Non—RecursiveQueries- The objective of queryoptimizationistoselectanaccessplanforaninputquerythatoptimizes a givencostfunction.Thiscostfunctiontypicallyreferstomachineresourcessuchasdiskaccesses,CPUtime,andpossiblycommunicationtime(for a distributed database system). The queryoptimizerisincharge of decisionsregarding the ordering of database operations,and the choice of the accesspathsto the data, the algorithmsforperforming database operations,and the intermediaterelationstobematerialized.Thesedecisionsareundertakenbased on the physical database schemaandrelatedstatistics. A set of decisionsthatleadtoanexecutionplancanbecapturedby a processingtreeKrishnamurthy86]. A processingtree(PT)is a treeinwhich a leafis a baserelationand a non—leafnodeisanintermediaterelationmaterializedbyapplyinganinternal database operation.Internaldatabaseoperationsimplementefficientlyrelationalalgebraoperationsusingspecificaccesspathsandalgorithms.Examples of internal database operationsareexact—matchselect,sort—mergejoin,n—arypipelinedjoin,semi—join,etc. The application of algebraictransformationrulesJarke84]permitsgeneration of manycandidatePT’sfor a singlequery. The optimizationproblemcanbeformulatedasfinding the PT of minimalcostamongallequivalentPT’s.TraditionalqueryoptimizationalgorithmsSelinger79]performanexhaustivesearch of the solutionspace,definedas the set of allequivalentPT’s,for a givenquery. The estimation of the cost of a PTisobtainedbycomputing the sum of the costs of the individualinternal database operationsin the PT. The cost of aninternaloperationisitself a monotonicfunction of the operandcardinalities.If the operandrelationsareintermediaterelationsthentheircardinalitiesmustalsobeestimated.Therefore,foreachoperationin the PT,twonumbersmustbepredicted:(1) the individualcost of the operationand(2) the cardinality of itsresultbased on the selectivity of the conditionsSelinger79,Piatetsky84]. The possiblePT’sforexecutinganSPJqueryareessentiallygeneratedbypermutation of the joinordering.Withnrelations,therearen!possiblepermutations. The complexity of exhaustivesearchisthereforeprohibitivewhennislarge(e.g.,n>10). The use of dynamicprogrammingandheuristics,asinSelinger79],reducesthiscomplexityto2~,whichisstillsignificant.Tohandle the case of complexqueriesinvolving a largenumber of relations, the optimizationalgorithmmustbemoreefficient. The complexity of the optimizationalgorithmcanbefurtherreducedbyimposingrestrictions on the class of 12PT’sIbaraki84),limiting the generality of the costfunctionKrishnamurthy86),orusing a probabilistichill—climbingalgorithmloannidis87].Assumingthat the solutionspaceissearchedbyanefficientalgorithm,wenowillustrate the possiblePT’sthatcanbeproducedbased on the storagemodelwithjoinindices. The addition of joinindicesin the storagemodelenlarges the solutionspaceforoptimization.Joinindicesshouldbeconsideredby the queryoptimizerasanyotherjoinmethod,andusedonlywhentheyleadto the optimalPT.InValduriez87],wegive a precisespecification of the joinalgorithmusingjoinindex,denotedbyJOINJI,anditscost.ThisalgorithmtakesasinputtwobaserelationsR1(TID1, A1 ,B1, ... )clustered on TID.Clusteringisbased on a hashedortreestructuredorganization. A selectionindex on attribute A of relationRis a baserelationF (A, TID)clustered on A. LetR1andR2betworelations,notnecessarilydistinct,andletTID1andTID2beidentifiers of tuples of R1and A2 ,respectively. A joinindex on relationsR1and A2 is a relation of couples(TID1,TID2),whereeachcoupleindicatestwotuplesmatching a joinpredicate.Intuitively, a joinindexisanabstraction of the join of tworelations. A joinindexcanbeimplementedbytwobaserelationsF(TID1,TID2),oneclustered on TID1and the other on TID2.Joinindicesareuniquelydesignedtooptimizejoins. The joinpredicateassociatedwith a joinindexmaybequitegeneralandincludeseveralattributes of bothrelations.Furthermore,morethanonejoinindexcanbedefinedbetweenanytworelations. The identification of variousjoinindicesbetweentworelationsisbased on the associatedjoinpredicate.Thus, the join of relations A1 andR2 on the predicate(R1 .A =R2 .A andR1.B=R2.B)canbecapturedaseither a singlejoinindex, on the multi—attributejoinpredicate,ortwojoinindices,one on (R1 .A =R2 .A) and the other on (R1.BR2.B). The choicebetween the alternativesis a database designdecisionbased on joinfrequencies,updateoverhead,etc.Letusconsider the followingrelational database schema(keyattributesarebold):11CUSTOMER(cname,city,age,job)ORDER(cname,pname,qty,date)PART(pname,weight,price,spname) A (partial)physicalschemaforthis database, based on the storagemodeldescribedabove,is(clusteredattributesarebold)C_PC(CID,cname,city,age,job)City_IND(city,CID)Age_IND(age,CID)0_PC(OlD,cname,pname,qty,date)CnamelND(cname,OlD)CIDJI(CID,OlD)OID_Jl(OlD,CID)C_PCand0_PCareprimarycopies of CUSTOMERandORDERrelations.City_INDandAge_INDareselectionindices on CUSTOMER.CnamelNDis a selectionindex on ORDER.CIDJIandOlDJIarejoinindicesbetweenCUSTOMERandORDERfor the joinpredicate(CUSTOMER.Cname=ORDER.Cname).3.Optimization of Non—RecursiveQueries- The objective of queryoptimizationistoselectanaccessplanforaninputquerythatoptimizes a givencostfunction.Thiscostfunctiontypicallyreferstomachineresourcessuchasdiskaccesses,CPUtime,andpossiblycommunicationtime(for a distributed database system). The queryoptimizerisincharge of decisionsregarding the ordering of database operations,and the choice of the accesspathsto the data, the algorithmsforperforming database operations,and the intermediaterelationstobematerialized.Thesedecisionsareundertakenbased on the physical database schemaandrelatedstatistics. A set of decisionsthatleadtoanexecutionplancanbecapturedby a processingtreeKrishnamurthy86]. A processingtree(PT)is a treeinwhich a leafis a baserelationand a non—leafnodeisanintermediaterelationmaterializedbyapplyinganinternal database operation.Internaldatabaseoperationsimplementefficientlyrelationalalgebraoperationsusingspecificaccesspathsandalgorithms.Examples of internal database operationsareexact—matchselect,sort—mergejoin,n—arypipelinedjoin,semi—join,etc. The application of algebraictransformationrulesJarke84]permitsgeneration of manycandidatePT’sfor a singlequery. The optimizationproblemcanbeformulatedasfinding the PT of minimalcostamongallequivalentPT’s.TraditionalqueryoptimizationalgorithmsSelinger79]performanexhaustivesearch of the solutionspace,definedas the set of allequivalentPT’s,for a givenquery. The estimation of the cost of a PTisobtainedbycomputing the sum of the costs of the individualinternal database operationsin the PT. The cost of aninternaloperationisitself a monotonicfunction of the operandcardinalities.If the operandrelationsareintermediaterelationsthentheircardinalitiesmustalsobeestimated.Therefore,foreachoperationin the PT,twonumbersmustbepredicted:(1) the individualcost of the operationand(2) the cardinality of itsresultbased on the selectivity of the conditionsSelinger79,Piatetsky84]. The possiblePT’sforexecutinganSPJqueryareessentiallygeneratedbypermutation of the joinordering.Withnrelations,therearen!possiblepermutations. The complexity of exhaustivesearchisthereforeprohibitivewhennislarge(e.g.,n>10). The use of dynamicprogrammingandheuristics,asinSelinger79],reducesthiscomplexityto2~,whichisstillsignificant.Tohandle the case of complexqueriesinvolving a largenumber of relations, the optimizationalgorithmmustbemoreefficient. The complexity of the optimizationalgorithmcanbefurtherreducedbyimposingrestrictions on the class of 12PT’sIbaraki84),limiting the generality of the costfunctionKrishnamurthy86),orusing a probabilistichill—climbingalgorithmloannidis87].Assumingthat the solutionspaceissearchedbyanefficientalgorithm,wenowillustrate the possiblePT’sthatcanbeproducedbased on the storagemodelwithjoinindices. The addition of joinindicesin the storagemodelenlarges the solutionspaceforoptimization.Joinindicesshouldbeconsideredby the queryoptimizerasanyotherjoinmethod,andusedonlywhentheyleadto the optimalPT.InValduriez87],wegive a precisespecification of the joinalgorithmusingjoinindex,denotedbyJOINJI,anditscost.ThisalgorithmtakesasinputtwobaserelationsR1(TID1, A1 ,B1,...

Tài liệu The British Computer Society Code of Good Practice docx

... Ensure the organisation's practices on the collection and use of personal data comply with applicable national, regional and international laws and (self) regulatory schemes; as a minimum ... Check the documentation of the components and assure yourself that they are compatible with each other and with the target platform.  Maintain a configuration management system that records the ... to standards that will maintain the continuity of application systems critical to the organisation's existence to a level agreed by the organisation; regularly test and maintain these procedures...

a bibliography of the personal computer electronic resource pdf

... ...

The Marketing Data Box - A QUARTERLY COLLECTION OF PRACTICAL MARKETING TOOLS READY FOR PROFESSIONAL USE doc

... data comes from major data partners and captures essential marketing data over the short term for a fast, easy glance at trends.D ATA I N S I G H T S The Marketing Data Box The Marketing Data ... Multichannel Merchant The Marketing Data Box19More Use of Practical Email ToolsTrigger emails on the rise Of the respondents that conducted email marketing, 9 of 10 sent promotional messages, and ... marketing professional with a time-saving collection of research and facts, in the form of charts and Excel documents, in order to make the knowledge demands of daily marketing an easier task....

DEVELOPING A COMPETITIVE STRATEGY: A CASE STUDY OF THE THANGLONG GARMENT COMPANY IN HANOI, VIETNAM

... Tuan A research study submitted in partial fulfillment of the requirement for the degree of Master of Business AdministrationExamination Committee: Prof. N. Ramachandran (Chairman)Dr. Truong ... safety flight toits customers. Major goals are the targets that an organization wants to fulfil in long or medium term. Mostprofit-seeking organizations place maximization of profit near the ... operation. In the past, most of jackets manufacturedare exported, company only focused on the domestic in the last three years. However, the framework and analysis of external environment can...

Potential biogas production from sewage sludge: A case study of the sewage treatment plant at Kwame Nkrumah university of science and technology, Ghana

... Mestas C. and Santos D. Economical and Environmental Analysis of a Biogas Plant within the Context of a Real Farm. The Royal Veterinary and Agricultural University, Denmark, 2007. [8] Design of ... the sludge at the Primary Sedimentation Tank to generate biogas. A laboratory experiment was done to determine the faecal sludge quality. The flowrate of the sludge was estimated based on the ... sludge generation. The estimation of the biogas potential of the sludge the quality of the sludge was analysed and the results is presented below in Table 5. The average litre of biogas produced...

A Quick Tour of the C++CLI Language Features

... want to call the appropriate decay method, either beta decay or alpha decay. These decay methods of the RadioactiveAtom class will update the atomic number and isotope number of the atom according ... n_atoms = 50; array<Atom>^ atoms = gcnew array<Atom>(n_atoms); // Between the array creation and initialization, // the atoms are in an invalid state. // Don't call GetAtomicNumber ... called a managed array. Using that instead of the native array should fix the problem, as in Listing 2-2.Listing 2-2. Using a Managed Array// atom_managed.cppref class Atom{ private: array<double>^...

A Brief Tour of the X Display Environment

... a totally separate system. The variants of the Microsoft Windows operating system cannot export the display of an individual application to be viewed on a separate machine. If an application ... is because you are attaching to one end of the ssh tunnel. The other end of the tunnel is on Machine A, which is where the appli-cation is actually displayed. The authority and display are now ... by a colon and a number. An example is ron.mydomain.com:0, which is display 0 on the system with the domain name ron.mydomain.com. If you are working on the console of a system that has an...

Xem thêm

Từ khóa: Báo cáo quy trình mua hàng CT CP Công Nghệ NPV Nghiên cứu vật liệu biến hóa (metamaterials) hấp thụ sóng điện tử ở vùng tần số THz Biện pháp quản lý hoạt động dạy hát xoan trong trường trung học cơ sở huyện lâm thao, phú thọ Giáo án Sinh học 11 bài 13: Thực hành phát hiện diệp lục và carôtenôit ĐỒ ÁN NGHIÊN CỨU CÔNG NGHỆ KẾT NỐI VÔ TUYẾN CỰ LY XA, CÔNG SUẤT THẤP LPWAN NGHIÊN CỨU CÔNG NGHỆ KẾT NỐI VÔ TUYẾN CỰ LY XA, CÔNG SUẤT THẤP LPWAN SLIDE Phối hợp giữa phòng văn hóa và thông tin với phòng giáo dục và đào tạo trong việc tuyên truyền, giáo dục, vận động xây dựng nông thôn mới huyện thanh thủy, tỉnh phú thọ Phát triển mạng lưới kinh doanh nước sạch tại công ty TNHH một thành viên kinh doanh nước sạch quảng ninh Nghiên cứu tổng hợp các oxit hỗn hợp kích thƣớc nanomet ce 0 75 zr0 25o2 , ce 0 5 zr0 5o2 và khảo sát hoạt tính quang xúc tác của chúng Định tội danh từ thực tiễn huyện Cần Giuộc, tỉnh Long An (Luận văn thạc sĩ)Tìm hiểu công cụ đánh giá hệ thống đảm bảo an toàn hệ thống thông tin Chuong 2 nhận dạng rui ro Tăng trưởng tín dụng hộ sản xuất nông nghiệp tại Ngân hàng Nông nghiệp và Phát triển nông thôn Việt Nam chi nhánh tỉnh Bắc Giang (Luận văn thạc sĩ)Tranh tụng tại phiên tòa hình sự sơ thẩm theo pháp luật tố tụng hình sự Việt Nam từ thực tiễn xét xử của các Tòa án quân sự Quân khu (Luận văn thạc sĩ)chuong 1 tong quan quan tri rui ro Nguyên tắc phân hóa trách nhiệm hình sự đối với người dưới 18 tuổi phạm tội trong pháp luật hình sự Việt Nam (Luận văn thạc sĩ)Giáo án Sinh học 11 bài 14: Thực hành phát hiện hô hấp ở thực vật Giáo án Sinh học 11 bài 14: Thực hành phát hiện hô hấp ở thực vật Chiến lược marketing tại ngân hàng Agribank chi nhánh Sài Gòn từ 2013-2015 TÁI CHẾ NHỰA VÀ QUẢN LÝ CHẤT THẢI Ở HOA KỲ