... the definitions of the complexobjecttypes.Wewillpresent,inparticular, the techniquesusedin the ESPRITprojectMULTOS.Inthisproject, a dataserverhasbeenimplementedinwhichdataobjectsareconstitutedbymultimediadocumentswithcomplexinternalstructures.1.IntroductionManyapplications,suchasofficeinformationsystems(OIS),particularlyfilingandretrieval of multimediadocumentsIEEE84],computer-aideddesignandmanufacturing(CAD/CAM),andartificialintelligence(Al)ingeneralandknowledge-basedexpertsystemsinparticular,needtodealwith a largenumber of dataobjectshavingcomplexstructures.Insuchapplicationareas, the datamanagementsystemhastocopewith the largevolumes of dataandtomanage the complexity of the structures of thesedataobjectsBANE87J.Animportantcharacteristic of many of thesenewapplicationsisthatthereis a muchlowerratio of instancespertypethanintraditionaldatabaseapplications.Consequently, a largenumber of objectsimplies a largenumber of objecttypes. The resultisoften a verylargeschema,onwhichitbecomesdifficultfor the userstospecifyqueries. The datamanagementsystemmustbeabletoprocessqueriescontainingbothconditionson the schema(i.e.partialconditionsontypestructures of the complexdataobjectstobeselected)andon the dataobjects(i.e.conditionson the values of the basiccomponentscontainedin the complexdataobjects).Inthispaperwewillfocuson a particularphaseinqueryprocessingon a database of complexobjects.Inthisphase the queryisanalyzed,completed,andtransformedbasedon the informationcontainedin the definitions of the complexobjecttypes.WecallthisphaseType-LevelQueryProcessing.Withthisphase, the systemrealizes a two-foldfunctionality:• The systemdoesnotforce the usertospecifyexactly the structures(i.e. the types) of the complexdataobjectstoselect.On the contrary,itallows the usertospecifyonlypartialstructures of thesecomplexobjects,somakingqueriesoncontentismuchmoreflexible.Infact, the usercanspecify the type of only a fewcomponents of the complexobjects(andgivingconditionson the values),withoutspecifying the completetype of the complexobjects.• The systemexploits the complexstructures of the dataobjects,describedaccordingto a high-levelmodel,forquerytransformationswhichsimplify the rest of the queryprocessing.DuringType-Levelprocessing,sometransformationsallowpruning of the query,sothat the resultingquerycontainsfewerpredicatestoevaluate.Inotherwords,for a givenquery, the Type-Levelprocessorcheckswhetherthereareconjunctsordisjunctsin the querythatarealwaystrueforinstances of the objecttypesreferencedin the query.Incertaincases,duringthisphase,itmayalsobededucedthat the queryisemptywithouthavingtoaccess the data.In ... )clusteredonTID.Clusteringisbasedon a hashedortreestructuredorganization. A selectionindexonattribute A of relationRis a baserelationF (A, TID)clusteredon A. LetR1andR2betworelations,notnecessarilydistinct,andletTID1andTID2beidentifiers of tuples of R1and A2 ,respectively. A joinindexonrelationsR1and A2 is a relation of couples(TID1,TID2),whereeachcoupleindicatestwotuplesmatching a joinpredicate.Intuitively, a joinindexisanabstraction of the join of tworelations. A joinindexcanbeimplementedbytwobaserelationsF(TID1,TID2),oneclusteredonTID1and the otheronTID2.Joinindicesareuniquelydesignedtooptimizejoins. The joinpredicateassociatedwith a joinindexmaybequitegeneralandincludeseveralattributes of bothrelations.Furthermore,morethanonejoinindexcanbedefinedbetweenanytworelations. The identification of variousjoinindicesbetweentworelationsisbasedon the associatedjoinpredicate.Thus, the join of relations A1 andR2on the predicate(R1 .A =R2 .A andR1.B=R2.B)canbecapturedaseither a singlejoinindex,on the multi—attributejoinpredicate,ortwojoinindices,oneon(R1 .A =R2 .A) and the otheron(R1.BR2.B). The choicebetween the alternativesis a databasedesigndecisionbasedonjoinfrequencies,updateoverhead,etc.Letusconsider the followingrelationaldatabaseschema(keyattributesarebold):11CUSTOMER(cname,city,age,job)ORDER(cname,pname,qty,date)PART(pname,weight,price,spname) A (partial)physicalschemaforthisdatabase,basedon the storagemodeldescribedabove,is(clusteredattributesarebold)C_PC(CID,cname,city,age,job)City_IND(city,CID)Age_IND(age,CID)0_PC(OlD,cname,pname,qty,date)CnamelND(cname,OlD)CIDJI(CID,OlD)OID_Jl(OlD,CID)C_PCand0_PCareprimarycopies of CUSTOMERandORDERrelations.City_INDandAge_INDareselectionindicesonCUSTOMER.CnamelNDis a selectionindexonORDER.CIDJIandOlDJIarejoinindicesbetweenCUSTOMERandORDERfor the joinpredicate(CUSTOMER.Cname=ORDER.Cname).3.Optimization of Non—RecursiveQueries- The objective of queryoptimizationistoselectanaccessplanforaninputquerythatoptimizes a givencostfunction.Thiscostfunctiontypicallyreferstomachineresourcessuchasdiskaccesses,CPUtime,andpossiblycommunicationtime(for a distributeddatabasesystem). The queryoptimizerisincharge of decisionsregarding the ordering of databaseoperations,and the choice of the accesspathsto the data, the algorithmsforperformingdatabaseoperations,and the intermediaterelationstobematerialized.Thesedecisionsareundertakenbasedon the physicaldatabaseschemaandrelatedstatistics. A set of decisionsthatleadtoanexecutionplancanbecapturedby a processingtreeKrishnamurthy86]. A processingtree(PT)is a treeinwhich a leafis a baserelationand a non—leafnodeisanintermediaterelationmaterializedbyapplyinganinternaldatabaseoperation.Internaldatabaseoperationsimplementefficientlyrelationalalgebraoperationsusingspecificaccesspathsandalgorithms.Examples of internaldatabaseoperationsareexact—matchselect,sort—mergejoin,n—arypipelinedjoin,semi—join,etc. The application of algebraictransformationrulesJarke84]permitsgeneration of manycandidatePT’sfor a singlequery. The optimizationproblemcanbeformulatedasfinding the PT of minimalcostamongallequivalentPT’s.TraditionalqueryoptimizationalgorithmsSelinger79]performanexhaustivesearch of the solutionspace,definedas the set of allequivalentPT’s,for a givenquery. The estimation of the cost of a PTisobtainedbycomputing the sum of the costs of the individualinternaldatabaseoperationsin the PT. The cost of aninternaloperationisitself a monotonicfunction of the operandcardinalities.If the operandrelationsareintermediaterelationsthentheircardinalitiesmustalsobeestimated.Therefore,foreachoperationin the PT,twonumbersmustbepredicted:(1) the individualcost of the operationand(2) the cardinality of itsresultbasedon the selectivity of the conditionsSelinger79,Piatetsky84]. The possiblePT’sforexecutinganSPJqueryareessentiallygeneratedbypermutation of the joinordering.Withnrelations,therearen!possiblepermutations. The complexity of exhaustivesearchisthereforeprohibitivewhennislarge(e.g.,n>10). The use of dynamicprogrammingandheuristics,asinSelinger79],reducesthiscomplexityto2~,whichisstillsignificant.Tohandle the case of complexqueriesinvolving a largenumber of relations, the optimizationalgorithmmustbemoreefficient. The complexity of the optimizationalgorithmcanbefurtherreducedbyimposingrestrictionson the class of 12PT’sIbaraki84),limiting the generality of the costfunctionKrishnamurthy86),orusing a probabilistichill—climbingalgorithmloannidis87].Assumingthat the solutionspaceissearchedbyanefficientalgorithm,wenowillustrate the possiblePT’sthatcanbeproducedbasedon the storagemodelwithjoinindices. The addition of joinindicesin the storagemodelenlarges the solutionspaceforoptimization.Joinindicesshouldbeconsideredby the queryoptimizerasanyotherjoinmethod,andusedonlywhentheyleadto the optimalPT.InValduriez87],wegive a precisespecification of the joinalgorithmusingjoinindex,denotedbyJOINJI,anditscost.ThisalgorithmtakesasinputtwobaserelationsR1(TID1, A1 ,B1, ... the designertochoosefrom a variety of accessmethodsandimplementations of accessmethods,dataplacementstrategiesandimplementations of dataplacementstrategiestobedefinedfor a file.Thisset of accessmethodsandplacementstrategiesisextensible.Wearecurrentlytesting the system. The filingsystemwillbeusedforlow-levelsupport of the archivalcomponent of the server. The queryprocessingstrategiesthatwillperformbestin the performancestudiesoutlinedinthispaperwillbeincorporatedin the system.ReferencesChristodoulakis84]5.Christodoulakis:“Implications of AssumptionsinDatabasePerformanceEvaluation”,ACMTODS,June1984.Christodoulakis87aJS.Christodoulakis:“Analysis of RetrievalPerformanceforRecordsandObjectsUsingOpticalDiskTechnology”,ACMTODS,June1987.Christodoulakis87b]S.Christodoulakis:“AnalysisandFundamentalPerformanceTradeoffsforCLVOpticalDisks”,TechnicalReport,Department of ComputerScience,University of Waterloo,1987.ChristodoulakisandVelissaropoulos87]S.ChristodoulakisandT.Velissaropoulos:“Issuesin the Design of a DistributedTestbedforMINOS”,TransactionsonManagementInformationSystems”,1987.ChristodoulakisandNg87]5.ChristodoulakisandR.Ng:“QueryProcessingin a MultimediaRetrievalEnvironment”,inpreparation,1987.Christodoulakisetal.87]S.Christodoulakis,E.Ledoux,R.Ng:“AnOpticalDiskBasedObjectFilingSystem”,TechnicalReport,Department of ComputerScience,University of Waterloo,1987.ChristodoulakisandFaloutsos84]S.ChristodoulakisandC.Faloutsos:“PerformanceAnalysis of a MessageFileServer”,IEEETransactionsonSoftwareEngineering,March1984.FaloutsosandChristodoulakis87]C.FaloutsosandS.Christodoulakis:“Analysis of RetrievalPerformance of SignatureAccessMethods”,ACMTOOlS,1987.Haskin8l]L .A. Haskin:“SpecialPurposeProcessorsforTextRetrieval”,DatabaseEngineering4,1,Sept1981,16-29.21QueryProcessingBasedonComplexObjectTypesElisaBertino,FaustoRabittiIstitutodiElaborazionedellaInformazioneConsiglioNazionaledelleRicercheViaS.Maria46,Pisa(Italy)ABSTRACTInapplscatwnareaswhere the datamanagementsystemhastodealwith a largenumber of complexdataobjectswith a widevariety of types, the systemmustbeabletoprocessqueriescontainingbothconditionson the schema of the dataobjectsandon the values of the dataobjects.Inthispaperwewillfocuson a particularphaseinqueryprocessingon a database of complexobjectscalledType-LevelQueryProcessing.Inthisphase, the queryisanalyzed,completed,andtransformedon the basis of the...