Tài liệu Database and XML Technologies- P7 ppt

43 274 0
Tài liệu Database and XML Technologies- P7 ppt

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

240 S. Flesca et al. – assign a new different value to one of the two isbn attributes, so that there are no two books with the same isbn. Note that the document can be made consistent by replacing one of the two values "0-451-16194-7" with any value in the domain, a part from those intro- ducing inconsistencies. To this end we shall use the unknown value ⊥ in order to replace inconsistent data. Moreover, when inconsistencies cannot be repaired by assigning different values to attributes or changing some element content, we consider an alternative strategy which uses a boolean function specifying the reliability of elements. Generally, more than one strategy can be used to repair a document, thus generating several repaired documents. Concerning the issue of querying an XML document with functional dependencies, we shall consider as certain information only the information contained in all possible repaired documents. The violation of a functional dependency suggests a set of possible update operations in order to ensure its satisfiability, yielding a consistent scenario of the information. In repairing documents we prefer the repairs performing min- imal sets of changes to the original document, in the same way as well known approaches proposed for relational database repairing. Example 2. Consider the XML document of the previous Example where the element title in the first book is missing. In this case, the update action con- sisting in assigning the value Principles of Database and Knowledge-Base Systems to the title of the first book is reliable. Consider again the XML document of the previous example with the func- tional dependency bib.book.@isbn → bib.book stating that two books having the same isbn coincide. In this case we could consider two repairs which make the isbn value unreliable, and two repairs which make the (node) book unreli- able. However, as the unreliability of a book implies the unreliability of all its (sub-)elements, we consider as feasible only the two repairs updating the isbn value. ✷ 2 Preliminaries XML Trees and DTDs A tree T is a tuple (r T ,N T ,E T ,λ T ), where N T ⊆ N is the set of nodes, λ T : N T → Σ is a node labelling function, r T ∈ N T is the distinguished root of t, and E T ⊆ N T × N T is an (acyclic) set of edges such that starting from any node n i ∈ N T it is possible to reach any other node n j ∈ N T , walking through a sequence of edges e 1 , .,e k . The set of leaf nodes of a tree T will be denoted as Leaves(T ). Given a tree T =(r T ,N T ,E T ,λ T ), we say that a tree T  = (r T  ,N T  ,E T  ,λ T  )isasubtree of T if the following conditions hold: 1. N T  ⊆ N T ; Repairs and Consistent Answers for XML Data 241 2. the edge (n i ,n j ) belongs to E T  iff n i ∈ N T  , n j ∈ N T  and (n i ,n j ) ∈ E T . The set of trees defined on the alphabet of node labels Σ will be denoted as T Σ . Given a tag alphabet τ, an attribute name alphabet α, a string alphabet Str and a symbol S not belonging to τ ∪ α,anXML tree is a pair XT = T,δ, where: – T =(r, N, E, λ) is a tree in T τ∪α∪{S} ; – given a node n of T , λ(n) ∈ α ∪{S}⇔n ∈ Leaves(T ); – δ : Leaves(T) → Str is a function associating a (string) value to every leaf of T . The symbol S is used to represent the #PCDATA content of elements. A DTD is a tuple D =(τ, α,P, R, rt) where: i) P is the set of element type definitions; ii) R is the set of attribute lists; iii) rt ∈ τ is the tag of the document root element. Example 3. The following XML document (conforming the DTD reported on the right-hand side of the document) represents a collection of books, and is graphically represented by the XML tree in Fig. 1. <bib> <book> <written_by> <author ano="A1"> <name>Ullman</name> </author> <author ano="A2"> <name>Widom</name> </author> </written_by> <title> A First Course in Database Systems </title> <publisher> Prentice-Hall </publisher> </book> <book> <written_by> <author ano="A1"> <name>Ullman</name> </author> </written_by> <title> Principles of Database and Knowledge-Base Systems </title> <publisher> CS Press </publisher> </book> </bib> <!ELEMENT bib (book+)> <!ELEMENT book (written_by, title, pub, year?)> <!ELEMENT written_by (author+)> <!ELEMENT author (name)> <!ATTLIST author ano CDATA> <!ELEMENT name PCDATA> <!ELEMENT title PCDATA> <!ELEMENT pub PCDATA> <!ELEMENT year PCDATA> The internal nodes of the XML tree have a unique label, denoting the tag name of the corresponding element. The leaf nodes correspond to either an at- tribute or the textual content of an element, and are labelled with two strings. The first one denotes the attribute name (in the case that the node represents 242 S. Flesca et al. Fig. 1. An XML Tree an attribute) or is equal to the symbol S (in the case that the node represents an element content). The second label denotes either the value of the attribute or the string contained inside the element corresponding to the node. ✷ A path p on a DTD D =(τ, α, P, R, rt) is a sequence p = s 1 , .,s m of symbols in τ ∪ α ∪{S} such that: 1. s 1 = rt; 2. for each i in 2 m − 1, s i ∈ τ and s i appears in the element type definition of s i−1 ; 3. s m ∈ α ⇒ s m appears in the attribute list of s m−1 ; 4. s m ∈ τ ∪{S}⇒s m appears in the element type definition of s m−1 . The set of paths which can be defined on a DTD D will be denoted as paths(D). In particular, paths(D) is partitioned into two disjoint sets: 1) EP aths(D), which contains all the paths p = s 1 , .,s m where s m ∈ τ (i.e. the paths whose last symbol denotes an element); 2) StrP aths(D) contains the paths whose last symbol denotes either the textual content of an element or an attribute. Example 4. Consider the DTD D of Example 3. The set of paths defined on D is partitioned into the following sets: EP aths(D)={ bib, bib.book, bib.book.written by, bib.book.written by.author, bib.book.written by.author.name, bib.book.title, bib.book.pub, bib.book.year } StrP aths(D)={ bib.book.written by.author.@ano, bib.book.written by.author.name.S, bib.book.title.S, bib.book.pub.S, bib.book.year.S } ✷ Given an XML tree XT = T,δ conforming a DTD D, a path p ∈ paths(D) identifies the set of nodes which can be reached, starting from the root of XT, by going through a sequence of nodes “spelling” p. More formally, p = s 1 , .,s m identifies the set of nodes {n 1 , .,n k } of XT such that, for each i ∈ 1 k, there exists a sequence of nodes n i 1 , .,n i m with the following properties: Repairs and Consistent Answers for XML Data 243 1. n i 1 = r T and n i m = n i ; 2. for each j ∈ 1 m − 1, n i j+1 is a child of n i j ; 3. for each j ∈ 1 m, λ(n i j )=s j . The set of nodes of XT identified by p will be denoted as p(XT). Moreover, we denote with XT.p the answer of the path p applied on XT, that is: – if p ∈ EP ath(D), then XT.p = p(XT); – if p ∈ StrP ath(D), then XT.p = {δ T (x)|x ∈ p(XT)}. Thus, the answer of a path p applied on XT is either a set of node identifiers, or a set of (string) values, depending on whether the last symbol s m in p belongs to τ (i.e. s m is a tag name) or to α ∪{S} (i.e. s m is either an attribute name or the symbol S). Example 5. Let XT be the XML tree of Fig. 1. In the following table we report the answers of different paths (defined over the DTD associated to XT) applied on XT. path p XT.p bib.book.title {v 12 ,v 22 } bib.book.title.S { “A First Course .” , “Principles of Database .” } bib.book.written by.author {v 4 ,v 8 ,v 18 } bib.book.written by.author.@ano { “A1” , “A2” } bib.book.year ∅ bib.book.year.S ∅ The answers to both the paths bib.book.year and bib.book.year.S are empty sets, as there is no node in XT associated to an element year. ✷ 3 XML and Functional Dependencies In this Section, we recall the notion of functional dependency in the XML setting proposed in [4,6] 2 . A functional dependency A → B in a relational database D models the correspondence between A and B values in the tuples of D. However, there is no standard tuple concept for XML. Thus, before introducing functional dependencies for XML, we provide the concept of tree tuples, corresponding to the concept of tuples in relational databases. Informally, a tree tuple groups together nodes of the document which are semantically correlated, according to the structure of the tree. For instance, a tree tuple of the XML tree XT of Fig. 1 consists of a sub-tree which contains information about a book. Observe that each book is possibly described by more than one tree tuple, as each tree tuple contains the information of only one author (see Example 6). 2 An alternative definition has been proposed in [13] 244 S. Flesca et al. Definition 1 (Tree Tuple). Given an XML tree XT conforming the DTD D, a tree tuple t of XT is a maximal sub-tree of XT such that, for every path p ∈ paths(D), t.p contains at most one element. ✷ Example 6. Consider the XML tree XT of Fig. 1. The subtrees of XT shown in Fig. 2(a) and Fig. 2(b) are tree tuples, whereas the subtrees in Fig. 3(a) and Fig. 3(b) are not tree tuples. (a)(b) Fig. 2. Two tree tuples of the XML tree of Fig. 1 (a)(b) Fig. 3. Two subtrees of the XML tree of Fig. 1 which are not tree tuples The subtree of Fig. 3(a) is not a tree tuple as there are two distinct nodes (i.e. v 4 and v 8 ) which correspond to the same path bib.book.written by.author. This means that each book stored in XT can correspond to more than one tree tuple: each tree tuple corresponds to one of the book authors. Repairs and Consistent Answers for XML Data 245 The subtree of Fig. 3(b) is not a tree tuple as it is not maximal: it is a subtree of the tree tuple of Fig. 2(b). ✷ Given a XML tree XT, a pair of tree tuples t 1 , t 2 of XT, and a set S ⊆ paths(D), t 1 .S = t 2 .S means that t 1 .p = t 2 .p for each path p ∈ S. Moreover we say that t 1 .S = ∅ if t 1 .p = ∅ for each p ∈ S. Definition 2 (Functional Dependency). Given a DTD D, a functional de- pendency on D is an expression of the form S → p, where S is a finite non empty subset of paths(D) and p is an element of paths(D). ✷ Given an XML tree XT conforming a DTD D and a functional dependency F : S 1 → S 2 , we say that XT satisfies F (XT |= F ) iff for each pair of tree tuples t 1 ,t 2 of XT, t 1 .S 1 = t 2 .S 1 ∧ t 1 .S 1 = ∅⇒t 1 .S 2 = t 2 .S 2 . Given a set of functional dependencies FD = {F 1 , .,F n } over D, we say that XT satisfies FD if it satisfies F i for every i ∈ 1 n. Example 7. Consider the XML tree XT of Fig. 1. The constraint that the at- tribute @ano identifies univocally the (value of the) name of every author can be expressed with the following functional dependency: bib.book.written by.author.@ano → bib.book.written by.author.name.S To say that two distinct authors of the same book cannot have the same value of the attribute ano we can use the following FD: {bib.book, bib.book.written by.author.@ano}→bib.book.written by.author ✷ A set of functional dependencies FD over a DTD D is satisfiable if there exists an XML tree XT conforming D such that XT |= FD. 4 Repairing and Querying Inconsistent XML Databases In this Section we present an approach to the problem of repairing XML doc- uments which are inconsistent w.r.t. a given set of functional dependencies. A possibly inconsistent XML document can be repaired by taking two different kind of actions: 1) by changing the value of an attribute or the content of an element, 2) by marking some of the attributes or elements of the document as “unreliable”. 246 S. Flesca et al. Example 8. Consider the following XML document conforming the DTD re- ported on its right-hand side: <cars> <car cno="c1"> <policy pno="p1"/> <garage> <name> Olympo </name> <city> Boston </city> </garage> <garage> <name> Johnson </name> <city> Cambridge </city> </garage> </car> </cars> <!ELEMENT cars (car+)> <!ELEMENT car (policy?, garage+)> <!ATTLIST car cno CDATA> <!ELEMENT policy EMPTY> <!ATTLIST policy pno CDATA> <!ELEMENT garage (name, city)> <!ELEMENT name PCDATA> <!ELEMENT city PCDATA> and the functional dependency {cars.car.policy}→cars.car.garage saying that, if a car has a policy, then it can be repaired by only one garage. Otherwise, if no policy is associated to the car, then it can be repaired in more than one garage. ✷ The above document does not satisfy the functional dependency, as the car with @cno = c1 has a policy, but is associated with two garages. This inconsis- tency may have one of the following causes: 1) the policy element is incorrect; 2) one of the two author elements is incorrect. The above functional dependency involves only node identifiers, so that it is not possible to repair the document by changing some of its element values. A possible repair strategy consists of considering unreliable either the policy element or one of the author elements. We point out that marking a node as unreliable is a more preserving mecha- nism than simply deleting it. Indeed, a simple deletion of a whole garage element would produce undesired side-effects. For instance, if we delete one of the two garage elements and then ask whether the car can be repaired in only one garage, the answer would be “yes”. On the contrary, by marking one of the two garage elements as “unreliable”, we will consider the “yes” answer as not reliable. Example 9. Consider the XML tree XT of Fig. 4, conforming the DTD D of Example 3 and suppose that we are given the following functional dependency: {bib.book, bib.book.written by.author.@ano}→bib.book.written by.author . The XML tree XT does not satisfy the above FD, as the two author elements, contained in the same book, have the same value of the attribute @ano, whereas the above FD requires that, for each book, there is only one author having a given @ano value. ✷ The constraint in the above example may not be satisfied for two possible reasons: 1) one of the two @ano values is incorrect; 2) one of the two author elements is incorrect. Repairs and Consistent Answers for XML Data 247 Fig. 4. An XML tree Therefore, two repairing strategies are possible. If we assume that the former of the two errors occurs, we are induced to change the @ano value of one of the authors. That is, we can make XT consistent w.r.t. the given FD by assigning a new value (denoted as ⊥ 1 ) to the attribute @ano of any of the author elements (see Fig. 5(a) ). (a)(b) Fig. 5. Two repairs of the XML tree of Fig. 4 Otherwise, if we assume that the latter error occurs (i.e. one of the two author elements is incorrect), we choose to mark one of the two authors having the same @ano as unreliable (see Fig. 5(b), where unreliable nodes are marked with the symbol ). However, the latter strategy changes a larger portion of the document, since it marks a whole author element as unreliable, whereas the first strategy only changes its @ano. Repair strategies performing smaller changes to the original document will be preferred, in the same way as in well-known approaches to relational database repairing [3,11]. Thus, we propose two different kinds of actions which can be performed for repairing inconsistent XML documents: 1) updating element values and 2) mark- ing elements as unreliable. Observe that we prefer marking a node as unreliable 248 S. Flesca et al. rather than deleting it, since removing elements from an XML document leads to two undesired side effects: it causes incorrect answers to queries, like in example 8, and does not always suffice to remove inconsistency. In fact, deleting a node can lead to a new document not conforming the given DTD. 4.1 R-XML Tree Given an XML tree XT, the reliability of the nodes of XT is given by providing a boolean function that assigns “true” to every reliable node and “false”toevery unreliable node. More formally: Definition 3 (R-XML tree). A R-XML tree is a triplet RXT = T,δ, where T,δ is an XML tree and  is a reliability function from N T to {true, false}, such that, for each pair of nodes n 1 ,n 2 ∈ N T with n 2 descendent of n 1 , it holds that (n 1 )=false ⇒ (n 2 )=false. ✷ An XML tree XT is an R-XML tree such that  returns true for all nodes in XT. Thus, a R-XML tree can be thought of as an XML tree where each node is marked with a boolean value (true if the node is reliable, and false otherwise). We now introduce the concept of satisfiability of functional dependencies over R-XML trees. Definition 4 (Weak satisfiability). Let RXT = T,δ, be an R-XML tree conforming a DTD D, and f : S → p be a functional dependency. We say that RXT weakly satisfies f (RXT |= w f) if one of the following conditions holds: 1. T,δ|= f; 2. for each pair of tuples t 1 ,t 2 of RXT one of the following holds: a. there exists a path p i ∈ S such that: ((p i (t 1 )) = false) ∨ ((p i (t 2 )) = false); b. ((p(t 1 )) = false) ∨ ((p(t 2 )) = false). ✷ It is worth noting that for XML-trees the weak satisfiability reduces to the standard notion of satisfiability. Basically, the weak satisfiability does not con- sider unsatisfied functional dependencies over paths containing unreliable nodes. Given a set of functional dependencies FD = {F 1 , .,F n } over D,wesay that RXT weakly satisfies FD (D |= w FD) if it weakly satisfies F i for every i ∈ 1 n. Before presenting our repairing technique we need some preliminary nota- tions. The composition of two reliability functions  1 and  2 is  1 ·  2 (n)= min( 1 (n), 2 (n)). The composition of two functions δ 1 and δ 2 associating val- ues to leaf nodes is δ 1 · δ 2 (n)=  δ 1 (n)ifδ 1 (n) is defined over n, δ 2 (n) otherwise (i.e. δ 1 (n) is not defined over n). Repairs and Consistent Answers for XML Data 249 The composition of functions is useful to update node values (strings assigned to leaf nodes and reliability values). Moreover, by composing two reliability func- tions, the value of a node cannot be increased (i.e. reliable nodes can be made unreliable, but unreliable nodes cannot be made reliable). In the following, for a given R-XML tree RXT = T,δ T , T  and reliability function  (resp. function assigning leaf values δ), we denote with (RXT )= T,δ T ,·  T  (resp. δ(RXT)=T,δ · δ T , T ) the application of  (resp. δ)to RXT. Definition 5 (Weak repair). Let RXT = T,δ, be an R-XML tree con- forming a DTD D and FD a set of functional dependencies. A (weak) repair for RXT is a pair of functions δ  and   such that RXT  = T,δ  · δ,   ·  weakly satisfies FD (RXT |= w FD). ✷ Example 10. Consider the XML document of Example 3, graphically represented in Fig. 1, and the functional dependency bib.book.written by.author.@ano → bib.book.written by.author. The document is not consistent as there are two authors with the same value for the attribute @ano. Possible repairs are: R 1 = {δ(v5)=⊥ }, {} (v), R 2 = {δ(v9)=⊥}, {} (v), R 3 = {}, {v4,v5,v6,v7} (v) and R 4 = {}, {v8,v9,v10,v11} (v), where the function  S (v) states that v ∈ S is defined false and v ∈ S is defined true by . ✷ As we have assumed that the reliability value of a node cannot be greater than the reliability value of its ancestors, we often do not specify the reliability value of descendants of unreliable nodes. For instance, regarding the reliability function of the repair R 3 , we shall denote R 3 as {}, {v4} , as the nodes v5,v6 and v7 are descendant of the node v4,. The set of weak repairs for a possibly inconsistent R-XML tree RXT, with respect to a set of functional dependencies FD, will by denoted by R(RXT, FD). Given a set of of labelled nodes N and a reliability function  defined on N , we denote with True  (N)={n ∈ N|(n)=true} and with False  (N)={n ∈ N|(n)=false}. Analogously, we denote with Updated δ (N) the set of (leaf) nodes on which δ is defined, i.e. the set of nodes modified by δ. With a little abuse of notation we apply the functions True  , (resp. False  , U pdated δ ) to trees as well. When these functions are applied to a R-XML tree RXT = T,δ,, their results consist of the subtree of RXT only containing the nodes in True  (N T ) (resp. False  (N T ), U pdated δ (N T )). Definition 6 (Minimal Repair). Let XT = T,δ be an XML Tree con- forming a DTD D, FD a set of functional dependencies and R 1 = δ 1 , 1 , R 2 = δ 2 , 2  two repairs for XT. We say that R 1 is smaller than R 2 (R 1  R 2 )ifUpdated δ 1 (N T ) ∪ False  1 (N T ) ⊆ U pdated δ 2 (N T ) ∪ False  2 (N T ) and False  1 (N T ) ⊆ False  2 (N T ). Moreover, we say that a repair R is minimal if there is no repair R  = R such that R   R. ✷ [...]... root and b is not a key and c is a key and b ∩ c ∩ p = root and there exists b →→ d|e such that d is a key and e is a key and d ≥ q ∩ r and e ≥ q ∩ r; (G) q is not a key and r is not a key and there exists p →→ q|k and there exists p →→ k|r such that k ≥ q ∩ r; (H) p is a key, q is a key and r is not a key and q ∩ r = p and q ∩ r is not a strict prefix of p and there exists x →→ q|k such that x < p and. .. and the development of a similar theory for XML will similarly lay the foundation for understanding how to design XML documents In addition, the study of FDs and MVDs in XML is important because of the close connection between XML and relational databases With current technology, the source of XML data is typically a relational database [1] and relational databases are also normally used to store XML. .. 4nf in relational database design Acta Informatica, 36:1–41, 1999 13 M.W Vincent and J Liu Strong functional dependencies and a redundancy free normal form for xml Submitted to ACM Transactions on Database Systems, 2002 14 M.W Vincent and J Liu Functional dependencies for xml In Fifth Asian Pacific Web Conference, 2003 15 M.W Vincent and J Liu Multivalued dependencies and a 4nf for xml In 15th International... security models for XML documents by leveraging on techniques developed for relational databases More specifically, in our approach, (1) Users make XML queries against the given XML view/schema, (2) Access controls for XML data are also specified in the XML model, but (3) Data are stored in relational databases, and (4) Security check and query evaluation are also done in relational databases Instead of... data in relational databases are to (1) map XML authorization rules into the existing access control mechanism in relational databases; and (2) map XML documents into tables in relational databases In the following, we discuss some of the issues 1 Theoretical study of XML and Relational security models To fully realize our vision, a thorough study on the expressive power of XML and relational security... Abiteboul, R Hull, and V Vianu Foundations of databases Addison WAesley, 1996 3 P Buneman, S Davidson, W Fan, and C Hara Reasoning about keys for xml In International Workshop on Database Programming Languages, 2001 4 P Buneman, S Davidson, W Fan, C Hara, and W Tan Keys for xml Computer Networks, 39(5):473–487, 2002 266 M.W Vincent, J Liu, and C Liu 5 P Buneman, W Fan, J Simeon, and S Weinstein Constraints... Berlin Heidelberg 2003 268 D Lee, W.-C Lee, and P Liu Table 1 The overview of XML and Relational security model supports XML Relational XML Security Models Relational Security Models ([6], [2], etc) ([13], etc) XML Databases Relational Databases (Xindice, Tamino, etc) (Oracle, DB2, SQL Server, etc) Models Products most XML database products currently do not have any support for access controls Similarly,... support XML security models by utilizing existing security support of relational security models or relational products More specifically, we assume that – XML documents are converted into and stored in relational databases – Users are given an XML view/schema against which they issue XML queries – Access controls are specified by security administrators in the XML schema and documents – Security check and. .. Σ be a set XMVDs and key constraints Then Σ is in XML fourth normal form (4XNF) if for every XMVD p →→ q|r ∈ Σ, at least one of the following conditions holds: (A) q and r are both keys; (B) p is a key and q ∩ r = p; (C) p is a key and q ∩ r is a strict prefix of p; (D) q ∩ r = root; (E) there exists an XMVD s →→ t|u ∈ Σ such that s∩p = root and t ≥ q ∩r and t is a key and u ≥ q ∩ r and u is a key; (F)... data and xml ACM SIGMOD Record, 30(1):45–47, 2001 6 P Buneman, W Fan, and S Weinstein Path constraints on structured and semistructured data In Proc ACM PODS Conference, pages 129–138, 1998 7 W Fan and J Simeon Integrity constraints for xml In Proc ACM PODS Conference, pages 23–34, 2000 8 M Levene and M W Vincent Justification for inclusion dependency normal form IEEE Transactions on Knowledge and Data . of FDs and MVDs in XML is important because of the close connection between XML and relational databases. With current technol- ogy, the source of XML data. relational database design and the development of a similar theory for XML will similarly lay the foundation for understanding how to design XML documents.

Ngày đăng: 24/12/2013, 03:15

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan