Tài liệu XML by Example- P6 pptx

50 433 0
Tài liệu XML by Example- P6 pptx

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

After the opening tag, the parser sees the content of the name element: XML Training . It generates an event by passing the application the content as a parameter. The next event indicates the closing tag for the name element. The parser has completely parsed the name element. It has fired five events so far: three events for the name element, one event for the declaration, and one for product opening tag. The parser now moves to the first price element. It generates two events for each price element: one event for the opening tag and one event for the closing tag. Even though the closing tag is reduced to the / character in the opening tag, the parser generates an event for it. The parser passes the element’s parameters to the application in the event for the opening tag. There are four price elements, so the parser generates eight events as it parses them. Finally, the parser meets product ’s closing tag and it gener- ates its last event. As Figure 8.5 illustrates, taken together, the events describe the document tree to the application. An opening tag event means “going one level down in the tree,” whereas a closing tag element means “going one level up in the tree.” 235 Why Another API? Figure 8.5: How the parser builds the tree implicitly An event-based interface is the most natural interface for a parser. Indeed, the parser simply has to report what it sees. Note that the parser passes enough information to build the document tree of the XML documents but, unlike an object-based parser, it does not explic- itly build the tree. 10 2429 CH08 11/12/99 1:09 PM Page 235 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. NOTE If needed, the application can build a DOM tree from the events it receives from the parser. In fact, several object-based parsers are built around an event-based parser. Internally, they use an event-based parser and they create objects in response to the events the parser generates. Why Use Event-Based Interfaces? Which type of interface do you use? An object-based or an event-based interface? Unfortunately, there is no clean-cut answer to this question. Neither of the two interfaces is intrinsically better; they serve different needs. The main reason people prefer event-based interfaces is efficiency. Event- based interfaces are lower level than object-based interfaces. On the posi- tive side, they give you more control over parsing and enable you to optimize your application. On the downside, it means more work for you. As already discussed, an event-based interface consumes fewer resources than an object-based one, simply because it does not need to build the docu- ment tree. Furthermore, with an event-based interface, the application can start pro- cessing the document as the parser is reading it. With an object-based interface, the application must wait until the document has been com- pletely read. Therefore, event-based interfaces are particularly popular with applications that process large files (which would take a lot of time to read and create a document tree) and for servers (which process many documents simultane- ously). The major limitation of event-based interfaces is that it is not possible to navigate through the document as you can with a DOM tree. Indeed, after firing an event, the parser forgets about it. As you will see, the application must explicitly buffer those events it is interested in. It might also have more work in managing the state. Of course, whether it uses an event-based or an object-based interface, the parser does a lot of useful work: It reads the document, enforces the XML syntax, and resolves entities. When using a validating parser, it might vali- date the document against its DTD. So, there are many reasons to use a parser. 236 Chapter 8: Alternative API: SAX 10 2429 CH08 11/12/99 1:09 PM Page 236 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. SAX: The Alternative API By definition, the DOM recommendation does not apply to event-based parsers. The members of the XML-DEV mailing list have developed a stan- dard API for event-based parsers called SAX, short for the Simple API for XML. SAX is defined for the Java language. There is a version of SAX for Python and Perl but currently none for JavaScript or C++. Furthermore, SAX is not implemented in browsers; it is available only for standalone parsers. Obviously, the examples in this chapter are written in Java. If you want to learn how to write Java applications, refer to Appendix A, “Crash Course on Java.” SAX is edited by David Megginson and published at www.megginson.com/ SAX . Unlike DOM, SAX is not endorsed by an official standardization body but it is widely used and is considered a de facto standard. In particular, Sun has included SAX in ProjectX—an ongoing effort to add an XML parser to the Java platform. ProjectX also supports DOM so the parser offers both event-based and object-based interfaces. It is available from java.sun.com . The IBM parser, XML for Java (available from www.alphaworks.ibm.com ), and the DataChannel parser, XJParse (available from www.datachannel.com ), are other parsers that support both the DOM and SAX interfaces. Microstar’s Ælfred ( www.microstar.com ) and James Clark’s XP ( www.jclark.com ) support only the SAX interface. Getting Started with SAX Listing 8.2 is a Java application that finds the cheapest price from the list of prices in Listing 8.1. The application prints the best price as well as the name of the vendor. Listing 8.2: Simple SAX Application /* * XML By Example, chapter 8: SAX */ package com.psol.xbe; import org.xml.sax.*; import org.xml.sax.helpers.ParserFactory; 237 SAX: The Alternative API EXAMPLE continues 10 2429 CH08 11/12/99 1:09 PM Page 237 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. /** * SAX event handler to find the cheapest offering * in a list of prices. * @author bmarchal@pineapplesoft.com */ public class Cheapest extends HandlerBase { /* * event handler */ /** * properties we are collecting: cheapest price */ protected double min = Double.MAX_VALUE; /** * properties we are collecting: cheapest vendor */ protected String vendor = null; /** * startElement event: the price list is stored as price * elements with price and vendor attributes * @param name element’s name * @param attributes element’s attributes */ public void startElement(String name,AttributeList attributes) { if(name.equals(“price”)) { String attribute = attributes.getValue(“price”); if(null != attribute) { 238 Chapter 8: Alternative API: SAX Listing 8.2: continued 10 2429 CH08 11/12/99 1:09 PM Page 238 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. double price = toDouble(attribute); if(min > price) { min = price; vendor = attributes.getValue(“vendor”); } } } } /** * helper method: turn a string in a double * @param string number as a string * @return the number as a double, or 0.0 if it cannot convert * the number */ protected double toDouble(String string) { Double stringDouble = Double.valueOf(string); if(null != stringDouble) return stringDouble.doubleValue(); else return 0.0; } /** * property accessor: vendor name * @return the vendor with the cheapest offer so far */ public String getVendor() { return vendor; } /** * property accessor: best price * @return the best price so far 239 SAX: The Alternative API continues 10 2429 CH08 11/12/99 1:09 PM Page 239 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. */ public double getMinimum() { return min; } /* * main() method and properties */ /** * the parser class (IBM’s XML for Java) */ protected static final String PARSER_NAME = “com.ibm.xml.parsers.SAXParser”; /** * main() method * decodes command-line parameters and invokes the parser * @param args command-line argument * @throw Exception catch-all for underlying exceptions */ public static void main(String[] args) throws Exception { // command-line arguments if(args.length < 1) { System.out.println(“java com.psol.xbe.CheapestCL ➥filename”); return; } // creates the event handler Cheapest cheapest = new Cheapest(); // creates the parser 240 Chapter 8: Alternative API: SAX Listing 8.2: continued 10 2429 CH08 11/12/99 1:09 PM Page 240 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Parser parser = ParserFactory.makeParser(PARSER_NAME); parser.setDocumentHandler(cheapest); // invokes the parser against the price list parser.parse(args[0]); // prints the results System.out.println(“The cheapest offer is “ + cheapest.getVendor() + “ ($” + cheapest.getMinimum() + ‘)’); } } Compiling the Example To compile this application, you need a Java Development Kit (JDK) for your platform. For this example, the Java Runtime is not enough. You can download the JDK from java.sun.com . Furthermore, you have to download the IBM parser, XML for Java, from www.alphaworks.ibm.com . As always, I will post updates on www.mcp.com . So, if you have problems downloading a component, visit www.mcp.com . Save Listing 8.2 in a file called Cheapest.java. Go to the DOS prompt, change to the directory where you saved Cheapest.java, and create an empty directory called classes . The compile will place the Java program in the classes directory. Finally, compile the Java source with javac -classpath c:\xml4j\xml4j.jar -d classes Cheapest.java This command assumes you have installed the IBM parser in c:\xml4j ; you might have to adapt the classpath if you installed the parser in a different directory. To run the application against the price list, issue the following command: java -classpath c:\xml4j\xml4j.jar;classes ➥com.psol.xbe.Cheapest prices.xml This command assumes that the XML price list from Listing 8.1 is in a file called prices.xml . CAUTION The programs in this chapter do essentially no error checking. The programs minimize errors; however, if you type parameters incorrectly, the programs can crash. 241 SAX: The Alternative API 10 2429 CH08 11/12/99 1:09 PM Page 241 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Running this program against the price list in Listing 8.1 gives the result: The cheapest offer is XMLi ($699.0) Note that the classpath points to the parser and to the classes directory. The fully qualified name of the file is com.psol.xbe.Cheapest . CAUTION This example won’t work unless you have installed a Java Development Kit. If there is an error message similar to “Exception in thread “main” java.lang.NoClassDefFoundError” , it means that either the classpath is incorrect (be sure it points to the right directories) or that you typed an incorrect class name ( com.psol.xbe.Cheapest ). SAX Interfaces and Objects Events in SAX are defined as methods attached to specific Java interfaces. An application implements some of these methods and registers as an event-handler with the parser. Main SAX Events SAX groups its events in a few interfaces: • DocumentHandler defines events related to the document itself (such as opening and closing tags). Most applications register for these events. • DTDHandler defines events related to the DTD. Few applications regis- ter for these events. Moreover, SAX does not define enough events to completely report on the DTD (SAX-validating parsers read and use the DTD but they cannot pass all the information to the application). • EntityResolver defines events related to loading entities. Few applica- tions register for these events. They are required to load entities from special sources such as a database. • ErrorHandler defines error events. Applications register for these events if they need to report errors in a special way. To simplify work, SAX provides a default implementation for all these interfaces in the HandlerBase class. It is easier to extend HandlerBase and override the methods that are relevant for the application rather than to implement an interface directly. Parser To register event handlers and to start parsing, the application uses the Parser interface. To start parsing, the application calls parse() , a method of Parser : parser.parse(args[0]); 242 Chapter 8: Alternative API: SAX OUTPUT EXAMPLE 10 2429 CH08 11/12/99 1:09 PM Page 242 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Parser defines the following methods: • parse() starts parsing an XML document. There are two versions of parse() —one accepts a filename or a URL, the other an InputSource object (see section “InputSource”). • setDocumentHandler() , setDTDHandler() , setEntityResolver() , and setErrorHandler() allow the application to register event handlers. • setLocale() requests error messages in a specific Locale. ParserFactory ParserFactory creates the parser object. It takes the class name for the parser. For XML for Java, it is com.ibm.xml.parsers.SAXParser . To switch to another parser, you can change one line and recompile: protected static final String PARSER_NAME = “com.ibm.xml.parsers.SAXParser”; // . Parser parser = ParserFactory.makeParser(PARSER_NAME); For more flexibility, the application can read the class name from the com- mand line or from a configuration file. In this case, it is even possible to change the parser without recompiling. InputSource InputSource controls how the parser reads files, including XML documents and entities. In most cases, documents are loaded from the local file system or from a URL. The default implementation of InputSource knows how to load them. However, if an application has special needs, such as loading documents from a database, it can override InputSource . The parse() method is available in two versions—one takes a string, the other an InputSource . The string version uses the default InputSource to load the document from a file or a URL. DocumentHandler Listing 8.2 is simple because it needs to handle only the startElement mes- sage. As the name implies, the message is sent when the parser sees the opening tag of an element. The event is defined by the DocumentHandler interface. The application cre- ates a new class, Cheapest , which overrides the startElement() method. The application registers Cheapest as an event handler with the parser. // creates the event handler 243 SAX Interfaces and Objects EXAMPLE EXAMPLE 10 2429 CH08 11/12/99 1:09 PM Page 243 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Cheapest cheapest = new Cheapest(); // . parser.setDocumentHandler(cheapest); DocumentHandler declares events related to the document. The following events are available: • startDocument() / endDocument() notify the application of the docu- ment’s beginning or ending. • startElement() / endElement() notify the application that an element starts or ends (which corresponds to the opening and closing tags of the element). Attributes are passed as an AttributeList ; see the section “AttributeList” that follows. Empty elements ( <img href=”logo.gif”/> ) generate both startElement and endElement events even though there is only one tag. • characters() / ignorableWhitespace() notify the application when the parser finds content (text) in an element. The parser can break a piece of text in several events or pass it all at once as it sees fit. However, one event is always attached to a single element. The ignorableWhitespace event is used for ignorable spaces as defined by the XML specs. • processingInstruction() notifies the application of processing instruc- tions. • setDocumentLocator() passes a Locator object to the application; see the section “Locator” that follows. Note that the SAX parser is not required to supply a Locator , but if it does, it must fire this event before any other event. AttributeList In the event, the application receives the element name and the list of attributes in an AttributeList . In this example, the application waits until a price element is found. It then extracts the vendor name and the price from the list of attributes. Armed with this information, finding the cheapest product requires a simple comparison: public void startElement(String name,AttributeList attributes) { if(name.equals(“price”)) { String attribute = attributes.getValue(“price”); 244 Chapter 8: Alternative API: SAX EXAMPLE 10 2429 CH08 11/12/99 1:09 PM Page 244 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... parameters: the filename and the longest delay java -classpath c: \xml4 j \xml4 j.jar;classes com.psol.xbe.BestDeal ➥product .xml 60 returns The best deal is proposed by XMLi a XML Training at 699.0 delivered in 45 days whereas java -classpath c: \xml4 j \xml4 j.jar;classes com.psol.xbe.BestDeal ➥product .xml 3 returns The best deal is proposed by Emailaholic a XML Training at 1999.0 delivered in 2 days A Layered Architecture... are not errors as defined by the XML specification For example, some parsers issue a warning when there is no XML declaration It is not an error (because the declaration is optional), but it is worth noting • error() signals errors as defined by the XML specification • fatalError() signals fatal errors, as defined by the XML specification SAXException Most methods defined by the SAX standard can throw... it takes the urgency in consideration Indeed, the cheapest vendor (XMLi) is also the slowest one to deliver On the other hand, Emailaholic is expensive but it delivers in two days Listing 8.4: Improved Best Deal Looker /* * XML By Example, chapter 8: SAX */ package com.psol.xbe; import java.util.*; import org .xml. sax.*; import org .xml. sax.helpers.ParserFactory; /** * Starting point class: initializes... 11 2429 CH09 11/12/99 1:02 PM Page 269 9 Writing XML In the last four chapters, you learned how to use XML documents in your applications You studied style sheets and how to convert XML documents in HTML You also learned how to read XML documents from JavaScript or Java applications with a parser This chapter looks at the mirror problem: how to write XML documents from an application The mirror component... modifying XML documents Listing 9.1 is the XML price list used in Chapter 7 EXAMPLE ✔ The example in the section “A DOM Application” in Chapter 7 (page 199) converted the prices into Euros and printed the result With small changes to the original application, you can record the new prices in the original document Listing 9.1: XML Price List < ?xml version=”1.0”?> XML Editor... Alternative API: SAX Figure 8.6: Price list structure Listing 8.3: Price List with Delivery Information < ?xml version=”1.0”?> XML Training Playfield Training 999.00 899.00 XMLi 2999.00 1499.00 699.00... parser is called a generator Whereas the parser reads XML documents, the generator writes them In this chapter, you learn how to write documents • through DOM, which is ideal for modifying XML documents • through your own generator, which is more efficient The Parser Mirror In practice, some parsers integrate a generator They can read and write XML documents Consequently, the term parser is often used... need packaged parsers: to shield the programmer from the XML syntax • The other school argues that writing XML documents is simple and can easily be done with ad hoc code As usual, I’m a pragmatist and I choose one option or the other depending on the needs of the application at hand In general, however, it is dramatically easier to generate XML documents than to read them This is because you control... watermark 11 2429 CH09 11/12/99 1:02 PM Page 270 270 Chapter 9: Writing XML However, when writing the document, you decide If your applications don’t need entities, don’t use them If you are happy with ASCII, stick to it Most applications need few of the features of XML besides the tagging mechanism Therefore, although a typical XML parser is a thousand lines of code, a simple but effective generator... You’ll start by using a DOM parser to generate XML documents and then you’ll see how to write your own generator Finally, you will see how to support different DTDs The techniques are illustrated with JavaScript but port easily in to Java Modifying a Document with DOM In Chapter 7, “The Parser and DOM,” you saw how DOM parsers read documents That is only one half of DOM The other half is writing XML documents . c: xml4 j xml4 j.jar;classes ➥com.psol.xbe.Cheapest prices .xml This command assumes that the XML price list from Listing 8.1 is in a file called prices .xml. defined by the XML specification. • fatalError() signals fatal errors, as defined by the XML specification. SAXException Most methods defined by the SAX

Ngày đăng: 14/12/2013, 18:15

Tài liệu cùng người dùng

Tài liệu liên quan