perl the complete reference second edition phần 6 pps

125 449 0
perl the complete reference second edition phần 6 pps

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

the GET method has a limited transfer size. Although there is officially no limit, most people try to keep GET method requests down to less than 1K (1,024 bytes). Also note that because the information is placed into an environment variable, your operating system might have limits on the size of either individual environment variables or the environment space as a whole. The POST method has no such limitation. You can transfer as much information as you like within a POST request without fear of any truncation along the way. However, you cannot use a POST request to process an extended URL. For the POST method, the CONTENT_LENGTH environment variable contains the length of the query supplied, and it can be used to ensure that you read the right amount of information from the standard input. Chapter 18: Developing for the World Wide Web (WWW) 585 DEVELOPING APPLICATIONS Figure 18-1. The Book Bug Report form from www.mcwords.com Extracting Form Data No matter how the field data is transferred, there is a format for the information that you need to be aware of before you can use the information. The HTML form defines a number of fields, and the name and contents of the field are contained within the query string that is supplied. The information is supplied as name/value pairs, separated by ampersands (&). Each name/value pair is then also separated by an equal sign. For example, the following query string shows two fields, first and last: first=Martin&last=Brown Splitting these fields up is easy within Perl. You can use split to do the hard work for you. One final note, though—many of the characters you may take for granted are encoded so that the URL is not misinterpreted. Imagine what would happen if my name contained an ampersand or equal sign! The encoding, like other elements, is very simple. It uses a percent sign, followed by a two-digit hex string that defines the ASCII character code for the character in question. So the string “Martin Brown” would be translated into, Martin%20Brown where 20 is the hexadecimal code for ASCII character 32, the space. You may also find that spaces are encoded using a single + sign (the example that follows accounts for both formats). Armed with all this information, you can use something like the init_cgi function, shown next, to access the information supplied by a browser. The function supports both GET and POST requests: sub init_cgi { my $query = $ENV{QUERY_STRING}; # get the query string my $length = $ENV{CONTENT_LENGTH}; # get the content length my (@assign, %formlist); # create some temporaries if ($query =~ /\w+/) # Check if GET query contains data { @assign = split('&',$query); # Extract the field/value pairs } elsif (defined($length) and $length>0)#GETisempty, POST instead { sysread(STDIN, $_, $length); # Read in CONTENT_LENGTH bytes chomp; @assign = split('&'); # Extract the field/value pairs } 586 Perl: The Complete Reference foreach (@assign) # Now split field/value pairs to hash { my ($name,$value) = split /=/; $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; if (defined($formlist{$name})) # If the field exists, append data { $formlist{$name} .= ",$value"; } else # Otherwise, create new hash key { $formlist{$name} = $value; } } return %formlist; # Return the hash to the caller } The steps are straightforward, and they follow the description. First of all, you access the query string—either by getting the value of the QUERY_STRING environment variable or by accepting input up to the length specified in CONTENT_LENGTH—from standard input using the sysread function. Note that you must use this method rather than the <STDIN> operator because you want to ensure that you read in the entire contents, irrespective of any line termination. HTML forms provide multiline text entry fields, and using a line input operator could lead to unexpected results. Also, it’s possible to transfer binary information using a POST method, and any form of line processing might produce a garbled response. Finally, sysread acts as a security check. Many “denial of service” attacks (where too much information or too many requests are sent, therefore denying service to other users) prey on the fact that a script accepts an unlimited amount of information while also tricking the server into believing that the query length is small or even unspecified. If you arbitrarily imported all the information provided, you could easily lock up a small server. Once you have obtained the query string, you split it by an ampersand into the @assign array and then process each field/value pair in turn. For convenience, you place the information into a hash. The keys of the hash become the field names, and the corresponding values become the values as supplied by the browser. The most important trick here is the line $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; This uses the functional replacement to a standard regular expression to decode the %xx characters in the query into their correct values. To encode the information back into the URL format within your script, the best solution is to use the URI::Escape module by Gisle Aas. This provides a function, Chapter 18: Developing for the World Wide Web (WWW) 587 DEVELOPING APPLICATIONS uri_escape, for converting a string into its URL-escaped equivalent. You can also use uri_unescape to convert it back. See Appendix D for more information. Using the above function (init_cgi), you can write a simple Perl script that reports the information provided to it by either method (this uses the init_cgi script shown earlier, but it’s not included here for brevity): #!/usr/local/bin/perl –w print "Content-type: text/html\n\n"; %form = init_cgi(); print("Form length is: ", scalar keys %form, "<br>\n"); for my $key (sort keys %form) { print "Key $key = $form{$key}<br>\n"; } If you place this on a server and supply it a URL such as this: http://www.mcwords.com/cgi/test.cgi?first=Martin&last=Brown the browser window reports this back: Form length is: 2 Key first = Martin Key last = Brown Success! Of course, most scripts do other things besides printing the information back. Either they format the data and send it on in an email, or search a database, or perform a myriad of other tasks. What has been demonstrated here is how to extract the information supplied via either method into a suitable hash structure that you can use within Perl. How you use the information depends on what you are trying to achieve. The process detailed here has been duplicated many times in a number of different modules. The best solution, though, is to use the facilities provided by the standard CGI module. This comes with the standard Perl distribution and should be your first point of call for developing web applications. We’ll be taking a closer look at the CGI module in the next chapter. 588 Perl: The Complete Reference Chapter 18: Developing for the World Wide Web (WWW) 589 DEVELOPING APPLICATIONS Sending Information Back to the Browser Communicating information back to the user is so simple, you’ll be looking for ways to make it more complicated. In essence, you print information to STDOUT, and this is then sent back verbatim to the browser. The actual method is more complex. When a web server responds with a static file, it returns an HTTP header that tells the browser about the file it is about to receive. The header includes information such as the content length, encoding, and so on. It then sends the actual document back to the browser. The two elements—the header and the document—are separated by a single blank line. How the browser treats the document it receives is depends on the information supplied by the HTTP header and the extension of the file it receives. This allows you to send back a binary file (such as an image) directly from a script by telling the application what data format the file is encoded with. When using a CGI application, the HTTP header is not automatically attached to the output generated, so you have to generate this information yourself. This is the reason for the print "Content-type: text/html\n\n"; lines in the previous examples. This indicates to the browser that it is accepting a file using text encoding in html format. There are other fields you can return in the HTTP header, which we’ll look at now. HTTP Headers The HTTP header information is returned as follows: Field: data The case of the Field name is important, but otherwise you can use as much white space as you like between the colon and the field data. A sample list of HTTP header fields is shown in Table 18-2. The only required field is Content-type, which defines the format of the file you are returning. If you do not specify anything, the browser assumes you are sending back preformatted raw text, not HTML. The definition of the file format is by a MIME string. MIME is an acronym for Multipurpose Internet Mail Extensions, and it is a slash-separated string that defines the raw format and a subformat within it. For example, text/html says the information returned is plain text, using HTML as a file format. Mac users will be familiar with the concept of file owners and types, and this is the basic model employed by MIME. 590 Perl: The Complete Reference Field Meaning Allow: list A comma-delimited list of the HTTP request methods supported by the requested resource (script or program). Scripts generally support GET and POST; other methods include HEAD, POST, DELETE, LINK, and UNLINK. Content-encoding: string The encoding used in the message body. Currently the only supported formats are Gzip and compress. If you want to encode data this way, make sure you check the value of HTTP_ACCEPT_ENCODING from the environment variables. Content-type: string A MIME string defining the format of the file being returned. Content-length: string The length, in bytes, of the data being returned. The browser uses this value to report the estimated download time for a file. Date: string The date and time the message is sent. It should be in the format 01 Jan 1998 12:00:00 GMT. The time zone should be GMT for reference purposes; the browser can calculate the difference for its local time zone if it has to. Expires: string The date the information becomes invalid. This should be used by the browser to decide when a page needs to be refreshed. Last-modified: string The date of last modification of the resource Location: string The URL that should be returned instead of the URL requested MIME-version: string The version of the MIME protocol supported Server: string/string The web server application and version number Title: string The title of the resource URI: string The URI that should be returned instead of the requested one Table 18-2. HTTP Header Fields TEAMFLY Team-Fly ® Chapter 18: Developing for the World Wide Web (WWW) 591 DEVELOPING APPLICATIONS Other examples include application/pdf, which states that the file type is application (and therefore binary) and that the file’s format is pdf, the Adobe Acrobat file format. Others you might be familiar with are image/gif, which states that the file is a GIF file, and application/zip, which is a compressed file using the Zip algorithm. This MIME information is used by the browser to decide how to process the file. Most browsers will have a mapping that says they deal with files of type image/gif so that you can place graphical files within a page. They may also have an entry for application/pdf, which either calls an external application to open the received file or passes the file to a plug-in that optionally displays the file to the user. For example, here’s an extract from the file supplied by default with the Apache web server: application/mac-binhex40 hqx application/mac-compactpro cpt application/macwriteii application/msword doc application/news-message-id application/news-transmission application/octet-stream bin dms lha lzh exe class application/oda oda application/pdf pdf application/postscript ai eps ps application/powerpoint ppt application/remote-printing application/rtf rtf application/slate application/wita application/wordperfect5.1 application/x-bcpio bcpio application/x-cdlink vcd application/x-compress application/x-cpio cpio application/x-csh csh application/x-director dcr dir dxr It’s important to realize the significance of this one, seemingly innocent, field. Without it, your browser would not know how to process the information it receives. Normally the web server sends the MIME type back to the browser, and it uses a lookup table that maps MIME strings to file extensions. Thus, when a browser requests myphoto.gif, the server sends back a Content-type field value of image/gif. Since a script is executed by the server rather than sent back verbatim to the browser, it must supply this information itself. 592 Perl: The Complete Reference Other fields in Table 18-2 are optional but also have useful applications. The Location field can be used to automatically redirect a user to an alternative page without using the normal RELOAD directive in an HTML file. The existence of the Location field automatically instructs the browser to load the URL contained in the field’s value. Here’s another script that uses the earlier init_cgi function and the Location HTTP field to point a user in a different direction: %form = init_cgi(); respond("Error: No URL specified") unless(defined($form{url})); open(LOG,">>/usr/local/http/logs/jump.log") or respond("Error: A config error has occurred"); print LOG (scalar(localtime(time)), " $ENV{REMOTE_ADDR} $form{url}\n"); close(LOG) or respond("Error: A config error has occurred"); print "Location: $form{url}\n\n"; sub respond { my $message = shift; print "Content-type: text/html\n\n"; show_debug(); print <<EOF; <head> <title>$message</title> </head> <body> $message </body> EOF exit; } This is actually a version of a script used on a number of sites I have developed that allows you to keep a log of when a user clicks onto a foreign page. For example, you might have links on a page to another site, and you want to be able to record how many people visit this other site from your page. Instead of using a normal link within your HTML document, you could use the CGI script: <a href="/cgi/redirect.pl?url=http://www.mcwords.com">MCwords</a> Every time users click on this link, they will still visit the new site, but you’ll have a record of their leap off of your site. Document Body You already know that the document body should be in HTML. To send output, you just print to STDOUT, as you would with any other application. In an ideal world, you should consider using something like the CGI module to help you build the pages correctly. It will certainly remove a lot of clutter from your script, while also providing a higher level of reliability for the HTML you produce. Unfortunately, it doesn’t solve any of the problems associated with a poor HTML implementation within a browser. However, because you just print the information to standard output, you need to take care with errors and other information that might otherwise be sent to STDERR. You can’t use warn or die, because any message produced will not be displayed to the user. While this might be what you want as a web developer (the information is usually recorded in the error log), it is not very user friendly. The solution is to use something like the function shown in the previous redirection example to report an error back to the user. Again, this is an important thing to grasp. There is nothing worse from a user’s point of view than this displayed in the browser: Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, webmaster@mchome.com and inform them of the time the error occurred, and anything you might have done that may have caused the error. Smarter Web Programming Up until now, we have been specifically concentrating on the mechanics behind Perl CGI scripts. Although we’ve seen solutions for certain aspects of the process, there are easier ways of doing things. Since you already know how to obtain information supplied on a web form, we will instead concentrate on the semantics and process for the script contents. In particular, we’ll examine the CGI module, web cookies, the debug process, and how to interface to other web-related languages. Chapter 18: Developing for the World Wide Web (WWW) 593 DEVELOPING APPLICATIONS The CGI Module The CGI module started out as a separate module available from CPAN. It’s now included as part of the standard distribution and provides a much easier interface to web programming with Perl. As well as providing a mechanism for extracting elements supplied on a form, it also provides an object-oriented interface to building web pages and, more usefully, web forms. You can use this interface either in its object-oriented format or with a simple functional interface. Along with the standard CGI interface and the functions and object features supporting the production of “good” HTML, the module also supports some of the more advanced features of CGI scripting. These include the support for uploading files via HTTP and access to cookies—something we’ll be taking a look at later in this chapter. For the designers among you, the CGI module also supports cascading style sheets and frames. Finally, it supports server push—a technology that allows a server to send new data to a client at periodic intervals. This is useful for pages, and especially images, that need to be updated. This has largely been superseded by the client-side RELOAD directive, but it still has its uses. For example, you can build a single CGI script for converting Roman numerals into integer decimal numbers using the following script. It not only builds and produces the HTML form, but also provides a method for processing the information supplied when the user fills in and submits the form. #!/usr/local/bin/perl -w use CGI qw/:standard/; print header, start_html('Roman Numerals Conversion'), h1('Roman Numeral Converter'), start_form, "What's the Roman Numeral number?", textfield('roman'),p, submit, end_form,p,hr,p; if (param()) { print(h3('The value is ', parse_roman(uc(param('roman')))),p,hr); } sub parse_roman 594 Perl: The Complete Reference [...]... can do the XML::Parser module provides the basis for extracting XML data; all you need to do is work out what you want to do with those tags and the information they delimit 60 6 Perl: The Complete Reference This is easy if the query data is simple, but what if the information needs to be escaped because of special characters? In this instance, the easiest thing is to grab a GET-based URL from the browser,... line 1 The traditional way of enabling warnings was to use the -w argument on the command line: perl -w myscript.pl You can also supply the option within the "shebang" line: #/usr/local/bin /perl -w DEVELOPING APPLICATIONS I Deprecated functions, operators, and variables 61 4 Perl: The Complete Reference But be careful about using command line options on operating systems that restrict the length of the. .. introduces the page title and sets the header and body style The h1 function formats the supplied text in the header level-one style The start_form function initiates an HTML form By default, it assumes you are using the same script—this is an HTML/browser feature rather than a Perl CGI feature, and the textfield function inserts a simple text field The argument supplied defines the name of the field... with the report so that you can start to isolate the problem In particular, consider stacking up the errors in an array by just using a simple push call, and then call a function right at the end of the script to dump out the date, time, and error log, along with the values of the environment variables I’ve used a function similar to the one that follows to dump out the information at the end of the. .. warnings and the strict pragma enabled at all times This will help to ensure that your scripts are written to as tight a definition of the Perl language as possible, and as such we’ll give these two systems extended attention in this chapter The last part of the chapter deals with the other Perl pragmas These change the way in which Perl operates, such as by adding additional library directories to the search... the values from the fields specified In the example, there is only one field, roman, which contains the Roman numeral string entered by the user The parse_roman function then does all the work of parsing the string and translating the Roman numerals into integer values I’ll leave it up to the reader to determine how this function works This concludes our brief look into the use of the CGI module for... join "\n",@errorlist; } select $old; } 60 8 Perl: The Complete Reference Remember, as well, that any additional modules you need to load when the script initializes will add seconds to the time to start up the script: anything that can be avoided should be avoided Alternatively, think about using the mod _perl Apache module This provides an interface between Apache and Perl CGI scripts One of its major benefits... complicated Perl is not a compiled language in the true sense like C/C++ There is a compilation stage, and before this there is also a parsing stage where the code is checked All of this happens in the milliseconds before the code is actually executed Perl also supports run-time errors These are errors or potential problems that Perl identifies while the code is executing; they include simple warnings like undefined... Table 19-2 The switches interact with the $^W variable and the new lexical warnings according to the following rules: I If no command line switches are supplied, and neither the $^W variable nor the warnings pragma is in force, then default warnings will be enabled, and optional warnings disabled I The -w sets the $^W variable as normal I If a block makes use of the warnings pragma, both the $^W and... executes them within an embedded Perl interpreter that is part of the Apache web server Additional invocations of the script do not require reloading They are already loaded, and the Perl interpreter does not need to be invoked for each CGI script This helps both performance and memory management Security The number of attacks on Internet sites is increasing Whether this is due to the meteoric rise of the . you place the information into a hash. The keys of the hash become the field names, and the corresponding values become the values as supplied by the browser. The most important trick here is the line $value. and other selections. 598 Perl: The Complete Reference In either case, the creation of a cookie and how you access the information stored in a cookie are server-based requests, since it’s the. tells the browser about the file it is about to receive. The header includes information such as the content length, encoding, and so on. It then sends the actual document back to the browser. The

Ngày đăng: 13/08/2014, 22:21

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan