perl the complete reference second edition phần 3 doc

210 Perl: The Complete Reference M ost software is written to work with and modify data in one format or another. Perl was originally designed as a system for processing logs and summarizing and reporting on the information. Because of this focus, a large proportion of the functions built into Perl are dedicated to the extraction and recombination of information. For example, Perl includes functions for splitting a line by a sequence of delimiters, and it can recombine the line later using a different set. If you can’t do what you want with the built-in functions, then Perl also provides a mechanism for regular expressions. We can use a regular expression to extract information, or as an advanced search and replace tool, and as a transliteration tool for converting or stripping individual characters from a string. In this chapter, we’re going to concentrate on the data-manipulation features built into Perl, from the basics of numerical calculations through to basic string handling. We’ll also look at the regular expression mechanism and how it works and integrates into the Perl language. We’ll also take the opportunity to look at the Unicode character system. Unicode is a standard for displaying strings that supports not only the ASCII standard, which represents characters by a single byte, but also provides support for multibyte characters, including those with accents, and also those in non-Latin character sets such as Greek and kanji (as used in the far east). Working with Numbers The core numerical ability of Perl is supported through the standard operators that you should be familiar with. For example, all of the following expressions return the sort of values you would expect: $result = 3+4; $ftoc = (212-32)*(5/9); $square = 16*2; Beyond these basic operators, Perl also supports a number of functions that fill in the gaps. Without exception, all of these functions automatically use the value of $_ if you fail to specify a variable on which to operate. abs—the Absolute Value When you are concerned only with magnitude—for example, when comparing the size of two objects—the designation of negative or positive is not required. You can use the abs function to return the absolute value of a number: print abs(-1.295476); TEAMFLY Team-Fly ® Chapter 8: Data Manipulation 211 FUNDAMENTALS This should print a value of 1.295476. Supplying a positive value to abs will return the same positive value or, more correctly, it will return the nondesignated value: all positive values imply a + sign in front of them. int—Converting Floating Points to Integers To convert a floating point number into an integer, you use the int function: print int abs(-1.295476); This should print a value of 1. The only problem with the int function is that it strictly removes the fractional component of a number; no rounding of any sort is done. If you want to return a number that has been rounded to a number of decimal places, use the printf or sprintf function: printf("%.2f",abs(-1.295476)); This will round the number to two decimal places—a value of 1.30 in this example. Note that the 0 is appended in the output to show the two decimal places. exp—Raising e to the Power To perform a normal exponentiation operation on a number, you use the ** operator: $square = 4**2; This returns 16, or 4 raised to the power of 2. If you want to raise the natural base number e to the power, you need to use the exp function: exp EXPR exp If you do not supply an EXPR argument, exp uses the value of the $_variable as the exponent. For example, to find the square of e: $square = exp(2); sqrt—the Square Root To get the square root of a number, use the built-in sqrt function: $var = sqrt(16384); 212 Perl: The Complete Reference To calculate the nth root of a number, use the ** operator with a fractional number. For example, the following line $var = 16384**(1/2); is identical to $var = sqrt(16384); To find the cube root of 16,777,216, you might use $var = 16777216**(1/3); which should return a value of 256. log—the Logarithm To find the logarithm (base e) of a number, you need to use the log function: $log = log 1.43; Trigonometric Functions There are three built-in trigonometric functions for calculating the arctangent squared (atan2), cosine (cos), and sine (sin) of a value: atan2 X,Y cos EXPR sin EXPR If you need access to the arcsine, arccosine, and tangent, then use the POSIX module, which supplies the corresponding acos, asin, and tan functions. Unless you are doing trigonometric calculations, there is little use for these functions in everyday life. However, you can use the sin function to calculate your biorhythms using the simple script shown next, assuming you know the number of days you have been alive: my ($phys_step, $emot_step, $inte_step) = (23, 28, 33); use Math::Complex; print "Enter the number of days you been alive:\n"; Chapter 8: Data Manipulation 213 FUNDAMENTALS my $alive = <STDIN>; $phys = int(sin(((pi*($alive%$phys_step))/($phys_step/2)))*100); $emot = int(sin(((pi*($alive%$emot_step))/($emot_step/2)))*100); $inte = int(sin(((pi*($alive%$inte_step))/($inte_step/2)))*100); print "Your Physical is $phys%, Emotional $emot%, Intellectual $inte%\n"; Conversion Between Bases Perl provides automatic conversion to decimal for numerical literals specified in binary, octal, and hexadecimal. However, the translation is not automatic on values contained within strings, either those defined using string literals or from strings imported from the outside world (files, user input, etc.). To convert a string-based literal, use the oct or hex functions. The hex function converts only hexadecimal numbers supplied with or without the 0x prefix. For example, the decimal value of the hexadecimal string “ff47ace3” (42,828,873,954) can be displayed with either of the following statements: print hex("ff47ace3"); print hex("0xff47ace3"); The hex function doesn’t work with other number formats, so for strings that start with 0, 0b, or 0x, you are better off using the oct function. By default, the oct function interprets a string without a prefix as an octal string and raises an error if it doesn’t see it. So this print oct("755"); is valid, but this print oct("aef"); will fail. If you supply a string using one of the literal formats that provides the necessary prefix, oct will convert it, so all of the following are valid: print oct("0755"); print oct("0x7f"); print oct("0b00100001"); 214 Perl: The Complete Reference Both oct and hex default to using the $_ variable if you fail to supply an argument. To print out a decimal value in hexadecimal, binary, or octal, use printf, or use sprintf to print a formatted base number to a string: printf ("%lb %lo %lx", oct("0b00010001"), oct("0755"), oct("0x7f")); See printf in Chapter 7 for more information. Conversion Between Characters and Numbers If you want to insert a specific character into a string by its numerical value, you can use the \0 or \x character escapes: print "\007"; print "\x07"; These examples print the octal and hexadecimal values; in this case the “bell” character. Often, though, it is useful to be able to specify a character by its decimal number and to convert the character back to its decimal equivalent in the ASCII table. The chr function returns the character matching the value of EXPR, or $_if EXPR is not specified. The value is matched against the current ASCII table for the operating system, so it could reveal different values on different platforms for characters with an ASCII value of 128 or higher. This may or may not be useful. The ord function returns the numeric value of the first character of EXPR, or $_ if EXPR is not specified. The value is returned according to the ASCII table and is always unsigned. Thus, using the two functions together, print chr(ord('b')); we should get the character “b”. Random Numbers Perl provides a built-in random number generator. All random numbers need a “seed” value, which is used in an algorithm, usually based on the precision, or lack thereof, for a specific calculation. The format for the rand function is rand EXPR rand The function returns a floating-point random number between 0 and EXPR or between 0 and 1 (including 0, but not including 1) if EXPR is not specified. If you want an integer random number, just use the int function to return a reasonable value, as in this example: print int(rand(16)),"\n"; You can use the srand function to seed the random number generator with a specific value: srand EXPR The rand function automatically calls the srand function the first time rand is called, if you don’t specifically seed the random number generator. The default seed value is the value returned by the time function, which returns the number of seconds from the epoch (usually January 1, 1970 UTC—although it’s dependent on your platform). The problem is that this is not a good seed number because its value is predictable. Instead, you might want to try a calculation based on a combination of the current time, the current process ID, and perhaps the user ID, to seed the generator with an unpredictable value. I’ve used the following calculation as a good seed, although it’s far from perfect: srand((time() ^ (time() % $])) ^ exp(length($0))**$$); By mixing the unpredictable values of the current time and process ID with predictable values, such as the length of the current script and the Perl version number, you should get a reasonable seed value. The following program calculates the number of random numbers generated before a duplicate value is returned: my %randres; my $counter = 1; srand((time() ^ (time() % $])) ^ exp(length($0))**$$); while (my $val = rand()) { last if (defined($randres{$val})); print "Current count is $counter\n" if (($counter %10000) == 0); $randres{$val} = 1; $counter++; } print "Out of $counter tries I encountered a duplicate random number\n"; Chapter 8: Data Manipulation 215 FUNDAMENTALS 216 Perl: The Complete Reference Whatever seed value you choose, the internal random number generator is unlikely to give you more than 500 numbers before a duplicate appears. This makes it unsuitable for secure purposes, since you need a random number that cannot otherwise be predicted. The Math::TrulyRandom module provides a more robust system for generating random numbers. If you insert the truly_random_value function in place of the rand function in the preceding program, you can see how long it takes before a random number reappears. I’ve attained 20,574 unique random numbers with this function using that test script, and this should be more than enough for most uses. Working with Very Small Integers Perl uses 32-bit integers for storing integers and for all of its integer-based math. Occasionally, however, it is necessary to store and handle integers that are smaller than the standard 32-bit integers. This is especially true in databases, where you may wish to store a block of Boolean values: even using a single character for each Boolean value will take up eight bits. A better solution is to use the vec function, which supports the storage of multiple integers as strings: vec EXPR, OFFSET, BITS The EXPR is the scalar that will be used to store the information; the OFFSET and BITS arguments define the element of the integer string and the size of each element, respectively. The return value is the integer store at OFFSET of size BITS from the string EXPR. The function can also be assigned to, which modifies the value of the element you have specified. For example, using the preceding database example, you might use the following code to populate an “option” string: vec($optstring, 0, 1) = $print ? 1 : 0; vec($optstring, 1, 1) = $display ? 1 : 0; vec($optstring, 2, 1) = $delete ? 1 : 0; print length($optstring),"\n"; The print statement at the end of the code displays the length, in bytes, of the string. It should report a size of one byte. We have managed to store three Boolean values within less than one real byte of information. The bits argument allows you to specify select larger bit strings: Perl supports values of 1, 2, 4, 8, 16, and 32 bits per element. You can therefore store four 2-bit integers (up to an integer value of 3, including 0) in a single byte. Obviously the vec function is not limited to storing and accessing your own bitstrings; it can be used to extract and update any string, providing you want to modify 1, 2, 4, 8, 16, or 32 bits at a time. Perl also guarantees that the first bit, accessed with vec($var, 0, 1); FUNDAMENTALS will always be the first bit in the first character of a string, irrespective of whether your machine is little endian or big endian. Furthermore, this also implies that the first byte of a string can be accessed with vec($var, 0, 8); The vec function is most often used with functions that require bitsets, such as the select function. You’ll see examples of this in later chapters. Little endian machines store the least significant byte of a word in the lower byte address, while big endian machines store the most significant byte at this position. This affects the byte ordering of strings, but doesn’t affect the order of bits within those bytes. Working with Strings Creating a new string scalar is as easy as assigning a quoted value to a variable: $string = "Come grow old along with me\n"; However, unlike C and some other languages, we can’t access individual characters by supplying their index location within the string, so we need a function for that. This same limitation also means that we need some solutions for splitting, extracting, and finding characters within a given string. String Concatenation We have already seen in Chapter 3 the operators that can be used with strings. The most basic operator that you will need to use is the concatenation operator. This is a direct replacement for the C strcat() function. The problem with the strcat() function is that it is inefficient, and it requires constant concatenation of a single string to a single variable. Within Perl, you can concatenate any string, whether it has been derived from a static quoted string in the script itself, or in scripts exported by functions. This code fragment: $thetime = 'The time is ' . localtime() . "\n"; assigns the string, without interpolation; the time string, as returned by localtime; and the interpolated newline character to the $thetime variable. The concatenation operator is the single period between each element. It is important to appreciate the difference between using concatenation and lists. This print statement: print 'The time is ' . localtime() . "\n"; Chapter 8: Data Manipulation 217 produces the same result as print 'The time is ', localtime(), "\n"; However, in the first example, the string is concatenated before being printed; in the second, the print function is printing a list of arguments. You cannot use the second format to assign a compound string to a scalar—the following line will not work: $string = 'The time is ', localtime(), "\n"; Concatenation is also useful when you want to express a sequence of values as only a single argument to a function. For example: $string = join($suffix . ':' . $prefix, @strings); String Length The length function returns the length, in characters (rather than bytes), of the supplied string (see the “Unicode” section at the end of this chapter for details on the relationship between bytes and characters). The function accepts only a single argument (or it returns the length of the $_ variable if none is specified): print "Your name is ",length($name), "characters long\n"; Case Modifications There are some simple modifications built into Perl as functions that may be more convenient and quicker than using the regular expressions we will cover later in this chapter. The four basic functions are lc, uc, lcfirst, and ucfirst. They convert a string to all lowercase, all uppercase, or only the first character of the string to lowercase or uppercase, respectively. For example: $string = "The Cat Sat on the Mat"; print lc($string) # Outputs 'the cat sat on the mat' print lcfirst($string) # Outputs 'the Cat Sat on the Mat' print uc($string) # Outputs 'THE CAT SAT ON THE MAT' print ucfirst($string) # Outputs 'The Cat Sat on the Mat' These functions can be useful for “normalizing” a string into an all uppercase or lowercase format—useful when combining and de-duping lists when using hashes. 218 Perl: The Complete Reference Chapter 8: Data Manipulation 219 FUNDAMENTALS End-of-Line Character Removal When you read in data from a filehandle using a while or other loop and the <FH> operator, the trailing newline on the file remains in the string that you import. You will often find yourself processing the data contained within each line, and you will not want the newline character. The chop function can be used to strip the last character off any expression: while(<FH>) { chop; } The only danger with the chop function is that it strips the last character from the line, irrespective of what the last character was. The chomp function works in combination with the $/ variable when reading from filehandles. The $/ variable is the record separator that is attached to the records you read from a filehandle, and it is by default set to the newline character. The chomp function works by removing the last character from a string only if it matches the value of $/. To do a safe strip from a record of the record separator character, just use chomp in place of chop: while(<FH>) { chomp; } This is a much safer option, as it guarantees that the data of a record will remain intact, irrespective of the last character type. String Location Within many programming languages, a string is stored as an array of characters. To access an individual character within a string, you need to determine the location of the character within the string and access that element of the array. Perl does not support this option, because often you are not working with the individual characters within the string, but the string as a whole. Two functions, index and rindex, can be used to find the position of a particular character or string of characters within another string: index STR, SUBSTR [, POSITION] rindex STR, SUBSTR [, POSITION] [...]...220 Perl: The Complete Reference The index function returns the first position of SUBSTR within the string STR, or it returns –1 if the string cannot be found If the POSITION argument is specified, then the search skips that many characters from the start of the string and starts the search at the next character The rindex function returns the opposite of the index function the last occurrence... specify the individual elements The groupings are the elements in standard parentheses, and each one will match (we hope) in sequence, returning a list that has been assigned to the hours, minutes, and seconds variables 235 236 Perl: The Complete Reference In a scalar context, the /g modifier performs a progressive match For each execution of the match, Perl starts searching from the point in the search... modifies the statement on the left based on the regular expression on the right The second operator, !~, is for matches only and is the exact opposite: it returns true only if the value on the left does not match the regular expression on the right Although often used on their own in combination with the pattern binding operators, regular expressions also appear in two other locations within Perl When... one-pass process, however The substitution operation is not put into a loop For example, in the following substitution we replace “o” with “oo”: 240 Perl: The Complete Reference The while loop will drop out as soon as the substitution fails to find a double space The /e modifier causes Perl to evaluate the REPLACEMENT text as if it were a Perl expression, and then to use the value as the replacement string... during the execution of a script—otherwise you may end up with extraneous matches The /x modifier enables you to introduce white space and comments into an expression for clarity For example, the following match expression looks suspiciously like line noise: 233 234 Perl: The Complete Reference Note the terminology here—we are matching the letters “f”, “o”, and “o” in that sequence, somewhere within the. .. new LIST, irrespective of the number of elements removed or replaced The array will FUNDAMENTALS Like its cousin pop, if ARRAY is not specified, it shifts the first value from the @_ array within a subroutine, or the first command line argument stored in @ARGV otherwise The opposite is unshift, which places new elements at the start of the array: 2 23 224 Perl: The Complete Reference shrink or grow as... is passed, such that 231 232 Perl: The Complete Reference for the match against the supplied list Using grep with a regular expression is similar in principle to using a standard match within the confines of a loop The statements on the right side of the two test and assignment operators must be regular expression operators There are three regular expression operators within Perl m// (match), s///... string just past the last match You can use this to progress through an array searching for the same string without having to remove or manually set the starting position of the search The position of the last match can be used within a regular expression using the \G assertion When /g fails to match, the position is reset to the start of the string If you use the /c modifier as well, then the position... s#(\d+)/(\d+)/(\d+)# $3$ 1$2#; This example also demonstrates the fact that you can use delimiters other than the forward slash for substitutions too Just like the match operator, the character used is the one immediately following the “s” Alternatively, if you specify a naturally paired FUNDAMENTALS The PATTERN is the regular expression for the text that we are looking for The REPLACEMENT is a specification for the. .. use to replace the found text with For example, you may remember from the substr definition earlier in the chapter that you could replace a specific number of characters within a string by using assignment: 237 238 Perl: The Complete Reference delimiter, such as a brace; then the replacement expression can have its own pair of delimiters: $date = s{(\d+)/(\d+)/(\d+)} { $3$ 1$2}x; Note that the return value . the square of e: $square = exp(2); sqrt the Square Root To get the square root of a number, use the built-in sqrt function: $var = sqrt(1 638 4); 212 Perl: The Complete Reference To calculate the. specified, then the search skips that many characters from the start of the string and starts the search at the next character. The rindex function returns the opposite of the index function the last. [, POSITION] 220 Perl: The Complete Reference The index function returns the first position of SUBSTR within the string STR, or it returns –1 if the string cannot be found. If the POSITION argument

perl the complete reference second edition phần 3 doc

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan