sed awk, 2nd edition

570 102 0
sed  awk, 2nd edition

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

By Dale Dougherty & Arnold Robbins; ISBN 1-56592-225-5, 432 pages Second Edition, March 1997 (See the catalog page for this book.) Index Symbols | A | B | C | D | E | F | G | H | I | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y Table of Contents Preface Chapter 1: Power Tools for Editing Chapter 2: Understanding Basic Operations Chapter 3: Understanding Regular Expression Syntax Chapter 4: Writing sed Scripts Chapter 5: Basic sed Commands Chapter 6: Advanced sed Commands Chapter 7: Writing Scripts for awk Chapter 8: Conditionals, Loops, and Arrays Chapter 9: Functions Chapter 10: The Bottom Drawer Chapter 11: A Flock of awks Chapter 12: Full-Featured Applications Chapter 13: A Miscellany of Scripts Appendix A: Quick Reference for sed Appendix B: Quick Reference for awk Appendix C: Supplement for Chapter 12 Copyright © 2000 O'Reilly & QKFIN All Rights Reserved Preface Preface Contents: Scope of This Handbook Availability of sed and awk Obtaining Example Source Code Conventions Used in This Handbook About the Second Edition Acknowledgments from the First Edition Comments and Questions This book is about a set of oddly named UNIX utilities, sed and awk These utilities have many things in common, including the use of regular expressions for pattern matching Since pattern matching is such an important part of their use, this book explains UNIX regular expression syntax very thoroughly Because there is a natural progression in learning from grep to sed to awk, we will be covering all three programs, although the focus is on sed and awk Sed and awk are tools used by users, programmers, and system administrators - anyone working with text files Sed, so called because it is a stream editor, is perfect for applying a series of edits to a number of files Awk, named after its developers Aho, Weinberger, and Kernighan, is a programming language that permits easy manipulation of structured data and the generation of formatted reports This book emphasizes the POSIX definition of awk In addition, the book briefly describes the original version of awk, before discussing three freely available versions of awk and two commercial ones, all of which implement POSIX awk The focus of this book is on writing scripts for sed and awk that quickly solve an assortment of problems for the user Many of these scripts could be called "quick-fixes." In addition, we'll cover scripts that solve larger problems that require more careful design and development Scope of This Handbook Chapter 1, Power Tools for Editing, is an overview of the features and capabilities of sed and awk Chapter 2, Understanding Basic Operations, demonstrates the basic operations of sed and awk, showing a progression in functionality from sed to awk Both share a similar command-line syntax, accepting user instructions in the form of a script Chapter 3, Understanding Regular Expression Syntax, describes UNIX regular expression syntax in full detail New users are often intimidated by these strange expressions, used for pattern matching It is important to master regular expression syntax to get the most from sed and awk The pattern-matching examples in this chapter largely rely on grep and egrep Chapter 4, Writing sed Scripts, begins a three-chapter section on sed This chapter covers the basic elements of writing a sed script using only a few sed commands It also presents a shell script that simplifies invoking sed scripts Chapter 5, Basic sed Commands, and Chapter 6, Advanced sed Commands, divide the sed command set into basic and advanced commands The basic commands are commands that parallel manual editing actions, while the advanced commands introduce simple programming capabilities Among the advanced commands are those that manipulate the hold space, a set-aside temporary buffer Chapter 7, Writing Scripts for awk, begins a five-chapter section on awk This chapter presents the primary features of this scripting language A number of scripts are explained, including one that modifies the output of the ls command Chapter 8, Conditionals, Loops, and Arrays, describes how to use common programming constructs such as conditionals, loops, and arrays Chapter 9, Functions, describes how to use awk's built-in functions as well as how to write user-defined functions Chapter 10, The Bottom Drawer, covers a set of miscellaneous awk topics It describes how to execute UNIX commands from an awk script and how to direct output to files and pipes It then offers some (meager) advice on debugging awk scripts Chapter 11, A Flock of awks, describes the original V7 version of awk, the current Bell Labs awk, GNU awk (gawk) from the Free Software Foundation, and mawk, by Michael Brennan The latter three all have freely available source code This chapter also describes two commercial implementations, MKS awk and Thomson Automation awk (tawk), as well as VSAwk, which brings awk-like capabilities to the Visual Basic environment Chapter 12, Full-Featured Applications, presents two longer, more complex awk scripts that together demonstrate nearly all the features of the language The first script is an interactive spelling checker The second script processes and formats the index for a book or a master index for a set of books Chapter 13, A Miscellany of Scripts, presents a number of user-contributed scripts that show different styles and techniques of writing scripts for sed and awk Appendix A, Quick Reference for sed, is a quick reference describing sed's commands and commandline options Appendix B, Quick Reference for awk, is a quick reference to awk's command-line options and a full description of its scripting language Appendix C, Supplement for Chapter 12, presents the full listings for the spellcheck.awk script and the masterindex shell script described in Chapter 12 Availability of sed and awk Symbols | A | B | C | D | E | F | G | H | I | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y Index: Symbols and Numbers & (ampersand) && (logical AND) operator : 7.8 Relational and Boolean Operators in replacement text 5.3 Substitution 5.3.1 Replacement Metacharacters * (asterisk) ** (exponentiation) operator : 7.6 Expressions **= (assignment) operator : 7.6 Expressions *= (assignment) operator : 7.6 Expressions as metacharacter 3.1 That's an Expression 3.2.5 Repeated Occurrences of a Character multiplication operator : 7.6 Expressions \ (backslash) 7.6 Expressions (see also escape sequences, awk) \ escape sequences 3.2.11 What's the Word? Part II 11.2.3.4 Extended regular expressions \`, \' escape sequences : 11.2.3.4 Extended regular expressions character classes and : 3.2.4 Character Classes as metacharacter 3.2 A Line-Up of Characters 3.2.1 The Ubiquitous Backslash in replacement text 5.3 Substitution 5.3.1 Replacement Metacharacters {} (braces) \{\} metacharacters 3.2 A Line-Up of Characters 3.2.8 A Span of Characters in awk 2.1 Awk, by Sed and Grep, out of Ed 2.4.1 Running awk 8.1 Conditional Statements grouping sed commands in 4.2.1 Grouping Commands 5.1 About the Syntax of sed Commands [] (brackets) metacharacters 3.2 A Line-Up of Characters 3.2.4 Character Classes [::] metacharacters : 3.2.4.3 POSIX character class additions [ ] metacharacters : 3.2.4.3 POSIX character class additions [==] metacharacters : 3.2.4.3 POSIX character class additions ^ (circumflex) ^= (assignment) operator : 7.6 Expressions character classes and 3.2 A Line-Up of Characters 3.2.4.2 Excluding a class of characters exponentiation operator : 7.6 Expressions as metacharacter 3.2 A Line-Up of Characters 3.2.7 Positional Metacharacters in multiline pattern space : 6.1.1 Append Next Line : (colon) for labels : 6.4 Advanced Flow Control Commands $ (dollar sign) as end-of-line metacharacter 3.2 A Line-Up of Characters 3.2.7 Positional Metacharacters for last input line : 4.2 A Global Perspective on Addressing in multiline pattern space : 6.1.1 Append Next Line $0, $1, $2, 2.4.1 Running awk 7.5.1 Referencing and Separating Fields (dot) metacharacter 3.1 That's an Expression 3.2.2 A Wildcard 3.2.5 Repeated Occurrences of a Character = (equal sign) == (equal to) operator : 7.8 Relational and Boolean Operators for printing line numbers : 5.9 Print Line Number ! (exclamation point) 4.2 A Global Perspective on Addressing A.2.1 Pattern Addressing != (not equal to) operator : 7.8 Relational and Boolean Operators !~ (does not match) operator 7.5.1 Referencing and Separating Fields 7.8 Relational and Boolean Operators branch command versus : 6.4.1 Branching csh and : 1.4 Four Hurdles to Mastering sed and awk logical NOT operator : 7.8 Relational and Boolean Operators > (greater than sign) >= (greater than or equal to) operator : 7.8 Relational and Boolean Operators for redirection 2.3.2.1 Saving output 4.3 Testing and Saving Output 10.5 Directing Output to Files and Pipes relational operator : 7.8 Relational and Boolean Operators - (hyphen) -= (assignment) operator : 7.6 Expressions (decrement) operator : 7.6 Expressions character classes and : 3.2.4.1 A range of characters subtraction operator : 7.6 Expressions < (less than sign) { print $0 ":" volume } ' volume=$romaNum $x >>/tmp/index$$ sectNumber=`expr $sectNumber + 1` else awk '-F\t' ' NR == { split(namelist, names, ","); volname = names[volume] } NF == { print $0 } NF > { print $0 ":" volname } ' volume=$sectNumber namelist=$sectNames $x >>/tmp/ index$$ sectNumber=`expr $sectNumber + 1` fi done FILES="/tmp/index$$" fi if [ "$PAGE" != "" ]; then $INDEXDIR/page.idx $FILES exit fi $INDEXDIR/input.idx $FILES | sort -bdf -t: +0 -1 +1 -2 +3 -4 +2n -3n | uniq | $INDEXDIR/pagenums.idx | $INDEXDIR/combine.idx | $INDEXDIR/format.idx FMT=$FORMAT MACDIR=$INDEXMACDIR if [ -s "/tmp/index$$" ]; then rm /tmp/index$$ fi C.1 Full Listing of spellcheck awk C.3 Documentation for masterindex Appendix C Supplement for Chapter 12 C.3 Documentation for masterindex This documentation, and the notes that follow, are by Dale Dougherty C.3.1 masterindex indexing program for single and multivolume indexing Synopsis masterindex [-master [volume]] [-page] [-screen] [filename ] Description masterindex generates a formatted index based on structured index entries output by troff Unless you redirect output, it comes to the screen Options -m or -master indicates that you are compiling a multivolume index The index entries for each volume should be in a single file and the filenames should be listed in sequence If the first file is not the first volume, then specify the volume number as a separate argument The volume number is converted to a roman numeral and prepended to all the page numbers of entries in that file -p or -page produces a listing of index entries for each page number It can be used to proof the entries against hardcopy -s or -screen specifies that the unformatted index will be viewed on the "screen" The default is to prepare output that contains troff macros for formatting Files /work/bin/masterindex /work/bin/page.idx /work/bin/pagenums.idx /work/bin/combine.idx /work/bin/format.idx /work/bin/rotate.idx /work/bin/romanum /work/macros/current/indexmacs See Also Note that these programs require "nawk" (new awk): nawk (1), and sed (1V) Bugs The new index program is modular, invoking a series of smaller programs This should allow me to connect different modules to implement new features as well as isolate and fix problems more easily Index entries should not contain any troff font changes The program does not handle them Roman numerals greater than eight will not be sorted properly, thus imposing a limit of an eight-book index (The sort program will sort the roman numerals 1-10 in the following order: I, II, III, IV, IX, V, VI, VII, VIII, X.) C.3.2 Background Details Tim O'Reilly recommends The Joy of Cooking (JofC) index as an ideal index I examined the JofC index quite thoroughly and set out to write a new indexing program that duplicated its features I did not wholly duplicate the JofC format, but this could be done fairly easily if desired Please look at the JofC index yourself to examine its features I also tried to a few other things to improve on the previous index program and provide more support for the person coding the index C.3.3 Coding Index Entries This section describes the coding of index entries in the document file We use the XX macro for placing index entries in a file The simplest case is: XX "entry" If the entry consists of primary and secondary sort keys, then we can code it as: XX "primary, secondary" A comma delimits the two keys We also have a XN macro for generating "See" references without a page number It is specified as: XN "entry (See anotherEntry)" While these coding forms continue to work as they have, masterindex provides greater flexibility by allowing three levels of keys: primary, secondary, and tertiary You'd specify the entry like so: XX "primary: secondary; tertiary" Note that the comma is not used as a delimiter A colon delimits the primary and secondary entry; the semicolon delimits the secondary and tertiary entry This means that commas can be a part of a key using this syntax Don't worry, though, you can continue to use a comma to delimit the primary and secondary keys (Be aware that the first comma in a line is converted to a colon, if no colon delimiter is found.) I'd recommend that new books be coded using the above syntax, even if you are only specifying a primary and secondary key Another feature is automatic rotation of primary and secondary keys if a tilde (~) is used as the delimiter So the following entry: XX "cat~command" is equivalent to the following two entries: XX "cat command" XX "command: cat" You can think of the secondary key as a classification (command, attribute, function, etc.) of the primary entry Be careful not to reverse the two, as "command cat" does not make much sense To use a tilde in an entry, enter "~~" I added a new macro, XB, that is the same as XX except that the page number for this index entry will be output in bold to indicate that it is the most significant page number in a range Here is an example: XB "cat command" When troff processes the index entries, it outputs the page number followed by an asterisk This is how it appears when output is seen in screen format When coded for troff formatting, the page number is surrounded by the bold font change escape sequences (By the way, in the JofC index, I noticed that they allowed having the same page number in roman and in bold.) Also, this page number will not be combined in a range of consecutive numbers One other feature of the JofC index is that the very first secondary key appears on the same line with the primary key The old index program placed any secondary key on the next line The one advantage of doing it the JofC way is that entries containing only one secondary key will be output on the same line and look much better Thus, you'd have "line justification, definition of" rather than having "definition of" indented on the next line The next secondary key would be indented Note that if the primary key exists as a separate entry (it has page numbers associated with it), the page references for the primary key will be output on the same line and the first secondary entry will be output on the next line To reiterate, while the syntax of the three-level entries is different, this index entry is perfectly valid: XX "line justification, definition of" It also produces the same result as: XX "line justification: definition of" (The colon disappears in the output.) Similarly, you could write an entry, such as XX "justification, lines, defined" or XX "justification: lines, defined" where the comma between "lines" and "defined" does not serve as a delimiter but is part of the secondary key The previous example could be written as an entry with three levels: XX "justification: lines; defined" where the semicolon delimits the tertiary key The semicolon is output with the key, and multiple tertiary keys may follow immediately after the secondary key The main thing, though, is that page numbers are collected for all primary, secondary, and tertiary keys Thus, you could have output such as: justification 4-9 lines 4,6; defined, C.3.4 Output Format One thing I wanted to that our previous program did not is generate an index without the troff codes masterindex has three output modes: troff, screen, and page The default output is intended for processing by troff (via fmt) It contains macros that are defined in /work/macros/current/indexmacs These macros should produce the same index format as before, which was largely done directly through troff requests Here are a few lines off the top: $ masterindex ch01 so /work/macros/current/indexmacs Se "" "Index" XC XF A "A" XF "applications, structure of 2; program XF "attribute, WIN_CONSUME_KBD_EVENTS 13" XF "WIN_CONSUME_PICK_EVENTS 13" XF "WIN_NOTIFY_EVENT_PROC 13" XF "XV_ERROR_PROC 14" XF "XV_INIT_ARGC_PTR_ARGV 5,6" 1" The top two lines should be obvious The XC macro produces multicolumn output (It will print out two columns for smaller books It's not smart enough to take arguments specifying the width of columns, but that should be done.) The XF macro has three possible values for its first argument An "A" indicates that the second argument is a letter of the alphabet that should be output as a divider A "1" indicates that the second argument contains a primary entry A "2" indicates that the entry begins with a secondary entry, which is indented When invoked with the -s argument, the program prepares the index for viewing on the screen (or printing as an ASCII file) Again, here are a few lines: $ masterindex -s ch01 A applications, structure of 2; program attribute, WIN_CONSUME_KBD_EVENTS 13 WIN_CONSUME_PICK_EVENTS 13 WIN_NOTIFY_EVENT_PROC 13 XV_ERROR_PROC 14 XV_INIT_ARGC_PTR_ARGV 5,6 XV_INIT_ARGS XV_USAGE_PROC Obviously, this is useful for quickly proofing the index The third type of format is also used for proofing the index Invoked using -p, it provides a page-by-page listing of the index entries $ masterindex -p ch01 Page structure of XView applications applications, structure of; program XView applications XView applications, structure of XView interface compiling XView programs XView, compiling programs Page XView libraries C.3.5 Compiling a Master Index A multivolume master index is invoked by specifying the -m option Each set of index entries for a particular volume must be placed in a separate file $ masterindex -m -s book1 book2 book3 xv_init() procedure II: 4; III: XV_INIT_ARGC_PTR_ARGV attribute II: 5,6 XV_INIT_ARGS attribute I: Files must be specified in consecutive order If the first file is not Volume 1, you can specify the number as an argument $ masterindex -m -s book4 book5 C.2 Listing of masterindex Shell Script ... delimiter 2.1 Awk, by Sed and Grep, out of Ed 5.3 Substitution division operator : 7.6 Expressions in ed commands : 2.1 Awk, by Sed and Grep, out of Ed pattern addressing 2.1 Awk, by Sed and Grep,... of sed Commands menu-based generator (example) : 10.4 A Menu-Based Command Generator multiple : 2.4.1 Running awk order of : 4.1 Applying Commands in a Script sed Basic sed Commands Advanced sed. .. Expressions addresses, line 2.1 Awk, by Sed and Grep, out of Ed 4.2 A Global Perspective on Addressing 5.1 About the Syntax of sed Commands addressing by pattern 2.1 Awk, by Sed and Grep, out of Ed A.2.1

Ngày đăng: 12/03/2019, 16:39

Từ khóa liên quan

Mục lục

  • O'reilly sed & awd 2nd Edition

  • Preface

    • [Preface] Comments and Questions

    • [Preface] Acknowledgments from the First Edition

    • [Preface] About the Second Edition

    • [Preface] Conventions Used in This Handbook

    • [Preface] Obtaining Example Source Code

    • [Preface] Availability of sed and awk

    • Index

    • [Chapter 1] Power Tools for Editing

    • [Chapter 1] 1.2 A Stream Editor

    • [Chapter 1] 1.3 A Pattern-Matching Programming Language

    • [Chapter 1] 1.4 Four Hurdles to Mastering sed and awk

    • [Chapter 2] Understanding Basic Operations

    • [Chapter 2] 2.2 Command-Line Syntax

    • [Chapter 2] 2.3 Using sed

    • [Chapter 2] 2.4 Using awk

    • [Chapter 2] 2.5 Using sed and awk Together

    • [Chapter 3] Understanding Regular Expression Syntax

    • [Chapter 3] 3.2 A Line-Up of Characters

    • [Chapter 3] 3.3 I Never Metacharacter I Didn't Like

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan