aboutsummaryrefslogtreecommitdiffstats
path: root/gnu/usr.bin/awk
diff options
context:
space:
mode:
authorsvn2git <svn2git@FreeBSD.org>1994-05-01 00:00:00 -0800
committersvn2git <svn2git@FreeBSD.org>1994-05-01 00:00:00 -0800
commita16f65c7d117419bd266c28a1901ef129a337569 (patch)
tree2626602f66dc3551e7a7c7bc9ad763c3bc7ab40a /gnu/usr.bin/awk
parent8503f4f13f77abf7adc8f7e329c6f9c1d52b6a20 (diff)
downloadsrc-a16f65c7d117419bd266c28a1901ef129a337569.tar.gz
src-a16f65c7d117419bd266c28a1901ef129a337569.zip
Release FreeBSD 1.1release/1.1.0_cvs
This commit was manufactured to restore the state of the 1.1-RELEASE image. Releases prior to 5.3-RELEASE are omitting the secure/ and crypto/ subdirs.
Diffstat (limited to 'gnu/usr.bin/awk')
-rw-r--r--gnu/usr.bin/awk/ACKNOWLEDGMENT21
-rw-r--r--gnu/usr.bin/awk/COPYING340
-rw-r--r--gnu/usr.bin/awk/FUTURES120
-rw-r--r--gnu/usr.bin/awk/LIMITATIONS14
-rw-r--r--gnu/usr.bin/awk/Makefile13
-rw-r--r--gnu/usr.bin/awk/NEWS1295
-rw-r--r--gnu/usr.bin/awk/PORTS32
-rw-r--r--gnu/usr.bin/awk/POSIX95
-rw-r--r--gnu/usr.bin/awk/PROBLEMS6
-rw-r--r--gnu/usr.bin/awk/README116
-rw-r--r--gnu/usr.bin/awk/array.c293
-rw-r--r--gnu/usr.bin/awk/awk.11873
-rw-r--r--gnu/usr.bin/awk/awk.h763
-rw-r--r--gnu/usr.bin/awk/awk.y1804
-rw-r--r--gnu/usr.bin/awk/builtin.c1133
-rw-r--r--gnu/usr.bin/awk/config.h272
-rw-r--r--gnu/usr.bin/awk/dfa.c2291
-rw-r--r--gnu/usr.bin/awk/dfa.h543
-rw-r--r--gnu/usr.bin/awk/eval.c1225
-rw-r--r--gnu/usr.bin/awk/field.c645
-rw-r--r--gnu/usr.bin/awk/gawk.texi11270
-rw-r--r--gnu/usr.bin/awk/getopt.c662
-rw-r--r--gnu/usr.bin/awk/getopt.h128
-rw-r--r--gnu/usr.bin/awk/getopt1.c160
-rw-r--r--gnu/usr.bin/awk/io.c1207
-rw-r--r--gnu/usr.bin/awk/iop.c318
-rw-r--r--gnu/usr.bin/awk/main.c731
-rw-r--r--gnu/usr.bin/awk/msg.c106
-rw-r--r--gnu/usr.bin/awk/node.c429
-rw-r--r--gnu/usr.bin/awk/patchlevel.h1
-rw-r--r--gnu/usr.bin/awk/protos.h115
-rw-r--r--gnu/usr.bin/awk/re.c208
-rw-r--r--gnu/usr.bin/awk/regex.c2854
-rw-r--r--gnu/usr.bin/awk/regex.h260
-rw-r--r--gnu/usr.bin/awk/version.c46
35 files changed, 31389 insertions, 0 deletions
diff --git a/gnu/usr.bin/awk/ACKNOWLEDGMENT b/gnu/usr.bin/awk/ACKNOWLEDGMENT
new file mode 100644
index 000000000000..b6c3b0b0c692
--- /dev/null
+++ b/gnu/usr.bin/awk/ACKNOWLEDGMENT
@@ -0,0 +1,21 @@
+The current developers of Gawk would like to thank and acknowledge the
+many people who have contributed to the development through bug reports
+and fixes and suggestions. Unfortunately, we have not been organized
+enough to keep track of all the names -- for that we apologize.
+
+Another group of people have assisted even more by porting Gawk to new
+platforms and providing a great deal of feedback. They are:
+
+ Hal Peterson <hrp@pecan.cray.com> (Cray)
+ Pat Rankin <gawk.rankin@EQL.Caltech.Edu> (VMS)
+ Michal Jaegermann <NTOMCZAK@vm.ucs.UAlberta.CA> (Atari, NeXT, DEC 3100)
+ Mike Lijewski <mjlx@eagle.cnsf.cornell.edu> (IBM RS6000)
+ Scott Deifik <scottd@amgen.com> (MSDOS 2.14)
+ Kent Williams (MSDOS 2.11)
+ Conrad Kwok (MSDOS earlier versions)
+ Scott Garfinkle (MSDOS earlier versions)
+
+Last, but far from least, we would like to thank Brian Kernighan who
+has helped to clear up many dark corners of the language and provided a
+restraining touch when we have been overly tempted by "feeping
+creaturism".
diff --git a/gnu/usr.bin/awk/COPYING b/gnu/usr.bin/awk/COPYING
new file mode 100644
index 000000000000..3358a7be862a
--- /dev/null
+++ b/gnu/usr.bin/awk/COPYING
@@ -0,0 +1,340 @@
+ GNU GENERAL PUBLIC LICENSE
+ Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.
+ 675 Mass Ave, Cambridge, MA 02139, USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Library General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ GNU GENERAL PUBLIC LICENSE
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+ 0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License. The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language. (Hereinafter, translation is included without limitation in
+the term "modification".) Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+ 1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+ 2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+ a) You must cause the modified files to carry prominent notices
+ stating that you changed the files and the date of any change.
+
+ b) You must cause any work that you distribute or publish, that in
+ whole or in part contains or is derived from the Program or any
+ part thereof, to be licensed as a whole at no charge to all third
+ parties under the terms of this License.
+
+ c) If the modified program normally reads commands interactively
+ when run, you must cause it, when started running for such
+ interactive use in the most ordinary way, to print or display an
+ announcement including an appropriate copyright notice and a
+ notice that there is no warranty (or else, saying that you provide
+ a warranty) and that users may redistribute the program under
+ these conditions, and telling the user how to view a copy of this
+ License. (Exception: if the Program itself is interactive but
+ does not normally print such an announcement, your work based on
+ the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+ 3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+ a) Accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of Sections
+ 1 and 2 above on a medium customarily used for software interchange; or,
+
+ b) Accompany it with a written offer, valid for at least three
+ years, to give any third party, for a charge no more than your
+ cost of physically performing source distribution, a complete
+ machine-readable copy of the corresponding source code, to be
+ distributed under the terms of Sections 1 and 2 above on a medium
+ customarily used for software interchange; or,
+
+ c) Accompany it with the information you received as to the offer
+ to distribute corresponding source code. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form with such
+ an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it. For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable. However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+ 4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License. Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+ 5. You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Program or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+ 6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+ 7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all. For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+ 8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded. In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+ 9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation. If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+ 10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission. For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this. Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+ NO WARRANTY
+
+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+ Appendix: How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) 19yy <name of author>
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+ Gnomovision version 69, Copyright (C) 19yy name of author
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary. Here is a sample; alter the names:
+
+ Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+ `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+ <signature of Ty Coon>, 1 April 1989
+ Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs. If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library. If this is what you want to do, use the GNU Library General
+Public License instead of this License.
+
diff --git a/gnu/usr.bin/awk/FUTURES b/gnu/usr.bin/awk/FUTURES
new file mode 100644
index 000000000000..b09656046b27
--- /dev/null
+++ b/gnu/usr.bin/awk/FUTURES
@@ -0,0 +1,120 @@
+This file lists future projects and enhancements for gawk. Items are listed
+in roughly the order they will be done for a given release. This file is
+mainly for use by the developers to help keep themselves on track, please
+don't bug us too much about schedules or what all this really means.
+
+For 2.16
+========
+David:
+ Move to autoconf-based configure system.
+
+ Allow RS to be a regexp.
+
+ RT variable to hold text of record terminator
+
+ RECLEN variable for fixed length records
+
+ Feedback alloca.s changes to FSF
+
+ Extensible hashing in memory of awk arrays
+
+ Split() with null string as third arg to split up strings
+
+ Analogously, setting FS="" would split the input record into individual
+ characters.
+
+Arnold:
+ Generalize IGNORECASE
+ any value makes it work, not just numeric non-zero
+ make it apply to *all* string comparisons
+
+ Fix FILENAME to have an initial value of "", not "-"
+
+ Clean up code by isolating system-specific functions in separate files.
+
+ Undertake significant directory reorganization.
+
+ Extensive manual cleanup:
+ Use of texinfo 2.0 features
+ Lots more examples
+ Document all of the above.
+
+In 2.17
+=======
+David:
+
+ Incorporate newer dfa.c and regex.c (go to POSIX regexps)
+
+ Make regex + dfa less dependant on gawk header file includes
+
+ General sub functions:
+ edit(line, pat, sub) and gedit(line, pat, sub)
+ that return the substituted strings and allow \1 etc. in the sub string.
+
+Arnold:
+ DBM storage of awk arrays. Try to allow multiple dbm packages
+
+ ? Have strftime() pay attention to the value of ENVIRON["TZ"]
+
+ Additional manual features:
+ Document posix regexps
+ Document use of dbm arrays
+ ? Add an error messages section to the manual
+ ? A section on where gawk is bounded
+ regex
+ i/o
+ sun fp conversions
+
+For 2.18
+========
+
+Arnold:
+ Add chdir and stat built-in functions.
+
+ Add function pointers as valid variable types.
+
+ Add an `ftw' built-in function that takes a function pointer.
+
+David:
+
+ Do an optimization pass over parse tree?
+
+For 2.19 or later:
+==================
+Add variables similar to C's __FILE__ and __LINE__ for better diagnostics
+from within awk programs.
+
+Add an explicit concatenation operator and assignment version.
+
+? Add a switch statement
+
+Add the ability to seek on an open file and retrieve the current file position.
+
+Add lint checking everywhere, including check for use of builtin vars.
+only in new awk.
+
+"restart" keyword
+
+Add |&
+
+Make awk '/foo/' files... run at egrep speeds
+
+Do a reference card
+
+Allow OFMT to be other than a floating point format.
+
+Allow redefining of builtin functions?
+
+Make it faster and smaller.
+
+For 3.x:
+========
+
+Create a gawk compiler?
+
+Create a gawk-to-C translator? (or C++??)
+
+Provide awk profiling and debugging.
+
+
+
diff --git a/gnu/usr.bin/awk/LIMITATIONS b/gnu/usr.bin/awk/LIMITATIONS
new file mode 100644
index 000000000000..5877197aeb55
--- /dev/null
+++ b/gnu/usr.bin/awk/LIMITATIONS
@@ -0,0 +1,14 @@
+This file describes limits of gawk on a Unix system (although it
+is variable even then). Non-Unix systems may have other limits.
+
+# of fields in a record: MAX_INT
+Length of input record: MAX_INT
+Length of output record: unlimited
+Size of a field: MAX_INT
+Size of a printf string: MAX_INT
+Size of a literal string: MAX_INT
+Characters in a character class: 2^(# of bits per byte)
+# of file redirections: unlimited
+# of pipe redirections: min(# of processes per user, # of open files)
+double-precision floating point
+Length of source line: unlimited
diff --git a/gnu/usr.bin/awk/Makefile b/gnu/usr.bin/awk/Makefile
new file mode 100644
index 000000000000..3ef9894d175d
--- /dev/null
+++ b/gnu/usr.bin/awk/Makefile
@@ -0,0 +1,13 @@
+PROG= awk
+SRCS= main.c eval.c builtin.c msg.c iop.c io.c field.c array.c \
+ node.c version.c re.c awk.c regex.c dfa.c \
+ getopt.c getopt1.c
+CFLAGS+= -DGAWK
+LDADD= -lm
+DPADD= ${LIBM}
+CLEANFILES+= awk.c y.tab.h
+
+MAN1= awk.1
+
+.include <bsd.prog.mk>
+.include "../../usr.bin/Makefile.inc"
diff --git a/gnu/usr.bin/awk/NEWS b/gnu/usr.bin/awk/NEWS
new file mode 100644
index 000000000000..6711373d6ea5
--- /dev/null
+++ b/gnu/usr.bin/awk/NEWS
@@ -0,0 +1,1295 @@
+Changes from 2.15.1 to 2.15.2
+---------------------------
+
+Additions to the FUTURES file.
+
+Document undefined order of output when using both standard output
+ and /dev/stdout or any of the /dev output files that gawk emulates in
+ the absence of OS support.
+
+Clean up the distribution generation in Makefile.in: the info files are
+ now included, the distributed files are marked read-only and patched
+ distributions are now unpacked in a directory named with the patch level.
+
+
+Changes from 2.15 to 2.15.1
+---------------------------
+
+Close stdout and stderr before all redirections on program exit. This allows
+ detection of write errors and also fixes the messages test on Solaris 2.x.
+
+Removed YYMAXDEPTH define in awk.y which was limiting the parser stack depth.
+
+Changes to config/bsd44, Makefile.bsd44 and configure to bring it into line
+ with the BSD4.4 release.
+
+Changed Makefile to use prefix, exec_prefix, bindir etc.
+
+make install now installs info files.
+
+make install now sets permissions on installed files.
+
+Make targets added: uninstall, distclean, mostlyclean and realclean.
+
+Added config.h to cleaner and clobber make targets.
+
+Changes to config/{hpux8x,sysv3,sysv4,ultrix41} to deal with alloca().
+
+Change to getopt.h for portability.
+
+Added more special cases to the getpgrp() call.
+
+Added README.ibmrt-aos and config/ibmrt-aos.
+
+Changes from 2.14 to 2.15
+---------------------------
+
+Command-line source can now be mixed with library functions.
+
+ARGIND variable tracks index in ARGV of FILENAME.
+
+GNU style long options in addition to short options.
+
+Plan 9 style special files interpreted by gawk:
+ /dev/pid
+ /dev/ppid
+ /dev/pgrpid
+ /dev/user
+ $1 = getuid
+ $2 = geteuid
+ $3 = getgid
+ $4 = getegid
+ $5 ... $NF = getgroups if supported
+
+ERRNO variable contains error string if getline or close fails.
+
+Very old options -a and -e have gone away.
+
+Inftest has been removed from the default target in test/Makefile -- the
+ results were too machine specific and resulted in too many false alarms.
+
+A README.amiga has been added.
+
+The "too many arguments supplied for format string" warning message is only
+ in effect under the lint option.
+
+Code improvements in dfa.c.
+
+Fixed all reported bugs:
+
+ Writes are checked for failure (such as full filesystem).
+
+ Stopped (at least some) runaway error messages.
+
+ gsub(/^/, "x") does the right thing for $0 of 0, 1, or more length.
+
+ close() on a command being piped to a getline now works properly.
+
+ The input record will no longer be freed upon an explicit close()
+ of the input file.
+
+ A NUL character in FS now works.
+
+ In a substitute, \\& now means a literal backslash followed by what
+ was matched.
+
+ Integer overflow of substring length in substr() is caught.
+
+ An input record without a newline termination is handled properly.
+
+ In io.c, check is against only EMFILE so that system file table
+ is not filled.
+
+ Renamed all files with names longer than 14 characters.
+
+ Escaped characters in regular expressions were being lost when
+ IGNORECASE was used.
+
+ Long source lines were not being handled properly.
+
+ Sourcefiles that ended in a tab but no newline were bombing.
+
+ Patterns that could match zero characters in split() were not working
+ properly.
+
+ The parsedebug option was not working.
+
+ The grammar was being a bit too lenient, allowing some very dubious
+ programs to pass.
+
+ Compilation with DEBUG defined now works.
+
+ A variable read in with getline was not being treated as a potential
+ number.
+
+ Array subscripts were not always of string type.
+
+
+Changes from 2.13.2 to 2.14
+---------------------------
+
+Updated manual!
+
+Added "next file" to skip efficiently to the next input file.
+
+Fixed potential of overflowing buffer in do_sprintf().
+
+Plugged small memory leak in sub_common().
+
+EOF on a redirect is now "sticky" -- it can only be cleared by close()ing
+ the pipe or file.
+
+Now works if used via a #! /bin/gawk line at the top of an executable file
+ when that line ends with whitespace.
+
+Added some checks to the grammar to catch redefinition of builtin functions.
+ This could eventually be the basis for an extension to allow redefining
+ functions, but in the mean time it's a good error catching facility.
+
+Negative integer exponents now work.
+
+Modified do_system() to make sure it had a non-null string to be passed
+ to system(3). Thus, system("") will flush any pending output but not go
+ through the overhead of forking an un-needed shell.
+
+A fix to floating point comparisons so that NaNs compare right on IEEE systems.
+
+Added code to make sure we're not opening directories for reading and such.
+
+Added code to do better diagnoses of weird or null file names.
+
+Allow continue outside of a loop, unless in strict posix mode. Lint option
+ will issue warning.
+
+New missing/strftime.c. There has been one chage that affects gawk. Posix
+ now defines a %V conversion so the vms conversion has been changed to %v.
+ If this version is used with gawk -Wlint and they use %V in a call to
+ strftime, they'll get a warning.
+
+Error messages now conform to GNU standard (I hope).
+
+Changed comparisons to conform to the description found in the file POSIX.
+ This is inconsistent with the current POSIX draft, but that is broken.
+ Hopefully the final POSIX standard will conform to this version.
+ (Alas, this will have to wait for 1003.2b, which will be a revision to
+ the 1003.2 standard. That standard has been frozen with the broken
+ comparison rules.)
+
+The length of a string was a short and now is a size_t.
+
+Updated VMS help.
+
+Added quite a few new tests to the test suite and deleted many due to lack of
+ written releases. Test output is only removed if it is identical to the
+ "good" output.
+
+Fixed a couple of bugs for reference to $0 when $0 is "" -- particularly in
+ a BEGIN block.
+
+Fixed premature freeing in construct "$0 = $0".
+
+Removed the call to wait_any() in gawk_popen(), since on at least some systems,
+ if gawk's input was from a pipe, the predecssor process in the pipe was a
+ child of gawk and this caused a deadlock.
+
+Regexp can (once again) match a newline, if given explicitly.
+
+nextopen() makes sure file name is null terminated.
+
+Fixed VMS pipe simulation. Improved VMS I/O performance.
+
+Catch . used in variable names.
+
+Fixed bug in getline without redirect from a file -- it was quitting after the
+ first EOF, rather than trying the next file.
+
+Fixed bug in treatment of backslash at the end of a string -- it was bombing
+ rather than doing something sensible. It is not clear what this should mean,
+ but for now I issue a warning and take it as a literal backslash.
+
+Moved setting of regexp syntax to before the option parsing in main(), to
+ handle things like -v FS='[.,;]'
+
+Fixed bug when NF is set by user -- fields_arr must be expanded if necessary
+ and "new" fields must be initialized.
+
+Fixed several bugs in [g]sub() for no match found or the match is 0-length.
+
+Fixed bug where in gsub() a pattern anchorred at the beginning would still
+ substitute throughout the string.
+
+make test does not assume the . is in PATH.
+
+Fixed bug when a field beyond the end of the record was requested after
+ $0 was altered (directly or indirectly).
+
+Fixed bug for assignment to field beyond end of record -- the assigned value
+ was not found on subsequent reference to that field.
+
+Fixed bug for FS a regexp and it matches at the end of a record.
+
+Fixed memory leak for an array local to a function.
+
+Fixed hanging of pipe redirection to getline
+
+Fixed coredump on access to $0 inside BEGIN block.
+
+Fixed treatment of RS = "". It now parses the fields correctly and strips
+ leading whitspace from a record if FS is a space.
+
+Fixed faking of /dev/stdin.
+
+Fixed problem with x += x
+
+Use of scalar as array and vice versa is now detected.
+
+IGNORECASE now obeyed for FS (even if FS is a single alphabetic character).
+
+Switch to GPL version 2.
+
+Renamed awk.tab.c to awktab.c for MSDOS and VMS tar programs.
+
+Renamed this file (CHANGES) to NEWS.
+
+Use fmod() instead of modf() and provide FMOD_MISSING #define to undo
+ this change.
+
+Correct the volatile declarations in eval.c.
+
+Avoid errant closing of the file descriptors for stdin, stdout and stderr.
+
+Be more flexible about where semi-colons can occur in programs.
+
+Check for write errors on all output, not just on close().
+
+Eliminate the need for missing/{strtol.c,vprintf.c}.
+
+Use GNU getopt and eliminate missing/getopt.c.
+
+More "lint" checking.
+
+
+Changes from 2.13.1 to 2.13.2
+-----------------------------
+
+Toward conformity with GNU standards, configure is a link to mkconf, the latter
+ to disappear in the next major release.
+
+Update to config/bsd43.
+
+Added config/apollo, config/msc60, config/cray2-50, config/interactive2.2
+
+sgi33.cc added for compilation using cc ratther than gcc.
+
+Ultrix41 now propagates to config.h properly -- as part of a general
+ mechanism in configure for kludges -- #define anything from a config file
+ just gets tacked onto the end of config.h -- to be used sparingly.
+
+Got rid of an unnecessary and troublesome declaration of vprintf().
+
+Small improvement in locality of error messages.
+
+Try to diagnose use of array as scalar and vice versa -- to be improved in
+ the future.
+
+Fix for last bug fix for Cray division code--sigh.
+
+More changes to test suite to explicitly use sh. Also get rid of
+ a few generated files.
+
+Fixed off-by-one bug in string concatenation code.
+
+Fix for use of array that is passed in from a previous function parameter.
+ Addition to test suite for above.
+
+A number of changes associated with changing NF and access to fields
+ beyond the end of the current record.
+
+Change to missing/memcmp.c to avoid seg. fault on zero length input.
+
+Updates to test suite (including some inadvertently left out of the last patch)
+ to invoke sh explicitly (rather than rely on #!/bin/sh) and remove some
+ junk files. test/chem/good updated to correspond to bug fixes.
+
+Changes from 2.13.0 to 2.13.1
+-----------------------------
+
+More configs and PORTS.
+
+Fixed bug wherein a simple division produced an erroneous FPE, caused by
+ the Cray division workaround -- that code is now #ifdef'd only for
+ Cray *and* fixed.
+
+Fixed bug in modulus implementation -- it was very close to the above
+ code, so I noticed it.
+
+Fixed portability problem with limits.h in missing.c
+
+Fixed portability problem with tzname and daylight -- define TZNAME_MISSING
+ if strftime() is missing and tzname is also.
+
+Better support for Latin-1 character set.
+
+Fixed portability problem in test Makefile.
+
+Updated PROBLEMS file.
+
+=============================== gawk-2.13 released =========================
+Changes from 2.12.42 to 2.12.43
+-------------------------------
+
+Typo in awk.y
+
+Fixed up strftime.3 and added doc. for %V.
+
+Changes from 2.12.41 to 2.12.42
+-------------------------------
+
+Fixed bug in devopen() -- if you had write permission in /dev,
+ it would just create /dev/stdout etc.!!
+
+Final (?) VMS update.
+
+Make NeXT use GFMT_WORKAROUND
+
+Fixed bug in sub_common() for substitute on zero-length match. Improved the
+ code a bit while I was at it.
+
+Fixed grammar so that $i++ parses as ($i)++
+
+Put support/* back in the distribution (didn't I already do this?!)
+
+Changes from 2.12.40 to 2.12.41
+-------------------------------
+
+VMS workaround for broken %g format.
+
+Changes from 2.12.39 to 2.12.40
+-------------------------------
+
+Minor man page update.
+
+Fixed latent bug in redirect().
+
+Changes from 2.12.38 to 2.12.39
+-------------------------------
+
+Updates to test suite -- remove dependence on changing gawk.1 man page.
+
+Changes from 2.12.37 to 2.12.38
+-------------------------------
+
+Fixed bug in use of *= without whitespace following.
+
+VMS update.
+
+Updates to man page.
+
+Option handling updates in main.c
+
+test/manyfiles redone and added to bigtest.
+
+Fixed latent (on Sun) bug in handling of save_fs.
+
+Changes from 2.12.36 to 2.12.37
+-------------------------------
+
+Update REL in Makefile-dist. Incorporate test suite into main distribution.
+
+Minor fix in regtest.
+
+Changes from 2.12.35 to 2.12.36
+-------------------------------
+
+Release takes on dual personality -- 2.12.36 and 2.13.0 -- any further
+ patches before public release won't count for 2.13, although they will for
+ 2.12 -- be careful to avoid confusion! patchlevel.h will be the last thing
+ to change.
+
+Cray updates to deal with arithmetic problems.
+
+Minor test suite updates.
+
+Fixed latent bug in parser (freeing memory).
+
+Changes from 2.12.34 to 2.12.35
+-------------------------------
+
+VMS updates.
+
+Flush stdout at top of err() and stderr at bottom.
+
+Fixed bug in eval_condition() -- it wasn't testing for MAYBE_NUM and
+ doing the force_number().
+
+Included the missing manyfiles.awk and a new test to catch the above bug which
+ I am amazed wasn't already caught by the test suite -- it's pretty basic.
+
+Changes from 2.12.33 to 2.12.34
+-------------------------------
+
+Atari updates -- including bug fix.
+
+More VMS updates -- also nuke vms/version.com.
+
+Fixed bug in handling of large numbers of redirections -- it was probably never
+ tested before (blush!).
+
+Minor rearrangement of code in r_force_number().
+
+Made chem and regtest tests a bit more portable (Ultrix again).
+
+Added another test -- manyfiles -- not invoked under any other test -- very Unix
+ specific.
+
+Rough beginning of LIMITATIONS file -- need my AWK book to complete it.
+
+Changes from 2.12.32 to 2.12.33
+-------------------------------
+
+Expunge debug.? from various files.
+
+Remove vestiges of Floor and Ceil kludge.
+
+Special case integer division -- mainly for Cray, but maybe someone else
+ will benefit.
+
+Workaround for iop_close closing an output pipe descriptor on Cray --
+ not conditional since I think it may fix a bug on SGI as well and I don't
+ think it can hurt elsewhere.
+
+Fixed memory leak in assoc_lookup().
+
+Small cleanup in test suite.
+
+Changes from 2.12.31 to 2.12.32
+-------------------------------
+
+Nuked debug.c and debugging flag -- there are better ways.
+
+Nuked version.sh and version.c in subdirectories.
+
+Fixed bug in handling of IGNORECASE.
+
+Fixed bug when FIELDWIDTHS was set via -v option.
+
+Fixed (obscure) bug when $0 is assigned a numerical value.
+
+Fixed so that escape sequences in command-line assignments work (as it already
+ said in the comment).
+
+Added a few cases to test suite.
+
+Moved support/* back into distribution.
+
+VMS updates.
+
+Changes from 2.12.30 to 2.12.31
+-------------------------------
+
+Cosmetic manual page changes.
+
+Updated sunos3 config.
+
+Small changes in test suite including renaming files over 14 chars. in length.
+
+Changes from 2.12.29 to 2.12.30
+-------------------------------
+
+Bug fix for many string concatenations in a row.
+
+Changes from 2.12.28 to 2.12.29
+-------------------------------
+
+Minor cleanup in awk.y
+
+Minor VMS update.
+
+Minor atari update.
+
+Changes from 2.12.27 to 2.12.28
+-------------------------------
+
+Got rid of the debugging goop in eval.c -- there are better ways.
+
+Sequent port.
+
+VMS changes left out of the last patch -- sigh! config/vms.h renamed
+ to config/vms-conf.h.
+
+Fixed missing/tzset.c
+
+Removed use of gcvt() and GCVT_MISSING -- turns out it was no faster than
+ sprintf("%g") and caused all sorts of portability headaches.
+
+Tuned get_field() -- it was unnecessarily parsing the whole record on reference
+ to $0.
+
+Tuned interpret() a bit in the rule_node loop.
+
+In r_force_number(), worked around bug in Uglix strtod() and got rid of
+ ugly do{}while(0) at Michal's urging.
+
+Replaced do_deref() and deref with unref(node) -- much cleaner and a bit faster.
+
+Got rid of assign_number() -- contrary to comment, it was no faster than
+ just making a new node and freeing the old one.
+
+Replaced make_number() and tmp_number() with macros that call mk_number().
+
+Changed freenode() and newnode() into macros -- the latter is getnode()
+ which calls more_nodes() as necessary.
+
+Changes from 2.12.26 to 2.12.27
+-------------------------------
+
+Completion of Cray 2 port (includes a kludge for floor() and ceil()
+ that may go or be changed -- I think that it may just be working around
+ a bug in chem that is being tweaked on the Cray).
+
+More VMS updates.
+
+Moved kludge over yacc's insertion of malloc and realloc declarations
+ from protos.h to the Makefile.
+
+Added a lisp interpreter in awk to the test suite. (Invoked under
+ bigtest.)
+
+Cleanup in r_force_number() -- I had never gotten around to a thorough
+ profile of the cache code and it turns out to be not worth it.
+
+Performance boost -- do lazy force_number()'ing for fields etc. i.e.
+ flag them (MAYBE_NUM) and call force_number only as necessary.
+
+Changes from 2.12.25 to 2.12.26
+-------------------------------
+
+Rework of regexp stuff so that dynamic regexps have reasonable
+ performance -- string used for compiled regexp is stored and
+ compared to new string -- if same, no recompilation is necessary.
+ Also, very dynamic regexps cause dfa-based searching to be turned
+ off.
+
+Code in dev_open() is back to returning fileno(std*) rather than
+ dup()ing it. This will be documented. Sorry for the run-around
+ on this.
+
+Minor atari updates.
+
+Minor vms update.
+
+Missing file from MSDOS port.
+
+Added warning (under lint) if third arg. of [g]sub is a constant and
+ handle it properly in the code (i.e. return how many matches).
+
+Changes from 2.12.24 to 2.12.25
+-------------------------------
+
+MSDOS port.
+
+Non-consequential changes to regexp variables in preparation for
+ a more serious change to fix a serious performance problem.
+
+Changes from 2.12.23 to 2.12.24
+-------------------------------
+
+Fixed bug in output flushing introduced a few patches back. This caused
+ serious performance losses.
+
+Changes from 2.12.22 to 2.12.23
+-------------------------------
+
+Accidently left config/cray2-60 out of last patch.
+
+Added some missing dependencies to Makefile.
+
+Cleaned up mkconf a bit; made yacc the default parser (no alloca needed,
+ right?); added rs6000 hook for signed characters.
+
+Made regex.c with NO_ALLOCA undefined work.
+
+Fixed bug in dfa.c for systems where free(NULL) bombs.
+
+Deleted a few cant_happen()'s that *really* can't hapen.
+
+Changes from 2.12.21 to 2.12.22
+-------------------------------
+
+Added to config stuff the ability to choose YACC rather than bison.
+
+Fixed CHAR_UNSIGNED in config.h-dist.
+
+Second arg. of strtod() is char ** rather than const char **.
+
+stackb is now initially malloc()'ed since it may be realloc()'ed.
+
+VMS updates.
+
+Added SIZE_T_MISSING to config stuff and a default typedef to awk.h.
+ (Maybe it is not needed on any current systems??)
+
+re_compile_pattern()'s size is now size_t unconditionally.
+
+Changes from 2.12.20 to 2.12.21
+-------------------------------
+
+Corrected missing/gcvt.c.
+
+Got rid of use of dup2() and thus DUP_MISSING.
+
+Updated config/sgi33.
+
+Turned on (and fixed) in cmp_nodes() the behaviour that I *hope* will be in
+ POSIX 1003.2 for relational comparisons.
+
+Small updates to test suite.
+
+Changes from 2.12.19 to 2.12.20
+-------------------------------
+
+Sloppy, sloppy, sloppy!! I didn't even try to compile the last two
+ patches. This one fixes goofs in regex.c.
+
+Changes from 2.12.18 to 2.12.19
+-------------------------------
+
+Cleanup of last patch.
+
+Changes from 2.12.17 to 2.12.18
+-------------------------------
+
+Makefile renamed to Makefile-dist.
+
+Added alloca() configuration to mkconf. (A bit kludgey.) Just
+ add a single line containing ALLOCA_PW, ALLOCA_S or ALLOCA_C
+ to the appropriate config file to have Makefile-dist edited
+ accordingly.
+
+Reorganized output flushing to correspond with new semantics of
+ devopen() on "/dev/std*" etc.
+
+Fixed rest of last goof!!
+
+Save and restore errno in do_pathopen().
+
+Miscellaneous atari updates.
+
+Get rid of the trailing comma in the NODETYPE definition (Cray
+ compiler won't take it).
+
+Try to make the use of `const' consistent since Cray compiler is
+ fussy about that. See the changes to `basename' and `myname'.
+
+It turns out that, according to section 3.8.3 (Macro Replacement)
+ of the ANSI Standard: ``If there are sequences of preprocessing
+ tokens within the list of arguments that would otherwise act as
+ preprocessing directives, the behavior is undefined.'' That means
+ that you cannot count on the behavior of the declaration of
+ re_compile_pattern in awk.h, and indeed the Cray compiler chokes on it.
+
+Replaced alloca with malloc/realloc/free in regex.c. It was much simpler
+ than expected. (Inside NO_ALLOCA for now -- by default no alloca.)
+
+Added a configuration file, config/cray60, for Unicos-6.0.
+
+Changes from 2.12.16 to 2.12.17
+-------------------------------
+
+Ooops. Goofed signal use in last patch.
+
+Changes from 2.12.15 to 2.12.16
+-------------------------------
+
+RENAMED *_dir to just * (e.g. missing_dir).
+
+Numerous VMS changes.
+
+Proper inclusion of atari and vms files.
+
+Added experimental (ifdef'd out) RELAXED_CONTINUATION and DEFAULT_FILETYPE
+ -- please comment on these!
+
+Moved pathopen() to io.c (sigh).
+
+Put local directory ahead in default AWKPATH.
+
+Added facility in mkconf to echo comments on stdout: lines beginning
+ with "#echo " will have the remainder of the line echoed when mkconf is run.
+ Any lines starting with "#" will otherwise be treated as comments. The
+ intent is to be able to say:
+ "#echo Make sure you uncomment alloca.c in the Makefile"
+ or the like.
+
+Prototype fix for V.4
+
+Fixed version_string to not print leading @(#).
+
+Fixed FIELDWIDTHS to work with strict (turned out to be easy).
+
+Fixed conf for V.2.
+
+Changed semantics of /dev/fd/n to be like on real /dev/fd.
+
+Several configuration and updates in the makefile.
+
+Updated manpage.
+
+Include tzset.c and system.c from missing_dir that were accidently left out of
+ the last patch.
+
+Fixed bug in cmdline variable assignment -- arg was getting freed(!) in
+ call to variable.
+
+Backed out of parse-time constant folding for now, until I can figure out
+ how to do it right.
+
+Fixed devopen() so that getline <"-" works.
+
+Changes from 2.12.14 to 2.12.15
+-------------------------------
+
+Changed config/* to a condensed form that can be used with mkconf to generate
+ a config.h from config.h-dist -- much easier to maintain. Please chaeck
+ carefully against what you had before for a particular system and report
+ any problems. vms.h remains separate since the stuff at the bottom
+ didn't quite fit the mkconf model -- hopefully cleared up later.
+
+Fixed bug in grammar -- didn't allow function definition to be separated from
+ other rules by a semi-colon.
+
+VMS fix to #includes in missing.c -- should we just be including awk.h?
+
+Updated README for texinfo.tex version.
+
+Updating of copyright in all .[chy] files.
+
+Added but commented out Michal's fix to strftime.
+
+Added tzset() emulation based on Rick Adams' code. Added TZSET_MISSING to
+ config.h-dist.
+
+Added strftime.3 man page for missing_dir
+
+More posix: func, **, **= don't work in -W posix
+
+More lint: ^, ^= not in old awk
+
+gawk.1: removed ref to -DNO_DEV_FD, other minor updating.
+
+Style change: pushbak becomes pushback() in yylex().
+
+Changes from 2.12.13 to 2.12.14
+-------------------------------
+
+Better (?) organization of awk.h -- attempt to keep all system dependencies
+ near the top and move some of the non-general things out of the config.h
+ files.
+
+Change to handling of SYSTEM_MISSING.
+
+Small change to ultrix config.
+
+Do "/dev/fd/*" etc. checking at runtime.
+
+First pass at VMS port.
+
+Improvements to error handling (when lexeme spans buffers).
+
+Fixed backslash handling -- why didn't I notice this sooner?
+
+Added programs from book to test suite and new target "bigtest" to Makefile.
+
+Changes from 2.12.12 to 2.12.13
+-------------------------------
+
+Recognize OFS and ORS specially so that OFS = 9 works without efficiency hit.
+ Took advantage of opportunity to tune do_print*() for about 10% win on a
+ print with 5 args (i.e. small but significant).
+
+Somewhat pervasive changes to reconcile CONVFMT vs. OFMT.
+
+Better initialization of builtin vars.
+
+Make config/* consistent wrt STRTOL_MISSING.
+
+Small portability improvement to alloca.s
+
+Improvements to lint code in awk.y
+
+Replaced strtol() with a better one by Chris Torek.
+
+Changes from 2.12.11 to 2.12.12
+-------------------------------
+
+Added PORTS file to record successful ports.
+
+Added #define const to nothing if not STDC and added const to strtod() header.
+
+Added * to printf capabilities and partially implemented ' ' and '+' (has an
+ effect for %d only, silently ignored for other formats). I'm afraid that's
+ as far as I want to go before I look at a complete replacement for
+ do_sprintf().
+
+Added warning for /regexp/ on LHS of MATCHOP.
+
+Changes from 2.12.10 to 2.12.11
+-------------------------------
+
+Small Makefile improvements.
+
+Some remaining nits from the NeXT port.
+
+Got rid of bcopy() define in awk.h -- not needed anymore (??)
+
+Changed private in builtin.c -- it is special on Sequent.
+
+Added subset implementation of strtol() and STRTOL_MISSING.
+
+A little bit of cleanup in debug.c, dfa.c.
+
+Changes from 2.12.9 to 2.12.10
+------------------------------
+
+Redid compatability checking and checking for # of args.
+
+Removed all references to variables[] from outside awk.y, in preparation
+ for a more abstract interface to the symbol table.
+
+Got rid of a remaining use of bcopy() in regex.c.
+
+Changes from 2.12.8 to 2.12.9
+-----------------------------
+
+Portability improvements for atari, next and decstation.
+
+Bug fix in substr() -- wasn't handling 3rd arg. of -1 properly.
+
+Manpage updates.
+
+Moved support from src release to doc release.
+
+Updated FUTURES file.
+
+Added some "lint" warnings.
+
+Changes from 2.12.7 to 2.12.8
+-----------------------------
+
+Changed time() to systime().
+
+Changed warning() in snode() to fatal().
+
+strftime() now defaults second arg. to current time.
+
+Changes from 2.12.6 to 2.12.7
+-----------------------------
+
+Fixed bug in sub_common() involving inadequate allocation of a buffer.
+
+Added some missing files to the Makefile.
+
+Changes from 2.12.5 to 2.12.6
+-----------------------------
+
+Fixed bug wherein non-redirected getline could call iop_close() just
+ prior to a call from do_input().
+
+Fixed bug in handling of /dev/stdout and /dev/stderr.
+
+Changes from 2.12.4 to 2.12.5
+-----------------------------
+
+Updated README and support directory.
+
+Changes from 2.12.3 to 2.12.4
+-----------------------------
+
+Updated CHANGES and TODO (should have been done in previous 2 patches).
+
+Changes from 2.12.2 to 2.12.3
+-----------------------------
+
+Brought regex.c and alloca.s into line with current FSF versions.
+
+Changes from 2.12.1 to 2.12.2
+-----------------------------
+
+Portability improvements; mostly moving system prototypes out of awk.h
+
+Introduction of strftime.
+
+Use of CONVFMT.
+
+Changes from 2.12 to 2.12.1
+-----------------------------
+
+Consolidated treatment of command-line assignments (thus correcting the
+-v treatment).
+
+Rationalized builtin-variable handling into a table-driven process, thus
+simplifying variable() and eliminating spc_var().
+
+Fixed bug in handling of command-line source that ended in a newline.
+
+Simplified install() and lookup().
+
+Did away with double-mallocing of identifiers and now free second and later
+instances of a name, after the first gets installed into the symbol table.
+
+Treat IGNORECASE specially, simplifying a lot of code, and allowing
+checking against strict conformance only on setting it, rather than on each
+pattern match.
+
+Fixed regexp matching when IGNORECASE is non-zero (broken when dfa.c was
+added).
+
+Fixed bug where $0 was not being marked as valid, even after it was rebuilt.
+This caused mangling of $0.
+
+
+Changes from 2.11.1 to 2.12
+-----------------------------
+
+Makefile:
+
+Portability improvements in Makefile.
+Move configuration stuff into config.h
+
+FSF files:
+
+Synchronized alloca.[cs] and regex.[ch] with FSF.
+
+array.c:
+
+Rationalized hash routines into one with a different algorithm.
+delete() now works if the array is a local variable.
+Changed interface of assoc_next() and avoided dereferencing past the end of the
+ array.
+
+awk.h:
+
+Merged non-prototype and prototype declarations in awk.h.
+Expanded tree_eval #define to short-circuit more calls of r_tree_eval().
+
+awk.y:
+
+Delinted some of the code in the grammar.
+Fixed and improved some of the error message printing.
+Changed to accomodate unlimited length source lines.
+Line continuation now works as advertised.
+Source lines can be arbitrarily long.
+Refined grammar hacks so that /= assignment works. Regular expressions
+ starting with /= are recognized at the beginning of a line, after && or ||
+ and after ~ or !~. More contexts can be added if necessary.
+Fixed IGNORECASE (multiple scans for backslash).
+Condensed expression_lists in array references.
+Detect and warn for correct # args in builtin functions -- call most of them
+ with a fixed number (i.e. fill in defaults at parse-time rather than at
+ run-time).
+Load ENVIRON only if it is referenced (detected at parse-time).
+Treat NF, FS, RS, NR, FNR specially at parse time, to improve run time.
+Fold constant expressions at parse time.
+Do make_regexp() on third arg. of split() at parse tiem if it is a constant.
+
+builtin.c:
+
+srand() returns 0 the first time called.
+Replaced alloca() with malloc() in do_sprintf().
+Fixed setting of RSTART and RLENGTH in do_match().
+Got rid of get_{one,two,three} and allowance for variable # of args. at
+ run-time -- this is now done at parse-time.
+Fixed latent bug in [g]sub whereby changes to $0 would never get made.
+Rewrote much of sub_common() for simplicity and performance.
+Added ctime() and time() builtin functions (unless -DSTRICT). ctime() returns
+ a time string like the C function, given the number of seconds since the epoch
+ and time() returns the current time in seconds.
+do_sprintf() now checks for mismatch between format string and number of
+ arguments supplied.
+
+dfa.c
+
+This is borrowed (almost unmodified) from GNU grep to provide faster searches.
+
+eval.c
+
+Node_var, Node_var_array and Node_param_list handled from macro rather
+ than in r_tree_eval().
+Changed cmp_nodes() to not do a force_number() -- this, combined with a
+ force_number() on ARGV[] and ENVIRON[] brings it into line with other awks
+Greatly simplified cmp_nodes().
+Separated out Node_NF, Node_FS, Node_RS, Node_NR and Node_FNR in get_lhs().
+All adjacent string concatenations now done at once.
+
+field.c
+
+Added support for FIELDWIDTHS.
+Fixed bug in get_field() whereby changes to a field were not always
+ properly reflected in $0.
+Reordered tests in parse_field() so that reference off the end of the buffer
+ doesn't happen.
+set_FS() now sets *parse_field i.e. routine to call depending on type of FS.
+It also does make_regexp() for FS if needed. get_field() passes FS_regexp
+ to re_parse_field(), as does do_split().
+Changes to set_field() and set_record() to avoid malloc'ing and free'ing the
+ field nodes repeatedly. The fields now just point into $0 unless they are
+ assigned to another variable or changed. force_number() on the field is
+ *only* done when the field is needed.
+
+gawk.1
+
+Fixed troff formatting problem on .TP lines.
+
+io.c
+
+Moved some code out into iop.c.
+Output from pipes and system() calls is properly synchronized.
+Status from pipe close properly returned.
+Bug in getline with no redirect fixed.
+
+iop.c
+
+This file contains a totally revamped get_a_record and associated code.
+
+main.c
+
+Command line programs no longer use a temporary file.
+Therefore, tmpnam() no longer required.
+Deprecated -a and -e options -- they will go away in the next release,
+ but for now they cause a warning.
+Moved -C, -V, -c options to -W ala posix.
+Added -W posix option: throw out \x
+Added -W lint option.
+
+
+node.c
+
+force_number() now allows pure numerics to have leading whitespace.
+Added make_string facility to optimize case of adding an already malloc'd
+ string.
+Cleaned up and simplified do_deref().
+Fixed bug in handling of stref==255 in do_deref().
+
+re.c
+
+contains the interface to regexp code
+
+Changes from 2.11.1 to FSF version of same
+------------------------------------------
+Thu Jan 4 14:19:30 1990 Jim Kingdon (kingdon at albert)
+
+ * Makefile (YACC): Add -y to bison part.
+
+ * missing.c: Add #include <stdio.h>.
+
+Sun Dec 24 16:16:05 1989 David J. MacKenzie (djm at hobbes.ai.mit.edu)
+
+ * * Makefile: Add (commented out) default defines for Sony News.
+
+ * awk.h: Move declaration of vprintf so it will compile when
+ -DVPRINTF_MISSING is defined.
+
+Mon Nov 13 18:54:08 1989 Robert J. Chassell (bob at apple-gunkies.ai.mit.edu)
+
+ * gawk.texinfo: changed @-commands that are not part of the
+ standard, currently released texinfmt.el to those that are.
+ Otherwise, only people with the as-yet unreleased makeinfo.c can
+ format this file.
+
+Changes from 2.11beta to 2.11.1 (production)
+--------------------------------------------
+
+Went from "beta" to production status!!!
+
+Now flushes stdout before closing pipes or redirected files to
+synchonize output.
+
+MS-DOS changes added in.
+
+Signal handler return type parameterized in Makefile and awk.h and
+some lint removed. debug.c cleaned up.
+
+Fixed FS splitting to never match null strings, per book.
+
+Correction to the manual's description of FS.
+
+Some compilers break on char *foo = "string" + 4 so fixed version.sh and
+main.c.
+
+Changes from 2.10beta to 2.11beta
+---------------------------------
+
+This release fixes all reported bugs that we could reproduce. Probably
+some of the changes are not documented here.
+
+The next release will probably not be a beta release!
+
+The most important change is the addition of the -nostalgia option. :-)
+
+The documentation has been improved and brought up-to-date.
+
+There has been a lot of general cleaning up of the code that is not otherwise
+documented here. There has been a movement toward using standard-conforming
+library routines and providing them (in missing.d) for systems lacking them.
+Improved (hopefully) configuration through Makfile modifications and missing.c.
+In particular, straightened out confusion over vprintf #defines, declarations
+etc.
+
+Deleted RCS log comments from source, to reduce source size by about one third.
+Most of them were horribly out-of-date, anyway.
+
+Renamed source files to reflect (for the most part) their contents.
+
+More and improved error messages. Cleanup and fixes to yyerror().
+String constants are not altered in input buffer, so error messages come out
+better. Fixed usage message. Make use of ANSI C strerror() function
+(provided).
+
+Plugged many more memory leaks. The memory consumption is now quite
+reasonable over a wide range of programs.
+
+Uses volatile declaration if STDC > 0 to avoid problems due to longjmp.
+
+New -a and -e options to use awk or egrep style regexps, respectively,
+since POSIX says awk should use egrep regexps. Default is -a.
+
+Added -v option for setting variables before the first file is encountered.
+Version information now uses -V and copyleft uses -C.
+
+Added a patchlevel.h file and its use for -V and -C.
+
+Append_right() optimized for major improvement to programs with a *lot*
+of statements.
+
+Operator precedence has been corrected to match draft Posix.
+
+Tightened up grammar for builtin functions so that only length
+may be called without arguments or parentheses.
+
+/regex/ is now a normal expression that can appear in any expression
+context.
+
+Allow /= to begin a regexp. Allow ..[../..].. in a regexp.
+
+Allow empty compound statements ({}).
+
+Made return and next illegal outside a function and in BEGIN/END respectively.
+
+Division by zero is now illegal and causes a fatal error.
+
+Fixed exponentiation so that x ^ 0 and x ^= 0 both return 1.
+
+Fixed do_sqrt, do_log, and do_exp to do argument/return checking and
+print an error message, per the manual.
+
+Fixed main to catch SIGSEGV to get source and data file line numbers.
+
+Fixed yyerror to print the ^ at the beginning of the bad token, not the end.
+
+Fix to substr() builtin: it was failing if the arguments
+weren't already strings.
+
+Added new node value flag NUMERIC to indicate that a variable is
+purely a number as opposed to type NUM which indicates that
+the node's numeric value is valid. This is set in make_number(),
+tmp_number and r_force_number() when appropriate and used in
+cmp_nodes(). This fixed a bug in comparison of variables that had
+numeric prefixes. The new code uses strtod() and eliminates is_a_number().
+A simple strtod() is provided for systems lacking one. It does no
+overflow checking, so could be improved.
+
+Simplification and efficiency improvement in force_string.
+
+Added performance tweak in r_force_number().
+
+Fixed a bug with nested loops and break/continue in functions.
+
+Fixed inconsistency in handling of empty fields when $0 has to be rebuilt.
+Happens to simplify rebuild_record().
+
+Cleaned up the code associated with opening a pipe for reading. Gawk
+now has its own popen routine (gawk_popen) that allocates an IOBUF
+and keeps track of the pid of the child process. gawk_pclose
+marks the appropriate child as defunct in the right struct redirect.
+
+Cleaned up and fixed close_redir().
+
+Fixed an obscure bug to do with redirection. Intermingled ">" and ">>"
+redirects did not output in a predictable order.
+
+Improved handling of output bufferring: now all print[f]s redirected to a tty
+or pipe are flushed immediately and non-redirected output to a tty is flushed
+before the next input record is read.
+
+Fixed a bug in get_a_record() where bcopy() could have copied over
+a random pointer.
+
+Fixed a bug when RS="" and records separated by multiple blank lines.
+
+Got rid of SLOWIO code which was out-of-date anyway.
+
+Fix in get_field() for case where $0 is changed and then $(n) are
+changed and then $0 is used.
+
+Fixed infinite loop on failure to open file for reading from getline.
+Now handles redirect file open failures properly.
+
+Filenames such as /dev/stdin now allowed on the command line as well as
+in redirects.
+
+Fixed so that gawk '$1' where $1 is a zero tests false.
+
+Fixed parsing so that `RLENGTH -1' parses the same as `RLENGTH - 1',
+for example.
+
+The return from a user-defined function now defaults to the Null node.
+This fixes a core-dump-causing bug when the return value of a function
+is used and that function returns no value.
+
+Now catches floating point exceptions to avoid core dumps.
+
+Bug fix for deleting elements of an array -- under some conditions, it was
+deleting more than one element at a time.
+
+Fix in AWKPATH code for running off the end of the string.
+
+Fixed handling of precision in *printf calls. %0.2d now works properly,
+as does %c. [s]printf now recognizes %i and %X.
+
+Fixed a bug in printing of very large (>240) strings.
+
+Cleaned up erroneous behaviour for RS == "".
+
+Added IGNORECASE support to index().
+
+Simplified and fixed newnode/freenode.
+
+Fixed reference to $(anything) in a BEGIN block.
+
+Eliminated use of USG rand48().
+
+Bug fix in force_string for machines with 16-bit ints.
+
+Replaced use of mktemp() with tmpnam() and provided a partial implementation of
+the latter for systems that don't have it.
+
+Added a portability check for includes in io.c.
+
+Minor portability fix in alloc.c plus addition of xmalloc().
+
+Portability fix: on UMAX4.2, st_blksize is zero for a pipe, thus breaking
+iop_alloc() -- fixed.
+
+Workaround for compiler bug on Sun386i in do_sprintf.
+
+More and improved prototypes in awk.h.
+
+Consolidated C escape parsing code into one place.
+
+strict flag is now turned on only when invoked with compatability option.
+It now applies to fewer things.
+
+Changed cast of f._ptr in vprintf.c from (unsigned char *) to (char *).
+Hopefully this is right for the systems that use this code (I don't).
+
+Support for pipes under MSDOS added.
diff --git a/gnu/usr.bin/awk/PORTS b/gnu/usr.bin/awk/PORTS
new file mode 100644
index 000000000000..95e133f9dd03
--- /dev/null
+++ b/gnu/usr.bin/awk/PORTS
@@ -0,0 +1,32 @@
+A recent version of gawk has been successfully compiled and run "make test"
+on the following:
+
+Sun 4/490 running 4.1
+NeXT running 2.0
+DECstation 3100 running Ultrix 4.0 or Ultrix 3.1 (different config)
+AtariST (16-bit ints, gcc compiler, byacc, running under TOS)
+ESIX V.3.2 Rev D (== System V Release 3.2), the 386. compiler was gcc + bison
+IBM RS/6000 (see README.rs6000)
+486 running SVR4, using cc and bison
+SGI running IRIX 3.3 using gcc (fails with cc)
+Sequent Balance running Dynix V3.1
+Cray Y-MP8 running Unicos 6.0.11
+Cray 2 running Unicos 6.1 (modulo trailing zeroes in chem)
+VAX/VMS V5.x (should also work on 4.6 and 4.7)
+VMS POSIX V1.0, V1.1
+OpenVMS AXP V1.0
+MSDOS - Microsoft C 5.1, compiles and runs very simple testing
+BSD 4.4alpha
+
+From: ghazi@caip.rutgers.edu (Kaveh R. Ghazi):
+
+arch configured as:
+---- --------------
+Hpux 9.0 hpux8x
+NeXTStep 2.0 next20
+Sgi Irix 4.0.5 (/bin/cc) sgi405.cc (new file)
+Stardent Titan 1500 OSv2.5 sysv3
+Stardent Vistra (i860) SVR4 sysv4
+SunOS 4.1.2 sunos41
+Tektronix XD88 (UTekV 3.2e) sysv3
+Ultrix 4.2 ultrix41
diff --git a/gnu/usr.bin/awk/POSIX b/gnu/usr.bin/awk/POSIX
new file mode 100644
index 000000000000..f2405420aedf
--- /dev/null
+++ b/gnu/usr.bin/awk/POSIX
@@ -0,0 +1,95 @@
+Right now, the numeric vs. string comparisons are screwed up in draft
+11.2. What prompted me to check it out was the note in gnu.bug.utils
+which observed that gawk was doing the comparison $1 == "000"
+numerically. I think that we can agree that intuitively, this should
+be done as a string comparison. Version 2.13.2 of gawk follows the
+current POSIX draft. Following is how I (now) think this
+stuff should be done.
+
+1. A numeric literal or the result of a numeric operation has the NUMERIC
+ attribute.
+
+2. A string literal or the result of a string operation has the STRING
+ attribute.
+
+3. Fields, getline input, FILENAME, ARGV elements, ENVIRON elements and the
+ elements of an array created by split() that are numeric strings
+ have the STRNUM attribute. Otherwise, they have the STRING attribute.
+ Uninitialized variables also have the STRNUM attribute.
+
+4. Attributes propagate across assignments, but are not changed by
+ any use. (Although a use may cause the entity to acquire an additional
+ value such that it has both a numeric and string value -- this leaves the
+ attribute unchanged.)
+
+When two operands are compared, either string comparison or numeric comparison
+may be used, depending on the attributes of the operands, according to the
+following (symmetric) matrix:
+
+ +----------------------------------------------
+ | STRING NUMERIC STRNUM
+--------+----------------------------------------------
+ |
+STRING | string string string
+ |
+NUMERIC | string numeric numeric
+ |
+STRNUM | string numeric numeric
+--------+----------------------------------------------
+
+So, the following program should print all OKs.
+
+echo '0e2 0a 0 0b
+0e2 0a 0 0b' |
+$AWK '
+NR == 1 {
+ num = 0
+ str = "0e2"
+
+ print ++test ": " ( (str == "0e2") ? "OK" : "OOPS" )
+ print ++test ": " ( ("0e2" != 0) ? "OK" : "OOPS" )
+ print ++test ": " ( ("0" != $2) ? "OK" : "OOPS" )
+ print ++test ": " ( ("0e2" == $1) ? "OK" : "OOPS" )
+
+ print ++test ": " ( (0 == "0") ? "OK" : "OOPS" )
+ print ++test ": " ( (0 == num) ? "OK" : "OOPS" )
+ print ++test ": " ( (0 != $2) ? "OK" : "OOPS" )
+ print ++test ": " ( (0 == $1) ? "OK" : "OOPS" )
+
+ print ++test ": " ( ($1 != "0") ? "OK" : "OOPS" )
+ print ++test ": " ( ($1 == num) ? "OK" : "OOPS" )
+ print ++test ": " ( ($2 != 0) ? "OK" : "OOPS" )
+ print ++test ": " ( ($2 != $1) ? "OK" : "OOPS" )
+ print ++test ": " ( ($3 == 0) ? "OK" : "OOPS" )
+ print ++test ": " ( ($3 == $1) ? "OK" : "OOPS" )
+ print ++test ": " ( ($2 != $4) ? "OK" : "OOPS" ) # 15
+}
+{
+ a = "+2"
+ b = 2
+ if (NR % 2)
+ c = a + b
+ print ++test ": " ( (a != b) ? "OK" : "OOPS" ) # 16 and 22
+
+ d = "2a"
+ b = 2
+ if (NR % 2)
+ c = d + b
+ print ++test ": " ( (d != b) ? "OK" : "OOPS" )
+
+ print ++test ": " ( (d + 0 == b) ? "OK" : "OOPS" )
+
+ e = "2"
+ print ++test ": " ( (e == b "") ? "OK" : "OOPS" )
+
+ a = "2.13"
+ print ++test ": " ( (a == 2.13) ? "OK" : "OOPS" )
+
+ a = "2.130000"
+ print ++test ": " ( (a != 2.13) ? "OK" : "OOPS" )
+
+ if (NR == 2) {
+ CONVFMT = "%.6f"
+ print ++test ": " ( (a == 2.13) ? "OK" : "OOPS" )
+ }
+}'
diff --git a/gnu/usr.bin/awk/PROBLEMS b/gnu/usr.bin/awk/PROBLEMS
new file mode 100644
index 000000000000..3b7c5148bd8e
--- /dev/null
+++ b/gnu/usr.bin/awk/PROBLEMS
@@ -0,0 +1,6 @@
+This is a list of known problems in gawk 2.15.
+Hopefully they will all be fixed in the next major release of gawk.
+
+Please keep in mind that the code is still undergoing significant evolution.
+
+1. Gawk's printf is probably still not POSIX compliant.
diff --git a/gnu/usr.bin/awk/README b/gnu/usr.bin/awk/README
new file mode 100644
index 000000000000..f4bd3df806c8
--- /dev/null
+++ b/gnu/usr.bin/awk/README
@@ -0,0 +1,116 @@
+README:
+
+This is GNU Awk 2.15. It should be upwardly compatible with the
+System V Release 4 awk. It is almost completely compliant with draft 11.3
+of POSIX 1003.2.
+
+This release adds new features -- see NEWS for details.
+
+See the installation instructions, below.
+
+Known problems are given in the PROBLEMS file. Work to be done is
+described briefly in the FUTURES file. Verified ports are listed in
+the PORTS file. Changes in this version are summarized in the CHANGES file.
+Please read the LIMITATIONS and ACKNOWLEDGMENT files.
+
+Read the file POSIX for a discussion of how the standard says comparisons
+should be done vs. how they really should be done and how gawk does them.
+
+To format the documentation with TeX, you must use texinfo.tex 2.53
+or later. Otherwise footnotes look unacceptable.
+
+If you wish to remake the Info files, you should use makeinfo. The 2.15
+version of makeinfo works with no errors.
+
+The man page is up to date.
+
+INSTALLATION:
+
+Check whether there is a system-specific README file for your system.
+
+Makefile.in may need some tailoring. The only changes necessary should
+be to change installation targets or to change compiler flags.
+The changes to make in Makefile.in are commented and should be obvious.
+
+All other changes should be made in a config file. Samples for
+various systems are included in the config directory. Starting with
+2.11, our intent has been to make the code conform to standards (ANSI,
+POSIX, SVID, in that order) whenever possible, and to not penalize
+standard conforming systems. We have included substitute versions of
+routines not universally available. Simply add the appropriate define
+for the missing feature(s) on your system.
+
+If you have neither bison nor yacc, use the awktab.c file here. It was
+generated with bison, and should have no AT&T code in it. (Note that
+modifying awk.y without bison or yacc will be difficult, at best. You might
+want to get a copy of bison from the FSF too.)
+
+If no config file is included for your system, start by copying one
+for a similar system. One way of determining the defines needed is to
+try to load gawk with nothing defined and see what routines are
+unresolved by the loader. This should give you a good idea of how to
+proceed.
+
+The next release will use the FSF autoconfig program, so we are no longer
+soliciting new config files.
+
+If you have an MS-DOS system, use the stuff in the pc directory.
+For an Atari there is an atari directory and similarly one for VMS.
+
+Chapter 16 of The GAWK Manual discusses configuration in detail.
+
+After successful compilation, do 'make test' to run a small test
+suite. There should be no output from the 'cmp' invocations except in
+the cases where there are small differences in floating point values.
+If there are other differences, please investigate and report the
+problem.
+
+PRINTING THE MANUAL
+
+The 'support' directory contains texinfo.tex 2.65, which will be necessary
+for printing the manual, and the texindex.c program from the texinfo
+distribution which is also necessary. See the makefile for the steps needed
+to get a DVI file from the manual.
+
+CAVEATS
+
+The existence of a patchlevel.h file does *N*O*T* imply a commitment on
+our part to issue bug fixes or patches. It is there in case we should
+decide to do so.
+
+BUG REPORTS AND FIXES (Un*x systems):
+
+Please coordinate changes through David Trueman and/or Arnold Robbins.
+
+David Trueman
+Department of Mathematics, Statistics and Computing Science,
+Dalhousie University, Halifax, Nova Scotia, Canada
+
+UUCP: {uunet utai watmath}!dalcs!david
+INTERNET: david@cs.dal.ca
+
+Arnold Robbins
+1736 Reindeer Drive
+Atlanta, GA, 30329, USA
+
+INTERNET: arnold@skeeve.atl.ga.us
+UUCP: { gatech, emory, emoryu1 }!skeeve!arnold
+
+BUG REPORTS AND FIXES (non-Unix ports):
+
+MS-DOS:
+ Scott Deifik
+ AMGEN Inc.
+ Amgen Center, Bldg.17-Dept.393
+ Thousand Oaks, CA 91320-1789
+ Tel-805-499-5725 ext.4677
+ Fax-805-498-0358
+ scottd@amgen.com
+
+VMS:
+ Pat Rankin
+ rankin@eql.caltech.edu (e-mail only)
+
+Atari ST:
+ Michal Jaegermann
+ NTOMCZAK@vm.ucs.UAlberta.CA (e-mail only)
diff --git a/gnu/usr.bin/awk/array.c b/gnu/usr.bin/awk/array.c
new file mode 100644
index 000000000000..59be340c04df
--- /dev/null
+++ b/gnu/usr.bin/awk/array.c
@@ -0,0 +1,293 @@
+/*
+ * array.c - routines for associative arrays.
+ */
+
+/*
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ *
+ * This file is part of GAWK, the GNU implementation of the
+ * AWK Progamming Language.
+ *
+ * GAWK is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * GAWK is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with GAWK; see the file COPYING. If not, write to
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include "awk.h"
+
+static NODE *assoc_find P((NODE *symbol, NODE *subs, int hash1));
+
+NODE *
+concat_exp(tree)
+register NODE *tree;
+{
+ register NODE *r;
+ char *str;
+ char *s;
+ unsigned len;
+ int offset;
+ int subseplen;
+ char *subsep;
+
+ if (tree->type != Node_expression_list)
+ return force_string(tree_eval(tree));
+ r = force_string(tree_eval(tree->lnode));
+ if (tree->rnode == NULL)
+ return r;
+ subseplen = SUBSEP_node->lnode->stlen;
+ subsep = SUBSEP_node->lnode->stptr;
+ len = r->stlen + subseplen + 2;
+ emalloc(str, char *, len, "concat_exp");
+ memcpy(str, r->stptr, r->stlen+1);
+ s = str + r->stlen;
+ free_temp(r);
+ tree = tree->rnode;
+ while (tree) {
+ if (subseplen == 1)
+ *s++ = *subsep;
+ else {
+ memcpy(s, subsep, subseplen+1);
+ s += subseplen;
+ }
+ r = force_string(tree_eval(tree->lnode));
+ len += r->stlen + subseplen;
+ offset = s - str;
+ erealloc(str, char *, len, "concat_exp");
+ s = str + offset;
+ memcpy(s, r->stptr, r->stlen+1);
+ s += r->stlen;
+ free_temp(r);
+ tree = tree->rnode;
+ }
+ r = make_str_node(str, s - str, ALREADY_MALLOCED);
+ r->flags |= TEMP;
+ return r;
+}
+
+/* Flush all the values in symbol[] before doing a split() */
+void
+assoc_clear(symbol)
+NODE *symbol;
+{
+ int i;
+ NODE *bucket, *next;
+
+ if (symbol->var_array == 0)
+ return;
+ for (i = 0; i < HASHSIZE; i++) {
+ for (bucket = symbol->var_array[i]; bucket; bucket = next) {
+ next = bucket->ahnext;
+ unref(bucket->ahname);
+ unref(bucket->ahvalue);
+ freenode(bucket);
+ }
+ symbol->var_array[i] = 0;
+ }
+}
+
+/*
+ * calculate the hash function of the string in subs
+ */
+unsigned int
+hash(s, len)
+register char *s;
+register int len;
+{
+ register unsigned long h = 0, g;
+
+ while (len--) {
+ h = (h << 4) + *s++;
+ g = (h & 0xf0000000);
+ if (g) {
+ h = h ^ (g >> 24);
+ h = h ^ g;
+ }
+ }
+ if (h < HASHSIZE)
+ return h;
+ else
+ return h%HASHSIZE;
+}
+
+/*
+ * locate symbol[subs]
+ */
+static NODE * /* NULL if not found */
+assoc_find(symbol, subs, hash1)
+NODE *symbol;
+register NODE *subs;
+int hash1;
+{
+ register NODE *bucket, *prev = 0;
+
+ for (bucket = symbol->var_array[hash1]; bucket; bucket = bucket->ahnext) {
+ if (cmp_nodes(bucket->ahname, subs) == 0) {
+ if (prev) { /* move found to front of chain */
+ prev->ahnext = bucket->ahnext;
+ bucket->ahnext = symbol->var_array[hash1];
+ symbol->var_array[hash1] = bucket;
+ }
+ return bucket;
+ } else
+ prev = bucket; /* save previous list entry */
+ }
+ return NULL;
+}
+
+/*
+ * test whether the array element symbol[subs] exists or not
+ */
+int
+in_array(symbol, subs)
+NODE *symbol, *subs;
+{
+ register int hash1;
+
+ if (symbol->type == Node_param_list)
+ symbol = stack_ptr[symbol->param_cnt];
+ if (symbol->var_array == 0)
+ return 0;
+ subs = concat_exp(subs); /* concat_exp returns a string node */
+ hash1 = hash(subs->stptr, subs->stlen);
+ if (assoc_find(symbol, subs, hash1) == NULL) {
+ free_temp(subs);
+ return 0;
+ } else {
+ free_temp(subs);
+ return 1;
+ }
+}
+
+/*
+ * SYMBOL is the address of the node (or other pointer) being dereferenced.
+ * SUBS is a number or string used as the subscript.
+ *
+ * Find SYMBOL[SUBS] in the assoc array. Install it with value "" if it
+ * isn't there. Returns a pointer ala get_lhs to where its value is stored
+ */
+NODE **
+assoc_lookup(symbol, subs)
+NODE *symbol, *subs;
+{
+ register int hash1;
+ register NODE *bucket;
+
+ (void) force_string(subs);
+ hash1 = hash(subs->stptr, subs->stlen);
+
+ if (symbol->var_array == 0) { /* this table really should grow
+ * dynamically */
+ unsigned size;
+
+ size = sizeof(NODE *) * HASHSIZE;
+ emalloc(symbol->var_array, NODE **, size, "assoc_lookup");
+ memset((char *)symbol->var_array, 0, size);
+ symbol->type = Node_var_array;
+ } else {
+ bucket = assoc_find(symbol, subs, hash1);
+ if (bucket != NULL) {
+ free_temp(subs);
+ return &(bucket->ahvalue);
+ }
+ }
+
+ /* It's not there, install it. */
+ if (do_lint && subs->stlen == 0)
+ warning("subscript of array `%s' is null string",
+ symbol->vname);
+ getnode(bucket);
+ bucket->type = Node_ahash;
+ if (subs->flags & TEMP)
+ bucket->ahname = dupnode(subs);
+ else {
+ unsigned int saveflags = subs->flags;
+
+ subs->flags &= ~MALLOC;
+ bucket->ahname = dupnode(subs);
+ subs->flags = saveflags;
+ }
+ free_temp(subs);
+
+ /* array subscripts are strings */
+ bucket->ahname->flags &= ~NUMBER;
+ bucket->ahname->flags |= STRING;
+ bucket->ahvalue = Nnull_string;
+ bucket->ahnext = symbol->var_array[hash1];
+ symbol->var_array[hash1] = bucket;
+ return &(bucket->ahvalue);
+}
+
+void
+do_delete(symbol, tree)
+NODE *symbol, *tree;
+{
+ register int hash1;
+ register NODE *bucket, *last;
+ NODE *subs;
+
+ if (symbol->type == Node_param_list)
+ symbol = stack_ptr[symbol->param_cnt];
+ if (symbol->var_array == 0)
+ return;
+ subs = concat_exp(tree); /* concat_exp returns string node */
+ hash1 = hash(subs->stptr, subs->stlen);
+
+ last = NULL;
+ for (bucket = symbol->var_array[hash1]; bucket; last = bucket, bucket = bucket->ahnext)
+ if (cmp_nodes(bucket->ahname, subs) == 0)
+ break;
+ free_temp(subs);
+ if (bucket == NULL)
+ return;
+ if (last)
+ last->ahnext = bucket->ahnext;
+ else
+ symbol->var_array[hash1] = bucket->ahnext;
+ unref(bucket->ahname);
+ unref(bucket->ahvalue);
+ freenode(bucket);
+}
+
+void
+assoc_scan(symbol, lookat)
+NODE *symbol;
+struct search *lookat;
+{
+ if (!symbol->var_array) {
+ lookat->retval = NULL;
+ return;
+ }
+ lookat->arr_ptr = symbol->var_array;
+ lookat->arr_end = lookat->arr_ptr + HASHSIZE; /* added */
+ lookat->bucket = symbol->var_array[0];
+ assoc_next(lookat);
+}
+
+void
+assoc_next(lookat)
+struct search *lookat;
+{
+ while (lookat->arr_ptr < lookat->arr_end) {
+ if (lookat->bucket != 0) {
+ lookat->retval = lookat->bucket->ahname;
+ lookat->bucket = lookat->bucket->ahnext;
+ return;
+ }
+ lookat->arr_ptr++;
+ if (lookat->arr_ptr < lookat->arr_end)
+ lookat->bucket = *(lookat->arr_ptr);
+ else
+ lookat->retval = NULL;
+ }
+ return;
+}
diff --git a/gnu/usr.bin/awk/awk.1 b/gnu/usr.bin/awk/awk.1
new file mode 100644
index 000000000000..0338485e8db8
--- /dev/null
+++ b/gnu/usr.bin/awk/awk.1
@@ -0,0 +1,1873 @@
+.ds PX \s-1POSIX\s+1
+.ds UX \s-1UNIX\s+1
+.ds AN \s-1ANSI\s+1
+.TH GAWK 1 "Apr 15 1993" "Free Software Foundation" "Utility Commands"
+.SH NAME
+gawk \- pattern scanning and processing language
+.SH SYNOPSIS
+.B gawk
+[ POSIX or GNU style options ]
+.B \-f
+.I program-file
+[
+.B \-\^\-
+] file .\^.\^.
+.br
+.B gawk
+[ POSIX or GNU style options ]
+[
+.B \-\^\-
+]
+.I program-text
+file .\^.\^.
+.SH DESCRIPTION
+.I Gawk
+is the GNU Project's implementation of the AWK programming language.
+It conforms to the definition of the language in
+the \*(PX 1003.2 Command Language And Utilities Standard.
+This version in turn is based on the description in
+.IR "The AWK Programming Language" ,
+by Aho, Kernighan, and Weinberger,
+with the additional features defined in the System V Release 4 version
+of \*(UX
+.IR awk .
+.I Gawk
+also provides some GNU-specific extensions.
+.PP
+The command line consists of options to
+.I gawk
+itself, the AWK program text (if not supplied via the
+.B \-f
+or
+.B \-\^\-file
+options), and values to be made
+available in the
+.B ARGC
+and
+.B ARGV
+pre-defined AWK variables.
+.SH OPTIONS
+.PP
+.I Gawk
+options may be either the traditional \*(PX one letter options,
+or the GNU style long options. \*(PX style options start with a single ``\-'',
+while GNU long options start with ``\-\^\-''.
+GNU style long options are provided for both GNU-specific features and
+for \*(PX mandated features. Other implementations of the AWK language
+are likely to only accept the traditional one letter options.
+.PP
+Following the \*(PX standard,
+.IR gawk -specific
+options are supplied via arguments to the
+.B \-W
+option. Multiple
+.B \-W
+options may be supplied, or multiple arguments may be supplied together
+if they are separated by commas, or enclosed in quotes and separated
+by white space.
+Case is ignored in arguments to the
+.B \-W
+option.
+Each
+.B \-W
+option has a corresponding GNU style long option, as detailed below.
+.PP
+.I Gawk
+accepts the following options.
+.TP
+.PD 0
+.BI \-F " fs"
+.TP
+.PD
+.BI \-\^\-field-separator= fs
+Use
+.I fs
+for the input field separator (the value of the
+.B FS
+predefined
+variable).
+.TP
+.PD 0
+\fB\-v\fI var\fB\^=\^\fIval\fR
+.TP
+.PD
+\fB\-\^\-assign=\fIvar\fB\^=\^\fIval\fR
+Assign the value
+.IR val ,
+to the variable
+.IR var ,
+before execution of the program begins.
+Such variable values are available to the
+.B BEGIN
+block of an AWK program.
+.TP
+.PD 0
+.BI \-f " program-file"
+.TP
+.PD
+.BI \-\^\-file= program-file
+Read the AWK program source from the file
+.IR program-file ,
+instead of from the first command line argument.
+Multiple
+.B \-f
+(or
+.BR \-\^\-file )
+options may be used.
+.TP \w'\fB\-\^\-copyright\fR'u+1n
+.PD 0
+.B "\-W compat"
+.TP
+.PD
+.B \-\^\-compat
+Run in
+.I compatibility
+mode. In compatibility mode,
+.I gawk
+behaves identically to \*(UX
+.IR awk ;
+none of the GNU-specific extensions are recognized.
+See
+.BR "GNU EXTENSIONS" ,
+below, for more information.
+.TP
+.PD 0
+.B "\-W copyleft"
+.TP
+.PD 0
+.B "\-W copyright"
+.TP
+.PD 0
+.B \-\^\-copyleft
+.TP
+.PD
+.B \-\^\-copyright
+Print the short version of the GNU copyright information message on
+the error output.
+.TP
+.PD 0
+.B "\-W help"
+.TP
+.PD 0
+.B "\-W usage"
+.TP
+.PD 0
+.B \-\^\-help
+.TP
+.PD
+.B \-\^\-usage
+Print a relatively short summary of the available options on
+the error output.
+.TP
+.PD 0
+.B "\-W lint"
+.TP
+.PD 0
+.B \-\^\-lint
+Provide warnings about constructs that are
+dubious or non-portable to other AWK implementations.
+.ig
+.\" This option is left undocumented, on purpose.
+.TP
+.PD 0
+.B "\-W nostalgia"
+.TP
+.PD
+.B \-\^\-nostalgia
+Provide a moment of nostalgia for long time
+.I awk
+users.
+..
+.TP
+.PD 0
+.B "\-W posix"
+.TP
+.PD
+.B \-\^\-posix
+This turns on
+.I compatibility
+mode, with the following additional restrictions:
+.RS
+.TP \w'\(bu'u+1n
+\(bu
+.B \ex
+escape sequences are not recognized.
+.TP
+\(bu
+The synonym
+.B func
+for the keyword
+.B function
+is not recognized.
+.TP
+\(bu
+The operators
+.B **
+and
+.B **=
+cannot be used in place of
+.B ^
+and
+.BR ^= .
+.RE
+.TP
+.PD 0
+.BI "\-W source=" program-text
+.TP
+.PD
+.BI \-\^\-source= program-text
+Use
+.I program-text
+as AWK program source code.
+This option allows the easy intermixing of library functions (used via the
+.B \-f
+and
+.B \-\^\-file
+options) with source code entered on the command line.
+It is intended primarily for medium to large size AWK programs used
+in shell scripts.
+.sp .5
+The
+.B "\-W source="
+form of this option uses the rest of the command line argument for
+.IR program-text ;
+no other options to
+.B \-W
+will be recognized in the same argument.
+.TP
+.PD 0
+.B "\-W version"
+.TP
+.PD
+.B \-\^\-version
+Print version information for this particular copy of
+.I gawk
+on the error output.
+This is useful mainly for knowing if the current copy of
+.I gawk
+on your system
+is up to date with respect to whatever the Free Software Foundation
+is distributing.
+.TP
+.B \-\^\-
+Signal the end of options. This is useful to allow further arguments to the
+AWK program itself to start with a ``\-''.
+This is mainly for consistency with the argument parsing convention used
+by most other \*(PX programs.
+.PP
+Any other options are flagged as illegal, but are otherwise ignored.
+.SH AWK PROGRAM EXECUTION
+.PP
+An AWK program consists of a sequence of pattern-action statements
+and optional function definitions.
+.RS
+.PP
+\fIpattern\fB { \fIaction statements\fB }\fR
+.br
+\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements\fB }\fR
+.RE
+.PP
+.I Gawk
+first reads the program source from the
+.IR program-file (s)
+if specified, or from the first non-option argument on the command line.
+The
+.B \-f
+option may be used multiple times on the command line.
+.I Gawk
+will read the program text as if all the
+.IR program-file s
+had been concatenated together. This is useful for building libraries
+of AWK functions, without having to include them in each new AWK
+program that uses them. To use a library function in a file from a
+program typed in on the command line, specify
+.B /dev/tty
+as one of the
+.IR program-file s,
+type your program, and end it with a
+.B ^D
+(control-d).
+.PP
+The environment variable
+.B AWKPATH
+specifies a search path to use when finding source files named with
+the
+.B \-f
+option. If this variable does not exist, the default path is
+\fB".:/usr/lib/awk:/usr/local/lib/awk"\fR.
+If a file name given to the
+.B \-f
+option contains a ``/'' character, no path search is performed.
+.PP
+.I Gawk
+executes AWK programs in the following order.
+First,
+.I gawk
+compiles the program into an internal form.
+Next, all variable assignments specified via the
+.B \-v
+option are performed. Then,
+.I gawk
+executes the code in the
+.B BEGIN
+block(s) (if any),
+and then proceeds to read
+each file named in the
+.B ARGV
+array.
+If there are no files named on the command line,
+.I gawk
+reads the standard input.
+.PP
+If a filename on the command line has the form
+.IB var = val
+it is treated as a variable assignment. The variable
+.I var
+will be assigned the value
+.IR val .
+(This happens after any
+.B BEGIN
+block(s) have been run.)
+Command line variable assignment
+is most useful for dynamically assigning values to the variables
+AWK uses to control how input is broken into fields and records. It
+is also useful for controlling state if multiple passes are needed over
+a single data file.
+.PP
+If the value of a particular element of
+.B ARGV
+is empty (\fB""\fR),
+.I gawk
+skips over it.
+.PP
+For each line in the input,
+.I gawk
+tests to see if it matches any
+.I pattern
+in the AWK program.
+For each pattern that the line matches, the associated
+.I action
+is executed.
+The patterns are tested in the order they occur in the program.
+.PP
+Finally, after all the input is exhausted,
+.I gawk
+executes the code in the
+.B END
+block(s) (if any).
+.SH VARIABLES AND FIELDS
+AWK variables are dynamic; they come into existence when they are
+first used. Their values are either floating-point numbers or strings,
+or both,
+depending upon how they are used. AWK also has one dimension
+arrays; multiply dimensioned arrays may be simulated.
+Several pre-defined variables are set as a program
+runs; these will be described as needed and summarized below.
+.SS Fields
+.PP
+As each input line is read,
+.I gawk
+splits the line into
+.IR fields ,
+using the value of the
+.B FS
+variable as the field separator.
+If
+.B FS
+is a single character, fields are separated by that character.
+Otherwise,
+.B FS
+is expected to be a full regular expression.
+In the special case that
+.B FS
+is a single blank, fields are separated
+by runs of blanks and/or tabs.
+Note that the value of
+.B IGNORECASE
+(see below) will also affect how fields are split when
+.B FS
+is a regular expression.
+.PP
+If the
+.B FIELDWIDTHS
+variable is set to a space separated list of numbers, each field is
+expected to have fixed width, and
+.I gawk
+will split up the record using the specified widths. The value of
+.B FS
+is ignored.
+Assigning a new value to
+.B FS
+overrides the use of
+.BR FIELDWIDTHS ,
+and restores the default behavior.
+.PP
+Each field in the input line may be referenced by its position,
+.BR $1 ,
+.BR $2 ,
+and so on.
+.B $0
+is the whole line. The value of a field may be assigned to as well.
+Fields need not be referenced by constants:
+.RS
+.PP
+.ft B
+n = 5
+.br
+print $n
+.ft R
+.RE
+.PP
+prints the fifth field in the input line.
+The variable
+.B NF
+is set to the total number of fields in the input line.
+.PP
+References to non-existent fields (i.e. fields after
+.BR $NF )
+produce the null-string. However, assigning to a non-existent field
+(e.g.,
+.BR "$(NF+2) = 5" )
+will increase the value of
+.BR NF ,
+create any intervening fields with the null string as their value, and
+cause the value of
+.B $0
+to be recomputed, with the fields being separated by the value of
+.BR OFS .
+.SS Built-in Variables
+.PP
+AWK's built-in variables are:
+.PP
+.TP \w'\fBFIELDWIDTHS\fR'u+1n
+.B ARGC
+The number of command line arguments (does not include options to
+.IR gawk ,
+or the program source).
+.TP
+.B ARGIND
+The index in
+.B ARGV
+of the current file being processed.
+.TP
+.B ARGV
+Array of command line arguments. The array is indexed from
+0 to
+.B ARGC
+\- 1.
+Dynamically changing the contents of
+.B ARGV
+can control the files used for data.
+.TP
+.B CONVFMT
+The conversion format for numbers, \fB"%.6g"\fR, by default.
+.TP
+.B ENVIRON
+An array containing the values of the current environment.
+The array is indexed by the environment variables, each element being
+the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be
+.BR /u/arnold ).
+Changing this array does not affect the environment seen by programs which
+.I gawk
+spawns via redirection or the
+.B system()
+function.
+(This may change in a future version of
+.IR gawk .)
+.\" but don't hold your breath...
+.TP
+.B ERRNO
+If a system error occurs either doing a redirection for
+.BR getline ,
+during a read for
+.BR getline ,
+or during a
+.BR close ,
+then
+.B ERRNO
+will contain
+a string describing the error.
+.TP
+.B FIELDWIDTHS
+A white-space separated list of fieldwidths. When set,
+.I gawk
+parses the input into fields of fixed width, instead of using the
+value of the
+.B FS
+variable as the field separator.
+The fixed field width facility is still experimental; expect the
+semantics to change as
+.I gawk
+evolves over time.
+.TP
+.B FILENAME
+The name of the current input file.
+If no files are specified on the command line, the value of
+.B FILENAME
+is ``\-''.
+.TP
+.B FNR
+The input record number in the current input file.
+.TP
+.B FS
+The input field separator, a blank by default.
+.TP
+.B IGNORECASE
+Controls the case-sensitivity of all regular expression operations. If
+.B IGNORECASE
+has a non-zero value, then pattern matching in rules,
+field splitting with
+.BR FS ,
+regular expression
+matching with
+.B ~
+and
+.BR !~ ,
+and the
+.BR gsub() ,
+.BR index() ,
+.BR match() ,
+.BR split() ,
+and
+.B sub()
+pre-defined functions will all ignore case when doing regular expression
+operations. Thus, if
+.B IGNORECASE
+is not equal to zero,
+.B /aB/
+matches all of the strings \fB"ab"\fP, \fB"aB"\fP, \fB"Ab"\fP,
+and \fB"AB"\fP.
+As with all AWK variables, the initial value of
+.B IGNORECASE
+is zero, so all regular expression operations are normally case-sensitive.
+.TP
+.B NF
+The number of fields in the current input record.
+.TP
+.B NR
+The total number of input records seen so far.
+.TP
+.B OFMT
+The output format for numbers, \fB"%.6g"\fR, by default.
+.TP
+.B OFS
+The output field separator, a blank by default.
+.TP
+.B ORS
+The output record separator, by default a newline.
+.TP
+.B RS
+The input record separator, by default a newline.
+.B RS
+is exceptional in that only the first character of its string
+value is used for separating records.
+(This will probably change in a future release of
+.IR gawk .)
+If
+.B RS
+is set to the null string, then records are separated by
+blank lines.
+When
+.B RS
+is set to the null string, then the newline character always acts as
+a field separator, in addition to whatever value
+.B FS
+may have.
+.TP
+.B RSTART
+The index of the first character matched by
+.BR match() ;
+0 if no match.
+.TP
+.B RLENGTH
+The length of the string matched by
+.BR match() ;
+\-1 if no match.
+.TP
+.B SUBSEP
+The character used to separate multiple subscripts in array
+elements, by default \fB"\e034"\fR.
+.SS Arrays
+.PP
+Arrays are subscripted with an expression between square brackets
+.RB ( [ " and " ] ).
+If the expression is an expression list
+.RI ( expr ", " expr " ...)"
+then the array subscript is a string consisting of the
+concatenation of the (string) value of each expression,
+separated by the value of the
+.B SUBSEP
+variable.
+This facility is used to simulate multiply dimensioned
+arrays. For example:
+.PP
+.RS
+.ft B
+i = "A" ;\^ j = "B" ;\^ k = "C"
+.br
+x[i, j, k] = "hello, world\en"
+.ft R
+.RE
+.PP
+assigns the string \fB"hello, world\en"\fR to the element of the array
+.B x
+which is indexed by the string \fB"A\e034B\e034C"\fR. All arrays in AWK
+are associative, i.e. indexed by string values.
+.PP
+The special operator
+.B in
+may be used in an
+.B if
+or
+.B while
+statement to see if an array has an index consisting of a particular
+value.
+.PP
+.RS
+.ft B
+.nf
+if (val in array)
+ print array[val]
+.fi
+.ft
+.RE
+.PP
+If the array has multiple subscripts, use
+.BR "(i, j) in array" .
+.PP
+The
+.B in
+construct may also be used in a
+.B for
+loop to iterate over all the elements of an array.
+.PP
+An element may be deleted from an array using the
+.B delete
+statement.
+.SS Variable Typing And Conversion
+.PP
+Variables and fields
+may be (floating point) numbers, or strings, or both. How the
+value of a variable is interpreted depends upon its context. If used in
+a numeric expression, it will be treated as a number, if used as a string
+it will be treated as a string.
+.PP
+To force a variable to be treated as a number, add 0 to it; to force it
+to be treated as a string, concatenate it with the null string.
+.PP
+When a string must be converted to a number, the conversion is accomplished
+using
+.IR atof (3).
+A number is converted to a string by using the value of
+.B CONVFMT
+as a format string for
+.IR sprintf (3),
+with the numeric value of the variable as the argument.
+However, even though all numbers in AWK are floating-point,
+integral values are
+.I always
+converted as integers. Thus, given
+.PP
+.RS
+.ft B
+.nf
+CONVFMT = "%2.2f"
+a = 12
+b = a ""
+.fi
+.ft R
+.RE
+.PP
+the variable
+.B b
+has a value of \fB"12"\fR and not \fB"12.00"\fR.
+.PP
+.I Gawk
+performs comparisons as follows:
+If two variables are numeric, they are compared numerically.
+If one value is numeric and the other has a string value that is a
+``numeric string,'' then comparisons are also done numerically.
+Otherwise, the numeric value is converted to a string and a string
+comparison is performed.
+Two strings are compared, of course, as strings.
+According to the \*(PX standard, even if two strings are
+numeric strings, a numeric comparison is performed. However, this is
+clearly incorrect, and
+.I gawk
+does not do this.
+.PP
+Uninitialized variables have the numeric value 0 and the string value ""
+(the null, or empty, string).
+.SH PATTERNS AND ACTIONS
+AWK is a line oriented language. The pattern comes first, and then the
+action. Action statements are enclosed in
+.B {
+and
+.BR } .
+Either the pattern may be missing, or the action may be missing, but,
+of course, not both. If the pattern is missing, the action will be
+executed for every single line of input.
+A missing action is equivalent to
+.RS
+.PP
+.B "{ print }"
+.RE
+.PP
+which prints the entire line.
+.PP
+Comments begin with the ``#'' character, and continue until the
+end of the line.
+Blank lines may be used to separate statements.
+Normally, a statement ends with a newline, however, this is not the
+case for lines ending in
+a ``,'', ``{'', ``?'', ``:'', ``&&'', or ``||''.
+Lines ending in
+.B do
+or
+.B else
+also have their statements automatically continued on the following line.
+In other cases, a line can be continued by ending it with a ``\e'',
+in which case the newline will be ignored.
+.PP
+Multiple statements may
+be put on one line by separating them with a ``;''.
+This applies to both the statements within the action part of a
+pattern-action pair (the usual case),
+and to the pattern-action statements themselves.
+.SS Patterns
+AWK patterns may be one of the following:
+.PP
+.RS
+.nf
+.B BEGIN
+.B END
+.BI / "regular expression" /
+.I "relational expression"
+.IB pattern " && " pattern
+.IB pattern " || " pattern
+.IB pattern " ? " pattern " : " pattern
+.BI ( pattern )
+.BI ! " pattern"
+.IB pattern1 ", " pattern2
+.fi
+.RE
+.PP
+.B BEGIN
+and
+.B END
+are two special kinds of patterns which are not tested against
+the input.
+The action parts of all
+.B BEGIN
+patterns are merged as if all the statements had
+been written in a single
+.B BEGIN
+block. They are executed before any
+of the input is read. Similarly, all the
+.B END
+blocks are merged,
+and executed when all the input is exhausted (or when an
+.B exit
+statement is executed).
+.B BEGIN
+and
+.B END
+patterns cannot be combined with other patterns in pattern expressions.
+.B BEGIN
+and
+.B END
+patterns cannot have missing action parts.
+.PP
+For
+.BI / "regular expression" /
+patterns, the associated statement is executed for each input line that matches
+the regular expression.
+Regular expressions are the same as those in
+.IR egrep (1),
+and are summarized below.
+.PP
+A
+.I "relational expression"
+may use any of the operators defined below in the section on actions.
+These generally test whether certain fields match certain regular expressions.
+.PP
+The
+.BR && ,
+.BR || ,
+and
+.B !
+operators are logical AND, logical OR, and logical NOT, respectively, as in C.
+They do short-circuit evaluation, also as in C, and are used for combining
+more primitive pattern expressions. As in most languages, parentheses
+may be used to change the order of evaluation.
+.PP
+The
+.B ?\^:
+operator is like the same operator in C. If the first pattern is true
+then the pattern used for testing is the second pattern, otherwise it is
+the third. Only one of the second and third patterns is evaluated.
+.PP
+The
+.IB pattern1 ", " pattern2
+form of an expression is called a range pattern.
+It matches all input records starting with a line that matches
+.IR pattern1 ,
+and continuing until a record that matches
+.IR pattern2 ,
+inclusive. It does not combine with any other sort of pattern expression.
+.SS Regular Expressions
+Regular expressions are the extended kind found in
+.IR egrep .
+They are composed of characters as follows:
+.TP \w'\fB[^\fIabc...\fB]\fR'u+2n
+.I c
+matches the non-metacharacter
+.IR c .
+.TP
+.I \ec
+matches the literal character
+.IR c .
+.TP
+.B .
+matches any character except newline.
+.TP
+.B ^
+matches the beginning of a line or a string.
+.TP
+.B $
+matches the end of a line or a string.
+.TP
+.BI [ abc... ]
+character class, matches any of the characters
+.IR abc... .
+.TP
+.BI [^ abc... ]
+negated character class, matches any character except
+.I abc...
+and newline.
+.TP
+.IB r1 | r2
+alternation: matches either
+.I r1
+or
+.IR r2 .
+.TP
+.I r1r2
+concatenation: matches
+.IR r1 ,
+and then
+.IR r2 .
+.TP
+.IB r +
+matches one or more
+.IR r 's.
+.TP
+.IB r *
+matches zero or more
+.IR r 's.
+.TP
+.IB r ?
+matches zero or one
+.IR r 's.
+.TP
+.BI ( r )
+grouping: matches
+.IR r .
+.PP
+The escape sequences that are valid in string constants (see below)
+are also legal in regular expressions.
+.SS Actions
+Action statements are enclosed in braces,
+.B {
+and
+.BR } .
+Action statements consist of the usual assignment, conditional, and looping
+statements found in most languages. The operators, control statements,
+and input/output statements
+available are patterned after those in C.
+.SS Operators
+.PP
+The operators in AWK, in order of increasing precedence, are
+.PP
+.TP "\w'\fB*= /= %= ^=\fR'u+1n"
+.PD 0
+.B "= += \-="
+.TP
+.PD
+.B "*= /= %= ^="
+Assignment. Both absolute assignment
+.BI ( var " = " value )
+and operator-assignment (the other forms) are supported.
+.TP
+.B ?:
+The C conditional expression. This has the form
+.IB expr1 " ? " expr2 " : " expr3\c
+\&. If
+.I expr1
+is true, the value of the expression is
+.IR expr2 ,
+otherwise it is
+.IR expr3 .
+Only one of
+.I expr2
+and
+.I expr3
+is evaluated.
+.TP
+.B ||
+Logical OR.
+.TP
+.B &&
+Logical AND.
+.TP
+.B "~ !~"
+Regular expression match, negated match.
+.B NOTE:
+Do not use a constant regular expression
+.RB ( /foo/ )
+on the left-hand side of a
+.B ~
+or
+.BR !~ .
+Only use one on the right-hand side. The expression
+.BI "/foo/ ~ " exp
+has the same meaning as \fB(($0 ~ /foo/) ~ \fIexp\fB)\fR.
+This is usually
+.I not
+what was intended.
+.TP
+.PD 0
+.B "< >"
+.TP
+.PD 0
+.B "<= >="
+.TP
+.PD
+.B "!= =="
+The regular relational operators.
+.TP
+.I blank
+String concatenation.
+.TP
+.B "+ \-"
+Addition and subtraction.
+.TP
+.B "* / %"
+Multiplication, division, and modulus.
+.TP
+.B "+ \- !"
+Unary plus, unary minus, and logical negation.
+.TP
+.B ^
+Exponentiation (\fB**\fR may also be used, and \fB**=\fR for
+the assignment operator).
+.TP
+.B "++ \-\^\-"
+Increment and decrement, both prefix and postfix.
+.TP
+.B $
+Field reference.
+.SS Control Statements
+.PP
+The control statements are
+as follows:
+.PP
+.RS
+.nf
+\fBif (\fIcondition\fB) \fIstatement\fR [ \fBelse\fI statement \fR]
+\fBwhile (\fIcondition\fB) \fIstatement \fR
+\fBdo \fIstatement \fBwhile (\fIcondition\fB)\fR
+\fBfor (\fIexpr1\fB; \fIexpr2\fB; \fIexpr3\fB) \fIstatement\fR
+\fBfor (\fIvar \fBin\fI array\fB) \fIstatement\fR
+\fBbreak\fR
+\fBcontinue\fR
+\fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR
+\fBexit\fR [ \fIexpression\fR ]
+\fB{ \fIstatements \fB}
+.fi
+.RE
+.SS "I/O Statements"
+.PP
+The input/output statements are as follows:
+.PP
+.TP "\w'\fBprintf \fIfmt, expr-list\fR'u+1n"
+.BI close( filename )
+Close file (or pipe, see below).
+.TP
+.B getline
+Set
+.B $0
+from next input record; set
+.BR NF ,
+.BR NR ,
+.BR FNR .
+.TP
+.BI "getline <" file
+Set
+.B $0
+from next record of
+.IR file ;
+set
+.BR NF .
+.TP
+.BI getline " var"
+Set
+.I var
+from next input record; set
+.BR NF ,
+.BR FNR .
+.TP
+.BI getline " var" " <" file
+Set
+.I var
+from next record of
+.IR file .
+.TP
+.B next
+Stop processing the current input record. The next input record
+is read and processing starts over with the first pattern in the
+AWK program. If the end of the input data is reached, the
+.B END
+block(s), if any, are executed.
+.TP
+.B "next file"
+Stop processing the current input file. The next input record read
+comes from the next input file.
+.B FILENAME
+is updated,
+.B FNR
+is reset to 1, and processing starts over with the first pattern in the
+AWK program. If the end of the input data is reached, the
+.B END
+block(s), if any, are executed.
+.TP
+.B print
+Prints the current record.
+.TP
+.BI print " expr-list"
+Prints expressions.
+.TP
+.BI print " expr-list" " >" file
+Prints expressions on
+.IR file .
+.TP
+.BI printf " fmt, expr-list"
+Format and print.
+.TP
+.BI printf " fmt, expr-list" " >" file
+Format and print on
+.IR file .
+.TP
+.BI system( cmd-line )
+Execute the command
+.IR cmd-line ,
+and return the exit status.
+(This may not be available on non-\*(PX systems.)
+.PP
+Other input/output redirections are also allowed. For
+.B print
+and
+.BR printf ,
+.BI >> file
+appends output to the
+.IR file ,
+while
+.BI | " command"
+writes on a pipe.
+In a similar fashion,
+.IB command " | getline"
+pipes into
+.BR getline .
+.BR Getline
+will return 0 on end of file, and \-1 on an error.
+.SS The \fIprintf\fP\^ Statement
+.PP
+The AWK versions of the
+.B printf
+statement and
+.B sprintf()
+function
+(see below)
+accept the following conversion specification formats:
+.TP
+.B %c
+An \s-1ASCII\s+1 character.
+If the argument used for
+.B %c
+is numeric, it is treated as a character and printed.
+Otherwise, the argument is assumed to be a string, and the only first
+character of that string is printed.
+.TP
+.B %d
+A decimal number (the integer part).
+.TP
+.B %i
+Just like
+.BR %d .
+.TP
+.B %e
+A floating point number of the form
+.BR [\-]d.ddddddE[+\^\-]dd .
+.TP
+.B %f
+A floating point number of the form
+.BR [\-]ddd.dddddd .
+.TP
+.B %g
+Use
+.B e
+or
+.B f
+conversion, whichever is shorter, with nonsignificant zeros suppressed.
+.TP
+.B %o
+An unsigned octal number (again, an integer).
+.TP
+.B %s
+A character string.
+.TP
+.B %x
+An unsigned hexadecimal number (an integer).
+.TP
+.B %X
+Like
+.BR %x ,
+but using
+.B ABCDEF
+instead of
+.BR abcdef .
+.TP
+.B %%
+A single
+.B %
+character; no argument is converted.
+.PP
+There are optional, additional parameters that may lie between the
+.B %
+and the control letter:
+.TP
+.B \-
+The expression should be left-justified within its field.
+.TP
+.I width
+The field should be padded to this width. If the number has a leading
+zero, then the field will be padded with zeros.
+Otherwise it is padded with blanks.
+.TP
+.BI . prec
+A number indicating the maximum width of strings or digits to the right
+of the decimal point.
+.PP
+The dynamic
+.I width
+and
+.I prec
+capabilities of the \*(AN C
+.B printf()
+routines are supported.
+A
+.B *
+in place of either the
+.B width
+or
+.B prec
+specifications will cause their values to be taken from
+the argument list to
+.B printf
+or
+.BR sprintf() .
+.SS Special File Names
+.PP
+When doing I/O redirection from either
+.B print
+or
+.B printf
+into a file,
+or via
+.B getline
+from a file,
+.I gawk
+recognizes certain special filenames internally. These filenames
+allow access to open file descriptors inherited from
+.IR gawk 's
+parent process (usually the shell).
+Other special filenames provide access information about the running
+.B gawk
+process.
+The filenames are:
+.TP \w'\fB/dev/stdout\fR'u+1n
+.B /dev/pid
+Reading this file returns the process ID of the current process,
+in decimal, terminated with a newline.
+.TP
+.B /dev/ppid
+Reading this file returns the parent process ID of the current process,
+in decimal, terminated with a newline.
+.TP
+.B /dev/pgrpid
+Reading this file returns the process group ID of the current process,
+in decimal, terminated with a newline.
+.TP
+.B /dev/user
+Reading this file returns a single record terminated with a newline.
+The fields are separated with blanks.
+.B $1
+is the value of the
+.IR getuid (2)
+system call,
+.B $2
+is the value of the
+.IR geteuid (2)
+system call,
+.B $3
+is the value of the
+.IR getgid (2)
+system call, and
+.B $4
+is the value of the
+.IR getegid (2)
+system call.
+If there are any additional fields, they are the group IDs returned by
+.IR getgroups (2).
+(Multiple groups may not be supported on all systems.)
+.TP
+.B /dev/stdin
+The standard input.
+.TP
+.B /dev/stdout
+The standard output.
+.TP
+.B /dev/stderr
+The standard error output.
+.TP
+.BI /dev/fd/\^ n
+The file associated with the open file descriptor
+.IR n .
+.PP
+These are particularly useful for error messages. For example:
+.PP
+.RS
+.ft B
+print "You blew it!" > "/dev/stderr"
+.ft R
+.RE
+.PP
+whereas you would otherwise have to use
+.PP
+.RS
+.ft B
+print "You blew it!" | "cat 1>&2"
+.ft R
+.RE
+.PP
+These file names may also be used on the command line to name data files.
+.SS Numeric Functions
+.PP
+AWK has the following pre-defined arithmetic functions:
+.PP
+.TP \w'\fBsrand(\^\fIexpr\^\fB)\fR'u+1n
+.BI atan2( y , " x" )
+returns the arctangent of
+.I y/x
+in radians.
+.TP
+.BI cos( expr )
+returns the cosine in radians.
+.TP
+.BI exp( expr )
+the exponential function.
+.TP
+.BI int( expr )
+truncates to integer.
+.TP
+.BI log( expr )
+the natural logarithm function.
+.TP
+.B rand()
+returns a random number between 0 and 1.
+.TP
+.BI sin( expr )
+returns the sine in radians.
+.TP
+.BI sqrt( expr )
+the square root function.
+.TP
+.BI srand( expr )
+use
+.I expr
+as a new seed for the random number generator. If no
+.I expr
+is provided, the time of day will be used.
+The return value is the previous seed for the random
+number generator.
+.SS String Functions
+.PP
+AWK has the following pre-defined string functions:
+.PP
+.TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
+\fBgsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR
+for each substring matching the regular expression
+.I r
+in the string
+.IR t ,
+substitute the string
+.IR s ,
+and return the number of substitutions.
+If
+.I t
+is not supplied, use
+.BR $0 .
+.TP
+.BI index( s , " t" )
+returns the index of the string
+.I t
+in the string
+.IR s ,
+or 0 if
+.I t
+is not present.
+.TP
+.BI length( s )
+returns the length of the string
+.IR s ,
+or the length of
+.B $0
+if
+.I s
+is not supplied.
+.TP
+.BI match( s , " r" )
+returns the position in
+.I s
+where the regular expression
+.I r
+occurs, or 0 if
+.I r
+is not present, and sets the values of
+.B RSTART
+and
+.BR RLENGTH .
+.TP
+\fBsplit(\fIs\fB, \fIa\fB, \fIr\fB)\fR
+splits the string
+.I s
+into the array
+.I a
+on the regular expression
+.IR r ,
+and returns the number of fields. If
+.I r
+is omitted,
+.B FS
+is used instead.
+.TP
+.BI sprintf( fmt , " expr-list" )
+prints
+.I expr-list
+according to
+.IR fmt ,
+and returns the resulting string.
+.TP
+\fBsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR
+just like
+.BR gsub() ,
+but only the first matching substring is replaced.
+.TP
+\fBsubstr(\fIs\fB, \fIi\fB, \fIn\fB)\fR
+returns the
+.IR n -character
+substring of
+.I s
+starting at
+.IR i .
+If
+.I n
+is omitted, the rest of
+.I s
+is used.
+.TP
+.BI tolower( str )
+returns a copy of the string
+.IR str ,
+with all the upper-case characters in
+.I str
+translated to their corresponding lower-case counterparts.
+Non-alphabetic characters are left unchanged.
+.TP
+.BI toupper( str )
+returns a copy of the string
+.IR str ,
+with all the lower-case characters in
+.I str
+translated to their corresponding upper-case counterparts.
+Non-alphabetic characters are left unchanged.
+.SS Time Functions
+.PP
+Since one of the primary uses of AWK programs is processing log files
+that contain time stamp information,
+.I gawk
+provides the following two functions for obtaining time stamps and
+formatting them.
+.PP
+.TP "\w'\fBsystime()\fR'u+1n"
+.B systime()
+returns the current time of day as the number of seconds since the Epoch
+(Midnight UTC, January 1, 1970 on \*(PX systems).
+.TP
+\fBstrftime(\fIformat\fR, \fItimestamp\fB)\fR
+formats
+.I timestamp
+according to the specification in
+.IR format.
+The
+.I timestamp
+should be of the same form as returned by
+.BR systime() .
+If
+.I timestamp
+is missing, the current time of day is used.
+See the specification for the
+.B strftime()
+function in \*(AN C for the format conversions that are
+guaranteed to be available.
+A public-domain version of
+.IR strftime (3)
+and a man page for it are shipped with
+.IR gawk ;
+if that version was used to build
+.IR gawk ,
+then all of the conversions described in that man page are available to
+.IR gawk.
+.SS String Constants
+.PP
+String constants in AWK are sequences of characters enclosed
+between double quotes (\fB"\fR). Within strings, certain
+.I "escape sequences"
+are recognized, as in C. These are:
+.PP
+.TP \w'\fB\e\^\fIddd\fR'u+1n
+.B \e\e
+A literal backslash.
+.TP
+.B \ea
+The ``alert'' character; usually the \s-1ASCII\s+1 \s-1BEL\s+1 character.
+.TP
+.B \eb
+backspace.
+.TP
+.B \ef
+form-feed.
+.TP
+.B \en
+new line.
+.TP
+.B \er
+carriage return.
+.TP
+.B \et
+horizontal tab.
+.TP
+.B \ev
+vertical tab.
+.TP
+.BI \ex "\^hex digits"
+The character represented by the string of hexadecimal digits following
+the
+.BR \ex .
+As in \*(AN C, all following hexadecimal digits are considered part of
+the escape sequence.
+(This feature should tell us something about language design by committee.)
+E.g., "\ex1B" is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
+.TP
+.BI \e ddd
+The character represented by the 1-, 2-, or 3-digit sequence of octal
+digits. E.g. "\e033" is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
+.TP
+.BI \e c
+The literal character
+.IR c\^ .
+.PP
+The escape sequences may also be used inside constant regular expressions
+(e.g.,
+.B "/[\ \et\ef\en\er\ev]/"
+matches whitespace characters).
+.SH FUNCTIONS
+Functions in AWK are defined as follows:
+.PP
+.RS
+\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements \fB}\fR
+.RE
+.PP
+Functions are executed when called from within the action parts of regular
+pattern-action statements. Actual parameters supplied in the function
+call are used to instantiate the formal parameters declared in the function.
+Arrays are passed by reference, other variables are passed by value.
+.PP
+Since functions were not originally part of the AWK language, the provision
+for local variables is rather clumsy: They are declared as extra parameters
+in the parameter list. The convention is to separate local variables from
+real parameters by extra spaces in the parameter list. For example:
+.PP
+.RS
+.ft B
+.nf
+function f(p, q, a, b) { # a & b are local
+ ..... }
+
+/abc/ { ... ; f(1, 2) ; ... }
+.fi
+.ft R
+.RE
+.PP
+The left parenthesis in a function call is required
+to immediately follow the function name,
+without any intervening white space.
+This is to avoid a syntactic ambiguity with the concatenation operator.
+This restriction does not apply to the built-in functions listed above.
+.PP
+Functions may call each other and may be recursive.
+Function parameters used as local variables are initialized
+to the null string and the number zero upon function invocation.
+.PP
+The word
+.B func
+may be used in place of
+.BR function .
+.SH EXAMPLES
+.nf
+Print and sort the login names of all users:
+
+.ft B
+ BEGIN { FS = ":" }
+ { print $1 | "sort" }
+
+.ft R
+Count lines in a file:
+
+.ft B
+ { nlines++ }
+ END { print nlines }
+
+.ft R
+Precede each line by its number in the file:
+
+.ft B
+ { print FNR, $0 }
+
+.ft R
+Concatenate and line number (a variation on a theme):
+
+.ft B
+ { print NR, $0 }
+.ft R
+.fi
+.SH SEE ALSO
+.IR egrep (1)
+.PP
+.IR "The AWK Programming Language" ,
+Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
+Addison-Wesley, 1988. ISBN 0-201-07981-X.
+.PP
+.IR "The GAWK Manual" ,
+Edition 0.15, published by the Free Software Foundation, 1993.
+.SH POSIX COMPATIBILITY
+A primary goal for
+.I gawk
+is compatibility with the \*(PX standard, as well as with the
+latest version of \*(UX
+.IR awk .
+To this end,
+.I gawk
+incorporates the following user visible
+features which are not described in the AWK book,
+but are part of
+.I awk
+in System V Release 4, and are in the \*(PX standard.
+.PP
+The
+.B \-v
+option for assigning variables before program execution starts is new.
+The book indicates that command line variable assignment happens when
+.I awk
+would otherwise open the argument as a file, which is after the
+.B BEGIN
+block is executed. However, in earlier implementations, when such an
+assignment appeared before any file names, the assignment would happen
+.I before
+the
+.B BEGIN
+block was run. Applications came to depend on this ``feature.''
+When
+.I awk
+was changed to match its documentation, this option was added to
+accomodate applications that depended upon the old behavior.
+(This feature was agreed upon by both the AT&T and GNU developers.)
+.PP
+The
+.B \-W
+option for implementation specific features is from the \*(PX standard.
+.PP
+When processing arguments,
+.I gawk
+uses the special option ``\fB\-\^\-\fP'' to signal the end of
+arguments, and warns about, but otherwise ignores, undefined options.
+.PP
+The AWK book does not define the return value of
+.BR srand() .
+The System V Release 4 version of \*(UX
+.I awk
+(and the \*(PX standard)
+has it return the seed it was using, to allow keeping track
+of random number sequences. Therefore
+.B srand()
+in
+.I gawk
+also returns its current seed.
+.PP
+Other new features are:
+The use of multiple
+.B \-f
+options (from MKS
+.IR awk );
+the
+.B ENVIRON
+array; the
+.BR \ea ,
+and
+.BR \ev
+escape sequences (done originally in
+.I gawk
+and fed back into AT&T's); the
+.B tolower()
+and
+.B toupper()
+built-in functions (from AT&T); and the \*(AN C conversion specifications in
+.B printf
+(done first in AT&T's version).
+.SH GNU EXTENSIONS
+.I Gawk
+has some extensions to \*(PX
+.IR awk .
+They are described in this section. All the extensions described here
+can be disabled by
+invoking
+.I gawk
+with the
+.B "\-W compat"
+option.
+.PP
+The following features of
+.I gawk
+are not available in
+\*(PX
+.IR awk .
+.RS
+.TP \w'\(bu'u+1n
+\(bu
+The
+.B \ex
+escape sequence.
+.TP
+\(bu
+The
+.B systime()
+and
+.B strftime()
+functions.
+.TP
+\(bu
+The special file names available for I/O redirection are not recognized.
+.TP
+\(bu
+The
+.B ARGIND
+and
+.B ERRNO
+variables are not special.
+.TP
+\(bu
+The
+.B IGNORECASE
+variable and its side-effects are not available.
+.TP
+\(bu
+The
+.B FIELDWIDTHS
+variable and fixed width field splitting.
+.TP
+\(bu
+No path search is performed for files named via the
+.B \-f
+option. Therefore the
+.B AWKPATH
+environment variable is not special.
+.TP
+\(bu
+The use of
+.B "next file"
+to abandon processing of the current input file.
+.RE
+.PP
+The AWK book does not define the return value of the
+.B close()
+function.
+.IR Gawk\^ 's
+.B close()
+returns the value from
+.IR fclose (3),
+or
+.IR pclose (3),
+when closing a file or pipe, respectively.
+.PP
+When
+.I gawk
+is invoked with the
+.B "\-W compat"
+option,
+if the
+.I fs
+argument to the
+.B \-F
+option is ``t'', then
+.B FS
+will be set to the tab character.
+Since this is a rather ugly special case, it is not the default behavior.
+This behavior also does not occur if
+.B \-Wposix
+has been specified.
+.ig
+.PP
+If
+.I gawk
+was compiled for debugging, it will
+accept the following additional options:
+.TP
+.PD 0
+.B \-Wparsedebug
+.TP
+.PD
+.B \-\^\-parsedebug
+Turn on
+.IR yacc (1)
+or
+.IR bison (1)
+debugging output during program parsing.
+This option should only be of interest to the
+.I gawk
+maintainers, and may not even be compiled into
+.IR gawk .
+..
+.SH HISTORICAL FEATURES
+There are two features of historical AWK implementations that
+.I gawk
+supports.
+First, it is possible to call the
+.B length()
+built-in function not only with no argument, but even without parentheses!
+Thus,
+.RS
+.PP
+.ft B
+a = length
+.ft R
+.RE
+.PP
+is the same as either of
+.RS
+.PP
+.ft B
+a = length()
+.br
+a = length($0)
+.ft R
+.RE
+.PP
+This feature is marked as ``deprecated'' in the \*(PX standard, and
+.I gawk
+will issue a warning about its use if
+.B \-Wlint
+is specified on the command line.
+.PP
+The other feature is the use of the
+.B continue
+statement outside the body of a
+.BR while ,
+.BR for ,
+or
+.B do
+loop. Traditional AWK implementations have treated such usage as
+equivalent to the
+.B next
+statement.
+.I Gawk
+will support this usage if
+.B \-Wposix
+has not been specified.
+.SH BUGS
+The
+.B \-F
+option is not necessary given the command line variable assignment feature;
+it remains only for backwards compatibility.
+.PP
+If your system actually has support for
+.B /dev/fd
+and the associated
+.BR /dev/stdin ,
+.BR /dev/stdout ,
+and
+.B /dev/stderr
+files, you may get different output from
+.I gawk
+than you would get on a system without those files. When
+.I gawk
+interprets these files internally, it synchronizes output to the standard
+output with output to
+.BR /dev/stdout ,
+while on a system with those files, the output is actually to different
+open files.
+Caveat Emptor.
+.SH VERSION INFORMATION
+This man page documents
+.IR gawk ,
+version 2.15.
+.PP
+Starting with the 2.15 version of
+.IR gawk ,
+the
+.BR \-c ,
+.BR \-V ,
+.BR \-C ,
+.ig
+.BR \-D ,
+..
+.BR \-a ,
+and
+.B \-e
+options of the 2.11 version are no longer recognized.
+.SH AUTHORS
+The original version of \*(UX
+.I awk
+was designed and implemented by Alfred Aho,
+Peter Weinberger, and Brian Kernighan of AT&T Bell Labs. Brian Kernighan
+continues to maintain and enhance it.
+.PP
+Paul Rubin and Jay Fenlason,
+of the Free Software Foundation, wrote
+.IR gawk ,
+to be compatible with the original version of
+.I awk
+distributed in Seventh Edition \*(UX.
+John Woods contributed a number of bug fixes.
+David Trueman, with contributions
+from Arnold Robbins, made
+.I gawk
+compatible with the new version of \*(UX
+.IR awk .
+.PP
+The initial DOS port was done by Conrad Kwok and Scott Garfinkle.
+Scott Deifik is the current DOS maintainer. Pat Rankin did the
+port to VMS, and Michal Jaegermann did the port to the Atari ST.
+.SH ACKNOWLEDGEMENTS
+Brian Kernighan of Bell Labs
+provided valuable assistance during testing and debugging.
+We thank him.
diff --git a/gnu/usr.bin/awk/awk.h b/gnu/usr.bin/awk/awk.h
new file mode 100644
index 000000000000..ca3997f11d4b
--- /dev/null
+++ b/gnu/usr.bin/awk/awk.h
@@ -0,0 +1,763 @@
+/*
+ * awk.h -- Definitions for gawk.
+ */
+
+/*
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ *
+ * This file is part of GAWK, the GNU implementation of the
+ * AWK Progamming Language.
+ *
+ * GAWK is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * GAWK is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with GAWK; see the file COPYING. If not, write to
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+/* ------------------------------ Includes ------------------------------ */
+#include <stdio.h>
+#include <limits.h>
+#include <ctype.h>
+#include <setjmp.h>
+#include <varargs.h>
+#include <time.h>
+#include <errno.h>
+#if !defined(errno) && !defined(MSDOS)
+extern int errno;
+#endif
+#ifdef __GNU_LIBRARY__
+#ifndef linux
+#include <signum.h>
+#endif
+#endif
+
+/* ----------------- System dependencies (with more includes) -----------*/
+
+#if !defined(VMS) || (!defined(VAXC) && !defined(__DECC))
+#include <sys/types.h>
+#include <sys/stat.h>
+#else /* VMS w/ VAXC or DECC */
+#include <types.h>
+#include <stat.h>
+#include <file.h> /* avoid <fcntl.h> in io.c */
+#endif
+
+#include <signal.h>
+
+#include "config.h"
+
+#ifdef __STDC__
+#define P(s) s
+#define MALLOC_ARG_T size_t
+#else
+#define P(s) ()
+#define MALLOC_ARG_T unsigned
+#define volatile
+#define const
+#endif
+
+#ifndef SIGTYPE
+#define SIGTYPE void
+#endif
+
+#ifdef SIZE_T_MISSING
+typedef unsigned int size_t;
+#endif
+
+#ifndef SZTC
+#define SZTC
+#define INTC
+#endif
+
+#ifdef STDC_HEADERS
+#include <stdlib.h>
+#include <string.h>
+#ifdef NeXT
+#include <libc.h>
+#undef atof
+#else
+#if defined(atarist) || defined(VMS)
+#include <unixlib.h>
+#else /* atarist || VMS */
+#ifndef MSDOS
+#include <unistd.h>
+#endif /* MSDOS */
+#endif /* atarist || VMS */
+#endif /* Next */
+#else /* STDC_HEADERS */
+#include "protos.h"
+#endif /* STDC_HEADERS */
+
+#if defined(ultrix) && !defined(Ultrix41)
+extern char * getenv P((char *name));
+extern double atof P((char *s));
+#endif
+
+#ifndef __GNUC__
+#ifdef sparc
+/* nasty nasty SunOS-ism */
+#include <alloca.h>
+#ifdef lint
+extern char *alloca();
+#endif
+#else /* not sparc */
+#if !defined(alloca) && !defined(ALLOCA_PROTO)
+extern char *alloca();
+#endif
+#endif /* sparc */
+#endif /* __GNUC__ */
+
+#ifdef HAVE_UNDERSCORE_SETJMP
+/* nasty nasty berkelixm */
+#define setjmp _setjmp
+#define longjmp _longjmp
+#endif
+
+/*
+ * if you don't have vprintf, try this and cross your fingers.
+ */
+#if defined(VPRINTF_MISSING)
+#define vfprintf(fp,fmt,arg) _doprnt((fmt), (arg), (fp))
+#endif
+
+#ifdef VMS
+/* some macros to redirect to code in vms/vms_misc.c */
+#define exit vms_exit
+#define open vms_open
+#define strerror vms_strerror
+#define strdup vms_strdup
+extern void exit P((int));
+extern int open P((const char *,int,...));
+extern char *strerror P((int));
+extern char *strdup P((const char *str));
+extern int vms_devopen P((const char *,int));
+# ifndef NO_TTY_FWRITE
+#define fwrite tty_fwrite
+#define fclose tty_fclose
+extern size_t fwrite P((const void *,size_t,size_t,FILE *));
+extern int fclose P((FILE *));
+# endif
+extern FILE *popen P((const char *,const char *));
+extern int pclose P((FILE *));
+extern void vms_arg_fixup P((int *,char ***));
+/* some things not in STDC_HEADERS */
+extern int gnu_strftime P((char *,size_t,const char *,const struct tm *));
+extern int unlink P((const char *));
+extern int getopt P((int,char **,char *));
+extern int isatty P((int));
+#ifndef fileno
+extern int fileno P((FILE *));
+#endif
+extern int close(), dup(), dup2(), fstat(), read(), stat();
+#endif /*VMS*/
+
+#ifdef MSDOS
+#include <io.h>
+extern FILE *popen P((char *, char *));
+extern int pclose P((FILE *));
+#endif
+
+#define GNU_REGEX
+#ifdef GNU_REGEX
+#include "regex.h"
+#include "dfa.h"
+typedef struct Regexp {
+ struct re_pattern_buffer pat;
+ struct re_registers regs;
+ struct regexp dfareg;
+ int dfa;
+} Regexp;
+#define RESTART(rp,s) (rp)->regs.start[0]
+#define REEND(rp,s) (rp)->regs.end[0]
+#else /* GNU_REGEX */
+#endif /* GNU_REGEX */
+
+#ifdef atarist
+#define read _text_read /* we do not want all these CR's to mess our input */
+extern int _text_read (int, char *, int);
+#endif
+
+#ifndef DEFPATH
+#define DEFPATH ".:/usr/local/lib/awk:/usr/lib/awk"
+#endif
+
+#ifndef ENVSEP
+#define ENVSEP ':'
+#endif
+
+/* ------------------ Constants, Structures, Typedefs ------------------ */
+#define AWKNUM double
+
+typedef enum {
+ /* illegal entry == 0 */
+ Node_illegal,
+
+ /* binary operators lnode and rnode are the expressions to work on */
+ Node_times,
+ Node_quotient,
+ Node_mod,
+ Node_plus,
+ Node_minus,
+ Node_cond_pair, /* conditional pair (see Node_line_range) */
+ Node_subscript,
+ Node_concat,
+ Node_exp,
+
+ /* unary operators subnode is the expression to work on */
+/*10*/ Node_preincrement,
+ Node_predecrement,
+ Node_postincrement,
+ Node_postdecrement,
+ Node_unary_minus,
+ Node_field_spec,
+
+ /* assignments lnode is the var to assign to, rnode is the exp */
+ Node_assign,
+ Node_assign_times,
+ Node_assign_quotient,
+ Node_assign_mod,
+/*20*/ Node_assign_plus,
+ Node_assign_minus,
+ Node_assign_exp,
+
+ /* boolean binaries lnode and rnode are expressions */
+ Node_and,
+ Node_or,
+
+ /* binary relationals compares lnode and rnode */
+ Node_equal,
+ Node_notequal,
+ Node_less,
+ Node_greater,
+ Node_leq,
+/*30*/ Node_geq,
+ Node_match,
+ Node_nomatch,
+
+ /* unary relationals works on subnode */
+ Node_not,
+
+ /* program structures */
+ Node_rule_list, /* lnode is a rule, rnode is rest of list */
+ Node_rule_node, /* lnode is pattern, rnode is statement */
+ Node_statement_list, /* lnode is statement, rnode is more list */
+ Node_if_branches, /* lnode is to run on true, rnode on false */
+ Node_expression_list, /* lnode is an exp, rnode is more list */
+ Node_param_list, /* lnode is a variable, rnode is more list */
+
+ /* keywords */
+/*40*/ Node_K_if, /* lnode is conditonal, rnode is if_branches */
+ Node_K_while, /* lnode is condtional, rnode is stuff to run */
+ Node_K_for, /* lnode is for_struct, rnode is stuff to run */
+ Node_K_arrayfor, /* lnode is for_struct, rnode is stuff to run */
+ Node_K_break, /* no subs */
+ Node_K_continue, /* no stuff */
+ Node_K_print, /* lnode is exp_list, rnode is redirect */
+ Node_K_printf, /* lnode is exp_list, rnode is redirect */
+ Node_K_next, /* no subs */
+ Node_K_exit, /* subnode is return value, or NULL */
+/*50*/ Node_K_do, /* lnode is conditional, rnode stuff to run */
+ Node_K_return,
+ Node_K_delete,
+ Node_K_getline,
+ Node_K_function, /* lnode is statement list, rnode is params */
+
+ /* I/O redirection for print statements */
+ Node_redirect_output, /* subnode is where to redirect */
+ Node_redirect_append, /* subnode is where to redirect */
+ Node_redirect_pipe, /* subnode is where to redirect */
+ Node_redirect_pipein, /* subnode is where to redirect */
+ Node_redirect_input, /* subnode is where to redirect */
+
+ /* Variables */
+/*60*/ Node_var, /* rnode is value, lnode is array stuff */
+ Node_var_array, /* array is ptr to elements, asize num of
+ * eles */
+ Node_val, /* node is a value - type in flags */
+
+ /* Builtins subnode is explist to work on, proc is func to call */
+ Node_builtin,
+
+ /*
+ * pattern: conditional ',' conditional ; lnode of Node_line_range
+ * is the two conditionals (Node_cond_pair), other word (rnode place)
+ * is a flag indicating whether or not this range has been entered.
+ */
+ Node_line_range,
+
+ /*
+ * boolean test of membership in array lnode is string-valued
+ * expression rnode is array name
+ */
+ Node_in_array,
+
+ Node_func, /* lnode is param. list, rnode is body */
+ Node_func_call, /* lnode is name, rnode is argument list */
+
+ Node_cond_exp, /* lnode is conditonal, rnode is if_branches */
+ Node_regex,
+/*70*/ Node_hashnode,
+ Node_ahash,
+ Node_NF,
+ Node_NR,
+ Node_FNR,
+ Node_FS,
+ Node_RS,
+ Node_FIELDWIDTHS,
+ Node_IGNORECASE,
+ Node_OFS,
+ Node_ORS,
+ Node_OFMT,
+ Node_CONVFMT,
+ Node_K_nextfile
+} NODETYPE;
+
+/*
+ * NOTE - this struct is a rather kludgey -- it is packed to minimize
+ * space usage, at the expense of cleanliness. Alter at own risk.
+ */
+typedef struct exp_node {
+ union {
+ struct {
+ union {
+ struct exp_node *lptr;
+ char *param_name;
+ } l;
+ union {
+ struct exp_node *rptr;
+ struct exp_node *(*pptr) ();
+ Regexp *preg;
+ struct for_loop_header *hd;
+ struct exp_node **av;
+ int r_ent; /* range entered */
+ } r;
+ union {
+ char *name;
+ struct exp_node *extra;
+ } x;
+ short number;
+ unsigned char reflags;
+# define CASE 1
+# define CONST 2
+# define FS_DFLT 4
+ } nodep;
+ struct {
+ AWKNUM fltnum; /* this is here for optimal packing of
+ * the structure on many machines
+ */
+ char *sp;
+ size_t slen;
+ unsigned char sref;
+ char idx;
+ } val;
+ struct {
+ struct exp_node *next;
+ char *name;
+ int length;
+ struct exp_node *value;
+ } hash;
+#define hnext sub.hash.next
+#define hname sub.hash.name
+#define hlength sub.hash.length
+#define hvalue sub.hash.value
+ struct {
+ struct exp_node *next;
+ struct exp_node *name;
+ struct exp_node *value;
+ } ahash;
+#define ahnext sub.ahash.next
+#define ahname sub.ahash.name
+#define ahvalue sub.ahash.value
+ } sub;
+ NODETYPE type;
+ unsigned short flags;
+# define MALLOC 1 /* can be free'd */
+# define TEMP 2 /* should be free'd */
+# define PERM 4 /* can't be free'd */
+# define STRING 8 /* assigned as string */
+# define STR 16 /* string value is current */
+# define NUM 32 /* numeric value is current */
+# define NUMBER 64 /* assigned as number */
+# define MAYBE_NUM 128 /* user input: if NUMERIC then
+ * a NUMBER
+ */
+ char *vname; /* variable's name */
+} NODE;
+
+#define lnode sub.nodep.l.lptr
+#define nextp sub.nodep.l.lptr
+#define rnode sub.nodep.r.rptr
+#define source_file sub.nodep.x.name
+#define source_line sub.nodep.number
+#define param_cnt sub.nodep.number
+#define param sub.nodep.l.param_name
+
+#define subnode lnode
+#define proc sub.nodep.r.pptr
+
+#define re_reg sub.nodep.r.preg
+#define re_flags sub.nodep.reflags
+#define re_text lnode
+#define re_exp sub.nodep.x.extra
+#define re_cnt sub.nodep.number
+
+#define forsub lnode
+#define forloop rnode->sub.nodep.r.hd
+
+#define stptr sub.val.sp
+#define stlen sub.val.slen
+#define stref sub.val.sref
+#define stfmt sub.val.idx
+
+#define numbr sub.val.fltnum
+
+#define var_value lnode
+#define var_array sub.nodep.r.av
+
+#define condpair lnode
+#define triggered sub.nodep.r.r_ent
+
+#ifdef DONTDEF
+int primes[] = {31, 61, 127, 257, 509, 1021, 2053, 4099, 8191, 16381};
+#endif
+/* a quick profile suggests that the following is a good value */
+#define HASHSIZE 127
+
+typedef struct for_loop_header {
+ NODE *init;
+ NODE *cond;
+ NODE *incr;
+} FOR_LOOP_HEADER;
+
+/* for "for(iggy in foo) {" */
+struct search {
+ NODE **arr_ptr;
+ NODE **arr_end;
+ NODE *bucket;
+ NODE *retval;
+};
+
+/* for faster input, bypass stdio */
+typedef struct iobuf {
+ int fd;
+ char *buf;
+ char *off;
+ char *end;
+ size_t size; /* this will be determined by an fstat() call */
+ int cnt;
+ long secsiz;
+ int flag;
+# define IOP_IS_TTY 1
+# define IOP_IS_INTERNAL 2
+# define IOP_NO_FREE 4
+} IOBUF;
+
+typedef void (*Func_ptr)();
+
+/*
+ * structure used to dynamically maintain a linked-list of open files/pipes
+ */
+struct redirect {
+ unsigned int flag;
+# define RED_FILE 1
+# define RED_PIPE 2
+# define RED_READ 4
+# define RED_WRITE 8
+# define RED_APPEND 16
+# define RED_NOBUF 32
+# define RED_USED 64
+# define RED_EOF 128
+ char *value;
+ FILE *fp;
+ IOBUF *iop;
+ int pid;
+ int status;
+ struct redirect *prev;
+ struct redirect *next;
+};
+
+/* structure for our source, either a command line string or a source file */
+struct src {
+ enum srctype { CMDLINE = 1, SOURCEFILE } stype;
+ char *val;
+};
+
+/* longjmp return codes, must be nonzero */
+/* Continue means either for loop/while continue, or next input record */
+#define TAG_CONTINUE 1
+/* Break means either for/while break, or stop reading input */
+#define TAG_BREAK 2
+/* Return means return from a function call; leave value in ret_node */
+#define TAG_RETURN 3
+
+#define HUGE INT_MAX
+
+/* -------------------------- External variables -------------------------- */
+/* gawk builtin variables */
+extern int NF;
+extern int NR;
+extern int FNR;
+extern int IGNORECASE;
+extern char *RS;
+extern char *OFS;
+extern int OFSlen;
+extern char *ORS;
+extern int ORSlen;
+extern char *OFMT;
+extern char *CONVFMT;
+extern int CONVFMTidx;
+extern int OFMTidx;
+extern NODE *FS_node, *NF_node, *RS_node, *NR_node;
+extern NODE *FILENAME_node, *OFS_node, *ORS_node, *OFMT_node;
+extern NODE *CONVFMT_node;
+extern NODE *FNR_node, *RLENGTH_node, *RSTART_node, *SUBSEP_node;
+extern NODE *IGNORECASE_node;
+extern NODE *FIELDWIDTHS_node;
+
+extern NODE **stack_ptr;
+extern NODE *Nnull_string;
+extern NODE **fields_arr;
+extern int sourceline;
+extern char *source;
+extern NODE *expression_value;
+
+extern NODE *_t; /* used as temporary in tree_eval */
+
+extern const char *myname;
+
+extern NODE *nextfree;
+extern int field0_valid;
+extern int do_unix;
+extern int do_posix;
+extern int do_lint;
+extern int in_begin_rule;
+extern int in_end_rule;
+
+/* ------------------------- Pseudo-functions ------------------------- */
+
+#define is_identchar(c) (isalnum(c) || (c) == '_')
+
+
+#ifndef MPROF
+#define getnode(n) if (nextfree) n = nextfree, nextfree = nextfree->nextp;\
+ else n = more_nodes()
+#define freenode(n) ((n)->nextp = nextfree, nextfree = (n))
+#else
+#define getnode(n) emalloc(n, NODE *, sizeof(NODE), "getnode")
+#define freenode(n) free(n)
+#endif
+
+#ifdef DEBUG
+#define tree_eval(t) r_tree_eval(t)
+#else
+#define tree_eval(t) (_t = (t),(_t) == NULL ? Nnull_string : \
+ ((_t)->type == Node_val ? (_t) : \
+ ((_t)->type == Node_var ? (_t)->var_value : \
+ ((_t)->type == Node_param_list ? \
+ (stack_ptr[(_t)->param_cnt])->var_value : \
+ r_tree_eval((_t))))))
+#endif
+
+#define make_number(x) mk_number((x), (MALLOC|NUM|NUMBER))
+#define tmp_number(x) mk_number((x), (MALLOC|TEMP|NUM|NUMBER))
+
+#define free_temp(n) do {if ((n)->flags&TEMP) { unref(n); }} while (0)
+#define make_string(s,l) make_str_node((s), SZTC (l),0)
+#define SCAN 1
+#define ALREADY_MALLOCED 2
+
+#define cant_happen() fatal("internal error line %d, file: %s", \
+ __LINE__, __FILE__);
+
+#if defined(__STDC__) && !defined(NO_TOKEN_PASTING)
+#define emalloc(var,ty,x,str) (void)((var=(ty)malloc((MALLOC_ARG_T)(x))) ||\
+ (fatal("%s: %s: can't allocate memory (%s)",\
+ (str), #var, strerror(errno)),0))
+#define erealloc(var,ty,x,str) (void)((var=(ty)realloc((char *)var,\
+ (MALLOC_ARG_T)(x))) ||\
+ (fatal("%s: %s: can't allocate memory (%s)",\
+ (str), #var, strerror(errno)),0))
+#else /* __STDC__ */
+#define emalloc(var,ty,x,str) (void)((var=(ty)malloc((MALLOC_ARG_T)(x))) ||\
+ (fatal("%s: %s: can't allocate memory (%s)",\
+ (str), "var", strerror(errno)),0))
+#define erealloc(var,ty,x,str) (void)((var=(ty)realloc((char *)var,\
+ (MALLOC_ARG_T)(x))) ||\
+ (fatal("%s: %s: can't allocate memory (%s)",\
+ (str), "var", strerror(errno)),0))
+#endif /* __STDC__ */
+
+#ifdef DEBUG
+#define force_number r_force_number
+#define force_string r_force_string
+#else /* not DEBUG */
+#ifdef lint
+extern AWKNUM force_number();
+#endif
+#ifdef MSDOS
+extern double _msc51bug;
+#define force_number(n) (_msc51bug=(_t = (n),(_t->flags & NUM) ? _t->numbr : r_force_number(_t)))
+#else /* not MSDOS */
+#define force_number(n) (_t = (n),(_t->flags & NUM) ? _t->numbr : r_force_number(_t))
+#endif /* MSDOS */
+#define force_string(s) (_t = (s),(_t->flags & STR) ? _t : r_force_string(_t))
+#endif /* not DEBUG */
+
+#define STREQ(a,b) (*(a) == *(b) && strcmp((a), (b)) == 0)
+#define STREQN(a,b,n) ((n)&& *(a)== *(b) && strncmp((a), (b), SZTC (n)) == 0)
+
+/* ------------- Function prototypes or defs (as appropriate) ------------- */
+
+/* array.c */
+extern NODE *concat_exp P((NODE *tree));
+extern void assoc_clear P((NODE *symbol));
+extern unsigned int hash P((char *s, int len));
+extern int in_array P((NODE *symbol, NODE *subs));
+extern NODE **assoc_lookup P((NODE *symbol, NODE *subs));
+extern void do_delete P((NODE *symbol, NODE *tree));
+extern void assoc_scan P((NODE *symbol, struct search *lookat));
+extern void assoc_next P((struct search *lookat));
+/* awk.tab.c */
+extern char *tokexpand P((void));
+extern char nextc P((void));
+extern NODE *node P((NODE *left, NODETYPE op, NODE *right));
+extern NODE *install P((char *name, NODE *value));
+extern NODE *lookup P((char *name));
+extern NODE *variable P((char *name, int can_free));
+extern int yyparse P((void));
+/* builtin.c */
+extern NODE *do_exp P((NODE *tree));
+extern NODE *do_index P((NODE *tree));
+extern NODE *do_int P((NODE *tree));
+extern NODE *do_length P((NODE *tree));
+extern NODE *do_log P((NODE *tree));
+extern NODE *do_sprintf P((NODE *tree));
+extern void do_printf P((NODE *tree));
+extern void print_simple P((NODE *tree, FILE *fp));
+extern NODE *do_sqrt P((NODE *tree));
+extern NODE *do_substr P((NODE *tree));
+extern NODE *do_strftime P((NODE *tree));
+extern NODE *do_systime P((NODE *tree));
+extern NODE *do_system P((NODE *tree));
+extern void do_print P((NODE *tree));
+extern NODE *do_tolower P((NODE *tree));
+extern NODE *do_toupper P((NODE *tree));
+extern NODE *do_atan2 P((NODE *tree));
+extern NODE *do_sin P((NODE *tree));
+extern NODE *do_cos P((NODE *tree));
+extern NODE *do_rand P((NODE *tree));
+extern NODE *do_srand P((NODE *tree));
+extern NODE *do_match P((NODE *tree));
+extern NODE *do_gsub P((NODE *tree));
+extern NODE *do_sub P((NODE *tree));
+/* eval.c */
+extern int interpret P((NODE *volatile tree));
+extern NODE *r_tree_eval P((NODE *tree));
+extern int cmp_nodes P((NODE *t1, NODE *t2));
+extern NODE **get_lhs P((NODE *ptr, Func_ptr *assign));
+extern void set_IGNORECASE P((void));
+void set_OFS P((void));
+void set_ORS P((void));
+void set_OFMT P((void));
+void set_CONVFMT P((void));
+/* field.c */
+extern void init_fields P((void));
+extern void set_record P((char *buf, int cnt, int freeold));
+extern void reset_record P((void));
+extern void set_NF P((void));
+extern NODE **get_field P((int num, Func_ptr *assign));
+extern NODE *do_split P((NODE *tree));
+extern void set_FS P((void));
+extern void set_RS P((void));
+extern void set_FIELDWIDTHS P((void));
+/* io.c */
+extern void set_FNR P((void));
+extern void set_NR P((void));
+extern void do_input P((void));
+extern struct redirect *redirect P((NODE *tree, int *errflg));
+extern NODE *do_close P((NODE *tree));
+extern int flush_io P((void));
+extern int close_io P((void));
+extern int devopen P((char *name, char *mode));
+extern int pathopen P((char *file));
+extern NODE *do_getline P((NODE *tree));
+extern void do_nextfile P((void));
+/* iop.c */
+extern int optimal_bufsize P((int fd));
+extern IOBUF *iop_alloc P((int fd));
+extern int get_a_record P((char **out, IOBUF *iop, int rs, int *errcode));
+/* main.c */
+extern int main P((int argc, char **argv));
+extern Regexp *mk_re_parse P((char *s, int ignorecase));
+extern void load_environ P((void));
+extern char *arg_assign P((char *arg));
+extern SIGTYPE catchsig P((int sig, int code));
+/* msg.c */
+#ifdef MSDOS
+extern void err P((char *s, char *emsg, char *va_list, ...));
+extern void msg P((char *va_alist, ...));
+extern void warning P((char *va_alist, ...));
+extern void fatal P((char *va_alist, ...));
+#else
+extern void err ();
+extern void msg ();
+extern void warning ();
+extern void fatal ();
+#endif
+/* node.c */
+extern AWKNUM r_force_number P((NODE *n));
+extern NODE *r_force_string P((NODE *s));
+extern NODE *dupnode P((NODE *n));
+extern NODE *mk_number P((AWKNUM x, unsigned int flags));
+extern NODE *make_str_node P((char *s, size_t len, int scan ));
+extern NODE *tmp_string P((char *s, size_t len ));
+extern NODE *more_nodes P((void));
+#ifdef DEBUG
+extern void freenode P((NODE *it));
+#endif
+extern void unref P((NODE *tmp));
+extern int parse_escape P((char **string_ptr));
+/* re.c */
+extern Regexp *make_regexp P((char *s, int len, int ignorecase, int dfa));
+extern int research P((Regexp *rp, char *str, int start, int len, int need_start));
+extern void refree P((Regexp *rp));
+extern void reg_error P((const char *s));
+extern Regexp *re_update P((NODE *t));
+extern void resyntax P((int syntax));
+extern void resetup P((void));
+
+/* strcase.c */
+extern int strcasecmp P((const char *s1, const char *s2));
+extern int strncasecmp P((const char *s1, const char *s2, register size_t n));
+
+#ifdef atarist
+/* atari/tmpnam.c */
+extern char *tmpnam P((char *buf));
+extern char *tempnam P((const char *path, const char *base));
+#endif
+
+/* Figure out what '\a' really is. */
+#ifdef __STDC__
+#define BELL '\a' /* sure makes life easy, don't it? */
+#else
+# if 'z' - 'a' == 25 /* ascii */
+# if 'a' != 97 /* machine is dumb enough to use mark parity */
+# define BELL '\207'
+# else
+# define BELL '\07'
+# endif
+# else
+# define BELL '\057'
+# endif
+#endif
+
+extern char casetable[]; /* for case-independent regexp matching */
diff --git a/gnu/usr.bin/awk/awk.y b/gnu/usr.bin/awk/awk.y
new file mode 100644
index 000000000000..6e87f1c449cc
--- /dev/null
+++ b/gnu/usr.bin/awk/awk.y
@@ -0,0 +1,1804 @@
+/*
+ * awk.y --- yacc/bison parser
+ */
+
+/*
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ *
+ * This file is part of GAWK, the GNU implementation of the
+ * AWK Progamming Language.
+ *
+ * GAWK is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * GAWK is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with GAWK; see the file COPYING. If not, write to
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+%{
+#ifdef DEBUG
+#define YYDEBUG 12
+#endif
+
+#include "awk.h"
+
+static void yyerror (); /* va_alist */
+static char *get_src_buf P((void));
+static int yylex P((void));
+static NODE *node_common P((NODETYPE op));
+static NODE *snode P((NODE *subn, NODETYPE op, int sindex));
+static NODE *mkrangenode P((NODE *cpair));
+static NODE *make_for_loop P((NODE *init, NODE *cond, NODE *incr));
+static NODE *append_right P((NODE *list, NODE *new));
+static void func_install P((NODE *params, NODE *def));
+static void pop_var P((NODE *np, int freeit));
+static void pop_params P((NODE *params));
+static NODE *make_param P((char *name));
+static NODE *mk_rexp P((NODE *exp));
+
+static int want_assign; /* lexical scanning kludge */
+static int want_regexp; /* lexical scanning kludge */
+static int can_return; /* lexical scanning kludge */
+static int io_allowed = 1; /* lexical scanning kludge */
+static char *lexptr; /* pointer to next char during parsing */
+static char *lexend;
+static char *lexptr_begin; /* keep track of where we were for error msgs */
+static char *lexeme; /* beginning of lexeme for debugging */
+static char *thisline = NULL;
+#define YYDEBUG_LEXER_TEXT (lexeme)
+static int param_counter;
+static char *tokstart = NULL;
+static char *token = NULL;
+static char *tokend;
+
+NODE *variables[HASHSIZE];
+
+extern char *source;
+extern int sourceline;
+extern struct src *srcfiles;
+extern int numfiles;
+extern int errcount;
+extern NODE *begin_block;
+extern NODE *end_block;
+%}
+
+%union {
+ long lval;
+ AWKNUM fval;
+ NODE *nodeval;
+ NODETYPE nodetypeval;
+ char *sval;
+ NODE *(*ptrval)();
+}
+
+%type <nodeval> function_prologue function_body
+%type <nodeval> rexp exp start program rule simp_exp
+%type <nodeval> non_post_simp_exp
+%type <nodeval> pattern
+%type <nodeval> action variable param_list
+%type <nodeval> rexpression_list opt_rexpression_list
+%type <nodeval> expression_list opt_expression_list
+%type <nodeval> statements statement if_statement opt_param_list
+%type <nodeval> opt_exp opt_variable regexp
+%type <nodeval> input_redir output_redir
+%type <nodetypeval> print
+%type <sval> func_name
+%type <lval> lex_builtin
+
+%token <sval> FUNC_CALL NAME REGEXP
+%token <lval> ERROR
+%token <nodeval> YNUMBER YSTRING
+%token <nodetypeval> RELOP APPEND_OP
+%token <nodetypeval> ASSIGNOP MATCHOP NEWLINE CONCAT_OP
+%token <nodetypeval> LEX_BEGIN LEX_END LEX_IF LEX_ELSE LEX_RETURN LEX_DELETE
+%token <nodetypeval> LEX_WHILE LEX_DO LEX_FOR LEX_BREAK LEX_CONTINUE
+%token <nodetypeval> LEX_PRINT LEX_PRINTF LEX_NEXT LEX_EXIT LEX_FUNCTION
+%token <nodetypeval> LEX_GETLINE
+%token <nodetypeval> LEX_IN
+%token <lval> LEX_AND LEX_OR INCREMENT DECREMENT
+%token <lval> LEX_BUILTIN LEX_LENGTH
+
+/* these are just yylval numbers */
+
+/* Lowest to highest */
+%right ASSIGNOP
+%right '?' ':'
+%left LEX_OR
+%left LEX_AND
+%left LEX_GETLINE
+%nonassoc LEX_IN
+%left FUNC_CALL LEX_BUILTIN LEX_LENGTH
+%nonassoc MATCHOP
+%nonassoc RELOP '<' '>' '|' APPEND_OP
+%left CONCAT_OP
+%left YSTRING YNUMBER
+%left '+' '-'
+%left '*' '/' '%'
+%right '!' UNARY
+%right '^'
+%left INCREMENT DECREMENT
+%left '$'
+%left '(' ')'
+%%
+
+start
+ : opt_nls program opt_nls
+ { expression_value = $2; }
+ ;
+
+program
+ : rule
+ {
+ if ($1 != NULL)
+ $$ = $1;
+ else
+ $$ = NULL;
+ yyerrok;
+ }
+ | program rule
+ /* add the rule to the tail of list */
+ {
+ if ($2 == NULL)
+ $$ = $1;
+ else if ($1 == NULL)
+ $$ = $2;
+ else {
+ if ($1->type != Node_rule_list)
+ $1 = node($1, Node_rule_list,
+ (NODE*)NULL);
+ $$ = append_right ($1,
+ node($2, Node_rule_list,(NODE *) NULL));
+ }
+ yyerrok;
+ }
+ | error { $$ = NULL; }
+ | program error { $$ = NULL; }
+ ;
+
+rule
+ : LEX_BEGIN { io_allowed = 0; }
+ action
+ {
+ if (begin_block) {
+ if (begin_block->type != Node_rule_list)
+ begin_block = node(begin_block, Node_rule_list,
+ (NODE *)NULL);
+ (void) append_right (begin_block, node(
+ node((NODE *)NULL, Node_rule_node, $3),
+ Node_rule_list, (NODE *)NULL) );
+ } else
+ begin_block = node((NODE *)NULL, Node_rule_node, $3);
+ $$ = NULL;
+ io_allowed = 1;
+ yyerrok;
+ }
+ | LEX_END { io_allowed = 0; }
+ action
+ {
+ if (end_block) {
+ if (end_block->type != Node_rule_list)
+ end_block = node(end_block, Node_rule_list,
+ (NODE *)NULL);
+ (void) append_right (end_block, node(
+ node((NODE *)NULL, Node_rule_node, $3),
+ Node_rule_list, (NODE *)NULL));
+ } else
+ end_block = node((NODE *)NULL, Node_rule_node, $3);
+ $$ = NULL;
+ io_allowed = 1;
+ yyerrok;
+ }
+ | LEX_BEGIN statement_term
+ {
+ warning("BEGIN blocks must have an action part");
+ errcount++;
+ yyerrok;
+ }
+ | LEX_END statement_term
+ {
+ warning("END blocks must have an action part");
+ errcount++;
+ yyerrok;
+ }
+ | pattern action
+ { $$ = node ($1, Node_rule_node, $2); yyerrok; }
+ | action
+ { $$ = node ((NODE *)NULL, Node_rule_node, $1); yyerrok; }
+ | pattern statement_term
+ {
+ $$ = node ($1,
+ Node_rule_node,
+ node(node(node(make_number(0.0),
+ Node_field_spec,
+ (NODE *) NULL),
+ Node_expression_list,
+ (NODE *) NULL),
+ Node_K_print,
+ (NODE *) NULL));
+ yyerrok;
+ }
+ | function_prologue function_body
+ {
+ func_install($1, $2);
+ $$ = NULL;
+ yyerrok;
+ }
+ ;
+
+func_name
+ : NAME
+ { $$ = $1; }
+ | FUNC_CALL
+ { $$ = $1; }
+ | lex_builtin
+ {
+ yyerror("%s() is a built-in function, it cannot be redefined",
+ tokstart);
+ errcount++;
+ /* yyerrok; */
+ }
+ ;
+
+lex_builtin
+ : LEX_BUILTIN
+ | LEX_LENGTH
+ ;
+
+function_prologue
+ : LEX_FUNCTION
+ {
+ param_counter = 0;
+ }
+ func_name '(' opt_param_list r_paren opt_nls
+ {
+ $$ = append_right(make_param($3), $5);
+ can_return = 1;
+ }
+ ;
+
+function_body
+ : l_brace statements r_brace opt_semi
+ {
+ $$ = $2;
+ can_return = 0;
+ }
+ ;
+
+
+pattern
+ : exp
+ { $$ = $1; }
+ | exp comma exp
+ { $$ = mkrangenode ( node($1, Node_cond_pair, $3) ); }
+ ;
+
+regexp
+ /*
+ * In this rule, want_regexp tells yylex that the next thing
+ * is a regexp so it should read up to the closing slash.
+ */
+ : '/'
+ { ++want_regexp; }
+ REGEXP '/'
+ {
+ NODE *n;
+ int len;
+
+ getnode(n);
+ n->type = Node_regex;
+ len = strlen($3);
+ n->re_exp = make_string($3, len);
+ n->re_reg = make_regexp($3, len, 0, 1);
+ n->re_text = NULL;
+ n->re_flags = CONST;
+ n->re_cnt = 1;
+ $$ = n;
+ }
+ ;
+
+action
+ : l_brace statements r_brace opt_semi opt_nls
+ { $$ = $2 ; }
+ | l_brace r_brace opt_semi opt_nls
+ { $$ = NULL; }
+ ;
+
+statements
+ : statement
+ { $$ = $1; }
+ | statements statement
+ {
+ if ($1 == NULL || $1->type != Node_statement_list)
+ $1 = node($1, Node_statement_list,(NODE *)NULL);
+ $$ = append_right($1,
+ node( $2, Node_statement_list, (NODE *)NULL));
+ yyerrok;
+ }
+ | error
+ { $$ = NULL; }
+ | statements error
+ { $$ = NULL; }
+ ;
+
+statement_term
+ : nls
+ | semi opt_nls
+ ;
+
+statement
+ : semi opt_nls
+ { $$ = NULL; }
+ | l_brace r_brace
+ { $$ = NULL; }
+ | l_brace statements r_brace
+ { $$ = $2; }
+ | if_statement
+ { $$ = $1; }
+ | LEX_WHILE '(' exp r_paren opt_nls statement
+ { $$ = node ($3, Node_K_while, $6); }
+ | LEX_DO opt_nls statement LEX_WHILE '(' exp r_paren opt_nls
+ { $$ = node ($6, Node_K_do, $3); }
+ | LEX_FOR '(' NAME LEX_IN NAME r_paren opt_nls statement
+ {
+ $$ = node ($8, Node_K_arrayfor, make_for_loop(variable($3,1),
+ (NODE *)NULL, variable($5,1)));
+ }
+ | LEX_FOR '(' opt_exp semi exp semi opt_exp r_paren opt_nls statement
+ {
+ $$ = node($10, Node_K_for, (NODE *)make_for_loop($3, $5, $7));
+ }
+ | LEX_FOR '(' opt_exp semi semi opt_exp r_paren opt_nls statement
+ {
+ $$ = node ($9, Node_K_for,
+ (NODE *)make_for_loop($3, (NODE *)NULL, $6));
+ }
+ | LEX_BREAK statement_term
+ /* for break, maybe we'll have to remember where to break to */
+ { $$ = node ((NODE *)NULL, Node_K_break, (NODE *)NULL); }
+ | LEX_CONTINUE statement_term
+ /* similarly */
+ { $$ = node ((NODE *)NULL, Node_K_continue, (NODE *)NULL); }
+ | print '(' expression_list r_paren output_redir statement_term
+ { $$ = node ($3, $1, $5); }
+ | print opt_rexpression_list output_redir statement_term
+ {
+ if ($1 == Node_K_print && $2 == NULL)
+ $2 = node(node(make_number(0.0),
+ Node_field_spec,
+ (NODE *) NULL),
+ Node_expression_list,
+ (NODE *) NULL);
+
+ $$ = node ($2, $1, $3);
+ }
+ | LEX_NEXT opt_exp statement_term
+ { NODETYPE type;
+
+ if ($2 && $2 == lookup("file")) {
+ if (do_lint)
+ warning("`next file' is a gawk extension");
+ else if (do_unix || do_posix)
+ yyerror("`next file' is a gawk extension");
+ else if (! io_allowed)
+ yyerror("`next file' used in BEGIN or END action");
+ type = Node_K_nextfile;
+ } else {
+ if (! io_allowed)
+ yyerror("next used in BEGIN or END action");
+ type = Node_K_next;
+ }
+ $$ = node ((NODE *)NULL, type, (NODE *)NULL);
+ }
+ | LEX_EXIT opt_exp statement_term
+ { $$ = node ($2, Node_K_exit, (NODE *)NULL); }
+ | LEX_RETURN
+ { if (! can_return) yyerror("return used outside function context"); }
+ opt_exp statement_term
+ { $$ = node ($3, Node_K_return, (NODE *)NULL); }
+ | LEX_DELETE NAME '[' expression_list ']' statement_term
+ { $$ = node (variable($2,1), Node_K_delete, $4); }
+ | exp statement_term
+ { $$ = $1; }
+ ;
+
+print
+ : LEX_PRINT
+ { $$ = $1; }
+ | LEX_PRINTF
+ { $$ = $1; }
+ ;
+
+if_statement
+ : LEX_IF '(' exp r_paren opt_nls statement
+ {
+ $$ = node($3, Node_K_if,
+ node($6, Node_if_branches, (NODE *)NULL));
+ }
+ | LEX_IF '(' exp r_paren opt_nls statement
+ LEX_ELSE opt_nls statement
+ { $$ = node ($3, Node_K_if,
+ node ($6, Node_if_branches, $9)); }
+ ;
+
+nls
+ : NEWLINE
+ { want_assign = 0; }
+ | nls NEWLINE
+ ;
+
+opt_nls
+ : /* empty */
+ | nls
+ ;
+
+input_redir
+ : /* empty */
+ { $$ = NULL; }
+ | '<' simp_exp
+ { $$ = node ($2, Node_redirect_input, (NODE *)NULL); }
+ ;
+
+output_redir
+ : /* empty */
+ { $$ = NULL; }
+ | '>' exp
+ { $$ = node ($2, Node_redirect_output, (NODE *)NULL); }
+ | APPEND_OP exp
+ { $$ = node ($2, Node_redirect_append, (NODE *)NULL); }
+ | '|' exp
+ { $$ = node ($2, Node_redirect_pipe, (NODE *)NULL); }
+ ;
+
+opt_param_list
+ : /* empty */
+ { $$ = NULL; }
+ | param_list
+ { $$ = $1; }
+ ;
+
+param_list
+ : NAME
+ { $$ = make_param($1); }
+ | param_list comma NAME
+ { $$ = append_right($1, make_param($3)); yyerrok; }
+ | error
+ { $$ = NULL; }
+ | param_list error
+ { $$ = NULL; }
+ | param_list comma error
+ { $$ = NULL; }
+ ;
+
+/* optional expression, as in for loop */
+opt_exp
+ : /* empty */
+ { $$ = NULL; }
+ | exp
+ { $$ = $1; }
+ ;
+
+opt_rexpression_list
+ : /* empty */
+ { $$ = NULL; }
+ | rexpression_list
+ { $$ = $1; }
+ ;
+
+rexpression_list
+ : rexp
+ { $$ = node ($1, Node_expression_list, (NODE *)NULL); }
+ | rexpression_list comma rexp
+ {
+ $$ = append_right($1,
+ node( $3, Node_expression_list, (NODE *)NULL));
+ yyerrok;
+ }
+ | error
+ { $$ = NULL; }
+ | rexpression_list error
+ { $$ = NULL; }
+ | rexpression_list error rexp
+ { $$ = NULL; }
+ | rexpression_list comma error
+ { $$ = NULL; }
+ ;
+
+opt_expression_list
+ : /* empty */
+ { $$ = NULL; }
+ | expression_list
+ { $$ = $1; }
+ ;
+
+expression_list
+ : exp
+ { $$ = node ($1, Node_expression_list, (NODE *)NULL); }
+ | expression_list comma exp
+ {
+ $$ = append_right($1,
+ node( $3, Node_expression_list, (NODE *)NULL));
+ yyerrok;
+ }
+ | error
+ { $$ = NULL; }
+ | expression_list error
+ { $$ = NULL; }
+ | expression_list error exp
+ { $$ = NULL; }
+ | expression_list comma error
+ { $$ = NULL; }
+ ;
+
+/* Expressions, not including the comma operator. */
+exp : variable ASSIGNOP
+ { want_assign = 0; }
+ exp
+ {
+ if (do_lint && $4->type == Node_regex)
+ warning("Regular expression on left of assignment.");
+ $$ = node ($1, $2, $4);
+ }
+ | '(' expression_list r_paren LEX_IN NAME
+ { $$ = node (variable($5,1), Node_in_array, $2); }
+ | exp '|' LEX_GETLINE opt_variable
+ {
+ $$ = node ($4, Node_K_getline,
+ node ($1, Node_redirect_pipein, (NODE *)NULL));
+ }
+ | LEX_GETLINE opt_variable input_redir
+ {
+ if (do_lint && ! io_allowed && $3 == NULL)
+ warning("non-redirected getline undefined inside BEGIN or END action");
+ $$ = node ($2, Node_K_getline, $3);
+ }
+ | exp LEX_AND exp
+ { $$ = node ($1, Node_and, $3); }
+ | exp LEX_OR exp
+ { $$ = node ($1, Node_or, $3); }
+ | exp MATCHOP exp
+ {
+ if ($1->type == Node_regex)
+ warning("Regular expression on left of MATCH operator.");
+ $$ = node ($1, $2, mk_rexp($3));
+ }
+ | regexp
+ { $$ = $1; }
+ | '!' regexp %prec UNARY
+ {
+ $$ = node(node(make_number(0.0),
+ Node_field_spec,
+ (NODE *) NULL),
+ Node_nomatch,
+ $2);
+ }
+ | exp LEX_IN NAME
+ { $$ = node (variable($3,1), Node_in_array, $1); }
+ | exp RELOP exp
+ {
+ if (do_lint && $3->type == Node_regex)
+ warning("Regular expression on left of comparison.");
+ $$ = node ($1, $2, $3);
+ }
+ | exp '<' exp
+ { $$ = node ($1, Node_less, $3); }
+ | exp '>' exp
+ { $$ = node ($1, Node_greater, $3); }
+ | exp '?' exp ':' exp
+ { $$ = node($1, Node_cond_exp, node($3, Node_if_branches, $5));}
+ | simp_exp
+ { $$ = $1; }
+ | exp simp_exp %prec CONCAT_OP
+ { $$ = node ($1, Node_concat, $2); }
+ ;
+
+rexp
+ : variable ASSIGNOP
+ { want_assign = 0; }
+ rexp
+ { $$ = node ($1, $2, $4); }
+ | rexp LEX_AND rexp
+ { $$ = node ($1, Node_and, $3); }
+ | rexp LEX_OR rexp
+ { $$ = node ($1, Node_or, $3); }
+ | LEX_GETLINE opt_variable input_redir
+ {
+ if (do_lint && ! io_allowed && $3 == NULL)
+ warning("non-redirected getline undefined inside BEGIN or END action");
+ $$ = node ($2, Node_K_getline, $3);
+ }
+ | regexp
+ { $$ = $1; }
+ | '!' regexp %prec UNARY
+ { $$ = node((NODE *) NULL, Node_nomatch, $2); }
+ | rexp MATCHOP rexp
+ { $$ = node ($1, $2, mk_rexp($3)); }
+ | rexp LEX_IN NAME
+ { $$ = node (variable($3,1), Node_in_array, $1); }
+ | rexp RELOP rexp
+ { $$ = node ($1, $2, $3); }
+ | rexp '?' rexp ':' rexp
+ { $$ = node($1, Node_cond_exp, node($3, Node_if_branches, $5));}
+ | simp_exp
+ { $$ = $1; }
+ | rexp simp_exp %prec CONCAT_OP
+ { $$ = node ($1, Node_concat, $2); }
+ ;
+
+simp_exp
+ : non_post_simp_exp
+ /* Binary operators in order of decreasing precedence. */
+ | simp_exp '^' simp_exp
+ { $$ = node ($1, Node_exp, $3); }
+ | simp_exp '*' simp_exp
+ { $$ = node ($1, Node_times, $3); }
+ | simp_exp '/' simp_exp
+ { $$ = node ($1, Node_quotient, $3); }
+ | simp_exp '%' simp_exp
+ { $$ = node ($1, Node_mod, $3); }
+ | simp_exp '+' simp_exp
+ { $$ = node ($1, Node_plus, $3); }
+ | simp_exp '-' simp_exp
+ { $$ = node ($1, Node_minus, $3); }
+ | variable INCREMENT
+ { $$ = node ($1, Node_postincrement, (NODE *)NULL); }
+ | variable DECREMENT
+ { $$ = node ($1, Node_postdecrement, (NODE *)NULL); }
+ ;
+
+non_post_simp_exp
+ : '!' simp_exp %prec UNARY
+ { $$ = node ($2, Node_not,(NODE *) NULL); }
+ | '(' exp r_paren
+ { $$ = $2; }
+ | LEX_BUILTIN
+ '(' opt_expression_list r_paren
+ { $$ = snode ($3, Node_builtin, (int) $1); }
+ | LEX_LENGTH '(' opt_expression_list r_paren
+ { $$ = snode ($3, Node_builtin, (int) $1); }
+ | LEX_LENGTH
+ {
+ if (do_lint)
+ warning("call of `length' without parentheses is not portable");
+ $$ = snode ((NODE *)NULL, Node_builtin, (int) $1);
+ if (do_posix)
+ warning( "call of `length' without parentheses is deprecated by POSIX");
+ }
+ | FUNC_CALL '(' opt_expression_list r_paren
+ {
+ $$ = node ($3, Node_func_call, make_string($1, strlen($1)));
+ }
+ | variable
+ | INCREMENT variable
+ { $$ = node ($2, Node_preincrement, (NODE *)NULL); }
+ | DECREMENT variable
+ { $$ = node ($2, Node_predecrement, (NODE *)NULL); }
+ | YNUMBER
+ { $$ = $1; }
+ | YSTRING
+ { $$ = $1; }
+
+ | '-' simp_exp %prec UNARY
+ { if ($2->type == Node_val) {
+ $2->numbr = -(force_number($2));
+ $$ = $2;
+ } else
+ $$ = node ($2, Node_unary_minus, (NODE *)NULL);
+ }
+ | '+' simp_exp %prec UNARY
+ { $$ = $2; }
+ ;
+
+opt_variable
+ : /* empty */
+ { $$ = NULL; }
+ | variable
+ { $$ = $1; }
+ ;
+
+variable
+ : NAME
+ { $$ = variable($1,1); }
+ | NAME '[' expression_list ']'
+ {
+ if ($3->rnode == NULL) {
+ $$ = node (variable($1,1), Node_subscript, $3->lnode);
+ freenode($3);
+ } else
+ $$ = node (variable($1,1), Node_subscript, $3);
+ }
+ | '$' non_post_simp_exp
+ { $$ = node ($2, Node_field_spec, (NODE *)NULL); }
+ ;
+
+l_brace
+ : '{' opt_nls
+ ;
+
+r_brace
+ : '}' opt_nls { yyerrok; }
+ ;
+
+r_paren
+ : ')' { yyerrok; }
+ ;
+
+opt_semi
+ : /* empty */
+ | semi
+ ;
+
+semi
+ : ';' { yyerrok; want_assign = 0; }
+ ;
+
+comma : ',' opt_nls { yyerrok; }
+ ;
+
+%%
+
+struct token {
+ char *operator; /* text to match */
+ NODETYPE value; /* node type */
+ int class; /* lexical class */
+ unsigned flags; /* # of args. allowed and compatability */
+# define ARGS 0xFF /* 0, 1, 2, 3 args allowed (any combination */
+# define A(n) (1<<(n))
+# define VERSION 0xFF00 /* old awk is zero */
+# define NOT_OLD 0x0100 /* feature not in old awk */
+# define NOT_POSIX 0x0200 /* feature not in POSIX */
+# define GAWKX 0x0400 /* gawk extension */
+ NODE *(*ptr) (); /* function that implements this keyword */
+};
+
+extern NODE
+ *do_exp(), *do_getline(), *do_index(), *do_length(),
+ *do_sqrt(), *do_log(), *do_sprintf(), *do_substr(),
+ *do_split(), *do_system(), *do_int(), *do_close(),
+ *do_atan2(), *do_sin(), *do_cos(), *do_rand(),
+ *do_srand(), *do_match(), *do_tolower(), *do_toupper(),
+ *do_sub(), *do_gsub(), *do_strftime(), *do_systime();
+
+/* Tokentab is sorted ascii ascending order, so it can be binary searched. */
+
+static struct token tokentab[] = {
+{"BEGIN", Node_illegal, LEX_BEGIN, 0, 0},
+{"END", Node_illegal, LEX_END, 0, 0},
+{"atan2", Node_builtin, LEX_BUILTIN, NOT_OLD|A(2), do_atan2},
+{"break", Node_K_break, LEX_BREAK, 0, 0},
+{"close", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_close},
+{"continue", Node_K_continue, LEX_CONTINUE, 0, 0},
+{"cos", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_cos},
+{"delete", Node_K_delete, LEX_DELETE, NOT_OLD, 0},
+{"do", Node_K_do, LEX_DO, NOT_OLD, 0},
+{"else", Node_illegal, LEX_ELSE, 0, 0},
+{"exit", Node_K_exit, LEX_EXIT, 0, 0},
+{"exp", Node_builtin, LEX_BUILTIN, A(1), do_exp},
+{"for", Node_K_for, LEX_FOR, 0, 0},
+{"func", Node_K_function, LEX_FUNCTION, NOT_POSIX|NOT_OLD, 0},
+{"function", Node_K_function, LEX_FUNCTION, NOT_OLD, 0},
+{"getline", Node_K_getline, LEX_GETLINE, NOT_OLD, 0},
+{"gsub", Node_builtin, LEX_BUILTIN, NOT_OLD|A(2)|A(3), do_gsub},
+{"if", Node_K_if, LEX_IF, 0, 0},
+{"in", Node_illegal, LEX_IN, 0, 0},
+{"index", Node_builtin, LEX_BUILTIN, A(2), do_index},
+{"int", Node_builtin, LEX_BUILTIN, A(1), do_int},
+{"length", Node_builtin, LEX_LENGTH, A(0)|A(1), do_length},
+{"log", Node_builtin, LEX_BUILTIN, A(1), do_log},
+{"match", Node_builtin, LEX_BUILTIN, NOT_OLD|A(2), do_match},
+{"next", Node_K_next, LEX_NEXT, 0, 0},
+{"print", Node_K_print, LEX_PRINT, 0, 0},
+{"printf", Node_K_printf, LEX_PRINTF, 0, 0},
+{"rand", Node_builtin, LEX_BUILTIN, NOT_OLD|A(0), do_rand},
+{"return", Node_K_return, LEX_RETURN, NOT_OLD, 0},
+{"sin", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_sin},
+{"split", Node_builtin, LEX_BUILTIN, A(2)|A(3), do_split},
+{"sprintf", Node_builtin, LEX_BUILTIN, 0, do_sprintf},
+{"sqrt", Node_builtin, LEX_BUILTIN, A(1), do_sqrt},
+{"srand", Node_builtin, LEX_BUILTIN, NOT_OLD|A(0)|A(1), do_srand},
+{"strftime", Node_builtin, LEX_BUILTIN, GAWKX|A(1)|A(2), do_strftime},
+{"sub", Node_builtin, LEX_BUILTIN, NOT_OLD|A(2)|A(3), do_sub},
+{"substr", Node_builtin, LEX_BUILTIN, A(2)|A(3), do_substr},
+{"system", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_system},
+{"systime", Node_builtin, LEX_BUILTIN, GAWKX|A(0), do_systime},
+{"tolower", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_tolower},
+{"toupper", Node_builtin, LEX_BUILTIN, NOT_OLD|A(1), do_toupper},
+{"while", Node_K_while, LEX_WHILE, 0, 0},
+};
+
+/* VARARGS0 */
+static void
+yyerror(va_alist)
+va_dcl
+{
+ va_list args;
+ char *mesg = NULL;
+ register char *bp, *cp;
+ char *scan;
+ char buf[120];
+
+ errcount++;
+ /* Find the current line in the input file */
+ if (lexptr) {
+ if (!thisline) {
+ cp = lexeme;
+ if (*cp == '\n') {
+ cp--;
+ mesg = "unexpected newline";
+ }
+ for ( ; cp != lexptr_begin && *cp != '\n'; --cp)
+ ;
+ if (*cp == '\n')
+ cp++;
+ thisline = cp;
+ }
+ /* NL isn't guaranteed */
+ bp = lexeme;
+ while (bp < lexend && *bp && *bp != '\n')
+ bp++;
+ } else {
+ thisline = "(END OF FILE)";
+ bp = thisline + 13;
+ }
+ msg("%.*s", (int) (bp - thisline), thisline);
+ bp = buf;
+ cp = buf + sizeof(buf) - 24; /* 24 more than longest msg. input */
+ if (lexptr) {
+ scan = thisline;
+ while (bp < cp && scan < lexeme)
+ if (*scan++ == '\t')
+ *bp++ = '\t';
+ else
+ *bp++ = ' ';
+ *bp++ = '^';
+ *bp++ = ' ';
+ }
+ va_start(args);
+ if (mesg == NULL)
+ mesg = va_arg(args, char *);
+ strcpy(bp, mesg);
+ err("", buf, args);
+ va_end(args);
+ exit(2);
+}
+
+static char *
+get_src_buf()
+{
+ static int samefile = 0;
+ static int nextfile = 0;
+ static char *buf = NULL;
+ static int fd;
+ int n;
+ register char *scan;
+ static int len = 0;
+ static int did_newline = 0;
+# define SLOP 128 /* enough space to hold most source lines */
+
+ if (nextfile > numfiles)
+ return NULL;
+
+ if (srcfiles[nextfile].stype == CMDLINE) {
+ if (len == 0) {
+ len = strlen(srcfiles[nextfile].val);
+ sourceline = 1;
+ lexptr = lexptr_begin = srcfiles[nextfile].val;
+ lexend = lexptr + len;
+ } else if (!did_newline && *(lexptr-1) != '\n') {
+ /*
+ * The following goop is to ensure that the source
+ * ends with a newline and that the entire current
+ * line is available for error messages.
+ */
+ int offset;
+
+ did_newline = 1;
+ offset = lexptr - lexeme;
+ for (scan = lexeme; scan > lexptr_begin; scan--)
+ if (*scan == '\n') {
+ scan++;
+ break;
+ }
+ len = lexptr - scan;
+ emalloc(buf, char *, len+1, "get_src_buf");
+ memcpy(buf, scan, len);
+ thisline = buf;
+ lexptr = buf + len;
+ *lexptr = '\n';
+ lexeme = lexptr - offset;
+ lexptr_begin = buf;
+ lexend = lexptr + 1;
+ } else {
+ len = 0;
+ lexeme = lexptr = lexptr_begin = NULL;
+ }
+ if (lexptr == NULL && ++nextfile <= numfiles)
+ return get_src_buf();
+ return lexptr;
+ }
+ if (!samefile) {
+ source = srcfiles[nextfile].val;
+ if (source == NULL) {
+ if (buf) {
+ free(buf);
+ buf = NULL;
+ }
+ len = 0;
+ return lexeme = lexptr = lexptr_begin = NULL;
+ }
+ fd = pathopen(source);
+ if (fd == -1)
+ fatal("can't open source file \"%s\" for reading (%s)",
+ source, strerror(errno));
+ len = optimal_bufsize(fd);
+ if (buf)
+ free(buf);
+ emalloc(buf, char *, len + SLOP, "get_src_buf");
+ lexptr_begin = buf + SLOP;
+ samefile = 1;
+ sourceline = 1;
+ } else {
+ /*
+ * Here, we retain the current source line (up to length SLOP)
+ * in the beginning of the buffer that was overallocated above
+ */
+ int offset;
+ int linelen;
+
+ offset = lexptr - lexeme;
+ for (scan = lexeme; scan > lexptr_begin; scan--)
+ if (*scan == '\n') {
+ scan++;
+ break;
+ }
+ linelen = lexptr - scan;
+ if (linelen > SLOP)
+ linelen = SLOP;
+ thisline = buf + SLOP - linelen;
+ memcpy(thisline, scan, linelen);
+ lexeme = buf + SLOP - offset;
+ lexptr_begin = thisline;
+ }
+ n = read(fd, buf + SLOP, len);
+ if (n == -1)
+ fatal("can't read sourcefile \"%s\" (%s)",
+ source, strerror(errno));
+ if (n == 0) {
+ samefile = 0;
+ nextfile++;
+ len = 0;
+ return get_src_buf();
+ }
+ lexptr = buf + SLOP;
+ lexend = lexptr + n;
+ return buf;
+}
+
+#define tokadd(x) (*token++ = (x), token == tokend ? tokexpand() : token)
+
+char *
+tokexpand()
+{
+ static int toksize = 60;
+ int tokoffset;
+
+ tokoffset = token - tokstart;
+ toksize *= 2;
+ if (tokstart)
+ erealloc(tokstart, char *, toksize, "tokexpand");
+ else
+ emalloc(tokstart, char *, toksize, "tokexpand");
+ tokend = tokstart + toksize;
+ token = tokstart + tokoffset;
+ return token;
+}
+
+#if DEBUG
+char
+nextc() {
+ if (lexptr && lexptr < lexend)
+ return *lexptr++;
+ else if (get_src_buf())
+ return *lexptr++;
+ else
+ return '\0';
+}
+#else
+#define nextc() ((lexptr && lexptr < lexend) ? \
+ *lexptr++ : \
+ (get_src_buf() ? *lexptr++ : '\0') \
+ )
+#endif
+#define pushback() (lexptr && lexptr > lexptr_begin ? lexptr-- : lexptr)
+
+/*
+ * Read the input and turn it into tokens.
+ */
+
+static int
+yylex()
+{
+ register int c;
+ int seen_e = 0; /* These are for numbers */
+ int seen_point = 0;
+ int esc_seen; /* for literal strings */
+ int low, mid, high;
+ static int did_newline = 0;
+ char *tokkey;
+
+ if (!nextc())
+ return 0;
+ pushback();
+ lexeme = lexptr;
+ thisline = NULL;
+ if (want_regexp) {
+ int in_brack = 0;
+
+ want_regexp = 0;
+ token = tokstart;
+ while ((c = nextc()) != 0) {
+ switch (c) {
+ case '[':
+ in_brack = 1;
+ break;
+ case ']':
+ in_brack = 0;
+ break;
+ case '\\':
+ if ((c = nextc()) == '\0') {
+ yyerror("unterminated regexp ends with \\ at end of file");
+ } else if (c == '\n') {
+ sourceline++;
+ continue;
+ } else
+ tokadd('\\');
+ break;
+ case '/': /* end of the regexp */
+ if (in_brack)
+ break;
+
+ pushback();
+ tokadd('\0');
+ yylval.sval = tokstart;
+ return REGEXP;
+ case '\n':
+ pushback();
+ yyerror("unterminated regexp");
+ case '\0':
+ yyerror("unterminated regexp at end of file");
+ }
+ tokadd(c);
+ }
+ }
+retry:
+ while ((c = nextc()) == ' ' || c == '\t')
+ ;
+
+ lexeme = lexptr ? lexptr - 1 : lexptr;
+ thisline = NULL;
+ token = tokstart;
+ yylval.nodetypeval = Node_illegal;
+
+ switch (c) {
+ case 0:
+ return 0;
+
+ case '\n':
+ sourceline++;
+ return NEWLINE;
+
+ case '#': /* it's a comment */
+ while ((c = nextc()) != '\n') {
+ if (c == '\0')
+ return 0;
+ }
+ sourceline++;
+ return NEWLINE;
+
+ case '\\':
+#ifdef RELAXED_CONTINUATION
+ if (!do_unix) { /* strip trailing white-space and/or comment */
+ while ((c = nextc()) == ' ' || c == '\t') continue;
+ if (c == '#')
+ while ((c = nextc()) != '\n') if (!c) break;
+ pushback();
+ }
+#endif /*RELAXED_CONTINUATION*/
+ if (nextc() == '\n') {
+ sourceline++;
+ goto retry;
+ } else
+ yyerror("inappropriate use of backslash");
+ break;
+
+ case '$':
+ want_assign = 1;
+ return '$';
+
+ case ')':
+ case ']':
+ case '(':
+ case '[':
+ case ';':
+ case ':':
+ case '?':
+ case '{':
+ case ',':
+ return c;
+
+ case '*':
+ if ((c = nextc()) == '=') {
+ yylval.nodetypeval = Node_assign_times;
+ return ASSIGNOP;
+ } else if (do_posix) {
+ pushback();
+ return '*';
+ } else if (c == '*') {
+ /* make ** and **= aliases for ^ and ^= */
+ static int did_warn_op = 0, did_warn_assgn = 0;
+
+ if (nextc() == '=') {
+ if (do_lint && ! did_warn_assgn) {
+ did_warn_assgn = 1;
+ warning("**= is not allowed by POSIX");
+ }
+ yylval.nodetypeval = Node_assign_exp;
+ return ASSIGNOP;
+ } else {
+ pushback();
+ if (do_lint && ! did_warn_op) {
+ did_warn_op = 1;
+ warning("** is not allowed by POSIX");
+ }
+ return '^';
+ }
+ }
+ pushback();
+ return '*';
+
+ case '/':
+ if (want_assign) {
+ if (nextc() == '=') {
+ yylval.nodetypeval = Node_assign_quotient;
+ return ASSIGNOP;
+ }
+ pushback();
+ }
+ return '/';
+
+ case '%':
+ if (nextc() == '=') {
+ yylval.nodetypeval = Node_assign_mod;
+ return ASSIGNOP;
+ }
+ pushback();
+ return '%';
+
+ case '^':
+ {
+ static int did_warn_op = 0, did_warn_assgn = 0;
+
+ if (nextc() == '=') {
+
+ if (do_lint && ! did_warn_assgn) {
+ did_warn_assgn = 1;
+ warning("operator `^=' is not supported in old awk");
+ }
+ yylval.nodetypeval = Node_assign_exp;
+ return ASSIGNOP;
+ }
+ pushback();
+ if (do_lint && ! did_warn_op) {
+ did_warn_op = 1;
+ warning("operator `^' is not supported in old awk");
+ }
+ return '^';
+ }
+
+ case '+':
+ if ((c = nextc()) == '=') {
+ yylval.nodetypeval = Node_assign_plus;
+ return ASSIGNOP;
+ }
+ if (c == '+')
+ return INCREMENT;
+ pushback();
+ return '+';
+
+ case '!':
+ if ((c = nextc()) == '=') {
+ yylval.nodetypeval = Node_notequal;
+ return RELOP;
+ }
+ if (c == '~') {
+ yylval.nodetypeval = Node_nomatch;
+ want_assign = 0;
+ return MATCHOP;
+ }
+ pushback();
+ return '!';
+
+ case '<':
+ if (nextc() == '=') {
+ yylval.nodetypeval = Node_leq;
+ return RELOP;
+ }
+ yylval.nodetypeval = Node_less;
+ pushback();
+ return '<';
+
+ case '=':
+ if (nextc() == '=') {
+ yylval.nodetypeval = Node_equal;
+ return RELOP;
+ }
+ yylval.nodetypeval = Node_assign;
+ pushback();
+ return ASSIGNOP;
+
+ case '>':
+ if ((c = nextc()) == '=') {
+ yylval.nodetypeval = Node_geq;
+ return RELOP;
+ } else if (c == '>') {
+ yylval.nodetypeval = Node_redirect_append;
+ return APPEND_OP;
+ }
+ yylval.nodetypeval = Node_greater;
+ pushback();
+ return '>';
+
+ case '~':
+ yylval.nodetypeval = Node_match;
+ want_assign = 0;
+ return MATCHOP;
+
+ case '}':
+ /*
+ * Added did newline stuff. Easier than
+ * hacking the grammar
+ */
+ if (did_newline) {
+ did_newline = 0;
+ return c;
+ }
+ did_newline++;
+ --lexptr; /* pick up } next time */
+ return NEWLINE;
+
+ case '"':
+ esc_seen = 0;
+ while ((c = nextc()) != '"') {
+ if (c == '\n') {
+ pushback();
+ yyerror("unterminated string");
+ }
+ if (c == '\\') {
+ c = nextc();
+ if (c == '\n') {
+ sourceline++;
+ continue;
+ }
+ esc_seen = 1;
+ tokadd('\\');
+ }
+ if (c == '\0') {
+ pushback();
+ yyerror("unterminated string");
+ }
+ tokadd(c);
+ }
+ yylval.nodeval = make_str_node(tokstart,
+ token - tokstart, esc_seen ? SCAN : 0);
+ yylval.nodeval->flags |= PERM;
+ return YSTRING;
+
+ case '-':
+ if ((c = nextc()) == '=') {
+ yylval.nodetypeval = Node_assign_minus;
+ return ASSIGNOP;
+ }
+ if (c == '-')
+ return DECREMENT;
+ pushback();
+ return '-';
+
+ case '.':
+ c = nextc();
+ pushback();
+ if (!isdigit(c))
+ return '.';
+ else
+ c = '.'; /* FALL THROUGH */
+ case '0':
+ case '1':
+ case '2':
+ case '3':
+ case '4':
+ case '5':
+ case '6':
+ case '7':
+ case '8':
+ case '9':
+ /* It's a number */
+ for (;;) {
+ int gotnumber = 0;
+
+ tokadd(c);
+ switch (c) {
+ case '.':
+ if (seen_point) {
+ gotnumber++;
+ break;
+ }
+ ++seen_point;
+ break;
+ case 'e':
+ case 'E':
+ if (seen_e) {
+ gotnumber++;
+ break;
+ }
+ ++seen_e;
+ if ((c = nextc()) == '-' || c == '+')
+ tokadd(c);
+ else
+ pushback();
+ break;
+ case '0':
+ case '1':
+ case '2':
+ case '3':
+ case '4':
+ case '5':
+ case '6':
+ case '7':
+ case '8':
+ case '9':
+ break;
+ default:
+ gotnumber++;
+ }
+ if (gotnumber)
+ break;
+ c = nextc();
+ }
+ pushback();
+ yylval.nodeval = make_number(atof(tokstart));
+ yylval.nodeval->flags |= PERM;
+ return YNUMBER;
+
+ case '&':
+ if ((c = nextc()) == '&') {
+ yylval.nodetypeval = Node_and;
+ for (;;) {
+ c = nextc();
+ if (c == '\0')
+ break;
+ if (c == '#') {
+ while ((c = nextc()) != '\n' && c != '\0')
+ ;
+ if (c == '\0')
+ break;
+ }
+ if (c == '\n')
+ sourceline++;
+ if (! isspace(c)) {
+ pushback();
+ break;
+ }
+ }
+ want_assign = 0;
+ return LEX_AND;
+ }
+ pushback();
+ return '&';
+
+ case '|':
+ if ((c = nextc()) == '|') {
+ yylval.nodetypeval = Node_or;
+ for (;;) {
+ c = nextc();
+ if (c == '\0')
+ break;
+ if (c == '#') {
+ while ((c = nextc()) != '\n' && c != '\0')
+ ;
+ if (c == '\0')
+ break;
+ }
+ if (c == '\n')
+ sourceline++;
+ if (! isspace(c)) {
+ pushback();
+ break;
+ }
+ }
+ want_assign = 0;
+ return LEX_OR;
+ }
+ pushback();
+ return '|';
+ }
+
+ if (c != '_' && ! isalpha(c))
+ yyerror("Invalid char '%c' in expression\n", c);
+
+ /* it's some type of name-type-thing. Find its length */
+ token = tokstart;
+ while (is_identchar(c)) {
+ tokadd(c);
+ c = nextc();
+ }
+ tokadd('\0');
+ emalloc(tokkey, char *, token - tokstart, "yylex");
+ memcpy(tokkey, tokstart, token - tokstart);
+ pushback();
+
+ /* See if it is a special token. */
+ low = 0;
+ high = (sizeof (tokentab) / sizeof (tokentab[0])) - 1;
+ while (low <= high) {
+ int i/* , c */;
+
+ mid = (low + high) / 2;
+ c = *tokstart - tokentab[mid].operator[0];
+ i = c ? c : strcmp (tokstart, tokentab[mid].operator);
+
+ if (i < 0) { /* token < mid */
+ high = mid - 1;
+ } else if (i > 0) { /* token > mid */
+ low = mid + 1;
+ } else {
+ if (do_lint) {
+ if (tokentab[mid].flags & GAWKX)
+ warning("%s() is a gawk extension",
+ tokentab[mid].operator);
+ if (tokentab[mid].flags & NOT_POSIX)
+ warning("POSIX does not allow %s",
+ tokentab[mid].operator);
+ if (tokentab[mid].flags & NOT_OLD)
+ warning("%s is not supported in old awk",
+ tokentab[mid].operator);
+ }
+ if ((do_unix && (tokentab[mid].flags & GAWKX))
+ || (do_posix && (tokentab[mid].flags & NOT_POSIX)))
+ break;
+ if (tokentab[mid].class == LEX_BUILTIN
+ || tokentab[mid].class == LEX_LENGTH
+ )
+ yylval.lval = mid;
+ else
+ yylval.nodetypeval = tokentab[mid].value;
+
+ return tokentab[mid].class;
+ }
+ }
+
+ yylval.sval = tokkey;
+ if (*lexptr == '(')
+ return FUNC_CALL;
+ else {
+ want_assign = 1;
+ return NAME;
+ }
+}
+
+static NODE *
+node_common(op)
+NODETYPE op;
+{
+ register NODE *r;
+
+ getnode(r);
+ r->type = op;
+ r->flags = MALLOC;
+ /* if lookahead is NL, lineno is 1 too high */
+ if (lexeme && *lexeme == '\n')
+ r->source_line = sourceline - 1;
+ else
+ r->source_line = sourceline;
+ r->source_file = source;
+ return r;
+}
+
+/*
+ * This allocates a node with defined lnode and rnode.
+ */
+NODE *
+node(left, op, right)
+NODE *left, *right;
+NODETYPE op;
+{
+ register NODE *r;
+
+ r = node_common(op);
+ r->lnode = left;
+ r->rnode = right;
+ return r;
+}
+
+/*
+ * This allocates a node with defined subnode and proc for builtin functions
+ * Checks for arg. count and supplies defaults where possible.
+ */
+static NODE *
+snode(subn, op, idx)
+NODETYPE op;
+int idx;
+NODE *subn;
+{
+ register NODE *r;
+ register NODE *n;
+ int nexp = 0;
+ int args_allowed;
+
+ r = node_common(op);
+
+ /* traverse expression list to see how many args. given */
+ for (n= subn; n; n= n->rnode) {
+ nexp++;
+ if (nexp > 3)
+ break;
+ }
+
+ /* check against how many args. are allowed for this builtin */
+ args_allowed = tokentab[idx].flags & ARGS;
+ if (args_allowed && !(args_allowed & A(nexp)))
+ fatal("%s() cannot have %d argument%c",
+ tokentab[idx].operator, nexp, nexp == 1 ? ' ' : 's');
+
+ r->proc = tokentab[idx].ptr;
+
+ /* special case processing for a few builtins */
+ if (nexp == 0 && r->proc == do_length) {
+ subn = node(node(make_number(0.0),Node_field_spec,(NODE *)NULL),
+ Node_expression_list,
+ (NODE *) NULL);
+ } else if (r->proc == do_match) {
+ if (subn->rnode->lnode->type != Node_regex)
+ subn->rnode->lnode = mk_rexp(subn->rnode->lnode);
+ } else if (r->proc == do_sub || r->proc == do_gsub) {
+ if (subn->lnode->type != Node_regex)
+ subn->lnode = mk_rexp(subn->lnode);
+ if (nexp == 2)
+ append_right(subn, node(node(make_number(0.0),
+ Node_field_spec,
+ (NODE *) NULL),
+ Node_expression_list,
+ (NODE *) NULL));
+ else if (do_lint && subn->rnode->rnode->lnode->type == Node_val)
+ warning("string literal as last arg of substitute");
+ } else if (r->proc == do_split) {
+ if (nexp == 2)
+ append_right(subn,
+ node(FS_node, Node_expression_list, (NODE *) NULL));
+ n = subn->rnode->rnode->lnode;
+ if (n->type != Node_regex)
+ subn->rnode->rnode->lnode = mk_rexp(n);
+ if (nexp == 2)
+ subn->rnode->rnode->lnode->re_flags |= FS_DFLT;
+ }
+
+ r->subnode = subn;
+ return r;
+}
+
+/*
+ * This allocates a Node_line_range node with defined condpair and
+ * zeroes the trigger word to avoid the temptation of assuming that calling
+ * 'node( foo, Node_line_range, 0)' will properly initialize 'triggered'.
+ */
+/* Otherwise like node() */
+static NODE *
+mkrangenode(cpair)
+NODE *cpair;
+{
+ register NODE *r;
+
+ getnode(r);
+ r->type = Node_line_range;
+ r->condpair = cpair;
+ r->triggered = 0;
+ return r;
+}
+
+/* Build a for loop */
+static NODE *
+make_for_loop(init, cond, incr)
+NODE *init, *cond, *incr;
+{
+ register FOR_LOOP_HEADER *r;
+ NODE *n;
+
+ emalloc(r, FOR_LOOP_HEADER *, sizeof(FOR_LOOP_HEADER), "make_for_loop");
+ getnode(n);
+ n->type = Node_illegal;
+ r->init = init;
+ r->cond = cond;
+ r->incr = incr;
+ n->sub.nodep.r.hd = r;
+ return n;
+}
+
+/*
+ * Install a name in the symbol table, even if it is already there.
+ * Caller must check against redefinition if that is desired.
+ */
+NODE *
+install(name, value)
+char *name;
+NODE *value;
+{
+ register NODE *hp;
+ register int len, bucket;
+
+ len = strlen(name);
+ bucket = hash(name, len);
+ getnode(hp);
+ hp->type = Node_hashnode;
+ hp->hnext = variables[bucket];
+ variables[bucket] = hp;
+ hp->hlength = len;
+ hp->hvalue = value;
+ hp->hname = name;
+ hp->hvalue->vname = name;
+ return hp->hvalue;
+}
+
+/* find the most recent hash node for name installed by install */
+NODE *
+lookup(name)
+char *name;
+{
+ register NODE *bucket;
+ register int len;
+
+ len = strlen(name);
+ bucket = variables[hash(name, len)];
+ while (bucket) {
+ if (bucket->hlength == len && STREQN(bucket->hname, name, len))
+ return bucket->hvalue;
+ bucket = bucket->hnext;
+ }
+ return NULL;
+}
+
+/*
+ * Add new to the rightmost branch of LIST. This uses n^2 time, so we make
+ * a simple attempt at optimizing it.
+ */
+static NODE *
+append_right(list, new)
+NODE *list, *new;
+{
+ register NODE *oldlist;
+ static NODE *savefront = NULL, *savetail = NULL;
+
+ oldlist = list;
+ if (savefront == oldlist) {
+ savetail = savetail->rnode = new;
+ return oldlist;
+ } else
+ savefront = oldlist;
+ while (list->rnode != NULL)
+ list = list->rnode;
+ savetail = list->rnode = new;
+ return oldlist;
+}
+
+/*
+ * check if name is already installed; if so, it had better have Null value,
+ * in which case def is added as the value. Otherwise, install name with def
+ * as value.
+ */
+static void
+func_install(params, def)
+NODE *params;
+NODE *def;
+{
+ NODE *r;
+
+ pop_params(params->rnode);
+ pop_var(params, 0);
+ r = lookup(params->param);
+ if (r != NULL) {
+ fatal("function name `%s' previously defined", params->param);
+ } else
+ (void) install(params->param, node(params, Node_func, def));
+}
+
+static void
+pop_var(np, freeit)
+NODE *np;
+int freeit;
+{
+ register NODE *bucket, **save;
+ register int len;
+ char *name;
+
+ name = np->param;
+ len = strlen(name);
+ save = &(variables[hash(name, len)]);
+ for (bucket = *save; bucket; bucket = bucket->hnext) {
+ if (len == bucket->hlength && STREQN(bucket->hname, name, len)) {
+ *save = bucket->hnext;
+ freenode(bucket);
+ if (freeit)
+ free(np->param);
+ return;
+ }
+ save = &(bucket->hnext);
+ }
+}
+
+static void
+pop_params(params)
+NODE *params;
+{
+ register NODE *np;
+
+ for (np = params; np != NULL; np = np->rnode)
+ pop_var(np, 1);
+}
+
+static NODE *
+make_param(name)
+char *name;
+{
+ NODE *r;
+
+ getnode(r);
+ r->type = Node_param_list;
+ r->rnode = NULL;
+ r->param = name;
+ r->param_cnt = param_counter++;
+ return (install(name, r));
+}
+
+/* Name points to a variable name. Make sure its in the symbol table */
+NODE *
+variable(name, can_free)
+char *name;
+int can_free;
+{
+ register NODE *r;
+ static int env_loaded = 0;
+
+ if (!env_loaded && STREQ(name, "ENVIRON")) {
+ load_environ();
+ env_loaded = 1;
+ }
+ if ((r = lookup(name)) == NULL)
+ r = install(name, node(Nnull_string, Node_var, (NODE *) NULL));
+ else if (can_free)
+ free(name);
+ return r;
+}
+
+static NODE *
+mk_rexp(exp)
+NODE *exp;
+{
+ if (exp->type == Node_regex)
+ return exp;
+ else {
+ NODE *n;
+
+ getnode(n);
+ n->type = Node_regex;
+ n->re_exp = exp;
+ n->re_text = NULL;
+ n->re_reg = NULL;
+ n->re_flags = 0;
+ n->re_cnt = 1;
+ return n;
+ }
+}
diff --git a/gnu/usr.bin/awk/builtin.c b/gnu/usr.bin/awk/builtin.c
new file mode 100644
index 000000000000..9d5e3b302fde
--- /dev/null
+++ b/gnu/usr.bin/awk/builtin.c
@@ -0,0 +1,1133 @@
+/*
+ * builtin.c - Builtin functions and various utility procedures
+ */
+
+/*
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ *
+ * This file is part of GAWK, the GNU implementation of the
+ * AWK Progamming Language.
+ *
+ * GAWK is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * GAWK is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with GAWK; see the file COPYING. If not, write to
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+
+#include "awk.h"
+
+
+#ifndef SRANDOM_PROTO
+extern void srandom P((int seed));
+#endif
+#ifndef linux
+extern char *initstate P((unsigned seed, char *state, int n));
+extern char *setstate P((char *state));
+extern long random P((void));
+#endif
+
+extern NODE **fields_arr;
+extern int output_is_tty;
+
+static NODE *sub_common P((NODE *tree, int global));
+
+#ifdef GFMT_WORKAROUND
+char *gfmt P((double g, int prec, char *buf));
+#endif
+
+#ifdef _CRAY
+/* Work around a problem in conversion of doubles to exact integers. */
+#include <float.h>
+#define Floor(n) floor((n) * (1.0 + DBL_EPSILON))
+#define Ceil(n) ceil((n) * (1.0 + DBL_EPSILON))
+
+/* Force the standard C compiler to use the library math functions. */
+extern double exp(double);
+double (*Exp)() = exp;
+#define exp(x) (*Exp)(x)
+extern double log(double);
+double (*Log)() = log;
+#define log(x) (*Log)(x)
+#else
+#define Floor(n) floor(n)
+#define Ceil(n) ceil(n)
+#endif
+
+static void
+efwrite(ptr, size, count, fp, from, rp, flush)
+void *ptr;
+unsigned size, count;
+FILE *fp;
+char *from;
+struct redirect *rp;
+int flush;
+{
+ errno = 0;
+ if (fwrite(ptr, size, count, fp) != count)
+ goto wrerror;
+ if (flush
+ && ((fp == stdout && output_is_tty)
+ || (rp && (rp->flag & RED_NOBUF)))) {
+ fflush(fp);
+ if (ferror(fp))
+ goto wrerror;
+ }
+ return;
+
+ wrerror:
+ fatal("%s to \"%s\" failed (%s)", from,
+ rp ? rp->value : "standard output",
+ errno ? strerror(errno) : "reason unknown");
+}
+
+/* Builtin functions */
+NODE *
+do_exp(tree)
+NODE *tree;
+{
+ NODE *tmp;
+ double d, res;
+#ifndef exp
+ double exp P((double));
+#endif
+
+ tmp= tree_eval(tree->lnode);
+ d = force_number(tmp);
+ free_temp(tmp);
+ errno = 0;
+ res = exp(d);
+ if (errno == ERANGE)
+ warning("exp argument %g is out of range", d);
+ return tmp_number((AWKNUM) res);
+}
+
+NODE *
+do_index(tree)
+NODE *tree;
+{
+ NODE *s1, *s2;
+ register char *p1, *p2;
+ register int l1, l2;
+ long ret;
+
+
+ s1 = tree_eval(tree->lnode);
+ s2 = tree_eval(tree->rnode->lnode);
+ force_string(s1);
+ force_string(s2);
+ p1 = s1->stptr;
+ p2 = s2->stptr;
+ l1 = s1->stlen;
+ l2 = s2->stlen;
+ ret = 0;
+ if (IGNORECASE) {
+ while (l1) {
+ if (l2 > l1)
+ break;
+ if (casetable[(int)*p1] == casetable[(int)*p2]
+ && (l2 == 1 || strncasecmp(p1, p2, l2) == 0)) {
+ ret = 1 + s1->stlen - l1;
+ break;
+ }
+ l1--;
+ p1++;
+ }
+ } else {
+ while (l1) {
+ if (l2 > l1)
+ break;
+ if (*p1 == *p2
+ && (l2 == 1 || STREQN(p1, p2, l2))) {
+ ret = 1 + s1->stlen - l1;
+ break;
+ }
+ l1--;
+ p1++;
+ }
+ }
+ free_temp(s1);
+ free_temp(s2);
+ return tmp_number((AWKNUM) ret);
+}
+
+NODE *
+do_int(tree)
+NODE *tree;
+{
+ NODE *tmp;
+ double floor P((double));
+ double ceil P((double));
+ double d;
+
+ tmp = tree_eval(tree->lnode);
+ d = force_number(tmp);
+ if (d >= 0)
+ d = Floor(d);
+ else
+ d = Ceil(d);
+ free_temp(tmp);
+ return tmp_number((AWKNUM) d);
+}
+
+NODE *
+do_length(tree)
+NODE *tree;
+{
+ NODE *tmp;
+ int len;
+
+ tmp = tree_eval(tree->lnode);
+ len = force_string(tmp)->stlen;
+ free_temp(tmp);
+ return tmp_number((AWKNUM) len);
+}
+
+NODE *
+do_log(tree)
+NODE *tree;
+{
+ NODE *tmp;
+#ifndef log
+ double log P((double));
+#endif
+ double d, arg;
+
+ tmp = tree_eval(tree->lnode);
+ arg = (double) force_number(tmp);
+ if (arg < 0.0)
+ warning("log called with negative argument %g", arg);
+ d = log(arg);
+ free_temp(tmp);
+ return tmp_number((AWKNUM) d);
+}
+
+/* %e and %f formats are not properly implemented. Someone should fix them */
+/* Actually, this whole thing should be reimplemented. */
+
+NODE *
+do_sprintf(tree)
+NODE *tree;
+{
+#define bchunk(s,l) if(l) {\
+ while((l)>ofre) {\
+ erealloc(obuf, char *, osiz*2, "do_sprintf");\
+ ofre+=osiz;\
+ osiz*=2;\
+ }\
+ memcpy(obuf+olen,s,(l));\
+ olen+=(l);\
+ ofre-=(l);\
+ }
+
+ /* Is there space for something L big in the buffer? */
+#define chksize(l) if((l)>ofre) {\
+ erealloc(obuf, char *, osiz*2, "do_sprintf");\
+ ofre+=osiz;\
+ osiz*=2;\
+ }
+
+ /*
+ * Get the next arg to be formatted. If we've run out of args,
+ * return "" (Null string)
+ */
+#define parse_next_arg() {\
+ if(!carg) { toofew = 1; break; }\
+ else {\
+ arg=tree_eval(carg->lnode);\
+ carg=carg->rnode;\
+ }\
+ }
+
+ NODE *r;
+ int toofew = 0;
+ char *obuf;
+ int osiz, ofre, olen;
+ static char chbuf[] = "0123456789abcdef";
+ static char sp[] = " ";
+ char *s0, *s1;
+ int n0;
+ NODE *sfmt, *arg;
+ register NODE *carg;
+ long fw, prec, lj, alt, big;
+ long *cur;
+ long val;
+#ifdef sun386 /* Can't cast unsigned (int/long) from ptr->value */
+ long tmp_uval; /* on 386i 4.0.1 C compiler -- it just hangs */
+#endif
+ unsigned long uval;
+ int sgn;
+ int base;
+ char cpbuf[30]; /* if we have numbers bigger than 30 */
+ char *cend = &cpbuf[30];/* chars, we lose, but seems unlikely */
+ char *cp;
+ char *fill;
+ double tmpval;
+ char *pr_str;
+ int ucasehex = 0;
+ char signchar = 0;
+ int len;
+
+
+ emalloc(obuf, char *, 120, "do_sprintf");
+ osiz = 120;
+ ofre = osiz - 1;
+ olen = 0;
+ sfmt = tree_eval(tree->lnode);
+ sfmt = force_string(sfmt);
+ carg = tree->rnode;
+ for (s0 = s1 = sfmt->stptr, n0 = sfmt->stlen; n0-- > 0;) {
+ if (*s1 != '%') {
+ s1++;
+ continue;
+ }
+ bchunk(s0, s1 - s0);
+ s0 = s1;
+ cur = &fw;
+ fw = 0;
+ prec = 0;
+ lj = alt = big = 0;
+ fill = sp;
+ cp = cend;
+ s1++;
+
+retry:
+ --n0;
+ switch (*s1++) {
+ case '%':
+ bchunk("%", 1);
+ s0 = s1;
+ break;
+
+ case '0':
+ if (fill != sp || lj)
+ goto lose;
+ if (cur == &fw)
+ fill = "0"; /* FALL through */
+ case '1':
+ case '2':
+ case '3':
+ case '4':
+ case '5':
+ case '6':
+ case '7':
+ case '8':
+ case '9':
+ if (cur == 0)
+ goto lose;
+ *cur = s1[-1] - '0';
+ while (n0 > 0 && *s1 >= '0' && *s1 <= '9') {
+ --n0;
+ *cur = *cur * 10 + *s1++ - '0';
+ }
+ goto retry;
+ case '*':
+ if (cur == 0)
+ goto lose;
+ parse_next_arg();
+ *cur = force_number(arg);
+ free_temp(arg);
+ goto retry;
+ case ' ': /* print ' ' or '-' */
+ case '+': /* print '+' or '-' */
+ signchar = *(s1-1);
+ goto retry;
+ case '-':
+ if (lj || fill != sp)
+ goto lose;
+ lj++;
+ goto retry;
+ case '.':
+ if (cur != &fw)
+ goto lose;
+ cur = &prec;
+ goto retry;
+ case '#':
+ if (alt)
+ goto lose;
+ alt++;
+ goto retry;
+ case 'l':
+ if (big)
+ goto lose;
+ big++;
+ goto retry;
+ case 'c':
+ parse_next_arg();
+ if (arg->flags & NUMBER) {
+#ifdef sun386
+ tmp_uval = arg->numbr;
+ uval= (unsigned long) tmp_uval;
+#else
+ uval = (unsigned long) arg->numbr;
+#endif
+ cpbuf[0] = uval;
+ prec = 1;
+ pr_str = cpbuf;
+ goto dopr_string;
+ }
+ if (! prec)
+ prec = 1;
+ else if (prec > arg->stlen)
+ prec = arg->stlen;
+ pr_str = arg->stptr;
+ goto dopr_string;
+ case 's':
+ parse_next_arg();
+ arg = force_string(arg);
+ if (!prec || prec > arg->stlen)
+ prec = arg->stlen;
+ pr_str = arg->stptr;
+
+ dopr_string:
+ if (fw > prec && !lj) {
+ while (fw > prec) {
+ bchunk(sp, 1);
+ fw--;
+ }
+ }
+ bchunk(pr_str, (int) prec);
+ if (fw > prec) {
+ while (fw > prec) {
+ bchunk(sp, 1);
+ fw--;
+ }
+ }
+ s0 = s1;
+ free_temp(arg);
+ break;
+ case 'd':
+ case 'i':
+ parse_next_arg();
+ val = (long) force_number(arg);
+ free_temp(arg);
+ if (val < 0) {
+ sgn = 1;
+ val = -val;
+ } else
+ sgn = 0;
+ do {
+ *--cp = '0' + val % 10;
+ val /= 10;
+ } while (val);
+ if (sgn)
+ *--cp = '-';
+ else if (signchar)
+ *--cp = signchar;
+ if (prec > fw)
+ fw = prec;
+ prec = cend - cp;
+ if (fw > prec && !lj) {
+ if (fill != sp && (*cp == '-' || signchar)) {
+ bchunk(cp, 1);
+ cp++;
+ prec--;
+ fw--;
+ }
+ while (fw > prec) {
+ bchunk(fill, 1);
+ fw--;
+ }
+ }
+ bchunk(cp, (int) prec);
+ if (fw > prec) {
+ while (fw > prec) {
+ bchunk(fill, 1);
+ fw--;
+ }
+ }
+ s0 = s1;
+ break;
+ case 'u':
+ base = 10;
+ goto pr_unsigned;
+ case 'o':
+ base = 8;
+ goto pr_unsigned;
+ case 'X':
+ ucasehex = 1;
+ case 'x':
+ base = 16;
+ goto pr_unsigned;
+ pr_unsigned:
+ parse_next_arg();
+ uval = (unsigned long) force_number(arg);
+ free_temp(arg);
+ do {
+ *--cp = chbuf[uval % base];
+ if (ucasehex && isalpha(*cp))
+ *cp = toupper(*cp);
+ uval /= base;
+ } while (uval);
+ if (alt && (base == 8 || base == 16)) {
+ if (base == 16) {
+ if (ucasehex)
+ *--cp = 'X';
+ else
+ *--cp = 'x';
+ }
+ *--cp = '0';
+ }
+ prec = cend - cp;
+ if (fw > prec && !lj) {
+ while (fw > prec) {
+ bchunk(fill, 1);
+ fw--;
+ }
+ }
+ bchunk(cp, (int) prec);
+ if (fw > prec) {
+ while (fw > prec) {
+ bchunk(fill, 1);
+ fw--;
+ }
+ }
+ s0 = s1;
+ break;
+ case 'g':
+ parse_next_arg();
+ tmpval = force_number(arg);
+ free_temp(arg);
+ chksize(fw + prec + 9); /* 9==slop */
+
+ cp = cpbuf;
+ *cp++ = '%';
+ if (lj)
+ *cp++ = '-';
+ if (fill != sp)
+ *cp++ = '0';
+#ifndef GFMT_WORKAROUND
+ if (cur != &fw) {
+ (void) strcpy(cp, "*.*g");
+ (void) sprintf(obuf + olen, cpbuf, (int) fw, (int) prec, (double) tmpval);
+ } else {
+ (void) strcpy(cp, "*g");
+ (void) sprintf(obuf + olen, cpbuf, (int) fw, (double) tmpval);
+ }
+#else /* GFMT_WORKAROUND */
+ {
+ char *gptr, gbuf[120];
+#define DEFAULT_G_PRECISION 6
+ if (fw + prec + 9 > sizeof gbuf) { /* 9==slop */
+ emalloc(gptr, char *, fw+prec+9, "do_sprintf(gfmt)");
+ } else
+ gptr = gbuf;
+ (void) gfmt((double) tmpval, cur != &fw ?
+ (int) prec : DEFAULT_G_PRECISION, gptr);
+ *cp++ = '*', *cp++ = 's', *cp = '\0';
+ (void) sprintf(obuf + olen, cpbuf, (int) fw, gptr);
+ if (fill != sp && *gptr == ' ') {
+ char *p = gptr;
+ do { *p++ = '0'; } while (*p == ' ');
+ }
+ if (gptr != gbuf) free(gptr);
+ }
+#endif /* GFMT_WORKAROUND */
+ len = strlen(obuf + olen);
+ ofre -= len;
+ olen += len;
+ s0 = s1;
+ break;
+
+ case 'f':
+ parse_next_arg();
+ tmpval = force_number(arg);
+ free_temp(arg);
+ chksize(fw + prec + 9); /* 9==slop */
+
+ cp = cpbuf;
+ *cp++ = '%';
+ if (lj)
+ *cp++ = '-';
+ if (fill != sp)
+ *cp++ = '0';
+ if (cur != &fw) {
+ (void) strcpy(cp, "*.*f");
+ (void) sprintf(obuf + olen, cpbuf, (int) fw, (int) prec, (double) tmpval);
+ } else {
+ (void) strcpy(cp, "*f");
+ (void) sprintf(obuf + olen, cpbuf, (int) fw, (double) tmpval);
+ }
+ len = strlen(obuf + olen);
+ ofre -= len;
+ olen += len;
+ s0 = s1;
+ break;
+ case 'e':
+ parse_next_arg();
+ tmpval = force_number(arg);
+ free_temp(arg);
+ chksize(fw + prec + 9); /* 9==slop */
+ cp = cpbuf;
+ *cp++ = '%';
+ if (lj)
+ *cp++ = '-';
+ if (fill != sp)
+ *cp++ = '0';
+ if (cur != &fw) {
+ (void) strcpy(cp, "*.*e");
+ (void) sprintf(obuf + olen, cpbuf, (int) fw, (int) prec, (double) tmpval);
+ } else {
+ (void) strcpy(cp, "*e");
+ (void) sprintf(obuf + olen, cpbuf, (int) fw, (double) tmpval);
+ }
+ len = strlen(obuf + olen);
+ ofre -= len;
+ olen += len;
+ s0 = s1;
+ break;
+
+ default:
+ lose:
+ break;
+ }
+ if (toofew)
+ fatal("%s\n\t%s\n\t%*s%s",
+ "not enough arguments to satisfy format string",
+ sfmt->stptr, s1 - sfmt->stptr - 2, "",
+ "^ ran out for this one"
+ );
+ }
+ if (do_lint && carg != NULL)
+ warning("too many arguments supplied for format string");
+ bchunk(s0, s1 - s0);
+ free_temp(sfmt);
+ r = make_str_node(obuf, olen, ALREADY_MALLOCED);
+ r->flags |= TEMP;
+ return r;
+}
+
+void
+do_printf(tree)
+register NODE *tree;
+{
+ struct redirect *rp = NULL;
+ register FILE *fp;
+
+ if (tree->rnode) {
+ int errflg; /* not used, sigh */
+
+ rp = redirect(tree->rnode, &errflg);
+ if (rp) {
+ fp = rp->fp;
+ if (!fp)
+ return;
+ } else
+ return;
+ } else
+ fp = stdout;
+ tree = do_sprintf(tree->lnode);
+ efwrite(tree->stptr, sizeof(char), tree->stlen, fp, "printf", rp , 1);
+ free_temp(tree);
+}
+
+NODE *
+do_sqrt(tree)
+NODE *tree;
+{
+ NODE *tmp;
+ double arg;
+ extern double sqrt P((double));
+
+ tmp = tree_eval(tree->lnode);
+ arg = (double) force_number(tmp);
+ free_temp(tmp);
+ if (arg < 0.0)
+ warning("sqrt called with negative argument %g", arg);
+ return tmp_number((AWKNUM) sqrt(arg));
+}
+
+NODE *
+do_substr(tree)
+NODE *tree;
+{
+ NODE *t1, *t2, *t3;
+ NODE *r;
+ register int indx;
+ size_t length;
+
+ t1 = tree_eval(tree->lnode);
+ t2 = tree_eval(tree->rnode->lnode);
+ if (tree->rnode->rnode == NULL) /* third arg. missing */
+ length = t1->stlen;
+ else {
+ t3 = tree_eval(tree->rnode->rnode->lnode);
+ length = (size_t) force_number(t3);
+ free_temp(t3);
+ }
+ indx = (int) force_number(t2) - 1;
+ free_temp(t2);
+ t1 = force_string(t1);
+ if (indx < 0)
+ indx = 0;
+ if (indx >= t1->stlen || length <= 0) {
+ free_temp(t1);
+ return Nnull_string;
+ }
+ if (indx + length > t1->stlen || LONG_MAX - indx < length)
+ length = t1->stlen - indx;
+ r = tmp_string(t1->stptr + indx, length);
+ free_temp(t1);
+ return r;
+}
+
+NODE *
+do_strftime(tree)
+NODE *tree;
+{
+ NODE *t1, *t2;
+ struct tm *tm;
+ time_t fclock;
+ char buf[100];
+ int ret;
+
+ t1 = force_string(tree_eval(tree->lnode));
+
+ if (tree->rnode == NULL) /* second arg. missing, default */
+ (void) time(&fclock);
+ else {
+ t2 = tree_eval(tree->rnode->lnode);
+ fclock = (time_t) force_number(t2);
+ free_temp(t2);
+ }
+ tm = localtime(&fclock);
+
+ ret = strftime(buf, 100, t1->stptr, tm);
+
+ return tmp_string(buf, ret);
+}
+
+NODE *
+do_systime(tree)
+NODE *tree;
+{
+ time_t lclock;
+
+ (void) time(&lclock);
+ return tmp_number((AWKNUM) lclock);
+}
+
+NODE *
+do_system(tree)
+NODE *tree;
+{
+ NODE *tmp;
+ int ret = 0;
+ char *cmd;
+
+ (void) flush_io (); /* so output is synchronous with gawk's */
+ tmp = tree_eval(tree->lnode);
+ cmd = force_string(tmp)->stptr;
+ if (cmd && *cmd) {
+ ret = system(cmd);
+ ret = (ret >> 8) & 0xff;
+ }
+ free_temp(tmp);
+ return tmp_number((AWKNUM) ret);
+}
+
+void
+do_print(tree)
+register NODE *tree;
+{
+ register NODE *t1;
+ struct redirect *rp = NULL;
+ register FILE *fp;
+ register char *s;
+
+ if (tree->rnode) {
+ int errflg; /* not used, sigh */
+
+ rp = redirect(tree->rnode, &errflg);
+ if (rp) {
+ fp = rp->fp;
+ if (!fp)
+ return;
+ } else
+ return;
+ } else
+ fp = stdout;
+ tree = tree->lnode;
+ while (tree) {
+ t1 = tree_eval(tree->lnode);
+ if (t1->flags & NUMBER) {
+ if (OFMTidx == CONVFMTidx)
+ (void) force_string(t1);
+ else {
+ char buf[100];
+
+ sprintf(buf, OFMT, t1->numbr);
+ t1 = tmp_string(buf, strlen(buf));
+ }
+ }
+ efwrite(t1->stptr, sizeof(char), t1->stlen, fp, "print", rp, 0);
+ free_temp(t1);
+ tree = tree->rnode;
+ if (tree) {
+ s = OFS;
+ if (OFSlen)
+ efwrite(s, sizeof(char), OFSlen, fp, "print", rp, 0);
+ }
+ }
+ s = ORS;
+ if (ORSlen)
+ efwrite(s, sizeof(char), ORSlen, fp, "print", rp, 1);
+}
+
+NODE *
+do_tolower(tree)
+NODE *tree;
+{
+ NODE *t1, *t2;
+ register char *cp, *cp2;
+
+ t1 = tree_eval(tree->lnode);
+ t1 = force_string(t1);
+ t2 = tmp_string(t1->stptr, t1->stlen);
+ for (cp = t2->stptr, cp2 = t2->stptr + t2->stlen; cp < cp2; cp++)
+ if (isupper(*cp))
+ *cp = tolower(*cp);
+ free_temp(t1);
+ return t2;
+}
+
+NODE *
+do_toupper(tree)
+NODE *tree;
+{
+ NODE *t1, *t2;
+ register char *cp;
+
+ t1 = tree_eval(tree->lnode);
+ t1 = force_string(t1);
+ t2 = tmp_string(t1->stptr, t1->stlen);
+ for (cp = t2->stptr; cp < t2->stptr + t2->stlen; cp++)
+ if (islower(*cp))
+ *cp = toupper(*cp);
+ free_temp(t1);
+ return t2;
+}
+
+NODE *
+do_atan2(tree)
+NODE *tree;
+{
+ NODE *t1, *t2;
+ extern double atan2 P((double, double));
+ double d1, d2;
+
+ t1 = tree_eval(tree->lnode);
+ t2 = tree_eval(tree->rnode->lnode);
+ d1 = force_number(t1);
+ d2 = force_number(t2);
+ free_temp(t1);
+ free_temp(t2);
+ return tmp_number((AWKNUM) atan2(d1, d2));
+}
+
+NODE *
+do_sin(tree)
+NODE *tree;
+{
+ NODE *tmp;
+ extern double sin P((double));
+ double d;
+
+ tmp = tree_eval(tree->lnode);
+ d = sin((double)force_number(tmp));
+ free_temp(tmp);
+ return tmp_number((AWKNUM) d);
+}
+
+NODE *
+do_cos(tree)
+NODE *tree;
+{
+ NODE *tmp;
+ extern double cos P((double));
+ double d;
+
+ tmp = tree_eval(tree->lnode);
+ d = cos((double)force_number(tmp));
+ free_temp(tmp);
+ return tmp_number((AWKNUM) d);
+}
+
+static int firstrand = 1;
+static char state[256];
+
+/* ARGSUSED */
+NODE *
+do_rand(tree)
+NODE *tree;
+{
+ if (firstrand) {
+ (void) initstate((unsigned) 1, state, sizeof state);
+ srandom(1);
+ firstrand = 0;
+ }
+ return tmp_number((AWKNUM) random() / LONG_MAX);
+}
+
+NODE *
+do_srand(tree)
+NODE *tree;
+{
+ NODE *tmp;
+ static long save_seed = 0;
+ long ret = save_seed; /* SVR4 awk srand returns previous seed */
+
+ if (firstrand)
+ (void) initstate((unsigned) 1, state, sizeof state);
+ else
+ (void) setstate(state);
+
+ if (!tree)
+ srandom((int) (save_seed = (long) time((time_t *) 0)));
+ else {
+ tmp = tree_eval(tree->lnode);
+ srandom((int) (save_seed = (long) force_number(tmp)));
+ free_temp(tmp);
+ }
+ firstrand = 0;
+ return tmp_number((AWKNUM) ret);
+}
+
+NODE *
+do_match(tree)
+NODE *tree;
+{
+ NODE *t1;
+ int rstart;
+ AWKNUM rlength;
+ Regexp *rp;
+
+ t1 = force_string(tree_eval(tree->lnode));
+ tree = tree->rnode->lnode;
+ rp = re_update(tree);
+ rstart = research(rp, t1->stptr, 0, t1->stlen, 1);
+ if (rstart >= 0) { /* match succeded */
+ rstart++; /* 1-based indexing */
+ rlength = REEND(rp, t1->stptr) - RESTART(rp, t1->stptr);
+ } else { /* match failed */
+ rstart = 0;
+ rlength = -1.0;
+ }
+ free_temp(t1);
+ unref(RSTART_node->var_value);
+ RSTART_node->var_value = make_number((AWKNUM) rstart);
+ unref(RLENGTH_node->var_value);
+ RLENGTH_node->var_value = make_number(rlength);
+ return tmp_number((AWKNUM) rstart);
+}
+
+static NODE *
+sub_common(tree, global)
+NODE *tree;
+int global;
+{
+ register char *scan;
+ register char *bp, *cp;
+ char *buf;
+ int buflen;
+ register char *matchend;
+ register int len;
+ char *matchstart;
+ char *text;
+ int textlen;
+ char *repl;
+ char *replend;
+ int repllen;
+ int sofar;
+ int ampersands;
+ int matches = 0;
+ Regexp *rp;
+ NODE *s; /* subst. pattern */
+ NODE *t; /* string to make sub. in; $0 if none given */
+ NODE *tmp;
+ NODE **lhs = &tree; /* value not used -- just different from NULL */
+ int priv = 0;
+ Func_ptr after_assign = NULL;
+
+ tmp = tree->lnode;
+ rp = re_update(tmp);
+
+ tree = tree->rnode;
+ s = tree->lnode;
+
+ tree = tree->rnode;
+ tmp = tree->lnode;
+ t = force_string(tree_eval(tmp));
+
+ /* do the search early to avoid work on non-match */
+ if (research(rp, t->stptr, 0, t->stlen, 1) == -1 ||
+ (RESTART(rp, t->stptr) > t->stlen) && (matches = 1)) {
+ free_temp(t);
+ return tmp_number((AWKNUM) matches);
+ }
+
+ if (tmp->type == Node_val)
+ lhs = NULL;
+ else
+ lhs = get_lhs(tmp, &after_assign);
+ t->flags |= STRING;
+ /*
+ * create a private copy of the string
+ */
+ if (t->stref > 1 || (t->flags & PERM)) {
+ unsigned int saveflags;
+
+ saveflags = t->flags;
+ t->flags &= ~MALLOC;
+ tmp = dupnode(t);
+ t->flags = saveflags;
+ t = tmp;
+ priv = 1;
+ }
+ text = t->stptr;
+ textlen = t->stlen;
+ buflen = textlen + 2;
+
+ s = force_string(tree_eval(s));
+ repl = s->stptr;
+ replend = repl + s->stlen;
+ repllen = replend - repl;
+ emalloc(buf, char *, buflen, "do_sub");
+ ampersands = 0;
+ for (scan = repl; scan < replend; scan++) {
+ if (*scan == '&') {
+ repllen--;
+ ampersands++;
+ } else if (*scan == '\\' && (*(scan+1) == '&' || *(scan+1) == '\\')) {
+ repllen--;
+ scan++;
+ }
+ }
+
+ bp = buf;
+ for (;;) {
+ matches++;
+ matchstart = t->stptr + RESTART(rp, t->stptr);
+ matchend = t->stptr + REEND(rp, t->stptr);
+
+ /*
+ * create the result, copying in parts of the original
+ * string
+ */
+ len = matchstart - text + repllen
+ + ampersands * (matchend - matchstart);
+ sofar = bp - buf;
+ while (buflen - sofar - len - 1 < 0) {
+ buflen *= 2;
+ erealloc(buf, char *, buflen, "do_sub");
+ bp = buf + sofar;
+ }
+ for (scan = text; scan < matchstart; scan++)
+ *bp++ = *scan;
+ for (scan = repl; scan < replend; scan++)
+ if (*scan == '&')
+ for (cp = matchstart; cp < matchend; cp++)
+ *bp++ = *cp;
+ else if (*scan == '\\' && (*(scan+1) == '&' || *(scan+1) == '\\')) {
+ scan++;
+ *bp++ = *scan;
+ } else
+ *bp++ = *scan;
+ if (global && matchstart == matchend && matchend < text + textlen) {
+ *bp++ = *matchend;
+ matchend++;
+ }
+ textlen = text + textlen - matchend;
+ text = matchend;
+ if (!global || textlen <= 0 ||
+ research(rp, t->stptr, text-t->stptr, textlen, 1) == -1)
+ break;
+ }
+ sofar = bp - buf;
+ if (buflen - sofar - textlen - 1) {
+ buflen = sofar + textlen + 2;
+ erealloc(buf, char *, buflen, "do_sub");
+ bp = buf + sofar;
+ }
+ for (scan = matchend; scan < text + textlen; scan++)
+ *bp++ = *scan;
+ textlen = bp - buf;
+ free(t->stptr);
+ t->stptr = buf;
+ t->stlen = textlen;
+
+ free_temp(s);
+ if (matches > 0 && lhs) {
+ if (priv) {
+ unref(*lhs);
+ *lhs = t;
+ }
+ if (after_assign)
+ (*after_assign)();
+ t->flags &= ~(NUM|NUMBER);
+ }
+ return tmp_number((AWKNUM) matches);
+}
+
+NODE *
+do_gsub(tree)
+NODE *tree;
+{
+ return sub_common(tree, 1);
+}
+
+NODE *
+do_sub(tree)
+NODE *tree;
+{
+ return sub_common(tree, 0);
+}
+
+#ifdef GFMT_WORKAROUND
+ /*
+ * printf's %g format [can't rely on gcvt()]
+ * caveat: don't use as argument to *printf()!
+ */
+char *
+gfmt(g, prec, buf)
+double g; /* value to format */
+int prec; /* indicates desired significant digits, not decimal places */
+char *buf; /* return buffer; assumed big enough to hold result */
+{
+ if (g == 0.0) {
+ (void) strcpy(buf, "0"); /* easy special case */
+ } else {
+ register char *d, *e, *p;
+
+ /* start with 'e' format (it'll provide nice exponent) */
+ if (prec < 1) prec = 1; /* at least 1 significant digit */
+ (void) sprintf(buf, "%.*e", prec - 1, g);
+ if ((e = strchr(buf, 'e')) != 0) { /* find exponent */
+ int exp = atoi(e+1); /* fetch exponent */
+ if (exp >= -4 && exp < prec) { /* per K&R2, B1.2 */
+ /* switch to 'f' format and re-do */
+ prec -= (exp + 1); /* decimal precision */
+ (void) sprintf(buf, "%.*f", prec, g);
+ e = buf + strlen(buf);
+ }
+ if ((d = strchr(buf, '.')) != 0) {
+ /* remove trailing zeroes and decimal point */
+ for (p = e; p > d && *--p == '0'; ) continue;
+ if (*p == '.') --p;
+ if (++p < e) /* copy exponent and NUL */
+ while ((*p++ = *e++) != '\0') continue;
+ }
+ }
+ }
+ return buf;
+}
+#endif /* GFMT_WORKAROUND */
diff --git a/gnu/usr.bin/awk/config.h b/gnu/usr.bin/awk/config.h
new file mode 100644
index 000000000000..8c20953ed531
--- /dev/null
+++ b/gnu/usr.bin/awk/config.h
@@ -0,0 +1,272 @@
+/*
+ * config.h -- configuration definitions for gawk.
+ *
+ * For generic 4.4 alpha
+ */
+
+/*
+ * Copyright (C) 1991, 1992 the Free Software Foundation, Inc.
+ *
+ * This file is part of GAWK, the GNU implementation of the
+ * AWK Progamming Language.
+ *
+ * GAWK is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * GAWK is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with GAWK; see the file COPYING. If not, write to
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+/*
+ * This file isolates configuration dependencies for gnu awk.
+ * You should know something about your system, perhaps by having
+ * a manual handy, when you edit this file. You should copy config.h-dist
+ * to config.h, and edit config.h. Do not modify config.h-dist, so that
+ * it will be easy to apply any patches that may be distributed.
+ *
+ * The general idea is that systems conforming to the various standards
+ * should need to do the least amount of changing. Definining the various
+ * items in ths file usually means that your system is missing that
+ * particular feature.
+ *
+ * The order of preference in standard conformance is ANSI C, POSIX,
+ * and the SVID.
+ *
+ * If you have no clue as to what's going on with your system, try
+ * compiling gawk without editing this file and see what shows up
+ * missing in the link stage. From there, you can probably figure out
+ * which defines to turn on.
+ */
+
+/**************************/
+/* Miscellanious features */
+/**************************/
+
+/*
+ * BLKSIZE_MISSING
+ *
+ * Check your /usr/include/sys/stat.h file. If the stat structure
+ * does not have a member named st_blksize, define this. (This will
+ * most likely be the case on most System V systems prior to V.4.)
+ */
+/* #define BLKSIZE_MISSING 1 */
+
+/*
+ * SIGTYPE
+ *
+ * The return type of the routines passed to the signal function.
+ * Modern systems use `void', older systems use `int'.
+ * If left undefined, it will default to void.
+ */
+/* #define SIGTYPE int */
+
+/*
+ * SIZE_T_MISSING
+ *
+ * If your system has no typedef for size_t, define this to get a default
+ */
+/* #define SIZE_T_MISSING 1 */
+
+/*
+ * CHAR_UNSIGNED
+ *
+ * If your machine uses unsigned characters (IBM RT and RS/6000 and others)
+ * then define this for use in regex.c
+ */
+/* #define CHAR_UNSIGNED 1 */
+
+/*
+ * HAVE_UNDERSCORE_SETJMP
+ *
+ * Check in your /usr/include/setjmp.h file. If there are routines
+ * there named _setjmp and _longjmp, then you should define this.
+ * Typically only systems derived from Berkeley Unix have this.
+ */
+#define HAVE_UNDERSCORE_SETJMP 1
+
+/***********************************************/
+/* Missing library subroutines or system calls */
+/***********************************************/
+
+/*
+ * MEMCMP_MISSING
+ * MEMCPY_MISSING
+ * MEMSET_MISSING
+ *
+ * These three routines are for manipulating blocks of memory. Most
+ * likely they will either all three be present or all three be missing,
+ * so they're grouped together.
+ */
+/* #define MEMCMP_MISSING 1 */
+/* #define MEMCPY_MISSING 1 */
+/* #define MEMSET_MISSING 1 */
+
+/*
+ * RANDOM_MISSING
+ *
+ * Your system does not have the random(3) suite of random number
+ * generating routines. These are different than the old rand(3)
+ * routines!
+ */
+/* #define RANDOM_MISSING 1 */
+
+/*
+ * STRCASE_MISSING
+ *
+ * Your system does not have the strcasemp() and strncasecmp()
+ * routines that originated in Berkeley Unix.
+ */
+/* #define STRCASE_MISSING 1 */
+
+/*
+ * STRCHR_MISSING
+ *
+ * Your system does not have the strchr() and strrchr() functions.
+ */
+/* #define STRCHR_MISSING 1 */
+
+/*
+ * STRERROR_MISSING
+ *
+ * Your system lacks the ANSI C strerror() routine for returning the
+ * strings associated with errno values.
+ */
+/* #define STRERROR_MISSING 1 */
+
+/*
+ * STRTOD_MISSING
+ *
+ * Your system does not have the strtod() routine for converting
+ * strings to double precision floating point values.
+ */
+/* #define STRTOD_MISSING 1 */
+
+/*
+ * STRFTIME_MISSING
+ *
+ * Your system lacks the ANSI C strftime() routine for formatting
+ * broken down time values.
+ */
+/* #define STRFTIME_MISSING 1 */
+
+/*
+ * TZSET_MISSING
+ *
+ * If you have a 4.2 BSD vintage system, then the strftime() routine
+ * supplied in the missing directory won't be enough, because it relies on the
+ * tzset() routine from System V / Posix. Fortunately, there is an
+ * emulation for tzset() too that should do the trick. If you don't
+ * have tzset(), define this.
+ */
+/* #define TZSET_MISSING 1 */
+
+/*
+ * TZNAME_MISSING
+ *
+ * Some systems do not support the external variables tzname and daylight.
+ * If this is the case *and* strftime() is missing, define this.
+ */
+/* #define TZNAME_MISSING 1 */
+
+/*
+ * STDC_HEADERS
+ *
+ * If your system does have ANSI compliant header files that
+ * provide prototypes for library routines, then define this.
+ */
+#define STDC_HEADERS 1
+
+/*
+ * NO_TOKEN_PASTING
+ *
+ * If your compiler define's __STDC__ but does not support token
+ * pasting (tok##tok), then define this.
+ */
+/* #define NO_TOKEN_PASTING 1 */
+
+/*****************************************************************/
+/* Stuff related to the Standard I/O Library. */
+/*****************************************************************/
+/* Much of this is (still, unfortunately) black magic in nature. */
+/* You may have to use some or all of these together to get gawk */
+/* to work correctly. */
+/*****************************************************************/
+
+/*
+ * NON_STD_SPRINTF
+ *
+ * Look in your /usr/include/stdio.h file. If the return type of the
+ * sprintf() function is NOT `int', define this.
+ */
+/* #define NON_STD_SPRINTF 1 */
+
+/*
+ * VPRINTF_MISSING
+ *
+ * Define this if your system lacks vprintf() and the other routines
+ * that go with it. This will trigger an attempt to use _doprnt().
+ * If you don't have that, this attempt will fail and you are on your own.
+ */
+/* #define VPRINTF_MISSING 1 */
+
+/*
+ * Casts from size_t to int and back. These will become unnecessary
+ * at some point in the future, but for now are required where the
+ * two types are a different representation.
+ */
+/* #define SZTC */
+/* #define INTC */
+
+/*
+ * SYSTEM_MISSING
+ *
+ * Define this if your library does not provide a system function
+ * or you are not entirely happy with it and would rather use
+ * a provided replacement (atari only).
+ */
+/* #define SYSTEM_MISSING 1 */
+
+/*
+ * FMOD_MISSING
+ *
+ * Define this if your system lacks the fmod() function and modf() will
+ * be used instead.
+ */
+/* #define FMOD_MISSING 1 */
+
+
+/*******************************/
+/* Gawk configuration options. */
+/*******************************/
+
+/*
+ * DEFPATH
+ *
+ * The default search path for the -f option of gawk. It is used
+ * if the AWKPATH environment variable is undefined. The default
+ * definition is provided here. Most likely you should not change
+ * this.
+ */
+
+/* #define DEFPATH ".:/usr/lib/awk:/usr/local/lib/awk" */
+/* #define ENVSEP ':' */
+
+/*
+ * alloca already has a prototype defined - don't redefine it
+ */
+#define ALLOCA_PROTO 1
+
+/*
+ * srandom already has a prototype defined - don't redefine it
+ */
+#define SRANDOM_PROTO 1
+
+/* anything that follows is for system-specific short-term kludges */
diff --git a/gnu/usr.bin/awk/dfa.c b/gnu/usr.bin/awk/dfa.c
new file mode 100644
index 000000000000..5293c755871d
--- /dev/null
+++ b/gnu/usr.bin/awk/dfa.c
@@ -0,0 +1,2291 @@
+/* dfa.c - determinisitic extended regexp routines for GNU
+ Copyright (C) 1988 Free Software Foundation, Inc.
+ Written June, 1988 by Mike Haertel
+ Modified July, 1988 by Arthur David Olson
+ to assist BMG speedups
+
+ NO WARRANTY
+
+ BECAUSE THIS PROGRAM IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY
+NO WARRANTY, TO THE EXTENT PERMITTED BY APPLICABLE STATE LAW. EXCEPT
+WHEN OTHERWISE STATED IN WRITING, FREE SOFTWARE FOUNDATION, INC,
+RICHARD M. STALLMAN AND/OR OTHER PARTIES PROVIDE THIS PROGRAM "AS IS"
+WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING,
+BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
+FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY
+AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE
+DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR
+CORRECTION.
+
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW WILL RICHARD M.
+STALLMAN, THE FREE SOFTWARE FOUNDATION, INC., AND/OR ANY OTHER PARTY
+WHO MAY MODIFY AND REDISTRIBUTE THIS PROGRAM AS PERMITTED BELOW, BE
+LIABLE TO YOU FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR
+OTHER SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR
+DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES OR
+A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) THIS
+PROGRAM, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
+DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY.
+
+ GENERAL PUBLIC LICENSE TO COPY
+
+ 1. You may copy and distribute verbatim copies of this source file
+as you receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy a valid copyright notice "Copyright
+ (C) 1988 Free Software Foundation, Inc."; and include following the
+copyright notice a verbatim copy of the above disclaimer of warranty
+and of this License. You may charge a distribution fee for the
+physical act of transferring a copy.
+
+ 2. You may modify your copy or copies of this source file or
+any portion of it, and copy and distribute such modifications under
+the terms of Paragraph 1 above, provided that you also do the following:
+
+ a) cause the modified files to carry prominent notices stating
+ that you changed the files and the date of any change; and
+
+ b) cause the whole of any work that you distribute or publish,
+ that in whole or in part contains or is a derivative of this
+ program or any part thereof, to be licensed at no charge to all
+ third parties on terms identical to those contained in this
+ License Agreement (except that you may choose to grant more extensive
+ warranty protection to some or all third parties, at your option).
+
+ c) You may charge a distribution fee for the physical act of
+ transferring a copy, and you may at your option offer warranty
+ protection in exchange for a fee.
+
+Mere aggregation of another unrelated program with this program (or its
+derivative) on a volume of a storage or distribution medium does not bring
+the other program under the scope of these terms.
+
+ 3. You may copy and distribute this program or any portion of it in
+compiled, executable or object code form under the terms of Paragraphs
+1 and 2 above provided that you do the following:
+
+ a) accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of
+ Paragraphs 1 and 2 above; or,
+
+ b) accompany it with a written offer, valid for at least three
+ years, to give any third party free (except for a nominal
+ shipping charge) a complete machine-readable copy of the
+ corresponding source code, to be distributed under the terms of
+ Paragraphs 1 and 2 above; or,
+
+ c) accompany it with the information you received as to where the
+ corresponding source code may be obtained. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form alone.)
+
+For an executable file, complete source code means all the source code for
+all modules it contains; but, as a special exception, it need not include
+source code for modules which are standard libraries that accompany the
+operating system on which the executable file runs.
+
+ 4. You may not copy, sublicense, distribute or transfer this program
+except as expressly provided under this License Agreement. Any attempt
+otherwise to copy, sublicense, distribute or transfer this program is void and
+your rights to use the program under this License agreement shall be
+automatically terminated. However, parties who have received computer
+software programs from you with this License Agreement will not have
+their licenses terminated so long as such parties remain in full compliance.
+
+ 5. If you wish to incorporate parts of this program into other free
+programs whose distribution conditions are different, write to the Free
+Software Foundation at 675 Mass Ave, Cambridge, MA 02139. We have not yet
+worked out a simple rule that can be stated here, but we will often permit
+this. We will be guided by the two goals of preserving the free status of
+all derivatives our free software and of promoting the sharing and reuse of
+software.
+
+
+In other words, you are welcome to use, share and improve this program.
+You are forbidden to forbid anyone else to use, share and improve
+what you give them. Help stamp out software-hoarding! */
+
+#include "awk.h"
+#include <assert.h>
+
+#ifdef setbit /* surprise - setbit and clrbit are macros on NeXT */
+#undef setbit
+#endif
+#ifdef clrbit
+#undef clrbit
+#endif
+
+#ifdef __STDC__
+typedef void *ptr_t;
+#else
+typedef char *ptr_t;
+#endif
+
+typedef struct {
+ char ** in;
+ char * left;
+ char * right;
+ char * is;
+} must;
+
+static ptr_t xcalloc P((int n, size_t s));
+static ptr_t xmalloc P((size_t n));
+static ptr_t xrealloc P((ptr_t p, size_t n));
+static int tstbit P((int b, _charset c));
+static void setbit P((int b, _charset c));
+static void clrbit P((int b, _charset c));
+static void copyset P((const _charset src, _charset dst));
+static void zeroset P((_charset s));
+static void notset P((_charset s));
+static int equal P((const _charset s1, const _charset s2));
+static int charset_index P((const _charset s));
+static _token lex P((void));
+static void addtok P((_token t));
+static void atom P((void));
+static void closure P((void));
+static void branch P((void));
+static void regexp P((void));
+static void copy P((const _position_set *src, _position_set *dst));
+static void insert P((_position p, _position_set *s));
+static void merge P((_position_set *s1, _position_set *s2, _position_set *m));
+static void delete P((_position p, _position_set *s));
+static int state_index P((struct regexp *r, _position_set *s,
+ int newline, int letter));
+static void epsclosure P((_position_set *s, struct regexp *r));
+static void build_state P((int s, struct regexp *r));
+static void build_state_zero P((struct regexp *r));
+static char *icatalloc P((char *old, const char *new));
+static char *icpyalloc P((const char *string));
+static char *istrstr P((char *lookin, char *lookfor));
+static void ifree P((char *cp));
+static void freelist P((char **cpp));
+static char **enlist P((char **cpp, char *new, size_t len));
+static char **comsubs P((char *left, char *right));
+static char **addlists P((char **old, char **new));
+static char **inboth P((char **left, char **right));
+static void resetmust P((must *mp));
+static void regmust P((struct regexp *r));
+
+#undef P
+
+static ptr_t
+xcalloc(n, s)
+ int n;
+ size_t s;
+{
+ ptr_t r = calloc(n, s);
+
+ if (NULL == r)
+ reg_error("Memory exhausted"); /* reg_error does not return */
+ return r;
+}
+
+static ptr_t
+xmalloc(n)
+ size_t n;
+{
+ ptr_t r = malloc(n);
+
+ assert(n != 0);
+ if (NULL == r)
+ reg_error("Memory exhausted");
+ return r;
+}
+
+static ptr_t
+xrealloc(p, n)
+ ptr_t p;
+ size_t n;
+{
+ ptr_t r = realloc(p, n);
+
+ assert(n != 0);
+ if (NULL == r)
+ reg_error("Memory exhausted");
+ return r;
+}
+
+#define CALLOC(p, t, n) ((p) = (t *) xcalloc((n), sizeof (t)))
+#undef MALLOC
+#define MALLOC(p, t, n) ((p) = (t *) xmalloc((n) * sizeof (t)))
+#define REALLOC(p, t, n) ((p) = (t *) xrealloc((ptr_t) (p), (n) * sizeof (t)))
+
+/* Reallocate an array of type t if nalloc is too small for index. */
+#define REALLOC_IF_NECESSARY(p, t, nalloc, index) \
+ if ((index) >= (nalloc)) \
+ { \
+ while ((index) >= (nalloc)) \
+ (nalloc) *= 2; \
+ REALLOC(p, t, nalloc); \
+ }
+
+/* Stuff pertaining to charsets. */
+
+static int
+tstbit(b, c)
+ int b;
+ _charset c;
+{
+ return c[b / INTBITS] & 1 << b % INTBITS;
+}
+
+static void
+setbit(b, c)
+ int b;
+ _charset c;
+{
+ c[b / INTBITS] |= 1 << b % INTBITS;
+}
+
+static void
+clrbit(b, c)
+ int b;
+ _charset c;
+{
+ c[b / INTBITS] &= ~(1 << b % INTBITS);
+}
+
+static void
+copyset(src, dst)
+ const _charset src;
+ _charset dst;
+{
+ int i;
+
+ for (i = 0; i < _CHARSET_INTS; ++i)
+ dst[i] = src[i];
+}
+
+static void
+zeroset(s)
+ _charset s;
+{
+ int i;
+
+ for (i = 0; i < _CHARSET_INTS; ++i)
+ s[i] = 0;
+}
+
+static void
+notset(s)
+ _charset s;
+{
+ int i;
+
+ for (i = 0; i < _CHARSET_INTS; ++i)
+ s[i] = ~s[i];
+}
+
+static int
+equal(s1, s2)
+ const _charset s1;
+ const _charset s2;
+{
+ int i;
+
+ for (i = 0; i < _CHARSET_INTS; ++i)
+ if (s1[i] != s2[i])
+ return 0;
+ return 1;
+}
+
+/* A pointer to the current regexp is kept here during parsing. */
+static struct regexp *reg;
+
+/* Find the index of charset s in reg->charsets, or allocate a new charset. */
+static int
+charset_index(s)
+ const _charset s;
+{
+ int i;
+
+ for (i = 0; i < reg->cindex; ++i)
+ if (equal(s, reg->charsets[i]))
+ return i;
+ REALLOC_IF_NECESSARY(reg->charsets, _charset, reg->calloc, reg->cindex);
+ ++reg->cindex;
+ copyset(s, reg->charsets[i]);
+ return i;
+}
+
+/* Syntax bits controlling the behavior of the lexical analyzer. */
+static syntax_bits, syntax_bits_set;
+
+/* Flag for case-folding letters into sets. */
+static case_fold;
+
+/* Entry point to set syntax options. */
+void
+regsyntax(bits, fold)
+ long bits;
+ int fold;
+{
+ syntax_bits_set = 1;
+ syntax_bits = bits;
+ case_fold = fold;
+}
+
+/* Lexical analyzer. */
+static const char *lexstart; /* Pointer to beginning of input string. */
+static const char *lexptr; /* Pointer to next input character. */
+static lexleft; /* Number of characters remaining. */
+static caret_allowed; /* True if backward context allows ^
+ (meaningful only if RE_CONTEXT_INDEP_OPS
+ is turned off). */
+static closure_allowed; /* True if backward context allows closures
+ (meaningful only if RE_CONTEXT_INDEP_OPS
+ is turned off). */
+
+/* Note that characters become unsigned here. */
+#define FETCH(c, eoferr) \
+ { \
+ if (! lexleft) \
+ if (eoferr != NULL) \
+ reg_error(eoferr); \
+ else \
+ return _END; \
+ (c) = (unsigned char) *lexptr++; \
+ --lexleft; \
+ }
+
+static _token
+lex()
+{
+ _token c, c2;
+ int invert;
+ _charset cset;
+
+ FETCH(c, (char *) 0);
+ switch (c)
+ {
+ case '^':
+ if (! (syntax_bits & RE_CONTEXT_INDEP_OPS)
+ && (!caret_allowed ||
+ ((syntax_bits & RE_TIGHT_VBAR) && lexptr - 1 != lexstart)))
+ goto normal_char;
+ caret_allowed = 0;
+ return syntax_bits & RE_TIGHT_VBAR ? _ALLBEGLINE : _BEGLINE;
+
+ case '$':
+ if (syntax_bits & RE_CONTEXT_INDEP_OPS || !lexleft
+ || (! (syntax_bits & RE_TIGHT_VBAR)
+ && ((syntax_bits & RE_NO_BK_PARENS
+ ? lexleft > 0 && *lexptr == ')'
+ : lexleft > 1 && *lexptr == '\\' && lexptr[1] == ')')
+ || (syntax_bits & RE_NO_BK_VBAR
+ ? lexleft > 0 && *lexptr == '|'
+ : lexleft > 1 && *lexptr == '\\' && lexptr[1] == '|'))))
+ return syntax_bits & RE_TIGHT_VBAR ? _ALLENDLINE : _ENDLINE;
+ goto normal_char;
+
+ case '\\':
+ FETCH(c, "Unfinished \\ quote");
+ switch (c)
+ {
+ case '1':
+ case '2':
+ case '3':
+ case '4':
+ case '5':
+ case '6':
+ case '7':
+ case '8':
+ case '9':
+ caret_allowed = 0;
+ closure_allowed = 1;
+ return _BACKREF;
+
+ case '<':
+ caret_allowed = 0;
+ return _BEGWORD;
+
+ case '>':
+ caret_allowed = 0;
+ return _ENDWORD;
+
+ case 'b':
+ caret_allowed = 0;
+ return _LIMWORD;
+
+ case 'B':
+ caret_allowed = 0;
+ return _NOTLIMWORD;
+
+ case 'w':
+ case 'W':
+ zeroset(cset);
+ for (c2 = 0; c2 < _NOTCHAR; ++c2)
+ if (ISALNUM(c2))
+ setbit(c2, cset);
+ if (c == 'W')
+ notset(cset);
+ caret_allowed = 0;
+ closure_allowed = 1;
+ return _SET + charset_index(cset);
+
+ case '?':
+ if (syntax_bits & RE_BK_PLUS_QM)
+ goto qmark;
+ goto normal_char;
+
+ case '+':
+ if (syntax_bits & RE_BK_PLUS_QM)
+ goto plus;
+ goto normal_char;
+
+ case '|':
+ if (! (syntax_bits & RE_NO_BK_VBAR))
+ goto or;
+ goto normal_char;
+
+ case '(':
+ if (! (syntax_bits & RE_NO_BK_PARENS))
+ goto lparen;
+ goto normal_char;
+
+ case ')':
+ if (! (syntax_bits & RE_NO_BK_PARENS))
+ goto rparen;
+ goto normal_char;
+
+ default:
+ goto normal_char;
+ }
+
+ case '?':
+ if (syntax_bits & RE_BK_PLUS_QM)
+ goto normal_char;
+ qmark:
+ if (! (syntax_bits & RE_CONTEXT_INDEP_OPS) && !closure_allowed)
+ goto normal_char;
+ return _QMARK;
+
+ case '*':
+ if (! (syntax_bits & RE_CONTEXT_INDEP_OPS) && !closure_allowed)
+ goto normal_char;
+ return _STAR;
+
+ case '+':
+ if (syntax_bits & RE_BK_PLUS_QM)
+ goto normal_char;
+ plus:
+ if (! (syntax_bits & RE_CONTEXT_INDEP_OPS) && !closure_allowed)
+ goto normal_char;
+ return _PLUS;
+
+ case '|':
+ if (! (syntax_bits & RE_NO_BK_VBAR))
+ goto normal_char;
+ or:
+ caret_allowed = 1;
+ closure_allowed = 0;
+ return _OR;
+
+ case '\n':
+ if (! (syntax_bits & RE_NEWLINE_OR))
+ goto normal_char;
+ goto or;
+
+ case '(':
+ if (! (syntax_bits & RE_NO_BK_PARENS))
+ goto normal_char;
+ lparen:
+ caret_allowed = 1;
+ closure_allowed = 0;
+ return _LPAREN;
+
+ case ')':
+ if (! (syntax_bits & RE_NO_BK_PARENS))
+ goto normal_char;
+ rparen:
+ caret_allowed = 0;
+ closure_allowed = 1;
+ return _RPAREN;
+
+ case '.':
+ zeroset(cset);
+ notset(cset);
+ clrbit('\n', cset);
+ caret_allowed = 0;
+ closure_allowed = 1;
+ return _SET + charset_index(cset);
+
+ case '[':
+ zeroset(cset);
+ FETCH(c, "Unbalanced [");
+ if (c == '^')
+ {
+ FETCH(c, "Unbalanced [");
+ invert = 1;
+ }
+ else
+ invert = 0;
+ do
+ {
+ FETCH(c2, "Unbalanced [");
+ if ((syntax_bits & RE_AWK_CLASS_HACK) && c == '\\')
+ {
+ c = c2;
+ FETCH(c2, "Unbalanced [");
+ }
+ if (c2 == '-')
+ {
+ FETCH(c2, "Unbalanced [");
+ if (c2 == ']' && (syntax_bits & RE_AWK_CLASS_HACK))
+ {
+ setbit(c, cset);
+ setbit('-', cset);
+ break;
+ }
+ while (c <= c2)
+ setbit(c++, cset);
+ FETCH(c, "Unbalanced [");
+ }
+ else
+ {
+ setbit(c, cset);
+ c = c2;
+ }
+ }
+ while (c != ']');
+ if (invert)
+ notset(cset);
+ caret_allowed = 0;
+ closure_allowed = 1;
+ return _SET + charset_index(cset);
+
+ default:
+ normal_char:
+ caret_allowed = 0;
+ closure_allowed = 1;
+ if (case_fold && ISALPHA(c))
+ {
+ zeroset(cset);
+ if (isupper(c))
+ c = tolower(c);
+ setbit(c, cset);
+ setbit(toupper(c), cset);
+ return _SET + charset_index(cset);
+ }
+ return c;
+ }
+}
+
+/* Recursive descent parser for regular expressions. */
+
+static _token tok; /* Lookahead token. */
+static depth; /* Current depth of a hypothetical stack
+ holding deferred productions. This is
+ used to determine the depth that will be
+ required of the real stack later on in
+ reganalyze(). */
+
+/* Add the given token to the parse tree, maintaining the depth count and
+ updating the maximum depth if necessary. */
+static void
+addtok(t)
+ _token t;
+{
+ REALLOC_IF_NECESSARY(reg->tokens, _token, reg->talloc, reg->tindex);
+ reg->tokens[reg->tindex++] = t;
+
+ switch (t)
+ {
+ case _QMARK:
+ case _STAR:
+ case _PLUS:
+ break;
+
+ case _CAT:
+ case _OR:
+ --depth;
+ break;
+
+ default:
+ ++reg->nleaves;
+ case _EMPTY:
+ ++depth;
+ break;
+ }
+ if (depth > reg->depth)
+ reg->depth = depth;
+}
+
+/* The grammar understood by the parser is as follows.
+
+ start:
+ regexp
+ _ALLBEGLINE regexp
+ regexp _ALLENDLINE
+ _ALLBEGLINE regexp _ALLENDLINE
+
+ regexp:
+ regexp _OR branch
+ branch
+
+ branch:
+ branch closure
+ closure
+
+ closure:
+ closure _QMARK
+ closure _STAR
+ closure _PLUS
+ atom
+
+ atom:
+ <normal character>
+ _SET
+ _BACKREF
+ _BEGLINE
+ _ENDLINE
+ _BEGWORD
+ _ENDWORD
+ _LIMWORD
+ _NOTLIMWORD
+ <empty>
+
+ The parser builds a parse tree in postfix form in an array of tokens. */
+
+#ifdef __STDC__
+static void regexp(void);
+#else
+static void regexp();
+#endif
+
+static void
+atom()
+{
+ if (tok >= 0 && (tok < _NOTCHAR || tok >= _SET || tok == _BACKREF
+ || tok == _BEGLINE || tok == _ENDLINE || tok == _BEGWORD
+ || tok == _ENDWORD || tok == _LIMWORD || tok == _NOTLIMWORD))
+ {
+ addtok(tok);
+ tok = lex();
+ }
+ else if (tok == _LPAREN)
+ {
+ tok = lex();
+ regexp();
+ if (tok != _RPAREN)
+ reg_error("Unbalanced (");
+ tok = lex();
+ }
+ else
+ addtok(_EMPTY);
+}
+
+static void
+closure()
+{
+ atom();
+ while (tok == _QMARK || tok == _STAR || tok == _PLUS)
+ {
+ addtok(tok);
+ tok = lex();
+ }
+}
+
+static void
+branch()
+{
+ closure();
+ while (tok != _RPAREN && tok != _OR && tok != _ALLENDLINE && tok >= 0)
+ {
+ closure();
+ addtok(_CAT);
+ }
+}
+
+static void
+regexp()
+{
+ branch();
+ while (tok == _OR)
+ {
+ tok = lex();
+ branch();
+ addtok(_OR);
+ }
+}
+
+/* Main entry point for the parser. S is a string to be parsed, len is the
+ length of the string, so s can include NUL characters. R is a pointer to
+ the struct regexp to parse into. */
+void
+regparse(s, len, r)
+ const char *s;
+ size_t len;
+ struct regexp *r;
+{
+ reg = r;
+ lexstart = lexptr = s;
+ lexleft = len;
+ caret_allowed = 1;
+ closure_allowed = 0;
+
+ if (! syntax_bits_set)
+ reg_error("No syntax specified");
+
+ tok = lex();
+ depth = r->depth;
+
+ if (tok == _ALLBEGLINE)
+ {
+ addtok(_BEGLINE);
+ tok = lex();
+ regexp();
+ addtok(_CAT);
+ }
+ else
+ regexp();
+
+ if (tok == _ALLENDLINE)
+ {
+ addtok(_ENDLINE);
+ addtok(_CAT);
+ tok = lex();
+ }
+
+ if (tok != _END)
+ reg_error("Unbalanced )");
+
+ addtok(_END - r->nregexps);
+ addtok(_CAT);
+
+ if (r->nregexps)
+ addtok(_OR);
+
+ ++r->nregexps;
+}
+
+/* Some primitives for operating on sets of positions. */
+
+/* Copy one set to another; the destination must be large enough. */
+static void
+copy(src, dst)
+ const _position_set *src;
+ _position_set *dst;
+{
+ int i;
+
+ for (i = 0; i < src->nelem; ++i)
+ dst->elems[i] = src->elems[i];
+ dst->nelem = src->nelem;
+}
+
+/* Insert a position in a set. Position sets are maintained in sorted
+ order according to index. If position already exists in the set with
+ the same index then their constraints are logically or'd together.
+ S->elems must point to an array large enough to hold the resulting set. */
+static void
+insert(p, s)
+ _position p;
+ _position_set *s;
+{
+ int i;
+ _position t1, t2;
+
+ for (i = 0; i < s->nelem && p.index < s->elems[i].index; ++i)
+ ;
+ if (i < s->nelem && p.index == s->elems[i].index)
+ s->elems[i].constraint |= p.constraint;
+ else
+ {
+ t1 = p;
+ ++s->nelem;
+ while (i < s->nelem)
+ {
+ t2 = s->elems[i];
+ s->elems[i++] = t1;
+ t1 = t2;
+ }
+ }
+}
+
+/* Merge two sets of positions into a third. The result is exactly as if
+ the positions of both sets were inserted into an initially empty set. */
+static void
+merge(s1, s2, m)
+ _position_set *s1;
+ _position_set *s2;
+ _position_set *m;
+{
+ int i = 0, j = 0;
+
+ m->nelem = 0;
+ while (i < s1->nelem && j < s2->nelem)
+ if (s1->elems[i].index > s2->elems[j].index)
+ m->elems[m->nelem++] = s1->elems[i++];
+ else if (s1->elems[i].index < s2->elems[j].index)
+ m->elems[m->nelem++] = s2->elems[j++];
+ else
+ {
+ m->elems[m->nelem] = s1->elems[i++];
+ m->elems[m->nelem++].constraint |= s2->elems[j++].constraint;
+ }
+ while (i < s1->nelem)
+ m->elems[m->nelem++] = s1->elems[i++];
+ while (j < s2->nelem)
+ m->elems[m->nelem++] = s2->elems[j++];
+}
+
+/* Delete a position from a set. */
+static void
+delete(p, s)
+ _position p;
+ _position_set *s;
+{
+ int i;
+
+ for (i = 0; i < s->nelem; ++i)
+ if (p.index == s->elems[i].index)
+ break;
+ if (i < s->nelem)
+ for (--s->nelem; i < s->nelem; ++i)
+ s->elems[i] = s->elems[i + 1];
+}
+
+/* Find the index of the state corresponding to the given position set with
+ the given preceding context, or create a new state if there is no such
+ state. Newline and letter tell whether we got here on a newline or
+ letter, respectively. */
+static int
+state_index(r, s, newline, letter)
+ struct regexp *r;
+ _position_set *s;
+ int newline;
+ int letter;
+{
+ int lhash = 0;
+ int constraint;
+ int i, j;
+
+ newline = newline ? 1 : 0;
+ letter = letter ? 1 : 0;
+
+ for (i = 0; i < s->nelem; ++i)
+ lhash ^= s->elems[i].index + s->elems[i].constraint;
+
+ /* Try to find a state that exactly matches the proposed one. */
+ for (i = 0; i < r->sindex; ++i)
+ {
+ if (lhash != r->states[i].hash || s->nelem != r->states[i].elems.nelem
+ || newline != r->states[i].newline || letter != r->states[i].letter)
+ continue;
+ for (j = 0; j < s->nelem; ++j)
+ if (s->elems[j].constraint
+ != r->states[i].elems.elems[j].constraint
+ || s->elems[j].index != r->states[i].elems.elems[j].index)
+ break;
+ if (j == s->nelem)
+ return i;
+ }
+
+ /* We'll have to create a new state. */
+ REALLOC_IF_NECESSARY(r->states, _dfa_state, r->salloc, r->sindex);
+ r->states[i].hash = lhash;
+ MALLOC(r->states[i].elems.elems, _position, s->nelem);
+ copy(s, &r->states[i].elems);
+ r->states[i].newline = newline;
+ r->states[i].letter = letter;
+ r->states[i].backref = 0;
+ r->states[i].constraint = 0;
+ r->states[i].first_end = 0;
+ for (j = 0; j < s->nelem; ++j)
+ if (r->tokens[s->elems[j].index] < 0)
+ {
+ constraint = s->elems[j].constraint;
+ if (_SUCCEEDS_IN_CONTEXT(constraint, newline, 0, letter, 0)
+ || _SUCCEEDS_IN_CONTEXT(constraint, newline, 0, letter, 1)
+ || _SUCCEEDS_IN_CONTEXT(constraint, newline, 1, letter, 0)
+ || _SUCCEEDS_IN_CONTEXT(constraint, newline, 1, letter, 1))
+ r->states[i].constraint |= constraint;
+ if (! r->states[i].first_end)
+ r->states[i].first_end = r->tokens[s->elems[j].index];
+ }
+ else if (r->tokens[s->elems[j].index] == _BACKREF)
+ {
+ r->states[i].constraint = _NO_CONSTRAINT;
+ r->states[i].backref = 1;
+ }
+
+ ++r->sindex;
+
+ return i;
+}
+
+/* Find the epsilon closure of a set of positions. If any position of the set
+ contains a symbol that matches the empty string in some context, replace
+ that position with the elements of its follow labeled with an appropriate
+ constraint. Repeat exhaustively until no funny positions are left.
+ S->elems must be large enough to hold the result. */
+static void
+epsclosure(s, r)
+ _position_set *s;
+ struct regexp *r;
+{
+ int i, j;
+ int *visited;
+ _position p, old;
+
+ MALLOC(visited, int, r->tindex);
+ for (i = 0; i < r->tindex; ++i)
+ visited[i] = 0;
+
+ for (i = 0; i < s->nelem; ++i)
+ if (r->tokens[s->elems[i].index] >= _NOTCHAR
+ && r->tokens[s->elems[i].index] != _BACKREF
+ && r->tokens[s->elems[i].index] < _SET)
+ {
+ old = s->elems[i];
+ p.constraint = old.constraint;
+ delete(s->elems[i], s);
+ if (visited[old.index])
+ {
+ --i;
+ continue;
+ }
+ visited[old.index] = 1;
+ switch (r->tokens[old.index])
+ {
+ case _BEGLINE:
+ p.constraint &= _BEGLINE_CONSTRAINT;
+ break;
+ case _ENDLINE:
+ p.constraint &= _ENDLINE_CONSTRAINT;
+ break;
+ case _BEGWORD:
+ p.constraint &= _BEGWORD_CONSTRAINT;
+ break;
+ case _ENDWORD:
+ p.constraint &= _ENDWORD_CONSTRAINT;
+ break;
+ case _LIMWORD:
+ p.constraint &= _ENDWORD_CONSTRAINT;
+ break;
+ case _NOTLIMWORD:
+ p.constraint &= _NOTLIMWORD_CONSTRAINT;
+ break;
+ default:
+ break;
+ }
+ for (j = 0; j < r->follows[old.index].nelem; ++j)
+ {
+ p.index = r->follows[old.index].elems[j].index;
+ insert(p, s);
+ }
+ /* Force rescan to start at the beginning. */
+ i = -1;
+ }
+
+ free(visited);
+}
+
+/* Perform bottom-up analysis on the parse tree, computing various functions.
+ Note that at this point, we're pretending constructs like \< are real
+ characters rather than constraints on what can follow them.
+
+ Nullable: A node is nullable if it is at the root of a regexp that can
+ match the empty string.
+ * _EMPTY leaves are nullable.
+ * No other leaf is nullable.
+ * A _QMARK or _STAR node is nullable.
+ * A _PLUS node is nullable if its argument is nullable.
+ * A _CAT node is nullable if both its arguments are nullable.
+ * An _OR node is nullable if either argument is nullable.
+
+ Firstpos: The firstpos of a node is the set of positions (nonempty leaves)
+ that could correspond to the first character of a string matching the
+ regexp rooted at the given node.
+ * _EMPTY leaves have empty firstpos.
+ * The firstpos of a nonempty leaf is that leaf itself.
+ * The firstpos of a _QMARK, _STAR, or _PLUS node is the firstpos of its
+ argument.
+ * The firstpos of a _CAT node is the firstpos of the left argument, union
+ the firstpos of the right if the left argument is nullable.
+ * The firstpos of an _OR node is the union of firstpos of each argument.
+
+ Lastpos: The lastpos of a node is the set of positions that could
+ correspond to the last character of a string matching the regexp at
+ the given node.
+ * _EMPTY leaves have empty lastpos.
+ * The lastpos of a nonempty leaf is that leaf itself.
+ * The lastpos of a _QMARK, _STAR, or _PLUS node is the lastpos of its
+ argument.
+ * The lastpos of a _CAT node is the lastpos of its right argument, union
+ the lastpos of the left if the right argument is nullable.
+ * The lastpos of an _OR node is the union of the lastpos of each argument.
+
+ Follow: The follow of a position is the set of positions that could
+ correspond to the character following a character matching the node in
+ a string matching the regexp. At this point we consider special symbols
+ that match the empty string in some context to be just normal characters.
+ Later, if we find that a special symbol is in a follow set, we will
+ replace it with the elements of its follow, labeled with an appropriate
+ constraint.
+ * Every node in the firstpos of the argument of a _STAR or _PLUS node is in
+ the follow of every node in the lastpos.
+ * Every node in the firstpos of the second argument of a _CAT node is in
+ the follow of every node in the lastpos of the first argument.
+
+ Because of the postfix representation of the parse tree, the depth-first
+ analysis is conveniently done by a linear scan with the aid of a stack.
+ Sets are stored as arrays of the elements, obeying a stack-like allocation
+ scheme; the number of elements in each set deeper in the stack can be
+ used to determine the address of a particular set's array. */
+void
+reganalyze(r, searchflag)
+ struct regexp *r;
+ int searchflag;
+{
+ int *nullable; /* Nullable stack. */
+ int *nfirstpos; /* Element count stack for firstpos sets. */
+ _position *firstpos; /* Array where firstpos elements are stored. */
+ int *nlastpos; /* Element count stack for lastpos sets. */
+ _position *lastpos; /* Array where lastpos elements are stored. */
+ int *nalloc; /* Sizes of arrays allocated to follow sets. */
+ _position_set tmp; /* Temporary set for merging sets. */
+ _position_set merged; /* Result of merging sets. */
+ int wants_newline; /* True if some position wants newline info. */
+ int *o_nullable;
+ int *o_nfirst, *o_nlast;
+ _position *o_firstpos, *o_lastpos;
+ int i, j;
+ _position *pos;
+
+ r->searchflag = searchflag;
+
+ MALLOC(nullable, int, r->depth);
+ o_nullable = nullable;
+ MALLOC(nfirstpos, int, r->depth);
+ o_nfirst = nfirstpos;
+ MALLOC(firstpos, _position, r->nleaves);
+ o_firstpos = firstpos, firstpos += r->nleaves;
+ MALLOC(nlastpos, int, r->depth);
+ o_nlast = nlastpos;
+ MALLOC(lastpos, _position, r->nleaves);
+ o_lastpos = lastpos, lastpos += r->nleaves;
+ MALLOC(nalloc, int, r->tindex);
+ for (i = 0; i < r->tindex; ++i)
+ nalloc[i] = 0;
+ MALLOC(merged.elems, _position, r->nleaves);
+
+ CALLOC(r->follows, _position_set, r->tindex);
+
+ for (i = 0; i < r->tindex; ++i)
+ switch (r->tokens[i])
+ {
+ case _EMPTY:
+ /* The empty set is nullable. */
+ *nullable++ = 1;
+
+ /* The firstpos and lastpos of the empty leaf are both empty. */
+ *nfirstpos++ = *nlastpos++ = 0;
+ break;
+
+ case _STAR:
+ case _PLUS:
+ /* Every element in the firstpos of the argument is in the follow
+ of every element in the lastpos. */
+ tmp.nelem = nfirstpos[-1];
+ tmp.elems = firstpos;
+ pos = lastpos;
+ for (j = 0; j < nlastpos[-1]; ++j)
+ {
+ merge(&tmp, &r->follows[pos[j].index], &merged);
+ REALLOC_IF_NECESSARY(r->follows[pos[j].index].elems, _position,
+ nalloc[pos[j].index], merged.nelem - 1);
+ copy(&merged, &r->follows[pos[j].index]);
+ }
+
+ case _QMARK:
+ /* A _QMARK or _STAR node is automatically nullable. */
+ if (r->tokens[i] != _PLUS)
+ nullable[-1] = 1;
+ break;
+
+ case _CAT:
+ /* Every element in the firstpos of the second argument is in the
+ follow of every element in the lastpos of the first argument. */
+ tmp.nelem = nfirstpos[-1];
+ tmp.elems = firstpos;
+ pos = lastpos + nlastpos[-1];
+ for (j = 0; j < nlastpos[-2]; ++j)
+ {
+ merge(&tmp, &r->follows[pos[j].index], &merged);
+ REALLOC_IF_NECESSARY(r->follows[pos[j].index].elems, _position,
+ nalloc[pos[j].index], merged.nelem - 1);
+ copy(&merged, &r->follows[pos[j].index]);
+ }
+
+ /* The firstpos of a _CAT node is the firstpos of the first argument,
+ union that of the second argument if the first is nullable. */
+ if (nullable[-2])
+ nfirstpos[-2] += nfirstpos[-1];
+ else
+ firstpos += nfirstpos[-1];
+ --nfirstpos;
+
+ /* The lastpos of a _CAT node is the lastpos of the second argument,
+ union that of the first argument if the second is nullable. */
+ if (nullable[-1])
+ nlastpos[-2] += nlastpos[-1];
+ else
+ {
+ pos = lastpos + nlastpos[-2];
+ for (j = nlastpos[-1] - 1; j >= 0; --j)
+ pos[j] = lastpos[j];
+ lastpos += nlastpos[-2];
+ nlastpos[-2] = nlastpos[-1];
+ }
+ --nlastpos;
+
+ /* A _CAT node is nullable if both arguments are nullable. */
+ nullable[-2] = nullable[-1] && nullable[-2];
+ --nullable;
+ break;
+
+ case _OR:
+ /* The firstpos is the union of the firstpos of each argument. */
+ nfirstpos[-2] += nfirstpos[-1];
+ --nfirstpos;
+
+ /* The lastpos is the union of the lastpos of each argument. */
+ nlastpos[-2] += nlastpos[-1];
+ --nlastpos;
+
+ /* An _OR node is nullable if either argument is nullable. */
+ nullable[-2] = nullable[-1] || nullable[-2];
+ --nullable;
+ break;
+
+ default:
+ /* Anything else is a nonempty position. (Note that special
+ constructs like \< are treated as nonempty strings here;
+ an "epsilon closure" effectively makes them nullable later.
+ Backreferences have to get a real position so we can detect
+ transitions on them later. But they are nullable. */
+ *nullable++ = r->tokens[i] == _BACKREF;
+
+ /* This position is in its own firstpos and lastpos. */
+ *nfirstpos++ = *nlastpos++ = 1;
+ --firstpos, --lastpos;
+ firstpos->index = lastpos->index = i;
+ firstpos->constraint = lastpos->constraint = _NO_CONSTRAINT;
+
+ /* Allocate the follow set for this position. */
+ nalloc[i] = 1;
+ MALLOC(r->follows[i].elems, _position, nalloc[i]);
+ break;
+ }
+
+ /* For each follow set that is the follow set of a real position, replace
+ it with its epsilon closure. */
+ for (i = 0; i < r->tindex; ++i)
+ if (r->tokens[i] < _NOTCHAR || r->tokens[i] == _BACKREF
+ || r->tokens[i] >= _SET)
+ {
+ copy(&r->follows[i], &merged);
+ epsclosure(&merged, r);
+ if (r->follows[i].nelem < merged.nelem)
+ REALLOC(r->follows[i].elems, _position, merged.nelem);
+ copy(&merged, &r->follows[i]);
+ }
+
+ /* Get the epsilon closure of the firstpos of the regexp. The result will
+ be the set of positions of state 0. */
+ merged.nelem = 0;
+ for (i = 0; i < nfirstpos[-1]; ++i)
+ insert(firstpos[i], &merged);
+ epsclosure(&merged, r);
+
+ /* Check if any of the positions of state 0 will want newline context. */
+ wants_newline = 0;
+ for (i = 0; i < merged.nelem; ++i)
+ if (_PREV_NEWLINE_DEPENDENT(merged.elems[i].constraint))
+ wants_newline = 1;
+
+ /* Build the initial state. */
+ r->salloc = 1;
+ r->sindex = 0;
+ MALLOC(r->states, _dfa_state, r->salloc);
+ state_index(r, &merged, wants_newline, 0);
+
+ free(o_nullable);
+ free(o_nfirst);
+ free(o_firstpos);
+ free(o_nlast);
+ free(o_lastpos);
+ free(nalloc);
+ free(merged.elems);
+}
+
+/* Find, for each character, the transition out of state s of r, and store
+ it in the appropriate slot of trans.
+
+ We divide the positions of s into groups (positions can appear in more
+ than one group). Each group is labeled with a set of characters that
+ every position in the group matches (taking into account, if necessary,
+ preceding context information of s). For each group, find the union
+ of the its elements' follows. This set is the set of positions of the
+ new state. For each character in the group's label, set the transition
+ on this character to be to a state corresponding to the set's positions,
+ and its associated backward context information, if necessary.
+
+ If we are building a searching matcher, we include the positions of state
+ 0 in every state.
+
+ The collection of groups is constructed by building an equivalence-class
+ partition of the positions of s.
+
+ For each position, find the set of characters C that it matches. Eliminate
+ any characters from C that fail on grounds of backward context.
+
+ Search through the groups, looking for a group whose label L has nonempty
+ intersection with C. If L - C is nonempty, create a new group labeled
+ L - C and having the same positions as the current group, and set L to
+ the intersection of L and C. Insert the position in this group, set
+ C = C - L, and resume scanning.
+
+ If after comparing with every group there are characters remaining in C,
+ create a new group labeled with the characters of C and insert this
+ position in that group. */
+void
+regstate(s, r, trans)
+ int s;
+ struct regexp *r;
+ int trans[];
+{
+ _position_set grps[_NOTCHAR]; /* As many as will ever be needed. */
+ _charset labels[_NOTCHAR]; /* Labels corresponding to the groups. */
+ int ngrps = 0; /* Number of groups actually used. */
+ _position pos; /* Current position being considered. */
+ _charset matches; /* Set of matching characters. */
+ int matchesf; /* True if matches is nonempty. */
+ _charset intersect; /* Intersection with some label set. */
+ int intersectf; /* True if intersect is nonempty. */
+ _charset leftovers; /* Stuff in the label that didn't match. */
+ int leftoversf; /* True if leftovers is nonempty. */
+ static _charset letters; /* Set of characters considered letters. */
+ static _charset newline; /* Set of characters that aren't newline. */
+ _position_set follows; /* Union of the follows of some group. */
+ _position_set tmp; /* Temporary space for merging sets. */
+ int state; /* New state. */
+ int wants_newline; /* New state wants to know newline context. */
+ int state_newline; /* New state on a newline transition. */
+ int wants_letter; /* New state wants to know letter context. */
+ int state_letter; /* New state on a letter transition. */
+ static initialized; /* Flag for static initialization. */
+ int i, j, k;
+
+ /* Initialize the set of letters, if necessary. */
+ if (! initialized)
+ {
+ initialized = 1;
+ for (i = 0; i < _NOTCHAR; ++i)
+ if (ISALNUM(i))
+ setbit(i, letters);
+ setbit('\n', newline);
+ }
+
+ zeroset(matches);
+
+ for (i = 0; i < r->states[s].elems.nelem; ++i)
+ {
+ pos = r->states[s].elems.elems[i];
+ if (r->tokens[pos.index] >= 0 && r->tokens[pos.index] < _NOTCHAR)
+ setbit(r->tokens[pos.index], matches);
+ else if (r->tokens[pos.index] >= _SET)
+ copyset(r->charsets[r->tokens[pos.index] - _SET], matches);
+ else
+ continue;
+
+ /* Some characters may need to be climinated from matches because
+ they fail in the current context. */
+ if (pos.constraint != 0xff)
+ {
+ if (! _MATCHES_NEWLINE_CONTEXT(pos.constraint,
+ r->states[s].newline, 1))
+ clrbit('\n', matches);
+ if (! _MATCHES_NEWLINE_CONTEXT(pos.constraint,
+ r->states[s].newline, 0))
+ for (j = 0; j < _CHARSET_INTS; ++j)
+ matches[j] &= newline[j];
+ if (! _MATCHES_LETTER_CONTEXT(pos.constraint,
+ r->states[s].letter, 1))
+ for (j = 0; j < _CHARSET_INTS; ++j)
+ matches[j] &= ~letters[j];
+ if (! _MATCHES_LETTER_CONTEXT(pos.constraint,
+ r->states[s].letter, 0))
+ for (j = 0; j < _CHARSET_INTS; ++j)
+ matches[j] &= letters[j];
+
+ /* If there are no characters left, there's no point in going on. */
+ for (j = 0; j < _CHARSET_INTS && !matches[j]; ++j)
+ ;
+ if (j == _CHARSET_INTS)
+ continue;
+ }
+
+ for (j = 0; j < ngrps; ++j)
+ {
+ /* If matches contains a single character only, and the current
+ group's label doesn't contain that character, go on to the
+ next group. */
+ if (r->tokens[pos.index] >= 0 && r->tokens[pos.index] < _NOTCHAR
+ && !tstbit(r->tokens[pos.index], labels[j]))
+ continue;
+
+ /* Check if this group's label has a nonempty intersection with
+ matches. */
+ intersectf = 0;
+ for (k = 0; k < _CHARSET_INTS; ++k)
+ (intersect[k] = matches[k] & labels[j][k]) ? intersectf = 1 : 0;
+ if (! intersectf)
+ continue;
+
+ /* It does; now find the set differences both ways. */
+ leftoversf = matchesf = 0;
+ for (k = 0; k < _CHARSET_INTS; ++k)
+ {
+ /* Even an optimizing compiler can't know this for sure. */
+ int match = matches[k], label = labels[j][k];
+
+ (leftovers[k] = ~match & label) ? leftoversf = 1 : 0;
+ (matches[k] = match & ~label) ? matchesf = 1 : 0;
+ }
+
+ /* If there were leftovers, create a new group labeled with them. */
+ if (leftoversf)
+ {
+ copyset(leftovers, labels[ngrps]);
+ copyset(intersect, labels[j]);
+ MALLOC(grps[ngrps].elems, _position, r->nleaves);
+ copy(&grps[j], &grps[ngrps]);
+ ++ngrps;
+ }
+
+ /* Put the position in the current group. Note that there is no
+ reason to call insert() here. */
+ grps[j].elems[grps[j].nelem++] = pos;
+
+ /* If every character matching the current position has been
+ accounted for, we're done. */
+ if (! matchesf)
+ break;
+ }
+
+ /* If we've passed the last group, and there are still characters
+ unaccounted for, then we'll have to create a new group. */
+ if (j == ngrps)
+ {
+ copyset(matches, labels[ngrps]);
+ zeroset(matches);
+ MALLOC(grps[ngrps].elems, _position, r->nleaves);
+ grps[ngrps].nelem = 1;
+ grps[ngrps].elems[0] = pos;
+ ++ngrps;
+ }
+ }
+
+ MALLOC(follows.elems, _position, r->nleaves);
+ MALLOC(tmp.elems, _position, r->nleaves);
+
+ /* If we are a searching matcher, the default transition is to a state
+ containing the positions of state 0, otherwise the default transition
+ is to fail miserably. */
+ if (r->searchflag)
+ {
+ wants_newline = 0;
+ wants_letter = 0;
+ for (i = 0; i < r->states[0].elems.nelem; ++i)
+ {
+ if (_PREV_NEWLINE_DEPENDENT(r->states[0].elems.elems[i].constraint))
+ wants_newline = 1;
+ if (_PREV_LETTER_DEPENDENT(r->states[0].elems.elems[i].constraint))
+ wants_letter = 1;
+ }
+ copy(&r->states[0].elems, &follows);
+ state = state_index(r, &follows, 0, 0);
+ if (wants_newline)
+ state_newline = state_index(r, &follows, 1, 0);
+ else
+ state_newline = state;
+ if (wants_letter)
+ state_letter = state_index(r, &follows, 0, 1);
+ else
+ state_letter = state;
+ for (i = 0; i < _NOTCHAR; ++i)
+ trans[i] = (ISALNUM(i)) ? state_letter : state ;
+ trans['\n'] = state_newline;
+ }
+ else
+ for (i = 0; i < _NOTCHAR; ++i)
+ trans[i] = -1;
+
+ for (i = 0; i < ngrps; ++i)
+ {
+ follows.nelem = 0;
+
+ /* Find the union of the follows of the positions of the group.
+ This is a hideously inefficient loop. Fix it someday. */
+ for (j = 0; j < grps[i].nelem; ++j)
+ for (k = 0; k < r->follows[grps[i].elems[j].index].nelem; ++k)
+ insert(r->follows[grps[i].elems[j].index].elems[k], &follows);
+
+ /* If we are building a searching matcher, throw in the positions
+ of state 0 as well. */
+ if (r->searchflag)
+ for (j = 0; j < r->states[0].elems.nelem; ++j)
+ insert(r->states[0].elems.elems[j], &follows);
+
+ /* Find out if the new state will want any context information. */
+ wants_newline = 0;
+ if (tstbit('\n', labels[i]))
+ for (j = 0; j < follows.nelem; ++j)
+ if (_PREV_NEWLINE_DEPENDENT(follows.elems[j].constraint))
+ wants_newline = 1;
+
+ wants_letter = 0;
+ for (j = 0; j < _CHARSET_INTS; ++j)
+ if (labels[i][j] & letters[j])
+ break;
+ if (j < _CHARSET_INTS)
+ for (j = 0; j < follows.nelem; ++j)
+ if (_PREV_LETTER_DEPENDENT(follows.elems[j].constraint))
+ wants_letter = 1;
+
+ /* Find the state(s) corresponding to the union of the follows. */
+ state = state_index(r, &follows, 0, 0);
+ if (wants_newline)
+ state_newline = state_index(r, &follows, 1, 0);
+ else
+ state_newline = state;
+ if (wants_letter)
+ state_letter = state_index(r, &follows, 0, 1);
+ else
+ state_letter = state;
+
+ /* Set the transitions for each character in the current label. */
+ for (j = 0; j < _CHARSET_INTS; ++j)
+ for (k = 0; k < INTBITS; ++k)
+ if (labels[i][j] & 1 << k)
+ {
+ int c = j * INTBITS + k;
+
+ if (c == '\n')
+ trans[c] = state_newline;
+ else if (ISALNUM(c))
+ trans[c] = state_letter;
+ else if (c < _NOTCHAR)
+ trans[c] = state;
+ }
+ }
+
+ for (i = 0; i < ngrps; ++i)
+ free(grps[i].elems);
+ free(follows.elems);
+ free(tmp.elems);
+}
+
+/* Some routines for manipulating a compiled regexp's transition tables.
+ Each state may or may not have a transition table; if it does, and it
+ is a non-accepting state, then r->trans[state] points to its table.
+ If it is an accepting state then r->fails[state] points to its table.
+ If it has no table at all, then r->trans[state] is NULL.
+ TODO: Improve this comment, get rid of the unnecessary redundancy. */
+
+static void
+build_state(s, r)
+ int s;
+ struct regexp *r;
+{
+ int *trans; /* The new transition table. */
+ int i;
+
+ /* Set an upper limit on the number of transition tables that will ever
+ exist at once. 1024 is arbitrary. The idea is that the frequently
+ used transition tables will be quickly rebuilt, whereas the ones that
+ were only needed once or twice will be cleared away. */
+ if (r->trcount >= 1024)
+ {
+ for (i = 0; i < r->tralloc; ++i)
+ if (r->trans[i])
+ {
+ free((ptr_t) r->trans[i]);
+ r->trans[i] = NULL;
+ }
+ else if (r->fails[i])
+ {
+ free((ptr_t) r->fails[i]);
+ r->fails[i] = NULL;
+ }
+ r->trcount = 0;
+ }
+
+ ++r->trcount;
+
+ /* Set up the success bits for this state. */
+ r->success[s] = 0;
+ if (ACCEPTS_IN_CONTEXT(r->states[s].newline, 1, r->states[s].letter, 0,
+ s, *r))
+ r->success[s] |= 4;
+ if (ACCEPTS_IN_CONTEXT(r->states[s].newline, 0, r->states[s].letter, 1,
+ s, *r))
+ r->success[s] |= 2;
+ if (ACCEPTS_IN_CONTEXT(r->states[s].newline, 0, r->states[s].letter, 0,
+ s, *r))
+ r->success[s] |= 1;
+
+ MALLOC(trans, int, _NOTCHAR);
+ regstate(s, r, trans);
+
+ /* Now go through the new transition table, and make sure that the trans
+ and fail arrays are allocated large enough to hold a pointer for the
+ largest state mentioned in the table. */
+ for (i = 0; i < _NOTCHAR; ++i)
+ if (trans[i] >= r->tralloc)
+ {
+ int oldalloc = r->tralloc;
+
+ while (trans[i] >= r->tralloc)
+ r->tralloc *= 2;
+ REALLOC(r->realtrans, int *, r->tralloc + 1);
+ r->trans = r->realtrans + 1;
+ REALLOC(r->fails, int *, r->tralloc);
+ REALLOC(r->success, int, r->tralloc);
+ REALLOC(r->newlines, int, r->tralloc);
+ while (oldalloc < r->tralloc)
+ {
+ r->trans[oldalloc] = NULL;
+ r->fails[oldalloc++] = NULL;
+ }
+ }
+
+ /* Keep the newline transition in a special place so we can use it as
+ a sentinel. */
+ r->newlines[s] = trans['\n'];
+ trans['\n'] = -1;
+
+ if (ACCEPTING(s, *r))
+ r->fails[s] = trans;
+ else
+ r->trans[s] = trans;
+}
+
+static void
+build_state_zero(r)
+ struct regexp *r;
+{
+ r->tralloc = 1;
+ r->trcount = 0;
+ CALLOC(r->realtrans, int *, r->tralloc + 1);
+ r->trans = r->realtrans + 1;
+ CALLOC(r->fails, int *, r->tralloc);
+ MALLOC(r->success, int, r->tralloc);
+ MALLOC(r->newlines, int, r->tralloc);
+ build_state(0, r);
+}
+
+/* Search through a buffer looking for a match to the given struct regexp.
+ Find the first occurrence of a string matching the regexp in the buffer,
+ and the shortest possible version thereof. Return a pointer to the first
+ character after the match, or NULL if none is found. Begin points to
+ the beginning of the buffer, and end points to the first character after
+ its end. We store a newline in *end to act as a sentinel, so end had
+ better point somewhere valid. Newline is a flag indicating whether to
+ allow newlines to be in the matching string. If count is non-
+ NULL it points to a place we're supposed to increment every time we
+ see a newline. Finally, if backref is non-NULL it points to a place
+ where we're supposed to store a 1 if backreferencing happened and the
+ match needs to be verified by a backtracking matcher. Otherwise
+ we store a 0 in *backref. */
+char *
+regexecute(r, begin, end, newline, count, backref)
+ struct regexp *r;
+ char *begin;
+ char *end;
+ int newline;
+ int *count;
+ int *backref;
+{
+ register s, s1, tmp; /* Current state. */
+ register unsigned char *p; /* Current input character. */
+ register **trans, *t; /* Copy of r->trans so it can be optimized
+ into a register. */
+ static sbit[_NOTCHAR]; /* Table for anding with r->success. */
+ static sbit_init;
+
+ if (! sbit_init)
+ {
+ int i;
+
+ sbit_init = 1;
+ for (i = 0; i < _NOTCHAR; ++i)
+ sbit[i] = (ISALNUM(i)) ? 2 : 1;
+ sbit['\n'] = 4;
+ }
+
+ if (! r->tralloc)
+ build_state_zero(r);
+
+ s = s1 = 0;
+ p = (unsigned char *) begin;
+ trans = r->trans;
+ *end = '\n';
+
+ for (;;)
+ {
+ while ((t = trans[s]) != 0) { /* hand-optimized loop */
+ s1 = t[*p++];
+ if ((t = trans[s1]) == 0) {
+ tmp = s ; s = s1 ; s1 = tmp ; /* swap */
+ break;
+ }
+ s = t[*p++];
+ }
+
+ if (s >= 0 && p <= (unsigned char *) end && r->fails[s])
+ {
+ if (r->success[s] & sbit[*p])
+ {
+ if (backref)
+ *backref = (r->states[s].backref != 0);
+ return (char *) p;
+ }
+
+ s1 = s;
+ s = r->fails[s][*p++];
+ continue;
+ }
+
+ /* If the previous character was a newline, count it. */
+ if (count && (char *) p <= end && p[-1] == '\n')
+ ++*count;
+
+ /* Check if we've run off the end of the buffer. */
+ if ((char *) p >= end)
+ return NULL;
+
+ if (s >= 0)
+ {
+ build_state(s, r);
+ trans = r->trans;
+ continue;
+ }
+
+ if (p[-1] == '\n' && newline)
+ {
+ s = r->newlines[s1];
+ continue;
+ }
+
+ s = 0;
+ }
+}
+
+/* Initialize the components of a regexp that the other routines don't
+ initialize for themselves. */
+void
+reginit(r)
+ struct regexp *r;
+{
+ r->calloc = 1;
+ MALLOC(r->charsets, _charset, r->calloc);
+ r->cindex = 0;
+
+ r->talloc = 1;
+ MALLOC(r->tokens, _token, r->talloc);
+ r->tindex = r->depth = r->nleaves = r->nregexps = 0;
+
+ r->searchflag = 0;
+ r->tralloc = 0;
+}
+
+/* Parse and analyze a single string of the given length. */
+void
+regcompile(s, len, r, searchflag)
+ const char *s;
+ size_t len;
+ struct regexp *r;
+ int searchflag;
+{
+ if (case_fold) /* dummy folding in service of regmust() */
+ {
+ char *regcopy;
+ int i;
+
+ regcopy = malloc(len);
+ if (!regcopy)
+ reg_error("out of memory");
+
+ /* This is a complete kludge and could potentially break
+ \<letter> escapes . . . */
+ case_fold = 0;
+ for (i = 0; i < len; ++i)
+ if (ISUPPER(s[i]))
+ regcopy[i] = tolower(s[i]);
+ else
+ regcopy[i] = s[i];
+
+ reginit(r);
+ r->mustn = 0;
+ r->must[0] = '\0';
+ regparse(regcopy, len, r);
+ free(regcopy);
+ regmust(r);
+ reganalyze(r, searchflag);
+ case_fold = 1;
+ reginit(r);
+ regparse(s, len, r);
+ reganalyze(r, searchflag);
+ }
+ else
+ {
+ reginit(r);
+ regparse(s, len, r);
+ regmust(r);
+ reganalyze(r, searchflag);
+ }
+}
+
+/* Free the storage held by the components of a regexp. */
+void
+reg_free(r)
+ struct regexp *r;
+{
+ int i;
+
+ free((ptr_t) r->charsets);
+ free((ptr_t) r->tokens);
+ for (i = 0; i < r->sindex; ++i)
+ free((ptr_t) r->states[i].elems.elems);
+ free((ptr_t) r->states);
+ for (i = 0; i < r->tindex; ++i)
+ if (r->follows[i].elems)
+ free((ptr_t) r->follows[i].elems);
+ free((ptr_t) r->follows);
+ for (i = 0; i < r->tralloc; ++i)
+ if (r->trans[i])
+ free((ptr_t) r->trans[i]);
+ else if (r->fails[i])
+ free((ptr_t) r->fails[i]);
+ if (r->realtrans)
+ free((ptr_t) r->realtrans);
+ if (r->fails)
+ free((ptr_t) r->fails);
+ if (r->newlines)
+ free((ptr_t) r->newlines);
+}
+
+/*
+Having found the postfix representation of the regular expression,
+try to find a long sequence of characters that must appear in any line
+containing the r.e.
+Finding a "longest" sequence is beyond the scope here;
+we take an easy way out and hope for the best.
+(Take "(ab|a)b"--please.)
+
+We do a bottom-up calculation of sequences of characters that must appear
+in matches of r.e.'s represented by trees rooted at the nodes of the postfix
+representation:
+ sequences that must appear at the left of the match ("left")
+ sequences that must appear at the right of the match ("right")
+ lists of sequences that must appear somewhere in the match ("in")
+ sequences that must constitute the match ("is")
+When we get to the root of the tree, we use one of the longest of its
+calculated "in" sequences as our answer. The sequence we find is returned in
+r->must (where "r" is the single argument passed to "regmust");
+the length of the sequence is returned in r->mustn.
+
+The sequences calculated for the various types of node (in pseudo ANSI c)
+are shown below. "p" is the operand of unary operators (and the left-hand
+operand of binary operators); "q" is the right-hand operand of binary operators
+.
+"ZERO" means "a zero-length sequence" below.
+
+Type left right is in
+---- ---- ----- -- --
+char c # c # c # c # c
+
+SET ZERO ZERO ZERO ZERO
+
+STAR ZERO ZERO ZERO ZERO
+
+QMARK ZERO ZERO ZERO ZERO
+
+PLUS p->left p->right ZERO p->in
+
+CAT (p->is==ZERO)? (q->is==ZERO)? (p->is!=ZERO && p->in plus
+ p->left : q->right : q->is!=ZERO) ? q->in plus
+ p->is##q->left p->right##q->is p->is##q->is : p->right##q->left
+ ZERO
+
+OR longest common longest common (do p->is and substrings common to
+ leading trailing q->is have same p->in and q->in
+ (sub)sequence (sub)sequence length and
+ of p->left of p->right content) ?
+ and q->left and q->right p->is : NULL
+
+If there's anything else we recognize in the tree, all four sequences get set
+to zero-length sequences. If there's something we don't recognize in the tree,
+we just return a zero-length sequence.
+
+Break ties in favor of infrequent letters (choosing 'zzz' in preference to
+'aaa')?
+
+And. . .is it here or someplace that we might ponder "optimizations" such as
+ egrep 'psi|epsilon' -> egrep 'psi'
+ egrep 'pepsi|epsilon' -> egrep 'epsi'
+ (Yes, we now find "epsi" as a "string
+ that must occur", but we might also
+ simplify the *entire* r.e. being sought
+)
+ grep '[c]' -> grep 'c'
+ grep '(ab|a)b' -> grep 'ab'
+ grep 'ab*' -> grep 'a'
+ grep 'a*b' -> grep 'b'
+There are several issues:
+ Is optimization easy (enough)?
+
+ Does optimization actually accomplish anything,
+ or is the automaton you get from "psi|epsilon" (for example)
+ the same as the one you get from "psi" (for example)?
+
+ Are optimizable r.e.'s likely to be used in real-life situations
+ (something like 'ab*' is probably unlikely; something like is
+ 'psi|epsilon' is likelier)?
+*/
+
+static char *
+icatalloc(old, new)
+char * old;
+const char * new;
+{
+ register char * result;
+ register int oldsize, newsize;
+
+ newsize = (new == NULL) ? 0 : strlen(new);
+ if (old == NULL)
+ oldsize = 0;
+ else if (newsize == 0)
+ return old;
+ else oldsize = strlen(old);
+ if (old == NULL)
+ result = (char *) malloc(newsize + 1);
+ else result = (char *) realloc((void *) old, oldsize + newsize + 1);
+ if (result != NULL && new != NULL)
+ (void) strcpy(result + oldsize, new);
+ return result;
+}
+
+static char *
+icpyalloc(string)
+const char * string;
+{
+ return icatalloc((char *) NULL, string);
+}
+
+static char *
+istrstr(lookin, lookfor)
+char * lookin;
+register char * lookfor;
+{
+ register char * cp;
+ register int len;
+
+ len = strlen(lookfor);
+ for (cp = lookin; *cp != '\0'; ++cp)
+ if (strncmp(cp, lookfor, len) == 0)
+ return cp;
+ return NULL;
+}
+
+static void
+ifree(cp)
+char * cp;
+{
+ if (cp != NULL)
+ free(cp);
+}
+
+static void
+freelist(cpp)
+register char ** cpp;
+{
+ register int i;
+
+ if (cpp == NULL)
+ return;
+ for (i = 0; cpp[i] != NULL; ++i) {
+ free(cpp[i]);
+ cpp[i] = NULL;
+ }
+}
+
+static char **
+enlist(cpp, new, len)
+register char ** cpp;
+register char * new;
+#ifdef __STDC__
+size_t len;
+#else
+int len;
+#endif
+{
+ register int i, j;
+
+ if (cpp == NULL)
+ return NULL;
+ if ((new = icpyalloc(new)) == NULL) {
+ freelist(cpp);
+ return NULL;
+ }
+ new[len] = '\0';
+ /*
+ ** Is there already something in the list that's new (or longer)?
+ */
+ for (i = 0; cpp[i] != NULL; ++i)
+ if (istrstr(cpp[i], new) != NULL) {
+ free(new);
+ return cpp;
+ }
+ /*
+ ** Eliminate any obsoleted strings.
+ */
+ j = 0;
+ while (cpp[j] != NULL)
+ if (istrstr(new, cpp[j]) == NULL)
+ ++j;
+ else {
+ free(cpp[j]);
+ if (--i == j)
+ break;
+ cpp[j] = cpp[i];
+ }
+ /*
+ ** Add the new string.
+ */
+ cpp = (char **) realloc((char *) cpp, (i + 2) * sizeof *cpp);
+ if (cpp == NULL)
+ return NULL;
+ cpp[i] = new;
+ cpp[i + 1] = NULL;
+ return cpp;
+}
+
+/*
+** Given pointers to two strings,
+** return a pointer to an allocated list of their distinct common substrings.
+** Return NULL if something seems wild.
+*/
+
+static char **
+comsubs(left, right)
+char * left;
+char * right;
+{
+ register char ** cpp;
+ register char * lcp;
+ register char * rcp;
+ register int i, len;
+
+ if (left == NULL || right == NULL)
+ return NULL;
+ cpp = (char **) malloc(sizeof *cpp);
+ if (cpp == NULL)
+ return NULL;
+ cpp[0] = NULL;
+ for (lcp = left; *lcp != '\0'; ++lcp) {
+ len = 0;
+ rcp = strchr(right, *lcp);
+ while (rcp != NULL) {
+ for (i = 1; lcp[i] != '\0' && lcp[i] == rcp[i]; ++i)
+ ;
+ if (i > len)
+ len = i;
+ rcp = strchr(rcp + 1, *lcp);
+ }
+ if (len == 0)
+ continue;
+#ifdef __STDC__
+ if ((cpp = enlist(cpp, lcp, (size_t)len)) == NULL)
+#else
+ if ((cpp = enlist(cpp, lcp, len)) == NULL)
+#endif
+ break;
+ }
+ return cpp;
+}
+
+static char **
+addlists(old, new)
+char ** old;
+char ** new;
+{
+ register int i;
+
+ if (old == NULL || new == NULL)
+ return NULL;
+ for (i = 0; new[i] != NULL; ++i) {
+ old = enlist(old, new[i], strlen(new[i]));
+ if (old == NULL)
+ break;
+ }
+ return old;
+}
+
+/*
+** Given two lists of substrings,
+** return a new list giving substrings common to both.
+*/
+
+static char **
+inboth(left, right)
+char ** left;
+char ** right;
+{
+ register char ** both;
+ register char ** temp;
+ register int lnum, rnum;
+
+ if (left == NULL || right == NULL)
+ return NULL;
+ both = (char **) malloc(sizeof *both);
+ if (both == NULL)
+ return NULL;
+ both[0] = NULL;
+ for (lnum = 0; left[lnum] != NULL; ++lnum) {
+ for (rnum = 0; right[rnum] != NULL; ++rnum) {
+ temp = comsubs(left[lnum], right[rnum]);
+ if (temp == NULL) {
+ freelist(both);
+ return NULL;
+ }
+ both = addlists(both, temp);
+ freelist(temp);
+ if (both == NULL)
+ return NULL;
+ }
+ }
+ return both;
+}
+
+/*
+typedef struct {
+ char ** in;
+ char * left;
+ char * right;
+ char * is;
+} must;
+ */
+static void
+resetmust(mp)
+register must * mp;
+{
+ mp->left[0] = mp->right[0] = mp->is[0] = '\0';
+ freelist(mp->in);
+}
+
+static void
+regmust(r)
+register struct regexp * r;
+{
+ register must * musts;
+ register must * mp;
+ register char * result = "";
+ register int ri;
+ register int i;
+ register _token t;
+ static must must0;
+
+ reg->mustn = 0;
+ reg->must[0] = '\0';
+ musts = (must *) malloc((reg->tindex + 1) * sizeof *musts);
+ if (musts == NULL)
+ return;
+ mp = musts;
+ for (i = 0; i <= reg->tindex; ++i)
+ mp[i] = must0;
+ for (i = 0; i <= reg->tindex; ++i) {
+ mp[i].in = (char **) malloc(sizeof *mp[i].in);
+ mp[i].left = malloc(2);
+ mp[i].right = malloc(2);
+ mp[i].is = malloc(2);
+ if (mp[i].in == NULL || mp[i].left == NULL ||
+ mp[i].right == NULL || mp[i].is == NULL)
+ goto done;
+ mp[i].left[0] = mp[i].right[0] = mp[i].is[0] = '\0';
+ mp[i].in[0] = NULL;
+ }
+ for (ri = 0; ri < reg->tindex; ++ri) {
+ switch (t = reg->tokens[ri]) {
+ case _ALLBEGLINE:
+ case _ALLENDLINE:
+ case _LPAREN:
+ case _RPAREN:
+ goto done; /* "cannot happen" */
+ case _EMPTY:
+ case _BEGLINE:
+ case _ENDLINE:
+ case _BEGWORD:
+ case _ENDWORD:
+ case _LIMWORD:
+ case _NOTLIMWORD:
+ case _BACKREF:
+ resetmust(mp);
+ break;
+ case _STAR:
+ case _QMARK:
+ if (mp <= musts)
+ goto done; /* "cannot happen" */
+ --mp;
+ resetmust(mp);
+ break;
+ case _OR:
+ if (mp < &musts[2])
+ goto done; /* "cannot happen" */
+ {
+ register char ** new;
+ register must * lmp;
+ register must * rmp;
+ register int j, ln, rn, n;
+
+ rmp = --mp;
+ lmp = --mp;
+ /* Guaranteed to be. Unlikely, but. . . */
+ if (strcmp(lmp->is, rmp->is) != 0)
+ lmp->is[0] = '\0';
+ /* Left side--easy */
+ i = 0;
+ while (lmp->left[i] != '\0' &&
+ lmp->left[i] == rmp->left[i])
+ ++i;
+ lmp->left[i] = '\0';
+ /* Right side */
+ ln = strlen(lmp->right);
+ rn = strlen(rmp->right);
+ n = ln;
+ if (n > rn)
+ n = rn;
+ for (i = 0; i < n; ++i)
+ if (lmp->right[ln - i - 1] !=
+ rmp->right[rn - i - 1])
+ break;
+ for (j = 0; j < i; ++j)
+ lmp->right[j] =
+ lmp->right[(ln - i) + j];
+ lmp->right[j] = '\0';
+ new = inboth(lmp->in, rmp->in);
+ if (new == NULL)
+ goto done;
+ freelist(lmp->in);
+ free((char *) lmp->in);
+ lmp->in = new;
+ }
+ break;
+ case _PLUS:
+ if (mp <= musts)
+ goto done; /* "cannot happen" */
+ --mp;
+ mp->is[0] = '\0';
+ break;
+ case _END:
+ if (mp != &musts[1])
+ goto done; /* "cannot happen" */
+ for (i = 0; musts[0].in[i] != NULL; ++i)
+ if (strlen(musts[0].in[i]) > strlen(result))
+ result = musts[0].in[i];
+ goto done;
+ case _CAT:
+ if (mp < &musts[2])
+ goto done; /* "cannot happen" */
+ {
+ register must * lmp;
+ register must * rmp;
+
+ rmp = --mp;
+ lmp = --mp;
+ /*
+ ** In. Everything in left, plus everything in
+ ** right, plus catenation of
+ ** left's right and right's left.
+ */
+ lmp->in = addlists(lmp->in, rmp->in);
+ if (lmp->in == NULL)
+ goto done;
+ if (lmp->right[0] != '\0' &&
+ rmp->left[0] != '\0') {
+ register char * tp;
+
+ tp = icpyalloc(lmp->right);
+ if (tp == NULL)
+ goto done;
+ tp = icatalloc(tp, rmp->left);
+ if (tp == NULL)
+ goto done;
+ lmp->in = enlist(lmp->in, tp,
+ strlen(tp));
+ free(tp);
+ if (lmp->in == NULL)
+ goto done;
+ }
+ /* Left-hand */
+ if (lmp->is[0] != '\0') {
+ lmp->left = icatalloc(lmp->left,
+ rmp->left);
+ if (lmp->left == NULL)
+ goto done;
+ }
+ /* Right-hand */
+ if (rmp->is[0] == '\0')
+ lmp->right[0] = '\0';
+ lmp->right = icatalloc(lmp->right, rmp->right);
+ if (lmp->right == NULL)
+ goto done;
+ /* Guaranteed to be */
+ if (lmp->is[0] != '\0' && rmp->is[0] != '\0') {
+ lmp->is = icatalloc(lmp->is, rmp->is);
+ if (lmp->is == NULL)
+ goto done;
+ }
+ }
+ break;
+ default:
+ if (t < _END) {
+ /* "cannot happen" */
+ goto done;
+ } else if (t == '\0') {
+ /* not on *my* shift */
+ goto done;
+ } else if (t >= _SET) {
+ /* easy enough */
+ resetmust(mp);
+ } else {
+ /* plain character */
+ resetmust(mp);
+ mp->is[0] = mp->left[0] = mp->right[0] = t;
+ mp->is[1] = mp->left[1] = mp->right[1] = '\0';
+ mp->in = enlist(mp->in, mp->is, 1);
+ if (mp->in == NULL)
+ goto done;
+ }
+ break;
+ }
+ ++mp;
+ }
+done:
+ (void) strncpy(reg->must, result, MUST_MAX - 1);
+ reg->must[MUST_MAX - 1] = '\0';
+ reg->mustn = strlen(reg->must);
+ mp = musts;
+ for (i = 0; i <= reg->tindex; ++i) {
+ freelist(mp[i].in);
+ ifree((char *) mp[i].in);
+ ifree(mp[i].left);
+ ifree(mp[i].right);
+ ifree(mp[i].is);
+ }
+ free((char *) mp);
+}
diff --git a/gnu/usr.bin/awk/dfa.h b/gnu/usr.bin/awk/dfa.h
new file mode 100644
index 000000000000..65fc49565a7c
--- /dev/null
+++ b/gnu/usr.bin/awk/dfa.h
@@ -0,0 +1,543 @@
+/* dfa.h - declarations for GNU deterministic regexp compiler
+ Copyright (C) 1988 Free Software Foundation, Inc.
+ Written June, 1988 by Mike Haertel
+
+ NO WARRANTY
+
+ BECAUSE THIS PROGRAM IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY
+NO WARRANTY, TO THE EXTENT PERMITTED BY APPLICABLE STATE LAW. EXCEPT
+WHEN OTHERWISE STATED IN WRITING, FREE SOFTWARE FOUNDATION, INC,
+RICHARD M. STALLMAN AND/OR OTHER PARTIES PROVIDE THIS PROGRAM "AS IS"
+WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING,
+BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
+FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY
+AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE
+DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR
+CORRECTION.
+
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW WILL RICHARD M.
+STALLMAN, THE FREE SOFTWARE FOUNDATION, INC., AND/OR ANY OTHER PARTY
+WHO MAY MODIFY AND REDISTRIBUTE THIS PROGRAM AS PERMITTED BELOW, BE
+LIABLE TO YOU FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR
+OTHER SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR
+DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES OR
+A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) THIS
+PROGRAM, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
+DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY.
+
+ GENERAL PUBLIC LICENSE TO COPY
+
+ 1. You may copy and distribute verbatim copies of this source file
+as you receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy a valid copyright notice "Copyright
+ (C) 1988 Free Software Foundation, Inc."; and include following the
+copyright notice a verbatim copy of the above disclaimer of warranty
+and of this License. You may charge a distribution fee for the
+physical act of transferring a copy.
+
+ 2. You may modify your copy or copies of this source file or
+any portion of it, and copy and distribute such modifications under
+the terms of Paragraph 1 above, provided that you also do the following:
+
+ a) cause the modified files to carry prominent notices stating
+ that you changed the files and the date of any change; and
+
+ b) cause the whole of any work that you distribute or publish,
+ that in whole or in part contains or is a derivative of this
+ program or any part thereof, to be licensed at no charge to all
+ third parties on terms identical to those contained in this
+ License Agreement (except that you may choose to grant more extensive
+ warranty protection to some or all third parties, at your option).
+
+ c) You may charge a distribution fee for the physical act of
+ transferring a copy, and you may at your option offer warranty
+ protection in exchange for a fee.
+
+Mere aggregation of another unrelated program with this program (or its
+derivative) on a volume of a storage or distribution medium does not bring
+the other program under the scope of these terms.
+
+ 3. You may copy and distribute this program or any portion of it in
+compiled, executable or object code form under the terms of Paragraphs
+1 and 2 above provided that you do the following:
+
+ a) accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of
+ Paragraphs 1 and 2 above; or,
+
+ b) accompany it with a written offer, valid for at least three
+ years, to give any third party free (except for a nominal
+ shipping charge) a complete machine-readable copy of the
+ corresponding source code, to be distributed under the terms of
+ Paragraphs 1 and 2 above; or,
+
+ c) accompany it with the information you received as to where the
+ corresponding source code may be obtained. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form alone.)
+
+For an executable file, complete source code means all the source code for
+all modules it contains; but, as a special exception, it need not include
+source code for modules which are standard libraries that accompany the
+operating system on which the executable file runs.
+
+ 4. You may not copy, sublicense, distribute or transfer this program
+except as expressly provided under this License Agreement. Any attempt
+otherwise to copy, sublicense, distribute or transfer this program is void and
+your rights to use the program under this License agreement shall be
+automatically terminated. However, parties who have received computer
+software programs from you with this License Agreement will not have
+their licenses terminated so long as such parties remain in full compliance.
+
+ 5. If you wish to incorporate parts of this program into other free
+programs whose distribution conditions are different, write to the Free
+Software Foundation at 675 Mass Ave, Cambridge, MA 02139. We have not yet
+worked out a simple rule that can be stated here, but we will often permit
+this. We will be guided by the two goals of preserving the free status of
+all derivatives our free software and of promoting the sharing and reuse of
+software.
+
+
+In other words, you are welcome to use, share and improve this program.
+You are forbidden to forbid anyone else to use, share and improve
+what you give them. Help stamp out software-hoarding! */
+
+#ifdef __STDC__
+
+#ifdef SOMEDAY
+#define ISALNUM(c) isalnum(c)
+#define ISALPHA(c) isalpha(c)
+#define ISUPPER(c) isupper(c)
+#else
+#define ISALNUM(c) (isascii(c) && isalnum(c))
+#define ISALPHA(c) (isascii(c) && isalpha(c))
+#define ISUPPER(c) (isascii(c) && isupper(c))
+#endif
+
+#else /* ! __STDC__ */
+
+#define const
+
+#define ISALNUM(c) (isascii(c) && isalnum(c))
+#define ISALPHA(c) (isascii(c) && isalpha(c))
+#define ISUPPER(c) (isascii(c) && isupper(c))
+
+#endif /* ! __STDC__ */
+
+/* 1 means plain parentheses serve as grouping, and backslash
+ parentheses are needed for literal searching.
+ 0 means backslash-parentheses are grouping, and plain parentheses
+ are for literal searching. */
+#define RE_NO_BK_PARENS 1L
+
+/* 1 means plain | serves as the "or"-operator, and \| is a literal.
+ 0 means \| serves as the "or"-operator, and | is a literal. */
+#define RE_NO_BK_VBAR (1L << 1)
+
+/* 0 means plain + or ? serves as an operator, and \+, \? are literals.
+ 1 means \+, \? are operators and plain +, ? are literals. */
+#define RE_BK_PLUS_QM (1L << 2)
+
+/* 1 means | binds tighter than ^ or $.
+ 0 means the contrary. */
+#define RE_TIGHT_VBAR (1L << 3)
+
+/* 1 means treat \n as an _OR operator
+ 0 means treat it as a normal character */
+#define RE_NEWLINE_OR (1L << 4)
+
+/* 0 means that a special characters (such as *, ^, and $) always have
+ their special meaning regardless of the surrounding context.
+ 1 means that special characters may act as normal characters in some
+ contexts. Specifically, this applies to:
+ ^ - only special at the beginning, or after ( or |
+ $ - only special at the end, or before ) or |
+ *, +, ? - only special when not after the beginning, (, or | */
+#define RE_CONTEXT_INDEP_OPS (1L << 5)
+
+/* 1 means that \ in a character class escapes the next character (typically
+ a hyphen. It also is overloaded to mean that hyphen at the end of the range
+ is allowable and means that the hyphen is to be taken literally. */
+#define RE_AWK_CLASS_HACK (1L << 6)
+
+/* Now define combinations of bits for the standard possibilities. */
+#ifdef notdef
+#define RE_SYNTAX_AWK (RE_NO_BK_PARENS | RE_NO_BK_VBAR | RE_CONTEXT_INDEP_OPS)
+#define RE_SYNTAX_EGREP (RE_SYNTAX_AWK | RE_NEWLINE_OR)
+#define RE_SYNTAX_GREP (RE_BK_PLUS_QM | RE_NEWLINE_OR)
+#define RE_SYNTAX_EMACS 0
+#endif
+
+/* The NULL pointer. */
+#ifndef NULL
+#define NULL 0
+#endif
+
+/* Number of bits in an unsigned char. */
+#ifndef CHARBITS
+#define CHARBITS 8
+#endif
+
+/* First integer value that is greater than any character code. */
+#define _NOTCHAR (1 << CHARBITS)
+
+/* INTBITS need not be exact, just a lower bound. */
+#ifndef INTBITS
+#define INTBITS (CHARBITS * sizeof (int))
+#endif
+
+/* Number of ints required to hold a bit for every character. */
+#define _CHARSET_INTS ((_NOTCHAR + INTBITS - 1) / INTBITS)
+
+/* Sets of unsigned characters are stored as bit vectors in arrays of ints. */
+typedef int _charset[_CHARSET_INTS];
+
+/* The regexp is parsed into an array of tokens in postfix form. Some tokens
+ are operators and others are terminal symbols. Most (but not all) of these
+ codes are returned by the lexical analyzer. */
+#ifdef __STDC__
+
+typedef enum
+{
+ _END = -1, /* _END is a terminal symbol that matches the
+ end of input; any value of _END or less in
+ the parse tree is such a symbol. Accepting
+ states of the DFA are those that would have
+ a transition on _END. */
+
+ /* Ordinary character values are terminal symbols that match themselves. */
+
+ _EMPTY = _NOTCHAR, /* _EMPTY is a terminal symbol that matches
+ the empty string. */
+
+ _BACKREF, /* _BACKREF is generated by \<digit>; it
+ it not completely handled. If the scanner
+ detects a transition on backref, it returns
+ a kind of "semi-success" indicating that
+ the match will have to be verified with
+ a backtracking matcher. */
+
+ _BEGLINE, /* _BEGLINE is a terminal symbol that matches
+ the empty string if it is at the beginning
+ of a line. */
+
+ _ALLBEGLINE, /* _ALLBEGLINE is a terminal symbol that
+ matches the empty string if it is at the
+ beginning of a line; _ALLBEGLINE applies
+ to the entire regexp and can only occur
+ as the first token thereof. _ALLBEGLINE
+ never appears in the parse tree; a _BEGLINE
+ is prepended with _CAT to the entire
+ regexp instead. */
+
+ _ENDLINE, /* _ENDLINE is a terminal symbol that matches
+ the empty string if it is at the end of
+ a line. */
+
+ _ALLENDLINE, /* _ALLENDLINE is to _ENDLINE as _ALLBEGLINE
+ is to _BEGLINE. */
+
+ _BEGWORD, /* _BEGWORD is a terminal symbol that matches
+ the empty string if it is at the beginning
+ of a word. */
+
+ _ENDWORD, /* _ENDWORD is a terminal symbol that matches
+ the empty string if it is at the end of
+ a word. */
+
+ _LIMWORD, /* _LIMWORD is a terminal symbol that matches
+ the empty string if it is at the beginning
+ or the end of a word. */
+
+ _NOTLIMWORD, /* _NOTLIMWORD is a terminal symbol that
+ matches the empty string if it is not at
+ the beginning or end of a word. */
+
+ _QMARK, /* _QMARK is an operator of one argument that
+ matches zero or one occurences of its
+ argument. */
+
+ _STAR, /* _STAR is an operator of one argument that
+ matches the Kleene closure (zero or more
+ occurrences) of its argument. */
+
+ _PLUS, /* _PLUS is an operator of one argument that
+ matches the positive closure (one or more
+ occurrences) of its argument. */
+
+ _CAT, /* _CAT is an operator of two arguments that
+ matches the concatenation of its
+ arguments. _CAT is never returned by the
+ lexical analyzer. */
+
+ _OR, /* _OR is an operator of two arguments that
+ matches either of its arguments. */
+
+ _LPAREN, /* _LPAREN never appears in the parse tree,
+ it is only a lexeme. */
+
+ _RPAREN, /* _RPAREN never appears in the parse tree. */
+
+ _SET /* _SET and (and any value greater) is a
+ terminal symbol that matches any of a
+ class of characters. */
+} _token;
+
+#else /* ! __STDC__ */
+
+typedef short _token;
+
+#define _END -1
+#define _EMPTY _NOTCHAR
+#define _BACKREF (_EMPTY + 1)
+#define _BEGLINE (_EMPTY + 2)
+#define _ALLBEGLINE (_EMPTY + 3)
+#define _ENDLINE (_EMPTY + 4)
+#define _ALLENDLINE (_EMPTY + 5)
+#define _BEGWORD (_EMPTY + 6)
+#define _ENDWORD (_EMPTY + 7)
+#define _LIMWORD (_EMPTY + 8)
+#define _NOTLIMWORD (_EMPTY + 9)
+#define _QMARK (_EMPTY + 10)
+#define _STAR (_EMPTY + 11)
+#define _PLUS (_EMPTY + 12)
+#define _CAT (_EMPTY + 13)
+#define _OR (_EMPTY + 14)
+#define _LPAREN (_EMPTY + 15)
+#define _RPAREN (_EMPTY + 16)
+#define _SET (_EMPTY + 17)
+
+#endif /* ! __STDC__ */
+
+/* Sets are stored in an array in the compiled regexp; the index of the
+ array corresponding to a given set token is given by _SET_INDEX(t). */
+#define _SET_INDEX(t) ((t) - _SET)
+
+/* Sometimes characters can only be matched depending on the surrounding
+ context. Such context decisions depend on what the previous character
+ was, and the value of the current (lookahead) character. Context
+ dependent constraints are encoded as 8 bit integers. Each bit that
+ is set indicates that the constraint succeeds in the corresponding
+ context.
+
+ bit 7 - previous and current are newlines
+ bit 6 - previous was newline, current isn't
+ bit 5 - previous wasn't newline, current is
+ bit 4 - neither previous nor current is a newline
+ bit 3 - previous and current are word-constituents
+ bit 2 - previous was word-constituent, current isn't
+ bit 1 - previous wasn't word-constituent, current is
+ bit 0 - neither previous nor current is word-constituent
+
+ Word-constituent characters are those that satisfy isalnum().
+
+ The macro _SUCCEEDS_IN_CONTEXT determines whether a a given constraint
+ succeeds in a particular context. Prevn is true if the previous character
+ was a newline, currn is true if the lookahead character is a newline.
+ Prevl and currl similarly depend upon whether the previous and current
+ characters are word-constituent letters. */
+#define _MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
+ ((constraint) & (1 << (((prevn) ? 2 : 0) + ((currn) ? 1 : 0) + 4)))
+#define _MATCHES_LETTER_CONTEXT(constraint, prevl, currl) \
+ ((constraint) & (1 << (((prevl) ? 2 : 0) + ((currl) ? 1 : 0))))
+#define _SUCCEEDS_IN_CONTEXT(constraint, prevn, currn, prevl, currl) \
+ (_MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
+ && _MATCHES_LETTER_CONTEXT(constraint, prevl, currl))
+
+/* The following macros give information about what a constraint depends on. */
+#define _PREV_NEWLINE_DEPENDENT(constraint) \
+ (((constraint) & 0xc0) >> 2 != ((constraint) & 0x30))
+#define _PREV_LETTER_DEPENDENT(constraint) \
+ (((constraint) & 0x0c) >> 2 != ((constraint) & 0x03))
+
+/* Tokens that match the empty string subject to some constraint actually
+ work by applying that constraint to determine what may follow them,
+ taking into account what has gone before. The following values are
+ the constraints corresponding to the special tokens previously defined. */
+#define _NO_CONSTRAINT 0xff
+#define _BEGLINE_CONSTRAINT 0xcf
+#define _ENDLINE_CONSTRAINT 0xaf
+#define _BEGWORD_CONSTRAINT 0xf2
+#define _ENDWORD_CONSTRAINT 0xf4
+#define _LIMWORD_CONSTRAINT 0xf6
+#define _NOTLIMWORD_CONSTRAINT 0xf9
+
+/* States of the recognizer correspond to sets of positions in the parse
+ tree, together with the constraints under which they may be matched.
+ So a position is encoded as an index into the parse tree together with
+ a constraint. */
+typedef struct
+{
+ unsigned index; /* Index into the parse array. */
+ unsigned constraint; /* Constraint for matching this position. */
+} _position;
+
+/* Sets of positions are stored as arrays. */
+typedef struct
+{
+ _position *elems; /* Elements of this position set. */
+ int nelem; /* Number of elements in this set. */
+} _position_set;
+
+/* A state of the regexp consists of a set of positions, some flags,
+ and the token value of the lowest-numbered position of the state that
+ contains an _END token. */
+typedef struct
+{
+ int hash; /* Hash of the positions of this state. */
+ _position_set elems; /* Positions this state could match. */
+ char newline; /* True if previous state matched newline. */
+ char letter; /* True if previous state matched a letter. */
+ char backref; /* True if this state matches a \<digit>. */
+ unsigned char constraint; /* Constraint for this state to accept. */
+ int first_end; /* Token value of the first _END in elems. */
+} _dfa_state;
+
+/* If an r.e. is at most MUST_MAX characters long, we look for a string which
+ must appear in it; whatever's found is dropped into the struct reg. */
+
+#define MUST_MAX 50
+
+/* A compiled regular expression. */
+struct regexp
+{
+ /* Stuff built by the scanner. */
+ _charset *charsets; /* Array of character sets for _SET tokens. */
+ int cindex; /* Index for adding new charsets. */
+ int calloc; /* Number of charsets currently allocated. */
+
+ /* Stuff built by the parser. */
+ _token *tokens; /* Postfix parse array. */
+ int tindex; /* Index for adding new tokens. */
+ int talloc; /* Number of tokens currently allocated. */
+ int depth; /* Depth required of an evaluation stack
+ used for depth-first traversal of the
+ parse tree. */
+ int nleaves; /* Number of leaves on the parse tree. */
+ int nregexps; /* Count of parallel regexps being built
+ with regparse(). */
+
+ /* Stuff owned by the state builder. */
+ _dfa_state *states; /* States of the regexp. */
+ int sindex; /* Index for adding new states. */
+ int salloc; /* Number of states currently allocated. */
+
+ /* Stuff built by the structure analyzer. */
+ _position_set *follows; /* Array of follow sets, indexed by position
+ index. The follow of a position is the set
+ of positions containing characters that
+ could conceivably follow a character
+ matching the given position in a string
+ matching the regexp. Allocated to the
+ maximum possible position index. */
+ int searchflag; /* True if we are supposed to build a searching
+ as opposed to an exact matcher. A searching
+ matcher finds the first and shortest string
+ matching a regexp anywhere in the buffer,
+ whereas an exact matcher finds the longest
+ string matching, but anchored to the
+ beginning of the buffer. */
+
+ /* Stuff owned by the executor. */
+ int tralloc; /* Number of transition tables that have
+ slots so far. */
+ int trcount; /* Number of transition tables that have
+ actually been built. */
+ int **trans; /* Transition tables for states that can
+ never accept. If the transitions for a
+ state have not yet been computed, or the
+ state could possibly accept, its entry in
+ this table is NULL. */
+ int **realtrans; /* Trans always points to realtrans + 1; this
+ is so trans[-1] can contain NULL. */
+ int **fails; /* Transition tables after failing to accept
+ on a state that potentially could do so. */
+ int *success; /* Table of acceptance conditions used in
+ regexecute and computed in build_state. */
+ int *newlines; /* Transitions on newlines. The entry for a
+ newline in any transition table is always
+ -1 so we can count lines without wasting
+ too many cycles. The transition for a
+ newline is stored separately and handled
+ as a special case. Newline is also used
+ as a sentinel at the end of the buffer. */
+ char must[MUST_MAX];
+ int mustn;
+};
+
+/* Some macros for user access to regexp internals. */
+
+/* ACCEPTING returns true if s could possibly be an accepting state of r. */
+#define ACCEPTING(s, r) ((r).states[s].constraint)
+
+/* ACCEPTS_IN_CONTEXT returns true if the given state accepts in the
+ specified context. */
+#define ACCEPTS_IN_CONTEXT(prevn, currn, prevl, currl, state, reg) \
+ _SUCCEEDS_IN_CONTEXT((reg).states[state].constraint, \
+ prevn, currn, prevl, currl)
+
+/* FIRST_MATCHING_REGEXP returns the index number of the first of parallel
+ regexps that a given state could accept. Parallel regexps are numbered
+ starting at 1. */
+#define FIRST_MATCHING_REGEXP(state, reg) (-(reg).states[state].first_end)
+
+/* Entry points. */
+
+#ifdef __STDC__
+
+/* Regsyntax() takes two arguments; the first sets the syntax bits described
+ earlier in this file, and the second sets the case-folding flag. */
+extern void regsyntax(long, int);
+
+/* Compile the given string of the given length into the given struct regexp.
+ Final argument is a flag specifying whether to build a searching or an
+ exact matcher. */
+extern void regcompile(const char *, size_t, struct regexp *, int);
+
+/* Execute the given struct regexp on the buffer of characters. The
+ first char * points to the beginning, and the second points to the
+ first character after the end of the buffer, which must be a writable
+ place so a sentinel end-of-buffer marker can be stored there. The
+ second-to-last argument is a flag telling whether to allow newlines to
+ be part of a string matching the regexp. The next-to-last argument,
+ if non-NULL, points to a place to increment every time we see a
+ newline. The final argument, if non-NULL, points to a flag that will
+ be set if further examination by a backtracking matcher is needed in
+ order to verify backreferencing; otherwise the flag will be cleared.
+ Returns NULL if no match is found, or a pointer to the first
+ character after the first & shortest matching string in the buffer. */
+extern char *regexecute(struct regexp *, char *, char *, int, int *, int *);
+
+/* Free the storage held by the components of a struct regexp. */
+extern void reg_free(struct regexp *);
+
+/* Entry points for people who know what they're doing. */
+
+/* Initialize the components of a struct regexp. */
+extern void reginit(struct regexp *);
+
+/* Incrementally parse a string of given length into a struct regexp. */
+extern void regparse(const char *, size_t, struct regexp *);
+
+/* Analyze a parsed regexp; second argument tells whether to build a searching
+ or an exact matcher. */
+extern void reganalyze(struct regexp *, int);
+
+/* Compute, for each possible character, the transitions out of a given
+ state, storing them in an array of integers. */
+extern void regstate(int, struct regexp *, int []);
+
+/* Error handling. */
+
+/* Regerror() is called by the regexp routines whenever an error occurs. It
+ takes a single argument, a NUL-terminated string describing the error.
+ The default reg_error() prints the error message to stderr and exits.
+ The user can provide a different reg_free() if so desired. */
+extern void reg_error(const char *);
+
+#else /* ! __STDC__ */
+extern void regsyntax(), regcompile(), reg_free(), reginit(), regparse();
+extern void reganalyze(), regstate(), reg_error();
+extern char *regexecute();
+#endif
diff --git a/gnu/usr.bin/awk/eval.c b/gnu/usr.bin/awk/eval.c
new file mode 100644
index 000000000000..f640f3733ada
--- /dev/null
+++ b/gnu/usr.bin/awk/eval.c
@@ -0,0 +1,1225 @@
+/*
+ * eval.c - gawk parse tree interpreter
+ */
+
+/*
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ *
+ * This file is part of GAWK, the GNU implementation of the
+ * AWK Progamming Language.
+ *
+ * GAWK is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * GAWK is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with GAWK; see the file COPYING. If not, write to
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include "awk.h"
+
+extern double pow P((double x, double y));
+extern double modf P((double x, double *yp));
+extern double fmod P((double x, double y));
+
+static int eval_condition P((NODE *tree));
+static NODE *op_assign P((NODE *tree));
+static NODE *func_call P((NODE *name, NODE *arg_list));
+static NODE *match_op P((NODE *tree));
+
+NODE *_t; /* used as a temporary in macros */
+#ifdef MSDOS
+double _msc51bug; /* to get around a bug in MSC 5.1 */
+#endif
+NODE *ret_node;
+int OFSlen;
+int ORSlen;
+int OFMTidx;
+int CONVFMTidx;
+
+/* Macros and variables to save and restore function and loop bindings */
+/*
+ * the val variable allows return/continue/break-out-of-context to be
+ * caught and diagnosed
+ */
+#define PUSH_BINDING(stack, x, val) (memcpy ((char *)(stack), (char *)(x), sizeof (jmp_buf)), val++)
+#define RESTORE_BINDING(stack, x, val) (memcpy ((char *)(x), (char *)(stack), sizeof (jmp_buf)), val--)
+
+static jmp_buf loop_tag; /* always the current binding */
+static int loop_tag_valid = 0; /* nonzero when loop_tag valid */
+static int func_tag_valid = 0;
+static jmp_buf func_tag;
+extern int exiting, exit_val;
+
+/*
+ * This table is used by the regexp routines to do case independant
+ * matching. Basically, every ascii character maps to itself, except
+ * uppercase letters map to lower case ones. This table has 256
+ * entries, which may be overkill. Note also that if the system this
+ * is compiled on doesn't use 7-bit ascii, casetable[] should not be
+ * defined to the linker, so gawk should not load.
+ *
+ * Do NOT make this array static, it is used in several spots, not
+ * just in this file.
+ */
+#if 'a' == 97 /* it's ascii */
+char casetable[] = {
+ '\000', '\001', '\002', '\003', '\004', '\005', '\006', '\007',
+ '\010', '\011', '\012', '\013', '\014', '\015', '\016', '\017',
+ '\020', '\021', '\022', '\023', '\024', '\025', '\026', '\027',
+ '\030', '\031', '\032', '\033', '\034', '\035', '\036', '\037',
+ /* ' ' '!' '"' '#' '$' '%' '&' ''' */
+ '\040', '\041', '\042', '\043', '\044', '\045', '\046', '\047',
+ /* '(' ')' '*' '+' ',' '-' '.' '/' */
+ '\050', '\051', '\052', '\053', '\054', '\055', '\056', '\057',
+ /* '0' '1' '2' '3' '4' '5' '6' '7' */
+ '\060', '\061', '\062', '\063', '\064', '\065', '\066', '\067',
+ /* '8' '9' ':' ';' '<' '=' '>' '?' */
+ '\070', '\071', '\072', '\073', '\074', '\075', '\076', '\077',
+ /* '@' 'A' 'B' 'C' 'D' 'E' 'F' 'G' */
+ '\100', '\141', '\142', '\143', '\144', '\145', '\146', '\147',
+ /* 'H' 'I' 'J' 'K' 'L' 'M' 'N' 'O' */
+ '\150', '\151', '\152', '\153', '\154', '\155', '\156', '\157',
+ /* 'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'W' */
+ '\160', '\161', '\162', '\163', '\164', '\165', '\166', '\167',
+ /* 'X' 'Y' 'Z' '[' '\' ']' '^' '_' */
+ '\170', '\171', '\172', '\133', '\134', '\135', '\136', '\137',
+ /* '`' 'a' 'b' 'c' 'd' 'e' 'f' 'g' */
+ '\140', '\141', '\142', '\143', '\144', '\145', '\146', '\147',
+ /* 'h' 'i' 'j' 'k' 'l' 'm' 'n' 'o' */
+ '\150', '\151', '\152', '\153', '\154', '\155', '\156', '\157',
+ /* 'p' 'q' 'r' 's' 't' 'u' 'v' 'w' */
+ '\160', '\161', '\162', '\163', '\164', '\165', '\166', '\167',
+ /* 'x' 'y' 'z' '{' '|' '}' '~' */
+ '\170', '\171', '\172', '\173', '\174', '\175', '\176', '\177',
+ '\200', '\201', '\202', '\203', '\204', '\205', '\206', '\207',
+ '\210', '\211', '\212', '\213', '\214', '\215', '\216', '\217',
+ '\220', '\221', '\222', '\223', '\224', '\225', '\226', '\227',
+ '\230', '\231', '\232', '\233', '\234', '\235', '\236', '\237',
+ '\240', '\241', '\242', '\243', '\244', '\245', '\246', '\247',
+ '\250', '\251', '\252', '\253', '\254', '\255', '\256', '\257',
+ '\260', '\261', '\262', '\263', '\264', '\265', '\266', '\267',
+ '\270', '\271', '\272', '\273', '\274', '\275', '\276', '\277',
+ '\300', '\301', '\302', '\303', '\304', '\305', '\306', '\307',
+ '\310', '\311', '\312', '\313', '\314', '\315', '\316', '\317',
+ '\320', '\321', '\322', '\323', '\324', '\325', '\326', '\327',
+ '\330', '\331', '\332', '\333', '\334', '\335', '\336', '\337',
+ '\340', '\341', '\342', '\343', '\344', '\345', '\346', '\347',
+ '\350', '\351', '\352', '\353', '\354', '\355', '\356', '\357',
+ '\360', '\361', '\362', '\363', '\364', '\365', '\366', '\367',
+ '\370', '\371', '\372', '\373', '\374', '\375', '\376', '\377',
+};
+#else
+#include "You lose. You will need a translation table for your character set."
+#endif
+
+/*
+ * Tree is a bunch of rules to run. Returns zero if it hit an exit()
+ * statement
+ */
+int
+interpret(tree)
+register NODE *volatile tree;
+{
+ jmp_buf volatile loop_tag_stack; /* shallow binding stack for loop_tag */
+ static jmp_buf rule_tag; /* tag the rule currently being run, for NEXT
+ * and EXIT statements. It is static because
+ * there are no nested rules */
+ register NODE *volatile t = NULL; /* temporary */
+ NODE **volatile lhs; /* lhs == Left Hand Side for assigns, etc */
+ NODE *volatile stable_tree;
+ int volatile traverse = 1; /* True => loop thru tree (Node_rule_list) */
+
+ if (tree == NULL)
+ return 1;
+ sourceline = tree->source_line;
+ source = tree->source_file;
+ switch (tree->type) {
+ case Node_rule_node:
+ traverse = 0; /* False => one for-loop iteration only */
+ /* FALL THROUGH */
+ case Node_rule_list:
+ for (t = tree; t != NULL; t = t->rnode) {
+ if (traverse)
+ tree = t->lnode;
+ sourceline = tree->source_line;
+ source = tree->source_file;
+ switch (setjmp(rule_tag)) {
+ case 0: /* normal non-jump */
+ /* test pattern, if any */
+ if (tree->lnode == NULL ||
+ eval_condition(tree->lnode))
+ (void) interpret(tree->rnode);
+ break;
+ case TAG_CONTINUE: /* NEXT statement */
+ return 1;
+ case TAG_BREAK:
+ return 0;
+ default:
+ cant_happen();
+ }
+ if (!traverse) /* case Node_rule_node */
+ break; /* don't loop */
+ }
+ break;
+
+ case Node_statement_list:
+ for (t = tree; t != NULL; t = t->rnode)
+ (void) interpret(t->lnode);
+ break;
+
+ case Node_K_if:
+ if (eval_condition(tree->lnode)) {
+ (void) interpret(tree->rnode->lnode);
+ } else {
+ (void) interpret(tree->rnode->rnode);
+ }
+ break;
+
+ case Node_K_while:
+ PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+
+ stable_tree = tree;
+ while (eval_condition(stable_tree->lnode)) {
+ switch (setjmp(loop_tag)) {
+ case 0: /* normal non-jump */
+ (void) interpret(stable_tree->rnode);
+ break;
+ case TAG_CONTINUE: /* continue statement */
+ break;
+ case TAG_BREAK: /* break statement */
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ return 1;
+ default:
+ cant_happen();
+ }
+ }
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ break;
+
+ case Node_K_do:
+ PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ stable_tree = tree;
+ do {
+ switch (setjmp(loop_tag)) {
+ case 0: /* normal non-jump */
+ (void) interpret(stable_tree->rnode);
+ break;
+ case TAG_CONTINUE: /* continue statement */
+ break;
+ case TAG_BREAK: /* break statement */
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ return 1;
+ default:
+ cant_happen();
+ }
+ } while (eval_condition(stable_tree->lnode));
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ break;
+
+ case Node_K_for:
+ PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ (void) interpret(tree->forloop->init);
+ stable_tree = tree;
+ while (eval_condition(stable_tree->forloop->cond)) {
+ switch (setjmp(loop_tag)) {
+ case 0: /* normal non-jump */
+ (void) interpret(stable_tree->lnode);
+ /* fall through */
+ case TAG_CONTINUE: /* continue statement */
+ (void) interpret(stable_tree->forloop->incr);
+ break;
+ case TAG_BREAK: /* break statement */
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ return 1;
+ default:
+ cant_happen();
+ }
+ }
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ break;
+
+ case Node_K_arrayfor:
+ {
+ volatile struct search l; /* For array_for */
+ Func_ptr after_assign = NULL;
+
+#define hakvar forloop->init
+#define arrvar forloop->incr
+ PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ lhs = get_lhs(tree->hakvar, &after_assign);
+ t = tree->arrvar;
+ if (t->type == Node_param_list)
+ t = stack_ptr[t->param_cnt];
+ stable_tree = tree;
+ for (assoc_scan(t, (struct search *)&l);
+ l.retval;
+ assoc_next((struct search *)&l)) {
+ unref(*((NODE **) lhs));
+ *lhs = dupnode(l.retval);
+ if (after_assign)
+ (*after_assign)();
+ switch (setjmp(loop_tag)) {
+ case 0:
+ (void) interpret(stable_tree->lnode);
+ case TAG_CONTINUE:
+ break;
+
+ case TAG_BREAK:
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ return 1;
+ default:
+ cant_happen();
+ }
+ }
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ break;
+ }
+
+ case Node_K_break:
+ if (loop_tag_valid == 0)
+ fatal("unexpected break");
+ longjmp(loop_tag, TAG_BREAK);
+ break;
+
+ case Node_K_continue:
+ if (loop_tag_valid == 0) {
+ /*
+ * AT&T nawk treats continue outside of loops like
+ * next. Allow it if not posix, and complain if
+ * lint.
+ */
+ static int warned = 0;
+
+ if (do_lint && ! warned) {
+ warning("use of `continue' outside of loop is not portable");
+ warned = 1;
+ }
+ if (do_posix)
+ fatal("use of `continue' outside of loop is not allowed");
+ longjmp(rule_tag, TAG_CONTINUE);
+ } else
+ longjmp(loop_tag, TAG_CONTINUE);
+ break;
+
+ case Node_K_print:
+ do_print(tree);
+ break;
+
+ case Node_K_printf:
+ do_printf(tree);
+ break;
+
+ case Node_K_delete:
+ do_delete(tree->lnode, tree->rnode);
+ break;
+
+ case Node_K_next:
+ longjmp(rule_tag, TAG_CONTINUE);
+ break;
+
+ case Node_K_nextfile:
+ do_nextfile();
+ break;
+
+ case Node_K_exit:
+ /*
+ * In A,K,&W, p. 49, it says that an exit statement "...
+ * causes the program to behave as if the end of input had
+ * occurred; no more input is read, and the END actions, if
+ * any are executed." This implies that the rest of the rules
+ * are not done. So we immediately break out of the main loop.
+ */
+ exiting = 1;
+ if (tree) {
+ t = tree_eval(tree->lnode);
+ exit_val = (int) force_number(t);
+ }
+ free_temp(t);
+ longjmp(rule_tag, TAG_BREAK);
+ break;
+
+ case Node_K_return:
+ t = tree_eval(tree->lnode);
+ ret_node = dupnode(t);
+ free_temp(t);
+ longjmp(func_tag, TAG_RETURN);
+ break;
+
+ default:
+ /*
+ * Appears to be an expression statement. Throw away the
+ * value.
+ */
+ if (do_lint && tree->type == Node_var)
+ warning("statement has no effect");
+ t = tree_eval(tree);
+ free_temp(t);
+ break;
+ }
+ return 1;
+}
+
+/* evaluate a subtree */
+
+NODE *
+r_tree_eval(tree)
+register NODE *tree;
+{
+ register NODE *r, *t1, *t2; /* return value & temporary subtrees */
+ register NODE **lhs;
+ register int di;
+ AWKNUM x, x1, x2;
+ long lx;
+#ifdef CRAY
+ long lx2;
+#endif
+
+#ifdef DEBUG
+ if (tree == NULL)
+ return Nnull_string;
+ if (tree->type == Node_val) {
+ if (tree->stref <= 0) cant_happen();
+ return tree;
+ }
+ if (tree->type == Node_var) {
+ if (tree->var_value->stref <= 0) cant_happen();
+ return tree->var_value;
+ }
+ if (tree->type == Node_param_list) {
+ if (stack_ptr[tree->param_cnt] == NULL)
+ return Nnull_string;
+ else
+ return stack_ptr[tree->param_cnt]->var_value;
+ }
+#endif
+ switch (tree->type) {
+ case Node_and:
+ return tmp_number((AWKNUM) (eval_condition(tree->lnode)
+ && eval_condition(tree->rnode)));
+
+ case Node_or:
+ return tmp_number((AWKNUM) (eval_condition(tree->lnode)
+ || eval_condition(tree->rnode)));
+
+ case Node_not:
+ return tmp_number((AWKNUM) ! eval_condition(tree->lnode));
+
+ /* Builtins */
+ case Node_builtin:
+ return ((*tree->proc) (tree->subnode));
+
+ case Node_K_getline:
+ return (do_getline(tree));
+
+ case Node_in_array:
+ return tmp_number((AWKNUM) in_array(tree->lnode, tree->rnode));
+
+ case Node_func_call:
+ return func_call(tree->rnode, tree->lnode);
+
+ /* unary operations */
+ case Node_NR:
+ case Node_FNR:
+ case Node_NF:
+ case Node_FIELDWIDTHS:
+ case Node_FS:
+ case Node_RS:
+ case Node_field_spec:
+ case Node_subscript:
+ case Node_IGNORECASE:
+ case Node_OFS:
+ case Node_ORS:
+ case Node_OFMT:
+ case Node_CONVFMT:
+ lhs = get_lhs(tree, (Func_ptr *)0);
+ return *lhs;
+
+ case Node_var_array:
+ fatal("attempt to use an array in a scalar context");
+
+ case Node_unary_minus:
+ t1 = tree_eval(tree->subnode);
+ x = -force_number(t1);
+ free_temp(t1);
+ return tmp_number(x);
+
+ case Node_cond_exp:
+ if (eval_condition(tree->lnode))
+ return tree_eval(tree->rnode->lnode);
+ return tree_eval(tree->rnode->rnode);
+
+ case Node_match:
+ case Node_nomatch:
+ case Node_regex:
+ return match_op(tree);
+
+ case Node_func:
+ fatal("function `%s' called with space between name and (,\n%s",
+ tree->lnode->param,
+ "or used in other expression context");
+
+ /* assignments */
+ case Node_assign:
+ {
+ Func_ptr after_assign = NULL;
+
+ r = tree_eval(tree->rnode);
+ lhs = get_lhs(tree->lnode, &after_assign);
+ if (r != *lhs) {
+ NODE *save;
+
+ save = *lhs;
+ *lhs = dupnode(r);
+ unref(save);
+ }
+ free_temp(r);
+ if (after_assign)
+ (*after_assign)();
+ return *lhs;
+ }
+
+ case Node_concat:
+ {
+#define STACKSIZE 10
+ NODE *stack[STACKSIZE];
+ register NODE **sp;
+ register int len;
+ char *str;
+ register char *dest;
+
+ sp = stack;
+ len = 0;
+ while (tree->type == Node_concat) {
+ *sp = force_string(tree_eval(tree->lnode));
+ tree = tree->rnode;
+ len += (*sp)->stlen;
+ if (++sp == &stack[STACKSIZE-2]) /* one more and NULL */
+ break;
+ }
+ *sp = force_string(tree_eval(tree));
+ len += (*sp)->stlen;
+ *++sp = NULL;
+ emalloc(str, char *, len+2, "tree_eval");
+ dest = str;
+ sp = stack;
+ while (*sp) {
+ memcpy(dest, (*sp)->stptr, (*sp)->stlen);
+ dest += (*sp)->stlen;
+ free_temp(*sp);
+ sp++;
+ }
+ r = make_str_node(str, len, ALREADY_MALLOCED);
+ r->flags |= TEMP;
+ }
+ return r;
+
+ /* other assignment types are easier because they are numeric */
+ case Node_preincrement:
+ case Node_predecrement:
+ case Node_postincrement:
+ case Node_postdecrement:
+ case Node_assign_exp:
+ case Node_assign_times:
+ case Node_assign_quotient:
+ case Node_assign_mod:
+ case Node_assign_plus:
+ case Node_assign_minus:
+ return op_assign(tree);
+ default:
+ break; /* handled below */
+ }
+
+ /* evaluate subtrees in order to do binary operation, then keep going */
+ t1 = tree_eval(tree->lnode);
+ t2 = tree_eval(tree->rnode);
+
+ switch (tree->type) {
+ case Node_geq:
+ case Node_leq:
+ case Node_greater:
+ case Node_less:
+ case Node_notequal:
+ case Node_equal:
+ di = cmp_nodes(t1, t2);
+ free_temp(t1);
+ free_temp(t2);
+ switch (tree->type) {
+ case Node_equal:
+ return tmp_number((AWKNUM) (di == 0));
+ case Node_notequal:
+ return tmp_number((AWKNUM) (di != 0));
+ case Node_less:
+ return tmp_number((AWKNUM) (di < 0));
+ case Node_greater:
+ return tmp_number((AWKNUM) (di > 0));
+ case Node_leq:
+ return tmp_number((AWKNUM) (di <= 0));
+ case Node_geq:
+ return tmp_number((AWKNUM) (di >= 0));
+ default:
+ cant_happen();
+ }
+ break;
+ default:
+ break; /* handled below */
+ }
+
+ x1 = force_number(t1);
+ free_temp(t1);
+ x2 = force_number(t2);
+ free_temp(t2);
+ switch (tree->type) {
+ case Node_exp:
+ if ((lx = x2) == x2 && lx >= 0) { /* integer exponent */
+ if (lx == 0)
+ x = 1;
+ else if (lx == 1)
+ x = x1;
+ else {
+ /* doing it this way should be more precise */
+ for (x = x1; --lx; )
+ x *= x1;
+ }
+ } else
+ x = pow((double) x1, (double) x2);
+ return tmp_number(x);
+
+ case Node_times:
+ return tmp_number(x1 * x2);
+
+ case Node_quotient:
+ if (x2 == 0)
+ fatal("division by zero attempted");
+#ifdef _CRAY
+ /*
+ * special case for integer division, put in for Cray
+ */
+ lx2 = x2;
+ if (lx2 == 0)
+ return tmp_number(x1 / x2);
+ lx = (long) x1 / lx2;
+ if (lx * x2 == x1)
+ return tmp_number((AWKNUM) lx);
+ else
+#endif
+ return tmp_number(x1 / x2);
+
+ case Node_mod:
+ if (x2 == 0)
+ fatal("division by zero attempted in mod");
+#ifndef FMOD_MISSING
+ return tmp_number(fmod (x1, x2));
+#else
+ (void) modf(x1 / x2, &x);
+ return tmp_number(x1 - x * x2);
+#endif
+
+ case Node_plus:
+ return tmp_number(x1 + x2);
+
+ case Node_minus:
+ return tmp_number(x1 - x2);
+
+ case Node_var_array:
+ fatal("attempt to use an array in a scalar context");
+
+ default:
+ fatal("illegal type (%d) in tree_eval", tree->type);
+ }
+ return 0;
+}
+
+/* Is TREE true or false? Returns 0==false, non-zero==true */
+static int
+eval_condition(tree)
+register NODE *tree;
+{
+ register NODE *t1;
+ register int ret;
+
+ if (tree == NULL) /* Null trees are the easiest kinds */
+ return 1;
+ if (tree->type == Node_line_range) {
+ /*
+ * Node_line_range is kind of like Node_match, EXCEPT: the
+ * lnode field (more properly, the condpair field) is a node
+ * of a Node_cond_pair; whether we evaluate the lnode of that
+ * node or the rnode depends on the triggered word. More
+ * precisely: if we are not yet triggered, we tree_eval the
+ * lnode; if that returns true, we set the triggered word.
+ * If we are triggered (not ELSE IF, note), we tree_eval the
+ * rnode, clear triggered if it succeeds, and perform our
+ * action (regardless of success or failure). We want to be
+ * able to begin and end on a single input record, so this
+ * isn't an ELSE IF, as noted above.
+ */
+ if (!tree->triggered)
+ if (!eval_condition(tree->condpair->lnode))
+ return 0;
+ else
+ tree->triggered = 1;
+ /* Else we are triggered */
+ if (eval_condition(tree->condpair->rnode))
+ tree->triggered = 0;
+ return 1;
+ }
+
+ /*
+ * Could just be J.random expression. in which case, null and 0 are
+ * false, anything else is true
+ */
+
+ t1 = tree_eval(tree);
+ if (t1->flags & MAYBE_NUM)
+ (void) force_number(t1);
+ if (t1->flags & NUMBER)
+ ret = t1->numbr != 0.0;
+ else
+ ret = t1->stlen != 0;
+ free_temp(t1);
+ return ret;
+}
+
+/*
+ * compare two nodes, returning negative, 0, positive
+ */
+int
+cmp_nodes(t1, t2)
+register NODE *t1, *t2;
+{
+ register int ret;
+ register int len1, len2;
+
+ if (t1 == t2)
+ return 0;
+ if (t1->flags & MAYBE_NUM)
+ (void) force_number(t1);
+ if (t2->flags & MAYBE_NUM)
+ (void) force_number(t2);
+ if ((t1->flags & NUMBER) && (t2->flags & NUMBER)) {
+ if (t1->numbr == t2->numbr) return 0;
+ else if (t1->numbr - t2->numbr < 0) return -1;
+ else return 1;
+ }
+ (void) force_string(t1);
+ (void) force_string(t2);
+ len1 = t1->stlen;
+ len2 = t2->stlen;
+ if (len1 == 0 || len2 == 0)
+ return len1 - len2;
+ ret = memcmp(t1->stptr, t2->stptr, len1 <= len2 ? len1 : len2);
+ return ret == 0 ? len1-len2 : ret;
+}
+
+static NODE *
+op_assign(tree)
+register NODE *tree;
+{
+ AWKNUM rval, lval;
+ NODE **lhs;
+ AWKNUM t1, t2;
+ long ltemp;
+ NODE *tmp;
+ Func_ptr after_assign = NULL;
+
+ lhs = get_lhs(tree->lnode, &after_assign);
+ lval = force_number(*lhs);
+
+ /*
+ * Can't unref *lhs until we know the type; doing so
+ * too early breaks x += x sorts of things.
+ */
+ switch(tree->type) {
+ case Node_preincrement:
+ case Node_predecrement:
+ unref(*lhs);
+ *lhs = make_number(lval +
+ (tree->type == Node_preincrement ? 1.0 : -1.0));
+ if (after_assign)
+ (*after_assign)();
+ return *lhs;
+
+ case Node_postincrement:
+ case Node_postdecrement:
+ unref(*lhs);
+ *lhs = make_number(lval +
+ (tree->type == Node_postincrement ? 1.0 : -1.0));
+ if (after_assign)
+ (*after_assign)();
+ return tmp_number(lval);
+ default:
+ break; /* handled below */
+ }
+
+ tmp = tree_eval(tree->rnode);
+ rval = force_number(tmp);
+ free_temp(tmp);
+ unref(*lhs);
+ switch(tree->type) {
+ case Node_assign_exp:
+ if ((ltemp = rval) == rval) { /* integer exponent */
+ if (ltemp == 0)
+ *lhs = make_number((AWKNUM) 1);
+ else if (ltemp == 1)
+ *lhs = make_number(lval);
+ else {
+ /* doing it this way should be more precise */
+ for (t1 = t2 = lval; --ltemp; )
+ t1 *= t2;
+ *lhs = make_number(t1);
+ }
+ } else
+ *lhs = make_number((AWKNUM) pow((double) lval, (double) rval));
+ break;
+
+ case Node_assign_times:
+ *lhs = make_number(lval * rval);
+ break;
+
+ case Node_assign_quotient:
+ if (rval == (AWKNUM) 0)
+ fatal("division by zero attempted in /=");
+#ifdef _CRAY
+ /*
+ * special case for integer division, put in for Cray
+ */
+ ltemp = rval;
+ if (ltemp == 0) {
+ *lhs = make_number(lval / rval);
+ break;
+ }
+ ltemp = (long) lval / ltemp;
+ if (ltemp * lval == rval)
+ *lhs = make_number((AWKNUM) ltemp);
+ else
+#endif
+ *lhs = make_number(lval / rval);
+ break;
+
+ case Node_assign_mod:
+ if (rval == (AWKNUM) 0)
+ fatal("division by zero attempted in %=");
+#ifndef FMOD_MISSING
+ *lhs = make_number(fmod(lval, rval));
+#else
+ (void) modf(lval / rval, &t1);
+ t2 = lval - rval * t1;
+ *lhs = make_number(t2);
+#endif
+ break;
+
+ case Node_assign_plus:
+ *lhs = make_number(lval + rval);
+ break;
+
+ case Node_assign_minus:
+ *lhs = make_number(lval - rval);
+ break;
+ default:
+ cant_happen();
+ }
+ if (after_assign)
+ (*after_assign)();
+ return *lhs;
+}
+
+NODE **stack_ptr;
+
+static NODE *
+func_call(name, arg_list)
+NODE *name; /* name is a Node_val giving function name */
+NODE *arg_list; /* Node_expression_list of calling args. */
+{
+ register NODE *arg, *argp, *r;
+ NODE *n, *f;
+ jmp_buf volatile func_tag_stack;
+ jmp_buf volatile loop_tag_stack;
+ int volatile save_loop_tag_valid = 0;
+ NODE **volatile save_stack, *save_ret_node;
+ NODE **volatile local_stack = NULL, **sp;
+ int count;
+ extern NODE *ret_node;
+
+ /*
+ * retrieve function definition node
+ */
+ f = lookup(name->stptr);
+ if (!f || f->type != Node_func)
+ fatal("function `%s' not defined", name->stptr);
+#ifdef FUNC_TRACE
+ fprintf(stderr, "function %s called\n", name->stptr);
+#endif
+ count = f->lnode->param_cnt;
+ if (count)
+ emalloc(local_stack, NODE **, count*sizeof(NODE *), "func_call");
+ sp = local_stack;
+
+ /*
+ * for each calling arg. add NODE * on stack
+ */
+ for (argp = arg_list; count && argp != NULL; argp = argp->rnode) {
+ arg = argp->lnode;
+ getnode(r);
+ r->type = Node_var;
+ /*
+ * call by reference for arrays; see below also
+ */
+ if (arg->type == Node_param_list)
+ arg = stack_ptr[arg->param_cnt];
+ if (arg->type == Node_var_array)
+ *r = *arg;
+ else {
+ n = tree_eval(arg);
+ r->lnode = dupnode(n);
+ r->rnode = (NODE *) NULL;
+ free_temp(n);
+ }
+ *sp++ = r;
+ count--;
+ }
+ if (argp != NULL) /* left over calling args. */
+ warning(
+ "function `%s' called with more arguments than declared",
+ name->stptr);
+ /*
+ * add remaining params. on stack with null value
+ */
+ while (count-- > 0) {
+ getnode(r);
+ r->type = Node_var;
+ r->lnode = Nnull_string;
+ r->rnode = (NODE *) NULL;
+ *sp++ = r;
+ }
+
+ /*
+ * Execute function body, saving context, as a return statement
+ * will longjmp back here.
+ *
+ * Have to save and restore the loop_tag stuff so that a return
+ * inside a loop in a function body doesn't scrog any loops going
+ * on in the main program. We save the necessary info in variables
+ * local to this function so that function nesting works OK.
+ * We also only bother to save the loop stuff if we're in a loop
+ * when the function is called.
+ */
+ if (loop_tag_valid) {
+ int junk = 0;
+
+ save_loop_tag_valid = (volatile int) loop_tag_valid;
+ PUSH_BINDING(loop_tag_stack, loop_tag, junk);
+ loop_tag_valid = 0;
+ }
+ save_stack = stack_ptr;
+ stack_ptr = local_stack;
+ PUSH_BINDING(func_tag_stack, func_tag, func_tag_valid);
+ save_ret_node = ret_node;
+ ret_node = Nnull_string; /* default return value */
+ if (setjmp(func_tag) == 0)
+ (void) interpret(f->rnode);
+
+ r = ret_node;
+ ret_node = (NODE *) save_ret_node;
+ RESTORE_BINDING(func_tag_stack, func_tag, func_tag_valid);
+ stack_ptr = (NODE **) save_stack;
+
+ /*
+ * here, we pop each parameter and check whether
+ * it was an array. If so, and if the arg. passed in was
+ * a simple variable, then the value should be copied back.
+ * This achieves "call-by-reference" for arrays.
+ */
+ sp = local_stack;
+ count = f->lnode->param_cnt;
+ for (argp = arg_list; count > 0 && argp != NULL; argp = argp->rnode) {
+ arg = argp->lnode;
+ if (arg->type == Node_param_list)
+ arg = stack_ptr[arg->param_cnt];
+ n = *sp++;
+ if (arg->type == Node_var && n->type == Node_var_array) {
+ /* should we free arg->var_value ? */
+ arg->var_array = n->var_array;
+ arg->type = Node_var_array;
+ }
+ unref(n->lnode);
+ freenode(n);
+ count--;
+ }
+ while (count-- > 0) {
+ n = *sp++;
+ /* if n is an (local) array, all the elements should be freed */
+ if (n->type == Node_var_array) {
+ assoc_clear(n);
+ free(n->var_array);
+ }
+ unref(n->lnode);
+ freenode(n);
+ }
+ if (local_stack)
+ free((char *) local_stack);
+
+ /* Restore the loop_tag stuff if necessary. */
+ if (save_loop_tag_valid) {
+ int junk = 0;
+
+ loop_tag_valid = (int) save_loop_tag_valid;
+ RESTORE_BINDING(loop_tag_stack, loop_tag, junk);
+ }
+
+ if (!(r->flags & PERM))
+ r->flags |= TEMP;
+ return r;
+}
+
+/*
+ * This returns a POINTER to a node pointer. get_lhs(ptr) is the current
+ * value of the var, or where to store the var's new value
+ */
+
+NODE **
+get_lhs(ptr, assign)
+register NODE *ptr;
+Func_ptr *assign;
+{
+ register NODE **aptr = NULL;
+ register NODE *n;
+
+ switch (ptr->type) {
+ case Node_var_array:
+ fatal("attempt to use an array in a scalar context");
+ case Node_var:
+ aptr = &(ptr->var_value);
+#ifdef DEBUG
+ if (ptr->var_value->stref <= 0)
+ cant_happen();
+#endif
+ break;
+
+ case Node_FIELDWIDTHS:
+ aptr = &(FIELDWIDTHS_node->var_value);
+ if (assign)
+ *assign = set_FIELDWIDTHS;
+ break;
+
+ case Node_RS:
+ aptr = &(RS_node->var_value);
+ if (assign)
+ *assign = set_RS;
+ break;
+
+ case Node_FS:
+ aptr = &(FS_node->var_value);
+ if (assign)
+ *assign = set_FS;
+ break;
+
+ case Node_FNR:
+ unref(FNR_node->var_value);
+ FNR_node->var_value = make_number((AWKNUM) FNR);
+ aptr = &(FNR_node->var_value);
+ if (assign)
+ *assign = set_FNR;
+ break;
+
+ case Node_NR:
+ unref(NR_node->var_value);
+ NR_node->var_value = make_number((AWKNUM) NR);
+ aptr = &(NR_node->var_value);
+ if (assign)
+ *assign = set_NR;
+ break;
+
+ case Node_NF:
+ if (NF == -1)
+ (void) get_field(HUGE-1, assign); /* parse record */
+ unref(NF_node->var_value);
+ NF_node->var_value = make_number((AWKNUM) NF);
+ aptr = &(NF_node->var_value);
+ if (assign)
+ *assign = set_NF;
+ break;
+
+ case Node_IGNORECASE:
+ unref(IGNORECASE_node->var_value);
+ IGNORECASE_node->var_value = make_number((AWKNUM) IGNORECASE);
+ aptr = &(IGNORECASE_node->var_value);
+ if (assign)
+ *assign = set_IGNORECASE;
+ break;
+
+ case Node_OFMT:
+ aptr = &(OFMT_node->var_value);
+ if (assign)
+ *assign = set_OFMT;
+ break;
+
+ case Node_CONVFMT:
+ aptr = &(CONVFMT_node->var_value);
+ if (assign)
+ *assign = set_CONVFMT;
+ break;
+
+ case Node_ORS:
+ aptr = &(ORS_node->var_value);
+ if (assign)
+ *assign = set_ORS;
+ break;
+
+ case Node_OFS:
+ aptr = &(OFS_node->var_value);
+ if (assign)
+ *assign = set_OFS;
+ break;
+
+ case Node_param_list:
+ aptr = &(stack_ptr[ptr->param_cnt]->var_value);
+ break;
+
+ case Node_field_spec:
+ {
+ int field_num;
+
+ n = tree_eval(ptr->lnode);
+ field_num = (int) force_number(n);
+ free_temp(n);
+ if (field_num < 0)
+ fatal("attempt to access field %d", field_num);
+ if (field_num == 0 && field0_valid) { /* short circuit */
+ aptr = &fields_arr[0];
+ if (assign)
+ *assign = reset_record;
+ break;
+ }
+ aptr = get_field(field_num, assign);
+ break;
+ }
+ case Node_subscript:
+ n = ptr->lnode;
+ if (n->type == Node_param_list)
+ n = stack_ptr[n->param_cnt];
+ aptr = assoc_lookup(n, concat_exp(ptr->rnode));
+ break;
+
+ case Node_func:
+ fatal ("`%s' is a function, assignment is not allowed",
+ ptr->lnode->param);
+ default:
+ cant_happen();
+ }
+ return aptr;
+}
+
+static NODE *
+match_op(tree)
+register NODE *tree;
+{
+ register NODE *t1;
+ register Regexp *rp;
+ int i;
+ int match = 1;
+
+ if (tree->type == Node_nomatch)
+ match = 0;
+ if (tree->type == Node_regex)
+ t1 = *get_field(0, (Func_ptr *) 0);
+ else {
+ t1 = force_string(tree_eval(tree->lnode));
+ tree = tree->rnode;
+ }
+ rp = re_update(tree);
+ i = research(rp, t1->stptr, 0, t1->stlen, 0);
+ i = (i == -1) ^ (match == 1);
+ free_temp(t1);
+ return tmp_number((AWKNUM) i);
+}
+
+void
+set_IGNORECASE()
+{
+ static int warned = 0;
+
+ if ((do_lint || do_unix) && ! warned) {
+ warned = 1;
+ warning("IGNORECASE not supported in compatibility mode");
+ }
+ IGNORECASE = (force_number(IGNORECASE_node->var_value) != 0.0);
+ set_FS();
+}
+
+void
+set_OFS()
+{
+ OFS = force_string(OFS_node->var_value)->stptr;
+ OFSlen = OFS_node->var_value->stlen;
+ OFS[OFSlen] = '\0';
+}
+
+void
+set_ORS()
+{
+ ORS = force_string(ORS_node->var_value)->stptr;
+ ORSlen = ORS_node->var_value->stlen;
+ ORS[ORSlen] = '\0';
+}
+
+static NODE **fmt_list = NULL;
+static int fmt_ok P((NODE *n));
+static int fmt_index P((NODE *n));
+
+static int
+fmt_ok(n)
+NODE *n;
+{
+ /* to be done later */
+ return 1;
+}
+
+static int
+fmt_index(n)
+NODE *n;
+{
+ register int ix = 0;
+ static int fmt_num = 4;
+ static int fmt_hiwater = 0;
+
+ if (fmt_list == NULL)
+ emalloc(fmt_list, NODE **, fmt_num*sizeof(*fmt_list), "fmt_index");
+ (void) force_string(n);
+ while (ix < fmt_hiwater) {
+ if (cmp_nodes(fmt_list[ix], n) == 0)
+ return ix;
+ ix++;
+ }
+ /* not found */
+ n->stptr[n->stlen] = '\0';
+ if (!fmt_ok(n))
+ warning("bad FMT specification");
+ if (fmt_hiwater >= fmt_num) {
+ fmt_num *= 2;
+ emalloc(fmt_list, NODE **, fmt_num, "fmt_index");
+ }
+ fmt_list[fmt_hiwater] = dupnode(n);
+ return fmt_hiwater++;
+}
+
+void
+set_OFMT()
+{
+ OFMTidx = fmt_index(OFMT_node->var_value);
+ OFMT = fmt_list[OFMTidx]->stptr;
+}
+
+void
+set_CONVFMT()
+{
+ CONVFMTidx = fmt_index(CONVFMT_node->var_value);
+ CONVFMT = fmt_list[CONVFMTidx]->stptr;
+}
diff --git a/gnu/usr.bin/awk/field.c b/gnu/usr.bin/awk/field.c
new file mode 100644
index 000000000000..d8f9a5455631
--- /dev/null
+++ b/gnu/usr.bin/awk/field.c
@@ -0,0 +1,645 @@
+/*
+ * field.c - routines for dealing with fields and record parsing
+ */
+
+/*
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ *
+ * This file is part of GAWK, the GNU implementation of the
+ * AWK Progamming Language.
+ *
+ * GAWK is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * GAWK is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with GAWK; see the file COPYING. If not, write to
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include "awk.h"
+
+static int (*parse_field) P((int, char **, int, NODE *,
+ Regexp *, void (*)(), NODE *));
+static void rebuild_record P((void));
+static int re_parse_field P((int, char **, int, NODE *,
+ Regexp *, void (*)(), NODE *));
+static int def_parse_field P((int, char **, int, NODE *,
+ Regexp *, void (*)(), NODE *));
+static int sc_parse_field P((int, char **, int, NODE *,
+ Regexp *, void (*)(), NODE *));
+static int fw_parse_field P((int, char **, int, NODE *,
+ Regexp *, void (*)(), NODE *));
+static void set_element P((int, char *, int, NODE *));
+static void grow_fields_arr P((int num));
+static void set_field P((int num, char *str, int len, NODE *dummy));
+
+
+static Regexp *FS_regexp = NULL;
+static char *parse_extent; /* marks where to restart parse of record */
+static int parse_high_water=0; /* field number that we have parsed so far */
+static int nf_high_water = 0; /* size of fields_arr */
+static int resave_fs;
+static NODE *save_FS; /* save current value of FS when line is read,
+ * to be used in deferred parsing
+ */
+
+NODE **fields_arr; /* array of pointers to the field nodes */
+int field0_valid; /* $(>0) has not been changed yet */
+int default_FS;
+static NODE **nodes; /* permanent repository of field nodes */
+static int *FIELDWIDTHS = NULL;
+
+void
+init_fields()
+{
+ NODE *n;
+
+ emalloc(fields_arr, NODE **, sizeof(NODE *), "init_fields");
+ emalloc(nodes, NODE **, sizeof(NODE *), "init_fields");
+ getnode(n);
+ *n = *Nnull_string;
+ fields_arr[0] = nodes[0] = n;
+ parse_extent = fields_arr[0]->stptr;
+ save_FS = dupnode(FS_node->var_value);
+ field0_valid = 1;
+}
+
+
+static void
+grow_fields_arr(num)
+int num;
+{
+ register int t;
+ register NODE *n;
+
+ erealloc(fields_arr, NODE **, (num + 1) * sizeof(NODE *), "set_field");
+ erealloc(nodes, NODE **, (num+1) * sizeof(NODE *), "set_field");
+ for (t = nf_high_water+1; t <= num; t++) {
+ getnode(n);
+ *n = *Nnull_string;
+ fields_arr[t] = nodes[t] = n;
+ }
+ nf_high_water = num;
+}
+
+/*ARGSUSED*/
+static void
+set_field(num, str, len, dummy)
+int num;
+char *str;
+int len;
+NODE *dummy; /* not used -- just to make interface same as set_element */
+{
+ register NODE *n;
+
+ if (num > nf_high_water)
+ grow_fields_arr(num);
+ n = nodes[num];
+ n->stptr = str;
+ n->stlen = len;
+ n->flags = (PERM|STR|STRING|MAYBE_NUM);
+ fields_arr[num] = n;
+}
+
+/* Someone assigned a value to $(something). Fix up $0 to be right */
+static void
+rebuild_record()
+{
+ register int tlen;
+ register NODE *tmp;
+ NODE *ofs;
+ char *ops;
+ register char *cops;
+ register NODE **ptr;
+ register int ofslen;
+
+ tlen = 0;
+ ofs = force_string(OFS_node->var_value);
+ ofslen = ofs->stlen;
+ ptr = &fields_arr[NF];
+ while (ptr > &fields_arr[0]) {
+ tmp = force_string(*ptr);
+ tlen += tmp->stlen;
+ ptr--;
+ }
+ tlen += (NF - 1) * ofslen;
+ if (tlen < 0)
+ tlen = 0;
+ emalloc(ops, char *, tlen + 2, "fix_fields");
+ cops = ops;
+ ops[0] = '\0';
+ for (ptr = &fields_arr[1]; ptr <= &fields_arr[NF]; ptr++) {
+ tmp = *ptr;
+ if (tmp->stlen == 1)
+ *cops++ = tmp->stptr[0];
+ else if (tmp->stlen != 0) {
+ memcpy(cops, tmp->stptr, tmp->stlen);
+ cops += tmp->stlen;
+ }
+ if (ptr != &fields_arr[NF]) {
+ if (ofslen == 1)
+ *cops++ = ofs->stptr[0];
+ else if (ofslen != 0) {
+ memcpy(cops, ofs->stptr, ofslen);
+ cops += ofslen;
+ }
+ }
+ }
+ tmp = make_str_node(ops, tlen, ALREADY_MALLOCED);
+ unref(fields_arr[0]);
+ fields_arr[0] = tmp;
+ field0_valid = 1;
+}
+
+/*
+ * setup $0, but defer parsing rest of line until reference is made to $(>0)
+ * or to NF. At that point, parse only as much as necessary.
+ */
+void
+set_record(buf, cnt, freeold)
+char *buf;
+int cnt;
+int freeold;
+{
+ register int i;
+
+ NF = -1;
+ for (i = 1; i <= parse_high_water; i++) {
+ unref(fields_arr[i]);
+ }
+ parse_high_water = 0;
+ if (freeold) {
+ unref(fields_arr[0]);
+ if (resave_fs) {
+ resave_fs = 0;
+ unref(save_FS);
+ save_FS = dupnode(FS_node->var_value);
+ }
+ nodes[0]->stptr = buf;
+ nodes[0]->stlen = cnt;
+ nodes[0]->stref = 1;
+ nodes[0]->flags = (STRING|STR|PERM|MAYBE_NUM);
+ fields_arr[0] = nodes[0];
+ }
+ fields_arr[0]->flags |= MAYBE_NUM;
+ field0_valid = 1;
+}
+
+void
+reset_record()
+{
+ (void) force_string(fields_arr[0]);
+ set_record(fields_arr[0]->stptr, fields_arr[0]->stlen, 0);
+}
+
+void
+set_NF()
+{
+ register int i;
+
+ NF = (int) force_number(NF_node->var_value);
+ if (NF > nf_high_water)
+ grow_fields_arr(NF);
+ for (i = parse_high_water + 1; i <= NF; i++) {
+ unref(fields_arr[i]);
+ fields_arr[i] = Nnull_string;
+ }
+ field0_valid = 0;
+}
+
+/*
+ * this is called both from get_field() and from do_split()
+ * via (*parse_field)(). This variation is for when FS is a regular
+ * expression -- either user-defined or because RS=="" and FS==" "
+ */
+static int
+re_parse_field(up_to, buf, len, fs, rp, set, n)
+int up_to; /* parse only up to this field number */
+char **buf; /* on input: string to parse; on output: point to start next */
+int len;
+NODE *fs;
+Regexp *rp;
+void (*set) (); /* routine to set the value of the parsed field */
+NODE *n;
+{
+ register char *scan = *buf;
+ register int nf = parse_high_water;
+ register char *field;
+ register char *end = scan + len;
+
+ if (up_to == HUGE)
+ nf = 0;
+ if (len == 0)
+ return nf;
+
+ if (*RS == 0 && default_FS)
+ while (scan < end && isspace(*scan))
+ scan++;
+ field = scan;
+ while (scan < end
+ && research(rp, scan, 0, (int)(end - scan), 1) != -1
+ && nf < up_to) {
+ if (REEND(rp, scan) == RESTART(rp, scan)) { /* null match */
+ scan++;
+ if (scan == end) {
+ (*set)(++nf, field, scan - field, n);
+ up_to = nf;
+ break;
+ }
+ continue;
+ }
+ (*set)(++nf, field, scan + RESTART(rp, scan) - field, n);
+ scan += REEND(rp, scan);
+ field = scan;
+ if (scan == end) /* FS at end of record */
+ (*set)(++nf, field, 0, n);
+ }
+ if (nf != up_to && scan < end) {
+ (*set)(++nf, scan, (int)(end - scan), n);
+ scan = end;
+ }
+ *buf = scan;
+ return (nf);
+}
+
+/*
+ * this is called both from get_field() and from do_split()
+ * via (*parse_field)(). This variation is for when FS is a single space
+ * character.
+ */
+static int
+def_parse_field(up_to, buf, len, fs, rp, set, n)
+int up_to; /* parse only up to this field number */
+char **buf; /* on input: string to parse; on output: point to start next */
+int len;
+NODE *fs;
+Regexp *rp;
+void (*set) (); /* routine to set the value of the parsed field */
+NODE *n;
+{
+ register char *scan = *buf;
+ register int nf = parse_high_water;
+ register char *field;
+ register char *end = scan + len;
+ char sav;
+
+ if (up_to == HUGE)
+ nf = 0;
+ if (len == 0)
+ return nf;
+
+ /* before doing anything save the char at *end */
+ sav = *end;
+ /* because it will be destroyed now: */
+
+ *end = ' '; /* sentinel character */
+ for (; nf < up_to; scan++) {
+ /*
+ * special case: fs is single space, strip leading whitespace
+ */
+ while (scan < end && (*scan == ' ' || *scan == '\t'))
+ scan++;
+ if (scan >= end)
+ break;
+ field = scan;
+ while (*scan != ' ' && *scan != '\t')
+ scan++;
+ (*set)(++nf, field, (int)(scan - field), n);
+ if (scan == end)
+ break;
+ }
+
+ /* everything done, restore original char at *end */
+ *end = sav;
+
+ *buf = scan;
+ return nf;
+}
+
+/*
+ * this is called both from get_field() and from do_split()
+ * via (*parse_field)(). This variation is for when FS is a single character
+ * other than space.
+ */
+static int
+sc_parse_field(up_to, buf, len, fs, rp, set, n)
+int up_to; /* parse only up to this field number */
+char **buf; /* on input: string to parse; on output: point to start next */
+int len;
+NODE *fs;
+Regexp *rp;
+void (*set) (); /* routine to set the value of the parsed field */
+NODE *n;
+{
+ register char *scan = *buf;
+ register char fschar;
+ register int nf = parse_high_water;
+ register char *field;
+ register char *end = scan + len;
+ char sav;
+
+ if (up_to == HUGE)
+ nf = 0;
+ if (len == 0)
+ return nf;
+
+ if (*RS == 0 && fs->stlen == 0)
+ fschar = '\n';
+ else
+ fschar = fs->stptr[0];
+
+ /* before doing anything save the char at *end */
+ sav = *end;
+ /* because it will be destroyed now: */
+ *end = fschar; /* sentinel character */
+
+ for (; nf < up_to; scan++) {
+ field = scan;
+ while (*scan++ != fschar)
+ ;
+ scan--;
+ (*set)(++nf, field, (int)(scan - field), n);
+ if (scan == end)
+ break;
+ }
+
+ /* everything done, restore original char at *end */
+ *end = sav;
+
+ *buf = scan;
+ return nf;
+}
+
+/*
+ * this is called both from get_field() and from do_split()
+ * via (*parse_field)(). This variation is for fields are fixed widths.
+ */
+static int
+fw_parse_field(up_to, buf, len, fs, rp, set, n)
+int up_to; /* parse only up to this field number */
+char **buf; /* on input: string to parse; on output: point to start next */
+int len;
+NODE *fs;
+Regexp *rp;
+void (*set) (); /* routine to set the value of the parsed field */
+NODE *n;
+{
+ register char *scan = *buf;
+ register int nf = parse_high_water;
+ register char *end = scan + len;
+
+ if (up_to == HUGE)
+ nf = 0;
+ if (len == 0)
+ return nf;
+ for (; nf < up_to && (len = FIELDWIDTHS[nf+1]) != -1; ) {
+ if (len > end - scan)
+ len = end - scan;
+ (*set)(++nf, scan, len, n);
+ scan += len;
+ }
+ if (len == -1)
+ *buf = end;
+ else
+ *buf = scan;
+ return nf;
+}
+
+NODE **
+get_field(requested, assign)
+register int requested;
+Func_ptr *assign; /* this field is on the LHS of an assign */
+{
+ /*
+ * if requesting whole line but some other field has been altered,
+ * then the whole line must be rebuilt
+ */
+ if (requested == 0) {
+ if (!field0_valid) {
+ /* first, parse remainder of input record */
+ if (NF == -1) {
+ NF = (*parse_field)(HUGE-1, &parse_extent,
+ fields_arr[0]->stlen -
+ (parse_extent - fields_arr[0]->stptr),
+ save_FS, FS_regexp, set_field,
+ (NODE *)NULL);
+ parse_high_water = NF;
+ }
+ rebuild_record();
+ }
+ if (assign)
+ *assign = reset_record;
+ return &fields_arr[0];
+ }
+
+ /* assert(requested > 0); */
+
+ if (assign)
+ field0_valid = 0; /* $0 needs reconstruction */
+
+ if (requested <= parse_high_water) /* already parsed this field */
+ return &fields_arr[requested];
+
+ if (NF == -1) { /* have not yet parsed to end of record */
+ /*
+ * parse up to requested fields, calling set_field() for each,
+ * saving in parse_extent the point where the parse left off
+ */
+ if (parse_high_water == 0) /* starting at the beginning */
+ parse_extent = fields_arr[0]->stptr;
+ parse_high_water = (*parse_field)(requested, &parse_extent,
+ fields_arr[0]->stlen - (parse_extent-fields_arr[0]->stptr),
+ save_FS, FS_regexp, set_field, (NODE *)NULL);
+
+ /*
+ * if we reached the end of the record, set NF to the number of
+ * fields so far. Note that requested might actually refer to
+ * a field that is beyond the end of the record, but we won't
+ * set NF to that value at this point, since this is only a
+ * reference to the field and NF only gets set if the field
+ * is assigned to -- this case is handled below
+ */
+ if (parse_extent == fields_arr[0]->stptr + fields_arr[0]->stlen)
+ NF = parse_high_water;
+ if (requested == HUGE-1) /* HUGE-1 means set NF */
+ requested = parse_high_water;
+ }
+ if (parse_high_water < requested) { /* requested beyond end of record */
+ if (assign) { /* expand record */
+ register int i;
+
+ if (requested > nf_high_water)
+ grow_fields_arr(requested);
+
+ /* fill in fields that don't exist */
+ for (i = parse_high_water + 1; i <= requested; i++)
+ fields_arr[i] = Nnull_string;
+
+ NF = requested;
+ parse_high_water = requested;
+ } else
+ return &Nnull_string;
+ }
+
+ return &fields_arr[requested];
+}
+
+static void
+set_element(num, s, len, n)
+int num;
+char *s;
+int len;
+NODE *n;
+{
+ register NODE *it;
+
+ it = make_string(s, len);
+ it->flags |= MAYBE_NUM;
+ *assoc_lookup(n, tmp_number((AWKNUM) (num))) = it;
+}
+
+NODE *
+do_split(tree)
+NODE *tree;
+{
+ NODE *t1, *t2, *t3, *tmp;
+ NODE *fs;
+ char *s;
+ int (*parseit)P((int, char **, int, NODE *,
+ Regexp *, void (*)(), NODE *));
+ Regexp *rp = NULL;
+
+ t1 = tree_eval(tree->lnode);
+ t2 = tree->rnode->lnode;
+ t3 = tree->rnode->rnode->lnode;
+
+ (void) force_string(t1);
+
+ if (t2->type == Node_param_list)
+ t2 = stack_ptr[t2->param_cnt];
+ if (t2->type != Node_var && t2->type != Node_var_array)
+ fatal("second argument of split is not a variable");
+ assoc_clear(t2);
+
+ if (t3->re_flags & FS_DFLT) {
+ parseit = parse_field;
+ fs = force_string(FS_node->var_value);
+ rp = FS_regexp;
+ } else {
+ tmp = force_string(tree_eval(t3->re_exp));
+ if (tmp->stlen == 1) {
+ if (tmp->stptr[0] == ' ')
+ parseit = def_parse_field;
+ else
+ parseit = sc_parse_field;
+ } else {
+ parseit = re_parse_field;
+ rp = re_update(t3);
+ }
+ fs = tmp;
+ }
+
+ s = t1->stptr;
+ tmp = tmp_number((AWKNUM) (*parseit)(HUGE, &s, (int)t1->stlen,
+ fs, rp, set_element, t2));
+ free_temp(t1);
+ free_temp(t3);
+ return tmp;
+}
+
+void
+set_FS()
+{
+ NODE *tmp = NULL;
+ char buf[10];
+ NODE *fs;
+
+ buf[0] = '\0';
+ default_FS = 0;
+ if (FS_regexp) {
+ refree(FS_regexp);
+ FS_regexp = NULL;
+ }
+ fs = force_string(FS_node->var_value);
+ if (fs->stlen > 1)
+ parse_field = re_parse_field;
+ else if (*RS == 0) {
+ parse_field = sc_parse_field;
+ if (fs->stlen == 1) {
+ if (fs->stptr[0] == ' ') {
+ default_FS = 1;
+ strcpy(buf, "[ \t\n]+");
+ } else if (fs->stptr[0] != '\n')
+ sprintf(buf, "[%c\n]", fs->stptr[0]);
+ }
+ } else {
+ parse_field = def_parse_field;
+ if (fs->stptr[0] == ' ' && fs->stlen == 1)
+ default_FS = 1;
+ else if (fs->stptr[0] != ' ' && fs->stlen == 1) {
+ if (IGNORECASE == 0)
+ parse_field = sc_parse_field;
+ else
+ sprintf(buf, "[%c]", fs->stptr[0]);
+ }
+ }
+ if (buf[0]) {
+ FS_regexp = make_regexp(buf, strlen(buf), IGNORECASE, 1);
+ parse_field = re_parse_field;
+ } else if (parse_field == re_parse_field) {
+ FS_regexp = make_regexp(fs->stptr, fs->stlen, IGNORECASE, 1);
+ } else
+ FS_regexp = NULL;
+ resave_fs = 1;
+}
+
+void
+set_RS()
+{
+ (void) force_string(RS_node->var_value);
+ RS = RS_node->var_value->stptr;
+ set_FS();
+}
+
+void
+set_FIELDWIDTHS()
+{
+ register char *scan;
+ char *end;
+ register int i;
+ static int fw_alloc = 1;
+ static int warned = 0;
+ extern double strtod();
+
+ if (do_lint && ! warned) {
+ warned = 1;
+ warning("use of FIELDWIDTHS is a gawk extension");
+ }
+ if (do_unix) /* quick and dirty, does the trick */
+ return;
+
+ parse_field = fw_parse_field;
+ scan = force_string(FIELDWIDTHS_node->var_value)->stptr;
+ end = scan + 1;
+ if (FIELDWIDTHS == NULL)
+ emalloc(FIELDWIDTHS, int *, fw_alloc * sizeof(int), "set_FIELDWIDTHS");
+ FIELDWIDTHS[0] = 0;
+ for (i = 1; ; i++) {
+ if (i >= fw_alloc) {
+ fw_alloc *= 2;
+ erealloc(FIELDWIDTHS, int *, fw_alloc * sizeof(int), "set_FIELDWIDTHS");
+ }
+ FIELDWIDTHS[i] = (int) strtod(scan, &end);
+ if (end == scan)
+ break;
+ scan = end;
+ }
+ FIELDWIDTHS[i] = -1;
+}
diff --git a/gnu/usr.bin/awk/gawk.texi b/gnu/usr.bin/awk/gawk.texi
new file mode 100644
index 000000000000..b2802623136d
--- /dev/null
+++ b/gnu/usr.bin/awk/gawk.texi
@@ -0,0 +1,11270 @@
+\input texinfo @c -*-texinfo-*-
+@c %**start of header (This is for running Texinfo on a region.)
+@setfilename gawk.info
+@settitle The GAWK Manual
+@c @smallbook
+@c %**end of header (This is for running Texinfo on a region.)
+
+@ifinfo
+@synindex fn cp
+@synindex vr cp
+@end ifinfo
+@iftex
+@syncodeindex fn cp
+@syncodeindex vr cp
+@end iftex
+
+@c If "finalout" is commented out, the printed output will show
+@c black boxes that mark lines that are too long. Thus, it is
+@c unwise to comment it out when running a master in case there are
+@c overfulls which are deemed okay.
+
+@iftex
+@finalout
+@end iftex
+
+@c ===> NOTE! <==
+@c Determine the edition number in *four* places by hand:
+@c 1. First ifinfo section 2. title page 3. copyright page 4. top node
+@c To find the locations, search for !!set
+
+@ifinfo
+This file documents @code{awk}, a program that you can use to select
+particular records in a file and perform operations upon them.
+
+This is Edition 0.15 of @cite{The GAWK Manual}, @*
+for the 2.15 version of the GNU implementation @*
+of AWK.
+
+Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc.
+
+Permission is granted to make and distribute verbatim copies of
+this manual provided the copyright notice and this permission notice
+are preserved on all copies.
+
+@ignore
+Permission is granted to process this file through TeX and print the
+results, provided the printed document carries copying permission
+notice identical to this one except for the removal of this paragraph
+(this paragraph not being relevant to the printed manual).
+
+@end ignore
+Permission is granted to copy and distribute modified versions of this
+manual under the conditions for verbatim copying, provided that the entire
+resulting derived work is distributed under the terms of a permission
+notice identical to this one.
+
+Permission is granted to copy and distribute translations of this manual
+into another language, under the above conditions for modified versions,
+except that this permission notice may be stated in a translation approved
+by the Foundation.
+@end ifinfo
+
+@setchapternewpage odd
+
+@c !!set edition, date, version
+@titlepage
+@title The GAWK Manual
+@subtitle Edition 0.15
+@subtitle April 1993
+@author Diane Barlow Close
+@author Arnold D. Robbins
+@author Paul H. Rubin
+@author Richard Stallman
+
+@c Include the Distribution inside the titlepage environment so
+@c that headings are turned off. Headings on and off do not work.
+
+@page
+@vskip 0pt plus 1filll
+Copyright @copyright{} 1989, 1991, 1992, 1993 Free Software Foundation, Inc.
+@sp 2
+
+@c !!set edition, date, version
+This is Edition 0.15 of @cite{The GAWK Manual}, @*
+for the 2.15 version of the GNU implementation @*
+of AWK.
+
+@sp 2
+Published by the Free Software Foundation @*
+675 Massachusetts Avenue @*
+Cambridge, MA 02139 USA @*
+Printed copies are available for $20 each.
+
+Permission is granted to make and distribute verbatim copies of
+this manual provided the copyright notice and this permission notice
+are preserved on all copies.
+
+Permission is granted to copy and distribute modified versions of this
+manual under the conditions for verbatim copying, provided that the entire
+resulting derived work is distributed under the terms of a permission
+notice identical to this one.
+
+Permission is granted to copy and distribute translations of this manual
+into another language, under the above conditions for modified versions,
+except that this permission notice may be stated in a translation approved
+by the Foundation.
+@end titlepage
+
+@ifinfo
+@node Top, Preface, (dir), (dir)
+@comment node-name, next, previous, up
+@top General Introduction
+@c Preface or Licensing nodes should come right after the Top
+@c node, in `unnumbered' sections, then the chapter, `What is gawk'.
+
+This file documents @code{awk}, a program that you can use to select
+particular records in a file and perform operations upon them.
+
+@c !!set edition, date, version
+This is Edition 0.15 of @cite{The GAWK Manual}, @*
+for the 2.15 version of the GNU implementation @*
+of AWK.
+
+@end ifinfo
+
+@menu
+* Preface:: What you can do with @code{awk}; brief history
+ and acknowledgements.
+* Copying:: Your right to copy and distribute @code{gawk}.
+* This Manual:: Using this manual.
+ Includes sample input files that you can use.
+* Getting Started:: A basic introduction to using @code{awk}.
+ How to run an @code{awk} program.
+ Command line syntax.
+* Reading Files:: How to read files and manipulate fields.
+* Printing:: How to print using @code{awk}. Describes the
+ @code{print} and @code{printf} statements.
+ Also describes redirection of output.
+* One-liners:: Short, sample @code{awk} programs.
+* Patterns:: The various types of patterns
+ explained in detail.
+* Actions:: The various types of actions are
+ introduced here. Describes
+ expressions and the various operators in
+ detail. Also describes comparison expressions.
+* Expressions:: Expressions are the basic building
+ blocks of statements.
+* Statements:: The various control statements are
+ described in detail.
+* Arrays:: The description and use of arrays.
+ Also includes array-oriented control
+ statements.
+* Built-in:: The built-in functions are summarized here.
+* User-defined:: User-defined functions are described in detail.
+* Built-in Variables:: Built-in Variables
+* Command Line:: How to run @code{gawk}.
+* Language History:: The evolution of the @code{awk} language.
+* Installation:: Installing @code{gawk} under
+ various operating systems.
+* Gawk Summary:: @code{gawk} Options and Language Summary.
+* Sample Program:: A sample @code{awk} program with a
+ complete explanation.
+* Bugs:: Reporting Problems and Bugs.
+* Notes:: Something about the
+ implementation of @code{gawk}.
+* Glossary:: An explanation of some unfamiliar terms.
+* Index::
+@end menu
+
+@node Preface, Copying, Top, Top
+@comment node-name, next, previous, up
+@unnumbered Preface
+
+@iftex
+@cindex what is @code{awk}
+@end iftex
+If you are like many computer users, you would frequently like to make
+changes in various text files wherever certain patterns appear, or
+extract data from parts of certain lines while discarding the rest. To
+write a program to do this in a language such as C or Pascal is a
+time-consuming inconvenience that may take many lines of code. The job
+may be easier with @code{awk}.
+
+The @code{awk} utility interprets a special-purpose programming language
+that makes it possible to handle simple data-reformatting jobs easily
+with just a few lines of code.
+
+The GNU implementation of @code{awk} is called @code{gawk}; it is fully
+upward compatible with the System V Release 4 version of
+@code{awk}. @code{gawk} is also upward compatible with the @sc{posix}
+(draft) specification of the @code{awk} language. This means that all
+properly written @code{awk} programs should work with @code{gawk}.
+Thus, we usually don't distinguish between @code{gawk} and other @code{awk}
+implementations in this manual.@refill
+
+@cindex uses of @code{awk}
+This manual teaches you what @code{awk} does and how you can use
+@code{awk} effectively. You should already be familiar with basic
+system commands such as @code{ls}. Using @code{awk} you can: @refill
+
+@itemize @bullet
+@item
+manage small, personal databases
+
+@item
+generate reports
+
+@item
+validate data
+@item
+produce indexes, and perform other document preparation tasks
+
+@item
+even experiment with algorithms that can be adapted later to other computer
+languages
+@end itemize
+
+@iftex
+This manual has the difficult task of being both tutorial and reference.
+If you are a novice, feel free to skip over details that seem too complex.
+You should also ignore the many cross references; they are for the
+expert user, and for the on-line Info version of the manual.
+@end iftex
+
+@menu
+* History:: The history of @code{gawk} and
+ @code{awk}. Acknowledgements.
+@end menu
+
+@node History, , Preface, Preface
+@comment node-name, next, previous, up
+@unnumberedsec History of @code{awk} and @code{gawk}
+
+@cindex acronym
+@cindex history of @code{awk}
+The name @code{awk} comes from the initials of its designers: Alfred V.
+Aho, Peter J. Weinberger, and Brian W. Kernighan. The original version of
+@code{awk} was written in 1977. In 1985 a new version made the programming
+language more powerful, introducing user-defined functions, multiple input
+streams, and computed regular expressions.
+This new version became generally available with System V Release 3.1.
+The version in System V Release 4 added some new features and also cleaned
+up the behavior in some of the ``dark corners'' of the language.
+The specification for @code{awk} in the @sc{posix} Command Language
+and Utilities standard further clarified the language based on feedback
+from both the @code{gawk} designers, and the original @code{awk}
+designers.@refill
+
+The GNU implementation, @code{gawk}, was written in 1986 by Paul Rubin
+and Jay Fenlason, with advice from Richard Stallman. John Woods
+contributed parts of the code as well. In 1988 and 1989, David Trueman, with
+help from Arnold Robbins, thoroughly reworked @code{gawk} for compatibility
+with the newer @code{awk}. Current development (1992) focuses on bug fixes,
+performance improvements, and standards compliance.
+
+We need to thank many people for their assistance in producing this
+manual. Jay Fenlason contributed many ideas and sample programs. Richard
+Mlynarik and Robert J. Chassell gave helpful comments on early drafts of this
+manual. The paper @cite{A Supplemental Document for @code{awk}} by John W.
+Pierce of the Chemistry Department at UC San Diego, pinpointed several
+issues relevant both to @code{awk} implementation and to this manual, that
+would otherwise have escaped us. David Trueman, Pat Rankin, and Michal
+Jaegermann also contributed sections of the manual.@refill
+
+The following people provided many helpful comments on this edition of
+the manual: Rick Adams, Michael Brennan, Rich Burridge, Diane Close,
+Christopher (``Topher'') Eliot, Michael Lijewski, Pat Rankin, Miriam Robbins,
+and Michal Jaegermann. Robert J. Chassell provided much valuable advice on
+the use of Texinfo.
+
+Finally, we would like to thank Brian Kernighan of Bell Labs for invaluable
+assistance during the testing and debugging of @code{gawk}, and for
+help in clarifying numerous points about the language.@refill
+
+@node Copying, This Manual, Preface, Top
+@unnumbered GNU GENERAL PUBLIC LICENSE
+@center Version 2, June 1991
+
+@display
+Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc.
+675 Mass Ave, Cambridge, MA 02139, USA
+
+Everyone is permitted to copy and distribute verbatim copies
+of this license document, but changing it is not allowed.
+@end display
+
+@c fakenode --- for prepinfo
+@unnumberedsec Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software---to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Library General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+@iftex
+@c fakenode --- for prepinfo
+@unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+@end iftex
+@ifinfo
+@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+@end ifinfo
+
+@enumerate
+@item
+This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License. The ``Program'', below,
+refers to any such program or work, and a ``work based on the Program''
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language. (Hereinafter, translation is included without limitation in
+the term ``modification''.) Each licensee is addressed as ``you''.
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+@item
+You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+@item
+You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+@enumerate a
+@item
+You must cause the modified files to carry prominent notices
+stating that you changed the files and the date of any change.
+
+@item
+You must cause any work that you distribute or publish, that in
+whole or in part contains or is derived from the Program or any
+part thereof, to be licensed as a whole at no charge to all third
+parties under the terms of this License.
+
+@item
+If the modified program normally reads commands interactively
+when run, you must cause it, when started running for such
+interactive use in the most ordinary way, to print or display an
+announcement including an appropriate copyright notice and a
+notice that there is no warranty (or else, saying that you provide
+a warranty) and that users may redistribute the program under
+these conditions, and telling the user how to view a copy of this
+License. (Exception: if the Program itself is interactive but
+does not normally print such an announcement, your work based on
+the Program is not required to print an announcement.)
+@end enumerate
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+@item
+You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+@enumerate a
+@item
+Accompany it with the complete corresponding machine-readable
+source code, which must be distributed under the terms of Sections
+1 and 2 above on a medium customarily used for software interchange; or,
+
+@item
+Accompany it with a written offer, valid for at least three
+years, to give any third party, for a charge no more than your
+cost of physically performing source distribution, a complete
+machine-readable copy of the corresponding source code, to be
+distributed under the terms of Sections 1 and 2 above on a medium
+customarily used for software interchange; or,
+
+@item
+Accompany it with the information you received as to the offer
+to distribute corresponding source code. (This alternative is
+allowed only for noncommercial distribution and only if you
+received the program in object code or executable form with such
+an offer, in accord with Subsection b above.)
+@end enumerate
+
+The source code for a work means the preferred form of the work for
+making modifications to it. For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable. However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+@item
+You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License. Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+@item
+You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Program or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+@item
+Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+@item
+If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all. For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+@item
+If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded. In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+@item
+The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Program
+specifies a version number of this License which applies to it and ``any
+later version'', you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation. If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+@item
+If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission. For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this. Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+@iftex
+@c fakenode --- for prepinfo
+@heading NO WARRANTY
+@end iftex
+@ifinfo
+@center NO WARRANTY
+@end ifinfo
+
+@item
+BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+@item
+IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+@end enumerate
+
+@iftex
+@c fakenode --- for prepinfo
+@heading END OF TERMS AND CONDITIONS
+@end iftex
+@ifinfo
+@center END OF TERMS AND CONDITIONS
+@end ifinfo
+
+@page
+@c fakenode --- for prepinfo
+@unnumberedsec How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the ``copyright'' line and a pointer to where the full notice is found.
+
+@smallexample
+@var{one line to give the program's name and a brief idea of what it does.}
+Copyright (C) 19@var{yy} @var{name of author}
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+@end smallexample
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+@smallexample
+Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
+Gnomovision comes with ABSOLUTELY NO WARRANTY; for details
+type `show w'.
+This is free software, and you are welcome to redistribute it
+under certain conditions; type `show c' for details.
+@end smallexample
+
+The hypothetical commands @samp{show w} and @samp{show c} should show
+the appropriate parts of the General Public License. Of course, the
+commands you use may be called something other than @samp{show w} and
+@samp{show c}; they could even be mouse-clicks or menu items---whatever
+suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a ``copyright disclaimer'' for the program, if
+necessary. Here is a sample; alter the names:
+
+@smallexample
+Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+`Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+@var{signature of Ty Coon}, 1 April 1989
+Ty Coon, President of Vice
+@end smallexample
+
+This General Public License does not permit incorporating your program into
+proprietary programs. If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library. If this is what you want to do, use the GNU Library General
+Public License instead of this License.
+
+@node This Manual, Getting Started, Copying, Top
+@chapter Using this Manual
+@cindex manual, using this
+@cindex using this manual
+@cindex language, @code{awk}
+@cindex program, @code{awk}
+@cindex @code{awk} language
+@cindex @code{awk} program
+
+The term @code{awk} refers to a particular program, and to the language you
+use to tell this program what to do. When we need to be careful, we call
+the program ``the @code{awk} utility'' and the language ``the @code{awk}
+language.'' The term @code{gawk} refers to a version of @code{awk} developed
+as part the GNU project. The purpose of this manual is to explain
+both the
+@code{awk} language and how to run the @code{awk} utility.@refill
+
+While concentrating on the features of @code{gawk}, the manual will also
+attempt to describe important differences between @code{gawk} and other
+@code{awk} implementations. In particular, any features that are not
+in the @sc{posix} standard for @code{awk} will be noted. @refill
+
+The term @dfn{@code{awk} program} refers to a program written by you in
+the @code{awk} programming language.@refill
+
+@xref{Getting Started, ,Getting Started with @code{awk}}, for the bare
+essentials you need to know to start using @code{awk}.
+
+Some useful ``one-liners'' are included to give you a feel for the
+@code{awk} language (@pxref{One-liners, ,Useful ``One-liners''}).
+
+@ignore
+@strong{I deleted four paragraphs here because they would confuse the
+beginner more than help him. They mention terms such as ``field,''
+``pattern,'' ``action,'' ``built-in function'' which the beginner
+doesn't know.}
+
+@strong{If you can find a way to introduce several of these concepts here,
+enough to give the reader a map of what is to follow, that might
+be useful. I'm not sure that can be done without taking up more
+space than ought to be used here. There may be no way to win.}
+
+@strong{ADR: I'd like to tackle this in phase 2 of my editing.}
+@end ignore
+
+A sample @code{awk} program has been provided for you
+(@pxref{Sample Program}).@refill
+
+If you find terms that you aren't familiar with, try looking them
+up in the glossary (@pxref{Glossary}).@refill
+
+The entire @code{awk} language is summarized for quick reference in
+@ref{Gawk Summary, ,@code{gawk} Summary}. Look there if you just need
+to refresh your memory about a particular feature.@refill
+
+Most of the time complete @code{awk} programs are used as examples, but in
+some of the more advanced sections, only the part of the @code{awk} program
+that illustrates the concept being described is shown.@refill
+
+@menu
+* Sample Data Files:: Sample data files for use in the @code{awk}
+ programs illustrated in this manual.
+@end menu
+
+@node Sample Data Files, , This Manual, This Manual
+@section Data Files for the Examples
+
+@cindex input file, sample
+@cindex sample input file
+@cindex @file{BBS-list} file
+Many of the examples in this manual take their input from two sample
+data files. The first, called @file{BBS-list}, represents a list of
+computer bulletin board systems together with information about those systems.
+The second data file, called @file{inventory-shipped}, contains
+information about shipments on a monthly basis. Each line of these
+files is one @dfn{record}.
+
+In the file @file{BBS-list}, each record contains the name of a computer
+bulletin board, its phone number, the board's baud rate, and a code for
+the number of hours it is operational. An @samp{A} in the last column
+means the board operates 24 hours a day. A @samp{B} in the last
+column means the board operates evening and weekend hours, only. A
+@samp{C} means the board operates only on weekends.
+
+@example
+aardvark 555-5553 1200/300 B
+alpo-net 555-3412 2400/1200/300 A
+barfly 555-7685 1200/300 A
+bites 555-1675 2400/1200/300 A
+camelot 555-0542 300 C
+core 555-2912 1200/300 C
+fooey 555-1234 2400/1200/300 B
+foot 555-6699 1200/300 B
+macfoo 555-6480 1200/300 A
+sdace 555-3430 2400/1200/300 A
+sabafoo 555-2127 1200/300 C
+@end example
+
+@cindex @file{inventory-shipped} file
+The second data file, called @file{inventory-shipped}, represents
+information about shipments during the year.
+Each record contains the month of the year, the number
+of green crates shipped, the number of red boxes shipped, the number of
+orange bags shipped, and the number of blue packages shipped,
+respectively. There are 16 entries, covering the 12 months of one year
+and 4 months of the next year.@refill
+
+@example
+Jan 13 25 15 115
+Feb 15 32 24 226
+Mar 15 24 34 228
+Apr 31 52 63 420
+May 16 34 29 208
+Jun 31 42 75 492
+Jul 24 34 67 436
+Aug 15 34 47 316
+Sep 13 55 37 277
+Oct 29 54 68 525
+Nov 20 87 82 577
+Dec 17 35 61 401
+
+Jan 21 36 64 620
+Feb 26 58 80 652
+Mar 24 75 70 495
+Apr 21 70 74 514
+@end example
+
+@ifinfo
+If you are reading this in GNU Emacs using Info, you can copy the regions
+of text showing these sample files into your own test files. This way you
+can try out the examples shown in the remainder of this document. You do
+this by using the command @kbd{M-x write-region} to copy text from the Info
+file into a file for use with @code{awk}
+(@xref{Misc File Ops, , , emacs, GNU Emacs Manual},
+for more information). Using this information, create your own
+@file{BBS-list} and @file{inventory-shipped} files, and practice what you
+learn in this manual.
+@end ifinfo
+
+@node Getting Started, Reading Files, This Manual, Top
+@chapter Getting Started with @code{awk}
+@cindex script, definition of
+@cindex rule, definition of
+@cindex program, definition of
+@cindex basic function of @code{gawk}
+
+The basic function of @code{awk} is to search files for lines (or other
+units of text) that contain certain patterns. When a line matches one
+of the patterns, @code{awk} performs specified actions on that line.
+@code{awk} keeps processing input lines in this way until the end of the
+input file is reached.@refill
+
+When you run @code{awk}, you specify an @code{awk} @dfn{program} which
+tells @code{awk} what to do. The program consists of a series of
+@dfn{rules}. (It may also contain @dfn{function definitions}, but that
+is an advanced feature, so we will ignore it for now.
+@xref{User-defined, ,User-defined Functions}.) Each rule specifies one
+pattern to search for, and one action to perform when that pattern is found.
+
+Syntactically, a rule consists of a pattern followed by an action. The
+action is enclosed in curly braces to separate it from the pattern.
+Rules are usually separated by newlines. Therefore, an @code{awk}
+program looks like this:
+
+@example
+@var{pattern} @{ @var{action} @}
+@var{pattern} @{ @var{action} @}
+@dots{}
+@end example
+
+@menu
+* Very Simple:: A very simple example.
+* Two Rules:: A less simple one-line example with two rules.
+* More Complex:: A more complex example.
+* Running gawk:: How to run @code{gawk} programs;
+ includes command line syntax.
+* Comments:: Adding documentation to @code{gawk} programs.
+* Statements/Lines:: Subdividing or combining statements into lines.
+* When:: When to use @code{gawk} and
+ when to use other things.
+@end menu
+
+@node Very Simple, Two Rules, Getting Started, Getting Started
+@section A Very Simple Example
+
+@cindex @samp{print $0}
+The following command runs a simple @code{awk} program that searches the
+input file @file{BBS-list} for the string of characters: @samp{foo}. (A
+string of characters is usually called, a @dfn{string}.
+The term @dfn{string} is perhaps based on similar usage in English, such
+as ``a string of pearls,'' or, ``a string of cars in a train.'')
+
+@example
+awk '/foo/ @{ print $0 @}' BBS-list
+@end example
+
+@noindent
+When lines containing @samp{foo} are found, they are printed, because
+@w{@samp{print $0}} means print the current line. (Just @samp{print} by
+itself means the same thing, so we could have written that
+instead.)
+
+You will notice that slashes, @samp{/}, surround the string @samp{foo}
+in the actual @code{awk} program. The slashes indicate that @samp{foo}
+is a pattern to search for. This type of pattern is called a
+@dfn{regular expression}, and is covered in more detail later
+(@pxref{Regexp, ,Regular Expressions as Patterns}). There are
+single-quotes around the @code{awk} program so that the shell won't
+interpret any of it as special shell characters.@refill
+
+Here is what this program prints:
+
+@example
+@group
+fooey 555-1234 2400/1200/300 B
+foot 555-6699 1200/300 B
+macfoo 555-6480 1200/300 A
+sabafoo 555-2127 1200/300 C
+@end group
+@end example
+
+@cindex action, default
+@cindex pattern, default
+@cindex default action
+@cindex default pattern
+In an @code{awk} rule, either the pattern or the action can be omitted,
+but not both. If the pattern is omitted, then the action is performed
+for @emph{every} input line. If the action is omitted, the default
+action is to print all lines that match the pattern.
+
+Thus, we could leave out the action (the @code{print} statement and the curly
+braces) in the above example, and the result would be the same: all
+lines matching the pattern @samp{foo} would be printed. By comparison,
+omitting the @code{print} statement but retaining the curly braces makes an
+empty action that does nothing; then no lines would be printed.
+
+@node Two Rules, More Complex, Very Simple, Getting Started
+@section An Example with Two Rules
+@cindex how @code{awk} works
+
+The @code{awk} utility reads the input files one line at a
+time. For each line, @code{awk} tries the patterns of each of the rules.
+If several patterns match then several actions are run, in the order in
+which they appear in the @code{awk} program. If no patterns match, then
+no actions are run.
+
+After processing all the rules (perhaps none) that match the line,
+@code{awk} reads the next line (however,
+@pxref{Next Statement, ,The @code{next} Statement}). This continues
+until the end of the file is reached.@refill
+
+For example, the @code{awk} program:
+
+@example
+/12/ @{ print $0 @}
+/21/ @{ print $0 @}
+@end example
+
+@noindent
+contains two rules. The first rule has the string @samp{12} as the
+pattern and @samp{print $0} as the action. The second rule has the
+string @samp{21} as the pattern and also has @samp{print $0} as the
+action. Each rule's action is enclosed in its own pair of braces.
+
+This @code{awk} program prints every line that contains the string
+@samp{12} @emph{or} the string @samp{21}. If a line contains both
+strings, it is printed twice, once by each rule.
+
+If we run this program on our two sample data files, @file{BBS-list} and
+@file{inventory-shipped}, as shown here:
+
+@example
+awk '/12/ @{ print $0 @}
+ /21/ @{ print $0 @}' BBS-list inventory-shipped
+@end example
+
+@noindent
+we get the following output:
+
+@example
+aardvark 555-5553 1200/300 B
+alpo-net 555-3412 2400/1200/300 A
+barfly 555-7685 1200/300 A
+bites 555-1675 2400/1200/300 A
+core 555-2912 1200/300 C
+fooey 555-1234 2400/1200/300 B
+foot 555-6699 1200/300 B
+macfoo 555-6480 1200/300 A
+sdace 555-3430 2400/1200/300 A
+sabafoo 555-2127 1200/300 C
+sabafoo 555-2127 1200/300 C
+Jan 21 36 64 620
+Apr 21 70 74 514
+@end example
+
+@noindent
+Note how the line in @file{BBS-list} beginning with @samp{sabafoo}
+was printed twice, once for each rule.
+
+@node More Complex, Running gawk, Two Rules, Getting Started
+@comment node-name, next, previous, up
+@section A More Complex Example
+
+Here is an example to give you an idea of what typical @code{awk}
+programs do. This example shows how @code{awk} can be used to
+summarize, select, and rearrange the output of another utility. It uses
+features that haven't been covered yet, so don't worry if you don't
+understand all the details.
+
+@example
+ls -l | awk '$5 == "Nov" @{ sum += $4 @}
+ END @{ print sum @}'
+@end example
+
+This command prints the total number of bytes in all the files in the
+current directory that were last modified in November (of any year).
+(In the C shell you would need to type a semicolon and then a backslash
+at the end of the first line; in a @sc{posix}-compliant shell, such as the
+Bourne shell or the Bourne-Again shell, you can type the example as shown.)
+
+The @w{@samp{ls -l}} part of this example is a command that gives you a
+listing of the files in a directory, including file size and date.
+Its output looks like this:@refill
+
+@example
+-rw-r--r-- 1 close 1933 Nov 7 13:05 Makefile
+-rw-r--r-- 1 close 10809 Nov 7 13:03 gawk.h
+-rw-r--r-- 1 close 983 Apr 13 12:14 gawk.tab.h
+-rw-r--r-- 1 close 31869 Jun 15 12:20 gawk.y
+-rw-r--r-- 1 close 22414 Nov 7 13:03 gawk1.c
+-rw-r--r-- 1 close 37455 Nov 7 13:03 gawk2.c
+-rw-r--r-- 1 close 27511 Dec 9 13:07 gawk3.c
+-rw-r--r-- 1 close 7989 Nov 7 13:03 gawk4.c
+@end example
+
+@noindent
+The first field contains read-write permissions, the second field contains
+the number of links to the file, and the third field identifies the owner of
+the file. The fourth field contains the size of the file in bytes. The
+fifth, sixth, and seventh fields contain the month, day, and time,
+respectively, that the file was last modified. Finally, the eighth field
+contains the name of the file.
+
+The @code{$5 == "Nov"} in our @code{awk} program is an expression that
+tests whether the fifth field of the output from @w{@samp{ls -l}}
+matches the string @samp{Nov}. Each time a line has the string
+@samp{Nov} in its fifth field, the action @samp{@{ sum += $4 @}} is
+performed. This adds the fourth field (the file size) to the variable
+@code{sum}. As a result, when @code{awk} has finished reading all the
+input lines, @code{sum} is the sum of the sizes of files whose
+lines matched the pattern. (This works because @code{awk} variables
+are automatically initialized to zero.)@refill
+
+After the last line of output from @code{ls} has been processed, the
+@code{END} rule is executed, and the value of @code{sum} is
+printed. In this example, the value of @code{sum} would be 80600.@refill
+
+These more advanced @code{awk} techniques are covered in later sections
+(@pxref{Actions, ,Overview of Actions}). Before you can move on to more
+advanced @code{awk} programming, you have to know how @code{awk} interprets
+your input and displays your output. By manipulating fields and using
+@code{print} statements, you can produce some very useful and spectacular
+looking reports.@refill
+
+@node Running gawk, Comments, More Complex, Getting Started
+@section How to Run @code{awk} Programs
+
+@ignore
+Date: Mon, 26 Aug 91 09:48:10 +0200
+From: gatech!vsoc07.cern.ch!matheys (Jean-Pol Matheys (CERN - ECP Division))
+To: uunet.UU.NET!skeeve!arnold
+Subject: RE: status check
+
+The introduction of Chapter 2 (i.e. before 2.1) should include
+the whole of section 2.4 - it's better to tell people how to run awk programs
+before giving any examples
+
+ADR --- he's right. but for now, don't do this because the rest of the
+chapter would need some rewriting.
+@end ignore
+
+@cindex command line formats
+@cindex running @code{awk} programs
+There are several ways to run an @code{awk} program. If the program is
+short, it is easiest to include it in the command that runs @code{awk},
+like this:
+
+@example
+awk '@var{program}' @var{input-file1} @var{input-file2} @dots{}
+@end example
+
+@noindent
+where @var{program} consists of a series of patterns and actions, as
+described earlier.
+
+When the program is long, it is usually more convenient to put it in a file
+and run it with a command like this:
+
+@example
+awk -f @var{program-file} @var{input-file1} @var{input-file2} @dots{}
+@end example
+
+@menu
+* One-shot:: Running a short throw-away @code{awk} program.
+* Read Terminal:: Using no input files (input from
+ terminal instead).
+* Long:: Putting permanent @code{awk} programs in files.
+* Executable Scripts:: Making self-contained @code{awk} programs.
+@end menu
+
+@node One-shot, Read Terminal, Running gawk, Running gawk
+@subsection One-shot Throw-away @code{awk} Programs
+
+Once you are familiar with @code{awk}, you will often type simple
+programs at the moment you want to use them. Then you can write the
+program as the first argument of the @code{awk} command, like this:
+
+@example
+awk '@var{program}' @var{input-file1} @var{input-file2} @dots{}
+@end example
+
+@noindent
+where @var{program} consists of a series of @var{patterns} and
+@var{actions}, as described earlier.
+
+@cindex single quotes, why needed
+This command format instructs the shell to start @code{awk} and use the
+@var{program} to process records in the input file(s). There are single
+quotes around @var{program} so that the shell doesn't interpret any
+@code{awk} characters as special shell characters. They also cause the
+shell to treat all of @var{program} as a single argument for
+@code{awk} and allow @var{program} to be more than one line long.@refill
+
+This format is also useful for running short or medium-sized @code{awk}
+programs from shell scripts, because it avoids the need for a separate
+file for the @code{awk} program. A self-contained shell script is more
+reliable since there are no other files to misplace.
+
+@node Read Terminal, Long, One-shot, Running gawk
+@subsection Running @code{awk} without Input Files
+
+@cindex standard input
+@cindex input, standard
+You can also run @code{awk} without any input files. If you type the
+command line:@refill
+
+@example
+awk '@var{program}'
+@end example
+
+@noindent
+then @code{awk} applies the @var{program} to the @dfn{standard input},
+which usually means whatever you type on the terminal. This continues
+until you indicate end-of-file by typing @kbd{Control-d}.
+
+For example, if you execute this command:
+
+@example
+awk '/th/'
+@end example
+
+@noindent
+whatever you type next is taken as data for that @code{awk}
+program. If you go on to type the following data:
+
+@example
+Kathy
+Ben
+Tom
+Beth
+Seth
+Karen
+Thomas
+@kbd{Control-d}
+@end example
+
+@noindent
+then @code{awk} prints this output:
+
+@example
+Kathy
+Beth
+Seth
+@end example
+
+@noindent
+@cindex case sensitivity
+@cindex pattern, case sensitive
+as matching the pattern @samp{th}. Notice that it did not recognize
+@samp{Thomas} as matching the pattern. The @code{awk} language is
+@dfn{case sensitive}, and matches patterns exactly. (However, you can
+override this with the variable @code{IGNORECASE}.
+@xref{Case-sensitivity, ,Case-sensitivity in Matching}.)
+
+@node Long, Executable Scripts, Read Terminal, Running gawk
+@subsection Running Long Programs
+
+@cindex running long programs
+@cindex @samp{-f} option
+@cindex program file
+@cindex file, @code{awk} program
+Sometimes your @code{awk} programs can be very long. In this case it is
+more convenient to put the program into a separate file. To tell
+@code{awk} to use that file for its program, you type:@refill
+
+@example
+awk -f @var{source-file} @var{input-file1} @var{input-file2} @dots{}
+@end example
+
+The @samp{-f} instructs the @code{awk} utility to get the @code{awk} program
+from the file @var{source-file}. Any file name can be used for
+@var{source-file}. For example, you could put the program:@refill
+
+@example
+/th/
+@end example
+
+@noindent
+into the file @file{th-prog}. Then this command:
+
+@example
+awk -f th-prog
+@end example
+
+@noindent
+does the same thing as this one:
+
+@example
+awk '/th/'
+@end example
+
+@noindent
+which was explained earlier (@pxref{Read Terminal, ,Running @code{awk} without Input Files}).
+Note that you don't usually need single quotes around the file name that you
+specify with @samp{-f}, because most file names don't contain any of the shell's
+special characters. Notice that in @file{th-prog}, the @code{awk}
+program did not have single quotes around it. The quotes are only needed
+for programs that are provided on the @code{awk} command line.
+
+If you want to identify your @code{awk} program files clearly as such,
+you can add the extension @file{.awk} to the file name. This doesn't
+affect the execution of the @code{awk} program, but it does make
+``housekeeping'' easier.
+
+@node Executable Scripts, , Long, Running gawk
+@c node-name, next, previous, up
+@subsection Executable @code{awk} Programs
+@cindex executable scripts
+@cindex scripts, executable
+@cindex self contained programs
+@cindex program, self contained
+@cindex @samp{#!}
+
+Once you have learned @code{awk}, you may want to write self-contained
+@code{awk} scripts, using the @samp{#!} script mechanism. You can do
+this on many Unix systems @footnote{The @samp{#!} mechanism works on
+Unix systems derived from Berkeley Unix, System V Release 4, and some System
+V Release 3 systems.} (and someday on GNU).@refill
+
+For example, you could create a text file named @file{hello}, containing
+the following (where @samp{BEGIN} is a feature we have not yet
+discussed):
+
+@example
+#! /bin/awk -f
+
+# a sample awk program
+BEGIN @{ print "hello, world" @}
+@end example
+
+@noindent
+After making this file executable (with the @code{chmod} command), you
+can simply type:
+
+@example
+hello
+@end example
+
+@noindent
+at the shell, and the system will arrange to run @code{awk} @footnote{The
+line beginning with @samp{#!} lists the full pathname of an interpreter
+to be run, and an optional initial command line argument to pass to that
+interpreter. The operating system then runs the interpreter with the given
+argument and the full argument list of the executed program. The first argument
+in the list is the full pathname of the @code{awk} program. The rest of the
+argument list will either be options to @code{awk}, or data files,
+or both.} as if you had typed:@refill
+
+@example
+awk -f hello
+@end example
+
+@noindent
+Self-contained @code{awk} scripts are useful when you want to write a
+program which users can invoke without knowing that the program is
+written in @code{awk}.
+
+@cindex shell scripts
+@cindex scripts, shell
+If your system does not support the @samp{#!} mechanism, you can get a
+similar effect using a regular shell script. It would look something
+like this:
+
+@example
+: The colon makes sure this script is executed by the Bourne shell.
+awk '@var{program}' "$@@"
+@end example
+
+Using this technique, it is @emph{vital} to enclose the @var{program} in
+single quotes to protect it from interpretation by the shell. If you
+omit the quotes, only a shell wizard can predict the results.
+
+The @samp{"$@@"} causes the shell to forward all the command line
+arguments to the @code{awk} program, without interpretation. The first
+line, which starts with a colon, is used so that this shell script will
+work even if invoked by a user who uses the C shell.
+@c Someday: (See @cite{The Bourne Again Shell}, by ??.)
+
+@node Comments, Statements/Lines, Running gawk, Getting Started
+@section Comments in @code{awk} Programs
+@cindex @samp{#}
+@cindex comments
+@cindex use of comments
+@cindex documenting @code{awk} programs
+@cindex programs, documenting
+
+A @dfn{comment} is some text that is included in a program for the sake
+of human readers, and that is not really part of the program. Comments
+can explain what the program does, and how it works. Nearly all
+programming languages have provisions for comments, because programs are
+typically hard to understand without their extra help.
+
+In the @code{awk} language, a comment starts with the sharp sign
+character, @samp{#}, and continues to the end of the line. The
+@code{awk} language ignores the rest of a line following a sharp sign.
+For example, we could have put the following into @file{th-prog}:@refill
+
+@smallexample
+# This program finds records containing the pattern @samp{th}. This is how
+# you continue comments on additional lines.
+/th/
+@end smallexample
+
+You can put comment lines into keyboard-composed throw-away @code{awk}
+programs also, but this usually isn't very useful; the purpose of a
+comment is to help you or another person understand the program at
+a later time.@refill
+
+@node Statements/Lines, When, Comments, Getting Started
+@section @code{awk} Statements versus Lines
+
+Most often, each line in an @code{awk} program is a separate statement or
+separate rule, like this:
+
+@example
+awk '/12/ @{ print $0 @}
+ /21/ @{ print $0 @}' BBS-list inventory-shipped
+@end example
+
+But sometimes statements can be more than one line, and lines can
+contain several statements. You can split a statement into multiple
+lines by inserting a newline after any of the following:@refill
+
+@example
+, @{ ? : || && do else
+@end example
+
+@noindent
+A newline at any other point is considered the end of the statement.
+(Splitting lines after @samp{?} and @samp{:} is a minor @code{gawk}
+extension. The @samp{?} and @samp{:} referred to here is the
+three operand conditional expression described in
+@ref{Conditional Exp, ,Conditional Expressions}.)@refill
+
+@cindex backslash continuation
+@cindex continuation of lines
+If you would like to split a single statement into two lines at a point
+where a newline would terminate it, you can @dfn{continue} it by ending the
+first line with a backslash character, @samp{\}. This is allowed
+absolutely anywhere in the statement, even in the middle of a string or
+regular expression. For example:
+
+@example
+awk '/This program is too long, so continue it\
+ on the next line/ @{ print $1 @}'
+@end example
+
+@noindent
+We have generally not used backslash continuation in the sample programs in
+this manual. Since in @code{gawk} there is no limit on the length of a line,
+it is never strictly necessary; it just makes programs prettier. We have
+preferred to make them even more pretty by keeping the statements short.
+Backslash continuation is most useful when your @code{awk} program is in a
+separate source file, instead of typed in on the command line. You should
+also note that many @code{awk} implementations are more picky about where
+you may use backslash continuation. For maximal portability of your @code{awk}
+programs, it is best not to split your lines in the middle of a regular
+expression or a string.@refill
+
+@strong{Warning: backslash continuation does not work as described above
+with the C shell.} Continuation with backslash works for @code{awk}
+programs in files, and also for one-shot programs @emph{provided} you
+are using a @sc{posix}-compliant shell, such as the Bourne shell or the
+Bourne-again shell. But the C shell used on Berkeley Unix behaves
+differently! There, you must use two backslashes in a row, followed by
+a newline.@refill
+
+@cindex multiple statements on one line
+When @code{awk} statements within one rule are short, you might want to put
+more than one of them on a line. You do this by separating the statements
+with a semicolon, @samp{;}.
+This also applies to the rules themselves.
+Thus, the previous program could have been written:@refill
+
+@example
+/12/ @{ print $0 @} ; /21/ @{ print $0 @}
+@end example
+
+@noindent
+@strong{Note:} the requirement that rules on the same line must be
+separated with a semicolon is a recent change in the @code{awk}
+language; it was done for consistency with the treatment of statements
+within an action.
+
+@node When, , Statements/Lines, Getting Started
+@section When to Use @code{awk}
+
+@cindex when to use @code{awk}
+@cindex applications of @code{awk}
+You might wonder how @code{awk} might be useful for you. Using additional
+utility programs, more advanced patterns, field separators, arithmetic
+statements, and other selection criteria, you can produce much more
+complex output. The @code{awk} language is very useful for producing
+reports from large amounts of raw data, such as summarizing information
+from the output of other utility programs like @code{ls}.
+(@xref{More Complex, ,A More Complex Example}.)
+
+Programs written with @code{awk} are usually much smaller than they would
+be in other languages. This makes @code{awk} programs easy to compose and
+use. Often @code{awk} programs can be quickly composed at your terminal,
+used once, and thrown away. Since @code{awk} programs are interpreted, you
+can avoid the usually lengthy edit-compile-test-debug cycle of software
+development.
+
+Complex programs have been written in @code{awk}, including a complete
+retargetable assembler for 8-bit microprocessors (@pxref{Glossary}, for
+more information) and a microcode assembler for a special purpose Prolog
+computer. However, @code{awk}'s capabilities are strained by tasks of
+such complexity.
+
+If you find yourself writing @code{awk} scripts of more than, say, a few
+hundred lines, you might consider using a different programming
+language. Emacs Lisp is a good choice if you need sophisticated string
+or pattern matching capabilities. The shell is also good at string and
+pattern matching; in addition, it allows powerful use of the system
+utilities. More conventional languages, such as C, C++, and Lisp, offer
+better facilities for system programming and for managing the complexity
+of large programs. Programs in these languages may require more lines
+of source code than the equivalent @code{awk} programs, but they are
+easier to maintain and usually run more efficiently.@refill
+
+@node Reading Files, Printing, Getting Started, Top
+@chapter Reading Input Files
+
+@cindex reading files
+@cindex input
+@cindex standard input
+@vindex FILENAME
+In the typical @code{awk} program, all input is read either from the
+standard input (by default the keyboard, but often a pipe from another
+command) or from files whose names you specify on the @code{awk} command
+line. If you specify input files, @code{awk} reads them in order, reading
+all the data from one before going on to the next. The name of the current
+input file can be found in the built-in variable @code{FILENAME}
+(@pxref{Built-in Variables}).@refill
+
+The input is read in units called records, and processed by the
+rules one record at a time. By default, each record is one line. Each
+record is split automatically into fields, to make it more
+convenient for a rule to work on its parts.
+
+On rare occasions you will need to use the @code{getline} command,
+which can do explicit input from any number of files
+(@pxref{Getline, ,Explicit Input with @code{getline}}).@refill
+
+@menu
+* Records:: Controlling how data is split into records.
+* Fields:: An introduction to fields.
+* Non-Constant Fields:: Non-constant Field Numbers.
+* Changing Fields:: Changing the Contents of a Field.
+* Field Separators:: The field separator and how to change it.
+* Constant Size:: Reading constant width data.
+* Multiple Line:: Reading multi-line records.
+* Getline:: Reading files under explicit program control
+ using the @code{getline} function.
+* Close Input:: Closing an input file (so you can read from
+ the beginning once more).
+@end menu
+
+@node Records, Fields, Reading Files, Reading Files
+@section How Input is Split into Records
+
+@cindex record separator
+The @code{awk} language divides its input into records and fields.
+Records are separated by a character called the @dfn{record separator}.
+By default, the record separator is the newline character, defining
+a record to be a single line of text.@refill
+
+@iftex
+@cindex changing the record separator
+@end iftex
+@vindex RS
+Sometimes you may want to use a different character to separate your
+records. You can use a different character by changing the built-in
+variable @code{RS}. The value of @code{RS} is a string that says how
+to separate records; the default value is @code{"\n"}, the string containing
+just a newline character. This is why records are, by default, single lines.
+
+@code{RS} can have any string as its value, but only the first character
+of the string is used as the record separator. The other characters are
+ignored. @code{RS} is exceptional in this regard; @code{awk} uses the
+full value of all its other built-in variables.@refill
+
+@ignore
+Someday this should be true!
+
+The value of @code{RS} is not limited to a one-character string. It can
+be any regular expression (@pxref{Regexp, ,Regular Expressions as Patterns}).
+In general, each record
+ends at the next string that matches the regular expression; the next
+record starts at the end of the matching string. This general rule is
+actually at work in the usual case, where @code{RS} contains just a
+newline: a record ends at the beginning of the next matching string (the
+next newline in the input) and the following record starts just after
+the end of this string (at the first character of the following line).
+The newline, since it matches @code{RS}, is not part of either record.@refill
+@end ignore
+
+You can change the value of @code{RS} in the @code{awk} program with the
+assignment operator, @samp{=} (@pxref{Assignment Ops, ,Assignment Expressions}).
+The new record-separator character should be enclosed in quotation marks to make
+a string constant. Often the right time to do this is at the beginning
+of execution, before any input has been processed, so that the very
+first record will be read with the proper separator. To do this, use
+the special @code{BEGIN} pattern
+(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}). For
+example:@refill
+
+@example
+awk 'BEGIN @{ RS = "/" @} ; @{ print $0 @}' BBS-list
+@end example
+
+@noindent
+changes the value of @code{RS} to @code{"/"}, before reading any input.
+This is a string whose first character is a slash; as a result, records
+are separated by slashes. Then the input file is read, and the second
+rule in the @code{awk} program (the action with no pattern) prints each
+record. Since each @code{print} statement adds a newline at the end of
+its output, the effect of this @code{awk} program is to copy the input
+with each slash changed to a newline.
+
+Another way to change the record separator is on the command line,
+using the variable-assignment feature
+(@pxref{Command Line, ,Invoking @code{awk}}).@refill
+
+@example
+awk '@{ print $0 @}' RS="/" BBS-list
+@end example
+
+@noindent
+This sets @code{RS} to @samp{/} before processing @file{BBS-list}.
+
+Reaching the end of an input file terminates the current input record,
+even if the last character in the file is not the character in @code{RS}.
+
+@ignore
+@c merge the preceding paragraph and this stuff into one paragraph
+@c and put it in an `expert info' section.
+This produces correct behavior in the vast majority of cases, although
+the following (extreme) pipeline prints a surprising @samp{1}. (There
+is one field, consisting of a newline.)
+
+@example
+echo | awk 'BEGIN @{ RS = "a" @} ; @{ print NF @}'
+@end example
+
+@end ignore
+
+The empty string, @code{""} (a string of no characters), has a special meaning
+as the value of @code{RS}: it means that records are separated only
+by blank lines. @xref{Multiple Line, ,Multiple-Line Records}, for more details.
+
+@cindex number of records, @code{NR} or @code{FNR}
+@vindex NR
+@vindex FNR
+The @code{awk} utility keeps track of the number of records that have
+been read so far from the current input file. This value is stored in a
+built-in variable called @code{FNR}. It is reset to zero when a new
+file is started. Another built-in variable, @code{NR}, is the total
+number of input records read so far from all files. It starts at zero
+but is never automatically reset to zero.
+
+If you change the value of @code{RS} in the middle of an @code{awk} run,
+the new value is used to delimit subsequent records, but the record
+currently being processed (and records already processed) are not
+affected.
+
+@node Fields, Non-Constant Fields, Records, Reading Files
+@section Examining Fields
+
+@cindex examining fields
+@cindex fields
+@cindex accessing fields
+When @code{awk} reads an input record, the record is
+automatically separated or @dfn{parsed} by the interpreter into chunks
+called @dfn{fields}. By default, fields are separated by whitespace,
+like words in a line.
+Whitespace in @code{awk} means any string of one or more spaces and/or
+tabs; other characters such as newline, formfeed, and so on, that are
+considered whitespace by other languages are @emph{not} considered
+whitespace by @code{awk}.@refill
+
+The purpose of fields is to make it more convenient for you to refer to
+these pieces of the record. You don't have to use them---you can
+operate on the whole record if you wish---but fields are what make
+simple @code{awk} programs so powerful.
+
+@cindex @code{$} (field operator)
+@cindex operators, @code{$}
+To refer to a field in an @code{awk} program, you use a dollar-sign,
+@samp{$}, followed by the number of the field you want. Thus, @code{$1}
+refers to the first field, @code{$2} to the second, and so on. For
+example, suppose the following is a line of input:@refill
+
+@example
+This seems like a pretty nice example.
+@end example
+
+@noindent
+Here the first field, or @code{$1}, is @samp{This}; the second field, or
+@code{$2}, is @samp{seems}; and so on. Note that the last field,
+@code{$7}, is @samp{example.}. Because there is no space between the
+@samp{e} and the @samp{.}, the period is considered part of the seventh
+field.@refill
+
+No matter how many fields there are, the last field in a record can be
+represented by @code{$NF}. So, in the example above, @code{$NF} would
+be the same as @code{$7}, which is @samp{example.}. Why this works is
+explained below (@pxref{Non-Constant Fields, ,Non-constant Field Numbers}).
+If you try to refer to a field beyond the last one, such as @code{$8}
+when the record has only 7 fields, you get the empty string.@refill
+
+@vindex NF
+@cindex number of fields, @code{NF}
+Plain @code{NF}, with no @samp{$}, is a built-in variable whose value
+is the number of fields in the current record.
+
+@code{$0}, which looks like an attempt to refer to the zeroth field, is
+a special case: it represents the whole input record. This is what you
+would use if you weren't interested in fields.
+
+Here are some more examples:
+
+@example
+awk '$1 ~ /foo/ @{ print $0 @}' BBS-list
+@end example
+
+@noindent
+This example prints each record in the file @file{BBS-list} whose first
+field contains the string @samp{foo}. The operator @samp{~} is called a
+@dfn{matching operator} (@pxref{Comparison Ops, ,Comparison Expressions});
+it tests whether a string (here, the field @code{$1}) matches a given regular
+expression.@refill
+
+By contrast, the following example:
+
+@example
+awk '/foo/ @{ print $1, $NF @}' BBS-list
+@end example
+
+@noindent
+looks for @samp{foo} in @emph{the entire record} and prints the first
+field and the last field for each input record containing a
+match.@refill
+
+@node Non-Constant Fields, Changing Fields, Fields, Reading Files
+@section Non-constant Field Numbers
+
+The number of a field does not need to be a constant. Any expression in
+the @code{awk} language can be used after a @samp{$} to refer to a
+field. The value of the expression specifies the field number. If the
+value is a string, rather than a number, it is converted to a number.
+Consider this example:@refill
+
+@example
+awk '@{ print $NR @}'
+@end example
+
+@noindent
+Recall that @code{NR} is the number of records read so far: 1 in the
+first record, 2 in the second, etc. So this example prints the first
+field of the first record, the second field of the second record, and so
+on. For the twentieth record, field number 20 is printed; most likely,
+the record has fewer than 20 fields, so this prints a blank line.
+
+Here is another example of using expressions as field numbers:
+
+@example
+awk '@{ print $(2*2) @}' BBS-list
+@end example
+
+The @code{awk} language must evaluate the expression @code{(2*2)} and use
+its value as the number of the field to print. The @samp{*} sign
+represents multiplication, so the expression @code{2*2} evaluates to 4.
+The parentheses are used so that the multiplication is done before the
+@samp{$} operation; they are necessary whenever there is a binary
+operator in the field-number expression. This example, then, prints the
+hours of operation (the fourth field) for every line of the file
+@file{BBS-list}.@refill
+
+If the field number you compute is zero, you get the entire record.
+Thus, @code{$(2-2)} has the same value as @code{$0}. Negative field
+numbers are not allowed.
+
+The number of fields in the current record is stored in the built-in
+variable @code{NF} (@pxref{Built-in Variables}). The expression
+@code{$NF} is not a special feature: it is the direct consequence of
+evaluating @code{NF} and using its value as a field number.
+
+@node Changing Fields, Field Separators, Non-Constant Fields, Reading Files
+@section Changing the Contents of a Field
+
+@cindex field, changing contents of
+@cindex changing contents of a field
+@cindex assignment to fields
+You can change the contents of a field as seen by @code{awk} within an
+@code{awk} program; this changes what @code{awk} perceives as the
+current input record. (The actual input is untouched: @code{awk} never
+modifies the input file.)
+
+Consider this example:
+
+@smallexample
+awk '@{ $3 = $2 - 10; print $2, $3 @}' inventory-shipped
+@end smallexample
+
+@noindent
+The @samp{-} sign represents subtraction, so this program reassigns
+field three, @code{$3}, to be the value of field two minus ten,
+@code{$2 - 10}. (@xref{Arithmetic Ops, ,Arithmetic Operators}.)
+Then field two, and the new value for field three, are printed.
+
+In order for this to work, the text in field @code{$2} must make sense
+as a number; the string of characters must be converted to a number in
+order for the computer to do arithmetic on it. The number resulting
+from the subtraction is converted back to a string of characters which
+then becomes field three.
+@xref{Conversion, ,Conversion of Strings and Numbers}.@refill
+
+When you change the value of a field (as perceived by @code{awk}), the
+text of the input record is recalculated to contain the new field where
+the old one was. Therefore, @code{$0} changes to reflect the altered
+field. Thus,
+
+@smallexample
+awk '@{ $2 = $2 - 10; print $0 @}' inventory-shipped
+@end smallexample
+
+@noindent
+prints a copy of the input file, with 10 subtracted from the second
+field of each line.
+
+You can also assign contents to fields that are out of range. For
+example:
+
+@smallexample
+awk '@{ $6 = ($5 + $4 + $3 + $2) ; print $6 @}' inventory-shipped
+@end smallexample
+
+@noindent
+We've just created @code{$6}, whose value is the sum of fields
+@code{$2}, @code{$3}, @code{$4}, and @code{$5}. The @samp{+} sign
+represents addition. For the file @file{inventory-shipped}, @code{$6}
+represents the total number of parcels shipped for a particular month.
+
+Creating a new field changes the internal @code{awk} copy of the current
+input record---the value of @code{$0}. Thus, if you do @samp{print $0}
+after adding a field, the record printed includes the new field, with
+the appropriate number of field separators between it and the previously
+existing fields.
+
+This recomputation affects and is affected by several features not yet
+discussed, in particular, the @dfn{output field separator}, @code{OFS},
+which is used to separate the fields (@pxref{Output Separators}), and
+@code{NF} (the number of fields; @pxref{Fields, ,Examining Fields}).
+For example, the value of @code{NF} is set to the number of the highest
+field you create.@refill
+
+Note, however, that merely @emph{referencing} an out-of-range field
+does @emph{not} change the value of either @code{$0} or @code{NF}.
+Referencing an out-of-range field merely produces a null string. For
+example:@refill
+
+@smallexample
+if ($(NF+1) != "")
+ print "can't happen"
+else
+ print "everything is normal"
+@end smallexample
+
+@noindent
+should print @samp{everything is normal}, because @code{NF+1} is certain
+to be out of range. (@xref{If Statement, ,The @code{if} Statement},
+for more information about @code{awk}'s @code{if-else} statements.)@refill
+
+It is important to note that assigning to a field will change the
+value of @code{$0}, but will not change the value of @code{NF},
+even when you assign the null string to a field. For example:
+
+@smallexample
+echo a b c d | awk '@{ OFS = ":"; $2 = "" ; print ; print NF @}'
+@end smallexample
+
+@noindent
+prints
+
+@smallexample
+a::c:d
+4
+@end smallexample
+
+@noindent
+The field is still there, it just has an empty value. You can tell
+because there are two colons in a row.
+
+@node Field Separators, Constant Size, Changing Fields, Reading Files
+@section Specifying how Fields are Separated
+@vindex FS
+@cindex fields, separating
+@cindex field separator, @code{FS}
+@cindex @samp{-F} option
+
+(This section is rather long; it describes one of the most fundamental
+operations in @code{awk}. If you are a novice with @code{awk}, we
+recommend that you re-read this section after you have studied the
+section on regular expressions, @ref{Regexp, ,Regular Expressions as Patterns}.)
+
+The way @code{awk} splits an input record into fields is controlled by
+the @dfn{field separator}, which is a single character or a regular
+expression. @code{awk} scans the input record for matches for the
+separator; the fields themselves are the text between the matches. For
+example, if the field separator is @samp{oo}, then the following line:
+
+@smallexample
+moo goo gai pan
+@end smallexample
+
+@noindent
+would be split into three fields: @samp{m}, @samp{@ g} and @samp{@ gai@
+pan}.
+
+The field separator is represented by the built-in variable @code{FS}.
+Shell programmers take note! @code{awk} does not use the name @code{IFS}
+which is used by the shell.@refill
+
+You can change the value of @code{FS} in the @code{awk} program with the
+assignment operator, @samp{=} (@pxref{Assignment Ops, ,Assignment Expressions}).
+Often the right time to do this is at the beginning of execution,
+before any input has been processed, so that the very first record
+will be read with the proper separator. To do this, use the special
+@code{BEGIN} pattern
+(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}).
+For example, here we set the value of @code{FS} to the string
+@code{","}:@refill
+
+@smallexample
+awk 'BEGIN @{ FS = "," @} ; @{ print $2 @}'
+@end smallexample
+
+@noindent
+Given the input line,
+
+@smallexample
+John Q. Smith, 29 Oak St., Walamazoo, MI 42139
+@end smallexample
+
+@noindent
+this @code{awk} program extracts the string @samp{@ 29 Oak St.}.
+
+@cindex field separator, choice of
+@cindex regular expressions as field separators
+Sometimes your input data will contain separator characters that don't
+separate fields the way you thought they would. For instance, the
+person's name in the example we've been using might have a title or
+suffix attached, such as @samp{John Q. Smith, LXIX}. From input
+containing such a name:
+
+@smallexample
+John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
+@end smallexample
+
+@noindent
+the previous sample program would extract @samp{@ LXIX}, instead of
+@samp{@ 29 Oak St.}. If you were expecting the program to print the
+address, you would be surprised. So choose your data layout and
+separator characters carefully to prevent such problems.
+
+As you know, by default, fields are separated by whitespace sequences
+(spaces and tabs), not by single spaces: two spaces in a row do not
+delimit an empty field. The default value of the field separator is a
+string @w{@code{" "}} containing a single space. If this value were
+interpreted in the usual way, each space character would separate
+fields, so two spaces in a row would make an empty field between them.
+The reason this does not happen is that a single space as the value of
+@code{FS} is a special case: it is taken to specify the default manner
+of delimiting fields.
+
+If @code{FS} is any other single character, such as @code{","}, then
+each occurrence of that character separates two fields. Two consecutive
+occurrences delimit an empty field. If the character occurs at the
+beginning or the end of the line, that too delimits an empty field. The
+space character is the only single character which does not follow these
+rules.
+
+More generally, the value of @code{FS} may be a string containing any
+regular expression. Then each match in the record for the regular
+expression separates fields. For example, the assignment:@refill
+
+@smallexample
+FS = ", \t"
+@end smallexample
+
+@noindent
+makes every area of an input line that consists of a comma followed by a
+space and a tab, into a field separator. (@samp{\t} stands for a
+tab.)@refill
+
+For a less trivial example of a regular expression, suppose you want
+single spaces to separate fields the way single commas were used above.
+You can set @code{FS} to @w{@code{"[@ ]"}}. This regular expression
+matches a single space and nothing else.
+
+@c the following index entry is an overfull hbox. --mew 30jan1992
+@cindex field separator: on command line
+@cindex command line, setting @code{FS} on
+@code{FS} can be set on the command line. You use the @samp{-F} argument to
+do so. For example:
+
+@smallexample
+awk -F, '@var{program}' @var{input-files}
+@end smallexample
+
+@noindent
+sets @code{FS} to be the @samp{,} character. Notice that the argument uses
+a capital @samp{F}. Contrast this with @samp{-f}, which specifies a file
+containing an @code{awk} program. Case is significant in command options:
+the @samp{-F} and @samp{-f} options have nothing to do with each other.
+You can use both options at the same time to set the @code{FS} argument
+@emph{and} get an @code{awk} program from a file.@refill
+
+@c begin expert info
+The value used for the argument to @samp{-F} is processed in exactly the
+same way as assignments to the built-in variable @code{FS}. This means that
+if the field separator contains special characters, they must be escaped
+appropriately. For example, to use a @samp{\} as the field separator, you
+would have to type:
+
+@smallexample
+# same as FS = "\\"
+awk -F\\\\ '@dots{}' files @dots{}
+@end smallexample
+
+@noindent
+Since @samp{\} is used for quoting in the shell, @code{awk} will see
+@samp{-F\\}. Then @code{awk} processes the @samp{\\} for escape
+characters (@pxref{Constants, ,Constant Expressions}), finally yielding
+a single @samp{\} to be used for the field separator.
+@c end expert info
+
+As a special case, in compatibility mode
+(@pxref{Command Line, ,Invoking @code{awk}}), if the
+argument to @samp{-F} is @samp{t}, then @code{FS} is set to the tab
+character. (This is because if you type @samp{-F\t}, without the quotes,
+at the shell, the @samp{\} gets deleted, so @code{awk} figures that you
+really want your fields to be separated with tabs, and not @samp{t}s.
+Use @samp{-v FS="t"} on the command line if you really do want to separate
+your fields with @samp{t}s.)@refill
+
+For example, let's use an @code{awk} program file called @file{baud.awk}
+that contains the pattern @code{/300/}, and the action @samp{print $1}.
+Here is the program:
+
+@smallexample
+/300/ @{ print $1 @}
+@end smallexample
+
+Let's also set @code{FS} to be the @samp{-} character, and run the
+program on the file @file{BBS-list}. The following command prints a
+list of the names of the bulletin boards that operate at 300 baud and
+the first three digits of their phone numbers:@refill
+
+@smallexample
+awk -F- -f baud.awk BBS-list
+@end smallexample
+
+@noindent
+It produces this output:
+
+@smallexample
+aardvark 555
+alpo
+barfly 555
+bites 555
+camelot 555
+core 555
+fooey 555
+foot 555
+macfoo 555
+sdace 555
+sabafoo 555
+@end smallexample
+
+@noindent
+Note the second line of output. If you check the original file, you will
+see that the second line looked like this:
+
+@smallexample
+alpo-net 555-3412 2400/1200/300 A
+@end smallexample
+
+The @samp{-} as part of the system's name was used as the field
+separator, instead of the @samp{-} in the phone number that was
+originally intended. This demonstrates why you have to be careful in
+choosing your field and record separators.
+
+The following program searches the system password file, and prints
+the entries for users who have no password:
+
+@smallexample
+awk -F: '$2 == ""' /etc/passwd
+@end smallexample
+
+@noindent
+Here we use the @samp{-F} option on the command line to set the field
+separator. Note that fields in @file{/etc/passwd} are separated by
+colons. The second field represents a user's encrypted password, but if
+the field is empty, that user has no password.
+
+@c begin expert info
+According to the @sc{posix} standard, @code{awk} is supposed to behave
+as if each record is split into fields at the time that it is read.
+In particular, this means that you can change the value of @code{FS}
+after a record is read, but before any of the fields are referenced.
+The value of the fields (i.e. how they were split) should reflect the
+old value of @code{FS}, not the new one.
+
+However, many implementations of @code{awk} do not do this. Instead,
+they defer splitting the fields until a field reference actually happens,
+using the @emph{current} value of @code{FS}! This behavior can be difficult
+to diagnose. The following example illustrates the results of the two methods.
+(The @code{sed} command prints just the first line of @file{/etc/passwd}.)
+
+@smallexample
+sed 1q /etc/passwd | awk '@{ FS = ":" ; print $1 @}'
+@end smallexample
+
+@noindent
+will usually print
+
+@smallexample
+root
+@end smallexample
+
+@noindent
+on an incorrect implementation of @code{awk}, while @code{gawk}
+will print something like
+
+@smallexample
+root:nSijPlPhZZwgE:0:0:Root:/:
+@end smallexample
+@c end expert info
+
+@c begin expert info
+There is an important difference between the two cases of @samp{FS = @w{" "}}
+(a single blank) and @samp{FS = @w{"[ \t]+"}} (which is a regular expression
+matching one or more blanks or tabs). For both values of @code{FS}, fields
+are separated by runs of blanks and/or tabs. However, when the value of
+@code{FS} is @code{" "}, @code{awk} will strip leading and trailing whitespace
+from the record, and then decide where the fields are.
+
+For example, the following expression prints @samp{b}:
+
+@smallexample
+echo ' a b c d ' | awk '@{ print $2 @}'
+@end smallexample
+
+@noindent
+However, the following prints @samp{a}:
+
+@smallexample
+echo ' a b c d ' | awk 'BEGIN @{ FS = "[ \t]+" @} ; @{ print $2 @}'
+@end smallexample
+
+@noindent
+In this case, the first field is null.
+
+The stripping of leading and trailing whitespace also comes into
+play whenever @code{$0} is recomputed. For instance, this pipeline
+
+@smallexample
+echo ' a b c d' | awk '@{ print; $2 = $2; print @}'
+@end smallexample
+
+@noindent
+produces this output:
+
+@smallexample
+ a b c d
+a b c d
+@end smallexample
+
+@noindent
+The first @code{print} statement prints the record as it was read,
+with leading whitespace intact. The assignment to @code{$2} rebuilds
+@code{$0} by concatenating @code{$1} through @code{$NF} together,
+separated by the value of @code{OFS}. Since the leading whitespace
+was ignored when finding @code{$1}, it is not part of the new @code{$0}.
+Finally, the last @code{print} statement prints the new @code{$0}.
+@c end expert info
+
+The following table summarizes how fields are split, based on the
+value of @code{FS}.
+
+@table @code
+@item FS == " "
+Fields are separated by runs of whitespace. Leading and trailing
+whitespace are ignored. This is the default.
+
+@item FS == @var{any single character}
+Fields are separated by each occurrence of the character. Multiple
+successive occurrences delimit empty fields, as do leading and
+trailing occurrences.
+
+@item FS == @var{regexp}
+Fields are separated by occurrences of characters that match @var{regexp}.
+Leading and trailing matches of @var{regexp} delimit empty fields.
+@end table
+
+@node Constant Size, Multiple Line, Field Separators, Reading Files
+@section Reading Fixed-width Data
+
+(This section discusses an advanced, experimental feature. If you are
+a novice @code{awk} user, you may wish to skip it on the first reading.)
+
+@code{gawk} 2.13 introduced a new facility for dealing with fixed-width fields
+with no distinctive field separator. Data of this nature arises typically
+in one of at least two ways: the input for old FORTRAN programs where
+numbers are run together, and the output of programs that did not anticipate
+the use of their output as input for other programs.
+
+An example of the latter is a table where all the columns are lined up by
+the use of a variable number of spaces and @emph{empty fields are just
+spaces}. Clearly, @code{awk}'s normal field splitting based on @code{FS}
+will not work well in this case. (Although a portable @code{awk} program
+can use a series of @code{substr} calls on @code{$0}, this is awkward and
+inefficient for a large number of fields.)@refill
+
+The splitting of an input record into fixed-width fields is specified by
+assigning a string containing space-separated numbers to the built-in
+variable @code{FIELDWIDTHS}. Each number specifies the width of the field
+@emph{including} columns between fields. If you want to ignore the columns
+between fields, you can specify the width as a separate field that is
+subsequently ignored.
+
+The following data is the output of the @code{w} utility. It is useful
+to illustrate the use of @code{FIELDWIDTHS}.
+
+@smallexample
+ 10:06pm up 21 days, 14:04, 23 users
+User tty login@ idle JCPU PCPU what
+hzuo ttyV0 8:58pm 9 5 vi p24.tex
+hzang ttyV3 6:37pm 50 -csh
+eklye ttyV5 9:53pm 7 1 em thes.tex
+dportein ttyV6 8:17pm 1:47 -csh
+gierd ttyD3 10:00pm 1 elm
+dave ttyD4 9:47pm 4 4 w
+brent ttyp0 26Jun91 4:46 26:46 4:41 bash
+dave ttyq4 26Jun9115days 46 46 wnewmail
+@end smallexample
+
+The following program takes the above input, converts the idle time to
+number of seconds and prints out the first two fields and the calculated
+idle time. (This program uses a number of @code{awk} features that
+haven't been introduced yet.)@refill
+
+@smallexample
+BEGIN @{ FIELDWIDTHS = "9 6 10 6 7 7 35" @}
+NR > 2 @{
+ idle = $4
+ sub(/^ */, "", idle) # strip leading spaces
+ if (idle == "") idle = 0
+ if (idle ~ /:/) @{ split(idle, t, ":"); idle = t[1] * 60 + t[2] @}
+ if (idle ~ /days/) @{ idle *= 24 * 60 * 60 @}
+
+ print $1, $2, idle
+@}
+@end smallexample
+
+Here is the result of running the program on the data:
+
+@smallexample
+hzuo ttyV0 0
+hzang ttyV3 50
+eklye ttyV5 0
+dportein ttyV6 107
+gierd ttyD3 1
+dave ttyD4 0
+brent ttyp0 286
+dave ttyq4 1296000
+@end smallexample
+
+Another (possibly more practical) example of fixed-width input data
+would be the input from a deck of balloting cards. In some parts of
+the United States, voters make their choices by punching holes in computer
+cards. These cards are then processed to count the votes for any particular
+candidate or on any particular issue. Since a voter may choose not to
+vote on some issue, any column on the card may be empty. An @code{awk}
+program for processing such data could use the @code{FIELDWIDTHS} feature
+to simplify reading the data.@refill
+
+@c of course, getting gawk to run on a system with card readers is
+@c another story!
+
+This feature is still experimental, and will likely evolve over time.
+
+@node Multiple Line, Getline, Constant Size, Reading Files
+@section Multiple-Line Records
+
+@cindex multiple line records
+@cindex input, multiple line records
+@cindex reading files, multiple line records
+@cindex records, multiple line
+In some data bases, a single line cannot conveniently hold all the
+information in one entry. In such cases, you can use multi-line
+records.
+
+The first step in doing this is to choose your data format: when records
+are not defined as single lines, how do you want to define them?
+What should separate records?
+
+One technique is to use an unusual character or string to separate
+records. For example, you could use the formfeed character (written
+@code{\f} in @code{awk}, as in C) to separate them, making each record
+a page of the file. To do this, just set the variable @code{RS} to
+@code{"\f"} (a string containing the formfeed character). Any
+other character could equally well be used, as long as it won't be part
+of the data in a record.@refill
+
+@ignore
+Another technique is to have blank lines separate records. The string
+@code{"^\n+"} is a regular expression that matches any sequence of
+newlines starting at the beginning of a line---in other words, it
+matches a sequence of blank lines. If you set @code{RS} to this string,
+a record always ends at the first blank line encountered. In
+addition, a regular expression always matches the longest possible
+sequence when there is a choice. So the next record doesn't start until
+the first nonblank line that follows---no matter how many blank lines
+appear in a row, they are considered one record-separator.
+@end ignore
+
+Another technique is to have blank lines separate records. By a special
+dispensation, a null string as the value of @code{RS} indicates that
+records are separated by one or more blank lines. If you set @code{RS}
+to the null string, a record always ends at the first blank line
+encountered. And the next record doesn't start until the first nonblank
+line that follows---no matter how many blank lines appear in a row, they
+are considered one record-separator. (End of file is also considered
+a record separator.)@refill
+@c !!! This use of `end of file' is confusing. Needs to be clarified.
+
+The second step is to separate the fields in the record. One way to do
+this is to put each field on a separate line: to do this, just set the
+variable @code{FS} to the string @code{"\n"}. (This simple regular
+expression matches a single newline.)
+
+Another way to separate fields is to divide each of the lines into fields
+in the normal manner. This happens by default as a result of a special
+feature: when @code{RS} is set to the null string, the newline character
+@emph{always} acts as a field separator. This is in addition to whatever
+field separations result from @code{FS}.
+
+The original motivation for this special exception was probably so that
+you get useful behavior in the default case (i.e., @w{@code{FS == " "}}).
+This feature can be a problem if you really don't want the
+newline character to separate fields, since there is no way to
+prevent it. However, you can work around this by using the @code{split}
+function to break up the record manually
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).@refill
+
+@ignore
+Here are two ways to use records separated by blank lines and break each
+line into fields normally:
+
+@example
+awk 'BEGIN @{ RS = ""; FS = "[ \t\n]+" @} @{ print $1 @}' BBS-list
+
+@exdent @r{or}
+
+awk 'BEGIN @{ RS = "^\n+"; FS = "[ \t\n]+" @} @{ print $1 @}' BBS-list
+@end example
+@end ignore
+
+@ignore
+Here is how to use records separated by blank lines and break each
+line into fields normally:
+
+@example
+awk 'BEGIN @{ RS = ""; FS = "[ \t\n]+" @} ; @{ print $1 @}' BBS-list
+@end example
+@end ignore
+
+@node Getline, Close Input, Multiple Line, Reading Files
+@section Explicit Input with @code{getline}
+
+@findex getline
+@cindex input, explicit
+@cindex explicit input
+@cindex input, @code{getline} command
+@cindex reading files, @code{getline} command
+So far we have been getting our input files from @code{awk}'s main
+input stream---either the standard input (usually your terminal) or the
+files specified on the command line. The @code{awk} language has a
+special built-in command called @code{getline} that
+can be used to read input under your explicit control.@refill
+
+This command is quite complex and should @emph{not} be used by
+beginners. It is covered here because this is the chapter on input.
+The examples that follow the explanation of the @code{getline} command
+include material that has not been covered yet. Therefore, come back
+and study the @code{getline} command @emph{after} you have reviewed the
+rest of this manual and have a good knowledge of how @code{awk} works.
+
+@vindex ERRNO
+@cindex differences: @code{gawk} and @code{awk}
+@code{getline} returns 1 if it finds a record, and 0 if the end of the
+file is encountered. If there is some error in getting a record, such
+as a file that cannot be opened, then @code{getline} returns @minus{}1.
+In this case, @code{gawk} sets the variable @code{ERRNO} to a string
+describing the error that occurred.
+
+In the following examples, @var{command} stands for a string value that
+represents a shell command.
+
+@table @code
+@item getline
+The @code{getline} command can be used without arguments to read input
+from the current input file. All it does in this case is read the next
+input record and split it up into fields. This is useful if you've
+finished processing the current record, but you want to do some special
+processing @emph{right now} on the next record. Here's an
+example:@refill
+
+@example
+awk '@{
+ if (t = index($0, "/*")) @{
+ if (t > 1)
+ tmp = substr($0, 1, t - 1)
+ else
+ tmp = ""
+ u = index(substr($0, t + 2), "*/")
+ while (u == 0) @{
+ getline
+ t = -1
+ u = index($0, "*/")
+ @}
+ if (u <= length($0) - 2)
+ $0 = tmp substr($0, t + u + 3)
+ else
+ $0 = tmp
+ @}
+ print $0
+@}'
+@end example
+
+This @code{awk} program deletes all C-style comments, @samp{/* @dots{}
+*/}, from the input. By replacing the @samp{print $0} with other
+statements, you could perform more complicated processing on the
+decommented input, like searching for matches of a regular
+expression. (This program has a subtle problem---can you spot it?)
+
+@c the program to remove comments doesn't work if one
+@c comment ends and another begins on the same line. (Your
+@c idea for restart would be useful here). --- brennan@boeing.com
+
+This form of the @code{getline} command sets @code{NF} (the number of
+fields; @pxref{Fields, ,Examining Fields}), @code{NR} (the number of
+records read so far; @pxref{Records, ,How Input is Split into Records}),
+@code{FNR} (the number of records read from this input file), and the
+value of @code{$0}.
+
+@strong{Note:} the new value of @code{$0} is used in testing
+the patterns of any subsequent rules. The original value
+of @code{$0} that triggered the rule which executed @code{getline}
+is lost. By contrast, the @code{next} statement reads a new record
+but immediately begins processing it normally, starting with the first
+rule in the program. @xref{Next Statement, ,The @code{next} Statement}.
+
+@item getline @var{var}
+This form of @code{getline} reads a record into the variable @var{var}.
+This is useful when you want your program to read the next record from
+the current input file, but you don't want to subject the record to the
+normal input processing.
+
+For example, suppose the next line is a comment, or a special string,
+and you want to read it, but you must make certain that it won't trigger
+any rules. This version of @code{getline} allows you to read that line
+and store it in a variable so that the main
+read-a-line-and-check-each-rule loop of @code{awk} never sees it.
+
+The following example swaps every two lines of input. For example, given:
+
+@example
+wan
+tew
+free
+phore
+@end example
+
+@noindent
+it outputs:
+
+@example
+tew
+wan
+phore
+free
+@end example
+
+@noindent
+Here's the program:
+
+@example
+@group
+awk '@{
+ if ((getline tmp) > 0) @{
+ print tmp
+ print $0
+ @} else
+ print $0
+@}'
+@end group
+@end example
+
+The @code{getline} function used in this way sets only the variables
+@code{NR} and @code{FNR} (and of course, @var{var}). The record is not
+split into fields, so the values of the fields (including @code{$0}) and
+the value of @code{NF} do not change.@refill
+
+@item getline < @var{file}
+@cindex input redirection
+@cindex redirection of input
+This form of the @code{getline} function takes its input from the file
+@var{file}. Here @var{file} is a string-valued expression that
+specifies the file name. @samp{< @var{file}} is called a @dfn{redirection}
+since it directs input to come from a different place.
+
+This form is useful if you want to read your input from a particular
+file, instead of from the main input stream. For example, the following
+program reads its input record from the file @file{foo.input} when it
+encounters a first field with a value equal to 10 in the current input
+file.@refill
+
+@example
+awk '@{
+ if ($1 == 10) @{
+ getline < "foo.input"
+ print
+ @} else
+ print
+@}'
+@end example
+
+Since the main input stream is not used, the values of @code{NR} and
+@code{FNR} are not changed. But the record read is split into fields in
+the normal manner, so the values of @code{$0} and other fields are
+changed. So is the value of @code{NF}.
+
+This does not cause the record to be tested against all the patterns
+in the @code{awk} program, in the way that would happen if the record
+were read normally by the main processing loop of @code{awk}. However
+the new record is tested against any subsequent rules, just as when
+@code{getline} is used without a redirection.
+
+@item getline @var{var} < @var{file}
+This form of the @code{getline} function takes its input from the file
+@var{file} and puts it in the variable @var{var}. As above, @var{file}
+is a string-valued expression that specifies the file from which to read.
+
+In this version of @code{getline}, none of the built-in variables are
+changed, and the record is not split into fields. The only variable
+changed is @var{var}.
+
+For example, the following program copies all the input files to the
+output, except for records that say @w{@samp{@@include @var{filename}}}.
+Such a record is replaced by the contents of the file
+@var{filename}.@refill
+
+@example
+awk '@{
+ if (NF == 2 && $1 == "@@include") @{
+ while ((getline line < $2) > 0)
+ print line
+ close($2)
+ @} else
+ print
+@}'
+@end example
+
+Note here how the name of the extra input file is not built into
+the program; it is taken from the data, from the second field on
+the @samp{@@include} line.@refill
+
+The @code{close} function is called to ensure that if two identical
+@samp{@@include} lines appear in the input, the entire specified file is
+included twice. @xref{Close Input, ,Closing Input Files and Pipes}.@refill
+
+One deficiency of this program is that it does not process nested
+@samp{@@include} statements the way a true macro preprocessor would.
+
+@item @var{command} | getline
+You can @dfn{pipe} the output of a command into @code{getline}. A pipe is
+simply a way to link the output of one program to the input of another. In
+this case, the string @var{command} is run as a shell command and its output
+is piped into @code{awk} to be used as input. This form of @code{getline}
+reads one record from the pipe.
+
+For example, the following program copies input to output, except for lines
+that begin with @samp{@@execute}, which are replaced by the output produced by
+running the rest of the line as a shell command:
+
+@example
+awk '@{
+ if ($1 == "@@execute") @{
+ tmp = substr($0, 10)
+ while ((tmp | getline) > 0)
+ print
+ close(tmp)
+ @} else
+ print
+@}'
+@end example
+
+@noindent
+The @code{close} function is called to ensure that if two identical
+@samp{@@execute} lines appear in the input, the command is run for
+each one. @xref{Close Input, ,Closing Input Files and Pipes}.
+
+Given the input:
+
+@example
+foo
+bar
+baz
+@@execute who
+bletch
+@end example
+
+@noindent
+the program might produce:
+
+@example
+foo
+bar
+baz
+hack ttyv0 Jul 13 14:22
+hack ttyp0 Jul 13 14:23 (gnu:0)
+hack ttyp1 Jul 13 14:23 (gnu:0)
+hack ttyp2 Jul 13 14:23 (gnu:0)
+hack ttyp3 Jul 13 14:23 (gnu:0)
+bletch
+@end example
+
+@noindent
+Notice that this program ran the command @code{who} and printed the result.
+(If you try this program yourself, you will get different results, showing
+you who is logged in on your system.)
+
+This variation of @code{getline} splits the record into fields, sets the
+value of @code{NF} and recomputes the value of @code{$0}. The values of
+@code{NR} and @code{FNR} are not changed.
+
+@item @var{command} | getline @var{var}
+The output of the command @var{command} is sent through a pipe to
+@code{getline} and into the variable @var{var}. For example, the
+following program reads the current date and time into the variable
+@code{current_time}, using the @code{date} utility, and then
+prints it.@refill
+
+@example
+awk 'BEGIN @{
+ "date" | getline current_time
+ close("date")
+ print "Report printed on " current_time
+@}'
+@end example
+
+In this version of @code{getline}, none of the built-in variables are
+changed, and the record is not split into fields.
+@end table
+
+@node Close Input, , Getline, Reading Files
+@section Closing Input Files and Pipes
+@cindex closing input files and pipes
+@findex close
+
+If the same file name or the same shell command is used with
+@code{getline} more than once during the execution of an @code{awk}
+program, the file is opened (or the command is executed) only the first time.
+At that time, the first record of input is read from that file or command.
+The next time the same file or command is used in @code{getline}, another
+record is read from it, and so on.
+
+This implies that if you want to start reading the same file again from
+the beginning, or if you want to rerun a shell command (rather than
+reading more output from the command), you must take special steps.
+What you must do is use the @code{close} function, as follows:
+
+@example
+close(@var{filename})
+@end example
+
+@noindent
+or
+
+@example
+close(@var{command})
+@end example
+
+The argument @var{filename} or @var{command} can be any expression. Its
+value must exactly equal the string that was used to open the file or
+start the command---for example, if you open a pipe with this:
+
+@example
+"sort -r names" | getline foo
+@end example
+
+@noindent
+then you must close it with this:
+
+@example
+close("sort -r names")
+@end example
+
+Once this function call is executed, the next @code{getline} from that
+file or command will reopen the file or rerun the command.
+
+@iftex
+@vindex ERRNO
+@cindex differences: @code{gawk} and @code{awk}
+@end iftex
+@code{close} returns a value of zero if the close succeeded.
+Otherwise, the value will be non-zero.
+In this case, @code{gawk} sets the variable @code{ERRNO} to a string
+describing the error that occurred.
+
+@node Printing, One-liners, Reading Files, Top
+@chapter Printing Output
+
+@cindex printing
+@cindex output
+One of the most common things that actions do is to output or @dfn{print}
+some or all of the input. For simple output, use the @code{print}
+statement. For fancier formatting use the @code{printf} statement.
+Both are described in this chapter.
+
+@menu
+* Print:: The @code{print} statement.
+* Print Examples:: Simple examples of @code{print} statements.
+* Output Separators:: The output separators and how to change them.
+* OFMT:: Controlling Numeric Output With @code{print}.
+* Printf:: The @code{printf} statement.
+* Redirection:: How to redirect output to multiple
+ files and pipes.
+* Special Files:: File name interpretation in @code{gawk}.
+ @code{gawk} allows access to
+ inherited file descriptors.
+@end menu
+
+@node Print, Print Examples, Printing, Printing
+@section The @code{print} Statement
+@cindex @code{print} statement
+
+The @code{print} statement does output with simple, standardized
+formatting. You specify only the strings or numbers to be printed, in a
+list separated by commas. They are output, separated by single spaces,
+followed by a newline. The statement looks like this:
+
+@example
+print @var{item1}, @var{item2}, @dots{}
+@end example
+
+@noindent
+The entire list of items may optionally be enclosed in parentheses. The
+parentheses are necessary if any of the item expressions uses a
+relational operator; otherwise it could be confused with a redirection
+(@pxref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}).
+The relational operators are @samp{==},
+@samp{!=}, @samp{<}, @samp{>}, @samp{>=}, @samp{<=}, @samp{~} and
+@samp{!~} (@pxref{Comparison Ops, ,Comparison Expressions}).@refill
+
+The items printed can be constant strings or numbers, fields of the
+current record (such as @code{$1}), variables, or any @code{awk}
+expressions. The @code{print} statement is completely general for
+computing @emph{what} values to print. With two exceptions,
+you cannot specify @emph{how} to print them---how many
+columns, whether to use exponential notation or not, and so on.
+(@xref{Output Separators}, and
+@ref{OFMT, ,Controlling Numeric Output with @code{print}}.)
+For that, you need the @code{printf} statement
+(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}).@refill
+
+The simple statement @samp{print} with no items is equivalent to
+@samp{print $0}: it prints the entire current record. To print a blank
+line, use @samp{print ""}, where @code{""} is the null, or empty,
+string.
+
+To print a fixed piece of text, use a string constant such as
+@w{@code{"Hello there"}} as one item. If you forget to use the
+double-quote characters, your text will be taken as an @code{awk}
+expression, and you will probably get an error. Keep in mind that a
+space is printed between any two items.
+
+Most often, each @code{print} statement makes one line of output. But it
+isn't limited to one line. If an item value is a string that contains a
+newline, the newline is output along with the rest of the string. A
+single @code{print} can make any number of lines this way.
+
+@node Print Examples, Output Separators, Print, Printing
+@section Examples of @code{print} Statements
+
+Here is an example of printing a string that contains embedded newlines:
+
+@example
+awk 'BEGIN @{ print "line one\nline two\nline three" @}'
+@end example
+
+@noindent
+produces output like this:
+
+@example
+line one
+line two
+line three
+@end example
+
+Here is an example that prints the first two fields of each input record,
+with a space between them:
+
+@example
+awk '@{ print $1, $2 @}' inventory-shipped
+@end example
+
+@noindent
+Its output looks like this:
+
+@example
+Jan 13
+Feb 15
+Mar 15
+@dots{}
+@end example
+
+A common mistake in using the @code{print} statement is to omit the comma
+between two items. This often has the effect of making the items run
+together in the output, with no space. The reason for this is that
+juxtaposing two string expressions in @code{awk} means to concatenate
+them. For example, without the comma:
+
+@example
+awk '@{ print $1 $2 @}' inventory-shipped
+@end example
+
+@noindent
+prints:
+
+@example
+@group
+Jan13
+Feb15
+Mar15
+@dots{}
+@end group
+@end example
+
+Neither example's output makes much sense to someone unfamiliar with the
+file @file{inventory-shipped}. A heading line at the beginning would make
+it clearer. Let's add some headings to our table of months (@code{$1}) and
+green crates shipped (@code{$2}). We do this using the @code{BEGIN} pattern
+(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}) to force the headings to be printed only once:
+
+@example
+awk 'BEGIN @{ print "Month Crates"
+ print "----- ------" @}
+ @{ print $1, $2 @}' inventory-shipped
+@end example
+
+@noindent
+Did you already guess what happens? This program prints the following:
+
+@example
+@group
+Month Crates
+----- ------
+Jan 13
+Feb 15
+Mar 15
+@dots{}
+@end group
+@end example
+
+@noindent
+The headings and the table data don't line up! We can fix this by printing
+some spaces between the two fields:
+
+@example
+awk 'BEGIN @{ print "Month Crates"
+ print "----- ------" @}
+ @{ print $1, " ", $2 @}' inventory-shipped
+@end example
+
+You can imagine that this way of lining up columns can get pretty
+complicated when you have many columns to fix. Counting spaces for two
+or three columns can be simple, but more than this and you can get
+``lost'' quite easily. This is why the @code{printf} statement was
+created (@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing});
+one of its specialties is lining up columns of data.@refill
+
+@node Output Separators, OFMT, Print Examples, Printing
+@section Output Separators
+
+@cindex output field separator, @code{OFS}
+@vindex OFS
+@vindex ORS
+@cindex output record separator, @code{ORS}
+As mentioned previously, a @code{print} statement contains a list
+of items, separated by commas. In the output, the items are normally
+separated by single spaces. But they do not have to be spaces; a
+single space is only the default. You can specify any string of
+characters to use as the @dfn{output field separator} by setting the
+built-in variable @code{OFS}. The initial value of this variable
+is the string @w{@code{" "}}, that is, just a single space.@refill
+
+The output from an entire @code{print} statement is called an
+@dfn{output record}. Each @code{print} statement outputs one output
+record and then outputs a string called the @dfn{output record separator}.
+The built-in variable @code{ORS} specifies this string. The initial
+value of the variable is the string @code{"\n"} containing a newline
+character; thus, normally each @code{print} statement makes a separate line.
+
+You can change how output fields and records are separated by assigning
+new values to the variables @code{OFS} and/or @code{ORS}. The usual
+place to do this is in the @code{BEGIN} rule
+(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}), so
+that it happens before any input is processed. You may also do this
+with assignments on the command line, before the names of your input
+files.@refill
+
+The following example prints the first and second fields of each input
+record separated by a semicolon, with a blank line added after each
+line:@refill
+
+@example
+@group
+awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @}
+ @{ print $1, $2 @}' BBS-list
+@end group
+@end example
+
+If the value of @code{ORS} does not contain a newline, all your output
+will be run together on a single line, unless you output newlines some
+other way.
+
+@node OFMT, Printf, Output Separators, Printing
+@section Controlling Numeric Output with @code{print}
+@vindex OFMT
+When you use the @code{print} statement to print numeric values,
+@code{awk} internally converts the number to a string of characters,
+and prints that string. @code{awk} uses the @code{sprintf} function
+to do this conversion. For now, it suffices to say that the @code{sprintf}
+function accepts a @dfn{format specification} that tells it how to format
+numbers (or strings), and that there are a number of different ways that
+numbers can be formatted. The different format specifications are discussed
+more fully in
+@ref{Printf, ,Using @code{printf} Statements for Fancier Printing}.@refill
+
+The built-in variable @code{OFMT} contains the default format specification
+that @code{print} uses with @code{sprintf} when it wants to convert a
+number to a string for printing. By supplying different format specifications
+as the value of @code{OFMT}, you can change how @code{print} will print
+your numbers. As a brief example:
+
+@example
+@group
+awk 'BEGIN @{ OFMT = "%d" # print numbers as integers
+ print 17.23 @}'
+@end group
+@end example
+
+@noindent
+will print @samp{17}.
+
+@node Printf, Redirection, OFMT, Printing
+@section Using @code{printf} Statements for Fancier Printing
+@cindex formatted output
+@cindex output, formatted
+
+If you want more precise control over the output format than
+@code{print} gives you, use @code{printf}. With @code{printf} you can
+specify the width to use for each item, and you can specify various
+stylistic choices for numbers (such as what radix to use, whether to
+print an exponent, whether to print a sign, and how many digits to print
+after the decimal point). You do this by specifying a string, called
+the @dfn{format string}, which controls how and where to print the other
+arguments.
+
+@menu
+* Basic Printf:: Syntax of the @code{printf} statement.
+* Control Letters:: Format-control letters.
+* Format Modifiers:: Format-specification modifiers.
+* Printf Examples:: Several examples.
+@end menu
+
+@node Basic Printf, Control Letters, Printf, Printf
+@subsection Introduction to the @code{printf} Statement
+
+@cindex @code{printf} statement, syntax of
+The @code{printf} statement looks like this:@refill
+
+@example
+printf @var{format}, @var{item1}, @var{item2}, @dots{}
+@end example
+
+@noindent
+The entire list of arguments may optionally be enclosed in parentheses. The
+parentheses are necessary if any of the item expressions uses a
+relational operator; otherwise it could be confused with a redirection
+(@pxref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}).
+The relational operators are @samp{==},
+@samp{!=}, @samp{<}, @samp{>}, @samp{>=}, @samp{<=}, @samp{~} and
+@samp{!~} (@pxref{Comparison Ops, ,Comparison Expressions}).@refill
+
+@cindex format string
+The difference between @code{printf} and @code{print} is the argument
+@var{format}. This is an expression whose value is taken as a string; it
+specifies how to output each of the other arguments. It is called
+the @dfn{format string}.
+
+The format string is the same as in the @sc{ansi} C library function
+@code{printf}. Most of @var{format} is text to be output verbatim.
+Scattered among this text are @dfn{format specifiers}, one per item.
+Each format specifier says to output the next item at that place in the
+format.@refill
+
+The @code{printf} statement does not automatically append a newline to its
+output. It outputs only what the format specifies. So if you want
+a newline, you must include one in the format. The output separator
+variables @code{OFS} and @code{ORS} have no effect on @code{printf}
+statements.@refill
+
+@node Control Letters, Format Modifiers, Basic Printf, Printf
+@subsection Format-Control Letters
+@cindex @code{printf}, format-control characters
+@cindex format specifier
+
+A format specifier starts with the character @samp{%} and ends with a
+@dfn{format-control letter}; it tells the @code{printf} statement how
+to output one item. (If you actually want to output a @samp{%}, write
+@samp{%%}.) The format-control letter specifies what kind of value to
+print. The rest of the format specifier is made up of optional
+@dfn{modifiers} which are parameters such as the field width to use.@refill
+
+Here is a list of the format-control letters:
+
+@table @samp
+@item c
+This prints a number as an ASCII character. Thus, @samp{printf "%c",
+65} outputs the letter @samp{A}. The output for a string value is
+the first character of the string.
+
+@item d
+This prints a decimal integer.
+
+@item i
+This also prints a decimal integer.
+
+@item e
+This prints a number in scientific (exponential) notation.
+For example,
+
+@example
+printf "%4.3e", 1950
+@end example
+
+@noindent
+prints @samp{1.950e+03}, with a total of four significant figures of
+which three follow the decimal point. The @samp{4.3} are @dfn{modifiers},
+discussed below.
+
+@item f
+This prints a number in floating point notation.
+
+@item g
+This prints a number in either scientific notation or floating point
+notation, whichever uses fewer characters.
+@ignore
+From: gatech!ames!elroy!cit-vax!EQL.Caltech.Edu!rankin (Pat Rankin)
+
+In the description of printf formats (p.43), the information for %g
+is incorrect (mainly, it's too much of an oversimplification). It's
+wrong in the AWK book too, and in the gawk man page. I suggested to
+David Trueman before 2.13 was released that the latter be revised, so
+that it matched gawk's behavior (rather than trying to change gawk to
+match the docs ;-). The documented description is nice and simple, but
+it doesn't match the actual underlying behavior of %g in the various C
+run-time libraries that gawk relies on. The precision value for g format
+is different than for f and e formats, so it's inaccurate to say 'g' is
+the shorter of 'e' or 'f'. For 'g', precision represents the number of
+significant digits rather than the number of decimal places, and it has
+special rules about how to format numbers with range between 10E-1 and
+10E-4. All in all, it's pretty messy, and I had to add that clumsy
+GFMT_WORKAROUND code because the VMS run-time library doesn't conform to
+the ANSI-C specifications.
+@end ignore
+
+@item o
+This prints an unsigned octal integer.
+
+@item s
+This prints a string.
+
+@item x
+This prints an unsigned hexadecimal integer.
+
+@item X
+This prints an unsigned hexadecimal integer. However, for the values 10
+through 15, it uses the letters @samp{A} through @samp{F} instead of
+@samp{a} through @samp{f}.
+
+@item %
+This isn't really a format-control letter, but it does have a meaning
+when used after a @samp{%}: the sequence @samp{%%} outputs one
+@samp{%}. It does not consume an argument.
+@end table
+
+@node Format Modifiers, Printf Examples, Control Letters, Printf
+@subsection Modifiers for @code{printf} Formats
+
+@cindex @code{printf}, modifiers
+@cindex modifiers (in format specifiers)
+A format specification can also include @dfn{modifiers} that can control
+how much of the item's value is printed and how much space it gets. The
+modifiers come between the @samp{%} and the format-control letter. Here
+are the possible modifiers, in the order in which they may appear:
+
+@table @samp
+@item -
+The minus sign, used before the width modifier, says to left-justify
+the argument within its specified width. Normally the argument
+is printed right-justified in the specified width. Thus,
+
+@example
+printf "%-4s", "foo"
+@end example
+
+@noindent
+prints @samp{foo }.
+
+@item @var{width}
+This is a number representing the desired width of a field. Inserting any
+number between the @samp{%} sign and the format control character forces the
+field to be expanded to this width. The default way to do this is to
+pad with spaces on the left. For example,
+
+@example
+printf "%4s", "foo"
+@end example
+
+@noindent
+prints @samp{ foo}.
+
+The value of @var{width} is a minimum width, not a maximum. If the item
+value requires more than @var{width} characters, it can be as wide as
+necessary. Thus,
+
+@example
+printf "%4s", "foobar"
+@end example
+
+@noindent
+prints @samp{foobar}.
+
+Preceding the @var{width} with a minus sign causes the output to be
+padded with spaces on the right, instead of on the left.
+
+@item .@var{prec}
+This is a number that specifies the precision to use when printing.
+This specifies the number of digits you want printed to the right of the
+decimal point. For a string, it specifies the maximum number of
+characters from the string that should be printed.
+@end table
+
+The C library @code{printf}'s dynamic @var{width} and @var{prec}
+capability (for example, @code{"%*.*s"}) is supported. Instead of
+supplying explicit @var{width} and/or @var{prec} values in the format
+string, you pass them in the argument list. For example:@refill
+
+@example
+w = 5
+p = 3
+s = "abcdefg"
+printf "<%*.*s>\n", w, p, s
+@end example
+
+@noindent
+is exactly equivalent to
+
+@example
+s = "abcdefg"
+printf "<%5.3s>\n", s
+@end example
+
+@noindent
+Both programs output @samp{@w{<@bullet{}@bullet{}abc>}}. (We have
+used the bullet symbol ``@bullet{}'' to represent a space, to clearly
+show you that there are two spaces in the output.)@refill
+
+Earlier versions of @code{awk} did not support this capability. You may
+simulate it by using concatenation to build up the format string,
+like so:@refill
+
+@example
+w = 5
+p = 3
+s = "abcdefg"
+printf "<%" w "." p "s>\n", s
+@end example
+
+@noindent
+This is not particularly easy to read, however.
+
+@node Printf Examples, , Format Modifiers, Printf
+@subsection Examples of Using @code{printf}
+
+Here is how to use @code{printf} to make an aligned table:
+
+@example
+awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list
+@end example
+
+@noindent
+prints the names of bulletin boards (@code{$1}) of the file
+@file{BBS-list} as a string of 10 characters, left justified. It also
+prints the phone numbers (@code{$2}) afterward on the line. This
+produces an aligned two-column table of names and phone numbers:@refill
+
+@example
+@group
+aardvark 555-5553
+alpo-net 555-3412
+barfly 555-7685
+bites 555-1675
+camelot 555-0542
+core 555-2912
+fooey 555-1234
+foot 555-6699
+macfoo 555-6480
+sdace 555-3430
+sabafoo 555-2127
+@end group
+@end example
+
+Did you notice that we did not specify that the phone numbers be printed
+as numbers? They had to be printed as strings because the numbers are
+separated by a dash. This dash would be interpreted as a minus sign if
+we had tried to print the phone numbers as numbers. This would have led
+to some pretty confusing results.
+
+We did not specify a width for the phone numbers because they are the
+last things on their lines. We don't need to put spaces after them.
+
+We could make our table look even nicer by adding headings to the tops
+of the columns. To do this, use the @code{BEGIN} pattern
+(@pxref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns})
+to force the header to be printed only once, at the beginning of
+the @code{awk} program:@refill
+
+@example
+@group
+awk 'BEGIN @{ print "Name Number"
+ print "---- ------" @}
+ @{ printf "%-10s %s\n", $1, $2 @}' BBS-list
+@end group
+@end example
+
+Did you notice that we mixed @code{print} and @code{printf} statements in
+the above example? We could have used just @code{printf} statements to get
+the same results:
+
+@example
+@group
+awk 'BEGIN @{ printf "%-10s %s\n", "Name", "Number"
+ printf "%-10s %s\n", "----", "------" @}
+ @{ printf "%-10s %s\n", $1, $2 @}' BBS-list
+@end group
+@end example
+
+@noindent
+By outputting each column heading with the same format specification
+used for the elements of the column, we have made sure that the headings
+are aligned just like the columns.
+
+The fact that the same format specification is used three times can be
+emphasized by storing it in a variable, like this:
+
+@example
+awk 'BEGIN @{ format = "%-10s %s\n"
+ printf format, "Name", "Number"
+ printf format, "----", "------" @}
+ @{ printf format, $1, $2 @}' BBS-list
+@end example
+
+See if you can use the @code{printf} statement to line up the headings and
+table data for our @file{inventory-shipped} example covered earlier in the
+section on the @code{print} statement
+(@pxref{Print, ,The @code{print} Statement}).@refill
+
+@node Redirection, Special Files, Printf, Printing
+@section Redirecting Output of @code{print} and @code{printf}
+
+@cindex output redirection
+@cindex redirection of output
+So far we have been dealing only with output that prints to the standard
+output, usually your terminal. Both @code{print} and @code{printf} can
+also send their output to other places.
+This is called @dfn{redirection}.@refill
+
+A redirection appears after the @code{print} or @code{printf} statement.
+Redirections in @code{awk} are written just like redirections in shell
+commands, except that they are written inside the @code{awk} program.
+
+@menu
+* File/Pipe Redirection:: Redirecting Output to Files and Pipes.
+* Close Output:: How to close output files and pipes.
+@end menu
+
+@node File/Pipe Redirection, Close Output, Redirection, Redirection
+@subsection Redirecting Output to Files and Pipes
+
+Here are the three forms of output redirection. They are all shown for
+the @code{print} statement, but they work identically for @code{printf}
+also.@refill
+
+@table @code
+@item print @var{items} > @var{output-file}
+This type of redirection prints the items onto the output file
+@var{output-file}. The file name @var{output-file} can be any
+expression. Its value is changed to a string and then used as a
+file name (@pxref{Expressions, ,Expressions as Action Statements}).@refill
+
+When this type of redirection is used, the @var{output-file} is erased
+before the first output is written to it. Subsequent writes do not
+erase @var{output-file}, but append to it. If @var{output-file} does
+not exist, then it is created.@refill
+
+For example, here is how one @code{awk} program can write a list of
+BBS names to a file @file{name-list} and a list of phone numbers to a
+file @file{phone-list}. Each output file contains one name or number
+per line.
+
+@smallexample
+awk '@{ print $2 > "phone-list"
+ print $1 > "name-list" @}' BBS-list
+@end smallexample
+
+@item print @var{items} >> @var{output-file}
+This type of redirection prints the items onto the output file
+@var{output-file}. The difference between this and the
+single-@samp{>} redirection is that the old contents (if any) of
+@var{output-file} are not erased. Instead, the @code{awk} output is
+appended to the file.
+
+@cindex pipes for output
+@cindex output, piping
+@item print @var{items} | @var{command}
+It is also possible to send output through a @dfn{pipe} instead of into a
+file. This type of redirection opens a pipe to @var{command} and writes
+the values of @var{items} through this pipe, to another process created
+to execute @var{command}.@refill
+
+The redirection argument @var{command} is actually an @code{awk}
+expression. Its value is converted to a string, whose contents give the
+shell command to be run.
+
+For example, this produces two files, one unsorted list of BBS names
+and one list sorted in reverse alphabetical order:
+
+@smallexample
+awk '@{ print $1 > "names.unsorted"
+ print $1 | "sort -r > names.sorted" @}' BBS-list
+@end smallexample
+
+Here the unsorted list is written with an ordinary redirection while
+the sorted list is written by piping through the @code{sort} utility.
+
+Here is an example that uses redirection to mail a message to a mailing
+list @samp{bug-system}. This might be useful when trouble is encountered
+in an @code{awk} script run periodically for system maintenance.
+
+@smallexample
+report = "mail bug-system"
+print "Awk script failed:", $0 | report
+print "at record number", FNR, "of", FILENAME | report
+close(report)
+@end smallexample
+
+We call the @code{close} function here because it's a good idea to close
+the pipe as soon as all the intended output has been sent to it.
+@xref{Close Output, ,Closing Output Files and Pipes}, for more information
+on this. This example also illustrates the use of a variable to represent
+a @var{file} or @var{command}: it is not necessary to always
+use a string constant. Using a variable is generally a good idea,
+since @code{awk} requires you to spell the string value identically
+every time.
+@end table
+
+Redirecting output using @samp{>}, @samp{>>}, or @samp{|} asks the system
+to open a file or pipe only if the particular @var{file} or @var{command}
+you've specified has not already been written to by your program, or if
+it has been closed since it was last written to.@refill
+
+@node Close Output, , File/Pipe Redirection, Redirection
+@subsection Closing Output Files and Pipes
+@cindex closing output files and pipes
+@findex close
+
+When a file or pipe is opened, the file name or command associated with
+it is remembered by @code{awk} and subsequent writes to the same file or
+command are appended to the previous writes. The file or pipe stays
+open until @code{awk} exits. This is usually convenient.
+
+Sometimes there is a reason to close an output file or pipe earlier
+than that. To do this, use the @code{close} function, as follows:
+
+@example
+close(@var{filename})
+@end example
+
+@noindent
+or
+
+@example
+close(@var{command})
+@end example
+
+The argument @var{filename} or @var{command} can be any expression.
+Its value must exactly equal the string used to open the file or pipe
+to begin with---for example, if you open a pipe with this:
+
+@example
+print $1 | "sort -r > names.sorted"
+@end example
+
+@noindent
+then you must close it with this:
+
+@example
+close("sort -r > names.sorted")
+@end example
+
+Here are some reasons why you might need to close an output file:
+
+@itemize @bullet
+@item
+To write a file and read it back later on in the same @code{awk}
+program. Close the file when you are finished writing it; then
+you can start reading it with @code{getline}
+(@pxref{Getline, ,Explicit Input with @code{getline}}).@refill
+
+@item
+To write numerous files, successively, in the same @code{awk}
+program. If you don't close the files, eventually you may exceed a
+system limit on the number of open files in one process. So close
+each one when you are finished writing it.
+
+@item
+To make a command finish. When you redirect output through a pipe,
+the command reading the pipe normally continues to try to read input
+as long as the pipe is open. Often this means the command cannot
+really do its work until the pipe is closed. For example, if you
+redirect output to the @code{mail} program, the message is not
+actually sent until the pipe is closed.
+
+@item
+To run the same program a second time, with the same arguments.
+This is not the same thing as giving more input to the first run!
+
+For example, suppose you pipe output to the @code{mail} program. If you
+output several lines redirected to this pipe without closing it, they make
+a single message of several lines. By contrast, if you close the pipe
+after each line of output, then each line makes a separate message.
+@end itemize
+
+@iftex
+@vindex ERRNO
+@cindex differences: @code{gawk} and @code{awk}
+@end iftex
+@code{close} returns a value of zero if the close succeeded.
+Otherwise, the value will be non-zero.
+In this case, @code{gawk} sets the variable @code{ERRNO} to a string
+describing the error that occurred.
+
+@node Special Files, , Redirection, Printing
+@section Standard I/O Streams
+@cindex standard input
+@cindex standard output
+@cindex standard error output
+@cindex file descriptors
+
+Running programs conventionally have three input and output streams
+already available to them for reading and writing. These are known as
+the @dfn{standard input}, @dfn{standard output}, and @dfn{standard error
+output}. These streams are, by default, terminal input and output, but
+they are often redirected with the shell, via the @samp{<}, @samp{<<},
+@samp{>}, @samp{>>}, @samp{>&} and @samp{|} operators. Standard error
+is used only for writing error messages; the reason we have two separate
+streams, standard output and standard error, is so that they can be
+redirected separately.
+
+@iftex
+@cindex differences: @code{gawk} and @code{awk}
+@end iftex
+In other implementations of @code{awk}, the only way to write an error
+message to standard error in an @code{awk} program is as follows:
+
+@smallexample
+print "Serious error detected!\n" | "cat 1>&2"
+@end smallexample
+
+@noindent
+This works by opening a pipeline to a shell command which can access the
+standard error stream which it inherits from the @code{awk} process.
+This is far from elegant, and is also inefficient, since it requires a
+separate process. So people writing @code{awk} programs have often
+neglected to do this. Instead, they have sent the error messages to the
+terminal, like this:
+
+@smallexample
+@group
+NF != 4 @{
+ printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/tty"
+@}
+@end group
+@end smallexample
+
+@noindent
+This has the same effect most of the time, but not always: although the
+standard error stream is usually the terminal, it can be redirected, and
+when that happens, writing to the terminal is not correct. In fact, if
+@code{awk} is run from a background job, it may not have a terminal at all.
+Then opening @file{/dev/tty} will fail.
+
+@code{gawk} provides special file names for accessing the three standard
+streams. When you redirect input or output in @code{gawk}, if the file name
+matches one of these special names, then @code{gawk} directly uses the
+stream it stands for.
+
+@cindex @file{/dev/stdin}
+@cindex @file{/dev/stdout}
+@cindex @file{/dev/stderr}
+@cindex @file{/dev/fd/}
+@table @file
+@item /dev/stdin
+The standard input (file descriptor 0).
+
+@item /dev/stdout
+The standard output (file descriptor 1).
+
+@item /dev/stderr
+The standard error output (file descriptor 2).
+
+@item /dev/fd/@var{N}
+The file associated with file descriptor @var{N}. Such a file must have
+been opened by the program initiating the @code{awk} execution (typically
+the shell). Unless you take special pains, only descriptors 0, 1 and 2
+are available.
+@end table
+
+The file names @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
+are aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and @file{/dev/fd/2},
+respectively, but they are more self-explanatory.
+
+The proper way to write an error message in a @code{gawk} program
+is to use @file{/dev/stderr}, like this:
+
+@smallexample
+NF != 4 @{
+ printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/stderr"
+@}
+@end smallexample
+
+@code{gawk} also provides special file names that give access to information
+about the running @code{gawk} process. Each of these ``files'' provides
+a single record of information. To read them more than once, you must
+first close them with the @code{close} function
+(@pxref{Close Input, ,Closing Input Files and Pipes}).
+The filenames are:
+
+@cindex @file{/dev/pid}
+@cindex @file{/dev/pgrpid}
+@cindex @file{/dev/ppid}
+@cindex @file{/dev/user}
+@table @file
+@item /dev/pid
+Reading this file returns the process ID of the current process,
+in decimal, terminated with a newline.
+
+@item /dev/ppid
+Reading this file returns the parent process ID of the current process,
+in decimal, terminated with a newline.
+
+@item /dev/pgrpid
+Reading this file returns the process group ID of the current process,
+in decimal, terminated with a newline.
+
+@item /dev/user
+Reading this file returns a single record terminated with a newline.
+The fields are separated with blanks. The fields represent the
+following information:
+
+@table @code
+@item $1
+The value of the @code{getuid} system call.
+
+@item $2
+The value of the @code{geteuid} system call.
+
+@item $3
+The value of the @code{getgid} system call.
+
+@item $4
+The value of the @code{getegid} system call.
+@end table
+
+If there are any additional fields, they are the group IDs returned by
+@code{getgroups} system call.
+(Multiple groups may not be supported on all systems.)@refill
+@end table
+
+These special file names may be used on the command line as data
+files, as well as for I/O redirections within an @code{awk} program.
+They may not be used as source files with the @samp{-f} option.
+
+Recognition of these special file names is disabled if @code{gawk} is in
+compatibility mode (@pxref{Command Line, ,Invoking @code{awk}}).
+
+@quotation
+@strong{Caution}: Unless your system actually has a @file{/dev/fd} directory
+(or any of the other above listed special files),
+the interpretation of these file names is done by @code{gawk} itself.
+For example, using @samp{/dev/fd/4} for output will actually write on
+file descriptor 4, and not on a new file descriptor that was @code{dup}'ed
+from file descriptor 4. Most of the time this does not matter; however, it
+is important to @emph{not} close any of the files related to file descriptors
+0, 1, and 2. If you do close one of these files, unpredictable behavior
+will result.
+@end quotation
+
+@node One-liners, Patterns, Printing, Top
+@chapter Useful ``One-liners''
+
+@cindex one-liners
+Useful @code{awk} programs are often short, just a line or two. Here is a
+collection of useful, short programs to get you started. Some of these
+programs contain constructs that haven't been covered yet. The description
+of the program will give you a good idea of what is going on, but please
+read the rest of the manual to become an @code{awk} expert!
+
+@c Per suggestions from Michal Jaegermann
+@ifinfo
+Since you are reading this in Info, each line of the example code is
+enclosed in quotes, to represent text that you would type literally.
+The examples themselves represent shell commands that use single quotes
+to keep the shell from interpreting the contents of the program.
+When reading the examples, focus on the text between the open and close
+quotes.
+@end ifinfo
+
+@table @code
+@item awk '@{ if (NF > max) max = NF @}
+@itemx @ @ @ @ @ END @{ print max @}'
+This program prints the maximum number of fields on any input line.
+
+@item awk 'length($0) > 80'
+This program prints every line longer than 80 characters. The sole
+rule has a relational expression as its pattern, and has no action (so the
+default action, printing the record, is used).
+
+@item awk 'NF > 0'
+This program prints every line that has at least one field. This is an
+easy way to delete blank lines from a file (or rather, to create a new
+file similar to the old file but from which the blank lines have been
+deleted).
+
+@item awk '@{ if (NF > 0) print @}'
+This program also prints every line that has at least one field. Here we
+allow the rule to match every line, then decide in the action whether
+to print.
+
+@item awk@ 'BEGIN@ @{@ for (i = 1; i <= 7; i++)
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ print int(101 * rand()) @}'
+This program prints 7 random numbers from 0 to 100, inclusive.
+
+@item ls -l @var{files} | awk '@{ x += $4 @} ; END @{ print "total bytes: " x @}'
+This program prints the total number of bytes used by @var{files}.
+
+@item expand@ @var{file}@ |@ awk@ '@{ if (x < length()) x = length() @}
+@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ END @{ print "maximum line length is " x @}'
+This program prints the maximum line length of @var{file}. The input
+is piped through the @code{expand} program to change tabs into spaces,
+so the widths compared are actually the right-margin columns.
+
+@item awk 'BEGIN @{ FS = ":" @}
+@itemx @ @ @ @ @ @{ print $1 | "sort" @}' /etc/passwd
+This program prints a sorted list of the login names of all users.
+
+@item awk '@{ nlines++ @}
+@itemx @ @ @ @ @ END@ @{ print nlines @}'
+This programs counts lines in a file.
+
+@item awk 'END @{ print NR @}'
+This program also counts lines in a file, but lets @code{awk} do the work.
+
+@item awk '@{ print NR, $0 @}'
+This program adds line numbers to all its input files,
+similar to @samp{cat -n}.
+@end table
+
+@node Patterns, Actions, One-liners, Top
+@chapter Patterns
+@cindex pattern, definition of
+
+Patterns in @code{awk} control the execution of rules: a rule is
+executed when its pattern matches the current input record. This
+chapter tells all about how to write patterns.
+
+@menu
+* Kinds of Patterns:: A list of all kinds of patterns.
+ The following subsections describe
+ them in detail.
+* Regexp:: Regular expressions such as @samp{/foo/}.
+* Comparison Patterns:: Comparison expressions such as @code{$1 > 10}.
+* Boolean Patterns:: Combining comparison expressions.
+* Expression Patterns:: Any expression can be used as a pattern.
+* Ranges:: Pairs of patterns specify record ranges.
+* BEGIN/END:: Specifying initialization and cleanup rules.
+* Empty:: The empty pattern, which matches every record.
+@end menu
+
+@node Kinds of Patterns, Regexp, Patterns, Patterns
+@section Kinds of Patterns
+@cindex patterns, types of
+
+Here is a summary of the types of patterns supported in @code{awk}.
+@c At the next rewrite, check to see that this order matches the
+@c order in the text. It might not matter to a reader, but it's good
+@c style. Also, it might be nice to mention all the topics of sections
+@c that follow in this list; that way people can scan and know when to
+@c expect a specific topic. Specifically please also make an entry
+@c for Boolean operators as patterns in the right place. --mew
+
+@table @code
+@item /@var{regular expression}/
+A regular expression as a pattern. It matches when the text of the
+input record fits the regular expression.
+(@xref{Regexp, ,Regular Expressions as Patterns}.)@refill
+
+@item @var{expression}
+A single expression. It matches when its value, converted to a number,
+is nonzero (if a number) or nonnull (if a string).
+(@xref{Expression Patterns, ,Expressions as Patterns}.)@refill
+
+@item @var{pat1}, @var{pat2}
+A pair of patterns separated by a comma, specifying a range of records.
+(@xref{Ranges, ,Specifying Record Ranges with Patterns}.)
+
+@item BEGIN
+@itemx END
+Special patterns to supply start-up or clean-up information to
+@code{awk}. (@xref{BEGIN/END, ,@code{BEGIN} and @code{END} Special Patterns}.)
+
+@item @var{null}
+The empty pattern matches every input record.
+(@xref{Empty, ,The Empty Pattern}.)@refill
+@end table
+
+
+@node Regexp, Comparison Patterns, Kinds of Patterns, Patterns
+@section Regular Expressions as Patterns
+@cindex pattern, regular expressions
+@cindex regexp
+@cindex regular expressions as patterns
+
+A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a
+class of strings. A regular expression enclosed in slashes (@samp{/})
+is an @code{awk} pattern that matches every input record whose text
+belongs to that class.
+
+The simplest regular expression is a sequence of letters, numbers, or
+both. Such a regexp matches any string that contains that sequence.
+Thus, the regexp @samp{foo} matches any string containing @samp{foo}.
+Therefore, the pattern @code{/foo/} matches any input record containing
+@samp{foo}. Other kinds of regexps let you specify more complicated
+classes of strings.
+
+@menu
+* Regexp Usage:: How to Use Regular Expressions
+* Regexp Operators:: Regular Expression Operators
+* Case-sensitivity:: How to do case-insensitive matching.
+@end menu
+
+@node Regexp Usage, Regexp Operators, Regexp, Regexp
+@subsection How to Use Regular Expressions
+
+A regular expression can be used as a pattern by enclosing it in
+slashes. Then the regular expression is matched against the
+entire text of each record. (Normally, it only needs
+to match some part of the text in order to succeed.) For example, this
+prints the second field of each record that contains @samp{foo} anywhere:
+
+@example
+awk '/foo/ @{ print $2 @}' BBS-list
+@end example
+
+@cindex regular expression matching operators
+@cindex string-matching operators
+@cindex operators, string-matching
+@cindex operators, regexp matching
+@cindex regexp search operators
+Regular expressions can also be used in comparison expressions. Then
+you can specify the string to match against; it need not be the entire
+current input record. These comparison expressions can be used as
+patterns or in @code{if}, @code{while}, @code{for}, and @code{do} statements.
+
+@table @code
+@item @var{exp} ~ /@var{regexp}/
+This is true if the expression @var{exp} (taken as a character string)
+is matched by @var{regexp}. The following example matches, or selects,
+all input records with the upper-case letter @samp{J} somewhere in the
+first field:@refill
+
+@example
+awk '$1 ~ /J/' inventory-shipped
+@end example
+
+So does this:
+
+@example
+awk '@{ if ($1 ~ /J/) print @}' inventory-shipped
+@end example
+
+@item @var{exp} !~ /@var{regexp}/
+This is true if the expression @var{exp} (taken as a character string)
+is @emph{not} matched by @var{regexp}. The following example matches,
+or selects, all input records whose first field @emph{does not} contain
+the upper-case letter @samp{J}:@refill
+
+@example
+awk '$1 !~ /J/' inventory-shipped
+@end example
+@end table
+
+@cindex computed regular expressions
+@cindex regular expressions, computed
+@cindex dynamic regular expressions
+The right hand side of a @samp{~} or @samp{!~} operator need not be a
+constant regexp (i.e., a string of characters between slashes). It may
+be any expression. The expression is evaluated, and converted if
+necessary to a string; the contents of the string are used as the
+regexp. A regexp that is computed in this way is called a @dfn{dynamic
+regexp}. For example:
+
+@example
+identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+"
+$0 ~ identifier_regexp
+@end example
+
+@noindent
+sets @code{identifier_regexp} to a regexp that describes @code{awk}
+variable names, and tests if the input record matches this regexp.
+
+@node Regexp Operators, Case-sensitivity, Regexp Usage, Regexp
+@subsection Regular Expression Operators
+@cindex metacharacters
+@cindex regular expression metacharacters
+
+You can combine regular expressions with the following characters,
+called @dfn{regular expression operators}, or @dfn{metacharacters}, to
+increase the power and versatility of regular expressions.
+
+Here is a table of metacharacters. All characters not listed in the
+table stand for themselves.
+
+@table @code
+@item ^
+This matches the beginning of the string or the beginning of a line
+within the string. For example:
+
+@example
+^@@chapter
+@end example
+
+@noindent
+matches the @samp{@@chapter} at the beginning of a string, and can be used
+to identify chapter beginnings in Texinfo source files.
+
+@item $
+This is similar to @samp{^}, but it matches only at the end of a string
+or the end of a line within the string. For example:
+
+@example
+p$
+@end example
+
+@noindent
+matches a record that ends with a @samp{p}.
+
+@item .
+This matches any single character except a newline. For example:
+
+@example
+.P
+@end example
+
+@noindent
+matches any single character followed by a @samp{P} in a string. Using
+concatenation we can make regular expressions like @samp{U.A}, which
+matches any three-character sequence that begins with @samp{U} and ends
+with @samp{A}.
+
+@item [@dots{}]
+This is called a @dfn{character set}. It matches any one of the
+characters that are enclosed in the square brackets. For example:
+
+@example
+[MVX]
+@end example
+
+@noindent
+matches any one of the characters @samp{M}, @samp{V}, or @samp{X} in a
+string.@refill
+
+Ranges of characters are indicated by using a hyphen between the beginning
+and ending characters, and enclosing the whole thing in brackets. For
+example:@refill
+
+@example
+[0-9]
+@end example
+
+@noindent
+matches any digit.
+
+To include the character @samp{\}, @samp{]}, @samp{-} or @samp{^} in a
+character set, put a @samp{\} in front of it. For example:
+
+@example
+[d\]]
+@end example
+
+@noindent
+matches either @samp{d}, or @samp{]}.@refill
+
+This treatment of @samp{\} is compatible with other @code{awk}
+implementations, and is also mandated by the @sc{posix} Command Language
+and Utilities standard. The regular expressions in @code{awk} are a superset
+of the @sc{posix} specification for Extended Regular Expressions (EREs).
+@sc{posix} EREs are based on the regular expressions accepted by the
+traditional @code{egrep} utility.
+
+In @code{egrep} syntax, backslash is not syntactically special within
+square brackets. This means that special tricks have to be used to
+represent the characters @samp{]}, @samp{-} and @samp{^} as members of a
+character set.
+
+In @code{egrep} syntax, to match @samp{-}, write it as @samp{---},
+which is a range containing only @w{@samp{-}.} You may also give @samp{-}
+as the first or last character in the set. To match @samp{^}, put it
+anywhere except as the first character of a set. To match a @samp{]},
+make it the first character in the set. For example:@refill
+
+@example
+[]d^]
+@end example
+
+@noindent
+matches either @samp{]}, @samp{d} or @samp{^}.@refill
+
+@item [^ @dots{}]
+This is a @dfn{complemented character set}. The first character after
+the @samp{[} @emph{must} be a @samp{^}. It matches any characters
+@emph{except} those in the square brackets (or newline). For example:
+
+@example
+[^0-9]
+@end example
+
+@noindent
+matches any character that is not a digit.
+
+@item |
+This is the @dfn{alternation operator} and it is used to specify
+alternatives. For example:
+
+@example
+^P|[0-9]
+@end example
+
+@noindent
+matches any string that matches either @samp{^P} or @samp{[0-9]}. This
+means it matches any string that contains a digit or starts with @samp{P}.
+
+The alternation applies to the largest possible regexps on either side.
+@item (@dots{})
+Parentheses are used for grouping in regular expressions as in
+arithmetic. They can be used to concatenate regular expressions
+containing the alternation operator, @samp{|}.
+
+@item *
+This symbol means that the preceding regular expression is to be
+repeated as many times as possible to find a match. For example:
+
+@example
+ph*
+@end example
+
+@noindent
+applies the @samp{*} symbol to the preceding @samp{h} and looks for matches
+to one @samp{p} followed by any number of @samp{h}s. This will also match
+just @samp{p} if no @samp{h}s are present.
+
+The @samp{*} repeats the @emph{smallest} possible preceding expression.
+(Use parentheses if you wish to repeat a larger expression.) It finds
+as many repetitions as possible. For example:
+
+@example
+awk '/\(c[ad][ad]*r x\)/ @{ print @}' sample
+@end example
+
+@noindent
+prints every record in the input containing a string of the form
+@samp{(car x)}, @samp{(cdr x)}, @samp{(cadr x)}, and so on.@refill
+
+@item +
+This symbol is similar to @samp{*}, but the preceding expression must be
+matched at least once. This means that:
+
+@example
+wh+y
+@end example
+
+@noindent
+would match @samp{why} and @samp{whhy} but not @samp{wy}, whereas
+@samp{wh*y} would match all three of these strings. This is a simpler
+way of writing the last @samp{*} example:
+
+@example
+awk '/\(c[ad]+r x\)/ @{ print @}' sample
+@end example
+
+@item ?
+This symbol is similar to @samp{*}, but the preceding expression can be
+matched once or not at all. For example:
+
+@example
+fe?d
+@end example
+
+@noindent
+will match @samp{fed} and @samp{fd}, but nothing else.@refill
+
+@item \
+This is used to suppress the special meaning of a character when
+matching. For example:
+
+@example
+\$
+@end example
+
+@noindent
+matches the character @samp{$}.
+
+The escape sequences used for string constants
+(@pxref{Constants, ,Constant Expressions}) are
+valid in regular expressions as well; they are also introduced by a
+@samp{\}.@refill
+@end table
+
+In regular expressions, the @samp{*}, @samp{+}, and @samp{?} operators have
+the highest precedence, followed by concatenation, and finally by @samp{|}.
+As in arithmetic, parentheses can change how operators are grouped.@refill
+
+@node Case-sensitivity, , Regexp Operators, Regexp
+@subsection Case-sensitivity in Matching
+
+Case is normally significant in regular expressions, both when matching
+ordinary characters (i.e., not metacharacters), and inside character
+sets. Thus a @samp{w} in a regular expression matches only a lower case
+@samp{w} and not an upper case @samp{W}.
+
+The simplest way to do a case-independent match is to use a character
+set: @samp{[Ww]}. However, this can be cumbersome if you need to use it
+often; and it can make the regular expressions harder for humans to
+read. There are two other alternatives that you might prefer.
+
+One way to do a case-insensitive match at a particular point in the
+program is to convert the data to a single case, using the
+@code{tolower} or @code{toupper} built-in string functions (which we
+haven't discussed yet;
+@pxref{String Functions, ,Built-in Functions for String Manipulation}).
+For example:@refill
+
+@example
+tolower($1) ~ /foo/ @{ @dots{} @}
+@end example
+
+@noindent
+converts the first field to lower case before matching against it.
+
+Another method is to set the variable @code{IGNORECASE} to a nonzero
+value (@pxref{Built-in Variables}). When @code{IGNORECASE} is not zero,
+@emph{all} regexp operations ignore case. Changing the value of
+@code{IGNORECASE} dynamically controls the case sensitivity of your
+program as it runs. Case is significant by default because
+@code{IGNORECASE} (like most variables) is initialized to zero.
+
+@example
+x = "aB"
+if (x ~ /ab/) @dots{} # this test will fail
+
+IGNORECASE = 1
+if (x ~ /ab/) @dots{} # now it will succeed
+@end example
+
+In general, you cannot use @code{IGNORECASE} to make certain rules
+case-insensitive and other rules case-sensitive, because there is no way
+to set @code{IGNORECASE} just for the pattern of a particular rule. To
+do this, you must use character sets or @code{tolower}. However, one
+thing you can do only with @code{IGNORECASE} is turn case-sensitivity on
+or off dynamically for all the rules at once.@refill
+
+@code{IGNORECASE} can be set on the command line, or in a @code{BEGIN}
+rule. Setting @code{IGNORECASE} from the command line is a way to make
+a program case-insensitive without having to edit it.
+
+The value of @code{IGNORECASE} has no effect if @code{gawk} is in
+compatibility mode (@pxref{Command Line, ,Invoking @code{awk}}).
+Case is always significant in compatibility mode.@refill
+
+@node Comparison Patterns, Boolean Patterns, Regexp, Patterns
+@section Comparison Expressions as Patterns
+@cindex comparison expressions as patterns
+@cindex pattern, comparison expressions
+@cindex relational operators
+@cindex operators, relational
+
+@dfn{Comparison patterns} test relationships such as equality between
+two strings or numbers. They are a special case of expression patterns
+(@pxref{Expression Patterns, ,Expressions as Patterns}). They are written
+with @dfn{relational operators}, which are a superset of those in C.
+Here is a table of them:@refill
+
+@table @code
+@item @var{x} < @var{y}
+True if @var{x} is less than @var{y}.
+
+@item @var{x} <= @var{y}
+True if @var{x} is less than or equal to @var{y}.
+
+@item @var{x} > @var{y}
+True if @var{x} is greater than @var{y}.
+
+@item @var{x} >= @var{y}
+True if @var{x} is greater than or equal to @var{y}.
+
+@item @var{x} == @var{y}
+True if @var{x} is equal to @var{y}.
+
+@item @var{x} != @var{y}
+True if @var{x} is not equal to @var{y}.
+
+@item @var{x} ~ @var{y}
+True if @var{x} matches the regular expression described by @var{y}.
+
+@item @var{x} !~ @var{y}
+True if @var{x} does not match the regular expression described by @var{y}.
+@end table
+
+The operands of a relational operator are compared as numbers if they
+are both numbers. Otherwise they are converted to, and compared as,
+strings (@pxref{Conversion, ,Conversion of Strings and Numbers},
+for the detailed rules). Strings are compared by comparing the first
+character of each, then the second character of each,
+and so on, until there is a difference. If the two strings are equal until
+the shorter one runs out, the shorter one is considered to be less than the
+longer one. Thus, @code{"10"} is less than @code{"9"}, and @code{"abc"}
+is less than @code{"abcd"}.@refill
+
+The left operand of the @samp{~} and @samp{!~} operators is a string.
+The right operand is either a constant regular expression enclosed in
+slashes (@code{/@var{regexp}/}), or any expression, whose string value
+is used as a dynamic regular expression
+(@pxref{Regexp Usage, ,How to Use Regular Expressions}).@refill
+
+The following example prints the second field of each input record
+whose first field is precisely @samp{foo}.
+
+@example
+awk '$1 == "foo" @{ print $2 @}' BBS-list
+@end example
+
+@noindent
+Contrast this with the following regular expression match, which would
+accept any record with a first field that contains @samp{foo}:
+
+@example
+awk '$1 ~ "foo" @{ print $2 @}' BBS-list
+@end example
+
+@noindent
+or, equivalently, this one:
+
+@example
+awk '$1 ~ /foo/ @{ print $2 @}' BBS-list
+@end example
+
+@node Boolean Patterns, Expression Patterns, Comparison Patterns, Patterns
+@section Boolean Operators and Patterns
+@cindex patterns, boolean
+@cindex boolean patterns
+
+A @dfn{boolean pattern} is an expression which combines other patterns
+using the @dfn{boolean operators} ``or'' (@samp{||}), ``and''
+(@samp{&&}), and ``not'' (@samp{!}). Whether the boolean pattern
+matches an input record depends on whether its subpatterns match.
+
+For example, the following command prints all records in the input file
+@file{BBS-list} that contain both @samp{2400} and @samp{foo}.@refill
+
+@example
+awk '/2400/ && /foo/' BBS-list
+@end example
+
+The following command prints all records in the input file
+@file{BBS-list} that contain @emph{either} @samp{2400} or @samp{foo}, or
+both.@refill
+
+@example
+awk '/2400/ || /foo/' BBS-list
+@end example
+
+The following command prints all records in the input file
+@file{BBS-list} that do @emph{not} contain the string @samp{foo}.
+
+@example
+awk '! /foo/' BBS-list
+@end example
+
+Note that boolean patterns are a special case of expression patterns
+(@pxref{Expression Patterns, ,Expressions as Patterns}); they are
+expressions that use the boolean operators.
+@xref{Boolean Ops, ,Boolean Expressions}, for complete information
+on the boolean operators.@refill
+
+The subpatterns of a boolean pattern can be constant regular
+expressions, comparisons, or any other @code{awk} expressions. Range
+patterns are not expressions, so they cannot appear inside boolean
+patterns. Likewise, the special patterns @code{BEGIN} and @code{END},
+which never match any input record, are not expressions and cannot
+appear inside boolean patterns.
+
+@node Expression Patterns, Ranges, Boolean Patterns, Patterns
+@section Expressions as Patterns
+
+Any @code{awk} expression is also valid as an @code{awk} pattern.
+Then the pattern ``matches'' if the expression's value is nonzero (if a
+number) or nonnull (if a string).
+
+The expression is reevaluated each time the rule is tested against a new
+input record. If the expression uses fields such as @code{$1}, the
+value depends directly on the new input record's text; otherwise, it
+depends only on what has happened so far in the execution of the
+@code{awk} program, but that may still be useful.
+
+Comparison patterns are actually a special case of this. For
+example, the expression @code{$5 == "foo"} has the value 1 when the
+value of @code{$5} equals @code{"foo"}, and 0 otherwise; therefore, this
+expression as a pattern matches when the two values are equal.
+
+Boolean patterns are also special cases of expression patterns.
+
+A constant regexp as a pattern is also a special case of an expression
+pattern. @code{/foo/} as an expression has the value 1 if @samp{foo}
+appears in the current input record; thus, as a pattern, @code{/foo/}
+matches any record containing @samp{foo}.
+
+Other implementations of @code{awk} that are not yet @sc{posix} compliant
+are less general than @code{gawk}: they allow comparison expressions, and
+boolean combinations thereof (optionally with parentheses), but not
+necessarily other kinds of expressions.
+
+@node Ranges, BEGIN/END, Expression Patterns, Patterns
+@section Specifying Record Ranges with Patterns
+
+@cindex range pattern
+@cindex patterns, range
+A @dfn{range pattern} is made of two patterns separated by a comma, of
+the form @code{@var{begpat}, @var{endpat}}. It matches ranges of
+consecutive input records. The first pattern @var{begpat} controls
+where the range begins, and the second one @var{endpat} controls where
+it ends. For example,@refill
+
+@example
+awk '$1 == "on", $1 == "off"'
+@end example
+
+@noindent
+prints every record between @samp{on}/@samp{off} pairs, inclusive.
+
+A range pattern starts out by matching @var{begpat}
+against every input record; when a record matches @var{begpat}, the
+range pattern becomes @dfn{turned on}. The range pattern matches this
+record. As long as it stays turned on, it automatically matches every
+input record read. It also matches @var{endpat} against
+every input record; when that succeeds, the range pattern is turned
+off again for the following record. Now it goes back to checking
+@var{begpat} against each record.
+
+The record that turns on the range pattern and the one that turns it
+off both match the range pattern. If you don't want to operate on
+these records, you can write @code{if} statements in the rule's action
+to distinguish them.
+
+It is possible for a pattern to be turned both on and off by the same
+record, if both conditions are satisfied by that record. Then the action is
+executed for just that record.
+
+@node BEGIN/END, Empty, Ranges, Patterns
+@section @code{BEGIN} and @code{END} Special Patterns
+
+@cindex @code{BEGIN} special pattern
+@cindex patterns, @code{BEGIN}
+@cindex @code{END} special pattern
+@cindex patterns, @code{END}
+@code{BEGIN} and @code{END} are special patterns. They are not used to
+match input records. Rather, they are used for supplying start-up or
+clean-up information to your @code{awk} script. A @code{BEGIN} rule is
+executed, once, before the first input record has been read. An @code{END}
+rule is executed, once, after all the input has been read. For
+example:@refill
+
+@example
+awk 'BEGIN @{ print "Analysis of `foo'" @}
+ /foo/ @{ ++foobar @}
+ END @{ print "`foo' appears " foobar " times." @}' BBS-list
+@end example
+
+This program finds the number of records in the input file @file{BBS-list}
+that contain the string @samp{foo}. The @code{BEGIN} rule prints a title
+for the report. There is no need to use the @code{BEGIN} rule to
+initialize the counter @code{foobar} to zero, as @code{awk} does this
+for us automatically (@pxref{Variables}).
+
+The second rule increments the variable @code{foobar} every time a
+record containing the pattern @samp{foo} is read. The @code{END} rule
+prints the value of @code{foobar} at the end of the run.@refill
+
+The special patterns @code{BEGIN} and @code{END} cannot be used in ranges
+or with boolean operators (indeed, they cannot be used with any operators).
+
+An @code{awk} program may have multiple @code{BEGIN} and/or @code{END}
+rules. They are executed in the order they appear, all the @code{BEGIN}
+rules at start-up and all the @code{END} rules at termination.
+
+Multiple @code{BEGIN} and @code{END} sections are useful for writing
+library functions, since each library can have its own @code{BEGIN} or
+@code{END} rule to do its own initialization and/or cleanup. Note that
+the order in which library functions are named on the command line
+controls the order in which their @code{BEGIN} and @code{END} rules are
+executed. Therefore you have to be careful to write such rules in
+library files so that the order in which they are executed doesn't matter.
+@xref{Command Line, ,Invoking @code{awk}}, for more information on
+using library functions.
+
+If an @code{awk} program only has a @code{BEGIN} rule, and no other
+rules, then the program exits after the @code{BEGIN} rule has been run.
+(Older versions of @code{awk} used to keep reading and ignoring input
+until end of file was seen.) However, if an @code{END} rule exists as
+well, then the input will be read, even if there are no other rules in
+the program. This is necessary in case the @code{END} rule checks the
+@code{NR} variable.
+
+@code{BEGIN} and @code{END} rules must have actions; there is no default
+action for these rules since there is no current record when they run.
+
+@node Empty, , BEGIN/END, Patterns
+@comment node-name, next, previous, up
+@section The Empty Pattern
+
+@cindex empty pattern
+@cindex pattern, empty
+An empty pattern is considered to match @emph{every} input record. For
+example, the program:@refill
+
+@example
+awk '@{ print $1 @}' BBS-list
+@end example
+
+@noindent
+prints the first field of every record.
+
+@node Actions, Expressions, Patterns, Top
+@chapter Overview of Actions
+@cindex action, definition of
+@cindex curly braces
+@cindex action, curly braces
+@cindex action, separating statements
+
+An @code{awk} program or script consists of a series of
+rules and function definitions, interspersed. (Functions are
+described later. @xref{User-defined, ,User-defined Functions}.)
+
+A rule contains a pattern and an action, either of which may be
+omitted. The purpose of the @dfn{action} is to tell @code{awk} what to do
+once a match for the pattern is found. Thus, the entire program
+looks somewhat like this:
+
+@example
+@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]}
+@r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]}
+@dots{}
+function @var{name} (@var{args}) @{ @dots{} @}
+@dots{}
+@end example
+
+An action consists of one or more @code{awk} @dfn{statements}, enclosed
+in curly braces (@samp{@{} and @samp{@}}). Each statement specifies one
+thing to be done. The statements are separated by newlines or
+semicolons.
+
+The curly braces around an action must be used even if the action
+contains only one statement, or even if it contains no statements at
+all. However, if you omit the action entirely, omit the curly braces as
+well. (An omitted action is equivalent to @samp{@{ print $0 @}}.)
+
+Here are the kinds of statements supported in @code{awk}:
+
+@itemize @bullet
+@item
+Expressions, which can call functions or assign values to variables
+(@pxref{Expressions, ,Expressions as Action Statements}). Executing
+this kind of statement simply computes the value of the expression and
+then ignores it. This is useful when the expression has side effects
+(@pxref{Assignment Ops, ,Assignment Expressions}).@refill
+
+@item
+Control statements, which specify the control flow of @code{awk}
+programs. The @code{awk} language gives you C-like constructs
+(@code{if}, @code{for}, @code{while}, and so on) as well as a few
+special ones (@pxref{Statements, ,Control Statements in Actions}).@refill
+
+@item
+Compound statements, which consist of one or more statements enclosed in
+curly braces. A compound statement is used in order to put several
+statements together in the body of an @code{if}, @code{while}, @code{do}
+or @code{for} statement.
+
+@item
+Input control, using the @code{getline} command
+(@pxref{Getline, ,Explicit Input with @code{getline}}), and the @code{next}
+statement (@pxref{Next Statement, ,The @code{next} Statement}).
+
+@item
+Output statements, @code{print} and @code{printf}.
+@xref{Printing, ,Printing Output}.@refill
+
+@item
+Deletion statements, for deleting array elements.
+@xref{Delete, ,The @code{delete} Statement}.@refill
+@end itemize
+
+@iftex
+The next two chapters cover in detail expressions and control
+statements, respectively. We go on to treat arrays and built-in
+functions, both of which are used in expressions. Then we proceed
+to discuss how to define your own functions.
+@end iftex
+
+@node Expressions, Statements, Actions, Top
+@chapter Expressions as Action Statements
+@cindex expression
+
+Expressions are the basic building block of @code{awk} actions. An
+expression evaluates to a value, which you can print, test, store in a
+variable or pass to a function. But beyond that, an expression can assign a new value to a variable
+or a field, with an assignment operator.
+
+An expression can serve as a statement on its own. Most other kinds of
+statements contain one or more expressions which specify data to be
+operated on. As in other languages, expressions in @code{awk} include
+variables, array references, constants, and function calls, as well as
+combinations of these with various operators.
+
+@menu
+* Constants:: String, numeric, and regexp constants.
+* Variables:: Variables give names to values for later use.
+* Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-}, etc.)
+* Concatenation:: Concatenating strings.
+* Comparison Ops:: Comparison of numbers and strings
+ with @samp{<}, etc.
+* Boolean Ops:: Combining comparison expressions
+ using boolean operators
+ @samp{||} (``or''), @samp{&&} (``and'') and @samp{!} (``not'').
+
+* Assignment Ops:: Changing the value of a variable or a field.
+* Increment Ops:: Incrementing the numeric value of a variable.
+
+* Conversion:: The conversion of strings to numbers
+ and vice versa.
+* Values:: The whole truth about numbers and strings.
+* Conditional Exp:: Conditional expressions select
+ between two subexpressions under control
+ of a third subexpression.
+* Function Calls:: A function call is an expression.
+* Precedence:: How various operators nest.
+@end menu
+
+@node Constants, Variables, Expressions, Expressions
+@section Constant Expressions
+@cindex constants, types of
+@cindex string constants
+
+The simplest type of expression is the @dfn{constant}, which always has
+the same value. There are three types of constants: numeric constants,
+string constants, and regular expression constants.
+
+@cindex numeric constant
+@cindex numeric value
+A @dfn{numeric constant} stands for a number. This number can be an
+integer, a decimal fraction, or a number in scientific (exponential)
+notation. Note that all numeric values are represented within
+@code{awk} in double-precision floating point. Here are some examples
+of numeric constants, which all have the same value:
+
+@example
+105
+1.05e+2
+1050e-1
+@end example
+
+A string constant consists of a sequence of characters enclosed in
+double-quote marks. For example:
+
+@example
+"parrot"
+@end example
+
+@noindent
+@iftex
+@cindex differences between @code{gawk} and @code{awk}
+@end iftex
+represents the string whose contents are @samp{parrot}. Strings in
+@code{gawk} can be of any length and they can contain all the possible
+8-bit ASCII characters including ASCII NUL. Other @code{awk}
+implementations may have difficulty with some character codes.@refill
+
+@cindex escape sequence notation
+Some characters cannot be included literally in a string constant. You
+represent them instead with @dfn{escape sequences}, which are character
+sequences beginning with a backslash (@samp{\}).
+
+One use of an escape sequence is to include a double-quote character in
+a string constant. Since a plain double-quote would end the string, you
+must use @samp{\"} to represent a single double-quote character as a
+part of the string.
+The
+backslash character itself is another character that cannot be
+included normally; you write @samp{\\} to put one backslash in the
+string. Thus, the string whose contents are the two characters
+@samp{"\} must be written @code{"\"\\"}.
+
+Another use of backslash is to represent unprintable characters
+such as newline. While there is nothing to stop you from writing most
+of these characters directly in a string constant, they may look ugly.
+
+Here is a table of all the escape sequences used in @code{awk}:
+
+@table @code
+@item \\
+Represents a literal backslash, @samp{\}.
+
+@item \a
+Represents the ``alert'' character, control-g, ASCII code 7.
+
+@item \b
+Represents a backspace, control-h, ASCII code 8.
+
+@item \f
+Represents a formfeed, control-l, ASCII code 12.
+
+@item \n
+Represents a newline, control-j, ASCII code 10.
+
+@item \r
+Represents a carriage return, control-m, ASCII code 13.
+
+@item \t
+Represents a horizontal tab, control-i, ASCII code 9.
+
+@item \v
+Represents a vertical tab, control-k, ASCII code 11.
+
+@item \@var{nnn}
+Represents the octal value @var{nnn}, where @var{nnn} are one to three
+digits between 0 and 7. For example, the code for the ASCII ESC
+(escape) character is @samp{\033}.@refill
+
+@item \x@var{hh}@dots{}
+Represents the hexadecimal value @var{hh}, where @var{hh} are hexadecimal
+digits (@samp{0} through @samp{9} and either @samp{A} through @samp{F} or
+@samp{a} through @samp{f}). Like the same construct in @sc{ansi} C, the escape
+sequence continues until the first non-hexadecimal digit is seen. However,
+using more than two hexadecimal digits produces undefined results. (The
+@samp{\x} escape sequence is not allowed in @sc{posix} @code{awk}.)@refill
+@end table
+
+A @dfn{constant regexp} is a regular expression description enclosed in
+slashes, such as @code{/^beginning and end$/}. Most regexps used in
+@code{awk} programs are constant, but the @samp{~} and @samp{!~}
+operators can also match computed or ``dynamic'' regexps
+(@pxref{Regexp Usage, ,How to Use Regular Expressions}).@refill
+
+Constant regexps may be used like simple expressions. When a
+constant regexp is not on the right hand side of the @samp{~} or
+@samp{!~} operators, it has the same meaning as if it appeared
+in a pattern, i.e. @samp{($0 ~ /foo/)}
+(@pxref{Expression Patterns, ,Expressions as Patterns}).
+This means that the two code segments,@refill
+
+@example
+if ($0 ~ /barfly/ || $0 ~ /camelot/)
+ print "found"
+@end example
+
+@noindent
+and
+
+@example
+if (/barfly/ || /camelot/)
+ print "found"
+@end example
+
+@noindent
+are exactly equivalent. One rather bizarre consequence of this rule is
+that the following boolean expression is legal, but does not do what the user
+intended:@refill
+
+@example
+if (/foo/ ~ $1) print "found foo"
+@end example
+
+This code is ``obviously'' testing @code{$1} for a match against the regexp
+@code{/foo/}. But in fact, the expression @code{(/foo/ ~ $1)} actually means
+@code{(($0 ~ /foo/) ~ $1)}. In other words, first match the input record
+against the regexp @code{/foo/}. The result will be either a 0 or a 1,
+depending upon the success or failure of the match. Then match that result
+against the first field in the record.@refill
+
+Since it is unlikely that you would ever really wish to make this kind of
+test, @code{gawk} will issue a warning when it sees this construct in
+a program.@refill
+
+Another consequence of this rule is that the assignment statement
+
+@example
+matches = /foo/
+@end example
+
+@noindent
+will assign either 0 or 1 to the variable @code{matches}, depending
+upon the contents of the current input record.
+
+Constant regular expressions are also used as the first argument for
+the @code{sub} and @code{gsub} functions
+(@pxref{String Functions, ,Built-in Functions for String Manipulation}).@refill
+
+This feature of the language was never well documented until the
+@sc{posix} specification.
+
+You may be wondering, when is
+
+@example
+$1 ~ /foo/ @{ @dots{} @}
+@end example
+
+@noindent
+preferable to
+
+@example
+$1 ~ "foo" @{ @dots{} @}
+@end example
+
+Since the right-hand sides of both @samp{~} operators are constants,
+it is more efficient to use the @samp{/foo/} form: @code{awk} can note
+that you have supplied a regexp and store it internally in a form that
+makes pattern matching more efficient. In the second form, @code{awk}
+must first convert the string into this internal form, and then perform
+the pattern matching. The first form is also better style; it shows
+clearly that you intend a regexp match.
+
+@node Variables, Arithmetic Ops, Constants, Expressions
+@section Variables
+@cindex variables, user-defined
+@cindex user-defined variables
+@c there should be more than one subsection, ideally. Not a big deal.
+@c But usually there are supposed to be at least two. One way to get
+@c around this is to write the info in the subsection as the info in the
+@c section itself and not have any subsections.. --mew
+
+Variables let you give names to values and refer to them later. You have
+already seen variables in many of the examples. The name of a variable
+must be a sequence of letters, digits and underscores, but it may not begin
+with a digit. Case is significant in variable names; @code{a} and @code{A}
+are distinct variables.
+
+A variable name is a valid expression by itself; it represents the
+variable's current value. Variables are given new values with
+@dfn{assignment operators} and @dfn{increment operators}.
+@xref{Assignment Ops, ,Assignment Expressions}.
+
+A few variables have special built-in meanings, such as @code{FS}, the
+field separator, and @code{NF}, the number of fields in the current
+input record. @xref{Built-in Variables}, for a list of them. These
+built-in variables can be used and assigned just like all other
+variables, but their values are also used or changed automatically by
+@code{awk}. Each built-in variable's name is made entirely of upper case
+letters.
+
+Variables in @code{awk} can be assigned either numeric or string
+values. By default, variables are initialized to the null string, which
+is effectively zero if converted to a number. There is no need to
+``initialize'' each variable explicitly in @code{awk}, the way you would in C or most other traditional languages.
+
+@menu
+* Assignment Options:: Setting variables on the command line
+ and a summary of command line syntax.
+ This is an advanced method of input.
+@end menu
+
+@node Assignment Options, , Variables, Variables
+@subsection Assigning Variables on the Command Line
+
+You can set any @code{awk} variable by including a @dfn{variable assignment}
+among the arguments on the command line when you invoke @code{awk}
+(@pxref{Command Line, ,Invoking @code{awk}}). Such an assignment has
+this form:@refill
+
+@example
+@var{variable}=@var{text}
+@end example
+
+@noindent
+With it, you can set a variable either at the beginning of the
+@code{awk} run or in between input files.
+
+If you precede the assignment with the @samp{-v} option, like this:
+
+@example
+-v @var{variable}=@var{text}
+@end example
+
+@noindent
+then the variable is set at the very beginning, before even the
+@code{BEGIN} rules are run. The @samp{-v} option and its assignment
+must precede all the file name arguments, as well as the program text.
+
+Otherwise, the variable assignment is performed at a time determined by
+its position among the input file arguments: after the processing of the
+preceding input file argument. For example:
+
+@example
+awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list
+@end example
+
+@noindent
+prints the value of field number @code{n} for all input records. Before
+the first file is read, the command line sets the variable @code{n}
+equal to 4. This causes the fourth field to be printed in lines from
+the file @file{inventory-shipped}. After the first file has finished,
+but before the second file is started, @code{n} is set to 2, so that the
+second field is printed in lines from @file{BBS-list}.
+
+Command line arguments are made available for explicit examination by
+the @code{awk} program in an array named @code{ARGV}
+(@pxref{Built-in Variables}).@refill
+
+@code{awk} processes the values of command line assignments for escape
+sequences (@pxref{Constants, ,Constant Expressions}).
+
+@node Arithmetic Ops, Concatenation, Variables, Expressions
+@section Arithmetic Operators
+@cindex arithmetic operators
+@cindex operators, arithmetic
+@cindex addition
+@cindex subtraction
+@cindex multiplication
+@cindex division
+@cindex remainder
+@cindex quotient
+@cindex exponentiation
+
+The @code{awk} language uses the common arithmetic operators when
+evaluating expressions. All of these arithmetic operators follow normal
+precedence rules, and work as you would expect them to. This example
+divides field three by field four, adds field two, stores the result
+into field one, and prints the resulting altered input record:
+
+@example
+awk '@{ $1 = $2 + $3 / $4; print @}' inventory-shipped
+@end example
+
+The arithmetic operators in @code{awk} are:
+
+@table @code
+@item @var{x} + @var{y}
+Addition.
+
+@item @var{x} - @var{y}
+Subtraction.
+
+@item - @var{x}
+Negation.
+
+@item + @var{x}
+Unary plus. No real effect on the expression.
+
+@item @var{x} * @var{y}
+Multiplication.
+
+@item @var{x} / @var{y}
+Division. Since all numbers in @code{awk} are double-precision
+floating point, the result is not rounded to an integer: @code{3 / 4}
+has the value 0.75.
+
+@item @var{x} % @var{y}
+@iftex
+@cindex differences between @code{gawk} and @code{awk}
+@end iftex
+Remainder. The quotient is rounded toward zero to an integer,
+multiplied by @var{y} and this result is subtracted from @var{x}.
+This operation is sometimes known as ``trunc-mod.'' The following
+relation always holds:
+
+@example
+b * int(a / b) + (a % b) == a
+@end example
+
+One possibly undesirable effect of this definition of remainder is that
+@code{@var{x} % @var{y}} is negative if @var{x} is negative. Thus,
+
+@example
+-17 % 8 = -1
+@end example
+
+In other @code{awk} implementations, the signedness of the remainder
+may be machine dependent.
+
+@item @var{x} ^ @var{y}
+@itemx @var{x} ** @var{y}
+Exponentiation: @var{x} raised to the @var{y} power. @code{2 ^ 3} has
+the value 8. The character sequence @samp{**} is equivalent to
+@samp{^}. (The @sc{posix} standard only specifies the use of @samp{^}
+for exponentiation.)
+@end table
+
+@node Concatenation, Comparison Ops, Arithmetic Ops, Expressions
+@section String Concatenation
+
+@cindex string operators
+@cindex operators, string
+@cindex concatenation
+There is only one string operation: concatenation. It does not have a
+specific operator to represent it. Instead, concatenation is performed by
+writing expressions next to one another, with no operator. For example:
+
+@example
+awk '@{ print "Field number one: " $1 @}' BBS-list
+@end example
+
+@noindent
+produces, for the first record in @file{BBS-list}:
+
+@example
+Field number one: aardvark
+@end example
+
+Without the space in the string constant after the @samp{:}, the line
+would run together. For example:
+
+@example
+awk '@{ print "Field number one:" $1 @}' BBS-list
+@end example
+
+@noindent
+produces, for the first record in @file{BBS-list}:
+
+@example
+Field number one:aardvark
+@end example
+
+Since string concatenation does not have an explicit operator, it is
+often necessary to insure that it happens where you want it to by
+enclosing the items to be concatenated in parentheses. For example, the
+following code fragment does not concatenate @code{file} and @code{name}
+as you might expect:
+
+@example
+file = "file"
+name = "name"
+print "something meaningful" > file name
+@end example
+
+@noindent
+It is necessary to use the following:
+
+@example
+print "something meaningful" > (file name)
+@end example
+
+We recommend you use parentheses around concatenation in all but the
+most common contexts (such as in the right-hand operand of @samp{=}).
+
+@ignore
+@code{gawk} actually now allows a concatenation on the right hand
+side of a @code{>} redirection, but other @code{awk}s don't. So for
+now we won't mention that fact.
+@end ignore
+
+@node Comparison Ops, Boolean Ops, Concatenation, Expressions
+@section Comparison Expressions
+@cindex comparison expressions
+@cindex expressions, comparison
+@cindex relational operators
+@cindex operators, relational
+@cindex regexp operators
+
+@dfn{Comparison expressions} compare strings or numbers for
+relationships such as equality. They are written using @dfn{relational
+operators}, which are a superset of those in C. Here is a table of
+them:
+
+@table @code
+@item @var{x} < @var{y}
+True if @var{x} is less than @var{y}.
+
+@item @var{x} <= @var{y}
+True if @var{x} is less than or equal to @var{y}.
+
+@item @var{x} > @var{y}
+True if @var{x} is greater than @var{y}.
+
+@item @var{x} >= @var{y}
+True if @var{x} is greater than or equal to @var{y}.
+
+@item @var{x} == @var{y}
+True if @var{x} is equal to @var{y}.
+
+@item @var{x} != @var{y}
+True if @var{x} is not equal to @var{y}.
+
+@item @var{x} ~ @var{y}
+True if the string @var{x} matches the regexp denoted by @var{y}.
+
+@item @var{x} !~ @var{y}
+True if the string @var{x} does not match the regexp denoted by @var{y}.
+
+@item @var{subscript} in @var{array}
+True if array @var{array} has an element with the subscript @var{subscript}.
+@end table
+
+Comparison expressions have the value 1 if true and 0 if false.
+
+The rules @code{gawk} uses for performing comparisons are based on those
+in draft 11.2 of the @sc{posix} standard. The @sc{posix} standard introduced
+the concept of a @dfn{numeric string}, which is simply a string that looks
+like a number, for example, @code{@w{" +2"}}.
+
+@vindex CONVFMT
+When performing a relational operation, @code{gawk} considers the type of an
+operand to be the type it received on its last @emph{assignment}, rather
+than the type of its last @emph{use}
+(@pxref{Values, ,Numeric and String Values}).
+This type is @emph{unknown} when the operand is from an ``external'' source:
+field variables, command line arguments, array elements resulting from a
+@code{split} operation, and the value of an @code{ENVIRON} element.
+In this case only, if the operand is a numeric string, then it is
+considered to be of both string type and numeric type. If at least one
+operand of a comparison is of string type only, then a string
+comparison is performed. Any numeric operand will be converted to a
+string using the value of @code{CONVFMT}
+(@pxref{Conversion, ,Conversion of Strings and Numbers}).
+If one operand of a comparison is numeric, and the other operand is
+either numeric or both numeric and string, then @code{gawk} does a
+numeric comparison. If both operands have both types, then the
+comparison is numeric. Strings are compared
+by comparing the first character of each, then the second character of each,
+and so on. Thus @code{"10"} is less than @code{"9"}. If there are two
+strings where one is a prefix of the other, the shorter string is less than
+the longer one. Thus @code{"abc"} is less than @code{"abcd"}.@refill
+
+Here are some sample expressions, how @code{gawk} compares them, and what
+the result of the comparison is.
+
+@table @code
+@item 1.5 <= 2.0
+numeric comparison (true)
+
+@item "abc" >= "xyz"
+string comparison (false)
+
+@item 1.5 != " +2"
+string comparison (true)
+
+@item "1e2" < "3"
+string comparison (true)
+
+@item a = 2; b = "2"
+@itemx a == b
+string comparison (true)
+@end table
+
+@example
+echo 1e2 3 | awk '@{ print ($1 < $2) ? "true" : "false" @}'
+@end example
+
+@noindent
+prints @samp{false} since both @code{$1} and @code{$2} are numeric
+strings and thus have both string and numeric types, thus dictating
+a numeric comparison.
+
+The purpose of the comparison rules and the use of numeric strings is
+to attempt to produce the behavior that is ``least surprising,'' while
+still ``doing the right thing.''
+
+String comparisons and regular expression comparisons are very different.
+For example,
+
+@example
+$1 == "foo"
+@end example
+
+@noindent
+has the value of 1, or is true, if the first field of the current input
+record is precisely @samp{foo}. By contrast,
+
+@example
+$1 ~ /foo/
+@end example
+
+@noindent
+has the value 1 if the first field contains @samp{foo}, such as @samp{foobar}.
+
+The right hand operand of the @samp{~} and @samp{!~} operators may be
+either a constant regexp (@code{/@dots{}/}), or it may be an ordinary
+expression, in which case the value of the expression as a string is a
+dynamic regexp (@pxref{Regexp Usage, ,How to Use Regular Expressions}).
+
+@cindex regexp as expression
+In very recent implementations of @code{awk}, a constant regular
+expression in slashes by itself is also an expression. The regexp
+@code{/@var{regexp}/} is an abbreviation for this comparison expression:
+
+@example
+$0 ~ /@var{regexp}/
+@end example
+
+In some contexts it may be necessary to write parentheses around the
+regexp to avoid confusing the @code{gawk} parser. For example,
+@code{(/x/ - /y/) > threshold} is not allowed, but @code{((/x/) - (/y/))
+> threshold} parses properly.
+
+One special place where @code{/foo/} is @emph{not} an abbreviation for
+@code{$0 ~ /foo/} is when it is the right-hand operand of @samp{~} or
+@samp{!~}! @xref{Constants, ,Constant Expressions}, where this is
+discussed in more detail.
+
+@node Boolean Ops, Assignment Ops, Comparison Ops, Expressions
+@section Boolean Expressions
+@cindex expressions, boolean
+@cindex boolean expressions
+@cindex operators, boolean
+@cindex boolean operators
+@cindex logical operations
+@cindex and operator
+@cindex or operator
+@cindex not operator
+
+A @dfn{boolean expression} is a combination of comparison expressions or
+matching expressions, using the boolean operators ``or''
+(@samp{||}), ``and'' (@samp{&&}), and ``not'' (@samp{!}), along with
+parentheses to control nesting. The truth of the boolean expression is
+computed by combining the truth values of the component expressions.
+
+Boolean expressions can be used wherever comparison and matching
+expressions can be used. They can be used in @code{if}, @code{while}
+@code{do} and @code{for} statements. They have numeric values (1 if true,
+0 if false), which come into play if the result of the boolean expression
+is stored in a variable, or used in arithmetic.@refill
+
+In addition, every boolean expression is also a valid boolean pattern, so
+you can use it as a pattern to control the execution of rules.
+
+Here are descriptions of the three boolean operators, with an example of
+each. It may be instructive to compare these examples with the
+analogous examples of boolean patterns
+(@pxref{Boolean Patterns, ,Boolean Operators and Patterns}), which
+use the same boolean operators in patterns instead of expressions.@refill
+
+@table @code
+@item @var{boolean1} && @var{boolean2}
+True if both @var{boolean1} and @var{boolean2} are true. For example,
+the following statement prints the current input record if it contains
+both @samp{2400} and @samp{foo}.@refill
+
+@smallexample
+if ($0 ~ /2400/ && $0 ~ /foo/) print
+@end smallexample
+
+The subexpression @var{boolean2} is evaluated only if @var{boolean1}
+is true. This can make a difference when @var{boolean2} contains
+expressions that have side effects: in the case of @code{$0 ~ /foo/ &&
+($2 == bar++)}, the variable @code{bar} is not incremented if there is
+no @samp{foo} in the record.
+
+@item @var{boolean1} || @var{boolean2}
+True if at least one of @var{boolean1} or @var{boolean2} is true.
+For example, the following command prints all records in the input
+file @file{BBS-list} that contain @emph{either} @samp{2400} or
+@samp{foo}, or both.@refill
+
+@smallexample
+awk '@{ if ($0 ~ /2400/ || $0 ~ /foo/) print @}' BBS-list
+@end smallexample
+
+The subexpression @var{boolean2} is evaluated only if @var{boolean1}
+is false. This can make a difference when @var{boolean2} contains
+expressions that have side effects.
+
+@item !@var{boolean}
+True if @var{boolean} is false. For example, the following program prints
+all records in the input file @file{BBS-list} that do @emph{not} contain the
+string @samp{foo}.
+
+@smallexample
+awk '@{ if (! ($0 ~ /foo/)) print @}' BBS-list
+@end smallexample
+@end table
+
+@node Assignment Ops, Increment Ops, Boolean Ops, Expressions
+@section Assignment Expressions
+@cindex assignment operators
+@cindex operators, assignment
+@cindex expressions, assignment
+
+An @dfn{assignment} is an expression that stores a new value into a
+variable. For example, let's assign the value 1 to the variable
+@code{z}:@refill
+
+@example
+z = 1
+@end example
+
+After this expression is executed, the variable @code{z} has the value 1.
+Whatever old value @code{z} had before the assignment is forgotten.
+
+Assignments can store string values also. For example, this would store
+the value @code{"this food is good"} in the variable @code{message}:
+
+@example
+thing = "food"
+predicate = "good"
+message = "this " thing " is " predicate
+@end example
+
+@noindent
+(This also illustrates concatenation of strings.)
+
+The @samp{=} sign is called an @dfn{assignment operator}. It is the
+simplest assignment operator because the value of the right-hand
+operand is stored unchanged.
+
+@cindex side effect
+Most operators (addition, concatenation, and so on) have no effect
+except to compute a value. If you ignore the value, you might as well
+not use the operator. An assignment operator is different; it does
+produce a value, but even if you ignore the value, the assignment still
+makes itself felt through the alteration of the variable. We call this
+a @dfn{side effect}.
+
+@cindex lvalue
+The left-hand operand of an assignment need not be a variable
+(@pxref{Variables}); it can also be a field
+(@pxref{Changing Fields, ,Changing the Contents of a Field}) or
+an array element (@pxref{Arrays, ,Arrays in @code{awk}}).
+These are all called @dfn{lvalues},
+which means they can appear on the left-hand side of an assignment operator.
+The right-hand operand may be any expression; it produces the new value
+which the assignment stores in the specified variable, field or array
+element.@refill
+
+It is important to note that variables do @emph{not} have permanent types.
+The type of a variable is simply the type of whatever value it happens
+to hold at the moment. In the following program fragment, the variable
+@code{foo} has a numeric value at first, and a string value later on:
+
+@example
+foo = 1
+print foo
+foo = "bar"
+print foo
+@end example
+
+@noindent
+When the second assignment gives @code{foo} a string value, the fact that
+it previously had a numeric value is forgotten.
+
+An assignment is an expression, so it has a value: the same value that
+is assigned. Thus, @code{z = 1} as an expression has the value 1.
+One consequence of this is that you can write multiple assignments together:
+
+@example
+x = y = z = 0
+@end example
+
+@noindent
+stores the value 0 in all three variables. It does this because the
+value of @code{z = 0}, which is 0, is stored into @code{y}, and then
+the value of @code{y = z = 0}, which is 0, is stored into @code{x}.
+
+You can use an assignment anywhere an expression is called for. For
+example, it is valid to write @code{x != (y = 1)} to set @code{y} to 1
+and then test whether @code{x} equals 1. But this style tends to make
+programs hard to read; except in a one-shot program, you should
+rewrite it to get rid of such nesting of assignments. This is never very
+hard.
+
+Aside from @samp{=}, there are several other assignment operators that
+do arithmetic with the old value of the variable. For example, the
+operator @samp{+=} computes a new value by adding the right-hand value
+to the old value of the variable. Thus, the following assignment adds
+5 to the value of @code{foo}:
+
+@example
+foo += 5
+@end example
+
+@noindent
+This is precisely equivalent to the following:
+
+@example
+foo = foo + 5
+@end example
+
+@noindent
+Use whichever one makes the meaning of your program clearer.
+
+Here is a table of the arithmetic assignment operators. In each
+case, the right-hand operand is an expression whose value is converted
+to a number.
+
+@table @code
+@item @var{lvalue} += @var{increment}
+Adds @var{increment} to the value of @var{lvalue} to make the new value
+of @var{lvalue}.
+
+@item @var{lvalue} -= @var{decrement}
+Subtracts @var{decrement} from the value of @var{lvalue}.
+
+@item @var{lvalue} *= @var{coefficient}
+Multiplies the value of @var{lvalue} by @var{coefficient}.
+
+@item @var{lvalue} /= @var{quotient}
+Divides the value of @var{lvalue} by @var{quotient}.
+
+@item @var{lvalue} %= @var{modulus}
+Sets @var{lvalue} to its remainder by @var{modulus}.
+
+@item @var{lvalue} ^= @var{power}
+@itemx @var{lvalue} **= @var{power}
+Raises @var{lvalue} to the power @var{power}.
+(Only the @code{^=} operator is specified by @sc{posix}.)
+@end table
+
+@ignore
+From: gatech!ames!elroy!cit-vax!EQL.Caltech.Edu!rankin (Pat Rankin)
+ In the discussion of assignment operators, it states that
+``foo += 5'' "is precisely equivalent to" ``foo = foo + 5'' (p.77). That
+may be true for simple variables, but it's not true for expressions with
+side effects, like array references. For proof, try
+ BEGIN {
+ foo[rand()] += 5; for (x in foo) print x, foo[x]
+ bar[rand()] = bar[rand()] + 5; for (x in bar) print x, bar[x]
+ }
+I suspect that the original statement is simply untrue--that '+=' is more
+efficient in all cases.
+
+ADR --- Try to add something about this here for the next go 'round.
+@end ignore
+
+@node Increment Ops, Conversion, Assignment Ops, Expressions
+@section Increment Operators
+
+@cindex increment operators
+@cindex operators, increment
+@dfn{Increment operators} increase or decrease the value of a variable
+by 1. You could do the same thing with an assignment operator, so
+the increment operators add no power to the @code{awk} language; but they
+are convenient abbreviations for something very common.
+
+The operator to add 1 is written @samp{++}. It can be used to increment
+a variable either before or after taking its value.
+
+To pre-increment a variable @var{v}, write @code{++@var{v}}. This adds
+1 to the value of @var{v} and that new value is also the value of this
+expression. The assignment expression @code{@var{v} += 1} is completely
+equivalent.
+
+Writing the @samp{++} after the variable specifies post-increment. This
+increments the variable value just the same; the difference is that the
+value of the increment expression itself is the variable's @emph{old}
+value. Thus, if @code{foo} has the value 4, then the expression @code{foo++}
+has the value 4, but it changes the value of @code{foo} to 5.
+
+The post-increment @code{foo++} is nearly equivalent to writing @code{(foo
++= 1) - 1}. It is not perfectly equivalent because all numbers in
+@code{awk} are floating point: in floating point, @code{foo + 1 - 1} does
+not necessarily equal @code{foo}. But the difference is minute as
+long as you stick to numbers that are fairly small (less than a trillion).
+
+Any lvalue can be incremented. Fields and array elements are incremented
+just like variables. (Use @samp{$(i++)} when you wish to do a field reference
+and a variable increment at the same time. The parentheses are necessary
+because of the precedence of the field reference operator, @samp{$}.)
+@c expert information in the last parenthetical remark
+
+The decrement operator @samp{--} works just like @samp{++} except that
+it subtracts 1 instead of adding. Like @samp{++}, it can be used before
+the lvalue to pre-decrement or after it to post-decrement.
+
+Here is a summary of increment and decrement expressions.
+
+@table @code
+@item ++@var{lvalue}
+This expression increments @var{lvalue} and the new value becomes the
+value of this expression.
+
+@item @var{lvalue}++
+This expression causes the contents of @var{lvalue} to be incremented.
+The value of the expression is the @emph{old} value of @var{lvalue}.
+
+@item --@var{lvalue}
+Like @code{++@var{lvalue}}, but instead of adding, it subtracts. It
+decrements @var{lvalue} and delivers the value that results.
+
+@item @var{lvalue}--
+Like @code{@var{lvalue}++}, but instead of adding, it subtracts. It
+decrements @var{lvalue}. The value of the expression is the @emph{old}
+value of @var{lvalue}.
+@end table
+
+@node Conversion, Values, Increment Ops, Expressions
+@section Conversion of Strings and Numbers
+
+@cindex conversion of strings and numbers
+Strings are converted to numbers, and numbers to strings, if the context
+of the @code{awk} program demands it. For example, if the value of
+either @code{foo} or @code{bar} in the expression @code{foo + bar}
+happens to be a string, it is converted to a number before the addition
+is performed. If numeric values appear in string concatenation, they
+are converted to strings. Consider this:@refill
+
+@example
+two = 2; three = 3
+print (two three) + 4
+@end example
+
+@noindent
+This eventually prints the (numeric) value 27. The numeric values of
+the variables @code{two} and @code{three} are converted to strings and
+concatenated together, and the resulting string is converted back to the
+number 23, to which 4 is then added.
+
+If, for some reason, you need to force a number to be converted to a
+string, concatenate the null string with that number. To force a string
+to be converted to a number, add zero to that string.
+
+A string is converted to a number by interpreting a numeric prefix
+of the string as numerals:
+@code{"2.5"} converts to 2.5, @code{"1e3"} converts to 1000, and @code{"25fix"}
+has a numeric value of 25.
+Strings that can't be interpreted as valid numbers are converted to
+zero.
+
+@vindex CONVFMT
+The exact manner in which numbers are converted into strings is controlled
+by the @code{awk} built-in variable @code{CONVFMT} (@pxref{Built-in Variables}).
+Numbers are converted using a special version of the @code{sprintf} function
+(@pxref{Built-in, ,Built-in Functions}) with @code{CONVFMT} as the format
+specifier.@refill
+
+@code{CONVFMT}'s default value is @code{"%.6g"}, which prints a value with
+at least six significant digits. For some applications you will want to
+change it to specify more precision. Double precision on most modern
+machines gives you 16 or 17 decimal digits of precision.
+
+Strange results can happen if you set @code{CONVFMT} to a string that doesn't
+tell @code{sprintf} how to format floating point numbers in a useful way.
+For example, if you forget the @samp{%} in the format, all numbers will be
+converted to the same constant string.@refill
+
+As a special case, if a number is an integer, then the result of converting
+it to a string is @emph{always} an integer, no matter what the value of
+@code{CONVFMT} may be. Given the following code fragment:
+
+@example
+CONVFMT = "%2.2f"
+a = 12
+b = a ""
+@end example
+
+@noindent
+@code{b} has the value @code{"12"}, not @code{"12.00"}.
+
+@ignore
+For the 2.14 version, describe the ``stickyness'' of conversions. Right now
+the manual assumes everywhere that variables are either numbers or strings;
+in fact both kinds of values may be valid. If both happen to be valid, a
+conversion isn't necessary and isn't done. Revising the manual to be
+consistent with this, though, is too big a job to tackle at the moment.
+
+7/92: This has sort of been done, only the section isn't completely right!
+ What to do?
+7/92: Pretty much fixed, at least for the short term, thanks to text
+ from David.
+@end ignore
+
+@vindex OFMT
+Prior to the @sc{posix} standard, @code{awk} specified that the value
+of @code{OFMT} was used for converting numbers to strings. @code{OFMT}
+specifies the output format to use when printing numbers with @code{print}.
+@code{CONVFMT} was introduced in order to separate the semantics of
+conversions from the semantics of printing. Both @code{CONVFMT} and
+@code{OFMT} have the same default value: @code{"%.6g"}. In the vast majority
+of cases, old @code{awk} programs will not change their behavior.
+However, this use of @code{OFMT} is something to keep in mind if you must
+port your program to other implementations of @code{awk}; we recommend
+that instead of changing your programs, you just port @code{gawk} itself!@refill
+
+@node Values, Conditional Exp, Conversion, Expressions
+@section Numeric and String Values
+@cindex conversion of strings and numbers
+
+Through most of this manual, we present @code{awk} values (such as constants,
+fields, or variables) as @emph{either} numbers @emph{or} strings. This is
+a convenient way to think about them, since typically they are used in only
+one way, or the other.
+
+In truth though, @code{awk} values can be @emph{both} string and
+numeric, at the same time. Internally, @code{awk} represents values
+with a string, a (floating point) number, and an indication that one,
+the other, or both representations of the value are valid.
+
+Keeping track of both kinds of values is important for execution
+efficiency: a variable can acquire a string value the first time it
+is used as a string, and then that string value can be used until the
+variable is assigned a new value. Thus, if a variable with only a numeric
+value is used in several concatenations in a row, it only has to be given
+a string representation once. The numeric value remains valid, so that
+no conversion back to a number is necessary if the variable is later used
+in an arithmetic expression.
+
+Tracking both kinds of values is also important for precise numerical
+calculations. Consider the following:
+
+@smallexample
+a = 123.321
+CONVFMT = "%3.1f"
+b = a " is a number"
+c = a + 1.654
+@end smallexample
+
+@noindent
+The variable @code{a} receives a string value in the concatenation and
+assignment to @code{b}. The string value of @code{a} is @code{"123.3"}.
+If the numeric value was lost when it was converted to a string, then the
+numeric use of @code{a} in the last statement would lose information.
+@code{c} would be assigned the value 124.954 instead of 124.975.
+Such errors accumulate rapidly, and very adversely affect numeric
+computations.@refill
+
+Once a numeric value acquires a corresponding string value, it stays valid
+until a new assignment is made. If @code{CONVFMT}
+(@pxref{Conversion, ,Conversion of Strings and Numbers}) changes in the
+meantime, the old string value will still be used. For example:@refill
+
+@smallexample
+BEGIN @{
+ CONVFMT = "%2.2f"
+ a = 123.456
+ b = a "" # force `a' to have string value too
+ printf "a = %s\n", a
+ CONVFMT = "%.6g"
+ printf "a = %s\n", a
+ a += 0 # make `a' numeric only again
+ printf "a = %s\n", a # use `a' as string
+@}
+@end smallexample
+
+@noindent
+This program prints @samp{a = 123.46} twice, and then prints
+@samp{a = 123.456}.
+
+@xref{Conversion, ,Conversion of Strings and Numbers}, for the rules that
+specify how string values are made from numeric values.
+
+@node Conditional Exp, Function Calls, Values, Expressions
+@section Conditional Expressions
+@cindex conditional expression
+@cindex expression, conditional
+
+A @dfn{conditional expression} is a special kind of expression with
+three operands. It allows you to use one expression's value to select
+one of two other expressions.
+
+The conditional expression looks the same as in the C language:
+
+@example
+@var{selector} ? @var{if-true-exp} : @var{if-false-exp}
+@end example
+
+@noindent
+There are three subexpressions. The first, @var{selector}, is always
+computed first. If it is ``true'' (not zero and not null) then
+@var{if-true-exp} is computed next and its value becomes the value of
+the whole expression. Otherwise, @var{if-false-exp} is computed next
+and its value becomes the value of the whole expression.@refill
+
+For example, this expression produces the absolute value of @code{x}:
+
+@example
+x > 0 ? x : -x
+@end example
+
+Each time the conditional expression is computed, exactly one of
+@var{if-true-exp} and @var{if-false-exp} is computed; the other is ignored.
+This is important when the expressions contain side effects. For example,
+this conditional expression examines element @code{i} of either array
+@code{a} or array @code{b}, and increments @code{i}.
+
+@example
+x == y ? a[i++] : b[i++]
+@end example
+
+@noindent
+This is guaranteed to increment @code{i} exactly once, because each time
+one or the other of the two increment expressions is executed,
+and the other is not.
+
+@node Function Calls, Precedence, Conditional Exp, Expressions
+@section Function Calls
+@cindex function call
+@cindex calling a function
+
+A @dfn{function} is a name for a particular calculation. Because it has
+a name, you can ask for it by name at any point in the program. For
+example, the function @code{sqrt} computes the square root of a number.
+
+A fixed set of functions are @dfn{built-in}, which means they are
+available in every @code{awk} program. The @code{sqrt} function is one
+of these. @xref{Built-in, ,Built-in Functions}, for a list of built-in
+functions and their descriptions. In addition, you can define your own
+functions in the program for use elsewhere in the same program.
+@xref{User-defined, ,User-defined Functions}, for how to do this.@refill
+
+@cindex arguments in function call
+The way to use a function is with a @dfn{function call} expression,
+which consists of the function name followed by a list of
+@dfn{arguments} in parentheses. The arguments are expressions which
+give the raw materials for the calculation that the function will do.
+When there is more than one argument, they are separated by commas. If
+there are no arguments, write just @samp{()} after the function name.
+Here are some examples:
+
+@example
+sqrt(x^2 + y^2) # @r{One argument}
+atan2(y, x) # @r{Two arguments}
+rand() # @r{No arguments}
+@end example
+
+@strong{Do not put any space between the function name and the
+open-parenthesis!} A user-defined function name looks just like the name of
+a variable, and space would make the expression look like concatenation
+of a variable with an expression inside parentheses. Space before the
+parenthesis is harmless with built-in functions, but it is best not to get
+into the habit of using space to avoid mistakes with user-defined
+functions.
+
+Each function expects a particular number of arguments. For example, the
+@code{sqrt} function must be called with a single argument, the number
+to take the square root of:
+
+@example
+sqrt(@var{argument})
+@end example
+
+Some of the built-in functions allow you to omit the final argument.
+If you do so, they use a reasonable default.
+@xref{Built-in, ,Built-in Functions}, for full details. If arguments
+are omitted in calls to user-defined functions, then those arguments are
+treated as local variables, initialized to the null string
+(@pxref{User-defined, ,User-defined Functions}).@refill
+
+Like every other expression, the function call has a value, which is
+computed by the function based on the arguments you give it. In this
+example, the value of @code{sqrt(@var{argument})} is the square root of the
+argument. A function can also have side effects, such as assigning the
+values of certain variables or doing I/O.
+
+Here is a command to read numbers, one number per line, and print the
+square root of each one:
+
+@example
+awk '@{ print "The square root of", $1, "is", sqrt($1) @}'
+@end example
+
+@node Precedence, , Function Calls, Expressions
+@section Operator Precedence (How Operators Nest)
+@cindex precedence
+@cindex operator precedence
+
+@dfn{Operator precedence} determines how operators are grouped, when
+different operators appear close by in one expression. For example,
+@samp{*} has higher precedence than @samp{+}; thus, @code{a + b * c}
+means to multiply @code{b} and @code{c}, and then add @code{a} to the
+product (i.e., @code{a + (b * c)}).
+
+You can overrule the precedence of the operators by using parentheses.
+You can think of the precedence rules as saying where the
+parentheses are assumed if you do not write parentheses yourself. In
+fact, it is wise to always use parentheses whenever you have an unusual
+combination of operators, because other people who read the program may
+not remember what the precedence is in this case. You might forget,
+too; then you could make a mistake. Explicit parentheses will help prevent
+any such mistake.
+
+When operators of equal precedence are used together, the leftmost
+operator groups first, except for the assignment, conditional and
+exponentiation operators, which group in the opposite order.
+Thus, @code{a - b + c} groups as @code{(a - b) + c};
+@code{a = b = c} groups as @code{a = (b = c)}.@refill
+
+The precedence of prefix unary operators does not matter as long as only
+unary operators are involved, because there is only one way to parse
+them---innermost first. Thus, @code{$++i} means @code{$(++i)} and
+@code{++$x} means @code{++($x)}. However, when another operator follows
+the operand, then the precedence of the unary operators can matter.
+Thus, @code{$x^2} means @code{($x)^2}, but @code{-x^2} means
+@code{-(x^2)}, because @samp{-} has lower precedence than @samp{^}
+while @samp{$} has higher precedence.
+
+Here is a table of the operators of @code{awk}, in order of increasing
+precedence:
+
+@table @asis
+@item assignment
+@samp{=}, @samp{+=}, @samp{-=}, @samp{*=}, @samp{/=}, @samp{%=},
+@samp{^=}, @samp{**=}. These operators group right-to-left.
+(The @samp{**=} operator is not specified by @sc{posix}.)
+
+@item conditional
+@samp{?:}. This operator groups right-to-left.
+
+@item logical ``or''.
+@samp{||}.
+
+@item logical ``and''.
+@samp{&&}.
+
+@item array membership
+@samp{in}.
+
+@item matching
+@samp{~}, @samp{!~}.
+
+@item relational, and redirection
+The relational operators and the redirections have the same precedence
+level. Characters such as @samp{>} serve both as relationals and as
+redirections; the context distinguishes between the two meanings.
+
+The relational operators are @samp{<}, @samp{<=}, @samp{==}, @samp{!=},
+@samp{>=} and @samp{>}.
+
+The I/O redirection operators are @samp{<}, @samp{>}, @samp{>>} and
+@samp{|}.
+
+Note that I/O redirection operators in @code{print} and @code{printf}
+statements belong to the statement level, not to expressions. The
+redirection does not produce an expression which could be the operand of
+another operator. As a result, it does not make sense to use a
+redirection operator near another operator of lower precedence, without
+parentheses. Such combinations, for example @samp{print foo > a ? b :
+c}, result in syntax errors.
+
+@item concatenation
+No special token is used to indicate concatenation.
+The operands are simply written side by side.
+
+@item add, subtract
+@samp{+}, @samp{-}.
+
+@item multiply, divide, mod
+@samp{*}, @samp{/}, @samp{%}.
+
+@item unary plus, minus, ``not''
+@samp{+}, @samp{-}, @samp{!}.
+
+@item exponentiation
+@samp{^}, @samp{**}. These operators group right-to-left.
+(The @samp{**} operator is not specified by @sc{posix}.)
+
+@item increment, decrement
+@samp{++}, @samp{--}.
+
+@item field
+@samp{$}.
+@end table
+
+@node Statements, Arrays, Expressions, Top
+@chapter Control Statements in Actions
+@cindex control statement
+
+@dfn{Control statements} such as @code{if}, @code{while}, and so on
+control the flow of execution in @code{awk} programs. Most of the
+control statements in @code{awk} are patterned on similar statements in
+C.
+
+All the control statements start with special keywords such as @code{if}
+and @code{while}, to distinguish them from simple expressions.
+
+Many control statements contain other statements; for example, the
+@code{if} statement contains another statement which may or may not be
+executed. The contained statement is called the @dfn{body}. If you
+want to include more than one statement in the body, group them into a
+single compound statement with curly braces, separating them with
+newlines or semicolons.
+
+@menu
+* If Statement:: Conditionally execute
+ some @code{awk} statements.
+* While Statement:: Loop until some condition is satisfied.
+* Do Statement:: Do specified action while looping until some
+ condition is satisfied.
+* For Statement:: Another looping statement, that provides
+ initialization and increment clauses.
+* Break Statement:: Immediately exit the innermost enclosing loop.
+* Continue Statement:: Skip to the end of the innermost
+ enclosing loop.
+* Next Statement:: Stop processing the current input record.
+* Next File Statement:: Stop processing the current file.
+* Exit Statement:: Stop execution of @code{awk}.
+@end menu
+
+@node If Statement, While Statement, Statements, Statements
+@section The @code{if} Statement
+
+@cindex @code{if} statement
+The @code{if}-@code{else} statement is @code{awk}'s decision-making
+statement. It looks like this:@refill
+
+@example
+if (@var{condition}) @var{then-body} @r{[}else @var{else-body}@r{]}
+@end example
+
+@noindent
+@var{condition} is an expression that controls what the rest of the
+statement will do. If @var{condition} is true, @var{then-body} is
+executed; otherwise, @var{else-body} is executed (assuming that the
+@code{else} clause is present). The @code{else} part of the statement is
+optional. The condition is considered false if its value is zero or
+the null string, and true otherwise.@refill
+
+Here is an example:
+
+@example
+if (x % 2 == 0)
+ print "x is even"
+else
+ print "x is odd"
+@end example
+
+In this example, if the expression @code{x % 2 == 0} is true (that is,
+the value of @code{x} is divisible by 2), then the first @code{print}
+statement is executed, otherwise the second @code{print} statement is
+performed.@refill
+
+If the @code{else} appears on the same line as @var{then-body}, and
+@var{then-body} is not a compound statement (i.e., not surrounded by
+curly braces), then a semicolon must separate @var{then-body} from
+@code{else}. To illustrate this, let's rewrite the previous example:
+
+@example
+awk '@{ if (x % 2 == 0) print "x is even"; else
+ print "x is odd" @}'
+@end example
+
+@noindent
+If you forget the @samp{;}, @code{awk} won't be able to parse the
+statement, and you will get a syntax error.
+
+We would not actually write this example this way, because a human
+reader might fail to see the @code{else} if it were not the first thing
+on its line.
+
+@node While Statement, Do Statement, If Statement, Statements
+@section The @code{while} Statement
+@cindex @code{while} statement
+@cindex loop
+@cindex body of a loop
+
+In programming, a @dfn{loop} means a part of a program that is (or at least can
+be) executed two or more times in succession.
+
+The @code{while} statement is the simplest looping statement in
+@code{awk}. It repeatedly executes a statement as long as a condition is
+true. It looks like this:
+
+@example
+while (@var{condition})
+ @var{body}
+@end example
+
+@noindent
+Here @var{body} is a statement that we call the @dfn{body} of the loop,
+and @var{condition} is an expression that controls how long the loop
+keeps running.
+
+The first thing the @code{while} statement does is test @var{condition}.
+If @var{condition} is true, it executes the statement @var{body}.
+(@var{condition} is true when the value
+is not zero and not a null string.) After @var{body} has been executed,
+@var{condition} is tested again, and if it is still true, @var{body} is
+executed again. This process repeats until @var{condition} is no longer
+true. If @var{condition} is initially false, the body of the loop is
+never executed.@refill
+
+This example prints the first three fields of each record, one per line.
+
+@example
+awk '@{ i = 1
+ while (i <= 3) @{
+ print $i
+ i++
+ @}
+@}'
+@end example
+
+@noindent
+Here the body of the loop is a compound statement enclosed in braces,
+containing two statements.
+
+The loop works like this: first, the value of @code{i} is set to 1.
+Then, the @code{while} tests whether @code{i} is less than or equal to
+three. This is the case when @code{i} equals one, so the @code{i}-th
+field is printed. Then the @code{i++} increments the value of @code{i}
+and the loop repeats. The loop terminates when @code{i} reaches 4.
+
+As you can see, a newline is not required between the condition and the
+body; but using one makes the program clearer unless the body is a
+compound statement or is very simple. The newline after the open-brace
+that begins the compound statement is not required either, but the
+program would be hard to read without it.
+
+@node Do Statement, For Statement, While Statement, Statements
+@section The @code{do}-@code{while} Statement
+
+The @code{do} loop is a variation of the @code{while} looping statement.
+The @code{do} loop executes the @var{body} once, then repeats @var{body}
+as long as @var{condition} is true. It looks like this:
+
+@example
+do
+ @var{body}
+while (@var{condition})
+@end example
+
+Even if @var{condition} is false at the start, @var{body} is executed at
+least once (and only once, unless executing @var{body} makes
+@var{condition} true). Contrast this with the corresponding
+@code{while} statement:
+
+@example
+while (@var{condition})
+ @var{body}
+@end example
+
+@noindent
+This statement does not execute @var{body} even once if @var{condition}
+is false to begin with.
+
+Here is an example of a @code{do} statement:
+
+@example
+awk '@{ i = 1
+ do @{
+ print $0
+ i++
+ @} while (i <= 10)
+@}'
+@end example
+
+@noindent
+prints each input record ten times. It isn't a very realistic example,
+since in this case an ordinary @code{while} would do just as well. But
+this reflects actual experience; there is only occasionally a real use
+for a @code{do} statement.@refill
+
+@node For Statement, Break Statement, Do Statement, Statements
+@section The @code{for} Statement
+@cindex @code{for} statement
+
+The @code{for} statement makes it more convenient to count iterations of a
+loop. The general form of the @code{for} statement looks like this:@refill
+
+@example
+for (@var{initialization}; @var{condition}; @var{increment})
+ @var{body}
+@end example
+
+@noindent
+This statement starts by executing @var{initialization}. Then, as long
+as @var{condition} is true, it repeatedly executes @var{body} and then
+@var{increment}. Typically @var{initialization} sets a variable to
+either zero or one, @var{increment} adds 1 to it, and @var{condition}
+compares it against the desired number of iterations.
+
+Here is an example of a @code{for} statement:
+
+@example
+@group
+awk '@{ for (i = 1; i <= 3; i++)
+ print $i
+@}'
+@end group
+@end example
+
+@noindent
+This prints the first three fields of each input record, one field per
+line.
+
+In the @code{for} statement, @var{body} stands for any statement, but
+@var{initialization}, @var{condition} and @var{increment} are just
+expressions. You cannot set more than one variable in the
+@var{initialization} part unless you use a multiple assignment statement
+such as @code{x = y = 0}, which is possible only if all the initial values
+are equal. (But you can initialize additional variables by writing
+their assignments as separate statements preceding the @code{for} loop.)
+
+The same is true of the @var{increment} part; to increment additional
+variables, you must write separate statements at the end of the loop.
+The C compound expression, using C's comma operator, would be useful in
+this context, but it is not supported in @code{awk}.
+
+Most often, @var{increment} is an increment expression, as in the
+example above. But this is not required; it can be any expression
+whatever. For example, this statement prints all the powers of 2
+between 1 and 100:
+
+@example
+for (i = 1; i <= 100; i *= 2)
+ print i
+@end example
+
+Any of the three expressions in the parentheses following the @code{for} may
+be omitted if there is nothing to be done there. Thus, @w{@samp{for (;x
+> 0;)}} is equivalent to @w{@samp{while (x > 0)}}. If the
+@var{condition} is omitted, it is treated as @var{true}, effectively
+yielding an @dfn{infinite loop} (i.e., a loop that will never
+terminate).@refill
+
+In most cases, a @code{for} loop is an abbreviation for a @code{while}
+loop, as shown here:
+
+@example
+@var{initialization}
+while (@var{condition}) @{
+ @var{body}
+ @var{increment}
+@}
+@end example
+
+@noindent
+The only exception is when the @code{continue} statement
+(@pxref{Continue Statement, ,The @code{continue} Statement}) is used
+inside the loop; changing a @code{for} statement to a @code{while}
+statement in this way can change the effect of the @code{continue}
+statement inside the loop.@refill
+
+There is an alternate version of the @code{for} loop, for iterating over
+all the indices of an array:
+
+@example
+for (i in array)
+ @var{do something with} array[i]
+@end example
+
+@noindent
+@xref{Arrays, ,Arrays in @code{awk}}, for more information on this
+version of the @code{for} loop.
+
+The @code{awk} language has a @code{for} statement in addition to a
+@code{while} statement because often a @code{for} loop is both less work to
+type and more natural to think of. Counting the number of iterations is
+very common in loops. It can be easier to think of this counting as part
+of looping rather than as something to do inside the loop.
+
+The next section has more complicated examples of @code{for} loops.
+
+@node Break Statement, Continue Statement, For Statement, Statements
+@section The @code{break} Statement
+@cindex @code{break} statement
+@cindex loops, exiting
+
+The @code{break} statement jumps out of the innermost @code{for},
+@code{while}, or @code{do}-@code{while} loop that encloses it. The
+following example finds the smallest divisor of any integer, and also
+identifies prime numbers:@refill
+
+@smallexample
+awk '# find smallest divisor of num
+ @{ num = $1
+ for (div = 2; div*div <= num; div++)
+ if (num % div == 0)
+ break
+ if (num % div == 0)
+ printf "Smallest divisor of %d is %d\n", num, div
+ else
+ printf "%d is prime\n", num @}'
+@end smallexample
+
+When the remainder is zero in the first @code{if} statement, @code{awk}
+immediately @dfn{breaks out} of the containing @code{for} loop. This means
+that @code{awk} proceeds immediately to the statement following the loop
+and continues proc