aboutsummaryrefslogtreecommitdiffstats
path: root/gnu/usr.bin/awk
diff options
context:
space:
mode:
Diffstat (limited to 'gnu/usr.bin/awk')
-rw-r--r--gnu/usr.bin/awk/ACKNOWLEDGMENT8
-rw-r--r--gnu/usr.bin/awk/FUTURES69
-rw-r--r--gnu/usr.bin/awk/LIMITATIONS2
-rw-r--r--gnu/usr.bin/awk/Makefile20
-rw-r--r--gnu/usr.bin/awk/NEWS207
-rw-r--r--gnu/usr.bin/awk/PORTS9
-rw-r--r--gnu/usr.bin/awk/PROBLEMS6
-rw-r--r--gnu/usr.bin/awk/README25
-rw-r--r--gnu/usr.bin/awk/array.c300
-rw-r--r--gnu/usr.bin/awk/awk.1158
-rw-r--r--gnu/usr.bin/awk/awk.h115
-rw-r--r--gnu/usr.bin/awk/awk.y144
-rw-r--r--gnu/usr.bin/awk/builtin.c670
-rw-r--r--gnu/usr.bin/awk/config.h38
-rw-r--r--gnu/usr.bin/awk/dfa.c2837
-rw-r--r--gnu/usr.bin/awk/dfa.h425
-rw-r--r--gnu/usr.bin/awk/eval.c113
-rw-r--r--gnu/usr.bin/awk/field.c119
-rw-r--r--gnu/usr.bin/awk/getopt.c203
-rw-r--r--gnu/usr.bin/awk/getopt.h31
-rw-r--r--gnu/usr.bin/awk/getopt1.c81
-rw-r--r--gnu/usr.bin/awk/io.c221
-rw-r--r--gnu/usr.bin/awk/iop.c27
-rw-r--r--gnu/usr.bin/awk/main.c178
-rw-r--r--gnu/usr.bin/awk/msg.c11
-rw-r--r--gnu/usr.bin/awk/node.c68
-rw-r--r--gnu/usr.bin/awk/patchlevel.h2
-rw-r--r--gnu/usr.bin/awk/protos.h45
-rw-r--r--gnu/usr.bin/awk/re.c46
-rw-r--r--gnu/usr.bin/awk/regex.c6330
-rw-r--r--gnu/usr.bin/awk/regex.h675
-rw-r--r--gnu/usr.bin/awk/version.c3
32 files changed, 8458 insertions, 4728 deletions
diff --git a/gnu/usr.bin/awk/ACKNOWLEDGMENT b/gnu/usr.bin/awk/ACKNOWLEDGMENT
index b6c3b0b0c692..cb4021f7d633 100644
--- a/gnu/usr.bin/awk/ACKNOWLEDGMENT
+++ b/gnu/usr.bin/awk/ACKNOWLEDGMENT
@@ -8,12 +8,16 @@ platforms and providing a great deal of feedback. They are:
Hal Peterson <hrp@pecan.cray.com> (Cray)
Pat Rankin <gawk.rankin@EQL.Caltech.Edu> (VMS)
- Michal Jaegermann <NTOMCZAK@vm.ucs.UAlberta.CA> (Atari, NeXT, DEC 3100)
+ Michal Jaegermann <michal@gortel.phys.UAlberta.CA> (Atari, NeXT, DEC 3100)
Mike Lijewski <mjlx@eagle.cnsf.cornell.edu> (IBM RS6000)
- Scott Deifik <scottd@amgen.com> (MSDOS 2.14)
+ Scott Deifik <scottd@amgen.com> (MSDOS 2.14 and 2.15)
Kent Williams (MSDOS 2.11)
Conrad Kwok (MSDOS earlier versions)
Scott Garfinkle (MSDOS earlier versions)
+ Kai Uwe Rommel <rommel@ars.muc.de> (OS/2)
+ Darrel Hankerson <hankedr@mail.auburn.edu> (OS/2)
+ Mark Moraes <Mark-Moraes@deshaw.com> (Code Center, Purify)
+ Kaveh Ghazi <ghazi@noc.rutgers.edu> (Lots of Unix variants)
Last, but far from least, we would like to thank Brian Kernighan who
has helped to clear up many dark corners of the language and provided a
diff --git a/gnu/usr.bin/awk/FUTURES b/gnu/usr.bin/awk/FUTURES
index b09656046b27..6250584fdfd5 100644
--- a/gnu/usr.bin/awk/FUTURES
+++ b/gnu/usr.bin/awk/FUTURES
@@ -1,14 +1,26 @@
This file lists future projects and enhancements for gawk. Items are listed
in roughly the order they will be done for a given release. This file is
-mainly for use by the developers to help keep themselves on track, please
+mainly for use by the developer(s) to help keep themselves on track, please
don't bug us too much about schedules or what all this really means.
+(An `x' indicates that some progress has been made, but that the feature is
+not complete yet.)
+
For 2.16
========
-David:
- Move to autoconf-based configure system.
+x Move to autoconf-based configure system.
+
+x Research awk `fflush' function.
+
+x Generalize IGNORECASE
+ any value makes it work, not just numeric non-zero
+ make it apply to *all* string comparisons
+
+x Fix FILENAME to have an initial value of "", not "-"
- Allow RS to be a regexp.
+In 2.17
+=======
+x Allow RS to be a regexp.
RT variable to hold text of record terminator
@@ -16,46 +28,24 @@ David:
Feedback alloca.s changes to FSF
- Extensible hashing in memory of awk arrays
+x Split() with null string as third arg to split up strings
- Split() with null string as third arg to split up strings
-
- Analogously, setting FS="" would split the input record into individual
+x Analogously, setting FS="" would split the input record into individual
characters.
-Arnold:
- Generalize IGNORECASE
- any value makes it work, not just numeric non-zero
- make it apply to *all* string comparisons
-
- Fix FILENAME to have an initial value of "", not "-"
-
- Clean up code by isolating system-specific functions in separate files.
+x Clean up code by isolating system-specific functions in separate files.
Undertake significant directory reorganization.
- Extensive manual cleanup:
+x Extensive manual cleanup:
Use of texinfo 2.0 features
Lots more examples
Document all of the above.
-In 2.17
-=======
-David:
-
- Incorporate newer dfa.c and regex.c (go to POSIX regexps)
+x Go to POSIX regexps
Make regex + dfa less dependant on gawk header file includes
- General sub functions:
- edit(line, pat, sub) and gedit(line, pat, sub)
- that return the substituted strings and allow \1 etc. in the sub string.
-
-Arnold:
- DBM storage of awk arrays. Try to allow multiple dbm packages
-
- ? Have strftime() pay attention to the value of ENVIRON["TZ"]
-
Additional manual features:
Document posix regexps
Document use of dbm arrays
@@ -67,19 +57,26 @@ Arnold:
For 2.18
========
+ DBM storage of awk arrays. Try to allow multiple dbm packages
-Arnold:
+ General sub functions:
+ edit(line, pat, sub) and gedit(line, pat, sub)
+ that return the substituted strings and allow \1 etc. in the sub
+ string.
+
+ ? Have strftime() pay attention to the value of ENVIRON["TZ"]
+
+For 2.19
+========
Add chdir and stat built-in functions.
Add function pointers as valid variable types.
Add an `ftw' built-in function that takes a function pointer.
-David:
-
Do an optimization pass over parse tree?
-For 2.19 or later:
+For 2.20 or later:
==================
Add variables similar to C's __FILE__ and __LINE__ for better diagnostics
from within awk programs.
@@ -101,7 +98,7 @@ Make awk '/foo/' files... run at egrep speeds
Do a reference card
-Allow OFMT to be other than a floating point format.
+Allow OFMT and CONVFMT to be other than a floating point format.
Allow redefining of builtin functions?
diff --git a/gnu/usr.bin/awk/LIMITATIONS b/gnu/usr.bin/awk/LIMITATIONS
index 5877197aeb55..64eab85f164b 100644
--- a/gnu/usr.bin/awk/LIMITATIONS
+++ b/gnu/usr.bin/awk/LIMITATIONS
@@ -12,3 +12,5 @@ Characters in a character class: 2^(# of bits per byte)
# of pipe redirections: min(# of processes per user, # of open files)
double-precision floating point
Length of source line: unlimited
+Number of input records in one file: MAX_LONG
+Number of input records total: MAX_LONG
diff --git a/gnu/usr.bin/awk/Makefile b/gnu/usr.bin/awk/Makefile
index 3ef9894d175d..dad9f5772be3 100644
--- a/gnu/usr.bin/awk/Makefile
+++ b/gnu/usr.bin/awk/Makefile
@@ -1,13 +1,13 @@
-PROG= awk
-SRCS= main.c eval.c builtin.c msg.c iop.c io.c field.c array.c \
- node.c version.c re.c awk.c regex.c dfa.c \
- getopt.c getopt1.c
-CFLAGS+= -DGAWK
-LDADD= -lm
-DPADD= ${LIBM}
-CLEANFILES+= awk.c y.tab.h
+PROG= awk
+SRCS= main.c eval.c builtin.c msg.c iop.c io.c field.c getopt1.c \
+ getopt.c array.c \
+ node.c version.c re.c awk.c regex.c dfa.c
+DPADD= ${LIBM}
+LDADD= -lm
+CFLAGS+=-I${.CURDIR} -DGAWK
+CLEANFILES+=awk.c y.tab.h
-MAN1= awk.1
+MAN1= awk.1
.include <bsd.prog.mk>
-.include "../../usr.bin/Makefile.inc"
+#.include "../../usr.bin/Makefile.inc"
diff --git a/gnu/usr.bin/awk/NEWS b/gnu/usr.bin/awk/NEWS
index 6711373d6ea5..4df69e7d0bf8 100644
--- a/gnu/usr.bin/awk/NEWS
+++ b/gnu/usr.bin/awk/NEWS
@@ -1,5 +1,191 @@
+Changes from 2.15.4 to 2.15.5
+-----------------------------
+
+FUTURES file updated and re-arranged some with more rational schedule.
+
+Many prototypes handled better for ANSI C in protos.h.
+
+getopt.c updated somewhat.
+
+test/Makefile now removes junk directory, `bardargtest' renamed `badargs.'
+
+Bug fix in iop.c for RS = "". Eat trailing newlines off of record separator.
+
+Bug fix in Makefile.bsd44, use leading tab in actions.
+
+Fix in field.c:set_FS for FS == "\\" and IGNORECASE != 0.
+
+Config files updated or added:
+ cray60, DEC OSF/1 2.0, Utek, sgi405, next21, next30, atari/config.h,
+ sco.
+
+Fix in io.c for ENFILE as well as EMFILE, update decl of groupset to
+include OSF/1.
+
+Rationalized printing as integers if numbers are outside the range of a long.
+Changes to node.c:force_string and builtin.c.
+
+Made internal NF, NR, and FNR variables longs instead of ints.
+
+Add LIMITS_H_MISSING stuff to config.in and awk.h, and default defs for
+INT_MAX and LONG_MAX, if no limits.h file. Add a standard decl of
+the time() function for __STDC__. From ghazi@noc.rutgers.edu.
+
+Fix tree_eval in awk.h and r_tree_eval in eval.c to deal better with
+function parameters, particularly ones that are arrays.
+
+Fix eval.c to print out array names of arrays used in scalar contexts.
+
+Fix eval.c in interpret to zero out source and sourceline initially. This
+does a better job of providing source file and line number information.
+
+Fix to re_parse_field in field.c to not use isspace when RS = "", but rather
+to explicitly look for blank and tab.
+
+Fix to sc_parse_field in field.c to catch the case of the FS character at the
+end of a record.
+
+Lots of miscellanious bug fixes for memory leaks, courtesy Mark Moraes,
+also fixes for arrays.
+
+io.c fixed to warn about lack of explicit closes if --lint.
+
+Updated missing/strftime.c to match posted strftime 6.2.
+
+Bug fix in builtin.c, in case of non-match in sub_common.
+
+Updated constant used for division in builtin.c:do_rand for DEC Alpha
+and CRAY Y-MP.
+
+POSIXLY_CORRECT in the environment turns on --posix (fixed in main.c).
+
+Updated srandom prototype and calls in builtin.c.
+
+Fix awk.y to enforce posix semantics of unary +: result is numeric.
+
+Fix array.c to not rearrange the hash chain upon finding an index in
+the array. This messed things up in cases like:
+ for (index1 in array) {
+ blah
+ if (index2 in array) # blew away the for
+ stuff
+ }
+
+Fixed spelling errors in the man page.
+
+Fixes in awk.y so that
+ gawk '' /path/to/file
+will work without core dumping or finding parse errors.
+
+Fix main.c so that --lint will fuss about an empty program.
+Yet another fix for argument parsing in the case of unrecognized options.
+
+Bug fix in dfa.c to not attempt to free null pointers.
+
+Bug fix in builtin.c to only use DEFAULT_G_PRECISION for %g or %G.
+
+Bug fix in field.c to achieve call by value semantics for split.
+
+Changes from 2.15.3 to 2.15.4
+-----------------------------
+
+Lots of lint fixes, and do_sprintf made mostly ANSI C compatible.
+
+Man page updated and edited.
+
+Copyrights updated.
+
+Arrays now grow dynamically, initially scaling up by an order of magnitude
+ and then doubling, up to ~ 64K. This should keep gawk's performance
+ graceful under heavy load.
+
+New `delete array' feature added. Only documented in the man page.
+
+Switched to dfa and regex suites from grep-2.0. These offer the ability to
+ move to POSIX regexps in the next release.
+
+Disabled GNU regex ops.
+
+Research awk -m option now recognized. It does nothing in gawk, since gawk
+ has no static limits. Only documented in the man page.
+
+New bionic (faster, better, stronger than before) hashing function.
+
+Bug fix in argument handling. `gawk -X' now notices there was no program.
+ Additional bug fixes to make --compat and --lint work again.
+
+Many changes for systems where sizeof(int) != sizeof(void *).
+
+Add explicit alloca(0) in io.c to recover space from C alloca.
+
+Fixed file descriptor leak in io.c.
+
+The --version option now follows the GNU coding standards and exits.
+
+Fixed several prototypes in protos.h.
+
+Several tests updated. On Solaris, warn that the out? tests will fail.
+
+Configuration files for SunOS with cc and Solaris 2.x added.
+
+Improved error messages in awk.y on gawk extensions if do_unix or do_compat.
+
+INSTALL file added.
+
+Fixed Atari Makefile and several VMS specific changes.
+
+Better conversion of numbers to strings on systems with broken sprintfs.
+
+Changes from 2.15.2 to 2.15.3
+-----------------------------
+
+Increased HASHSIZE to a decent number, 127 was way too small.
+
+FILENAME is now the null string in a BEGIN rule.
+
+Argument processing fixed for invalid options and missing arguments.
+
+This version will build on VMS. This included a fix to close all files
+ and pipes opened with redirections before closing stdout and stderr.
+
+More getpgrp() defines.
+
+Changes for BSD44: <sys/param.h> in io.c and Makefile.bsd44.
+
+All directories in the distribution are now writable.
+
+Separated LDFLAGS and CFLAGS in Makefile. CFLAGS can now be overridden by
+ user.
+
+Make dist now builds compressed archives ending in .gz and runs doschk.
+
+Amiga port.
+
+New getopt.c fixes Alpha OSF/1 problem.
+
+Make clean now removes possible test output.
+
+Improved algorithm for multiple adjacent string concatenations leads to
+ performance improvements.
+
+Fix nasty bug whereby command-line assignments, both with -v and at run time,
+ could create variables with syntactically illegal names.
+
+Fix obscure bug in printf with %0 flag and filling.
+
+Add a lint check for substr if provided length exceeds remaining characters
+ in string.
+
+Update atari support.
+
+PC support enhanced to include support for both DOS and OS/2. (Lots more
+ #ifdefs. Sigh.)
+
+Config files for Hitachi Unix and OSF/1, courtesy of Yoko Morishita
+ (morisita@sra.co.jp)
+
Changes from 2.15.1 to 2.15.2
----------------------------
+-----------------------------
Additions to the FUTURES file.
@@ -11,7 +197,6 @@ Clean up the distribution generation in Makefile.in: the info files are
now included, the distributed files are marked read-only and patched
distributions are now unpacked in a directory named with the patch level.
-
Changes from 2.15 to 2.15.1
---------------------------
@@ -161,7 +346,7 @@ Added code to do better diagnoses of weird or null file names.
Allow continue outside of a loop, unless in strict posix mode. Lint option
will issue warning.
-New missing/strftime.c. There has been one chage that affects gawk. Posix
+New missing/strftime.c. There has been one change that affects gawk. Posix
now defines a %V conversion so the vms conversion has been changed to %v.
If this version is used with gawk -Wlint and they use %V in a call to
strftime, they'll get a warning.
@@ -189,7 +374,7 @@ Fixed a couple of bugs for reference to $0 when $0 is "" -- particularly in
Fixed premature freeing in construct "$0 = $0".
Removed the call to wait_any() in gawk_popen(), since on at least some systems,
- if gawk's input was from a pipe, the predecssor process in the pipe was a
+ if gawk's input was from a pipe, the predecessor process in the pipe was a
child of gawk and this caused a deadlock.
Regexp can (once again) match a newline, if given explicitly.
@@ -215,7 +400,7 @@ Fixed bug when NF is set by user -- fields_arr must be expanded if necessary
Fixed several bugs in [g]sub() for no match found or the match is 0-length.
-Fixed bug where in gsub() a pattern anchorred at the beginning would still
+Fixed bug where in gsub() a pattern anchored at the beginning would still
substitute throughout the string.
make test does not assume the . is in PATH.
@@ -235,7 +420,7 @@ Fixed hanging of pipe redirection to getline
Fixed coredump on access to $0 inside BEGIN block.
Fixed treatment of RS = "". It now parses the fields correctly and strips
- leading whitspace from a record if FS is a space.
+ leading whitespace from a record if FS is a space.
Fixed faking of /dev/stdin.
@@ -279,7 +464,7 @@ Update to config/bsd43.
Added config/apollo, config/msc60, config/cray2-50, config/interactive2.2
-sgi33.cc added for compilation using cc ratther than gcc.
+sgi33.cc added for compilation using cc rather than gcc.
Ultrix41 now propagates to config.h properly -- as part of a general
mechanism in configure for kludges -- #define anything from a config file
@@ -600,7 +785,7 @@ Fixed bug in output flushing introduced a few patches back. This caused
Changes from 2.12.22 to 2.12.23
-------------------------------
-Accidently left config/cray2-60 out of last patch.
+Accidentally left config/cray2-60 out of last patch.
Added some missing dependencies to Makefile.
@@ -750,7 +935,7 @@ Changes from 2.12.14 to 2.12.15
-------------------------------
Changed config/* to a condensed form that can be used with mkconf to generate
- a config.h from config.h-dist -- much easier to maintain. Please chaeck
+ a config.h from config.h-dist -- much easier to maintain. Please check
carefully against what you had before for a particular system and report
any problems. vms.h remains separate since the stuff at the bottom
didn't quite fit the mkconf model -- hopefully cleared up later.
@@ -1102,7 +1287,7 @@ Changes from 2.11beta to 2.11.1 (production)
Went from "beta" to production status!!!
Now flushes stdout before closing pipes or redirected files to
-synchonize output.
+synchronize output.
MS-DOS changes added in.
@@ -1217,7 +1402,7 @@ Cleaned up and fixed close_redir().
Fixed an obscure bug to do with redirection. Intermingled ">" and ">>"
redirects did not output in a predictable order.
-Improved handling of output bufferring: now all print[f]s redirected to a tty
+Improved handling of output buffering: now all print[f]s redirected to a tty
or pipe are flushed immediately and non-redirected output to a tty is flushed
before the next input record is read.
diff --git a/gnu/usr.bin/awk/PORTS b/gnu/usr.bin/awk/PORTS
index 95e133f9dd03..5087a4301af9 100644
--- a/gnu/usr.bin/awk/PORTS
+++ b/gnu/usr.bin/awk/PORTS
@@ -18,15 +18,18 @@ OpenVMS AXP V1.0
MSDOS - Microsoft C 5.1, compiles and runs very simple testing
BSD 4.4alpha
-From: ghazi@caip.rutgers.edu (Kaveh R. Ghazi):
+From: ghazi@noc.rutgers.edu (Kaveh R. Ghazi):
arch configured as:
---- --------------
+Dec Alpha OSF 1.3 osf1
Hpux 9.0 hpux8x
NeXTStep 2.0 next20
-Sgi Irix 4.0.5 (/bin/cc) sgi405.cc (new file)
+Sgi Irix 4.0.5 (/bin/cc) sgi405.cc
Stardent Titan 1500 OSv2.5 sysv3
Stardent Vistra (i860) SVR4 sysv4
-SunOS 4.1.2 sunos41
+Solaris 2.3 solaris2.cc
+SunOS 4.1.3 sunos41
Tektronix XD88 (UTekV 3.2e) sysv3
+Tektronix 4300 (UTek 4.0) utek
Ultrix 4.2 ultrix41
diff --git a/gnu/usr.bin/awk/PROBLEMS b/gnu/usr.bin/awk/PROBLEMS
index 3b7c5148bd8e..a43618002119 100644
--- a/gnu/usr.bin/awk/PROBLEMS
+++ b/gnu/usr.bin/awk/PROBLEMS
@@ -3,4 +3,8 @@ Hopefully they will all be fixed in the next major release of gawk.
Please keep in mind that the code is still undergoing significant evolution.
-1. Gawk's printf is probably still not POSIX compliant.
+1. The interactions with the lexer and yyerror need reworking. It is possible
+ to get line numbers that are one line off if --compat or --posix is
+ true and either `next file' or `delete array' are used.
+
+ Really the whole lexical analysis stuff needs reworking.
diff --git a/gnu/usr.bin/awk/README b/gnu/usr.bin/awk/README
index f4bd3df806c8..90ed9c29f63a 100644
--- a/gnu/usr.bin/awk/README
+++ b/gnu/usr.bin/awk/README
@@ -1,8 +1,7 @@
README:
-This is GNU Awk 2.15. It should be upwardly compatible with the
-System V Release 4 awk. It is almost completely compliant with draft 11.3
-of POSIX 1003.2.
+This is GNU Awk 2.15. It should be upwardly compatible with the System
+V Release 4 awk. It is almost completely compliant with POSIX 1003.2.
This release adds new features -- see NEWS for details.
@@ -10,7 +9,7 @@ See the installation instructions, below.
Known problems are given in the PROBLEMS file. Work to be done is
described briefly in the FUTURES file. Verified ports are listed in
-the PORTS file. Changes in this version are summarized in the CHANGES file.
+the PORTS file. Changes in this version are summarized in the NEWS file.
Please read the LIMITATIONS and ACKNOWLEDGMENT files.
Read the file POSIX for a discussion of how the standard says comparisons
@@ -28,6 +27,8 @@ INSTALLATION:
Check whether there is a system-specific README file for your system.
+A quick overview of the installation process is in the file INSTALL.
+
Makefile.in may need some tailoring. The only changes necessary should
be to change installation targets or to change compiler flags.
The changes to make in Makefile.in are commented and should be obvious.
@@ -54,10 +55,12 @@ proceed.
The next release will use the FSF autoconfig program, so we are no longer
soliciting new config files.
-If you have an MS-DOS system, use the stuff in the pc directory.
+If you have an MS-DOS or OS/2 system, use the stuff in the pc directory.
For an Atari there is an atari directory and similarly one for VMS.
Chapter 16 of The GAWK Manual discusses configuration in detail.
+(However, it does not discuss OS/2 configuration, see README.pc for
+the details. The manual is being massively revised for 2.16.)
After successful compilation, do 'make test' to run a small test
suite. There should be no output from the 'cmp' invocations except in
@@ -67,7 +70,7 @@ problem.
PRINTING THE MANUAL
-The 'support' directory contains texinfo.tex 2.65, which will be necessary
+The 'support' directory contains texinfo.tex 2.115, which will be necessary
for printing the manual, and the texindex.c program from the texinfo
distribution which is also necessary. See the makefile for the steps needed
to get a DVI file from the manual.
@@ -91,7 +94,7 @@ INTERNET: david@cs.dal.ca
Arnold Robbins
1736 Reindeer Drive
-Atlanta, GA, 30329, USA
+Atlanta, GA, 30329-3528, USA
INTERNET: arnold@skeeve.atl.ga.us
UUCP: { gatech, emory, emoryu1 }!skeeve!arnold
@@ -113,4 +116,10 @@ VMS:
Atari ST:
Michal Jaegermann
- NTOMCZAK@vm.ucs.UAlberta.CA (e-mail only)
+ michal@gortel.phys.ualberta.ca (e-mail only)
+
+OS/2:
+ Kai Uwe Rommel
+ rommel@ars.muc.de (e-mail only)
+ Darrel Hankerson
+ hankedr@mail.auburn.edu (e-mail only)
diff --git a/gnu/usr.bin/awk/array.c b/gnu/usr.bin/awk/array.c
index 59be340c04df..d42f9a6c5847 100644
--- a/gnu/usr.bin/awk/array.c
+++ b/gnu/usr.bin/awk/array.c
@@ -3,7 +3,7 @@
*/
/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -23,9 +23,24 @@
* the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
+/*
+ * Tree walks (``for (iggy in foo)'') and array deletions use expensive
+ * linear searching. So what we do is start out with small arrays and
+ * grow them as needed, so that our arrays are hopefully small enough,
+ * most of the time, that they're pretty full and we're not looking at
+ * wasted space.
+ *
+ * The decision is made to grow the array if the average chain length is
+ * ``too big''. This is defined as the total number of entries in the table
+ * divided by the size of the array being greater than some constant.
+ */
+
+#define AVG_CHAIN_MAX 10 /* don't want to linear search more than this */
+
#include "awk.h"
static NODE *assoc_find P((NODE *symbol, NODE *subs, int hash1));
+static void grow_table P((NODE *symbol));
NODE *
concat_exp(tree)
@@ -34,9 +49,9 @@ register NODE *tree;
register NODE *r;
char *str;
char *s;
- unsigned len;
+ size_t len;
int offset;
- int subseplen;
+ size_t subseplen;
char *subsep;
if (tree->type != Node_expression_list)
@@ -84,7 +99,7 @@ NODE *symbol;
if (symbol->var_array == 0)
return;
- for (i = 0; i < HASHSIZE; i++) {
+ for (i = 0; i < symbol->array_size; i++) {
for (bucket = symbol->var_array[i]; bucket; bucket = next) {
next = bucket->ahnext;
unref(bucket->ahname);
@@ -93,17 +108,26 @@ NODE *symbol;
}
symbol->var_array[i] = 0;
}
+ free(symbol->var_array);
+ symbol->var_array = NULL;
+ symbol->array_size = symbol->table_size = 0;
+ symbol->flags &= ~ARRAYMAXED;
}
/*
* calculate the hash function of the string in subs
*/
unsigned int
-hash(s, len)
-register char *s;
-register int len;
+hash(s, len, hsize)
+register const char *s;
+register size_t len;
+unsigned long hsize;
{
- register unsigned long h = 0, g;
+ register unsigned long h = 0;
+
+#ifdef this_is_really_slow
+
+ register unsigned long g;
while (len--) {
h = (h << 4) + *s++;
@@ -113,10 +137,84 @@ register int len;
h = h ^ g;
}
}
- if (h < HASHSIZE)
- return h;
- else
- return h%HASHSIZE;
+
+#else /* this_is_really_slow */
+/*
+ * This is INCREDIBLY ugly, but fast. We break the string up into 8 byte
+ * units. On the first time through the loop we get the "leftover bytes"
+ * (strlen % 8). On every other iteration, we perform 8 HASHC's so we handle
+ * all 8 bytes. Essentially, this saves us 7 cmp & branch instructions. If
+ * this routine is heavily used enough, it's worth the ugly coding.
+ *
+ * OZ's original sdbm hash, copied from Margo Seltzers db package.
+ *
+ */
+
+/* Even more speed: */
+/* #define HASHC h = *s++ + 65599 * h */
+/* Because 65599 = pow(2,6) + pow(2,16) - 1 we multiply by shifts */
+#define HASHC htmp = (h << 6); \
+ h = *s++ + htmp + (htmp << 10) - h
+
+ unsigned long htmp;
+
+ h = 0;
+
+#if defined(VAXC)
+/*
+ * [This was an implementation of "Duff's Device", but it has been
+ * redone, separating the switch for extra iterations from the loop.
+ * This is necessary because the DEC VAX-C compiler is STOOPID.]
+ */
+ switch (len & (8 - 1)) {
+ case 7: HASHC;
+ case 6: HASHC;
+ case 5: HASHC;
+ case 4: HASHC;
+ case 3: HASHC;
+ case 2: HASHC;
+ case 1: HASHC;
+ default: break;
+ }
+
+ if (len > (8 - 1)) {
+ register size_t loop = len >> 3;
+ do {
+ HASHC;
+ HASHC;
+ HASHC;
+ HASHC;
+ HASHC;
+ HASHC;
+ HASHC;
+ HASHC;
+ } while (--loop);
+ }
+#else /* !VAXC */
+ /* "Duff's Device" for those who can handle it */
+ if (len > 0) {
+ register size_t loop = (len + 8 - 1) >> 3;
+
+ switch (len & (8 - 1)) {
+ case 0:
+ do { /* All fall throughs */
+ HASHC;
+ case 7: HASHC;
+ case 6: HASHC;
+ case 5: HASHC;
+ case 4: HASHC;
+ case 3: HASHC;
+ case 2: HASHC;
+ case 1: HASHC;
+ } while (--loop);
+ }
+ }
+#endif /* !VAXC */
+#endif /* this_is_really_slow - not */
+
+ if (h >= hsize)
+ h %= hsize;
+ return h;
}
/*
@@ -132,11 +230,19 @@ int hash1;
for (bucket = symbol->var_array[hash1]; bucket; bucket = bucket->ahnext) {
if (cmp_nodes(bucket->ahname, subs) == 0) {
+#if 0
+ /*
+ * Disable this code for now. It screws things up if we have
+ * a ``for (iggy in foo)'' in progress. Interestingly enough,
+ * this was not a problem in 2.15.3, only in 2.15.4. I'm not
+ * sure why it works in 2.15.3.
+ */
if (prev) { /* move found to front of chain */
prev->ahnext = bucket->ahnext;
bucket->ahnext = symbol->var_array[hash1];
symbol->var_array[hash1] = bucket;
}
+#endif
return bucket;
} else
prev = bucket; /* save previous list entry */
@@ -158,7 +264,7 @@ NODE *symbol, *subs;
if (symbol->var_array == 0)
return 0;
subs = concat_exp(subs); /* concat_exp returns a string node */
- hash1 = hash(subs->stptr, subs->stlen);
+ hash1 = hash(subs->stptr, subs->stlen, (unsigned long) symbol->array_size);
if (assoc_find(symbol, subs, hash1) == NULL) {
free_temp(subs);
return 0;
@@ -183,17 +289,17 @@ NODE *symbol, *subs;
register NODE *bucket;
(void) force_string(subs);
- hash1 = hash(subs->stptr, subs->stlen);
-
- if (symbol->var_array == 0) { /* this table really should grow
- * dynamically */
- unsigned size;
- size = sizeof(NODE *) * HASHSIZE;
- emalloc(symbol->var_array, NODE **, size, "assoc_lookup");
- memset((char *)symbol->var_array, 0, size);
+ if (symbol->var_array == 0) {
symbol->type = Node_var_array;
+ symbol->array_size = symbol->table_size = 0; /* sanity */
+ symbol->flags &= ~ARRAYMAXED;
+ grow_table(symbol);
+ hash1 = hash(subs->stptr, subs->stlen,
+ (unsigned long) symbol->array_size);
} else {
+ hash1 = hash(subs->stptr, subs->stlen,
+ (unsigned long) symbol->array_size);
bucket = assoc_find(symbol, subs, hash1);
if (bucket != NULL) {
free_temp(subs);
@@ -205,6 +311,17 @@ NODE *symbol, *subs;
if (do_lint && subs->stlen == 0)
warning("subscript of array `%s' is null string",
symbol->vname);
+
+ /* first see if we would need to grow the array, before installing */
+ symbol->table_size++;
+ if ((symbol->flags & ARRAYMAXED) == 0
+ && symbol->table_size/symbol->array_size > AVG_CHAIN_MAX) {
+ grow_table(symbol);
+ /* have to recompute hash value for new size */
+ hash1 = hash(subs->stptr, subs->stlen,
+ (unsigned long) symbol->array_size);
+ }
+
getnode(bucket);
bucket->type = Node_ahash;
if (subs->flags & TEMP)
@@ -240,7 +357,7 @@ NODE *symbol, *tree;
if (symbol->var_array == 0)
return;
subs = concat_exp(tree); /* concat_exp returns string node */
- hash1 = hash(subs->stptr, subs->stlen);
+ hash1 = hash(subs->stptr, subs->stlen, (unsigned long) symbol->array_size);
last = NULL;
for (bucket = symbol->var_array[hash1]; bucket; last = bucket, bucket = bucket->ahnext)
@@ -256,6 +373,15 @@ NODE *symbol, *tree;
unref(bucket->ahname);
unref(bucket->ahvalue);
freenode(bucket);
+ symbol->table_size--;
+ if (symbol->table_size <= 0) {
+ memset(symbol->var_array, '\0',
+ sizeof(NODE *) * symbol->array_size);
+ symbol->table_size = symbol->array_size = 0;
+ symbol->flags &= ~ARRAYMAXED;
+ free((char *) symbol->var_array);
+ symbol->var_array = NULL;
+ }
}
void
@@ -263,31 +389,127 @@ assoc_scan(symbol, lookat)
NODE *symbol;
struct search *lookat;
{
- if (!symbol->var_array) {
- lookat->retval = NULL;
- return;
- }
- lookat->arr_ptr = symbol->var_array;
- lookat->arr_end = lookat->arr_ptr + HASHSIZE; /* added */
- lookat->bucket = symbol->var_array[0];
- assoc_next(lookat);
+ lookat->sym = symbol;
+ lookat->idx = 0;
+ lookat->bucket = NULL;
+ lookat->retval = NULL;
+ if (symbol->var_array != NULL)
+ assoc_next(lookat);
}
void
assoc_next(lookat)
struct search *lookat;
{
- while (lookat->arr_ptr < lookat->arr_end) {
- if (lookat->bucket != 0) {
- lookat->retval = lookat->bucket->ahname;
- lookat->bucket = lookat->bucket->ahnext;
+ register NODE *symbol = lookat->sym;
+
+ if (symbol == NULL)
+ fatal("null symbol in assoc_next");
+ if (symbol->var_array == NULL || lookat->idx > symbol->array_size) {
+ lookat->retval = NULL;
+ return;
+ }
+ /*
+ * This is theoretically unsafe. The element bucket might have
+ * been freed if the body of the scan did a delete on the next
+ * element of the bucket. The only way to do that is by array
+ * reference, which is unlikely. Basically, if the user is doing
+ * anything other than an operation on the current element of an
+ * assoc array while walking through it sequentially, all bets are
+ * off. (The safe way is to register all search structs on an
+ * array with the array, and update all of them on a delete or
+ * insert)
+ */
+ if (lookat->bucket != NULL) {
+ lookat->retval = lookat->bucket->ahname;
+ lookat->bucket = lookat->bucket->ahnext;
+ return;
+ }
+ for (; lookat->idx < symbol->array_size; lookat->idx++) {
+ NODE *bucket;
+
+ if ((bucket = symbol->var_array[lookat->idx]) != NULL) {
+ lookat->retval = bucket->ahname;
+ lookat->bucket = bucket->ahnext;
+ lookat->idx++;
return;
}
- lookat->arr_ptr++;
- if (lookat->arr_ptr < lookat->arr_end)
- lookat->bucket = *(lookat->arr_ptr);
- else
- lookat->retval = NULL;
}
+ lookat->retval = NULL;
+ lookat->bucket = NULL;
return;
}
+
+/* grow_table --- grow a hash table */
+
+static void
+grow_table(symbol)
+NODE *symbol;
+{
+ NODE **old, **new, *chain, *next;
+ int i, j;
+ unsigned long hash1;
+ unsigned long oldsize, newsize;
+ /*
+ * This is an array of primes. We grow the table by an order of
+ * magnitude each time (not just doubling) so that growing is a
+ * rare operation. We expect, on average, that it won't happen
+ * more than twice. The final size is also chosen to be small
+ * enough so that MS-DOG mallocs can handle it. When things are
+ * very large (> 8K), we just double more or less, instead of
+ * just jumping from 8K to 64K.
+ */
+ static long sizes[] = { 13, 127, 1021, 8191, 16381, 32749, 65497 };
+
+ /* find next biggest hash size */
+ oldsize = symbol->array_size;
+ newsize = 0;
+ for (i = 0, j = sizeof(sizes)/sizeof(sizes[0]); i < j; i++) {
+ if (oldsize < sizes[i]) {
+ newsize = sizes[i];
+ break;
+ }
+ }
+
+ if (newsize == oldsize) { /* table already at max (!) */
+ symbol->flags |= ARRAYMAXED;
+ return;
+ }
+
+ /* allocate new table */
+ emalloc(new, NODE **, newsize * sizeof(NODE *), "grow_table");
+ memset(new, '\0', newsize * sizeof(NODE *));
+
+ /* brand new hash table, set things up and return */
+ if (symbol->var_array == NULL) {
+ symbol->table_size = 0;
+ goto done;
+ }
+
+ /* old hash table there, move stuff to new, free old */
+ old = symbol->var_array;
+ for (i = 0; i < oldsize; i++) {
+ if (old[i] == NULL)
+ continue;
+
+ for (chain = old[i]; chain != NULL; chain = next) {
+ next = chain->ahnext;
+ hash1 = hash(chain->ahname->stptr,
+ chain->ahname->stlen, newsize);
+
+ /* remove from old list, add to new */
+ chain->ahnext = new[hash1];
+ new[hash1] = chain;
+
+ }
+ }
+ free(old);
+
+done:
+ /*
+ * note that symbol->table_size does not change if an old array,
+ * and is explicitly set to 0 if a new one.
+ */
+ symbol->var_array = new;
+ symbol->array_size = newsize;
+}
diff --git a/gnu/usr.bin/awk/awk.1 b/gnu/usr.bin/awk/awk.1
index 0338485e8db8..a98c99d1608b 100644
--- a/gnu/usr.bin/awk/awk.1
+++ b/gnu/usr.bin/awk/awk.1
@@ -1,7 +1,7 @@
.ds PX \s-1POSIX\s+1
.ds UX \s-1UNIX\s+1
.ds AN \s-1ANSI\s+1
-.TH GAWK 1 "Apr 15 1993" "Free Software Foundation" "Utility Commands"
+.TH GAWK 1 "Apr 18 1994" "Free Software Foundation" "Utility Commands"
.SH NAME
gawk \- pattern scanning and processing language
.SH SYNOPSIS
@@ -71,6 +71,11 @@ option.
Each
.B \-W
option has a corresponding GNU style long option, as detailed below.
+Arguments to GNU style long options are either joined with the option
+by an
+.B =
+sign, with no intervening spaces, or they may be provided in the
+next command line argument.
.PP
.I Gawk
accepts the following options.
@@ -114,6 +119,26 @@ Multiple
(or
.BR \-\^\-file )
options may be used.
+.TP
+.PD 0
+.BI \-mf= NNN
+.TP
+.BI \-mr= NNN
+Set various memory limits to the value
+.IR NNN .
+The
+.B f
+flag sets the maximum number of fields, and the
+.B r
+flag sets the maximum record size. These two flags and the
+.B \-m
+option are from the AT&T Bell Labs research version of \*(UX
+.IR awk .
+They are ignored by
+.IR gawk ,
+since
+.I gawk
+has no pre-defined limits.
.TP \w'\fB\-\^\-copyright\fR'u+1n
.PD 0
.B "\-W compat"
@@ -158,6 +183,8 @@ the error output.
.B \-\^\-usage
Print a relatively short summary of the available options on
the error output.
+Per the GNU Coding Standards, these options cause an immediate,
+successful exit.
.TP
.PD 0
.B "\-W lint"
@@ -248,6 +275,8 @@ This is useful mainly for knowing if the current copy of
on your system
is up to date with respect to whatever the Free Software Foundation
is distributing.
+Per the GNU Coding Standards, these options cause an immediate,
+successful exit.
.TP
.B \-\^\-
Signal the end of options. This is useful to allow further arguments to the
@@ -255,7 +284,13 @@ AWK program itself to start with a ``\-''.
This is mainly for consistency with the argument parsing convention used
by most other \*(PX programs.
.PP
-Any other options are flagged as illegal, but are otherwise ignored.
+In compatibility mode,
+any other options are flagged as illegal, but are otherwise ignored.
+In normal operation, as long as program text has been supplied, unknown
+options are passed on to the AWK program in the
+.B ARGV
+array for processing. This is particularly useful for running AWK
+programs via the ``#!'' executable interpreter mechanism.
.SH AWK PROGRAM EXECUTION
.PP
An AWK program consists of a sequence of pattern-action statements
@@ -270,23 +305,23 @@ and optional function definitions.
.I Gawk
first reads the program source from the
.IR program-file (s)
-if specified, or from the first non-option argument on the command line.
+if specified,
+from arguments to
+.BR "\-W source=" ,
+or from the first non-option argument on the command line.
The
.B \-f
-option may be used multiple times on the command line.
+and
+.B "\-W source="
+options may be used multiple times on the command line.
.I Gawk
will read the program text as if all the
.IR program-file s
+and command line source texts
had been concatenated together. This is useful for building libraries
of AWK functions, without having to include them in each new AWK
-program that uses them. To use a library function in a file from a
-program typed in on the command line, specify
-.B /dev/tty
-as one of the
-.IR program-file s,
-type your program, and end it with a
-.B ^D
-(control-d).
+program that uses them. It also provides the ability to mix library
+functions with command line programs.
.PP
The environment variable
.B AWKPATH
@@ -302,11 +337,13 @@ option contains a ``/'' character, no path search is performed.
.I Gawk
executes AWK programs in the following order.
First,
+all variable assignments specified via the
+.B \-v
+option are performed.
+Next,
.I gawk
compiles the program into an internal form.
-Next, all variable assignments specified via the
-.B \-v
-option are performed. Then,
+Then,
.I gawk
executes the code in the
.B BEGIN
@@ -359,8 +396,8 @@ block(s) (if any).
AWK variables are dynamic; they come into existence when they are
first used. Their values are either floating-point numbers or strings,
or both,
-depending upon how they are used. AWK also has one dimension
-arrays; multiply dimensioned arrays may be simulated.
+depending upon how they are used. AWK also has one dimensional
+arrays; arrays with multiple dimensions may be simulated.
Several pre-defined variables are set as a program
runs; these will be described as needed and summarized below.
.SS Fields
@@ -435,6 +472,7 @@ cause the value of
.B $0
to be recomputed, with the fields being separated by the value of
.BR OFS .
+References to negative numbered fields cause a fatal error.
.SS Built-in Variables
.PP
AWK's built-in variables are:
@@ -482,7 +520,7 @@ If a system error occurs either doing a redirection for
during a read for
.BR getline ,
or during a
-.BR close ,
+.BR close() ,
then
.B ERRNO
will contain
@@ -505,6 +543,11 @@ The name of the current input file.
If no files are specified on the command line, the value of
.B FILENAME
is ``\-''.
+However,
+.B FILENAME
+is undefined inside the
+.B BEGIN
+block.
.TP
.B FNR
The input record number in the current input file.
@@ -644,6 +687,9 @@ loop to iterate over all the elements of an array.
An element may be deleted from an array using the
.B delete
statement.
+The
+.B delete
+statement may also be used to delete the entire contents of an array.
.SS Variable Typing And Conversion
.PP
Variables and fields
@@ -680,7 +726,7 @@ b = a ""
.PP
the variable
.B b
-has a value of \fB"12"\fR and not \fB"12.00"\fR.
+has a string value of \fB"12"\fR and not \fB"12.00"\fR.
.PP
.I Gawk
performs comparisons as follows:
@@ -809,7 +855,8 @@ the third. Only one of the second and third patterns is evaluated.
.PP
The
.IB pattern1 ", " pattern2
-form of an expression is called a range pattern.
+form of an expression is called a
+.IR "range pattern" .
It matches all input records starting with a line that matches
.IR pattern1 ,
and continuing until a record that matches
@@ -982,6 +1029,7 @@ as follows:
\fBbreak\fR
\fBcontinue\fR
\fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR
+\fBdelete \fIarray\^\fR
\fBexit\fR [ \fIexpression\fR ]
\fB{ \fIstatements \fB}
.fi
@@ -1046,10 +1094,20 @@ Prints the current record.
.TP
.BI print " expr-list"
Prints expressions.
+Each expression is separated by the value of the
+.B OFS
+variable. The output record is terminated with the value of the
+.B ORS
+variable.
.TP
.BI print " expr-list" " >" file
Prints expressions on
.IR file .
+Each expression is separated by the value of the
+.B OFS
+variable. The output record is terminated with the value of the
+.B ORS
+variable.
.TP
.BI printf " fmt, expr-list"
Format and print.
@@ -1078,8 +1136,9 @@ In a similar fashion,
.IB command " | getline"
pipes into
.BR getline .
-.BR Getline
-will return 0 on end of file, and \-1 on an error.
+The
+.BR getline
+command will return 0 on end of file, and \-1 on an error.
.SS The \fIprintf\fP\^ Statement
.PP
The AWK versions of the
@@ -1153,6 +1212,7 @@ The expression should be left-justified within its field.
The field should be padded to this width. If the number has a leading
zero, then the field will be padded with zeros.
Otherwise it is padded with blanks.
+This applies even to the non-numeric output formats.
.TP
.BI . prec
A number indicating the maximum width of strings or digits to the right
@@ -1229,7 +1289,7 @@ is the value of the
system call.
If there are any additional fields, they are the group IDs returned by
.IR getgroups (2).
-(Multiple groups may not be supported on all systems.)
+Multiple groups may not be supported on all systems.
.TP
.B /dev/stdin
The standard input.
@@ -1360,6 +1420,9 @@ and returns the number of fields. If
is omitted,
.B FS
is used instead.
+The array
+.I a
+is cleared first.
.TP
.BI sprintf( fmt , " expr-list" )
prints
@@ -1477,11 +1540,11 @@ the
As in \*(AN C, all following hexadecimal digits are considered part of
the escape sequence.
(This feature should tell us something about language design by committee.)
-E.g., "\ex1B" is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
+E.g., \fB"\ex1B"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
.TP
.BI \e ddd
The character represented by the 1-, 2-, or 3-digit sequence of octal
-digits. E.g. "\e033" is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
+digits. E.g. \fB"\e033"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
.TP
.BI \e c
The literal character
@@ -1562,7 +1625,15 @@ Concatenate and line number (a variation on a theme):
.ft R
.fi
.SH SEE ALSO
-.IR egrep (1)
+.IR egrep (1),
+.IR getpid (2),
+.IR getppid (2),
+.IR getpgrp (2),
+.IR getuid (2),
+.IR geteuid (2),
+.IR getgid (2),
+.IR getegid (2),
+.IR getgroups (2)
.PP
.IR "The AWK Programming Language" ,
Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
@@ -1600,7 +1671,7 @@ block was run. Applications came to depend on this ``feature.''
When
.I awk
was changed to match its documentation, this option was added to
-accomodate applications that depended upon the old behavior.
+accommodate applications that depended upon the old behavior.
(This feature was agreed upon by both the AT&T and GNU developers.)
.PP
The
@@ -1610,7 +1681,11 @@ option for implementation specific features is from the \*(PX standard.
When processing arguments,
.I gawk
uses the special option ``\fB\-\^\-\fP'' to signal the end of
-arguments, and warns about, but otherwise ignores, undefined options.
+arguments.
+In compatibility mode, it will warn about, but otherwise ignore,
+undefined options.
+In normal operation, such arguments are passed on to the AWK program for
+it to process.
.PP
The AWK book does not define the return value of
.BR srand() .
@@ -1706,6 +1781,11 @@ environment variable is not special.
The use of
.B "next file"
to abandon processing of the current input file.
+.TP
+\(bu
+The use of
+.BI delete " array"
+to delete the entire contents of an array.
.RE
.PP
The AWK book does not define the return value of the
@@ -1733,7 +1813,7 @@ option is ``t'', then
will be set to the tab character.
Since this is a rather ugly special case, it is not the default behavior.
This behavior also does not occur if
-.B \-Wposix
+.B "\-W posix"
has been specified.
.ig
.PP
@@ -1785,7 +1865,7 @@ a = length($0)
This feature is marked as ``deprecated'' in the \*(PX standard, and
.I gawk
will issue a warning about its use if
-.B \-Wlint
+.B "\-W lint"
is specified on the command line.
.PP
The other feature is the use of the
@@ -1801,8 +1881,21 @@ equivalent to the
statement.
.I Gawk
will support this usage if
-.B \-Wposix
+.B "\-W posix"
has not been specified.
+.SH ENVIRONMENT VARIABLES
+If
+.B POSIXLY_CORRECT
+exists in the environment, then
+.I gawk
+behaves exactly as if
+.B \-\-posix
+had been specified on the command line.
+If
+.B \-\-lint
+has been specified,
+.I gawk
+will issue a warning message to this effect.
.SH BUGS
The
.B \-F
@@ -1844,6 +1937,7 @@ the
and
.B \-e
options of the 2.11 version are no longer recognized.
+This fact will not even be documented in the manual page for version 2.16.
.SH AUTHORS
The original version of \*(UX
.I awk
@@ -1867,6 +1961,8 @@ compatible with the new version of \*(UX
The initial DOS port was done by Conrad Kwok and Scott Garfinkle.
Scott Deifik is the current DOS maintainer. Pat Rankin did the
port to VMS, and Michal Jaegermann did the port to the Atari ST.
+The port to OS/2 was done by Kai Uwe Rommel, with contributions and
+help from Darrel Hankerson.
.SH ACKNOWLEDGEMENTS
Brian Kernighan of Bell Labs
provided valuable assistance during testing and debugging.
diff --git a/gnu/usr.bin/awk/awk.h b/gnu/usr.bin/awk/awk.h
index ca3997f11d4b..066bf444f9f3 100644
--- a/gnu/usr.bin/awk/awk.h
+++ b/gnu/usr.bin/awk/awk.h
@@ -3,7 +3,7 @@
*/
/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -24,14 +24,18 @@
*/
/* ------------------------------ Includes ------------------------------ */
+#include "config.h"
+
#include <stdio.h>
+#ifndef LIMITS_H_MISSING
#include <limits.h>
+#endif
#include <ctype.h>
#include <setjmp.h>
#include <varargs.h>
#include <time.h>
#include <errno.h>
-#if !defined(errno) && !defined(MSDOS)
+#if !defined(errno) && !defined(MSDOS) && !defined(OS2)
extern int errno;
#endif
#ifdef __GNU_LIBRARY__
@@ -42,6 +46,10 @@ extern int errno;
/* ----------------- System dependencies (with more includes) -----------*/
+#if defined(__FreeBSD__)
+# include <floatingpoint.h>
+#endif
+
#if !defined(VMS) || (!defined(VAXC) && !defined(__DECC))
#include <sys/types.h>
#include <sys/stat.h>
@@ -53,8 +61,6 @@ extern int errno;
#include <signal.h>
-#include "config.h"
-
#ifdef __STDC__
#define P(s) s
#define MALLOC_ARG_T size_t
@@ -88,7 +94,7 @@ typedef unsigned int size_t;
#if defined(atarist) || defined(VMS)
#include <unixlib.h>
#else /* atarist || VMS */
-#ifndef MSDOS
+#if !defined(MSDOS) && !defined(_MSC_VER)
#include <unistd.h>
#endif /* MSDOS */
#endif /* atarist || VMS */
@@ -111,7 +117,11 @@ extern char *alloca();
#endif
#else /* not sparc */
#if !defined(alloca) && !defined(ALLOCA_PROTO)
+#if defined(_MSC_VER)
+#include <malloc.h>
+#else
extern char *alloca();
+#endif /* _MSC_VER */
#endif
#endif /* sparc */
#endif /* __GNUC__ */
@@ -150,7 +160,7 @@ extern FILE *popen P((const char *,const char *));
extern int pclose P((FILE *));
extern void vms_arg_fixup P((int *,char ***));
/* some things not in STDC_HEADERS */
-extern int gnu_strftime P((char *,size_t,const char *,const struct tm *));
+extern size_t gnu_strftime P((char *,size_t,const char *,const struct tm *));
extern int unlink P((const char *));
extern int getopt P((int,char **,char *));
extern int isatty P((int));
@@ -158,14 +168,9 @@ extern int isatty P((int));
extern int fileno P((FILE *));
#endif
extern int close(), dup(), dup2(), fstat(), read(), stat();
+extern int getpgrp P((void));
#endif /*VMS*/
-#ifdef MSDOS
-#include <io.h>
-extern FILE *popen P((char *, char *));
-extern int pclose P((FILE *));
-#endif
-
#define GNU_REGEX
#ifdef GNU_REGEX
#include "regex.h"
@@ -173,7 +178,7 @@ extern int pclose P((FILE *));
typedef struct Regexp {
struct re_pattern_buffer pat;
struct re_registers regs;
- struct regexp dfareg;
+ struct dfa dfareg;
int dfa;
} Regexp;
#define RESTART(rp,s) (rp)->regs.start[0]
@@ -184,6 +189,9 @@ typedef struct Regexp {
#ifdef atarist
#define read _text_read /* we do not want all these CR's to mess our input */
extern int _text_read (int, char *, int);
+#ifndef __MINT__
+#undef NGROUPS_MAX
+#endif /* __MINT__ */
#endif
#ifndef DEFPATH
@@ -194,6 +202,8 @@ extern int _text_read (int, char *, int);
#define ENVSEP ':'
#endif
+extern double double_to_int P((double d));
+
/* ------------------ Constants, Structures, Typedefs ------------------ */
#define AWKNUM double
@@ -331,6 +341,7 @@ typedef struct exp_node {
union {
struct exp_node *lptr;
char *param_name;
+ long ll;
} l;
union {
struct exp_node *rptr;
@@ -343,6 +354,7 @@ typedef struct exp_node {
union {
char *name;
struct exp_node *extra;
+ long xl;
} x;
short number;
unsigned char reflags;
@@ -362,7 +374,7 @@ typedef struct exp_node {
struct {
struct exp_node *next;
char *name;
- int length;
+ size_t length;
struct exp_node *value;
} hash;
#define hnext sub.hash.next
@@ -388,8 +400,8 @@ typedef struct exp_node {
# define NUM 32 /* numeric value is current */
# define NUMBER 64 /* assigned as number */
# define MAYBE_NUM 128 /* user input: if NUMERIC then
- * a NUMBER
- */
+ * a NUMBER */
+# define ARRAYMAXED 256 /* array is at max size */
char *vname; /* variable's name */
} NODE;
@@ -422,6 +434,8 @@ typedef struct exp_node {
#define var_value lnode
#define var_array sub.nodep.r.av
+#define array_size sub.nodep.l.ll
+#define table_size sub.nodep.x.xl
#define condpair lnode
#define triggered sub.nodep.r.r_ent
@@ -429,8 +443,6 @@ typedef struct exp_node {
#ifdef DONTDEF
int primes[] = {31, 61, 127, 257, 509, 1021, 2053, 4099, 8191, 16381};
#endif
-/* a quick profile suggests that the following is a good value */
-#define HASHSIZE 127
typedef struct for_loop_header {
NODE *init;
@@ -440,8 +452,8 @@ typedef struct for_loop_header {
/* for "for(iggy in foo) {" */
struct search {
- NODE **arr_ptr;
- NODE **arr_end;
+ NODE *sym;
+ size_t idx;
NODE *bucket;
NODE *retval;
};
@@ -499,13 +511,25 @@ struct src {
/* Return means return from a function call; leave value in ret_node */
#define TAG_RETURN 3
+#ifndef INT_MAX
+#define INT_MAX (~(1 << (sizeof (int) * 8 - 1)))
+#endif
+#ifndef LONG_MAX
+#define LONG_MAX (~(1 << (sizeof (long) * 8 - 1)))
+#endif
+#ifndef ULONG_MAX
+#define ULONG_MAX (~(unsigned long)0)
+#endif
+#ifndef LONG_MIN
+#define LONG_MIN (-LONG_MAX - 1)
+#endif
#define HUGE INT_MAX
/* -------------------------- External variables -------------------------- */
/* gawk builtin variables */
-extern int NF;
-extern int NR;
-extern int FNR;
+extern long NF;
+extern long NR;
+extern long FNR;
extern int IGNORECASE;
extern char *RS;
extern char *OFS;
@@ -558,17 +582,20 @@ extern int in_end_rule;
#ifdef DEBUG
#define tree_eval(t) r_tree_eval(t)
+#define get_lhs(p, a) r_get_lhs((p), (a))
+#undef freenode
#else
-#define tree_eval(t) (_t = (t),(_t) == NULL ? Nnull_string : \
- ((_t)->type == Node_val ? (_t) : \
- ((_t)->type == Node_var ? (_t)->var_value : \
- ((_t)->type == Node_param_list ? \
- (stack_ptr[(_t)->param_cnt])->var_value : \
- r_tree_eval((_t))))))
+#define get_lhs(p, a) ((p)->type == Node_var ? (&(p)->var_value) : \
+ r_get_lhs((p), (a)))
+#define tree_eval(t) (_t = (t),_t == NULL ? Nnull_string : \
+ (_t->type == Node_param_list ? r_tree_eval(_t) : \
+ (_t->type == Node_val ? _t : \
+ (_t->type == Node_var ? _t->var_value : \
+ r_tree_eval(_t)))))
#endif
-#define make_number(x) mk_number((x), (MALLOC|NUM|NUMBER))
-#define tmp_number(x) mk_number((x), (MALLOC|TEMP|NUM|NUMBER))
+#define make_number(x) mk_number((x), (unsigned int)(MALLOC|NUM|NUMBER))
+#define tmp_number(x) mk_number((x), (unsigned int)(MALLOC|TEMP|NUM|NUMBER))
#define free_temp(n) do {if ((n)->flags&TEMP) { unref(n); }} while (0)
#define make_string(s,l) make_str_node((s), SZTC (l),0)
@@ -620,7 +647,7 @@ extern double _msc51bug;
/* array.c */
extern NODE *concat_exp P((NODE *tree));
extern void assoc_clear P((NODE *symbol));
-extern unsigned int hash P((char *s, int len));
+extern unsigned int hash P((const char *s, size_t len, unsigned long hsize));
extern int in_array P((NODE *symbol, NODE *subs));
extern NODE **assoc_lookup P((NODE *symbol, NODE *subs));
extern void do_delete P((NODE *symbol, NODE *tree));
@@ -631,7 +658,7 @@ extern char *tokexpand P((void));
extern char nextc P((void));
extern NODE *node P((NODE *left, NODETYPE op, NODE *right));
extern NODE *install P((char *name, NODE *value));
-extern NODE *lookup P((char *name));
+extern NODE *lookup P((const char *name));
extern NODE *variable P((char *name, int can_free));
extern int yyparse P((void));
/* builtin.c */
@@ -663,7 +690,7 @@ extern NODE *do_sub P((NODE *tree));
extern int interpret P((NODE *volatile tree));
extern NODE *r_tree_eval P((NODE *tree));
extern int cmp_nodes P((NODE *t1, NODE *t2));
-extern NODE **get_lhs P((NODE *ptr, Func_ptr *assign));
+extern NODE **r_get_lhs P((NODE *ptr, Func_ptr *assign));
extern void set_IGNORECASE P((void));
void set_OFS P((void));
void set_ORS P((void));
@@ -687,8 +714,8 @@ extern struct redirect *redirect P((NODE *tree, int *errflg));
extern NODE *do_close P((NODE *tree));
extern int flush_io P((void));
extern int close_io P((void));
-extern int devopen P((char *name, char *mode));
-extern int pathopen P((char *file));
+extern int devopen P((const char *name, const char *mode));
+extern int pathopen P((const char *file));
extern NODE *do_getline P((NODE *tree));
extern void do_nextfile P((void));
/* iop.c */
@@ -702,13 +729,12 @@ extern void load_environ P((void));
extern char *arg_assign P((char *arg));
extern SIGTYPE catchsig P((int sig, int code));
/* msg.c */
-#ifdef MSDOS
-extern void err P((char *s, char *emsg, char *va_list, ...));
-extern void msg P((char *va_alist, ...));
-extern void warning P((char *va_alist, ...));
-extern void fatal P((char *va_alist, ...));
+extern void err P((const char *s, const char *emsg, va_list argp));
+#if _MSC_VER == 510
+extern void msg P((va_list va_alist, ...));
+extern void warning P((va_list va_alist, ...));
+extern void fatal P((va_list va_alist, ...));
#else
-extern void err ();
extern void msg ();
extern void warning ();
extern void fatal ();
@@ -727,8 +753,9 @@ extern void freenode P((NODE *it));
extern void unref P((NODE *tmp));
extern int parse_escape P((char **string_ptr));
/* re.c */
-extern Regexp *make_regexp P((char *s, int len, int ignorecase, int dfa));
-extern int research P((Regexp *rp, char *str, int start, int len, int need_start));
+extern Regexp *make_regexp P((char *s, size_t len, int ignorecase, int dfa));
+extern int research P((Regexp *rp, char *str, int start,
+ size_t len, int need_start));
extern void refree P((Regexp *rp));
extern void reg_error P((const char *s));
extern Regexp *re_update P((NODE *t));
diff --git a/gnu/usr.bin/awk/awk.y b/gnu/usr.bin/awk/awk.y
index 6e87f1c449cc..175cea90a8cd 100644
--- a/gnu/usr.bin/awk/awk.y
+++ b/gnu/usr.bin/awk/awk.y
@@ -3,7 +3,7 @@
*/
/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -56,9 +56,10 @@ static char *thisline = NULL;
#define YYDEBUG_LEXER_TEXT (lexeme)
static int param_counter;
static char *tokstart = NULL;
-static char *token = NULL;
+static char *tok = NULL;
static char *tokend;
+#define HASHSIZE 1021 /* this constant only used here */
NODE *variables[HASHSIZE];
extern char *source;
@@ -116,6 +117,7 @@ extern NODE *end_block;
%left LEX_GETLINE
%nonassoc LEX_IN
%left FUNC_CALL LEX_BUILTIN LEX_LENGTH
+%nonassoc ','
%nonassoc MATCHOP
%nonassoc RELOP '<' '>' '|' APPEND_OP
%left CONCAT_OP
@@ -161,6 +163,7 @@ program
}
| error { $$ = NULL; }
| program error { $$ = NULL; }
+ | /* empty */ { $$ = NULL; }
;
rule
@@ -276,7 +279,7 @@ function_body
pattern
: exp
{ $$ = $1; }
- | exp comma exp
+ | exp ',' exp
{ $$ = mkrangenode ( node($1, Node_cond_pair, $3) ); }
;
@@ -290,7 +293,7 @@ regexp
REGEXP '/'
{
NODE *n;
- int len;
+ size_t len;
getnode(n);
n->type = Node_regex;
@@ -385,10 +388,19 @@ statement
if ($2 && $2 == lookup("file")) {
if (do_lint)
warning("`next file' is a gawk extension");
- else if (do_unix || do_posix)
- yyerror("`next file' is a gawk extension");
- else if (! io_allowed)
- yyerror("`next file' used in BEGIN or END action");
+ if (do_unix || do_posix) {
+ /*
+ * can't use yyerror, since may have overshot
+ * the source line
+ */
+ errcount++;
+ msg("`next file' is a gawk extension");
+ }
+ if (! io_allowed) {
+ /* same thing */
+ errcount++;
+ msg("`next file' used in BEGIN or END action");
+ }
type = Node_K_nextfile;
} else {
if (! io_allowed)
@@ -405,6 +417,20 @@ statement
{ $$ = node ($3, Node_K_return, (NODE *)NULL); }
| LEX_DELETE NAME '[' expression_list ']' statement_term
{ $$ = node (variable($2,1), Node_K_delete, $4); }
+ | LEX_DELETE NAME statement_term
+ {
+ if (do_lint)
+ warning("`delete array' is a gawk extension");
+ if (do_unix || do_posix) {
+ /*
+ * can't use yyerror, since may have overshot
+ * the source line
+ */
+ errcount++;
+ msg("`delete array' is a gawk extension");
+ }
+ $$ = node (variable($2,1), Node_K_delete, (NODE *) NULL);
+ }
| exp statement_term
{ $$ = $1; }
;
@@ -693,7 +719,11 @@ non_post_simp_exp
$$ = node ($2, Node_unary_minus, (NODE *)NULL);
}
| '+' simp_exp %prec UNARY
- { $$ = $2; }
+ {
+ /* was: $$ = $2 */
+ /* POSIX semantics: force a conversion to numeric type */
+ $$ = node (make_number(0.0), Node_plus, $2);
+ }
;
opt_variable
@@ -745,7 +775,7 @@ comma : ',' opt_nls { yyerrok; }
%%
struct token {
- char *operator; /* text to match */
+ const char *operator; /* text to match */
NODETYPE value; /* node type */
int class; /* lexical class */
unsigned flags; /* # of args. allowed and compatability */
@@ -819,14 +849,15 @@ yyerror(va_alist)
va_dcl
{
va_list args;
- char *mesg = NULL;
+ const char *mesg = NULL;
register char *bp, *cp;
char *scan;
char buf[120];
+ static char end_of_file_line[] = "(END OF FILE)";
errcount++;
/* Find the current line in the input file */
- if (lexptr) {
+ if (lexptr && lexeme) {
if (!thisline) {
cp = lexeme;
if (*cp == '\n') {
@@ -834,7 +865,7 @@ va_dcl
mesg = "unexpected newline";
}
for ( ; cp != lexptr_begin && *cp != '\n'; --cp)
- ;
+ continue;
if (*cp == '\n')
cp++;
thisline = cp;
@@ -844,8 +875,8 @@ va_dcl
while (bp < lexend && *bp && *bp != '\n')
bp++;
} else {
- thisline = "(END OF FILE)";
- bp = thisline + 13;
+ thisline = end_of_file_line;
+ bp = thisline + strlen(thisline);
}
msg("%.*s", (int) (bp - thisline), thisline);
bp = buf;
@@ -882,12 +913,22 @@ get_src_buf()
static int did_newline = 0;
# define SLOP 128 /* enough space to hold most source lines */
+again:
if (nextfile > numfiles)
return NULL;
if (srcfiles[nextfile].stype == CMDLINE) {
if (len == 0) {
len = strlen(srcfiles[nextfile].val);
+ if (len == 0) {
+ /*
+ * Yet Another Special case:
+ * gawk '' /path/name
+ * Sigh.
+ */
+ ++nextfile;
+ goto again;
+ }
sourceline = 1;
lexptr = lexptr_begin = srcfiles[nextfile].val;
lexend = lexptr + len;
@@ -973,6 +1014,7 @@ get_src_buf()
if (n == 0) {
samefile = 0;
nextfile++;
+ *lexeme = '\0';
len = 0;
return get_src_buf();
}
@@ -981,7 +1023,7 @@ get_src_buf()
return buf;
}
-#define tokadd(x) (*token++ = (x), token == tokend ? tokexpand() : token)
+#define tokadd(x) (*tok++ = (x), tok == tokend ? tokexpand() : tok)
char *
tokexpand()
@@ -989,15 +1031,15 @@ tokexpand()
static int toksize = 60;
int tokoffset;
- tokoffset = token - tokstart;
+ tokoffset = tok - tokstart;
toksize *= 2;
if (tokstart)
erealloc(tokstart, char *, toksize, "tokexpand");
else
emalloc(tokstart, char *, toksize, "tokexpand");
tokend = tokstart + toksize;
- token = tokstart + tokoffset;
- return token;
+ tok = tokstart + tokoffset;
+ return tok;
}
#if DEBUG
@@ -1036,13 +1078,23 @@ yylex()
if (!nextc())
return 0;
pushback();
+#ifdef OS2
+ /*
+ * added for OS/2's extproc feature of cmd.exe
+ * (like #! in BSD sh)
+ */
+ if (strncasecmp(lexptr, "extproc ", 8) == 0) {
+ while (*lexptr && *lexptr != '\n')
+ lexptr++;
+ }
+#endif
lexeme = lexptr;
thisline = NULL;
if (want_regexp) {
int in_brack = 0;
want_regexp = 0;
- token = tokstart;
+ tok = tokstart;
while ((c = nextc()) != 0) {
switch (c) {
case '[':
@@ -1079,11 +1131,11 @@ yylex()
}
retry:
while ((c = nextc()) == ' ' || c == '\t')
- ;
+ continue;
lexeme = lexptr ? lexptr - 1 : lexptr;
thisline = NULL;
- token = tokstart;
+ tok = tokstart;
yylval.nodetypeval = Node_illegal;
switch (c) {
@@ -1104,18 +1156,28 @@ retry:
case '\\':
#ifdef RELAXED_CONTINUATION
- if (!do_unix) { /* strip trailing white-space and/or comment */
- while ((c = nextc()) == ' ' || c == '\t') continue;
+ /*
+ * This code puports to allow comments and/or whitespace
+ * after the `\' at the end of a line used for continuation.
+ * Use it at your own risk. We think it's a bad idea, which
+ * is why it's not on by default.
+ */
+ if (!do_unix) {
+ /* strip trailing white-space and/or comment */
+ while ((c = nextc()) == ' ' || c == '\t')
+ continue;
if (c == '#')
- while ((c = nextc()) != '\n') if (!c) break;
+ while ((c = nextc()) != '\n')
+ if (c == '\0')
+ break;
pushback();
}
-#endif /*RELAXED_CONTINUATION*/
+#endif /* RELAXED_CONTINUATION */
if (nextc() == '\n') {
sourceline++;
goto retry;
} else
- yyerror("inappropriate use of backslash");
+ yyerror("backslash not last character on line");
break;
case '$':
@@ -1296,7 +1358,7 @@ retry:
tokadd(c);
}
yylval.nodeval = make_str_node(tokstart,
- token - tokstart, esc_seen ? SCAN : 0);
+ tok - tokstart, esc_seen ? SCAN : 0);
yylval.nodeval->flags |= PERM;
return YSTRING;
@@ -1384,7 +1446,7 @@ retry:
break;
if (c == '#') {
while ((c = nextc()) != '\n' && c != '\0')
- ;
+ continue;
if (c == '\0')
break;
}
@@ -1410,7 +1472,7 @@ retry:
break;
if (c == '#') {
while ((c = nextc()) != '\n' && c != '\0')
- ;
+ continue;
if (c == '\0')
break;
}
@@ -1432,14 +1494,14 @@ retry:
yyerror("Invalid char '%c' in expression\n", c);
/* it's some type of name-type-thing. Find its length */
- token = tokstart;
+ tok = tokstart;
while (is_identchar(c)) {
tokadd(c);
c = nextc();
}
tokadd('\0');
- emalloc(tokkey, char *, token - tokstart, "yylex");
- memcpy(tokkey, tokstart, token - tokstart);
+ emalloc(tokkey, char *, tok - tokstart, "yylex");
+ memcpy(tokkey, tokstart, tok - tokstart);
pushback();
/* See if it is a special token. */
@@ -1478,6 +1540,7 @@ retry:
else
yylval.nodetypeval = tokentab[mid].value;
+ free(tokkey);
return tokentab[mid].class;
}
}
@@ -1638,10 +1701,11 @@ char *name;
NODE *value;
{
register NODE *hp;
- register int len, bucket;
+ register size_t len;
+ register int bucket;
len = strlen(name);
- bucket = hash(name, len);
+ bucket = hash(name, len, (unsigned long) HASHSIZE);
getnode(hp);
hp->type = Node_hashnode;
hp->hnext = variables[bucket];
@@ -1656,13 +1720,13 @@ NODE *value;
/* find the most recent hash node for name installed by install */
NODE *
lookup(name)
-char *name;
+const char *name;
{
register NODE *bucket;
- register int len;
+ register size_t len;
len = strlen(name);
- bucket = variables[hash(name, len)];
+ bucket = variables[hash(name, len, (unsigned long) HASHSIZE)];
while (bucket) {
if (bucket->hlength == len && STREQN(bucket->hname, name, len))
return bucket->hvalue;
@@ -1721,12 +1785,12 @@ NODE *np;
int freeit;
{
register NODE *bucket, **save;
- register int len;
+ register size_t len;
char *name;
name = np->param;
len = strlen(name);
- save = &(variables[hash(name, len)]);
+ save = &(variables[hash(name, len, (unsigned long) HASHSIZE)]);
for (bucket = *save; bucket; bucket = bucket->hnext) {
if (len == bucket->hlength && STREQN(bucket->hname, name, len)) {
*save = bucket->hnext;
diff --git a/gnu/usr.bin/awk/builtin.c b/gnu/usr.bin/awk/builtin.c
index 9d5e3b302fde..96411de70df4 100644
--- a/gnu/usr.bin/awk/builtin.c
+++ b/gnu/usr.bin/awk/builtin.c
@@ -3,7 +3,7 @@
*/
/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -26,9 +26,8 @@
#include "awk.h"
-
#ifndef SRANDOM_PROTO
-extern void srandom P((int seed));
+extern void srandom P((unsigned int seed));
#endif
#ifndef linux
extern char *initstate P((unsigned seed, char *state, int n));
@@ -40,10 +39,7 @@ extern NODE **fields_arr;
extern int output_is_tty;
static NODE *sub_common P((NODE *tree, int global));
-
-#ifdef GFMT_WORKAROUND
-char *gfmt P((double g, int prec, char *buf));
-#endif
+NODE *format_tree P((const char *, int, NODE *));
#ifdef _CRAY
/* Work around a problem in conversion of doubles to exact integers. */
@@ -63,12 +59,35 @@ double (*Log)() = log;
#define Ceil(n) ceil(n)
#endif
+#define DEFAULT_G_PRECISION 6
+
+#ifdef GFMT_WORKAROUND
+/* semi-temporary hack, mostly to gracefully handle VMS */
+static void sgfmt P((char *buf, const char *format, int alt,
+ int fwidth, int precision, double value));
+#endif /* GFMT_WORKAROUND */
+
+/*
+ * On the alpha, LONG_MAX is too big for doing rand().
+ * On the Cray (Y-MP, anyway), ints and longs are 64 bits, but
+ * random() does things in terms of 32 bits. So we have to chop
+ * LONG_MAX down.
+ */
+#if (defined(__alpha) && defined(__osf__)) || defined(_CRAY)
+#define GAWK_RANDOM_MAX (LONG_MAX & 0x7fffffff)
+#else
+#define GAWK_RANDOM_MAX LONG_MAX
+#endif
+
+static void efwrite P((const void *ptr, size_t size, size_t count, FILE *fp,
+ const char *from, struct redirect *rp,int flush));
+
static void
efwrite(ptr, size, count, fp, from, rp, flush)
-void *ptr;
-unsigned size, count;
+const void *ptr;
+size_t size, count;
FILE *fp;
-char *from;
+const char *from;
struct redirect *rp;
int flush;
{
@@ -117,7 +136,7 @@ NODE *tree;
{
NODE *s1, *s2;
register char *p1, *p2;
- register int l1, l2;
+ register size_t l1, l2;
long ret;
@@ -160,21 +179,30 @@ NODE *tree;
return tmp_number((AWKNUM) ret);
}
+double
+double_to_int(d)
+double d;
+{
+ double floor P((double));
+ double ceil P((double));
+
+ if (d >= 0)
+ d = Floor(d);
+ else
+ d = Ceil(d);
+ return d;
+}
+
NODE *
do_int(tree)
NODE *tree;
{
NODE *tmp;
- double floor P((double));
- double ceil P((double));
double d;
tmp = tree_eval(tree->lnode);
d = force_number(tmp);
- if (d >= 0)
- d = Floor(d);
- else
- d = Ceil(d);
+ d = double_to_int(d);
free_temp(tmp);
return tmp_number((AWKNUM) d);
}
@@ -184,7 +212,7 @@ do_length(tree)
NODE *tree;
{
NODE *tmp;
- int len;
+ size_t len;
tmp = tree_eval(tree->lnode);
len = force_string(tmp)->stlen;
@@ -211,27 +239,54 @@ NODE *tree;
return tmp_number((AWKNUM) d);
}
-/* %e and %f formats are not properly implemented. Someone should fix them */
-/* Actually, this whole thing should be reimplemented. */
+/*
+ * format_tree() formats nodes of a tree, starting with a left node,
+ * and accordingly to a fmt_string providing a format like in
+ * printf family from C library. Returns a string node which value
+ * is a formatted string. Called by sprintf function.
+ *
+ * It is one of the uglier parts of gawk. Thanks to Michal Jaegermann
+ * for taming this beast and making it compatible with ANSI C.
+ */
NODE *
-do_sprintf(tree)
-NODE *tree;
+format_tree(fmt_string, n0, carg)
+const char *fmt_string;
+int n0;
+register NODE *carg;
{
+/* copy 'l' bytes from 's' to 'obufout' checking for space in the process */
+/* difference of pointers should be of ptrdiff_t type, but let us be kind */
#define bchunk(s,l) if(l) {\
while((l)>ofre) {\
- erealloc(obuf, char *, osiz*2, "do_sprintf");\
+ long olen = obufout - obuf;\
+ erealloc(obuf, char *, osiz*2, "format_tree");\
ofre+=osiz;\
osiz*=2;\
+ obufout = obuf + olen;\
}\
- memcpy(obuf+olen,s,(l));\
- olen+=(l);\
+ memcpy(obufout,s,(size_t)(l));\
+ obufout+=(l);\
ofre-=(l);\
}
+/* copy one byte from 's' to 'obufout' checking for space in the process */
+#define bchunk_one(s) {\
+ if(ofre <= 0) {\
+ long olen = obufout - obuf;\
+ erealloc(obuf, char *, osiz*2, "format_tree");\
+ ofre+=osiz;\
+ osiz*=2;\
+ obufout = obuf + olen;\
+ }\
+ *obufout++ = *s;\
+ --ofre;\
+ }
/* Is there space for something L big in the buffer? */
#define chksize(l) if((l)>ofre) {\
- erealloc(obuf, char *, osiz*2, "do_sprintf");\
+ long olen = obufout - obuf;\
+ erealloc(obuf, char *, osiz*2, "format_tree");\
+ obufout = obuf + olen;\
ofre+=osiz;\
osiz*=2;\
}
@@ -250,15 +305,14 @@ NODE *tree;
NODE *r;
int toofew = 0;
- char *obuf;
- int osiz, ofre, olen;
- static char chbuf[] = "0123456789abcdef";
- static char sp[] = " ";
- char *s0, *s1;
- int n0;
- NODE *sfmt, *arg;
- register NODE *carg;
- long fw, prec, lj, alt, big;
+ char *obuf, *obufout;
+ size_t osiz, ofre;
+ char *chbuf;
+ const char *s0, *s1;
+ int cs1;
+ NODE *arg;
+ long fw, prec;
+ int lj, alt, big, have_prec;
long *cur;
long val;
#ifdef sun386 /* Can't cast unsigned (int/long) from ptr->value */
@@ -266,26 +320,26 @@ NODE *tree;
#endif
unsigned long uval;
int sgn;
- int base;
+ int base = 0;
char cpbuf[30]; /* if we have numbers bigger than 30 */
char *cend = &cpbuf[30];/* chars, we lose, but seems unlikely */
char *cp;
char *fill;
double tmpval;
- char *pr_str;
- int ucasehex = 0;
char signchar = 0;
- int len;
-
+ size_t len;
+ static char sp[] = " ";
+ static char zero_string[] = "0";
+ static char lchbuf[] = "0123456789abcdef";
+ static char Uchbuf[] = "0123456789ABCDEF";
- emalloc(obuf, char *, 120, "do_sprintf");
+ emalloc(obuf, char *, 120, "format_tree");
+ obufout = obuf;
osiz = 120;
ofre = osiz - 1;
- olen = 0;
- sfmt = tree_eval(tree->lnode);
- sfmt = force_string(sfmt);
- carg = tree->rnode;
- for (s0 = s1 = sfmt->stptr, n0 = sfmt->stlen; n0-- > 0;) {
+
+ s0 = s1 = fmt_string;
+ while (n0-- > 0) {
if (*s1 != '%') {
s1++;
continue;
@@ -295,24 +349,31 @@ NODE *tree;
cur = &fw;
fw = 0;
prec = 0;
+ have_prec = 0;
lj = alt = big = 0;
fill = sp;
cp = cend;
+ chbuf = lchbuf;
s1++;
retry:
--n0;
- switch (*s1++) {
+ switch (cs1 = *s1++) {
+ case (-1): /* dummy case to allow for checking */
+check_pos:
+ if (cur != &fw)
+ break; /* reject as a valid format */
+ goto retry;
case '%':
- bchunk("%", 1);
+ bchunk_one("%");
s0 = s1;
break;
case '0':
- if (fill != sp || lj)
- goto lose;
+ if (lj)
+ goto retry;
if (cur == &fw)
- fill = "0"; /* FALL through */
+ fill = zero_string; /* FALL through */
case '1':
case '2':
case '3':
@@ -323,44 +384,65 @@ retry:
case '8':
case '9':
if (cur == 0)
- goto lose;
- *cur = s1[-1] - '0';
+ /* goto lose; */
+ break;
+ if (prec >= 0)
+ *cur = cs1 - '0';
+ /* with a negative precision *cur is already set */
+ /* to -1, so it will remain negative, but we have */
+ /* to "eat" precision digits in any case */
while (n0 > 0 && *s1 >= '0' && *s1 <= '9') {
--n0;
*cur = *cur * 10 + *s1++ - '0';
}
+ if (prec < 0) /* negative precision is discarded */
+ have_prec = 0;
+ if (cur == &prec)
+ cur = 0;
goto retry;
case '*':
if (cur == 0)
- goto lose;
+ /* goto lose; */
+ break;
parse_next_arg();
*cur = force_number(arg);
free_temp(arg);
+ if (cur == &prec)
+ cur = 0;
goto retry;
case ' ': /* print ' ' or '-' */
+ /* 'space' flag is ignored */
+ /* if '+' already present */
+ if (signchar != 0)
+ goto check_pos;
+ /* FALL THROUGH */
case '+': /* print '+' or '-' */
- signchar = *(s1-1);
- goto retry;
+ signchar = cs1;
+ goto check_pos;
case '-':
- if (lj || fill != sp)
- goto lose;
- lj++;
- goto retry;
+ if (prec < 0)
+ break;
+ if (cur == &prec) {
+ prec = -1;
+ goto retry;
+ }
+ fill = sp; /* if left justified then other */
+ lj++; /* filling is ignored */
+ goto check_pos;
case '.':
if (cur != &fw)
- goto lose;
+ break;
cur = &prec;
+ have_prec++;
goto retry;
case '#':
- if (alt)
- goto lose;
alt++;
- goto retry;
+ goto check_pos;
case 'l':
if (big)
- goto lose;
+ break;
big++;
- goto retry;
+ goto check_pos;
case 'c':
parse_next_arg();
if (arg->flags & NUMBER) {
@@ -372,240 +454,189 @@ retry:
#endif
cpbuf[0] = uval;
prec = 1;
- pr_str = cpbuf;
- goto dopr_string;
+ cp = cpbuf;
+ goto pr_tail;
}
- if (! prec)
+ if (have_prec == 0)
prec = 1;
else if (prec > arg->stlen)
prec = arg->stlen;
- pr_str = arg->stptr;
- goto dopr_string;
+ cp = arg->stptr;
+ goto pr_tail;
case 's':
parse_next_arg();
arg = force_string(arg);
- if (!prec || prec > arg->stlen)
+ if (have_prec == 0 || prec > arg->stlen)
prec = arg->stlen;
- pr_str = arg->stptr;
-
- dopr_string:
- if (fw > prec && !lj) {
- while (fw > prec) {
- bchunk(sp, 1);
- fw--;
- }
- }
- bchunk(pr_str, (int) prec);
- if (fw > prec) {
- while (fw > prec) {
- bchunk(sp, 1);
- fw--;
- }
- }
- s0 = s1;
- free_temp(arg);
- break;
+ cp = arg->stptr;
+ goto pr_tail;
case 'd':
case 'i':
parse_next_arg();
- val = (long) force_number(arg);
- free_temp(arg);
+ tmpval = force_number(arg);
+ if (tmpval > LONG_MAX || tmpval < LONG_MIN) {
+ /* out of range - emergency use of %g format */
+ cs1 = 'g';
+ goto format_float;
+ }
+ val = (long) tmpval;
+
if (val < 0) {
sgn = 1;
- val = -val;
- } else
+ if (val > LONG_MIN)
+ uval = (unsigned long) -val;
+ else
+ uval = (unsigned long)(-(LONG_MIN + 1))
+ + (unsigned long)1;
+ } else {
sgn = 0;
+ uval = (unsigned long) val;
+ }
do {
- *--cp = '0' + val % 10;
- val /= 10;
- } while (val);
+ *--cp = (char) ('0' + uval % 10);
+ uval /= 10;
+ } while (uval);
if (sgn)
*--cp = '-';
else if (signchar)
*--cp = signchar;
+ if (have_prec != 0) /* ignore '0' flag if */
+ fill = sp; /* precision given */
if (prec > fw)
fw = prec;
prec = cend - cp;
- if (fw > prec && !lj) {
- if (fill != sp && (*cp == '-' || signchar)) {
- bchunk(cp, 1);
- cp++;
- prec--;
- fw--;
- }
- while (fw > prec) {
- bchunk(fill, 1);
- fw--;
- }
+ if (fw > prec && ! lj && fill != sp
+ && (*cp == '-' || signchar)) {
+ bchunk_one(cp);
+ cp++;
+ prec--;
+ fw--;
}
- bchunk(cp, (int) prec);
- if (fw > prec) {
- while (fw > prec) {
- bchunk(fill, 1);
- fw--;
- }
- }
- s0 = s1;
- break;
- case 'u':
- base = 10;
- goto pr_unsigned;
- case 'o':
- base = 8;
- goto pr_unsigned;
+ goto pr_tail;
case 'X':
- ucasehex = 1;
+ chbuf = Uchbuf; /* FALL THROUGH */
case 'x':
- base = 16;
- goto pr_unsigned;
- pr_unsigned:
+ base += 6; /* FALL THROUGH */
+ case 'u':
+ base += 2; /* FALL THROUGH */
+ case 'o':
+ base += 8;
parse_next_arg();
- uval = (unsigned long) force_number(arg);
- free_temp(arg);
+ tmpval = force_number(arg);
+ if (tmpval > ULONG_MAX || tmpval < LONG_MIN) {
+ /* out of range - emergency use of %g format */
+ cs1 = 'g';
+ goto format_float;
+ }
+ uval = (unsigned long)tmpval;
+ if (have_prec != 0) /* ignore '0' flag if */
+ fill = sp; /* precision given */
do {
*--cp = chbuf[uval % base];
- if (ucasehex && isalpha(*cp))
- *cp = toupper(*cp);
uval /= base;
} while (uval);
- if (alt && (base == 8 || base == 16)) {
+ if (alt) {
if (base == 16) {
- if (ucasehex)
- *--cp = 'X';
- else
- *--cp = 'x';
- }
- *--cp = '0';
+ *--cp = cs1;
+ *--cp = '0';
+ if (fill != sp) {
+ bchunk(cp, 2);
+ cp += 2;
+ fw -= 2;
+ }
+ } else if (base == 8)
+ *--cp = '0';
}
+ base = 0;
prec = cend - cp;
- if (fw > prec && !lj) {
+ pr_tail:
+ if (! lj) {
while (fw > prec) {
- bchunk(fill, 1);
+ bchunk_one(fill);
fw--;
}
}
bchunk(cp, (int) prec);
- if (fw > prec) {
- while (fw > prec) {
- bchunk(fill, 1);
- fw--;
- }
+ while (fw > prec) {
+ bchunk_one(fill);
+ fw--;
}
s0 = s1;
+ free_temp(arg);
break;
case 'g':
+ case 'G':
+ case 'e':
+ case 'f':
+ case 'E':
parse_next_arg();
tmpval = force_number(arg);
+ format_float:
free_temp(arg);
+ if (have_prec == 0)
+ prec = DEFAULT_G_PRECISION;
chksize(fw + prec + 9); /* 9==slop */
cp = cpbuf;
*cp++ = '%';
if (lj)
*cp++ = '-';
+ if (signchar)
+ *cp++ = signchar;
+ if (alt)
+ *cp++ = '#';
if (fill != sp)
*cp++ = '0';
+ cp = strcpy(cp, "*.*") + 3;
+ *cp++ = cs1;
+ *cp = '\0';
#ifndef GFMT_WORKAROUND
- if (cur != &fw) {
- (void) strcpy(cp, "*.*g");
- (void) sprintf(obuf + olen, cpbuf, (int) fw, (int) prec, (double) tmpval);
- } else {
- (void) strcpy(cp, "*g");
- (void) sprintf(obuf + olen, cpbuf, (int) fw, (double) tmpval);
- }
+ (void) sprintf(obufout, cpbuf,
+ (int) fw, (int) prec, (double) tmpval);
#else /* GFMT_WORKAROUND */
- {
- char *gptr, gbuf[120];
-#define DEFAULT_G_PRECISION 6
- if (fw + prec + 9 > sizeof gbuf) { /* 9==slop */
- emalloc(gptr, char *, fw+prec+9, "do_sprintf(gfmt)");
- } else
- gptr = gbuf;
- (void) gfmt((double) tmpval, cur != &fw ?
- (int) prec : DEFAULT_G_PRECISION, gptr);
- *cp++ = '*', *cp++ = 's', *cp = '\0';
- (void) sprintf(obuf + olen, cpbuf, (int) fw, gptr);
- if (fill != sp && *gptr == ' ') {
- char *p = gptr;
- do { *p++ = '0'; } while (*p == ' ');
- }
- if (gptr != gbuf) free(gptr);
- }
+ if (cs1 == 'g' || cs1 == 'G')
+ sgfmt(obufout, cpbuf, (int) alt,
+ (int) fw, (int) prec, (double) tmpval);
+ else
+ (void) sprintf(obufout, cpbuf,
+ (int) fw, (int) prec, (double) tmpval);
#endif /* GFMT_WORKAROUND */
- len = strlen(obuf + olen);
- ofre -= len;
- olen += len;
- s0 = s1;
- break;
-
- case 'f':
- parse_next_arg();
- tmpval = force_number(arg);
- free_temp(arg);
- chksize(fw + prec + 9); /* 9==slop */
-
- cp = cpbuf;
- *cp++ = '%';
- if (lj)
- *cp++ = '-';
- if (fill != sp)
- *cp++ = '0';
- if (cur != &fw) {
- (void) strcpy(cp, "*.*f");
- (void) sprintf(obuf + olen, cpbuf, (int) fw, (int) prec, (double) tmpval);
- } else {
- (void) strcpy(cp, "*f");
- (void) sprintf(obuf + olen, cpbuf, (int) fw, (double) tmpval);
- }
- len = strlen(obuf + olen);
- ofre -= len;
- olen += len;
- s0 = s1;
- break;
- case 'e':
- parse_next_arg();
- tmpval = force_number(arg);
- free_temp(arg);
- chksize(fw + prec + 9); /* 9==slop */
- cp = cpbuf;
- *cp++ = '%';
- if (lj)
- *cp++ = '-';
- if (fill != sp)
- *cp++ = '0';
- if (cur != &fw) {
- (void) strcpy(cp, "*.*e");
- (void) sprintf(obuf + olen, cpbuf, (int) fw, (int) prec, (double) tmpval);
- } else {
- (void) strcpy(cp, "*e");
- (void) sprintf(obuf + olen, cpbuf, (int) fw, (double) tmpval);
- }
- len = strlen(obuf + olen);
+ len = strlen(obufout);
ofre -= len;
- olen += len;
+ obufout += len;
s0 = s1;
break;
-
default:
- lose:
break;
}
if (toofew)
fatal("%s\n\t%s\n\t%*s%s",
"not enough arguments to satisfy format string",
- sfmt->stptr, s1 - sfmt->stptr - 2, "",
+ fmt_string, s1 - fmt_string - 2, "",
"^ ran out for this one"
);
}
if (do_lint && carg != NULL)
warning("too many arguments supplied for format string");
bchunk(s0, s1 - s0);
- free_temp(sfmt);
- r = make_str_node(obuf, olen, ALREADY_MALLOCED);
+ r = make_str_node(obuf, obufout - obuf, ALREADY_MALLOCED);
r->flags |= TEMP;
return r;
}
+NODE *
+do_sprintf(tree)
+NODE *tree;
+{
+ NODE *r;
+ NODE *sfmt = force_string(tree_eval(tree->lnode));
+
+ r = format_tree(sfmt->stptr, sfmt->stlen, tree->rnode);
+ free_temp(sfmt);
+ return r;
+}
+
+
void
do_printf(tree)
register NODE *tree;
@@ -654,6 +685,7 @@ NODE *tree;
NODE *r;
register int indx;
size_t length;
+ int is_long;
t1 = tree_eval(tree->lnode);
t2 = tree_eval(tree->rnode->lnode);
@@ -669,12 +701,16 @@ NODE *tree;
t1 = force_string(t1);
if (indx < 0)
indx = 0;
- if (indx >= t1->stlen || length <= 0) {
+ if (indx >= t1->stlen || (long) length <= 0) {
free_temp(t1);
return Nnull_string;
}
- if (indx + length > t1->stlen || LONG_MAX - indx < length)
+ if ((is_long = (indx + length > t1->stlen)) || LONG_MAX - indx < length) {
length = t1->stlen - indx;
+ if (do_lint && is_long)
+ warning("substr: length %d at position %d exceeds length of first argument",
+ length, indx+1);
+ }
r = tmp_string(t1->stptr + indx, length);
free_temp(t1);
return r;
@@ -688,7 +724,6 @@ NODE *tree;
struct tm *tm;
time_t fclock;
char buf[100];
- int ret;
t1 = force_string(tree_eval(tree->lnode));
@@ -701,9 +736,7 @@ NODE *tree;
}
tm = localtime(&fclock);
- ret = strftime(buf, 100, t1->stptr, tm);
-
- return tmp_string(buf, ret);
+ return tmp_string(buf, strftime(buf, 100, t1->stptr, tm));
}
NODE *
@@ -723,18 +756,43 @@ NODE *tree;
NODE *tmp;
int ret = 0;
char *cmd;
+ char save;
- (void) flush_io (); /* so output is synchronous with gawk's */
+ (void) flush_io (); /* so output is synchronous with gawk's */
tmp = tree_eval(tree->lnode);
cmd = force_string(tmp)->stptr;
+
if (cmd && *cmd) {
+ /* insure arg to system is zero-terminated */
+
+ /*
+ * From: David Trueman <emory!cs.dal.ca!david>
+ * To: arnold@cc.gatech.edu (Arnold Robbins)
+ * Date: Wed, 3 Nov 1993 12:49:41 -0400
+ *
+ * It may not be necessary to save the character, but
+ * I'm not sure. It would normally be the field
+ * separator. If the parse has not yet gone beyond
+ * that, it could mess up (although I doubt it). If
+ * FIELDWIDTHS is being used, it might be the first
+ * character of the next field. Unless someone wants
+ * to check it out exhaustively, I suggest saving it
+ * for now...
+ */
+ save = cmd[tmp->stlen];
+ cmd[tmp->stlen] = '\0';
+
ret = system(cmd);
ret = (ret >> 8) & 0xff;
+
+ cmd[tmp->stlen] = save;
}
free_temp(tmp);
return tmp_number((AWKNUM) ret);
}
+extern NODE **fmt_list; /* declared in eval.c */
+
void
do_print(tree)
register NODE *tree;
@@ -763,10 +821,18 @@ register NODE *tree;
if (OFMTidx == CONVFMTidx)
(void) force_string(t1);
else {
+#ifndef GFMT_WORKAROUND
char buf[100];
- sprintf(buf, OFMT, t1->numbr);
+ (void) sprintf(buf, OFMT, t1->numbr);
+ free_temp(t1);
t1 = tmp_string(buf, strlen(buf));
+#else /* GFMT_WORKAROUND */
+ free_temp(t1);
+ t1 = format_tree(OFMT,
+ fmt_list[OFMTidx]->stlen,
+ tree);
+#endif /* GFMT_WORKAROUND */
}
}
efwrite(t1->stptr, sizeof(char), t1->stlen, fp, "print", rp, 0);
@@ -775,12 +841,13 @@ register NODE *tree;
if (tree) {
s = OFS;
if (OFSlen)
- efwrite(s, sizeof(char), OFSlen, fp, "print", rp, 0);
+ efwrite(s, sizeof(char), (size_t)OFSlen,
+ fp, "print", rp, 0);
}
}
s = ORS;
if (ORSlen)
- efwrite(s, sizeof(char), ORSlen, fp, "print", rp, 1);
+ efwrite(s, sizeof(char), (size_t)ORSlen, fp, "print", rp, 1);
}
NODE *
@@ -863,7 +930,7 @@ NODE *tree;
}
static int firstrand = 1;
-static char state[256];
+static char state[512];
/* ARGSUSED */
NODE *
@@ -875,7 +942,7 @@ NODE *tree;
srandom(1);
firstrand = 0;
}
- return tmp_number((AWKNUM) random() / LONG_MAX);
+ return tmp_number((AWKNUM) random() / GAWK_RANDOM_MAX);
}
NODE *
@@ -892,10 +959,10 @@ NODE *tree;
(void) setstate(state);
if (!tree)
- srandom((int) (save_seed = (long) time((time_t *) 0)));
+ srandom((unsigned int) (save_seed = (long) time((time_t *) 0)));
else {
tmp = tree_eval(tree->lnode);
- srandom((int) (save_seed = (long) force_number(tmp)));
+ srandom((unsigned int) (save_seed = (long) force_number(tmp)));
free_temp(tmp);
}
firstrand = 0;
@@ -938,15 +1005,15 @@ int global;
register char *scan;
register char *bp, *cp;
char *buf;
- int buflen;
+ size_t buflen;
register char *matchend;
- register int len;
+ register size_t len;
char *matchstart;
char *text;
- int textlen;
+ size_t textlen;
char *repl;
char *replend;
- int repllen;
+ size_t repllen;
int sofar;
int ampersands;
int matches = 0;
@@ -970,9 +1037,9 @@ int global;
/* do the search early to avoid work on non-match */
if (research(rp, t->stptr, 0, t->stlen, 1) == -1 ||
- (RESTART(rp, t->stptr) > t->stlen) && (matches = 1)) {
+ RESTART(rp, t->stptr) > t->stlen) {
free_temp(t);
- return tmp_number((AWKNUM) matches);
+ return tmp_number((AWKNUM) 0.0);
}
if (tmp->type == Node_val)
@@ -1001,13 +1068,15 @@ int global;
repl = s->stptr;
replend = repl + s->stlen;
repllen = replend - repl;
- emalloc(buf, char *, buflen, "do_sub");
+ emalloc(buf, char *, buflen + 2, "do_sub");
+ buf[buflen] = '\0';
+ buf[buflen + 1] = '\0';
ampersands = 0;
for (scan = repl; scan < replend; scan++) {
if (*scan == '&') {
repllen--;
ampersands++;
- } else if (*scan == '\\' && (*(scan+1) == '&' || *(scan+1) == '\\')) {
+ } else if (*scan == '\\' && *(scan+1) == '&') {
repllen--;
scan++;
}
@@ -1026,7 +1095,7 @@ int global;
len = matchstart - text + repllen
+ ampersands * (matchend - matchstart);
sofar = bp - buf;
- while (buflen - sofar - len - 1 < 0) {
+ while ((long)(buflen - sofar - len - 1) < 0) {
buflen *= 2;
erealloc(buf, char *, buflen, "do_sub");
bp = buf + sofar;
@@ -1037,18 +1106,20 @@ int global;
if (*scan == '&')
for (cp = matchstart; cp < matchend; cp++)
*bp++ = *cp;
- else if (*scan == '\\' && (*(scan+1) == '&' || *(scan+1) == '\\')) {
+ else if (*scan == '\\' && *(scan+1) == '&') {
scan++;
*bp++ = *scan;
} else
*bp++ = *scan;
+
+ /* catch the case of gsub(//, "blah", whatever), i.e. empty regexp */
if (global && matchstart == matchend && matchend < text + textlen) {
*bp++ = *matchend;
matchend++;
}
textlen = text + textlen - matchend;
text = matchend;
- if (!global || textlen <= 0 ||
+ if (!global || (long)textlen <= 0 ||
research(rp, t->stptr, text-t->stptr, textlen, 1) == -1)
break;
}
@@ -1060,6 +1131,7 @@ int global;
}
for (scan = matchend; scan < text + textlen; scan++)
*bp++ = *scan;
+ *bp = '\0';
textlen = bp - buf;
free(t->stptr);
t->stptr = buf;
@@ -1093,41 +1165,75 @@ NODE *tree;
}
#ifdef GFMT_WORKAROUND
- /*
- * printf's %g format [can't rely on gcvt()]
- * caveat: don't use as argument to *printf()!
- */
-char *
-gfmt(g, prec, buf)
-double g; /* value to format */
-int prec; /* indicates desired significant digits, not decimal places */
+/*
+ * printf's %g format [can't rely on gcvt()]
+ * caveat: don't use as argument to *printf()!
+ * 'format' string HAS to be of "<flags>*.*g" kind, or we bomb!
+ */
+static void
+sgfmt(buf, format, alt, fwidth, prec, g)
char *buf; /* return buffer; assumed big enough to hold result */
+const char *format;
+int alt; /* use alternate form flag */
+int fwidth; /* field width in a format */
+int prec; /* indicates desired significant digits, not decimal places */
+double g; /* value to format */
{
- if (g == 0.0) {
- (void) strcpy(buf, "0"); /* easy special case */
- } else {
- register char *d, *e, *p;
-
- /* start with 'e' format (it'll provide nice exponent) */
- if (prec < 1) prec = 1; /* at least 1 significant digit */
- (void) sprintf(buf, "%.*e", prec - 1, g);
- if ((e = strchr(buf, 'e')) != 0) { /* find exponent */
- int exp = atoi(e+1); /* fetch exponent */
- if (exp >= -4 && exp < prec) { /* per K&R2, B1.2 */
- /* switch to 'f' format and re-do */
- prec -= (exp + 1); /* decimal precision */
- (void) sprintf(buf, "%.*f", prec, g);
- e = buf + strlen(buf);
- }
- if ((d = strchr(buf, '.')) != 0) {
- /* remove trailing zeroes and decimal point */
- for (p = e; p > d && *--p == '0'; ) continue;
- if (*p == '.') --p;
- if (++p < e) /* copy exponent and NUL */
- while ((*p++ = *e++) != '\0') continue;
- }
+ char dform[40];
+ register char *gpos;
+ register char *d, *e, *p;
+ int again = 0;
+
+ strncpy(dform, format, sizeof dform - 1);
+ dform[sizeof dform - 1] = '\0';
+ gpos = strrchr(dform, '.');
+
+ if (g == 0.0 && alt == 0) { /* easy special case */
+ *gpos++ = 'd';
+ *gpos = '\0';
+ (void) sprintf(buf, dform, fwidth, 0);
+ return;
+ }
+ gpos += 2; /* advance to location of 'g' in the format */
+
+ if (prec <= 0) /* negative precision is ignored */
+ prec = (prec < 0 ? DEFAULT_G_PRECISION : 1);
+
+ if (*gpos == 'G')
+ again = 1;
+ /* start with 'e' format (it'll provide nice exponent) */
+ *gpos = 'e';
+ prec -= 1;
+ (void) sprintf(buf, dform, fwidth, prec, g);
+ if ((e = strrchr(buf, 'e')) != NULL) { /* find exponent */
+ int exp = atoi(e+1); /* fetch exponent */
+ if (exp >= -4 && exp <= prec) { /* per K&R2, B1.2 */
+ /* switch to 'f' format and re-do */
+ *gpos = 'f';
+ prec -= exp; /* decimal precision */
+ (void) sprintf(buf, dform, fwidth, prec, g);
+ e = buf + strlen(buf);
+ while (*--e == ' ')
+ continue;
+ e += 1;
+ }
+ else if (again != 0)
+ *gpos = 'E';
+
+ /* if 'alt' in force, then trailing zeros are not removed */
+ if (alt == 0 && (d = strrchr(buf, '.')) != NULL) {
+ /* throw away an excess of precision */
+ for (p = e; p > d && *--p == '0'; )
+ prec -= 1;
+ if (d == p)
+ prec -= 1;
+ if (prec < 0)
+ prec = 0;
+ /* and do that once again */
+ again = 1;
}
+ if (again != 0)
+ (void) sprintf(buf, dform, fwidth, prec, g);
}
- return buf;
}
#endif /* GFMT_WORKAROUND */
diff --git a/gnu/usr.bin/awk/config.h b/gnu/usr.bin/awk/config.h
index 8c20953ed531..0b3cca1d8299 100644
--- a/gnu/usr.bin/awk/config.h
+++ b/gnu/usr.bin/awk/config.h
@@ -5,7 +5,7 @@
*/
/*
- * Copyright (C) 1991, 1992 the Free Software Foundation, Inc.
+ * Copyright (C) 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -92,6 +92,13 @@
*/
#define HAVE_UNDERSCORE_SETJMP 1
+/*
+ * LIMITS_H_MISSING
+ *
+ * You don't have a <limits.h> include file.
+ */
+/* #define LIMITS_H_MISSING 1 */
+
/***********************************************/
/* Missing library subroutines or system calls */
/***********************************************/
@@ -147,7 +154,7 @@
* Your system does not have the strtod() routine for converting
* strings to double precision floating point values.
*/
-/* #define STRTOD_MISSING 1 */
+/* #define STRTOD_MISSING 1 */
/*
* STRFTIME_MISSING
@@ -177,6 +184,15 @@
/* #define TZNAME_MISSING 1 */
/*
+ * TM_ZONE_MISSING
+ *
+ * Your "struct tm" is missing the tm_zone field.
+ * If this is the case *and* strftime() is missing *and* tzname is missing,
+ * define this.
+ */
+/* #define TM_ZONE_MISSING 1 */
+
+/*
* STDC_HEADERS
*
* If your system does have ANSI compliant header files that
@@ -269,4 +285,22 @@
*/
#define SRANDOM_PROTO 1
+/*
+ * getpgrp() in sysvr4 and POSIX takes no argument
+ */
+/* #define GETPGRP_NOARG 0 */
+
+/*
+ * define const to nothing if not __STDC__
+ */
+#ifndef __STDC__
+#define const
+#endif
+
+/* If svr4 and not gcc */
+/* #define SVR4 0 */
+#ifdef SVR4
+#define __svr4__ 1
+#endif
+
/* anything that follows is for system-specific short-term kludges */
diff --git a/gnu/usr.bin/awk/dfa.c b/gnu/usr.bin/awk/dfa.c
index 5293c755871d..47ad35e9cc2e 100644
--- a/gnu/usr.bin/awk/dfa.c
+++ b/gnu/usr.bin/awk/dfa.c
@@ -1,182 +1,146 @@
-/* dfa.c - determinisitic extended regexp routines for GNU
+/* dfa.c - deterministic extended regexp routines for GNU
Copyright (C) 1988 Free Software Foundation, Inc.
- Written June, 1988 by Mike Haertel
- Modified July, 1988 by Arthur David Olson
- to assist BMG speedups
-
- NO WARRANTY
-
- BECAUSE THIS PROGRAM IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY
-NO WARRANTY, TO THE EXTENT PERMITTED BY APPLICABLE STATE LAW. EXCEPT
-WHEN OTHERWISE STATED IN WRITING, FREE SOFTWARE FOUNDATION, INC,
-RICHARD M. STALLMAN AND/OR OTHER PARTIES PROVIDE THIS PROGRAM "AS IS"
-WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING,
-BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
-FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY
-AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE
-DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR
-CORRECTION.
-
- IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW WILL RICHARD M.
-STALLMAN, THE FREE SOFTWARE FOUNDATION, INC., AND/OR ANY OTHER PARTY
-WHO MAY MODIFY AND REDISTRIBUTE THIS PROGRAM AS PERMITTED BELOW, BE
-LIABLE TO YOU FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR
-OTHER SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
-USE OR INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR
-DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES OR
-A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) THIS
-PROGRAM, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
-DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY.
-
- GENERAL PUBLIC LICENSE TO COPY
-
- 1. You may copy and distribute verbatim copies of this source file
-as you receive it, in any medium, provided that you conspicuously and
-appropriately publish on each copy a valid copyright notice "Copyright
- (C) 1988 Free Software Foundation, Inc."; and include following the
-copyright notice a verbatim copy of the above disclaimer of warranty
-and of this License. You may charge a distribution fee for the
-physical act of transferring a copy.
-
- 2. You may modify your copy or copies of this source file or
-any portion of it, and copy and distribute such modifications under
-the terms of Paragraph 1 above, provided that you also do the following:
-
- a) cause the modified files to carry prominent notices stating
- that you changed the files and the date of any change; and
-
- b) cause the whole of any work that you distribute or publish,
- that in whole or in part contains or is a derivative of this
- program or any part thereof, to be licensed at no charge to all
- third parties on terms identical to those contained in this
- License Agreement (except that you may choose to grant more extensive
- warranty protection to some or all third parties, at your option).
-
- c) You may charge a distribution fee for the physical act of
- transferring a copy, and you may at your option offer warranty
- protection in exchange for a fee.
-
-Mere aggregation of another unrelated program with this program (or its
-derivative) on a volume of a storage or distribution medium does not bring
-the other program under the scope of these terms.
-
- 3. You may copy and distribute this program or any portion of it in
-compiled, executable or object code form under the terms of Paragraphs
-1 and 2 above provided that you do the following:
-
- a) accompany it with the complete corresponding machine-readable
- source code, which must be distributed under the terms of
- Paragraphs 1 and 2 above; or,
-
- b) accompany it with a written offer, valid for at least three
- years, to give any third party free (except for a nominal
- shipping charge) a complete machine-readable copy of the
- corresponding source code, to be distributed under the terms of
- Paragraphs 1 and 2 above; or,
-
- c) accompany it with the information you received as to where the
- corresponding source code may be obtained. (This alternative is
- allowed only for noncommercial distribution and only if you
- received the program in object code or executable form alone.)
-
-For an executable file, complete source code means all the source code for
-all modules it contains; but, as a special exception, it need not include
-source code for modules which are standard libraries that accompany the
-operating system on which the executable file runs.
-
- 4. You may not copy, sublicense, distribute or transfer this program
-except as expressly provided under this License Agreement. Any attempt
-otherwise to copy, sublicense, distribute or transfer this program is void and
-your rights to use the program under this License agreement shall be
-automatically terminated. However, parties who have received computer
-software programs from you with this License Agreement will not have
-their licenses terminated so long as such parties remain in full compliance.
-
- 5. If you wish to incorporate parts of this program into other free
-programs whose distribution conditions are different, write to the Free
-Software Foundation at 675 Mass Ave, Cambridge, MA 02139. We have not yet
-worked out a simple rule that can be stated here, but we will often permit
-this. We will be guided by the two goals of preserving the free status of
-all derivatives our free software and of promoting the sharing and reuse of
-software.
-
-
-In other words, you are welcome to use, share and improve this program.
-You are forbidden to forbid anyone else to use, share and improve
-what you give them. Help stamp out software-hoarding! */
-
-#include "awk.h"
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2, or (at your option)
+ any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
+
+/* Written June, 1988 by Mike Haertel
+ Modified July, 1988 by Arthur David Olson to assist BMG speedups */
+
#include <assert.h>
+#include <ctype.h>
+#include <stdio.h>
+
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+
+#ifdef STDC_HEADERS
+#include <stdlib.h>
+#else
+#include <sys/types.h>
+extern char *calloc(), *malloc(), *realloc();
+extern void free();
+#endif
-#ifdef setbit /* surprise - setbit and clrbit are macros on NeXT */
-#undef setbit
+#if defined(HAVE_STRING_H) || defined(STDC_HEADERS)
+#include <string.h>
+#undef index
+#define index strchr
+#else
+#include <strings.h>
#endif
-#ifdef clrbit
-#undef clrbit
+
+#ifndef DEBUG /* use the same approach as regex.c */
+#undef assert
+#define assert(e)
+#endif /* DEBUG */
+
+#ifndef isgraph
+#define isgraph(C) (isprint(C) && !isspace(C))
#endif
+#ifdef isascii
+#define ISALPHA(C) (isascii(C) && isalpha(C))
+#define ISUPPER(C) (isascii(C) && isupper(C))
+#define ISLOWER(C) (isascii(C) && islower(C))
+#define ISDIGIT(C) (isascii(C) && isdigit(C))
+#define ISXDIGIT(C) (isascii(C) && isxdigit(C))
+#define ISSPACE(C) (isascii(C) && isspace(C))
+#define ISPUNCT(C) (isascii(C) && ispunct(C))
+#define ISALNUM(C) (isascii(C) && isalnum(C))
+#define ISPRINT(C) (isascii(C) && isprint(C))
+#define ISGRAPH(C) (isascii(C) && isgraph(C))
+#define ISCNTRL(C) (isascii(C) && iscntrl(C))
+#else
+#define ISALPHA(C) isalpha(C)
+#define ISUPPER(C) isupper(C)
+#define ISLOWER(C) islower(C)
+#define ISDIGIT(C) isdigit(C)
+#define ISXDIGIT(C) isxdigit(C)
+#define ISSPACE(C) isspace(C)
+#define ISPUNCT(C) ispunct(C)
+#define ISALNUM(C) isalnum(C)
+#define ISPRINT(C) isprint(C)
+#define ISGRAPH(C) isgraph(C)
+#define ISCNTRL(C) iscntrl(C)
+#endif
+
+#include "regex.h"
+#include "dfa.h"
+
#ifdef __STDC__
typedef void *ptr_t;
#else
typedef char *ptr_t;
+#ifndef const
+#define const
+#endif
#endif
-typedef struct {
- char ** in;
- char * left;
- char * right;
- char * is;
-} must;
+static void dfamust _RE_ARGS((struct dfa *dfa));
-static ptr_t xcalloc P((int n, size_t s));
-static ptr_t xmalloc P((size_t n));
-static ptr_t xrealloc P((ptr_t p, size_t n));
-static int tstbit P((int b, _charset c));
-static void setbit P((int b, _charset c));
-static void clrbit P((int b, _charset c));
-static void copyset P((const _charset src, _charset dst));
-static void zeroset P((_charset s));
-static void notset P((_charset s));
-static int equal P((const _charset s1, const _charset s2));
-static int charset_index P((const _charset s));
-static _token lex P((void));
-static void addtok P((_token t));
-static void atom P((void));
-static void closure P((void));
-static void branch P((void));
-static void regexp P((void));
-static void copy P((const _position_set *src, _position_set *dst));
-static void insert P((_position p, _position_set *s));
-static void merge P((_position_set *s1, _position_set *s2, _position_set *m));
-static void delete P((_position p, _position_set *s));
-static int state_index P((struct regexp *r, _position_set *s,
+static ptr_t xcalloc _RE_ARGS((size_t n, size_t s));
+static ptr_t xmalloc _RE_ARGS((size_t n));
+static ptr_t xrealloc _RE_ARGS((ptr_t p, size_t n));
+#ifdef DEBUG
+static void prtok _RE_ARGS((token t));
+#endif
+static int tstbit _RE_ARGS((int b, charclass c));
+static void setbit _RE_ARGS((int b, charclass c));
+static void clrbit _RE_ARGS((int b, charclass c));
+static void copyset _RE_ARGS((charclass src, charclass dst));
+static void zeroset _RE_ARGS((charclass s));
+static void notset _RE_ARGS((charclass s));
+static int equal _RE_ARGS((charclass s1, charclass s2));
+static int charclass_index _RE_ARGS((charclass s));
+static int looking_at _RE_ARGS((const char *s));
+static token lex _RE_ARGS((void));
+static void addtok _RE_ARGS((token t));
+static void atom _RE_ARGS((void));
+static int nsubtoks _RE_ARGS((int tindex));
+static void copytoks _RE_ARGS((int tindex, int ntokens));
+static void closure _RE_ARGS((void));
+static void branch _RE_ARGS((void));
+static void regexp _RE_ARGS((int toplevel));
+static void copy _RE_ARGS((position_set *src, position_set *dst));
+static void insert _RE_ARGS((position p, position_set *s));
+static void merge _RE_ARGS((position_set *s1, position_set *s2, position_set *m));
+static void delete _RE_ARGS((position p, position_set *s));
+static int state_index _RE_ARGS((struct dfa *d, position_set *s,
int newline, int letter));
-static void epsclosure P((_position_set *s, struct regexp *r));
-static void build_state P((int s, struct regexp *r));
-static void build_state_zero P((struct regexp *r));
-static char *icatalloc P((char *old, const char *new));
-static char *icpyalloc P((const char *string));
-static char *istrstr P((char *lookin, char *lookfor));
-static void ifree P((char *cp));
-static void freelist P((char **cpp));
-static char **enlist P((char **cpp, char *new, size_t len));
-static char **comsubs P((char *left, char *right));
-static char **addlists P((char **old, char **new));
-static char **inboth P((char **left, char **right));
-static void resetmust P((must *mp));
-static void regmust P((struct regexp *r));
-
-#undef P
+static void build_state _RE_ARGS((int s, struct dfa *d));
+static void build_state_zero _RE_ARGS((struct dfa *d));
+static char *icatalloc _RE_ARGS((char *old, char *new));
+static char *icpyalloc _RE_ARGS((char *string));
+static char *istrstr _RE_ARGS((char *lookin, char *lookfor));
+static void ifree _RE_ARGS((char *cp));
+static void freelist _RE_ARGS((char **cpp));
+static char **enlist _RE_ARGS((char **cpp, char *new, size_t len));
+static char **comsubs _RE_ARGS((char *left, char *right));
+static char **addlists _RE_ARGS((char **old, char **new));
+static char **inboth _RE_ARGS((char **left, char **right));
static ptr_t
xcalloc(n, s)
- int n;
+ size_t n;
size_t s;
{
ptr_t r = calloc(n, s);
- if (NULL == r)
- reg_error("Memory exhausted"); /* reg_error does not return */
+ if (!r)
+ dfaerror("Memory exhausted");
return r;
}
@@ -187,8 +151,8 @@ xmalloc(n)
ptr_t r = malloc(n);
assert(n != 0);
- if (NULL == r)
- reg_error("Memory exhausted");
+ if (!r)
+ dfaerror("Memory exhausted");
return r;
}
@@ -200,13 +164,12 @@ xrealloc(p, n)
ptr_t r = realloc(p, n);
assert(n != 0);
- if (NULL == r)
- reg_error("Memory exhausted");
+ if (!r)
+ dfaerror("Memory exhausted");
return r;
}
-#define CALLOC(p, t, n) ((p) = (t *) xcalloc((n), sizeof (t)))
-#undef MALLOC
+#define CALLOC(p, t, n) ((p) = (t *) xcalloc((size_t)(n), sizeof (t)))
#define MALLOC(p, t, n) ((p) = (t *) xmalloc((n) * sizeof (t)))
#define REALLOC(p, t, n) ((p) = (t *) xrealloc((ptr_t) (p), (n) * sizeof (t)))
@@ -218,13 +181,52 @@ xrealloc(p, n)
(nalloc) *= 2; \
REALLOC(p, t, nalloc); \
}
-
-/* Stuff pertaining to charsets. */
+
+#ifdef DEBUG
+
+static void
+prtok(t)
+ token t;
+{
+ char *s;
+
+ if (t < 0)
+ fprintf(stderr, "END");
+ else if (t < NOTCHAR)
+ fprintf(stderr, "%c", t);
+ else
+ {
+ switch (t)
+ {
+ case EMPTY: s = "EMPTY"; break;
+ case BACKREF: s = "BACKREF"; break;
+ case BEGLINE: s = "BEGLINE"; break;
+ case ENDLINE: s = "ENDLINE"; break;
+ case BEGWORD: s = "BEGWORD"; break;
+ case ENDWORD: s = "ENDWORD"; break;
+ case LIMWORD: s = "LIMWORD"; break;
+ case NOTLIMWORD: s = "NOTLIMWORD"; break;
+ case QMARK: s = "QMARK"; break;
+ case STAR: s = "STAR"; break;
+ case PLUS: s = "PLUS"; break;
+ case CAT: s = "CAT"; break;
+ case OR: s = "OR"; break;
+ case ORTOP: s = "ORTOP"; break;
+ case LPAREN: s = "LPAREN"; break;
+ case RPAREN: s = "RPAREN"; break;
+ default: s = "CSET"; break;
+ }
+ fprintf(stderr, "%s", s);
+ }
+}
+#endif /* DEBUG */
+
+/* Stuff pertaining to charclasses. */
static int
tstbit(b, c)
int b;
- _charset c;
+ charclass c;
{
return c[b / INTBITS] & 1 << b % INTBITS;
}
@@ -232,7 +234,7 @@ tstbit(b, c)
static void
setbit(b, c)
int b;
- _charset c;
+ charclass c;
{
c[b / INTBITS] |= 1 << b % INTBITS;
}
@@ -240,84 +242,84 @@ setbit(b, c)
static void
clrbit(b, c)
int b;
- _charset c;
+ charclass c;
{
c[b / INTBITS] &= ~(1 << b % INTBITS);
}
static void
copyset(src, dst)
- const _charset src;
- _charset dst;
+ charclass src;
+ charclass dst;
{
int i;
- for (i = 0; i < _CHARSET_INTS; ++i)
+ for (i = 0; i < CHARCLASS_INTS; ++i)
dst[i] = src[i];
}
static void
zeroset(s)
- _charset s;
+ charclass s;
{
int i;
- for (i = 0; i < _CHARSET_INTS; ++i)
+ for (i = 0; i < CHARCLASS_INTS; ++i)
s[i] = 0;
}
static void
notset(s)
- _charset s;
+ charclass s;
{
int i;
- for (i = 0; i < _CHARSET_INTS; ++i)
+ for (i = 0; i < CHARCLASS_INTS; ++i)
s[i] = ~s[i];
}
static int
equal(s1, s2)
- const _charset s1;
- const _charset s2;
+ charclass s1;
+ charclass s2;
{
int i;
- for (i = 0; i < _CHARSET_INTS; ++i)
+ for (i = 0; i < CHARCLASS_INTS; ++i)
if (s1[i] != s2[i])
return 0;
return 1;
}
-
-/* A pointer to the current regexp is kept here during parsing. */
-static struct regexp *reg;
-/* Find the index of charset s in reg->charsets, or allocate a new charset. */
+/* A pointer to the current dfa is kept here during parsing. */
+static struct dfa *dfa;
+
+/* Find the index of charclass s in dfa->charclasses, or allocate a new charclass. */
static int
-charset_index(s)
- const _charset s;
+charclass_index(s)
+ charclass s;
{
int i;
- for (i = 0; i < reg->cindex; ++i)
- if (equal(s, reg->charsets[i]))
+ for (i = 0; i < dfa->cindex; ++i)
+ if (equal(s, dfa->charclasses[i]))
return i;
- REALLOC_IF_NECESSARY(reg->charsets, _charset, reg->calloc, reg->cindex);
- ++reg->cindex;
- copyset(s, reg->charsets[i]);
+ REALLOC_IF_NECESSARY(dfa->charclasses, charclass, dfa->calloc, dfa->cindex);
+ ++dfa->cindex;
+ copyset(s, dfa->charclasses[i]);
return i;
}
/* Syntax bits controlling the behavior of the lexical analyzer. */
-static syntax_bits, syntax_bits_set;
+static reg_syntax_t syntax_bits, syntax_bits_set;
/* Flag for case-folding letters into sets. */
-static case_fold;
+static int case_fold;
/* Entry point to set syntax options. */
void
-regsyntax(bits, fold)
- long bits;
+dfasyntax(bits, fold)
+ reg_syntax_t bits;
int fold;
{
syntax_bits_set = 1;
@@ -325,63 +327,136 @@ regsyntax(bits, fold)
case_fold = fold;
}
-/* Lexical analyzer. */
-static const char *lexstart; /* Pointer to beginning of input string. */
-static const char *lexptr; /* Pointer to next input character. */
+/* Lexical analyzer. All the dross that deals with the obnoxious
+ GNU Regex syntax bits is located here. The poor, suffering
+ reader is referred to the GNU Regex documentation for the
+ meaning of the @#%!@#%^!@ syntax bits. */
+
+static char *lexstart; /* Pointer to beginning of input string. */
+static char *lexptr; /* Pointer to next input character. */
static lexleft; /* Number of characters remaining. */
-static caret_allowed; /* True if backward context allows ^
- (meaningful only if RE_CONTEXT_INDEP_OPS
- is turned off). */
-static closure_allowed; /* True if backward context allows closures
- (meaningful only if RE_CONTEXT_INDEP_OPS
- is turned off). */
+static token lasttok; /* Previous token returned; initially END. */
+static int laststart; /* True if we're separated from beginning or (, |
+ only by zero-width characters. */
+static int parens; /* Count of outstanding left parens. */
+static int minrep, maxrep; /* Repeat counts for {m,n}. */
/* Note that characters become unsigned here. */
#define FETCH(c, eoferr) \
{ \
if (! lexleft) \
- if (eoferr != NULL) \
- reg_error(eoferr); \
+ if (eoferr != 0) \
+ dfaerror(eoferr); \
else \
- return _END; \
+ return lasttok = END; \
(c) = (unsigned char) *lexptr++; \
--lexleft; \
}
-static _token
+#ifdef __STDC__
+#define FUNC(F, P) static int F(int c) { return P(c); }
+#else
+#define FUNC(F, P) static int F(c) int c; { return P(c); }
+#endif
+
+FUNC(is_alpha, ISALPHA)
+FUNC(is_upper, ISUPPER)
+FUNC(is_lower, ISLOWER)
+FUNC(is_digit, ISDIGIT)
+FUNC(is_xdigit, ISXDIGIT)
+FUNC(is_space, ISSPACE)
+FUNC(is_punct, ISPUNCT)
+FUNC(is_alnum, ISALNUM)
+FUNC(is_print, ISPRINT)
+FUNC(is_graph, ISGRAPH)
+FUNC(is_cntrl, ISCNTRL)
+
+/* The following list maps the names of the Posix named character classes
+ to predicate functions that determine whether a given character is in
+ the class. The leading [ has already been eaten by the lexical analyzer. */
+static struct {
+ const char *name;
+ int (*pred) _RE_ARGS((int));
+} prednames[] = {
+ { ":alpha:]", is_alpha },
+ { ":upper:]", is_upper },
+ { ":lower:]", is_lower },
+ { ":digit:]", is_digit },
+ { ":xdigit:]", is_xdigit },
+ { ":space:]", is_space },
+ { ":punct:]", is_punct },
+ { ":alnum:]", is_alnum },
+ { ":print:]", is_print },
+ { ":graph:]", is_graph },
+ { ":cntrl:]", is_cntrl },
+ { 0 }
+};
+
+static int
+looking_at(s)
+ const char *s;
+{
+ size_t len;
+
+ len = strlen(s);
+ if (lexleft < len)
+ return 0;
+ return strncmp(s, lexptr, len) == 0;
+}
+
+static token
lex()
{
- _token c, c2;
- int invert;
- _charset cset;
+ token c, c1, c2;
+ int backslash = 0, invert;
+ charclass ccl;
+ int i;
- FETCH(c, (char *) 0);
- switch (c)
+ /* Basic plan: We fetch a character. If it's a backslash,
+ we set the backslash flag and go through the loop again.
+ On the plus side, this avoids having a duplicate of the
+ main switch inside the backslash case. On the minus side,
+ it means that just about every case begins with
+ "if (backslash) ...". */
+ for (i = 0; i < 2; ++i)
{
- case '^':
- if (! (syntax_bits & RE_CONTEXT_INDEP_OPS)
- && (!caret_allowed ||
- ((syntax_bits & RE_TIGHT_VBAR) && lexptr - 1 != lexstart)))
- goto normal_char;
- caret_allowed = 0;
- return syntax_bits & RE_TIGHT_VBAR ? _ALLBEGLINE : _BEGLINE;
-
- case '$':
- if (syntax_bits & RE_CONTEXT_INDEP_OPS || !lexleft
- || (! (syntax_bits & RE_TIGHT_VBAR)
- && ((syntax_bits & RE_NO_BK_PARENS
- ? lexleft > 0 && *lexptr == ')'
- : lexleft > 1 && *lexptr == '\\' && lexptr[1] == ')')
- || (syntax_bits & RE_NO_BK_VBAR
- ? lexleft > 0 && *lexptr == '|'
- : lexleft > 1 && *lexptr == '\\' && lexptr[1] == '|'))))
- return syntax_bits & RE_TIGHT_VBAR ? _ALLENDLINE : _ENDLINE;
- goto normal_char;
-
- case '\\':
- FETCH(c, "Unfinished \\ quote");
+ FETCH(c, 0);
switch (c)
{
+ case '\\':
+ if (backslash)
+ goto normal_char;
+ if (lexleft == 0)
+ dfaerror("Unfinished \\ escape");
+ backslash = 1;
+ break;
+
+ case '^':
+ if (backslash)
+ goto normal_char;
+ if (syntax_bits & RE_CONTEXT_INDEP_ANCHORS
+ || lasttok == END
+ || lasttok == LPAREN
+ || lasttok == OR)
+ return lasttok = BEGLINE;
+ goto normal_char;
+
+ case '$':
+ if (backslash)
+ goto normal_char;
+ if (syntax_bits & RE_CONTEXT_INDEP_ANCHORS
+ || lexleft == 0
+ || (syntax_bits & RE_NO_BK_PARENS
+ ? lexleft > 0 && *lexptr == ')'
+ : lexleft > 1 && lexptr[0] == '\\' && lexptr[1] == ')')
+ || (syntax_bits & RE_NO_BK_VBAR
+ ? lexleft > 0 && *lexptr == '|'
+ : lexleft > 1 && lexptr[0] == '\\' && lexptr[1] == '|')
+ || ((syntax_bits & RE_NEWLINE_ALT)
+ && lexleft > 0 && *lexptr == '\n'))
+ return lasttok = ENDLINE;
+ goto normal_char;
+
case '1':
case '2':
case '3':
@@ -391,236 +466,315 @@ lex()
case '7':
case '8':
case '9':
- caret_allowed = 0;
- closure_allowed = 1;
- return _BACKREF;
+ if (backslash && !(syntax_bits & RE_NO_BK_REFS))
+ {
+ laststart = 0;
+ return lasttok = BACKREF;
+ }
+ goto normal_char;
case '<':
- caret_allowed = 0;
- return _BEGWORD;
+ if (syntax_bits & RE_NO_GNU_OPS)
+ goto normal_char;
+ if (backslash)
+ return lasttok = BEGWORD;
+ goto normal_char;
case '>':
- caret_allowed = 0;
- return _ENDWORD;
+ if (syntax_bits & RE_NO_GNU_OPS)
+ goto normal_char;
+ if (backslash)
+ return lasttok = ENDWORD;
+ goto normal_char;
case 'b':
- caret_allowed = 0;
- return _LIMWORD;
+ if (syntax_bits & RE_NO_GNU_OPS)
+ goto normal_char;
+ if (backslash)
+ return lasttok = LIMWORD;
+ goto normal_char;
case 'B':
- caret_allowed = 0;
- return _NOTLIMWORD;
-
- case 'w':
- case 'W':
- zeroset(cset);
- for (c2 = 0; c2 < _NOTCHAR; ++c2)
- if (ISALNUM(c2))
- setbit(c2, cset);
- if (c == 'W')
- notset(cset);
- caret_allowed = 0;
- closure_allowed = 1;
- return _SET + charset_index(cset);
+ if (syntax_bits & RE_NO_GNU_OPS)
+ goto normal_char;
+ if (backslash)
+ return lasttok = NOTLIMWORD;
+ goto normal_char;
case '?':
- if (syntax_bits & RE_BK_PLUS_QM)
- goto qmark;
- goto normal_char;
+ if (syntax_bits & RE_LIMITED_OPS)
+ goto normal_char;
+ if (backslash != ((syntax_bits & RE_BK_PLUS_QM) != 0))
+ goto normal_char;
+ if (!(syntax_bits & RE_CONTEXT_INDEP_OPS) && laststart)
+ goto normal_char;
+ return lasttok = QMARK;
+
+ case '*':
+ if (backslash)
+ goto normal_char;
+ if (!(syntax_bits & RE_CONTEXT_INDEP_OPS) && laststart)
+ goto normal_char;
+ return lasttok = STAR;
case '+':
- if (syntax_bits & RE_BK_PLUS_QM)
- goto plus;
- goto normal_char;
+ if (syntax_bits & RE_LIMITED_OPS)
+ goto normal_char;
+ if (backslash != ((syntax_bits & RE_BK_PLUS_QM) != 0))
+ goto normal_char;
+ if (!(syntax_bits & RE_CONTEXT_INDEP_OPS) && laststart)
+ goto normal_char;
+ return lasttok = PLUS;
+
+ case '{':
+ if (!(syntax_bits & RE_INTERVALS))
+ goto normal_char;
+ if (backslash != ((syntax_bits & RE_NO_BK_BRACES) == 0))
+ goto normal_char;
+ minrep = maxrep = 0;
+ /* Cases:
+ {M} - exact count
+ {M,} - minimum count, maximum is infinity
+ {,M} - 0 through M
+ {M,N} - M through N */
+ FETCH(c, "unfinished repeat count");
+ if (ISDIGIT(c))
+ {
+ minrep = c - '0';
+ for (;;)
+ {
+ FETCH(c, "unfinished repeat count");
+ if (!ISDIGIT(c))
+ break;
+ minrep = 10 * minrep + c - '0';
+ }
+ }
+ else if (c != ',')
+ dfaerror("malformed repeat count");
+ if (c == ',')
+ for (;;)
+ {
+ FETCH(c, "unfinished repeat count");
+ if (!ISDIGIT(c))
+ break;
+ maxrep = 10 * maxrep + c - '0';
+ }
+ else
+ maxrep = minrep;
+ if (!(syntax_bits & RE_NO_BK_BRACES))
+ {
+ if (c != '\\')
+ dfaerror("malformed repeat count");
+ FETCH(c, "unfinished repeat count");
+ }
+ if (c != '}')
+ dfaerror("malformed repeat count");
+ laststart = 0;
+ return lasttok = REPMN;
case '|':
- if (! (syntax_bits & RE_NO_BK_VBAR))
- goto or;
- goto normal_char;
+ if (syntax_bits & RE_LIMITED_OPS)
+ goto normal_char;
+ if (backslash != ((syntax_bits & RE_NO_BK_VBAR) == 0))
+ goto normal_char;
+ laststart = 1;
+ return lasttok = OR;
+
+ case '\n':
+ if (syntax_bits & RE_LIMITED_OPS
+ || backslash
+ || !(syntax_bits & RE_NEWLINE_ALT))
+ goto normal_char;
+ laststart = 1;
+ return lasttok = OR;
case '(':
- if (! (syntax_bits & RE_NO_BK_PARENS))
- goto lparen;
- goto normal_char;
+ if (backslash != ((syntax_bits & RE_NO_BK_PARENS) == 0))
+ goto normal_char;
+ ++parens;
+ laststart = 1;
+ return lasttok = LPAREN;
case ')':
- if (! (syntax_bits & RE_NO_BK_PARENS))
- goto rparen;
- goto normal_char;
-
- default:
- goto normal_char;
- }
+ if (backslash != ((syntax_bits & RE_NO_BK_PARENS) == 0))
+ goto normal_char;
+ if (parens == 0 && syntax_bits & RE_UNMATCHED_RIGHT_PAREN_ORD)
+ goto normal_char;
+ --parens;
+ laststart = 0;
+ return lasttok = RPAREN;
+
+ case '.':
+ if (backslash)
+ goto normal_char;
+ zeroset(ccl);
+ notset(ccl);
+ if (!(syntax_bits & RE_DOT_NEWLINE))
+ clrbit('\n', ccl);
+ if (syntax_bits & RE_DOT_NOT_NULL)
+ clrbit('\0', ccl);
+ laststart = 0;
+ return lasttok = CSET + charclass_index(ccl);
- case '?':
- if (syntax_bits & RE_BK_PLUS_QM)
- goto normal_char;
- qmark:
- if (! (syntax_bits & RE_CONTEXT_INDEP_OPS) && !closure_allowed)
- goto normal_char;
- return _QMARK;
-
- case '*':
- if (! (syntax_bits & RE_CONTEXT_INDEP_OPS) && !closure_allowed)
- goto normal_char;
- return _STAR;
-
- case '+':
- if (syntax_bits & RE_BK_PLUS_QM)
- goto normal_char;
- plus:
- if (! (syntax_bits & RE_CONTEXT_INDEP_OPS) && !closure_allowed)
- goto normal_char;
- return _PLUS;
-
- case '|':
- if (! (syntax_bits & RE_NO_BK_VBAR))
- goto normal_char;
- or:
- caret_allowed = 1;
- closure_allowed = 0;
- return _OR;
-
- case '\n':
- if (! (syntax_bits & RE_NEWLINE_OR))
- goto normal_char;
- goto or;
-
- case '(':
- if (! (syntax_bits & RE_NO_BK_PARENS))
- goto normal_char;
- lparen:
- caret_allowed = 1;
- closure_allowed = 0;
- return _LPAREN;
-
- case ')':
- if (! (syntax_bits & RE_NO_BK_PARENS))
- goto normal_char;
- rparen:
- caret_allowed = 0;
- closure_allowed = 1;
- return _RPAREN;
-
- case '.':
- zeroset(cset);
- notset(cset);
- clrbit('\n', cset);
- caret_allowed = 0;
- closure_allowed = 1;
- return _SET + charset_index(cset);
-
- case '[':
- zeroset(cset);
- FETCH(c, "Unbalanced [");
- if (c == '^')
- {
+ case 'w':
+ case 'W':
+ if (!backslash || (syntax_bits & RE_NO_GNU_OPS))
+ goto normal_char;
+ zeroset(ccl);
+ for (c2 = 0; c2 < NOTCHAR; ++c2)
+ if (ISALNUM(c2))
+ setbit(c2, ccl);
+ if (c == 'W')
+ notset(ccl);
+ laststart = 0;
+ return lasttok = CSET + charclass_index(ccl);
+
+ case '[':
+ if (backslash)
+ goto normal_char;
+ zeroset(ccl);
FETCH(c, "Unbalanced [");
- invert = 1;
- }
- else
- invert = 0;
- do
- {
- FETCH(c2, "Unbalanced [");
- if ((syntax_bits & RE_AWK_CLASS_HACK) && c == '\\')
+ if (c == '^')
{
- c = c2;
- FETCH(c2, "Unbalanced [");
+ FETCH(c, "Unbalanced [");
+ invert = 1;
}
- if (c2 == '-')
+ else
+ invert = 0;
+ do
{
- FETCH(c2, "Unbalanced [");
- if (c2 == ']' && (syntax_bits & RE_AWK_CLASS_HACK))
+ /* Nobody ever said this had to be fast. :-)
+ Note that if we're looking at some other [:...:]
+ construct, we just treat it as a bunch of ordinary
+ characters. We can do this because we assume
+ regex has checked for syntax errors before
+ dfa is ever called. */
+ if (c == '[' && (syntax_bits & RE_CHAR_CLASSES))
+ for (c1 = 0; prednames[c1].name; ++c1)
+ if (looking_at(prednames[c1].name))
+ {
+ for (c2 = 0; c2 < NOTCHAR; ++c2)
+ if ((*prednames[c1].pred)(c2))
+ setbit(c2, ccl);
+ lexptr += strlen(prednames[c1].name);
+ lexleft -= strlen(prednames[c1].name);
+ FETCH(c1, "Unbalanced [");
+ goto skip;
+ }
+ if (c == '\\' && (syntax_bits & RE_BACKSLASH_ESCAPE_IN_LISTS))
+ FETCH(c, "Unbalanced [");
+ FETCH(c1, "Unbalanced [");
+ if (c1 == '-')
{
- setbit(c, cset);
- setbit('-', cset);
- break;
- }
+ FETCH(c2, "Unbalanced [");
+ if (c2 == ']')
+ {
+ /* In the case [x-], the - is an ordinary hyphen,
+ which is left in c1, the lookahead character. */
+ --lexptr;
+ ++lexleft;
+ c2 = c;
+ }
+ else
+ {
+ if (c2 == '\\'
+ && (syntax_bits & RE_BACKSLASH_ESCAPE_IN_LISTS))
+ FETCH(c2, "Unbalanced [");
+ FETCH(c1, "Unbalanced [");
+ }
+ }
+ else
+ c2 = c;
while (c <= c2)
- setbit(c++, cset);
- FETCH(c, "Unbalanced [");
+ {
+ setbit(c, ccl);
+ if (case_fold)
+ if (ISUPPER(c))
+ setbit(tolower(c), ccl);
+ else if (ISLOWER(c))
+ setbit(toupper(c), ccl);
+ ++c;
+ }
+ skip:
+ ;
}
- else
+ while ((c = c1) != ']');
+ if (invert)
{
- setbit(c, cset);
- c = c2;
+ notset(ccl);
+ if (syntax_bits & RE_HAT_LISTS_NOT_NEWLINE)
+ clrbit('\n', ccl);
}
- }
- while (c != ']');
- if (invert)
- notset(cset);
- caret_allowed = 0;
- closure_allowed = 1;
- return _SET + charset_index(cset);
+ laststart = 0;
+ return lasttok = CSET + charclass_index(ccl);
- default:
- normal_char:
- caret_allowed = 0;
- closure_allowed = 1;
- if (case_fold && ISALPHA(c))
- {
- zeroset(cset);
- if (isupper(c))
- c = tolower(c);
- setbit(c, cset);
- setbit(toupper(c), cset);
- return _SET + charset_index(cset);
+ default:
+ normal_char:
+ laststart = 0;
+ if (case_fold && ISALPHA(c))
+ {
+ zeroset(ccl);
+ setbit(c, ccl);
+ if (isupper(c))
+ setbit(tolower(c), ccl);
+ else
+ setbit(toupper(c), ccl);
+ return lasttok = CSET + charclass_index(ccl);
+ }
+ return c;
}
- return c;
}
+
+ /* The above loop should consume at most a backslash
+ and some other character. */
+ abort();
}
-
+
/* Recursive descent parser for regular expressions. */
-static _token tok; /* Lookahead token. */
+static token tok; /* Lookahead token. */
static depth; /* Current depth of a hypothetical stack
holding deferred productions. This is
used to determine the depth that will be
required of the real stack later on in
- reganalyze(). */
+ dfaanalyze(). */
/* Add the given token to the parse tree, maintaining the depth count and
updating the maximum depth if necessary. */
static void
addtok(t)
- _token t;
+ token t;
{
- REALLOC_IF_NECESSARY(reg->tokens, _token, reg->talloc, reg->tindex);
- reg->tokens[reg->tindex++] = t;
+ REALLOC_IF_NECESSARY(dfa->tokens, token, dfa->talloc, dfa->tindex);
+ dfa->tokens[dfa->tindex++] = t;
switch (t)
{
- case _QMARK:
- case _STAR:
- case _PLUS:
+ case QMARK:
+ case STAR:
+ case PLUS:
break;
- case _CAT:
- case _OR:
+ case CAT:
+ case OR:
+ case ORTOP:
--depth;
break;
default:
- ++reg->nleaves;
- case _EMPTY:
+ ++dfa->nleaves;
+ case EMPTY:
++depth;
break;
}
- if (depth > reg->depth)
- reg->depth = depth;
+ if (depth > dfa->depth)
+ dfa->depth = depth;
}
/* The grammar understood by the parser is as follows.
- start:
- regexp
- _ALLBEGLINE regexp
- regexp _ALLENDLINE
- _ALLBEGLINE regexp _ALLENDLINE
-
regexp:
- regexp _OR branch
+ regexp OR branch
branch
branch:
@@ -628,144 +782,187 @@ addtok(t)
closure
closure:
- closure _QMARK
- closure _STAR
- closure _PLUS
+ closure QMARK
+ closure STAR
+ closure PLUS
atom
atom:
<normal character>
- _SET
- _BACKREF
- _BEGLINE
- _ENDLINE
- _BEGWORD
- _ENDWORD
- _LIMWORD
- _NOTLIMWORD
+ CSET
+ BACKREF
+ BEGLINE
+ ENDLINE
+ BEGWORD
+ ENDWORD
+ LIMWORD
+ NOTLIMWORD
<empty>
The parser builds a parse tree in postfix form in an array of tokens. */
-#ifdef __STDC__
-static void regexp(void);
-#else
-static void regexp();
-#endif
-
static void
atom()
{
- if (tok >= 0 && (tok < _NOTCHAR || tok >= _SET || tok == _BACKREF
- || tok == _BEGLINE || tok == _ENDLINE || tok == _BEGWORD
- || tok == _ENDWORD || tok == _LIMWORD || tok == _NOTLIMWORD))
+ if ((tok >= 0 && tok < NOTCHAR) || tok >= CSET || tok == BACKREF
+ || tok == BEGLINE || tok == ENDLINE || tok == BEGWORD
+ || tok == ENDWORD || tok == LIMWORD || tok == NOTLIMWORD)
{
addtok(tok);
tok = lex();
}
- else if (tok == _LPAREN)
+ else if (tok == LPAREN)
{
tok = lex();
- regexp();
- if (tok != _RPAREN)
- reg_error("Unbalanced (");
+ regexp(0);
+ if (tok != RPAREN)
+ dfaerror("Unbalanced (");
tok = lex();
}
else
- addtok(_EMPTY);
+ addtok(EMPTY);
+}
+
+/* Return the number of tokens in the given subexpression. */
+static int
+nsubtoks(tindex)
+int tindex;
+{
+ int ntoks1;
+
+ switch (dfa->tokens[tindex - 1])
+ {
+ default:
+ return 1;
+ case QMARK:
+ case STAR:
+ case PLUS:
+ return 1 + nsubtoks(tindex - 1);
+ case CAT:
+ case OR:
+ case ORTOP:
+ ntoks1 = nsubtoks(tindex - 1);
+ return 1 + ntoks1 + nsubtoks(tindex - 1 - ntoks1);
+ }
+}
+
+/* Copy the given subexpression to the top of the tree. */
+static void
+copytoks(tindex, ntokens)
+ int tindex, ntokens;
+{
+ int i;
+
+ for (i = 0; i < ntokens; ++i)
+ addtok(dfa->tokens[tindex + i]);
}
static void
closure()
{
+ int tindex, ntokens, i;
+
atom();
- while (tok == _QMARK || tok == _STAR || tok == _PLUS)
- {
- addtok(tok);
- tok = lex();
- }
+ while (tok == QMARK || tok == STAR || tok == PLUS || tok == REPMN)
+ if (tok == REPMN)
+ {
+ ntokens = nsubtoks(dfa->tindex);
+ tindex = dfa->tindex - ntokens;
+ if (maxrep == 0)
+ addtok(PLUS);
+ if (minrep == 0)
+ addtok(QMARK);
+ for (i = 1; i < minrep; ++i)
+ {
+ copytoks(tindex, ntokens);
+ addtok(CAT);
+ }
+ for (; i < maxrep; ++i)
+ {
+ copytoks(tindex, ntokens);
+ addtok(QMARK);
+ addtok(CAT);
+ }
+ tok = lex();
+ }
+ else
+ {
+ addtok(tok);
+ tok = lex();
+ }
}
static void
branch()
{
closure();
- while (tok != _RPAREN && tok != _OR && tok != _ALLENDLINE && tok >= 0)
+ while (tok != RPAREN && tok != OR && tok >= 0)
{
closure();
- addtok(_CAT);
+ addtok(CAT);
}
}
static void
-regexp()
+regexp(toplevel)
+ int toplevel;
{
branch();
- while (tok == _OR)
+ while (tok == OR)
{
tok = lex();
branch();
- addtok(_OR);
+ if (toplevel)
+ addtok(ORTOP);
+ else
+ addtok(OR);
}
}
/* Main entry point for the parser. S is a string to be parsed, len is the
- length of the string, so s can include NUL characters. R is a pointer to
- the struct regexp to parse into. */
+ length of the string, so s can include NUL characters. D is a pointer to
+ the struct dfa to parse into. */
void
-regparse(s, len, r)
- const char *s;
+dfaparse(s, len, d)
+ char *s;
size_t len;
- struct regexp *r;
+ struct dfa *d;
+
{
- reg = r;
+ dfa = d;
lexstart = lexptr = s;
lexleft = len;
- caret_allowed = 1;
- closure_allowed = 0;
+ lasttok = END;
+ laststart = 1;
+ parens = 0;
if (! syntax_bits_set)
- reg_error("No syntax specified");
+ dfaerror("No syntax specified");
tok = lex();
- depth = r->depth;
+ depth = d->depth;
- if (tok == _ALLBEGLINE)
- {
- addtok(_BEGLINE);
- tok = lex();
- regexp();
- addtok(_CAT);
- }
- else
- regexp();
+ regexp(1);
- if (tok == _ALLENDLINE)
- {
- addtok(_ENDLINE);
- addtok(_CAT);
- tok = lex();
- }
+ if (tok != END)
+ dfaerror("Unbalanced )");
- if (tok != _END)
- reg_error("Unbalanced )");
+ addtok(END - d->nregexps);
+ addtok(CAT);
- addtok(_END - r->nregexps);
- addtok(_CAT);
+ if (d->nregexps)
+ addtok(ORTOP);
- if (r->nregexps)
- addtok(_OR);
-
- ++r->nregexps;
+ ++d->nregexps;
}
-
+
/* Some primitives for operating on sets of positions. */
/* Copy one set to another; the destination must be large enough. */
static void
copy(src, dst)
- const _position_set *src;
- _position_set *dst;
+ position_set *src;
+ position_set *dst;
{
int i;
@@ -780,14 +977,14 @@ copy(src, dst)
S->elems must point to an array large enough to hold the resulting set. */
static void
insert(p, s)
- _position p;
- _position_set *s;
+ position p;
+ position_set *s;
{
int i;
- _position t1, t2;
+ position t1, t2;
for (i = 0; i < s->nelem && p.index < s->elems[i].index; ++i)
- ;
+ continue;
if (i < s->nelem && p.index == s->elems[i].index)
s->elems[i].constraint |= p.constraint;
else
@@ -807,9 +1004,9 @@ insert(p, s)
the positions of both sets were inserted into an initially empty set. */
static void
merge(s1, s2, m)
- _position_set *s1;
- _position_set *s2;
- _position_set *m;
+ position_set *s1;
+ position_set *s2;
+ position_set *m;
{
int i = 0, j = 0;
@@ -833,8 +1030,8 @@ merge(s1, s2, m)
/* Delete a position from a set. */
static void
delete(p, s)
- _position p;
- _position_set *s;
+ position p;
+ position_set *s;
{
int i;
@@ -845,19 +1042,19 @@ delete(p, s)
for (--s->nelem; i < s->nelem; ++i)
s->elems[i] = s->elems[i + 1];
}
-
+
/* Find the index of the state corresponding to the given position set with
the given preceding context, or create a new state if there is no such
state. Newline and letter tell whether we got here on a newline or
letter, respectively. */
static int
-state_index(r, s, newline, letter)
- struct regexp *r;
- _position_set *s;
+state_index(d, s, newline, letter)
+ struct dfa *d;
+ position_set *s;
int newline;
int letter;
{
- int lhash = 0;
+ int hash = 0;
int constraint;
int i, j;
@@ -865,78 +1062,80 @@ state_index(r, s, newline, letter)
letter = letter ? 1 : 0;
for (i = 0; i < s->nelem; ++i)
- lhash ^= s->elems[i].index + s->elems[i].constraint;
+ hash ^= s->elems[i].index + s->elems[i].constraint;
/* Try to find a state that exactly matches the proposed one. */
- for (i = 0; i < r->sindex; ++i)
+ for (i = 0; i < d->sindex; ++i)
{
- if (lhash != r->states[i].hash || s->nelem != r->states[i].elems.nelem
- || newline != r->states[i].newline || letter != r->states[i].letter)
+ if (hash != d->states[i].hash || s->nelem != d->states[i].elems.nelem
+ || newline != d->states[i].newline || letter != d->states[i].letter)
continue;
for (j = 0; j < s->nelem; ++j)
if (s->elems[j].constraint
- != r->states[i].elems.elems[j].constraint
- || s->elems[j].index != r->states[i].elems.elems[j].index)
+ != d->states[i].elems.elems[j].constraint
+ || s->elems[j].index != d->states[i].elems.elems[j].index)
break;
if (j == s->nelem)
return i;
}
/* We'll have to create a new state. */
- REALLOC_IF_NECESSARY(r->states, _dfa_state, r->salloc, r->sindex);
- r->states[i].hash = lhash;
- MALLOC(r->states[i].elems.elems, _position, s->nelem);
- copy(s, &r->states[i].elems);
- r->states[i].newline = newline;
- r->states[i].letter = letter;
- r->states[i].backref = 0;
- r->states[i].constraint = 0;
- r->states[i].first_end = 0;
+ REALLOC_IF_NECESSARY(d->states, dfa_state, d->salloc, d->sindex);
+ d->states[i].hash = hash;
+ MALLOC(d->states[i].elems.elems, position, s->nelem);
+ copy(s, &d->states[i].elems);
+ d->states[i].newline = newline;
+ d->states[i].letter = letter;
+ d->states[i].backref = 0;
+ d->states[i].constraint = 0;
+ d->states[i].first_end = 0;
for (j = 0; j < s->nelem; ++j)
- if (r->tokens[s->elems[j].index] < 0)
+ if (d->tokens[s->elems[j].index] < 0)
{
constraint = s->elems[j].constraint;
- if (_SUCCEEDS_IN_CONTEXT(constraint, newline, 0, letter, 0)
- || _SUCCEEDS_IN_CONTEXT(constraint, newline, 0, letter, 1)
- || _SUCCEEDS_IN_CONTEXT(constraint, newline, 1, letter, 0)
- || _SUCCEEDS_IN_CONTEXT(constraint, newline, 1, letter, 1))
- r->states[i].constraint |= constraint;
- if (! r->states[i].first_end)
- r->states[i].first_end = r->tokens[s->elems[j].index];
+ if (SUCCEEDS_IN_CONTEXT(constraint, newline, 0, letter, 0)
+ || SUCCEEDS_IN_CONTEXT(constraint, newline, 0, letter, 1)
+ || SUCCEEDS_IN_CONTEXT(constraint, newline, 1, letter, 0)
+ || SUCCEEDS_IN_CONTEXT(constraint, newline, 1, letter, 1))
+ d->states[i].constraint |= constraint;
+ if (! d->states[i].first_end)
+ d->states[i].first_end = d->tokens[s->elems[j].index];
}
- else if (r->tokens[s->elems[j].index] == _BACKREF)
+ else if (d->tokens[s->elems[j].index] == BACKREF)
{
- r->states[i].constraint = _NO_CONSTRAINT;
- r->states[i].backref = 1;
+ d->states[i].constraint = NO_CONSTRAINT;
+ d->states[i].backref = 1;
}
- ++r->sindex;
+ ++d->sindex;
return i;
}
-
+
/* Find the epsilon closure of a set of positions. If any position of the set
contains a symbol that matches the empty string in some context, replace
that position with the elements of its follow labeled with an appropriate
constraint. Repeat exhaustively until no funny positions are left.
S->elems must be large enough to hold the result. */
+static void epsclosure _RE_ARGS((position_set *s, struct dfa *d));
+
static void
-epsclosure(s, r)
- _position_set *s;
- struct regexp *r;
+epsclosure(s, d)
+ position_set *s;
+ struct dfa *d;
{
int i, j;
int *visited;
- _position p, old;
+ position p, old;
- MALLOC(visited, int, r->tindex);
- for (i = 0; i < r->tindex; ++i)
+ MALLOC(visited, int, d->tindex);
+ for (i = 0; i < d->tindex; ++i)
visited[i] = 0;
for (i = 0; i < s->nelem; ++i)
- if (r->tokens[s->elems[i].index] >= _NOTCHAR
- && r->tokens[s->elems[i].index] != _BACKREF
- && r->tokens[s->elems[i].index] < _SET)
+ if (d->tokens[s->elems[i].index] >= NOTCHAR
+ && d->tokens[s->elems[i].index] != BACKREF
+ && d->tokens[s->elems[i].index] < CSET)
{
old = s->elems[i];
p.constraint = old.constraint;
@@ -947,32 +1146,32 @@ epsclosure(s, r)
continue;
}
visited[old.index] = 1;
- switch (r->tokens[old.index])
+ switch (d->tokens[old.index])
{
- case _BEGLINE:
- p.constraint &= _BEGLINE_CONSTRAINT;
+ case BEGLINE:
+ p.constraint &= BEGLINE_CONSTRAINT;
break;
- case _ENDLINE:
- p.constraint &= _ENDLINE_CONSTRAINT;
+ case ENDLINE:
+ p.constraint &= ENDLINE_CONSTRAINT;
break;
- case _BEGWORD:
- p.constraint &= _BEGWORD_CONSTRAINT;
+ case BEGWORD:
+ p.constraint &= BEGWORD_CONSTRAINT;
break;
- case _ENDWORD:
- p.constraint &= _ENDWORD_CONSTRAINT;
+ case ENDWORD:
+ p.constraint &= ENDWORD_CONSTRAINT;
break;
- case _LIMWORD:
- p.constraint &= _ENDWORD_CONSTRAINT;
+ case LIMWORD:
+ p.constraint &= LIMWORD_CONSTRAINT;
break;
- case _NOTLIMWORD:
- p.constraint &= _NOTLIMWORD_CONSTRAINT;
+ case NOTLIMWORD:
+ p.constraint &= NOTLIMWORD_CONSTRAINT;
break;
default:
break;
}
- for (j = 0; j < r->follows[old.index].nelem; ++j)
+ for (j = 0; j < d->follows[old.index].nelem; ++j)
{
- p.index = r->follows[old.index].elems[j].index;
+ p.index = d->follows[old.index].elems[j].index;
insert(p, s);
}
/* Force rescan to start at the beginning. */
@@ -981,41 +1180,41 @@ epsclosure(s, r)
free(visited);
}
-
+
/* Perform bottom-up analysis on the parse tree, computing various functions.
Note that at this point, we're pretending constructs like \< are real
characters rather than constraints on what can follow them.
Nullable: A node is nullable if it is at the root of a regexp that can
match the empty string.
- * _EMPTY leaves are nullable.
+ * EMPTY leaves are nullable.
* No other leaf is nullable.
- * A _QMARK or _STAR node is nullable.
- * A _PLUS node is nullable if its argument is nullable.
- * A _CAT node is nullable if both its arguments are nullable.
- * An _OR node is nullable if either argument is nullable.
+ * A QMARK or STAR node is nullable.
+ * A PLUS node is nullable if its argument is nullable.
+ * A CAT node is nullable if both its arguments are nullable.
+ * An OR node is nullable if either argument is nullable.
Firstpos: The firstpos of a node is the set of positions (nonempty leaves)
that could correspond to the first character of a string matching the
regexp rooted at the given node.
- * _EMPTY leaves have empty firstpos.
+ * EMPTY leaves have empty firstpos.
* The firstpos of a nonempty leaf is that leaf itself.
- * The firstpos of a _QMARK, _STAR, or _PLUS node is the firstpos of its
+ * The firstpos of a QMARK, STAR, or PLUS node is the firstpos of its
argument.
- * The firstpos of a _CAT node is the firstpos of the left argument, union
+ * The firstpos of a CAT node is the firstpos of the left argument, union
the firstpos of the right if the left argument is nullable.
- * The firstpos of an _OR node is the union of firstpos of each argument.
+ * The firstpos of an OR node is the union of firstpos of each argument.
Lastpos: The lastpos of a node is the set of positions that could
correspond to the last character of a string matching the regexp at
the given node.
- * _EMPTY leaves have empty lastpos.
+ * EMPTY leaves have empty lastpos.
* The lastpos of a nonempty leaf is that leaf itself.
- * The lastpos of a _QMARK, _STAR, or _PLUS node is the lastpos of its
+ * The lastpos of a QMARK, STAR, or PLUS node is the lastpos of its
argument.
- * The lastpos of a _CAT node is the lastpos of its right argument, union
+ * The lastpos of a CAT node is the lastpos of its right argument, union
the lastpos of the left if the right argument is nullable.
- * The lastpos of an _OR node is the union of the lastpos of each argument.
+ * The lastpos of an OR node is the union of the lastpos of each argument.
Follow: The follow of a position is the set of positions that could
correspond to the character following a character matching the node in
@@ -1024,9 +1223,9 @@ epsclosure(s, r)
Later, if we find that a special symbol is in a follow set, we will
replace it with the elements of its follow, labeled with an appropriate
constraint.
- * Every node in the firstpos of the argument of a _STAR or _PLUS node is in
+ * Every node in the firstpos of the argument of a STAR or PLUS node is in
the follow of every node in the lastpos.
- * Every node in the firstpos of the second argument of a _CAT node is in
+ * Every node in the firstpos of the second argument of a CAT node is in
the follow of every node in the lastpos of the first argument.
Because of the postfix representation of the parse tree, the depth-first
@@ -1035,48 +1234,61 @@ epsclosure(s, r)
scheme; the number of elements in each set deeper in the stack can be
used to determine the address of a particular set's array. */
void
-reganalyze(r, searchflag)
- struct regexp *r;
+dfaanalyze(d, searchflag)
+ struct dfa *d;
int searchflag;
{
int *nullable; /* Nullable stack. */
int *nfirstpos; /* Element count stack for firstpos sets. */
- _position *firstpos; /* Array where firstpos elements are stored. */
+ position *firstpos; /* Array where firstpos elements are stored. */
int *nlastpos; /* Element count stack for lastpos sets. */
- _position *lastpos; /* Array where lastpos elements are stored. */
+ position *lastpos; /* Array where lastpos elements are stored. */
int *nalloc; /* Sizes of arrays allocated to follow sets. */
- _position_set tmp; /* Temporary set for merging sets. */
- _position_set merged; /* Result of merging sets. */
+ position_set tmp; /* Temporary set for merging sets. */
+ position_set merged; /* Result of merging sets. */
int wants_newline; /* True if some position wants newline info. */
int *o_nullable;
int *o_nfirst, *o_nlast;
- _position *o_firstpos, *o_lastpos;
+ position *o_firstpos, *o_lastpos;
int i, j;
- _position *pos;
+ position *pos;
- r->searchflag = searchflag;
+#ifdef DEBUG
+ fprintf(stderr, "dfaanalyze:\n");
+ for (i = 0; i < d->tindex; ++i)
+ {
+ fprintf(stderr, " %d:", i);
+ prtok(d->tokens[i]);
+ }
+ putc('\n', stderr);
+#endif
- MALLOC(nullable, int, r->depth);
+ d->searchflag = searchflag;
+
+ MALLOC(nullable, int, d->depth);
o_nullable = nullable;
- MALLOC(nfirstpos, int, r->depth);
+ MALLOC(nfirstpos, int, d->depth);
o_nfirst = nfirstpos;
- MALLOC(firstpos, _position, r->nleaves);
- o_firstpos = firstpos, firstpos += r->nleaves;
- MALLOC(nlastpos, int, r->depth);
+ MALLOC(firstpos, position, d->nleaves);
+ o_firstpos = firstpos, firstpos += d->nleaves;
+ MALLOC(nlastpos, int, d->depth);
o_nlast = nlastpos;
- MALLOC(lastpos, _position, r->nleaves);
- o_lastpos = lastpos, lastpos += r->nleaves;
- MALLOC(nalloc, int, r->tindex);
- for (i = 0; i < r->tindex; ++i)
+ MALLOC(lastpos, position, d->nleaves);
+ o_lastpos = lastpos, lastpos += d->nleaves;
+ MALLOC(nalloc, int, d->tindex);
+ for (i = 0; i < d->tindex; ++i)
nalloc[i] = 0;
- MALLOC(merged.elems, _position, r->nleaves);
+ MALLOC(merged.elems, position, d->nleaves);
- CALLOC(r->follows, _position_set, r->tindex);
+ CALLOC(d->follows, position_set, d->tindex);
- for (i = 0; i < r->tindex; ++i)
- switch (r->tokens[i])
+ for (i = 0; i < d->tindex; ++i)
+#ifdef DEBUG
+ { /* Nonsyntactic #ifdef goo... */
+#endif
+ switch (d->tokens[i])
{
- case _EMPTY:
+ case EMPTY:
/* The empty set is nullable. */
*nullable++ = 1;
@@ -1084,8 +1296,8 @@ reganalyze(r, searchflag)
*nfirstpos++ = *nlastpos++ = 0;
break;
- case _STAR:
- case _PLUS:
+ case STAR:
+ case PLUS:
/* Every element in the firstpos of the argument is in the follow
of every element in the lastpos. */
tmp.nelem = nfirstpos[-1];
@@ -1093,19 +1305,19 @@ reganalyze(r, searchflag)
pos = lastpos;
for (j = 0; j < nlastpos[-1]; ++j)
{
- merge(&tmp, &r->follows[pos[j].index], &merged);
- REALLOC_IF_NECESSARY(r->follows[pos[j].index].elems, _position,
+ merge(&tmp, &d->follows[pos[j].index], &merged);
+ REALLOC_IF_NECESSARY(d->follows[pos[j].index].elems, position,
nalloc[pos[j].index], merged.nelem - 1);
- copy(&merged, &r->follows[pos[j].index]);
+ copy(&merged, &d->follows[pos[j].index]);
}
- case _QMARK:
- /* A _QMARK or _STAR node is automatically nullable. */
- if (r->tokens[i] != _PLUS)
+ case QMARK:
+ /* A QMARK or STAR node is automatically nullable. */
+ if (d->tokens[i] != PLUS)
nullable[-1] = 1;
break;
- case _CAT:
+ case CAT:
/* Every element in the firstpos of the second argument is in the
follow of every element in the lastpos of the first argument. */
tmp.nelem = nfirstpos[-1];
@@ -1113,13 +1325,13 @@ reganalyze(r, searchflag)
pos = lastpos + nlastpos[-1];
for (j = 0; j < nlastpos[-2]; ++j)
{
- merge(&tmp, &r->follows[pos[j].index], &merged);
- REALLOC_IF_NECESSARY(r->follows[pos[j].index].elems, _position,
+ merge(&tmp, &d->follows[pos[j].index], &merged);
+ REALLOC_IF_NECESSARY(d->follows[pos[j].index].elems, position,
nalloc[pos[j].index], merged.nelem - 1);
- copy(&merged, &r->follows[pos[j].index]);
+ copy(&merged, &d->follows[pos[j].index]);
}
- /* The firstpos of a _CAT node is the firstpos of the first argument,
+ /* The firstpos of a CAT node is the firstpos of the first argument,
union that of the second argument if the first is nullable. */
if (nullable[-2])
nfirstpos[-2] += nfirstpos[-1];
@@ -1127,7 +1339,7 @@ reganalyze(r, searchflag)
firstpos += nfirstpos[-1];
--nfirstpos;
- /* The lastpos of a _CAT node is the lastpos of the second argument,
+ /* The lastpos of a CAT node is the lastpos of the second argument,
union that of the first argument if the second is nullable. */
if (nullable[-1])
nlastpos[-2] += nlastpos[-1];
@@ -1141,12 +1353,13 @@ reganalyze(r, searchflag)
}
--nlastpos;
- /* A _CAT node is nullable if both arguments are nullable. */
+ /* A CAT node is nullable if both arguments are nullable. */
nullable[-2] = nullable[-1] && nullable[-2];
--nullable;
break;
- case _OR:
+ case OR:
+ case ORTOP:
/* The firstpos is the union of the firstpos of each argument. */
nfirstpos[-2] += nfirstpos[-1];
--nfirstpos;
@@ -1155,7 +1368,7 @@ reganalyze(r, searchflag)
nlastpos[-2] += nlastpos[-1];
--nlastpos;
- /* An _OR node is nullable if either argument is nullable. */
+ /* An OR node is nullable if either argument is nullable. */
nullable[-2] = nullable[-1] || nullable[-2];
--nullable;
break;
@@ -1166,31 +1379,63 @@ reganalyze(r, searchflag)
an "epsilon closure" effectively makes them nullable later.
Backreferences have to get a real position so we can detect
transitions on them later. But they are nullable. */
- *nullable++ = r->tokens[i] == _BACKREF;
+ *nullable++ = d->tokens[i] == BACKREF;
/* This position is in its own firstpos and lastpos. */
*nfirstpos++ = *nlastpos++ = 1;
--firstpos, --lastpos;
firstpos->index = lastpos->index = i;
- firstpos->constraint = lastpos->constraint = _NO_CONSTRAINT;
+ firstpos->constraint = lastpos->constraint = NO_CONSTRAINT;
/* Allocate the follow set for this position. */
nalloc[i] = 1;
- MALLOC(r->follows[i].elems, _position, nalloc[i]);
+ MALLOC(d->follows[i].elems, position, nalloc[i]);
break;
}
+#ifdef DEBUG
+ /* ... balance the above nonsyntactic #ifdef goo... */
+ fprintf(stderr, "node %d:", i);
+ prtok(d->tokens[i]);
+ putc('\n', stderr);
+ fprintf(stderr, nullable[-1] ? " nullable: yes\n" : " nullable: no\n");
+ fprintf(stderr, " firstpos:");
+ for (j = nfirstpos[-1] - 1; j >= 0; --j)
+ {
+ fprintf(stderr, " %d:", firstpos[j].index);
+ prtok(d->tokens[firstpos[j].index]);
+ }
+ fprintf(stderr, "\n lastpos:");
+ for (j = nlastpos[-1] - 1; j >= 0; --j)
+ {
+ fprintf(stderr, " %d:", lastpos[j].index);
+ prtok(d->tokens[lastpos[j].index]);
+ }
+ putc('\n', stderr);
+ }
+#endif
/* For each follow set that is the follow set of a real position, replace
it with its epsilon closure. */
- for (i = 0; i < r->tindex; ++i)
- if (r->tokens[i] < _NOTCHAR || r->tokens[i] == _BACKREF
- || r->tokens[i] >= _SET)
+ for (i = 0; i < d->tindex; ++i)
+ if (d->tokens[i] < NOTCHAR || d->tokens[i] == BACKREF
+ || d->tokens[i] >= CSET)
{
- copy(&r->follows[i], &merged);
- epsclosure(&merged, r);
- if (r->follows[i].nelem < merged.nelem)
- REALLOC(r->follows[i].elems, _position, merged.nelem);
- copy(&merged, &r->follows[i]);
+#ifdef DEBUG
+ fprintf(stderr, "follows(%d:", i);
+ prtok(d->tokens[i]);
+ fprintf(stderr, "):");
+ for (j = d->follows[i].nelem - 1; j >= 0; --j)
+ {
+ fprintf(stderr, " %d:", d->follows[i].elems[j].index);
+ prtok(d->tokens[d->follows[i].elems[j].index]);
+ }
+ putc('\n', stderr);
+#endif
+ copy(&d->follows[i], &merged);
+ epsclosure(&merged, d);
+ if (d->follows[i].nelem < merged.nelem)
+ REALLOC(d->follows[i].elems, position, merged.nelem);
+ copy(&merged, &d->follows[i]);
}
/* Get the epsilon closure of the firstpos of the regexp. The result will
@@ -1198,19 +1443,19 @@ reganalyze(r, searchflag)
merged.nelem = 0;
for (i = 0; i < nfirstpos[-1]; ++i)
insert(firstpos[i], &merged);
- epsclosure(&merged, r);
+ epsclosure(&merged, d);
/* Check if any of the positions of state 0 will want newline context. */
wants_newline = 0;
for (i = 0; i < merged.nelem; ++i)
- if (_PREV_NEWLINE_DEPENDENT(merged.elems[i].constraint))
+ if (PREV_NEWLINE_DEPENDENT(merged.elems[i].constraint))
wants_newline = 1;
/* Build the initial state. */
- r->salloc = 1;
- r->sindex = 0;
- MALLOC(r->states, _dfa_state, r->salloc);
- state_index(r, &merged, wants_newline, 0);
+ d->salloc = 1;
+ d->sindex = 0;
+ MALLOC(d->states, dfa_state, d->salloc);
+ state_index(d, &merged, wants_newline, 0);
free(o_nullable);
free(o_nfirst);
@@ -1220,8 +1465,8 @@ reganalyze(r, searchflag)
free(nalloc);
free(merged.elems);
}
-
-/* Find, for each character, the transition out of state s of r, and store
+
+/* Find, for each character, the transition out of state s of d, and store
it in the appropriate slot of trans.
We divide the positions of s into groups (positions can appear in more
@@ -1252,25 +1497,25 @@ reganalyze(r, searchflag)
create a new group labeled with the characters of C and insert this
position in that group. */
void
-regstate(s, r, trans)
+dfastate(s, d, trans)
int s;
- struct regexp *r;
+ struct dfa *d;
int trans[];
{
- _position_set grps[_NOTCHAR]; /* As many as will ever be needed. */
- _charset labels[_NOTCHAR]; /* Labels corresponding to the groups. */
+ position_set grps[NOTCHAR]; /* As many as will ever be needed. */
+ charclass labels[NOTCHAR]; /* Labels corresponding to the groups. */
int ngrps = 0; /* Number of groups actually used. */
- _position pos; /* Current position being considered. */
- _charset matches; /* Set of matching characters. */
+ position pos; /* Current position being considered. */
+ charclass matches; /* Set of matching characters. */
int matchesf; /* True if matches is nonempty. */
- _charset intersect; /* Intersection with some label set. */
+ charclass intersect; /* Intersection with some label set. */
int intersectf; /* True if intersect is nonempty. */
- _charset leftovers; /* Stuff in the label that didn't match. */
+ charclass leftovers; /* Stuff in the label that didn't match. */
int leftoversf; /* True if leftovers is nonempty. */
- static _charset letters; /* Set of characters considered letters. */
- static _charset newline; /* Set of characters that aren't newline. */
- _position_set follows; /* Union of the follows of some group. */
- _position_set tmp; /* Temporary space for merging sets. */
+ static charclass letters; /* Set of characters considered letters. */
+ static charclass newline; /* Set of characters that aren't newline. */
+ position_set follows; /* Union of the follows of some group. */
+ position_set tmp; /* Temporary space for merging sets. */
int state; /* New state. */
int wants_newline; /* New state wants to know newline context. */
int state_newline; /* New state on a newline transition. */
@@ -1283,7 +1528,7 @@ regstate(s, r, trans)
if (! initialized)
{
initialized = 1;
- for (i = 0; i < _NOTCHAR; ++i)
+ for (i = 0; i < NOTCHAR; ++i)
if (ISALNUM(i))
setbit(i, letters);
setbit('\n', newline);
@@ -1291,40 +1536,40 @@ regstate(s, r, trans)
zeroset(matches);
- for (i = 0; i < r->states[s].elems.nelem; ++i)
+ for (i = 0; i < d->states[s].elems.nelem; ++i)
{
- pos = r->states[s].elems.elems[i];
- if (r->tokens[pos.index] >= 0 && r->tokens[pos.index] < _NOTCHAR)
- setbit(r->tokens[pos.index], matches);
- else if (r->tokens[pos.index] >= _SET)
- copyset(r->charsets[r->tokens[pos.index] - _SET], matches);
+ pos = d->states[s].elems.elems[i];
+ if (d->tokens[pos.index] >= 0 && d->tokens[pos.index] < NOTCHAR)
+ setbit(d->tokens[pos.index], matches);
+ else if (d->tokens[pos.index] >= CSET)
+ copyset(d->charclasses[d->tokens[pos.index] - CSET], matches);
else
continue;
- /* Some characters may need to be climinated from matches because
+ /* Some characters may need to be eliminated from matches because
they fail in the current context. */
- if (pos.constraint != 0xff)
+ if (pos.constraint != 0xFF)
{
- if (! _MATCHES_NEWLINE_CONTEXT(pos.constraint,
- r->states[s].newline, 1))
+ if (! MATCHES_NEWLINE_CONTEXT(pos.constraint,
+ d->states[s].newline, 1))
clrbit('\n', matches);
- if (! _MATCHES_NEWLINE_CONTEXT(pos.constraint,
- r->states[s].newline, 0))
- for (j = 0; j < _CHARSET_INTS; ++j)
+ if (! MATCHES_NEWLINE_CONTEXT(pos.constraint,
+ d->states[s].newline, 0))
+ for (j = 0; j < CHARCLASS_INTS; ++j)
matches[j] &= newline[j];
- if (! _MATCHES_LETTER_CONTEXT(pos.constraint,
- r->states[s].letter, 1))
- for (j = 0; j < _CHARSET_INTS; ++j)
+ if (! MATCHES_LETTER_CONTEXT(pos.constraint,
+ d->states[s].letter, 1))
+ for (j = 0; j < CHARCLASS_INTS; ++j)
matches[j] &= ~letters[j];
- if (! _MATCHES_LETTER_CONTEXT(pos.constraint,
- r->states[s].letter, 0))
- for (j = 0; j < _CHARSET_INTS; ++j)
+ if (! MATCHES_LETTER_CONTEXT(pos.constraint,
+ d->states[s].letter, 0))
+ for (j = 0; j < CHARCLASS_INTS; ++j)
matches[j] &= letters[j];
/* If there are no characters left, there's no point in going on. */
- for (j = 0; j < _CHARSET_INTS && !matches[j]; ++j)
- ;
- if (j == _CHARSET_INTS)
+ for (j = 0; j < CHARCLASS_INTS && !matches[j]; ++j)
+ continue;
+ if (j == CHARCLASS_INTS)
continue;
}
@@ -1333,27 +1578,27 @@ regstate(s, r, trans)
/* If matches contains a single character only, and the current
group's label doesn't contain that character, go on to the
next group. */
- if (r->tokens[pos.index] >= 0 && r->tokens[pos.index] < _NOTCHAR
- && !tstbit(r->tokens[pos.index], labels[j]))
+ if (d->tokens[pos.index] >= 0 && d->tokens[pos.index] < NOTCHAR
+ && !tstbit(d->tokens[pos.index], labels[j]))
continue;
/* Check if this group's label has a nonempty intersection with
matches. */
intersectf = 0;
- for (k = 0; k < _CHARSET_INTS; ++k)
- (intersect[k] = matches[k] & labels[j][k]) ? intersectf = 1 : 0;
+ for (k = 0; k < CHARCLASS_INTS; ++k)
+ (intersect[k] = matches[k] & labels[j][k]) ? (intersectf = 1) : 0;
if (! intersectf)
continue;
/* It does; now find the set differences both ways. */
leftoversf = matchesf = 0;
- for (k = 0; k < _CHARSET_INTS; ++k)
+ for (k = 0; k < CHARCLASS_INTS; ++k)
{
/* Even an optimizing compiler can't know this for sure. */
int match = matches[k], label = labels[j][k];
- (leftovers[k] = ~match & label) ? leftoversf = 1 : 0;
- (matches[k] = match & ~label) ? matchesf = 1 : 0;
+ (leftovers[k] = ~match & label) ? (leftoversf = 1) : 0;
+ (matches[k] = match & ~label) ? (matchesf = 1) : 0;
}
/* If there were leftovers, create a new group labeled with them. */
@@ -1361,7 +1606,7 @@ regstate(s, r, trans)
{
copyset(leftovers, labels[ngrps]);
copyset(intersect, labels[j]);
- MALLOC(grps[ngrps].elems, _position, r->nleaves);
+ MALLOC(grps[ngrps].elems, position, d->nleaves);
copy(&grps[j], &grps[ngrps]);
++ngrps;
}
@@ -1382,46 +1627,50 @@ regstate(s, r, trans)
{
copyset(matches, labels[ngrps]);
zeroset(matches);
- MALLOC(grps[ngrps].elems, _position, r->nleaves);
+ MALLOC(grps[ngrps].elems, position, d->nleaves);
grps[ngrps].nelem = 1;
grps[ngrps].elems[0] = pos;
++ngrps;
}
}
- MALLOC(follows.elems, _position, r->nleaves);
- MALLOC(tmp.elems, _position, r->nleaves);
+ MALLOC(follows.elems, position, d->nleaves);
+ MALLOC(tmp.elems, position, d->nleaves);
/* If we are a searching matcher, the default transition is to a state
containing the positions of state 0, otherwise the default transition
is to fail miserably. */
- if (r->searchflag)
+ if (d->searchflag)
{
wants_newline = 0;
wants_letter = 0;
- for (i = 0; i < r->states[0].elems.nelem; ++i)
+ for (i = 0; i < d->states[0].elems.nelem; ++i)
{
- if (_PREV_NEWLINE_DEPENDENT(r->states[0].elems.elems[i].constraint))
+ if (PREV_NEWLINE_DEPENDENT(d->states[0].elems.elems[i].constraint))
wants_newline = 1;
- if (_PREV_LETTER_DEPENDENT(r->states[0].elems.elems[i].constraint))
+ if (PREV_LETTER_DEPENDENT(d->states[0].elems.elems[i].constraint))
wants_letter = 1;
}
- copy(&r->states[0].elems, &follows);
- state = state_index(r, &follows, 0, 0);
+ copy(&d->states[0].elems, &follows);
+ state = state_index(d, &follows, 0, 0);
if (wants_newline)
- state_newline = state_index(r, &follows, 1, 0);
+ state_newline = state_index(d, &follows, 1, 0);
else
state_newline = state;
if (wants_letter)
- state_letter = state_index(r, &follows, 0, 1);
+ state_letter = state_index(d, &follows, 0, 1);
else
state_letter = state;
- for (i = 0; i < _NOTCHAR; ++i)
- trans[i] = (ISALNUM(i)) ? state_letter : state ;
- trans['\n'] = state_newline;
+ for (i = 0; i < NOTCHAR; ++i)
+ if (i == '\n')
+ trans[i] = state_newline;
+ else if (ISALNUM(i))
+ trans[i] = state_letter;
+ else
+ trans[i] = state;
}
else
- for (i = 0; i < _NOTCHAR; ++i)
+ for (i = 0; i < NOTCHAR; ++i)
trans[i] = -1;
for (i = 0; i < ngrps; ++i)
@@ -1431,44 +1680,44 @@ regstate(s, r, trans)
/* Find the union of the follows of the positions of the group.
This is a hideously inefficient loop. Fix it someday. */
for (j = 0; j < grps[i].nelem; ++j)
- for (k = 0; k < r->follows[grps[i].elems[j].index].nelem; ++k)
- insert(r->follows[grps[i].elems[j].index].elems[k], &follows);
+ for (k = 0; k < d->follows[grps[i].elems[j].index].nelem; ++k)
+ insert(d->follows[grps[i].elems[j].index].elems[k], &follows);
/* If we are building a searching matcher, throw in the positions
of state 0 as well. */
- if (r->searchflag)
- for (j = 0; j < r->states[0].elems.nelem; ++j)
- insert(r->states[0].elems.elems[j], &follows);
+ if (d->searchflag)
+ for (j = 0; j < d->states[0].elems.nelem; ++j)
+ insert(d->states[0].elems.elems[j], &follows);
/* Find out if the new state will want any context information. */
wants_newline = 0;
if (tstbit('\n', labels[i]))
for (j = 0; j < follows.nelem; ++j)
- if (_PREV_NEWLINE_DEPENDENT(follows.elems[j].constraint))
+ if (PREV_NEWLINE_DEPENDENT(follows.elems[j].constraint))
wants_newline = 1;
wants_letter = 0;
- for (j = 0; j < _CHARSET_INTS; ++j)
+ for (j = 0; j < CHARCLASS_INTS; ++j)
if (labels[i][j] & letters[j])
break;
- if (j < _CHARSET_INTS)
+ if (j < CHARCLASS_INTS)
for (j = 0; j < follows.nelem; ++j)
- if (_PREV_LETTER_DEPENDENT(follows.elems[j].constraint))
+ if (PREV_LETTER_DEPENDENT(follows.elems[j].constraint))
wants_letter = 1;
/* Find the state(s) corresponding to the union of the follows. */
- state = state_index(r, &follows, 0, 0);
+ state = state_index(d, &follows, 0, 0);
if (wants_newline)
- state_newline = state_index(r, &follows, 1, 0);
+ state_newline = state_index(d, &follows, 1, 0);
else
state_newline = state;
if (wants_letter)
- state_letter = state_index(r, &follows, 0, 1);
+ state_letter = state_index(d, &follows, 0, 1);
else
state_letter = state;
/* Set the transitions for each character in the current label. */
- for (j = 0; j < _CHARSET_INTS; ++j)
+ for (j = 0; j < CHARCLASS_INTS; ++j)
for (k = 0; k < INTBITS; ++k)
if (labels[i][j] & 1 << k)
{
@@ -1478,7 +1727,7 @@ regstate(s, r, trans)
trans[c] = state_newline;
else if (ISALNUM(c))
trans[c] = state_letter;
- else if (c < _NOTCHAR)
+ else if (c < NOTCHAR)
trans[c] = state;
}
}
@@ -1488,18 +1737,18 @@ regstate(s, r, trans)
free(follows.elems);
free(tmp.elems);
}
-
-/* Some routines for manipulating a compiled regexp's transition tables.
+
+/* Some routines for manipulating a compiled dfa's transition tables.
Each state may or may not have a transition table; if it does, and it
- is a non-accepting state, then r->trans[state] points to its table.
- If it is an accepting state then r->fails[state] points to its table.
- If it has no table at all, then r->trans[state] is NULL.
+ is a non-accepting state, then d->trans[state] points to its table.
+ If it is an accepting state then d->fails[state] points to its table.
+ If it has no table at all, then d->trans[state] is NULL.
TODO: Improve this comment, get rid of the unnecessary redundancy. */
static void
-build_state(s, r)
+build_state(s, d)
int s;
- struct regexp *r;
+ struct dfa *d;
{
int *trans; /* The new transition table. */
int i;
@@ -1508,87 +1757,87 @@ build_state(s, r)
exist at once. 1024 is arbitrary. The idea is that the frequently
used transition tables will be quickly rebuilt, whereas the ones that
were only needed once or twice will be cleared away. */
- if (r->trcount >= 1024)
+ if (d->trcount >= 1024)
{
- for (i = 0; i < r->tralloc; ++i)
- if (r->trans[i])
+ for (i = 0; i < d->tralloc; ++i)
+ if (d->trans[i])
{
- free((ptr_t) r->trans[i]);
- r->trans[i] = NULL;
+ free((ptr_t) d->trans[i]);
+ d->trans[i] = NULL;
}
- else if (r->fails[i])
+ else if (d->fails[i])
{
- free((ptr_t) r->fails[i]);
- r->fails[i] = NULL;
+ free((ptr_t) d->fails[i]);
+ d->fails[i] = NULL;
}
- r->trcount = 0;
+ d->trcount = 0;
}
- ++r->trcount;
+ ++d->trcount;
/* Set up the success bits for this state. */
- r->success[s] = 0;
- if (ACCEPTS_IN_CONTEXT(r->states[s].newline, 1, r->states[s].letter, 0,
- s, *r))
- r->success[s] |= 4;
- if (ACCEPTS_IN_CONTEXT(r->states[s].newline, 0, r->states[s].letter, 1,
- s, *r))
- r->success[s] |= 2;
- if (ACCEPTS_IN_CONTEXT(r->states[s].newline, 0, r->states[s].letter, 0,
- s, *r))
- r->success[s] |= 1;
-
- MALLOC(trans, int, _NOTCHAR);
- regstate(s, r, trans);
+ d->success[s] = 0;
+ if (ACCEPTS_IN_CONTEXT(d->states[s].newline, 1, d->states[s].letter, 0,
+ s, *d))
+ d->success[s] |= 4;
+ if (ACCEPTS_IN_CONTEXT(d->states[s].newline, 0, d->states[s].letter, 1,
+ s, *d))
+ d->success[s] |= 2;
+ if (ACCEPTS_IN_CONTEXT(d->states[s].newline, 0, d->states[s].letter, 0,
+ s, *d))
+ d->success[s] |= 1;
+
+ MALLOC(trans, int, NOTCHAR);
+ dfastate(s, d, trans);
/* Now go through the new transition table, and make sure that the trans
and fail arrays are allocated large enough to hold a pointer for the
largest state mentioned in the table. */
- for (i = 0; i < _NOTCHAR; ++i)
- if (trans[i] >= r->tralloc)
+ for (i = 0; i < NOTCHAR; ++i)
+ if (trans[i] >= d->tralloc)
{
- int oldalloc = r->tralloc;
-
- while (trans[i] >= r->tralloc)
- r->tralloc *= 2;
- REALLOC(r->realtrans, int *, r->tralloc + 1);
- r->trans = r->realtrans + 1;
- REALLOC(r->fails, int *, r->tralloc);
- REALLOC(r->success, int, r->tralloc);
- REALLOC(r->newlines, int, r->tralloc);
- while (oldalloc < r->tralloc)
+ int oldalloc = d->tralloc;
+
+ while (trans[i] >= d->tralloc)
+ d->tralloc *= 2;
+ REALLOC(d->realtrans, int *, d->tralloc + 1);
+ d->trans = d->realtrans + 1;
+ REALLOC(d->fails, int *, d->tralloc);
+ REALLOC(d->success, int, d->tralloc);
+ REALLOC(d->newlines, int, d->tralloc);
+ while (oldalloc < d->tralloc)
{
- r->trans[oldalloc] = NULL;
- r->fails[oldalloc++] = NULL;
+ d->trans[oldalloc] = NULL;
+ d->fails[oldalloc++] = NULL;
}
}
/* Keep the newline transition in a special place so we can use it as
a sentinel. */
- r->newlines[s] = trans['\n'];
+ d->newlines[s] = trans['\n'];
trans['\n'] = -1;
- if (ACCEPTING(s, *r))
- r->fails[s] = trans;
+ if (ACCEPTING(s, *d))
+ d->fails[s] = trans;
else
- r->trans[s] = trans;
+ d->trans[s] = trans;
}
static void
-build_state_zero(r)
- struct regexp *r;
+build_state_zero(d)
+ struct dfa *d;
{
- r->tralloc = 1;
- r->trcount = 0;
- CALLOC(r->realtrans, int *, r->tralloc + 1);
- r->trans = r->realtrans + 1;
- CALLOC(r->fails, int *, r->tralloc);
- MALLOC(r->success, int, r->tralloc);
- MALLOC(r->newlines, int, r->tralloc);
- build_state(0, r);
+ d->tralloc = 1;
+ d->trcount = 0;
+ CALLOC(d->realtrans, int *, d->tralloc + 1);
+ d->trans = d->realtrans + 1;
+ CALLOC(d->fails, int *, d->tralloc);
+ MALLOC(d->success, int, d->tralloc);
+ MALLOC(d->newlines, int, d->tralloc);
+ build_state(0, d);
}
-
-/* Search through a buffer looking for a match to the given struct regexp.
+
+/* Search through a buffer looking for a match to the given struct dfa.
Find the first occurrence of a string matching the regexp in the buffer,
and the shortest possible version thereof. Return a pointer to the first
character after the match, or NULL if none is found. Begin points to
@@ -1602,8 +1851,8 @@ build_state_zero(r)
match needs to be verified by a backtracking matcher. Otherwise
we store a 0 in *backref. */
char *
-regexecute(r, begin, end, newline, count, backref)
- struct regexp *r;
+dfaexec(d, begin, end, newline, count, backref)
+ struct dfa *d;
char *begin;
char *end;
int newline;
@@ -1612,9 +1861,9 @@ regexecute(r, begin, end, newline, count, backref)
{
register s, s1, tmp; /* Current state. */
register unsigned char *p; /* Current input character. */
- register **trans, *t; /* Copy of r->trans so it can be optimized
+ register **trans, *t; /* Copy of d->trans so it can be optimized
into a register. */
- static sbit[_NOTCHAR]; /* Table for anding with r->success. */
+ static sbit[NOTCHAR]; /* Table for anding with d->success. */
static sbit_init;
if (! sbit_init)
@@ -1622,41 +1871,54 @@ regexecute(r, begin, end, newline, count, backref)
int i;
sbit_init = 1;
- for (i = 0; i < _NOTCHAR; ++i)
- sbit[i] = (ISALNUM(i)) ? 2 : 1;
- sbit['\n'] = 4;
+ for (i = 0; i < NOTCHAR; ++i)
+ if (i == '\n')
+ sbit[i] = 4;
+ else if (ISALNUM(i))
+ sbit[i] = 2;
+ else
+ sbit[i] = 1;
}
- if (! r->tralloc)
- build_state_zero(r);
+ if (! d->tralloc)
+ build_state_zero(d);
s = s1 = 0;
p = (unsigned char *) begin;
- trans = r->trans;
+ trans = d->trans;
*end = '\n';
for (;;)
{
- while ((t = trans[s]) != 0) { /* hand-optimized loop */
- s1 = t[*p++];
- if ((t = trans[s1]) == 0) {
- tmp = s ; s = s1 ; s1 = tmp ; /* swap */
- break;
- }
- s = t[*p++];
- }
+ /* The dreaded inner loop. */
+ if ((t = trans[s]) != 0)
+ do
+ {
+ s1 = t[*p++];
+ if (! (t = trans[s1]))
+ goto last_was_s;
+ s = t[*p++];
+ }
+ while ((t = trans[s]) != 0);
+ goto last_was_s1;
+ last_was_s:
+ tmp = s, s = s1, s1 = tmp;
+ last_was_s1:
- if (s >= 0 && p <= (unsigned char *) end && r->fails[s])
+ if (s >= 0 && p <= (unsigned char *) end && d->fails[s])
{
- if (r->success[s] & sbit[*p])
+ if (d->success[s] & sbit[*p])
{
if (backref)
- *backref = (r->states[s].backref != 0);
+ if (d->states[s].backref)
+ *backref = 1;
+ else
+ *backref = 0;
return (char *) p;
}
s1 = s;
- s = r->fails[s][*p++];
+ s = d->fails[s][*p++];
continue;
}
@@ -1665,627 +1927,662 @@ regexecute(r, begin, end, newline, count, backref)
++*count;
/* Check if we've run off the end of the buffer. */
- if ((char *) p >= end)
+ if ((char *) p > end)
return NULL;
if (s >= 0)
{
- build_state(s, r);
- trans = r->trans;
+ build_state(s, d);
+ trans = d->trans;
continue;
}
if (p[-1] == '\n' && newline)
{
- s = r->newlines[s1];
+ s = d->newlines[s1];
continue;
}
s = 0;
}
}
-
-/* Initialize the components of a regexp that the other routines don't
+
+/* Initialize the components of a dfa that the other routines don't
initialize for themselves. */
void
-reginit(r)
- struct regexp *r;
+dfainit(d)
+ struct dfa *d;
{
- r->calloc = 1;
- MALLOC(r->charsets, _charset, r->calloc);
- r->cindex = 0;
+ d->calloc = 1;
+ MALLOC(d->charclasses, charclass, d->calloc);
+ d->cindex = 0;
+
+ d->talloc = 1;
+ MALLOC(d->tokens, token, d->talloc);
+ d->tindex = d->depth = d->nleaves = d->nregexps = 0;
- r->talloc = 1;
- MALLOC(r->tokens, _token, r->talloc);
- r->tindex = r->depth = r->nleaves = r->nregexps = 0;
+ d->searchflag = 0;
+ d->tralloc = 0;
- r->searchflag = 0;
- r->tralloc = 0;
+ d->musts = 0;
}
/* Parse and analyze a single string of the given length. */
void
-regcompile(s, len, r, searchflag)
- const char *s;
+dfacomp(s, len, d, searchflag)
+ char *s;
size_t len;
- struct regexp *r;
+ struct dfa *d;
int searchflag;
{
- if (case_fold) /* dummy folding in service of regmust() */
+ if (case_fold) /* dummy folding in service of dfamust() */
{
- char *regcopy;
+ char *lcopy;
int i;
- regcopy = malloc(len);
- if (!regcopy)
- reg_error("out of memory");
+ lcopy = malloc(len);
+ if (!lcopy)
+ dfaerror("out of memory");
- /* This is a complete kludge and could potentially break
- \<letter> escapes . . . */
+ /* This is a kludge. */
case_fold = 0;
for (i = 0; i < len; ++i)
if (ISUPPER(s[i]))
- regcopy[i] = tolower(s[i]);
+ lcopy[i] = tolower(s[i]);
else
- regcopy[i] = s[i];
-
- reginit(r);
- r->mustn = 0;
- r->must[0] = '\0';
- regparse(regcopy, len, r);
- free(regcopy);
- regmust(r);
- reganalyze(r, searchflag);
+ lcopy[i] = s[i];
+
+ dfainit(d);
+ dfaparse(lcopy, len, d);
+ free(lcopy);
+ dfamust(d);
+ d->cindex = d->tindex = d->depth = d->nleaves = d->nregexps = 0;
case_fold = 1;
- reginit(r);
- regparse(s, len, r);
- reganalyze(r, searchflag);
+ dfaparse(s, len, d);
+ dfaanalyze(d, searchflag);
}
else
{
- reginit(r);
- regparse(s, len, r);
- regmust(r);
- reganalyze(r, searchflag);
+ dfainit(d);
+ dfaparse(s, len, d);
+ dfamust(d);
+ dfaanalyze(d, searchflag);
}
}
-/* Free the storage held by the components of a regexp. */
+/* Free the storage held by the components of a dfa. */
void
-reg_free(r)
- struct regexp *r;
+dfafree(d)
+ struct dfa *d;
{
int i;
-
- free((ptr_t) r->charsets);
- free((ptr_t) r->tokens);
- for (i = 0; i < r->sindex; ++i)
- free((ptr_t) r->states[i].elems.elems);
- free((ptr_t) r->states);
- for (i = 0; i < r->tindex; ++i)
- if (r->follows[i].elems)
- free((ptr_t) r->follows[i].elems);
- free((ptr_t) r->follows);
- for (i = 0; i < r->tralloc; ++i)
- if (r->trans[i])
- free((ptr_t) r->trans[i]);
- else if (r->fails[i])
- free((ptr_t) r->fails[i]);
- if (r->realtrans)
- free((ptr_t) r->realtrans);
- if (r->fails)
- free((ptr_t) r->fails);
- if (r->newlines)
- free((ptr_t) r->newlines);
+ struct dfamust *dm, *ndm;
+
+ free((ptr_t) d->charclasses);
+ free((ptr_t) d->tokens);
+ for (i = 0; i < d->sindex; ++i)
+ free((ptr_t) d->states[i].elems.elems);
+ free((ptr_t) d->states);
+ for (i = 0; i < d->tindex; ++i)
+ if (d->follows[i].elems)
+ free((ptr_t) d->follows[i].elems);
+ free((ptr_t) d->follows);
+ for (i = 0; i < d->tralloc; ++i)
+ if (d->trans[i])
+ free((ptr_t) d->trans[i]);
+ else if (d->fails[i])
+ free((ptr_t) d->fails[i]);
+ if (d->realtrans) free((ptr_t) d->realtrans);
+ if (d->fails) free((ptr_t) d->fails);
+ if (d->newlines) free((ptr_t) d->newlines);
+ for (dm = d->musts; dm; dm = ndm)
+ {
+ ndm = dm->next;
+ free(dm->must);
+ free((ptr_t) dm);
+ }
}
-/*
-Having found the postfix representation of the regular expression,
-try to find a long sequence of characters that must appear in any line
-containing the r.e.
-Finding a "longest" sequence is beyond the scope here;
-we take an easy way out and hope for the best.
-(Take "(ab|a)b"--please.)
-
-We do a bottom-up calculation of sequences of characters that must appear
-in matches of r.e.'s represented by trees rooted at the nodes of the postfix
-representation:
+/* Having found the postfix representation of the regular expression,
+ try to find a long sequence of characters that must appear in any line
+ containing the r.e.
+ Finding a "longest" sequence is beyond the scope here;
+ we take an easy way out and hope for the best.
+ (Take "(ab|a)b"--please.)
+
+ We do a bottom-up calculation of sequences of characters that must appear
+ in matches of r.e.'s represented by trees rooted at the nodes of the postfix
+ representation:
sequences that must appear at the left of the match ("left")
sequences that must appear at the right of the match ("right")
lists of sequences that must appear somewhere in the match ("in")
sequences that must constitute the match ("is")
-When we get to the root of the tree, we use one of the longest of its
-calculated "in" sequences as our answer. The sequence we find is returned in
-r->must (where "r" is the single argument passed to "regmust");
-the length of the sequence is returned in r->mustn.
-
-The sequences calculated for the various types of node (in pseudo ANSI c)
-are shown below. "p" is the operand of unary operators (and the left-hand
-operand of binary operators); "q" is the right-hand operand of binary operators
-.
-"ZERO" means "a zero-length sequence" below.
-
-Type left right is in
----- ---- ----- -- --
-char c # c # c # c # c
-
-SET ZERO ZERO ZERO ZERO
-
-STAR ZERO ZERO ZERO ZERO
-
-QMARK ZERO ZERO ZERO ZERO
-
-PLUS p->left p->right ZERO p->in
-
-CAT (p->is==ZERO)? (q->is==ZERO)? (p->is!=ZERO && p->in plus
- p->left : q->right : q->is!=ZERO) ? q->in plus
- p->is##q->left p->right##q->is p->is##q->is : p->right##q->left
- ZERO
-
-OR longest common longest common (do p->is and substrings common to
- leading trailing q->is have same p->in and q->in
- (sub)sequence (sub)sequence length and
- of p->left of p->right content) ?
- and q->left and q->right p->is : NULL
-
-If there's anything else we recognize in the tree, all four sequences get set
-to zero-length sequences. If there's something we don't recognize in the tree,
-we just return a zero-length sequence.
-
-Break ties in favor of infrequent letters (choosing 'zzz' in preference to
-'aaa')?
-And. . .is it here or someplace that we might ponder "optimizations" such as
+ When we get to the root of the tree, we use one of the longest of its
+ calculated "in" sequences as our answer. The sequence we find is returned in
+ d->must (where "d" is the single argument passed to "dfamust");
+ the length of the sequence is returned in d->mustn.
+
+ The sequences calculated for the various types of node (in pseudo ANSI c)
+ are shown below. "p" is the operand of unary operators (and the left-hand
+ operand of binary operators); "q" is the right-hand operand of binary
+ operators.
+
+ "ZERO" means "a zero-length sequence" below.
+
+ Type left right is in
+ ---- ---- ----- -- --
+ char c # c # c # c # c
+
+ CSET ZERO ZERO ZERO ZERO
+
+ STAR ZERO ZERO ZERO ZERO
+
+ QMARK ZERO ZERO ZERO ZERO
+
+ PLUS p->left p->right ZERO p->in
+
+ CAT (p->is==ZERO)? (q->is==ZERO)? (p->is!=ZERO && p->in plus
+ p->left : q->right : q->is!=ZERO) ? q->in plus
+ p->is##q->left p->right##q->is p->is##q->is : p->right##q->left
+ ZERO
+
+ OR longest common longest common (do p->is and substrings common to
+ leading trailing q->is have same p->in and q->in
+ (sub)sequence (sub)sequence length and
+ of p->left of p->right content) ?
+ and q->left and q->right p->is : NULL
+
+ If there's anything else we recognize in the tree, all four sequences get set
+ to zero-length sequences. If there's something we don't recognize in the tree,
+ we just return a zero-length sequence.
+
+ Break ties in favor of infrequent letters (choosing 'zzz' in preference to
+ 'aaa')?
+
+ And. . .is it here or someplace that we might ponder "optimizations" such as
egrep 'psi|epsilon' -> egrep 'psi'
egrep 'pepsi|epsilon' -> egrep 'epsi'
(Yes, we now find "epsi" as a "string
that must occur", but we might also
- simplify the *entire* r.e. being sought
-)
+ simplify the *entire* r.e. being sought)
grep '[c]' -> grep 'c'
grep '(ab|a)b' -> grep 'ab'
grep 'ab*' -> grep 'a'
grep 'a*b' -> grep 'b'
-There are several issues:
- Is optimization easy (enough)?
- Does optimization actually accomplish anything,
- or is the automaton you get from "psi|epsilon" (for example)
- the same as the one you get from "psi" (for example)?
+ There are several issues:
- Are optimizable r.e.'s likely to be used in real-life situations
- (something like 'ab*' is probably unlikely; something like is
- 'psi|epsilon' is likelier)?
-*/
+ Is optimization easy (enough)?
+
+ Does optimization actually accomplish anything,
+ or is the automaton you get from "psi|epsilon" (for example)
+ the same as the one you get from "psi" (for example)?
+
+ Are optimizable r.e.'s likely to be used in real-life situations
+ (something like 'ab*' is probably unlikely; something like is
+ 'psi|epsilon' is likelier)? */
static char *
icatalloc(old, new)
-char * old;
-const char * new;
+ char *old;
+ char *new;
{
- register char * result;
- register int oldsize, newsize;
-
- newsize = (new == NULL) ? 0 : strlen(new);
- if (old == NULL)
- oldsize = 0;
- else if (newsize == 0)
- return old;
- else oldsize = strlen(old);
- if (old == NULL)
- result = (char *) malloc(newsize + 1);
- else result = (char *) realloc((void *) old, oldsize + newsize + 1);
- if (result != NULL && new != NULL)
- (void) strcpy(result + oldsize, new);
- return result;
+ char *result;
+ size_t oldsize, newsize;
+
+ newsize = (new == NULL) ? 0 : strlen(new);
+ if (old == NULL)
+ oldsize = 0;
+ else if (newsize == 0)
+ return old;
+ else oldsize = strlen(old);
+ if (old == NULL)
+ result = (char *) malloc(newsize + 1);
+ else
+ result = (char *) realloc((void *) old, oldsize + newsize + 1);
+ if (result != NULL && new != NULL)
+ (void) strcpy(result + oldsize, new);
+ return result;
}
static char *
icpyalloc(string)
-const char * string;
+ char *string;
{
- return icatalloc((char *) NULL, string);
+ return icatalloc((char *) NULL, string);
}
static char *
istrstr(lookin, lookfor)
-char * lookin;
-register char * lookfor;
+ char *lookin;
+ char *lookfor;
{
- register char * cp;
- register int len;
-
- len = strlen(lookfor);
- for (cp = lookin; *cp != '\0'; ++cp)
- if (strncmp(cp, lookfor, len) == 0)
- return cp;
- return NULL;
+ char *cp;
+ size_t len;
+
+ len = strlen(lookfor);
+ for (cp = lookin; *cp != '\0'; ++cp)
+ if (strncmp(cp, lookfor, len) == 0)
+ return cp;
+ return NULL;
}
static void
ifree(cp)
-char * cp;
+ char *cp;
{
- if (cp != NULL)
- free(cp);
+ if (cp != NULL)
+ free(cp);
}
static void
freelist(cpp)
-register char ** cpp;
+ char **cpp;
{
- register int i;
+ int i;
- if (cpp == NULL)
- return;
- for (i = 0; cpp[i] != NULL; ++i) {
- free(cpp[i]);
- cpp[i] = NULL;
- }
+ if (cpp == NULL)
+ return;
+ for (i = 0; cpp[i] != NULL; ++i)
+ {
+ free(cpp[i]);
+ cpp[i] = NULL;
+ }
}
static char **
enlist(cpp, new, len)
-register char ** cpp;
-register char * new;
-#ifdef __STDC__
-size_t len;
-#else
-int len;
-#endif
+ char **cpp;
+ char *new;
+ size_t len;
{
- register int i, j;
+ int i, j;
- if (cpp == NULL)
- return NULL;
- if ((new = icpyalloc(new)) == NULL) {
- freelist(cpp);
- return NULL;
- }
- new[len] = '\0';
- /*
- ** Is there already something in the list that's new (or longer)?
- */
- for (i = 0; cpp[i] != NULL; ++i)
- if (istrstr(cpp[i], new) != NULL) {
- free(new);
- return cpp;
- }
- /*
- ** Eliminate any obsoleted strings.
- */
- j = 0;
- while (cpp[j] != NULL)
- if (istrstr(new, cpp[j]) == NULL)
- ++j;
- else {
- free(cpp[j]);
- if (--i == j)
- break;
- cpp[j] = cpp[i];
- }
- /*
- ** Add the new string.
- */
- cpp = (char **) realloc((char *) cpp, (i + 2) * sizeof *cpp);
- if (cpp == NULL)
- return NULL;
- cpp[i] = new;
- cpp[i + 1] = NULL;
+ if (cpp == NULL)
+ return NULL;
+ if ((new = icpyalloc(new)) == NULL)
+ {
+ freelist(cpp);
+ return NULL;
+ }
+ new[len] = '\0';
+ /* Is there already something in the list that's new (or longer)? */
+ for (i = 0; cpp[i] != NULL; ++i)
+ if (istrstr(cpp[i], new) != NULL)
+ {
+ free(new);
return cpp;
+ }
+ /* Eliminate any obsoleted strings. */
+ j = 0;
+ while (cpp[j] != NULL)
+ if (istrstr(new, cpp[j]) == NULL)
+ ++j;
+ else
+ {
+ free(cpp[j]);
+ if (--i == j)
+ break;
+ cpp[j] = cpp[i];
+ cpp[i] = NULL;
+ }
+ /* Add the new string. */
+ cpp = (char **) realloc((char *) cpp, (i + 2) * sizeof *cpp);
+ if (cpp == NULL)
+ return NULL;
+ cpp[i] = new;
+ cpp[i + 1] = NULL;
+ return cpp;
}
-/*
-** Given pointers to two strings,
-** return a pointer to an allocated list of their distinct common substrings.
-** Return NULL if something seems wild.
-*/
-
+/* Given pointers to two strings, return a pointer to an allocated
+ list of their distinct common substrings. Return NULL if something
+ seems wild. */
static char **
comsubs(left, right)
-char * left;
-char * right;
+ char *left;
+ char *right;
{
- register char ** cpp;
- register char * lcp;
- register char * rcp;
- register int i, len;
-
- if (left == NULL || right == NULL)
- return NULL;
- cpp = (char **) malloc(sizeof *cpp);
- if (cpp == NULL)
- return NULL;
- cpp[0] = NULL;
- for (lcp = left; *lcp != '\0'; ++lcp) {
- len = 0;
- rcp = strchr(right, *lcp);
- while (rcp != NULL) {
- for (i = 1; lcp[i] != '\0' && lcp[i] == rcp[i]; ++i)
- ;
- if (i > len)
- len = i;
- rcp = strchr(rcp + 1, *lcp);
- }
- if (len == 0)
- continue;
-#ifdef __STDC__
- if ((cpp = enlist(cpp, lcp, (size_t)len)) == NULL)
-#else
- if ((cpp = enlist(cpp, lcp, len)) == NULL)
-#endif
- break;
+ char **cpp;
+ char *lcp;
+ char *rcp;
+ size_t i, len;
+
+ if (left == NULL || right == NULL)
+ return NULL;
+ cpp = (char **) malloc(sizeof *cpp);
+ if (cpp == NULL)
+ return NULL;
+ cpp[0] = NULL;
+ for (lcp = left; *lcp != '\0'; ++lcp)
+ {
+ len = 0;
+ rcp = index(right, *lcp);
+ while (rcp != NULL)
+ {
+ for (i = 1; lcp[i] != '\0' && lcp[i] == rcp[i]; ++i)
+ continue;
+ if (i > len)
+ len = i;
+ rcp = index(rcp + 1, *lcp);
}
- return cpp;
+ if (len == 0)
+ continue;
+ if ((cpp = enlist(cpp, lcp, len)) == NULL)
+ break;
+ }
+ return cpp;
}
static char **
addlists(old, new)
-char ** old;
-char ** new;
+char **old;
+char **new;
{
- register int i;
-
- if (old == NULL || new == NULL)
- return NULL;
- for (i = 0; new[i] != NULL; ++i) {
- old = enlist(old, new[i], strlen(new[i]));
- if (old == NULL)
- break;
- }
- return old;
-}
+ int i;
-/*
-** Given two lists of substrings,
-** return a new list giving substrings common to both.
-*/
+ if (old == NULL || new == NULL)
+ return NULL;
+ for (i = 0; new[i] != NULL; ++i)
+ {
+ old = enlist(old, new[i], strlen(new[i]));
+ if (old == NULL)
+ break;
+ }
+ return old;
+}
+/* Given two lists of substrings, return a new list giving substrings
+ common to both. */
static char **
inboth(left, right)
-char ** left;
-char ** right;
+ char **left;
+ char **right;
{
- register char ** both;
- register char ** temp;
- register int lnum, rnum;
-
- if (left == NULL || right == NULL)
- return NULL;
- both = (char **) malloc(sizeof *both);
- if (both == NULL)
- return NULL;
- both[0] = NULL;
- for (lnum = 0; left[lnum] != NULL; ++lnum) {
- for (rnum = 0; right[rnum] != NULL; ++rnum) {
- temp = comsubs(left[lnum], right[rnum]);
- if (temp == NULL) {
- freelist(both);
- return NULL;
- }
- both = addlists(both, temp);
- freelist(temp);
- if (both == NULL)
- return NULL;
- }
+ char **both;
+ char **temp;
+ int lnum, rnum;
+
+ if (left == NULL || right == NULL)
+ return NULL;
+ both = (char **) malloc(sizeof *both);
+ if (both == NULL)
+ return NULL;
+ both[0] = NULL;
+ for (lnum = 0; left[lnum] != NULL; ++lnum)
+ {
+ for (rnum = 0; right[rnum] != NULL; ++rnum)
+ {
+ temp = comsubs(left[lnum], right[rnum]);
+ if (temp == NULL)
+ {
+ freelist(both);
+ return NULL;
+ }
+ both = addlists(both, temp);
+ freelist(temp);
+ if (both == NULL)
+ return NULL;
}
- return both;
+ }
+ return both;
}
-/*
-typedef struct {
- char ** in;
- char * left;
- char * right;
- char * is;
+typedef struct
+{
+ char **in;
+ char *left;
+ char *right;
+ char *is;
} must;
- */
+
static void
resetmust(mp)
-register must * mp;
+must *mp;
{
- mp->left[0] = mp->right[0] = mp->is[0] = '\0';
- freelist(mp->in);
+ mp->left[0] = mp->right[0] = mp->is[0] = '\0';
+ freelist(mp->in);
}
static void
-regmust(r)
-register struct regexp * r;
+dfamust(dfa)
+struct dfa *dfa;
{
- register must * musts;
- register must * mp;
- register char * result = "";
- register int ri;
- register int i;
- register _token t;
- static must must0;
-
- reg->mustn = 0;
- reg->must[0] = '\0';
- musts = (must *) malloc((reg->tindex + 1) * sizeof *musts);
- if (musts == NULL)
- return;
- mp = musts;
- for (i = 0; i <= reg->tindex; ++i)
- mp[i] = must0;
- for (i = 0; i <= reg->tindex; ++i) {
- mp[i].in = (char **) malloc(sizeof *mp[i].in);
- mp[i].left = malloc(2);
- mp[i].right = malloc(2);
- mp[i].is = malloc(2);
- if (mp[i].in == NULL || mp[i].left == NULL ||
- mp[i].right == NULL || mp[i].is == NULL)
- goto done;
- mp[i].left[0] = mp[i].right[0] = mp[i].is[0] = '\0';
- mp[i].in[0] = NULL;
- }
- for (ri = 0; ri < reg->tindex; ++ri) {
- switch (t = reg->tokens[ri]) {
- case _ALLBEGLINE:
- case _ALLENDLINE:
- case _LPAREN:
- case _RPAREN:
- goto done; /* "cannot happen" */
- case _EMPTY:
- case _BEGLINE:
- case _ENDLINE:
- case _BEGWORD:
- case _ENDWORD:
- case _LIMWORD:
- case _NOTLIMWORD:
- case _BACKREF:
- resetmust(mp);
- break;
- case _STAR:
- case _QMARK:
- if (mp <= musts)
- goto done; /* "cannot happen" */
- --mp;
- resetmust(mp);
- break;
- case _OR:
- if (mp < &musts[2])
- goto done; /* "cannot happen" */
- {
- register char ** new;
- register must * lmp;
- register must * rmp;
- register int j, ln, rn, n;
-
- rmp = --mp;
- lmp = --mp;
- /* Guaranteed to be. Unlikely, but. . . */
- if (strcmp(lmp->is, rmp->is) != 0)
- lmp->is[0] = '\0';
- /* Left side--easy */
- i = 0;
- while (lmp->left[i] != '\0' &&
- lmp->left[i] == rmp->left[i])
- ++i;
- lmp->left[i] = '\0';
- /* Right side */
- ln = strlen(lmp->right);
- rn = strlen(rmp->right);
- n = ln;
- if (n > rn)
- n = rn;
- for (i = 0; i < n; ++i)
- if (lmp->right[ln - i - 1] !=
- rmp->right[rn - i - 1])
- break;
- for (j = 0; j < i; ++j)
- lmp->right[j] =
- lmp->right[(ln - i) + j];
- lmp->right[j] = '\0';
- new = inboth(lmp->in, rmp->in);
- if (new == NULL)
- goto done;
- freelist(lmp->in);
- free((char *) lmp->in);
- lmp->in = new;
- }
- break;
- case _PLUS:
- if (mp <= musts)
- goto done; /* "cannot happen" */
- --mp;
- mp->is[0] = '\0';
- break;
- case _END:
- if (mp != &musts[1])
- goto done; /* "cannot happen" */
- for (i = 0; musts[0].in[i] != NULL; ++i)
- if (strlen(musts[0].in[i]) > strlen(result))
- result = musts[0].in[i];
- goto done;
- case _CAT:
- if (mp < &musts[2])
- goto done; /* "cannot happen" */
- {
- register must * lmp;
- register must * rmp;
-
- rmp = --mp;
- lmp = --mp;
- /*
- ** In. Everything in left, plus everything in
- ** right, plus catenation of
- ** left's right and right's left.
- */
- lmp->in = addlists(lmp->in, rmp->in);
- if (lmp->in == NULL)
- goto done;
- if (lmp->right[0] != '\0' &&
- rmp->left[0] != '\0') {
- register char * tp;
-
- tp = icpyalloc(lmp->right);
- if (tp == NULL)
- goto done;
- tp = icatalloc(tp, rmp->left);
- if (tp == NULL)
- goto done;
- lmp->in = enlist(lmp->in, tp,
- strlen(tp));
- free(tp);
- if (lmp->in == NULL)
- goto done;
- }
- /* Left-hand */
- if (lmp->is[0] != '\0') {
- lmp->left = icatalloc(lmp->left,
- rmp->left);
- if (lmp->left == NULL)
- goto done;
- }
- /* Right-hand */
- if (rmp->is[0] == '\0')
- lmp->right[0] = '\0';
- lmp->right = icatalloc(lmp->right, rmp->right);
- if (lmp->right == NULL)
- goto done;
- /* Guaranteed to be */
- if (lmp->is[0] != '\0' && rmp->is[0] != '\0') {
- lmp->is = icatalloc(lmp->is, rmp->is);
- if (lmp->is == NULL)
- goto done;
- }
- }
- break;
- default:
- if (t < _END) {
- /* "cannot happen" */
- goto done;
- } else if (t == '\0') {
- /* not on *my* shift */
- goto done;
- } else if (t >= _SET) {
- /* easy enough */
- resetmust(mp);
- } else {
- /* plain character */
- resetmust(mp);
- mp->is[0] = mp->left[0] = mp->right[0] = t;
- mp->is[1] = mp->left[1] = mp->right[1] = '\0';
- mp->in = enlist(mp->in, mp->is, 1);
- if (mp->in == NULL)
- goto done;
- }
- break;
- }
- ++mp;
- }
-done:
- (void) strncpy(reg->must, result, MUST_MAX - 1);
- reg->must[MUST_MAX - 1] = '\0';
- reg->mustn = strlen(reg->must);
- mp = musts;
- for (i = 0; i <= reg->tindex; ++i) {
- freelist(mp[i].in);
- ifree((char *) mp[i].in);
- ifree(mp[i].left);
- ifree(mp[i].right);
- ifree(mp[i].is);
+ must *musts;
+ must *mp;
+ char *result;
+ int ri;
+ int i;
+ int exact;
+ token t;
+ static must must0;
+ struct dfamust *dm;
+ static char empty_string[] = "";
+
+ result = empty_string;
+ exact = 0;
+ musts = (must *) malloc((dfa->tindex + 1) * sizeof *musts);
+ if (musts == NULL)
+ return;
+ mp = musts;
+ for (i = 0; i <= dfa->tindex; ++i)
+ mp[i] = must0;
+ for (i = 0; i <= dfa->tindex; ++i)
+ {
+ mp[i].in = (char **) malloc(sizeof *mp[i].in);
+ mp[i].left = malloc(2);
+ mp[i].right = malloc(2);
+ mp[i].is = malloc(2);
+ if (mp[i].in == NULL || mp[i].left == NULL ||
+ mp[i].right == NULL || mp[i].is == NULL)
+ goto done;
+ mp[i].left[0] = mp[i].right[0] = mp[i].is[0] = '\0';
+ mp[i].in[0] = NULL;
+ }
+#ifdef DEBUG
+ fprintf(stderr, "dfamust:\n");
+ for (i = 0; i < dfa->tindex; ++i)
+ {
+ fprintf(stderr, " %d:", i);
+ prtok(dfa->tokens[i]);
+ }
+ putc('\n', stderr);
+#endif
+ for (ri = 0; ri < dfa->tindex; ++ri)
+ {
+ switch (t = dfa->tokens[ri])
+ {
+ case LPAREN:
+ case RPAREN:
+ goto done; /* "cannot happen" */
+ case EMPTY:
+ case BEGLINE:
+ case ENDLINE:
+ case BEGWORD:
+ case ENDWORD:
+ case LIMWORD:
+ case NOTLIMWORD:
+ case BACKREF:
+ resetmust(mp);
+ break;
+ case STAR:
+ case QMARK:
+ if (mp <= musts)
+ goto done; /* "cannot happen" */
+ --mp;
+ resetmust(mp);
+ break;
+ case OR:
+ case ORTOP:
+ if (mp < &musts[2])
+ goto done; /* "cannot happen" */
+ {
+ char **new;
+ must *lmp;
+ must *rmp;
+ int j, ln, rn, n;
+
+ rmp = --mp;
+ lmp = --mp;
+ /* Guaranteed to be. Unlikely, but. . . */
+ if (strcmp(lmp->is, rmp->is) != 0)
+ lmp->is[0] = '\0';
+ /* Left side--easy */
+ i = 0;
+ while (lmp->left[i] != '\0' && lmp->left[i] == rmp->left[i])
+ ++i;
+ lmp->left[i] = '\0';
+ /* Right side */
+ ln = strlen(lmp->right);
+ rn = strlen(rmp->right);
+ n = ln;
+ if (n > rn)
+ n = rn;
+ for (i = 0; i < n; ++i)
+ if (lmp->right[ln - i - 1] != rmp->right[rn - i - 1])
+ break;
+ for (j = 0; j < i; ++j)
+ lmp->right[j] = lmp->right[(ln - i) + j];
+ lmp->right[j] = '\0';
+ new = inboth(lmp->in, rmp->in);
+ if (new == NULL)
+ goto done;
+ freelist(lmp->in);
+ free((char *) lmp->in);
+ lmp->in = new;
+ }
+ break;
+ case PLUS:
+ if (mp <= musts)
+ goto done; /* "cannot happen" */
+ --mp;
+ mp->is[0] = '\0';
+ break;
+ case END:
+ if (mp != &musts[1])
+ goto done; /* "cannot happen" */
+ for (i = 0; musts[0].in[i] != NULL; ++i)
+ if (strlen(musts[0].in[i]) > strlen(result))
+ result = musts[0].in[i];
+ if (strcmp(result, musts[0].is) == 0)
+ exact = 1;
+ goto done;
+ case CAT:
+ if (mp < &musts[2])
+ goto done; /* "cannot happen" */
+ {
+ must *lmp;
+ must *rmp;
+
+ rmp = --mp;
+ lmp = --mp;
+ /* In. Everything in left, plus everything in
+ right, plus catenation of
+ left's right and right's left. */
+ lmp->in = addlists(lmp->in, rmp->in);
+ if (lmp->in == NULL)
+ goto done;
+ if (lmp->right[0] != '\0' &&
+ rmp->left[0] != '\0')
+ {
+ char *tp;
+
+ tp = icpyalloc(lmp->right);
+ if (tp == NULL)
+ goto done;
+ tp = icatalloc(tp, rmp->left);
+ if (tp == NULL)
+ goto done;
+ lmp->in = enlist(lmp->in, tp,
+ strlen(tp));
+ free(tp);
+ if (lmp->in == NULL)
+ goto done;
+ }
+ /* Left-hand */
+ if (lmp->is[0] != '\0')
+ {
+ lmp->left = icatalloc(lmp->left,
+ rmp->left);
+ if (lmp->left == NULL)
+ goto done;
+ }
+ /* Right-hand */
+ if (rmp->is[0] == '\0')
+ lmp->right[0] = '\0';
+ lmp->right = icatalloc(lmp->right, rmp->right);
+ if (lmp->right == NULL)
+ goto done;
+ /* Guaranteed to be */
+ if (lmp->is[0] != '\0' && rmp->is[0] != '\0')
+ {
+ lmp->is = icatalloc(lmp->is, rmp->is);
+ if (lmp->is == NULL)
+ goto done;
+ }
+ else
+ lmp->is[0] = '\0';
+ }
+ break;
+ default:
+ if (t < END)
+ {
+ /* "cannot happen" */
+ goto done;
+ }
+ else if (t == '\0')
+ {
+ /* not on *my* shift */
+ goto done;
+ }
+ else if (t >= CSET)
+ {
+ /* easy enough */
+ resetmust(mp);
+ }
+ else
+ {
+ /* plain character */
+ resetmust(mp);
+ mp->is[0] = mp->left[0] = mp->right[0] = t;
+ mp->is[1] = mp->left[1] = mp->right[1] = '\0';
+ mp->in = enlist(mp->in, mp->is, (size_t)1);
+ if (mp->in == NULL)
+ goto done;
+ }
+ break;
}
- free((char *) mp);
+#ifdef DEBUG
+ fprintf(stderr, " node: %d:", ri);
+ prtok(dfa->tokens[ri]);
+ fprintf(stderr, "\n in:");
+ for (i = 0; mp->in[i]; ++i)
+ fprintf(stderr, " \"%s\"", mp->in[i]);
+ fprintf(stderr, "\n is: \"%s\"\n", mp->is);
+ fprintf(stderr, " left: \"%s\"\n", mp->left);
+ fprintf(stderr, " right: \"%s\"\n", mp->right);
+#endif
+ ++mp;
+ }
+ done:
+ if (strlen(result))
+ {
+ dm = (struct dfamust *) malloc(sizeof (struct dfamust));
+ dm->exact = exact;
+ dm->must = malloc(strlen(result) + 1);
+ strcpy(dm->must, result);
+ dm->next = dfa->musts;
+ dfa->musts = dm;
+ }
+ mp = musts;
+ for (i = 0; i <= dfa->tindex; ++i)
+ {
+ freelist(mp[i].in);
+ ifree((char *) mp[i].in);
+ ifree(mp[i].left);
+ ifree(mp[i].right);
+ ifree(mp[i].is);
+ }
+ free((char *) mp);
}
diff --git a/gnu/usr.bin/awk/dfa.h b/gnu/usr.bin/awk/dfa.h
index 65fc49565a7c..cc27d7a6a964 100644
--- a/gnu/usr.bin/awk/dfa.h
+++ b/gnu/usr.bin/awk/dfa.h
@@ -1,318 +1,130 @@
/* dfa.h - declarations for GNU deterministic regexp compiler
Copyright (C) 1988 Free Software Foundation, Inc.
- Written June, 1988 by Mike Haertel
-
- NO WARRANTY
-
- BECAUSE THIS PROGRAM IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY
-NO WARRANTY, TO THE EXTENT PERMITTED BY APPLICABLE STATE LAW. EXCEPT
-WHEN OTHERWISE STATED IN WRITING, FREE SOFTWARE FOUNDATION, INC,
-RICHARD M. STALLMAN AND/OR OTHER PARTIES PROVIDE THIS PROGRAM "AS IS"
-WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING,
-BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
-FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY
-AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE
-DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR
-CORRECTION.
-
- IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW WILL RICHARD M.
-STALLMAN, THE FREE SOFTWARE FOUNDATION, INC., AND/OR ANY OTHER PARTY
-WHO MAY MODIFY AND REDISTRIBUTE THIS PROGRAM AS PERMITTED BELOW, BE
-LIABLE TO YOU FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR
-OTHER SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
-USE OR INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR
-DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES OR
-A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) THIS
-PROGRAM, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
-DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY.
-
- GENERAL PUBLIC LICENSE TO COPY
-
- 1. You may copy and distribute verbatim copies of this source file
-as you receive it, in any medium, provided that you conspicuously and
-appropriately publish on each copy a valid copyright notice "Copyright
- (C) 1988 Free Software Foundation, Inc."; and include following the
-copyright notice a verbatim copy of the above disclaimer of warranty
-and of this License. You may charge a distribution fee for the
-physical act of transferring a copy.
-
- 2. You may modify your copy or copies of this source file or
-any portion of it, and copy and distribute such modifications under
-the terms of Paragraph 1 above, provided that you also do the following:
-
- a) cause the modified files to carry prominent notices stating
- that you changed the files and the date of any change; and
-
- b) cause the whole of any work that you distribute or publish,
- that in whole or in part contains or is a derivative of this
- program or any part thereof, to be licensed at no charge to all
- third parties on terms identical to those contained in this
- License Agreement (except that you may choose to grant more extensive
- warranty protection to some or all third parties, at your option).
-
- c) You may charge a distribution fee for the physical act of
- transferring a copy, and you may at your option offer warranty
- protection in exchange for a fee.
-
-Mere aggregation of another unrelated program with this program (or its
-derivative) on a volume of a storage or distribution medium does not bring
-the other program under the scope of these terms.
-
- 3. You may copy and distribute this program or any portion of it in
-compiled, executable or object code form under the terms of Paragraphs
-1 and 2 above provided that you do the following:
-
- a) accompany it with the complete corresponding machine-readable
- source code, which must be distributed under the terms of
- Paragraphs 1 and 2 above; or,
-
- b) accompany it with a written offer, valid for at least three
- years, to give any third party free (except for a nominal
- shipping charge) a complete machine-readable copy of the
- corresponding source code, to be distributed under the terms of
- Paragraphs 1 and 2 above; or,
-
- c) accompany it with the information you received as to where the
- corresponding source code may be obtained. (This alternative is
- allowed only for noncommercial distribution and only if you
- received the program in object code or executable form alone.)
-
-For an executable file, complete source code means all the source code for
-all modules it contains; but, as a special exception, it need not include
-source code for modules which are standard libraries that accompany the
-operating system on which the executable file runs.
-
- 4. You may not copy, sublicense, distribute or transfer this program
-except as expressly provided under this License Agreement. Any attempt
-otherwise to copy, sublicense, distribute or transfer this program is void and
-your rights to use the program under this License agreement shall be
-automatically terminated. However, parties who have received computer
-software programs from you with this License Agreement will not have
-their licenses terminated so long as such parties remain in full compliance.
-
- 5. If you wish to incorporate parts of this program into other free
-programs whose distribution conditions are different, write to the Free
-Software Foundation at 675 Mass Ave, Cambridge, MA 02139. We have not yet
-worked out a simple rule that can be stated here, but we will often permit
-this. We will be guided by the two goals of preserving the free status of
-all derivatives our free software and of promoting the sharing and reuse of
-software.
-
-
-In other words, you are welcome to use, share and improve this program.
-You are forbidden to forbid anyone else to use, share and improve
-what you give them. Help stamp out software-hoarding! */
-
-#ifdef __STDC__
-#ifdef SOMEDAY
-#define ISALNUM(c) isalnum(c)
-#define ISALPHA(c) isalpha(c)
-#define ISUPPER(c) isupper(c)
-#else
-#define ISALNUM(c) (isascii(c) && isalnum(c))
-#define ISALPHA(c) (isascii(c) && isalpha(c))
-#define ISUPPER(c) (isascii(c) && isupper(c))
-#endif
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2, or (at your option)
+ any later version.
-#else /* ! __STDC__ */
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
-#define const
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
-#define ISALNUM(c) (isascii(c) && isalnum(c))
-#define ISALPHA(c) (isascii(c) && isalpha(c))
-#define ISUPPER(c) (isascii(c) && isupper(c))
+/* Written June, 1988 by Mike Haertel */
-#endif /* ! __STDC__ */
-
-/* 1 means plain parentheses serve as grouping, and backslash
- parentheses are needed for literal searching.
- 0 means backslash-parentheses are grouping, and plain parentheses
- are for literal searching. */
-#define RE_NO_BK_PARENS 1L
-
-/* 1 means plain | serves as the "or"-operator, and \| is a literal.
- 0 means \| serves as the "or"-operator, and | is a literal. */
-#define RE_NO_BK_VBAR (1L << 1)
-
-/* 0 means plain + or ? serves as an operator, and \+, \? are literals.
- 1 means \+, \? are operators and plain +, ? are literals. */
-#define RE_BK_PLUS_QM (1L << 2)
-
-/* 1 means | binds tighter than ^ or $.
- 0 means the contrary. */
-#define RE_TIGHT_VBAR (1L << 3)
-
-/* 1 means treat \n as an _OR operator
- 0 means treat it as a normal character */
-#define RE_NEWLINE_OR (1L << 4)
-
-/* 0 means that a special characters (such as *, ^, and $) always have
- their special meaning regardless of the surrounding context.
- 1 means that special characters may act as normal characters in some
- contexts. Specifically, this applies to:
- ^ - only special at the beginning, or after ( or |
- $ - only special at the end, or before ) or |
- *, +, ? - only special when not after the beginning, (, or | */
-#define RE_CONTEXT_INDEP_OPS (1L << 5)
-
-/* 1 means that \ in a character class escapes the next character (typically
- a hyphen. It also is overloaded to mean that hyphen at the end of the range
- is allowable and means that the hyphen is to be taken literally. */
-#define RE_AWK_CLASS_HACK (1L << 6)
-
-/* Now define combinations of bits for the standard possibilities. */
-#ifdef notdef
-#define RE_SYNTAX_AWK (RE_NO_BK_PARENS | RE_NO_BK_VBAR | RE_CONTEXT_INDEP_OPS)
-#define RE_SYNTAX_EGREP (RE_SYNTAX_AWK | RE_NEWLINE_OR)
-#define RE_SYNTAX_GREP (RE_BK_PLUS_QM | RE_NEWLINE_OR)
-#define RE_SYNTAX_EMACS 0
-#endif
-
-/* The NULL pointer. */
-#ifndef NULL
-#define NULL 0
-#endif
+/* FIXME:
+ 2. We should not export so much of the DFA internals.
+ In addition to clobbering modularity, we eat up valuable
+ name space. */
/* Number of bits in an unsigned char. */
-#ifndef CHARBITS
#define CHARBITS 8
-#endif
/* First integer value that is greater than any character code. */
-#define _NOTCHAR (1 << CHARBITS)
+#define NOTCHAR (1 << CHARBITS)
/* INTBITS need not be exact, just a lower bound. */
-#ifndef INTBITS
#define INTBITS (CHARBITS * sizeof (int))
-#endif
/* Number of ints required to hold a bit for every character. */
-#define _CHARSET_INTS ((_NOTCHAR + INTBITS - 1) / INTBITS)
+#define CHARCLASS_INTS ((NOTCHAR + INTBITS - 1) / INTBITS)
/* Sets of unsigned characters are stored as bit vectors in arrays of ints. */
-typedef int _charset[_CHARSET_INTS];
+typedef int charclass[CHARCLASS_INTS];
/* The regexp is parsed into an array of tokens in postfix form. Some tokens
are operators and others are terminal symbols. Most (but not all) of these
codes are returned by the lexical analyzer. */
-#ifdef __STDC__
typedef enum
{
- _END = -1, /* _END is a terminal symbol that matches the
- end of input; any value of _END or less in
+ END = -1, /* END is a terminal symbol that matches the
+ end of input; any value of END or less in
the parse tree is such a symbol. Accepting
states of the DFA are those that would have
- a transition on _END. */
+ a transition on END. */
/* Ordinary character values are terminal symbols that match themselves. */
- _EMPTY = _NOTCHAR, /* _EMPTY is a terminal symbol that matches
+ EMPTY = NOTCHAR, /* EMPTY is a terminal symbol that matches
the empty string. */
- _BACKREF, /* _BACKREF is generated by \<digit>; it
+ BACKREF, /* BACKREF is generated by \<digit>; it
it not completely handled. If the scanner
detects a transition on backref, it returns
a kind of "semi-success" indicating that
the match will have to be verified with
a backtracking matcher. */
- _BEGLINE, /* _BEGLINE is a terminal symbol that matches
+ BEGLINE, /* BEGLINE is a terminal symbol that matches
the empty string if it is at the beginning
of a line. */
- _ALLBEGLINE, /* _ALLBEGLINE is a terminal symbol that
- matches the empty string if it is at the
- beginning of a line; _ALLBEGLINE applies
- to the entire regexp and can only occur
- as the first token thereof. _ALLBEGLINE
- never appears in the parse tree; a _BEGLINE
- is prepended with _CAT to the entire
- regexp instead. */
-
- _ENDLINE, /* _ENDLINE is a terminal symbol that matches
+ ENDLINE, /* ENDLINE is a terminal symbol that matches
the empty string if it is at the end of
a line. */
- _ALLENDLINE, /* _ALLENDLINE is to _ENDLINE as _ALLBEGLINE
- is to _BEGLINE. */
-
- _BEGWORD, /* _BEGWORD is a terminal symbol that matches
+ BEGWORD, /* BEGWORD is a terminal symbol that matches
the empty string if it is at the beginning
of a word. */
- _ENDWORD, /* _ENDWORD is a terminal symbol that matches
+ ENDWORD, /* ENDWORD is a terminal symbol that matches
the empty string if it is at the end of
a word. */
- _LIMWORD, /* _LIMWORD is a terminal symbol that matches
+ LIMWORD, /* LIMWORD is a terminal symbol that matches
the empty string if it is at the beginning
or the end of a word. */
- _NOTLIMWORD, /* _NOTLIMWORD is a terminal symbol that
+ NOTLIMWORD, /* NOTLIMWORD is a terminal symbol that
matches the empty string if it is not at
the beginning or end of a word. */
- _QMARK, /* _QMARK is an operator of one argument that
+ QMARK, /* QMARK is an operator of one argument that
matches zero or one occurences of its
argument. */
- _STAR, /* _STAR is an operator of one argument that
+ STAR, /* STAR is an operator of one argument that
matches the Kleene closure (zero or more
occurrences) of its argument. */
- _PLUS, /* _PLUS is an operator of one argument that
+ PLUS, /* PLUS is an operator of one argument that
matches the positive closure (one or more
occurrences) of its argument. */
- _CAT, /* _CAT is an operator of two arguments that
+ REPMN, /* REPMN is a lexical token corresponding
+ to the {m,n} construct. REPMN never
+ appears in the compiled token vector. */
+
+ CAT, /* CAT is an operator of two arguments that
matches the concatenation of its
- arguments. _CAT is never returned by the
+ arguments. CAT is never returned by the
lexical analyzer. */
- _OR, /* _OR is an operator of two arguments that
+ OR, /* OR is an operator of two arguments that
matches either of its arguments. */
- _LPAREN, /* _LPAREN never appears in the parse tree,
+ ORTOP, /* OR at the toplevel in the parse tree.
+ This is used for a boyer-moore heuristic. */
+
+ LPAREN, /* LPAREN never appears in the parse tree,
it is only a lexeme. */
- _RPAREN, /* _RPAREN never appears in the parse tree. */
+ RPAREN, /* RPAREN never appears in the parse tree. */
- _SET /* _SET and (and any value greater) is a
+ CSET /* CSET and (and any value greater) is a
terminal symbol that matches any of a
class of characters. */
-} _token;
+} token;
-#else /* ! __STDC__ */
-
-typedef short _token;
-
-#define _END -1
-#define _EMPTY _NOTCHAR
-#define _BACKREF (_EMPTY + 1)
-#define _BEGLINE (_EMPTY + 2)
-#define _ALLBEGLINE (_EMPTY + 3)
-#define _ENDLINE (_EMPTY + 4)
-#define _ALLENDLINE (_EMPTY + 5)
-#define _BEGWORD (_EMPTY + 6)
-#define _ENDWORD (_EMPTY + 7)
-#define _LIMWORD (_EMPTY + 8)
-#define _NOTLIMWORD (_EMPTY + 9)
-#define _QMARK (_EMPTY + 10)
-#define _STAR (_EMPTY + 11)
-#define _PLUS (_EMPTY + 12)
-#define _CAT (_EMPTY + 13)
-#define _OR (_EMPTY + 14)
-#define _LPAREN (_EMPTY + 15)
-#define _RPAREN (_EMPTY + 16)
-#define _SET (_EMPTY + 17)
-
-#endif /* ! __STDC__ */
-
-/* Sets are stored in an array in the compiled regexp; the index of the
- array corresponding to a given set token is given by _SET_INDEX(t). */
-#define _SET_INDEX(t) ((t) - _SET)
+/* Sets are stored in an array in the compiled dfa; the index of the
+ array corresponding to a given set token is given by SET_INDEX(t). */
+#define SET_INDEX(t) ((t) - CSET)
/* Sometimes characters can only be matched depending on the surrounding
context. Such context decisions depend on what the previous character
@@ -332,36 +144,36 @@ typedef short _token;
Word-constituent characters are those that satisfy isalnum().
- The macro _SUCCEEDS_IN_CONTEXT determines whether a a given constraint
+ The macro SUCCEEDS_IN_CONTEXT determines whether a a given constraint
succeeds in a particular context. Prevn is true if the previous character
was a newline, currn is true if the lookahead character is a newline.
Prevl and currl similarly depend upon whether the previous and current
characters are word-constituent letters. */
-#define _MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
- ((constraint) & (1 << (((prevn) ? 2 : 0) + ((currn) ? 1 : 0) + 4)))
-#define _MATCHES_LETTER_CONTEXT(constraint, prevl, currl) \
- ((constraint) & (1 << (((prevl) ? 2 : 0) + ((currl) ? 1 : 0))))
-#define _SUCCEEDS_IN_CONTEXT(constraint, prevn, currn, prevl, currl) \
- (_MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
- && _MATCHES_LETTER_CONTEXT(constraint, prevl, currl))
+#define MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
+ ((constraint) & 1 << (((prevn) ? 2 : 0) + ((currn) ? 1 : 0) + 4))
+#define MATCHES_LETTER_CONTEXT(constraint, prevl, currl) \
+ ((constraint) & 1 << (((prevl) ? 2 : 0) + ((currl) ? 1 : 0)))
+#define SUCCEEDS_IN_CONTEXT(constraint, prevn, currn, prevl, currl) \
+ (MATCHES_NEWLINE_CONTEXT(constraint, prevn, currn) \
+ && MATCHES_LETTER_CONTEXT(constraint, prevl, currl))
/* The following macros give information about what a constraint depends on. */
-#define _PREV_NEWLINE_DEPENDENT(constraint) \
+#define PREV_NEWLINE_DEPENDENT(constraint) \
(((constraint) & 0xc0) >> 2 != ((constraint) & 0x30))
-#define _PREV_LETTER_DEPENDENT(constraint) \
+#define PREV_LETTER_DEPENDENT(constraint) \
(((constraint) & 0x0c) >> 2 != ((constraint) & 0x03))
/* Tokens that match the empty string subject to some constraint actually
work by applying that constraint to determine what may follow them,
taking into account what has gone before. The following values are
the constraints corresponding to the special tokens previously defined. */
-#define _NO_CONSTRAINT 0xff
-#define _BEGLINE_CONSTRAINT 0xcf
-#define _ENDLINE_CONSTRAINT 0xaf
-#define _BEGWORD_CONSTRAINT 0xf2
-#define _ENDWORD_CONSTRAINT 0xf4
-#define _LIMWORD_CONSTRAINT 0xf6
-#define _NOTLIMWORD_CONSTRAINT 0xf9
+#define NO_CONSTRAINT 0xff
+#define BEGLINE_CONSTRAINT 0xcf
+#define ENDLINE_CONSTRAINT 0xaf
+#define BEGWORD_CONSTRAINT 0xf2
+#define ENDWORD_CONSTRAINT 0xf4
+#define LIMWORD_CONSTRAINT 0xf6
+#define NOTLIMWORD_CONSTRAINT 0xf9
/* States of the recognizer correspond to sets of positions in the parse
tree, together with the constraints under which they may be matched.
@@ -371,44 +183,48 @@ typedef struct
{
unsigned index; /* Index into the parse array. */
unsigned constraint; /* Constraint for matching this position. */
-} _position;
+} position;
/* Sets of positions are stored as arrays. */
typedef struct
{
- _position *elems; /* Elements of this position set. */
+ position *elems; /* Elements of this position set. */
int nelem; /* Number of elements in this set. */
-} _position_set;
+} position_set;
-/* A state of the regexp consists of a set of positions, some flags,
+/* A state of the dfa consists of a set of positions, some flags,
and the token value of the lowest-numbered position of the state that
- contains an _END token. */
+ contains an END token. */
typedef struct
{
int hash; /* Hash of the positions of this state. */
- _position_set elems; /* Positions this state could match. */
+ position_set elems; /* Positions this state could match. */
char newline; /* True if previous state matched newline. */
char letter; /* True if previous state matched a letter. */
char backref; /* True if this state matches a \<digit>. */
unsigned char constraint; /* Constraint for this state to accept. */
- int first_end; /* Token value of the first _END in elems. */
-} _dfa_state;
+ int first_end; /* Token value of the first END in elems. */
+} dfa_state;
-/* If an r.e. is at most MUST_MAX characters long, we look for a string which
- must appear in it; whatever's found is dropped into the struct reg. */
-
-#define MUST_MAX 50
+/* Element of a list of strings, at least one of which is known to
+ appear in any R.E. matching the DFA. */
+struct dfamust
+{
+ int exact;
+ char *must;
+ struct dfamust *next;
+};
/* A compiled regular expression. */
-struct regexp
+struct dfa
{
/* Stuff built by the scanner. */
- _charset *charsets; /* Array of character sets for _SET tokens. */
- int cindex; /* Index for adding new charsets. */
- int calloc; /* Number of charsets currently allocated. */
+ charclass *charclasses; /* Array of character sets for CSET tokens. */
+ int cindex; /* Index for adding new charclasses. */
+ int calloc; /* Number of charclasses currently allocated. */
/* Stuff built by the parser. */
- _token *tokens; /* Postfix parse array. */
+ token *tokens; /* Postfix parse array. */
int tindex; /* Index for adding new tokens. */
int talloc; /* Number of tokens currently allocated. */
int depth; /* Depth required of an evaluation stack
@@ -416,15 +232,15 @@ struct regexp
parse tree. */
int nleaves; /* Number of leaves on the parse tree. */
int nregexps; /* Count of parallel regexps being built
- with regparse(). */
+ with dfaparse(). */
/* Stuff owned by the state builder. */
- _dfa_state *states; /* States of the regexp. */
+ dfa_state *states; /* States of the dfa. */
int sindex; /* Index for adding new states. */
int salloc; /* Number of states currently allocated. */
/* Stuff built by the structure analyzer. */
- _position_set *follows; /* Array of follow sets, indexed by position
+ position_set *follows; /* Array of follow sets, indexed by position
index. The follow of a position is the set
of positions containing characters that
could conceivably follow a character
@@ -454,7 +270,7 @@ struct regexp
int **fails; /* Transition tables after failing to accept
on a state that potentially could do so. */
int *success; /* Table of acceptance conditions used in
- regexecute and computed in build_state. */
+ dfaexec and computed in build_state. */
int *newlines; /* Transitions on newlines. The entry for a
newline in any transition table is always
-1 so we can count lines without wasting
@@ -462,40 +278,41 @@ struct regexp
newline is stored separately and handled
as a special case. Newline is also used
as a sentinel at the end of the buffer. */
- char must[MUST_MAX];
- int mustn;
+ struct dfamust *musts; /* List of strings, at least one of which
+ is known to appear in any r.e. matching
+ the dfa. */
};
-/* Some macros for user access to regexp internals. */
+/* Some macros for user access to dfa internals. */
/* ACCEPTING returns true if s could possibly be an accepting state of r. */
#define ACCEPTING(s, r) ((r).states[s].constraint)
/* ACCEPTS_IN_CONTEXT returns true if the given state accepts in the
specified context. */
-#define ACCEPTS_IN_CONTEXT(prevn, currn, prevl, currl, state, reg) \
- _SUCCEEDS_IN_CONTEXT((reg).states[state].constraint, \
+#define ACCEPTS_IN_CONTEXT(prevn, currn, prevl, currl, state, dfa) \
+ SUCCEEDS_IN_CONTEXT((dfa).states[state].constraint, \
prevn, currn, prevl, currl)
/* FIRST_MATCHING_REGEXP returns the index number of the first of parallel
regexps that a given state could accept. Parallel regexps are numbered
starting at 1. */
-#define FIRST_MATCHING_REGEXP(state, reg) (-(reg).states[state].first_end)
+#define FIRST_MATCHING_REGEXP(state, dfa) (-(dfa).states[state].first_end)
/* Entry points. */
#ifdef __STDC__
-/* Regsyntax() takes two arguments; the first sets the syntax bits described
+/* dfasyntax() takes two arguments; the first sets the syntax bits described
earlier in this file, and the second sets the case-folding flag. */
-extern void regsyntax(long, int);
+extern void dfasyntax(reg_syntax_t, int);
-/* Compile the given string of the given length into the given struct regexp.
+/* Compile the given string of the given length into the given struct dfa.
Final argument is a flag specifying whether to build a searching or an
exact matcher. */
-extern void regcompile(const char *, size_t, struct regexp *, int);
+extern void dfacomp(char *, size_t, struct dfa *, int);
-/* Execute the given struct regexp on the buffer of characters. The
+/* Execute the given struct dfa on the buffer of characters. The
first char * points to the beginning, and the second points to the
first character after the end of the buffer, which must be a writable
place so a sentinel end-of-buffer marker can be stored there. The
@@ -507,37 +324,37 @@ extern void regcompile(const char *, size_t, struct regexp *, int);
order to verify backreferencing; otherwise the flag will be cleared.
Returns NULL if no match is found, or a pointer to the first
character after the first & shortest matching string in the buffer. */
-extern char *regexecute(struct regexp *, char *, char *, int, int *, int *);
+extern char *dfaexec(struct dfa *, char *, char *, int, int *, int *);
-/* Free the storage held by the components of a struct regexp. */
-extern void reg_free(struct regexp *);
+/* Free the storage held by the components of a struct dfa. */
+extern void dfafree(struct dfa *);
/* Entry points for people who know what they're doing. */
-/* Initialize the components of a struct regexp. */
-extern void reginit(struct regexp *);
+/* Initialize the components of a struct dfa. */
+extern void dfainit(struct dfa *);
-/* Incrementally parse a string of given length into a struct regexp. */
-extern void regparse(const char *, size_t, struct regexp *);
+/* Incrementally parse a string of given length into a struct dfa. */
+extern void dfaparse(char *, size_t, struct dfa *);
/* Analyze a parsed regexp; second argument tells whether to build a searching
or an exact matcher. */
-extern void reganalyze(struct regexp *, int);
+extern void dfaanalyze(struct dfa *, int);
/* Compute, for each possible character, the transitions out of a given
state, storing them in an array of integers. */
-extern void regstate(int, struct regexp *, int []);
+extern void dfastate(int, struct dfa *, int []);
/* Error handling. */
-/* Regerror() is called by the regexp routines whenever an error occurs. It
+/* dfaerror() is called by the regexp routines whenever an error occurs. It
takes a single argument, a NUL-terminated string describing the error.
- The default reg_error() prints the error message to stderr and exits.
- The user can provide a different reg_free() if so desired. */
-extern void reg_error(const char *);
+ The default dfaerror() prints the error message to stderr and exits.
+ The user can provide a different dfafree() if so desired. */
+extern void dfaerror(const char *);
#else /* ! __STDC__ */
-extern void regsyntax(), regcompile(), reg_free(), reginit(), regparse();
-extern void reganalyze(), regstate(), reg_error();
-extern char *regexecute();
-#endif
+extern void dfasyntax(), dfacomp(), dfafree(), dfainit(), dfaparse();
+extern void dfaanalyze(), dfastate(), dfaerror();
+extern char *dfaexec();
+#endif /* ! __STDC__ */
diff --git a/gnu/usr.bin/awk/eval.c b/gnu/usr.bin/awk/eval.c
index f640f3733ada..18f67fd2b592 100644
--- a/gnu/usr.bin/awk/eval.c
+++ b/gnu/usr.bin/awk/eval.c
@@ -3,7 +3,7 @@
*/
/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -137,6 +137,10 @@ register NODE *volatile tree;
NODE *volatile stable_tree;
int volatile traverse = 1; /* True => loop thru tree (Node_rule_list) */
+ /* avoid false source indications */
+ source = NULL;
+ sourceline = 0;
+
if (tree == NULL)
return 1;
sourceline = tree->source_line;
@@ -318,7 +322,10 @@ register NODE *volatile tree;
break;
case Node_K_delete:
- do_delete(tree->lnode, tree->rnode);
+ if (tree->rnode != NULL)
+ do_delete(tree->lnode, tree->rnode);
+ else
+ assoc_clear(tree->lnode);
break;
case Node_K_next:
@@ -378,7 +385,7 @@ register NODE *tree;
register int di;
AWKNUM x, x1, x2;
long lx;
-#ifdef CRAY
+#ifdef _CRAY
long lx2;
#endif
@@ -386,21 +393,25 @@ register NODE *tree;
if (tree == NULL)
return Nnull_string;
if (tree->type == Node_val) {
- if (tree->stref <= 0) cant_happen();
+ if ((char)tree->stref <= 0) cant_happen();
return tree;
}
if (tree->type == Node_var) {
- if (tree->var_value->stref <= 0) cant_happen();
+ if ((char)tree->var_value->stref <= 0) cant_happen();
return tree->var_value;
}
+#endif
+
if (tree->type == Node_param_list) {
- if (stack_ptr[tree->param_cnt] == NULL)
+ tree = stack_ptr[tree->param_cnt];
+ if (tree == NULL)
return Nnull_string;
- else
- return stack_ptr[tree->param_cnt]->var_value;
}
-#endif
+
switch (tree->type) {
+ case Node_var:
+ return tree->var_value;
+
case Node_and:
return tmp_number((AWKNUM) (eval_condition(tree->lnode)
&& eval_condition(tree->rnode)));
@@ -443,7 +454,7 @@ register NODE *tree;
return *lhs;
case Node_var_array:
- fatal("attempt to use an array in a scalar context");
+ fatal("attempt to use array `%s' in a scalar context", tree->vname);
case Node_unary_minus:
t1 = tree_eval(tree->subnode);
@@ -489,32 +500,52 @@ register NODE *tree;
case Node_concat:
{
#define STACKSIZE 10
- NODE *stack[STACKSIZE];
- register NODE **sp;
- register int len;
+ NODE *treelist[STACKSIZE+1];
+ NODE *strlist[STACKSIZE+1];
+ register NODE **treep;
+ register NODE **strp;
+ register size_t len;
char *str;
register char *dest;
- sp = stack;
- len = 0;
+ /*
+ * This is an efficiency hack for multiple adjacent string
+ * concatenations, to avoid recursion and string copies.
+ *
+ * Node_concat trees grow downward to the left, so
+ * descend to lowest (first) node, accumulating nodes
+ * to evaluate to strings as we go.
+ */
+ treep = treelist;
while (tree->type == Node_concat) {
- *sp = force_string(tree_eval(tree->lnode));
- tree = tree->rnode;
- len += (*sp)->stlen;
- if (++sp == &stack[STACKSIZE-2]) /* one more and NULL */
+ *treep++ = tree->rnode;
+ tree = tree->lnode;
+ if (treep == &treelist[STACKSIZE])
break;
}
- *sp = force_string(tree_eval(tree));
- len += (*sp)->stlen;
- *++sp = NULL;
+ *treep = tree;
+ /*
+ * Now, evaluate to strings in LIFO order, accumulating
+ * the string length, so we can do a single malloc at the
+ * end.
+ */
+ strp = strlist;
+ len = 0;
+ while (treep >= treelist) {
+ *strp = force_string(tree_eval(*treep--));
+ len += (*strp)->stlen;
+ strp++;
+ }
+ *strp = NULL;
emalloc(str, char *, len+2, "tree_eval");
+ str[len] = str[len+1] = '\0'; /* for good measure */
dest = str;
- sp = stack;
- while (*sp) {
- memcpy(dest, (*sp)->stptr, (*sp)->stlen);
- dest += (*sp)->stlen;
- free_temp(*sp);
- sp++;
+ strp = strlist;
+ while (*strp) {
+ memcpy(dest, (*strp)->stptr, (*strp)->stlen);
+ dest += (*strp)->stlen;
+ free_temp(*strp);
+ strp++;
}
r = make_str_node(str, len, ALREADY_MALLOCED);
r->flags |= TEMP;
@@ -629,7 +660,7 @@ register NODE *tree;
return tmp_number(x1 - x2);
case Node_var_array:
- fatal("attempt to use an array in a scalar context");
+ fatal("attempt to use array `%s' in a scalar context", tree->vname);
default:
fatal("illegal type (%d) in tree_eval", tree->type);
@@ -696,7 +727,7 @@ cmp_nodes(t1, t2)
register NODE *t1, *t2;
{
register int ret;
- register int len1, len2;
+ register size_t len1, len2;
if (t1 == t2)
return 0;
@@ -944,22 +975,26 @@ NODE *arg_list; /* Node_expression_list of calling args. */
if (arg->type == Node_param_list)
arg = stack_ptr[arg->param_cnt];
n = *sp++;
- if (arg->type == Node_var && n->type == Node_var_array) {
+ if ((arg->type == Node_var || arg->type == Node_var_array)
+ && n->type == Node_var_array) {
/* should we free arg->var_value ? */
arg->var_array = n->var_array;
arg->type = Node_var_array;
+ arg->array_size = n->array_size;
+ arg->table_size = n->table_size;
+ arg->flags = n->flags;
}
- unref(n->lnode);
+ /* n->lnode overlays the array size, don't unref it if array */
+ if (n->type != Node_var_array)
+ unref(n->lnode);
freenode(n);
count--;
}
while (count-- > 0) {
n = *sp++;
/* if n is an (local) array, all the elements should be freed */
- if (n->type == Node_var_array) {
+ if (n->type == Node_var_array)
assoc_clear(n);
- free(n->var_array);
- }
unref(n->lnode);
freenode(n);
}
@@ -985,7 +1020,7 @@ NODE *arg_list; /* Node_expression_list of calling args. */
*/
NODE **
-get_lhs(ptr, assign)
+r_get_lhs(ptr, assign)
register NODE *ptr;
Func_ptr *assign;
{
@@ -994,11 +1029,11 @@ Func_ptr *assign;
switch (ptr->type) {
case Node_var_array:
- fatal("attempt to use an array in a scalar context");
+ fatal("attempt to use array `%s' in a scalar context", ptr->vname);
case Node_var:
aptr = &(ptr->var_value);
#ifdef DEBUG
- if (ptr->var_value->stref <= 0)
+ if ((char)ptr->var_value->stref <= 0)
cant_happen();
#endif
break;
@@ -1170,7 +1205,7 @@ set_ORS()
ORS[ORSlen] = '\0';
}
-static NODE **fmt_list = NULL;
+NODE **fmt_list = NULL;
static int fmt_ok P((NODE *n));
static int fmt_index P((NODE *n));
diff --git a/gnu/usr.bin/awk/field.c b/gnu/usr.bin/awk/field.c
index d8f9a5455631..17dce9b684e5 100644
--- a/gnu/usr.bin/awk/field.c
+++ b/gnu/usr.bin/awk/field.c
@@ -3,7 +3,7 @@
*/
/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -25,26 +25,28 @@
#include "awk.h"
-static int (*parse_field) P((int, char **, int, NODE *,
- Regexp *, void (*)(), NODE *));
+typedef void (* Setfunc) P((int, char*, int, NODE *));
+
+static long (*parse_field) P((int, char **, int, NODE *,
+ Regexp *, Setfunc, NODE *));
static void rebuild_record P((void));
-static int re_parse_field P((int, char **, int, NODE *,
- Regexp *, void (*)(), NODE *));
-static int def_parse_field P((int, char **, int, NODE *,
- Regexp *, void (*)(), NODE *));
-static int sc_parse_field P((int, char **, int, NODE *,
- Regexp *, void (*)(), NODE *));
-static int fw_parse_field P((int, char **, int, NODE *,
- Regexp *, void (*)(), NODE *));
+static long re_parse_field P((int, char **, int, NODE *,
+ Regexp *, Setfunc, NODE *));
+static long def_parse_field P((int, char **, int, NODE *,
+ Regexp *, Setfunc, NODE *));
+static long sc_parse_field P((int, char **, int, NODE *,
+ Regexp *, Setfunc, NODE *));
+static long fw_parse_field P((int, char **, int, NODE *,
+ Regexp *, Setfunc, NODE *));
static void set_element P((int, char *, int, NODE *));
-static void grow_fields_arr P((int num));
+static void grow_fields_arr P((long num));
static void set_field P((int num, char *str, int len, NODE *dummy));
static Regexp *FS_regexp = NULL;
static char *parse_extent; /* marks where to restart parse of record */
-static int parse_high_water=0; /* field number that we have parsed so far */
-static int nf_high_water = 0; /* size of fields_arr */
+static long parse_high_water=0; /* field number that we have parsed so far */
+static long nf_high_water = 0; /* size of fields_arr */
static int resave_fs;
static NODE *save_FS; /* save current value of FS when line is read,
* to be used in deferred parsing
@@ -74,7 +76,7 @@ init_fields()
static void
grow_fields_arr(num)
-int num;
+long num;
{
register int t;
register NODE *n;
@@ -112,13 +114,13 @@ NODE *dummy; /* not used -- just to make interface same as set_element */
static void
rebuild_record()
{
- register int tlen;
+ register size_t tlen;
register NODE *tmp;
NODE *ofs;
char *ops;
register char *cops;
register NODE **ptr;
- register int ofslen;
+ register size_t ofslen;
tlen = 0;
ofs = force_string(OFS_node->var_value);
@@ -130,9 +132,9 @@ rebuild_record()
ptr--;
}
tlen += (NF - 1) * ofslen;
- if (tlen < 0)
+ if ((long)tlen < 0)
tlen = 0;
- emalloc(ops, char *, tlen + 2, "fix_fields");
+ emalloc(ops, char *, tlen + 2, "rebuild_record");
cops = ops;
ops[0] = '\0';
for (ptr = &fields_arr[1]; ptr <= &fields_arr[NF]; ptr++) {
@@ -204,7 +206,7 @@ set_NF()
{
register int i;
- NF = (int) force_number(NF_node->var_value);
+ NF = (long) force_number(NF_node->var_value);
if (NF > nf_high_water)
grow_fields_arr(NF);
for (i = parse_high_water + 1; i <= NF; i++) {
@@ -219,14 +221,14 @@ set_NF()
* via (*parse_field)(). This variation is for when FS is a regular
* expression -- either user-defined or because RS=="" and FS==" "
*/
-static int
+static long
re_parse_field(up_to, buf, len, fs, rp, set, n)
int up_to; /* parse only up to this field number */
char **buf; /* on input: string to parse; on output: point to start next */
int len;
NODE *fs;
Regexp *rp;
-void (*set) (); /* routine to set the value of the parsed field */
+Setfunc set; /* routine to set the value of the parsed field */
NODE *n;
{
register char *scan = *buf;
@@ -240,22 +242,23 @@ NODE *n;
return nf;
if (*RS == 0 && default_FS)
- while (scan < end && isspace(*scan))
+ while (scan < end && (*scan == ' ' || *scan == '\t' || *scan == '\n'))
scan++;
field = scan;
while (scan < end
- && research(rp, scan, 0, (int)(end - scan), 1) != -1
+ && research(rp, scan, 0, (end - scan), 1) != -1
&& nf < up_to) {
- if (REEND(rp, scan) == RESTART(rp, scan)) { /* null match */
+ if (REEND(rp, scan) == RESTART(rp, scan)) { /* null match */
scan++;
if (scan == end) {
- (*set)(++nf, field, scan - field, n);
+ (*set)(++nf, field, (int)(scan - field), n);
up_to = nf;
break;
}
continue;
}
- (*set)(++nf, field, scan + RESTART(rp, scan) - field, n);
+ (*set)(++nf, field,
+ (int)(scan + RESTART(rp, scan) - field), n);
scan += REEND(rp, scan);
field = scan;
if (scan == end) /* FS at end of record */
@@ -274,14 +277,14 @@ NODE *n;
* via (*parse_field)(). This variation is for when FS is a single space
* character.
*/
-static int
+static long
def_parse_field(up_to, buf, len, fs, rp, set, n)
int up_to; /* parse only up to this field number */
char **buf; /* on input: string to parse; on output: point to start next */
int len;
NODE *fs;
Regexp *rp;
-void (*set) (); /* routine to set the value of the parsed field */
+Setfunc set; /* routine to set the value of the parsed field */
NODE *n;
{
register char *scan = *buf;
@@ -328,14 +331,14 @@ NODE *n;
* via (*parse_field)(). This variation is for when FS is a single character
* other than space.
*/
-static int
+static long
sc_parse_field(up_to, buf, len, fs, rp, set, n)
int up_to; /* parse only up to this field number */
char **buf; /* on input: string to parse; on output: point to start next */
int len;
NODE *fs;
Regexp *rp;
-void (*set) (); /* routine to set the value of the parsed field */
+Setfunc set; /* routine to set the value of the parsed field */
NODE *n;
{
register char *scan = *buf;
@@ -360,14 +363,18 @@ NODE *n;
/* because it will be destroyed now: */
*end = fschar; /* sentinel character */
- for (; nf < up_to; scan++) {
+ for (; nf < up_to;) {
field = scan;
- while (*scan++ != fschar)
- ;
- scan--;
+ while (*scan != fschar)
+ scan++;
(*set)(++nf, field, (int)(scan - field), n);
if (scan == end)
break;
+ scan++;
+ if (scan == end) { /* FS at end of record */
+ (*set)(++nf, field, 0, n);
+ break;
+ }
}
/* everything done, restore original char at *end */
@@ -381,18 +388,18 @@ NODE *n;
* this is called both from get_field() and from do_split()
* via (*parse_field)(). This variation is for fields are fixed widths.
*/
-static int
+static long
fw_parse_field(up_to, buf, len, fs, rp, set, n)
int up_to; /* parse only up to this field number */
char **buf; /* on input: string to parse; on output: point to start next */
int len;
NODE *fs;
Regexp *rp;
-void (*set) (); /* routine to set the value of the parsed field */
+Setfunc set; /* routine to set the value of the parsed field */
NODE *n;
{
register char *scan = *buf;
- register int nf = parse_high_water;
+ register long nf = parse_high_water;
register char *end = scan + len;
if (up_to == HUGE)
@@ -512,11 +519,21 @@ NODE *tree;
NODE *t1, *t2, *t3, *tmp;
NODE *fs;
char *s;
- int (*parseit)P((int, char **, int, NODE *,
- Regexp *, void (*)(), NODE *));
+ long (*parseit)P((int, char **, int, NODE *,
+ Regexp *, Setfunc, NODE *));
Regexp *rp = NULL;
- t1 = tree_eval(tree->lnode);
+
+ /*
+ * do dupnode(), to avoid problems like
+ * x = split(a[1], a, "blah")
+ * since we assoc_clear the array. gack.
+ * this also gives up complete call by value semantics.
+ */
+ tmp = tree_eval(tree->lnode);
+ t1 = dupnode(tmp);
+ free_temp(tmp);
+
t2 = tree->rnode->lnode;
t3 = tree->rnode->rnode->lnode;
@@ -549,7 +566,7 @@ NODE *tree;
s = t1->stptr;
tmp = tmp_number((AWKNUM) (*parseit)(HUGE, &s, (int)t1->stlen,
fs, rp, set_element, t2));
- free_temp(t1);
+ unref(t1);
free_temp(t3);
return tmp;
}
@@ -557,10 +574,16 @@ NODE *tree;
void
set_FS()
{
- NODE *tmp = NULL;
char buf[10];
NODE *fs;
+ /*
+ * If changing the way fields are split, obey least-suprise
+ * semantics, and force $0 to be split totally.
+ */
+ if (fields_arr != NULL)
+ (void) get_field(HUGE - 1, 0);
+
buf[0] = '\0';
default_FS = 0;
if (FS_regexp) {
@@ -586,6 +609,9 @@ set_FS()
else if (fs->stptr[0] != ' ' && fs->stlen == 1) {
if (IGNORECASE == 0)
parse_field = sc_parse_field;
+ else if (fs->stptr[0] == '\\')
+ /* yet another special case */
+ strcpy(buf, "[\\\\]");
else
sprintf(buf, "[%c]", fs->stptr[0]);
}
@@ -625,6 +651,13 @@ set_FIELDWIDTHS()
if (do_unix) /* quick and dirty, does the trick */
return;
+ /*
+ * If changing the way fields are split, obey least-suprise
+ * semantics, and force $0 to be split totally.
+ */
+ if (fields_arr != NULL)
+ (void) get_field(HUGE - 1, 0);
+
parse_field = fw_parse_field;
scan = force_string(FIELDWIDTHS_node->var_value)->stptr;
end = scan + 1;
diff --git a/gnu/usr.bin/awk/getopt.c b/gnu/usr.bin/awk/getopt.c
index bbf345c33ca2..fd142f5a207a 100644
--- a/gnu/usr.bin/awk/getopt.c
+++ b/gnu/usr.bin/awk/getopt.c
@@ -3,44 +3,74 @@
"Keep this file name-space clean" means, talk to roland@gnu.ai.mit.edu
before changing it!
- Copyright (C) 1987, 88, 89, 90, 91, 1992 Free Software Foundation, Inc.
+ Copyright (C) 1987, 88, 89, 90, 91, 92, 93, 1994
+ Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify it
- under the terms of the GNU Library General Public License as published
- by the Free Software Foundation; either version 2, or (at your option)
- any later version.
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 2, or (at your option) any
+ later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
- You should have received a copy of the GNU Library General Public
- License along with this program; if not, write to the Free Software
- Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
-#ifdef GAWK
+#ifdef HAVE_CONFIG_H
+#if defined (emacs) || defined (CONFIG_BROKETS)
+/* We use <config.h> instead of "config.h" so that a compilation
+ using -I. -I$srcdir will use ./config.h rather than $srcdir/config.h
+ (which it would do because it found this file in $srcdir). */
+#include <config.h>
+#else
#include "config.h"
#endif
+#endif
+
+#ifndef __STDC__
+/* This is a separate conditional since some stdc systems
+ reject `defined (const)'. */
+#ifndef const
+#define const
+#endif
+#endif
+
+/* This tells Alpha OSF/1 not to define a getopt prototype in <stdio.h>. */
+#ifndef _NO_PROTO
+#define _NO_PROTO
+#endif
#include <stdio.h>
+/* Comment out all this code if we are using the GNU C Library, and are not
+ actually compiling the library itself. This code is part of the GNU C
+ Library, but also included in many other GNU distributions. Compiling
+ and linking in this code is a waste when using the GNU C library
+ (especially if it is a shared library). Rather than having every GNU
+ program understand `configure --with-gnu-libc' and omit the object files,
+ it is simpler to just do this in the source for each such file. */
+
+#if defined (_LIBC) || !defined (__GNU_LIBRARY__)
+
+
/* This needs to come after some library #include
to get __GNU_LIBRARY__ defined. */
-#ifdef __GNU_LIBRARY__
+#if defined(__GNU_LIBRARY__) || defined(STDC_HEADERS)
+/* Don't include stdlib.h for non-GNU C libraries because some of them
+ contain conflicting prototypes for getopt. */
#include <stdlib.h>
-#include <string.h>
-#endif /* GNU C library. */
-
-
-#ifndef __STDC__
-#define const
-#endif
+#else
+extern char *getenv ();
+#endif /* __GNU_LIBRARY || STDC_HEADERS */
/* If GETOPT_COMPAT is defined, `+' as well as `--' can introduce a
long-named option. Because this is not POSIX.2 compliant, it is
- being phased out. */
-#define GETOPT_COMPAT
+ being phased out. */
+/* #define GETOPT_COMPAT */
/* This version of `getopt' appears to the caller like standard Unix `getopt'
but it behaves differently for the user, since it allows the user
@@ -78,6 +108,7 @@ char *optarg = 0;
Otherwise, `optind' communicates from one call to the next
how much of ARGV has been scanned so far. */
+/* XXX 1003.2 says this must be 1 before any call. */
int optind = 0;
/* The next char to be scanned in the option-element
@@ -94,6 +125,12 @@ static char *nextchar;
int opterr = 1;
+/* Set to an option character which was unrecognized.
+ This must be initialized on some systems to avoid linking in the
+ system's own getopt implementation. */
+
+int optopt = '?';
+
/* Describe how to deal with options that follow non-option ARGV-elements.
If the caller did not specify anything,
@@ -128,41 +165,46 @@ static enum
REQUIRE_ORDER, PERMUTE, RETURN_IN_ORDER
} ordering;
-#ifdef __GNU_LIBRARY__
+#if defined(__GNU_LIBRARY__) || defined(STDC_HEADERS)
+/* We want to avoid inclusion of string.h with non-GNU libraries
+ because there are many ways it can cause trouble.
+ On some systems, it contains special magic macros that don't work
+ in GCC. */
#include <string.h>
#define my_index strchr
-#define my_bcopy(src, dst, n) memcpy ((dst), (src), (n))
#else
/* Avoid depending on library functions or files
whose names are inconsistent. */
-char *getenv ();
-
static char *
-my_index (string, chr)
- char *string;
+my_index (str, chr)
+ const char *str;
int chr;
{
- while (*string)
+ while (*str)
{
- if (*string == chr)
- return string;
- string++;
+ if (*str == chr)
+ return (char *) str;
+ str++;
}
return 0;
}
-static void
-my_bcopy (from, to, size)
- char *from, *to;
- int size;
-{
- int i;
- for (i = 0; i < size; i++)
- to[i] = from[i];
-}
-#endif /* GNU C library. */
+/* If using GCC, we can safely declare strlen this way.
+ If not using GCC, it is ok not to declare it.
+ (Supposedly there are some machines where it might get a warning,
+ but changing this conditional to __STDC__ is too risky.) */
+#ifdef __GNUC__
+#ifdef IN_GCC
+#include "gstddef.h"
+#else
+#include <stddef.h>
+#endif
+extern size_t strlen (const char *);
+#endif
+
+#endif /* __GNU_LIBRARY__ || STDC_HEADERS */
/* Handle permutation of arguments. */
@@ -186,17 +228,51 @@ static void
exchange (argv)
char **argv;
{
- int nonopts_size = (last_nonopt - first_nonopt) * sizeof (char *);
- char **temp = (char **) malloc (nonopts_size);
+ int bottom = first_nonopt;
+ int middle = last_nonopt;
+ int top = optind;
+ char *tem;
- /* Interchange the two blocks of data in ARGV. */
+ /* Exchange the shorter segment with the far end of the longer segment.
+ That puts the shorter segment into the right place.
+ It leaves the longer segment in the right place overall,
+ but it consists of two parts that need to be swapped next. */
- my_bcopy (&argv[first_nonopt], temp, nonopts_size);
- my_bcopy (&argv[last_nonopt], &argv[first_nonopt],
- (optind - last_nonopt) * sizeof (char *));
- my_bcopy (temp, &argv[first_nonopt + optind - last_nonopt], nonopts_size);
+ while (top > middle && middle > bottom)
+ {
+ if (top - middle > middle - bottom)
+ {
+ /* Bottom segment is the short one. */
+ int len = middle - bottom;
+ register int i;
- free(temp);
+ /* Swap it with the top part of the top segment. */
+ for (i = 0; i < len; i++)
+ {
+ tem = argv[bottom + i];
+ argv[bottom + i] = argv[top - (middle - bottom) + i];
+ argv[top - (middle - bottom) + i] = tem;
+ }
+ /* Exclude the moved bottom segment from further swapping. */
+ top -= len;
+ }
+ else
+ {
+ /* Top segment is the short one. */
+ int len = top - middle;
+ register int i;
+
+ /* Swap it with the bottom part of the bottom segment. */
+ for (i = 0; i < len; i++)
+ {
+ tem = argv[bottom + i];
+ argv[bottom + i] = argv[middle + i];
+ argv[middle + i] = tem;
+ }
+ /* Exclude the moved top segment from further swapping. */
+ bottom += len;
+ }
+ }
/* Update records for the slots the non-options now occupy. */
@@ -394,8 +470,7 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
int exact = 0;
int ambig = 0;
const struct option *pfound = NULL;
- int indfound = 0;
- extern int strncmp();
+ int indfound;
while (*s && *s != '=')
s++;
@@ -441,7 +516,7 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
if (*s)
{
/* Don't test has_arg with >, because some C compilers don't
- allow it to be used on enums. */
+ allow it to be used on enums. */
if (pfound->has_arg)
optarg = s + 1;
else
@@ -473,7 +548,7 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
fprintf (stderr, "%s: option `%s' requires an argument\n",
argv[0], argv[optind - 1]);
nextchar += strlen (nextchar);
- return '?';
+ return optstring[0] == ':' ? ':' : '?';
}
}
nextchar += strlen (nextchar);
@@ -489,7 +564,7 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
/* Can't find it as a long option. If this is not getopt_long_only,
or the option starts with '--' or is not a valid short
option, then it's an error.
- Otherwise interpret it as a short option. */
+ Otherwise interpret it as a short option. */
if (!long_only || argv[optind][1] == '-'
#ifdef GETOPT_COMPAT
|| argv[optind][0] == '+'
@@ -527,12 +602,18 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
{
if (opterr)
{
+#if 0
if (c < 040 || c >= 0177)
fprintf (stderr, "%s: unrecognized option, character code 0%o\n",
argv[0], c);
else
fprintf (stderr, "%s: unrecognized option `-%c'\n", argv[0], c);
+#else
+ /* 1003.2 specifies the format of this message. */
+ fprintf (stderr, "%s: illegal option -- %c\n", argv[0], c);
+#endif
}
+ optopt = c;
return '?';
}
if (temp[1] == ':')
@@ -562,9 +643,21 @@ _getopt_internal (argc, argv, optstring, longopts, longind, long_only)
else if (optind == argc)
{
if (opterr)
- fprintf (stderr, "%s: option `-%c' requires an argument\n",
- argv[0], c);
- c = '?';
+ {
+#if 0
+ fprintf (stderr, "%s: option `-%c' requires an argument\n",
+ argv[0], c);
+#else
+ /* 1003.2 specifies the format of this message. */
+ fprintf (stderr, "%s: option requires an argument -- %c\n",
+ argv[0], c);
+#endif
+ }
+ optopt = c;
+ if (optstring[0] == ':')
+ c = ':';
+ else
+ c = '?';
}
else
/* We already incremented `optind' once;
@@ -588,6 +681,8 @@ getopt (argc, argv, optstring)
(int *) 0,
0);
}
+
+#endif /* _LIBC or not __GNU_LIBRARY__. */
#ifdef TEST
diff --git a/gnu/usr.bin/awk/getopt.h b/gnu/usr.bin/awk/getopt.h
index de027434f7cb..b0fc4ffb6d76 100644
--- a/gnu/usr.bin/awk/getopt.h
+++ b/gnu/usr.bin/awk/getopt.h
@@ -1,19 +1,19 @@
/* Declarations for getopt.
- Copyright (C) 1989, 1990, 1991, 1992 Free Software Foundation, Inc.
+ Copyright (C) 1989, 1990, 1991, 1992, 1993 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify it
- under the terms of the GNU Library General Public License as published
- by the Free Software Foundation; either version 2, or (at your option)
- any later version.
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 2, or (at your option) any
+ later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
- You should have received a copy of the GNU Library General Public
- License along with this program; if not, write to the Free Software
- Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
#ifndef _GETOPT_H
#define _GETOPT_H 1
@@ -49,6 +49,10 @@ extern int optind;
extern int opterr;
+/* Set to an option character which was unrecognized. */
+
+extern int optopt;
+
/* Describe the long-named options requested by the application.
The LONG_OPTIONS argument to getopt_long or getopt_long_only is a vector
of `struct option' terminated by an element containing a name which is
@@ -72,7 +76,7 @@ extern int opterr;
struct option
{
-#if __STDC__
+#ifdef __STDC__
const char *name;
#else
char *name;
@@ -86,14 +90,11 @@ struct option
/* Names for the values of the `has_arg' field of `struct option'. */
-enum _argtype
-{
- no_argument,
- required_argument,
- optional_argument
-};
+#define no_argument 0
+#define required_argument 1
+#define optional_argument 2
-#if __STDC__
+#ifdef __STDC__
#if defined(__GNU_LIBRARY__)
/* Many other libraries have conflicting prototypes for getopt, with
differences in the consts, in stdlib.h. To avoid compilation
diff --git a/gnu/usr.bin/awk/getopt1.c b/gnu/usr.bin/awk/getopt1.c
index e2127cd58d42..7739b51215f9 100644
--- a/gnu/usr.bin/awk/getopt1.c
+++ b/gnu/usr.bin/awk/getopt1.c
@@ -1,40 +1,64 @@
-/* Getopt for GNU.
- Copyright (C) 1987, 88, 89, 90, 91, 1992 Free Software Foundation, Inc.
-
-This file is part of the libiberty library.
-Libiberty is free software; you can redistribute it and/or
-modify it under the terms of the GNU Library General Public
-License as published by the Free Software Foundation; either
-version 2 of the License, or (at your option) any later version.
-
-Libiberty is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-Library General Public License for more details.
-
-You should have received a copy of the GNU Library General Public
-License along with libiberty; see the file COPYING.LIB. If
-not, write to the Free Software Foundation, Inc., 675 Mass Ave,
-Cambridge, MA 02139, USA. */
+/* getopt_long and getopt_long_only entry points for GNU getopt.
+ Copyright (C) 1987, 88, 89, 90, 91, 92, 1993
+ Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the
+ Free Software Foundation; either version 2, or (at your option) any
+ later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */
-#ifdef LIBC
-/* For when compiled as part of the GNU C library. */
-#include <ansidecl.h>
+#ifdef HAVE_CONFIG_H
+#if defined (emacs) || defined (CONFIG_BROKETS)
+/* We use <config.h> instead of "config.h" so that a compilation
+ using -I. -I$srcdir will use ./config.h rather than $srcdir/config.h
+ (which it would do because it found this file in $srcdir). */
+#include <config.h>
+#else
+#include "config.h"
+#endif
#endif
#include "getopt.h"
#ifndef __STDC__
+/* This is a separate conditional since some stdc systems
+ reject `defined (const)'. */
+#ifndef const
#define const
#endif
+#endif
+
+#include <stdio.h>
+
+/* Comment out all this code if we are using the GNU C Library, and are not
+ actually compiling the library itself. This code is part of the GNU C
+ Library, but also included in many other GNU distributions. Compiling
+ and linking in this code is a waste when using the GNU C library
+ (especially if it is a shared library). Rather than having every GNU
+ program understand `configure --with-gnu-libc' and omit the object files,
+ it is simpler to just do this in the source for each such file. */
+
+#if defined (_LIBC) || !defined (__GNU_LIBRARY__)
-#if defined(STDC_HEADERS) || defined(__GNU_LIBRARY__) || defined (LIBC)
+
+/* This needs to come after some library #include
+ to get __GNU_LIBRARY__ defined. */
+#if defined(__GNU_LIBRARY__) || defined(OS2) || defined(MSDOS) || defined(atarist)
#include <stdlib.h>
-#else /* STDC_HEADERS or __GNU_LIBRARY__ */
+#else
char *getenv ();
-#endif /* STDC_HEADERS or __GNU_LIBRARY__ */
+#endif
-#if !defined (NULL)
+#ifndef NULL
#define NULL 0
#endif
@@ -52,9 +76,9 @@ getopt_long (argc, argv, options, long_options, opt_index)
/* Like getopt_long, but '-' as well as '--' can indicate a long option.
If an option that starts with '-' (not '--') doesn't match a long option,
but does match a short option, it is parsed as a short option
- instead. */
+ instead. */
-int
+int
getopt_long_only (argc, argv, options, long_options, opt_index)
int argc;
char *const *argv;
@@ -64,6 +88,9 @@ getopt_long_only (argc, argv, options, long_options, opt_index)
{
return _getopt_internal (argc, argv, options, long_options, opt_index, 1);
}
+
+
+#endif /* _LIBC or not __GNU_LIBRARY__. */
#ifdef TEST
diff --git a/gnu/usr.bin/awk/io.c b/gnu/usr.bin/awk/io.c
index 7004aedd519d..7fe21ecf016f 100644
--- a/gnu/usr.bin/awk/io.c
+++ b/gnu/usr.bin/awk/io.c
@@ -3,7 +3,7 @@
*/
/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -23,6 +23,9 @@
* the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
+#if !defined(VMS) && !defined(VMS_POSIX) && !defined(_MSC_VER)
+#include <sys/param.h>
+#endif
#include "awk.h"
#ifndef O_RDONLY
@@ -33,13 +36,17 @@
#define S_ISDIR(m) (((m) & S_IFMT) == S_IFDIR)
#endif
+#ifndef ENFILE
+#define ENFILE EMFILE
+#endif
+
#ifndef atarist
#define INVALID_HANDLE (-1)
#else
#define INVALID_HANDLE (__SMALLEST_VALID_HANDLE - 1)
#endif
-#if defined(MSDOS) || defined(atarist)
+#if defined(MSDOS) || defined(OS2) || defined(atarist)
#define PIPES_SIMULATED
#endif
@@ -48,17 +55,34 @@ static int inrec P((IOBUF *iop));
static int iop_close P((IOBUF *iop));
struct redirect *redirect P((NODE *tree, int *errflg));
static void close_one P((void));
-static int close_redir P((struct redirect *rp));
+static int close_redir P((struct redirect *rp, int exitwarn));
#ifndef PIPES_SIMULATED
static int wait_any P((int interesting));
#endif
static IOBUF *gawk_popen P((char *cmd, struct redirect *rp));
-static IOBUF *iop_open P((char *file, char *how));
+static IOBUF *iop_open P((const char *file, const char *how));
static int gawk_pclose P((struct redirect *rp));
-static int do_pathopen P((char *file));
+static int do_pathopen P((const char *file));
+static int str2mode P((const char *mode));
+static void spec_setup P((IOBUF *iop, int len, int allocate));
+static int specfdopen P((IOBUF *iop, const char *name, const char *mode));
+static int pidopen P((IOBUF *iop, const char *name, const char *mode));
+static int useropen P((IOBUF *iop, const char *name, const char *mode));
extern FILE *fdopen();
+
+#if defined (MSDOS)
+#include "popen.h"
+#define popen(c,m) os_popen(c,m)
+#define pclose(f) os_pclose(f)
+#elif defined (OS2) /* OS/2, but not family mode */
+#if defined (_MSC_VER)
+#define popen(c,m) _popen(c,m)
+#define pclose(f) _pclose(f)
+#endif
+#else
extern FILE *popen();
+#endif
static struct redirect *red_head = NULL;
@@ -87,7 +111,6 @@ int skipping;
static int i = 1;
static int files = 0;
NODE *arg;
- int fd = INVALID_HANDLE;
static IOBUF *curfile = NULL;
if (skipping) {
@@ -121,8 +144,7 @@ int skipping;
/* NOTREACHED */
/* This is a kludge. */
unref(FILENAME_node->var_value);
- FILENAME_node->var_value =
- dupnode(arg);
+ FILENAME_node->var_value = dupnode(arg);
FNR = 0;
i++;
break;
@@ -131,8 +153,8 @@ int skipping;
if (files == 0) {
files++;
/* no args. -- use stdin */
- /* FILENAME is init'ed to "-" */
/* FNR is init'ed to 0 */
+ FILENAME_node->var_value = make_string("-", 1);
curfile = iop_alloc(fileno(stdin));
}
return curfile;
@@ -141,13 +163,13 @@ int skipping;
void
set_FNR()
{
- FNR = (int) FNR_node->var_value->numbr;
+ FNR = (long) FNR_node->var_value->numbr;
}
void
set_NR()
{
- NR = (int) NR_node->var_value->numbr;
+ NR = (long) NR_node->var_value->numbr;
}
/*
@@ -238,12 +260,15 @@ do_input()
IOBUF *iop;
extern int exiting;
- if (setjmp(filebuf) != 0) {
- }
+ (void) setjmp(filebuf);
+
while ((iop = nextfile(0)) != NULL) {
if (inrec(iop) == 0)
while (interpret(expression_value) && inrec(iop) == 0)
- ;
+ continue;
+ /* recover any space from C based alloca */
+ (void) alloca(0);
+
if (exiting)
break;
}
@@ -260,10 +285,10 @@ int *errflg;
register char *str;
int tflag = 0;
int outflag = 0;
- char *direction = "to";
- char *mode;
+ const char *direction = "to";
+ const char *mode;
int fd;
- char *what = NULL;
+ const char *what = NULL;
switch (tree->type) {
case Node_redirect_append:
@@ -376,15 +401,19 @@ int *errflg;
rp->fp = stdout;
else if (fd == fileno(stderr))
rp->fp = stderr;
- else
- rp->fp = fdopen(fd, mode);
- if (isatty(fd))
+ else {
+ rp->fp = fdopen(fd, (char *) mode);
+ /* don't leak file descriptors */
+ if (rp->fp == NULL)
+ close(fd);
+ }
+ if (rp->fp != NULL && isatty(fd))
rp->flag |= RED_NOBUF;
}
}
if (rp->fp == NULL && rp->iop == NULL) {
/* too many files open -- close one and try again */
- if (errno == EMFILE)
+ if (errno == EMFILE || errno == ENFILE)
close_one();
else {
/*
@@ -454,16 +483,18 @@ NODE *tree;
if (rp == NULL) /* no match */
return tmp_number((AWKNUM) 0.0);
fflush(stdout); /* synchronize regular output */
- tmp = tmp_number((AWKNUM)close_redir(rp));
+ tmp = tmp_number((AWKNUM)close_redir(rp, 0));
rp = NULL;
return tmp;
}
static int
-close_redir(rp)
+close_redir(rp, exitwarn)
register struct redirect *rp;
+int exitwarn;
{
int status = 0;
+ char *what;
if (rp == NULL)
return 0;
@@ -482,14 +513,19 @@ register struct redirect *rp;
rp->iop = NULL;
}
}
+
+ what = (rp->flag & RED_PIPE) ? "pipe" : "file";
+
+ if (exitwarn)
+ warning("no explicit close of %s \"%s\" provided",
+ what, rp->value);
+
/* SVR4 awk checks and warns about status of close */
if (status) {
char *s = strerror(errno);
- warning("failure status (%d) on %s close of \"%s\" (%s).",
- status,
- (rp->flag & RED_PIPE) ? "pipe" :
- "file", rp->value, s);
+ warning("failure status (%d) on %s close of \"%s\" (%s)",
+ status, what, rp->value, s);
if (! do_unix) {
/* set ERRNO too so that program can get at it */
@@ -544,20 +580,27 @@ close_io ()
int status = 0;
errno = 0;
- if (fclose(stdout)) {
+ for (rp = red_head; rp != NULL; rp = next) {
+ next = rp->next;
+ /* close_redir() will print a message if needed */
+ /* if do_lint, warn about lack of explicit close */
+ if (close_redir(rp, do_lint))
+ status++;
+ rp = NULL;
+ }
+ /*
+ * Some of the non-Unix os's have problems doing an fclose
+ * on stdout and stderr. Since we don't really need to close
+ * them, we just flush them, and do that across the board.
+ */
+ if (fflush(stdout)) {
warning("error writing standard output (%s).", strerror(errno));
status++;
}
- if (fclose(stderr)) {
+ if (fflush(stderr)) {
warning("error writing standard error (%s).", strerror(errno));
status++;
}
- for (rp = red_head; rp != NULL; rp = next) {
- next = rp->next;
- if (close_redir(rp))
- status++;
- rp = NULL;
- }
return status;
}
@@ -565,7 +608,7 @@ close_io ()
static int
str2mode(mode)
-char *mode;
+const char *mode;
{
int ret;
@@ -581,7 +624,9 @@ char *mode;
case 'a':
ret = O_WRONLY|O_APPEND|O_CREAT;
break;
+
default:
+ ret = 0; /* lint */
cant_happen();
}
return ret;
@@ -598,10 +643,10 @@ char *mode;
int
devopen(name, mode)
-char *name, *mode;
+const char *name, *mode;
{
int openfd = INVALID_HANDLE;
- char *cp, *ptr;
+ const char *cp, *ptr;
int flag = 0;
struct stat buf;
extern double strtod();
@@ -618,7 +663,7 @@ char *name, *mode;
if (STREQ(name, "-"))
openfd = fileno(stdin);
- else if (STREQN(name, "/dev/", 5) && stat(name, &buf) == -1) {
+ else if (STREQN(name, "/dev/", 5) && stat((char *) name, &buf) == -1) {
cp = name + 5;
if (STREQ(cp, "stdin") && (flag & O_RDONLY) == O_RDONLY)
@@ -647,7 +692,7 @@ strictopen:
/* spec_setup --- setup an IOBUF for a special internal file */
-void
+static void
spec_setup(iop, len, allocate)
IOBUF *iop;
int len;
@@ -674,10 +719,10 @@ int allocate;
/* specfdopen --- open a fd special file */
-int
+static int
specfdopen(iop, name, mode)
IOBUF *iop;
-char *name, *mode;
+const char *name, *mode;
{
int fd;
IOBUF *tp;
@@ -694,23 +739,43 @@ char *name, *mode;
return 0;
}
+/*
+ * Following mess will improve in 2.16; this is written to avoid
+ * long lines, avoid splitting #if with backslash, and avoid #elif
+ * to maximize portability.
+ */
+#ifndef GETPGRP_NOARG
+#if defined(__svr4__) || defined(BSD4_4) || defined(_POSIX_SOURCE)
+#define GETPGRP_NOARG
+#else
+#if defined(i860) || defined(_AIX) || defined(hpux) || defined(VMS)
+#define GETPGRP_NOARG
+#else
+#if defined(OS2) || defined(MSDOS) || defined(AMIGA) || defined(atarist)
+#define GETPGRP_NOARG
+#endif
+#endif
+#endif
+#endif
+
+#ifdef GETPGRP_NOARG
+#define getpgrp_ARG /* nothing */
+#else
+#define getpgrp_ARG getpid()
+#endif
+
/* pidopen --- "open" /dev/pid, /dev/ppid, and /dev/pgrpid */
-int
+static int
pidopen(iop, name, mode)
IOBUF *iop;
-char *name, *mode;
+const char *name, *mode;
{
char tbuf[BUFSIZ];
int i;
if (name[6] == 'g')
-/* following #if will improve in 2.16 */
-#if defined(__svr4__) || defined(i860) || defined(_AIX) || defined(BSD4_4) || defined(__386BSD__)
- sprintf(tbuf, "%d\n", getpgrp());
-#else
- sprintf(tbuf, "%d\n", getpgrp(getpid()));
-#endif
+ sprintf(tbuf, "%d\n", getpgrp( getpgrp_ARG ));
else if (name[6] == 'i')
sprintf(tbuf, "%d\n", getpid());
else
@@ -733,15 +798,19 @@ char *name, *mode;
* supplementary group set.
*/
-int
+static int
useropen(iop, name, mode)
IOBUF *iop;
-char *name, *mode;
+const char *name, *mode;
{
char tbuf[BUFSIZ], *cp;
int i;
#if defined(NGROUPS_MAX) && NGROUPS_MAX > 0
+#if defined(atarist) || defined(__svr4__) || defined(__osf__)
+ gid_t groupset[NGROUPS_MAX];
+#else
int groupset[NGROUPS_MAX];
+#endif
int ngroups;
#endif
@@ -755,7 +824,7 @@ char *name, *mode;
for (i = 0; i < ngroups; i++) {
*cp++ = ' ';
- sprintf(cp, "%d", groupset[i]);
+ sprintf(cp, "%d", (int)groupset[i]);
cp += strlen(cp);
}
#endif
@@ -773,18 +842,16 @@ char *name, *mode;
static IOBUF *
iop_open(name, mode)
-char *name, *mode;
+const char *name, *mode;
{
int openfd = INVALID_HANDLE;
- char *cp, *ptr;
int flag = 0;
- int i;
struct stat buf;
IOBUF *iop;
static struct internal {
- char *name;
+ const char *name;
int compare;
- int (*fp)();
+ int (*fp) P((IOBUF*,const char *,const char *));
IOBUF iob;
} table[] = {
{ "/dev/fd/", 8, specfdopen },
@@ -805,12 +872,12 @@ char *name, *mode;
if (STREQ(name, "-"))
openfd = fileno(stdin);
- else if (STREQN(name, "/dev/", 5) && stat(name, &buf) == -1) {
+ else if (STREQN(name, "/dev/", 5) && stat((char *) name, &buf) == -1) {
int i;
for (i = 0; i < devcount; i++) {
if (STREQN(name, table[i].name, table[i].compare)) {
- IOBUF *iop = & table[i].iob;
+ iop = & table[i].iob;
if (iop->buf != NULL) {
spec_setup(iop, 0, 0);
@@ -939,8 +1006,9 @@ struct redirect *rp;
#else /* PIPES_SIMULATED */
/* use temporary file rather than pipe */
+ /* except if popen() provides real pipes too */
-#ifdef VMS
+#if defined(VMS) || defined(OS2) || defined (MSDOS)
static IOBUF *
gawk_popen(cmd, rp)
char *cmd;
@@ -958,7 +1026,7 @@ gawk_pclose(rp)
struct redirect *rp;
{
int rval, aval, fd = rp->iop->fd;
- FILE *kludge = fdopen(fd, "r"); /* pclose needs FILE* w/ right fileno */
+ FILE *kludge = fdopen(fd, (char *) "r"); /* pclose needs FILE* w/ right fileno */
rp->iop->fd = dup(fd); /* kludge to allow close() + pclose() */
rval = iop_close(rp->iop);
@@ -966,7 +1034,7 @@ struct redirect *rp;
aval = pclose(kludge);
return (rval < 0 ? rval : aval);
}
-#else /* VMS */
+#else /* VMS || OS2 || MSDOS */
static
struct {
@@ -1016,7 +1084,7 @@ struct redirect *rp;
free(pipes[cur].command);
return rval;
}
-#endif /* VMS */
+#endif /* VMS || OS2 || MSDOS */
#endif /* PIPES_SIMULATED */
@@ -1041,7 +1109,7 @@ NODE *tree;
rp = redirect(tree->rnode, &redir_error);
if (rp == NULL && redir_error) { /* failed redirect */
if (! do_unix) {
- char *s = strerror(redir_error);
+ s = strerror(redir_error);
unref(ERRNO_node->var_value);
ERRNO_node->var_value =
@@ -1056,7 +1124,7 @@ NODE *tree;
errcode = 0;
cnt = get_a_record(&s, iop, *RS, & errcode);
if (! do_unix && errcode != 0) {
- char *s = strerror(errcode);
+ s = strerror(errcode);
unref(ERRNO_node->var_value);
ERRNO_node->var_value = make_string(s, strlen(s));
@@ -1102,7 +1170,7 @@ NODE *tree;
int
pathopen (file)
-char *file;
+const char *file;
{
int fd = do_pathopen(file);
@@ -1134,12 +1202,12 @@ char *file;
static int
do_pathopen (file)
-char *file;
+const char *file;
{
- static char *savepath = DEFPATH; /* defined in config.h */
+ static const char *savepath = DEFPATH; /* defined in config.h */
static int first = 1;
- char *awkpath, *cp;
- char trypath[BUFSIZ];
+ const char *awkpath;
+ char *cp, trypath[BUFSIZ];
int fd;
if (STREQ(file, "-"))
@@ -1160,7 +1228,7 @@ char *file;
if (strchr(file, ':') != strchr(file, ']')
|| strchr(file, '>') != strchr(file, '/'))
#else /*!VMS*/
-#ifdef MSDOS
+#if defined(MSDOS) || defined(OS2)
if (strchr(file, '/') != strchr(file, '\\')
|| strchr(file, ':') != NULL)
#else
@@ -1169,6 +1237,12 @@ char *file;
#endif /*VMS*/
return (devopen(file, "r"));
+#if defined(MSDOS) || defined(OS2)
+ _searchenv(file, "AWKPATH", trypath);
+ if (trypath[0] == '\0')
+ _searchenv(file, "PATH", trypath);
+ return (trypath[0] == '\0') ? 0 : devopen(trypath, "r");
+#else
do {
trypath[0] = '\0';
/* this should take into account limits on size of trypath */
@@ -1180,7 +1254,7 @@ char *file;
#ifdef VMS
if (strchr(":]>/", *(cp-1)) == NULL)
#else
-#ifdef MSDOS
+#if defined(MSDOS) || defined(OS2)
if (strchr(":\\/", *(cp-1)) == NULL)
#else
if (*(cp-1) != '/')
@@ -1204,4 +1278,5 @@ char *file;
* Therefore try to open the file in the current directory.
*/
return (devopen(file, "r"));
+#endif
}
diff --git a/gnu/usr.bin/awk/iop.c b/gnu/usr.bin/awk/iop.c
index 0d7af1213db6..897daefbeb7c 100644
--- a/gnu/usr.bin/awk/iop.c
+++ b/gnu/usr.bin/awk/iop.c
@@ -3,7 +3,7 @@
*/
/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -62,7 +62,7 @@ int fd;
else if (fstat(fd, &stb) < 0)
return 8*512; /* conservative in case of DECnet access */
else
- return 24*512;
+ return 32*512;
#else
/*
@@ -146,17 +146,16 @@ int *errcode;
register char *bp = iop->off;
char *bufend;
char *start = iop->off; /* beginning of record */
- int saw_newline;
char rs;
- int eat_whitespace;
+ int saw_newline = 0, eat_whitespace = 0; /* used iff grRS==0 */
- if (iop->cnt == EOF) /* previous read hit EOF */
+ if (iop->cnt == EOF) { /* previous read hit EOF */
+ *out = NULL;
return EOF;
+ }
if (grRS == 0) { /* special case: grRS == "" */
rs = '\n';
- eat_whitespace = 0;
- saw_newline = 0;
} else
rs = (char) grRS;
@@ -181,9 +180,6 @@ int *errcode;
char *oldsplit = iop->buf + iop->secsiz;
long len; /* record length so far */
- if ((iop->flag & IOP_IS_INTERNAL) != 0)
- cant_happen();
-
len = bp - start;
if (len > iop->secsiz) {
/* expand secondary buffer */
@@ -245,7 +241,8 @@ int *errcode;
extern int default_FS;
if (default_FS && (bp == start || eat_whitespace)) {
- while (bp < iop->end && isspace(*bp))
+ while (bp < iop->end
+ && (*bp == ' ' || *bp == '\t' || *bp == '\n'))
bp++;
if (bp == iop->end) {
eat_whitespace = 1;
@@ -275,8 +272,10 @@ int *errcode;
iop->cnt = bp - start;
}
if (iop->cnt == EOF
- && (((iop->flag & IOP_IS_INTERNAL) != 0) || start == bp))
+ && (((iop->flag & IOP_IS_INTERNAL) != 0) || start == bp)) {
+ *out = NULL;
return EOF;
+ }
iop->off = bp;
bp--;
@@ -284,6 +283,10 @@ int *errcode;
bp++;
*bp = '\0';
if (grRS == 0) {
+ /* there could be more newlines left, clean 'em out now */
+ while (*(iop->off) == rs && iop->off <= iop->end)
+ (iop->off)++;
+
if (*--bp == rs)
*bp = '\0';
else
diff --git a/gnu/usr.bin/awk/main.c b/gnu/usr.bin/awk/main.c
index 77d0bf74e143..a14efffe15cf 100644
--- a/gnu/usr.bin/awk/main.c
+++ b/gnu/usr.bin/awk/main.c
@@ -3,7 +3,7 @@
*/
/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -55,9 +55,9 @@ NODE *ENVIRON_node, *IGNORECASE_node;
NODE *ARGC_node, *ARGV_node, *ARGIND_node;
NODE *FIELDWIDTHS_node;
-int NF;
-int NR;
-int FNR;
+long NF;
+long NR;
+long FNR;
int IGNORECASE;
char *RS;
char *OFS;
@@ -134,10 +134,19 @@ char **argv;
{
int c;
char *scan;
+ /* the + on the front tells GNU getopt not to rearrange argv */
+ const char *optlist = "+F:f:v:W:m:";
+ int stopped_early = 0;
+ int old_optind;
extern int optind;
extern int opterr;
extern char *optarg;
- int i;
+
+#ifdef __EMX__
+ _response(&argc, &argv);
+ _wildcard(&argc, &argv);
+ setvbuf(stdout, NULL, _IOLBF, BUFSIZ);
+#endif
(void) signal(SIGFPE, (SIGTYPE (*) P((int))) catchsig);
(void) signal(SIGSEGV, (SIGTYPE (*) P((int))) catchsig);
@@ -165,7 +174,6 @@ char **argv;
Nnull_string->flags = (PERM|STR|STRING|NUM|NUMBER);
/* Set up the special variables */
-
/*
* Note that this must be done BEFORE arg parsing else -F
* breaks horribly
@@ -179,11 +187,16 @@ char **argv;
/* Tell the regex routines how they should work. . . */
resetup();
+#ifdef fpsetmask
+ fpsetmask(~0xff);
+#endif
/* we do error messages ourselves on invalid options */
opterr = 0;
- /* the + on the front tells GNU getopt not to rearrange argv */
- while ((c = getopt_long(argc, argv, "+F:f:v:W:", optab, NULL)) != EOF) {
+ /* option processing. ready, set, go! */
+ for (optopt = 0, old_optind = 1;
+ (c = getopt_long(argc, argv, optlist, optab, NULL)) != EOF;
+ optopt = 0, old_optind = optind) {
if (do_posix)
opterr = 1;
switch (c) {
@@ -215,6 +228,19 @@ char **argv;
pre_assign(optarg);
break;
+ case 'm':
+ /*
+ * Research awk extension.
+ * -mf=nnn set # fields, gawk ignores
+ * -mr=nnn set record length, ditto
+ */
+ if (do_lint)
+ warning("-m[fr] option irrelevant");
+ if ((optarg[0] != 'r' && optarg[0] != 'f')
+ || optarg[1] != '=')
+ warning("-m option usage: -m[fn]=nnn");
+ break;
+
case 'W': /* gawk specific options */
gawk_option(optarg);
break;
@@ -233,7 +259,7 @@ char **argv;
break;
case 's':
- if (strlen(optarg) == 0)
+ if (optarg[0] == '\0')
warning("empty argument to --source ignored");
else {
srcfiles[++numfiles].stype = CMDLINE;
@@ -247,6 +273,14 @@ char **argv;
break;
#endif
+ case 0:
+ /*
+ * getopt_long found an option that sets a variable
+ * instead of returning a letter. Do nothing, just
+ * cycle around for the next one.
+ */
+ break;
+
case '?':
default:
/*
@@ -254,9 +288,27 @@ char **argv;
* option stops argument processing so that it can
* go into ARGV for the awk program to see. This
* makes use of ``#! /bin/gawk -f'' easier.
+ *
+ * However, it's never simple. If optopt is set,
+ * an option that requires an argument didn't get the
+ * argument. We care because if opterr is 0, then
+ * getopt_long won't print the error message for us.
*/
- if (! do_posix)
+ if (! do_posix
+ && (optopt == 0 || strchr(optlist, optopt) == NULL)) {
+ /*
+ * can't just do optind--. In case of an
+ * option with >=2 letters, getopt_long
+ * won't have incremented optind.
+ */
+ optind = old_optind;
+ stopped_early = 1;
goto out;
+ } else if (optopt)
+ /* Use 1003.2 required message format */
+ fprintf (stderr,
+ "%s: option requires an argument -- %c\n",
+ myname, optopt);
/* else
let getopt print error message for us */
break;
@@ -267,6 +319,14 @@ out:
if (do_nostalgia)
nostalgia();
+ /* check for POSIXLY_CORRECT environment variable */
+ if (! do_posix && getenv("POSIXLY_CORRECT") != NULL) {
+ do_posix = 1;
+ if (do_lint)
+ warning(
+ "environment variable `POSIXLY_CORRECT' set: turning on --posix");
+ }
+
/* POSIX compliance also implies no Unix extensions either */
if (do_posix)
do_unix = 1;
@@ -278,7 +338,7 @@ out:
output_is_tty = 1;
/* No -f or --source options, use next arg */
if (numfiles == -1) {
- if (optind > argc - 1) /* no args left */
+ if (optind > argc - 1 || stopped_early) /* no args left or no program */
usage(1);
srcfiles[++numfiles].stype = CMDLINE;
srcfiles[numfiles].val = argv[optind];
@@ -294,6 +354,10 @@ out:
/* Set up the field variables */
init_fields();
+ if (do_lint && begin_block == NULL && expression_value == NULL
+ && end_block == NULL)
+ warning("no program");
+
if (begin_block) {
in_begin_rule = 1;
(void) interpret(begin_block);
@@ -318,25 +382,29 @@ static void
usage(exitval)
int exitval;
{
- char *opt1 = " -f progfile [--]";
- char *opt2 = " [--] 'program'";
- char *regops = " [POSIX or GNU style options]";
+ const char *opt1 = " -f progfile [--]";
+#if defined(MSDOS) || defined(OS2) || defined(VMS)
+ const char *opt2 = " [--] \"program\"";
+#else
+ const char *opt2 = " [--] 'program'";
+#endif
+ const char *regops = " [POSIX or GNU style options]";
- version();
- fprintf(stderr, "usage: %s%s%s file ...\n %s%s%s file ...\n",
+ fprintf(stderr, "Usage:\t%s%s%s file ...\n\t%s%s%s file ...\n",
myname, regops, opt1, myname, regops, opt2);
/* GNU long options info. Gack. */
- fputs("\nPOSIX options:\t\tGNU long options:\n", stderr);
+ fputs("POSIX options:\t\tGNU long options:\n", stderr);
fputs("\t-f progfile\t\t--file=progfile\n", stderr);
fputs("\t-F fs\t\t\t--field-separator=fs\n", stderr);
fputs("\t-v var=val\t\t--assign=var=val\n", stderr);
+ fputs("\t-m[fr]=val\n", stderr);
fputs("\t-W compat\t\t--compat\n", stderr);
fputs("\t-W copyleft\t\t--copyleft\n", stderr);
fputs("\t-W copyright\t\t--copyright\n", stderr);
fputs("\t-W help\t\t\t--help\n", stderr);
fputs("\t-W lint\t\t\t--lint\n", stderr);
-#if 0
+#ifdef NOSTALGIA
fputs("\t-W nostalgia\t\t--nostalgia\n", stderr);
#endif
#ifdef DEBUG
@@ -371,7 +439,6 @@ GNU General Public License for more details.\n\
along with this program; if not, write to the Free Software\n\
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.\n";
- version();
fputs(blurb_part1, stderr);
fputs(blurb_part2, stderr);
fputs(blurb_part3, stderr);
@@ -383,7 +450,8 @@ cmdline_fs(str)
char *str;
{
register NODE **tmp;
- int len = strlen(str);
+ /* int len = strlen(str); *//* don't do that - we want to
+ avoid mismatched types */
tmp = get_lhs(FS_node, (Func_ptr *) 0);
unref(*tmp);
@@ -400,7 +468,7 @@ char *str;
if (do_unix && ! do_posix)
str[0] = '\t';
}
- *tmp = make_str_node(str, len, SCAN); /* do process escapes */
+ *tmp = make_str_node(str, strlen(str), SCAN); /* do process escapes */
set_FS();
}
@@ -432,9 +500,9 @@ char **argv;
*/
struct varinit {
NODE **spec;
- char *name;
+ const char *name;
NODETYPE type;
- char *strval;
+ const char *strval;
AWKNUM numval;
Func_ptr assign;
};
@@ -446,7 +514,7 @@ static struct varinit varinit[] = {
{&FS_node, "FS", Node_FS, " ", 0, 0 },
{&RS_node, "RS", Node_RS, "\n", 0, set_RS },
{&IGNORECASE_node, "IGNORECASE", Node_IGNORECASE, 0, 0, set_IGNORECASE },
-{&FILENAME_node, "FILENAME", Node_var, "-", 0, 0 },
+{&FILENAME_node, "FILENAME", Node_var, "", 0, 0 },
{&OFS_node, "OFS", Node_OFS, " ", 0, set_OFS },
{&ORS_node, "ORS", Node_ORS, "\n", 0, set_ORS },
{&OFMT_node, "OFMT", Node_OFMT, "%.6g", 0, set_OFMT },
@@ -465,9 +533,10 @@ init_vars()
register struct varinit *vp;
for (vp = varinit; vp->name; vp++) {
- *(vp->spec) = install(vp->name,
+ *(vp->spec) = install((char *) vp->name,
node(vp->strval == 0 ? make_number(vp->numval)
- : make_string(vp->strval, strlen(vp->strval)),
+ : make_string((char *) vp->strval,
+ strlen(vp->strval)),
vp->type, (NODE *) NULL));
if (vp->assign)
(*(vp->assign))();
@@ -477,7 +546,7 @@ init_vars()
void
load_environ()
{
-#if !defined(MSDOS) && !(defined(VMS) && defined(__DECC))
+#if !defined(MSDOS) && !defined(OS2) && !(defined(VMS) && defined(__DECC))
extern char **environ;
#endif
register char *var, *val;
@@ -510,7 +579,8 @@ char *
arg_assign(arg)
char *arg;
{
- char *cp;
+ char *cp, *cp2;
+ int badvar;
Func_ptr after_assign = NULL;
NODE *var;
NODE *it;
@@ -519,6 +589,19 @@ char *arg;
cp = strchr(arg, '=');
if (cp != NULL) {
*cp++ = '\0';
+ /* first check that the variable name has valid syntax */
+ badvar = 0;
+ if (! isalpha(arg[0]) && arg[0] != '_')
+ badvar = 1;
+ else
+ for (cp2 = arg+1; *cp2; cp2++)
+ if (! isalnum(*cp2) && *cp2 != '_') {
+ badvar = 1;
+ break;
+ }
+ if (badvar)
+ fatal("illegal name `%s' in variable assignment", arg);
+
/*
* Recent versions of nawk expand escapes inside assignments.
* This makes sense, so we do it too.
@@ -657,7 +740,7 @@ char *optstr;
if (strncasecmp(cp, "source=", 7) != 0)
goto unknown;
cp += 7;
- if (strlen(cp) == 0)
+ if (cp[0] == '\0')
warning("empty argument to -Wsource ignored");
else {
srcfiles[++numfiles].stype = CMDLINE;
@@ -689,40 +772,43 @@ static void
version()
{
fprintf(stderr, "%s, patchlevel %d\n", version_string, PATCHLEVEL);
+ /* per GNU coding standards, exit successfully, do nothing else */
+ exit(0);
}
-/* static */
+/* this mess will improve in 2.16 */
char *
gawk_name(filespec)
char *filespec;
{
- char *p;
-
+ char *p;
+
#ifdef VMS /* "device:[root.][directory.subdir]GAWK.EXE;n" -> "GAWK" */
- char *q;
+ char *q;
- p = strrchr(filespec, ']'); /* directory punctuation */
- q = strrchr(filespec, '>'); /* alternate <international> punct */
+ p = strrchr(filespec, ']'); /* directory punctuation */
+ q = strrchr(filespec, '>'); /* alternate <international> punct */
- if (p == NULL || q > p) p = q;
+ if (p == NULL || q > p) p = q;
p = strdup(p == NULL ? filespec : (p + 1));
if ((q = strrchr(p, '.')) != NULL) *q = '\0'; /* strip .typ;vers */
return p;
#endif /*VMS*/
-#if defined(MSDOS) || defined(atarist)
- char *q;
+#if defined(MSDOS) || defined(OS2) || defined(atarist)
+ char *q;
- p = filespec;
-
- if (q = strrchr(p, '\\'))
- p = q + 1;
- if (q = strchr(p, '.'))
- *q = '\0';
- strlwr(p);
+ for (p = filespec; (p = strchr(p, '\\')); *p = '/')
+ ;
+ p = filespec;
+ if ((q = strrchr(p, '/')))
+ p = q + 1;
+ if ((q = strchr(p, '.')))
+ *q = '\0';
+ strlwr(p);
- return (p == NULL ? filespec : p);
+ return (p == NULL ? filespec : p);
#endif /* MSDOS || atarist */
/* "path/name" -> "name" */
diff --git a/gnu/usr.bin/awk/msg.c b/gnu/usr.bin/awk/msg.c
index b60fe9d1e5e9..4bd9f901b559 100644
--- a/gnu/usr.bin/awk/msg.c
+++ b/gnu/usr.bin/awk/msg.c
@@ -3,7 +3,7 @@
*/
/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -31,8 +31,8 @@ char *source = NULL;
/* VARARGS2 */
void
err(s, emsg, argp)
-char *s;
-char *emsg;
+const char *s;
+const char *emsg;
va_list argp;
{
char *file;
@@ -49,9 +49,10 @@ va_list argp;
}
if (FNR) {
file = FILENAME_node->var_value->stptr;
+ (void) putc('(', stderr);
if (file)
- (void) fprintf(stderr, "(FILENAME=%s ", file);
- (void) fprintf(stderr, "FNR=%d) ", FNR);
+ (void) fprintf(stderr, "FILENAME=%s ", file);
+ (void) fprintf(stderr, "FNR=%ld) ", FNR);
}
(void) fprintf(stderr, s);
vfprintf(stderr, emsg, argp);
diff --git a/gnu/usr.bin/awk/node.c b/gnu/usr.bin/awk/node.c
index 65ecb0ed1723..dca4ad191b9f 100644
--- a/gnu/usr.bin/awk/node.c
+++ b/gnu/usr.bin/awk/node.c
@@ -3,7 +3,7 @@
*/
/*
- * Copyright (C) 1986, 1988, 1989, 1991, 1992 the Free Software Foundation, Inc.
+ * Copyright (C) 1986, 1988, 1989, 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -35,7 +35,7 @@ register NODE *n;
register char *cpend;
char save;
char *ptr;
- unsigned int newflags = 0;
+ unsigned int newflags;
#ifdef DEBUG
if (n == NULL)
@@ -69,7 +69,8 @@ register NODE *n;
if (n->flags & MAYBE_NUM) {
newflags = NUMBER;
n->flags &= ~MAYBE_NUM;
- }
+ } else
+ newflags = 0;
if (cpend - cp == 1) {
if (isdigit(*cp)) {
n->numbr = (AWKNUM)(*cp - '0');
@@ -101,7 +102,7 @@ register NODE *n;
* (more complicated) variations on this theme didn't seem to pay off, but
* systematic testing might be in order at some point
*/
-static char *values[] = {
+static const char *values[] = {
"0",
"1",
"2",
@@ -121,7 +122,7 @@ register NODE *s;
{
char buf[128];
register char *sp = buf;
- register long num = 0;
+ double val;
#ifdef DEBUG
if (s == NULL) cant_happen();
@@ -131,26 +132,56 @@ register NODE *s;
if (s->stref != 0) ; /*cant_happen();*/
#endif
- /* avoids floating point exception in DOS*/
- if ( s->numbr <= LONG_MAX && s->numbr >= -LONG_MAX)
- num = (long)s->numbr;
- if ((AWKNUM) num == s->numbr) { /* integral value */
+ /* not an integral value, or out of range */
+ if ((val = double_to_int(s->numbr)) != s->numbr
+ || val < LONG_MIN || val > LONG_MAX) {
+#ifdef GFMT_WORKAROUND
+ NODE *dummy, *r;
+ unsigned short oflags;
+ extern NODE *format_tree P((const char *, int, NODE *));
+ extern NODE **fmt_list; /* declared in eval.c */
+
+ /* create dummy node for a sole use of format_tree */
+ getnode(dummy);
+ dummy->lnode = s;
+ dummy->rnode = NULL;
+ oflags = s->flags;
+ s->flags |= PERM; /* prevent from freeing by format_tree() */
+ r = format_tree(CONVFMT, fmt_list[CONVFMTidx]->stlen, dummy);
+ s->flags = oflags;
+ s->stfmt = (char)CONVFMTidx;
+ s->stlen = r->stlen;
+ s->stptr = r->stptr;
+ freenode(r); /* Do not free_temp(r)! We want */
+ freenode(dummy); /* to keep s->stptr == r->stpr. */
+
+ goto no_malloc;
+#else
+ /*
+ * no need for a "replacement" formatting by gawk,
+ * just use sprintf
+ */
+ sprintf(sp, CONVFMT, s->numbr);
+ s->stlen = strlen(sp);
+ s->stfmt = (char)CONVFMTidx;
+#endif /* GFMT_WORKAROUND */
+ } else {
+ /* integral value */
+ /* force conversion to long only once */
+ register long num = (long) val;
if (num < NVAL && num >= 0) {
- sp = values[num];
+ sp = (char *) values[num];
s->stlen = 1;
} else {
(void) sprintf(sp, "%ld", num);
s->stlen = strlen(sp);
}
s->stfmt = -1;
- } else {
- (void) sprintf(sp, CONVFMT, s->numbr);
- s->stlen = strlen(sp);
- s->stfmt = (char)CONVFMTidx;
}
- s->stref = 1;
emalloc(s->stptr, char *, s->stlen + 2, "force_string");
memcpy(s->stptr, sp, s->stlen+1);
+no_malloc:
+ s->stref = 1;
s->flags |= STR;
return s;
}
@@ -182,7 +213,8 @@ NODE *n;
if (n->type == Node_val && (n->flags & STR)) {
r->stref = 1;
emalloc(r->stptr, char *, r->stlen + 2, "dupnode");
- memcpy(r->stptr, n->stptr, r->stlen+1);
+ memcpy(r->stptr, n->stptr, r->stlen);
+ r->stptr[r->stlen] = '\0';
}
return r;
}
@@ -285,8 +317,10 @@ more_nodes()
/* get more nodes and initialize list */
emalloc(nextfree, NODE *, NODECHUNK * sizeof(NODE), "newnode");
- for (np = nextfree; np < &nextfree[NODECHUNK - 1]; np++)
+ for (np = nextfree; np < &nextfree[NODECHUNK - 1]; np++) {
+ np->flags = 0;
np->nextp = np + 1;
+ }
np->nextp = NULL;
np = nextfree;
nextfree = nextfree->nextp;
diff --git a/gnu/usr.bin/awk/patchlevel.h b/gnu/usr.bin/awk/patchlevel.h
index c6161a1f274c..c80ca154ec62 100644
--- a/gnu/usr.bin/awk/patchlevel.h
+++ b/gnu/usr.bin/awk/patchlevel.h
@@ -1 +1 @@
-#define PATCHLEVEL 2
+#define PATCHLEVEL 5
diff --git a/gnu/usr.bin/awk/protos.h b/gnu/usr.bin/awk/protos.h
index 25af32165b02..1d4ac9987e1c 100644
--- a/gnu/usr.bin/awk/protos.h
+++ b/gnu/usr.bin/awk/protos.h
@@ -3,7 +3,7 @@
*/
/*
- * Copyright (C) 1991, 1992, the Free Software Foundation, Inc.
+ * Copyright (C) 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -32,14 +32,16 @@ extern aptr_t malloc P((MALLOC_ARG_T));
extern aptr_t realloc P((aptr_t, MALLOC_ARG_T));
extern aptr_t calloc P((MALLOC_ARG_T, MALLOC_ARG_T));
+#if !defined(sun) && !defined(__sun__)
extern void free P((aptr_t));
-extern char *getenv P((char *));
+#endif
+extern char *getenv P((const char *));
extern char *strcpy P((char *, const char *));
extern char *strcat P((char *, const char *));
-extern char *strncpy P((char *, const char *, int));
extern int strcmp P((const char *, const char *));
-extern int strncmp P((const char *, const char *, int));
+extern char *strncpy P((char *, const char *, size_t));
+extern int strncmp P((const char *, const char *, size_t));
#ifndef VMS
extern char *strerror P((int));
#else
@@ -48,22 +50,29 @@ extern char *strerror P((int,...));
extern char *strchr P((const char *, int));
extern char *strrchr P((const char *, int));
extern char *strstr P((const char *s1, const char *s2));
-extern int strlen P((const char *));
+extern size_t strlen P((const char *));
extern long strtol P((const char *, char **, int));
#if !defined(_MSC_VER) && !defined(__GNU_LIBRARY__)
-extern int strftime P((char *, int, const char *, const struct tm *));
+extern size_t strftime P((char *, size_t, const char *, const struct tm *));
#endif
+#ifdef __STDC__
extern time_t time P((time_t *));
+#else
+extern long time();
+#endif
extern aptr_t memset P((aptr_t, int, size_t));
extern aptr_t memcpy P((aptr_t, const aptr_t, size_t));
extern aptr_t memmove P((aptr_t, const aptr_t, size_t));
extern aptr_t memchr P((const aptr_t, int, size_t));
extern int memcmp P((const aptr_t, const aptr_t, size_t));
-/* extern int fprintf P((FILE *, char *, ...)); */
-extern int fprintf P(());
+extern int fprintf P((FILE *, const char *, ...));
#if !defined(MSDOS) && !defined(__GNU_LIBRARY__)
-extern int fwrite P((const char *, int, int, FILE *));
+#ifdef __STDC__
+extern size_t fwrite P((const aptr_t, size_t, size_t, FILE *));
+#else
+extern int fwrite();
+#endif
extern int fputs P((const char *, FILE *));
extern int unlink P((const char *));
#endif
@@ -75,7 +84,7 @@ extern void abort P(());
extern int isatty P((int));
extern void exit P((int));
extern int system P((const char *));
-extern int sscanf P((/* char *, char *, ... */));
+extern int sscanf P((const char *, const char *, ...));
#ifndef toupper
extern int toupper P((int));
#endif
@@ -84,32 +93,30 @@ extern int tolower P((int));
#endif
extern double pow P((double x, double y));
-extern double atof P((char *));
+extern double atof P((const char *));
extern double strtod P((const char *, char **));
extern int fstat P((int, struct stat *));
extern int stat P((const char *, struct stat *));
extern off_t lseek P((int, off_t, int));
extern int fseek P((FILE *, long, int));
extern int close P((int));
-extern int creat P(());
-extern int open P(());
+extern int creat P((const char *, mode_t));
+extern int open P((const char *, int, ...));
extern int pipe P((int *));
extern int dup P((int));
extern int dup2 P((int,int));
extern int fork P(());
extern int execl P((/* char *, char *, ... */));
+#ifndef __STDC__
extern int read P((int, char *, int));
+#endif
extern int wait P((int *));
extern void _exit P((int));
-#ifndef __STDC__
-extern long time P((long *));
-#endif
-
#ifdef NON_STD_SPRINTF
-extern char *sprintf();
+extern char *sprintf P((char *, const char*, ...));
#else
-extern int sprintf();
+extern int sprintf P((char *, const char*, ...));
#endif /* SPRINTF_INT */
#undef aptr_t
diff --git a/gnu/usr.bin/awk/re.c b/gnu/usr.bin/awk/re.c
index 495b0963cadb..4ea22c232685 100644
--- a/gnu/usr.bin/awk/re.c
+++ b/gnu/usr.bin/awk/re.c
@@ -3,7 +3,7 @@
*/
/*
- * Copyright (C) 1991, 1992 the Free Software Foundation, Inc.
+ * Copyright (C) 1991, 1992, 1993 the Free Software Foundation, Inc.
*
* This file is part of GAWK, the GNU implementation of the
* AWK Progamming Language.
@@ -30,12 +30,12 @@
Regexp *
make_regexp(s, len, ignorecase, dfa)
char *s;
-int len;
+size_t len;
int ignorecase;
int dfa;
{
Regexp *rp;
- char *err;
+ const char *rerr;
char *src = s;
char *temp;
char *end = s + len;
@@ -90,7 +90,7 @@ int dfa;
*dest = '\0' ; /* Only necessary if we print dest ? */
emalloc(rp, Regexp *, sizeof(*rp), "make_regexp");
memset((char *) rp, 0, sizeof(*rp));
- emalloc(rp->pat.buffer, char *, 16, "make_regexp");
+ emalloc(rp->pat.buffer, unsigned char *, 16, "make_regexp");
rp->pat.allocated = 16;
emalloc(rp->pat.fastmap, char *, 256, "make_regexp");
@@ -99,13 +99,14 @@ int dfa;
else
rp->pat.translate = NULL;
len = dest - temp;
- if ((err = re_compile_pattern(temp, (size_t) len, &(rp->pat))) != NULL)
- fatal("%s: /%s/", err, temp);
+ if ((rerr = re_compile_pattern(temp, len, &(rp->pat))) != NULL)
+ fatal("%s: /%s/", rerr, temp);
if (dfa && !ignorecase) {
- regcompile(temp, len, &(rp->dfareg), 1);
+ dfacomp(temp, len, &(rp->dfareg), 1);
rp->dfa = 1;
} else
rp->dfa = 0;
+
free(temp);
return rp;
}
@@ -115,24 +116,24 @@ research(rp, str, start, len, need_start)
Regexp *rp;
register char *str;
int start;
-register int len;
+register size_t len;
int need_start;
{
char *ret = str;
if (rp->dfa) {
- char save1;
- char save2;
+ char save;
int count = 0;
int try_backref;
- save1 = str[start+len];
- str[start+len] = '\n';
- save2 = str[start+len+1];
- ret = regexecute(&(rp->dfareg), str+start, str+start+len+1, 1,
+ /*
+ * dfa likes to stick a '\n' right after the matched
+ * text. So we just save and restore the character.
+ */
+ save = str[start+len];
+ ret = dfaexec(&(rp->dfareg), str+start, str+start+len, 1,
&count, &try_backref);
- str[start+len] = save1;
- str[start+len+1] = save2;
+ str[start+len] = save;
}
if (ret) {
if (need_start || rp->dfa == 0)
@@ -151,12 +152,12 @@ Regexp *rp;
free(rp->pat.buffer);
free(rp->pat.fastmap);
if (rp->dfa)
- reg_free(&(rp->dfareg));
+ dfafree(&(rp->dfareg));
free(rp);
}
void
-reg_error(s)
+dfaerror(s)
const char *s;
{
fatal(s);
@@ -194,7 +195,8 @@ NODE *t;
t->re_text = dupnode(t1);
free_temp(t1);
}
- t->re_reg = make_regexp(t->re_text->stptr, t->re_text->stlen, IGNORECASE, t->re_cnt);
+ t->re_reg = make_regexp(t->re_text->stptr, t->re_text->stlen,
+ IGNORECASE, t->re_cnt);
t->re_flags &= ~CASE;
t->re_flags |= IGNORECASE;
return t->re_reg;
@@ -203,6 +205,8 @@ NODE *t;
void
resetup()
{
- (void) re_set_syntax(RE_SYNTAX_AWK);
- regsyntax(RE_SYNTAX_AWK, 0);
+ reg_syntax_t syn = RE_SYNTAX_AWK;
+
+ (void) re_set_syntax(syn);
+ dfasyntax(syn, 0);
}
diff --git a/gnu/usr.bin/awk/regex.c b/gnu/usr.bin/awk/regex.c
index f4dd4c2cd24d..6a36f3de8178 100644
--- a/gnu/usr.bin/awk/regex.c
+++ b/gnu/usr.bin/awk/regex.c
@@ -1,9 +1,13 @@
-/* Extended regular expression matching and search library.
- Copyright (C) 1985, 1989-90 Free Software Foundation, Inc.
+/* Extended regular expression matching and search library,
+ version 0.12.
+ (Implements POSIX draft P10003.2/D11.2, except for
+ internationalization features.)
+
+ Copyright (C) 1993 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 1, or (at your option)
+ the Free Software Foundation; either version 2, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
@@ -15,75 +19,63 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
+/* AIX requires this to be the first thing in the file. */
+#if defined (_AIX) && !defined (REGEX_MALLOC)
+ #pragma alloca
+#endif
-/* To test, compile with -Dtest. This Dtestable feature turns this into
- a self-contained program which reads a pattern, describes how it
- compiles, then reads a string and searches for it.
-
- On the other hand, if you compile with both -Dtest and -Dcanned you
- can run some tests we've already thought of. */
+#define _GNU_SOURCE
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
-#ifdef emacs
+#if defined(STDC_HEADERS) && !defined(emacs)
+#include <stddef.h>
+#else
+/* We need this for `regex.h', and perhaps for the Emacs include files. */
+#include <sys/types.h>
+#endif
-/* The `emacs' switch turns on certain special matching commands
- that make sense only in emacs. */
+/* The `emacs' switch turns on certain matching commands
+ that make sense only in Emacs. */
+#ifdef emacs
#include "lisp.h"
#include "buffer.h"
#include "syntax.h"
-/* We write fatal error messages on standard error. */
-#include <stdio.h>
-
-/* isalpha(3) etc. are used for the character classes. */
-#include <ctype.h>
-
-#else /* not emacs */
+/* Emacs uses `NULL' as a predicate. */
+#undef NULL
-#include "awk.h"
+#else /* not emacs */
-#define NO_ALLOCA /* try it out for now */
-#ifndef NO_ALLOCA
-/* Make alloca work the best possible way. */
-#ifdef __GNUC__
-#ifndef atarist
-#ifndef alloca
-#define alloca __builtin_alloca
+/* We used to test for `BSTRING' here, but only GCC and Emacs define
+ `BSTRING', as far as I know, and neither of them use this code. */
+#if HAVE_STRING_H || STDC_HEADERS
+#include <string.h>
+#ifndef bcmp
+#define bcmp(s1, s2, n) memcmp ((s1), (s2), (n))
+#endif
+#ifndef bcopy
+#define bcopy(s, d, n) memcpy ((d), (s), (n))
+#endif
+#ifndef bzero
+#define bzero(s, n) memset ((s), 0, (n))
#endif
-#endif /* atarist */
#else
-#if defined(sparc) && !defined(__GNUC__)
-#include <alloca.h>
+#include <strings.h>
+#endif
+
+#ifdef STDC_HEADERS
+#include <stdlib.h>
#else
-char *alloca ();
+char *malloc ();
+char *realloc ();
#endif
-#endif /* __GNUC__ */
-
-#define FREE_AND_RETURN_VOID(stackb) return
-#define FREE_AND_RETURN(stackb,val) return(val)
-#define DOUBLE_STACK(stackx,stackb,len) \
- (stackx = (unsigned char **) alloca (2 * len \
- * sizeof (unsigned char *)),\
- /* Only copy what is in use. */ \
- (unsigned char **) memcpy (stackx, stackb, len * sizeof (char *)))
-#else /* NO_ALLOCA defined */
-#define FREE_AND_RETURN_VOID(stackb) free(stackb);return
-#define FREE_AND_RETURN(stackb,val) free(stackb);return(val)
-#define DOUBLE_STACK(stackx,stackb,len) \
- (unsigned char **) realloc (stackb, 2 * len * sizeof (unsigned char *))
-#endif /* NO_ALLOCA */
-
-static void store_jump P((char *, int, char *));
-static void insert_jump P((int, char *, char *, char *));
-static void store_jump_n P((char *, int, char *, unsigned));
-static void insert_jump_n P((int, char *, char *, char *, unsigned));
-static void insert_op_2 P((int, char *, char *, int, int ));
-static int memcmp_translate P((unsigned char *, unsigned char *,
- int, unsigned char *));
-long re_set_syntax P((long));
-
-/* Define the syntax stuff, so we can do the \<, \>, etc. */
+
+
+/* Define the syntax stuff for \<, \>, etc. */
/* This must be nonzero for the wordchar and notwordchar pattern
commands in re_match_2. */
@@ -91,18 +83,16 @@ long re_set_syntax P((long));
#define Sword 1
#endif
-#define SYNTAX(c) re_syntax_table[c]
-
-
#ifdef SYNTAX_TABLE
-char *re_syntax_table;
+extern char *re_syntax_table;
#else /* not SYNTAX_TABLE */
-static char re_syntax_table[256];
-static void init_syntax_once P((void));
+/* How many characters in the character set. */
+#define CHAR_SET_SIZE 256
+static char re_syntax_table[CHAR_SET_SIZE];
static void
init_syntax_once ()
@@ -113,7 +103,7 @@ init_syntax_once ()
if (done)
return;
- memset (re_syntax_table, 0, sizeof re_syntax_table);
+ bzero (re_syntax_table, sizeof re_syntax_table);
for (c = 'a'; c <= 'z'; c++)
re_syntax_table[c] = Sword;
@@ -123,1504 +113,2900 @@ init_syntax_once ()
for (c = '0'; c <= '9'; c++)
re_syntax_table[c] = Sword;
-
- /* Add specific syntax for ISO Latin-1. */
- for (c = 0300; c <= 0377; c++)
- re_syntax_table[c] = Sword;
- re_syntax_table[0327] = 0;
- re_syntax_table[0367] = 0;
+
+ re_syntax_table['_'] = Sword;
done = 1;
}
-#endif /* SYNTAX_TABLE */
-#undef P
-#endif /* emacs */
+#endif /* not SYNTAX_TABLE */
+#define SYNTAX(c) re_syntax_table[c]
-/* Sequents are missing isgraph. */
-#ifndef isgraph
-#define isgraph(c) (isprint((c)) && !isspace((c)))
-#endif
-
+#endif /* not emacs */
+
/* Get the interface, including the syntax bits. */
#include "regex.h"
+/* isalpha etc. are used for the character classes. */
+#include <ctype.h>
-/* These are the command codes that appear in compiled regular
- expressions, one per byte. Some command codes are followed by
- argument bytes. A command code can specify any interpretation
- whatsoever for its arguments. Zero-bytes may appear in the compiled
- regular expression.
+/* Jim Meyering writes:
+
+ "... Some ctype macros are valid only for character codes that
+ isascii says are ASCII (SGI's IRIX-4.0.5 is one such system --when
+ using /bin/cc or gcc but without giving an ansi option). So, all
+ ctype uses should be through macros like ISPRINT... If
+ STDC_HEADERS is defined, then autoconf has verified that the ctype
+ macros don't need to be guarded with references to isascii. ...
+ Defining isascii to 1 should let any compiler worth its salt
+ eliminate the && through constant folding." */
+#if ! defined (isascii) || defined (STDC_HEADERS)
+#undef isascii
+#define isascii(c) 1
+#endif
+
+#ifdef isblank
+#define ISBLANK(c) (isascii (c) && isblank (c))
+#else
+#define ISBLANK(c) ((c) == ' ' || (c) == '\t')
+#endif
+#ifdef isgraph
+#define ISGRAPH(c) (isascii (c) && isgraph (c))
+#else
+#define ISGRAPH(c) (isascii (c) && isprint (c) && !isspace (c))
+#endif
+
+#define ISPRINT(c) (isascii (c) && isprint (c))
+#define ISDIGIT(c) (isascii (c) && isdigit (c))
+#define ISALNUM(c) (isascii (c) && isalnum (c))
+#define ISALPHA(c) (isascii (c) && isalpha (c))
+#define ISCNTRL(c) (isascii (c) && iscntrl (c))
+#define ISLOWER(c) (isascii (c) && islower (c))
+#define ISPUNCT(c) (isascii (c) && ispunct (c))
+#define ISSPACE(c) (isascii (c) && isspace (c))
+#define ISUPPER(c) (isascii (c) && isupper (c))
+#define ISXDIGIT(c) (isascii (c) && isxdigit (c))
+
+#ifndef NULL
+#define NULL 0
+#endif
+
+/* We remove any previous definition of `SIGN_EXTEND_CHAR',
+ since ours (we hope) works properly with all combinations of
+ machines, compilers, `char' and `unsigned char' argument types.
+ (Per Bothner suggested the basic approach.) */
+#undef SIGN_EXTEND_CHAR
+#if __STDC__
+#define SIGN_EXTEND_CHAR(c) ((signed char) (c))
+#else /* not __STDC__ */
+/* As in Harbison and Steele. */
+#define SIGN_EXTEND_CHAR(c) ((((unsigned char) (c)) ^ 128) - 128)
+#endif
+
+/* Should we use malloc or alloca? If REGEX_MALLOC is not defined, we
+ use `alloca' instead of `malloc'. This is because using malloc in
+ re_search* or re_match* could cause memory leaks when C-g is used in
+ Emacs; also, malloc is slower and causes storage fragmentation. On
+ the other hand, malloc is more portable, and easier to debug.
- The value of `exactn' is needed in search.c (search_buffer) in emacs.
+ Because we sometimes use alloca, some routines have to be macros,
+ not functions -- `alloca'-allocated space disappears at the end of the
+ function it is called in. */
+
+#ifdef REGEX_MALLOC
+
+#define REGEX_ALLOCATE malloc
+#define REGEX_REALLOCATE(source, osize, nsize) realloc (source, nsize)
+
+#else /* not REGEX_MALLOC */
+
+/* Emacs already defines alloca, sometimes. */
+#ifndef alloca
+
+/* Make alloca work the best possible way. */
+#ifdef __GNUC__
+#define alloca __builtin_alloca
+#else /* not __GNUC__ */
+#if HAVE_ALLOCA_H
+#include <alloca.h>
+#else /* not __GNUC__ or HAVE_ALLOCA_H */
+#ifndef _AIX /* Already did AIX, up at the top. */
+char *alloca ();
+#endif /* not _AIX */
+#endif /* not HAVE_ALLOCA_H */
+#endif /* not __GNUC__ */
+
+#endif /* not alloca */
+
+#define REGEX_ALLOCATE alloca
+
+/* Assumes a `char *destination' variable. */
+#define REGEX_REALLOCATE(source, osize, nsize) \
+ (destination = (char *) alloca (nsize), \
+ bcopy (source, destination, osize), \
+ destination)
+
+#endif /* not REGEX_MALLOC */
+
+
+/* True if `size1' is non-NULL and PTR is pointing anywhere inside
+ `string1' or just past its end. This works if PTR is NULL, which is
+ a good thing. */
+#define FIRST_STRING_P(ptr) \
+ (size1 && string1 <= (ptr) && (ptr) <= string1 + size1)
+
+/* (Re)Allocate N items of type T using malloc, or fail. */
+#define TALLOC(n, t) ((t *) malloc ((n) * sizeof (t)))
+#define RETALLOC(addr, n, t) ((addr) = (t *) realloc (addr, (n) * sizeof (t)))
+#define REGEX_TALLOC(n, t) ((t *) REGEX_ALLOCATE ((n) * sizeof (t)))
+
+#define BYTEWIDTH 8 /* In bits. */
+
+#define STREQ(s1, s2) ((strcmp (s1, s2) == 0))
+
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+#define MIN(a, b) ((a) < (b) ? (a) : (b))
+
+typedef char boolean;
+#define false 0
+#define true 1
+
+/* These are the command codes that appear in compiled regular
+ expressions. Some opcodes are followed by argument bytes. A
+ command code can specify any interpretation whatsoever for its
+ arguments. Zero bytes may appear in the compiled regular expression.
+
+ The value of `exactn' is needed in search.c (search_buffer) in Emacs.
So regex.h defines a symbol `RE_EXACTN_VALUE' to be 1; the value of
`exactn' we use here must also be 1. */
-enum regexpcode
- {
- unused=0,
- exactn=1, /* Followed by one byte giving n, then by n literal bytes. */
- begline, /* Fail unless at beginning of line. */
- endline, /* Fail unless at end of line. */
- jump, /* Followed by two bytes giving relative address to jump to. */
- on_failure_jump, /* Followed by two bytes giving relative address of
- place to resume at in case of failure. */
- finalize_jump, /* Throw away latest failure point and then jump to
- address. */
- maybe_finalize_jump, /* Like jump but finalize if safe to do so.
- This is used to jump back to the beginning
- of a repeat. If the command that follows
- this jump is clearly incompatible with the
- one at the beginning of the repeat, such that
- we can be sure that there is no use backtracking
- out of repetitions already completed,
- then we finalize. */
- dummy_failure_jump, /* Jump, and push a dummy failure point. This
- failure point will be thrown away if an attempt
- is made to use it for a failure. A + construct
- makes this before the first repeat. Also
- use it as an intermediary kind of jump when
- compiling an or construct. */
- succeed_n, /* Used like on_failure_jump except has to succeed n times;
- then gets turned into an on_failure_jump. The relative
- address following it is useless until then. The
- address is followed by two bytes containing n. */
- jump_n, /* Similar to jump, but jump n times only; also the relative
- address following is in turn followed by yet two more bytes
- containing n. */
- set_number_at, /* Set the following relative location to the
- subsequent number. */
- anychar, /* Matches any (more or less) one character. */
- charset, /* Matches any one char belonging to specified set.
- First following byte is number of bitmap bytes.
- Then come bytes for a bitmap saying which chars are in.
- Bits in each byte are ordered low-bit-first.
- A character is in the set if its bit is 1.
- A character too large to have a bit in the map
- is automatically not in the set. */
- charset_not, /* Same parameters as charset, but match any character
- that is not one of those specified. */
- start_memory, /* Start remembering the text that is matched, for
- storing in a memory register. Followed by one
- byte containing the register number. Register numbers
- must be in the range 0 through RE_NREGS. */
- stop_memory, /* Stop remembering the text that is matched
- and store it in a memory register. Followed by
- one byte containing the register number. Register
- numbers must be in the range 0 through RE_NREGS. */
- duplicate, /* Match a duplicate of something remembered.
- Followed by one byte containing the index of the memory
- register. */
- before_dot, /* Succeeds if before point. */
- at_dot, /* Succeeds if at point. */
- after_dot, /* Succeeds if after point. */
- begbuf, /* Succeeds if at beginning of buffer. */
- endbuf, /* Succeeds if at end of buffer. */
- wordchar, /* Matches any word-constituent character. */
- notwordchar, /* Matches any char that is not a word-constituent. */
- wordbeg, /* Succeeds if at word beginning. */
- wordend, /* Succeeds if at word end. */
- wordbound, /* Succeeds if at a word boundary. */
- notwordbound,/* Succeeds if not at a word boundary. */
- syntaxspec, /* Matches any character whose syntax is specified.
- followed by a byte which contains a syntax code,
- e.g., Sword. */
- notsyntaxspec /* Matches any character whose syntax differs from
- that specified. */
- };
-
+typedef enum
+{
+ no_op = 0,
+
+ /* Followed by one byte giving n, then by n literal bytes. */
+ exactn = 1,
+
+ /* Matches any (more or less) character. */
+ anychar,
+
+ /* Matches any one char belonging to specified set. First
+ following byte is number of bitmap bytes. Then come bytes
+ for a bitmap saying which chars are in. Bits in each byte
+ are ordered low-bit-first. A character is in the set if its
+ bit is 1. A character too large to have a bit in the map is
+ automatically not in the set. */
+ charset,
+
+ /* Same parameters as charset, but match any character that is
+ not one of those specified. */
+ charset_not,
+
+ /* Start remembering the text that is matched, for storing in a
+ register. Followed by one byte with the register number, in
+ the range 0 to one less than the pattern buffer's re_nsub
+ field. Then followed by one byte with the number of groups
+ inner to this one. (This last has to be part of the
+ start_memory only because we need it in the on_failure_jump
+ of re_match_2.) */
+ start_memory,
+
+ /* Stop remembering the text that is matched and store it in a
+ memory register. Followed by one byte with the register
+ number, in the range 0 to one less than `re_nsub' in the
+ pattern buffer, and one byte with the number of inner groups,
+ just like `start_memory'. (We need the number of inner
+ groups here because we don't have any easy way of finding the
+ corresponding start_memory when we're at a stop_memory.) */
+ stop_memory,
+
+ /* Match a duplicate of something remembered. Followed by one
+ byte containing the register number. */
+ duplicate,
+
+ /* Fail unless at beginning of line. */
+ begline,
+
+ /* Fail unless at end of line. */
+ endline,
+
+ /* Succeeds if at beginning of buffer (if emacs) or at beginning
+ of string to be matched (if not). */
+ begbuf,
+
+ /* Analogously, for end of buffer/string. */
+ endbuf,
-/* Number of failure points to allocate space for initially,
- when matching. If this number is exceeded, more space is allocated,
- so it is not a hard limit. */
+ /* Followed by two byte relative address to which to jump. */
+ jump,
+
+ /* Same as jump, but marks the end of an alternative. */
+ jump_past_alt,
+
+ /* Followed by two-byte relative address of place to resume at
+ in case of failure. */
+ on_failure_jump,
+
+ /* Like on_failure_jump, but pushes a placeholder instead of the
+ current string position when executed. */
+ on_failure_keep_string_jump,
+
+ /* Throw away latest failure point and then jump to following
+ two-byte relative address. */
+ pop_failure_jump,
+
+ /* Change to pop_failure_jump if know won't have to backtrack to
+ match; otherwise change to jump. This is used to jump
+ back to the beginning of a repeat. If what follows this jump
+ clearly won't match what the repeat does, such that we can be
+ sure that there is no use backtracking out of repetitions
+ already matched, then we change it to a pop_failure_jump.
+ Followed by two-byte address. */
+ maybe_pop_jump,
+
+ /* Jump to following two-byte address, and push a dummy failure
+ point. This failure point will be thrown away if an attempt
+ is made to use it for a failure. A `+' construct makes this
+ before the first repeat. Also used as an intermediary kind
+ of jump when compiling an alternative. */
+ dummy_failure_jump,
+
+ /* Push a dummy failure point and continue. Used at the end of
+ alternatives. */
+ push_dummy_failure,
+
+ /* Followed by two-byte relative address and two-byte number n.
+ After matching N times, jump to the address upon failure. */
+ succeed_n,
+
+ /* Followed by two-byte relative address, and two-byte number n.
+ Jump to the address N times, then fail. */
+ jump_n,
+
+ /* Set the following two-byte relative address to the
+ subsequent two-byte number. The address *includes* the two
+ bytes of number. */
+ set_number_at,
+
+ wordchar, /* Matches any word-constituent character. */
+ notwordchar, /* Matches any char that is not a word-constituent. */
+
+ wordbeg, /* Succeeds if at word beginning. */
+ wordend, /* Succeeds if at word end. */
+
+ wordbound, /* Succeeds if at a word boundary. */
+ notwordbound /* Succeeds if not at a word boundary. */
-#ifndef NFAILURES
-#define NFAILURES 80
-#endif
+#ifdef emacs
+ ,before_dot, /* Succeeds if before point. */
+ at_dot, /* Succeeds if at point. */
+ after_dot, /* Succeeds if after point. */
-#ifdef CHAR_UNSIGNED
-#define SIGN_EXTEND_CHAR(c) ((c)>(char)127?(c)-256:(c)) /* for IBM RT */
-#endif
-#ifndef SIGN_EXTEND_CHAR
-#define SIGN_EXTEND_CHAR(x) (x)
-#endif
-
+ /* Matches any character whose syntax is specified. Followed by
+ a byte which contains a syntax code, e.g., Sword. */
+ syntaxspec,
+
+ /* Matches any character whose syntax is not that specified. */
+ notsyntaxspec
+#endif /* emacs */
+} re_opcode_t;
+
+/* Common operations on the compiled pattern. */
/* Store NUMBER in two contiguous bytes starting at DESTINATION. */
+
#define STORE_NUMBER(destination, number) \
- { (destination)[0] = (number) & 0377; \
- (destination)[1] = (number) >> 8; }
-
-/* Same as STORE_NUMBER, except increment the destination pointer to
- the byte after where the number is stored. Watch out that values for
- DESTINATION such as p + 1 won't work, whereas p will. */
-#define STORE_NUMBER_AND_INCR(destination, number) \
- { STORE_NUMBER(destination, number); \
- (destination) += 2; }
+ do { \
+ (destination)[0] = (number) & 0377; \
+ (destination)[1] = (number) >> 8; \
+ } while (0)
+
+/* Same as STORE_NUMBER, except increment DESTINATION to
+ the byte after where the number is stored. Therefore, DESTINATION
+ must be an lvalue. */
+#define STORE_NUMBER_AND_INCR(destination, number) \
+ do { \
+ STORE_NUMBER (destination, number); \
+ (destination) += 2; \
+ } while (0)
-/* Put into DESTINATION a number stored in two contingous bytes starting
+/* Put into DESTINATION a number stored in two contiguous bytes starting
at SOURCE. */
+
#define EXTRACT_NUMBER(destination, source) \
- { (destination) = *(source) & 0377; \
- (destination) += SIGN_EXTEND_CHAR (*(char *)((source) + 1)) << 8; }
+ do { \
+ (destination) = *(source) & 0377; \
+ (destination) += SIGN_EXTEND_CHAR (*((source) + 1)) << 8; \
+ } while (0)
+
+#ifdef DEBUG
+static void extract_number _RE_ARGS((int *dest, unsigned char *source));
+static void
+extract_number (dest, source)
+ int *dest;
+ unsigned char *source;
+{
+ int temp = SIGN_EXTEND_CHAR (*(source + 1));
+ *dest = *source & 0377;
+ *dest += temp << 8;
+}
+
+#ifndef EXTRACT_MACROS /* To debug the macros. */
+#undef EXTRACT_NUMBER
+#define EXTRACT_NUMBER(dest, src) extract_number (&dest, src)
+#endif /* not EXTRACT_MACROS */
+
+#endif /* DEBUG */
+
+/* Same as EXTRACT_NUMBER, except increment SOURCE to after the number.
+ SOURCE must be an lvalue. */
-/* Same as EXTRACT_NUMBER, except increment the pointer for source to
- point to second byte of SOURCE. Note that SOURCE has to be a value
- such as p, not, e.g., p + 1. */
#define EXTRACT_NUMBER_AND_INCR(destination, source) \
- { EXTRACT_NUMBER (destination, source); \
- (source) += 2; }
+ do { \
+ EXTRACT_NUMBER (destination, source); \
+ (source) += 2; \
+ } while (0)
+
+#ifdef DEBUG
+static void extract_number_and_incr _RE_ARGS((int *destination,
+ unsigned char **source));
+static void
+extract_number_and_incr (destination, source)
+ int *destination;
+ unsigned char **source;
+{
+ extract_number (destination, *source);
+ *source += 2;
+}
+#ifndef EXTRACT_MACROS
+#undef EXTRACT_NUMBER_AND_INCR
+#define EXTRACT_NUMBER_AND_INCR(dest, src) \
+ extract_number_and_incr (&dest, &src)
+#endif /* not EXTRACT_MACROS */
-/* Specify the precise syntax of regexps for compilation. This provides
- for compatibility for various utilities which historically have
- different, incompatible syntaxes.
-
- The argument SYNTAX is a bit-mask comprised of the various bits
- defined in regex.h. */
+#endif /* DEBUG */
+
+/* If DEBUG is defined, Regex prints many voluminous messages about what
+ it is doing (if the variable `debug' is nonzero). If linked with the
+ main program in `iregex.c', you can enter patterns and strings
+ interactively. And if linked with the main program in `main.c' and
+ the other test files, you can run the already-written tests. */
-long
-re_set_syntax (syntax)
- long syntax;
+#ifdef DEBUG
+
+/* We use standard I/O for debugging. */
+#include <stdio.h>
+
+/* It is useful to test things that ``must'' be true when debugging. */
+#include <assert.h>
+
+static int debug = 0;
+
+#define DEBUG_STATEMENT(e) e
+#define DEBUG_PRINT1(x) if (debug) printf (x)
+#define DEBUG_PRINT2(x1, x2) if (debug) printf (x1, x2)
+#define DEBUG_PRINT3(x1, x2, x3) if (debug) printf (x1, x2, x3)
+#define DEBUG_PRINT4(x1, x2, x3, x4) if (debug) printf (x1, x2, x3, x4)
+#define DEBUG_PRINT_COMPILED_PATTERN(p, s, e) \
+ if (debug) print_partial_compiled_pattern (s, e)
+#define DEBUG_PRINT_DOUBLE_STRING(w, s1, sz1, s2, sz2) \
+ if (debug) print_double_string (w, s1, sz1, s2, sz2)
+
+
+extern void printchar ();
+
+/* Print the fastmap in human-readable form. */
+
+void
+print_fastmap (fastmap)
+ char *fastmap;
{
- long ret;
+ unsigned was_a_range = 0;
+ unsigned i = 0;
+
+ while (i < (1 << BYTEWIDTH))
+ {
+ if (fastmap[i++])
+ {
+ was_a_range = 0;
+ printchar (i - 1);
+ while (i < (1 << BYTEWIDTH) && fastmap[i])
+ {
+ was_a_range = 1;
+ i++;
+ }
+ if (was_a_range)
+ {
+ printf ("-");
+ printchar (i - 1);
+ }
+ }
+ }
+ putchar ('\n');
+}
- ret = obscure_syntax;
- obscure_syntax = syntax;
- return ret;
+
+/* Print a compiled pattern string in human-readable form, starting at
+ the START pointer into it and ending just before the pointer END. */
+
+void
+print_partial_compiled_pattern (start, end)
+ unsigned char *start;
+ unsigned char *end;
+{
+ int mcnt, mcnt2;
+ unsigned char *p = start;
+ unsigned char *pend = end;
+
+ if (start == NULL)
+ {
+ printf ("(null)\n");
+ return;
+ }
+
+ /* Loop over pattern commands. */
+ while (p < pend)
+ {
+ printf ("%d:\t", p - start);
+
+ switch ((re_opcode_t) *p++)
+ {
+ case no_op:
+ printf ("/no_op");
+ break;
+
+ case exactn:
+ mcnt = *p++;
+ printf ("/exactn/%d", mcnt);
+ do
+ {
+ putchar ('/');
+ printchar (*p++);
+ }
+ while (--mcnt);
+ break;
+
+ case start_memory:
+ mcnt = *p++;
+ printf ("/start_memory/%d/%d", mcnt, *p++);
+ break;
+
+ case stop_memory:
+ mcnt = *p++;
+ printf ("/stop_memory/%d/%d", mcnt, *p++);
+ break;
+
+ case duplicate:
+ printf ("/duplicate/%d", *p++);
+ break;
+
+ case anychar:
+ printf ("/anychar");
+ break;
+
+ case charset:
+ case charset_not:
+ {
+ register int c, last = -100;
+ register int in_range = 0;
+
+ printf ("/charset [%s",
+ (re_opcode_t) *(p - 1) == charset_not ? "^" : "");
+
+ assert (p + *p < pend);
+
+ for (c = 0; c < 256; c++)
+ if (c / 8 < *p
+ && (p[1 + (c/8)] & (1 << (c % 8))))
+ {
+ /* Are we starting a range? */
+ if (last + 1 == c && ! in_range)
+ {
+ putchar ('-');
+ in_range = 1;
+ }
+ /* Have we broken a range? */
+ else if (last + 1 != c && in_range)
+ {
+ printchar (last);
+ in_range = 0;
+ }
+
+ if (! in_range)
+ printchar (c);
+
+ last = c;
+ }
+
+ if (in_range)
+ printchar (last);
+
+ putchar (']');
+
+ p += 1 + *p;
+ }
+ break;
+
+ case begline:
+ printf ("/begline");
+ break;
+
+ case endline:
+ printf ("/endline");
+ break;
+
+ case on_failure_jump:
+ extract_number_and_incr (&mcnt, &p);
+ printf ("/on_failure_jump to %d", p + mcnt - start);
+ break;
+
+ case on_failure_keep_string_jump:
+ extract_number_and_incr (&mcnt, &p);
+ printf ("/on_failure_keep_string_jump to %d", p + mcnt - start);
+ break;
+
+ case dummy_failure_jump:
+ extract_number_and_incr (&mcnt, &p);
+ printf ("/dummy_failure_jump to %d", p + mcnt - start);
+ break;
+
+ case push_dummy_failure:
+ printf ("/push_dummy_failure");
+ break;
+
+ case maybe_pop_jump:
+ extract_number_and_incr (&mcnt, &p);
+ printf ("/maybe_pop_jump to %d", p + mcnt - start);
+ break;
+
+ case pop_failure_jump:
+ extract_number_and_incr (&mcnt, &p);
+ printf ("/pop_failure_jump to %d", p + mcnt - start);
+ break;
+
+ case jump_past_alt:
+ extract_number_and_incr (&mcnt, &p);
+ printf ("/jump_past_alt to %d", p + mcnt - start);
+ break;
+
+ case jump:
+ extract_number_and_incr (&mcnt, &p);
+ printf ("/jump to %d", p + mcnt - start);
+ break;
+
+ case succeed_n:
+ extract_number_and_incr (&mcnt, &p);
+ extract_number_and_incr (&mcnt2, &p);
+ printf ("/succeed_n to %d, %d times", p + mcnt - start, mcnt2);
+ break;
+
+ case jump_n:
+ extract_number_and_incr (&mcnt, &p);
+ extract_number_and_incr (&mcnt2, &p);
+ printf ("/jump_n to %d, %d times", p + mcnt - start, mcnt2);
+ break;
+
+ case set_number_at:
+ extract_number_and_incr (&mcnt, &p);
+ extract_number_and_incr (&mcnt2, &p);
+ printf ("/set_number_at location %d to %d", p + mcnt - start, mcnt2);
+ break;
+
+ case wordbound:
+ printf ("/wordbound");
+ break;
+
+ case notwordbound:
+ printf ("/notwordbound");
+ break;
+
+ case wordbeg:
+ printf ("/wordbeg");
+ break;
+
+ case wordend:
+ printf ("/wordend");
+
+#ifdef emacs
+ case before_dot:
+ printf ("/before_dot");
+ break;
+
+ case at_dot:
+ printf ("/at_dot");
+ break;
+
+ case after_dot:
+ printf ("/after_dot");
+ break;
+
+ case syntaxspec:
+ printf ("/syntaxspec");
+ mcnt = *p++;
+ printf ("/%d", mcnt);
+ break;
+
+ case notsyntaxspec:
+ printf ("/notsyntaxspec");
+ mcnt = *p++;
+ printf ("/%d", mcnt);
+ break;
+#endif /* emacs */
+
+ case wordchar:
+ printf ("/wordchar");
+ break;
+
+ case notwordchar:
+ printf ("/notwordchar");
+ break;
+
+ case begbuf:
+ printf ("/begbuf");
+ break;
+
+ case endbuf:
+ printf ("/endbuf");
+ break;
+
+ default:
+ printf ("?%d", *(p-1));
+ }
+
+ putchar ('\n');
+ }
+
+ printf ("%d:\tend of pattern.\n", p - start);
+}
+
+
+void
+print_compiled_pattern (bufp)
+ struct re_pattern_buffer *bufp;
+{
+ unsigned char *buffer = bufp->buffer;
+
+ print_partial_compiled_pattern (buffer, buffer + bufp->used);
+ printf ("%d bytes used/%d bytes allocated.\n", bufp->used, bufp->allocated);
+
+ if (bufp->fastmap_accurate && bufp->fastmap)
+ {
+ printf ("fastmap: ");
+ print_fastmap (bufp->fastmap);
+ }
+
+ printf ("re_nsub: %d\t", bufp->re_nsub);
+ printf ("regs_alloc: %d\t", bufp->regs_allocated);
+ printf ("can_be_null: %d\t", bufp->can_be_null);
+ printf ("newline_anchor: %d\n", bufp->newline_anchor);
+ printf ("no_sub: %d\t", bufp->no_sub);
+ printf ("not_bol: %d\t", bufp->not_bol);
+ printf ("not_eol: %d\t", bufp->not_eol);
+ printf ("syntax: %d\n", bufp->syntax);
+ /* Perhaps we should print the translate table? */
+}
+
+
+void
+print_double_string (where, string1, size1, string2, size2)
+ const char *where;
+ const char *string1;
+ const char *string2;
+ int size1;
+ int size2;
+{
+ unsigned this_char;
+
+ if (where == NULL)
+ printf ("(null)");
+ else
+ {
+ if (FIRST_STRING_P (where))
+ {
+ for (this_char = where - string1; this_char < size1; this_char++)
+ printchar (string1[this_char]);
+
+ where = string2;
+ }
+
+ for (this_char = where - string2; this_char < size2; this_char++)
+ printchar (string2[this_char]);
+ }
}
-/* Set by re_set_syntax to the current regexp syntax to recognize. */
-long obscure_syntax = 0;
+#else /* not DEBUG */
+
+#undef assert
+#define assert(e)
+#define DEBUG_STATEMENT(e)
+#define DEBUG_PRINT1(x)
+#define DEBUG_PRINT2(x1, x2)
+#define DEBUG_PRINT3(x1, x2, x3)
+#define DEBUG_PRINT4(x1, x2, x3, x4)
+#define DEBUG_PRINT_COMPILED_PATTERN(p, s, e)
+#define DEBUG_PRINT_DOUBLE_STRING(w, s1, sz1, s2, sz2)
+#endif /* not DEBUG */
-/* Macros for re_compile_pattern, which is found below these definitions. */
+/* Set by `re_set_syntax' to the current regexp syntax to recognize. Can
+ also be assigned to arbitrarily: each pattern buffer stores its own
+ syntax, so it can be changed between regex compilations. */
+reg_syntax_t re_syntax_options = RE_SYNTAX_EMACS;
-#define CHAR_CLASS_MAX_LENGTH 6
-/* Fetch the next character in the uncompiled pattern, translating it if
- necessary. */
+/* Specify the precise syntax of regexps for compilation. This provides
+ for compatibility for various utilities which historically have
+ different, incompatible syntaxes.
+
+ The argument SYNTAX is a bit mask comprised of the various bits
+ defined in regex.h. We return the old syntax. */
+
+reg_syntax_t
+re_set_syntax (syntax)
+ reg_syntax_t syntax;
+{
+ reg_syntax_t ret = re_syntax_options;
+
+ re_syntax_options = syntax;
+ return ret;
+}
+
+/* This table gives an error message for each of the error codes listed
+ in regex.h. Obviously the order here has to be same as there. */
+
+static const char *re_error_msg[] =
+ { NULL, /* REG_NOERROR */
+ "No match", /* REG_NOMATCH */
+ "Invalid regular expression", /* REG_BADPAT */
+ "Invalid collation character", /* REG_ECOLLATE */
+ "Invalid character class name", /* REG_ECTYPE */
+ "Trailing backslash", /* REG_EESCAPE */
+ "Invalid back reference", /* REG_ESUBREG */
+ "Unmatched [ or [^", /* REG_EBRACK */
+ "Unmatched ( or \\(", /* REG_EPAREN */
+ "Unmatched \\{", /* REG_EBRACE */
+ "Invalid content of \\{\\}", /* REG_BADBR */
+ "Invalid range end", /* REG_ERANGE */
+ "Memory exhausted", /* REG_ESPACE */
+ "Invalid preceding regular expression", /* REG_BADRPT */
+ "Premature end of regular expression", /* REG_EEND */
+ "Regular expression too big", /* REG_ESIZE */
+ "Unmatched ) or \\)", /* REG_ERPAREN */
+ };
+
+/* Subroutine declarations and macros for regex_compile. */
+
+static reg_errcode_t regex_compile _RE_ARGS((const char *pattern, size_t size,
+ reg_syntax_t syntax,
+ struct re_pattern_buffer *bufp));
+static void store_op1 _RE_ARGS((re_opcode_t op, unsigned char *loc, int arg));
+static void store_op2 _RE_ARGS((re_opcode_t op, unsigned char *loc,
+ int arg1, int arg2));
+static void insert_op1 _RE_ARGS((re_opcode_t op, unsigned char *loc,
+ int arg, unsigned char *end));
+static void insert_op2 _RE_ARGS((re_opcode_t op, unsigned char *loc,
+ int arg1, int arg2, unsigned char *end));
+static boolean at_begline_loc_p _RE_ARGS((const char *pattern, const char *p,
+ reg_syntax_t syntax));
+static boolean at_endline_loc_p _RE_ARGS((const char *p, const char *pend,
+ reg_syntax_t syntax));
+static reg_errcode_t compile_range _RE_ARGS((const char **p_ptr,
+ const char *pend,
+ char *translate,
+ reg_syntax_t syntax,
+ unsigned char *b));
+
+/* Fetch the next character in the uncompiled pattern---translating it
+ if necessary. Also cast from a signed character in the constant
+ string passed to us by the user to an unsigned char that we can use
+ as an array index (in, e.g., `translate'). */
#define PATFETCH(c) \
- {if (p == pend) goto end_of_pattern; \
- c = * (unsigned char *) p++; \
- if (translate) c = translate[c]; }
+ do {if (p == pend) return REG_EEND; \
+ c = (unsigned char) *p++; \
+ if (translate) c = translate[c]; \
+ } while (0)
/* Fetch the next character in the uncompiled pattern, with no
translation. */
#define PATFETCH_RAW(c) \
- {if (p == pend) goto end_of_pattern; \
- c = * (unsigned char *) p++; }
+ do {if (p == pend) return REG_EEND; \
+ c = (unsigned char) *p++; \
+ } while (0)
+/* Go backwards one character in the pattern. */
#define PATUNFETCH p--
+/* If `translate' is non-null, return translate[D], else just D. We
+ cast the subscript to translate because some data is declared as
+ `char *', to avoid warnings when a string constant is passed. But
+ when we use a character as a subscript we must make it unsigned. */
+#define TRANSLATE(d) (translate ? translate[(unsigned char) (d)] : (d))
+
+
+/* Macros for outputting the compiled pattern into `buffer'. */
+
/* If the buffer isn't allocated when it comes in, use this. */
-#define INIT_BUF_SIZE 28
+#define INIT_BUF_SIZE 32
/* Make sure we have at least N more bytes of space in buffer. */
#define GET_BUFFER_SPACE(n) \
- { \
- while (b - bufp->buffer + (n) >= bufp->allocated) \
- EXTEND_BUFFER; \
- }
+ while (b - bufp->buffer + (n) > bufp->allocated) \
+ EXTEND_BUFFER ()
-/* Make sure we have one more byte of buffer space and then add CH to it. */
-#define BUFPUSH(ch) \
- { \
+/* Make sure we have one more byte of buffer space and then add C to it. */
+#define BUF_PUSH(c) \
+ do { \
GET_BUFFER_SPACE (1); \
- *b++ = (char) (ch); \
- }
-
-/* Extend the buffer by twice its current size via reallociation and
- reset the pointers that pointed into the old allocation to point to
- the correct places in the new allocation. If extending the buffer
- results in it being larger than 1 << 16, then flag memory exhausted. */
-#define EXTEND_BUFFER \
- { char *old_buffer = bufp->buffer; \
- if (bufp->allocated == (1L<<16)) goto too_big; \
- bufp->allocated *= 2; \
- if (bufp->allocated > (1L<<16)) bufp->allocated = (1L<<16); \
- bufp->buffer = (char *) realloc (bufp->buffer, bufp->allocated); \
- if (bufp->buffer == 0) \
- goto memory_exhausted; \
- b = (b - old_buffer) + bufp->buffer; \
- if (fixup_jump) \
- fixup_jump = (fixup_jump - old_buffer) + bufp->buffer; \
- if (laststart) \
- laststart = (laststart - old_buffer) + bufp->buffer; \
- begalt = (begalt - old_buffer) + bufp->buffer; \
- if (pending_exact) \
- pending_exact = (pending_exact - old_buffer) + bufp->buffer; \
- }
+ *b++ = (unsigned char) (c); \
+ } while (0)
+
+
+/* Ensure we have two more bytes of buffer space and then append C1 and C2. */
+#define BUF_PUSH_2(c1, c2) \
+ do { \
+ GET_BUFFER_SPACE (2); \
+ *b++ = (unsigned char) (c1); \
+ *b++ = (unsigned char) (c2); \
+ } while (0)
+
+
+/* As with BUF_PUSH_2, except for three bytes. */
+#define BUF_PUSH_3(c1, c2, c3) \
+ do { \
+ GET_BUFFER_SPACE (3); \
+ *b++ = (unsigned char) (c1); \
+ *b++ = (unsigned char) (c2); \
+ *b++ = (unsigned char) (c3); \
+ } while (0)
+
+
+/* Store a jump with opcode OP at LOC to location TO. We store a
+ relative address offset by the three bytes the jump itself occupies. */
+#define STORE_JUMP(op, loc, to) \
+ store_op1 (op, loc, (int)((to) - (loc) - 3))
+
+/* Likewise, for a two-argument jump. */
+#define STORE_JUMP2(op, loc, to, arg) \
+ store_op2 (op, loc, (int)((to) - (loc) - 3), arg)
+
+/* Like `STORE_JUMP', but for inserting. Assume `b' is the buffer end. */
+#define INSERT_JUMP(op, loc, to) \
+ insert_op1 (op, loc, (int)((to) - (loc) - 3), b)
+
+/* Like `STORE_JUMP2', but for inserting. Assume `b' is the buffer end. */
+#define INSERT_JUMP2(op, loc, to, arg) \
+ insert_op2 (op, loc, (int)((to) - (loc) - 3), arg, b)
+
+
+/* This is not an arbitrary limit: the arguments which represent offsets
+ into the pattern are two bytes long. So if 2^16 bytes turns out to
+ be too small, many things would have to change. */
+/* Any other compiler which, like MSC, has allocation limit below 2^16
+ bytes will have to use approach similar to what was done below for
+ MSC and drop MAX_BUF_SIZE a bit. Otherwise you may end up
+ reallocating to 0 bytes. Such thing is not going to work too well.
+ You have been warned!! */
+#ifdef _MSC_VER
+/* Microsoft C 16-bit versions limit malloc to approx 65512 bytes.
+ The REALLOC define eliminates a flurry of conversion warnings,
+ but is not required. */
+#define MAX_BUF_SIZE 65500L
+#define REALLOC(p,s) realloc((p), (size_t) (s))
+#else
+#define MAX_BUF_SIZE (1L << 16)
+#define REALLOC realloc
+#endif
-/* Set the bit for character C in a character set list. */
-#define SET_LIST_BIT(c) (b[(c) / BYTEWIDTH] |= 1 << ((c) % BYTEWIDTH))
+/* Extend the buffer by twice its current size via realloc and
+ reset the pointers that pointed into the old block to point to the
+ correct places in the new one. If extending the buffer results in it
+ being larger than MAX_BUF_SIZE, then flag memory exhausted. */
+#define EXTEND_BUFFER() \
+ do { \
+ unsigned char *old_buffer = bufp->buffer; \
+ if (bufp->allocated == MAX_BUF_SIZE) \
+ return REG_ESIZE; \
+ bufp->allocated <<= 1; \
+ if (bufp->allocated > MAX_BUF_SIZE) \
+ bufp->allocated = MAX_BUF_SIZE; \
+ bufp->buffer = (unsigned char *) REALLOC(bufp->buffer, bufp->allocated);\
+ if (bufp->buffer == NULL) \
+ return REG_ESPACE; \
+ /* If the buffer moved, move all the pointers into it. */ \
+ if (old_buffer != bufp->buffer) \
+ { \
+ b = (b - old_buffer) + bufp->buffer; \
+ begalt = (begalt - old_buffer) + bufp->buffer; \
+ if (fixup_alt_jump) \
+ fixup_alt_jump = (fixup_alt_jump - old_buffer) + bufp->buffer;\
+ if (laststart) \
+ laststart = (laststart - old_buffer) + bufp->buffer; \
+ if (pending_exact) \
+ pending_exact = (pending_exact - old_buffer) + bufp->buffer; \
+ } \
+ } while (0)
-/* Get the next unsigned number in the uncompiled pattern. */
-#define GET_UNSIGNED_NUMBER(num) \
- { if (p != pend) \
- { \
- PATFETCH (c); \
- while (isdigit (c)) \
- { \
- if (num < 0) \
- num = 0; \
- num = num * 10 + c - '0'; \
- if (p == pend) \
- break; \
- PATFETCH (c); \
- } \
- } \
- }
-/* Subroutines for re_compile_pattern. */
-/* static void store_jump (), insert_jump (), store_jump_n (),
- insert_jump_n (), insert_op_2 (); */
+/* Since we have one byte reserved for the register number argument to
+ {start,stop}_memory, the maximum number of groups we can report
+ things about is what fits in that byte. */
+#define MAX_REGNUM 255
+/* But patterns can have more than `MAX_REGNUM' registers. We just
+ ignore the excess. */
+typedef unsigned regnum_t;
-/* re_compile_pattern takes a regular-expression string
- and converts it into a buffer full of byte commands for matching.
- PATTERN is the address of the pattern string
- SIZE is the length of it.
- BUFP is a struct re_pattern_buffer * which points to the info
- on where to store the byte commands.
- This structure contains a char * which points to the
- actual space, which should have been obtained with malloc.
- re_compile_pattern may use realloc to grow the buffer space.
+/* Macros for the compile stack. */
- The number of bytes of commands can be found out by looking in
- the `struct re_pattern_buffer' that bufp pointed to, after
- re_compile_pattern returns. */
+/* Since offsets can go either forwards or backwards, this type needs to
+ be able to hold values from -(MAX_BUF_SIZE - 1) to MAX_BUF_SIZE - 1. */
+/* int may be not enough when sizeof(int) == 2 */
+typedef long pattern_offset_t;
-char *
-re_compile_pattern (pattern, size, bufp)
- char *pattern;
- size_t size;
- struct re_pattern_buffer *bufp;
+typedef struct
{
- register char *b = bufp->buffer;
- register char *p = pattern;
- char *pend = pattern + size;
- register unsigned c, c1;
- char *p0;
- unsigned char *translate = (unsigned char *) bufp->translate;
-
- /* Address of the count-byte of the most recently inserted `exactn'
- command. This makes it possible to tell whether a new exact-match
- character can be added to that command or requires a new `exactn'
- command. */
-
- char *pending_exact = 0;
+ pattern_offset_t begalt_offset;
+ pattern_offset_t fixup_alt_jump;
+ pattern_offset_t inner_group_offset;
+ pattern_offset_t laststart_offset;
+ regnum_t regnum;
+} compile_stack_elt_t;
- /* Address of the place where a forward-jump should go to the end of
- the containing expression. Each alternative of an `or', except the
- last, ends with a forward-jump of this sort. */
- char *fixup_jump = 0;
+typedef struct
+{
+ compile_stack_elt_t *stack;
+ unsigned size;
+ unsigned avail; /* Offset of next open position. */
+} compile_stack_type;
- /* Address of start of the most recently finished expression.
- This tells postfix * where to find the start of its operand. */
- char *laststart = 0;
+#define INIT_COMPILE_STACK_SIZE 32
- /* In processing a repeat, 1 means zero matches is allowed. */
+#define COMPILE_STACK_EMPTY (compile_stack.avail == 0)
+#define COMPILE_STACK_FULL (compile_stack.avail == compile_stack.size)
- char zero_times_ok;
+/* The next available element. */
+#define COMPILE_STACK_TOP (compile_stack.stack[compile_stack.avail])
- /* In processing a repeat, 1 means many matches is allowed. */
- char many_times_ok;
+/* Set the bit for character C in a list. */
+#define SET_LIST_BIT(c) \
+ (b[((unsigned char) (c)) / BYTEWIDTH] \
+ |= 1 << (((unsigned char) c) % BYTEWIDTH))
- /* Address of beginning of regexp, or inside of last \(. */
- char *begalt = b;
+/* Get the next unsigned number in the uncompiled pattern. */
+#define GET_UNSIGNED_NUMBER(num) \
+ { if (p != pend) \
+ { \
+ PATFETCH (c); \
+ while (ISDIGIT (c)) \
+ { \
+ if (num < 0) \
+ num = 0; \
+ num = num * 10 + c - '0'; \
+ if (p == pend) \
+ break; \
+ PATFETCH (c); \
+ } \
+ } \
+ }
+
+#define CHAR_CLASS_MAX_LENGTH 6 /* Namely, `xdigit'. */
+
+#define IS_CHAR_CLASS(string) \
+ (STREQ (string, "alpha") || STREQ (string, "upper") \
+ || STREQ (string, "lower") || STREQ (string, "digit") \
+ || STREQ (string, "alnum") || STREQ (string, "xdigit") \
+ || STREQ (string, "space") || STREQ (string, "print") \
+ || STREQ (string, "punct") || STREQ (string, "graph") \
+ || STREQ (string, "cntrl") || STREQ (string, "blank"))
+
+static boolean group_in_compile_stack _RE_ARGS((compile_stack_type
+ compile_stack,
+ regnum_t regnum));
+
+/* `regex_compile' compiles PATTERN (of length SIZE) according to SYNTAX.
+ Returns one of error codes defined in `regex.h', or zero for success.
+
+ Assumes the `allocated' (and perhaps `buffer') and `translate'
+ fields are set in BUFP on entry.
+
+ If it succeeds, results are put in BUFP (if it returns an error, the
+ contents of BUFP are undefined):
+ `buffer' is the compiled pattern;
+ `syntax' is set to SYNTAX;
+ `used' is set to the length of the compiled pattern;
+ `fastmap_accurate' is zero;
+ `re_nsub' is the number of subexpressions in PATTERN;
+ `not_bol' and `not_eol' are zero;
+
+ The `fastmap' and `newline_anchor' fields are neither
+ examined nor set. */
- /* In processing an interval, at least this many matches must be made. */
- int lower_bound;
+static reg_errcode_t
+regex_compile (pattern, size, syntax, bufp)
+ const char *pattern;
+ size_t size;
+ reg_syntax_t syntax;
+ struct re_pattern_buffer *bufp;
+{
+ /* We fetch characters from PATTERN here. Even though PATTERN is
+ `char *' (i.e., signed), we declare these variables as unsigned, so
+ they can be reliably used as array indices. */
+ register unsigned char c, c1;
+
+ /* A random tempory spot in PATTERN. */
+ const char *p1;
- /* In processing an interval, at most this many matches can be made. */
- int upper_bound;
+ /* Points to the end of the buffer, where we should append. */
+ register unsigned char *b;
+
+ /* Keeps track of unclosed groups. */
+ compile_stack_type compile_stack;
- /* Place in pattern (i.e., the {) to which to go back if the interval
- is invalid. */
- char *beg_interval = 0;
+ /* Points to the current (ending) position in the pattern. */
+ const char *p = pattern;
+ const char *pend = pattern + size;
- /* Stack of information saved by \( and restored by \).
- Four stack elements are pushed by each \(:
- First, the value of b.
- Second, the value of fixup_jump.
- Third, the value of regnum.
- Fourth, the value of begalt. */
+ /* How to translate the characters in the pattern. */
+ char *translate = bufp->translate;
- int stackb[40];
- int *stackp = stackb;
- int *stacke = stackb + 40;
- int *stackt;
+ /* Address of the count-byte of the most recently inserted `exactn'
+ command. This makes it possible to tell if a new exact-match
+ character can be added to that command or if the character requires
+ a new `exactn' command. */
+ unsigned char *pending_exact = 0;
- /* Counts \('s as they are encountered. Remembered for the matching \),
- where it becomes the register number to put in the stop_memory
- command. */
+ /* Address of start of the most recently finished expression.
+ This tells, e.g., postfix * where to find the start of its
+ operand. Reset at the beginning of groups and alternatives. */
+ unsigned char *laststart = 0;
- int regnum = 1;
+ /* Address of beginning of regexp, or inside of last group. */
+ unsigned char *begalt;
+
+ /* Place in the uncompiled pattern (i.e., the {) to
+ which to go back if the interval is invalid. */
+ const char *beg_interval;
+
+ /* Address of the place where a forward jump should go to the end of
+ the containing expression. Each alternative of an `or' -- except the
+ last -- ends with a forward jump of this sort. */
+ unsigned char *fixup_alt_jump = 0;
+
+ /* Counts open-groups as they are encountered. Remembered for the
+ matching close-group on the compile stack, so the same register
+ number is put in the stop_memory as the start_memory. */
+ regnum_t regnum = 0;
+
+#ifdef DEBUG
+ DEBUG_PRINT1 ("\nCompiling pattern: ");
+ if (debug)
+ {
+ unsigned debug_count;
+
+ for (debug_count = 0; debug_count < size; debug_count++)
+ printchar (pattern[debug_count]);
+ putchar ('\n');
+ }
+#endif /* DEBUG */
+ /* Initialize the compile stack. */
+ compile_stack.stack = TALLOC (INIT_COMPILE_STACK_SIZE, compile_stack_elt_t);
+ if (compile_stack.stack == NULL)
+ return REG_ESPACE;
+
+ compile_stack.size = INIT_COMPILE_STACK_SIZE;
+ compile_stack.avail = 0;
+
+ /* Initialize the pattern buffer. */
+ bufp->syntax = syntax;
bufp->fastmap_accurate = 0;
+ bufp->not_bol = bufp->not_eol = 0;
-#ifndef emacs
-#ifndef SYNTAX_TABLE
+ /* Set `used' to zero, so that if we return an error, the pattern
+ printer (for debugging) will think there's no pattern. We reset it
+ at the end. */
+ bufp->used = 0;
+
+ /* Always count groups, whether or not bufp->no_sub is set. */
+ bufp->re_nsub = 0;
+
+#if !defined (emacs) && !defined (SYNTAX_TABLE)
/* Initialize the syntax table. */
- init_syntax_once();
-#endif
+ init_syntax_once ();
#endif
if (bufp->allocated == 0)
{
- bufp->allocated = INIT_BUF_SIZE;
if (bufp->buffer)
- /* EXTEND_BUFFER loses when bufp->allocated is 0. */
- bufp->buffer = (char *) realloc (bufp->buffer, INIT_BUF_SIZE);
+ { /* If zero allocated, but buffer is non-null, try to realloc
+ enough space. This loses if buffer's address is bogus, but
+ that is the user's responsibility. */
+ RETALLOC (bufp->buffer, INIT_BUF_SIZE, unsigned char);
+ }
else
- /* Caller did not allocate a buffer. Do it for them. */
- bufp->buffer = (char *) malloc (INIT_BUF_SIZE);
- if (!bufp->buffer) goto memory_exhausted;
- begalt = b = bufp->buffer;
+ { /* Caller did not allocate a buffer. Do it for them. */
+ bufp->buffer = TALLOC (INIT_BUF_SIZE, unsigned char);
+ }
+ if (!bufp->buffer) return REG_ESPACE;
+
+ bufp->allocated = INIT_BUF_SIZE;
}
+ begalt = b = bufp->buffer;
+
+ /* Loop through the uncompiled pattern until we're at the end. */
while (p != pend)
{
PATFETCH (c);
switch (c)
- {
- case '$':
- {
- char *p1 = p;
- /* When testing what follows the $,
- look past the \-constructs that don't consume anything. */
- if (! (obscure_syntax & RE_CONTEXT_INDEP_OPS))
- while (p1 != pend)
- {
- if (*p1 == '\\' && p1 + 1 != pend
- && (p1[1] == '<' || p1[1] == '>'
- || p1[1] == '`' || p1[1] == '\''
-#ifdef emacs
- || p1[1] == '='
-#endif
- || p1[1] == 'b' || p1[1] == 'B'))
- p1 += 2;
- else
- break;
- }
- if (obscure_syntax & RE_TIGHT_VBAR)
- {
- if (! (obscure_syntax & RE_CONTEXT_INDEP_OPS) && p1 != pend)
- goto normal_char;
- /* Make operand of last vbar end before this `$'. */
- if (fixup_jump)
- store_jump (fixup_jump, jump, b);
- fixup_jump = 0;
- BUFPUSH (endline);
- break;
- }
- /* $ means succeed if at end of line, but only in special contexts.
- If validly in the middle of a pattern, it is a normal character. */
-
- if ((obscure_syntax & RE_CONTEXTUAL_INVALID_OPS) && p1 != pend)
- goto invalid_pattern;
- if (p1 == pend || *p1 == '\n'
- || (obscure_syntax & RE_CONTEXT_INDEP_OPS)
- || (obscure_syntax & RE_NO_BK_PARENS
- ? *p1 == ')'
- : *p1 == '\\' && p1[1] == ')')
- || (obscure_syntax & RE_NO_BK_VBAR
- ? *p1 == '|'
- : *p1 == '\\' && p1[1] == '|'))
- {
- BUFPUSH (endline);
- break;
- }
- goto normal_char;
+ {
+ case '^':
+ {
+ if ( /* If at start of pattern, it's an operator. */
+ p == pattern + 1
+ /* If context independent, it's an operator. */
+ || syntax & RE_CONTEXT_INDEP_ANCHORS
+ /* Otherwise, depends on what's come before. */
+ || at_begline_loc_p (pattern, p, syntax))
+ BUF_PUSH (begline);
+ else
+ goto normal_char;
}
- case '^':
- /* ^ means succeed if at beg of line, but only if no preceding
- pattern. */
-
- if ((obscure_syntax & RE_CONTEXTUAL_INVALID_OPS) && laststart)
- goto invalid_pattern;
- if (laststart && p - 2 >= pattern && p[-2] != '\n'
- && !(obscure_syntax & RE_CONTEXT_INDEP_OPS))
- goto normal_char;
- if (obscure_syntax & RE_TIGHT_VBAR)
- {
- if (p != pattern + 1
- && ! (obscure_syntax & RE_CONTEXT_INDEP_OPS))
- goto normal_char;
- BUFPUSH (begline);
- begalt = b;
- }
- else
- BUFPUSH (begline);
- break;
+ break;
+
+
+ case '$':
+ {
+ if ( /* If at end of pattern, it's an operator. */
+ p == pend
+ /* If context independent, it's an operator. */
+ || syntax & RE_CONTEXT_INDEP_ANCHORS
+ /* Otherwise, depends on what's next. */
+ || at_endline_loc_p (p, pend, syntax))
+ BUF_PUSH (endline);
+ else
+ goto normal_char;
+ }
+ break;
+
case '+':
- case '?':
- if ((obscure_syntax & RE_BK_PLUS_QM)
- || (obscure_syntax & RE_LIMITED_OPS))
- goto normal_char;
- handle_plus:
- case '*':
- /* If there is no previous pattern, char not special. */
- if (!laststart)
+ case '?':
+ if ((syntax & RE_BK_PLUS_QM)
+ || (syntax & RE_LIMITED_OPS))
+ goto normal_char;
+ handle_plus:
+ case '*':
+ /* If there is no previous pattern... */
+ if (!laststart)
{
- if (obscure_syntax & RE_CONTEXTUAL_INVALID_OPS)
- goto invalid_pattern;
- else if (! (obscure_syntax & RE_CONTEXT_INDEP_OPS))
- goto normal_char;
+ if (syntax & RE_CONTEXT_INVALID_OPS)
+ return REG_BADRPT;
+ else if (!(syntax & RE_CONTEXT_INDEP_OPS))
+ goto normal_char;
}
- /* If there is a sequence of repetition chars,
- collapse it down to just one. */
- zero_times_ok = 0;
- many_times_ok = 0;
- while (1)
- {
- zero_times_ok |= c != '+';
- many_times_ok |= c != '?';
- if (p == pend)
- break;
- PATFETCH (c);
- if (c == '*')
- ;
- else if (!(obscure_syntax & RE_BK_PLUS_QM)
- && (c == '+' || c == '?'))
- ;
- else if ((obscure_syntax & RE_BK_PLUS_QM)
- && c == '\\')
- {
- /* int c1; */
- PATFETCH (c1);
- if (!(c1 == '+' || c1 == '?'))
- {
- PATUNFETCH;
- PATUNFETCH;
- break;
- }
- c = c1;
- }
- else
- {
- PATUNFETCH;
- break;
- }
- }
- /* Star, etc. applied to an empty pattern is equivalent
- to an empty pattern. */
- if (!laststart)
- break;
+ {
+ /* Are we optimizing this jump? */
+ boolean keep_string_p = false;
+
+ /* 1 means zero (many) matches is allowed. */
+ char zero_times_ok = 0, many_times_ok = 0;
+
+ /* If there is a sequence of repetition chars, collapse it
+ down to just one (the right one). We can't combine
+ interval operators with these because of, e.g., `a{2}*',
+ which should only match an even number of `a's. */
+
+ for (;;)
+ {
+ zero_times_ok |= c != '+';
+ many_times_ok |= c != '?';
+
+ if (p == pend)
+ break;
+
+ PATFETCH (c);
+
+ if (c == '*'
+ || (!(syntax & RE_BK_PLUS_QM) && (c == '+' || c == '?')))
+ ;
+
+ else if (syntax & RE_BK_PLUS_QM && c == '\\')
+ {
+ if (p == pend) return REG_EESCAPE;
+
+ PATFETCH (c1);
+ if (!(c1 == '+' || c1 == '?'))
+ {
+ PATUNFETCH;
+ PATUNFETCH;
+ break;
+ }
+
+ c = c1;
+ }
+ else
+ {
+ PATUNFETCH;
+ break;
+ }
- /* Now we know whether or not zero matches is allowed
- and also whether or not two or more matches is allowed. */
- if (many_times_ok)
- {
- /* If more than one repetition is allowed, put in at the
- end a backward relative jump from b to before the next
- jump we're going to put in below (which jumps from
- laststart to after this jump). */
- GET_BUFFER_SPACE (3);
- store_jump (b, maybe_finalize_jump, laststart - 3);
- b += 3; /* Because store_jump put stuff here. */
- }
- /* On failure, jump from laststart to b + 3, which will be the
- end of the buffer after this jump is inserted. */
- GET_BUFFER_SPACE (3);
- insert_jump (on_failure_jump, laststart, b + 3, b);
- pending_exact = 0;
- b += 3;
- if (!zero_times_ok)
- {
- /* At least one repetition is required, so insert a
- dummy-failure before the initial on-failure-jump
- instruction of the loop. This effects a skip over that
- instruction the first time we hit that loop. */
- GET_BUFFER_SPACE (6);
- insert_jump (dummy_failure_jump, laststart, laststart + 6, b);
- b += 3;
- }
+ /* If we get here, we found another repeat character. */
+ }
+
+ /* Star, etc. applied to an empty pattern is equivalent
+ to an empty pattern. */
+ if (!laststart)
+ break;
+
+ /* Now we know whether or not zero matches is allowed
+ and also whether or not two or more matches is allowed. */
+ if (many_times_ok)
+ { /* More than one repetition is allowed, so put in at the
+ end a backward relative jump from `b' to before the next
+ jump we're going to put in below (which jumps from
+ laststart to after this jump).
+
+ But if we are at the `*' in the exact sequence `.*\n',
+ insert an unconditional jump backwards to the .,
+ instead of the beginning of the loop. This way we only
+ push a failure point once, instead of every time
+ through the loop. */
+ assert (p - 1 > pattern);
+
+ /* Allocate the space for the jump. */
+ GET_BUFFER_SPACE (3);
+
+ /* We know we are not at the first character of the pattern,
+ because laststart was nonzero. And we've already
+ incremented `p', by the way, to be the character after
+ the `*'. Do we have to do something analogous here
+ for null bytes, because of RE_DOT_NOT_NULL? */
+ if (TRANSLATE (*(p - 2)) == TRANSLATE ('.')
+ && zero_times_ok
+ && p < pend && TRANSLATE (*p) == TRANSLATE ('\n')
+ && !(syntax & RE_DOT_NEWLINE))
+ { /* We have .*\n. */
+ STORE_JUMP (jump, b, laststart);
+ keep_string_p = true;
+ }
+ else
+ /* Anything else. */
+ STORE_JUMP (maybe_pop_jump, b, laststart - 3);
+
+ /* We've added more stuff to the buffer. */
+ b += 3;
+ }
+
+ /* On failure, jump from laststart to b + 3, which will be the
+ end of the buffer after this jump is inserted. */
+ GET_BUFFER_SPACE (3);
+ INSERT_JUMP (keep_string_p ? on_failure_keep_string_jump
+ : on_failure_jump,
+ laststart, b + 3);
+ pending_exact = 0;
+ b += 3;
+
+ if (!zero_times_ok)
+ {
+ /* At least one repetition is required, so insert a
+ `dummy_failure_jump' before the initial
+ `on_failure_jump' instruction of the loop. This
+ effects a skip over that instruction the first time
+ we hit that loop. */
+ GET_BUFFER_SPACE (3);
+ INSERT_JUMP (dummy_failure_jump, laststart, laststart + 6);
+ b += 3;
+ }
+ }
break;
+
case '.':
- laststart = b;
- BUFPUSH (anychar);
- break;
+ laststart = b;
+ BUF_PUSH (anychar);
+ break;
+
case '[':
- if (p == pend)
- goto invalid_pattern;
- while (b - bufp->buffer
- > bufp->allocated - 3 - (1 << BYTEWIDTH) / BYTEWIDTH)
- EXTEND_BUFFER;
-
- laststart = b;
- if (*p == '^')
- {
- BUFPUSH (charset_not);
- p++;
- }
- else
- BUFPUSH (charset);
- p0 = p;
+ {
+ boolean had_char_class = false;
- BUFPUSH ((1 << BYTEWIDTH) / BYTEWIDTH);
- /* Clear the whole map */
- memset (b, 0, (1 << BYTEWIDTH) / BYTEWIDTH);
-
- if ((obscure_syntax & RE_HAT_NOT_NEWLINE) && b[-2] == charset_not)
- SET_LIST_BIT ('\n');
+ if (p == pend) return REG_EBRACK;
+ /* Ensure that we have enough space to push a charset: the
+ opcode, the length count, and the bitset; 34 bytes in all. */
+ GET_BUFFER_SPACE (34);
- /* Read in characters and ranges, setting map bits. */
- while (1)
- {
- /* Don't translate while fetching, in case it's a range bound.
- When we set the bit for the character, we translate it. */
- PATFETCH_RAW (c);
-
- /* If set, \ escapes characters when inside [...]. */
- if ((obscure_syntax & RE_AWK_CLASS_HACK) && c == '\\')
- {
- PATFETCH(c1);
- SET_LIST_BIT (c1);
- continue;
- }
- if (c == ']')
- {
- if (p == p0 + 1)
- {
- /* If this is an empty bracket expression. */
- if ((obscure_syntax & RE_NO_EMPTY_BRACKETS)
- && p == pend)
- goto invalid_pattern;
- }
- else
- /* Stop if this isn't merely a ] inside a bracket
- expression, but rather the end of a bracket
- expression. */
- break;
- }
- /* Get a range. */
- if (p[0] == '-' && p[1] != ']')
- {
- PATFETCH (c1);
- /* Don't translate the range bounds while fetching them. */
- PATFETCH_RAW (c1);
-
- if ((obscure_syntax & RE_NO_EMPTY_RANGES) && c > c1)
- goto invalid_pattern;
-
- if ((obscure_syntax & RE_NO_HYPHEN_RANGE_END)
- && c1 == '-' && *p != ']')
- goto invalid_pattern;
-
- while (c <= c1)
- {
- /* Translate each char that's in the range. */
- if (translate)
- SET_LIST_BIT (translate[c]);
- else
- SET_LIST_BIT (c);
- c++;
- }
- }
- else if ((obscure_syntax & RE_CHAR_CLASSES)
- && c == '[' && p[0] == ':')
- {
- /* Longest valid character class word has six characters. */
- char str[CHAR_CLASS_MAX_LENGTH];
- PATFETCH (c);
- c1 = 0;
- /* If no ] at end. */
- if (p == pend)
- goto invalid_pattern;
- while (1)
- {
- /* Don't translate the ``character class'' characters. */
- PATFETCH_RAW (c);
- if (c == ':' || c == ']' || p == pend
- || c1 == CHAR_CLASS_MAX_LENGTH)
- break;
- str[c1++] = c;
- }
- str[c1] = '\0';
- if (p == pend
- || c == ']' /* End of the bracket expression. */
- || p[0] != ']'
- || p + 1 == pend
- || (strcmp (str, "alpha") != 0
- && strcmp (str, "upper") != 0
- && strcmp (str, "lower") != 0
- && strcmp (str, "digit") != 0
- && strcmp (str, "alnum") != 0
- && strcmp (str, "xdigit") != 0
- && strcmp (str, "space") != 0
- && strcmp (str, "print") != 0
- && strcmp (str, "punct") != 0
- && strcmp (str, "graph") != 0
- && strcmp (str, "cntrl") != 0))
- {
- /* Undo the ending character, the letters, and leave
- the leading : and [ (but set bits for them). */
- c1++;
- while (c1--)
- PATUNFETCH;
- SET_LIST_BIT ('[');
- SET_LIST_BIT (':');
- }
- else
- {
- /* The ] at the end of the character class. */
- PATFETCH (c);
- if (c != ']')
- goto invalid_pattern;
- for (c = 0; c < (1 << BYTEWIDTH); c++)
- {
- if ((strcmp (str, "alpha") == 0 && isalpha (c))
- || (strcmp (str, "upper") == 0 && isupper (c))
- || (strcmp (str, "lower") == 0 && islower (c))
- || (strcmp (str, "digit") == 0 && isdigit (c))
- || (strcmp (str, "alnum") == 0 && isalnum (c))
- || (strcmp (str, "xdigit") == 0 && isxdigit (c))
- || (strcmp (str, "space") == 0 && isspace (c))
- || (strcmp (str, "print") == 0 && isprint (c))
- || (strcmp (str, "punct") == 0 && ispunct (c))
- || (strcmp (str, "graph") == 0 && isgraph (c))
- || (strcmp (str, "cntrl") == 0 && iscntrl (c)))
- SET_LIST_BIT (c);
- }
- }
- }
- else if (translate)
- SET_LIST_BIT (translate[c]);
- else
- SET_LIST_BIT (c);
- }
+ laststart = b;
+
+ /* We test `*p == '^' twice, instead of using an if
+ statement, so we only need one BUF_PUSH. */
+ BUF_PUSH (*p == '^' ? charset_not : charset);
+ if (*p == '^')
+ p++;
- /* Discard any character set/class bitmap bytes that are all
- 0 at the end of the map. Decrement the map-length byte too. */
- while ((int) b[-1] > 0 && b[b[-1] - 1] == 0)
- b[-1]--;
- b += b[-1];
+ /* Remember the first position in the bracket expression. */
+ p1 = p;
+
+ /* Push the number of bytes in the bitmap. */
+ BUF_PUSH ((1 << BYTEWIDTH) / BYTEWIDTH);
+
+ /* Clear the whole map. */
+ bzero (b, (1 << BYTEWIDTH) / BYTEWIDTH);
+
+ /* charset_not matches newline according to a syntax bit. */
+ if ((re_opcode_t) b[-2] == charset_not
+ && (syntax & RE_HAT_LISTS_NOT_NEWLINE))
+ SET_LIST_BIT ('\n');
+
+ /* Read in characters and ranges, setting map bits. */
+ for (;;)
+ {
+ if (p == pend) return REG_EBRACK;
+
+ PATFETCH (c);
+
+ /* \ might escape characters inside [...] and [^...]. */
+ if ((syntax & RE_BACKSLASH_ESCAPE_IN_LISTS) && c == '\\')
+ {
+ if (p == pend) return REG_EESCAPE;
+
+ PATFETCH (c1);
+ SET_LIST_BIT (c1);
+ continue;
+ }
+
+ /* Could be the end of the bracket expression. If it's
+ not (i.e., when the bracket expression is `[]' so
+ far), the ']' character bit gets set way below. */
+ if (c == ']' && p != p1 + 1)
+ break;
+
+ /* Look ahead to see if it's a range when the last thing
+ was a character class. */
+ if (had_char_class && c == '-' && *p != ']')
+ return REG_ERANGE;
+
+ /* Look ahead to see if it's a range when the last thing
+ was a character: if this is a hyphen not at the
+ beginning or the end of a list, then it's the range
+ operator. */
+ if (c == '-'
+ && !(p - 2 >= pattern && p[-2] == '[')
+ && !(p - 3 >= pattern && p[-3] == '[' && p[-2] == '^')
+ && *p != ']')
+ {
+ reg_errcode_t ret
+ = compile_range (&p, pend, translate, syntax, b);
+ if (ret != REG_NOERROR) return ret;
+ }
+
+ else if (p[0] == '-' && p[1] != ']')
+ { /* This handles ranges made up of characters only. */
+ reg_errcode_t ret;
+
+ /* Move past the `-'. */
+ PATFETCH (c1);
+
+ ret = compile_range (&p, pend, translate, syntax, b);
+ if (ret != REG_NOERROR) return ret;
+ }
+
+ /* See if we're at the beginning of a possible character
+ class. */
+
+ else if (syntax & RE_CHAR_CLASSES && c == '[' && *p == ':')
+ { /* Leave room for the null. */
+ char str[CHAR_CLASS_MAX_LENGTH + 1];
+
+ PATFETCH (c);
+ c1 = 0;
+
+ /* If pattern is `[[:'. */
+ if (p == pend) return REG_EBRACK;
+
+ for (;;)
+ {
+ PATFETCH (c);
+ if (c == ':' || c == ']' || p == pend
+ || c1 == CHAR_CLASS_MAX_LENGTH)
+ break;
+ str[c1++] = c;
+ }
+ str[c1] = '\0';
+
+ /* If isn't a word bracketed by `[:' and:`]':
+ undo the ending character, the letters, and leave
+ the leading `:' and `[' (but set bits for them). */
+ if (c == ':' && *p == ']')
+ {
+ int ch;
+ boolean is_alnum = STREQ (str, "alnum");
+ boolean is_alpha = STREQ (str, "alpha");
+ boolean is_blank = STREQ (str, "blank");
+ boolean is_cntrl = STREQ (str, "cntrl");
+ boolean is_digit = STREQ (str, "digit");
+ boolean is_graph = STREQ (str, "graph");
+ boolean is_lower = STREQ (str, "lower");
+ boolean is_print = STREQ (str, "print");
+ boolean is_punct = STREQ (str, "punct");
+ boolean is_space = STREQ (str, "space");
+ boolean is_upper = STREQ (str, "upper");
+ boolean is_xdigit = STREQ (str, "xdigit");
+
+ if (!IS_CHAR_CLASS (str)) return REG_ECTYPE;
+
+ /* Throw away the ] at the end of the character
+ class. */
+ PATFETCH (c);
+
+ if (p == pend) return REG_EBRACK;
+
+ for (ch = 0; ch < 1 << BYTEWIDTH; ch++)
+ {
+ if ( (is_alnum && ISALNUM (ch))
+ || (is_alpha && ISALPHA (ch))
+ || (is_blank && ISBLANK (ch))
+ || (is_cntrl && ISCNTRL (ch))
+ || (is_digit && ISDIGIT (ch))
+ || (is_graph && ISGRAPH (ch))
+ || (is_lower && ISLOWER (ch))
+ || (is_print && ISPRINT (ch))
+ || (is_punct && ISPUNCT (ch))
+ || (is_space && ISSPACE (ch))
+ || (is_upper && ISUPPER (ch))
+ || (is_xdigit && ISXDIGIT (ch)))
+ SET_LIST_BIT (ch);
+ }
+ had_char_class = true;
+ }
+ else
+ {
+ c1++;
+ while (c1--)
+ PATUNFETCH;
+ SET_LIST_BIT ('[');
+ SET_LIST_BIT (':');
+ had_char_class = false;
+ }
+ }
+ else
+ {
+ had_char_class = false;
+ SET_LIST_BIT (c);
+ }
+ }
+
+ /* Discard any (non)matching list bytes that are all 0 at the
+ end of the map. Decrease the map-length byte too. */
+ while ((int) b[-1] > 0 && b[b[-1] - 1] == 0)
+ b[-1]--;
+ b += b[-1];
+ }
break;
+
case '(':
- if (! (obscure_syntax & RE_NO_BK_PARENS))
- goto normal_char;
- else
- goto handle_open;
+ if (syntax & RE_NO_BK_PARENS)
+ goto handle_open;
+ else
+ goto normal_char;
+
+
+ case ')':
+ if (syntax & RE_NO_BK_PARENS)
+ goto handle_close;
+ else
+ goto normal_char;
- case ')':
- if (! (obscure_syntax & RE_NO_BK_PARENS))
- goto normal_char;
- else
- goto handle_close;
case '\n':
- if (! (obscure_syntax & RE_NEWLINE_OR))
- goto normal_char;
- else
- goto handle_bar;
+ if (syntax & RE_NEWLINE_ALT)
+ goto handle_alt;
+ else
+ goto normal_char;
+
case '|':
- if ((obscure_syntax & RE_CONTEXTUAL_INVALID_OPS)
- && (! laststart || p == pend))
- goto invalid_pattern;
- else if (! (obscure_syntax & RE_NO_BK_VBAR))
- goto normal_char;
- else
- goto handle_bar;
+ if (syntax & RE_NO_BK_VBAR)
+ goto handle_alt;
+ else
+ goto normal_char;
- case '{':
- if (! ((obscure_syntax & RE_NO_BK_CURLY_BRACES)
- && (obscure_syntax & RE_INTERVALS)))
- goto normal_char;
- else
+
+ case '{':
+ if (syntax & RE_INTERVALS && syntax & RE_NO_BK_BRACES)
goto handle_interval;
-
+ else
+ goto normal_char;
+
+
case '\\':
- if (p == pend) goto invalid_pattern;
- PATFETCH_RAW (c);
- switch (c)
- {
- case '(':
- if (obscure_syntax & RE_NO_BK_PARENS)
- goto normal_backsl;
- handle_open:
- if (stackp == stacke) goto nesting_too_deep;
-
- /* Laststart should point to the start_memory that we are about
- to push (unless the pattern has RE_NREGS or more ('s). */
- *stackp++ = b - bufp->buffer;
- if (regnum < RE_NREGS)
- {
- BUFPUSH (start_memory);
- BUFPUSH (regnum);
- }
- *stackp++ = fixup_jump ? fixup_jump - bufp->buffer + 1 : 0;
- *stackp++ = regnum++;
- *stackp++ = begalt - bufp->buffer;
- fixup_jump = 0;
- laststart = 0;
- begalt = b;
- break;
-
- case ')':
- if (obscure_syntax & RE_NO_BK_PARENS)
- goto normal_backsl;
- handle_close:
- if (stackp == stackb) goto unmatched_close;
- begalt = *--stackp + bufp->buffer;
- if (fixup_jump)
- store_jump (fixup_jump, jump, b);
- if (stackp[-1] < RE_NREGS)
- {
- BUFPUSH (stop_memory);
- BUFPUSH (stackp[-1]);
- }
- stackp -= 2;
- fixup_jump = *stackp ? *stackp + bufp->buffer - 1 : 0;
- laststart = *--stackp + bufp->buffer;
- break;
-
- case '|':
- if ((obscure_syntax & RE_LIMITED_OPS)
- || (obscure_syntax & RE_NO_BK_VBAR))
- goto normal_backsl;
- handle_bar:
- if (obscure_syntax & RE_LIMITED_OPS)
- goto normal_char;
- /* Insert before the previous alternative a jump which
- jumps to this alternative if the former fails. */
- GET_BUFFER_SPACE (6);
- insert_jump (on_failure_jump, begalt, b + 6, b);
- pending_exact = 0;
- b += 3;
- /* The alternative before the previous alternative has a
- jump after it which gets executed if it gets matched.
- Adjust that jump so it will jump to the previous
- alternative's analogous jump (put in below, which in
- turn will jump to the next (if any) alternative's such
- jump, etc.). The last such jump jumps to the correct
- final destination. */
- if (fixup_jump)
- store_jump (fixup_jump, jump, b);
+ if (p == pend) return REG_EESCAPE;
+
+ /* Do not translate the character after the \, so that we can
+ distinguish, e.g., \B from \b, even if we normally would
+ translate, e.g., B to b. */
+ PATFETCH_RAW (c);
+
+ switch (c)
+ {
+ case '(':
+ if (syntax & RE_NO_BK_PARENS)
+ goto normal_backslash;
+
+ handle_open:
+ bufp->re_nsub++;
+ regnum++;
+
+ if (COMPILE_STACK_FULL)
+ {
+ RETALLOC (compile_stack.stack, compile_stack.size << 1,
+ compile_stack_elt_t);
+ if (compile_stack.stack == NULL) return REG_ESPACE;
+
+ compile_stack.size <<= 1;
+ }
+
+ /* These are the values to restore when we hit end of this
+ group. They are all relative offsets, so that if the
+ whole pattern moves because of realloc, they will still
+ be valid. */
+ COMPILE_STACK_TOP.begalt_offset = begalt - bufp->buffer;
+ COMPILE_STACK_TOP.fixup_alt_jump
+ = fixup_alt_jump ? fixup_alt_jump - bufp->buffer + 1 : 0;
+ COMPILE_STACK_TOP.laststart_offset = b - bufp->buffer;
+ COMPILE_STACK_TOP.regnum = regnum;
+
+ /* We will eventually replace the 0 with the number of
+ groups inner to this one. But do not push a
+ start_memory for groups beyond the last one we can
+ represent in the compiled pattern. */
+ if (regnum <= MAX_REGNUM)
+ {
+ COMPILE_STACK_TOP.inner_group_offset = b - bufp->buffer + 2;
+ BUF_PUSH_3 (start_memory, regnum, 0);
+ }
- /* Leave space for a jump after previous alternative---to be
- filled in later. */
- fixup_jump = b;
- b += 3;
+ compile_stack.avail++;
+ fixup_alt_jump = 0;
laststart = 0;
- begalt = b;
- break;
+ begalt = b;
+ /* If we've reached MAX_REGNUM groups, then this open
+ won't actually generate any code, so we'll have to
+ clear pending_exact explicitly. */
+ pending_exact = 0;
+ break;
- case '{':
- if (! (obscure_syntax & RE_INTERVALS)
- /* Let \{ be a literal. */
- || ((obscure_syntax & RE_INTERVALS)
- && (obscure_syntax & RE_NO_BK_CURLY_BRACES))
- /* If it's the string "\{". */
- || (p - 2 == pattern && p == pend))
- goto normal_backsl;
- handle_interval:
- beg_interval = p - 1; /* The {. */
- /* If there is no previous pattern, this isn't an interval. */
- if (!laststart)
- {
- if (obscure_syntax & RE_CONTEXTUAL_INVALID_OPS)
- goto invalid_pattern;
- else
- goto normal_backsl;
+
+ case ')':
+ if (syntax & RE_NO_BK_PARENS) goto normal_backslash;
+
+ if (COMPILE_STACK_EMPTY)
+ if (syntax & RE_UNMATCHED_RIGHT_PAREN_ORD)
+ goto normal_backslash;
+ else
+ return REG_ERPAREN;
+
+ handle_close:
+ if (fixup_alt_jump)
+ { /* Push a dummy failure point at the end of the
+ alternative for a possible future
+ `pop_failure_jump' to pop. See comments at
+ `push_dummy_failure' in `re_match_2'. */
+ BUF_PUSH (push_dummy_failure);
+
+ /* We allocated space for this jump when we assigned
+ to `fixup_alt_jump', in the `handle_alt' case below. */
+ STORE_JUMP (jump_past_alt, fixup_alt_jump, b - 1);
}
- /* It also isn't an interval if not preceded by an re
- matching a single character or subexpression, or if
- the current type of intervals can't handle back
- references and the previous thing is a back reference. */
- if (! (*laststart == anychar
- || *laststart == charset
- || *laststart == charset_not
- || *laststart == start_memory
- || (*laststart == exactn && laststart[1] == 1)
- || (! (obscure_syntax & RE_NO_BK_REFS)
- && *laststart == duplicate)))
- {
- if (obscure_syntax & RE_NO_BK_CURLY_BRACES)
- goto normal_char;
-
- /* Posix extended syntax is handled in previous
- statement; this is for Posix basic syntax. */
- if (obscure_syntax & RE_INTERVALS)
- goto invalid_pattern;
+
+ /* See similar code for backslashed left paren above. */
+ if (COMPILE_STACK_EMPTY)
+ if (syntax & RE_UNMATCHED_RIGHT_PAREN_ORD)
+ goto normal_char;
+ else
+ return REG_ERPAREN;
+
+ /* Since we just checked for an empty stack above, this
+ ``can't happen''. */
+ assert (compile_stack.avail != 0);
+ {
+ /* We don't just want to restore into `regnum', because
+ later groups should continue to be numbered higher,
+ as in `(ab)c(de)' -- the second group is #2. */
+ regnum_t this_group_regnum;
+
+ compile_stack.avail--;
+ begalt = bufp->buffer + COMPILE_STACK_TOP.begalt_offset;
+ fixup_alt_jump
+ = COMPILE_STACK_TOP.fixup_alt_jump
+ ? bufp->buffer + COMPILE_STACK_TOP.fixup_alt_jump - 1
+ : 0;
+ laststart = bufp->buffer + COMPILE_STACK_TOP.laststart_offset;
+ this_group_regnum = COMPILE_STACK_TOP.regnum;
+ /* If we've reached MAX_REGNUM groups, then this open
+ won't actually generate any code, so we'll have to
+ clear pending_exact explicitly. */
+ pending_exact = 0;
+
+ /* We're at the end of the group, so now we know how many
+ groups were inside this one. */
+ if (this_group_regnum <= MAX_REGNUM)
+ {
+ unsigned char *inner_group_loc
+ = bufp->buffer + COMPILE_STACK_TOP.inner_group_offset;
- goto normal_backsl;
- }
- lower_bound = -1; /* So can see if are set. */
- upper_bound = -1;
- GET_UNSIGNED_NUMBER (lower_bound);
- if (c == ',')
- {
- GET_UNSIGNED_NUMBER (upper_bound);
- if (upper_bound < 0)
- upper_bound = RE_DUP_MAX;
- }
- if (upper_bound < 0)
- upper_bound = lower_bound;
- if (! (obscure_syntax & RE_NO_BK_CURLY_BRACES))
- {
- if (c != '\\')
- goto invalid_pattern;
- PATFETCH (c);
- }
- if (c != '}' || lower_bound < 0 || upper_bound > RE_DUP_MAX
- || lower_bound > upper_bound
- || ((obscure_syntax & RE_NO_BK_CURLY_BRACES)
- && p != pend && *p == '{'))
- {
- if (obscure_syntax & RE_NO_BK_CURLY_BRACES)
- goto unfetch_interval;
- else
- goto invalid_pattern;
- }
+ *inner_group_loc = regnum - this_group_regnum;
+ BUF_PUSH_3 (stop_memory, this_group_regnum,
+ regnum - this_group_regnum);
+ }
+ }
+ break;
- /* If upper_bound is zero, don't want to succeed at all;
- jump from laststart to b + 3, which will be the end of
- the buffer after this jump is inserted. */
-
- if (upper_bound == 0)
- {
- GET_BUFFER_SPACE (3);
- insert_jump (jump, laststart, b + 3, b);
- b += 3;
- }
- /* Otherwise, after lower_bound number of succeeds, jump
- to after the jump_n which will be inserted at the end
- of the buffer, and insert that jump_n. */
- else
- { /* Set to 5 if only one repetition is allowed and
- hence no jump_n is inserted at the current end of
- the buffer; then only space for the succeed_n is
- needed. Otherwise, need space for both the
- succeed_n and the jump_n. */
-
- unsigned slots_needed = upper_bound == 1 ? 5 : 10;
-
- GET_BUFFER_SPACE (slots_needed);
- /* Initialize the succeed_n to n, even though it will
- be set by its attendant set_number_at, because
- re_compile_fastmap will need to know it. Jump to
- what the end of buffer will be after inserting
- this succeed_n and possibly appending a jump_n. */
- insert_jump_n (succeed_n, laststart, b + slots_needed,
- b, lower_bound);
- b += 5; /* Just increment for the succeed_n here. */
-
- /* More than one repetition is allowed, so put in at
- the end of the buffer a backward jump from b to the
- succeed_n we put in above. By the time we've gotten
- to this jump when matching, we'll have matched once
- already, so jump back only upper_bound - 1 times. */
-
- if (upper_bound > 1)
- {
- store_jump_n (b, jump_n, laststart, upper_bound - 1);
- b += 5;
- /* When hit this when matching, reset the
- preceding jump_n's n to upper_bound - 1. */
- BUFPUSH (set_number_at);
- GET_BUFFER_SPACE (2);
- STORE_NUMBER_AND_INCR (b, -5);
- STORE_NUMBER_AND_INCR (b, upper_bound - 1);
- }
- /* When hit this when matching, set the succeed_n's n. */
- GET_BUFFER_SPACE (5);
- insert_op_2 (set_number_at, laststart, b, 5, lower_bound);
- b += 5;
- }
+ case '|': /* `\|'. */
+ if (syntax & RE_LIMITED_OPS || syntax & RE_NO_BK_VBAR)
+ goto normal_backslash;
+ handle_alt:
+ if (syntax & RE_LIMITED_OPS)
+ goto normal_char;
+
+ /* Insert before the previous alternative a jump which
+ jumps to this alternative if the former fails. */
+ GET_BUFFER_SPACE (3);
+ INSERT_JUMP (on_failure_jump, begalt, b + 6);
pending_exact = 0;
- beg_interval = 0;
+ b += 3;
+
+ /* The alternative before this one has a jump after it
+ which gets executed if it gets matched. Adjust that
+ jump so it will jump to this alternative's analogous
+ jump (put in below, which in turn will jump to the next
+ (if any) alternative's such jump, etc.). The last such
+ jump jumps to the correct final destination. A picture:
+ _____ _____
+ | | | |
+ | v | v
+ a | b | c
+
+ If we are at `b', then fixup_alt_jump right now points to a
+ three-byte space after `a'. We'll put in the jump, set
+ fixup_alt_jump to right after `b', and leave behind three
+ bytes which we'll fill in when we get to after `c'. */
+
+ if (fixup_alt_jump)
+ STORE_JUMP (jump_past_alt, fixup_alt_jump, b);
+
+ /* Mark and leave space for a jump after this alternative,
+ to be filled in later either by next alternative or
+ when know we're at the end of a series of alternatives. */
+ fixup_alt_jump = b;
+ GET_BUFFER_SPACE (3);
+ b += 3;
+
+ laststart = 0;
+ begalt = b;
break;
+ case '{':
+ /* If \{ is a literal. */
+ if (!(syntax & RE_INTERVALS)
+ /* If we're at `\{' and it's not the open-interval
+ operator. */
+ || ((syntax & RE_INTERVALS) && (syntax & RE_NO_BK_BRACES))
+ || (p - 2 == pattern && p == pend))
+ goto normal_backslash;
+
+ handle_interval:
+ {
+ /* If got here, then the syntax allows intervals. */
+
+ /* At least (most) this many matches must be made. */
+ int lower_bound = -1, upper_bound = -1;
+
+ beg_interval = p - 1;
+
+ if (p == pend)
+ {
+ if (syntax & RE_NO_BK_BRACES)
+ goto unfetch_interval;
+ else
+ return REG_EBRACE;
+ }
+
+ GET_UNSIGNED_NUMBER (lower_bound);
+
+ if (c == ',')
+ {
+ GET_UNSIGNED_NUMBER (upper_bound);
+ if (upper_bound < 0) upper_bound = RE_DUP_MAX;
+ }
+ else
+ /* Interval such as `{1}' => match exactly once. */
+ upper_bound = lower_bound;
+
+ if (lower_bound < 0 || upper_bound > RE_DUP_MAX
+ || lower_bound > upper_bound)
+ {
+ if (syntax & RE_NO_BK_BRACES)
+ goto unfetch_interval;
+ else
+ return REG_BADBR;
+ }
+
+ if (!(syntax & RE_NO_BK_BRACES))
+ {
+ if (c != '\\') return REG_EBRACE;
+
+ PATFETCH (c);
+ }
+
+ if (c != '}')
+ {
+ if (syntax & RE_NO_BK_BRACES)
+ goto unfetch_interval;
+ else
+ return REG_BADBR;
+ }
+
+ /* We just parsed a valid interval. */
+
+ /* If it's invalid to have no preceding re. */
+ if (!laststart)
+ {
+ if (syntax & RE_CONTEXT_INVALID_OPS)
+ return REG_BADRPT;
+ else if (syntax & RE_CONTEXT_INDEP_OPS)
+ laststart = b;
+ else
+ goto unfetch_interval;
+ }
+
+ /* If the upper bound is zero, don't want to succeed at
+ all; jump from `laststart' to `b + 3', which will be
+ the end of the buffer after we insert the jump. */
+ if (upper_bound == 0)
+ {
+ GET_BUFFER_SPACE (3);
+ INSERT_JUMP (jump, laststart, b + 3);
+ b += 3;
+ }
+
+ /* Otherwise, we have a nontrivial interval. When
+ we're all done, the pattern will look like:
+ set_number_at <jump count> <upper bound>
+ set_number_at <succeed_n count> <lower bound>
+ succeed_n <after jump addr> <succed_n count>
+ <body of loop>
+ jump_n <succeed_n addr> <jump count>
+ (The upper bound and `jump_n' are omitted if
+ `upper_bound' is 1, though.) */
+ else
+ { /* If the upper bound is > 1, we need to insert
+ more at the end of the loop. */
+ unsigned nbytes = 10 + (upper_bound > 1) * 10;
+
+ GET_BUFFER_SPACE (nbytes);
+
+ /* Initialize lower bound of the `succeed_n', even
+ though it will be set during matching by its
+ attendant `set_number_at' (inserted next),
+ because `re_compile_fastmap' needs to know.
+ Jump to the `jump_n' we might insert below. */
+ INSERT_JUMP2 (succeed_n, laststart,
+ b + 5 + (upper_bound > 1) * 5,
+ lower_bound);
+ b += 5;
+
+ /* Code to initialize the lower bound. Insert
+ before the `succeed_n'. The `5' is the last two
+ bytes of this `set_number_at', plus 3 bytes of
+ the following `succeed_n'. */
+ insert_op2 (set_number_at, laststart, 5, lower_bound, b);
+ b += 5;
+
+ if (upper_bound > 1)
+ { /* More than one repetition is allowed, so
+ append a backward jump to the `succeed_n'
+ that starts this interval.
+
+ When we've reached this during matching,
+ we'll have matched the interval once, so
+ jump back only `upper_bound - 1' times. */
+ STORE_JUMP2 (jump_n, b, laststart + 5,
+ upper_bound - 1);
+ b += 5;
+
+ /* The location we want to set is the second
+ parameter of the `jump_n'; that is `b-2' as
+ an absolute address. `laststart' will be
+ the `set_number_at' we're about to insert;
+ `laststart+3' the number to set, the source
+ for the relative address. But we are
+ inserting into the middle of the pattern --
+ so everything is getting moved up by 5.
+ Conclusion: (b - 2) - (laststart + 3) + 5,
+ i.e., b - laststart.
+
+ We insert this at the beginning of the loop
+ so that if we fail during matching, we'll
+ reinitialize the bounds. */
+ insert_op2 (set_number_at, laststart, b - laststart,
+ upper_bound - 1, b);
+ b += 5;
+ }
+ }
+ pending_exact = 0;
+ beg_interval = NULL;
+ }
+ break;
+
unfetch_interval:
- /* If an invalid interval, match the characters as literals. */
- if (beg_interval)
- p = beg_interval;
- else
+ /* If an invalid interval, match the characters as literals. */
+ assert (beg_interval);
+ p = beg_interval;
+ beg_interval = NULL;
+
+ /* normal_char and normal_backslash need `c'. */
+ PATFETCH (c);
+
+ if (!(syntax & RE_NO_BK_BRACES))
{
- fprintf (stderr,
- "regex: no interval beginning to which to backtrack.\n");
- exit (1);
+ if (p > pattern && p[-1] == '\\')
+ goto normal_backslash;
}
-
- beg_interval = 0;
- PATFETCH (c); /* normal_char expects char in `c'. */
- goto normal_char;
- break;
+ goto normal_char;
#ifdef emacs
- case '=':
- BUFPUSH (at_dot);
- break;
-
- case 's':
- laststart = b;
- BUFPUSH (syntaxspec);
- PATFETCH (c);
- BUFPUSH (syntax_spec_code[c]);
- break;
-
- case 'S':
- laststart = b;
- BUFPUSH (notsyntaxspec);
- PATFETCH (c);
- BUFPUSH (syntax_spec_code[c]);
- break;
+ /* There is no way to specify the before_dot and after_dot
+ operators. rms says this is ok. --karl */
+ case '=':
+ BUF_PUSH (at_dot);
+ break;
+
+ case 's':
+ laststart = b;
+ PATFETCH (c);
+ BUF_PUSH_2 (syntaxspec, syntax_spec_code[c]);
+ break;
+
+ case 'S':
+ laststart = b;
+ PATFETCH (c);
+ BUF_PUSH_2 (notsyntaxspec, syntax_spec_code[c]);
+ break;
#endif /* emacs */
- case 'w':
- laststart = b;
- BUFPUSH (wordchar);
- break;
-
- case 'W':
- laststart = b;
- BUFPUSH (notwordchar);
- break;
-
- case '<':
- BUFPUSH (wordbeg);
- break;
-
- case '>':
- BUFPUSH (wordend);
- break;
-
- case 'b':
- BUFPUSH (wordbound);
- break;
-
- case 'B':
- BUFPUSH (notwordbound);
- break;
-
- case '`':
- BUFPUSH (begbuf);
- break;
-
- case '\'':
- BUFPUSH (endbuf);
- break;
-
- case '1':
- case '2':
- case '3':
- case '4':
- case '5':
- case '6':
- case '7':
- case '8':
- case '9':
- if (obscure_syntax & RE_NO_BK_REFS)
+
+ case 'w':
+ if (re_syntax_options & RE_NO_GNU_OPS)
+ goto normal_char;
+ laststart = b;
+ BUF_PUSH (wordchar);
+ break;
+
+
+ case 'W':
+ if (re_syntax_options & RE_NO_GNU_OPS)
+ goto normal_char;
+ laststart = b;
+ BUF_PUSH (notwordchar);
+ break;
+
+
+ case '<':
+ if (re_syntax_options & RE_NO_GNU_OPS)
+ goto normal_char;
+ BUF_PUSH (wordbeg);
+ break;
+
+ case '>':
+ if (re_syntax_options & RE_NO_GNU_OPS)
+ goto normal_char;
+ BUF_PUSH (wordend);
+ break;
+
+ case 'b':
+ if (re_syntax_options & RE_NO_GNU_OPS)
+ goto normal_char;
+ BUF_PUSH (wordbound);
+ break;
+
+ case 'B':
+ if (re_syntax_options & RE_NO_GNU_OPS)
+ goto normal_char;
+ BUF_PUSH (notwordbound);
+ break;
+
+ case '`':
+ if (re_syntax_options & RE_NO_GNU_OPS)
+ goto normal_char;
+ BUF_PUSH (begbuf);
+ break;
+
+ case '\'':
+ if (re_syntax_options & RE_NO_GNU_OPS)
+ goto normal_char;
+ BUF_PUSH (endbuf);
+ break;
+
+ case '1': case '2': case '3': case '4': case '5':
+ case '6': case '7': case '8': case '9':
+ if (syntax & RE_NO_BK_REFS)
goto normal_char;
+
c1 = c - '0';
- if (c1 >= regnum)
- {
- if (obscure_syntax & RE_NO_EMPTY_BK_REF)
- goto invalid_pattern;
- else
- goto normal_char;
- }
+
+ if (c1 > regnum)
+ return REG_ESUBREG;
+
/* Can't back reference to a subexpression if inside of it. */
- for (stackt = stackp - 2; stackt > stackb; stackt -= 4)
- if (*stackt == c1)
- goto normal_char;
- laststart = b;
- BUFPUSH (duplicate);
- BUFPUSH (c1);
- break;
-
- case '+':
- case '?':
- if (obscure_syntax & RE_BK_PLUS_QM)
- goto handle_plus;
- else
- goto normal_backsl;
+ if (group_in_compile_stack (compile_stack, (regnum_t)c1))
+ goto normal_char;
+
+ laststart = b;
+ BUF_PUSH_2 (duplicate, c1);
break;
+
+ case '+':
+ case '?':
+ if (syntax & RE_BK_PLUS_QM)
+ goto handle_plus;
+ else
+ goto normal_backslash;
+
default:
- normal_backsl:
- /* You might think it would be useful for \ to mean
- not to translate; but if we don't translate it
- it will never match anything. */
- if (translate) c = translate[c];
- goto normal_char;
- }
- break;
+ normal_backslash:
+ /* You might think it would be useful for \ to mean
+ not to translate; but if we don't translate it
+ it will never match anything. */
+ c = TRANSLATE (c);
+ goto normal_char;
+ }
+ break;
+
default:
- normal_char: /* Expects the character in `c'. */
- if (!pending_exact || pending_exact + *pending_exact + 1 != b
- || *pending_exact == 0177 || *p == '*' || *p == '^'
- || ((obscure_syntax & RE_BK_PLUS_QM)
+ /* Expects the character in `c'. */
+ normal_char:
+ /* If no exactn currently being built. */
+ if (!pending_exact
+
+ /* If last exactn not at current position. */
+ || pending_exact + *pending_exact + 1 != b
+
+ /* We have only one byte following the exactn for the count. */
+ || *pending_exact == (1 << BYTEWIDTH) - 1
+
+ /* If followed by a repetition operator. */
+ || *p == '*' || *p == '^'
+ || ((syntax & RE_BK_PLUS_QM)
? *p == '\\' && (p[1] == '+' || p[1] == '?')
: (*p == '+' || *p == '?'))
- || ((obscure_syntax & RE_INTERVALS)
- && ((obscure_syntax & RE_NO_BK_CURLY_BRACES)
+ || ((syntax & RE_INTERVALS)
+ && ((syntax & RE_NO_BK_BRACES)
? *p == '{'
: (p[0] == '\\' && p[1] == '{'))))
{
- laststart = b;
- BUFPUSH (exactn);
- pending_exact = b;
- BUFPUSH (0);
- }
- BUFPUSH (c);
- (*pending_exact)++;
- }
- }
+ /* Start building a new exactn. */
+
+ laststart = b;
+
+ BUF_PUSH_2 (exactn, 0);
+ pending_exact = b - 1;
+ }
+
+ BUF_PUSH (c);
+ (*pending_exact)++;
+ break;
+ } /* switch (c) */
+ } /* while p != pend */
+
+
+ /* Through the pattern now. */
+
+ if (fixup_alt_jump)
+ STORE_JUMP (jump_past_alt, fixup_alt_jump, b);
- if (fixup_jump)
- store_jump (fixup_jump, jump, b);
+ if (!COMPILE_STACK_EMPTY)
+ return REG_EPAREN;
- if (stackp != stackb) goto unmatched_open;
+ free (compile_stack.stack);
+ /* We have succeeded; set the length of the buffer. */
bufp->used = b - bufp->buffer;
- return 0;
- invalid_pattern:
- return "Invalid regular expression";
+#ifdef DEBUG
+ if (debug)
+ {
+ DEBUG_PRINT1 ("\nCompiled pattern: \n");
+ print_compiled_pattern (bufp);
+ }
+#endif /* DEBUG */
- unmatched_open:
- return "Unmatched \\(";
+ return REG_NOERROR;
+} /* regex_compile */
+
+/* Subroutines for `regex_compile'. */
- unmatched_close:
- return "Unmatched \\)";
+/* Store OP at LOC followed by two-byte integer parameter ARG. */
- end_of_pattern:
- return "Premature end of regular expression";
+static void
+store_op1 (op, loc, arg)
+ re_opcode_t op;
+ unsigned char *loc;
+ int arg;
+{
+ *loc = (unsigned char) op;
+ STORE_NUMBER (loc + 1, arg);
+}
- nesting_too_deep:
- return "Nesting too deep";
- too_big:
- return "Regular expression too big";
+/* Like `store_op1', but for two two-byte parameters ARG1 and ARG2. */
- memory_exhausted:
- return "Memory exhausted";
+static void
+store_op2 (op, loc, arg1, arg2)
+ re_opcode_t op;
+ unsigned char *loc;
+ int arg1, arg2;
+{
+ *loc = (unsigned char) op;
+ STORE_NUMBER (loc + 1, arg1);
+ STORE_NUMBER (loc + 3, arg2);
}
-/* Store a jump of the form <OPCODE> <relative address>.
- Store in the location FROM a jump operation to jump to relative
- address FROM - TO. OPCODE is the opcode to store. */
+/* Copy the bytes from LOC to END to open up three bytes of space at LOC
+ for OP followed by two-byte integer parameter ARG. */
static void
-store_jump (from, opcode, to)
- char *from, *to;
- int opcode;
+insert_op1 (op, loc, arg, end)
+ re_opcode_t op;
+ unsigned char *loc;
+ int arg;
+ unsigned char *end;
{
- from[0] = (char)opcode;
- STORE_NUMBER(from + 1, to - (from + 3));
-}
+ register unsigned char *pfrom = end;
+ register unsigned char *pto = end + 3;
+ while (pfrom != loc)
+ *--pto = *--pfrom;
+
+ store_op1 (op, loc, arg);
+}
-/* Open up space before char FROM, and insert there a jump to TO.
- CURRENT_END gives the end of the storage not in use, so we know
- how much data to copy up. OP is the opcode of the jump to insert.
- If you call this function, you must zero out pending_exact. */
+/* Like `insert_op1', but for two two-byte parameters ARG1 and ARG2. */
static void
-insert_jump (op, from, to, current_end)
- int op;
- char *from, *to, *current_end;
+insert_op2 (op, loc, arg1, arg2, end)
+ re_opcode_t op;
+ unsigned char *loc;
+ int arg1, arg2;
+ unsigned char *end;
{
- register char *pfrom = current_end; /* Copy from here... */
- register char *pto = current_end + 3; /* ...to here. */
+ register unsigned char *pfrom = end;
+ register unsigned char *pto = end + 5;
- while (pfrom != from)
+ while (pfrom != loc)
*--pto = *--pfrom;
- store_jump (from, op, to);
+
+ store_op2 (op, loc, arg1, arg2);
}
-/* Store a jump of the form <opcode> <relative address> <n> .
+/* P points to just after a ^ in PATTERN. Return true if that ^ comes
+ after an alternative or a begin-subexpression. We assume there is at
+ least one character before the ^. */
- Store in the location FROM a jump operation to jump to relative
- address FROM - TO. OPCODE is the opcode to store, N is a number the
- jump uses, say, to decide how many times to jump.
-
- If you call this function, you must zero out pending_exact. */
+static boolean
+at_begline_loc_p (pattern, p, syntax)
+ const char *pattern, *p;
+ reg_syntax_t syntax;
+{
+ const char *prev = p - 2;
+ boolean prev_prev_backslash = prev > pattern && prev[-1] == '\\';
+
+ return
+ /* After a subexpression? */
+ (*prev == '(' && (syntax & RE_NO_BK_PARENS || prev_prev_backslash))
+ /* After an alternative? */
+ || (*prev == '|' && (syntax & RE_NO_BK_VBAR || prev_prev_backslash));
+}
-static void
-store_jump_n (from, opcode, to, n)
- char *from, *to;
- int opcode;
- unsigned n;
+
+/* The dual of at_begline_loc_p. This one is for $. We assume there is
+ at least one character after the $, i.e., `P < PEND'. */
+
+static boolean
+at_endline_loc_p (p, pend, syntax)
+ const char *p, *pend;
+ reg_syntax_t syntax;
{
- from[0] = (char)opcode;
- STORE_NUMBER (from + 1, to - (from + 3));
- STORE_NUMBER (from + 3, n);
+ const char *next = p;
+ boolean next_backslash = *next == '\\';
+ const char *next_next = p + 1 < pend ? p + 1 : NULL;
+
+ return
+ /* Before a subexpression? */
+ (syntax & RE_NO_BK_PARENS ? *next == ')'
+ : next_backslash && next_next && *next_next == ')')
+ /* Before an alternative? */
+ || (syntax & RE_NO_BK_VBAR ? *next == '|'
+ : next_backslash && next_next && *next_next == '|');
}
-/* Similar to insert_jump, but handles a jump which needs an extra
- number to handle minimum and maximum cases. Open up space at
- location FROM, and insert there a jump to TO. CURRENT_END gives the
- end of the storage in use, so we know how much data to copy up. OP is
- the opcode of the jump to insert.
+/* Returns true if REGNUM is in one of COMPILE_STACK's elements and
+ false if it's not. */
- If you call this function, you must zero out pending_exact. */
+static boolean
+group_in_compile_stack (compile_stack, regnum)
+ compile_stack_type compile_stack;
+ regnum_t regnum;
+{
+ int this_element;
-static void
-insert_jump_n (op, from, to, current_end, n)
- int op;
- char *from, *to, *current_end;
- unsigned n;
+ for (this_element = compile_stack.avail - 1;
+ this_element >= 0;
+ this_element--)
+ if (compile_stack.stack[this_element].regnum == regnum)
+ return true;
+
+ return false;
+}
+
+
+/* Read the ending character of a range (in a bracket expression) from the
+ uncompiled pattern *P_PTR (which ends at PEND). We assume the
+ starting character is in `P[-2]'. (`P[-1]' is the character `-'.)
+ Then we set the translation of all bits between the starting and
+ ending characters (inclusive) in the compiled pattern B.
+
+ Return an error code.
+
+ We use these short variable names so we can use the same macros as
+ `regex_compile' itself. */
+
+static reg_errcode_t
+compile_range (p_ptr, pend, translate, syntax, b)
+ const char **p_ptr, *pend;
+ char *translate;
+ reg_syntax_t syntax;
+ unsigned char *b;
{
- register char *pfrom = current_end; /* Copy from here... */
- register char *pto = current_end + 5; /* ...to here. */
+ unsigned this_char;
- while (pfrom != from)
- *--pto = *--pfrom;
- store_jump_n (from, op, to, n);
+ const char *p = *p_ptr;
+ int range_start, range_end;
+
+ if (p == pend)
+ return REG_ERANGE;
+
+ /* Even though the pattern is a signed `char *', we need to fetch
+ with unsigned char *'s; if the high bit of the pattern character
+ is set, the range endpoints will be negative if we fetch using a
+ signed char *.
+
+ We also want to fetch the endpoints without translating them; the
+ appropriate translation is done in the bit-setting loop below. */
+ range_start = ((unsigned char *) p)[-2];
+ range_end = ((unsigned char *) p)[0];
+
+ /* Have to increment the pointer into the pattern string, so the
+ caller isn't still at the ending character. */
+ (*p_ptr)++;
+
+ /* If the start is after the end, the range is empty. */
+ if (range_start > range_end)
+ return syntax & RE_NO_EMPTY_RANGES ? REG_ERANGE : REG_NOERROR;
+
+ /* Here we see why `this_char' has to be larger than an `unsigned
+ char' -- the range is inclusive, so if `range_end' == 0xff
+ (assuming 8-bit characters), we would otherwise go into an infinite
+ loop, since all characters <= 0xff. */
+ for (this_char = range_start; this_char <= range_end; this_char++)
+ {
+ SET_LIST_BIT (TRANSLATE (this_char));
+ }
+
+ return REG_NOERROR;
}
+
+/* Failure stack declarations and macros; both re_compile_fastmap and
+ re_match_2 use a failure stack. These have to be macros because of
+ REGEX_ALLOCATE. */
+
+/* Number of failure points for which to initially allocate space
+ when matching. If this number is exceeded, we allocate more
+ space, so it is not a hard limit. */
+#ifndef INIT_FAILURE_ALLOC
+#define INIT_FAILURE_ALLOC 5
+#endif
-/* Open up space at location THERE, and insert operation OP followed by
- NUM_1 and NUM_2. CURRENT_END gives the end of the storage in use, so
- we know how much data to copy up.
+/* Roughly the maximum number of failure points on the stack. Would be
+ exactly that if always used MAX_FAILURE_SPACE each time we failed.
+ This is a variable only so users of regex can assign to it; we never
+ change it ourselves. */
+int re_max_failures = 2000;
- If you call this function, you must zero out pending_exact. */
+typedef const unsigned char *fail_stack_elt_t;
-static void
-insert_op_2 (op, there, current_end, num_1, num_2)
- int op;
- char *there, *current_end;
- int num_1, num_2;
+typedef struct
{
- register char *pfrom = current_end; /* Copy from here... */
- register char *pto = current_end + 5; /* ...to here. */
+ fail_stack_elt_t *stack;
+ unsigned size;
+ unsigned avail; /* Offset of next open position. */
+} fail_stack_type;
- while (pfrom != there)
- *--pto = *--pfrom;
-
- there[0] = (char)op;
- STORE_NUMBER (there + 1, num_1);
- STORE_NUMBER (there + 3, num_2);
-}
+#define FAIL_STACK_EMPTY() (fail_stack.avail == 0)
+#define FAIL_STACK_PTR_EMPTY() (fail_stack_ptr->avail == 0)
+#define FAIL_STACK_FULL() (fail_stack.avail == fail_stack.size)
+#define FAIL_STACK_TOP() (fail_stack.stack[fail_stack.avail])
+
+
+/* Initialize `fail_stack'. Do `return -2' if the alloc fails. */
+
+#define INIT_FAIL_STACK() \
+ do { \
+ fail_stack.stack = (fail_stack_elt_t *) \
+ REGEX_ALLOCATE (INIT_FAILURE_ALLOC * sizeof (fail_stack_elt_t)); \
+ \
+ if (fail_stack.stack == NULL) \
+ return -2; \
+ \
+ fail_stack.size = INIT_FAILURE_ALLOC; \
+ fail_stack.avail = 0; \
+ } while (0)
+
+
+/* Double the size of FAIL_STACK, up to approximately `re_max_failures' items.
+
+ Return 1 if succeeds, and 0 if either ran out of memory
+ allocating space for it or it was already too large.
+
+ REGEX_REALLOCATE requires `destination' be declared. */
+
+#define DOUBLE_FAIL_STACK(fail_stack) \
+ ((fail_stack).size > re_max_failures * MAX_FAILURE_ITEMS \
+ ? 0 \
+ : ((fail_stack).stack = (fail_stack_elt_t *) \
+ REGEX_REALLOCATE ((fail_stack).stack, \
+ (fail_stack).size * sizeof (fail_stack_elt_t), \
+ ((fail_stack).size << 1) * sizeof (fail_stack_elt_t)), \
+ \
+ (fail_stack).stack == NULL \
+ ? 0 \
+ : ((fail_stack).size <<= 1, \
+ 1)))
+
+
+/* Push PATTERN_OP on FAIL_STACK.
+
+ Return 1 if was able to do so and 0 if ran out of memory allocating
+ space to do so. */
+#define PUSH_PATTERN_OP(pattern_op, fail_stack) \
+ ((FAIL_STACK_FULL () \
+ && !DOUBLE_FAIL_STACK (fail_stack)) \
+ ? 0 \
+ : ((fail_stack).stack[(fail_stack).avail++] = pattern_op, \
+ 1))
+
+/* This pushes an item onto the failure stack. Must be a four-byte
+ value. Assumes the variable `fail_stack'. Probably should only
+ be called from within `PUSH_FAILURE_POINT'. */
+#define PUSH_FAILURE_ITEM(item) \
+ fail_stack.stack[fail_stack.avail++] = (fail_stack_elt_t) item
+
+/* The complement operation. Assumes `fail_stack' is nonempty. */
+#define POP_FAILURE_ITEM() fail_stack.stack[--fail_stack.avail]
+
+/* Used to omit pushing failure point id's when we're not debugging. */
+#ifdef DEBUG
+#define DEBUG_PUSH PUSH_FAILURE_ITEM
+#define DEBUG_POP(item_addr) *(item_addr) = POP_FAILURE_ITEM ()
+#else
+#define DEBUG_PUSH(item)
+#define DEBUG_POP(item_addr)
+#endif
+
+
+/* Push the information about the state we will need
+ if we ever fail back to it.
+
+ Requires variables fail_stack, regstart, regend, reg_info, and
+ num_regs be declared. DOUBLE_FAIL_STACK requires `destination' be
+ declared.
+
+ Does `return FAILURE_CODE' if runs out of memory. */
+
+#define PUSH_FAILURE_POINT(pattern_place, string_place, failure_code) \
+ do { \
+ char *destination; \
+ /* Must be int, so when we don't save any registers, the arithmetic \
+ of 0 + -1 isn't done as unsigned. */ \
+ /* Can't be int, since there is not a shred of a guarantee that int \
+ is wide enough to hold a value of something to which pointer can \
+ be assigned */ \
+ s_reg_t this_reg; \
+ \
+ DEBUG_STATEMENT (failure_id++); \
+ DEBUG_STATEMENT (nfailure_points_pushed++); \
+ DEBUG_PRINT2 ("\nPUSH_FAILURE_POINT #%u:\n", failure_id); \
+ DEBUG_PRINT2 (" Before push, next avail: %d\n", (fail_stack).avail);\
+ DEBUG_PRINT2 (" size: %d\n", (fail_stack).size);\
+ \
+ DEBUG_PRINT2 (" slots needed: %d\n", NUM_FAILURE_ITEMS); \
+ DEBUG_PRINT2 (" available: %d\n", REMAINING_AVAIL_SLOTS); \
+ \
+ /* Ensure we have enough space allocated for what we will push. */ \
+ while (REMAINING_AVAIL_SLOTS < NUM_FAILURE_ITEMS) \
+ { \
+ if (!DOUBLE_FAIL_STACK (fail_stack)) \
+ return failure_code; \
+ \
+ DEBUG_PRINT2 ("\n Doubled stack; size now: %d\n", \
+ (fail_stack).size); \
+ DEBUG_PRINT2 (" slots available: %d\n", REMAINING_AVAIL_SLOTS);\
+ }
+
+#define PUSH_FAILURE_POINT2(pattern_place, string_place, failure_code) \
+ /* Push the info, starting with the registers. */ \
+ DEBUG_PRINT1 ("\n"); \
+ \
+ PUSH_FAILURE_POINT_LOOP (); \
+ \
+ DEBUG_PRINT2 (" Pushing low active reg: %d\n", lowest_active_reg);\
+ PUSH_FAILURE_ITEM (lowest_active_reg); \
+ \
+ DEBUG_PRINT2 (" Pushing high active reg: %d\n", highest_active_reg);\
+ PUSH_FAILURE_ITEM (highest_active_reg); \
+ \
+ DEBUG_PRINT2 (" Pushing pattern 0x%x: ", pattern_place); \
+ DEBUG_PRINT_COMPILED_PATTERN (bufp, pattern_place, pend); \
+ PUSH_FAILURE_ITEM (pattern_place); \
+ \
+ DEBUG_PRINT2 (" Pushing string 0x%x: `", string_place); \
+ DEBUG_PRINT_DOUBLE_STRING (string_place, string1, size1, string2, \
+ size2); \
+ DEBUG_PRINT1 ("'\n"); \
+ PUSH_FAILURE_ITEM (string_place); \
+ \
+ DEBUG_PRINT2 (" Pushing failure id: %u\n", failure_id); \
+ DEBUG_PUSH (failure_id); \
+ } while (0)
+
+/* Pulled out of PUSH_FAILURE_POINT() to shorten the definition
+ of that macro. (for VAX C) */
+#define PUSH_FAILURE_POINT_LOOP() \
+ for (this_reg = lowest_active_reg; this_reg <= highest_active_reg; \
+ this_reg++) \
+ { \
+ DEBUG_PRINT2 (" Pushing reg: %d\n", this_reg); \
+ DEBUG_STATEMENT (num_regs_pushed++); \
+ \
+ DEBUG_PRINT2 (" start: 0x%x\n", regstart[this_reg]); \
+ PUSH_FAILURE_ITEM (regstart[this_reg]); \
+ \
+ DEBUG_PRINT2 (" end: 0x%x\n", regend[this_reg]); \
+ PUSH_FAILURE_ITEM (regend[this_reg]); \
+ \
+ DEBUG_PRINT2 (" info: 0x%x\n ", reg_info[this_reg]); \
+ DEBUG_PRINT2 (" match_null=%d", \
+ REG_MATCH_NULL_STRING_P (reg_info[this_reg])); \
+ DEBUG_PRINT2 (" active=%d", IS_ACTIVE (reg_info[this_reg])); \
+ DEBUG_PRINT2 (" matched_something=%d", \
+ MATCHED_SOMETHING (reg_info[this_reg])); \
+ DEBUG_PRINT2 (" ever_matched=%d", \
+ EVER_MATCHED_SOMETHING (reg_info[this_reg])); \
+ DEBUG_PRINT1 ("\n"); \
+ PUSH_FAILURE_ITEM (reg_info[this_reg].word); \
+ }
+
+/* This is the number of items that are pushed and popped on the stack
+ for each register. */
+#define NUM_REG_ITEMS 3
+
+/* Individual items aside from the registers. */
+#ifdef DEBUG
+#define NUM_NONREG_ITEMS 5 /* Includes failure point id. */
+#else
+#define NUM_NONREG_ITEMS 4
+#endif
+
+/* We push at most this many items on the stack. */
+#define MAX_FAILURE_ITEMS ((num_regs - 1) * NUM_REG_ITEMS + NUM_NONREG_ITEMS)
+
+/* We actually push this many items. */
+#define NUM_FAILURE_ITEMS \
+ ((highest_active_reg - lowest_active_reg + 1) * NUM_REG_ITEMS \
+ + NUM_NONREG_ITEMS)
+
+/* How many items can still be added to the stack without overflowing it. */
+#define REMAINING_AVAIL_SLOTS ((fail_stack).size - (fail_stack).avail)
+
+
+/* Pops what PUSH_FAIL_STACK pushes.
+
+ We restore into the parameters, all of which should be lvalues:
+ STR -- the saved data position.
+ PAT -- the saved pattern position.
+ LOW_REG, HIGH_REG -- the highest and lowest active registers.
+ REGSTART, REGEND -- arrays of string positions.
+ REG_INFO -- array of information about each subexpression.
+
+ Also assumes the variables `fail_stack' and (if debugging), `bufp',
+ `pend', `string1', `size1', `string2', and `size2'. */
+
+#define POP_FAILURE_POINT(str, pat, low_reg, high_reg, regstart, regend, reg_info)\
+{ \
+ DEBUG_STATEMENT (fail_stack_elt_t failure_id;) \
+ s_reg_t this_reg; \
+ const unsigned char *string_temp; \
+ \
+ assert (!FAIL_STACK_EMPTY ()); \
+ \
+ /* Remove failure points and point to how many regs pushed. */ \
+ DEBUG_PRINT1 ("POP_FAILURE_POINT:\n"); \
+ DEBUG_PRINT2 (" Before pop, next avail: %d\n", fail_stack.avail); \
+ DEBUG_PRINT2 (" size: %d\n", fail_stack.size); \
+ \
+ assert (fail_stack.avail >= NUM_NONREG_ITEMS); \
+ \
+ DEBUG_POP (&failure_id); \
+ DEBUG_PRINT2 (" Popping failure id: %u\n", failure_id); \
+ \
+ /* If the saved string location is NULL, it came from an \
+ on_failure_keep_string_jump opcode, and we want to throw away the \
+ saved NULL, thus retaining our current position in the string. */ \
+ string_temp = POP_FAILURE_ITEM (); \
+ if (string_temp != NULL) \
+ str = (const char *) string_temp; \
+ \
+ DEBUG_PRINT2 (" Popping string 0x%x: `", str); \
+ DEBUG_PRINT_DOUBLE_STRING (str, string1, size1, string2, size2); \
+ DEBUG_PRINT1 ("'\n"); \
+ \
+ pat = (unsigned char *) POP_FAILURE_ITEM (); \
+ DEBUG_PRINT2 (" Popping pattern 0x%x: ", pat); \
+ DEBUG_PRINT_COMPILED_PATTERN (bufp, pat, pend); \
+ \
+ POP_FAILURE_POINT2 (low_reg, high_reg, regstart, regend, reg_info);
+/* Pulled out of POP_FAILURE_POINT() to shorten the definition
+ of that macro. (for MSC 5.1) */
+#define POP_FAILURE_POINT2(low_reg, high_reg, regstart, regend, reg_info) \
+ \
+ /* Restore register info. */ \
+ high_reg = (active_reg_t) POP_FAILURE_ITEM (); \
+ DEBUG_PRINT2 (" Popping high active reg: %d\n", high_reg); \
+ \
+ low_reg = (active_reg_t) POP_FAILURE_ITEM (); \
+ DEBUG_PRINT2 (" Popping low active reg: %d\n", low_reg); \
+ \
+ for (this_reg = high_reg; this_reg >= low_reg; this_reg--) \
+ { \
+ DEBUG_PRINT2 (" Popping reg: %d\n", this_reg); \
+ \
+ reg_info[this_reg].word = POP_FAILURE_ITEM (); \
+ DEBUG_PRINT2 (" info: 0x%x\n", reg_info[this_reg]); \
+ \
+ regend[this_reg] = (const char *) POP_FAILURE_ITEM (); \
+ DEBUG_PRINT2 (" end: 0x%x\n", regend[this_reg]); \
+ \
+ regstart[this_reg] = (const char *) POP_FAILURE_ITEM (); \
+ DEBUG_PRINT2 (" start: 0x%x\n", regstart[this_reg]); \
+ } \
+ \
+ DEBUG_STATEMENT (nfailure_points_popped++); \
+} /* POP_FAILURE_POINT */
-/* Given a pattern, compute a fastmap from it. The fastmap records
- which of the (1 << BYTEWIDTH) possible characters can start a string
- that matches the pattern. This fastmap is used by re_search to skip
- quickly over totally implausible text.
+/* re_compile_fastmap computes a ``fastmap'' for the compiled pattern in
+ BUFP. A fastmap records which of the (1 << BYTEWIDTH) possible
+ characters can start a string that matches the pattern. This fastmap
+ is used by re_search to skip quickly over impossible starting points.
- The caller must supply the address of a (1 << BYTEWIDTH)-byte data
- area as bufp->fastmap.
- The other components of bufp describe the pattern to be used. */
+ The caller must supply the address of a (1 << BYTEWIDTH)-byte data
+ area as BUFP->fastmap.
+
+ We set the `fastmap', `fastmap_accurate', and `can_be_null' fields in
+ the pattern buffer.
-void
+ Returns 0 if we succeed, -2 if an internal error. */
+
+int
re_compile_fastmap (bufp)
struct re_pattern_buffer *bufp;
{
- unsigned char *pattern = (unsigned char *) bufp->buffer;
- int size = bufp->used;
+ int j, k;
+ fail_stack_type fail_stack;
+#ifndef REGEX_MALLOC
+ char *destination;
+#endif
+ /* We don't push any register information onto the failure stack. */
+ unsigned num_regs = 0;
+
register char *fastmap = bufp->fastmap;
- register unsigned char *p = pattern;
- register unsigned char *pend = pattern + size;
- register int j, k;
- unsigned char *translate = (unsigned char *) bufp->translate;
- unsigned is_a_succeed_n;
+ unsigned char *pattern = bufp->buffer;
+ const unsigned char *p = pattern;
+ register unsigned char *pend = pattern + bufp->used;
-#ifndef NO_ALLOCA
- unsigned char *stackb[NFAILURES];
- unsigned char **stackp = stackb;
+ /* Assume that each path through the pattern can be null until
+ proven otherwise. We set this false at the bottom of switch
+ statement, to which we get only if a particular path doesn't
+ match the empty string. */
+ boolean path_can_be_null = true;
-#else
- unsigned char **stackb;
- unsigned char **stackp;
- stackb = (unsigned char **) malloc (NFAILURES * sizeof (unsigned char *));
- stackp = stackb;
-
-#endif /* NO_ALLOCA */
- memset (fastmap, 0, (1 << BYTEWIDTH));
- bufp->fastmap_accurate = 1;
+ /* We aren't doing a `succeed_n' to begin with. */
+ boolean succeed_n_p = false;
+
+ assert (fastmap != NULL && p != NULL);
+
+ INIT_FAIL_STACK ();
+ bzero (fastmap, 1 << BYTEWIDTH); /* Assume nothing's valid. */
+ bufp->fastmap_accurate = 1; /* It will be when we're done. */
bufp->can_be_null = 0;
- while (p)
+ while (p != pend || !FAIL_STACK_EMPTY ())
{
- is_a_succeed_n = 0;
if (p == pend)
- {
- bufp->can_be_null = 1;
- break;
+ {
+ bufp->can_be_null |= path_can_be_null;
+
+ /* Reset for next path. */
+ path_can_be_null = true;
+
+ p = fail_stack.stack[--fail_stack.avail];
}
+
+ /* We should never be about to go beyond the end of the pattern. */
+ assert (p < pend);
+
#ifdef SWITCH_ENUM_BUG
- switch ((int) ((enum regexpcode) *p++))
+ switch ((int) ((re_opcode_t) *p++))
#else
- switch ((enum regexpcode) *p++)
+ switch ((re_opcode_t) *p++)
#endif
{
+
+ /* I guess the idea here is to simply not bother with a fastmap
+ if a backreference is used, since it's too hard to figure out
+ the fastmap for the corresponding group. Setting
+ `can_be_null' stops `re_search_2' from using the fastmap, so
+ that is all we do. */
+ case duplicate:
+ bufp->can_be_null = 1;
+ return 0;
+
+
+ /* Following are the cases which match a character. These end
+ with `break'. */
+
case exactn:
- if (translate)
- fastmap[translate[p[1]]] = 1;
- else
- fastmap[p[1]] = 1;
+ fastmap[p[1]] = 1;
break;
- case begline:
- case before_dot:
+
+ case charset:
+ for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
+ if (p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH)))
+ fastmap[j] = 1;
+ break;
+
+
+ case charset_not:
+ /* Chars beyond end of map must be allowed. */
+ for (j = *p * BYTEWIDTH; j < (1 << BYTEWIDTH); j++)
+ fastmap[j] = 1;
+
+ for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
+ if (!(p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH))))
+ fastmap[j] = 1;
+ break;
+
+
+ case wordchar:
+ for (j = 0; j < (1 << BYTEWIDTH); j++)
+ if (SYNTAX (j) == Sword)
+ fastmap[j] = 1;
+ break;
+
+
+ case notwordchar:
+ for (j = 0; j < (1 << BYTEWIDTH); j++)
+ if (SYNTAX (j) != Sword)
+ fastmap[j] = 1;
+ break;
+
+
+ case anychar:
+ /* `.' matches anything ... */
+ for (j = 0; j < (1 << BYTEWIDTH); j++)
+ fastmap[j] = 1;
+
+ /* ... except perhaps newline. */
+ if (!(bufp->syntax & RE_DOT_NEWLINE))
+ fastmap['\n'] = 0;
+
+ /* Return if we have already set `can_be_null'; if we have,
+ then the fastmap is irrelevant. Something's wrong here. */
+ else if (bufp->can_be_null)
+ return 0;
+
+ /* Otherwise, have to check alternative paths. */
+ break;
+
+
+#ifdef emacs
+ case syntaxspec:
+ k = *p++;
+ for (j = 0; j < (1 << BYTEWIDTH); j++)
+ if (SYNTAX (j) == (enum syntaxcode) k)
+ fastmap[j] = 1;
+ break;
+
+
+ case notsyntaxspec:
+ k = *p++;
+ for (j = 0; j < (1 << BYTEWIDTH); j++)
+ if (SYNTAX (j) != (enum syntaxcode) k)
+ fastmap[j] = 1;
+ break;
+
+
+ /* All cases after this match the empty string. These end with
+ `continue'. */
+
+
+ case before_dot:
case at_dot:
case after_dot:
+ continue;
+#endif /* not emacs */
+
+
+ case no_op:
+ case begline:
+ case endline:
case begbuf:
case endbuf:
case wordbound:
case notwordbound:
case wordbeg:
case wordend:
+ case push_dummy_failure:
continue;
- case endline:
- if (translate)
- fastmap[translate['\n']] = 1;
- else
- fastmap['\n'] = 1;
-
- if (bufp->can_be_null != 1)
- bufp->can_be_null = 2;
- break;
case jump_n:
- case finalize_jump:
- case maybe_finalize_jump:
+ case pop_failure_jump:
+ case maybe_pop_jump:
case jump:
+ case jump_past_alt:
case dummy_failure_jump:
EXTRACT_NUMBER_AND_INCR (j, p);
p += j;
if (j > 0)
continue;
- /* Jump backward reached implies we just went through
- the body of a loop and matched nothing.
- Opcode jumped to should be an on_failure_jump.
- Just treat it like an ordinary jump.
- For a * loop, it has pushed its failure point already;
- If so, discard that as redundant. */
-
- if ((enum regexpcode) *p != on_failure_jump
- && (enum regexpcode) *p != succeed_n)
+
+ /* Jump backward implies we just went through the body of a
+ loop and matched nothing. Opcode jumped to should be
+ `on_failure_jump' or `succeed_n'. Just treat it like an
+ ordinary jump. For a * loop, it has pushed its failure
+ point already; if so, discard that as redundant. */
+ if ((re_opcode_t) *p != on_failure_jump
+ && (re_opcode_t) *p != succeed_n)
continue;
+
p++;
EXTRACT_NUMBER_AND_INCR (j, p);
p += j;
- if (stackp != stackb && *stackp == p)
- stackp--;
- continue;
+ /* If what's on the stack is where we are now, pop it. */
+ if (!FAIL_STACK_EMPTY ()
+ && fail_stack.stack[fail_stack.avail - 1] == p)
+ fail_stack.avail--;
+
+ continue;
+
+
case on_failure_jump:
+ case on_failure_keep_string_jump:
handle_on_failure_jump:
EXTRACT_NUMBER_AND_INCR (j, p);
- *++stackp = p + j;
- if (is_a_succeed_n)
- EXTRACT_NUMBER_AND_INCR (k, p); /* Skip the n. */
- continue;
+
+ /* For some patterns, e.g., `(a?)?', `p+j' here points to the
+ end of the pattern. We don't want to push such a point,
+ since when we restore it above, entering the switch will
+ increment `p' past the end of the pattern. We don't need
+ to push such a point since we obviously won't find any more
+ fastmap entries beyond `pend'. Such a pattern can match
+ the null string, though. */
+ if (p + j < pend)
+ {
+ if (!PUSH_PATTERN_OP (p + j, fail_stack))
+ return -2;
+ }
+ else
+ bufp->can_be_null = 1;
+
+ if (succeed_n_p)
+ {
+ EXTRACT_NUMBER_AND_INCR (k, p); /* Skip the n. */
+ succeed_n_p = false;
+ }
+
+ continue;
+
case succeed_n:
- is_a_succeed_n = 1;
/* Get to the number of times to succeed. */
p += 2;
- /* Increment p past the n for when k != 0. */
+
+ /* Increment p past the n for when k != 0. */
EXTRACT_NUMBER_AND_INCR (k, p);
if (k == 0)
{
p -= 4;
+ succeed_n_p = true; /* Spaghetti code alert. */
goto handle_on_failure_jump;
}
continue;
-
+
+
case set_number_at:
p += 4;
continue;
- case start_memory:
- case stop_memory:
- p++;
- continue;
-
- case duplicate:
- bufp->can_be_null = 1;
- fastmap['\n'] = 1;
- case anychar:
- for (j = 0; j < (1 << BYTEWIDTH); j++)
- if (j != '\n')
- fastmap[j] = 1;
- if (bufp->can_be_null)
- {
- FREE_AND_RETURN_VOID(stackb);
- }
- /* Don't return; check the alternative paths
- so we can set can_be_null if appropriate. */
- break;
-
- case wordchar:
- for (j = 0; j < (1 << BYTEWIDTH); j++)
- if (SYNTAX (j) == Sword)
- fastmap[j] = 1;
- break;
-
- case notwordchar:
- for (j = 0; j < (1 << BYTEWIDTH); j++)
- if (SYNTAX (j) != Sword)
- fastmap[j] = 1;
- break;
-#ifdef emacs
- case syntaxspec:
- k = *p++;
- for (j = 0; j < (1 << BYTEWIDTH); j++)
- if (SYNTAX (j) == (enum syntaxcode) k)
- fastmap[j] = 1;
- break;
-
- case notsyntaxspec:
- k = *p++;
- for (j = 0; j < (1 << BYTEWIDTH); j++)
- if (SYNTAX (j) != (enum syntaxcode) k)
- fastmap[j] = 1;
- break;
-
-#else /* not emacs */
- case syntaxspec:
- case notsyntaxspec:
- break;
-#endif /* not emacs */
+ case start_memory:
+ case stop_memory:
+ p += 2;
+ continue;
- case charset:
- for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
- if (p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH)))
- {
- if (translate)
- fastmap[translate[j]] = 1;
- else
- fastmap[j] = 1;
- }
- break;
- case charset_not:
- /* Chars beyond end of map must be allowed */
- for (j = *p * BYTEWIDTH; j < (1 << BYTEWIDTH); j++)
- if (translate)
- fastmap[translate[j]] = 1;
- else
- fastmap[j] = 1;
+ default:
+ abort (); /* We have listed all the cases. */
+ } /* switch *p++ */
+
+ /* Getting here means we have found the possible starting
+ characters for one path of the pattern -- and that the empty
+ string does not match. We need not follow this path further.
+ Instead, look at the next alternative (remembered on the
+ stack), or quit if no more. The test at the top of the loop
+ does these things. */
+ path_can_be_null = false;
+ p = pend;
+ } /* while p */
+
+ /* Set `can_be_null' for the last path (also the first path, if the
+ pattern is empty). */
+ bufp->can_be_null |= path_can_be_null;
+ return 0;
+} /* re_compile_fastmap */
+
+/* Set REGS to hold NUM_REGS registers, storing them in STARTS and
+ ENDS. Subsequent matches using PATTERN_BUFFER and REGS will use
+ this memory for recording register information. STARTS and ENDS
+ must be allocated using the malloc library routine, and must each
+ be at least NUM_REGS * sizeof (regoff_t) bytes long.
- for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
- if (!(p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH))))
- {
- if (translate)
- fastmap[translate[j]] = 1;
- else
- fastmap[j] = 1;
- }
- break;
+ If NUM_REGS == 0, then subsequent matches should allocate their own
+ register data.
- case unused: /* pacify gcc -Wall */
- break;
- }
+ Unless this function is called, the first search or match using
+ PATTERN_BUFFER will allocate its own register data, without
+ freeing the old data. */
- /* Get here means we have successfully found the possible starting
- characters of one path of the pattern. We need not follow this
- path any farther. Instead, look at the next alternative
- remembered in the stack. */
- if (stackp != stackb)
- p = *stackp--;
- else
- break;
+void
+re_set_registers (bufp, regs, num_regs, starts, ends)
+ struct re_pattern_buffer *bufp;
+ struct re_registers *regs;
+ unsigned num_regs;
+ regoff_t *starts, *ends;
+{
+ if (num_regs)
+ {
+ bufp->regs_allocated = REGS_REALLOCATE;
+ regs->num_regs = num_regs;
+ regs->start = starts;
+ regs->end = ends;
+ }
+ else
+ {
+ bufp->regs_allocated = REGS_UNALLOCATED;
+ regs->num_regs = 0;
+ regs->start = regs->end = 0;
}
- FREE_AND_RETURN_VOID(stackb);
}
-
-
+/* Searching routines. */
+
/* Like re_search_2, below, but only one string is specified, and
doesn't let you say where to stop matching. */
int
-re_search (pbufp, string, size, startpos, range, regs)
- struct re_pattern_buffer *pbufp;
- char *string;
+re_search (bufp, string, size, startpos, range, regs)
+ struct re_pattern_buffer *bufp;
+ const char *string;
int size, startpos, range;
struct re_registers *regs;
{
- return re_search_2 (pbufp, (char *) 0, 0, string, size, startpos, range,
+ return re_search_2 (bufp, NULL, 0, string, size, startpos, range,
regs, size);
}
-/* Using the compiled pattern in PBUFP->buffer, first tries to match the
+/* Using the compiled pattern in BUFP->buffer, first tries to match the
virtual concatenation of STRING1 and STRING2, starting first at index
- STARTPOS, then at STARTPOS + 1, and so on. RANGE is the number of
- places to try before giving up. If RANGE is negative, it searches
- backwards, i.e., the starting positions tried are STARTPOS, STARTPOS
- - 1, etc. STRING1 and STRING2 are of SIZE1 and SIZE2, respectively.
+ STARTPOS, then at STARTPOS + 1, and so on.
+
+ STRING1 and STRING2 have length SIZE1 and SIZE2, respectively.
+
+ RANGE is how far to scan while trying to match. RANGE = 0 means try
+ only at STARTPOS; in general, the last start tried is STARTPOS +
+ RANGE.
+
In REGS, return the indices of the virtual concatenation of STRING1
- and STRING2 that matched the entire PBUFP->buffer and its contained
- subexpressions. Do not consider matching one past the index MSTOP in
- the virtual concatenation of STRING1 and STRING2.
+ and STRING2 that matched the entire BUFP->buffer and its contained
+ subexpressions.
+
+ Do not consider matching one past the index STOP in the virtual
+ concatenation of STRING1 and STRING2.
- The value returned is the position in the strings at which the match
- was found, or -1 if no match was found, or -2 if error (such as
- failure stack overflow). */
+ We return either the position in the strings at which the match was
+ found, -1 if no match, or -2 if error (such as failure
+ stack overflow). */
int
-re_search_2 (pbufp, string1, size1, string2, size2, startpos, range,
- regs, mstop)
- struct re_pattern_buffer *pbufp;
- char *string1, *string2;
+re_search_2 (bufp, string1, size1, string2, size2, startpos, range, regs, stop)
+ struct re_pattern_buffer *bufp;
+ const char *string1, *string2;
int size1, size2;
int startpos;
- register int range;
+ int range;
struct re_registers *regs;
- int mstop;
+ int stop;
{
- register char *fastmap = pbufp->fastmap;
- register unsigned char *translate = (unsigned char *) pbufp->translate;
+ int val;
+ register char *fastmap = bufp->fastmap;
+ register char *translate = bufp->translate;
int total_size = size1 + size2;
int endpos = startpos + range;
- int val;
- /* Check for out-of-range starting position. */
- if (startpos < 0 || startpos > total_size)
+ /* Check for out-of-range STARTPOS. */
+ if (startpos < 0 || startpos > total_size)
return -1;
- /* Fix up range if it would eventually take startpos outside of the
- virtual concatenation of string1 and string2. */
+ /* Fix up RANGE if it might eventually take us outside
+ the virtual concatenation of STRING1 and STRING2. */
if (endpos < -1)
range = -1 - startpos;
else if (endpos > total_size)
range = total_size - startpos;
- /* Update the fastmap now if not correct already. */
- if (fastmap && !pbufp->fastmap_accurate)
- re_compile_fastmap (pbufp);
-
/* If the search isn't to be a backwards one, don't waste time in a
- long search for a pattern that says it is anchored. */
- if (pbufp->used > 0 && (enum regexpcode) pbufp->buffer[0] == begbuf
- && range > 0)
+ search for a pattern that must be anchored. */
+ if (bufp->used > 0 && (re_opcode_t) bufp->buffer[0] == begbuf && range > 0)
{
if (startpos > 0)
return -1;
@@ -1628,65 +3014,68 @@ re_search_2 (pbufp, string1, size1, string2, size2, startpos, range,
range = 1;
}
- while (1)
+ /* Update the fastmap now if not correct already. */
+ if (fastmap && !bufp->fastmap_accurate)
+ if (re_compile_fastmap (bufp) == -2)
+ return -2;
+
+ /* Loop through the string, looking for a place to start matching. */
+ for (;;)
{
/* If a fastmap is supplied, skip quickly over characters that
- cannot possibly be the start of a match. Note, however, that
- if the pattern can possibly match the null string, we must
- test it at each starting point so that we take the first null
- string we get. */
-
- if (fastmap && startpos < total_size && pbufp->can_be_null != 1)
+ cannot be the start of a match. If the pattern can match the
+ null string, however, we don't need to skip characters; we want
+ the first null string. */
+ if (fastmap && startpos < total_size && !bufp->can_be_null)
{
if (range > 0) /* Searching forwards. */
{
+ register const char *d;
register int lim = 0;
- register unsigned char *p;
int irange = range;
- if (startpos < size1 && startpos + range >= size1)
- lim = range - (size1 - startpos);
- p = ((unsigned char *)
- &(startpos >= size1 ? string2 - size1 : string1)[startpos]);
+ if (startpos < size1 && startpos + range >= size1)
+ lim = range - (size1 - startpos);
+
+ d = (startpos >= size1 ? string2 - size1 : string1) + startpos;
+
+ /* Written out as an if-else to avoid testing `translate'
+ inside the loop. */
+ if (translate)
+ while (range > lim
+ && !fastmap[(unsigned char)
+ translate[(unsigned char) *d++]])
+ range--;
+ else
+ while (range > lim && !fastmap[(unsigned char) *d++])
+ range--;
- while (range > lim && !fastmap[translate
- ? translate[*p++]
- : *p++])
- range--;
startpos += irange - range;
}
else /* Searching backwards. */
{
- register unsigned char c;
-
- if (string1 == 0 || startpos >= size1)
- c = string2[startpos - size1];
- else
- c = string1[startpos];
+ register char c = (size1 == 0 || startpos >= size1
+ ? string2[startpos - size1]
+ : string1[startpos]);
- c &= 0xff;
- if (translate ? !fastmap[translate[c]] : !fastmap[c])
+ if (!fastmap[(unsigned char) TRANSLATE (c)])
goto advance;
}
}
- if (range >= 0 && startpos == total_size
- && fastmap && pbufp->can_be_null == 0)
+ /* If can't match the null string, and that's all we have left, fail. */
+ if (range >= 0 && startpos == total_size && fastmap
+ && !bufp->can_be_null)
return -1;
- val = re_match_2 (pbufp, string1, size1, string2, size2, startpos,
- regs, mstop);
+ val = re_match_2 (bufp, string1, size1, string2, size2,
+ startpos, regs, stop);
if (val >= 0)
return startpos;
+
if (val == -2)
return -2;
-#ifndef NO_ALLOCA
-#ifdef C_ALLOCA
- alloca (0);
-#endif /* C_ALLOCA */
-
-#endif /* NO_ALLOCA */
advance:
if (!range)
break;
@@ -1702,245 +3091,236 @@ re_search_2 (pbufp, string1, size1, string2, size2, startpos, range,
}
}
return -1;
-}
-
-
+} /* re_search_2 */
-#ifndef emacs /* emacs never uses this. */
-int
-re_match (pbufp, string, size, pos, regs)
- struct re_pattern_buffer *pbufp;
- char *string;
- int size, pos;
- struct re_registers *regs;
-{
- return re_match_2 (pbufp, (char *) 0, 0, string, size, pos, regs, size);
-}
-#endif /* not emacs */
-
-
-/* The following are used for re_match_2, defined below: */
-
-/* Roughly the maximum number of failure points on the stack. Would be
- exactly that if always pushed MAX_NUM_FAILURE_ITEMS each time we failed. */
+/* Structure for per-register (a.k.a. per-group) information.
+ This must not be longer than one word, because we push this value
+ onto the failure stack. Other register information, such as the
+ starting and ending positions (which are addresses), and the list of
+ inner groups (which is a bits list) are maintained in separate
+ variables.
-int re_max_failures = 2000;
-
-/* Routine used by re_match_2. */
-/* static int memcmp_translate (); *//* already declared */
-
+ We are making a (strictly speaking) nonportable assumption here: that
+ the compiler will pack our bit fields into something that fits into
+ the type of `word', i.e., is something that fits into one item on the
+ failure stack. */
-/* Structure and accessing macros used in re_match_2: */
+/* Declarations and macros for re_match_2. */
-struct register_info
+typedef union
{
- unsigned is_active : 1;
- unsigned matched_something : 1;
-};
-
-#define IS_ACTIVE(R) ((R).is_active)
-#define MATCHED_SOMETHING(R) ((R).matched_something)
-
-
-/* Macros used by re_match_2: */
-
-
-/* I.e., regstart, regend, and reg_info. */
-
-#define NUM_REG_ITEMS 3
-
-/* We push at most this many things on the stack whenever we
- fail. The `+ 2' refers to PATTERN_PLACE and STRING_PLACE, which are
- arguments to the PUSH_FAILURE_POINT macro. */
-
-#define MAX_NUM_FAILURE_ITEMS (RE_NREGS * NUM_REG_ITEMS + 2)
-
-
-/* We push this many things on the stack whenever we fail. */
-
-#define NUM_FAILURE_ITEMS (last_used_reg * NUM_REG_ITEMS + 2)
-
-
-/* This pushes most of the information about the current state we will want
- if we ever fail back to it. */
-
-#define PUSH_FAILURE_POINT(pattern_place, string_place) \
- { \
- long last_used_reg, this_reg; \
- \
- /* Find out how many registers are active or have been matched. \
- (Aside from register zero, which is only set at the end.) */ \
- for (last_used_reg = RE_NREGS - 1; last_used_reg > 0; last_used_reg--)\
- if (regstart[last_used_reg] != (unsigned char *)(-1L)) \
- break; \
- \
- if (stacke - stackp < NUM_FAILURE_ITEMS) \
- { \
- unsigned char **stackx; \
- unsigned int len = stacke - stackb; \
- if (len > re_max_failures * MAX_NUM_FAILURE_ITEMS) \
- { \
- FREE_AND_RETURN(stackb,(-2)); \
- } \
- \
- /* Roughly double the size of the stack. */ \
- stackx = DOUBLE_STACK(stackx,stackb,len); \
- /* Rearrange the pointers. */ \
- stackp = stackx + (stackp - stackb); \
- stackb = stackx; \
- stacke = stackb + 2 * len; \
- } \
- \
- /* Now push the info for each of those registers. */ \
- for (this_reg = 1; this_reg <= last_used_reg; this_reg++) \
- { \
- *stackp++ = regstart[this_reg]; \
- *stackp++ = regend[this_reg]; \
- *stackp++ = (unsigned char *) &reg_info[this_reg]; \
- } \
- \
- /* Push how many registers we saved. */ \
- *stackp++ = (unsigned char *) last_used_reg; \
- \
- *stackp++ = pattern_place; \
- *stackp++ = string_place; \
- }
-
-
-/* This pops what PUSH_FAILURE_POINT pushes. */
-
-#define POP_FAILURE_POINT() \
- { \
- int temp; \
- stackp -= 2; /* Remove failure points. */ \
- temp = (int) *--stackp; /* How many regs pushed. */ \
- temp *= NUM_REG_ITEMS; /* How much to take off the stack. */ \
- stackp -= temp; /* Remove the register info. */ \
- }
-
+ fail_stack_elt_t word;
+ struct
+ {
+ /* This field is one if this group can match the empty string,
+ zero if not. If not yet determined, `MATCH_NULL_UNSET_VALUE'. */
+#define MATCH_NULL_UNSET_VALUE 3
+ unsigned match_null_string_p : 2;
+ unsigned is_active : 1;
+ unsigned matched_something : 1;
+ unsigned ever_matched_something : 1;
+ } bits;
+} register_info_type;
+
+#define REG_MATCH_NULL_STRING_P(R) ((R).bits.match_null_string_p)
+#define IS_ACTIVE(R) ((R).bits.is_active)
+#define MATCHED_SOMETHING(R) ((R).bits.matched_something)
+#define EVER_MATCHED_SOMETHING(R) ((R).bits.ever_matched_something)
+
+static boolean group_match_null_string_p _RE_ARGS((unsigned char **p,
+ unsigned char *end,
+ register_info_type *reg_info));
+static boolean alt_match_null_string_p _RE_ARGS((unsigned char *p,
+ unsigned char *end,
+ register_info_type *reg_info));
+static boolean common_op_match_null_string_p _RE_ARGS((unsigned char **p,
+ unsigned char *end,
+ register_info_type *reg_info));
+static int bcmp_translate _RE_ARGS((const char *s1, const char *s2,
+ int len, char *translate));
+
+/* Call this when have matched a real character; it sets `matched' flags
+ for the subexpressions which we are currently inside. Also records
+ that those subexprs have matched. */
+#define SET_REGS_MATCHED() \
+ do \
+ { \
+ active_reg_t r; \
+ for (r = lowest_active_reg; r <= highest_active_reg; r++) \
+ { \
+ MATCHED_SOMETHING (reg_info[r]) \
+ = EVER_MATCHED_SOMETHING (reg_info[r]) \
+ = 1; \
+ } \
+ } \
+ while (0)
+
+
+/* This converts PTR, a pointer into one of the search strings `string1'
+ and `string2' into an offset from the beginning of that string. */
+#define POINTER_TO_OFFSET(ptr) \
+ (FIRST_STRING_P (ptr) ? (ptr) - string1 : (ptr) - string2 + size1)
+
+/* Registers are set to a sentinel when they haven't yet matched. */
+#define REG_UNSET_VALUE ((char *) -1)
+#define REG_UNSET(e) ((e) == REG_UNSET_VALUE)
+
+
+/* Macros for dealing with the split strings in re_match_2. */
#define MATCHING_IN_FIRST_STRING (dend == end_match_1)
-/* Is true if there is a first string and if PTR is pointing anywhere
- inside it or just past the end. */
-
-#define IS_IN_FIRST_STRING(ptr) \
- (size1 && string1 <= (ptr) && (ptr) <= string1 + size1)
-
/* Call before fetching a character with *d. This switches over to
string2 if necessary. */
+#define PREFETCH() \
+ while (d == dend) \
+ { \
+ /* End of string2 => fail. */ \
+ if (dend == end_match_2) \
+ goto fail; \
+ /* End of string1 => advance to string2. */ \
+ d = string2; \
+ dend = end_match_2; \
+ }
-#define PREFETCH \
- while (d == dend) \
- { \
- /* end of string2 => fail. */ \
- if (dend == end_match_2) \
- goto fail; \
- /* end of string1 => advance to string2. */ \
- d = string2; \
- dend = end_match_2; \
- }
-
-
-/* Call this when have matched something; it sets `matched' flags for the
- registers corresponding to the subexpressions of which we currently
- are inside. */
-#define SET_REGS_MATCHED \
- { unsigned this_reg; \
- for (this_reg = 0; this_reg < RE_NREGS; this_reg++) \
- { \
- if (IS_ACTIVE(reg_info[this_reg])) \
- MATCHED_SOMETHING(reg_info[this_reg]) = 1; \
- else \
- MATCHED_SOMETHING(reg_info[this_reg]) = 0; \
- } \
- }
/* Test if at very beginning or at very end of the virtual concatenation
- of string1 and string2. If there is only one string, we've put it in
- string2. */
-
-#define AT_STRINGS_BEG (d == (size1 ? string1 : string2) || !size2)
-#define AT_STRINGS_END (d == end2)
-
-#define AT_WORD_BOUNDARY \
- (AT_STRINGS_BEG || AT_STRINGS_END || IS_A_LETTER (d - 1) != IS_A_LETTER (d))
-
-/* We have two special cases to check for:
- 1) if we're past the end of string1, we have to look at the first
- character in string2;
- 2) if we're before the beginning of string2, we have to look at the
- last character in string1; we assume there is a string1, so use
- this in conjunction with AT_STRINGS_BEG. */
-#define IS_A_LETTER(d) \
- (SYNTAX ((d) == end1 ? *string2 : (d) == string2 - 1 ? *(end1 - 1) : *(d))\
+ of `string1' and `string2'. If only one string, it's `string2'. */
+#define AT_STRINGS_BEG(d) ((d) == (size1 ? string1 : string2) || !size2)
+#define AT_STRINGS_END(d) ((d) == end2)
+
+
+/* Test if D points to a character which is word-constituent. We have
+ two special cases to check for: if past the end of string1, look at
+ the first character in string2; and if before the beginning of
+ string2, look at the last character in string1. */
+#define WORDCHAR_P(d) \
+ (SYNTAX ((d) == end1 ? *string2 \
+ : (d) == string2 - 1 ? *(end1 - 1) : *(d)) \
== Sword)
+/* Test if the character before D and the one at D differ with respect
+ to being word-constituent. */
+#define AT_WORD_BOUNDARY(d) \
+ (AT_STRINGS_BEG (d) || AT_STRINGS_END (d) \
+ || WORDCHAR_P (d - 1) != WORDCHAR_P (d))
+
+
+/* Free everything we malloc. */
+#ifdef REGEX_MALLOC
+#define FREE_VAR(var) if (var) free (var); var = NULL
+#define FREE_VARIABLES() \
+ do { \
+ FREE_VAR (fail_stack.stack); \
+ FREE_VAR (regstart); \
+ FREE_VAR (regend); \
+ FREE_VAR (old_regstart); \
+ FREE_VAR (old_regend); \
+ FREE_VAR (best_regstart); \
+ FREE_VAR (best_regend); \
+ FREE_VAR (reg_info); \
+ FREE_VAR (reg_dummy); \
+ FREE_VAR (reg_info_dummy); \
+ } while (0)
+#else /* not REGEX_MALLOC */
+/* Some MIPS systems (at least) want this to free alloca'd storage. */
+#define FREE_VARIABLES() alloca (0)
+#endif /* not REGEX_MALLOC */
+
+
+/* These values must meet several constraints. They must not be valid
+ register values; since we have a limit of 255 registers (because
+ we use only one byte in the pattern for the register number), we can
+ use numbers larger than 255. They must differ by 1, because of
+ NUM_FAILURE_ITEMS above. And the value for the lowest register must
+ be larger than the value for the highest register, so we do not try
+ to actually save any registers when none are active. */
+#define NO_HIGHEST_ACTIVE_REG (1 << BYTEWIDTH)
+#define NO_LOWEST_ACTIVE_REG (NO_HIGHEST_ACTIVE_REG + 1)
+
+/* Matching routines. */
-/* Match the pattern described by PBUFP against the virtual
- concatenation of STRING1 and STRING2, which are of SIZE1 and SIZE2,
- respectively. Start the match at index POS in the virtual
- concatenation of STRING1 and STRING2. In REGS, return the indices of
- the virtual concatenation of STRING1 and STRING2 that matched the
- entire PBUFP->buffer and its contained subexpressions. Do not
- consider matching one past the index MSTOP in the virtual
- concatenation of STRING1 and STRING2.
+#ifndef emacs /* Emacs never uses this. */
+/* re_match is like re_match_2 except it takes only a single string. */
- If pbufp->fastmap is nonzero, then it had better be up to date.
+int
+re_match (bufp, string, size, pos, regs)
+ struct re_pattern_buffer *bufp;
+ const char *string;
+ int size, pos;
+ struct re_registers *regs;
+ {
+ return re_match_2 (bufp, NULL, 0, string, size, pos, regs, size);
+}
+#endif /* not emacs */
- The reason that the data to match are specified as two components
- which are to be regarded as concatenated is so this function can be
- used directly on the contents of an Emacs buffer.
- -1 is returned if there is no match. -2 is returned if there is an
- error (such as match stack overflow). Otherwise the value is the
- length of the substring which was matched. */
+/* re_match_2 matches the compiled pattern in BUFP against the
+ the (virtual) concatenation of STRING1 and STRING2 (of length SIZE1
+ and SIZE2, respectively). We start matching at POS, and stop
+ matching at STOP.
+
+ If REGS is non-null and the `no_sub' field of BUFP is nonzero, we
+ store offsets for the substring each group matched in REGS. See the
+ documentation for exactly how many groups we fill.
+
+ We return -1 if no match, -2 if an internal error (such as the
+ failure stack overflowing). Otherwise, we return the length of the
+ matched substring. */
int
-re_match_2 (pbufp, string1_arg, size1, string2_arg, size2, pos, regs, mstop)
- struct re_pattern_buffer *pbufp;
- char *string1_arg, *string2_arg;
+re_match_2 (bufp, string1, size1, string2, size2, pos, regs, stop)
+ struct re_pattern_buffer *bufp;
+ const char *string1, *string2;
int size1, size2;
int pos;
struct re_registers *regs;
- int mstop;
+ int stop;
{
- register unsigned char *p = (unsigned char *) pbufp->buffer;
-
- /* Pointer to beyond end of buffer. */
- register unsigned char *pend = p + pbufp->used;
+ /* General temporaries. */
+ int mcnt;
+ unsigned char *p1;
- unsigned char *string1 = (unsigned char *) string1_arg;
- unsigned char *string2 = (unsigned char *) string2_arg;
- unsigned char *end1; /* Just past end of first string. */
- unsigned char *end2; /* Just past end of second string. */
+ /* Just past the end of the corresponding string. */
+ const char *end1, *end2;
/* Pointers into string1 and string2, just past the last characters in
each to consider matching. */
- unsigned char *end_match_1, *end_match_2;
-
- register unsigned char *d, *dend;
- register int mcnt; /* Multipurpose. */
- unsigned char *translate = (unsigned char *) pbufp->translate;
- unsigned is_a_jump_n = 0;
-
- /* Failure point stack. Each place that can handle a failure further
- down the line pushes a failure point on this stack. It consists of
- restart, regend, and reg_info for all registers corresponding to the
- subexpressions we're currently inside, plus the number of such
- registers, and, finally, two char *'s. The first char * is where to
- resume scanning the pattern; the second one is where to resume
- scanning the strings. If the latter is zero, the failure point is a
- ``dummy''; if a failure happens and the failure point is a dummy, it
- gets discarded and the next next one is tried. */
-
-#ifndef NO_ALLOCA
- unsigned char *initial_stack[MAX_NUM_FAILURE_ITEMS * NFAILURES];
+ const char *end_match_1, *end_match_2;
+
+ /* Where we are in the data, and the end of the current string. */
+ const char *d, *dend;
+
+ /* Where we are in the pattern, and the end of the pattern. */
+ unsigned char *p = bufp->buffer;
+ register unsigned char *pend = p + bufp->used;
+
+ /* We use this to map every character in the string. */
+ char *translate = bufp->translate;
+
+ /* Failure point stack. Each place that can handle a failure further
+ down the line pushes a failure point on this stack. It consists of
+ restart, regend, and reg_info for all registers corresponding to
+ the subexpressions we're currently inside, plus the number of such
+ registers, and, finally, two char *'s. The first char * is where
+ to resume scanning the pattern; the second one is where to resume
+ scanning the strings. If the latter is zero, the failure point is
+ a ``dummy''; if a failure happens and the failure point is a dummy,
+ it gets discarded and the next next one is tried. */
+ fail_stack_type fail_stack;
+#ifdef DEBUG
+ static unsigned failure_id = 0;
+ unsigned nfailure_points_pushed = 0, nfailure_points_popped = 0;
#endif
- unsigned char **stackb;
- unsigned char **stackp;
- unsigned char **stacke;
+ /* We fill all the registers internally, independent of what we
+ return, for use in backreferences. The number here includes
+ an element for register zero. */
+ size_t num_regs = bufp->re_nsub + 1;
+
+ /* The currently active registers. */
+ active_reg_t lowest_active_reg = NO_LOWEST_ACTIVE_REG;
+ active_reg_t highest_active_reg = NO_HIGHEST_ACTIVE_REG;
/* Information on the contents of registers. These are pointers into
the input strings; they record just what was matched (on this
@@ -1949,9 +3329,14 @@ re_match_2 (pbufp, string1_arg, size1, string2_arg, size2, pos, regs, mstop)
matching and the regnum-th regend points to right after where we
stopped matching the regnum-th subexpression. (The zeroth register
keeps track of what the whole pattern matches.) */
-
- unsigned char *regstart[RE_NREGS];
- unsigned char *regend[RE_NREGS];
+ const char **regstart = 0, **regend = 0;
+
+ /* If a group that's operated upon by a repetition operator fails to
+ match anything, then the register for its start will need to be
+ restored because it will have been set to wherever in the string we
+ are when we last see its open-group operator. Similarly for a
+ register's end. */
+ const char **old_regstart = 0, **old_regend = 0;
/* The is_active field of reg_info helps us keep track of which (possibly
nested) subexpressions we are currently in. The matched_something
@@ -1959,50 +3344,97 @@ re_match_2 (pbufp, string1_arg, size1, string2_arg, size2, pos, regs, mstop)
matched any of the pattern so far this time through the reg_num-th
subexpression. These two fields get reset each time through any
loop their register is in. */
-
- struct register_info reg_info[RE_NREGS];
-
+ register_info_type *reg_info = 0;
/* The following record the register info as found in the above
variables when we find a match better than any we've seen before.
This happens as we backtrack through the failure points, which in
- turn happens only if we have not yet matched the entire string. */
-
- unsigned best_regs_set = 0;
- unsigned char *best_regstart[RE_NREGS];
- unsigned char *best_regend[RE_NREGS];
-
- /* Initialize the stack. */
-#ifdef NO_ALLOCA
- stackb = (unsigned char **) malloc (MAX_NUM_FAILURE_ITEMS * NFAILURES * sizeof (char *));
-#else
- stackb = initial_stack;
+ turn happens only if we have not yet matched the entire string. */
+ unsigned best_regs_set = false;
+ const char **best_regstart = 0, **best_regend = 0;
+
+ /* Logically, this is `best_regend[0]'. But we don't want to have to
+ allocate space for that if we're not allocating space for anything
+ else (see below). Also, we never need info about register 0 for
+ any of the other register vectors, and it seems rather a kludge to
+ treat `best_regend' differently than the rest. So we keep track of
+ the end of the best match so far in a separate variable. We
+ initialize this to NULL so that when we backtrack the first time
+ and need to test it, it's not garbage. */
+ const char *match_end = NULL;
+
+ /* Used when we pop values we don't care about. */
+ const char **reg_dummy = 0;
+ register_info_type *reg_info_dummy = 0;
+
+#ifdef DEBUG
+ /* Counts the total number of registers pushed. */
+ unsigned num_regs_pushed = 0;
#endif
- stackp = stackb;
- stacke = &stackb[MAX_NUM_FAILURE_ITEMS * NFAILURES];
-#ifdef DEBUG_REGEX
- fprintf (stderr, "Entering re_match_2(%s%s)\n", string1_arg, string2_arg);
-#endif
+ DEBUG_PRINT1 ("\n\nEntering re_match_2.\n");
+
+ INIT_FAIL_STACK ();
+
+ /* Do not bother to initialize all the register variables if there are
+ no groups in the pattern, as it takes a fair amount of time. If
+ there are groups, we include space for register 0 (the whole
+ pattern), even though we never use it, since it simplifies the
+ array indexing. We should fix this. */
+ if (bufp->re_nsub)
+ {
+ regstart = REGEX_TALLOC (num_regs, const char *);
+ regend = REGEX_TALLOC (num_regs, const char *);
+ old_regstart = REGEX_TALLOC (num_regs, const char *);
+ old_regend = REGEX_TALLOC (num_regs, const char *);
+ best_regstart = REGEX_TALLOC (num_regs, const char *);
+ best_regend = REGEX_TALLOC (num_regs, const char *);
+ reg_info = REGEX_TALLOC (num_regs, register_info_type);
+ reg_dummy = REGEX_TALLOC (num_regs, const char *);
+ reg_info_dummy = REGEX_TALLOC (num_regs, register_info_type);
+
+ if (!(regstart && regend && old_regstart && old_regend && reg_info
+ && best_regstart && best_regend && reg_dummy && reg_info_dummy))
+ {
+ FREE_VARIABLES ();
+ return -2;
+ }
+ }
+#ifdef REGEX_MALLOC
+ else
+ {
+ /* We must initialize all our variables to NULL, so that
+ `FREE_VARIABLES' doesn't try to free them. */
+ regstart = regend = old_regstart = old_regend = best_regstart
+ = best_regend = reg_dummy = NULL;
+ reg_info = reg_info_dummy = (register_info_type *) NULL;
+ }
+#endif /* REGEX_MALLOC */
+ /* The starting position is bogus. */
+ if (pos < 0 || pos > size1 + size2)
+ {
+ FREE_VARIABLES ();
+ return -1;
+ }
+
/* Initialize subexpression text positions to -1 to mark ones that no
- \( or ( and \) or ) has been seen for. Also set all registers to
- inactive and mark them as not having matched anything or ever
- failed. */
- for (mcnt = 0; mcnt < RE_NREGS; mcnt++)
+ start_memory/stop_memory has been seen for. Also initialize the
+ register information struct. */
+ for (mcnt = 1; mcnt < num_regs; mcnt++)
{
- regstart[mcnt] = regend[mcnt] = (unsigned char *) (-1L);
+ regstart[mcnt] = regend[mcnt]
+ = old_regstart[mcnt] = old_regend[mcnt] = REG_UNSET_VALUE;
+
+ REG_MATCH_NULL_STRING_P (reg_info[mcnt]) = MATCH_NULL_UNSET_VALUE;
IS_ACTIVE (reg_info[mcnt]) = 0;
MATCHED_SOMETHING (reg_info[mcnt]) = 0;
+ EVER_MATCHED_SOMETHING (reg_info[mcnt]) = 0;
}
- if (regs)
- for (mcnt = 0; mcnt < RE_NREGS; mcnt++)
- regs->start[mcnt] = regs->end[mcnt] = -1;
-
- /* Set up pointers to ends of strings.
- Don't allow the second string to be empty unless both are empty. */
- if (size2 == 0)
+ /* We move `string1' into `string2' if the latter's empty -- but not if
+ `string1' is null. */
+ if (size2 == 0 && string1 != NULL)
{
string2 = string1;
size2 = size1;
@@ -2013,66 +3445,73 @@ re_match_2 (pbufp, string1_arg, size1, string2_arg, size2, pos, regs, mstop)
end2 = string2 + size2;
/* Compute where to stop matching, within the two strings. */
- if (mstop <= size1)
+ if (stop <= size1)
{
- end_match_1 = string1 + mstop;
+ end_match_1 = string1 + stop;
end_match_2 = string2;
}
else
{
end_match_1 = end1;
- end_match_2 = string2 + mstop - size1;
+ end_match_2 = string2 + stop - size1;
}
- /* `p' scans through the pattern as `d' scans through the data. `dend'
- is the end of the input string that `d' points within. `d' is
- advanced into the following input string whenever necessary, but
+ /* `p' scans through the pattern as `d' scans through the data.
+ `dend' is the end of the input string that `d' points within. `d'
+ is advanced into the following input string whenever necessary, but
this happens before fetching; therefore, at the beginning of the
loop, `d' can be pointing at the end of a string, but it cannot
- equal string2. */
-
- if (size1 != 0 && pos <= size1)
- d = string1 + pos, dend = end_match_1;
+ equal `string2'. */
+ if (size1 > 0 && pos <= size1)
+ {
+ d = string1 + pos;
+ dend = end_match_1;
+ }
else
- d = string2 + pos - size1, dend = end_match_2;
-
+ {
+ d = string2 + pos - size1;
+ dend = end_match_2;
+ }
+ DEBUG_PRINT1 ("The compiled pattern is: ");
+ DEBUG_PRINT_COMPILED_PATTERN (bufp, p, pend);
+ DEBUG_PRINT1 ("The string to match is: `");
+ DEBUG_PRINT_DOUBLE_STRING (d, string1, size1, string2, size2);
+ DEBUG_PRINT1 ("'\n");
+
/* This loops over pattern commands. It exits by returning from the
- function if match is complete, or it drops through if match fails
- at this starting point in the input data. */
-
- while (1)
+ function if the match is complete, or it drops through if the match
+ fails at this starting point in the input data. */
+ for (;;)
{
-#ifdef DEBUG_REGEX
- fprintf (stderr,
- "regex loop(%d): matching 0x%02d\n",
- p - (unsigned char *) pbufp->buffer,
- *p);
-#endif
- is_a_jump_n = 0;
- /* End of pattern means we might have succeeded. */
+ DEBUG_PRINT2 ("\n0x%x: ", p);
+
if (p == pend)
- {
- /* If not end of string, try backtracking. Otherwise done. */
+ { /* End of pattern means we might have succeeded. */
+ DEBUG_PRINT1 ("end of pattern ... ");
+
+ /* If we haven't matched the entire string, and we want the
+ longest match, try backtracking. */
if (d != end_match_2)
{
- if (stackp != stackb)
- {
- /* More failure points to try. */
-
- unsigned in_same_string =
- IS_IN_FIRST_STRING (best_regend[0])
- == MATCHING_IN_FIRST_STRING;
+ DEBUG_PRINT1 ("backtracking.\n");
+
+ if (!FAIL_STACK_EMPTY ())
+ { /* More failure points to try. */
+ boolean same_str_p = (FIRST_STRING_P (match_end)
+ == MATCHING_IN_FIRST_STRING);
/* If exceeds best match so far, save it. */
- if (! best_regs_set
- || (in_same_string && d > best_regend[0])
- || (! in_same_string && ! MATCHING_IN_FIRST_STRING))
+ if (!best_regs_set
+ || (same_str_p && d > match_end)
+ || (!same_str_p && !MATCHING_IN_FIRST_STRING))
{
- best_regs_set = 1;
- best_regend[0] = d; /* Never use regstart[0]. */
+ best_regs_set = true;
+ match_end = d;
- for (mcnt = 1; mcnt < RE_NREGS; mcnt++)
+ DEBUG_PRINT1 ("\nSAVING match as best so far.\n");
+
+ for (mcnt = 1; mcnt < num_regs; mcnt++)
{
best_regstart[mcnt] = regstart[mcnt];
best_regend[mcnt] = regend[mcnt];
@@ -2080,123 +3519,395 @@ re_match_2 (pbufp, string1_arg, size1, string2_arg, size2, pos, regs, mstop)
}
goto fail;
}
+
/* If no failure points, don't restore garbage. */
else if (best_regs_set)
{
- restore_best_regs:
- /* Restore best match. */
- d = best_regend[0];
+ restore_best_regs:
+ /* Restore best match. It may happen that `dend ==
+ end_match_1' while the restored d is in string2.
+ For example, the pattern `x.*y.*z' against the
+ strings `x-' and `y-z-', if the two strings are
+ not consecutive in memory. */
+ DEBUG_PRINT1 ("Restoring best registers.\n");
- for (mcnt = 0; mcnt < RE_NREGS; mcnt++)
+ d = match_end;
+ dend = ((d >= string1 && d <= end1)
+ ? end_match_1 : end_match_2);
+
+ for (mcnt = 1; mcnt < num_regs; mcnt++)
{
regstart[mcnt] = best_regstart[mcnt];
regend[mcnt] = best_regend[mcnt];
}
}
- }
+ } /* d != end_match_2 */
+
+ DEBUG_PRINT1 ("Accepting match.\n");
- /* If caller wants register contents data back, convert it
- to indices. */
- if (regs)
+ /* If caller wants register contents data back, do it. */
+ if (regs && !bufp->no_sub)
{
- regs->start[0] = pos;
- if (MATCHING_IN_FIRST_STRING)
- regs->end[0] = d - string1;
- else
- regs->end[0] = d - string2 + size1;
- for (mcnt = 1; mcnt < RE_NREGS; mcnt++)
+ /* Have the register data arrays been allocated? */
+ if (bufp->regs_allocated == REGS_UNALLOCATED)
+ { /* No. So allocate them with malloc. We need one
+ extra element beyond `num_regs' for the `-1' marker
+ GNU code uses. */
+ regs->num_regs = MAX (RE_NREGS, num_regs + 1);
+ regs->start = TALLOC (regs->num_regs, regoff_t);
+ regs->end = TALLOC (regs->num_regs, regoff_t);
+ if (regs->start == NULL || regs->end == NULL)
+ return -2;
+ bufp->regs_allocated = REGS_REALLOCATE;
+ }
+ else if (bufp->regs_allocated == REGS_REALLOCATE)
+ { /* Yes. If we need more elements than were already
+ allocated, reallocate them. If we need fewer, just
+ leave it alone. */
+ if (regs->num_regs < num_regs + 1)
+ {
+ regs->num_regs = num_regs + 1;
+ RETALLOC (regs->start, regs->num_regs, regoff_t);
+ RETALLOC (regs->end, regs->num_regs, regoff_t);
+ if (regs->start == NULL || regs->end == NULL)
+ return -2;
+ }
+ }
+ else
{
- if (regend[mcnt] == (unsigned char *)(-1L))
- {
- regs->start[mcnt] = -1;
- regs->end[mcnt] = -1;
- continue;
- }
- if (IS_IN_FIRST_STRING (regstart[mcnt]))
- regs->start[mcnt] = regstart[mcnt] - string1;
- else
- regs->start[mcnt] = regstart[mcnt] - string2 + size1;
-
- if (IS_IN_FIRST_STRING (regend[mcnt]))
- regs->end[mcnt] = regend[mcnt] - string1;
- else
- regs->end[mcnt] = regend[mcnt] - string2 + size1;
+ /* These braces fend off a "empty body in an else-statement"
+ warning under GCC when assert expands to nothing. */
+ assert (bufp->regs_allocated == REGS_FIXED);
}
- }
- FREE_AND_RETURN(stackb,
- (d - pos - (MATCHING_IN_FIRST_STRING ?
- string1 :
- string2 - size1)));
+
+ /* Convert the pointer data in `regstart' and `regend' to
+ indices. Register zero has to be set differently,
+ since we haven't kept track of any info for it. */
+ if (regs->num_regs > 0)
+ {
+ regs->start[0] = pos;
+ regs->end[0] = (MATCHING_IN_FIRST_STRING ? d - string1
+ : d - string2 + size1);
+ }
+
+ /* Go through the first `min (num_regs, regs->num_regs)'
+ registers, since that is all we initialized. */
+ for (mcnt = 1; mcnt < MIN (num_regs, regs->num_regs); mcnt++)
+ {
+ if (REG_UNSET (regstart[mcnt]) || REG_UNSET (regend[mcnt]))
+ regs->start[mcnt] = regs->end[mcnt] = -1;
+ else
+ {
+ regs->start[mcnt] = POINTER_TO_OFFSET (regstart[mcnt]);
+ regs->end[mcnt] = POINTER_TO_OFFSET (regend[mcnt]);
+ }
+ }
+
+ /* If the regs structure we return has more elements than
+ were in the pattern, set the extra elements to -1. If
+ we (re)allocated the registers, this is the case,
+ because we always allocate enough to have at least one
+ -1 at the end. */
+ for (mcnt = num_regs; mcnt < regs->num_regs; mcnt++)
+ regs->start[mcnt] = regs->end[mcnt] = -1;
+ } /* regs && !bufp->no_sub */
+
+ FREE_VARIABLES ();
+ DEBUG_PRINT4 ("%u failure points pushed, %u popped (%u remain).\n",
+ nfailure_points_pushed, nfailure_points_popped,
+ nfailure_points_pushed - nfailure_points_popped);
+ DEBUG_PRINT2 ("%u registers pushed.\n", num_regs_pushed);
+
+ mcnt = d - pos - (MATCHING_IN_FIRST_STRING
+ ? string1
+ : string2 - size1);
+
+ DEBUG_PRINT2 ("Returning %d from re_match_2.\n", mcnt);
+
+ return mcnt;
}
/* Otherwise match next pattern command. */
#ifdef SWITCH_ENUM_BUG
- switch ((int) ((enum regexpcode) *p++))
+ switch ((int) ((re_opcode_t) *p++))
#else
- switch ((enum regexpcode) *p++)
+ switch ((re_opcode_t) *p++)
#endif
{
+ /* Ignore these. Used to ignore the n of succeed_n's which
+ currently have n == 0. */
+ case no_op:
+ DEBUG_PRINT1 ("EXECUTING no_op.\n");
+ break;
+
+
+ /* Match the next n pattern characters exactly. The following
+ byte in the pattern defines n, and the n bytes after that
+ are the characters to match. */
+ case exactn:
+ mcnt = *p++;
+ DEBUG_PRINT2 ("EXECUTING exactn %d.\n", mcnt);
+
+ /* This is written out as an if-else so we don't waste time
+ testing `translate' inside the loop. */
+ if (translate)
+ {
+ do
+ {
+ PREFETCH ();
+ if (translate[(unsigned char) *d++] != (char) *p++)
+ goto fail;
+ }
+ while (--mcnt);
+ }
+ else
+ {
+ do
+ {
+ PREFETCH ();
+ if (*d++ != (char) *p++) goto fail;
+ }
+ while (--mcnt);
+ }
+ SET_REGS_MATCHED ();
+ break;
+
+
+ /* Match any character except possibly a newline or a null. */
+ case anychar:
+ DEBUG_PRINT1 ("EXECUTING anychar.\n");
+
+ PREFETCH ();
+
+ if ((!(bufp->syntax & RE_DOT_NEWLINE) && TRANSLATE (*d) == '\n')
+ || (bufp->syntax & RE_DOT_NOT_NULL && TRANSLATE (*d) == '\000'))
+ goto fail;
+
+ SET_REGS_MATCHED ();
+ DEBUG_PRINT2 (" Matched `%d'.\n", *d);
+ d++;
+ break;
+
+
+ case charset:
+ case charset_not:
+ {
+ register unsigned char c;
+ boolean not = (re_opcode_t) *(p - 1) == charset_not;
+
+ DEBUG_PRINT2 ("EXECUTING charset%s.\n", not ? "_not" : "");
+
+ PREFETCH ();
+ c = TRANSLATE (*d); /* The character to match. */
+
+ /* Cast to `unsigned' instead of `unsigned char' in case the
+ bit list is a full 32 bytes long. */
+ if (c < (unsigned) (*p * BYTEWIDTH)
+ && p[1 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
+ not = !not;
+
+ p += 1 + *p;
+
+ if (!not) goto fail;
+
+ SET_REGS_MATCHED ();
+ d++;
+ break;
+ }
+
+
+ /* The beginning of a group is represented by start_memory.
+ The arguments are the register number in the next byte, and the
+ number of groups inner to this one in the next. The text
+ matched within the group is recorded (in the internal
+ registers data structure) under the register number. */
+ case start_memory:
+ DEBUG_PRINT3 ("EXECUTING start_memory %d (%d):\n", *p, p[1]);
+
+ /* Find out if this group can match the empty string. */
+ p1 = p; /* To send to group_match_null_string_p. */
+
+ if (REG_MATCH_NULL_STRING_P (reg_info[*p]) == MATCH_NULL_UNSET_VALUE)
+ REG_MATCH_NULL_STRING_P (reg_info[*p])
+ = group_match_null_string_p (&p1, pend, reg_info);
+
+ /* Save the position in the string where we were the last time
+ we were at this open-group operator in case the group is
+ operated upon by a repetition operator, e.g., with `(a*)*b'
+ against `ab'; then we want to ignore where we are now in
+ the string in case this attempt to match fails. */
+ old_regstart[*p] = REG_MATCH_NULL_STRING_P (reg_info[*p])
+ ? REG_UNSET (regstart[*p]) ? d : regstart[*p]
+ : regstart[*p];
+ DEBUG_PRINT2 (" old_regstart: %d\n",
+ POINTER_TO_OFFSET (old_regstart[*p]));
- /* \( [or `(', as appropriate] is represented by start_memory,
- \) by stop_memory. Both of those commands are followed by
- a register number in the next byte. The text matched
- within the \( and \) is recorded under that number. */
- case start_memory:
regstart[*p] = d;
+ DEBUG_PRINT2 (" regstart: %d\n", POINTER_TO_OFFSET (regstart[*p]));
+
IS_ACTIVE (reg_info[*p]) = 1;
MATCHED_SOMETHING (reg_info[*p]) = 0;
- p++;
+
+ /* This is the new highest active register. */
+ highest_active_reg = *p;
+
+ /* If nothing was active before, this is the new lowest active
+ register. */
+ if (lowest_active_reg == NO_LOWEST_ACTIVE_REG)
+ lowest_active_reg = *p;
+
+ /* Move past the register number and inner group count. */
+ p += 2;
break;
+
+ /* The stop_memory opcode represents the end of a group. Its
+ arguments are the same as start_memory's: the register
+ number, and the number of inner groups. */
case stop_memory:
+ DEBUG_PRINT3 ("EXECUTING stop_memory %d (%d):\n", *p, p[1]);
+
+ /* We need to save the string position the last time we were at
+ this close-group operator in case the group is operated
+ upon by a repetition operator, e.g., with `((a*)*(b*)*)*'
+ against `aba'; then we want to ignore where we are now in
+ the string in case this attempt to match fails. */
+ old_regend[*p] = REG_MATCH_NULL_STRING_P (reg_info[*p])
+ ? REG_UNSET (regend[*p]) ? d : regend[*p]
+ : regend[*p];
+ DEBUG_PRINT2 (" old_regend: %d\n",
+ POINTER_TO_OFFSET (old_regend[*p]));
+
regend[*p] = d;
- IS_ACTIVE (reg_info[*p]) = 0;
+ DEBUG_PRINT2 (" regend: %d\n", POINTER_TO_OFFSET (regend[*p]));
- /* If just failed to match something this time around with a sub-
- expression that's in a loop, try to force exit from the loop. */
- if ((! MATCHED_SOMETHING (reg_info[*p])
- || (enum regexpcode) p[-3] == start_memory)
- && (p + 1) != pend)
+ /* This register isn't active anymore. */
+ IS_ACTIVE (reg_info[*p]) = 0;
+
+ /* If this was the only register active, nothing is active
+ anymore. */
+ if (lowest_active_reg == highest_active_reg)
{
- register unsigned char *p2 = p + 1;
+ lowest_active_reg = NO_LOWEST_ACTIVE_REG;
+ highest_active_reg = NO_HIGHEST_ACTIVE_REG;
+ }
+ else
+ { /* We must scan for the new highest active register, since
+ it isn't necessarily one less than now: consider
+ (a(b)c(d(e)f)g). When group 3 ends, after the f), the
+ new highest active register is 1. */
+ unsigned char r = *p - 1;
+ while (r > 0 && !IS_ACTIVE (reg_info[r]))
+ r--;
+
+ /* If we end up at register zero, that means that we saved
+ the registers as the result of an `on_failure_jump', not
+ a `start_memory', and we jumped to past the innermost
+ `stop_memory'. For example, in ((.)*) we save
+ registers 1 and 2 as a result of the *, but when we pop
+ back to the second ), we are at the stop_memory 1.
+ Thus, nothing is active. */
+ if (r == 0)
+ {
+ lowest_active_reg = NO_LOWEST_ACTIVE_REG;
+ highest_active_reg = NO_HIGHEST_ACTIVE_REG;
+ }
+ else
+ highest_active_reg = r;
+ }
+
+ /* If just failed to match something this time around with a
+ group that's operated on by a repetition operator, try to
+ force exit from the ``loop'', and restore the register
+ information for this group that we had before trying this
+ last match. */
+ if ((!MATCHED_SOMETHING (reg_info[*p])
+ || (re_opcode_t) p[-3] == start_memory)
+ && (p + 2) < pend)
+ {
+ boolean is_a_jump_n = false;
+
+ p1 = p + 2;
mcnt = 0;
- switch (*p2++)
+ switch ((re_opcode_t) *p1++)
{
case jump_n:
- is_a_jump_n = 1;
- case finalize_jump:
- case maybe_finalize_jump:
+ is_a_jump_n = true;
+ case pop_failure_jump:
+ case maybe_pop_jump:
case jump:
case dummy_failure_jump:
- EXTRACT_NUMBER_AND_INCR (mcnt, p2);
+ EXTRACT_NUMBER_AND_INCR (mcnt, p1);
if (is_a_jump_n)
- p2 += 2;
+ p1 += 2;
break;
+
+ default:
+ /* do nothing */ ;
}
- p2 += mcnt;
+ p1 += mcnt;
/* If the next operation is a jump backwards in the pattern
- to an on_failure_jump, exit from the loop by forcing a
- failure after pushing on the stack the on_failure_jump's
- jump in the pattern, and d. */
- if (mcnt < 0 && (enum regexpcode) *p2++ == on_failure_jump)
+ to an on_failure_jump right before the start_memory
+ corresponding to this stop_memory, exit from the loop
+ by forcing a failure after pushing on the stack the
+ on_failure_jump's jump in the pattern, and d. */
+ if (mcnt < 0 && (re_opcode_t) *p1 == on_failure_jump
+ && (re_opcode_t) p1[3] == start_memory && p1[4] == *p)
{
- EXTRACT_NUMBER_AND_INCR (mcnt, p2);
- PUSH_FAILURE_POINT (p2 + mcnt, d);
+ /* If this group ever matched anything, then restore
+ what its registers were before trying this last
+ failed match, e.g., with `(a*)*b' against `ab' for
+ regstart[1], and, e.g., with `((a*)*(b*)*)*'
+ against `aba' for regend[3].
+
+ Also restore the registers for inner groups for,
+ e.g., `((a*)(b*))*' against `aba' (register 3 would
+ otherwise get trashed). */
+
+ if (EVER_MATCHED_SOMETHING (reg_info[*p]))
+ {
+ unsigned r;
+
+ EVER_MATCHED_SOMETHING (reg_info[*p]) = 0;
+
+ /* Restore this and inner groups' (if any) registers. */
+ for (r = *p; r < *p + *(p + 1); r++)
+ {
+ regstart[r] = old_regstart[r];
+
+ /* xx why this test? */
+ if ((s_reg_t) old_regend[r] >= (s_reg_t) regstart[r])
+ regend[r] = old_regend[r];
+ }
+ }
+ p1++;
+ EXTRACT_NUMBER_AND_INCR (mcnt, p1);
+ PUSH_FAILURE_POINT (p1 + mcnt, d, -2);
+ PUSH_FAILURE_POINT2(p1 + mcnt, d, -2);
+
goto fail;
}
}
- p++;
+
+ /* Move past the register number and the inner group count. */
+ p += 2;
break;
+
/* \<digit> has been turned into a `duplicate' command which is
followed by the numeric value of <digit> as the register number. */
case duplicate:
{
- int regno = *p++; /* Get which register to match against */
- register unsigned char *d2, *dend2;
+ register const char *d2, *dend2;
+ int regno = *p++; /* Get which register to match against. */
+ DEBUG_PRINT2 ("EXECUTING duplicate %d.\n", regno);
- /* Where in input to try to start matching. */
+ /* Can't back reference a group which we've never matched. */
+ if (REG_UNSET (regstart[regno]) || REG_UNSET (regend[regno]))
+ goto fail;
+
+ /* Where in input to try to start matching. */
d2 = regstart[regno];
/* Where to stop matching; if both the place to start and
@@ -2204,10 +3915,10 @@ re_match_2 (pbufp, string1_arg, size1, string2_arg, size2, pos, regs, mstop)
set to the place to stop, otherwise, for now have to use
the end of the first string. */
- dend2 = ((IS_IN_FIRST_STRING (regstart[regno])
- == IS_IN_FIRST_STRING (regend[regno]))
+ dend2 = ((FIRST_STRING_P (regstart[regno])
+ == FIRST_STRING_P (regend[regno]))
? regend[regno] : end_match_1);
- while (1)
+ for (;;)
{
/* If necessary, advance to next segment in register
contents. */
@@ -2215,13 +3926,16 @@ re_match_2 (pbufp, string1_arg, size1, string2_arg, size2, pos, regs, mstop)
{
if (dend2 == end_match_2) break;
if (dend2 == regend[regno]) break;
- d2 = string2, dend2 = regend[regno]; /* end of string1 => advance to string2. */
+
+ /* End of string1 => advance to string2. */
+ d2 = string2;
+ dend2 = regend[regno];
}
/* At end of register contents => success */
if (d2 == dend2) break;
/* If necessary, advance to next segment in data. */
- PREFETCH;
+ PREFETCH ();
/* How many characters left in this segment to match. */
mcnt = dend - d;
@@ -2234,196 +3948,335 @@ re_match_2 (pbufp, string1_arg, size1, string2_arg, size2, pos, regs, mstop)
/* Compare that many; failure if mismatch, else move
past them. */
if (translate
- ? memcmp_translate (d, d2, mcnt, translate)
- : memcmp ((char *)d, (char *)d2, mcnt))
+ ? bcmp_translate (d, d2, mcnt, translate)
+ : bcmp (d, d2, mcnt))
goto fail;
d += mcnt, d2 += mcnt;
}
}
break;
- case anychar:
- PREFETCH; /* Fetch a data character. */
- /* Match anything but a newline, maybe even a null. */
- if ((translate ? translate[*d] : *d) == '\n'
- || ((obscure_syntax & RE_DOT_NOT_NULL)
- && (translate ? translate[*d] : *d) == '\000'))
- goto fail;
- SET_REGS_MATCHED;
- d++;
- break;
- case charset:
- case charset_not:
- {
- int not = 0; /* Nonzero for charset_not. */
- register int c;
- if (*(p - 1) == (unsigned char) charset_not)
- not = 1;
-
- PREFETCH; /* Fetch a data character. */
+ /* begline matches the empty string at the beginning of the string
+ (unless `not_bol' is set in `bufp'), and, if
+ `newline_anchor' is set, after newlines. */
+ case begline:
+ DEBUG_PRINT1 ("EXECUTING begline.\n");
+
+ if (AT_STRINGS_BEG (d))
+ {
+ if (!bufp->not_bol) break;
+ }
+ else if (d[-1] == '\n' && bufp->newline_anchor)
+ {
+ break;
+ }
+ /* In all other cases, we fail. */
+ goto fail;
- if (translate)
- c = translate[*d];
- else
- c = *d;
- if (c < *p * BYTEWIDTH
- && p[1 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
- not = !not;
+ /* endline is the dual of begline. */
+ case endline:
+ DEBUG_PRINT1 ("EXECUTING endline.\n");
- p += 1 + *p;
+ if (AT_STRINGS_END (d))
+ {
+ if (!bufp->not_eol) break;
+ }
+
+ /* We have to ``prefetch'' the next character. */
+ else if ((d == end1 ? *string2 : *d) == '\n'
+ && bufp->newline_anchor)
+ {
+ break;
+ }
+ goto fail;
- if (!not) goto fail;
- SET_REGS_MATCHED;
- d++;
- break;
- }
- case begline:
- if ((size1 != 0 && d == string1)
- || (size1 == 0 && size2 != 0 && d == string2)
- || (d && d[-1] == '\n')
- || (size1 == 0 && size2 == 0))
+ /* Match at the very beginning of the data. */
+ case begbuf:
+ DEBUG_PRINT1 ("EXECUTING begbuf.\n");
+ if (AT_STRINGS_BEG (d))
break;
- else
- goto fail;
-
- case endline:
- if (d == end2
- || (d == end1 ? (size2 == 0 || *string2 == '\n') : *d == '\n'))
+ goto fail;
+
+
+ /* Match at the very end of the data. */
+ case endbuf:
+ DEBUG_PRINT1 ("EXECUTING endbuf.\n");
+ if (AT_STRINGS_END (d))
break;
- goto fail;
+ goto fail;
- /* `or' constructs are handled by starting each alternative with
- an on_failure_jump that points to the start of the next
- alternative. Each alternative except the last ends with a
- jump to the joining point. (Actually, each jump except for
- the last one really jumps to the following jump, because
- tensioning the jumps is a hassle.) */
- /* The start of a stupid repeat has an on_failure_jump that points
- past the end of the repeat text. This makes a failure point so
- that on failure to match a repetition, matching restarts past
- as many repetitions have been found with no way to fail and
- look for another one. */
+ /* on_failure_keep_string_jump is used to optimize `.*\n'. It
+ pushes NULL as the value for the string on the stack. Then
+ `pop_failure_point' will keep the current value for the
+ string, instead of restoring it. To see why, consider
+ matching `foo\nbar' against `.*\n'. The .* matches the foo;
+ then the . fails against the \n. But the next thing we want
+ to do is match the \n against the \n; if we restored the
+ string value, we would be back at the foo.
+
+ Because this is used only in specific cases, we don't need to
+ check all the things that `on_failure_jump' does, to make
+ sure the right things get saved on the stack. Hence we don't
+ share its code. The only reason to push anything on the
+ stack at all is that otherwise we would have to change
+ `anychar's code to do something besides goto fail in this
+ case; that seems worse than this. */
+ case on_failure_keep_string_jump:
+ DEBUG_PRINT1 ("EXECUTING on_failure_keep_string_jump");
+
+ EXTRACT_NUMBER_AND_INCR (mcnt, p);
+ DEBUG_PRINT3 (" %d (to 0x%x):\n", mcnt, p + mcnt);
+
+ PUSH_FAILURE_POINT (p + mcnt, NULL, -2);
+ PUSH_FAILURE_POINT2(p + mcnt, NULL, -2);
+ break;
- /* A smart repeat is similar but loops back to the on_failure_jump
- so that each repetition makes another failure point. */
+ /* Uses of on_failure_jump:
+
+ Each alternative starts with an on_failure_jump that points
+ to the beginning of the next alternative. Each alternative
+ except the last ends with a jump that in effect jumps past
+ the rest of the alternatives. (They really jump to the
+ ending jump of the following alternative, because tensioning
+ these jumps is a hassle.)
+
+ Repeats start with an on_failure_jump that points past both
+ the repetition text and either the following jump or
+ pop_failure_jump back to this on_failure_jump. */
case on_failure_jump:
on_failure:
+ DEBUG_PRINT1 ("EXECUTING on_failure_jump");
+
EXTRACT_NUMBER_AND_INCR (mcnt, p);
- PUSH_FAILURE_POINT (p + mcnt, d);
+ DEBUG_PRINT3 (" %d (to 0x%x)", mcnt, p + mcnt);
+
+ /* If this on_failure_jump comes right before a group (i.e.,
+ the original * applied to a group), save the information
+ for that group and all inner ones, so that if we fail back
+ to this point, the group's information will be correct.
+ For example, in \(a*\)*\1, we need the preceding group,
+ and in \(\(a*\)b*\)\2, we need the inner group. */
+
+ /* We can't use `p' to check ahead because we push
+ a failure point to `p + mcnt' after we do this. */
+ p1 = p;
+
+ /* We need to skip no_op's before we look for the
+ start_memory in case this on_failure_jump is happening as
+ the result of a completed succeed_n, as in \(a\)\{1,3\}b\1
+ against aba. */
+ while (p1 < pend && (re_opcode_t) *p1 == no_op)
+ p1++;
+
+ if (p1 < pend && (re_opcode_t) *p1 == start_memory)
+ {
+ /* We have a new highest active register now. This will
+ get reset at the start_memory we are about to get to,
+ but we will have saved all the registers relevant to
+ this repetition op, as described above. */
+ highest_active_reg = *(p1 + 1) + *(p1 + 2);
+ if (lowest_active_reg == NO_LOWEST_ACTIVE_REG)
+ lowest_active_reg = *(p1 + 1);
+ }
+
+ DEBUG_PRINT1 (":\n");
+ PUSH_FAILURE_POINT (p + mcnt, d, -2);
+ PUSH_FAILURE_POINT2(p + mcnt, d, -2);
break;
- /* The end of a smart repeat has a maybe_finalize_jump back.
- Change it either to a finalize_jump or an ordinary jump. */
- case maybe_finalize_jump:
+
+ /* A smart repeat ends with `maybe_pop_jump'.
+ We change it to either `pop_failure_jump' or `jump'. */
+ case maybe_pop_jump:
EXTRACT_NUMBER_AND_INCR (mcnt, p);
- {
+ DEBUG_PRINT2 ("EXECUTING maybe_pop_jump %d.\n", mcnt);
+ {
register unsigned char *p2 = p;
- /* Compare what follows with the beginning of the repeat.
- If we can establish that there is nothing that they would
- both match, we can change to finalize_jump. */
- while (p2 + 1 != pend
- && (*p2 == (unsigned char) stop_memory
- || *p2 == (unsigned char) start_memory))
- p2 += 2; /* Skip over reg number. */
- if (p2 == pend)
- p[-3] = (unsigned char) finalize_jump;
- else if (*p2 == (unsigned char) exactn
- || *p2 == (unsigned char) endline)
+
+ /* Compare the beginning of the repeat with what in the
+ pattern follows its end. If we can establish that there
+ is nothing that they would both match, i.e., that we
+ would have to backtrack because of (as in, e.g., `a*a')
+ then we can change to pop_failure_jump, because we'll
+ never have to backtrack.
+
+ This is not true in the case of alternatives: in
+ `(a|ab)*' we do need to backtrack to the `ab' alternative
+ (e.g., if the string was `ab'). But instead of trying to
+ detect that here, the alternative has put on a dummy
+ failure point which is what we will end up popping. */
+
+ /* Skip over open/close-group commands. */
+ while (p2 + 2 < pend
+ && ((re_opcode_t) *p2 == stop_memory
+ || (re_opcode_t) *p2 == start_memory))
+ p2 += 3; /* Skip over args, too. */
+
+ /* If we're at the end of the pattern, we can change. */
+ if (p2 == pend)
+ {
+ /* Consider what happens when matching ":\(.*\)"
+ against ":/". I don't really understand this code
+ yet. */
+ p[-3] = (unsigned char) pop_failure_jump;
+ DEBUG_PRINT1
+ (" End of pattern: change to `pop_failure_jump'.\n");
+ }
+
+ else if ((re_opcode_t) *p2 == exactn
+ || (bufp->newline_anchor && (re_opcode_t) *p2 == endline))
{
- register int c = *p2 == (unsigned char) endline ? '\n' : p2[2];
- register unsigned char *p1 = p + mcnt;
- /* p1[0] ... p1[2] are an on_failure_jump.
- Examine what follows that. */
- if (p1[3] == (unsigned char) exactn && p1[5] != c)
- p[-3] = (unsigned char) finalize_jump;
- else if (p1[3] == (unsigned char) charset
- || p1[3] == (unsigned char) charset_not)
+ register unsigned char c
+ = *p2 == (unsigned char) endline ? '\n' : p2[2];
+ p1 = p + mcnt;
+
+ /* p1[0] ... p1[2] are the `on_failure_jump' corresponding
+ to the `maybe_finalize_jump' of this case. Examine what
+ follows. */
+ if ((re_opcode_t) p1[3] == exactn && p1[5] != c)
+ {
+ p[-3] = (unsigned char) pop_failure_jump;
+ DEBUG_PRINT3 (" %c != %c => pop_failure_jump.\n",
+ c, p1[5]);
+ }
+
+ else if ((re_opcode_t) p1[3] == charset
+ || (re_opcode_t) p1[3] == charset_not)
{
- int not = p1[3] == (unsigned char) charset_not;
- if (c < p1[4] * BYTEWIDTH
+ int not = (re_opcode_t) p1[3] == charset_not;
+
+ if (c < (unsigned char) (p1[4] * BYTEWIDTH)
&& p1[5 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
not = !not;
- /* `not' is 1 if c would match. */
- /* That means it is not safe to finalize. */
+
+ /* `not' is equal to 1 if c would match, which means
+ that we can't change to pop_failure_jump. */
if (!not)
- p[-3] = (unsigned char) finalize_jump;
+ {
+ p[-3] = (unsigned char) pop_failure_jump;
+ DEBUG_PRINT1 (" No match => pop_failure_jump.\n");
+ }
}
}
}
p -= 2; /* Point at relative address again. */
- if (p[-1] != (unsigned char) finalize_jump)
+ if ((re_opcode_t) p[-1] != pop_failure_jump)
{
- p[-1] = (unsigned char) jump;
- goto nofinalize;
+ p[-1] = (unsigned char) jump;
+ DEBUG_PRINT1 (" Match => jump.\n");
+ goto unconditional_jump;
}
/* Note fall through. */
- /* The end of a stupid repeat has a finalize_jump back to the
- start, where another failure point will be made which will
- point to after all the repetitions found so far. */
- /* Take off failure points put on by matching on_failure_jump
- because didn't fail. Also remove the register information
- put on by the on_failure_jump. */
- case finalize_jump:
- POP_FAILURE_POINT ();
- /* Note fall through. */
-
- /* Jump without taking off any failure points. */
+ /* The end of a simple repeat has a pop_failure_jump back to
+ its matching on_failure_jump, where the latter will push a
+ failure point. The pop_failure_jump takes off failure
+ points put on by this pop_failure_jump's matching
+ on_failure_jump; we got through the pattern to here from the
+ matching on_failure_jump, so didn't fail. */
+ case pop_failure_jump:
+ {
+ /* We need to pass separate storage for the lowest and
+ highest registers, even though we don't care about the
+ actual values. Otherwise, we will restore only one
+ register from the stack, since lowest will == highest in
+ `pop_failure_point'. */
+ active_reg_t dummy_low_reg, dummy_high_reg;
+ unsigned char *pdummy;
+ const char *sdummy;
+
+ DEBUG_PRINT1 ("EXECUTING pop_failure_jump.\n");
+ POP_FAILURE_POINT (sdummy, pdummy,
+ dummy_low_reg, dummy_high_reg,
+ reg_dummy, reg_dummy, reg_info_dummy);
+ }
+ /* Note fall through. */
+
+
+ /* Unconditionally jump (without popping any failure points). */
case jump:
- nofinalize:
- EXTRACT_NUMBER_AND_INCR (mcnt, p);
- p += mcnt;
+ unconditional_jump:
+ EXTRACT_NUMBER_AND_INCR (mcnt, p); /* Get the amount to jump. */
+ DEBUG_PRINT2 ("EXECUTING jump %d ", mcnt);
+ p += mcnt; /* Do the jump. */
+ DEBUG_PRINT2 ("(to 0x%x).\n", p);
break;
- case dummy_failure_jump:
- /* Normally, the on_failure_jump pushes a failure point, which
- then gets popped at finalize_jump. We will end up at
- finalize_jump, also, and with a pattern of, say, `a+', we
- are skipping over the on_failure_jump, so we have to push
- something meaningless for finalize_jump to pop. */
- PUSH_FAILURE_POINT (0, 0);
- goto nofinalize;
+
+ /* We need this opcode so we can detect where alternatives end
+ in `group_match_null_string_p' et al. */
+ case jump_past_alt:
+ DEBUG_PRINT1 ("EXECUTING jump_past_alt.\n");
+ goto unconditional_jump;
+
+ /* Normally, the on_failure_jump pushes a failure point, which
+ then gets popped at pop_failure_jump. We will end up at
+ pop_failure_jump, also, and with a pattern of, say, `a+', we
+ are skipping over the on_failure_jump, so we have to push
+ something meaningless for pop_failure_jump to pop. */
+ case dummy_failure_jump:
+ DEBUG_PRINT1 ("EXECUTING dummy_failure_jump.\n");
+ /* It doesn't matter what we push for the string here. What
+ the code at `fail' tests is the value for the pattern. */
+ PUSH_FAILURE_POINT (0, 0, -2);
+ PUSH_FAILURE_POINT2(0, 0, -2);
+ goto unconditional_jump;
+
+
+ /* At the end of an alternative, we need to push a dummy failure
+ point in case we are followed by a `pop_failure_jump', because
+ we don't want the failure point for the alternative to be
+ popped. For example, matching `(a|ab)*' against `aab'
+ requires that we match the `ab' alternative. */
+ case push_dummy_failure:
+ DEBUG_PRINT1 ("EXECUTING push_dummy_failure.\n");
+ /* See comments just above at `dummy_failure_jump' about the
+ two zeroes. */
+ PUSH_FAILURE_POINT (0, 0, -2);
+ PUSH_FAILURE_POINT2(0, 0, -2);
+ break;
- /* Have to succeed matching what follows at least n times. Then
- just handle like an on_failure_jump. */
+ /* Have to succeed matching what follows at least n times.
+ After that, handle like `on_failure_jump'. */
case succeed_n:
EXTRACT_NUMBER (mcnt, p + 2);
+ DEBUG_PRINT2 ("EXECUTING succeed_n %d.\n", mcnt);
+
+ assert (mcnt >= 0);
/* Originally, this is how many times we HAVE to succeed. */
- if (mcnt)
+ if (mcnt > 0)
{
mcnt--;
p += 2;
STORE_NUMBER_AND_INCR (p, mcnt);
+ DEBUG_PRINT3 (" Setting 0x%x to %d.\n", p, mcnt);
}
else if (mcnt == 0)
{
- p[2] = unused;
- p[3] = unused;
+ DEBUG_PRINT2 (" Setting two bytes from 0x%x to no_op.\n", p+2);
+ p[2] = (unsigned char) no_op;
+ p[3] = (unsigned char) no_op;
goto on_failure;
}
- else
- {
- fprintf (stderr, "regex: the succeed_n's n is not set.\n");
- exit (1);
- }
break;
case jump_n:
EXTRACT_NUMBER (mcnt, p + 2);
+ DEBUG_PRINT2 ("EXECUTING jump_n %d.\n", mcnt);
+
/* Originally, this is how many times we CAN jump. */
if (mcnt)
{
mcnt--;
- STORE_NUMBER(p + 2, mcnt);
- goto nofinalize; /* Do the jump without taking off
- any failure points. */
+ STORE_NUMBER (p + 2, mcnt);
+ goto unconditional_jump;
}
/* If don't have to jump any more, skip over the rest of command. */
else
@@ -2432,223 +4285,494 @@ re_match_2 (pbufp, string1_arg, size1, string2_arg, size2, pos, regs, mstop)
case set_number_at:
{
- register unsigned char *p1;
+ DEBUG_PRINT1 ("EXECUTING set_number_at.\n");
EXTRACT_NUMBER_AND_INCR (mcnt, p);
p1 = p + mcnt;
EXTRACT_NUMBER_AND_INCR (mcnt, p);
+ DEBUG_PRINT3 (" Setting 0x%x to %d.\n", p1, mcnt);
STORE_NUMBER (p1, mcnt);
break;
}
- /* Ignore these. Used to ignore the n of succeed_n's which
- currently have n == 0. */
- case unused:
- break;
-
case wordbound:
- if (AT_WORD_BOUNDARY)
+ DEBUG_PRINT1 ("EXECUTING wordbound.\n");
+ if (AT_WORD_BOUNDARY (d))
break;
- goto fail;
+ goto fail;
case notwordbound:
- if (AT_WORD_BOUNDARY)
+ DEBUG_PRINT1 ("EXECUTING notwordbound.\n");
+ if (AT_WORD_BOUNDARY (d))
goto fail;
- break;
+ break;
case wordbeg:
- if (IS_A_LETTER (d) && (!IS_A_LETTER (d - 1) || AT_STRINGS_BEG))
+ DEBUG_PRINT1 ("EXECUTING wordbeg.\n");
+ if (WORDCHAR_P (d) && (AT_STRINGS_BEG (d) || !WORDCHAR_P (d - 1)))
break;
- goto fail;
+ goto fail;
case wordend:
- /* Have to check if AT_STRINGS_BEG before looking at d - 1. */
- if (!AT_STRINGS_BEG && IS_A_LETTER (d - 1)
- && (!IS_A_LETTER (d) || AT_STRINGS_END))
+ DEBUG_PRINT1 ("EXECUTING wordend.\n");
+ if (!AT_STRINGS_BEG (d) && WORDCHAR_P (d - 1)
+ && (!WORDCHAR_P (d) || AT_STRINGS_END (d)))
break;
- goto fail;
+ goto fail;
#ifdef emacs
- case before_dot:
- if (PTR_CHAR_POS (d) >= point)
- goto fail;
- break;
-
+#ifdef emacs19
+ case before_dot:
+ DEBUG_PRINT1 ("EXECUTING before_dot.\n");
+ if (PTR_CHAR_POS ((unsigned char *) d) >= point)
+ goto fail;
+ break;
+
+ case at_dot:
+ DEBUG_PRINT1 ("EXECUTING at_dot.\n");
+ if (PTR_CHAR_POS ((unsigned char *) d) != point)
+ goto fail;
+ break;
+
+ case after_dot:
+ DEBUG_PRINT1 ("EXECUTING after_dot.\n");
+ if (PTR_CHAR_POS ((unsigned char *) d) <= point)
+ goto fail;
+ break;
+#else /* not emacs19 */
case at_dot:
- if (PTR_CHAR_POS (d) != point)
+ DEBUG_PRINT1 ("EXECUTING at_dot.\n");
+ if (PTR_CHAR_POS ((unsigned char *) d) + 1 != point)
goto fail;
break;
-
- case after_dot:
- if (PTR_CHAR_POS (d) <= point)
- goto fail;
- break;
-
- case wordchar:
- mcnt = (int) Sword;
- goto matchsyntax;
+#endif /* not emacs19 */
case syntaxspec:
+ DEBUG_PRINT2 ("EXECUTING syntaxspec %d.\n", mcnt);
mcnt = *p++;
- matchsyntax:
- PREFETCH;
- if (SYNTAX (*d++) != (enum syntaxcode) mcnt) goto fail;
- SET_REGS_MATCHED;
- break;
-
- case notwordchar:
+ goto matchsyntax;
+
+ case wordchar:
+ DEBUG_PRINT1 ("EXECUTING Emacs wordchar.\n");
mcnt = (int) Sword;
- goto matchnotsyntax;
+ matchsyntax:
+ PREFETCH ();
+ if (SYNTAX (*d++) != (enum syntaxcode) mcnt)
+ goto fail;
+ SET_REGS_MATCHED ();
+ break;
case notsyntaxspec:
+ DEBUG_PRINT2 ("EXECUTING notsyntaxspec %d.\n", mcnt);
mcnt = *p++;
- matchnotsyntax:
- PREFETCH;
- if (SYNTAX (*d++) == (enum syntaxcode) mcnt) goto fail;
- SET_REGS_MATCHED;
+ goto matchnotsyntax;
+
+ case notwordchar:
+ DEBUG_PRINT1 ("EXECUTING Emacs notwordchar.\n");
+ mcnt = (int) Sword;
+ matchnotsyntax:
+ PREFETCH ();
+ if (SYNTAX (*d++) == (enum syntaxcode) mcnt)
+ goto fail;
+ SET_REGS_MATCHED ();
break;
#else /* not emacs */
-
case wordchar:
- PREFETCH;
- if (!IS_A_LETTER (d))
+ DEBUG_PRINT1 ("EXECUTING non-Emacs wordchar.\n");
+ PREFETCH ();
+ if (!WORDCHAR_P (d))
goto fail;
- SET_REGS_MATCHED;
+ SET_REGS_MATCHED ();
+ d++;
break;
case notwordchar:
- PREFETCH;
- if (IS_A_LETTER (d))
+ DEBUG_PRINT1 ("EXECUTING non-Emacs notwordchar.\n");
+ PREFETCH ();
+ if (WORDCHAR_P (d))
goto fail;
- SET_REGS_MATCHED;
- break;
-
- case before_dot:
- case at_dot:
- case after_dot:
- case syntaxspec:
- case notsyntaxspec:
+ SET_REGS_MATCHED ();
+ d++;
break;
-
#endif /* not emacs */
-
- case begbuf:
- if (AT_STRINGS_BEG)
- break;
- goto fail;
-
- case endbuf:
- if (AT_STRINGS_END)
- break;
- goto fail;
-
- case exactn:
- /* Match the next few pattern characters exactly.
- mcnt is how many characters to match. */
- mcnt = *p++;
- /* This is written out as an if-else so we don't waste time
- testing `translate' inside the loop. */
- if (translate)
- {
- do
- {
- PREFETCH;
- if (translate[*d++] != *p++) goto fail;
- }
- while (--mcnt);
- }
- else
- {
- do
- {
- PREFETCH;
- if (*d++ != *p++) goto fail;
- }
- while (--mcnt);
- }
- SET_REGS_MATCHED;
- break;
+
+ default:
+ abort ();
}
continue; /* Successfully executed one pattern command; keep going. */
- /* Jump here if any matching operation fails. */
+
+ /* We goto here if a matching operation fails. */
fail:
- if (stackp != stackb)
- /* A restart point is known. Restart there and pop it. */
- {
- short last_used_reg, this_reg;
-
- /* If this failure point is from a dummy_failure_point, just
- skip it. */
- if (!stackp[-2])
+ if (!FAIL_STACK_EMPTY ())
+ { /* A restart point is known. Restore to that state. */
+ DEBUG_PRINT1 ("\nFAIL:\n");
+ POP_FAILURE_POINT (d, p,
+ lowest_active_reg, highest_active_reg,
+ regstart, regend, reg_info);
+
+ /* If this failure point is a dummy, try the next one. */
+ if (!p)
+ goto fail;
+
+ /* If we failed to the end of the pattern, don't examine *p. */
+ assert (p <= pend);
+ if (p < pend)
{
- POP_FAILURE_POINT ();
- goto fail;
+ boolean is_a_jump_n = false;
+
+ /* If failed to a backwards jump that's part of a repetition
+ loop, need to pop this failure point and use the next one. */
+ switch ((re_opcode_t) *p)
+ {
+ case jump_n:
+ is_a_jump_n = true;
+ case maybe_pop_jump:
+ case pop_failure_jump:
+ case jump:
+ p1 = p + 1;
+ EXTRACT_NUMBER_AND_INCR (mcnt, p1);
+ p1 += mcnt;
+
+ if ((is_a_jump_n && (re_opcode_t) *p1 == succeed_n)
+ || (!is_a_jump_n
+ && (re_opcode_t) *p1 == on_failure_jump))
+ goto fail;
+ break;
+ default:
+ /* do nothing */ ;
+ }
}
- d = *--stackp;
- p = *--stackp;
if (d >= string1 && d <= end1)
dend = end_match_1;
- /* Restore register info. */
- last_used_reg = (long) *--stackp;
-
- /* Make the ones that weren't saved -1 or 0 again. */
- for (this_reg = RE_NREGS - 1; this_reg > last_used_reg; this_reg--)
- {
- regend[this_reg] = (unsigned char *) (-1L);
- regstart[this_reg] = (unsigned char *) (-1L);
- IS_ACTIVE (reg_info[this_reg]) = 0;
- MATCHED_SOMETHING (reg_info[this_reg]) = 0;
- }
-
- /* And restore the rest from the stack. */
- for ( ; this_reg > 0; this_reg--)
- {
- reg_info[this_reg] = *(struct register_info *) *--stackp;
- regend[this_reg] = *--stackp;
- regstart[this_reg] = *--stackp;
- }
- }
+ }
else
break; /* Matching at this starting point really fails. */
- }
+ } /* for (;;) */
if (best_regs_set)
goto restore_best_regs;
- FREE_AND_RETURN(stackb,(-1)); /* Failure to match. */
-}
+ FREE_VARIABLES ();
+
+ return -1; /* Failure to match. */
+} /* re_match_2 */
+
+/* Subroutine definitions for re_match_2. */
+
+
+/* We are passed P pointing to a register number after a start_memory.
+
+ Return true if the pattern up to the corresponding stop_memory can
+ match the empty string, and false otherwise.
+
+ If we find the matching stop_memory, sets P to point to one past its number.
+ Otherwise, sets P to an undefined byte less than or equal to END.
+
+ We don't handle duplicates properly (yet). */
+
+static boolean
+group_match_null_string_p (p, end, reg_info)
+ unsigned char **p, *end;
+ register_info_type *reg_info;
+{
+ int mcnt;
+ /* Point to after the args to the start_memory. */
+ unsigned char *p1 = *p + 2;
+
+ while (p1 < end)
+ {
+ /* Skip over opcodes that can match nothing, and return true or
+ false, as appropriate, when we get to one that can't, or to the
+ matching stop_memory. */
+
+ switch ((re_opcode_t) *p1)
+ {
+ /* Could be either a loop or a series of alternatives. */
+ case on_failure_jump:
+ p1++;
+ EXTRACT_NUMBER_AND_INCR (mcnt, p1);
+
+ /* If the next operation is not a jump backwards in the
+ pattern. */
+
+ if (mcnt >= 0)
+ {
+ /* Go through the on_failure_jumps of the alternatives,
+ seeing if any of the alternatives cannot match nothing.
+ The last alternative starts with only a jump,
+ whereas the rest start with on_failure_jump and end
+ with a jump, e.g., here is the pattern for `a|b|c':
+
+ /on_failure_jump/0/6/exactn/1/a/jump_past_alt/0/6
+ /on_failure_jump/0/6/exactn/1/b/jump_past_alt/0/3
+ /exactn/1/c
+
+ So, we have to first go through the first (n-1)
+ alternatives and then deal with the last one separately. */
+
+
+ /* Deal with the first (n-1) alternatives, which start
+ with an on_failure_jump (see above) that jumps to right
+ past a jump_past_alt. */
+
+ while ((re_opcode_t) p1[mcnt-3] == jump_past_alt)
+ {
+ /* `mcnt' holds how many bytes long the alternative
+ is, including the ending `jump_past_alt' and
+ its number. */
+
+ if (!alt_match_null_string_p (p1, p1 + mcnt - 3,
+ reg_info))
+ return false;
+
+ /* Move to right after this alternative, including the
+ jump_past_alt. */
+ p1 += mcnt;
+
+ /* Break if it's the beginning of an n-th alternative
+ that doesn't begin with an on_failure_jump. */
+ if ((re_opcode_t) *p1 != on_failure_jump)
+ break;
+
+ /* Still have to check that it's not an n-th
+ alternative that starts with an on_failure_jump. */
+ p1++;
+ EXTRACT_NUMBER_AND_INCR (mcnt, p1);
+ if ((re_opcode_t) p1[mcnt-3] != jump_past_alt)
+ {
+ /* Get to the beginning of the n-th alternative. */
+ p1 -= 3;
+ break;
+ }
+ }
+
+ /* Deal with the last alternative: go back and get number
+ of the `jump_past_alt' just before it. `mcnt' contains
+ the length of the alternative. */
+ EXTRACT_NUMBER (mcnt, p1 - 2);
+
+ if (!alt_match_null_string_p (p1, p1 + mcnt, reg_info))
+ return false;
+
+ p1 += mcnt; /* Get past the n-th alternative. */
+ } /* if mcnt > 0 */
+ break;
+
+
+ case stop_memory:
+ assert (p1[1] == **p);
+ *p = p1 + 2;
+ return true;
+
+
+ default:
+ if (!common_op_match_null_string_p (&p1, end, reg_info))
+ return false;
+ }
+ } /* while p1 < end */
+
+ return false;
+} /* group_match_null_string_p */
+
+
+/* Similar to group_match_null_string_p, but doesn't deal with alternatives:
+ It expects P to be the first byte of a single alternative and END one
+ byte past the last. The alternative can contain groups. */
+
+static boolean
+alt_match_null_string_p (p, end, reg_info)
+ unsigned char *p, *end;
+ register_info_type *reg_info;
+{
+ int mcnt;
+ unsigned char *p1 = p;
+
+ while (p1 < end)
+ {
+ /* Skip over opcodes that can match nothing, and break when we get
+ to one that can't. */
+
+ switch ((re_opcode_t) *p1)
+ {
+ /* It's a loop. */
+ case on_failure_jump:
+ p1++;
+ EXTRACT_NUMBER_AND_INCR (mcnt, p1);
+ p1 += mcnt;
+ break;
+
+ default:
+ if (!common_op_match_null_string_p (&p1, end, reg_info))
+ return false;
+ }
+ } /* while p1 < end */
+
+ return true;
+} /* alt_match_null_string_p */
+
+
+/* Deals with the ops common to group_match_null_string_p and
+ alt_match_null_string_p.
+
+ Sets P to one after the op and its arguments, if any. */
+
+static boolean
+common_op_match_null_string_p (p, end, reg_info)
+ unsigned char **p, *end;
+ register_info_type *reg_info;
+{
+ int mcnt;
+ boolean ret;
+ int reg_no;
+ unsigned char *p1 = *p;
+
+ switch ((re_opcode_t) *p1++)
+ {
+ case no_op:
+ case begline:
+ case endline:
+ case begbuf:
+ case endbuf:
+ case wordbeg:
+ case wordend:
+ case wordbound:
+ case notwordbound:
+#ifdef emacs
+ case before_dot:
+ case at_dot:
+ case after_dot:
+#endif
+ break;
+
+ case start_memory:
+ reg_no = *p1;
+ assert (reg_no > 0 && reg_no <= MAX_REGNUM);
+ ret = group_match_null_string_p (&p1, end, reg_info);
+
+ /* Have to set this here in case we're checking a group which
+ contains a group and a back reference to it. */
+
+ if (REG_MATCH_NULL_STRING_P (reg_info[reg_no]) == MATCH_NULL_UNSET_VALUE)
+ REG_MATCH_NULL_STRING_P (reg_info[reg_no]) = ret;
+
+ if (!ret)
+ return false;
+ break;
+
+ /* If this is an optimized succeed_n for zero times, make the jump. */
+ case jump:
+ EXTRACT_NUMBER_AND_INCR (mcnt, p1);
+ if (mcnt >= 0)
+ p1 += mcnt;
+ else
+ return false;
+ break;
+
+ case succeed_n:
+ /* Get to the number of times to succeed. */
+ p1 += 2;
+ EXTRACT_NUMBER_AND_INCR (mcnt, p1);
+
+ if (mcnt == 0)
+ {
+ p1 -= 4;
+ EXTRACT_NUMBER_AND_INCR (mcnt, p1);
+ p1 += mcnt;
+ }
+ else
+ return false;
+ break;
+
+ case duplicate:
+ if (!REG_MATCH_NULL_STRING_P (reg_info[*p1]))
+ return false;
+ break;
+
+ case set_number_at:
+ p1 += 4;
+
+ default:
+ /* All other opcodes mean we cannot match the empty string. */
+ return false;
+ }
+
+ *p = p1;
+ return true;
+} /* common_op_match_null_string_p */
+/* Return zero if TRANSLATE[S1] and TRANSLATE[S2] are identical for LEN
+ bytes; nonzero otherwise. */
+
static int
-memcmp_translate (s1, s2, len, translate)
- unsigned char *s1, *s2;
+bcmp_translate (s1, s2, len, translate)
+ const char *s1, *s2;
register int len;
- unsigned char *translate;
+ char *translate;
{
- register unsigned char *p1 = s1, *p2 = s2;
+ register const unsigned char *p1 = (const unsigned char *) s1,
+ *p2 = (const unsigned char *) s2;
while (len)
{
- if (translate [*p1++] != translate [*p2++]) return 1;
+ if (translate[*p1++] != translate[*p2++]) return 1;
len--;
}
return 0;
}
+
+/* Entry points for GNU code. */
+/* re_compile_pattern is the GNU regular expression compiler: it
+ compiles PATTERN (of length SIZE) and puts the result in BUFP.
+ Returns 0 if the pattern was valid, otherwise an error string.
+
+ Assumes the `allocated' (and perhaps `buffer') and `translate' fields
+ are set in BUFP on entry.
+
+ We call regex_compile to do the actual compilation. */
+
+const char *
+re_compile_pattern (pattern, length, bufp)
+ const char *pattern;
+ size_t length;
+ struct re_pattern_buffer *bufp;
+{
+ reg_errcode_t ret;
+
+ /* GNU code is written to assume at least RE_NREGS registers will be set
+ (and at least one extra will be -1). */
+ bufp->regs_allocated = REGS_UNALLOCATED;
+
+ /* And GNU code determines whether or not to get register information
+ by passing null for the REGS argument to re_match, etc., not by
+ setting no_sub. */
+ bufp->no_sub = 0;
+
+ /* Match anchors at newline. */
+ bufp->newline_anchor = 1;
+
+ ret = regex_compile (pattern, length, re_syntax_options, bufp);
+ return re_error_msg[(int) ret];
+}
-/* Entry points compatible with 4.2 BSD regex library. */
+/* Entry points compatible with 4.2 BSD regex library. We don't define
+ them if this is an Emacs or POSIX compilation. */
-#if !defined(emacs) && !defined(GAWK)
+#if !defined (emacs) && !defined (_POSIX_SOURCE)
+/* BSD has one and only one pattern buffer. */
static struct re_pattern_buffer re_comp_buf;
char *
re_comp (s)
- char *s;
+ const char *s;
{
+ reg_errcode_t ret;
+
if (!s)
{
if (!re_comp_buf.buffer)
@@ -2658,197 +4782,289 @@ re_comp (s)
if (!re_comp_buf.buffer)
{
- if (!(re_comp_buf.buffer = (char *) malloc (200)))
- return "Memory exhausted";
+ re_comp_buf.buffer = (unsigned char *) malloc (200);
+ if (re_comp_buf.buffer == NULL)
+ return "Memory exhausted";
re_comp_buf.allocated = 200;
- if (!(re_comp_buf.fastmap = (char *) malloc (1 << BYTEWIDTH)))
+
+ re_comp_buf.fastmap = (char *) malloc (1 << BYTEWIDTH);
+ if (re_comp_buf.fastmap == NULL)
return "Memory exhausted";
}
- return re_compile_pattern (s, strlen (s), &re_comp_buf);
+
+ /* Since `re_exec' always passes NULL for the `regs' argument, we
+ don't need to initialize the pattern buffer fields which affect it. */
+
+ /* Match anchors at newlines. */
+ re_comp_buf.newline_anchor = 1;
+
+ ret = regex_compile (s, strlen (s), re_syntax_options, &re_comp_buf);
+
+ /* Yes, we're discarding `const' here. */
+ return (char *) re_error_msg[(int) ret];
}
+
int
re_exec (s)
- char *s;
+ const char *s;
{
- int len = strlen (s);
- return 0 <= re_search (&re_comp_buf, s, len, 0, len,
- (struct re_registers *) 0);
+ const int len = strlen (s);
+ return
+ 0 <= re_search (&re_comp_buf, s, len, 0, len, (struct re_registers *) 0);
}
-#endif /* not emacs && not GAWK */
+#endif /* not emacs and not _POSIX_SOURCE */
+
+/* POSIX.2 functions. Don't define these for Emacs. */
+#ifndef emacs
-
-#ifdef test
+/* regcomp takes a regular expression as a string and compiles it.
-#ifdef atarist
-long _stksize = 2L; /* reserve memory for stack */
-#endif
-#include <stdio.h>
+ PREG is a regex_t *. We do not expect any fields to be initialized,
+ since POSIX says we shouldn't. Thus, we set
-/* Indexed by a character, gives the upper case equivalent of the
- character. */
-
-char upcase[0400] =
- { 000, 001, 002, 003, 004, 005, 006, 007,
- 010, 011, 012, 013, 014, 015, 016, 017,
- 020, 021, 022, 023, 024, 025, 026, 027,
- 030, 031, 032, 033, 034, 035, 036, 037,
- 040, 041, 042, 043, 044, 045, 046, 047,
- 050, 051, 052, 053, 054, 055, 056, 057,
- 060, 061, 062, 063, 064, 065, 066, 067,
- 070, 071, 072, 073, 074, 075, 076, 077,
- 0100, 0101, 0102, 0103, 0104, 0105, 0106, 0107,
- 0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117,
- 0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127,
- 0130, 0131, 0132, 0133, 0134, 0135, 0136, 0137,
- 0140, 0101, 0102, 0103, 0104, 0105, 0106, 0107,
- 0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117,
- 0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127,
- 0130, 0131, 0132, 0173, 0174, 0175, 0176, 0177,
- 0200, 0201, 0202, 0203, 0204, 0205, 0206, 0207,
- 0210, 0211, 0212, 0213, 0214, 0215, 0216, 0217,
- 0220, 0221, 0222, 0223, 0224, 0225, 0226, 0227,
- 0230, 0231, 0232, 0233, 0234, 0235, 0236, 0237,
- 0240, 0241, 0242, 0243, 0244, 0245, 0246, 0247,
- 0250, 0251, 0252, 0253, 0254, 0255, 0256, 0257,
- 0260, 0261, 0262, 0263, 0264, 0265, 0266, 0267,
- 0270, 0271, 0272, 0273, 0274, 0275, 0276, 0277,
- 0300, 0301, 0302, 0303, 0304, 0305, 0306, 0307,
- 0310, 0311, 0312, 0313, 0314, 0315, 0316, 0317,
- 0320, 0321, 0322, 0323, 0324, 0325, 0326, 0327,
- 0330, 0331, 0332, 0333, 0334, 0335, 0336, 0337,
- 0340, 0341, 0342, 0343, 0344, 0345, 0346, 0347,
- 0350, 0351, 0352, 0353, 0354, 0355, 0356, 0357,
- 0360, 0361, 0362, 0363, 0364, 0365, 0366, 0367,
- 0370, 0371, 0372, 0373, 0374, 0375, 0376, 0377
- };
+ `buffer' to the compiled pattern;
+ `used' to the length of the compiled pattern;
+ `syntax' to RE_SYNTAX_POSIX_EXTENDED if the
+ REG_EXTENDED bit in CFLAGS is set; otherwise, to
+ RE_SYNTAX_POSIX_BASIC;
+ `newline_anchor' to REG_NEWLINE being set in CFLAGS;
+ `fastmap' and `fastmap_accurate' to zero;
+ `re_nsub' to the number of subexpressions in PATTERN.
-#ifdef canned
+ PATTERN is the address of the pattern string.
-#include "tests.h"
+ CFLAGS is a series of bits which affect compilation.
-typedef enum { extended_test, basic_test } test_type;
+ If REG_EXTENDED is set, we use POSIX extended syntax; otherwise, we
+ use POSIX basic syntax.
-/* Use this to run the tests we've thought of. */
+ If REG_NEWLINE is set, then . and [^...] don't match newline.
+ Also, regexec will try a match beginning after every newline.
-void
-main ()
-{
- test_type t = extended_test;
+ If REG_ICASE is set, then we considers upper- and lowercase
+ versions of letters to be equivalent when matching.
+
+ If REG_NOSUB is set, then when PREG is passed to regexec, that
+ routine will report only success or failure, and nothing about the
+ registers.
+
+ It returns 0 if it succeeds, nonzero if it doesn't. (See regex.h for
+ the return codes and their meanings.) */
- if (t == basic_test)
+int
+regcomp (preg, pattern, cflags)
+ regex_t *preg;
+ const char *pattern;
+ int cflags;
+{
+ reg_errcode_t ret;
+ reg_syntax_t syntax
+ = (cflags & REG_EXTENDED) ?
+ RE_SYNTAX_POSIX_EXTENDED : RE_SYNTAX_POSIX_BASIC;
+
+ /* regex_compile will allocate the space for the compiled pattern. */
+ preg->buffer = 0;
+ preg->allocated = 0;
+ preg->used = 0;
+
+ /* Don't bother to use a fastmap when searching. This simplifies the
+ REG_NEWLINE case: if we used a fastmap, we'd have to put all the
+ characters after newlines into the fastmap. This way, we just try
+ every character. */
+ preg->fastmap = 0;
+
+ if (cflags & REG_ICASE)
{
- printf ("Running basic tests:\n\n");
- test_posix_basic ();
+ unsigned i;
+
+ preg->translate = (char *) malloc (CHAR_SET_SIZE);
+ if (preg->translate == NULL)
+ return (int) REG_ESPACE;
+
+ /* Map uppercase characters to corresponding lowercase ones. */
+ for (i = 0; i < CHAR_SET_SIZE; i++)
+ preg->translate[i] = ISUPPER (i) ? tolower (i) : i;
}
- else if (t == extended_test)
- {
- printf ("Running extended tests:\n\n");
- test_posix_extended ();
+ else
+ preg->translate = NULL;
+
+ /* If REG_NEWLINE is set, newlines are treated differently. */
+ if (cflags & REG_NEWLINE)
+ { /* REG_NEWLINE implies neither . nor [^...] match newline. */
+ syntax &= ~RE_DOT_NEWLINE;
+ syntax |= RE_HAT_LISTS_NOT_NEWLINE;
+ /* It also changes the matching behavior. */
+ preg->newline_anchor = 1;
}
+ else
+ preg->newline_anchor = 0;
+
+ preg->no_sub = !!(cflags & REG_NOSUB);
+
+ /* POSIX says a null character in the pattern terminates it, so we
+ can use strlen here in compiling the pattern. */
+ ret = regex_compile (pattern, strlen (pattern), syntax, preg);
+
+ /* POSIX doesn't distinguish between an unmatched open-group and an
+ unmatched close-group: both are REG_EPAREN. */
+ if (ret == REG_ERPAREN) ret = REG_EPAREN;
+
+ return (int) ret;
}
-#else /* not canned */
-/* Use this to run interactive tests. */
+/* regexec searches for a given pattern, specified by PREG, in the
+ string STRING.
+
+ If NMATCH is zero or REG_NOSUB was set in the cflags argument to
+ `regcomp', we ignore PMATCH. Otherwise, we assume PMATCH has at
+ least NMATCH elements, and we set them to the offsets of the
+ corresponding matched substrings.
+
+ EFLAGS specifies `execution flags' which affect matching: if
+ REG_NOTBOL is set, then ^ does not match at the beginning of the
+ string; if REG_NOTEOL is set, then $ does not match at the end.
+
+ We return 0 if we find a match and REG_NOMATCH if not. */
-void
-main (argc, argv)
- int argc;
- char **argv;
+int
+regexec (preg, string, nmatch, pmatch, eflags)
+ const regex_t *preg;
+ const char *string;
+ size_t nmatch;
+ regmatch_t pmatch[];
+ int eflags;
{
- char pat[80];
- struct re_pattern_buffer buf;
- int i;
- char c;
- char fastmap[(1 << BYTEWIDTH)];
-
- /* Allow a command argument to specify the style of syntax. */
- if (argc > 1)
- obscure_syntax = atol (argv[1]);
-
- buf.allocated = 40;
- buf.buffer = (char *) malloc (buf.allocated);
- buf.fastmap = fastmap;
- buf.translate = upcase;
-
- while (1)
+ int ret;
+ struct re_registers regs;
+ regex_t private_preg;
+ int len = strlen (string);
+ boolean want_reg_info = !preg->no_sub && nmatch > 0;
+
+ private_preg = *preg;
+
+ private_preg.not_bol = !!(eflags & REG_NOTBOL);
+ private_preg.not_eol = !!(eflags & REG_NOTEOL);
+
+ /* The user has told us exactly how many registers to return
+ information about, via `nmatch'. We have to pass that on to the
+ matching routines. */
+ private_preg.regs_allocated = REGS_FIXED;
+
+ if (want_reg_info)
{
- gets (pat);
+ regs.num_regs = nmatch;
+ regs.start = TALLOC (nmatch, regoff_t);
+ regs.end = TALLOC (nmatch, regoff_t);
+ if (regs.start == NULL || regs.end == NULL)
+ return (int) REG_NOMATCH;
+ }
- if (*pat)
- {
- re_compile_pattern (pat, strlen(pat), &buf);
+ /* Perform the searching operation. */
+ ret = re_search (&private_preg, string, len,
+ /* start: */ 0, /* range: */ len,
+ want_reg_info ? &regs : (struct re_registers *) 0);
+
+ /* Copy the register information to the POSIX structure. */
+ if (want_reg_info)
+ {
+ if (ret >= 0)
+ {
+ unsigned r;
- for (i = 0; i < buf.used; i++)
- printchar (buf.buffer[i]);
+ for (r = 0; r < nmatch; r++)
+ {
+ pmatch[r].rm_so = regs.start[r];
+ pmatch[r].rm_eo = regs.end[r];
+ }
+ }
- putchar ('\n');
+ /* If we needed the temporary register info, free the space now. */
+ free (regs.start);
+ free (regs.end);
+ }
- printf ("%d allocated, %d used.\n", buf.allocated, buf.used);
+ /* We want zero return to mean success, unlike `re_search'. */
+ return ret >= 0 ? (int) REG_NOERROR : (int) REG_NOMATCH;
+}
- re_compile_fastmap (&buf);
- printf ("Allowed by fastmap: ");
- for (i = 0; i < (1 << BYTEWIDTH); i++)
- if (fastmap[i]) printchar (i);
- putchar ('\n');
- }
- gets (pat); /* Now read the string to match against */
+/* Returns a message corresponding to an error code, ERRCODE, returned
+ from either regcomp or regexec. We don't use PREG here. */
- i = re_match (&buf, pat, strlen (pat), 0, 0);
- printf ("Match value %d.\n", i);
- }
-}
+size_t
+regerror (errcode, preg, errbuf, errbuf_size)
+ int errcode;
+ const regex_t *preg;
+ char *errbuf;
+ size_t errbuf_size;
+{
+ const char *msg;
+ size_t msg_size;
-#endif
+ if (errcode < 0
+ || errcode >= (sizeof (re_error_msg) / sizeof (re_error_msg[0])))
+ /* Only error codes returned by the rest of the code should be passed
+ to this routine. If we are given anything else, or if other regex
+ code generates an invalid error code, then the program has a bug.
+ Dump core so we can fix it. */
+ abort ();
+ msg = re_error_msg[errcode];
-#ifdef NOTDEF
-print_buf (bufp)
- struct re_pattern_buffer *bufp;
-{
- int i;
+ /* POSIX doesn't require that we do anything in this case, but why
+ not be nice. */
+ if (! msg)
+ msg = "Success";
- printf ("buf is :\n----------------\n");
- for (i = 0; i < bufp->used; i++)
- printchar (bufp->buffer[i]);
-
- printf ("\n%d allocated, %d used.\n", bufp->allocated, bufp->used);
+ msg_size = strlen (msg) + 1; /* Includes the null. */
- printf ("Allowed by fastmap: ");
- for (i = 0; i < (1 << BYTEWIDTH); i++)
- if (bufp->fastmap[i])
- printchar (i);
- printf ("\nAllowed by translate: ");
- if (bufp->translate)
- for (i = 0; i < (1 << BYTEWIDTH); i++)
- if (bufp->translate[i])
- printchar (i);
- printf ("\nfastmap is%s accurate\n", bufp->fastmap_accurate ? "" : "n't");
- printf ("can %s be null\n----------", bufp->can_be_null ? "" : "not");
-}
-#endif /* NOTDEF */
-
-printchar (c)
- char c;
-{
- if (c < 040 || c >= 0177)
+ if (errbuf_size != 0)
{
- putchar ('\\');
- putchar (((c >> 6) & 3) + '0');
- putchar (((c >> 3) & 7) + '0');
- putchar ((c & 7) + '0');
+ if (msg_size > errbuf_size)
+ {
+ strncpy (errbuf, msg, errbuf_size - 1);
+ errbuf[errbuf_size - 1] = 0;
+ }
+ else
+ strcpy (errbuf, msg);
}
- else
- putchar (c);
+
+ return msg_size;
}
-error (string)
- char *string;
+
+/* Free dynamically allocated space used by PREG. */
+
+void
+regfree (preg)
+ regex_t *preg;
{
- puts (string);
- exit (1);
+ if (preg->buffer != NULL)
+ free (preg->buffer);
+ preg->buffer = NULL;
+
+ preg->allocated = 0;
+ preg->used = 0;
+
+ if (preg->fastmap != NULL)
+ free (preg->fastmap);
+ preg->fastmap = NULL;
+ preg->fastmap_accurate = 0;
+
+ if (preg->translate != NULL)
+ free (preg->translate);
+ preg->translate = NULL;
}
-#endif /* test */
+
+#endif /* not emacs */
+
+/*
+Local variables:
+make-backup-files: t
+version-control: t
+trim-versions-without-asking: nil
+End:
+*/
diff --git a/gnu/usr.bin/awk/regex.h b/gnu/usr.bin/awk/regex.h
index fce11c3a97dd..757dbacf7a28 100644
--- a/gnu/usr.bin/awk/regex.h
+++ b/gnu/usr.bin/awk/regex.h
@@ -1,10 +1,11 @@
-/* Definitions for data structures callers pass the regex library.
+/* Definitions for data structures and routines for the regular
+ expression library, version 0.12.
- Copyright (C) 1985, 1989-90 Free Software Foundation, Inc.
+ Copyright (C) 1985, 1989, 1990, 1991, 1992, 1993 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 1, or (at your option)
+ the Free Software Foundation; either version 2, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
@@ -16,245 +17,489 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
+#ifndef __REGEXP_LIBRARY_H__
+#define __REGEXP_LIBRARY_H__
-#ifndef __REGEXP_LIBRARY
-#define __REGEXP_LIBRARY
+/* POSIX says that <sys/types.h> must be included (by the caller) before
+ <regex.h>. */
-/* Define number of parens for which we record the beginnings and ends.
- This affects how much space the `struct re_registers' type takes up. */
-#ifndef RE_NREGS
-#define RE_NREGS 10
+#ifdef VMS
+/* VMS doesn't have `size_t' in <sys/types.h>, even though POSIX says it
+ should be there. */
+#include <stddef.h>
#endif
-#define BYTEWIDTH 8
+/* The following two types have to be signed and unsigned integer type
+ wide enough to hold a value of a pointer. For most ANSI compilers
+ ptrdiff_t and size_t should be likely OK. Still size of these two
+ types is 2 for Microsoft C. Ugh... */
+typedef long s_reg_t;
+typedef unsigned long active_reg_t;
-/* Maximum number of duplicates an interval can allow. */
-#ifndef RE_DUP_MAX
-#define RE_DUP_MAX ((1 << 15) - 1)
-#endif
+/* The following bits are used to determine the regexp syntax we
+ recognize. The set/not-set meanings are chosen so that Emacs syntax
+ remains the value 0. The bits are given in alphabetical order, and
+ the definitions shifted by one from the previous bit; thus, when we
+ add or remove a bit, only one other definition need change. */
+typedef unsigned long reg_syntax_t;
+
+/* If this bit is not set, then \ inside a bracket expression is literal.
+ If set, then such a \ quotes the following character. */
+#define RE_BACKSLASH_ESCAPE_IN_LISTS (1L)
+/* If this bit is not set, then + and ? are operators, and \+ and \? are
+ literals.
+ If set, then \+ and \? are operators and + and ? are literals. */
+#define RE_BK_PLUS_QM (RE_BACKSLASH_ESCAPE_IN_LISTS << 1)
-/* This defines the various regexp syntaxes. */
-extern long obscure_syntax;
-
-
-/* The following bits are used in the obscure_syntax variable to choose among
- alternative regexp syntaxes. */
-
-/* If this bit is set, plain parentheses serve as grouping, and backslash
- parentheses are needed for literal searching.
- If not set, backslash-parentheses are grouping, and plain parentheses
- are for literal searching. */
-#define RE_NO_BK_PARENS 1L
-
-/* If this bit is set, plain | serves as the `or'-operator, and \| is a
- literal.
- If not set, \| serves as the `or'-operator, and | is a literal. */
-#define RE_NO_BK_VBAR (1L << 1)
-
-/* If this bit is not set, plain + or ? serves as an operator, and \+, \? are
- literals.
- If set, \+, \? are operators and plain +, ? are literals. */
-#define RE_BK_PLUS_QM (1L << 2)
-
-/* If this bit is set, | binds tighter than ^ or $.
- If not set, the contrary. */
-#define RE_TIGHT_VBAR (1L << 3)
-
-/* If this bit is set, then treat newline as an OR operator.
- If not set, treat it as a normal character. */
-#define RE_NEWLINE_OR (1L << 4)
-
-/* If this bit is set, then special characters may act as normal
- characters in some contexts. Specifically, this applies to:
- ^ -- only special at the beginning, or after ( or |;
- $ -- only special at the end, or before ) or |;
- *, +, ? -- only special when not after the beginning, (, or |.
- If this bit is not set, special characters (such as *, ^, and $)
- always have their special meaning regardless of the surrounding
- context. */
-#define RE_CONTEXT_INDEP_OPS (1L << 5)
-
-/* If this bit is not set, then \ before anything inside [ and ] is taken as
- a real \.
- If set, then such a \ escapes the following character. This is a
- special case for awk. */
-#define RE_AWK_CLASS_HACK (1L << 6)
-
-/* If this bit is set, then \{ and \} or { and } serve as interval operators.
- If not set, then \{ and \} and { and } are treated as literals. */
-#define RE_INTERVALS (1L << 7)
-
-/* If this bit is not set, then \{ and \} serve as interval operators and
- { and } are literals.
- If set, then { and } serve as interval operators and \{ and \} are
- literals. */
-#define RE_NO_BK_CURLY_BRACES (1L << 8)
-
-/* If this bit is set, then character classes are supported; they are:
- [:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:],
+/* If this bit is set, then character classes are supported. They are:
+ [:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:],
[:space:], [:print:], [:punct:], [:graph:], and [:cntrl:].
If not set, then character classes are not supported. */
-#define RE_CHAR_CLASSES (1L << 9)
-
-/* If this bit is set, then the dot re doesn't match a null byte.
- If not set, it does. */
-#define RE_DOT_NOT_NULL (1L << 10)
-
-/* If this bit is set, then [^...] doesn't match a newline.
- If not set, it does. */
-#define RE_HAT_NOT_NEWLINE (1L << 11)
-
-/* If this bit is set, back references are recognized.
- If not set, they aren't. */
-#define RE_NO_BK_REFS (1L << 12)
-
-/* If this bit is set, back references must refer to a preceding
- subexpression. If not set, a back reference to a nonexistent
- subexpression is treated as literal characters. */
-#define RE_NO_EMPTY_BK_REF (1L << 13)
-
-/* If this bit is set, bracket expressions can't be empty.
- If it is set, they can be empty. */
-#define RE_NO_EMPTY_BRACKETS (1L << 14)
-
-/* If this bit is set, then *, +, ? and { cannot be first in an re or
- immediately after a |, or a (. Furthermore, a | cannot be first or
- last in an re, or immediately follow another | or a (. Also, a ^
- cannot appear in a nonleading position and a $ cannot appear in a
- nontrailing position (outside of bracket expressions, that is). */
-#define RE_CONTEXTUAL_INVALID_OPS (1L << 15)
-
-/* If this bit is set, then +, ? and | aren't recognized as operators.
- If it's not, they are. */
-#define RE_LIMITED_OPS (1L << 16)
-
-/* If this bit is set, then an ending range point has to collate higher
- or equal to the starting range point.
- If it's not set, then when the ending range point collates higher
- than the starting range point, the range is just considered empty. */
-#define RE_NO_EMPTY_RANGES (1L << 17)
-
-/* If this bit is set, then a hyphen (-) can't be an ending range point.
- If it isn't, then it can. */
-#define RE_NO_HYPHEN_RANGE_END (1L << 18)
-
-
-/* Define combinations of bits for the standard possibilities. */
-#define RE_SYNTAX_POSIX_AWK (RE_NO_BK_PARENS | RE_NO_BK_VBAR \
- | RE_CONTEXT_INDEP_OPS)
-#define RE_SYNTAX_AWK (RE_NO_BK_PARENS | RE_NO_BK_VBAR | RE_AWK_CLASS_HACK)
-#define RE_SYNTAX_EGREP (RE_NO_BK_PARENS | RE_NO_BK_VBAR \
- | RE_CONTEXT_INDEP_OPS | RE_NEWLINE_OR)
-#define RE_SYNTAX_GREP (RE_BK_PLUS_QM | RE_NEWLINE_OR)
+#define RE_CHAR_CLASSES (RE_BK_PLUS_QM << 1)
+
+/* If this bit is set, then ^ and $ are always anchors (outside bracket
+ expressions, of course).
+ If this bit is not set, then it depends:
+ ^ is an anchor if it is at the beginning of a regular
+ expression or after an open-group or an alternation operator;
+ $ is an anchor if it is at the end of a regular expression, or
+ before a close-group or an alternation operator.
+
+ This bit could be (re)combined with RE_CONTEXT_INDEP_OPS, because
+ POSIX draft 11.2 says that * etc. in leading positions is undefined.
+ We already implemented a previous draft which made those constructs
+ invalid, though, so we haven't changed the code back. */
+#define RE_CONTEXT_INDEP_ANCHORS (RE_CHAR_CLASSES << 1)
+
+/* If this bit is set, then special characters are always special
+ regardless of where they are in the pattern.
+ If this bit is not set, then special characters are special only in
+ some contexts; otherwise they are ordinary. Specifically,
+ * + ? and intervals are only special when not after the beginning,
+ open-group, or alternation operator. */
+#define RE_CONTEXT_INDEP_OPS (RE_CONTEXT_INDEP_ANCHORS << 1)
+
+/* If this bit is set, then *, +, ?, and { cannot be first in an re or
+ immediately after an alternation or begin-group operator. */
+#define RE_CONTEXT_INVALID_OPS (RE_CONTEXT_INDEP_OPS << 1)
+
+/* If this bit is set, then . matches newline.
+ If not set, then it doesn't. */
+#define RE_DOT_NEWLINE (RE_CONTEXT_INVALID_OPS << 1)
+
+/* If this bit is set, then . doesn't match NUL.
+ If not set, then it does. */
+#define RE_DOT_NOT_NULL (RE_DOT_NEWLINE << 1)
+
+/* If this bit is set, nonmatching lists [^...] do not match newline.
+ If not set, they do. */
+#define RE_HAT_LISTS_NOT_NEWLINE (RE_DOT_NOT_NULL << 1)
+
+/* If this bit is set, either \{...\} or {...} defines an
+ interval, depending on RE_NO_BK_BRACES.
+ If not set, \{, \}, {, and } are literals. */
+#define RE_INTERVALS (RE_HAT_LISTS_NOT_NEWLINE << 1)
+
+/* If this bit is set, +, ? and | aren't recognized as operators.
+ If not set, they are. */
+#define RE_LIMITED_OPS (RE_INTERVALS << 1)
+
+/* If this bit is set, newline is an alternation operator.
+ If not set, newline is literal. */
+#define RE_NEWLINE_ALT (RE_LIMITED_OPS << 1)
+
+/* If this bit is set, then `{...}' defines an interval, and \{ and \}
+ are literals.
+ If not set, then `\{...\}' defines an interval. */
+#define RE_NO_BK_BRACES (RE_NEWLINE_ALT << 1)
+
+/* If this bit is set, (...) defines a group, and \( and \) are literals.
+ If not set, \(...\) defines a group, and ( and ) are literals. */
+#define RE_NO_BK_PARENS (RE_NO_BK_BRACES << 1)
+
+/* If this bit is set, then \<digit> matches <digit>.
+ If not set, then \<digit> is a back-reference. */
+#define RE_NO_BK_REFS (RE_NO_BK_PARENS << 1)
+
+/* If this bit is set, then | is an alternation operator, and \| is literal.
+ If not set, then \| is an alternation operator, and | is literal. */
+#define RE_NO_BK_VBAR (RE_NO_BK_REFS << 1)
+
+/* If this bit is set, then an ending range point collating higher
+ than the starting range point, as in [z-a], is invalid.
+ If not set, then when ending range point collates higher than the
+ starting range point, the range is ignored. */
+#define RE_NO_EMPTY_RANGES (RE_NO_BK_VBAR << 1)
+
+/* If this bit is set, then an unmatched ) is ordinary.
+ If not set, then an unmatched ) is invalid. */
+#define RE_UNMATCHED_RIGHT_PAREN_ORD (RE_NO_EMPTY_RANGES << 1)
+
+/* If this bit is set, do not process the GNU regex operators.
+ IF not set, then the GNU regex operators are recognized. */
+#define RE_NO_GNU_OPS (RE_UNMATCHED_RIGHT_PAREN_ORD << 1)
+
+/* This global variable defines the particular regexp syntax to use (for
+ some interfaces). When a regexp is compiled, the syntax used is
+ stored in the pattern buffer, so changing this does not affect
+ already-compiled regexps. */
+extern reg_syntax_t re_syntax_options;
+
+/* Define combinations of the above bits for the standard possibilities.
+ (The [[[ comments delimit what gets put into the Texinfo file, so
+ don't delete them!) */
+/* [[[begin syntaxes]]] */
#define RE_SYNTAX_EMACS 0
-#define RE_SYNTAX_POSIX_BASIC (RE_INTERVALS | RE_BK_PLUS_QM \
- | RE_CHAR_CLASSES | RE_DOT_NOT_NULL \
- | RE_HAT_NOT_NEWLINE | RE_NO_EMPTY_BK_REF \
- | RE_NO_EMPTY_BRACKETS | RE_LIMITED_OPS \
- | RE_NO_EMPTY_RANGES | RE_NO_HYPHEN_RANGE_END)
-
-#define RE_SYNTAX_POSIX_EXTENDED (RE_INTERVALS | RE_NO_BK_CURLY_BRACES \
- | RE_NO_BK_VBAR | RE_NO_BK_PARENS \
- | RE_HAT_NOT_NEWLINE | RE_CHAR_CLASSES \
- | RE_NO_EMPTY_BRACKETS | RE_CONTEXTUAL_INVALID_OPS \
- | RE_NO_BK_REFS | RE_NO_EMPTY_RANGES \
- | RE_NO_HYPHEN_RANGE_END)
-
-
-/* This data structure is used to represent a compiled pattern. */
+
+#define RE_SYNTAX_AWK \
+ (RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL \
+ | RE_NO_BK_PARENS | RE_NO_BK_REFS \
+ | RE_NO_BK_VBAR | RE_NO_EMPTY_RANGES \
+ | RE_UNMATCHED_RIGHT_PAREN_ORD | RE_NO_GNU_OPS)
+
+#define RE_SYNTAX_GNU_AWK \
+ (RE_SYNTAX_POSIX_EXTENDED | RE_BACKSLASH_ESCAPE_IN_LISTS)
+
+#define RE_SYNTAX_POSIX_AWK \
+ (RE_SYNTAX_GNU_AWK | RE_NO_GNU_OPS)
+
+#define RE_SYNTAX_GREP \
+ (RE_BK_PLUS_QM | RE_CHAR_CLASSES \
+ | RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS \
+ | RE_NEWLINE_ALT)
+
+#define RE_SYNTAX_EGREP \
+ (RE_CHAR_CLASSES | RE_CONTEXT_INDEP_ANCHORS \
+ | RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE \
+ | RE_NEWLINE_ALT | RE_NO_BK_PARENS \
+ | RE_NO_BK_VBAR)
+
+#define RE_SYNTAX_POSIX_EGREP \
+ (RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES)
+
+/* P1003.2/D11.2, section 4.20.7.1, lines 5078ff. */
+#define RE_SYNTAX_ED RE_SYNTAX_POSIX_BASIC
+
+#define RE_SYNTAX_SED RE_SYNTAX_POSIX_BASIC
+
+/* Syntax bits common to both basic and extended POSIX regex syntax. */
+#define _RE_SYNTAX_POSIX_COMMON \
+ (RE_CHAR_CLASSES | RE_DOT_NEWLINE | RE_DOT_NOT_NULL \
+ | RE_INTERVALS | RE_NO_EMPTY_RANGES)
+
+#define RE_SYNTAX_POSIX_BASIC \
+ (_RE_SYNTAX_POSIX_COMMON | RE_BK_PLUS_QM)
+
+/* Differs from ..._POSIX_BASIC only in that RE_BK_PLUS_QM becomes
+ RE_LIMITED_OPS, i.e., \? \+ \| are not recognized. Actually, this
+ isn't minimal, since other operators, such as \`, aren't disabled. */
+#define RE_SYNTAX_POSIX_MINIMAL_BASIC \
+ (_RE_SYNTAX_POSIX_COMMON | RE_LIMITED_OPS)
+
+#define RE_SYNTAX_POSIX_EXTENDED \
+ (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
+ | RE_CONTEXT_INDEP_OPS | RE_NO_BK_BRACES \
+ | RE_NO_BK_PARENS | RE_NO_BK_VBAR \
+ | RE_UNMATCHED_RIGHT_PAREN_ORD)
+
+/* Differs from ..._POSIX_EXTENDED in that RE_CONTEXT_INVALID_OPS
+ replaces RE_CONTEXT_INDEP_OPS and RE_NO_BK_REFS is added. */
+#define RE_SYNTAX_POSIX_MINIMAL_EXTENDED \
+ (_RE_SYNTAX_POSIX_COMMON | RE_CONTEXT_INDEP_ANCHORS \
+ | RE_CONTEXT_INVALID_OPS | RE_NO_BK_BRACES \
+ | RE_NO_BK_PARENS | RE_NO_BK_REFS \
+ | RE_NO_BK_VBAR | RE_UNMATCHED_RIGHT_PAREN_ORD)
+/* [[[end syntaxes]]] */
+
+/* Maximum number of duplicates an interval can allow. Some systems
+ (erroneously) define this in other header files, but we want our
+ value, so remove any previous define. */
+#ifdef RE_DUP_MAX
+#undef RE_DUP_MAX
+#endif
+/* if sizeof(int) == 2, then ((1 << 15) - 1) overflows */
+#define RE_DUP_MAX (0x7fff)
+
+
+/* POSIX `cflags' bits (i.e., information for `regcomp'). */
+
+/* If this bit is set, then use extended regular expression syntax.
+ If not set, then use basic regular expression syntax. */
+#define REG_EXTENDED 1
+
+/* If this bit is set, then ignore case when matching.
+ If not set, then case is significant. */
+#define REG_ICASE (REG_EXTENDED << 1)
+
+/* If this bit is set, then anchors do not match at newline
+ characters in the string.
+ If not set, then anchors do match at newlines. */
+#define REG_NEWLINE (REG_ICASE << 1)
+
+/* If this bit is set, then report only success or fail in regexec.
+ If not set, then returns differ between not matching and errors. */
+#define REG_NOSUB (REG_NEWLINE << 1)
+
+
+/* POSIX `eflags' bits (i.e., information for regexec). */
+
+/* If this bit is set, then the beginning-of-line operator doesn't match
+ the beginning of the string (presumably because it's not the
+ beginning of a line).
+ If not set, then the beginning-of-line operator does match the
+ beginning of the string. */
+#define REG_NOTBOL 1
+
+/* Like REG_NOTBOL, except for the end-of-line. */
+#define REG_NOTEOL (1 << 1)
+
+
+/* If any error codes are removed, changed, or added, update the
+ `re_error_msg' table in regex.c. */
+typedef enum
+{
+ REG_NOERROR = 0, /* Success. */
+ REG_NOMATCH, /* Didn't find a match (for regexec). */
+
+ /* POSIX regcomp return error codes. (In the order listed in the
+ standard.) */
+ REG_BADPAT, /* Invalid pattern. */
+ REG_ECOLLATE, /* Not implemented. */
+ REG_ECTYPE, /* Invalid character class name. */
+ REG_EESCAPE, /* Trailing backslash. */
+ REG_ESUBREG, /* Invalid back reference. */
+ REG_EBRACK, /* Unmatched left bracket. */
+ REG_EPAREN, /* Parenthesis imbalance. */
+ REG_EBRACE, /* Unmatched \{. */
+ REG_BADBR, /* Invalid contents of \{\}. */
+ REG_ERANGE, /* Invalid range end. */
+ REG_ESPACE, /* Ran out of memory. */
+ REG_BADRPT, /* No preceding re for repetition op. */
+
+ /* Error codes we've added. */
+ REG_EEND, /* Premature end. */
+ REG_ESIZE, /* Compiled pattern bigger than 2^16 bytes. */
+ REG_ERPAREN /* Unmatched ) or \); not returned from regcomp. */
+} reg_errcode_t;
+
+/* This data structure represents a compiled pattern. Before calling
+ the pattern compiler, the fields `buffer', `allocated', `fastmap',
+ `translate', and `no_sub' can be set. After the pattern has been
+ compiled, the `re_nsub' field is available. All other fields are
+ private to the regex routines. */
struct re_pattern_buffer
- {
- char *buffer; /* Space holding the compiled pattern commands. */
- long allocated; /* Size of space that `buffer' points to. */
- long used; /* Length of portion of buffer actually occupied */
- char *fastmap; /* Pointer to fastmap, if any, or zero if none. */
- /* re_search uses the fastmap, if there is one,
- to skip over totally implausible characters. */
- char *translate; /* Translate table to apply to all characters before
- comparing, or zero for no translation.
- The translation is applied to a pattern when it is
- compiled and to data when it is matched. */
- char fastmap_accurate;
- /* Set to zero when a new pattern is stored,
- set to one when the fastmap is updated from it. */
- char can_be_null; /* Set to one by compiling fastmap
- if this pattern might match the null string.
- It does not necessarily match the null string
- in that case, but if this is zero, it cannot.
- 2 as value means can match null string
- but at end of range or before a character
- listed in the fastmap. */
- };
-
-
-/* search.c (search_buffer) needs this one value. It is defined both in
- regex.c and here. */
+{
+/* [[[begin pattern_buffer]]] */
+ /* Space that holds the compiled pattern. It is declared as
+ `unsigned char *' because its elements are
+ sometimes used as array indexes. */
+ unsigned char *buffer;
+
+ /* Number of bytes to which `buffer' points. */
+ unsigned long allocated;
+
+ /* Number of bytes actually used in `buffer'. */
+ unsigned long used;
+
+ /* Syntax setting with which the pattern was compiled. */
+ reg_syntax_t syntax;
+
+ /* Pointer to a fastmap, if any, otherwise zero. re_search uses
+ the fastmap, if there is one, to skip over impossible
+ starting points for matches. */
+ char *fastmap;
+
+ /* Either a translate table to apply to all characters before
+ comparing them, or zero for no translation. The translation
+ is applied to a pattern when it is compiled and to a string
+ when it is matched. */
+ char *translate;
+
+ /* Number of subexpressions found by the compiler. */
+ size_t re_nsub;
+
+ /* Zero if this pattern cannot match the empty string, one else.
+ Well, in truth it's used only in `re_search_2', to see
+ whether or not we should use the fastmap, so we don't set
+ this absolutely perfectly; see `re_compile_fastmap' (the
+ `duplicate' case). */
+ unsigned can_be_null : 1;
+
+ /* If REGS_UNALLOCATED, allocate space in the `regs' structure
+ for `max (RE_NREGS, re_nsub + 1)' groups.
+ If REGS_REALLOCATE, reallocate space if necessary.
+ If REGS_FIXED, use what's there. */
+#define REGS_UNALLOCATED 0
+#define REGS_REALLOCATE 1
+#define REGS_FIXED 2
+ unsigned regs_allocated : 2;
+
+ /* Set to zero when `regex_compile' compiles a pattern; set to one
+ by `re_compile_fastmap' if it updates the fastmap. */
+ unsigned fastmap_accurate : 1;
+
+ /* If set, `re_match_2' does not return information about
+ subexpressions. */
+ unsigned no_sub : 1;
+
+ /* If set, a beginning-of-line anchor doesn't match at the
+ beginning of the string. */
+ unsigned not_bol : 1;
+
+ /* Similarly for an end-of-line anchor. */
+ unsigned not_eol : 1;
+
+ /* If true, an anchor at a newline matches. */
+ unsigned newline_anchor : 1;
+
+/* [[[end pattern_buffer]]] */
+};
+
+typedef struct re_pattern_buffer regex_t;
+
+
+/* search.c (search_buffer) in Emacs needs this one opcode value. It is
+ defined both in `regex.c' and here. */
#define RE_EXACTN_VALUE 1
+
+/* Type for byte offsets within the string. POSIX mandates this. */
+typedef int regoff_t;
-/* Structure to store register contents data in.
-
- Pass the address of such a structure as an argument to re_match, etc.,
- if you want this information back.
+/* This is the structure we store register match data in. See
+ regex.texinfo for a full description of what registers match. */
+struct re_registers
+{
+ unsigned num_regs;
+ regoff_t *start;
+ regoff_t *end;
+};
- For i from 1 to RE_NREGS - 1, start[i] records the starting index in
- the string of where the ith subexpression matched, and end[i] records
- one after the ending index. start[0] and end[0] are analogous, for
- the entire pattern. */
-struct re_registers
- {
- int start[RE_NREGS];
- int end[RE_NREGS];
- };
+/* If `regs_allocated' is REGS_UNALLOCATED in the pattern buffer,
+ `re_match_2' returns information about at least this many registers
+ the first time a `regs' structure is passed. */
+#ifndef RE_NREGS
+#define RE_NREGS 30
+#endif
+/* POSIX specification for registers. Aside from the different names than
+ `re_registers', POSIX uses an array of structures, instead of a
+ structure of arrays. */
+typedef struct
+{
+ regoff_t rm_so; /* Byte offset from string's start to substring's start. */
+ regoff_t rm_eo; /* Byte offset from string's start to substring's end. */
+} regmatch_t;
+/* Declarations for routines. */
+
+/* To avoid duplicating every routine declaration -- once with a
+ prototype (if we are ANSI), and once without (if we aren't) -- we
+ use the following macro to declare argument types. This
+ unfortunately clutters up the declarations a bit, but I think it's
+ worth it. */
+
#ifdef __STDC__
-extern char *re_compile_pattern (char *, size_t, struct re_pattern_buffer *);
-/* Is this really advertised? */
-extern void re_compile_fastmap (struct re_pattern_buffer *);
-extern int re_search (struct re_pattern_buffer *, char*, int, int, int,
- struct re_registers *);
-extern int re_search_2 (struct re_pattern_buffer *, char *, int,
- char *, int, int, int,
- struct re_registers *, int);
-extern int re_match (struct re_pattern_buffer *, char *, int, int,
- struct re_registers *);
-extern int re_match_2 (struct re_pattern_buffer *, char *, int,
- char *, int, int, struct re_registers *, int);
-extern long re_set_syntax (long syntax);
-
-#ifndef GAWK
-/* 4.2 bsd compatibility. */
-extern char *re_comp (char *);
-extern int re_exec (char *);
-#endif
+#define _RE_ARGS(args) args
-#else /* !__STDC__ */
+#else /* not __STDC__ */
-extern char *re_compile_pattern ();
-/* Is this really advertised? */
-extern void re_compile_fastmap ();
-extern int re_search (), re_search_2 ();
-extern int re_match (), re_match_2 ();
-extern long re_set_syntax();
+#define _RE_ARGS(args) ()
-#ifndef GAWK
-/* 4.2 bsd compatibility. */
-extern char *re_comp ();
-extern int re_exec ();
-#endif
+#endif /* not __STDC__ */
-#endif /* __STDC__ */
+/* Sets the current default syntax to SYNTAX, and return the old syntax.
+ You can also simply assign to the `re_syntax_options' variable. */
+extern reg_syntax_t re_set_syntax _RE_ARGS ((reg_syntax_t syntax));
+/* Compile the regular expression PATTERN, with length LENGTH
+ and syntax given by the global `re_syntax_options', into the buffer
+ BUFFER. Return NULL if successful, and an error string if not. */
+extern const char *re_compile_pattern
+ _RE_ARGS ((const char *pattern, size_t length,
+ struct re_pattern_buffer *buffer));
+
+
+/* Compile a fastmap for the compiled pattern in BUFFER; used to
+ accelerate searches. Return 0 if successful and -2 if was an
+ internal error. */
+extern int re_compile_fastmap _RE_ARGS ((struct re_pattern_buffer *buffer));
+
+
+/* Search in the string STRING (with length LENGTH) for the pattern
+ compiled into BUFFER. Start searching at position START, for RANGE
+ characters. Return the starting position of the match, -1 for no
+ match, or -2 for an internal error. Also return register
+ information in REGS (if REGS and BUFFER->no_sub are nonzero). */
+extern int re_search
+ _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string,
+ int length, int start, int range, struct re_registers *regs));
+
+
+/* Like `re_search', but search in the concatenation of STRING1 and
+ STRING2. Also, stop searching at index START + STOP. */
+extern int re_search_2
+ _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string1,
+ int length1, const char *string2, int length2,
+ int start, int range, struct re_registers *regs, int stop));
-#ifdef SYNTAX_TABLE
-extern char *re_syntax_table;
-#endif
-#endif /* !__REGEXP_LIBRARY */
+/* Like `re_search', but return how many characters in STRING the regexp
+ in BUFFER matched, starting at position START. */
+extern int re_match
+ _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string,
+ int length, int start, struct re_registers *regs));
+
+
+/* Relates to `re_match' as `re_search_2' relates to `re_search'. */
+extern int re_match_2
+ _RE_ARGS ((struct re_pattern_buffer *buffer, const char *string1,
+ int length1, const char *string2, int length2,
+ int start, struct re_registers *regs, int stop));
+
+
+/* Set REGS to hold NUM_REGS registers, storing them in STARTS and
+ ENDS. Subsequent matches using BUFFER and REGS will use this memory
+ for recording register information. STARTS and ENDS must be
+ allocated with malloc, and must each be at least `NUM_REGS * sizeof
+ (regoff_t)' bytes long.
+
+ If NUM_REGS == 0, then subsequent matches should allocate their own
+ register data.
+
+ Unless this function is called, the first search or match using
+ PATTERN_BUFFER will allocate its own register data, without
+ freeing the old data. */
+extern void re_set_registers
+ _RE_ARGS ((struct re_pattern_buffer *buffer, struct re_registers *regs,
+ unsigned num_regs, regoff_t *starts, regoff_t *ends));
+
+/* 4.2 bsd compatibility. */
+extern char *re_comp _RE_ARGS ((const char *));
+extern int re_exec _RE_ARGS ((const char *));
+
+/* POSIX compatibility. */
+extern int regcomp _RE_ARGS ((regex_t *preg, const char *pattern, int cflags));
+extern int regexec
+ _RE_ARGS ((const regex_t *preg, const char *string, size_t nmatch,
+ regmatch_t pmatch[], int eflags));
+extern size_t regerror
+ _RE_ARGS ((int errcode, const regex_t *preg, char *errbuf,
+ size_t errbuf_size));
+extern void regfree _RE_ARGS ((regex_t *preg));
+
+#endif /* not __REGEXP_LIBRARY_H__ */
+
+/*
+Local variables:
+make-backup-files: t
+version-control: t
+trim-versions-without-asking: nil
+End:
+*/
diff --git a/gnu/usr.bin/awk/version.c b/gnu/usr.bin/awk/version.c
index adea5fafacfb..89b6cc055fb0 100644
--- a/gnu/usr.bin/awk/version.c
+++ b/gnu/usr.bin/awk/version.c
@@ -42,5 +42,6 @@ char *version_string = "@(#)Gnu Awk (gawk) 2.15";
/* 2.14 Mostly bug fixes. */
/* 2.15 Bug fixes plus intermixing of command-line source and files,
- GNU long options, ARGIND, ERRNO and Plan 9 style /dev/ files. */
+ GNU long options, ARGIND, ERRNO and Plan 9 style /dev/ files.
+ `delete array'. OS/2 port added. */