aboutsummaryrefslogtreecommitdiffstats
path: root/README.BTYACC
blob: 705481f82685ad6999b35e57f67edba6031d9950 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
-- $Id: README.BTYACC,v 1.1 2014/03/25 19:21:31 Tom.Shields Exp $

The original README from btyacc is below.

The backtracking enhancements to byacc have been merged into Thomas Dickey's
byacc baseline.

The %include and %define/%ifdef enhancements described below are not currently
incorporated.

-------------------------------------------------------------------------------
	     BTYACC -- backtracking yacc
	     ===========================

BTYACC was created by Chris Dodd using ideas from many
places and lots of code from the Berkeley Yacc
distribution, which is a public domain yacc clone put
together by the good folks at Berkeley.  This code is
distributed with NO WARRANTY and is public domain.
It is certain to contain bugs, which you should
report to: chrisd@collins.com.

Vadim Maslov of Siber Systems <vadik@siber.com>
considerably modified BTYACC to make it suitable
for production environment.

Several people have suggested bug fixes that
were incorporated into BtYacc.

See the README.BYACC files for more about
Berkeley Yacc and other sources of info.

http://www.siber.com/btyacc/ is the current home of BtYacc.
It is provided courtesy of Siber Systems http://www.siber.com/.


		Version 3.0 changes
		-------------------
		  by Vadim Maslov

Changes mostly occurred in btyaccpa.ske file that
contains the parsing shift/reduce/backtrack algorithm.

Version 3.0 innovations focus on:
- text position computation and propagation,
- industrial-strength error processing and recovery.


** Added mechanism for computing and propagating
text position of tokens and non-terminals.

Compilers often need to build AST trees such that every node
in a tree can relate to the parsed program source it came from.
The following applications are very likely to need this:
- debuggers that show actual source of the debugged program,
- source-to-source translators that want
  unchanged parts of the tree to generate the unchanged code.

The new YYPOSN mechanism added in this version of BtYacc
helps you in automating the text position computation
and in assigning the computed text positions to the AST.
This mechanism is successfully used in commercial
parsers and source-to-source translators.

In standard Yaccs every token and every non-terminal
has an YYSTYPE semantic value attached to it.
In this new version every token and every non-terminal
also has an YYPOSN text position attached to it.
YYPOSN is a user-defined type that can be anything and
that has a meaning of text position attached to
token or non-terminal.

In addition to semantic value stack BtYacc now maintains
text position stack. Behavior of the text position stack
is similar to the behavior of the semantic value stack.

If using text position mechanism,
you need to define the following:

YYPOSN	Preprocessor variable that contains C/C++ type of
	the text position attached to
	every token and non-terminal.

yyposn  Global variable of type YYPOSN.
        The lexer must assign text position of
	the returned token to yyposn, just like it assigns
	semantic value of the returned token to yylval.

YYREDUCEPOSNFUNC
	Preprocessor variable that points to function that
	is called after the grammar rule reduction
	to reduce text positions located on the stack.

        This function is called by BtYacc to reduce text
	positions. The function is called immediately after
	the regular rule reduction occurs.

	The function has the following prototype:
	void ReducePosn(YYPOSN  &ret,
			YYPOSN  *terms,
			YYSTYPE *term_vals,
			int      term_no,
			int      stk_pos,
			int      yychar,
			YYPOSN  &yyposn,
			UserType extra);

        The function arguments are:
        - ret
	  Reference to the text position returned by
          the rule. The function must write the computed
          text position returned by the rule to ret.
          This is analogue of the $$ semantic value.

        - term_posns
          Array of the right-hand side rule components
	  YYPOSN text positions.  These are analogues of
	  $1, $2, ..., $N in the text position world.

        - term_vals
	  Array of the right-hand side (RHS) rule components
	  YYSTYPE values. These are the $1,...,$N themselves.

        - term_no
          Number of the components in RHS of the reduced rule.
          Equal to size of arrays term_posns and term_vals.
          Also equal to N in $1,...,$N in the reduced rule.

        - stk_pos
          YYSTYPE/YYPOSN stack position before the reduction.

        - yychar
          Lookahead token that immediately follows
 	  the reduced RHS components.

        - yyposn
          YYPOSN of the token that immediately follows
	  the reduced RHS components.

        - extra
          User-defined extra argument passed to ReducePosn.

        Typically this function extracts text positions from
	the right-hand side rule components and either
	assigns them to the returned $$ structure/tree or
        if no $$ value is returned, puts them into
	the ret text position from where
        it will be picked up by the later reduced rules.

YYREDUCEPOSNFUNCARG
	Extra user-defined argument passed to
	the ReducePosn function. This argument can use
	any variables defined in btyaccpa.ske.


** Added code to btyaccpa.ske that automatically cleans up
semantic semantic values and text positions of tokens
and non-terminals that are discarded and deleted as
a result of error processing.

In the previous versions the discarded token and non-terminal
semantic values were not cleaned that caused quite severe
leaks.  The only way to fix it was to add garbage collection
to YYSTYPE class.

Now BtYacc skeleton calls delete functions for semantic
values and positions of the discarded tokens and
non-terminals.

You need to define the following functions that BtYacc
calls when it needs to delete semantic value or text position.

YYDELETEVAL
	User-defined function that is called by BtYacc
	to delete semantic value of the token or non-terminal.

	The user-defined function must have the prototype:
	void DeleteYYval(YYSTYPE v, int type);
	v    is semantic value to delete,
	type is one of the following:
	0 	discarding token
	1       discarding state
	2       cleaning up stack when aborting

YYDELETEPOSN
	User-defined function that is called by BtYacc
	to delete text position of the token or non-terminal.

	The user-defined function must have the prototype:
	void DeleteYYposn(YYPOSN p, int type);
	v    is semantic value to delete,
	type is one of the following:
	0 	discarding token
	1       discarding state
	2       cleaning up stack when aborting


** User can define "detailed" syntax error processing
function that reports an *exact* position of
the token that caused the error.

If you define preprocessor variable YYERROR_DETAILED in
your grammar then you need define the following
error processing function:

void yyerror_detailed(char    *text,
		      int      errt,
		      YYSTYPE &errt_value,
		      YYPOSN  &errt_posn);

It receives the following arguments:
text		Error message.
errt		Code of the token that caused the error.
errt_value	Value of the token that caused the error.
errt_posn	Text position of token that caused error.


** Dropped compatibility with C.

Compatibility with C became increasingly difficult
to maintain as new features were added to btyaccpa.ske.
So we dropped it. If anybody wants to make the new version
compatible with C, we would gladly accept the changes.

Meanwhile we expect that you use C++ to write grammar
actions and everything else in grammar files.
Since C is (in a sense) subset of C++, your C-based
grammar may work if you use C++ compiler to compile it.

		Version 3.0 bugs fixed
		----------------------

Matthias Meixner <meixner@mes.th-darmstadt.de> fixed a bug:
BtYacc does not correctly handle typenames, if one typename
is a prefix of another one and if this type is used after
the longer one. In this case BTYacc produces invalid code.


		Version 2.1 changes
		-------------------
		  by Vadim Maslov

** Added preprocessor statements to BtYacc that are similar
in function and behavior to C/C++ preprocessor statements.

These statements are used to:

- Introduce modularity into a grammar by breaking it
  into several *.y files and assembling different
  grammars from the *.y modules using %include and %ifdef.

- Have several versions of the same grammar
  by using %ifdef and $endif.

- To include automatically generated grammar fragment.
  For instance, we use %include to include
  automatically generated list of tokens.

Preprocessor statements are:

%define <var-name>
	Define preprocessor variable named <var-name>.

%ifdef <var-name>
	If preprocessor variable named <var-name>
	is defined by %define, then process the text from
	this %ifdef to the closing %endif.

%endif
	Closing bracket for %ifdef preprocessor statement.
	Only one nesting level of %ifdef-%endif is allowed.

%include <file-name>
	Process contents of the file named <file-name>.
	If <file-name> is a relative name, it is looked up
        in a directory in which btyacc was started.
	Only one nesting level of %include is allowed.


		Version 2.0 changes
		-------------------
		  by Vadim Maslov


** Changed 16-bit short numbers to 32-bit int numbers in
grammar tables, so that huge grammar tables (tables that
are larger than 32768 elements) resulting from huge
grammars (Cobol grammar, for instance) can work correctly.
You need to have 32-bit integer to index table bigger than
32768 elements, 16-bit integer is not enough.

The original BtYacc just generated non-working tables
larger than 32768 elements without even notifying about
the table overflow.


** Make error recovery work correctly when error happens
while processing nested conflicts. Original BtYacc could
infinitely cycle in certain situations that involved error
recovery while in nested conflict.

More detailed explanation: when we have nested conflicts
(conflict that happens while trial-processing another
conflict), it leads btyacc into NP-complete searching of
conflict tree. The ultimate goal is YYVALID operator that
selects a particular branch of that tree as a valid one.

If no YYVALID is found on the tree, then error recovery
takes over.  The problem with this is that error recovery
is started in the same state context that exists on the
last surveyed branch of the conflict tree.  Sometimes this
last branch may be of zero length and it results in
recovering to exactly the same state as existed before
entering the conflict. BtYacc cycles then.

We solved this problem by memorizing the longest path in
the conflict tree while browsing it. If we ever get into
error recovery, we restore state that existed on the
longest path.  Effectively we say: if we have an error,
let us move forward as far as we possibly could while we
were browsing the conflict tree.


** Introduce YYVALID_NESTED operation in addition to
simply YYVALID.  When we have a nested conflict (conflict
while processing in trial mode for another conflict), we
want to relate YYVALID to a particular level of conflict
being in trial.

Since we mostly anticipate only 2-level nested conflicts
YYVALID_NESTED tells the parser to satisfy only the
internal conflict.  Therefore, in 1-level conflict
situation YYVALID_NESTED acts like a regular YYVALID, but
in 2-level conflict it is a no-op and the other YYVALID
for outer conflict will be searched for.


** Improved handling of situation where /tmp directory is
missing.  Original btyacc just died quietly when /tmp
directory was missing.  We added code that states the
problem explicitly. While on UNIX /tmp directory is always
present, it may be missing on WIN32 systems, therefore
diagnosing this situation is important.


	Version 1.0 changes: BackTracking
	=================================
		by Chris Dodd

BTYACC is a modified version of yacc that supports
automatic backtracking and semantic disambiguation to
parse ambiguous grammars, as well as syntactic sugar for
inherited attributes (which tend to introduce conflicts).
Whenever a btyacc generated parser runs into a
shift-reduce or reduce-reduce error in the parse table, it
remembers the current parse point (yacc stack and input
stream state), and goes into trial parse mode.  It then
continues parsing, ignoring most rule actions.  If it runs
into an error (either through the parse table or through
an action calling YYERROR), it backtracks to the most
recent conflict point and tries a different alternative.
If it finds a successful parse (reaches the end of the
input or an action calls YYVALID), it backtracks to the
point where it first entered trial parse mode, and
continues with a full parse (executing all actions),
following the path of the successful trial.

Actions in btyacc come in two flavors -- {}-actions, which
are only executed when not in trial mode, and []-actions
which are executed regardless of mode.  There are also
inherited attributes, which look like arguments (they are
enclosed in "()") and act like []-actions.

What this buys you:

* No more lexer feedback hack.  In yacc grammars for C, a
standard hack, know as the "lexer feedback hack" is used
to find typedef names.  The lexer uses semantic
information to decide if any given identifier is a
typedef-name or not and returns a special token.  With
btyacc, you no longer need to do this; the lexer should
just always return an identifier.  The btyacc grammar then
needs a rule of the form:

typename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ]

While the hack works adequately well for parsing C, it
becomes a nightmare when you try to parse something like
C++, where treating an ID as a typedef becomes heavily
dependent on context.

* Easy disambiguation via simple ordering.  Btyacc runs
its trials via the rule "try shifting first, then try
reducing by the order that the conflicting rules appear in
the input file".  This means you can deal with semantic a
disambiguation rule like:
    [1] If it looks like a declaration it is, otherwise
    [2] If it looks like an expression it is, otherwise
    [3] it is a syntax error
	[Ellis&Stroustrup, Annotated C++ Reference Manual, p93]

To deal with this, you need only put all the rules for
declarations before the rules for expressions in the
grammar file.

* No extra cost if you do not use it.  Backtracking is
only triggered when the parse hits a shift/reduce or
reduce/reduce conflict in the table.  If you have no
conflicts in your grammar, there is no extra cost, other
than some extra code which will never be invoked.

* C++ and ANSI C compatible parsers.  The parsers produced
by btyacc can be compiled with C++ correctly.  If you
"#define" YYSTYPE to be some C++ type with constructor and
destructor, everything will work fine.  My favorite is
"#define YYSTYPE SmartPointer", where SmartPointer is a
smart pointer type that does garbage collection on the
pointed to objects.

BTYACC was originally written to make it easy to write a
C++ parser (my goal was to be able to use the grammar out
of the back of the ARM with as few modifications as
possible).  Anyone who has ever looked at Jim Roskind
public domain C++ yacc grammar, or the yacc-based grammar
used in g++ knows how difficult this is.  BTYACC is very
useful for parsing any ambiguous grammar, particularly
ones that come from trying to merge two (or more) complete
grammars.

Limitations of the backtracking: Currently, the generated
parser does NO pruning of alternate parsing paths.  To
avoid an exponential explosion of possible paths (and
parsing time), you need to manually tell the parser when
it can throw away saved paths using YYVALID.  In practice,
this turns out to be fairly easy to do.  A C++ parser (for
example) can just put a [YYVALID;] after every complete
declaration and statement rule, corresponding to pruning
the backtracking state after seeing a ';' or '}' -- there
will never be a situation in which it is useful to
backtrack past either of these.

Inherited attributes in btyacc:

Inherited attributes look a lot like function arguments to
non-terminals, which is what they end up being in a
recursive descent parser, but NOT how they are implemented
in btyacc.  Basically they are just syntactic sugar for
embedded semantic actions and $0, $-1, ... in normal yacc.
btyacc gives you two big advantages besides just the
syntax:
    1. it does type checking on the inherited attributes,
       so you do not have to specify $<type>0 and makes sure
       you give the correct number of arguments (inherited
       attributes) to every use of a non-terminal.
    2. It "collapses" identical actions from that are produced
       from inherited attributes.  This eliminates many
       potential reduce-reduce conflicts arising from
       the inherited attributes.

You use inherited attributes by declaring the types of the
attributes in the preamble with a type declaration and
declaring names of the attributes on the lhs of the yacc
rule.  You can of course have more than one rule with the
same lhs, and you can even give them different names in
each, but the type and number must be the same.

Here is a small example:
           /* lhs takes 2 inherited attributes */
%type <t1> lhs(<t1>, <t2>)
	   stuff(<t1>, <t2>)
%%
lhs($i1, $i2) : { $$ = $i1 }
	      | lhs($i1, $i2) stuff($1,$i2) { $$ = $2; }

This is roughly equivalent to the following yacc code:
lhs :
      { $$ = $<t1>-1; }
    | lhs [ $<t1>$ = $-1; ] [ $<t2>$ = $<t2>0; ] stuff
      { $$ = $4; }
    ;

See the file "test/t2.y" for a longer and more complete
example.  At the current time, the start symbol cannot
have any arguments.

Variant parsers:

Btyacc supports the -S flag to use a different parser
skeleton, changing the way that the parser is called and
used.  The skeleton "push.skel" is included to produce a
"passive" parser that you feed tokens to (rather than
having the parser call a separate yylex routine).  With
push.skel, yyparse is defined as follows:

int yyparse(int token, YYSTYPE yylval)

You should call yyparse repeatedly with successive tokens
of input.  It returns 0 if more input is needed, 1 for a
successful parse, and -1 for an unrecoverable parse error.


	Miscellaneous Features in ver. 1.0
	----------------------------------
		by Chris Dodd

     The -r option has been implemented.  The -r option tells
Yacc to put the read-only tables in y.tab.c and the code and
variables in y.code.c.  Keith Bostic asked for this option so
that :yyfix could be eliminated.

     The -l and -t options have been implemented.  The -l
option tells Yacc not to include #line directives in the code
it produces.  The -t option causes debugging code to be
included in the compiled parser.

     The code for error recovery has been changed to
implement the same algorithm as AT&T Yacc.  There will still
be differences in the way error recovery works because AT&T
Yacc uses more default reductions than Berkeley Yacc.

     The environment variable TMPDIR determines the directory
where temporary files will be created.  If TMPDIR is defined,
temporary files will be created in the directory whose
pathname is the value of TMPDIR.  By default, temporary files
are created in /tmp.

     The keywords are now case-insensitive.  For example,
%nonassoc, %NONASSOC, %NonAssoc, and %nOnAsSoC are
all equivalent.

     Commas and semicolons that are not part of C code are
treated as commentary.

     Line-end comments, as in BCPL, are permitted.  Line-end
comments begin with // and end at the next end-of-line.
Line-end comments are permitted in C code; they are converted
to C comments on output.

     The form of y.output files has been changed to look more
like those produced by AT&T Yacc.

     A new kind of declaration has been added.
The form of the declaration is

	  %ident string

where string is a sequence of characters beginning with a
double quote and ending with either a double quote or the
next end-of-line, whichever comes first.  The declaration
will cause a #ident directive to be written near the start
of the output file.

     If a parser has been compiled with debugging code, that
code can be enabled by setting an environment variable.
If the environment variable YYDEBUG is set to 0, debugging
output is suppressed.  If it is set to 1, debugging output
is written to standard output.


		Building BtYacc
		---------------
	by Chris Dodd and Vadim Maslov

We used GCC and GNU make to compile BtYacc both on UNIX and
WIN32 paltforms.  You are welcome to try different
combinations of makes and compilers.  Most likely it will
work, but it may require Makefile changes.

There is no config script.
Just type "make" and it should compile.

AWK. If you want to change file btyaccpa.ske (backtracking
parser skeleton), you will need awk to compile it into
skeleton.c file. We used GNU AWK (gawk) version 3.0.

It is known that using older versions of gawk
may create problems in compilation, because older awks
have problems with backslashes at the end of a line.

For MSDOS, there a "makefile.dos" that should do the trick.
Note: makefile.dos was not tested for a long time.

The result of compilation should be a single executable called
"btyacc" which you can install anywhere you like;
it does not require any other files in the distribution to run.


	       Legal Stuff
	       -----------
	by Chris Dodd and Vadim Maslov

In English: BtYacc is freeware. BtYacc is distributed with
no warranty whatsoever. The author and any other contributors
take no responsibility for any and all consequences of its use.

In Legalese: LIMITATION OF LIABILITY. NEITHER SIBER SYSTEMS
NOR ANY OF ITS LICENSORS NOR ANY BTYACC CONTRIBUTOR SHALL BE
LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL OR CONSEQUENTIAL
DAMAGES, OR DAMAGES FOR LOSS OF PROFITS, REVENUE, DATA OR
DATA USE, CAUSED BY BTYACC AND INCURRED BY CUSTOMER OR ANY
THIRD PARTY, WHETHER IN AN ACTION IN CONTRACT OR TORT, EVEN
IF SIBER SYSTEMS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.