# Parsing Techniques: A Practical Guide (Monographs in Computer Science)

## Dick Grune

Language: English

Pages: 662

ISBN: 038720248X

Format: PDF / Kindle (mobi) / ePub

This second edition of Grune and Jacobs’ brilliant work presents new developments and discoveries that have been made in the field. Parsing, also referred to as syntax analysis, has been and continues to be an essential part of computer science and linguistics. Parsing techniques have grown considerably in importance, both in computer science, ie. advanced compilers often use general CF parsers, and computational linguistics where such parsers are the only option. They are used in a variety of software products including Web browsers, interpreters in computer devices, and data compression programs; and they are used extensively in linguistics.

--> bbcc Still, we cannot apply this rule since normally the Qs are to the right of the c. This can be remedied by allowing a Q to hop left over a c: 3. cQ --> Qc We can now ﬁnish our derivation: aaabcQQ aaabQcQ aaabbccQ aaabbcQc aaabbQcc aaabbbccc (3 times rule 1) (rule 3) (rule 2) (rule 3) (rule 3) (rule 2) It should be noted that the above derivation only shows that the grammar will produce the right strings, and the reader will still have to convince himself that it will not generate

the operands of the * and ? operators. Regular expressions exist for all Type 3 grammars. Note that the * and the + work on what precedes them. To distinguish them from the normal multiplication and addition operators, they are often printed higher than the level text in print, but in computer input they are in line with the rest, and other means must be used to distinguish them. 2.3.4 Type 4 Grammars The last restriction we shall apply to what is allowed in a production rule is a pretty ﬁnal

b b a a C Q C X Q X Q C b b C C b b c c There is an additional reason for shunning ε-rules: they make both proofs and parsers more complicated, sometimes much more complicated; see, for example, Section 9.5.4. So the question arises why we should bother with ε-rules at all; the answer is that they are very convenient for the grammar writer and user. If we have a language that is described by a CF grammar with ε-rules and we want to describe it by a grammar without ε-rules,

depth-ﬁrst or breadth-ﬁrst Deterministic directional methods: breadth-ﬁrst search, with breadth restricted to 1, bounded look-ahead unbounded look-ahead Deterministic with postponed (“non-canonical”) node identiﬁcation Generalized deterministic: maximally restricted breadth-ﬁrst search Unger parser The predict/match automaton recursive descent DCG (Deﬁnite Clause Grammars) cancellation parsing LL(k) 85 Bottom-up CYK parser The shift/reduce automaton Breadth-ﬁrst, top-down restricted (Earley)

depth of nesting is severely limited. In English, a sentence containing a subclause containing a subclause containing a subclause will bafﬂe the reader, and even in German and Dutch nestings over say ﬁve deep are frowned upon. We replicate the grammar the desired number of times and remove the possibility of further recursion from the deepest level. Then the deepest level is regular, which makes the other levels regular in turn. The resulting grammar will be huge but regular and will be able to