Skip to content

Commit

Permalink
Add a functionality to import external PEG files
Browse files Browse the repository at this point in the history
  • Loading branch information
arithy committed Apr 21, 2024
1 parent 35b71a7 commit 11a1b2b
Show file tree
Hide file tree
Showing 13 changed files with 1,897 additions and 117 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
PackCC: a packrat parser generator for C.

Copyright (c) 2014, 2019-2022 Arihiro Yoshida. All rights reserved.
Copyright (c) 2014, 2019-2024 Arihiro Yoshida. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
121 changes: 88 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# PackCC #
# PackCC

## Overview ##
## Overview

**PackCC** is a parser generator for C.
Its main features are as follows:
Expand Down Expand Up @@ -41,14 +41,14 @@ This feature is irrelevant to common users, but helpful for PackCC developers to

PackCC itself is under MIT license, but you can distribute your generated code under any license you like.

## Installation ##
## Installation

You can obtain the executable `packcc` by compiling [`src/packcc.c`](src/packcc.c) using your favorite C compiler.
For convenience, the build environments using GCC, Clang, and Microsoft Visual Studio are prepared under [`build`](build) directory.

### Using GCC ###
### Using GCC

#### Other than MinGW ####
#### Other than MinGW

`packcc` will be built in both directories `build/gcc/debug/bin` and `build/gcc/release/bin` using `gcc` by executing the following commands:

Expand All @@ -60,7 +60,7 @@ make check # bats-core and uncrustify are required (see tests/README.md)

`packcc` in the directory `build/gcc/release/bin` is suitable for practical use.

#### MinGW ####
#### MinGW

`packcc` will be built in both directories `build/mingw-gcc/debug/bin` and `build/mingw-gcc/release/bin` using `gcc` by executing the following commands:

Expand All @@ -72,9 +72,9 @@ make check # bats-core and uncrustify are required (see tests/README.md)

`packcc` in the directory `build/mingw-gcc/release/bin` is suitable for practical use.

### Using Clang ###
### Using Clang

#### Other than MinGW ####
#### Other than MinGW

`packcc` will be built in both directories `build/clang/debug/bin` and `build/clang/release/bin` using `clang` by executing the following commands:

Expand All @@ -86,7 +86,7 @@ make check # bats-core and uncrustify are required (see tests/README.md)

`packcc` in the directory `build/clang/release/bin` is suitable for practical use.

#### MinGW ####
#### MinGW

`packcc` will be built in both directories `build/mingw-clang/debug/bin` and `build/mingw-clang/release/bin` using `clang` by executing the following commands:

Expand All @@ -98,10 +98,11 @@ make check # bats-core and uncrustify are required (see tests/README.md)

`packcc` in the directory `build/mingw-clang/release/bin` is suitable for practical use.

### Using Microsoft Visual Studio ###
### Using Microsoft Visual Studio

You have to install Microsoft Visual Studio 2019 in advance.
After that, you can build `packcc.exe` by the following instructions:

- Open the solution file `build\msvc\msvc.sln`,
- Select a preferred solution configuration (*Debug* or *Release*) and a preferred solution platform (*x64* or *x86*),
- Invoke the *Build Solution* menu item.
Expand All @@ -110,20 +111,21 @@ After that, you can build `packcc.exe` by the following instructions:
Here, `XXX` is `x64` or `x86`, and `YYY` is `Debug` or `Release`.
`packcc.exe` in the directory `build\msvc\XXX\Release` is suitable for practical use.

## Usage ##
## Usage

### Command ###
### Command

You must prepare a PEG source file (see the following section).
Let the file name `example.peg` for example.
You must prepare a PEG source file in advance.
For details of the PEG source syntax, see the section "Syntax".
Here, let the file name `example.peg` for example.

```
packcc example.peg
```

By running this, the parser source `example.h` and `example.c` are generated.

If no PEG file name is specified, the PEG source is read from the standard input, and `-.h` and `-.c` are generated.
If no PEG file name is specified, the PEG source is read from the standard input, and `-.h` and `-.c` will be generated.

The base name of the parser source files can be changed by `-o` option.

Expand All @@ -132,6 +134,19 @@ packcc -o parser example.peg
```

By running this, the parser source `parser.h` and `parser.c` are generated.
This option can be specified only once.

A directory to search for import files can be added by `-I` option (version 2.0.0 or later).
This option can be specified as many times as needed.
The firstly specified directory will be searched first, the secondly specified directory will be searched next, and so on.

```
packcc -I foo -I bar/baz example.peg
```

By running this, the directory `foo` is searched first, and the directory `bar/baz` is searched next.
The directories specified by this option have higher priority than those specified in the environment variable `PCC_IMPORT_PATH` and the default directories.
For more details of import, see the explanation of `%import` written in the section "Syntax".

If you want to disable UTF-8 support, specify the command line option `-a` or `--ascii` (version 1.4.0 or later).

Expand All @@ -144,7 +159,7 @@ If you want to confirm the version of the `packcc` command, execute the below.
packcc -v
```

### Syntax ###
### Syntax

A grammar consists of a set of named rules.
A rule definition can be split into multiple lines.
Expand Down Expand Up @@ -317,37 +332,37 @@ All matched actions are guaranteed to be executed only once.

In the action, the C source code can use the predefined variables below.

- **`$$`**
- **`$$`** :
The output variable, to which the result of the rule is stored.
The data type is the one specified by `%value`.
The default data type is `int`.
- **`auxil`**
- **`auxil`** :
The user-defined data that has been given via the API function `pcc_create()`.
The data type is the one specified by `%auxil`.
The default data type is `void *`.
- _variable_
- _variable_ :
The result of another rule that has already been evaluated.
If the rule has not been evaluated, it is ensured that the value is zero-cleared (version 1.7.1 or later).
The data type is the one specified by `%value`.
The default data type is `int`.
- **`$`**_n_
- **`$`**_n_ :
The string of the captured text.
The _n_ is the positive integer that corresponds to the order of capturing.
The variable `$1` holds the string of the first captured text.
- **`$`**_n_**`s`**
- **`$`**_n_**`s`** :
The start position in the input of the captured text, inclusive.
The _n_ is the positive integer that corresponds to the order of capturing.
The variable `$1s` holds the start position of the first captured text.
- **`$`**_n_**`e`**
- **`$`**_n_**`e`** :
The end position in the input of the captured text, exclusive.
The _n_ is the positive integer that corresponds to the order of capturing.
The variable `$1e` holds the end position of the first captured text.
- **`$0`**
- **`$0`** :
The string of the text between the start position in the input at which the rule pattern begins to match
and the current position in the input at which the element immediately before the action ends to match.
- **`$0s`**
- **`$0s`** :
The start position in the input at which the rule pattern begins to match.
- **`$0e`**
- **`$0e`** :
The current position in the input at which the element immediately before the action ends to match.

An example is shown below.
Expand Down Expand Up @@ -390,17 +405,20 @@ rule2 <- (e1 e2 e3) ~{ error("one of e[123] has failed"); }
The specified C source code is copied verbatim to the C header file before the generated parser API function declarations.
Any braces in the C source code must be properly nested.
Note that braces in directive lines and in comments (`/*`...`*/` and `//`...) are appropriately ignored.
When `%header` is used multiple times, the respective C source codes are copied in order of their appearance.

**`%source` `{` _c source code_ `}`**

The specified C source code is copied verbatim to the C source file before the generated parser implementation code.
Any braces in the C source code must be properly nested.
Note that braces in directive lines and in comments (`/*`...`*/` and `//`...) are appropriately ignored.
When `%source` is used multiple times, the respective C source codes are copied in order of their appearance.

**`%common` `{` _c source code_ `}`**

The specified C source code is copied verbatim to both of the C header file and the C source file
before the generated parser API function declarations and the implementation code respectively.
This has the same effect as `%header {` _c source code_ `} %source {` _c source code_ `}`.
Any braces in the C source code must be properly nested.
Note that braces in directive lines and in comments (`/*`...`*/` and `//`...) are appropriately ignored.

Expand All @@ -419,15 +437,42 @@ This can be useful for example when it is necessary to modify behavior of standa

The type of output data, which is output as `$$` in each action and can be retrieved from the parser API function `pcc_parse()`,
is changed to the specified one from the default `int`.
This can be used only once and cannot be used in imported files.

**`%auxil` `"`_user-defined data type_`"`**

The type of user-defined data, which is passed to the parser API function `pcc_create()`,
is changed to the specified one from the default `void *`.
This can be used only once and cannot be used in imported files.

**`%prefix` `"`_prefix_`"`**

The prefix of the parser API functions is changed to the specified one from the default `pcc`.
This can be used only once and cannot be used in imported files.

**`%import` `"`_import file name_`"`**

The content of the specified import file is expanded at the text location of `%import` (version 2.0.0 or later).
This can be used multiple times anywhere and can be used also in imported files.
The _import file name_ can be a relative path to the current directory or an absolute path.
If it is a relative path, the directories listed below are searched for the import file in the listed order.

1. the directory where the file that imports the import file is located
2. the directories specified with `-I` options
- They are prioritized in order of their appearance in the command line.
3. the directories specified by the environment variable `PCC_IMPORT_PATH`
- They are prioritized in order of their appearance in the value of this variable.
- The character used as a delimiter between directory names is the colon `':'` if PackCC is built for a Unix-like platform such as Linux, macOS, and MinGW.
The character is the semicolon `';'` if PackCC is built as a native Windows executable.
(This is exactly the same manner as the environment variable `PATH`.)
4. the per-user default directory
- This is the subdirectory `.packcc/import` in the home directory if PackCC is built for a Unix-like platform,
and in the user profile directory, "`C:\Users\`_username_" for example, if PackCC is built as a native Windows executable.
5. the system-wide default directory
- This is the directory `/usr/share/packcc/import` if PackCC is built for a Unix-like platform,
and is the subdirectory `packcc/import` in the common application data directory, "`C:\ProgramData`" for example.

Note that the file imported once is silently ignored when it is attempted to be imported again.

**`#`_comment_**

Expand All @@ -440,7 +485,16 @@ All text following `%%` is copied verbatim to the C source file after the genera

<small>(The specification is determined by referring to [peg/leg](http://piumarta.com/software/peg/) developed by Ian Piumarta.)</small>

### Macros ###
### Import Files

The following import files are currently bundled.

- [`import/char/ascii_character_group.peg`](import/char/ascii_character_group.peg) :
This contains various rules to match an ASCII character belonging to a specific character group.
- [`import/char/unicode_general_category.peg`](import/char/unicode_general_category.peg) :
This contains various rules to match a Unicode character belonging to a specific [general category](https://unicode.org/reports/tr44/#General_Category_Values).

### Macros

Some macros are prepared to customize the parser.
The macro definition should be in <u>`%source` section</u> in the PEG source.
Expand Down Expand Up @@ -560,9 +614,10 @@ For other events, `buffer` and `length` indicate a part of the currently loaded
The user-defined data passed to the API function `pcc_create()` can be retrieved from this argument.
There are currently three supported events:
- `PCC_DBG_EVALUATE` (= 0) - called when the parser starts to evaluate `rule`
- `PCC_DBG_MATCH` (= 1) - called when `rule` is matched, at which point buffer holds entire matched string
- `PCC_DBG_NOMATCH` (= 2) - called when the parser determines that the input does not match currently evaluated `rule`
- `PCC_DBG_EVALUATE` (= 0) - called when the parser starts to evaluate `rule`
- `PCC_DBG_MATCH` (= 1) - called when `rule` is matched, at which point buffer holds entire matched string
- `PCC_DBG_NOMATCH` (= 2) - called when the parser determines that the input does not match currently evaluated `rule`
A very simple implementation could look like this:
Expand Down Expand Up @@ -590,7 +645,7 @@ The initial size (the number of elements) of the internal arrays other than the
The arrays are expanded as needed.
The default is `2`.
### API ###
### API
The parser API has only 3 simple functions below.
Expand Down Expand Up @@ -653,9 +708,9 @@ while (pcc_parse(ctx, &ret));
pcc_destroy(ctx);
```

## Examples ##
## Examples

### Desktop calculator ###
### Desktop calculator

A simple example which provides interactive four arithmetic operations of integers is shown here.
Note that **left-recursive** grammar rules are defined in this example.
Expand Down Expand Up @@ -700,7 +755,7 @@ int main() {
}
```

### AST builder for Tiny-C ###
### AST builder for Tiny-C

You can find the more practical example in the directory [`examples/ast-tinyc`](examples/ast-tinyc).
It builds an AST (abstract syntax tree) from an input source file
Expand Down
35 changes: 35 additions & 0 deletions import/char/ascii_character_group.peg
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# This file is hereby placed in the public domain.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHORS AS IS AND ANY EXPRESS
# OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
# BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
# OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
# EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

ASCII_Printable_Character <- ASCII_Special_Character / ASCII_Number / ASCII_Letter
ASCII_Letter <- ASCII_Uppercase_Letter / ASCII_Lowercase_Letter

ASCII_Control_Character <- [\x00-\x1f\x7f]
ASCII_Special_Character <- [\x20-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]
ASCII_Number <- [0-9]
ASCII_Uppercase_Letter <- [A-Z]
ASCII_Lowercase_Letter <- [a-z]

ASCII_C_alnum <- [0-9A-Za-z]
ASCII_C_alpha <- [A-Za-z]
ASCII_C_blank <- [ \t]
ASCII_C_cntrl <- [\x00-\x1f\x7f]
ASCII_C_digit <- [0-9]
ASCII_C_graph <- [\x21-\x7e]
ASCII_C_lower <- [a-z]
ASCII_C_print <- [\x20-\x7e]
ASCII_C_punct <- [\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]
ASCII_C_space <- [ \t\n\v\f\r]
ASCII_C_upper <- [A-Z]
ASCII_C_xdigit <- [0-9A-Fa-f]
Loading

0 comments on commit 11a1b2b

Please sign in to comment.