Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements Scanner type for tokenizing nginx configs #80

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Commits on Jul 5, 2024

  1. Implements Scanner type for tokenizing nginx configs

    Implemented `crossplane.Scanner` that follows the example of other
    "scanner" types implemented in the Go stdlib. The existing `Lex` uses
    concurrency to make tokens available to the caller while managing
    "state". I think this design queue was taken from Rob Pike's 2011 talk
    on [Lexical Scanning in Go](https://go.dev/talks/2011/lex.slide). If you
    look at examples from the Go stdlib-- such as `bufio.Scanner` that `Lex`
    depends on-- you'd find that this isn't the strategy being employed and
    instead there is a struct that manages the state of the scanner and a
    method that used by the caller to advance the scanner to obtain tokens.
    
    After a bit of Internet archeology, I found [this](https://groups.google.com/g/golang-nuts/c/q--5t2cxv78/m/Vkr9bNuhP5sJ)
    post on `golang-nuts` from Rob Pike himself:
    
    > That talk was about a lexer, but the deeper purpose was to demonstrate
    > how concurrency can make programs nice even without obvious parallelism
    > in the problem. And like many such uses of concurrency, the code is
    > pretty but not necessarily fast.
    >
    > I think it's a fine approach to a lexer if you don't care about
    > performance. It is significantly slower than some other approaches but
    > is very easy to adapt. I used it in ivy, for example, but just so you
    > know, I'm probably going to replace the one in ivy with a more
    > traditional model to avoid some issues with the lexer accessing global
    > state. You don't care about that for your application, I'm sure.
    
    > So: It's pretty and nice to work on, but you'd probably not choose that
    > approach for a production compiler.
    
    An implementation of a "scanner" using the more "traditional" model--
    much of the logic is the same or very close to `Lex`-- seems to support
    the above statement.
    
    ```
    go test -benchmem -run=^$ -bench "^BenchmarkScan|BenchmarkLex$" github.com/nginxinc/nginx-go-crossplane -count=1 -v
    goos: darwin
    goarch: arm64
    pkg: github.com/nginxinc/nginx-go-crossplane
    BenchmarkLex
    BenchmarkLex/simple
    BenchmarkLex/simple-10             70982             16581 ns/op          102857 B/op         37 allocs/op
    BenchmarkLex/with-comments
    BenchmarkLex/with-comments-10      64125             18366 ns/op          102921 B/op         43 allocs/op
    BenchmarkLex/messy
    BenchmarkLex/messy-10              28171             42697 ns/op          104208 B/op        166 allocs/op
    BenchmarkLex/quote-behavior
    BenchmarkLex/quote-behavior-10     83667             14154 ns/op          102768 B/op         24 allocs/op
    BenchmarkLex/quoted-right-brace
    BenchmarkLex/quoted-right-brace-10                 48022             24799 ns/op          103369 B/op         52 allocs/op
    BenchmarkScan
    BenchmarkScan/simple
    BenchmarkScan/simple-10                           179712              6660 ns/op            4544 B/op         34 allocs/op
    BenchmarkScan/with-comments
    BenchmarkScan/with-comments-10                    133178              7628 ns/op            4608 B/op         40 allocs/op
    BenchmarkScan/messy
    BenchmarkScan/messy-10                             49251             24106 ns/op            5896 B/op        163 allocs/op
    BenchmarkScan/quote-behavior
    BenchmarkScan/quote-behavior-10                   240026              4854 ns/op            4456 B/op         21 allocs/op
    BenchmarkScan/quoted-right-brace
    BenchmarkScan/quoted-right-brace-10                87468             13534 ns/op            5056 B/op         49 allocs/op
    PASS
    ok      github.com/nginxinc/nginx-go-crossplane 13.676s
    ```
    
    This alternative to `Lex` is probably a micro-optimization for many use
    cases. As the size and number of NGINX configurations that need to be
    analyzed grows, optimization can be a good thing as well as an API that
    feels familiar to Go developers who might use this tool for their own
    purposes.
    
    Next steps:
    
    - Use `Scanner` to "parse" NGINX configurations. I think this should be
      done in place so that the existing API works as is, but we should also
      expose a way to allow the caller to provide the scanner.
    - Deprecate `Lex` in favor of `Scanner`. If we leave `Lex` in place then
      I don't think we would need a `v2` of the crossplane package (yet).
    ornj committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    636840a View commit details
    Browse the repository at this point in the history
  2. Adds Err() method to scanner and checks error in Scan()

    Stores the first error encountered by Scan() and checks it to make sure
    scanning stops unrecoverably. The Err() method can use used to fetch the
    last non-EOF error.
    ornj committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    c07b1be View commit details
    Browse the repository at this point in the history
  3. Fixes issue parsing comments in args with quote

    Fixed bug where the quoted token did not have `IsQuoted` set to `true`.
    I added an additional lex fixture which shows both the existing lexer
    and new scanner handle the case correctly.
    ornj committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    cc657b1 View commit details
    Browse the repository at this point in the history
  4. Updates scanner to support Lua extension

    Fixed up the Scanner logic to mirror changes made to support Lua
    extension in Lex. Added a compat layer so that the existing Lua type can
    be used with `Scanner` vs trying to refactor the implementation to
    remove the channel. Doing so I think would result in further gains.
    
    Benchmarks:
    
    ```
    ❯ go test -benchmem -run=^$ -bench "^(BenchmarkLex|BenchmarkLexWithLua|BenchmarkScanner|BenchmarkScannerWithLua)$" github.com/nginxinc/nginx-go-crossplane -count=1
    goos: darwin
    goarch: arm64
    pkg: github.com/nginxinc/nginx-go-crossplane
    BenchmarkLex/simple-10             57963             17756 ns/op          103049 B/op         39 allocs/op
    BenchmarkLex/with-comments-10      60025             20067 ns/op          103112 B/op         45 allocs/op
    BenchmarkLex/messy-10              26170             47822 ns/op          104400 B/op        168 allocs/op
    BenchmarkLex/quote-behavior-10             74510             17693 ns/op          102961 B/op         26 allocs/op
    BenchmarkLex/quoted-right-brace-10         43134             27752 ns/op          103560 B/op         54 allocs/op
    BenchmarkLex/comments-between-args-10      78271             14866 ns/op          102937 B/op         27 allocs/op
    BenchmarkLexWithLua/lua-basic-10           46273             26012 ns/op          105499 B/op         53 allocs/op
    BenchmarkLexWithLua/lua-block-simple-10                    22514             54149 ns/op          108556 B/op        143 allocs/op
    BenchmarkLexWithLua/lua-block-larger-10                    25983             46605 ns/op          108403 B/op         59 allocs/op
    BenchmarkLexWithLua/lua-block-tricky-10                    33756             35067 ns/op          106684 B/op         66 allocs/op
    BenchmarkScanner/simple-10                                163138              7084 ns/op            4648 B/op         36 allocs/op
    BenchmarkScanner/with-comments-10                         144558              8100 ns/op            4712 B/op         42 allocs/op
    BenchmarkScanner/messy-10                                  47570             25026 ns/op            6000 B/op        165 allocs/op
    BenchmarkScanner/quote-behavior-10                        222280              5083 ns/op            4560 B/op         23 allocs/op
    BenchmarkScanner/quoted-right-brace-10                     82656             14281 ns/op            5160 B/op         51 allocs/op
    BenchmarkScanner/comments-between-args-10                 225475              4872 ns/op            4536 B/op         24 allocs/op
    BenchmarkScannerWithLua/lua-basic-10                       93081             12833 ns/op            7866 B/op         66 allocs/op
    BenchmarkScannerWithLua/lua-block-simple-10                31426             37989 ns/op           10924 B/op        156 allocs/op
    BenchmarkScannerWithLua/lua-block-larger-10                37148             30723 ns/op           10770 B/op         72 allocs/op
    BenchmarkScannerWithLua/lua-block-tricky-10                54890             22383 ns/op            9050 B/op         79 allocs/op
    PASS
    ok      github.com/nginxinc/nginx-go-crossplane 29.969s
    ```
    ornj committed Jul 5, 2024
    Configuration menu
    Copy the full SHA
    fe04b93 View commit details
    Browse the repository at this point in the history