- 
                Notifications
    You must be signed in to change notification settings 
- Fork 20
Implements Scanner type for tokenizing nginx configs #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Draft
      
      
            ornj
  wants to merge
  4
  commits into
  nginxinc:main
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
ornj:scanner
  
      
      
   
  
    
  
  
  
 
  
      
    base: main
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    7f9daf9    to
    a103f17      
    Compare
  
    
              
                    nginx-nickc
  
              
              approved these changes
              
                  
                    Jan 30, 2024 
                  
              
              
            
            
Implemented `crossplane.Scanner` that follows the example of other "scanner" types implemented in the Go stdlib. The existing `Lex` uses concurrency to make tokens available to the caller while managing "state". I think this design queue was taken from Rob Pike's 2011 talk on [Lexical Scanning in Go](https://go.dev/talks/2011/lex.slide). If you look at examples from the Go stdlib-- such as `bufio.Scanner` that `Lex` depends on-- you'd find that this isn't the strategy being employed and instead there is a struct that manages the state of the scanner and a method that used by the caller to advance the scanner to obtain tokens. After a bit of Internet archeology, I found [this](https://groups.google.com/g/golang-nuts/c/q--5t2cxv78/m/Vkr9bNuhP5sJ) post on `golang-nuts` from Rob Pike himself: > That talk was about a lexer, but the deeper purpose was to demonstrate > how concurrency can make programs nice even without obvious parallelism > in the problem. And like many such uses of concurrency, the code is > pretty but not necessarily fast. > > I think it's a fine approach to a lexer if you don't care about > performance. It is significantly slower than some other approaches but > is very easy to adapt. I used it in ivy, for example, but just so you > know, I'm probably going to replace the one in ivy with a more > traditional model to avoid some issues with the lexer accessing global > state. You don't care about that for your application, I'm sure. > So: It's pretty and nice to work on, but you'd probably not choose that > approach for a production compiler. An implementation of a "scanner" using the more "traditional" model-- much of the logic is the same or very close to `Lex`-- seems to support the above statement. ``` go test -benchmem -run=^$ -bench "^BenchmarkScan|BenchmarkLex$" github.com/nginxinc/nginx-go-crossplane -count=1 -v goos: darwin goarch: arm64 pkg: github.com/nginxinc/nginx-go-crossplane BenchmarkLex BenchmarkLex/simple BenchmarkLex/simple-10 70982 16581 ns/op 102857 B/op 37 allocs/op BenchmarkLex/with-comments BenchmarkLex/with-comments-10 64125 18366 ns/op 102921 B/op 43 allocs/op BenchmarkLex/messy BenchmarkLex/messy-10 28171 42697 ns/op 104208 B/op 166 allocs/op BenchmarkLex/quote-behavior BenchmarkLex/quote-behavior-10 83667 14154 ns/op 102768 B/op 24 allocs/op BenchmarkLex/quoted-right-brace BenchmarkLex/quoted-right-brace-10 48022 24799 ns/op 103369 B/op 52 allocs/op BenchmarkScan BenchmarkScan/simple BenchmarkScan/simple-10 179712 6660 ns/op 4544 B/op 34 allocs/op BenchmarkScan/with-comments BenchmarkScan/with-comments-10 133178 7628 ns/op 4608 B/op 40 allocs/op BenchmarkScan/messy BenchmarkScan/messy-10 49251 24106 ns/op 5896 B/op 163 allocs/op BenchmarkScan/quote-behavior BenchmarkScan/quote-behavior-10 240026 4854 ns/op 4456 B/op 21 allocs/op BenchmarkScan/quoted-right-brace BenchmarkScan/quoted-right-brace-10 87468 13534 ns/op 5056 B/op 49 allocs/op PASS ok github.com/nginxinc/nginx-go-crossplane 13.676s ``` This alternative to `Lex` is probably a micro-optimization for many use cases. As the size and number of NGINX configurations that need to be analyzed grows, optimization can be a good thing as well as an API that feels familiar to Go developers who might use this tool for their own purposes. Next steps: - Use `Scanner` to "parse" NGINX configurations. I think this should be done in place so that the existing API works as is, but we should also expose a way to allow the caller to provide the scanner. - Deprecate `Lex` in favor of `Scanner`. If we leave `Lex` in place then I don't think we would need a `v2` of the crossplane package (yet).
Stores the first error encountered by Scan() and checks it to make sure scanning stops unrecoverably. The Err() method can use used to fetch the last non-EOF error.
Fixed bug where the quoted token did not have `IsQuoted` set to `true`. I added an additional lex fixture which shows both the existing lexer and new scanner handle the case correctly.
| Now supports @xynicole's changes to enable tokenizing Lua. Benchmarks: ❯ go test -benchmem -run=^$ -bench "^(BenchmarkLex|BenchmarkLexWithLua|BenchmarkScanner|BenchmarkScannerWithLua)$" github.com/nginxinc/nginx-go-crossplane -count=1
goos: darwin
goarch: arm64
pkg: github.com/nginxinc/nginx-go-crossplane
BenchmarkLex/simple-10             57963             17756 ns/op          103049 B/op         39 allocs/op
BenchmarkLex/with-comments-10      60025             20067 ns/op          103112 B/op         45 allocs/op
BenchmarkLex/messy-10              26170             47822 ns/op          104400 B/op        168 allocs/op
BenchmarkLex/quote-behavior-10             74510             17693 ns/op          102961 B/op         26 allocs/op
BenchmarkLex/quoted-right-brace-10         43134             27752 ns/op          103560 B/op         54 allocs/op
BenchmarkLex/comments-between-args-10      78271             14866 ns/op          102937 B/op         27 allocs/op
BenchmarkLexWithLua/lua-basic-10           46273             26012 ns/op          105499 B/op         53 allocs/op
BenchmarkLexWithLua/lua-block-simple-10                    22514             54149 ns/op          108556 B/op        143 allocs/op
BenchmarkLexWithLua/lua-block-larger-10                    25983             46605 ns/op          108403 B/op         59 allocs/op
BenchmarkLexWithLua/lua-block-tricky-10                    33756             35067 ns/op          106684 B/op         66 allocs/op
BenchmarkScanner/simple-10                                163138              7084 ns/op            4648 B/op         36 allocs/op
BenchmarkScanner/with-comments-10                         144558              8100 ns/op            4712 B/op         42 allocs/op
BenchmarkScanner/messy-10                                  47570             25026 ns/op            6000 B/op        165 allocs/op
BenchmarkScanner/quote-behavior-10                        222280              5083 ns/op            4560 B/op         23 allocs/op
BenchmarkScanner/quoted-right-brace-10                     82656             14281 ns/op            5160 B/op         51 allocs/op
BenchmarkScanner/comments-between-args-10                 225475              4872 ns/op            4536 B/op         24 allocs/op
BenchmarkScannerWithLua/lua-basic-10                       93081             12833 ns/op            7866 B/op         66 allocs/op
BenchmarkScannerWithLua/lua-block-simple-10                31426             37989 ns/op           10924 B/op        156 allocs/op
BenchmarkScannerWithLua/lua-block-larger-10                37148             30723 ns/op           10770 B/op         72 allocs/op
BenchmarkScannerWithLua/lua-block-tricky-10                54890             22383 ns/op            9050 B/op         79 allocs/op
PASS
ok      github.com/nginxinc/nginx-go-crossplane 29.969s | 
Fixed up the Scanner logic to mirror changes made to support Lua extension in Lex. Added a compat layer so that the existing Lua type can be used with `Scanner` vs trying to refactor the implementation to remove the channel. Doing so I think would result in further gains. Benchmarks: ``` ❯ go test -benchmem -run=^$ -bench "^(BenchmarkLex|BenchmarkLexWithLua|BenchmarkScanner|BenchmarkScannerWithLua)$" github.com/nginxinc/nginx-go-crossplane -count=1 goos: darwin goarch: arm64 pkg: github.com/nginxinc/nginx-go-crossplane BenchmarkLex/simple-10 57963 17756 ns/op 103049 B/op 39 allocs/op BenchmarkLex/with-comments-10 60025 20067 ns/op 103112 B/op 45 allocs/op BenchmarkLex/messy-10 26170 47822 ns/op 104400 B/op 168 allocs/op BenchmarkLex/quote-behavior-10 74510 17693 ns/op 102961 B/op 26 allocs/op BenchmarkLex/quoted-right-brace-10 43134 27752 ns/op 103560 B/op 54 allocs/op BenchmarkLex/comments-between-args-10 78271 14866 ns/op 102937 B/op 27 allocs/op BenchmarkLexWithLua/lua-basic-10 46273 26012 ns/op 105499 B/op 53 allocs/op BenchmarkLexWithLua/lua-block-simple-10 22514 54149 ns/op 108556 B/op 143 allocs/op BenchmarkLexWithLua/lua-block-larger-10 25983 46605 ns/op 108403 B/op 59 allocs/op BenchmarkLexWithLua/lua-block-tricky-10 33756 35067 ns/op 106684 B/op 66 allocs/op BenchmarkScanner/simple-10 163138 7084 ns/op 4648 B/op 36 allocs/op BenchmarkScanner/with-comments-10 144558 8100 ns/op 4712 B/op 42 allocs/op BenchmarkScanner/messy-10 47570 25026 ns/op 6000 B/op 165 allocs/op BenchmarkScanner/quote-behavior-10 222280 5083 ns/op 4560 B/op 23 allocs/op BenchmarkScanner/quoted-right-brace-10 82656 14281 ns/op 5160 B/op 51 allocs/op BenchmarkScanner/comments-between-args-10 225475 4872 ns/op 4536 B/op 24 allocs/op BenchmarkScannerWithLua/lua-basic-10 93081 12833 ns/op 7866 B/op 66 allocs/op BenchmarkScannerWithLua/lua-block-simple-10 31426 37989 ns/op 10924 B/op 156 allocs/op BenchmarkScannerWithLua/lua-block-larger-10 37148 30723 ns/op 10770 B/op 72 allocs/op BenchmarkScannerWithLua/lua-block-tricky-10 54890 22383 ns/op 9050 B/op 79 allocs/op PASS ok github.com/nginxinc/nginx-go-crossplane 29.969s ```
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Proposed changes
Implemented
crossplane.Scannerthat follows the example of other "scanner" types implemented in the Go stdlib. The existingLexuses concurrency to make tokens available to the caller while managing "state". I think this design queue was taken from Rob Pike's 2011 talk on Lexical Scanning in Go. If you look at examples from the Go stdlib-- such asbufio.ScannerthatLexdepends on-- you'd find that this isn't the strategy being employed and instead there is a struct that manages the state of the scanner and a method that used by the caller to advance the scanner to obtain tokens.After a bit of Internet archeology, I found this post on
golang-nutsfrom Rob Pike himself:An implementation of a "scanner" using the more "traditional" model-- much of the logic is the same or very close to
Lex-- seems to support the above statement.This alternative to
Lexis probably a micro-optimization for many use cases. As the size and number of NGINX configurations that need to be analyzed grows, optimization can be a good thing as well as an API that feels familiar to Go developers who might use this tool for their own purposes.Next steps:
Scannerto "parse" NGINX configurations. I think this should be done in place so that the existing API works as is, but we should also expose a way to allow the caller to provide the scanner.Lexin favor ofScanner. If we leaveLexin place then I don't think we would need av2of the crossplane package (yet).Checklist
Before creating a PR, run through this checklist and mark each as complete.
CONTRIBUTINGdocumentREADME.md)