Compress parser codepaths for repetitive code & Update Algo docs #65

Ed94 · 2024-12-15T23:09:49Z

The parser needs a review both for updating the algo's documentation and to reduce repetitive code.

As of 868b93c much of the parser is littered with manual look-ahead loops to resolve contextual ambiguities:

Lines 2488 to 2524 in 868b93c

    
           // Check three tokens ahead to make sure that were not dealing with a constructor initialization... 
        
           //                  (         350.0f    ,         <---  Could be the scenario 
        
           // Example : <Capture_Start> <Value> <Comma> 
        
           //                 idx         +1      +2 
        
           bool detected_comma = _ctx->parser.Tokens.Arr[ _ctx->parser.Tokens.Idx + 2 ].Type == Tok_Comma; 
        
           b32   detected_non_varadic_unpaired_param = detected_comma && nexttok.Type != Tok_Varadic_Argument; 
        
           if (! detected_non_varadic_unpaired_param && nexttok.Type ==  Tok_Preprocess_Macro_Expr) for( s32 break_scope = 0; break_scope == 0; ++ break_scope) 
        
           { 
        
           	Macro* macro = lookup_macro( nexttok.Text ); 
        
           	if (macro == nullptr || ! macro_is_functional(* macro)) 
        
           		break; 
        
           	// (   <Macro_Expr> (   
        
           	// Idx      +1     +2 
        
           	s32  idx    = _ctx->parser.Tokens.Idx + 1;   
        
           	s32  level = 0; 
        
           	// Find end of the token expression 
        
           	for ( ; idx < array_num(_ctx->parser.Tokens.Arr); idx++ ) 
        
           	{ 
        
           		Token tok = _ctx->parser.Tokens.Arr[ idx ]; 
        
           		if ( tok.Type == Tok_Capture_Start ) 
        
           			level++; 
        
           		else if ( tok.Type == Tok_Capture_End && level > 0 ) 
        
           			level--; 
        
           		if (level == 0 && tok.Type == Tok_Capture_End) 
        
           			break; 
        
           	} 
        
           	++ idx; // Will incremnt to possible comma position 
        
           	if ( _ctx->parser.Tokens.Arr[ idx ].Type != Tok_Comma ) 
        
           		break; 
        
           	detected_non_varadic_unpaired_param = true; 
        
           }

for example, the above uses raw iteration through the lexed tokens to resolve if after thee macro argument is a comma.

We need to setup utilizing a slice for a set of tokens to look ahead that will behave as a sub-slice of the full lexed slice:

struct LexSlice
{
	Token* Ptr;
	s32      Len;
	s32      Idx;
};

Like with the regular tokens array, it needs a simple interface to navigate with it (could probably just recycle the current one with TokArray and just change it to take a slice instead.

This sort of iteration is used throughout the parser for aggregating tokens the parser cannot parse:

eat( Tok_Capture_Start );

s32 level = 0;
while ( left && ( currtok.Type != Tok_Capture_End || level > 0 ) )
{
	if ( currtok.Type == Tok_Capture_Start )
		level++;
	else if ( currtok.Type == Tok_Capture_End && level > 0 )
		level--;

	eat( currtok.Type );
}
eat( Tok_Capture_End );

It can be generalized for both consumption and look-ahead.

The text was updated successfully, but these errors were encountered:

Ed94 added the simplification Simplifying the library label Dec 15, 2024

Ed94 added this to the C++ Parser support complete milestone Dec 15, 2024

Ed94 added this to gencpp roadmap Dec 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compress parser codepaths for repetitive code & Update Algo docs #65

Compress parser codepaths for repetitive code & Update Algo docs #65

Ed94 commented Dec 15, 2024

Compress parser codepaths for repetitive code & Update Algo docs #65

Compress parser codepaths for repetitive code & Update Algo docs #65

Comments

Ed94 commented Dec 15, 2024