Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress parser codepaths for repetitive code & Update Algo docs #65

Open
Ed94 opened this issue Dec 15, 2024 · 0 comments
Open

Compress parser codepaths for repetitive code & Update Algo docs #65

Ed94 opened this issue Dec 15, 2024 · 0 comments
Labels
simplification Simplifying the library

Comments

@Ed94
Copy link
Owner

Ed94 commented Dec 15, 2024

The parser needs a review both for updating the algo's documentation and to reduce repetitive code.

As of 868b93c much of the parser is littered with manual look-ahead loops to resolve contextual ambiguities:

// Check three tokens ahead to make sure that were not dealing with a constructor initialization...
// ( 350.0f , <--- Could be the scenario
// Example : <Capture_Start> <Value> <Comma>
// idx +1 +2
bool detected_comma = _ctx->parser.Tokens.Arr[ _ctx->parser.Tokens.Idx + 2 ].Type == Tok_Comma;
b32 detected_non_varadic_unpaired_param = detected_comma && nexttok.Type != Tok_Varadic_Argument;
if (! detected_non_varadic_unpaired_param && nexttok.Type == Tok_Preprocess_Macro_Expr) for( s32 break_scope = 0; break_scope == 0; ++ break_scope)
{
Macro* macro = lookup_macro( nexttok.Text );
if (macro == nullptr || ! macro_is_functional(* macro))
break;
// ( <Macro_Expr> (
// Idx +1 +2
s32 idx = _ctx->parser.Tokens.Idx + 1;
s32 level = 0;
// Find end of the token expression
for ( ; idx < array_num(_ctx->parser.Tokens.Arr); idx++ )
{
Token tok = _ctx->parser.Tokens.Arr[ idx ];
if ( tok.Type == Tok_Capture_Start )
level++;
else if ( tok.Type == Tok_Capture_End && level > 0 )
level--;
if (level == 0 && tok.Type == Tok_Capture_End)
break;
}
++ idx; // Will incremnt to possible comma position
if ( _ctx->parser.Tokens.Arr[ idx ].Type != Tok_Comma )
break;
detected_non_varadic_unpaired_param = true;
}

for example, the above uses raw iteration through the lexed tokens to resolve if after thee macro argument is a comma.

We need to setup utilizing a slice for a set of tokens to look ahead that will behave as a sub-slice of the full lexed slice:

struct LexSlice
{
	Token* Ptr;
	s32      Len;
	s32      Idx;
};

Like with the regular tokens array, it needs a simple interface to navigate with it (could probably just recycle the current one with TokArray and just change it to take a slice instead.

This sort of iteration is used throughout the parser for aggregating tokens the parser cannot parse:

eat( Tok_Capture_Start );

s32 level = 0;
while ( left && ( currtok.Type != Tok_Capture_End || level > 0 ) )
{
	if ( currtok.Type == Tok_Capture_Start )
		level++;
	else if ( currtok.Type == Tok_Capture_End && level > 0 )
		level--;

	eat( currtok.Type );
}
eat( Tok_Capture_End );

It can be generalized for both consumption and look-ahead.

@Ed94 Ed94 added the simplification Simplifying the library label Dec 15, 2024
@Ed94 Ed94 added this to the C++ Parser support complete milestone Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
simplification Simplifying the library
Projects
Status: No status
Development

No branches or pull requests

1 participant