Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcc: Teach HEADER symbols to accept constant strings #3768

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

dridi
Copy link
Member

@dridi dridi commented Jan 5, 2022

Static strings are unfortunately copied into workspace simply because
they don't belong to it. Since it is safe to reference a static string,
or at the very least a string that is guaranteed to outlive a VCL
transaction since in this case it has the same scope as the VCL shared
object, we can bypass the regular STRANDS assignment.

When a transaction runs out of workspace, fails and goes to vcl_synth,
we don't want to fail again because of a static VCL assignment of the
content-type and retry-after headers from the built-in VCL.

Refs #3765

dridi added 7 commits January 5, 2022 09:51
This tracking was very approximate, leading to many constant expressions
being considered variable. Instead of marking all expressions variable
right away, we can keep track of expressions that are edited on top of
a constant expression.

If a variable expression is simply the edition of a constant expression,
for example a constant STRING edited into a STRANDS, the variable one
keeps track of its constant origins.
If the expression was successfully parsed, it is possible to define how
to edit it, whether it is variable or constant. This allows vcc_Expr()
call sites to pass edit strings instead of making calls to Fb() and
intertwine them with INDENT adjustments.
Go one step further in the reuse of the expression edit syntax and have
a proper "\v2" sequence to refer to the symbol on the RHS in the assign
table.
Now that assignments parse expressions with their own edit string, there
is no longer a problematic boundary in the middle of an LBODY_* enum.
Static strings are unfortunately copied into workspace simply because
they don't belong to it. Since it is safe to reference a static string,
or at the very least a string that is guaranteed to outlive a VCL
transaction since in this case it has the same scope as the VCL shared
object, we can bypass the regular STRANDS assignment.

When a transaction runs out of workspace, fails and goes to vcl_synth,
we don't want to fail again because of a static VCL assignment of the
content-type and retry-after headers from the built-in VCL.

Refs varnishcache#3765
And get rid of a trailing space in the generated C code.
@dridi
Copy link
Member Author

dridi commented Jan 5, 2022

I added cef6f00 to show that vcc_ExprEdit() can be useful for more than just constant HEADER assignments, it can also help simplify libvcc.

The general pattern is:

-       Fb(tl, 1, "BEFORE\n");
-       tl->indent += INDENT;
-       vcc_Expr(tl, FOO);
-       tl->indent -= INDENT;
+       vcc_ExprEdit(tl, FOO, "BEFORE\v+\n\v1\v-AFTER\n", NULL);
        ERRCHK(tl);
-       Fb(tl, 1, "AFTER\n");

There may be other types of symbols that could benefit from a constant assignment, but HEADER symbols seem to be the only obvious ones with on one hand one less STRANDS detour and on the other hand one less workspace allocation.

@nigoroll
Copy link
Member

nigoroll commented Feb 14, 2025

I came back here after we found agreement on #4269. I am not sure if I expressed my thoughts on this ticket clearly enough, so here is an attempt to make good on that.

My understanding of the premise of this ticket is that when we have set x.http.a = "foo", we can can generate a header string "a: foo" and basically pass this as is (via some intermediate functions) to http_SetH(). This is certainly a good idea.

The additional idea I mentioned is that we can also optimize set x.http.a = y.http.a for the same reason, because y.http already has the a: <whatever> header, we just don't have the API to return it.

My hope was that, if we taught VCC to handle the "name: value" string instead of HEADER and "value", this would more or less fall out as a byproduct, but looking at this PR again, it now seems to me that the implementation is indeed less general than I initially thought when I commented on it in #4269.

So now my question is: Is it worth it? I can see that the optimization for VCL constants makes sense, in particular for avoiding out-of-workspace conditions on error paths, but the set x.http.a = y.http.a case is probably an exotic edge case.

The other case which we have optimized with #4269 is "value assignments" like set resp.reason = resp.http.foo, but that did not need any VCC changes.


(This section might be going to far, please do not take it as a concrete proposal...)

Another question is, if we are going to touch this area anyway, are there better options, still?

One obvious idea would be to represent headers as

struct hdnv {
	unsigned namelen, valuelen;	// 2, 3
	const char *name;		// "a: "
	const char *value;		// "foo"
}

Compared to our txt, we would increase storage for header pointers and the number of I/O vectors needed to send HTTP/1 headers by 50% 1, but we could easily avoid copying in "all" cases, because we could set name and value independently.

Footnotes

  1. Currently we send txt + "\r\n", so 50% would be the worst case addition because we could detect the common name + namelen + 2 == value and coalesce, but probably kernels do this already anyway?

@dridi
Copy link
Member Author

dridi commented Feb 17, 2025

it now seems to me that the implementation is indeed less general than I initially thought when I commented on it in #4269.

The main generic aspect of the change is the refinement of constant expressions across "proxy" expressions. For example a "static literal" turned into a single-component STRANDS would track its constant parent expression and should in theory still be considered constant once turned back into a STRING.

Any symbol can register expression edits for constant expressions (with the help of the assignment facility moving to actual expression edits) and at this point we are still in generic territory.

It just so happens that only HEADER learns how to assign constant expressions.

The additional idea I mentioned is that we can also optimize set x.http.a = y.http.a for the same reason, because y.http already has the a: <whatever> header, we just don't have the API to return it.

I'm not too worried about this. The assignment facility can only know how to deal with set HEADER = STRANDS actions, because right now it merely states the desired type. To go one step further for set actions we'd need to parse both LHS and RHS upfront to know the effective type of the RHS and gain the ability to favor HEADER expressions, falling back to STRANDS. The same would apply for STRING falling back to STRANDS (for the '=' operator at least).

The other case which we have optimized with #4269 is "value assignments" like set resp.reason = resp.http.foo, but that did not need any VCC changes.

I didn't realize that, I thought it was only for the obj2resp setup. I need to have another look at #4269.

Another question is, if we are going to touch this area anyway, are there better options, still?

I'm not up for touching how we use this format to pack headers in storage.

@nigoroll
Copy link
Member

Just for completeness, again, I do not want to push this topic:

I'm not up for touching how we use this format to pack headers in storage.

The storage representation would not change, it would still be "header: value". What would change is how our API looks at it. At the moment, it has a txt double pointer, the idea mentioned here is instead one "name" and one "value" pointer, both with a length. For headers from storage, the "value" pointer would point to the same string as "name".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants