Skip to content

Add internal URI handling API #19073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

kocsismate
Copy link
Member

No description provided.

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some first remarks. Did not yet look at everything.

Comment on lines +459 to +453
static zend_string *parse_url_uri_to_string(void *uri, uri_recomposition_mode_t recomposition_mode, bool exclude_fragment)
{
ZEND_UNREACHABLE();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to simply NULL the pointer in the uri_handler_t struct instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the same comment from @DanielEScherzer in the original PR, and I wrote him that I would like to avoid making the handlers optional if possible, because this way the existence of the handlers don't have to be checked before their usage - it's advantageous both for maintainability and performance.

The parse_url based implementation is special because it's not directly exposed for userland - it's just an internal URI "backend" for BC, and these handlers aren't necessarily needed for now. We could of course expose the to_string handlers later for 3rd party extensions if we want to. Then the code should probably be changed to something else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A function that triggers undefined behavior when called (this is what ZEND_UNREACHABLE implies for production builds) and not having a function (i.e. dereferencing a NULL pointer when trying to call the function) are functionally the same. In both cases the PHP binary will do something bad (ideally just crash).

Thus it seems to be preferable to clearly indicate that the handler is not available by using NULL rather than pretending there is a handler when calling it is unsafe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see what you mean now. I can't comment on what method is preferable, but intentionally passing a NULL value instead of a handler function, while callers of the handlers never expect NULL also seems wrong. Normally, static analyzers would emit an error in this case (in PHP for sure, and I don't know about C), that's why I didn't even think about this solution.

TBH the code which uses ZEND_UNREACHABLE() is unreachable indeed if one uses the internal API: currently, no function is exposed that would make use of the relevant handlers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give an example of what compilers do with this right now: They'll simply not emit any instructions at all, leading to the CPU executing random stuff from the binary! This is super unsafe.

see: https://godbolt.org/z/Gve8dGfhT

Comment on lines 111 to 114
if (uri_handler_name == NULL) {
return uri_handler_by_name("parse_url", sizeof("parse_url") - 1);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to parse_url in a new API is probably not a good idea. Instead the “legacy” users should just pass "parse_url" explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to parse_url here works because that's the default indeed where php_uri_get_handler() is called, the other "backends" can only be used if the config is explicitly passed (not null).

The other reason why I opted for this approach is that it would be inconvenient to create and free a new zend_string when the legacy implementation is needed, and I wanted to avoid adding a known string just for this purpose, or exposing the C string based uri_handler_by_name function instead.

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at this again and I must say that I'm having trouble meaningfully reviewing this. It adds a large amount of code with unclear purpose and confusing (to me) naming.

@kocsismate kocsismate marked this pull request as ready for review July 21, 2025 08:09
Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary review round

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The switch from zend_string to the pointer-length pair seems to have been a good idea

@TimWolla TimWolla self-requested a review July 22, 2025 20:06
@TimWolla
Copy link
Member

I'll try to take another look tomorrow.

@kocsismate
Copy link
Member Author

@nielsdos Do you see anything that I should fix before merging it? I'd like to implement some of the cleanups that we discussed

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good, FWIW I agree with Tim on UNREACHABLE

if (resource->pass) {
php_url_decode(ZSTR_VAL(resource->pass), ZSTR_LEN(resource->pass));
strcat(scratch, ZSTR_VAL(resource->pass));
if (resource->password) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things I'm thinking about:

  • This code block calls php_url_decode, which should result in the correct output string because you're using URI_COMPONENT_READ_RAW above, right?
  • The decoding happens in-place, this means it overwrites resource->password. Is this always fine? If so, can you document a guarantee that the strings returned by php_uri_parse_to_struct are new strings?
  • Pre-existing: If the string has NUL bytes (didn't look up whether that's valid) then strcat will go wrong.
  • Pre-existing: the string length of resource->password is not updated after decoding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants