Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Portable UTF-8 #38

Open
wants to merge 25 commits into
base: refactoring
Choose a base branch
from
Open

Portable UTF-8 #38

wants to merge 25 commits into from

Conversation

nibra
Copy link
Contributor

@nibra nibra commented Sep 22, 2021

Pull Request for Issue #29

This pull request replaces the abandoned library phputf8 with Portable UTF-8.

Portable UTF-8 is written in PHP and can work without mbstring, iconv or any other extra encoding php-extension on the server. It is easy to use and provides much more functionality than the former implementation. This library will
auto-detect the server environment and will use the installed php-extensions if they are available,
so the best possible performance is provided. As a fallback, Symfony Polyfills are used, if needed.

(Freely quoted from Portable UTF-8)

This PR is kept separate from the Test Consolidation #37 for now, so the tests can be updated and used for both new and old implementation, if issues arise.

Summary of Changes

  • Replaced phputf8 with Portable UTF-8
  • Revised inline documentation
  • Added tests to verify that signature consolidation does not break existing code

Testing Instructions

  • Code review. Check especially that
    • signatures did not change in a breaking way
    • behaviour does not change in an unexpected way

Documentation Changes Required

Yes, but should be generated from source. Inline documentation has been revised.

Outstanding Decisions

  1. Do we consider the utf8_* functions to be part of this package's API, as Micheal suggests?
  2. Do we want to encourage the use of an external library (thus deprecate proxy methods calling that library) or do we want to keep the whole scope of services and maintain it ourselves?

@nibra nibra marked this pull request as draft September 22, 2021 21:50
@nibra nibra changed the title Portable utf8 Portable UTF-8 Sep 22, 2021
@nibra nibra marked this pull request as ready for review September 23, 2021 18:45
@nibra
Copy link
Contributor Author

nibra commented Sep 23, 2021

Finally managed to provide Drone CI with the locales needed to conduct the string comparision tests.

@HLeithner
Copy link

please provide a PR against our docker-images repo and include the locales when this PR is merged and remove it from here.

@HLeithner
Copy link

I see you are changing some function signatures which I would say is not allowed because of b/c, Also you deprecated the utf8 functions say as alternative to use UTF8::XXX (I think you mean voku\helper\UTF8) which means the cms has to load the dependency directly? If you replace this lib again with something new and remove the dependency here the cms would fail right?

@mbabker
Copy link
Contributor

mbabker commented Sep 23, 2021

This pull request, as is, necessitates a major release. Why do I care? Because my company still uses this product and the last thing I need is to work around broken releases with major compatibility breaks or explain to our customers why we are freezing code updates for their websites.

  • Files that are provided by this package are removed, like it or not the third party API has become a de facto part of this package's API and must be treated as first-party code regarding backward (and forward) compatibility; I'm sure few will remember this, but inclusions of the phputf8 files for extensions were broken in the CMS 3.4 release because the old files were removed, hence those file paths being restored as proxies for backward compatibility
  • A number of methods have narrowed parameter types with no compatibility layer for the dropped types (there are instances where the type is changed from integer|boolean to integer (which, there are also undocumented nulls supported in these cases based on the code, a documentation bug in its own right), however, there is no handling to ensure passing a boolean still produces the expected result, you are essentially lucking out that PHP's type coercion might work more often than not)
  • Debatably, widening the allowed return types can introduce B/C issues for downstream consumers not expecting the new types (i.e. a method which previously only returns a string and can now return an array or string can cause type issues if the return value is passed into something else expecting only a string)

Additional nitpicks:

  • IDE inspection annotations do not belong in committed code like this, especially the ones used which apply to a single product; these are the types of inspections that should be covered with a proper static analysis tool where either the code is annotated (to the benefit of the downstream user who might actually care about strong static analysis in their applications) or issues ignored with a proper exclusions list, but JetBrains specific annotations should not be used

@nibra
Copy link
Contributor Author

nibra commented Sep 23, 2021

I see you are changing some function signatures which I would say is not allowed because of b/c,

Signatures below src/ should only have changed in a compatible way; if there are breaking changes, it needs to be corrected.

Also you deprecated the utf8 functions say as alternative to use UTF8::XXX (I think you mean voku\helper\UTF8) which means the cms has to load the dependency directly?

AFAIK, the Portable UTF-8 library is already part of the CMS indirectly (see issue comment).

If you replace this lib again with something new and remove the dependency here the cms would fail right?

That should not happen, hence the deprecation. But, of course, it can and should be discussed

@nibra
Copy link
Contributor Author

nibra commented Sep 23, 2021

This pull request, as is, necessitates a major release. Why do I care? Because my company still uses this product and the last thing I need is to work around broken releases with major compatibility breaks or explain to our customers why we are freezing code updates for their websites.

Thank you for your feedback, which I appreciate very much!

Files that are provided by this package are removed, like it or not the third party API has become a de facto part of this package's API and must be treated as first-party code regarding backward (and forward) compatibility; I'm sure few will remember this, but inclusions of the phputf8 files for extensions were broken in the CMS 3.4 release because the old files were removed, hence those file paths being restored as proxies for backward compatibility

I neither knew nor expected that the underlying library's API became part of the package's API. The whole point with providing a facade is to avoid this. So that's something we need to discuss thoroughly.

A number of methods have narrowed parameter types with no compatibility layer for the dropped types (there are instances where the type is changed from integer|boolean to integer

Those changes should be restricted to the docblocks, not the signature itself. If a signature is affected in code, it needs to be corrected.

(which, there are also undocumented nulls supported in these cases based on the code, a documentation bug in its own right), however, there is no handling to ensure passing a boolean still produces the expected result, ...)

This should be covered by the tests; if something's missing, the tests are separated from this PR, so they can be applied to both old and new code.

Debatably, widening the allowed return types can introduce B/C issues for downstream consumers not expecting the new types (i.e. a method which previously only returns a string and can now return an array or string can cause type issues if the return value is passed into something else expecting only a string)

The method in question has no type check and always accepted both strings and arrays of strings. Only thing changed here is the docblock, i.e., documentation.

IDE inspection annotations do not belong in committed code like this, especially the ones used which apply to a single product; these are the types of inspections that should be covered with a proper static analysis tool where either the code is annotated (to the benefit of the downstream user who might actually care about strong static analysis in their applications) or issues ignored with a proper exclusions list, but JetBrains specific annotations should not be used

Agree - those are overlooked residues.

Docs - Add change of default values to inline documentation
# Conflicts:
#	src/StringHelper.php
Refactor - Set default to 0 in strpos and strrpos
@mbabker
Copy link
Contributor

mbabker commented Sep 24, 2021

Files that are provided by this package are removed, like it or not the third party API has become a de facto part of this package's API and must be treated as first-party code regarding backward (and forward) compatibility; I'm sure few will remember this, but inclusions of the phputf8 files for extensions were broken in the CMS 3.4 release because the old files were removed, hence those file paths being restored as proxies for backward compatibility

I neither knew nor expected that the underlying library's API became part of the package's API. The whole point with providing a facade is to avoid this. So that's something we need to discuss thoroughly.

Answered in the discussion, but the TL;DR is that StringHelper can be refactored without issue but rm -rf src/phputf8 has B/C implications.

A number of methods have narrowed parameter types with no compatibility layer for the dropped types (there are instances where the type is changed from integer|boolean to integer

Those changes should be restricted to the docblocks, not the signature itself. If a signature is affected in code, it needs to be corrected.

When the API lacks typed signatures, the doc block is the contract for the signature. So, even removing a type from the doc blocks can be considered a B/C break as you are declaring that a previously supported type is no longer supported.

(which, there are also undocumented nulls supported in these cases based on the code, a documentation bug in its own right), however, there is no handling to ensure passing a boolean still produces the expected result, ...)

This should be covered by the tests; if something's missing, the tests are separated from this PR, so they can be applied to both old and new code.

So, there are two things in this comment:

  1. A lack of <foo>|null in the doc blocks. There are a lot of params that have a type in the doc block then have a default null value, which implicitly means that null is a supported type; this should turn into explicit by adding the null type

  2. The type narrowing changes. Because of type coercion in PHP and the lack of declare strict_types or typed signatures in the Joomla API, dropping integer|boolean to integer probably won't break anything from a practical perspective. But, my main point here is that the supported types are narrowed and there is no communication to the consumer beyond reviewing the doc blocks between each release of this change.

Debatably, widening the allowed return types can introduce B/C issues for downstream consumers not expecting the new types (i.e. a method which previously only returns a string and can now return an array or string can cause type issues if the return value is passed into something else expecting only a string)

The method in question has no type check and always accepted both strings and arrays of strings. Only thing changed here is the docblock, i.e., documentation.

This was the return types, not the parameter types. Again, lacking types in the PHP code itself, the doc block acts as the contract. An example is with StringHelper::str_ireplace(), which this PR shows has the return type changing from string to string[]|string, i.e. type widening. These types of subtle changes, while maybe not B/C breaks in the strictest sense of the term, can have practical B/C concerns to consumers (i.e. if someone were calling trim(StringHelper::str_ireplace('foot', 'foo', ' football ')), the added array return would break that call because trim() expects a string). Now, if the method (or any of them where the return types have changed in this PR) actually did return arrays previously, then it is a valid documentation fix and the concern here is a moot point. But, in general, return type widening does need to be handled with care.

@nibra
Copy link
Contributor Author

nibra commented Sep 24, 2021

When the API lacks typed signatures, the doc block is the contract for the signature.

I get your point, but I'm not sure if I agree. Since the PHP interpreter does not look at the doc block, the doc block IMHO only serves documentation purposes. If I want the consumer to either omit an argument or provide an integer, I tell them in the doc block that the argument is an optional integer. The fact, that the code might need a null value to detect omission, is an internal (implementation) problem, but should not "pollute" the signature. Am I wrong?

An example is with StringHelper::str_ireplace(), which this PR shows has the return type changing from string to string[]|string, i.e. type widening.

StringHelper::str_ireplace() returns an array only, if you provide an array as subject, it will always return a string, if subject is a string. Thus in this case, yes, the type is widened, but it should not affect existing code.

@mbabker
Copy link
Contributor

mbabker commented Sep 24, 2021

When the API lacks typed signatures, the doc block is the contract for the signature.

I get your point, but I'm not sure if I agree. Since the PHP interpreter does not look at the doc block, the doc block IMHO only serves documentation purposes.

Then by that argument, any untyped parameter must accept all types without generating a runtime error. If I pass an array into a parameter documented as only supporting a string or an integer, am I wrong for violating the types in the documentation, or is the code wrong for not correctly handling all potential types since a type is not enforced at the language level? This is why a doc block should be considered part of the API contract.

If I want the consumer to either omit an argument or provide an integer, I tell them in the doc block that the argument is an optional integer. The fact, that the code might need a null value to detect omission, is an internal (implementation) problem, but should not "pollute" the signature. Am I wrong?

IMO, yes.

<?php declare(strict_types=1);

function add_up_to_three_numbers(int|float $number1, int|float $number2, int|float $number3 = null): int|float
{
    $return = $number1 + $number2;

    if ($number3 !== null) {
        $return += $number3;
    }

    return $return;
}

While this actually is legal on PHP 8.0, to me this is an invalid function signature as it reads that each parameter MUST be an integer or float with no nullability. The way PHP interprets things, null is allowed even though the parameter's type is not declared nullable. With an explicit nullable declaration (i.e. int|float|null $number3 = null), it is conveyed to everyone that the parameter is expected to be nullable and that any developers coming in later should account for a potentially null value. Explicit communication of intent is always going to be better in the long run than implicitly relying on implementation details.

Until PHP 8, you have to pass all arguments if you want to set a value on any optional argument, so something like json_decode($data, true, 512, \JSON_THROW_ON_ERROR) is pretty common if you want PHP to throw exceptions on JSON decoding errors. That's a problem because I have to know the implementation details of that function to set a flag since I can't skip that third (depth) argument. If an optional argument on a function is documented as requiring a string, I generally would pass an empty string as my way of skipping that argument. But, the function's implementation details state that the optional argument supports a null value; am I wrong for passing an empty string (as the doc block documents) to signal "I don't want to set a value for this argument" or is the function wrong for not documenting that null is acceptable?

An example is with StringHelper::str_ireplace(), which this PR shows has the return type changing from string to string[]|string, i.e. type widening.

StringHelper::str_ireplace() returns an array only, if you provide an array as subject, it will always return a string, if subject is a string. Thus in this case, yes, the type is widened, but it should not affect existing code.

That entire example might have been a poor one to use, but hopefully, it conveyed my point; widening the potential return values absolutely can have B/C concerns for downstream users and needs to be treated with care.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants