Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexically Require Module #36

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Lexically Require Module #36

wants to merge 4 commits into from

Conversation

Ovid
Copy link

@Ovid Ovid commented Jul 23, 2023

Perl a P5P discussion, this PPC proposes a syntax for lexically requiring modules to avoid fragile transitive dependencies.

@Grinnz
Copy link

Grinnz commented Jul 23, 2023

I find this cognitively difficult, at the implementation and logical levels. I apologize if I'm misunderstanding things or this has been covered in the discussion.

If it's meant to affect only the keyword parser (which could actually be done lexically) then it would not do much of anything because it would not be able to affect code outside of its scope which would just treat Some::Module as a string and defer the lookup to runtime.

If it's meant to affect it at runtime then the effect would be dynamic and not lexical, and this seems incredibly complicated (tearing down the whole package namespace?)

From the user point of view, it seems confusing as it's called "lexically require" yet requiring a module runs arbitrary import methods (and the whole body of the module) which almost always have non-lexical effects.

@demerphq
Copy link

If it's meant to affect it at runtime then the effect would be dynamic and not lexical, and this seems incredibly complicated (tearing down the whole package namespace?)

Speaking abstractly and independently from the actual text of ovids proposal I don't see why it would that difficult to maintain a hash of permission data that models "from this package you are allowed to access that package". Every time we did a method lookup we would check the current package, and see if the permission hash allows access to the other package, if it didnt we would throw an exception. Obviously it would slow down such calls, maybe significantly, but perhaps it could be implemented in a way that one need not pay this price in production code.

@Grinnz
Copy link

Grinnz commented Jul 23, 2023

If it's meant to affect it at runtime then the effect would be dynamic and not lexical, and this seems incredibly complicated (tearing down the whole package namespace?)

Speaking abstractly and independently from the actual text of ovids proposal I don't see why it would that difficult to maintain a hash of permission data that models "from this package you are allowed to access that package". Every time we did a method lookup we would check the current package, and see if the permission hash allows access to the other package, if it didnt we would throw an exception.

Performance concerns aside, the fundamental problem here is that the design of this feature is to change behavior everywhere except in its scope, i.e. all existing code would no longer be able to call methods by default. The only way I could see that working is if there was another feature that applied such a restriction in that scope.

@demerphq
Copy link

change behavior everywhere except in its scope

Do you mean what I said, or what Ovid wrote? I have only skimmed Ovids text so maybe I missed something. As far as what i wrote I would implement it in such a way that it would only apply if the current package was registered in the top level of the hash. Something like this code:

if (my $allowed = $namespace_permission{$from_pack}) {
  unless ($allowed->{$to_pack}) { 
    die "You cannot call a method in $to_pack from $from_pack without requiring $to_pack explicitly" 
  }
}

So when the user enabled the feature in a given package we would add a subhash under the key $from_pack. When they required a given package we would basically do $namespace_permission{$from_pack}{$to_pack}++.

Am I misunderstanding your point?

@Grinnz
Copy link

Grinnz commented Jul 23, 2023

I'm referring to the feature proposed, which only involves a feature declaration in the scope where the module is required.

@demerphq
Copy link

I'm referring to the feature proposed, which only involves a feature declaration in the scope where the module is required.

Which as far as I understand it could be covered by the code I described, and would not impact other namespaces.

use Some::Module;
```

With the above, code outside of the above scope cannot see `Some::Module` unless

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @Grinnz is quite fairly calling out this as confusing or ambiguous. It should probably be worded:

Within the lexcial scope of the 'lexical_require' feature code cannot call methods against class names that have not been explicitly required within the current package, and doing so would throw an exception.  Methods would be allowed against any object (blessed reference), but not against a class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@demerphq suggested:

Within the lexical scope of the 'lexical_require' feature, code cannot call methods against class names that have not been explicitly required within the current package, and doing so would throw an exception. Methods would be allowed against any object (blessed reference), but not against a class.

I read that as changing the meaning, but I'm unsure if it was meant to. Rewriting it:

Within a given lexical scope, ß, if the 'lexical_require' feature is used, code outside of scope ß cannot call methods against class names that have not been explicitly required within the current package, and doing so would throw an exception. Methods would be allowed against any object (blessed reference), but not against a class.

However, that doesn't mean the transitive dependencies aren't available. If scope ß uses lexical_require and Hash::Ordered, but scope uses Hash::Ordered but doesn't use lexical_require, then Hash::Ordered is still available to everyone as a transitive dependency.

Thus, this doesn't offer perfect protection, but it does mean that I can write my own code in such a way that people will be less likely to rely on my internals.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within a given lexical scope, ß, if the 'lexical_require' feature is used, code outside of scope ß cannot call methods against class names that have not been explicitly required within the current package, and doing so would throw an exception. Methods would be allowed against any object (blessed reference), but not against a class.

I think the problem with this is what @Grinnz was concerned about: use of a lexical feature in one scope is affecting code in another totally unrelated scope. In simple terms that is a no-no.

Its fine to say "within THIS scope I cant do something", but the way you have worded it becomes action at a distance, and a practical problem to implement to boot. IOW, if you want those semantics you probably cant have the feature. If you are willing to adopt my version on the other hand it is doable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, now I know I'm understanding you correctly. I'll rewrite the PPC as soon as I practically can. Makes total sense.

@demerphq
Copy link

@Grinnz I think i get where you are coming but I think that @Ovid was a touch ambiguous in his wording and worded it such that it might be interpreted as meaning the effect would be broader than i think he intended. If I have package X, which requires Y, and packages Z and W which do not require Y, and Z does not use this feature then it should be allowed to access methods in Y via the Y classname even in W did use this feature and was forbidden from accessing Y because it had not been explicitly required. IOW, the only package affected by this feature should be the package it is used within.

FWIW, I could almost see us saying "this shouldn't be lexically controlled like most features, and we should actually introduce a new keyword like 'namespace' which functioned like package, but had more restrictive semantics. EG, i think it would be awkward if

package Z;
Y->foo(); # legal
use feature `lexical_require`;
Y->foo(); # throws exception

If we had a 'namespace' keyword then it would be clear that this feature affected the entirety of code compiled within the Z namespace. (We might forbid the user of package Z if someone had previously used namespace Z).

@wchristian
Copy link

wchristian commented Jul 24, 2023

Code outside of that scope cannot use the required module unless it explicitly uses it

I suspect a lot of people would interpret this as follows:

use strict;
use warnings;
package Foo { use feature 'lexical_require'; use Meep 'marp'; marp() }
print $Meep::bar;
Meep::foo();
__END__
Name "Meep::bar" used only once: possible typo at -e line 1.
Use of uninitialized value $Meep::bar in print at -e line 1.
Undefined subroutine &Meep::foo called at -e line 1.

Besides documenting the stance on that (whatever it be), the PPC should explain why the stance is decided as it is.

@tonycoz
Copy link

tonycoz commented Jul 25, 2023

One issue here is that Name->foo() can resolve Name as a file handle, and that is resolved at runtime, so except in the case of no feature "bareword_filehandles" (assuming Perl/perl5#19426 receives a review) any bareword name could be resolved as a handle, making compile-time checking difficult.

The examples all use barewords, what happens with:

my $class = "SomeClass";
$class->somemethod();

if SomeClass isn't visible within the scope.

Of course, $class may have been passed in from elsewhere that has done use SomeClass;.

@shadowcat-mst
Copy link

To expand on the case @tonycoz mentions, an exported sub can also be invoked - things like MooseX::Types and Type::Tiny have to provide a certain amount of cleverness to make a class type for DateTime work since e.g. DateTime->new still needs to work when DateTime is a constant sub export that returns a type object.

Note that this extends to colon separated names as well since Foo::Bar->new will call a Foo::Bar() subroutine if one is present, although I don't recall that case being something I've -yet- had to take into account.

@shadowcat-mst
Copy link

There's an elephant in the room here - %INC

Testing the contents of %INC is used for more than one purpose, notably:

  1. Checking to see if a module is already available (used by e.g. on-demand require() code)

  2. Checking to see if a module is being used at all (used by e.g. App::FatPacker and JSON::MaybeXS)

If %INC outside of the lexical-require-using code shows the module then we'll end up thinking a package is available when it isn't and break case 1

If %INC outside of the code -doesn't- show the module then we break case 2

There's also things like type constraints (and other utility code such as is_class_loaded from Class::Load) that test to see if a class name is loaded/valid on behalf of their caller, I can't remember the actual name already on cpan but think

has foo => (is => 'ro', required => 1, isa => AlreadyLoadedClassName);

(and note that variations on this theme will also test $name->can('can') or ->can('isa') so it's not just %INC that we need to worry about for that one)

The fun part here is that for that to work you'd want the loadedness visible dynamically at the very least ... but of course having it dynamically visible in code that invokes user-supplied callbacks would then see the dependency as loaded and maybe accidentally work in the way this PPC is trying to avoid happening, or maybe result in something -thinking- a module is available and recording that in a variable somewhere but then when a different piece of code tries to use that information the program explodes.

With the above, code outside of the above scope cannot see `Some::Module` unless
it explicitly requires it.
Within a given lexical scope, **ß**, if the 'lexical_require' feature is used,
code outside of scope **ß** cannot call methods against class names that have

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ovid note this needs to be changed. You cant have a feature used in a given scope affect code in a totally different unrelated scope. (As discussed elsewhere.)

@demerphq
Copy link

NOTE all of the following comments are written assuming that @Ovid will change the PPC so that this feature only affects code WITHIN the scope it is used. IMO having a feature affect code outside of its scope is a complete non-starter, and we really don't need to discuss all the myriad reasons why. ("Action at a distance is bad" seems a sufficient reason.)

@tonycoz wrote:

One issue here is that Name->foo() can resolve Name as a file handle,

I think this might be one of those "well don't do that then". IOW, under this feature we would assume that any bareword was a class name, even if it wasn't necessarily. Sure that might mean that the code in question couldn't use bareword filehandles, and it couldn't use functions that look like barewords, but IMO who cares? The feature would be opt-in, so if someone wanted to use constructs like that then they would just not use the feature.

@shadowcat-mst wrote:

DateTime->new
...
this extends to colon separated names as well since Foo::Bar->new will call a Foo::Bar() subroutine if one is present

Same thing for these scenarios. When using this feature DateTime would be assumed to be a bareword, same thing for Foo::Bar. If you wanted to access DateTime() or Foo::Bar() you would be expected to write the parens. Again, this is an opt-in feature, so it is not like it is going to break existing code unless a developer explicitly asks for the feature.

@shadowcat-mst wrote:

If %INC outside of the lexical-require-using code shows the module then we'll end up thinking a package is available when it isn't and break case 1

Same thing here. The feature is opt-in, if code is messing about with %INC then it simply wouldn't use the feature.

@hvds
Copy link

hvds commented Jul 25, 2023 via email

@Ovid
Copy link
Author

Ovid commented Jul 25, 2023

If someone thinks this is interesting and wants to write a better PPC, please go for it. My wallet just got stolen and I have a ton of paperwork to go through. French bureaucracy is performance art. Fortunately, I still have my passport this time.

Edit: In other words, I might not get back to this for a while.

@demerphq
Copy link

@Ovid sorry to hear that. Ill edit the text as I suggested and you agreed with.

@tonycoz
Copy link

tonycoz commented Jul 25, 2023

To expand on the case @tonycoz mentions, an exported sub can also be invoked - things like MooseX::Types and Type::Tiny have to provide a certain amount of cleverness to make a class type for DateTime work since e.g. DateTime->new still needs to work when DateTime is a constant sub export that returns a type object.

I don't think this is a problem, since that resolution to a sub name is already done at compile-time, depending on the visibility of the name, compare:

$ ./perl -Ilib -E 'sub Foo() { "Test" } package Foo { sub bar { say "InFoo" } } package Test { sub bar { say "InTest" } } Foo->bar()'
InTest
$ ./perl -Ilib -E 'package Foo { sub bar { say "InFoo" } } package Test { sub bar { say "InTest" } } Foo->bar(); sub Foo { "Test" }'
InFoo

So this type of change could be compatible with the current compile-time selection of sub-name vs bareword-as-string, any checks for an imported class name could be done at compile-time after

@shadowcat-mst
Copy link

@demerphq the %INC question applies to anything -using- code that uses this feature, so "oh, just don't do that" would make it impossible to use the feature in a CPAN module.

@demerphq
Copy link

@shadowcat-mst please explain the problem you are thinking of in more detail. Assuming this is implemented the way I expect it should be I dont see how it could "make it impossible to use the feature in a CPAN module." Yes that could happen if ovids "action at a distance" model was applied, but IMO that is a totally non-starter anyway. As long as this behavior is restricted to code that uses the feature there should be no action at a distance and no affect on code that does not use the feature.

@shadowcat-mst
Copy link

@tonycoz the * prototype (and moral equivalents in built-ins) to take a symbol name (which is how bareword filehandles work AFAICR) also involve compile-time triggering, hence my considering the cases to be pretty similar.

I do agree that it's no longer a problem if you're careful to write the implementation in a way that avoids said problem - my goal was to raise that that would be required.

@shadowcat-mst
Copy link

@demerphq Since you directly quoted "If %INC outside of the lexical-require-using code" I mistakenly presumed you'd noticed my use of the word "outside" and were responding based on the model that affects things outside.

Obviously, if we go a route that -doesn't- affect anything outside then it isn't going to be an issue to code outside, but if that's what you meant then I'm not sure what you were trying to say.

@wchristian
Copy link

Quick question here, since i can't yet go by changed PPC text: When you rewrite the text, @demerphq, will you include the capability to warn/die on My::UnRequired::->meep but without requiring the ::?

@book
Copy link
Contributor

book commented Nov 22, 2024

@Ovid The discussion mentions a rewrite, and then suggest that someone else might "write a better PPC" if interested.

Which leads to the question: is this PPC PR still relevant? Or should we close this PR and wait for the idea to be championed again at a later time?

The Perl global namespace and the visibility of transitive dependencies are definitely interesting problem that are worth addressing.

@Ovid
Copy link
Author

Ovid commented Nov 24, 2024

@book Sorry I didn't respond sooner. Very busy right now. Here are some quick thoughts.

  1. Lexical vs Dynamic Scope Confusion

The biggest concern raised is that having a lexical feature affect code outside its scope violates lexical scoping principles.

So the suggested change would have this affect:

package Foo {
    # Without lexical_require, any class method calls allowed
    Some::Class->new;  # works if *anyone* has loaded Some::Class
    
    {
        use feature 'lexical_require';
        # Within this scope, must explicitly require modules
        use Some::Class;
        Some::Class->new;  # works
        Other::Class->new; # fails - not required in this scope
    }
    
    # Outside lexical scope, back to normal behavior
    Some::Class->new;  # works
}
  1. Package Variables

I don't think we should address this right now unless someone has a good suggestion. Some modules might very well declare package variables not matching their package names.

  1. %INC Implications

shadowcat-mst raised important points about %INC interaction affecting module loading detection. I think this feature would operates independently of %INC to avoid interfering with:

  • Module loading detection
  • Package existence checks
  • Dynamic loading systems

The restrictions apply only to method calls within the lexical scope, not to module loading state or symbol table manipulation.

I also suspect this would need to be a runtime check, not compile time, but that introduces overhead we may not want.

Global Symbol Table

All of the above feels kind of like a nasty, incomplete hack. One obstacle we've faced several times is that we have one global symbol table. This has shot down other ideas in the past. I would think the above might be easier to do if the hard work of making local symbol tables was implemented at some point. Perhaps using a namespace keyword.

namespace My::Namespace;
# at this point, we have a private namespace *distinct* from
# the global namespace.

use Foo;
use Bar::Baz;

my $foo   = Foo->new;         # good
my $drink = Bar::Keep->pour;  # bad

There would be no checking %INC. There would be no checking of namespace permissions.

This, incidentally, might lead us to solving the pernicious problem we have with Corinna wanting "trusted" methods which cannot be called unless:

  • It's called inside the class
  • Or optionally a subclass
  • Or some other modules sharing the same private namespace

Having a group of modules sharing a private namespace would get us closer to code isolation we don't have today, but I'm unclear how we'd stop someone else from injecting their code into the namespace.

However, that means this:

namespace Foo { # named private namespace, but anonymous allowed too?
    package My::Package;
    use Some::Other::Module;
    ...
}

package Calling::Package;
use My::Package;
say Some::Other::Module->get_stuff; # fails

In the above, Some::Other::Module would be loaded into the private namespace, not the global one, so this would effect code outside the Foo scope, something raised as a concern earlier. I think this solution is more sane, but probably a hell of a lore more work.

@ap ap force-pushed the main branch 2 times, most recently from f7e9443 to e4ba29b Compare November 24, 2024 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants