Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fl tag - Supporting lazy version of f-strings #6

Closed
jimbaker opened this issue May 8, 2022 · 31 comments
Closed

fl tag - Supporting lazy version of f-strings #6

jimbaker opened this issue May 8, 2022 · 31 comments

Comments

@jimbaker
Copy link
Owner

jimbaker commented May 8, 2022

Building on @gvanrossum's issue #1, I have implemented a lazy version of f-strings. This demo code implements the fl tag such that it has the same user behavior as described in python/cpython#77135

Note that while fltag.py in the gist implements memoization on the value, it's not been optimized in any other way. (This was an imporant part of the discussion in the CPython issue above.) My assumption here is that an implementation of the fl tag could do some memoization on the raw string decode, as well other optimizations. TBD.

This can be used as follows, as seen in the demo function in the gist:

    logging.info(fl'{expensive_fn()}')     # nothing logged, expensive_fn is not called
    logging.warning(fl'{expensive_fn()}')  # but this is

https://gist.github.com/jimbaker/bb27803755ce890ecbcae29927cb776e

@gvanrossum
Copy link
Collaborator

gvanrossum commented May 8, 2022

Awesome! I see you discovered that format(x, '') is equivalent to str(x). Nice! (I had forgotten.)

You also discovered the need for decoding "raw" strings. I'm not sure that your approach covers everything. The escapes I know of include:

  • \n and other classic one-letter escapes
  • \ooo octal
  • \xhh hex
  • \uhhhh 16-bit unicode
  • \Uhhhhhhhh 32-bit unicode (really 21-bit)
  • \N{unicode_name}
  • What did I miss?

@gvanrossum
Copy link
Collaborator

Maybe we can similarly implement PEP 501 style "i-strings" using tag strings?

@jimbaker
Copy link
Owner Author

jimbaker commented May 9, 2022

You also discovered the need for decoding "raw" strings. I'm not sure that your approach covers everything. The escapes I know of include:

  • \n and other classic one-letter escapes
  • \ooo octal
  • \xhh hex
  • \uhhhh 16-bit unicode
  • \Uhhhhhhhh 32-bit unicode (really 21-bit)
  • \N{unicode_name}
  • What did I miss?

We should be good here - nothing missed. The function _PyUnicode_DecodeUnicodeEscapeInternal in unicodeobject.c is used by both the parser (directly) and the codec supporting 'unicode-escape', and it implements the above escape logic.

So trying it out with the fl tag, along with using standard raw f-strings:

>>> fr'\N{{GRINNING FACE}}'
'\\N{GRINNING FACE}'
>>> fl'\N{{GRINNING FACE}}'
LazyFString(args=('\\N{GRINNING FACE}',))
>>> str(_)
'😀'

The only possible gotcha here is that in regular f-strings we don't double up braces for \N{unicode_name} (they are not parsed as being part of an expression). For raw f-strings/tag-strings, it results in perhaps an obvious SyntaxError with GRINNING FACE, less so with something else where it would be a valid name:

>>> f'\N{GRINNING FACE}'
'😀'

@arcivanov
Copy link

@jimbaker this is wonderful progress!
Perhaps I'm missing something but this code isn't compiling, obviously, on 3.10.4. Are you using a patched version of CPython?

@gvanrossum
Copy link
Collaborator

@arcivanov
See #1. There's a branch of 3.11 involved, this is just a prototype. :)

@jimbaker
Ah, cool. I keep forgetting new things. :-)

The issue with braces in tag"\N{unicode name}" is unfortunate, I guess it would be gone if we didn't support raw strings. Why do we need those again? Maybe we don't?

@ericsnowcurrently
Copy link
Collaborator

Raw strings are especially useful for regular expressions. I know of at least one case where tag strings would be useful for regular expressions: fixing indentation in a verbose pattern where there are interpolations that "include" subpatterns defined elsewhere.

@gvanrossum
Copy link
Collaborator

Raw strings are especially useful for regular expressions. I know of at least one case where tag strings would be useful for regular expressions: fixing indentation in a verbose pattern where there are interpolations that "include" subpatterns defined elsewhere.

Could you elaborate on that example? IIUC in verbose re patterns indentation doesn't matter. So why would you need to fix it?

@ericvsmith
Copy link
Collaborator

I think that if our choice only "raw" or "cooked", we should go with raw. If we could come up with a clever way to say "this is a raw tagged string", then that would be ideal. But I don't see a good way of doing that. Maybe "fl-r", for a raw "fl" string? But it seems too ugly.

@gvanrossum
Copy link
Collaborator

I think that if our choice only "raw" or "cooked", we should go with raw. If we could come up with a clever way to say "this is a raw tagged string", then that would be ideal. But I don't see a good way of doing that. Maybe "fl-r", for a raw "fl" string? But it seems too ugly.

Definitely too ugly. :-)

I guess the tag"\N{blah blah}" issue can be solved in the f-string parser if there's enough motivation -- it "just" has to recognize \N{...} and not turn it into an interpolation. (This requires knowing whether the \ is itself escaped -- but it should already be keeping track of that in order to know when \" ends the string.)

@ericsnowcurrently
Copy link
Collaborator

Could you elaborate on that example? IIUC in verbose re patterns indentation doesn't matter. So why would you need to fix it?

It really helps when debugging a large pattern. If the indentation of "included" sub-patterns isn't fixed, then the resulting pattern is harder to follow when you print it out. I've had to deal with this on occasion.

@jimbaker
Copy link
Owner Author

jimbaker commented May 9, 2022

\LaTeX support 😁 - it would be nice to generate Latex with Python templates. This has been a use case for me, and likely a future one.

@gvanrossum
Copy link
Collaborator

Okay, raw mode seems useful enough to support.

@gvanrossum
Copy link
Collaborator

But wait. Aren’t curlies just as prevalent in Latex as backslashes? So what would you gain?

@jimbaker
Copy link
Owner Author

But wait. Aren’t curlies just as prevalent in Latex as backslashes? So what would you gain?

There are a lot of metacharacters in Latex. But there's a difference between working with something balanced like {{...}} and doubling \\. Or worse doubling separators with \\\\. Also in practice - or at least what I have done - there are more symbols that are specified simply by \sym than being parameterized. Again, this is for generated Latex, not writing it in general.

There is this workaround for Jinja. I don't think the fact that it can be customized actually helps here: http://eosrei.net/articles/2015/11/latex-templates-python-and-jinja2-generate-pdfs

@jimbaker jimbaker mentioned this issue May 18, 2022
11 tasks
@rmorshea
Copy link
Collaborator

rmorshea commented Jun 2, 2022

This reminds me of something i made because I wanted f-string style templates.

@jimbaker
Copy link
Owner Author

jimbaker commented Jun 2, 2022

@rmorshea The need to do frame inspection (as in https://github.com/rmorshea/fstr/blob/master/fstr/fstr.py#L33) is a common requirement we see in other templating approaches where we want to avoid repeating oneself and have direct access to expressions - which is perhaps why f-strings have become so popular. See for example

(I'm sure there are many, many more examples out there!)

The sharp edge here is that sys._getframe (however it is wrapped) provides access to the dynamic scope of a given name. This can certainly be useful, but lexical scope is what f-string uses - and tag strings more generally.
In particular, the lack of lexical scope is why more complex - but still very much popular - usage patterns fail, as seen in this issue: jviide/htm.py#11

@jimbaker
Copy link
Owner Author

jimbaker commented Jun 2, 2022

I added fl.py to the examples in the repo - this is a cleaned up version of the original gist.

@rmorshea
Copy link
Collaborator

rmorshea commented Jun 2, 2022

I think if I were doing this now I'd drop .evaluate() method for the reason you mention in addition to the fact that it's a bit too "magical". Rather, I'd just want .format(**variables) as in the "Use f-string syntax instead of str.format" example.

A reworked version of the second example might be:

import fformat import fformat

common_error_message = fformat`function {function.__name__!r} failed because {error}`

def add(x, y):
    try:
        return x + y
    except Exception as e:
        msg = common_error_message.format(function=add, error=e)
        print(msg)

def sub(x, y):
    try:
        return x + y
    except Exception as e:
        msg = common_error_message.format(function=sub, error=e)
        print(msg)

add(1, "2")
sub("5", 3)

If this seems compelling I can take a crack at implementing it.

@jimbaker
Copy link
Owner Author

jimbaker commented Jun 5, 2022

@rmorshea I've been thinking about the example with common_error_message, which also resembles the approach discussed in #2

First, we can do something like what you propose, given that a tag string when evaluated simply returns some object, which could support a format method (or possibly, __format__).

My feeling however that we might just want to wrap with a function, much like we do with some of the nested tag strings, such as with what see in the html example. So this could work:

from fformat import fformat

def common_error_message(function, error):
    return fformat`function {function.__name__!r} failed because {error}`

def add(x, y):
    try:
        return x + y
    except Exception as e:
        msg = common_error_message(function, e)
        print(msg)

@gvanrossum
Copy link
Collaborator

Why couldn't that use f-strings?

@jimbaker
Copy link
Owner Author

jimbaker commented Jun 5, 2022

@gvanrossum I'm assuming - perhaps wrongly! - that fformat is doing something that's not quite what f-strings would do. Otherwise, use the f-string or the equivalent format string.

@rmorshea Am I completely off with respect to the intent of the example?

@rmorshea
Copy link
Collaborator

rmorshea commented Jun 6, 2022

The intention was simply to have a way to use f-string syntax in a re-usable template. So to @gvanrossum's question, the answer is yes and no. Yes, you can do this with an f-string in a function as @jimbaker showed above, but no, you can't do it only with f-strings. If you were to do this just with f-string's you'd need to copy-paste the f-string and substitute in the appropriate variables (e.g. switching out add for sub as the function in the msg):

def add(x, y):
    try:
        return x + y
    except Exception as e:
        msg = f`function {add.__name__!r} failed because {e}`
        print(msg)

def sub(x, y):
    try:
        return x - y
    except Exception as e:
        msg = f`function {sub.__name__!r} failed because {e}`
        print(msg)

@gvanrossum
Copy link
Collaborator

That's sort of what I figured. But if you have to write code that is essentially template.format(function=..., error=...) where template is a global (template = fformat"function {add.__name__!r} failed because {error}") then this really does beg the question of whether it isn't better to just use a function (to be called using the same signature as template.format())? If we're trying to sell the idea that tag strings enable use cases that we couldn't have before, this particular example makes for a pretty weak argument.

@pomponchik
Copy link

I also wrote my own implementation of lazy f-strings. It works even easier:

import f

number = 33
f('{number} kittens drink milk')

The actual calculation of the string followed by caching occurs at the first access. This happens transparently to the user and works, for example, for logging.

My argument for why this feature should still be built into the interpreter is speed. I don't see any ways to achieve a speed comparable to the original f-strings. It is especially expensive to extract variables from closures - it cannot be done as efficiently as the interpreter does inside itself.

@jimbaker
Copy link
Owner Author

@pomponchik right, it's quite possible to dynamically look up variables in a number of ways from Python's frames. One can also generalize to expressions. The numexpr package, which is part of NumPy, does something similar to your implementation; see https://github.com/pydata/numexpr/blob/master/numexpr/necompiler.py#L725 (The difference is that inspect.stack as used in fazy calls sys._getframe.)

One difference in this proposal, besides being faster (or should be), is that the lookups of any variables in such expressions are lexically scoped. This is a well-known problem when composing with respect to nested functions, or implicitly with list comprehensions/generator expressions; see @pauleveritt's comment on a similar library jviide/htm.py#11

So that's why we need language support, or at least a transpiler, as @rmorshea as worked on in #20

@pomponchik
Copy link

pomponchik commented Apr 20, 2023

@jimbaker My implementation takes into account lexical nesting and takes these variables too, you can check. For generator expressions, my library also works perfectly:

>>> list(f('{x}') for x in range(10))
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In general, I don't see anything that I fundamentally couldn't implement as a library in this case. There is no way to reliably do just one thing - to make it work quickly.

@jimbaker
Copy link
Owner Author

@pomponchik I tried it on a modified version of Paul's example, and it didn't blow up:

import f

todos = ['breakfast', 'lunch', 'dinner']                                                                                
def Todo(label, prefix):
    return f('<li>{prefix}{label}</li>')


def TodoList(prefix, todos):
    return f('<ul>{[Todo(label, prefix) for label in todos]}</ul>')                                                     

print(TodoList('item - ', todos))

So my first reaction is, Python is maintaining enough lexical information at runtime such that it is possible to recover the lexical scoped lookup, if with rather complex (and as you note, slow running) code. I would have to see if I can find a counterexample, but your test cases you linked certainly cover the obvious one.

Obviously if this is true., then it is possible to implement an arbitrary tag scheme with ordinary functions, similar to what was done in https://github.com/jviide/tagged, but respecting lexical scope.

Also I'm rather impressed with your code here. It does go deep into Python's internals!

@gvanrossum
Copy link
Collaborator

How would you enforce that f() isn't called with a variable argument? That would be an attack vector.

@pomponchik
Copy link

pomponchik commented Apr 21, 2023

@gvanrossum I would do this by extracting the code object from the stack and analyzing its AST. The AST node must be a constant. So far, this improvement has not occurred to me, but in principle it looks like I could do it. But I note that it will work even longer.

@gvanrossum
Copy link
Collaborator

FWIW I don't want to keep discussing your f(...) implementation, it's irrelevant to the idea of tag strings (which are by design a syntactic feature).

@jimbaker
Copy link
Owner Author

We have a working example of fl tag, so closing out this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants