Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

starlark: add support for addressing #101

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

alandonovan
Copy link
Contributor

This change adds support for addressable variables and the unary
prefix operators * and &, which are needed to correctly support
Stargo (see wip-stargo branch), an upcoming extension that enables
Starlark bindings for Go types and variables.

Start reading at the starlark.Variable interface for background.

All new features are behind the -addressing flag.

@alandonovan alandonovan force-pushed the addressing branch 2 times, most recently from 940f76b to 5f85d6c Compare January 4, 2019 17:00
This change adds support for addressable variables and the unary
prefix operators * and &, which are needed to correctly support
Stargo (see wip-stargo branch), an upcoming extension that enables
Starlark bindings for Go types and variables.

Start reading at the starlark.Variable interface for background.

All new features are behind the -addressing flag.

Change-Id: I0258f5d38f85c0d03bb08dc98165b0335a88a98a
@natefinch
Copy link
Collaborator

I am looking at this but it'll take a bit of time to get up to speed. So far, I like the way it looks.

@alandonovan
Copy link
Contributor Author

Thanks. If something is not clear, do ask; that's the whole point. :)

BTW, I think a small extension to the mechanism in this CL could support all the delve team's needsa statement such as x = 1, where x is a predeclared starlark.Variable, could update a Go variable. But Stargo doesn't need it, so it can wait for a follow-up.

Copy link
Collaborator

@jayconrod jayconrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the glacial review speed. Took me a while to get through this one. :)

I have some misgivings about this change. Even though all this is behind a dialect flag, I feel like this is specializing the language too much toward a specific application. The change doesn't make sense without the context of stargo. If introduced addressing as a general language feature, I'd expect it to work with regular lists and objects and even local variables, not just external types that satisfy Variable.

I'm a little nervous about the language change itself as well. This is a significant departure from Starlark being mostly a subset of Python in a way that the other dialect flags are not. This is a step toward becoming a new, separate language with its own features.

All that being said, this seems very useful for Stargo. I'd love to suggest a way to make the language extensible by embedders, but this seems like a deep change. I'm not sure I can suggest a better approach. If only this were common Lisp and we had read macros...

@@ -749,7 +749,7 @@ func (p *parser) parseArgs() []Expr {
// | '[' ... // list literal or comprehension
// | '{' ... // dict literal or comprehension
// | '(' ... // tuple or parenthesized expression
// | ('-'|'+'|'~') primary_with_suffix
// | ('-'|'+'|'~'|'&') primary_with_suffix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also '*'


default:
fcomp.expr(e)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: delete blank line here.

// of a Go package such as http.DefaultServeMux are always accessed
// using a dot expression.
//
// A concrete type that satisfies Variable could be represented by a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, there aren't any types that satisfy this interface in the starlark package, correct?

fcomp.expr(lhs.Y)
fcomp.emit(DUP2)
fcomp.setPos(lhs.Lbrack)
fcomp.emit(INDEX)
if resolve.AllowAddressing {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems error-prone to emit different bytecode depending on dialect flags. An application might compile and cache code for a library that doesn't depend on this flag. The library could be loaded by different future invocations of that library in different modes.

One solution would be to include the resolve flag in the serialized header. But I think I'd prefer the simplicity of emitting these opcodes unconditionally. They're always no-ops if AllowAddressing is false, right? How severe is the overhead?

//
// In Addressing mode, all nonrecursive calls to addr() (a chain of ATTR
// and INDEX ops) must be followed by one of ADDRESS, SETFIELD,
// SETINDEX, or VALUE:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really confused me until I read the interpreter's code for VALUE a second time: it's a no-op for values that don't implement Variable. I think that should be mentioned here, too.

@@ -688,7 +720,7 @@ func (r *resolver) expr(e syntax.Expr) {
r.errorf(pos, "multiple *args not allowed")
}
seenVarargs = true
r.expr(arg)
r.expr(unop.X)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, f(*x) is interpreted as an expansion of arguments, not as a dereference. Makes sense, but the spec should be clear on how this ambiguity is resolved.

@alandonovan
Copy link
Contributor Author

alandonovan commented Jan 8, 2019

Sorry for the glacial review speed. Took me a while to get through this one. :)

Thanks, there's no rush. :)

I have some misgivings about this change. Even though all this is behind a dialect flag, I feel like this is specializing the language too much toward a specific application. The change doesn't make sense without the context of stargo. If introduced addressing as a general language feature, I'd expect it to work with regular lists and objects and even local variables, not just external types that satisfy Variable.

I'm a little nervous about the language change itself as well. This is a significant departure from Starlark being mostly a subset of Python in a way that the other dialect flags are not. This is a step toward becoming a new, separate language with its own features.

All that being said, this seems very useful for Stargo. I'd love to suggest a way to make the language extensible by embedders, but this seems like a deep change. I'm not sure I can suggest a better approach. If only this were common Lisp and we had read macros...

I'm glad you expressed these concerns, because you are absolutely right and it tells me you are paying close attention. :)

This change indeed makes major concessions towards Stargo, and a follow-up change I'm considering would do the same for the Delve REPL. Both changes are entirely Go-centric and would be tricky to implement (and likely unwanted) in another implementation of Starlark. Essentially I think the value of this proposed change to the Go ecosystem as a whole is worth the considerable costs you identify, as it would provide a familiar and high-quality scripting or interaction system to any Go application that wants it, while continuing to serve the needs of Bazel-like clients (of which I am one).

I'll look over your comments on the code later.

@laurentlb
Copy link
Contributor

Ideally, all dialects/embeddings of Starlark should be able to share tooling (IDE support, formatter, refactoring, linter...). Each time you extend the language, you create burden for the tooling and this can fragment the ecosystem.

When a change looks reasonable and can be useful to lots of users, it can be added to the spec.
But this change looks too specific, adds complexity to the language (I'm sure many people will be confused by the syntax), and is outside the Python-syntax subset.

@alandonovan
Copy link
Contributor Author

Ideally, all dialects/embeddings of Starlark should be able to share tooling (IDE support, formatter, refactoring, linter...). Each time you extend the language, you create burden for the tooling and this can fragment the ecosystem. [T]his change ... is outside the Python-syntax subset.

These too are all good points.

@natefinch
Copy link
Collaborator

So I guess my question is - do we need special syntax to make m.f = 1 "work" ?

From the stargo point of view, if m is a pointer to a struct that got wrapped by stargo.. that can just work. It works in my starlight package. Please note, I only dabble in python, so there may well be obvious things about the way python works normally that I don't notice.

When does starlark code need to care about pointers versus not? In normal go code, aside from type signatures and at value creation time, & and * are pretty rare. Sure maybe sometimes a function produces a pointer and some other function takes a non-pointer, but it's just not very often, in my experience. Starlark doesn't have type signatures, nor value creation of go types (AFAIK?). If it's sufficiently rare for stargo to actually need to care, maybe they could be built-in functions instead, which would keep the syntax pythonic and consistent with other dialects, if a bit verbose.

@alandonovan
Copy link
Contributor Author

do we need special syntax to make m.f = 1 "work" ?

Obviously we don't need special syntax to make m.f =1 work.

When does starlark code need to care about pointers versus not? In normal go code, aside from type signatures and at value creation time, & and * are pretty rare.

The *ptr operations could relatively easily be replaced by function calls:

y = *ptr           -- to dereference a pointer. This could be done with a function call, go.deref(ptr).    
*ptr = y           -- to update the variable pointed to by a pointer. This could be go.update(ptr, value).

The * operator as applied to types (e.g.*ast.ReturnStmt) is also syntactic sugar:
https://github.com/google/starlark-go/blob/wip-stargo/staranise/iterator_leak.star#L24
It could be replaced by go.ptr_to(ast.ReturnStmt).

The really crucial one is &:

ptr = &x.f[i]     -- to obtain a pointer to an element of a Go variable of kind struct or array.

About 3% of non-comment non-blank lines in this repo use the & operator, so it's unusual but not obscure.

If x is a struct variable and f is an array, then x.f[i] is a variable, so y = x.f[i] yields its value and x.f[i] = y updates it. All this can be supported without new syntax. You can even call methods on its address, x.f[i].M(), because Stargo does the implicit-& operation on variables just like the Go compiler. The problem comes when you need to get the address of x.f[i] and pass it to another function. I don't really see how one can do this without additional syntax (or intrinsics) because the compiler needs to generate different code for it.

You could argue that variables of this kind are not so common, and would be especially uncommon in the kinds of APIs users choose to expose to Stargo---users learn to avoid things that just don't work---but that seems like bad design. Also, I believe the addressing features in this PR, with one small tweak, would enable the Delve debugger team to implement what they need; a debugger surely can't be picky about which programs it looks at.

@natefinch
Copy link
Collaborator

Obviously we don't need special syntax to make m.f =1 work.

Well, I wouldn't have asked if it were obvious to me. :)

I definitely find this line to be very confusing:
assign_stmt = *ast.AssignStmt

But then, at least part of that is not realizing that you can assign a type to a variable in stargo.

So it sounds like, if we desired to avoid new operators, we'd need four intrinsics:

val = go.ptr_to(pkg.Foo) // val is a type value that is a *pkg.Foo
go.update(ptr, val) // equivalent to *ptr = val
val = go.deref(ptr) // equivalent to val = *ptr
ptr = go.addr(val)  // equivalent to ptr = &val

I actually think that's mostly ok. Not as elegant as operators, but more explicit, and easy for most people to understand. I wish update had a better name, though.

@alandonovan
Copy link
Contributor Author

Well, I wouldn't have asked if it were obvious to me. :)

Sorry, what I meant was that although m.f = 1 does need special support when m is a Go variable, that support is in the compiler; the scanner and parser already do the necessary things.

I definitely find this line to be very confusing:
assign_stmt = *ast.AssignStmt
But then, at least part of that is not realizing that you can assign a type to a variable in stargo.

ast is a Starlark value representing a Go package. ast.AssignStmt is a Stargo value representing a Go reflect.Type. That value supports the unary prefix * operator, and its effect is to return a pointer type. Just as in Go, *T and *ptr mean different things but use the same syntax.

So it sounds like, if we desired to avoid new operators, we'd need four intrinsics:
val = go.ptr_to(pkg.Foo) // val is a type value that is a *pkg.Foo

We have this one already.

ptr = go.addr(val) // equivalent to ptr = &val

No, this doesn't work. Passing val as an argument to go.addr reads the content of the variable, which is a plain r-value. It has no address. The compiler must treat expressions of the form &x.f and &a[i] specially.

@natefinch
Copy link
Collaborator

ok, thanks for clarifying. So need a special operator for getting the address of a variable, since that concept just doesn't exist in python, and there's no way a built-in can do that, because at best it gets passed a normal r-value. That makes sense.

I'm not an expert here, but in theory could we make a built-in keyword that isn't off the go namespace? So just maybe addr(val) ? Then starlark-go's compiler could treat that basically the same as &val, but other tools could just treat it like it's a normal function call.

I'm just spitballing here, trying to come up with fixes that work best for everyone.

Maybe * and & would fine, I don't know. I do think it could be unfortunate to lose out on tools that understand the language... however, I kind of wonder how many of those might actually exist if other implementations are much more limited than stargo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants