Skip to content
This repository has been archived by the owner on Feb 3, 2019. It is now read-only.

Latest commit

 

History

History
540 lines (458 loc) · 22.2 KB

TODO.org

File metadata and controls

540 lines (458 loc) · 22.2 KB

GSoC

all time

HeadlineTime
Total time2d 11:27
GSoC2d 11:27
\__ Time tracking & TODOs1d 22:26
\__ Notes13:01

today

HeadlineTime
Total time12:11
GSoC12:11
\__ Time tracking & TODOs12:11

Time tracking & TODOs

[#A] Implement a reference Clojure compatibility mode

[#A] lein-oxcart

Create an interface for using oxcart either standalone or via leiningen to compile Clojure source to bytecode

[#A] Precondition coverage

is woeful at best, go back and add appropriate coverage

[#A] Namespace level emitter

  • Is one feasable?
  • Depends on Called-by-what analysis pass for fn to method reduction correctness
  • Depends on Multi-arity to single arity reduction for fn to method reduction correctness
  • Depends on Constant inliner pass to get the most out of inlining, as this will enable more functions to be demoted to methods from IFns.
  • An issue is that namespaces are not really “modules” due to the implementation of the (ns) macro which simply alters the state of the var table mapping which constitutes a “namespace” through the emission of require and use forms which create var table side-effects. Since the idea with Oxcart is that all this var table garbage is garbage ideally I want to be able to escape the var table altogether for the overwhelming majority of vars to the point that it may make sense for each load() to do its own explicit var table manipulation. The information required for doing this could probably be derived from (ns-map) but we’ll see how that goes.

[#A] Whole program level emitter

  • Is one feasable?

[#A] Improve the whole program AST to a mutable object tree via transients for improved update semantics and performance

This is probably going to be a fun exercise in building a Cortext style CES out of TANAL trees, identifying duplicated forms (vars, locals) reducing them, returning the result for rewriting and then providing an interface to render it back to a TANAL tree.

see https://www.refheap.com/86571 for the original cut of the code in question, which could bear some major tightening up.

^:static reading

tbaldridge said Rich has previously implemented a lot of this stuff, and that it has been reverted out for dubious reasons. Tracking down what this was and when it happened would probably be a good start to determining whether or not I can reasonably make the requisite changes as part of GSoC.

662b38415e15edcbd720628c0c07a8f8817c96b4 This commit added ^:static annotations all over the core and cleaned up the core compiler support for it.

ca737838aa65970775c58cd3a72fea4221a67bda This commit added ^:static compiler support in the first place. Looks like all this did/does is create invokeStatic methods on static tagged classes, allowing the compiler to escape vars entirely for some subset of a program.

http://stuartsierra.com/2013/03/29/perils-of-dynamic-scope “In Clojure versions 1.2 and earlier, all Vars had dynamic scope by default, but this meant that there was a performance cost to look up the current dynamic binding of a Var on every function call. Leading up to 1.3, Rich Hickey experimented with allowing Vars to be declared ^:static, before settling on static by default with ^:dynamic as an option. You can still find ^:static declarations littered through the Clojure source code. Maybe someday they’ll be useful again.”

http://dev.clojure.org/display/design/%27Lean%27+Runtime http://dev.clojure.org/display/design/Build+Profiles

Read up on InvokedPrim

<Bronsa> arrdem: btw it looks like an approach like invokeStatic (assumed I understood this correctly) should be more robust for removing the var indirection rather than using ^:const because it should work fine with fn with closed-overs too <hiredman> Bronsa: well, :static generated static methods, so closed overs wouldn’t work <Bronsa> hiredman: oh right <Bronsa> arrdem: I derped. ^ <hiredman> Bronsa: you could just get rid of manual :static flagging, do λ lifting and implement any function that doesn’t close over anything and is never used as a value as a static method <hiredman> (you know, for the clojure compiler you are working on)

Reach set of classes

It would be very useful to be able to compute the reach set of a given expression in terms of Java classes accessed. This would allow me to simply ban clojure.lang.Var.alter and soforth as Java methods rather than having to do the unsupported symbol warning at the Clojure level. I’m not sure if t.a.jvm supports this but it’s worth asking. This is possible for static typing of classes but is not generally possible. Thanks to type hints indicating var type it may be possible to infer the banning of most of the clojure core var manipulation code.

[#B] Constant inliner pass

  • Rewrite ((partial f a b ..) g h) → (f a b .. g h)
  • Rewrite (partial f) → f
  • General invoke inlining potentially.
    <Bronsa> arrdem: btw, RE: automatically adding :inline  [19:39]
    <arrdem> yar
    <Bronsa> do you realize that :inline points to a function, not a boolean?
    <arrdem> yeah which is unfortunate.
    <Bronsa> I mean, I guess you could just walk the macroexpanded fn body
             wrapping all the args with clojure.core/unquote  [19:40]
    <Bronsa> but I'm not too sure how easy that would be to do
    <arrdem> well so the "real" fix is that you take the def'd fn, and then
             partially evaluate it in the invoke context using the provided
             arguments.
    <arrdem> so (f x) becomes ((fn [x] (+ 1 x)) x) which you can then do term
             rewriting on  [19:41]
    <Bronsa> right
    <arrdem> which is entirely general so long as you know what code is pure and
             can be rewritten...
    <arrdem> but we don't have purity tags either...
    <arrdem> but that's really just a bound on the partial evaluator not your
             ability to replace {:invoke} with {:do}  [19:42]
    <arrdem> just create a {:op :let} with the appropriate bindings and let it go
                                            [19:43]
    <*arrdem> adds this to the todo list
        

    See the GHC paper on Haskell inlining for some awesome background

  • Can apply ever be rewritten? (no)
  • Are there non-function values that make sense to try and inline, reduce or fold?

[#B] Build Oxcart with itself

This is probably pretty low priority, but I think it’s something that I’ll get for free since Oxcart doesn’t leverage anything (yet) beyond the static subset of Clojure which I’m trying to compile.

[#B] Hierarchy precomputation

Hierarchies are something else that could likely be precomputed and stored, but maybe not since it’s entirely meaningful to build a hierarchy at runtime. Whatever, they’re a weird and uncommon language feature anyway.

[#B] TCO

No really. Basic TCO isn’t that hard to do. More interesting multiple function to state machine transforms could also be valuable, but only if supported by bytecode generation. Multiple fns get concatinated with appropriate state transition code rewrites, but the state transition is a jump and the stack configuration should be the same in a jump to a body as a call, so this may be something I can get for free.

 method a(args) = apply _gen a-id-const args
 method b(args) = apply _gen b-id-const args
 method c(args) = apply _gen c-id-const args

 method _gen(id-const . args) =
    case id-const:
       a-id-const:
	    .label a
	    < a implementation >
	  b-id-const:
	    .label b
	    < b implementation >
	  c-id-const:
	    .label c
	    < c implementation >

Note that since all args are on the stack “jumping” from a to b or whatever is entirely correct since the only consumed argument stack value is the entry point ID. This means the argument stack must be in function call order, which means generating naive bytecode is correct. Similarly replacing tail calls in the block with bytecode gotos is then correct because the stack must be in call order otherwise the call would be invalid.

[#C] Partial evaluation of map

Creating partially evaluated specializations of map and other clojure core functions that are specialized on a function

[#C] InvokeDynamic

  • The spec has changed
  • The performance has changed
  • It may or may not be worthwhile

[#C] Multimethod precomputation

Statically identify multimethods as defs and cache final their fully computed dispatch tables as static values discarding all other manipulating operations.

This will also require hoisting defmethod forms to defn forms and eliminating the generated method instalation code.

Note that internally multimethods use a table of dispatch values to IFns, so doing this will require computing an inverse mapping from IFns to Vars, and then writing a new root function which holds the dispatch table as a static value.

[#C] Multimethod static dispatch

Given that I’m already proposing to precompute multimethods dispatch tables it’d be cool to try and look through the resulting table and try to precompute the dispatch target of a given multimethod use.

[#C] Implement compilation configurations & profiles

  • Indicate preserved vars
  • Preserve all vars
  • Default of optimize everything
  • Default of optimize nothing

[#C] Extend compilation profiles with symbol level annotations

[#C] Typechecking

Add `clojure.core.typed` annotations to all Oxcart code

[#D] Compiler introduced transients

Apply pointer analysis to structural sharing and attempt compiler introduction of transients

[#D] core.typed integration

Interface with `clojure.core.typed` to provide compiler introduction of `core.typed` derived records and runtime typechecking

Reading into the JVM

Patching TANAL/TEJVM

TEJVM classfile emitter

Bookkeeping

[#C] Emit argument specializations

When type hint permits kick out a specialized invoke and dispatch to it.

Notes

AST structure

https://github.com/halgari/data-all-the-asts/blob/master/src/data_all_the_asts_talk/core.clj#L181 https://www.youtube.com/watch?v=KhRQmT22SSg

So there are several parts to a AST-as-data

:op ;; allows for polymorphic dispatch on the node type :form ;; original un-analized form :children ;; child nodes in “execution order” :env ;; a grab-bag of data about the context of the node

Whole program AST

I need to be able to say that this var maps to that ast with that reach set. Then I can say “collect all reach sets” into a single sum reach set and then do var emission on that basis.

{#’clojure.core/conj -> (ast conj) … }

To compute this whole program AST we’re gonna have to do some weirdness with clojure.core/load and clojure.core/eval to interact with tools.analyzer. tools.analyzer.jvm isn’t far behind in all this but I don’t think I need it yet.

The reality of the matter is that “lein uberjar” is how 99.9% of Clojure applications get packaged and consequently whatever I wind up with for an optimizing compier that faces the user is gonna have to have a lein plugin of some sort which will run my/Nicola’s compiler over the input fileset as determined by the same logic that lein uses to determine what gets uberjared.

Clojure core symbol blacklist

  • clojure.lang.Var.alterRoot()
  • clojure.lang.Var.alter()
  • clojure.lang.Var.set()
  • clojure.core/alter-var-root
  • clojure.core/set! (compiler special form)

The form (clojure.lang.Var/set <X>) analyzes down to

     (defn clear-env [ast]
	(clojure.tools.analyzer.ast/prewalk
	 ast #(dissoc %1 :env)))

     (use 'clojure.pprint)

     (->> (clojure.tools.analyzer.jvm.core-test/ast1
	    (fn [x]
	      (clojure.lang.Var/set x (Long. 1))))
	   clear-env
	   pprint)

     ;; {:top-level true,
     ;;  :children [:methods],
     ;;  :op :fn,
     ;;  :form (fn* ([x] (clojure.lang.Var/set x (Long. 1)))),
     ;;  :variadic? false,
     ;;  :max-fixed-arity 1,
     ;;  :methods
     ;;  [{:children [:params :body],
     ;;    :loop-id loop_7404,
     ;;    :params
     ;;    [{:form x,
     ;;      :name x,
     ;;      :variadic? false,
     ;;      :op :binding,
     ;;      :arg-id 0,
     ;;      :local :arg}],
     ;;    :fixed-arity 1,
     ;;    :op :fn-method,
     ;;    :variadic? false,
     ;;    :form ([x] (clojure.lang.Var/set x (Long. 1))),
     ;;    :body
     ;;    {:body? true,
     ;;     :op :do,
     ;;     :form (do (clojure.lang.Var/set x (Long. 1))),
     ;;     :statements [],
     ;;     :ret
     ;;     {:children [:target :args],
     ;;      :args
     ;;      [{:children [],
     ;;        :assignable? false,
     ;;        :form x,
     ;;        :name x,
     ;;        :variadic? false,
     ;;        :op :local,
     ;;        :arg-id 0,
     ;;        :local :arg}
     ;;       {:op :new,
     ;;        :form (new Long 1),
     ;;        :class Long,
     ;;        :args
     ;;        [{:op :const, :type :number, :literal? true, :val 1, :form 1}],
     ;;        :children [:args]}],
     ;;      :method set,
     ;;      :op :host-call,
     ;;      :form (. clojure.lang.Var (set x (Long. 1))),
     ;;      :target
     ;;      {:op :const,
     ;;       :type :class,
     ;;       :literal? true,
     ;;       :val clojure.lang.Var,
     ;;       :form clojure.lang.Var}},
     ;;     :children [:statements :ret]}}],
     ;;  :once false}

So to find this host interop expression we’re looking for this structure in the analysis…

  {:op       :host-call
   :method   set               ;; or any of #{set alter alterRoot}
   :children [:target :args]
   :args     <anything>        ;; no really we don't care
   :target   {:op   :const
		:type :class
		:val  clojure.lang.Var}}

Alternatively if we analyze the source of clojure.core/alter-var-root we see…

     (defn alter-var-root
	"Atomically alters the root binding of var v by applying f to its
	current value plus any args"
	{:added "1.0"
	 :static true}
	[^clojure.lang.Var v f & args] (.alterRoot v f args))

which analyzes out to this key form… (. v (alterRoot f args)) or the t.a.jvm tree

clojure.tools.emitter.jvm> (->> (clojure.tools.analyzer.jvm.core-test/ast1
				       (defn alter-var-root
					 [^clojure.lang.Var v f & args]
					 (.alterRoot v f args)))
				      :init :methods first :body :ret :instance :tag)
clojure.lang.Var

Okay, so we have two cases here for ways that you can access this blacklisted value, now we need a way to figure out what blacklisted functions are.

Compiler.java .class files

Compiler.Expr.compile seems to do conditional classfile generation, however most of the classfile generation is done by Compiler.compile1 and Compiler.compile. The control flow path seems to be that Compiler.compile invokes Compiler.compile1 which invokes Expr.compile. Expr.compile then emits per-expression classfiles (nested fn types etc.), Compiler.compile1 then emits a top-level class for each form in the ns, and Compiler.compile emits the loader class which links all of the above together.

The trick with Compiler.java is that Expr.compile is mutually recursive with Compiler.compile1. What this means is that per-emitted expression classfile generation is done in the Expr.compile/Compiler.compile1 cycle.

The structure is that Compiler.compile seems to be that Compiler.compile is the intended entry point, and it will generate the *__init.class file. The load() method of __init.class is populated by invoking Compiler.compile1 on each top level form of the namespace being built. Compiler.compile1 for side effects invokes the .emit() operation on every read sub-expression, which writes to a shared mutable classloader passed through from Compiler.compile. Compiler.compile invokes Compiler.compile1 after beginning the visit of init(). This means that all bytecode written by .emit() members is appended implicitly to the bytecode of *__init.class/load.

For tejvm/-emit-loader it looks like I need to chase in-place writes to the generator adapter parameter of Expr.emit(), with the possibility that Expr.emit() somehow delays computation until it’s evaluated by Expr.eval().

Expr.emit is the key to all of this, since it handles the creation of the loading & instantiation code. The expressions themselves (via ObjExpr.emitLocal and ObjExpr.emitUnboxedLocal)

Emit tree

     (defmulti -emit-loader :op)

     (defmethod -emit-loader :AssignExpr
	 ;; defers to either
	 ;; - :InstanceFieldExpr
	 ;; - :LocalBindingExpr
	 ;; - :StaticFieldExpr
	 ;; - :VarExpr
	 )

     (defmethod -emit-loader :HostExpr)

     (defmethod -emit-loader :FieldExpr)
     (defmethod -emit-loader :InstanceFieldExpr)
     (defmethod -emit-loader :StaticFieldExpr)

     (defmethod -emit-loader :MethodExpr)

     (defmethod -emit-loader :IfExpr)
     (defmethod -emit-loader :ImportExpr)
     (defmethod -emit-loader :InstanceOfExpr)
     (defmethod -emit-loader :InvokeExpr)
     (defmethod -emit-loader :KeywordInvokeExpr)
     (defmethod -emit-loader :LetExpr)
     (defmethod -emit-loader :LetFnExpr)
     (defmethod -emit-loader :ListExpr)

     (defmethod -emit-loader :LiteralExpr)
     (defmethod -emit-loader :BooleanExpr)
     (defmethod -emit-loader :ConstantExpr)
     (defmethod -emit-loader :KeywordExpr)
     (defmethod -emit-loader :NilExpr)
     (defmethod -emit-loader :NumberExpr)
     (defmethod -emit-loader :StringExpr)

     (defmethod -emit-loader :LocalBindingExpr)
     (defmethod -emit-loader :MapExpr)
     (defmethod -emit-loader :MetaExpr)
     (defmethod -emit-loader :MethodParamExpr)
     (defmethod -emit-loader :NewExpr)
     (defmethod -emit-loader :ObjExpr)
     (defmethod -emit-loader :RecurExpr)
     (defmethod -emit-loader :SetExpr)
     (defmethod -emit-loader :StaticInvokeExpr)
     (defmethod -emit-loader :TheVarExpr)

     (defmethod -emit-loader :UntypedExpr)
     (defmethod -emit-loader :MonitorEnterExpr)
     (defmethod -emit-loader :MonitorExitExpr)
     (defmethod -emit-loader :ThrowExpr)

     (defmethod -emit-loader :VarExpr)
     (defmethod -emit-loader :VectorExpr)