Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph/types #86

Draft
wants to merge 16 commits into
base: development
Choose a base branch
from
Draft

Graph/types #86

wants to merge 16 commits into from

Conversation

ChrisCummins
Copy link
Owner

github.com//issues/82

ChrisCummins and others added 10 commits August 12, 2020 00:00
`pip install 'tensorflow==1.14.0'` is no longer found in pip channels.
Instead, specify any version >= 1.14.0. Newer versions seem to have
addressed the bazel file issue, so remove the workaround that requires
users to install TensorFlow themselves.

github.com//issues/76
Similar to the previous commit, the version of Torch is no longer
available. Permit newer versions.

github.com//issues/76
Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
This makes it so that you can see the outline of the images with white
backgrounds on the white background of a Github-rendered markdown README.
Replace the transparent background with a white one so that it is still legible
when dark-mode browser extensions are used, and add the LLVM dragon logo to
emphasize that it is LLVM doing the lowering.
github.com//issues/82

Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
This changes the format of the LLVM-IR program graphs to store a list
of unique strings, rather than LLVM-IR strings in each node. We use a
graph-level "strings" feature to store a list of the original LLVM-IR
string corresponding to each graph nodes. This allows to us to refer
to the same string from multiple nodes without duplication.

This breaks compatability with the inst2vec encoder on program graphs
generated prior to this commit.

Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
This adds a fourth node type, and a fourth edge flow, both called
"type". The idea is to represent types as first-class elements in the
graph representation. This allows greater compositionality by breaking
up composite types into subcomponents, and decreases the required
vocabulary size required to achieve a given coverage.

Background
----------

Currently, type information is stored in the "text" field of nodes for
constants and variables, e.g.:

    node {
      type: VARIABLE
      text: "i8"
    }

There are two issues with this:

 * Composite types end up with long textual representations,
   e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
   unbounded number of possible structs, this prevents 100% vocabulary
   coverage on any IR with structs (or other composite types).

 * In the future, we will want to encode different information on data
   nodes, such as embedding literal values. Moving the type information
   out of the data node "frees up" space for something else.

Overview
--------

This changes the representation to represent types as first-class
elements in the graph. A "type" node represents a type using its
"text" field, and a new "type" edge connects this type to variables or
constants of that type, e.g.:

    node {
      type: VARIABLE
      text: "var"
    }
    node {
      type: TYPE
      text: "i8"
    }
    edge {
      flow: TYPE
      source: 1
    }

Composite types
---------------

Types may be composed by connecting of many type nodes using type
edges. This allows you to break down complex types into a graph of
primitive parts. The meaning of composite types will depend on the
type of IR, the remainder describes the process for LLVM-IR.

Pointer types
-------------

A pointer is a composite of two types:

    [pointer] <- [pointed-type]

For example:

    int32_t* instance;

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      text: TYPE
      target: 1
    }
    edge {
      text: TYPE
      source: 1
      target: 2
    }

Where variables/constants of this type receive an incoming type edge
from the [pointer] node, which in turn receives an incoming type edge
from the [pointed-type] node.

One [pointer] node is generated for each unique pointer type. If a
graph contains multiple pointer types, there will be multiple
[pointer] nodes.

Struct types
------------

A struct is a compsite type where each member is a node type which
points to the parent node. Variable/constant instances of a struct
receive an incoming type edge from the root struct node. Note that
the graph of type nodes representing a composite struct type may be
cyclical, since a struct can contain a pointer of the same type (think
of a binary tree implementation).

The type edges from member nodes to the parent struct are
positional. The position indicates the element number. E.g. for a
struct with three elements, the incoming type edges to the struct node
will have positions 0, 1, and 2.

This example struct:

    struct s {
      int8_t a;
      int8_t b;
      struct s* c;
    }

    struct s instance;

Would be represented as:

    node {
      type: TYPE
      text: "struct"
    }
    node {
      type: TYPE
      text: "i8"
    }
    node {
      type: TYPE
      text: "*"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      target: 1
      position: 1
    }
    edge {
      flow: TYPE
      target: 2
      position: 2
    }
    edge {
      flow: TYPE
      source: 2
    }
    edge {
      flow: TYPE
      target: 3
    }

Array Types
-----------

An array is a composite type [array] <- [element-type]. For example,
the array:

    int a[10];

Would be represented as:

    node {
      type: TYPE
      text: "i32"
    }
    node {
      type: TYPE
      text: "[]"
    }
    node {
      type: VARIABLE
      text: "var"
    }
    edge {
      flow: TYPE
      target: 1
    }
    edge {
      flow: TYPE
      source: 1
      target: 2
    }

github.com//issues/82

Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
@ChrisCummins ChrisCummins changed the base branch from master to development August 20, 2020 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant