Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph/types #86

Draft
wants to merge 16 commits into
base: development
Choose a base branch
from
Draft

Graph/types #86

wants to merge 16 commits into from

Commits on Aug 11, 2020

  1. Configuration menu
    Copy the full SHA
    b029a50 View commit details
    Browse the repository at this point in the history
  2. Permit TensorFlow versions newer than 1.14.0.

    `pip install 'tensorflow==1.14.0'` is no longer found in pip channels.
    Instead, specify any version >= 1.14.0. Newer versions seem to have
    addressed the bazel file issue, so remove the workaround that requires
    users to install TensorFlow themselves.
    
    github.com//issues/76
    ChrisCummins committed Aug 11, 2020
    Configuration menu
    Copy the full SHA
    51de72f View commit details
    Browse the repository at this point in the history
  3. Permit Torch versions beyond 1.3.0.

    Similar to the previous commit, the version of Torch is no longer
    available. Permit newer versions.
    
    github.com//issues/76
    ChrisCummins committed Aug 11, 2020
    Configuration menu
    Copy the full SHA
    2022063 View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2020

  1. Add a script to generate the Devmap dataset.

    Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
    ChrisCummins committed Aug 14, 2020
    Configuration menu
    Copy the full SHA
    1257300 View commit details
    Browse the repository at this point in the history
  2. Add drop shadows to white-on-white README images.

    This makes it so that you can see the outline of the images with white
    backgrounds on the white background of a Github-rendered markdown README.
    ChrisCummins committed Aug 14, 2020
    Configuration menu
    Copy the full SHA
    ebe3138 View commit details
    Browse the repository at this point in the history
  3. Improve the compiler IR documentation asset.

    Replace the transparent background with a white one so that it is still legible
    when dark-mode browser extensions are used, and add the LLVM dragon logo to
    emphasize that it is LLVM doing the lowering.
    ChrisCummins committed Aug 14, 2020
    Configuration menu
    Copy the full SHA
    3a36a46 View commit details
    Browse the repository at this point in the history
  4. Documentation: Add a render of the type representation.

    github.com//issues/82
    
    Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
    ChrisCummins committed Aug 14, 2020
    Configuration menu
    Copy the full SHA
    8943d98 View commit details
    Browse the repository at this point in the history

Commits on Aug 16, 2020

  1. llvm: Deduplicate LLVM-IR strings.

    This changes the format of the LLVM-IR program graphs to store a list
    of unique strings, rather than LLVM-IR strings in each node. We use a
    graph-level "strings" feature to store a list of the original LLVM-IR
    string corresponding to each graph nodes. This allows to us to refer
    to the same string from multiple nodes without duplication.
    
    This breaks compatability with the inst2vec encoder on program graphs
    generated prior to this commit.
    
    Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
    ChrisCummins committed Aug 16, 2020
    Configuration menu
    Copy the full SHA
    3e24ddc View commit details
    Browse the repository at this point in the history
  2. Add types to the graph.

    This adds a fourth node type, and a fourth edge flow, both called
    "type". The idea is to represent types as first-class elements in the
    graph representation. This allows greater compositionality by breaking
    up composite types into subcomponents, and decreases the required
    vocabulary size required to achieve a given coverage.
    
    Background
    ----------
    
    Currently, type information is stored in the "text" field of nodes for
    constants and variables, e.g.:
    
        node {
          type: VARIABLE
          text: "i8"
        }
    
    There are two issues with this:
    
     * Composite types end up with long textual representations,
       e.g. "struct foo { i32 a; i32 b; ... }". Since there is an
       unbounded number of possible structs, this prevents 100% vocabulary
       coverage on any IR with structs (or other composite types).
    
     * In the future, we will want to encode different information on data
       nodes, such as embedding literal values. Moving the type information
       out of the data node "frees up" space for something else.
    
    Overview
    --------
    
    This changes the representation to represent types as first-class
    elements in the graph. A "type" node represents a type using its
    "text" field, and a new "type" edge connects this type to variables or
    constants of that type, e.g.:
    
        node {
          type: VARIABLE
          text: "var"
        }
        node {
          type: TYPE
          text: "i8"
        }
        edge {
          flow: TYPE
          source: 1
        }
    
    Composite types
    ---------------
    
    Types may be composed by connecting of many type nodes using type
    edges. This allows you to break down complex types into a graph of
    primitive parts. The meaning of composite types will depend on the
    type of IR, the remainder describes the process for LLVM-IR.
    
    Pointer types
    -------------
    
    A pointer is a composite of two types:
    
        [pointer] <- [pointed-type]
    
    For example:
    
        int32_t* instance;
    
    Would be represented as:
    
        node {
          type: TYPE
          text: "i32"
        }
        node {
          type: TYPE
          text: "*"
        }
        node {
          type: VARIABLE
          text: "var"
        }
        edge {
          text: TYPE
          target: 1
        }
        edge {
          text: TYPE
          source: 1
          target: 2
        }
    
    Where variables/constants of this type receive an incoming type edge
    from the [pointer] node, which in turn receives an incoming type edge
    from the [pointed-type] node.
    
    One [pointer] node is generated for each unique pointer type. If a
    graph contains multiple pointer types, there will be multiple
    [pointer] nodes.
    
    Struct types
    ------------
    
    A struct is a compsite type where each member is a node type which
    points to the parent node. Variable/constant instances of a struct
    receive an incoming type edge from the root struct node. Note that
    the graph of type nodes representing a composite struct type may be
    cyclical, since a struct can contain a pointer of the same type (think
    of a binary tree implementation).
    
    The type edges from member nodes to the parent struct are
    positional. The position indicates the element number. E.g. for a
    struct with three elements, the incoming type edges to the struct node
    will have positions 0, 1, and 2.
    
    This example struct:
    
        struct s {
          int8_t a;
          int8_t b;
          struct s* c;
        }
    
        struct s instance;
    
    Would be represented as:
    
        node {
          type: TYPE
          text: "struct"
        }
        node {
          type: TYPE
          text: "i8"
        }
        node {
          type: TYPE
          text: "*"
        }
        node {
          type: VARIABLE
          text: "var"
        }
        edge {
          flow: TYPE
          target: 1
        }
        edge {
          flow: TYPE
          target: 1
          position: 1
        }
        edge {
          flow: TYPE
          target: 2
          position: 2
        }
        edge {
          flow: TYPE
          source: 2
        }
        edge {
          flow: TYPE
          target: 3
        }
    
    Array Types
    -----------
    
    An array is a composite type [array] <- [element-type]. For example,
    the array:
    
        int a[10];
    
    Would be represented as:
    
        node {
          type: TYPE
          text: "i32"
        }
        node {
          type: TYPE
          text: "[]"
        }
        node {
          type: VARIABLE
          text: "var"
        }
        edge {
          flow: TYPE
          target: 1
        }
        edge {
          flow: TYPE
          source: 1
          target: 2
        }
    
    github.com//issues/82
    
    Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
    ChrisCummins committed Aug 16, 2020
    Configuration menu
    Copy the full SHA
    c36b0b4 View commit details
    Browse the repository at this point in the history
  3. s/root/external

    Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
    ChrisCummins committed Aug 16, 2020
    Configuration menu
    Copy the full SHA
    d2271b6 View commit details
    Browse the repository at this point in the history

Commits on Aug 17, 2020

  1. WIP: End-to-end dataflow task tests.

    Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
    ChrisCummins committed Aug 17, 2020
    Configuration menu
    Copy the full SHA
    48a774e View commit details
    Browse the repository at this point in the history
  2. WIP: Bump all requirements.txt versions.

    Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
    ChrisCummins committed Aug 17, 2020
    Configuration menu
    Copy the full SHA
    15b4e1a View commit details
    Browse the repository at this point in the history
  3. WIP: Pip dependencies for Tensorflow 2.x compatibility.

    Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
    ChrisCummins committed Aug 17, 2020
    Configuration menu
    Copy the full SHA
    d129ba9 View commit details
    Browse the repository at this point in the history
  4. WIP: TensorFlow 2.x API updates.

    Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
    ChrisCummins committed Aug 17, 2020
    Configuration menu
    Copy the full SHA
    b69d036 View commit details
    Browse the repository at this point in the history
  5. XXX: TF deps

    Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
    ChrisCummins committed Aug 17, 2020
    Configuration menu
    Copy the full SHA
    724763c View commit details
    Browse the repository at this point in the history

Commits on Aug 18, 2020

  1. WIP: Rewrite LSTM in PyTorch

    Signed-off-by: format 2020.06.15 <github.com/ChrisCummins/format>
    ChrisCummins committed Aug 18, 2020
    Configuration menu
    Copy the full SHA
    1f3403d View commit details
    Browse the repository at this point in the history