Skip to content

Kyrix‐S API Reference

Wenbo Tao edited this page Dec 29, 2020 · 32 revisions

Kyrix-S is an extension to the core Kyrix system, providing a simple declarative grammar for authoring large-scale zooming-based scatterplots, which we call Scalable Scatterplot Visualizations (or SSV). Kyrix-S's declarative grammar is a high-level concise grammar built on top of the lower-level Kyrix grammar, which enables authoring of a complex SSV in tens of lines of JSON.

An Example

The above GIF shows an SSV of NBA basketball games in the season of 2017~2018. The horizontal/vertical axis is the score of the home/away team. Each circle represents a cluster of games, with the number inside it being the cluster size. As one zooms in, the circles get collapsed into a bunch of smaller circles. When you hover over a circle, you see three games between the highest-ranked teams in that cluster, as well as a polygon indicating the boundary of the cluster.

To author this SSV using Kyrix-S, you only need to write the following JSON specification:

{
    data: {  
        db: "nba",  
        query: “SELECT * FROM games"  
    },  
    layout: {  
        x: {  
            field: "home_score",  
            extent: [69, 149]  
        },  
        y: {  
            field: "away_score",  
            extent: [69, 148]  
        },  
        z: {  
            field: "agg_rank",  
            order: "asc"  
        }  
    },  
    marks: {  
        cluster: {  
            mode: "circle"
        },  
        hover: {  
            rankList: {  
                mode: "tabular",  
                fields: ["home_team", "away_team", "home_score", "away_score"],  
                topk: 3  
            },  
            boundary: "convexhull"  
         }  
    },  
    config: {  
        axis: true  
    }  
};

Run the following commands in the root folder to bring up this application after the docker containers are started:

> cd compiler/examples/template-api-examples
> cp ../../../docker-scripts/compile.sh compile.sh
> chmod +x compile.sh
> sudo ./compile.sh SSV_circle.js

More examples can be found here.

Authoring an SSV with Kyrix-S

As an extension, Kyrix-S interoperates with Kyrix through the Project.addSSV call. By passing a JSON specification of an SSV into Project.addSSV, you can add one SSV into an encompassing Kyrix application, either as a new set of canvases, or a set of new layers of existing canvases.

Layers and Data Bindings

Project.addSSV creates two new layers on an existing canvas, or a new canvas. These two layers are respectively:

  • Layer 0. The scatterplot layer, which consists of objects. Each object is bound with columns specified in query.
  • Layer 1. A static legend layer with no data bindings.

A Declarative Grammar for SSVs

We document how to specify an SSV in JSON down below. The complete JSON schema is here.

  • data: defines the data being visualized, required.

    • query: a SQL query to fetch data for the SSV, required. Each record in the query result should correspond to one base object in the scatterplot, regardless of whether the SSV shows aggregated information. For example, if the SSV shows aggregated circles of NBA games, the query should select NBA games. Do not worry about the aggregation, which is handled by Kyrix-S. The query should not contain LIMIT.

    • db: the database in which data.query should be run, required.

    • columnNames: an optional array specifying the field names for the query results. This is used in specifying layout-related information, aggregation or tooltips. This field is optional. If not specified, Kyrix-S will use the column names returned by the database.

  • layout: controls the placement of the marks in the multi-scale zooming space, required.

    • x: defines the horizontal axis of the SSV, required.
      • field: a quantitative field in the query result that maps to the horizontal axis of the SSV, required. This should be one of data.columnNames (if specified), or one of the column names returned in the query results by the database.
      • extent: an optional two-number array [a, b] indicating the visible range of the field. a can be larger than b. If not specified, min/max value of layout.x.field will be used as the visible range. This field is required when config.axis is true. This field should not be present when layout.geo is present.
    • y: defines the vertical axis of the SSV.
      • field: a quantitative field in the query result that maps to the vertical axis of the SSV. This should be one of data.columnNames (if specified), or one of the column names returned in the query results by the database.
      • extent: an optional two-number array [a, b] indicating the visible range of the field. a can be larger than b. If not specified, min/max value of layout.x.field will be used as the visible range. This field is required when config.axis is true. This field should not be present when layout.geo is present.
    • z: defines how objects are distributed across zoom levels, required.
      • field: a quantitative field in the query result which indicates that distribution of objects across zoom levels is based on ranking of this field, required.
      • order: can be either asc or desc, indicating that the smaller/larger field is, the more likely that an object will appear on top zoom levels, required.
    • overlap: a number between 0 and 1 indicating how much overlap between objects is desired, with 0 meaning arbitrary overlap is allowed and 1 meaning no overlap is allowed. Note that this only sets the upper bound on the amount of overlap. Kyrix-S will space the objects more if visual density becomes too high in some regions. This field is optional, defaults to 0 when mark.cluster.mode is heatmap, and defaults to 1 otherwise.
    • geo: defines the initial viewport location for geographic dimensions. If this field is present, Kyrix-S assumes that x is longitude and y is latitude, and adds to the database table specified in data.query two columns kyrix_geo_x and kyrix_geo_y representing the screen coordinates on the top zoom level. Therefore, if you'd like Kyrix-S to help you transform lat/lon into screen coordinates, make sure the query has only one table in the FROM clause. Lastly, to make sure Kyrix-S successfully transforms the coordinates, you need to make sure either data.db equals kyrix, or run the following command to install d3 in your database: sudo docker exec -it kyrix_db_1 su - postgres -c "./install-d3.sh [dbname]"
      • center: a two-number array specifying the center of the initial geo viewport, required when layout.geo is present. The first number is latitude. The second number is longitude.
      • level: an integer between 0 and 19 specifying the zoom level of the initial geo viewport, required when layout.geo is present. On zoom level 0, the entire world is projected onto a 256*256 tile. To specify a US-centric viewport, use layout.geo.center=[39.5, -98.5] and layout.geo.level=5.

    Note that null values in layout.x.field, layout.y.field and layout.z.field will be regarded as 0. So make sure the missing values in the data are properly imputed.

  • marks: defines the visual representation of one or more objects, and is consisted of two components, cluster and hover, required.

    • cluster: cluster marks are static marks rendering one or a cluster of objects, required.

      • mode: defines the type of visual marks, required. There're six types in total: circle, heatmap, radar, pie, dot and custom. The last mode custom requires a custom renderer (see marks.cluster.custom), and the maximum width/height of an object (see marks.cluster.config.bboxW).

      • aggregate: defines the aggregation information needed to render a cluster of objects, and is consisted of an array of measures and dimensions, which together forming a SQL aggregation query.

        • measures: defines what aggregation statistics to be calculated and on what fields, and is optional. If not specified, by default Kyrix-S computes count(*) for each cluster of objects. If specified, it should be an array with each element being an object with the following fields:

          • field: name of the field on which this aggregation statistic is calculated, which should be either * when specifying count, or a quantitative field from the query results, required.
          • function: the aggregation statistic to be calculated, and can be one of count, sum, avg, min, maxand sqrsum, required.
          • extent: an optional two-number array specifying the range of the calculated aggregation statistic, required for radar.

          In the case where you want to specify the same function for many fields, you can instead specify this component as an object, with fields being an array of field names (required), function being the aggregation statistic (required), and extent being the range for all measures. See here for an example. For modes circle and heatmap at most one measure can be specified. For mode pie there needs to be exactly one measure specified. For mode dot no measure is allowed.

        • dimensions: defines how objects are grouped when calculating aggregation statistics, and is optional. If not specified, no grouping is performed. If specified, it should be an array with each element being an object with the following fields:

          • field: name of the field of a grouping column, which should be a categorical field from the query results, required.
          • domain: an array of strings indicating all possible values of field, required.

          For modes circle, heatmap, radar and dot, grouping is not supported. So you do not need to specify dimensions for those modes.

      • custom: a rendering function f(svg, data, args) for the custom mode which converts a set of data items data to visual marks and attaches them to svg, required when marks.cluster.mode is custom. Each data item in data is the representative object of a cluster of objects, with an additional field clusterAgg containing aggregation statistics of this cluster. To access the size of the cluster, you can write d.clusterAgg["count(*)"] where d is the data item. If there is grouping, you can write d.clusterAgg["medical_male_avg(salary)"], which is the average salary of male employees in the medical department in this cluster. args is a dictionary containing lots of useful information about the encompassing Kyrix application, similar to the input of a Kyrix layer renderer. An example.

      • config: a set of optional parameters for customizing the looks of the cluster marks.

        • bboxW: the width of the bounding box of all cluster marks. You need to specify this if and only if you are using the custom mode.
        • bboxH: the height of the bounding box of all cluster marks. You need to specify this if and only if you are using the custom mode.
        • circleMinSize: the minimum size of the circles in the circle mode. Defaults to 30.
        • circleMaxSize: the maximum size of the circles in the circle mode. Defaults to 70.
        • heatmapRadius: the radius of an object in the heatmap mode. Defaults to 80.
        • heatmapOpacity: the opacity of heatmaps in the heatmap mode, between 0 and 1. Defaults to 1.
        • radarRadius: the radius of a radar in the radar mode. Defaults to 80.
        • radarTicks: the number of ticks on an axis of a radar in the radar mode. Defaults to 5.
        • pieInnerRadius: the inner radius of a pie in the pie mode. Defaults to 1 (pixel).
        • pieOuterRadius: the outer radius of a pie in the pie mode. Defaults to 80.
        • pieCornerRadius: the corner radius of a pie in the pie mode. Defaults to 5.
        • pieLegendTitle: title of the legend for pie-chart based SSVs. Defaults to "legend".
        • pieLegendDomain: domain of the legends for pie-chart based SSVs. Should be specified as an array of strings. If not specified, Kyrix-S will use all distinct combinations of domains as the domain for the legends.
        • padAngle: the amount of padding between pies in the pie mode. Defaults to 0.05 (radians).
        • dotSizeColumn: a string which is the name of the field in the data that maps to the size of the dots in the dot mode. If not specified, all dots have the same size (dotMaxSize).
        • dotSizeDomain: a two-number array which indicates the range of dotSizeColumn. Must be present when dotSizeColumn is present.
        • dotSizeLegendTitle: the title of the size legend for the dot mode. Defaults to Point Size.
        • dotMaxSize: the maximum radius of the dots in the dot mode. Defaults to 15 pixels.
        • dotColorColumn: a string which is the name of the field in the data that maps to the color of the dots in the dot mode. If not specified, all dots have the same color (#38c2e0).
        • dotColorDomain: an array of values which indicates the distinct values of dotColorColumn. Must be present when dotColorColumn is present.
        • dotColorLegendTitle: the title of the color legend for the dot mode. Defaults to Point Color.
    • hover: hover marks are shown when the user mouses over a cluster mark. This component is optional.

      • rankList: hover marks that show representative objects from a cluster. The ranking of objects is defined in layout.z. Cannot be specified together with marks.hover.tooltip.

        • mode: either tabular which displays representative objects in a table, or custom, which is used to customize how objects are rendered, required. For custom, bboxW and bboxH must be specified in marks.hover.rankList.config indicating the size of the bounding box of an object.
        • topk: an integer greater than 0, indicating how many representative objects are displayed upon hovering. This field is optional and defaults to 1.
        • fields: an array of fields that will be displayed in the tabular mode, required when marks.hover.rankList.mode is tabular.
        • custom: the custom renderer for the custom mode, required when marks.hover.rankList.mode is custom.
        • orientation: the direction in which representative objects are positioned, could be either horizontal or vertical. This field is optional and defaults to vertical.
        • config: a set of optional parameters for customizing the looks of the hover marks.
          • bboxW: the width of the bounding box of a custom hover mark. Required for the custom mode.
          • bboxH: the height of the bounding box of a custom hover mark. Required for the custom mode.
      • tooltip: shows simple tooltips about a cluster, instead of a ranked list of objects. Cannot be specified together with marks.hover.rankList.

      • boundary: hover marks that show the boundary of clusters. Can be either bbox, which shows the boundary as the boundingbox, or convexhull, which shows a polygonal enclosure of the cluster.

      • selector: A D3/CSS selector string which helps identify what elements rendered by the custom cluster renderer are hoverable. Note that to enable hovering for the custom cluster mode, your custom cluster renderer should add a g variable which is an SVG <g> element and render everything as its direct child. What Kyrix-S does behind the scenes involves calling something like g.selectAll("YOUR_SELECTOR").on("mouseover" ...). This field is optional and defaults to *.

  • config: a set of optional global parameters for customizing the SSV.

    • axis: a boolean representing whether axes are displayed. Defaults to false.
    • xAxisTitle: the title of the x axis. Defaults to layout.x.field.
    • yAxisTitle: the title of the y axis. Defaults to layout.y.field.
    • numLevels: number of zoom levels in the SSV. Defaults to 10.
    • topLevelWidth: width of the top level. Defaults to 1000.
    • topLevelHeight: height of the top level. Defaults to 1000.
    • zoomFactor: zoom factor between adjacent levels. Defaults to 2.
    • map: a boolean indicating whether an OpenStreetMap background is rendered. Defaults to true if layout.geo is present and false otherwise.
    • numberFormat: a D3 format specifier controlling how numbers are displayed in the SSV. Defaults to ~s (decimal notation with an SI prefix, rounded to significant digits, trimming trailing zeros).

A Note on Memory Consumption

In the current release, Kyrix-S only works on a single node with sufficient main memory that can hold all data. To allocate memory to the kyrix container, run the following:

> sudo ./run-kyrix.sh --mavenopts -Xmx700m      # allocate 700MB memory to the kyrix container

if not specified, the default memory allocated is 512MB. Generally, if the size of raw data is X, you'll need to allocate 10X memory to the kyrix container.

We do have a multi-node Kyrix-S that can scale to billions of objects. We are working on testing it more thoroughly and include it in a future release. Stay tuned!