Skip to content

Releases: bullet-db/bullet-core

Supports extended field extraction for Filters and Aggregations

21 Nov 22:21
Compare
Choose a tag to compare

This release lets us use the new extended extraction of fields supported by bullet-record-0.2.2.

Filters and Aggregations can now use the extended dot notation to access Maps, Maps of Maps or Lists and Lists of Maps.

Address #47 partially

QueryManager#size and hasQuery(String)

19 Nov 23:44
Compare
Choose a tag to compare

A minor release adding two new helpers: int size() and boolean hasQuery(String) to the QueryManager.

Another category in QueryCategorizer

16 Nov 19:56
Compare
Choose a tag to compare

This release adds a new category for Queriers in QueryCategorizer. After categorizing queries (with or without a BulletRecord), you can use QueryCategorizer#getHasData to get the queries that have new data. This is in addition to the getDone, getRateLimited and getClosed queriers.

Query Partitioner and other changes

07 Nov 02:18
Compare
Choose a tag to compare

This release adds:

  1. A Partitioner concept to bullet-core and an implementation: SimpleEqualityPartitioner
  2. A few new validator helper methods to com.yahoo.bullet.common.Validator and a couple more helpers to BulletConfig
  3. Moves the FilterOperations, ProjectionOperations, AggregationOperations, WindowingOperations and PostAggregationOperations to a nested package com.yahoo.bullet.querying.operations.
  4. Rewrites StringFilterClause instances to ObjectFilterClause on Query#configure(BulletConfig)
  5. Removes the AutoCloseable interface from com.yahoo.bullet.pubsub.PubSub
  6. Adds a hasNull abstract method to com.yahoo.bullet.parsing.Clause and makes the RELATIONALS and LOGICALS member variables a Set instead of a List.

Partitioner

In the filtering stage, current Bullet implementations gets all the queries (Querier instances) and feeds each one the BulletRecord for each BulletRecord. With partitioning, the goal is to minimize the number of queries that need to see a given BulletRecord (ideally only the queries that would need to see it, see it). This release adds an interface com.yahoo.bullet.querying.partitioning.Partitioner and provides a com.yahoo.bullet.querying.partitioning.SimpleEqualityPartitioner implementation that ONLY works for queries with equality filters (==) connected by and operations (AND) on singular values.

For example, if queries to your instance of Bullet commonly have equality ANDed filters on fields A, B.c and D, take the following query:

A == foo AND B.c == bar AND D == null, using the fields [A, B.c, D] will make sure that records will values of foo, bar and null for those fields, will be seen only by those queries that are have those filters (or subsets of it), including queries with no filters on these fields (and queries with no filters at all).

The SimpleEqualityPartitioner#getKeys(Query) returns a set of size 1. This means the queries need not be duplicated after partitioning. However, SimpleEqualityPartitioner#getKeys(BulletRecord)will return a set of keys representing the queries that this record needs to presented to. The size of this can be up to 2^(# of fields), where each of the keys is the list of all combinations of null and the actual value in the record for each field. If fields are [A, B.c, D] and a record has these values: [A: foo, B.c: bar, D: baz, ...], the keys will be the the concatenation of the following items in each tuple using the user-configured delimiter (not necessarily in this order):

[foo, bar, baz]
[foo, null, baz]
[foo, bar, null]
[foo, null, null]
[null, bar, baz]
[null, bar, null]
[null, null, baz]
[null, null, null]

Using these keys and presenting the record to all the queries with the same key will ensure that the record is seen by exactly only the queries that need to see it. This can be huge savings depending on the cardinality of these fields and number of queries that get removed from the overall pool of queries.

Note: If you use the partitioner when your workload of queries don't have these kinds of filters, nothing bad will happen but you will unnecessary do work to compute that your queries don't fit the partitioner and default partition most, if not all, of them.

To configure a Partitioner, these new settings with their defaults have been added:

bullet.query.partitioner.enable: false
bullet.query.partitioner.class.name: "com.yahoo.bullet.querying.partitioning.SimpleEqualityPartitioner"
bullet.query.partitioner.equality.fields: null
bullet.query.partitioner.equality.delimiter: "|"

See the settings for a full explanation

AutoCloseable Pubsub Components, HttpClient 4.3.6

21 Oct 22:38
Compare
Choose a tag to compare

This release is a small improvement to the PubSub interfaces and it makes the com.yahoo.bullet.pubsub.{PubSub,Publisher,Subscriber} classes implement the AutoCloseable interface.

It also bumps the org,apache.httpcomponents.HttpClient dependency to 4.3.6 for a security bugfix.

Better Order By, Smaller Serializations, Transient Fields

26 Sep 00:32
Compare
Choose a tag to compare

This release changes the Order By post aggregation to not allow multiple order bys in post aggregations and allows you to specify just one. Instead of a list of string field names and a singular ascending or descending direction, you now specify a list of field objects, each with a name and a direction.

Order By changes

Before:

{
  "filters": {},
  "aggregation": {},
  "postAggregations": [ 
    { "type": "ORDERBY", "fields": ["A", "B"], "direction": "DESC" }
 ]
}

This let you only sort a set of fields by one direction. It also let you specify other order bys in the post aggregations that would destroy the ordering of the first.

Now:

{
  "filters": {},
  "aggregation": {},
  "postAggregations": [ 
    { "type": "ORDERBY", "fields": [ { "field": "A", "direction": "ASC" }, { "field": "B", "direction": "DESC" } ] }
 ]
}

This lets you sort in different directions by multiple fields using the later fields to sort when there are ties. It is now an error to have multiple order bys in the list of post aggregations. This closes #54

Projection

Previously, if the projection field in a Query was null or the fields in a projection was null or empty, all the fields would be projected regardless of whether a computation post aggregation was added to create new fields. It is now possible to just get your post aggregations fields by adding a projection in this manner:

{
  "projection": { "fields": { } },
  "postAggregations": [{ 
    "type": "COMPUTATION", 
    "expression": { 
      "left": { "value": { "kind": "FIELD", "value": "a" } }, 
      "right":  { "value": { "kind": "VALUE", "value": "5" } },
      "operation": "+"
    }
  }]
}

This will produce just the result a + 5 in your result.

In the future, we will use #57 to support pre-aggregation computations that might express this in a more natural manner.

Other changes

This release also fixes #53 and drops the size of the serialized Querier object. This should improve some overhead in state management in https://github.com/bullet-db/bullet-spark

There is also some internal change to support transient fields that are needed to support post aggregations

Post Aggregations!

15 Sep 02:29
Compare
Choose a tag to compare

This release adds post aggregations to Bullet queries! It also adds casting support to both filtering (and in post aggregations).

Casting

Filters allowed you specify a list of values with the following format:

  "value": {
    "kind": "FIELD | VALUE",
    "value": "foo.bar",
  }

Now a new key, type has been added (in addition to kind and value), which you can use to type cast the constant or the extracted field. Note that if the casting fails, this filter will be ignored.

Post Aggregations

Post aggregations are specified by adding a new top-level field to the query: postAggregations, which is a list of the various post aggregations. The order of the various post aggregations in this list determines how they are evaluated. Post aggregations can refer to previous results of post aggregations in the list to chain them.

{
  "filters": [], 
  "aggregation": {}, 
  "postAggregations": []
}

We start with two kinds of post aggregations:

1. ORDER BY

This orders result records based on given fields (in ascending order by default). To sort the records in descending order, use the DESC direction. You can specify any fields in each record or from previous post aggregations. Note that the ordering is fully typed, so the types of the fields will be used. If multiple fields are specified, ties will be broken from the list of fields from left to right.

{
  "type": "ORDERBY",
  "fields": ["A", "B"],
  "direction": "DESC"
}

2. COMPUTATION

This lets you perform arithmetic on the results in a fully nested way. We currently support +, -, * and / as operations. The format for this is:

{
  "type": "COMPUTATION",
  "expression": {}
}
2.1 Expressions

For future extensibility, the expression in this post aggregation is free form. Currently, we support the binary arithmetic operations that can be nested (implying parentheses). This forms a tree of expressions. The leaves of this tree resolve atomic values such as fields or constants. So, there are two kinds of expressions.

A) Binary Expressions
{
  "operation": "+",
  "left": {},
  "right": {},
  "type": "INTEGER | FLOAT | BOOLEAN | DOUBLE | LONG | STRING"
}

, where left and right are themselves expressions and type is used for force cast the result to the given type.

B) Unary Expressions
{
  "value": {
    "kind": "FIELD | VALUE",
    "value": "foo.bar",
    "type": "INTEGER | FLOAT | BOOLEAN | DOUBLE | LONG | STRING"
  }
}

These is the same definition value used for filtering mentioned above and can be used to extract fields from the record as your chosen type or use constants as your chosen type.

If casting fails in any of the expressions, the expression is ignored.

Example

Putting all these together, here is a complete example of post aggregation. Stay tuned for bullet-bql-0.2.0 for the much more concise BQL version of these. This first force computes a new field C: (CAST(foo.bar, LONG) + CAST((CAST(1.2, DOUBLE)/CAST(1, INTEGER)), FLOAT) or (C: foo.bar + (1.2/1) for each record in the result window and then orders the result by foo.baz first then by the new the field C.

{
   "postAggregations":[
      {
         "type":"COMPUTATION",
         "expression":{
            "operation":"+",
            "left":{
               "value":{
                  "kind":"FIELD",
                  "value":"foo.bar",
                  "type":"LONG"
               }
            },
            "right":{
               "operation":"/",
               "left":{
                  "value":{
                     "kind":"VALUE",
                     "value":"1.2",
                     "type":"DOUBLE",
                  }
               },
               "right":{
                  "value":{
                     "kind":"VALUE",
                     "value":"1",
                     "type":"INTEGER"
                  }
               },
               "type":"FLOAT"
            },
            "newName":"C"
         }
      },
      {
         "type":"ORDERBY",
         " fields":[
            "foo.baz", "C"
         ],
         "direction":"ASC"
      }
   ]
}

Sliding window, filtering against other fields, SIZEIS, CONTAINSKEY, and CONTAINSVALUE

06 Sep 01:30
Compare
Choose a tag to compare

Sliding Window

This release lets you specify a Emit Type Record window with Include Type Record and Last set to more than 1. This is the general case of the Reactive window - a Sliding window. As before, we only support this currently for Raw aggregations.

New Relational operations

Addresses #36. We have added support for three new relational filter operations:

  1. SIZEIS - lets you check if a complex field (MAP, LIST) has a given size. You can nest this in a NOT logical operator to get SIZE NOT IS. If you apply this to a STRING field, the length of the string is used for the comparison.

  2. CONTAINSKEY lets you check if a MAP field contains the given key. This MAP field can be a top level MAP field or a MAP field inside a LIST of MAPS

  3. CONTAINSVALUE lets you check if a MAP field or a LIST field contains the given value. In the case of a LIST: if the LIST contains primitive values, they are used to compare against the given value and if it is a LIST of MAPs, the values of the MAPs are checked.

Filter against other fields

You can now filter against other fields, not just constants!

Relational filters (not Logical) have been extended to support a new format:

Old Relational Filter
{
   "operation": "SIZEIS",
   "field": "foo",
   "values": ["1"] 
}
New Relational Filter
{
   "operation": "SIZEIS",
   "field": "foo",
   "values": [{ "kind": "VALUE", "value": "1"}] 
}

The old formats are still accepted. However, you may not mix and match both styles in the same values.

To compare to another field, simply use

{
   "kind": "FIELD",
   "value": "foo.bar"
}

This feature will let us support casting #37.

[BUG] - RESTPubSub Closes All HTTP Connections

27 Jun 00:35
Compare
Choose a tag to compare

This release fixes a bug in the RESTPubSub to ensure that the RESTPublisher and RESTSubscriber close all HTTP connections when they are finished posting or getting data from the PubSub REST endpoints.

It also changes a default setting to:

bullet.query.rate.limit.max.emit.count: 50

So now, by default, the backend will return a rate-limit error at 50 messages (or more) every 100 milliseconds.

Added RESTPublisher HTTP Timeout Setting

23 Jun 01:02
Compare
Choose a tag to compare

This release adds a RESTPubSub setting to configure an HTTP timeout for the RESTPublisher. The previous timeout setting only applied to the RESTSubscriber.

The two settings to configure the HTTP timeout for RESTSubscriber and RESTPublisher are now:

bullet.pubsub.rest.subscriber.connect.timeout.ms
bullet.pubsub.rest.publisher.connect.timeout.ms

Both will default to 5 seconds if not included in your settings file.

The old bullet.pubsub.rest.connect.timeout.ms setting should be removed from your settings file.