Skip to content
Eugene Lazutkin edited this page Jun 18, 2018 · 4 revisions

All provided filters use FilterBase as its foundation. It is a base class meant to be extended, which provides common facilities used by filters.

FilterBase is based on Transform. It operates in object mode editing token streams produced by a parser or another filter.

API

This document describes the user-facing interface only. If you want to build your own filter, feel free to inspect the code to gain more insights.

Internally, FilterBase keeps track of objects by building a stack. Items of a stack can be:

  • Number. In this case, a corresponding object is an array, and the number is the current index.
  • String. In this case, a corresponding object is an object, and the string is the current property key.
  • null. In this case, a corresponding object is an object, but keys are not tracked. FilterBase keeps track of keys only if a previous stream returns packed keys. In this case, a filter assumes that only object's shape will be used for filtering.

The stack is used to make filtering.

constructor(options)

options is an optional object described in details in node.js' Stream documentation. Additionally, the following optional custom properties are recognized:

  • pathSeparator is a string that separates stack values when it is converted to a string. The algorithm is straightforward: stack.join(pathSeparator). The default: '.'.
    const obj = [{a: 1}, {b: 2}];
    
    // stack when filtering 1: [0, 'a']
    // converted to a string: '0.a'
    
    // stack when filtering 2: [1, 'b']
    // converted to a string: '1.b'
  • filter is a way to accept or reject a data item. The interpretation of its returned value is up to concrete filter objects. Its value can be one of the following types:
    • String. The stack is converted to a string using pathSeparator, then it should be equal to filter value, or it should be longer and the filter value should be on a boundary of the pathSeparator value.
      const obj = {a: [1, 2], ab: null};
      
      const filter = 'a';
      // it fits ['a'], ['a', 0], and ['a', 1], but not ['ab']
    • RegExp. The stack is converted to a string using pathSeparator, then the filter is applied using filter.test(path).
      const obj = {a: [1, 2], ab: null};
      
      const filter = /^a\b/;
      // it fits ['a'], ['a', 0], and ['a', 1], but not ['ab']
      
      const filter = /^a/;
      // it fits ['a'], ['a', 0], ['a', 1], and ['ab']
    • Function. The filter is applied using filter(stack, chunk), where chunk is a data item being filtered. The function is called in the context of current filter object. It should return a truthy/falsy value.
    • The default: () => true.
  • once is a flag. When it is truthy, a filter object will make a selection (depending on its definition of selection) only once. Otherwise, all selections are included. The default: false.
    • It can be used as an optimization feature when we know that our stream contains exactly one object we want to do our action on.
  • replacement is what should be used instead of skipped objects. Not all filters use this option. Its value can be one of the following types:
    • Function. The filter is applied using replacement(stack, chunk), where chunk is a data item being filtered. The function is called in the context of current filter object. It should return an array of semantically valid data items.
    • Otherwise, it is assumed to be a static array of semantically valid data items.
    • The default: a null data item — [{name: 'nullValue', value: null}].
  • allowEmptyReplacement is a flag. It explicitly allows or disallows to replace removed values with an empty array.
    • The problem is that when streaming an object, a key will be already streamed, then a filter may want to remove a corresponding value (replace it with an empty array), which produces an invalid JSON stream. In order to avoid it, when allowEmptyReplacement is falsy, a filter checks the length of a replacement array and replaces it with the default (usually the null data item) if it is empty.
    • If a source stream packs keys, the problem can be avoided by delaying streaming keys. When allowEmptyReplacement is true, a filter will use this algorithm to stream keys.
  • Streaming flags. They are used only when a filter stream delayed keys (allowEmptyReplacement is true). Both of them are here for compatibility with Parser. See details in Parser's options.
    • streamValues is assigned first.
    • streamKeys is assigned next. When it is effective value is falsy, no startKey, stringChunk, nor endKey are produced. Only keyValue is issued.
    • The default: true.

Important details

Stack and path

When using a string or a regular expression as a filter function, the stack is converted to a path string before the filter can be applied. It should be noted that when a source stream does not produce keyValue data items, the stack uses null to denote an undefined property key, which is converted to a path string as an empty string:

[].join('.')
// produces: ''

[null].join('.')
// produces: ''

[null, null].join('.')
// produces: '.'

[null, 1, null].join('.')
// produces: '.1.'

[1, null, null, null, 2, null].join('.')
// produces: '1....2.'

Be aware of this behavior when crafting filters.

Property keys can be arbitrary strings. Sometimes it can mess up paths and textual filters. In order to avoid it, you can choose a different pathSeparator. It can be any string you like, just make sure it works with your filters.

const filter = Filter({pathSeparator: '->'});
// it will produce paths like that:
// [1, 'a'] => '1->a'
// [1, 0, 'ab', 0] => '1->0->ab->0'

Replacement hazards

Filters do not check if an array of replacement items is valid or not. Malformed arrays will produce substreams, which can break the rest of the data pipeline. Be extra careful with replacement and allowEmptyReplacement options.

Clone this wiki locally