Skip to content

Latest commit

 

History

History
137 lines (94 loc) · 3.97 KB

README.md

File metadata and controls

137 lines (94 loc) · 3.97 KB

Spliterator 🎀

Spliterator is a TypeScript library for streaming delimited content such as CSV, TSV and JSONL.

Let's say you have a huge newline-delimited JSON file that can't fit into memory:

{"name": "Jessie", "age": 30}
{"name": "Kelly", "age": 40}
{"name": "Loren", "age": 50}
// Several hundred thousand more lines...

Spliterator can help you read it line-by-line without loading the entire file into memory:

import { JSONSpliterator } from "spliterator"

interface Person {
	name: string
	age: number
}

const reader = JSONSpliterator.fromAsync("example.jsonl")

for await (const line of reader) {
	console.log(line) // {"name": "Alice", "age": 30}, etc.
}

NPM Version NPM License

Installation

yarn add spliterator
# or
npm install spliterator

Usage

Character-delimited files

While Spliterator supports any delimited byte stream, it's particularly useful for character-delimited content such as comma-separated values (CSV), tab-separated values (TSV) – or any other delimiter you can think of.

Full Name, Occupation, Age
Morgan, Developer, 30
Nataly, Designer, 40
Orlando, Manager, 50
import { CSVSpliterator } from "spliterator"

const reader = CSVSpliterator.fromAsync("people.csv")

for await (const columns of reader) {
	console.log(columns) // ["Full Name", "Occupation", "Age"], ["Morgan", "Developer", 30], etc.
}

CSV files can also be emitted as objects with headers as keys, with some quality-of-life features, such as normalizing property keys:

import { CSVSpliterator } from "spliterator"

interface Person {
	full_name: string
	occupation: string
	age: number
}

const reader = CSVSpliterator.fromAsync<Person>("people.csv", { mode: "object" })

for await (const columns of reader) {
	console.log(columns) // { full_name: "Morgan", occupation: "Developer", age: 30 }, etc.
}

CLI Usage

Spliterator also includes a CLI tool that can be used to stream delimited content from the command line, transform it, filter it, and more.

spliterator csv people.csv people.jsonl

The CLI also supports reading from standard input:

cat people.csv | spliterator csv people.jsonl

For information on all available commands, run spliterator --help.

Advanced Usage

Spliterator includes a collection of low-level classes and interfaces that can be used to create custom generators for any kind of delimited content.

For more advanced usage, check out our tests in the test directory, or our fully-annotated source code.

Reading from a stream

All included Spliterators implement the Generator and AsyncGenerator interfaces, so you can use them in for...of and for await...of loops, as well the web-native ReadableStreams, so you can use them in for await...of loops, as well as piping them through transformations to avoid nested and partially materialized streams.

import { JSONSpliterator } from "spliterator"

const people = [
	{ name: "Alice", age: 30 },
	{ name: "Bob", age: 40 },
	{ name: "Charlie", age: 50 },
]

const generator = JSONSpliterator.from(people.map(JSON.stringify).join("\n"))
const stream = ReadableStream.from(generator)

for await (const line of stream) {
	console.log(line) // {"name": "Alice", "age": 30}, etc.
}

Custom generators

While Spliterator includes premade exports for most use-cases, custom generators can be created via Spliterator and AsyncSpliterator. This class is a low-level interface that allows you to create your own generators for any kind of delimited content.

License

Spliterator is licensed under the AGPL-3.0 license. Generally, this means that you can use the software for free, but you must share any modifications you make to the software.

For more information on commercial usage licensing, please contact us at [email protected]