Skip to content

Latest commit

 

History

History
88 lines (60 loc) · 4.26 KB

README.md

File metadata and controls

88 lines (60 loc) · 4.26 KB

Javascript-only data library providing functionality like DataFrame in Pandas or R.

This project is at discussion stage - please leave comments and suggestions.

Key Features

See also #3

  • Access rows and fields within rows quickly and easily using a convenient syntax e.g. dataset[rowid], dataset[rowid][fieldid] -- preferably both dictionary and index style access
  • Query data and "freeze" it to a new DataFrame
  • Import data into a DataFrame from common sources and especially Tabular Data Packages ... (if possible do this without re-inventing the wheel by leveraging work elsewhere)
  • Be able to cope with large amounts of data e.g. 100k or 1m row CSV (see issue #6 for discussion)
var d = DataFrame(data-source or raw data)
// first row of data
d[0]
// data fields / columns
d.fields

d.query(querySpec)

Context

Context: I built something like this before with Recline and its Dataset object. There's also the work in Miso who did a very nice job on the Dataset object. However, neither Recline or Miso got it quite right. It's time to look at this again.

There's also a connection with Tabular Data Packages and JSON Table Schema and associated tooling such as the various Data Package javascript libraries. Manipulating the tabular data from a Tabular Data Package in javascript will need some kind of library and object and a DataFrame library could provide this.

Research

Suggestions welcome: please open pull requests or issues

Existing Libraries in Javascript

See this issue: #5

What is the Full Stack for Data

DataFrame only covers a part of the "data stack":

  • DataFrame object for holding "rows" of data
    • Might add a Dataset (or DataPackage) object as a way to represent an overall Dataset with metadata and possibly multiple DataFrames (plus other info)
  • DataQuery - querying data efficiently, storing and reifying queries. « not sure about this one as needed separately from e.g. DataFrame
  • Connectors - data import / export from other sources ranging from CSV to RDBMS and more. Again probably not part of DataFrame but separate libraries. This is a particular area where the connection with Tabular Data Package and JSON Table Schema is strong

There is of course lots of other stuff in the "data stack" such as the following:

  • Views / Visualization - data presentation (grids, graphs etc). Largely handled by third party libraries.
  • Validation
  • Analytics
  • ETL
  • etc

These might use DataFrame but they are not specifically in DataFrame.

Contributing

We use mocha for testing, install it as follows:

npm install -g expect.js
npm install -g mocha

Then to run tests:

mocha