Spec for model/cube #855

pwalsh · 2015-12-30T10:00:46Z

pwalsh
Dec 30, 2015

Fiscal Data Package has a mapping object. This is very very handy for building a logical model out of the physical data sources when appropriate. This logical model can in turn be used to automate visualisations and data loaders, for example.

Actually, there is nothing particularly "Fiscal" about this mapping: it is simply an OLAP cube implementation with measures and dimensions. I think we could extract out the generic pattern and expose it as a spec for declaring a model/cube mapping for any tabular data package.

danfowler · 2016-01-04T23:02:52Z

danfowler
Jan 4, 2016

+1 I was thinking in a similar direction (that the mapping/model of FDP should be made generic) when I posted this comment.

0 replies

s-celles · 2016-02-29T16:01:39Z

s-celles
Feb 29, 2016

+1 also see https://discuss.okfn.org/t/datapackage-for-3-dimensional-arrays-and-maybe-more/2107/2

0 replies

pwalsh · 2016-07-12T09:00:10Z

pwalsh
Jul 12, 2016
Author

@rgrp and all

I'm into this idea of course, but I'd rather see how this plays out with a views spec. Happy to leave this open for a while though while views gets worked on.

0 replies

rufuspollock · 2016-08-11T14:34:35Z

rufuspollock
Aug 11, 2016
Maintainer

WONTFIX. I'm going to close as wontfix for now and we can re-open if there is interest / need.

0 replies

danfowler · 2017-02-07T21:28:37Z

danfowler
Feb 7, 2017

I know this is closed, but I just came across this which seems relevant:

https://json-stat.org/

The JSON-stat format is a simple lightweight JSON format for data dissemination. It is based in a cube model that arises from the evidence that the most common form of data dissemination is the tabular form. In this cube model, datasets are organized in dimensions. Dimensions are organized in categories.

0 replies

rufuspollock · 2017-02-08T00:44:20Z

rufuspollock
Feb 8, 2017
Maintainer

@danfowler thanks - and I am aware of them (I think this started out as a simple version of SDMX).

0 replies

rufuspollock · 2017-02-08T00:45:42Z

rufuspollock
Feb 8, 2017
Maintainer

Re-opening. @pwalsh and I have discussed this recently and clear interest here and we'd like to start something in the nearish future.

/cc @ericbusboom

0 replies

ericbusboom · 2017-02-13T19:30:54Z

ericbusboom
Feb 13, 2017

I've been going through the JSON-stat website, and so far, I'm pretty sure that I don't understand it at all, and that none of my analysts would be able to create a JSON-stat file by hand. I can tell that the format depends on having several array properties all have the same length, basically breaking a conceptual object into separate fields, which seems like a maintenance nightmare. There is plenty to learn from here, but I don't think it is a good model for a design.

For my users, my top requirement is that it is easy to create and read the specifications. I want data creators to be able to annotate measures and dimensions from memory, with very little training. Data users must be able to understand the annotations with no training.

I have a strong preference for embedding the measure and dimension classifications into the schema, because it's easier to create and read. This can be as simple as:

Defining names in a taxonomy for the types of measures and dimensions
Attaching the names to columns in the existing schema

I imagine the names being mostly common terms like "dollars" or "yen" or "weight" or "sex".

I'd further propose that the names have a hierarchical structure to them, to allow for specification and extension. For instance 'weight/lbs' vs 'weight/kg' to distinguish units, or 'race/omb' vs race/census' to distinguish between different systems of standards for race.

But, it should also be possible for the user to annotate a column with just "weight." That's not ideal, but I've learned that getting 20% is better than getting 0%.

I'd further propose that the names be linked to JSON definitions that can be inlined or well-known. So "race/omb" may have an associated JSON file, possibly similar to the existing JSON-state or Financial data package forms. Then, perhaps, users could also define their own term 'race/orgname' and include a their own definitions in the package.

I don't (currently) have strong opinions about the structure of the definitions for the names -- the Fiscal Data Package definitions seem suitably extensible and generalizable -- since the definitions would mostly be created by experts.

However, am strongly opinionated that the typical user should be able to annotate the dataset with nothing more than applying a measure/dimension name to a column in the existing schema, and those names should be familiar and easy to memorize.

For reference, here are the inputs and outputs of the annotation system I'd produced before. This one has a rich datatype field ( rather than a separate field for the measure/dimension annotation), and a parent connection to link columns. The measure/dimension classification is inherent in the rich datatype; "count" is always a measure, "raceeth" is always a dimension. Here is a schema file:

http://test.docker1.civicknowledge.com/bundles/d04p006/file/schema.csv

And here is what the file looks like when rendered for the web:

http://test.docker1.civicknowledge.com/partitions/p04p00f006

As with Tableau, dimensions are green and measures are blue. Errors and uncertainties are grey. indentation represents parent/child relationships.

0 replies

pwalsh · 2017-02-13T20:17:03Z

pwalsh
Feb 13, 2017
Author

so, now actually reopening, and also ref. frictionlessdata/datapackage#343

0 replies

pwalsh · 2017-02-15T06:37:50Z

pwalsh
Feb 15, 2017
Author

@ericbusboom

I am quite sure I completely get what you want here, and it is very inline with where I think we need to go to generalise this out of our previous work on FDP.

One question: you say

But, it should also be possible for the user to annotate a column with just "weight." That's not ideal, but I've learned that getting 20% is better than getting 0%.

Which user? Someone who edits a descriptor file directly, so, someone comfortable with text editing a JSON file?

I ask because I want to distinguish between a canonical representation of something on the descriptor, and an ideal user experience for "end users" who might generate a descriptor via a series of actions.

OpenSpending currently supports a customisation to FDP (unspecified as yet) which does such annotations per field.

0 replies

ericbusboom · 2017-02-15T20:10:54Z

ericbusboom
Feb 15, 2017

Ah, Good question. I half-thought "user" was the wrong word when I wrote that .... I should have said Creator and Wrangler, as described in this analysis model. So, it's the people who are creating the dataset and the data dictionary, not the people who are defining what "weight" means.

I ask because I want to distinguish between a canonical representation of something on the descriptor, and an ideal user experience for "end users" who might generate a descriptor via a series of actions.

Yes, Absolutely. The definition of what "weight" is could be ( probably should be ) JSON.

I've updated one of my older specifications into a proposal for a semantic datatype category taxonomy. This is basically the system I've linked to previously, used in Ambry.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec for model/cube #855

{{title}}

Replies: 11 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Spec for model/cube #855

pwalsh Dec 30, 2015

Replies: 11 comments

danfowler Jan 4, 2016

s-celles Feb 29, 2016

pwalsh Jul 12, 2016 Author

rufuspollock Aug 11, 2016 Maintainer

danfowler Feb 7, 2017

rufuspollock Feb 8, 2017 Maintainer

rufuspollock Feb 8, 2017 Maintainer

ericbusboom Feb 13, 2017

pwalsh Feb 13, 2017 Author

pwalsh Feb 15, 2017 Author

ericbusboom Feb 15, 2017

pwalsh
Dec 30, 2015

danfowler
Jan 4, 2016

s-celles
Feb 29, 2016

pwalsh
Jul 12, 2016
Author

rufuspollock
Aug 11, 2016
Maintainer

danfowler
Feb 7, 2017

rufuspollock
Feb 8, 2017
Maintainer

rufuspollock
Feb 8, 2017
Maintainer

ericbusboom
Feb 13, 2017

pwalsh
Feb 13, 2017
Author

pwalsh
Feb 15, 2017
Author

ericbusboom
Feb 15, 2017