-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare with Wikibase tabular data model #1059
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good comparison. Cf. CSVW https://datapackage.org/guides/csvw-data-package/ I would write it a bit more from the viewpoint of someone that is familiar with MediaWiki Tabular Data (and discovers Data Package). Tables could then have 3 columns:
- MediaWiki property I'm familiar with
- Is it supported in Data Package
- Details on how it is supported in Data Package
We could also do it differently (and update CSVW), but I would prefer to align how we write comparison guides.
|
||
MediaWiki Tabular data has three required and two optional top-level fields. Most of these fields map to corresponding fields of a Data Resource: | ||
|
||
Field | Data Resource | Tabular data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://datapackage.org/guides/csvw-data-package/#columns compares properties for users that know CSVW, but want to know if it is supported in Data Package. I wonder if the same approach would make sense here? It would allow to combine the table with the differences.
For example (see source for markdown):
MediaWiki Tabular Data property | Data Package support | Details |
---|---|---|
name (implied from page)* | Yes | As resource.name |
description | Yes | As resource.description, but without support for localized strings |
data* | Yes | MediaWiki always requires an array of arrays, while Data Package supports other formats as well as data files referred to by path |
license | Yes | As resource.licenses with a name (e.g. CC0-1.0 ) |
schema* | Yes | As resource.schema, see further |
sources | Yes | As resource.sources, but without support for Wiki markup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, I would also link the MediaWiki Tabular Data property
|
||
### Data types | ||
|
||
Tabular data supports four data types that overlap with [Table Schema data types](/standard/table-schema/#field-types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar as suggestion above, we could express table cf. https://datapackage.org/guides/csvw-data-package/#data-types
MediaWiki Tabular Data data type | Data Package support | Details |
---|---|---|
number | Yes | As number field type, which has more extensive support |
boolean | Yes | As boolean field type |
string | Yes | As string field type, but without the 400 character limitation and can include \n and \t |
localized | No |
Missing values in MediaWiki Tabular Data are expressed as null
, while in Data Package you need to explicitly list values that should be considered missing in schema.missingValues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep it as a list:
number
subset of Table Schema number (no NaN, INF, or -INF)boolean
same as Table Schema booleanstring
subset of Table Schema string (limited to 400 characters at most and must not include\n
or\t
)localized
refers to an object that maps language codes to strings with same limitations asstring
type.
This type is not supported in Table Schema.
Indiviual values in a MediaWiki Tabular Data table can always be null
, while in Table Schema you need to explicitly list values that should be considered missing in schema.missingValues.
|
||
The `schema` field of MediaWiki tabular contains an object with field `fields` just like [Table Schema](/standard/table-schema/) but no other fields are allowed. The definition of individual schema fields also differs as following: | ||
|
||
Field | Table Schema | Tabular data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cf. suggestions above
MediaWiki Tabular Data field property | Data Package support | Details |
---|---|---|
name* | Yes | As field.name |
type* | Yes | As field.type |
title | Yes | As field.title, but without support for localized strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to:
MediaWiki Tabular Data Property | Data Package Table Schema |
---|---|
name (required) must be a string matching ^[a-zA-Z_][a-zA-Z_0-9]* |
name (required) can be any string |
type (required) is one of the Data Types above | type (optional) with different data types |
title (optional) is a localized string | title (optional) is a plain string |
Thanks, I'll adjust the commit but try to make it usable for both familiar with MediaWiki and Data Package, so there is no "supported yes or no". |
972b437
to
272afc1
Compare
fixes #990