Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare with Wikibase tabular data model #1059

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nichtich
Copy link
Contributor

@nichtich nichtich commented Dec 9, 2024

fixes #990

Copy link
Member

@peterdesmet peterdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comparison. Cf. CSVW https://datapackage.org/guides/csvw-data-package/ I would write it a bit more from the viewpoint of someone that is familiar with MediaWiki Tabular Data (and discovers Data Package). Tables could then have 3 columns:

  • MediaWiki property I'm familiar with
  • Is it supported in Data Package
  • Details on how it is supported in Data Package

We could also do it differently (and update CSVW), but I would prefer to align how we write comparison guides.

content/docs/guides/mediawiki-tabular-data.md Outdated Show resolved Hide resolved
content/docs/guides/mediawiki-tabular-data.md Outdated Show resolved Hide resolved
content/docs/guides/mediawiki-tabular-data.md Outdated Show resolved Hide resolved

MediaWiki Tabular data has three required and two optional top-level fields. Most of these fields map to corresponding fields of a Data Resource:

Field | Data Resource | Tabular data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://datapackage.org/guides/csvw-data-package/#columns compares properties for users that know CSVW, but want to know if it is supported in Data Package. I wonder if the same approach would make sense here? It would allow to combine the table with the differences.

For example (see source for markdown):

MediaWiki Tabular Data property Data Package support Details
name (implied from page)* Yes As resource.name
description Yes As resource.description, but without support for localized strings
data* Yes MediaWiki always requires an array of arrays, while Data Package supports other formats as well as data files referred to by path
license Yes As resource.licenses with a name (e.g. CC0-1.0)
schema* Yes As resource.schema, see further
sources Yes As resource.sources, but without support for Wiki markup

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I would also link the MediaWiki Tabular Data property


### Data types

Tabular data supports four data types that overlap with [Table Schema data types](/standard/table-schema/#field-types):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as suggestion above, we could express table cf. https://datapackage.org/guides/csvw-data-package/#data-types


MediaWiki Tabular Data data type Data Package support Details
number Yes As number field type, which has more extensive support
boolean Yes As boolean field type
string Yes As string field type, but without the 400 character limitation and can include \n and \t
localized No

Missing values in MediaWiki Tabular Data are expressed as null, while in Data Package you need to explicitly list values that should be considered missing in schema.missingValues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep it as a list:

  • number subset of Table Schema number (no NaN, INF, or -INF)
  • boolean same as Table Schema boolean
  • string subset of Table Schema string (limited to 400 characters at most and must not include \n or \t)
  • localized refers to an object that maps language codes to strings with same limitations as string type.
    This type is not supported in Table Schema.

Indiviual values in a MediaWiki Tabular Data table can always be null, while in Table Schema you need to explicitly list values that should be considered missing in schema.missingValues.

content/docs/guides/mediawiki-tabular-data.md Outdated Show resolved Hide resolved

The `schema` field of MediaWiki tabular contains an object with field `fields` just like [Table Schema](/standard/table-schema/) but no other fields are allowed. The definition of individual schema fields also differs as following:

Field | Table Schema | Tabular data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cf. suggestions above

MediaWiki Tabular Data field property Data Package support Details
name* Yes As field.name
type* Yes As field.type
title Yes As field.title, but without support for localized strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to:

MediaWiki Tabular Data Property Data Package Table Schema
name (required) must be a string matching ^[a-zA-Z_][a-zA-Z_0-9]* name (required) can be any string
type (required) is one of the Data Types above type (optional) with different data types
title (optional) is a localized string title (optional) is a plain string

content/docs/guides/mediawiki-tabular-data.md Outdated Show resolved Hide resolved
content/docs/guides/mediawiki-tabular-data.md Outdated Show resolved Hide resolved
@nichtich
Copy link
Contributor Author

Thanks, I'll adjust the commit but try to make it usable for both familiar with MediaWiki and Data Package, so there is no "supported yes or no".

@nichtich nichtich force-pushed the main branch 2 times, most recently from 972b437 to 272afc1 Compare December 13, 2024 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compare with Wikibase tabular data model
2 participants