-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to convert a v1 to a v2 package #264
Comments
Oof, yeah, the remote references really put a wrench in things. I suppose downloading and verbosely including is the way to go, in the same way that if we add a new field to a remote schema it must be downloaded and verbosely included. I see now how preserving remote references without modification makes a good argument for mixing v1 and v2 descriptors within the same package, but I'm still very concerned about the complexity it creates. In general remote references work ok for read-only datapackages, but behavior gets complicated / murky when any mutability AND round-tripping is desired. I think we'd both really like the ability to round-trip descriptors (incl. references to remote descriptors) when no modifications occur, but when we add remote references to the picture it means we need to track the modified state of our descriptors. If a property inside a remotely referenced descriptor is changed, the descriptor needs to decide whether that modification should trigger the descriptor being downloaded & modified. I think properly managing this state will become a big headache pretty quickly... so that brings us back to just downloading & verbosely including when upgrading. Either that or we need to sit down and figure out an architecture that helps us track / manage remote descriptor modification state & handle roundtripping without it exploding in our faces. I think the design I describe in #252 can potentially help with this. We can have different classes for local vs remote descriptors which can have different serialization behaviors via The only thing I have a confident answer on is |
Thanks for thinking along @khusmann! I've been contemplating this a bit more and I came up with this: flowchart TB
classDef function color:#fff,fill:#5E8CD8,stroke:#5E8CD8,stroke-width:2px;
classDef error color:#D86C5D,fill:#fff,stroke:#D86C5D,stroke-width:2px;
v1_a["v1 datapackage.json"]
v1_b["v1"]
v1_c["v1"]
v1_d["v1 datapackage.json"]
v1_read_package("read_package()"):::function
v1_create_package("create_package(v1)"):::function
v1_add_resource("add_resource()"):::function
v1_write_package("write_package()"):::function
v1_schema["v1 schema"]
v2_a["v2 datapackage.json"]
v2_b["v2"]
v2_c["v2"]
v2_d["v2 datapackage.json"]
v2_read_package("read_package()"):::function
v2_create_package("create_package(v2)"):::function
v2_add_resource("add_resource()"):::function
v2_write_package("write_package()"):::function
v2_schema["v2 schema"]
b_upgrade_package("upgrade_package()"):::function
c_upgrade_package("upgrade_package()"):::function
subgraph v2_workflow
v2_create_package --> v2_b
v2_a ==> v2_read_package ==> v2_b ==> v2_add_resource ==> v2_c ==> v2_write_package ==> v2_d
v2_schema -. optionally provided by user .-> v2_add_resource
v2_add_resource -- user provided v1 schema --> v2_error["error, use upgrade_schema()"]:::error
end
v1_b --> b_upgrade_package --> v2_b
v1_c --> c_upgrade_package --> v2_c
%% v1_workflow --> b_upgrade_package --> v2_workflow
subgraph v1_workflow
v1_create_package --> v1_b
v1_a ==> v1_read_package ==> v1_b ==> v1_add_resource ==> v1_c ==> v1_write_package ==> v1_d
v1_schema -. optionally provided by user .-> v1_add_resource
v1_add_resource -- user provided v2 schema --> v1_error["error, use upgrade_package()"]:::error
end
|
I wanted to assess the complexity of converting a v1 to a v2 Data Package. Below are the steps that need to be taken. For version detection, see #262. @khusmann could you review these? There are a couple of items I'm unsure about.
Package
Add package.$schema, remove
package.profile
Use
package.profile
, then remove it.NULL
=>https://datapackage.org/profiles/2.0/datapackage.json
data-package
(registered id) =>https://datapackage.org/profiles/2.0/datapackage.json
tabular-data-package
(registered id) =>https://datapackage.org/profiles/2.0/datapackage.json
. This also removes deprecated tabular-data-packagefiscal-data-package
(registered id) => Unsure, should we use the 1.0 URL for fiscal-data-package?Add package.contributors.roles
roles
(array) based onrole
(string). Removerole
Other changes
title
,givenName
andfamilyName
.Each resource
Add resource.$schema, remove
resource.profile
Use
resource.profile
, then remove itNULL
=>https://datapackage.org/profiles/2.0/dataresource.json
data-resource
(registered id) =>https://datapackage.org/profiles/2.0/dataresource.json
tabular-data-resource
(registered id) =>https://datapackage.org/profiles/2.0/dataresource.json
(but seeresource.type
)$schema
is already present (i.e. a v1 package with a v2 resource). => Unsure, should the presentresource.$schema
be left as is then?Add resource.type
Use
resource.profile
:NULL
=> don't settabular-data-resource
=>table
Other changes
For each dialect
Note that upconverting a dialect requires a remote one to be downloaded and verbosely included.
Add dialect.$schema
dialect.caseSensitiveHeader
is present =>https://datapackage.org/profiles/1.0/tabledialect.json
dialect.csvddfVersion
is present =>https://datapackage.org/profiles/1.0/tabledialect.json
https://datapackage.org/profiles/2.0/tabledialect.json
Unsure about this though. For example, if a dialect was absent (very often the case), one will be added with just the
$schema
property. The alternative is to leave all dialects as v1 (assuming a$schema
that defaults tohttps://datapackage.org/profiles/1.0/tabledialect.json
). That would also mean that remote dialects can stay remote.Other changes
For each schema
Note that upconverting a schema requires a remote one to be downloaded and verbosely included.
Add schema.$schema
https://datapackage.org/profiles/2.0/tableschema.json
because we will update the schema it to that version.Update
schema.primaryKey
Update
schema.foreignKeys
schema.foreignKeys.fields
from string to arrayschema.foreignKeys.reference[x].fields
from string to arrayschema.foreignKeys.reference[x].resource
= resource name => remove propertyNo action required
exact
for all v1, but that is also the default for this field, so no need to set itFor each field
Other changes
enum
should be converted to a field withcategories
.groupChar
is a new property, no action requiredtype = any
, potentially provide opt-in #168The text was updated successfully, but these errors were encountered: