Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macro write_datasetjson.sas inconsistent dataType "integer" definition vs "decimal" or "float" in actual data #54

Open
mhungria opened this issue Nov 21, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@mhungria
Copy link

mhungria commented Nov 21, 2024

The write_datasetjson.sas macro appears not to be detecting an inconsistent dataType "integer" definition vs "decimal" or "float" in actual data when creating a json file referencing a Define-XML.
Note: it may not affect the round trip validation against the source data depending on the software used to read the json file. However, depending on the software used, it may cause data truncation.
I know that appropriate validation of inconsistencies between Define-XML and the data is expected to take place before considering data files valid; however, it would be convenient to detect and report the inconsistency in the write_datasetjson.sas macro.

Refer to examples:
https://github.com/cdisc-org/DataExchange-DatasetJson/blob/master/examples/adam/adadas.json
https://github.com/cdisc-org/DataExchange-DatasetJson/blob/master/examples/adam/adnpix.json

Here is a visual display for the adadas.json using the Dicore Group's Dataset-JSON v1.1 Viewer, showing the AVAL definition and some decimal values:
image

image

List of all columns with the indicated inconsistent definitions in the examples released with Dataset-JSON v1.1:
image
image
Note: I'm finding the list of exceptions by using my apps to create the json files. I'm assigning "decimal" as json_datatype based on the data.

@mhungria mhungria changed the title Macro write_datasetjson.sas inconsistent dataType "integer" vs "decimal" in actual data Macro write_datasetjson.sas inconsistent dataType "integer" definition vs "decimal" in actual data Nov 21, 2024
@mhungria mhungria changed the title Macro write_datasetjson.sas inconsistent dataType "integer" definition vs "decimal" in actual data Macro write_datasetjson.sas inconsistent dataType "integer" definition vs "decimal" or "float" in actual data Nov 21, 2024
@lexjansen
Copy link
Owner

Hi Marcelina,
I was aware of some of these issues in the ADaM Define-XML file.
As you said, it really is a metadata problem. JSON does not even have a concept of "decimal", or "float", or even "integer". There is just "number". Although JSON schema knows "integer".
I do some limited checks on the consistency between the datatypes in the metadata, and the variable types take from that datasets.

I see the value of the check, but I feel it should be a separate step from writing the JSON, or reading, since you can also not assume consistency between data and metadata when reading.
And the check should probably also include checking that for the datetime, data, and time there are no incomplete values, when reading Dataset-JSON.

Btw, based on the data, how do you make a distinction between "decimal", "float", and "double"? Some of the variables above should actually be "float" or "double".

@lexjansen lexjansen added the enhancement New feature or request label Nov 21, 2024
@mhungria
Copy link
Author

Hi Lex, thanks for responding that quickly!

To your question, for now, I provide an overall execution parameter for the distinction depending on the actual data and the Define-XML SignificantDigits for float definitions - however the Define-XML may be wrong as we know :). The corresponding SMEs would need to provide the input for it.
Note: the revised examples available via the DIcore Dataset-JSON Viewer were created with an overall execution parameter of >8 decimal digits to differentiate between the float numbers and string/decimal representation. One may provide a different distinction mechanism depending on the use case.

@lexjansen
Copy link
Owner

It will be interesting to see in the future how, and if, the more granular numeric definitions have any impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants