diff --git a/Datasheet.md b/Datasheet.md index 18c9f0b..db53430 100644 --- a/Datasheet.md +++ b/Datasheet.md @@ -19,7 +19,7 @@ > Q. Do tables express heterogeneous data, or must data be homogenized? -> Q. Do tables capture missing data and, if so, how? +> Q. Do tables capture missing data and, if so, how? Do missing values affect the output constraints of any operations, for example `groupBy`? > Q. Are mutable tables supported? Are there any limitations? diff --git a/README.md b/README.md index b36989e..7083ada 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ The Brown Benchmark for Table Types -Version 1.1 +Version 1.2 ## Context diff --git a/TableAPI.md b/TableAPI.md index 4012d47..b28bb85 100644 --- a/TableAPI.md +++ b/TableAPI.md @@ -1032,7 +1032,7 @@ Sorts the rows of a `Table` in ascending order by using a sequence of specified - `header(t2)` is equal to `["value", "count"]` - `schema(t2)["value"]` is equal to `schema(t1)[c]` - `schema(t2)["count"]` is equal to `Number` -- `nrows(t2)` is equal to `length(removeDuplicates(getColumn(t1, c)))` +- `nrows(t2)` is equal to `length(removeDuplicates(getColumn(t1, c)))` Note that if there are missing values in the input, this constraint requires one row for missing values in the output. #### Description @@ -1150,6 +1150,8 @@ Partitions rows into groups and summarize each group with the functions in `agg` - `schema(t2)` is equal to `schema(r3)` - `nrows(t2)` is equal to `length(removeDuplicates(ks))`, where `ks` is the results of applying `key` to each row of `t1`. `ks` can be defined with `select` and `getColumn`. +Note that these constraints assume a first class representation for missing values. + #### Description Groups the rows of a table according to a specified key selector function and creates a result value from each group and its key. The rows of each group are projected by using a specified function.