-
Some introductory points, starting from an example I know:
In the "CSV file requirements" of Tabular Data Resource you have that CSV files MUST follow RFC 4180 with the following important exceptions allowed:
The great goodtables and datapackage (thank you very much for these tools) read properly the CSV separator. To add to the default output of CSV inferencing also the To add to the standard output:
Thank you |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 6 replies
-
@aborruso do you know of any existing tooling for guessing the delimiter. I believe csvkit does this e.g. wireservice/csvkit#1011 /cc @jpmckinney |
Beta Was this translation helpful? Give feedback.
-
CSVKit just uses Python's sniffing: https://docs.python.org/3/library/csv.html#csv.Sniffer |
Beta Was this translation helpful? Give feedback.
-
@rufuspollock goodtables and datapackage already do it, but they do not write it in the output. If you have this
and run And you can use a python onliner {
"profile": "tabular-data-package",
"resources": [
{
"path": "input.csv",
"profile": "tabular-data-resource",
"name": "input",
"format": "csv",
"mediatype": "text/csv",
"encoding": "utf-8",
"schema": {
"fields": [
{
"name": "field1",
"type": "integer",
"format": "default"
},
{
"name": "field2",
"type": "string",
"format": "default"
}
],
"missingValues": [
""
]
}
}
]
} |
Beta Was this translation helpful? Give feedback.
-
@aborruso is the fix here to improve |
Beta Was this translation helpful? Give feedback.
-
Frictionless does it:
$ frictionless describe data/delimiter.csv --yaml dialect:
delimiter: ;
encoding: utf-8
format: csv
hashing: md5
name: delimiter
path: data/delimiter.csv
profile: tabular-data-resource
schema:
fields:
- name: id
type: integer
- name: name
type: string
scheme: file $ frictionless validate data/delimiter.csv --yaml errors: []
stats:
errors: 0
tasks: 1
tasks:
- errors: []
partial: false
resource:
dialect:
delimiter: ;
encoding: utf-8
format: csv
hashing: md5
name: delimiter
path: data/delimiter.csv
profile: tabular-data-resource
schema:
fields:
- name: id
type: integer
- name: name
type: string
scheme: file
stats:
bytes: 30
fields: 2
hash: e5ee50a4b8d3a38f777599d70bebee96
rows: 2
scope:
- hash-count-error
- byte-count-error
- field-count-error
- row-count-error
- blank-header
- extra-label
- missing-label
- blank-label
- duplicate-label
- incorrect-label
- blank-row
- primary-key-error
- foreign-key-error
- extra-cell
- missing-cell
- type-error
- constraint-error
- unique-error
stats:
errors: 0
time: 0.02
valid: true
time: 0.02
valid: true
version: 4.5.1 |
Beta Was this translation helpful? Give feedback.
Frictionless does it: