Characters in text files are valid according to declared encoding #450

jeanetteclark · 2024-04-23T22:47:53Z

Status : ⌛ Not Started

Description

Check for text values within the correct ranges for declared encoding.

e.g., ASCII files only contain characters in the range \x00 to \xFF
e.g., Unicode encoded text files only contain characters in the correct range (e.g., for UTF-8)

Priority

Data Quality: Required

Issues

Most files don't have a declared encoding? So I'm not sure how we would check for this other than assuming most things we see are UTF-8 (or maybe ASCII??) unless declared otherwise. Thoughts @mbjones?

Procedure

in R, we could use validUTF8

The text was updated successfully, but these errors were encountered:

jeanetteclark added the Data Quality Suite label Apr 23, 2024

jeanetteclark added this to the v0.6.0 milestone Apr 23, 2024

jeanetteclark added the Priority: High label Aug 23, 2024

jeanetteclark added this to Data Quality Suite Aug 23, 2024

jeanetteclark moved this to Ready in Data Quality Suite Aug 23, 2024

jeanetteclark added this to Metadig Data Quality Oct 2, 2024

jeanetteclark moved this to Ready in Metadig Data Quality Oct 2, 2024

jeanetteclark moved this from Ready to Backlog in Metadig Data Quality Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Characters in text files are valid according to declared encoding #450

Characters in text files are valid according to declared encoding #450

jeanetteclark commented Apr 23, 2024 •

edited

Loading

Characters in text files are valid according to declared encoding #450

Characters in text files are valid according to declared encoding #450

Comments

jeanetteclark commented Apr 23, 2024 • edited Loading

Status : ⌛ Not Started

Description

Priority

Issues

Procedure

jeanetteclark commented Apr 23, 2024 •

edited

Loading