You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In database building procedure, R dataframes are output into CSV files, and the CSV files are loaded into a PostgreSQL database using COPY FROM SQL command.
The R output CSV format is not completely compatible with PostgreSQL COPY FROM command.
The following case is known to be incompatible:
\.
COPY FROM: "Because backslash is not a special character in the CSV format, ., the end-of-data marker, could also appear as a data value."
readr::write_csv does not quote \. in output CSV.
The following cases need to be further investigated for compatibility:
NULL/NA/missing values
COPY FROM: "The CSV format has no standard way to distinguish a NULL value from an empty string. PostgreSQL's COPY handles this by quoting. A NULL is output as the NULL parameter string and is not quoted, while a non-NULL value matching the NULL parameter string is quoted. For example, with the default settings, a NULL is written as an unquoted empty string, while an empty string data value is written with double quotes (""). Reading values follows similar rules. You can use FORCE_NOT_NULL to prevent NULL input comparisons for specific columns."
readr::write_csv probably writes both NA and empty string as blank in output CSV, with na = "".
Values that are not delimiter/quote/CR/LF/NULL/NA/missing but contain them:
COPY FROM: "If the value contains the delimiter character, the QUOTE character, the NULL string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the QUOTE character, and any occurrence within the value of a QUOTE character or the ESCAPE character is preceded by the escape character."
readr::write_csv probably quotes such values in CSV output, but edge cases need to be evaluated.
Currently, PostgreSQL COPY FROM compatible CSV files are output by the following function.
The values in ${BULK_EXP_SCHEMA}_${BULK_EXP_TPM_HISTOLOGY_TBL}.csv, which is the only CSV file as of 09/17/2021, are all compatible with PostgreSQL COPY FROM. However, CSV files that will be added in #37 may contain incompatible values.
The text was updated successfully, but these errors were encountered:
logstar
changed the title
Investigate and fix CSV format incompatibilities with PostgreSQL COPY FROM SQL command
Investigate and fix CSV format incompatibilities between readr::write_csv and PostgreSQL COPY FROMSep 17, 2021
In database building procedure, R dataframes are output into CSV files, and the CSV files are loaded into a PostgreSQL database using
COPY FROM
SQL command.The R output CSV format is not completely compatible with PostgreSQL
COPY FROM
command.The following case is known to be incompatible:
\.
COPY FROM
: "Because backslash is not a special character in the CSV format, ., the end-of-data marker, could also appear as a data value."readr::write_csv
does not quote\.
in output CSV.The following cases need to be further investigated for compatibility:
NULL
/NA
/missing valuesCOPY FROM
: "The CSV format has no standard way to distinguish a NULL value from an empty string. PostgreSQL's COPY handles this by quoting. A NULL is output as the NULL parameter string and is not quoted, while a non-NULL value matching the NULL parameter string is quoted. For example, with the default settings, a NULL is written as an unquoted empty string, while an empty string data value is written with double quotes (""). Reading values follows similar rules. You can use FORCE_NOT_NULL to prevent NULL input comparisons for specific columns."readr::write_csv
probably writes bothNA
and empty string as blank in output CSV, withna = ""
.NULL
/NA
/missing but contain them:COPY FROM
: "If the value contains the delimiter character, the QUOTE character, the NULL string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the QUOTE character, and any occurrence within the value of a QUOTE character or the ESCAPE character is preceded by the escape character."readr::write_csv
probably quotes such values in CSV output, but edge cases need to be evaluated.Currently, PostgreSQL
COPY FROM
compatible CSV files are output by the following function.OpenPedCan-api/db/build_tools/build_db.R
Lines 153 to 157 in c165d95
The values in
${BULK_EXP_SCHEMA}_${BULK_EXP_TPM_HISTOLOGY_TBL}.csv
, which is the only CSV file as of 09/17/2021, are all compatible with PostgreSQLCOPY FROM
. However, CSV files that will be added in #37 may contain incompatible values.Refs:
COPY FROM
specifications: https://www.postgresql.org/docs/11/sql-copy.htmlreadr::write_csv
latest version specifications: https://readr.tidyverse.org/reference/write_delim.html. The local database building Docker image usesreadr
version 2.0.0.The text was updated successfully, but these errors were encountered: