Newline characters cause rows in PostgreSQL table to be broken inadvertently. #31

YPCrumble · 2020-06-17T02:06:23Z

This issue replaces #30.

The issue is that user-inputted data that includes these newline characters:

\u2028
\u2029
\x85

causes the dump to think that the line is actually split into more than one. The result is that the dump raises:

ValueError("Mismatch between column names and values.")

To solve it I added the following to the Python processes:

    process = subprocess.Popen(
        (
            "pg_dump",
            # Force output to be UTF-8 encoded.
            "--encoding=utf-8",
            # Quote all table and column names, just in case.
            "--quote-all-identifiers",
            # Luckily `pg_dump` supports DB URLs, so we can just pass it the
            # URL as argument to the command.
            "--dbname",
            url.geturl().replace('postgis://', 'postgresql://'),
         ) + tuple(extra_params),
        stdout=subprocess.PIPE,
    )

    # Remove newline characters.
    process = subprocess.Popen(
        "sed $'s/\u2028/ /g'",
        shell=True,
        stdin=process.stdout,
        stdout=subprocess.PIPE)
    process = subprocess.Popen(
        "sed $'s/\u2029/ /g'",
        shell=True,
        stdin=process.stdout,
        stdout=subprocess.PIPE)
    process = subprocess.Popen(
        "sed $'s/\x85/ /g'",
        shell=True,
        stdin=process.stdout,
        stdout=subprocess.PIPE)

I'd be happy to add as a PR if it's helpful, or is there a better way to handle the issue?

The text was updated successfully, but these errors were encountered:

azin634 · 2021-12-10T10:08:00Z

I had a similar issue in mysql. See if this fix would work #29

YPCrumble · 2022-12-05T17:56:06Z

@azin634 this seems to help with the first two types of newlines, but not all. I'm now getting this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 1: invalid continuation byte

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newline characters cause rows in PostgreSQL table to be broken inadvertently. #31

Newline characters cause rows in PostgreSQL table to be broken inadvertently. #31

YPCrumble commented Jun 17, 2020

azin634 commented Dec 10, 2021

YPCrumble commented Dec 5, 2022

Newline characters cause rows in PostgreSQL table to be broken inadvertently. #31

Newline characters cause rows in PostgreSQL table to be broken inadvertently. #31

Comments

YPCrumble commented Jun 17, 2020

azin634 commented Dec 10, 2021

YPCrumble commented Dec 5, 2022