Handle line breaks encapsulated in XML tags #46

andreasnoack · 2018-02-23T14:09:53Z

Indeed, this is a pretty exotic feature request but I happen to have some CSVs where the last column contains mixed text including XML and the text within the XML tags can potentially have newline characters which shouldn't be interpreted as newlines when parsing the file. Two such examples

<PAGE_AUTHORS>&#xD;\n&#xD;\n&#xD;\n&#xD;\n&#xD;\nHACKETT;Ark. &#xC3;&#xA2;&#xC2;&#x80;&#xC2;&#x94; A sheriff;admin;About the Author</PAGE_AUTHORS>

and

<PAGE_AUTHORS>K G Rana;\nMax Planck Institute of Microstructure Physics;Weinberg 2;D-06120 Halle;Germany;\nMax Planck Institute for Chemical Physics of Solids;N&#xC3;&#xB6;thnitzer Str. 40;D-01187 Dresden;O Meshcheriakova;J K&#xC3;&#xBC;bler;\nInstitut f&#xC3;&#xBC;r Festk&#xC3;&#xB6;rperphysik;Technische Universit&#xC3;&#xA4;t Darmstadt;D-64289 Darmstadt;B Ernst;J Karel;R Hillebrand;E Pippel;P Werner;A K Nayak;C Felser;S S P Parkin</PAGE_AUTHORS>

The first example is taken from the file 20160810171500.gkg.csv from the GDELT2 dataset

The text was updated successfully, but these errors were encountered:

davidanthoff · 2019-03-17T22:34:46Z

Are these columns surrounded by quotation marks? If not, we would have to add support for XML to handle this? That seems not like a good idea :) Or am I misunderstanding something here?

davidanthoff added the bug label Jan 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle line breaks encapsulated in XML tags #46

Handle line breaks encapsulated in XML tags #46

andreasnoack commented Feb 23, 2018

davidanthoff commented Mar 17, 2019

Handle line breaks encapsulated in XML tags #46

Handle line breaks encapsulated in XML tags #46

Comments

andreasnoack commented Feb 23, 2018

davidanthoff commented Mar 17, 2019