Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing a line containing \" throws RuntimeException #8

Open
qiemem opened this issue Sep 5, 2015 · 7 comments
Open

Importing a line containing \" throws RuntimeException #8

qiemem opened this issue Sep 5, 2015 · 7 comments

Comments

@qiemem
Copy link
Member

qiemem commented Sep 5, 2015

Reported by @dougedmunds in NetLogo/NetLogo#845. That bug has a more extensive example, but the following is sufficient to reproduce:

csv:from-row "\"\\\" \\\"\""

That is, attempting to parse the string "\" \"" results in:

java.lang.RuntimeException: java.io.IOException: (line 1) invalid char between encapsulated token and delimiter
@dougedmunds
Copy link

Also

 csv:from-string "\"\\\" \\\"\""

@qiemem
Copy link
Member Author

qiemem commented Sep 5, 2015

The problem here is that the CSV parser is seeing the quote in \" as the closing quote for the cell. So in the entry:

"plot count scouts with [task-string = \"watching-dance\"]"

the parser matches the first \" with the opening quote, so that it thinks "plot count scouts with [task-string = \" is the cell. But then it sees a bunch of other stuff before it sees a delimiter, so it chokes.

One of the tough parts about CSV is that there isn't really any agreed upon standard. Besides people using all sorts of different delimiters, cell quotation practices, and so forth, how special characters are escaped varies as well. Unlike in many applications, escaping things like newlines and tab characters is not necessary, as you can just stick them in a quoted field. Quotation marks do need to be escaped however. As I understand it, most software that uses CSV escapes quote marks by putting two quote marks in a row (Excel is a notable example). So the above line would be written:

"plot count scouts with [task-string = ""watching-dance""]"

So here's the dilemma. I could specify \ as an escape character, but then that messes up files coming from programs that use it as a regular character (such as Excel). Alternatively, I could add another optional argument to the procedures in this extension to specify the escape character, but that complicates the API significantly. I'm not sure what the best solution is.

As a workaround, you can replace instances of \" in your strings with "". The string extension's rex-replace-all procedure would be particularly useful for this. Let me know if you'd like help with this.

@dougedmunds
Copy link

Hello Bryan,

I just posted about a tool I wrote on StackOverflow.com that shows the
code behind widgets.
See
http://stackoverflow.com/questions/32410338/how-can-i-see-the-code-behind-all-the-widgets-on-a-netlogo-interface

I made a note in the readme file about the csv issue. As far as I'm
concerned, moving the code out of the widget is the best solution for now.

You can see how where the embedded double quotes shows up, if you look
at some of the models in the models library that have plots .
For example in the "Disease Solo.nlogo" the Number Sick plot has this
code inside it:
"create-temporary-plot-pen word "run " run-number\nset-plot-pen-color
item (run-number mod 5)\n [blue red green orange
violet]" "plot num-sick"

That is actually code for two parts of the interface, plot setup and
plot update.
That's one of the two areas where I use csv:from-row to separate the two
strings.

The other areas is in Pens, where NetLogo uses one line to store 7
values on it (strings and numbers).

I found using csv was way easier than trying to otherwise separate the
parts of the line. If you look at the code in nivi.nlogo, you'll see
that the while loop in "parse-file" simply reads the file line by line
using "set mydata file-read-line". file-read-line generates a string,
which is fed to csv:from-row in those two places.

If you can think of a way to break the strings into parts without using
csv:from-row and avoid the runtime error, let me know.

-- Doug Edmunds
On 9/4/2015 9:38 PM, Bryan Head wrote:

The problem here is that the CSV parser is seeing the quote in |"| as
the closing quote for the cell. So in the entry:

|"plot count scouts with [task-string = "watching-dance"]"|

the parser matches the first |"| with the opening quote, so that it
thinks |"plot count scouts with [task-string = "| is the cell. But
then it sees a bunch of other stuff before it sees a delimiter, so it
chokes.

One of the tough parts about CSV is that there isn't really any agreed
upon standard. Besides people using all sorts of different delimiters,
cell quotation practices, and so forth, how special characters are
escaped varies as well. Unlike in many applications, escaping things
like newlines and tab characters is not necessary, as you can just
stick them in a quoted field. Quotation marks do need to be escaped
however. As I understand it, most software that uses CSV escapes quote
marks by putting two quote marks in a row (Excel is a notable
example). So the above line would be written:

|"plot count scouts with [task-string = ""watching-dance""]"|

So here's the dilemma. I could specify || as an escape character, but
then that messes up files coming from programs that use it as a
regular character (such as Excel). Alternatively, I could add another
optional argument to the procedures in this extension to specify the
escape character, but that complicates the API significantly. I'm not
sure what the best solution is.

As a workaround, you can replace instances of |"| in your strings
with |""|. The string extension
https://github.com/NetLogo/String-Extension/'s |rex-replace-all|
procedure would be particularly useful for this. Let me know if you'd
like help with this.


Reply to this email directly or view it on GitHub
#8 (comment).

@qiemem
Copy link
Member Author

qiemem commented Sep 7, 2015

Hi Doug,

You can get pretty far using file-read to parse the plot and pen lines in question. You should be able to parse line-by-line like you're doing, except when you know you're about to hit a plot setup/update line or pen line. At that point, you invoke file-read enough times to read in every entry in the line, and then switch back to reading line-by-line

However, this fails when dealing with multiple pens as you don't know how many pens there are, so you don't know when to switch back to file-read-line. There a couple of solutions I can think of, of varying levels of dirtiness. Probably the best is to just read the entire file in line-by-line to count the number of pens in each plot, and then read it in again to actually import the information. Dirty, but you don't have to write your own parser.

I'll keep thinking about it though. Sorry there isn't a simpler solution.

@dougedmunds
Copy link

My dirty solution to how many pens is that there is a blank line after
the last pen. If my loop finds "PENS" it loops through them until it
gets to the blank line.

On 9/7/2015 11:43 AM, Bryan Head wrote:

Hi Doug,

You can get pretty far using |file-read|
http://ccl.northwestern.edu/netlogo/docs/dictionary.html#file-read
to parse the plot and pen lines in question. You should be able to
parse line-by-line like you're doing, except when you know you're
about to hit a plot setup/update line or pen line. At that point, you
invoke |file-read| enough times to read in every entry in the line,
and then switch back to reading line-by-line

However, this fails when dealing with multiple pens as you don't know
how many pens there are, so you don't know when to switch back to
|file-read-line|. There a couple of solutions I can think of, of
varying levels of dirtiness. Probably the best is to just read the
entire file in line-by-line to count the number of pens in each plot,
and then read it in again to actually import the information. Dirty,
but you don't have to write your own parser.

I'll keep thinking about it though. Sorry there isn't a simpler solution.


Reply to this email directly or view it on GitHub
#8 (comment).

@qiemem
Copy link
Member Author

qiemem commented Sep 8, 2015

That works when reading everything with file-read-line, but doesn't work with file-read since it skips over newlines.

@dougedmunds
Copy link

I developed some code similar to your suggestion of using the string extension's rex-replace-all procedure. I want the model to work 'straight out of the box', without requiring any extensions not already included with NetLogo 5.2.

After running file-read-line, it now looks for the slash-doublequote in the string. If found, it substitutes "@@". Then it runs csv:from-row. Finally it substitutes back the slash-doublequote for any @@. To cover the bases, it tests for @@ in the original string. If found, it just reports the original string, without using csv:from-row.

This avoids the runtime error problem, afaik.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants