Parse trailing comments from buses data in v30 files #78

nickrobinson251 · 2022-07-14T17:15:09Z

WIP closing Support extracting end-of-line comments? #27

ideally we'd extract just the text inside the `/*[ ... ] */"` but i've no got that working yet (using https://github.com/JuliaData/Parsers.jl/pull/117), so for now this PR just extracts the whole thing:
julia> parse_network("test/testfiles/eolcomments.raw").buses.comment 2-element Vector{Union{Missing, String31}}: "/* [NOBO 1 ] */" "/* [NAU_E2 1 ] */"

Using Parsers#main, this now produces:

julia> parse_network("test/testfiles/eolcomments.raw").buses.comment
2-element Vector{Union{Missing, InlineStrings.String31}}:
 "NOBO     1"
 "NAU_E2   1"

cc @BSnelling @raphaelsaavedra

nickrobinson251 · 2022-07-14T17:33:27Z

various things to do:

check the performance of this
make this non-breaking?
decide if changing this to strip comment chars in future would be breaking (i'm leaning towards merge this with functionality as-is and maybe improve in future...)
comment the code
pray we never have to handle the case mentioned in Support extracting end-of-line comments? #27 where there's a comma (not a space) separating the last "real " column and the "comments" ?!

nickrobinson251 · 2022-07-14T17:34:35Z

src/parsing.jl

+@generated function parse_row!(rec::R, bytes, pos, len, options) where {R <: Buses30}
+    block = Expr(:block)
+    n = fieldcount(R)
+    for col in 1:(n - 2)


we need to handle the last two entires differently, becase the second-last entry (the last "real" column) we need to be sure to stop parsing before the comment

Sorry, I'm unfamiliar with the package, so I have some potentially basic questions.

I see that this function dispatches over Buses30. But it's not always the case that a PSSE v30 will have the comment. How do we deal with both situations? Also, it's not just the buses that have augmented names, other components have it too; are they being accounted for?

it's not always the case that a PSSE v30 will have the comment. How do we deal with both situations?

we define Buses30 as having 12 columns, with the 12th being the comments. In this function we parse the first (12 - 2) = 10 columns as usual. We then parse the 11th (and potentially last column present, if no comments) with whitespace as the delimiter (i.e. stop parsing when we hit a space or newline). If when parsing the 11th column we hit the newline character then there are no comments and we set the 12th column as missing, else if we hit whitespace we then parse the rest of the line as the 12th and final column. See the if newline in the code.

Also, it's not just the buses that have augmented names, other components have it too; are they being accounted for?

As far as i understood only the Buses comments ("augmented names") were needed, so this PR only parses comments from the Buses section (and only for v30 files). I doesn't extract comments present in any other section. Do we need to?

These are good question and i'm very happy to answer as many questions as you have!

Yeah, we do use augmented names for other components, namely generators, loads, and branches (incl. transformers)

so we need to functionality not just for buses?

oh, wait, for other components, these comments can be reconstructed from columns already in the data, no?

In the case of loads and generators the augmented names are constructed from other columns. But for branches (including transformers) we use the end of line comments.

transformers are the biggest pain 😂

nickrobinson251 · 2022-07-14T17:36:33Z

src/parsing.jl

+    push!(block.args, quote
+        # @show (rec, pos, code)
+        pos = checkdelim!(bytes, pos, len, OPTIONS_SPACE)
+        (rec, pos, code) = parse_value!(rec, $m, $Tm, bytes, pos, len, OPTIONS_SPACE)


we need to set whitespace as the delim in order to not parse the comment when parsing the second-last entry (hence OPTIONS_SPACE) but if we parsed with comma as the delim til now then we may be at a whitespace so we use checkdelim! to move to the next non-whitespace char before trying to parse this second-last entry

src/parsing.jl

nickrobinson251 · 2022-07-14T17:38:09Z

test/runtests.jl

@@ -497,6 +497,22 @@ using Test
        @test net_space.branches.j == net_space_manual.branches.j
    end

+    @testset "End-of-line comments" begin
+        net_eol = parse_network("testfiles/eolcomments.raw")


todo: add comment here about only Buses30 having comment column

raphaelsaavedra · 2022-07-14T17:55:12Z

pray we never have to handle the case mentioned in #27 where there's a comma (not a space) separating the last "real " column and the "comments"

IIRC this was the case for CAISO data. So we might have to deal with this in the (not so near) future.

nickrobinson251 · 2022-07-15T07:53:42Z

pray we never have to handle the case mentioned in #27 where there's a comma (not a space) separating the last "real " column and the "comments"

IIRC this was the case for CAISO data. So we might have to deal with this in the (not so near) future.

Yeah... i'm not sure how we'd handle that right now, since we hardcode "space is the deliimiter". I suppose we'd have to eithre manually peak ahead in search of the delimiter to "look" if there's a comma, then parse with comma-as-delimiter OR perhaps easier to parse with space-as-delimiter, check the resulting code and if parsing was invalid ( or perhaps just invaliddelimiter?) then go back and re-parse with comma-as-delimiter

I prefer to kick this can down the road until this feature is requested/needded

BSnelling · 2022-07-25T16:47:24Z

src/parsing.jl

+@generated function parse_row!(rec::R, bytes, pos, len, options) where {R <: Buses30}
+    block = Expr(:block)
+    n = fieldcount(R)
+    for col in 1:(n - 2)


In the case of loads and generators the augmented names are constructed from other columns. But for branches (including transformers) we use the end of line comments.

BSnelling · 2022-07-26T08:24:41Z

src/parsing.jl

+#  111,'STBC      ',161.00,1,    0.00,    0.00,227,   1,1.09814,  -8.327,  1 /* [STBC   1   ] */
+# And does _not_ handle data with a comma before the comment like:
+#  111,'STBC      ',161.00,1,    0.00,    0.00,227,   1,1.09814,  -8.327,  1, /* [STBC   1   ] */
+@generated function parse_row!(rec::R, bytes, pos, len, options) where {R <: Buses30}


Out of interest, why does this need to be a generated function?

unfortunately thiis is the necessary to avoid dynamic dispatch (#22)

whenever we call:

fo col in 1:ncols T = eltype(fieldtype(R, col)) parse_value!(record, col, T, bytes, pos, len, options)

the type of T is not known til runtime, meaaning the parse_value! call has to look-up decide at runtime which method to call, which adds a big performance cost, since we have one parse_value! call per value/element in the file

so we use a generated function to unroll the loop and generate code like

parse_value!(record, 1, Int64, bytes, pos, len, options) parse_value!(record, 2, Float64, bytes, pos, len, options) ... parse_value!(record, 12, Bool, bytes, pos, len, options)

I think the other way we might have been able to avoid the dynamic dispatch is by manually doing what "union splitting" does e.g. like is done in ODBC.jl

CSV.jl uses a mix of these techniques e.g. splitting the big union of expected types and generating code for custom types

BSnelling · 2022-07-26T09:07:51Z

test/runtests.jl

+        @test length(loads) == 1
+        @test loads.i == [10010]  # first col
+        @test loads.owner == [1]  # last non-missing col
+        @test isequal(loads.intrpt, [missing]) # last col


I'm wondering what exactly this test is checking for. Is it checking that the end of line comment has not been interpreted as intrpt? In that case should there also be a test that loads.scale is missing? scale is the 13th column so would be where the end of line comment could mistakenly end up.

it's just checking that loads parse as expected (but checking only select columns as a proxy, rather than every column)

scale is the 13th column so would be where the end of line comment could mistakenly end up.

i don't follow this - can you explain?

Anyway it sounds like if we want something like this PR then it does need to parse commnts from more than just Buses data, so this'd need updating

scale is the 13th column so would be where the end of line comment could mistakenly end up.

I thought this test might be intended to check that the end of line comment hadn't been parsed and placed into the last column by mistake. In the test file for this test, loads has 12 columns, so I thought checking the 13th column was missing would be informative.

nickrobinson251 added 4 commits July 14, 2022 16:23

WIP

5794808

WIP2

935a243

WIP3

2f2eb66

WIP don't strip comment markers

b471eda

nickrobinson251 commented Jul 14, 2022

View reviewed changes

Add comments

8348816

BSnelling reviewed Jul 26, 2022

View reviewed changes

Treat inline-comment-chars as quotes to extract text only

a1beb1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse trailing comments from buses data in v30 files #78

Parse trailing comments from buses data in v30 files #78

nickrobinson251 commented Jul 14, 2022 •

edited

Loading

nickrobinson251 commented Jul 14, 2022

nickrobinson251 Jul 14, 2022

raphaelsaavedra Jul 14, 2022

nickrobinson251 Jul 15, 2022

raphaelsaavedra Jul 15, 2022

nickrobinson251 Jul 19, 2022

nickrobinson251 Jul 19, 2022

BSnelling Jul 25, 2022

nickrobinson251 Aug 4, 2022

nickrobinson251 Jul 14, 2022

nickrobinson251 Jul 14, 2022

raphaelsaavedra commented Jul 14, 2022

nickrobinson251 commented Jul 15, 2022

BSnelling Jul 25, 2022

BSnelling Jul 26, 2022

nickrobinson251 Aug 4, 2022

BSnelling Jul 26, 2022

nickrobinson251 Aug 4, 2022

BSnelling Aug 4, 2022

Parse trailing comments from buses data in v30 files #78

Are you sure you want to change the base?

Parse trailing comments from buses data in v30 files #78

Conversation

nickrobinson251 commented Jul 14, 2022 • edited Loading

nickrobinson251 commented Jul 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raphaelsaavedra commented Jul 14, 2022

nickrobinson251 commented Jul 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nickrobinson251 commented Jul 14, 2022 •

edited

Loading