Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse trailing comments from buses data in v30 files #78

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

nickrobinson251
Copy link
Owner

@nickrobinson251 nickrobinson251 commented Jul 14, 2022

ideally we'd extract just the text inside the `/*[ ... ] */"` but i've no got that working yet (using https://github.com/JuliaData/Parsers.jl/pull/117), so for now this PR just extracts the whole thing:
julia> parse_network("test/testfiles/eolcomments.raw").buses.comment
2-element Vector{Union{Missing, String31}}:
 "/* [NOBO     1   ] */"
 "/* [NAU_E2   1   ] */"

Using Parsers#main, this now produces:

julia> parse_network("test/testfiles/eolcomments.raw").buses.comment
2-element Vector{Union{Missing, InlineStrings.String31}}:
 "NOBO     1"
 "NAU_E2   1"

cc @BSnelling @raphaelsaavedra

@nickrobinson251
Copy link
Owner Author

various things to do:

  • check the performance of this
  • make this non-breaking?
  • decide if changing this to strip comment chars in future would be breaking (i'm leaning towards merge this with functionality as-is and maybe improve in future...)
  • comment the code
  • pray we never have to handle the case mentioned in Support extracting end-of-line comments? #27 where there's a comma (not a space) separating the last "real " column and the "comments" ?!

@generated function parse_row!(rec::R, bytes, pos, len, options) where {R <: Buses30}
block = Expr(:block)
n = fieldcount(R)
for col in 1:(n - 2)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to handle the last two entires differently, becase the second-last entry (the last "real" column) we need to be sure to stop parsing before the comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm unfamiliar with the package, so I have some potentially basic questions.

I see that this function dispatches over Buses30. But it's not always the case that a PSSE v30 will have the comment. How do we deal with both situations? Also, it's not just the buses that have augmented names, other components have it too; are they being accounted for?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not always the case that a PSSE v30 will have the comment. How do we deal with both situations?

we define Buses30 as having 12 columns, with the 12th being the comments. In this function we parse the first (12 - 2) = 10 columns as usual. We then parse the 11th (and potentially last column present, if no comments) with whitespace as the delimiter (i.e. stop parsing when we hit a space or newline). If when parsing the 11th column we hit the newline character then there are no comments and we set the 12th column as missing, else if we hit whitespace we then parse the rest of the line as the 12th and final column. See the if newline in the code.

Also, it's not just the buses that have augmented names, other components have it too; are they being accounted for?

As far as i understood only the Buses comments ("augmented names") were needed, so this PR only parses comments from the Buses section (and only for v30 files). I doesn't extract comments present in any other section. Do we need to?

These are good question and i'm very happy to answer as many questions as you have!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we do use augmented names for other components, namely generators, loads, and branches (incl. transformers)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we need to functionality not just for buses?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, wait, for other components, these comments can be reconstructed from columns already in the data, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of loads and generators the augmented names are constructed from other columns. But for branches (including transformers) we use the end of line comments.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transformers are the biggest pain 😂

push!(block.args, quote
# @show (rec, pos, code)
pos = checkdelim!(bytes, pos, len, OPTIONS_SPACE)
(rec, pos, code) = parse_value!(rec, $m, $Tm, bytes, pos, len, OPTIONS_SPACE)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to set whitespace as the delim in order to not parse the comment when parsing the second-last entry (hence OPTIONS_SPACE) but if we parsed with comma as the delim til now then we may be at a whitespace so we use checkdelim! to move to the next non-whitespace char before trying to parse this second-last entry

src/parsing.jl Outdated Show resolved Hide resolved
@@ -497,6 +497,22 @@ using Test
@test net_space.branches.j == net_space_manual.branches.j
end

@testset "End-of-line comments" begin
net_eol = parse_network("testfiles/eolcomments.raw")
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: add comment here about only Buses30 having comment column

@raphaelsaavedra
Copy link

pray we never have to handle the case mentioned in #27 where there's a comma (not a space) separating the last "real " column and the "comments"

IIRC this was the case for CAISO data. So we might have to deal with this in the (not so near) future.

@nickrobinson251
Copy link
Owner Author

pray we never have to handle the case mentioned in #27 where there's a comma (not a space) separating the last "real " column and the "comments"

IIRC this was the case for CAISO data. So we might have to deal with this in the (not so near) future.

Yeah... i'm not sure how we'd handle that right now, since we hardcode "space is the deliimiter". I suppose we'd have to eithre manually peak ahead in search of the delimiter to "look" if there's a comma, then parse with comma-as-delimiter OR perhaps easier to parse with space-as-delimiter, check the resulting code and if parsing was invalid ( or perhaps just invaliddelimiter?) then go back and re-parse with comma-as-delimiter

I prefer to kick this can down the road until this feature is requested/needded

@generated function parse_row!(rec::R, bytes, pos, len, options) where {R <: Buses30}
block = Expr(:block)
n = fieldcount(R)
for col in 1:(n - 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of loads and generators the augmented names are constructed from other columns. But for branches (including transformers) we use the end of line comments.

# 111,'STBC ',161.00,1, 0.00, 0.00,227, 1,1.09814, -8.327, 1 /* [STBC 1 ] */
# And does _not_ handle data with a comma before the comment like:
# 111,'STBC ',161.00,1, 0.00, 0.00,227, 1,1.09814, -8.327, 1, /* [STBC 1 ] */
@generated function parse_row!(rec::R, bytes, pos, len, options) where {R <: Buses30}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, why does this need to be a generated function?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately thiis is the necessary to avoid dynamic dispatch (#22)

whenever we call:

fo col in 1:ncols
  T = eltype(fieldtype(R, col))
  parse_value!(record, col, T, bytes, pos, len, options)

the type of T is not known til runtime, meaaning the parse_value! call has to look-up decide at runtime which method to call, which adds a big performance cost, since we have one parse_value! call per value/element in the file

so we use a generated function to unroll the loop and generate code like

parse_value!(record, 1, Int64, bytes, pos, len, options)
parse_value!(record, 2, Float64, bytes, pos, len, options)
...
parse_value!(record, 12, Bool, bytes, pos, len, options)

I think the other way we might have been able to avoid the dynamic dispatch is by manually doing what "union splitting" does e.g. like is done in ODBC.jl

CSV.jl uses a mix of these techniques e.g. splitting the big union of expected types and generating code for custom types

@test length(loads) == 1
@test loads.i == [10010] # first col
@test loads.owner == [1] # last non-missing col
@test isequal(loads.intrpt, [missing]) # last col
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what exactly this test is checking for. Is it checking that the end of line comment has not been interpreted as intrpt? In that case should there also be a test that loads.scale is missing? scale is the 13th column so would be where the end of line comment could mistakenly end up.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just checking that loads parse as expected (but checking only select columns as a proxy, rather than every column)

scale is the 13th column so would be where the end of line comment could mistakenly end up.

i don't follow this - can you explain?

Anyway it sounds like if we want something like this PR then it does need to parse commnts from more than just Buses data, so this'd need updating

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scale is the 13th column so would be where the end of line comment could mistakenly end up.

I thought this test might be intended to check that the end of line comment hadn't been parsed and placed into the last column by mistake. In the test file for this test, loads has 12 columns, so I thought checking the 13th column was missing would be informative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants