-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse trailing comments from buses data in v30 files #78
base: main
Are you sure you want to change the base?
Conversation
various things to do:
|
@generated function parse_row!(rec::R, bytes, pos, len, options) where {R <: Buses30} | ||
block = Expr(:block) | ||
n = fieldcount(R) | ||
for col in 1:(n - 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to handle the last two entires differently, becase the second-last entry (the last "real" column) we need to be sure to stop parsing before the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm unfamiliar with the package, so I have some potentially basic questions.
I see that this function dispatches over Buses30
. But it's not always the case that a PSSE v30 will have the comment. How do we deal with both situations? Also, it's not just the buses that have augmented names, other components have it too; are they being accounted for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not always the case that a PSSE v30 will have the comment. How do we deal with both situations?
we define Buses30
as having 12 columns, with the 12th being the comments. In this function we parse the first (12 - 2) = 10 columns as usual. We then parse the 11th (and potentially last column present, if no comments) with whitespace as the delimiter (i.e. stop parsing when we hit a space or newline). If when parsing the 11th column we hit the newline character then there are no comments and we set the 12th column as missing
, else if we hit whitespace we then parse the rest of the line as the 12th and final column. See the if newline
in the code.
Also, it's not just the buses that have augmented names, other components have it too; are they being accounted for?
As far as i understood only the Buses comments ("augmented names") were needed, so this PR only parses comments from the Buses section (and only for v30 files). I doesn't extract comments present in any other section. Do we need to?
These are good question and i'm very happy to answer as many questions as you have!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we do use augmented names for other components, namely generators, loads, and branches (incl. transformers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we need to functionality not just for buses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, wait, for other components, these comments can be reconstructed from columns already in the data, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of loads and generators the augmented names are constructed from other columns. But for branches (including transformers) we use the end of line comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transformers are the biggest pain 😂
push!(block.args, quote | ||
# @show (rec, pos, code) | ||
pos = checkdelim!(bytes, pos, len, OPTIONS_SPACE) | ||
(rec, pos, code) = parse_value!(rec, $m, $Tm, bytes, pos, len, OPTIONS_SPACE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to set whitespace as the delim in order to not parse the comment when parsing the second-last entry (hence OPTIONS_SPACE
) but if we parsed with comma as the delim til now then we may be at a whitespace so we use checkdelim!
to move to the next non-whitespace char before trying to parse this second-last entry
@@ -497,6 +497,22 @@ using Test | |||
@test net_space.branches.j == net_space_manual.branches.j | |||
end | |||
|
|||
@testset "End-of-line comments" begin | |||
net_eol = parse_network("testfiles/eolcomments.raw") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: add comment here about only Buses30
having comment
column
IIRC this was the case for CAISO data. So we might have to deal with this in the (not so near) future. |
Yeah... i'm not sure how we'd handle that right now, since we hardcode "space is the deliimiter". I suppose we'd have to eithre manually peak ahead in search of the delimiter to "look" if there's a comma, then parse with comma-as-delimiter OR perhaps easier to parse with space-as-delimiter, check the resulting code and if parsing was I prefer to kick this can down the road until this feature is requested/needded |
@generated function parse_row!(rec::R, bytes, pos, len, options) where {R <: Buses30} | ||
block = Expr(:block) | ||
n = fieldcount(R) | ||
for col in 1:(n - 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of loads and generators the augmented names are constructed from other columns. But for branches (including transformers) we use the end of line comments.
# 111,'STBC ',161.00,1, 0.00, 0.00,227, 1,1.09814, -8.327, 1 /* [STBC 1 ] */ | ||
# And does _not_ handle data with a comma before the comment like: | ||
# 111,'STBC ',161.00,1, 0.00, 0.00,227, 1,1.09814, -8.327, 1, /* [STBC 1 ] */ | ||
@generated function parse_row!(rec::R, bytes, pos, len, options) where {R <: Buses30} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of interest, why does this need to be a generated function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately thiis is the necessary to avoid dynamic dispatch (#22)
whenever we call:
fo col in 1:ncols
T = eltype(fieldtype(R, col))
parse_value!(record, col, T, bytes, pos, len, options)
the type of T
is not known til runtime, meaaning the parse_value!
call has to look-up decide at runtime which method to call, which adds a big performance cost, since we have one parse_value!
call per value/element in the file
so we use a generated function to unroll the loop and generate code like
parse_value!(record, 1, Int64, bytes, pos, len, options)
parse_value!(record, 2, Float64, bytes, pos, len, options)
...
parse_value!(record, 12, Bool, bytes, pos, len, options)
I think the other way we might have been able to avoid the dynamic dispatch is by manually doing what "union splitting" does e.g. like is done in ODBC.jl
CSV.jl uses a mix of these techniques e.g. splitting the big union of expected types and generating code for custom types
@test length(loads) == 1 | ||
@test loads.i == [10010] # first col | ||
@test loads.owner == [1] # last non-missing col | ||
@test isequal(loads.intrpt, [missing]) # last col |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering what exactly this test is checking for. Is it checking that the end of line comment has not been interpreted as intrpt
? In that case should there also be a test that loads.scale
is missing
? scale
is the 13th column so would be where the end of line comment could mistakenly end up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just checking that loads
parse as expected (but checking only select columns as a proxy, rather than every column)
scale is the 13th column so would be where the end of line comment could mistakenly end up.
i don't follow this - can you explain?
Anyway it sounds like if we want something like this PR then it does need to parse commnts from more than just Buses data, so this'd need updating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scale is the 13th column so would be where the end of line comment could mistakenly end up.
I thought this test might be intended to check that the end of line comment hadn't been parsed and placed into the last column by mistake. In the test file for this test, loads
has 12 columns, so I thought checking the 13th column was missing
would be informative.
ideally we'd extract just the text inside the `/*[ ... ] */"` but i've no got that working yet (using https://github.com/JuliaData/Parsers.jl/pull/117), so for now this PR just extracts the whole thing:
Using
Parsers#main
, this now produces:cc @BSnelling @raphaelsaavedra