-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strict mode? #101
Comments
Something like this might be helpful: function sprint_full_df(df)
return sprint() do io
show(io, MIME"text/plain"(), df; allrows=true, allcols=true, truncate=1000)
end
end
"""
check_for_plan_errors(plan; throw=true)
Checks an EDF conversion plan for a variety of potential issues:
- errors
- unmatched sensor types
- missing channels
- unmatched units
When there are no issues, `nothing` is returned. When there are problems, the behavior depends on the keyword argument `throw`. If `throw=true`, the default, an error will be thrown with an informative message; otherwise, a list of issues (as strings) are returned, when problems are detected.
"""
function check_for_plan_errors(plan; throw=true)
plan = DataFrame(plan)
issues = String[]
# If there are error(s), plan needs manual inspection
error_rows = subset(plan, :error => ByRow(e -> e !== nothing && !ismissing(e)))
if !isempty(error_rows)
push!(issues,
"""
Errors:
$(sprint_full_df(error_rows[:, [:label, :error]]))
""")
end
missing_sensor_type = subset(plan, :sensor_type => ByRow(ismissing))
if !isempty(missing_sensor_type)
push!(issues,
"""
Unmatched sensor types:
$(sprint_full_df(missing_sensor_type[:, [:label, :transducer_type, :channel]]))
""")
end
missing_channel = subset(plan, :sensor_type => ByRow(ismissing))
if !isempty(missing_channel)
push!(issues,
"""
Missing channels:
$(sprint_full_df(missing_channel[:, [:label, :transducer_type, :sensor_type, :channel]]))
""")
end
bad_units = subset(plan,
[:physical_dimension, :sample_unit] => ByRow() do p, u
# no unit in the EDF, nothing we can do
p == "" && return false
return u == "unknown"
end)
if !isempty(bad_units)
push!(issues,
"""
Channels with missing `sample_unit` even though EDF has non-empty `physical_dimension`:
$(sprint_full_df(bad_units[:, [:label, :transducer_type, :sensor_type, :physical_dimension, :sample_unit]]))
""")
end
isempty(issues) && return nothing
if throw
msg = sprint() do io
println(io, "Found $(length(issues)) problems in EDF conversion plan.")
for (idx, problem) in enumerate(issues)
println(io, "$idx. $problem")
end
end
error(msg)
end
return issues
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I think there should be some kind of strict mode where a plan can only be executed if everything is explicitly marked ok, like:
errors
etc. I think some of the current defaults are around ingesting large datasets where it may be OK to miss a channel here and there. But there are use-cases in which you want to verify everything is 100% checked.
The text was updated successfully, but these errors were encountered: