strict mode? #101

ericphanson · 2024-10-10T15:52:01Z

I think there should be some kind of strict mode where a plan can only be executed if everything is explicitly marked ok, like:

all channels are being imported, or there's an explicit list of skipped channels
all channels have units
plan does not have errors

etc. I think some of the current defaults are around ingesting large datasets where it may be OK to miss a channel here and there. But there are use-cases in which you want to verify everything is 100% checked.

The text was updated successfully, but these errors were encountered:

ericphanson · 2024-12-11T12:43:49Z

Something like this might be helpful:

function sprint_full_df(df)
    return sprint() do io
        show(io, MIME"text/plain"(), df; allrows=true, allcols=true, truncate=1000)
    end
end

"""
    check_for_plan_errors(plan; throw=true)

Checks an EDF conversion plan for a variety of potential issues:

- errors
- unmatched sensor types
- missing channels
- unmatched units

When there are no issues, `nothing` is returned. When there are problems, the behavior depends on the keyword argument `throw`. If `throw=true`, the default, an error will be thrown with an informative message; otherwise, a list of issues (as strings) are returned, when problems are detected.
"""
function check_for_plan_errors(plan; throw=true)
    plan = DataFrame(plan)
    issues = String[]
    # If there are error(s), plan needs manual inspection
    error_rows = subset(plan, :error => ByRow(e -> e !== nothing && !ismissing(e)))
    if !isempty(error_rows)
        push!(issues,
              """
              Errors:

              $(sprint_full_df(error_rows[:, [:label, :error]]))
              """)
    end
    missing_sensor_type = subset(plan, :sensor_type => ByRow(ismissing))
    if !isempty(missing_sensor_type)
        push!(issues,
              """
              Unmatched sensor types:

              $(sprint_full_df(missing_sensor_type[:, [:label, :transducer_type, :channel]]))
              """)
    end
    missing_channel = subset(plan, :sensor_type => ByRow(ismissing))
    if !isempty(missing_channel)
        push!(issues,
              """
              Missing channels:

              $(sprint_full_df(missing_channel[:, [:label, :transducer_type, :sensor_type, :channel]]))
              """)
    end

    bad_units = subset(plan,
                       [:physical_dimension, :sample_unit] => ByRow() do p, u
                           # no unit in the EDF, nothing we can do
                           p == "" && return false
                           return u == "unknown"
                       end)

    if !isempty(bad_units)
        push!(issues,
              """
              Channels with missing `sample_unit` even though EDF has non-empty `physical_dimension`:

              $(sprint_full_df(bad_units[:, [:label, :transducer_type, :sensor_type, :physical_dimension, :sample_unit]]))
              """)
    end
    isempty(issues) && return nothing
    if throw
        msg = sprint() do io
            println(io, "Found $(length(issues)) problems in EDF conversion plan.")
            for (idx, problem) in enumerate(issues)
                println(io, "$idx. $problem")
            end
        end
        error(msg)
    end
    return issues
end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strict mode? #101

strict mode? #101

ericphanson commented Oct 10, 2024

ericphanson commented Dec 11, 2024 •

edited

Loading

strict mode? #101

strict mode? #101

Comments

ericphanson commented Oct 10, 2024

ericphanson commented Dec 11, 2024 • edited Loading

ericphanson commented Dec 11, 2024 •

edited

Loading