Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strict mode? #101

Open
ericphanson opened this issue Oct 10, 2024 · 1 comment
Open

strict mode? #101

ericphanson opened this issue Oct 10, 2024 · 1 comment

Comments

@ericphanson
Copy link
Member

I think there should be some kind of strict mode where a plan can only be executed if everything is explicitly marked ok, like:

  • all channels are being imported, or there's an explicit list of skipped channels
  • all channels have units
  • plan does not have errors

etc. I think some of the current defaults are around ingesting large datasets where it may be OK to miss a channel here and there. But there are use-cases in which you want to verify everything is 100% checked.

@ericphanson
Copy link
Member Author

ericphanson commented Dec 11, 2024

Something like this might be helpful:

function sprint_full_df(df)
    return sprint() do io
        show(io, MIME"text/plain"(), df; allrows=true, allcols=true, truncate=1000)
    end
end

"""
    check_for_plan_errors(plan; throw=true)

Checks an EDF conversion plan for a variety of potential issues:

- errors
- unmatched sensor types
- missing channels
- unmatched units

When there are no issues, `nothing` is returned. When there are problems, the behavior depends on the keyword argument `throw`. If `throw=true`, the default, an error will be thrown with an informative message; otherwise, a list of issues (as strings) are returned, when problems are detected.
"""
function check_for_plan_errors(plan; throw=true)
    plan = DataFrame(plan)
    issues = String[]
    # If there are error(s), plan needs manual inspection
    error_rows = subset(plan, :error => ByRow(e -> e !== nothing && !ismissing(e)))
    if !isempty(error_rows)
        push!(issues,
              """
              Errors:

              $(sprint_full_df(error_rows[:, [:label, :error]]))
              """)
    end
    missing_sensor_type = subset(plan, :sensor_type => ByRow(ismissing))
    if !isempty(missing_sensor_type)
        push!(issues,
              """
              Unmatched sensor types:

              $(sprint_full_df(missing_sensor_type[:, [:label, :transducer_type, :channel]]))
              """)
    end
    missing_channel = subset(plan, :sensor_type => ByRow(ismissing))
    if !isempty(missing_channel)
        push!(issues,
              """
              Missing channels:

              $(sprint_full_df(missing_channel[:, [:label, :transducer_type, :sensor_type, :channel]]))
              """)
    end

    bad_units = subset(plan,
                       [:physical_dimension, :sample_unit] => ByRow() do p, u
                           # no unit in the EDF, nothing we can do
                           p == "" && return false
                           return u == "unknown"
                       end)

    if !isempty(bad_units)
        push!(issues,
              """
              Channels with missing `sample_unit` even though EDF has non-empty `physical_dimension`:

              $(sprint_full_df(bad_units[:, [:label, :transducer_type, :sensor_type, :physical_dimension, :sample_unit]]))
              """)
    end
    isempty(issues) && return nothing
    if throw
        msg = sprint() do io
            println(io, "Found $(length(issues)) problems in EDF conversion plan.")
            for (idx, problem) in enumerate(issues)
                println(io, "$idx. $problem")
            end
        end
        error(msg)
    end
    return issues
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant