Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate of zeros in each feature #335

Open
sigmafelix opened this issue Jun 3, 2024 · 1 comment
Open

Rate of zeros in each feature #335

sigmafelix opened this issue Jun 3, 2024 · 1 comment

Comments

@sigmafelix
Copy link
Collaborator

We briefly discussed how to treat excessive true zeros in features even after postprocessing and imputation. The fraction of zeros per feature will give us a good reference to determine our measures.

  • Before imputation, the calculation results contain 3858 columns including site_id, time, and event flag. Five lagged features will be added after imputation.
  • The table below summarizes the number of features with zeros up to a certain percentile:
Percentile 50 60 70 80 90 100
Number of features 2218 2174 2100 2031 1881 1247
  • To note, zero-only features were excluded before the imputation.

@kyle-messier

@sigmafelix
Copy link
Collaborator Author

The current imputation procedure includes all features unless any of these has zero variance. We could easily adjust the base function to change that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant