Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heatmap of observations #202

Merged
merged 11 commits into from
Sep 3, 2024
Merged

Heatmap of observations #202

merged 11 commits into from
Sep 3, 2024

Conversation

bpbond
Copy link
Member

@bpbond bpbond commented Aug 2, 2024

@stephpenn1 something like this?

heatmap

@bpbond bpbond requested a review from stephpenn1 August 2, 2024 16:53
@bpbond
Copy link
Member Author

bpbond commented Aug 2, 2024

Plotting the data by quarter seems better

heatmap

@stephpenn1
Copy link
Member

Yes I like this - leaning towards quarterly but am going to do some exploring on my own with tick marks and will come back with thoughts!

@stephpenn1
Copy link
Member

If we go the monthly route (also using my google drive downloaded data i see very obviously that it did split the download into multiple zips):
image

Does the above graph provide extra clarity to use the monthly plotting? If it's still too confusing, I say we go with quarterly with more tick marks.

Now complicating it a step further, it would be really cool to have a waffle chart with data availability colored by the Instrument column :)

@bpbond
Copy link
Member Author

bpbond commented Aug 2, 2024

Whoah, cool graph you made, although why all the missing data? 😕

it would be really cool to have a waffle chart with data availability colored by the Instrument column

Oh!

@stephpenn1
Copy link
Member

Missing data is because i only used one of the zip files what google drive downloaded (I thought the others were duplicates)

@bpbond
Copy link
Member Author

bpbond commented Aug 2, 2024

Oh got it, thanks.

Are you going to try the waffle chart? Is that like what you did for the COSORE paper?

@stephpenn1
Copy link
Member

I'm trying it out but if it becomes too complicated we'll go with your existing code.

Currently, does this only count # of rows and not "look" into the files?
results$rows[i] <- length(readLines(fls[i])) - 1

@bpbond
Copy link
Member Author

bpbond commented Aug 2, 2024

Exactly right!

@stephpenn1
Copy link
Member

fls <- list.files("~/Documents/data package/v1-1 beta/", pattern = "*.csv$", full.names = TRUE, recursive = TRUE)

library(tibble)
results <- list()

for(i in seq_along(fls)) {
    message(basename(fls[i]))
    
    # results$rows[i] <- length(readLines(fls[i])) - 1
    results[[basename(fls[i])]] <- readr::read_csv(fls[i]) %>% group_by(Site, Instrument, year(TIMESTAMP), month(TIMESTAMP)) %>% summarise(n = n())
}

bind_rows(results) %>% 
    rename(Year = `year(TIMESTAMP)`, Month = `month(TIMESTAMP)`) %>% 
    arrange(Site, Year, Month) -> r

r %>% group_by(Site, Instrument, Year, Month) %>% 
    summarise(n = sum(n)) %>% 
    arrange(Month, Instrument) %>% 
    mutate(data_present = ifelse(n > 0, "Yes", "No"), Month = month.abb[Month]) %>% 
    filter(Site == "MSM") %>% select(-n) %>% ggplot() + 
    geom_tile(aes(x = factor(Month, levels = month.abb), y = Instrument, fill = data_present), size = 01) + 
    facet_wrap(~Year) + 
    theme_minimal() + 
    theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1)) + 
    labs(x = "Month") + 
    scale_fill_manual(values = "palegreen3")

image

@bpbond
Copy link
Member Author

bpbond commented Aug 2, 2024

This will be useful for users but also for you/us. It seems like a great way to look for unexpected missing data streams, etc.

@stephpenn1
Copy link
Member

Good to merge once checks pass

synoptic_avail
synoptic_avail_GCW

@stephpenn1 stephpenn1 merged commit f7418f0 into main Sep 3, 2024
1 check passed
@stephpenn1 stephpenn1 deleted the heatmap branch September 3, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants