-
-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2024-11-13: Missing terms are hard to find. #753
Labels
question
Further information is requested
Comments
Somewhat related: the checklist in #239 is quite outdated. A lot of the unchecked terms have now been defined. |
Wrote R code to do this library(yaml)
# Load the YAML file
yaml_data <- yaml.load_file("https://github.com/HeidiSeibold/glosario/raw/refs/heads/HeidiSeibold-glosario-1/glossary.yml",
as.named.list = TRUE)
# Initialize a list to hold the terms and definitions
terms_data <- list()
# Loop through each term entry in the YAML data
#for (entry in yaml_data) {
for (i in 1:length(yaml_data)) {
entry <- yaml_data[[i]]
slug <- entry$slug
print(slug)
# Loop through each language in the entry
for (lang in names(entry)) {
# Skip the 'slug' and 'ref' field
if(lang %in% c("slug", "ref")) {
terms_data <- terms_data
} else {
term <- entry[[lang]]$term
def <- entry[[lang]]$def
# Store in a structured format for later conversion
terms_data <- append(terms_data, list(data.frame(
slug = slug,
language = lang
)))
}
}
}
# Combine all individual data frames into one
terms_df <- do.call(rbind, terms_data)
# Pivot the data to have languages as columns
library(dplyr)
library(tidyr)
# Convert to wide format with 1 for presence and 0 for absence
df_wide <- terms_df %>%
mutate(present = 1) %>% # Add a column to indicate presence
pivot_wider(
names_from = language, # Make language codes the columns
values_from = present, # Use presence indicator
values_fill = list(present = 0) # Fill missing combinations with 0
)
# View the final table
print(df_wide) Rows: slugs TODO:
|
@HeidiSeibold thank you so much for this - we are are also working on the issue here #752 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It is hard to find terms that are missing and in which language they are missing.
Ideas on how to solve:
Any other ideas?
The text was updated successfully, but these errors were encountered: