Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting 'number stored as text' error in mixed columns #93

Open
cat-hand opened this issue Apr 19, 2023 · 6 comments
Open

Getting 'number stored as text' error in mixed columns #93

cat-hand opened this issue Apr 19, 2023 · 6 comments
Labels
bug Something isn't working must MoSCoW priority
Milestone

Comments

@cat-hand
Copy link

I'm exporting rounded and suppressed numbers and am getting the 'number stored as text' error in all of my numeric columns, even for columns where there are no rounded or suppressed numbers. I tried using numFmt = "#,##0" in the openxlsx createStyle function and it hasn't helped.

It looks like it might be an openxlsx issue and seems like it's solved in openxlsx2.

Can anyone help?

@matt-dray
Copy link
Collaborator

Hi Cat, thanks for highlighting this. Are you able to provide a reproducible example of the issue?

I've created a small example below that generates a workbook with a table (in sheet 'Table_1') that has two columns: one column (mpg) that's numeric but has a suppressed value ('[c]') and one column (cyl) that is fully numeric. In the attached screenshot, you can see (as expected) the 'number stored as text' warning is in column mpg, but there's no warning in cyl. Do you get the same result?

(Sidenote: as it happens, I may be converting {a11ytables} to use {openxlsx2} instead of {openxlsx} in future (see #70). This issue may be another reason to make the change.)

# Prepare supporting tables
support_tables <- list(
  cover_df = tibble::tribble(
    ~subsection_title, ~subsection_content,
    "Purpose", "Example results for something.",
    "Workbook properties", "Some placeholder information.",
    "Contact", "Placeholder email"
  ),
  contents_df = tibble::tribble(
    ~"Sheet name", ~"Sheet title",
    "Notes", "Notes",
    "Table_1", "Example sheet title"
  ),
  notes_df = tibble::tribble(
    ~"Note number", ~"Note text",
    "[note 1]", "Placeholder note.",
    "[note 2]", "Placeholder note."
  )
)

# Prepare example dataset
table_df <- mtcars
table_df[["car [note 1]"]] <- row.names(mtcars)  # row names to named column
row.names(table_df) <- NULL  # remove row names
table_df <- table_df[1:5, c("car [note 1]", "mpg", "cyl")]  # add note to header
table_df[2, "mpg"] <- "[c]"  # make a value confidential in the mpg column
table_df["Notes"] <- c("[note 2]", rep(NA_character_, 4))  # include note column

# Create new a11ytable
my_a11ytable <-
  a11ytables::create_a11ytable(
    tab_titles = c(
      "Cover",
      "Contents",
      "Notes",
      "Table_1"
    ),
    sheet_types = c(
      "cover",
      "contents",
      "notes",
      "tables"
    ),
    sheet_titles = c(
      "Cover title (example)",
      "Contents",
      "Notes",
      "Example sheet title"
    ),
    blank_cells = c(
      NA_character_,
      NA_character_,
      NA_character_,
      "Blank cells mean that a row does not have a note."
    ),
    sources = c(
      NA_character_,
      NA_character_,
      NA_character_,
      "Example source."
    ),
    tables = list(
      support_tables$cover_df,
      support_tables$contents_df,
      support_tables$notes_df,
      table_df
    )
  )

# Generate workbook from a11ytable
my_wb <- a11ytables::generate_workbook(my_a11ytable)

# Preview output
openxlsx::openXL(my_wb)

image

@cat-hand
Copy link
Author

Thanks for the quick reply Matt! I think the problem is that all my 'numeric' columns are actually text by the time I create my a11ytable - I have a function which rounds and suppresses all of the figures, so they are all stored as strings after using that function. I read that openxlx bases the excel column type on the column type in the input data, so I'm assuming that this is what is causing the error.

Was hoping there was some kind of workaround but I guess not haha. I have loads of tables in my publication so getting rid of the error manually isn't awesome. openxlsx2 sounds very promising though!

For info, this is essentially how that function works:

library(dplyr)
table_df <- mtcars %>%
  mutate(across(where(is.numeric), ~case_when(
    # in this example, figures less than 5 are suppressed
    .x < 5 ~ "[c]",
    # figures less than 6 are 'unreliable' so are shown but marked with 'u'
    .x < 6 ~ paste(janitor::round_half_up(.x, 1), "[u]"),
    TRUE ~ as.character(janitor::round_half_up(.x, 0)))))

@matt-dray
Copy link
Collaborator

Okey dokey, I think I understand: columns get converted to character class even if they remain unsuppressed. I think there's a few ways to convert numbers-only columns back to numeric en masse. Below is some demo code that uses type.convert() to do this (see how numeric column y gets converted to character in table_supp, but then ends up as numeric at the end). Let me know if I haven't quite understood.

suppressPackageStartupMessages(library(tidyverse))
library(janitor, warn.conflicts = FALSE)

set.seed(1337)

tbl <- tibble(
  id = paste0("id_", 1:10),
  x  = runif(10, 0, 10),  # will contain values to be suppressed
  y  = runif(10, 6, 10)   # won't need any suppression
)

tbl
#> # A tibble: 10 × 3
#>    id        x     y
#>    <chr> <dbl> <dbl>
#>  1 id_1  5.76   9.92
#>  2 id_2  5.65   9.97
#>  3 id_3  0.740  9.31
#>  4 id_4  4.54   6.78
#>  5 id_5  3.73   9.93
#>  6 id_6  3.31   6.10
#>  7 id_7  9.48   9.89
#>  8 id_8  2.81   9.70
#>  9 id_9  2.45   7.36
#> 10 id_10 1.46   6.99

tbl_supp <- tbl |> 
  mutate(
    across(
      where(is.numeric),
      \(value) case_when(
        value < 5 ~ "[c]",                                  # suppress
        value < 6 ~ paste(round_half_up(value, 1), "[u]"),  # 'unreliable' flag
        TRUE      ~ as.character(round_half_up(value, 0))   # otherwise round
      )
    )
  )

tbl_supp  # note 'y' is character but contains only numbers
#> # A tibble: 10 × 3
#>    id    x       y    
#>    <chr> <chr>   <chr>
#>  1 id_1  5.8 [u] 10   
#>  2 id_2  5.6 [u] 10   
#>  3 id_3  [c]     9    
#>  4 id_4  [c]     7    
#>  5 id_5  [c]     10   
#>  6 id_6  [c]     6    
#>  7 id_7  9       10   
#>  8 id_8  [c]     10   
#>  9 id_9  [c]     7    
#> 10 id_10 [c]     7

type.convert(tbl_supp, as.is = TRUE)  # note 'y' is now numeric
#> # A tibble: 10 × 3
#>    id    x           y
#>    <chr> <chr>   <int>
#>  1 id_1  5.8 [u]    10
#>  2 id_2  5.6 [u]    10
#>  3 id_3  [c]         9
#>  4 id_4  [c]         7
#>  5 id_5  [c]        10
#>  6 id_6  [c]         6
#>  7 id_7  9          10
#>  8 id_8  [c]        10
#>  9 id_9  [c]         7
#> 10 id_10 [c]         7

@cat-hand
Copy link
Author

Thank Matt, I didn't know about that function 😃 will give that a go

@matt-dray matt-dray added the bug Something isn't working label Aug 29, 2023
@matt-dray matt-dray added the must MoSCoW priority label Nov 5, 2023
@matt-dray matt-dray added this to the v0.4 milestone Jan 4, 2024
@jake-pitchers
Copy link

Hi there,

Have been using this package for a while now. I was wondering whether there was any plans to have it so columns with both strings and numbers could be stored as numbers where appropriate. Alternatively is there is a neat way to do it with openxslx that would be great to see :)

Thanks,

Jake

@cat-hand
Copy link
Author

cat-hand commented Jul 2, 2024

@jake-pitchers I have a temporary fix for this, stole it from the rapid.spreadsheets package:

#' Fixes column where numbers are stored as text
#'
#' Overwrites selected columns and rows that contain both numeric and character
#' elements. Once this is run, you should add a new style to the modified
#' columns and rows so that they are displayed correctly in the output file. Use
#' [openxlsx::addStyle()] and [get_cell_style()] to do this.
#'
#' @importFrom openxlsx writeData
#' @importFrom dplyr pull if_else
#' @importFrom stringr str_remove_all str_detect
#'
#' @param excel_wb Openxlsx workbook name
#' @param sheet Worksheet (either name as string, or location as numeric)
#' @param cols Vector of column numbers to be overwritten
#' @param rows Vector of row numbers to be overwritten
#' @param df Data frame containing the data from the relevant worksheet
#'
#' @return Updated workbook with modified columns
#'
#' @export

overwrite_num_cols <- function(excel_wb, sheet, cols, rows, df) {

  lapply(seq_along(cols), \(col) {

    full_col <- pull(df[cols], col)
    # Only convert numbers to numeric if they aren't marked with [u]
    full_col_num <- str_remove_all(full_col, "(,|%)(?!.*[u])")

    lapply(seq_along(rows), \(row) {

      # If the cell contains a character (e.g. [c]), return the character value
      if (is.na(suppressWarnings(as.numeric(full_col_num[[row]])))) {

        new_value <- full_col_num[[row]]

      } else {

        # If the cell contains a number, return the numeric value (if it's a
        # percentage, divide the value by 100)
        new_value <- if_else(any(str_detect(full_col, "%")),
                             as.numeric(full_col_num[[row]]) / 100,
                             as.numeric(full_col_num[[row]]))

      }

      openxlsx::writeData(excel_wb, sheet, new_value,
                          startCol = cols[col],
                          startRow = (row - 1) + rows[1])

    })
  })
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working must MoSCoW priority
Projects
None yet
Development

No branches or pull requests

3 participants