UTF-8/odd character handling causes header headaches #188

SimpleAddress4390 · 2023-07-31T20:13:06Z

Hi. Love dfSummary. I am processing large numbers of dataframes where some fields have Hebrew characters. I've been able to isolate an example where the original column text causes the dfSummary headers to become RMarkdown headings.

By using stringr to remove everything but alpha/numeric and punctuation, it works, but that approach of course assumes I know which fields to process before passing to dfSummary.

Is this just a known limitation, or a bug, or ...

I've provided a reproducible example RMD and htm examples of when it fails and when it works.

dfSummary-issue-20230731.zip

thanks for any insight.

dcomtois · 2023-08-20T11:03:36Z

Hello,

I notice you use the method = argument in the dfSummary() call directly; take a closer look at the vignette (https://cran.r-project.org/web/packages/summarytools/vignettes/rmarkdown.html), you'll see that you need to use print(), i.e.:

print( dfSummary(...), method = 'render')

You'll see a big difference in the rendering... Hope this resolves the issue, and sorry for the delay (I suggest you try StackOverflow to get a quicker response)

SimpleAddress4390 · 2023-08-21T11:20:18Z

Thanks for the reply. I'll look over at StackOverflow. I did know about the Print() and had tested that (but removed to simply the interactions). It fails with Print as well. I have traced the issue to something about the characters. Applying a UTF8 Normalize routine seems to fix it, but the fix is the data 'per field' not with the routine options. These two strings do not match using "==" str(a) #chr "בית אל, ניידת - 7" str(b) #chr "בית אל, ניידת - 7" charToRaw(a) charToRaw(b) charToRaw(a) #[1] d7 91 d7 99 d7 aa c2 a0 d7 90 d7 9c 2c c2 a0 d7 a0 d7 99 d7 99 d7 93 d7 aa c2 a0 2d c2 a0 37 charToRaw(b) #[1] d7 91 d7 99 d7 aa 20 d7 90 d7 9c 2c 20 d7 a0 d7 99 d7 99 d7 93 d7 aa 20 2d 20 37 These two DO match after performing mutate (fixedString=utf8_normalize(badString, map_case=TRUE,map_compat=TRUE,map_quote=TRUE,remove_ignorable=TRUE)) Again, thanks!

…

On Sun, Aug 20, 2023 at 2:03 PM Dominic Comtois ***@***.***> wrote: Hello, I notice you use the method = argument in the dfSummary() call directly; take a closer look at the vignette ( https://cran.r-project.org/web/packages/summarytools/vignettes/rmarkdown.html), you'll see that you need to use print(), i.e.: print( dfSummary(...), method = 'render') You'll see a big difference in the rendering... Hope this resolves the issue, and sorry for the delay (I suggest you try StackOverflow to get a quicker response) — Reply to this email directly, view it on GitHub <#188 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABWXT26BTCYCOQ53O5JXV6TXWHVJFANCNFSM6AAAAAA26YDFDE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

dcomtois · 2023-08-21T16:46:43Z

Ok I'll try and look into it in more details, in the meantime feel free to share new insights here! Thx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8/odd character handling causes header headaches #188

UTF-8/odd character handling causes header headaches #188

SimpleAddress4390 commented Jul 31, 2023 •

edited

Loading

dcomtois commented Aug 20, 2023

SimpleAddress4390 commented Aug 21, 2023 via email

dcomtois commented Aug 21, 2023

UTF-8/odd character handling causes header headaches #188

UTF-8/odd character handling causes header headaches #188

Comments

SimpleAddress4390 commented Jul 31, 2023 • edited Loading

dcomtois commented Aug 20, 2023

SimpleAddress4390 commented Aug 21, 2023 via email

dcomtois commented Aug 21, 2023

SimpleAddress4390 commented Jul 31, 2023 •

edited

Loading