Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAs are not allowed in subscripted assignments #15

Open
quzhouxiachuan opened this issue Mar 4, 2019 · 9 comments · May be fixed by #17
Open

NAs are not allowed in subscripted assignments #15

quzhouxiachuan opened this issue Mar 4, 2019 · 9 comments · May be fixed by #17

Comments

@quzhouxiachuan
Copy link

quzhouxiachuan commented Mar 4, 2019

Hi
I am using strip_rtf() to convert rtf to plain text. I encountered the following errors: strip_rtf(x[8]) Error in out[table_flg] <- paste(row_start, out[table_flg], row_end, sep = "") :
I checked x[8] content, it is not NA.

Could you please help with that?

@kota7
Copy link
Owner

kota7 commented Mar 4, 2019

@quzhouxiachuan I would love to fix it. Can you give me a reproducible example of error?

@strazto
Copy link

strazto commented Jan 9, 2020

I'm having the same problem, though the debugger is more illuminating of the cause (not related to the input being NA, but rather an NA value in the logical vector used to index out

Unfortunately I'm having a really hard time getting a reprex that doesn't violate medical ethics ( trying to manually edit raw rtf was a lesson in patience but mainly a lesson in humility and pain ), rtf is the ugliest markup I've ever seen.

I will say, however, that the easy fix is probably just to convert those NA's in tbl_flg to FALSE and accept that it might slightly break some table formatting

@strazto
Copy link

strazto commented Jan 9, 2020

I'll put in a PR for the fix when I'm home

@kota7
Copy link
Owner

kota7 commented Jan 9, 2020

@mstr3336 Thanks, I look forward to seeing that. Do you think you can share a test case too?

@strazto
Copy link

strazto commented Jan 9, 2020 via email

@kota7
Copy link
Owner

kota7 commented Jan 9, 2020

@mstr3336 I understand that. No problem.

@strazto
Copy link

strazto commented Jan 10, 2020

I was able to identify the character that caused the NA-

\'e6 2 NQ\'d9\'81\'84\'c8m4\'cd\'cd1!p\'82 

gives

[46] " 2 NQ"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
[47] NA                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
[48] "m4"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
[49] "췍"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
[50] "1!p"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
[51] "\u0082"

Suggesting that the

\'d9\'81\'84

section is what's giving us grief.

in the "parsed" matrix, we see:

parsed$intcode
[[47]]
[1] 55681 33992

If I perform the following:

naughty_pair <- c(55681, 33992)

naughty_char <- intToUtf8(naughty_pair)

I get the following:

naughty_char
[1] NA

This takes place here:

out <- lapply(parsed$intcode, intToUtf8) %>% unlist()

This is where (at least my) NA was introduced.

The question is then -

Do I handle the NA (Eg, convert to "") here?

Or do I handle the NA around in the following block?

striprtf/R/striprtf.R

Lines 172 to 178 in 649c245

# remove empty table sections
emp_tbl <- (nchar(out) == 0) & table_flg
out <- out[!emp_tbl]
table_flg <- table_flg[!emp_tbl]
# row start and end indicators added
out[table_flg] <- paste(row_start, out[table_flg], row_end, sep = "")

My suspicion is that the following line is is intended to provide similar functionality:

striprtf/R/striprtf.R

Lines 173 to 174 in 649c245

emp_tbl <- (nchar(out) == 0) & table_flg
out <- out[!emp_tbl]

However, it's probably best to handle the NA's as soon as they are introduced, so they don't propagate into the logicals, as line 173 also introduces NAs into emp_tbl

strazto added a commit to strazto/striprtf that referenced this issue Jan 10, 2020
Intended to solve kota7#15

The NAs introduced by this also started tracking into the logical vectors used to subscript assignments / filter empty elements.

Replacing them with "" mitigates this
@strazto strazto linked a pull request Jan 10, 2020 that will close this issue
@strazto
Copy link

strazto commented Jan 10, 2020

Thanks for considering my PR @kota7 .

It's now working for the problem document

@kota7
Copy link
Owner

kota7 commented Jan 10, 2020

@mstr3336 Thanks for your analysis. If possible, can you share an RTF file that contains that problematic string?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants