Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cpp11 #405

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open

Use cpp11 #405

wants to merge 27 commits into from

Conversation

mgirlich
Copy link
Contributor

@mgirlich mgirlich commented Sep 4, 2023

const xmlChar* uri = (xmlChar*) Rf_translateCharUTF8(STRING_ELT(x_sxp, i));
SET_STRING_ELT(out, i, Xml2String(xmlBuildURI(uri, base_uri)).asRString());
const xmlChar* uri = (xmlChar*) Rf_translateCharUTF8(x_sxp[i]);
out[i] = Xml2String(xmlBuildURI(uri, base_uri)).asRString();
Copy link
Contributor Author

@mgirlich mgirlich Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the slow down due to cpp11 protection probably doesn't matter so much here (https://cpp11.r-lib.org/articles/FAQ.html#ok-but-i-really-want-to-call-cpp11unwind_protect-manually).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea if it isn't critical then I'd probably leave as is


for (int i = 0; i < n; ++i) {
const xmlChar* uri = (xmlChar*) Rf_translateCharUTF8(STRING_ELT(x_sxp, i));
SET_STRING_ELT(out, i, Xml2String(xmlBuildURI(uri, base_uri)).asRString());
const xmlChar* uri = (xmlChar*) Rf_translateCharUTF8(x_sxp[i]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we can somehow leave away the Rf_translateCharUTF8() due to using cpp11.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think cpp11 attempts to convert to UTF8 in this case

It looks like if we went through an intermediate std::string it would translate, but what you currently have seems ok
https://github.com/r-lib/cpp11/blob/500f642b4ea132ec8c168fc70a28e81e9510ece3/inst/include/cpp11/r_string.hpp#L29

src/xml2_url.cpp Outdated
if (Rf_xlength(base_sxp) > 1) {
Rf_error("Base URL must be length 1");
const xmlChar* to_xml_chr(cpp11::strings x, const char* arg) {
if (x.size() > 1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make more sense to check this in R.

@@ -8,21 +10,7 @@
#include <cstring>

SEXP read_bin(SEXP con, size_t bytes = 64 * 1024);
SEXP write_bin(SEXP data, SEXP con);

inline SEXP R_GetConnection(SEXP con) { return con; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed R_GetConnection() as it seemed unnecessary.


inline SEXP R_GetConnection(SEXP con) { return con; }

inline size_t R_ReadConnection(SEXP con, void* buf, size_t n) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R_ReadConnection() wasn't used anywhere.

src/connection.h Outdated

return Rf_xlength(res);
}
cpp11::sexp write_bin(cpp11::sexp data, cpp11::sexp con);

inline size_t R_WriteConnection(SEXP con, void* buf, size_t n) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses memcpy() and I wasn't sure how to do this properly with cpp11.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is an alternative to memcpy(), but you could allocate a cpp11 raws and then do payload.sexp() to get access to the SEXP you can call RAW() on. I don't think that is too awful, and at least lets cpp11 manage the memory

src/connection.h Outdated

return Rf_xlength(res);
}
cpp11::sexp write_bin(cpp11::sexp data, cpp11::sexp con);

inline size_t R_WriteConnection(SEXP con, void* buf, size_t n) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is an alternative to memcpy(), but you could allocate a cpp11 raws and then do payload.sexp() to get access to the SEXP you can call RAW() on. I don't think that is too awful, and at least lets cpp11 manage the memory

// [[export]]
extern "C" SEXP unique_ns(SEXP ns) {
[[cpp11::register]]
cpp11::sexp unique_ns(SEXP ns) {
BEGIN_CPP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in the other places below. Do we need this macro at all anymore? I assume if you go through cpp11 everywhere it will wrap it in try/catch for you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavisVaughan If you could merge #400 then I could also finish xml2_node.cpp and I think then it should be safe to remove the macros.

src/xml2_output.cpp Outdated Show resolved Hide resolved
src/xml2_url.cpp Outdated Show resolved Hide resolved

for (int i = 0; i < n; ++i) {
const xmlChar* uri = (xmlChar*) Rf_translateCharUTF8(STRING_ELT(x_sxp, i));
SET_STRING_ELT(out, i, Xml2String(xmlBuildURI(uri, base_uri)).asRString());
const xmlChar* uri = (xmlChar*) Rf_translateCharUTF8(x_sxp[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think cpp11 attempts to convert to UTF8 in this case

It looks like if we went through an intermediate std::string it would translate, but what you currently have seems ok
https://github.com/r-lib/cpp11/blob/500f642b4ea132ec8c168fc70a28e81e9510ece3/inst/include/cpp11/r_string.hpp#L29

const xmlChar* uri = (xmlChar*) Rf_translateCharUTF8(STRING_ELT(x_sxp, i));
SET_STRING_ELT(out, i, Xml2String(xmlBuildURI(uri, base_uri)).asRString());
const xmlChar* uri = (xmlChar*) Rf_translateCharUTF8(x_sxp[i]);
out[i] = Xml2String(xmlBuildURI(uri, base_uri)).asRString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea if it isn't critical then I'd probably leave as is

src/xml2_url.cpp Outdated Show resolved Hide resolved
Comment on lines -68 to +73
SET_STRING_ELT(scheme, i, Rf_mkChar(uri->scheme == NULL ? "" : uri->scheme));
SET_STRING_ELT(server, i, Rf_mkChar(uri->server == NULL ? "" : uri->server));
INTEGER(port)[i] = uri->port == 0 ? NA_INTEGER : uri->port;
SET_STRING_ELT(user, i, Rf_mkChar(uri->user == NULL ? "" : uri->user));
SET_STRING_ELT(path, i, Rf_mkChar(uri->path == NULL ? "" : uri->path));
SET_STRING_ELT(fragment, i, Rf_mkChar(uri->fragment == NULL ? "" : uri->fragment));
scheme[i] = uri->scheme == NULL ? "" : uri->scheme;
server[i] = uri->server == NULL ? "" : uri->server;
port[i] = uri->port == 0 ? NA_INTEGER : uri->port;
user[i] = uri->user == NULL ? "" : uri->user;
path[i] = uri->path == NULL ? "" : uri->path;
fragment[i] = uri->fragment == NULL ? "" : uri->fragment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one might get significantly slower, but again it doesn't seem like it is used all that often? I couldn't really say

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it shouldn't be used that often.

@mgirlich
Copy link
Contributor Author

mgirlich commented Nov 3, 2023

@DavisVaughan I now also migrated xml2_node.cpp to cpp11. But the performance hit from string handling really bites us now. It is nearly as slow as before replacing the S3 dispatch.

@DavisVaughan
Copy link
Member

Is it one or two functions in particular?

@mgirlich
Copy link
Contributor Author

mgirlich commented Nov 3, 2023

I haven't looked into this in more detail but only checked xml_name() (see the benchmark in #400). The easiest would probably be to check and carefully revert some of the changes and then wrap things with cpp11::unwind_protect().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants