Skip to content

Commit

Permalink
Deploying to gh-pages from @ 1ae810d 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
andrie committed Jul 28, 2024
1 parent fa3b664 commit eb5deef
Show file tree
Hide file tree
Showing 9 changed files with 9 additions and 9 deletions.
2 changes: 1 addition & 1 deletion r-admin/site_libs/bootstrap/bootstrap.min.css

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion r-data/site_libs/bootstrap/bootstrap.min.css

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion r-exts/System-and-foreign-language-interfaces.html
Original file line number Diff line number Diff line change
Expand Up @@ -1886,7 +1886,7 @@ <h2 class="section anchored" data-number="5.15" data-anchor-id="character-encodi
<p>can be used to detect whether the internal representation of a given <code>CHARSXP</code> accessed via <code>CHAR</code> is UTF-8 (including ASCII). This function is rarely needed and specifically is not needed with <code>translateCharUTF8</code>, because such check is already included. However, when needed, it is better to use it in preference of <code>getCharCE</code>, as it is safer against future changes in the semantics of encoding marks and covers strings internally represented in the native encoding. Note that <code>charIsUTF8()</code> is not equivalent to <code>getCharCE() == CE_UTF8</code>.</p>
<p>Similarly, function</p>
<div class="sourceCode" id="cb133"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb133-1"><a href="#cb133-1" aria-hidden="true" tabindex="-1"></a>Rboolean charIsLatin1<span class="op">(</span>SEXP<span class="op">);</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>can be used to detect whether the internal representation of a given <code>CHARSXP</code> accessed via <code>CHAR</code> is latin1 (including ASCII). It is not equivalent to to <code>getCharCE() == CE_LATIN1</code>.</p>
<p>can be used to detect whether the internal representation of a given <code>CHARSXP</code> accessed via <code>CHAR</code> is latin1 (including ASCII). It is not equivalent to <code>getCharCE() == CE_LATIN1</code>.</p>
<p>Function</p>
<div class="sourceCode" id="cb134"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb134-1"><a href="#cb134-1" aria-hidden="true" tabindex="-1"></a><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>reEnc<span class="op">(</span><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>x<span class="op">,</span> cetype_t ce_in<span class="op">,</span> cetype_t ce_out<span class="op">,</span></span>
<span id="cb134-2"><a href="#cb134-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">int</span> subst<span class="op">);</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
Expand Down
2 changes: 1 addition & 1 deletion r-exts/search.json
Original file line number Diff line number Diff line change
Expand Up @@ -564,7 +564,7 @@
"href": "System-and-foreign-language-interfaces.html#character-encoding-issues",
"title": "5  System and foreign language interfaces",
"section": "5.15 Character encoding issues",
"text": "5.15 Character encoding issues\nCHARSXPs can be marked as coming from a known encoding (Latin-1 or UTF-8). This is mainly intended for human-readable output, and most packages can just treat such CHARSXPs as a whole. However, if they need to be interpreted as characters or output at C level then it would normally be correct to ensure that they are converted to the encoding of the current locale: this can be done by accessing the data in the CHARSXP by translateChar rather than by CHAR. If re-encoding is needed this allocates memory with R_alloc which thus persists to the end of the .Call/.External call unless vmaxset is used (see Transient storage allocation).\nThere is a similar function translateCharUTF8 which converts to UTF-8: this has the advantage that a faithful translation is almost always possible (whereas only a few languages can be represented in the encoding of the current locale unless that is UTF-8).\nBoth translateChar and translateCharUTF8 will translate any input, using escapes such as &lt;A9&gt; and &lt;U+0093&gt; to represent untranslatable parts of the input.\nThere is a public interface to the encoding marked on CHARSXPs via\ntypedef enum {CE_NATIVE, CE_UTF8, CE_LATIN1, CE_BYTES, CE_SYMBOL, CE_ANY} cetype_t;\ncetype_t getCharCE(SEXP);\nSEXP mkCharCE(const char *, cetype_t);\nOnly CE_UTF8 and CE_LATIN1 are marked on CHARSXPs (and so Rf_getCharCE will only return one of the first three), and these should only be used on non-ASCII strings. Value CE_BYTES is used to make CHARSXPs which should be regarded as a set of bytes and not translated. Value CE_SYMBOL is used internally to indicate Adobe Symbol encoding. Value CE_ANY is used to indicate a character string that will not need re-encoding – this is used for character strings known to be in ASCII, and can also be used as an input parameter where the intention is that the string is treated as a series of bytes. (See the comments under mkChar about the length of input allowed.)\nFunction\nRboolean charIsASCII(SEXP);\ncan be used to detect whether a given CHARSXP represents an ASCII string. The implementation is equivalent to checking individual characters, but may be faster.\nFunction\nRboolean charIsUTF8(SEXP);\ncan be used to detect whether the internal representation of a given CHARSXP accessed via CHAR is UTF-8 (including ASCII). This function is rarely needed and specifically is not needed with translateCharUTF8, because such check is already included. However, when needed, it is better to use it in preference of getCharCE, as it is safer against future changes in the semantics of encoding marks and covers strings internally represented in the native encoding. Note that charIsUTF8() is not equivalent to getCharCE() == CE_UTF8.\nSimilarly, function\nRboolean charIsLatin1(SEXP);\ncan be used to detect whether the internal representation of a given CHARSXP accessed via CHAR is latin1 (including ASCII). It is not equivalent to to getCharCE() == CE_LATIN1.\nFunction\nconst char *reEnc(const char *x, cetype_t ce_in, cetype_t ce_out,\n int subst);\ncan be used to re-encode character strings: like translateChar it returns a string allocated by R_alloc. This can translate from CE_SYMBOL to CE_UTF8, but not conversely. Argument subst controls what to do with untranslatable characters or invalid input: this is done byte-by-byte with 1 indicates to output hex of the form &lt;a0&gt;, and 2 to replace by ., with any other value causing the byte to produce no output.\nThere is also\nSEXP mkCharLenCE(const char *, int, cetype_t);\nto create marked character strings of a given length.",
"text": "5.15 Character encoding issues\nCHARSXPs can be marked as coming from a known encoding (Latin-1 or UTF-8). This is mainly intended for human-readable output, and most packages can just treat such CHARSXPs as a whole. However, if they need to be interpreted as characters or output at C level then it would normally be correct to ensure that they are converted to the encoding of the current locale: this can be done by accessing the data in the CHARSXP by translateChar rather than by CHAR. If re-encoding is needed this allocates memory with R_alloc which thus persists to the end of the .Call/.External call unless vmaxset is used (see Transient storage allocation).\nThere is a similar function translateCharUTF8 which converts to UTF-8: this has the advantage that a faithful translation is almost always possible (whereas only a few languages can be represented in the encoding of the current locale unless that is UTF-8).\nBoth translateChar and translateCharUTF8 will translate any input, using escapes such as &lt;A9&gt; and &lt;U+0093&gt; to represent untranslatable parts of the input.\nThere is a public interface to the encoding marked on CHARSXPs via\ntypedef enum {CE_NATIVE, CE_UTF8, CE_LATIN1, CE_BYTES, CE_SYMBOL, CE_ANY} cetype_t;\ncetype_t getCharCE(SEXP);\nSEXP mkCharCE(const char *, cetype_t);\nOnly CE_UTF8 and CE_LATIN1 are marked on CHARSXPs (and so Rf_getCharCE will only return one of the first three), and these should only be used on non-ASCII strings. Value CE_BYTES is used to make CHARSXPs which should be regarded as a set of bytes and not translated. Value CE_SYMBOL is used internally to indicate Adobe Symbol encoding. Value CE_ANY is used to indicate a character string that will not need re-encoding – this is used for character strings known to be in ASCII, and can also be used as an input parameter where the intention is that the string is treated as a series of bytes. (See the comments under mkChar about the length of input allowed.)\nFunction\nRboolean charIsASCII(SEXP);\ncan be used to detect whether a given CHARSXP represents an ASCII string. The implementation is equivalent to checking individual characters, but may be faster.\nFunction\nRboolean charIsUTF8(SEXP);\ncan be used to detect whether the internal representation of a given CHARSXP accessed via CHAR is UTF-8 (including ASCII). This function is rarely needed and specifically is not needed with translateCharUTF8, because such check is already included. However, when needed, it is better to use it in preference of getCharCE, as it is safer against future changes in the semantics of encoding marks and covers strings internally represented in the native encoding. Note that charIsUTF8() is not equivalent to getCharCE() == CE_UTF8.\nSimilarly, function\nRboolean charIsLatin1(SEXP);\ncan be used to detect whether the internal representation of a given CHARSXP accessed via CHAR is latin1 (including ASCII). It is not equivalent to getCharCE() == CE_LATIN1.\nFunction\nconst char *reEnc(const char *x, cetype_t ce_in, cetype_t ce_out,\n int subst);\ncan be used to re-encode character strings: like translateChar it returns a string allocated by R_alloc. This can translate from CE_SYMBOL to CE_UTF8, but not conversely. Argument subst controls what to do with untranslatable characters or invalid input: this is done byte-by-byte with 1 indicates to output hex of the form &lt;a0&gt;, and 2 to replace by ., with any other value causing the byte to produce no output.\nThere is also\nSEXP mkCharLenCE(const char *, int, cetype_t);\nto create marked character strings of a given length.",
"crumbs": [
"<span class='chapter-number'>5</span>  <span class='chapter-title'>System and foreign language interfaces</span>"
]
Expand Down
2 changes: 1 addition & 1 deletion r-exts/site_libs/bootstrap/bootstrap.min.css

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion r-intro/site_libs/bootstrap/bootstrap.min.css

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion r-ints/site_libs/bootstrap/bootstrap.min.css

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion r-lang/site_libs/bootstrap/bootstrap.min.css

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion site_libs/bootstrap/bootstrap.min.css

Large diffs are not rendered by default.

0 comments on commit eb5deef

Please sign in to comment.