Skip to content

Commit

Permalink
apacheGH-41402: [CI][R] Update our backwards compatibility CI any oth…
Browse files Browse the repository at this point in the history
…er R 4.4 cleanups (apache#41403)

### Rationale for this change

Keep up with the state of the world, ensure we are maintaining backwards compatibility.

Resolves apache#41402

### What changes are included in this PR?

* Bump to 4.4 as the release
* Remove old 3.6 jobs now that we no longer support that; clean up code where we hardcode things fro 3.6 and below
* Move many of our CI jobs to [rhub's new containers](https://github.com/r-hub/containers). We were accidentally running stale R devel (from December 2023) because the other rhub images stopped being updated. (One exception to be done as a follow on: apache#41416)
* Resolve a number of extended test failures

With this PR R extended tests should be all green with the exceptions of:

* Two sanitizer jobs (test-fedora-r-clang-sanitizer, test-ubuntu-r-sanitizer) — which are being investigated / fixed in apache#41421
* Valgrind — I'm running one last run with a new suppression file. 
* Binary jobs — these work but fail at upload, see apache#41403 (comment)
* Windows R Release — failing on main, apache#41398

### Are these changes tested?

By definition.

### Are there any user-facing changes?

No.

* GitHub Issue: apache#41402

Lead-authored-by: Jonathan Keane <[email protected]>
Co-authored-by: Jacob Wujciak-Jens <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
  • Loading branch information
jonkeane and assignUser authored Apr 29, 2024
1 parent ff67979 commit 6eb0b37
Show file tree
Hide file tree
Showing 31 changed files with 139 additions and 164 deletions.
6 changes: 3 additions & 3 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,12 @@ NUMBA=latest
NUMPY=latest
PANDAS=latest
PYTHON=3.8
R=4.2
R=4.4
SPARK=master
TURBODBC=latest

# These correspond to images on Docker Hub that contain R, e.g. rhub/ubuntu-gcc-release:latest
R_IMAGE=ubuntu-gcc-release
# These correspond to images on Docker Hub that contain R, e.g. rhub/ubuntu-release:latest
R_IMAGE=ubuntu-release
R_ORG=rhub
R_TAG=latest

Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/r.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ jobs:
strategy:
fail-fast: false
matrix:
r: ["4.3"]
r: ["4.4"]
ubuntu: [20.04]
force-tests: ["true"]
env:
Expand Down Expand Up @@ -192,7 +192,7 @@ jobs:
fail-fast: false
matrix:
config:
- { org: "rhub", image: "debian-gcc-devel", tag: "latest", devtoolset: "" }
- { org: "rhub", image: "ubuntu-gcc12", tag: "latest", devtoolset: "" }
env:
R_ORG: ${{ matrix.config.org }}
R_IMAGE: ${{ matrix.config.image }}
Expand Down
2 changes: 1 addition & 1 deletion ci/docker/linux-apt-docs.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
ARG base
FROM ${base}

ARG r=4.2
ARG r=4.4
ARG jdk=8

# See R install instructions at https://cloud.r-project.org/bin/linux/ubuntu/
Expand Down
2 changes: 1 addition & 1 deletion ci/docker/linux-apt-lint.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ RUN apt-get update && \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

ARG r=4.2
ARG r=4.4
RUN wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | \
tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc && \
# NOTE: Only R >= 4.0 is available in this repo
Expand Down
2 changes: 1 addition & 1 deletion ci/docker/linux-apt-r.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ ENV LANG=C.UTF-8
# Build R
# [1] https://www.digitalocean.com/community/tutorials/how-to-install-r-on-ubuntu-18-04
# [2] https://linuxize.com/post/how-to-install-r-on-ubuntu-18-04/#installing-r-packages-from-cran
ARG r=3.6
ARG r=4.4
RUN apt-get update -y && \
apt-get install -y \
dirmngr \
Expand Down
20 changes: 19 additions & 1 deletion ci/etc/valgrind-cran.supp
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
# under the License.

{
# `testthat::skip()`s cause a valgrind error that does not show up on CRAN.
# `testthat::skip()`s cause a valgrind error that does not show up on CRAN.
<testthat_skip_error>
Memcheck:Cond
fun:gregexpr_Regexc
Expand All @@ -32,3 +32,21 @@
fun:getvar
fun:bcEval
}
{
# This also doesn't seem to cause issues on CRAN, so suppress it.
<Memcheck_possible_leak>
Memcheck:Leak
match-leak-kinds: possible
fun:malloc
fun:libdeflate_alloc_compressor
fun:do_memCompress
fun:bcEval_loop
fun:bcEval
fun:Rf_eval
fun:R_execClosure
fun:applyClosure_core
fun:Rf_applyClosure
fun:Rf_eval
fun:do_set
fun:Rf_eval
}
4 changes: 2 additions & 2 deletions ci/scripts/r_sanitize.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ export UBSAN_OPTIONS="print_stacktrace=1,suppressions=/arrow/r/tools/ubsan.supp"

# run tests
pushd tests
${R_BIN} < testthat.R > testthat.out 2>&1 || { cat testthat.out; exit 1; }
${R_BIN} --no-save < testthat.R > testthat.out 2>&1 || { cat testthat.out; exit 1; }

cat testthat.out
if grep -q "runtime error" testthat.out; then
Expand All @@ -58,7 +58,7 @@ fi

# run examples
popd
${R_BIN} -e 'library(arrow); testthat::test_examples(".")' >> examples.out 2>&1 || { cat examples.out; exit 1; }
${R_BIN} --no-save -e 'library(arrow); testthat::test_examples(".")' >> examples.out 2>&1 || { cat examples.out; exit 1; }

cat examples.out
if grep -q "runtime error" examples.out; then
Expand Down
7 changes: 6 additions & 1 deletion ci/scripts/r_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,12 @@ if [ "$ARROW_USE_PKG_CONFIG" != "false" ]; then
export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${LD_LIBRARY_PATH}
export R_LD_LIBRARY_PATH=${LD_LIBRARY_PATH}
fi
export _R_CHECK_COMPILATION_FLAGS_KNOWN_=${ARROW_R_CXXFLAGS}

export _R_CHECK_COMPILATION_FLAGS_KNOWN_="${_R_CHECK_COMPILATION_FLAGS_KNOWN_} ${ARROW_R_CXXFLAGS}"
# These should generally be picked up, but are slightly wrong in rhub's containers it appears
# https://github.com/r-hub/containers/pull/63
export _R_CHECK_COMPILATION_FLAGS_KNOWN_="${_R_CHECK_COMPILATION_FLAGS_KNOWN_} -Wno-parentheses -Werror=format-security -Wp,-D_FORTIFY_SOURCE=3"

if [ "$ARROW_R_DEV" = "TRUE" ]; then
# These are sometimes used in the Arrow C++ build and are not a problem
export _R_CHECK_COMPILATION_FLAGS_KNOWN_="${_R_CHECK_COMPILATION_FLAGS_KNOWN_} -Wno-attributes -msse4.2 -Wno-noexcept-type -Wno-subobject-linkage"
Expand Down
2 changes: 1 addition & 1 deletion ci/scripts/r_valgrind.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ ${R_BIN} CMD INSTALL ${INSTALL_ARGS} arrow*.tar.gz
pushd tests

# to generate suppression files run:
# ${R_BIN} --vanilla -d "valgrind --tool=memcheck --leak-check=full --track-origins=yes --gen-suppressions=all --log-file=memcheck.log" -f testthat.supp
# ${R_BIN} --vanilla -d "valgrind --tool=memcheck --leak-check=full --track-origins=yes --gen-suppressions=all --log-file=memcheck.log" -f testthat.R
${R_BIN} --vanilla -d "valgrind --tool=memcheck --leak-check=full --track-origins=yes --suppressions=/${1}/ci/etc/valgrind-cran.supp" -f testthat.R |& tee testthat.out

# valgrind --error-exitcode=1 should return an erroring exit code that we can catch,
Expand Down
2 changes: 2 additions & 0 deletions dev/tasks/r/github.linux.arrow.version.back.compat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ jobs:
config:
# We use the R version that was released at the time of the arrow release in order
# to make sure we can download binaries from RSPM.
- { old_arrow_version: '14.0.2.1', r: '4.3' }
- { old_arrow_version: '13.0.0.1', r: '4.3' }
- { old_arrow_version: '12.0.1.1', r: '4.3' }
- { old_arrow_version: '11.0.0.3', r: '4.2' }
- { old_arrow_version: '10.0.1', r: '4.2' }
Expand Down
2 changes: 1 addition & 1 deletion dev/tasks/r/github.linux.offline.build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ jobs:
{{ macros.github_set_sccache_envvars()|indent(8)}}
run: |
cd arrow/r
R CMD INSTALL --install-tests --no-test-load --no-docs --no-help --no-byte-compile arrow_with_deps.tar.gz
R CMD INSTALL --install-tests --no-test-load --no-byte-compile arrow_with_deps.tar.gz
- name: Run the tests
run: R -e 'if(tools::testInstalledPackage("arrow") != 0L) stop("There was a test failure.")'
- name: Dump test logs
Expand Down
2 changes: 1 addition & 1 deletion dev/tasks/r/github.linux.versions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ jobs:
r_version:
# We test devel, release, and oldrel in regular CI.
# This is for older versions
- "3.6"
- "4.0"
- "4.1"
- "4.2"
env:
R_ORG: "rstudio"
R_IMAGE: "r-base"
Expand Down
10 changes: 4 additions & 6 deletions dev/tasks/r/github.packages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ jobs:
working-directory: 'arrow'
extra-packages: cpp11
- name: Set CRAN like openssl
if: contains(matrix.platform.runs_on, 'arm64')
if: contains(matrix.platform.name, 'arm64')
run: |
# The arm64 runners contain openssl 1.1.1t in this path that is always included first so we need to override the
# default setting of the brew --prefix as root dir to avoid version conflicts.
Expand Down Expand Up @@ -300,16 +300,14 @@ jobs:
# an OS that is not in the allowlist, so we have to opt-in to use the
# binary. Other env vars used in r_docker_configure.sh can be added
# here (like devtoolset) and wired up in the later steps.
- {image: "rhub/debian-clang-devel", libarrow_binary: "TRUE"}
- {image: "rhub/ubuntu-clang", libarrow_binary: "TRUE"}
# fedora-clang-devel cannot use binaries bc of libc++ (uncomment to see the error)
# - {image: "rhub/fedora-clang-devel", libarrow_binary: "TRUE"}
- {image: "rhub/ubuntu-gcc-release"} # currently ubuntu-20.04 (focal)
- {image: "rocker/r-ubuntu:22.04"} # openssl3
- {image: "rocker/r-ver"} # whatever is latest ubuntu LTS
- {image: "rhub/ubuntu-release"} # currently ubuntu-22.04
- {image: "rocker/r-ver:4.0.0"} # ubuntu-20.04
- {image: "rocker/r-ver:3.6.3", libarrow_binary: "TRUE"} # debian:buster (10)
- {image: "rstudio/r-base:4.1-focal"} # ubuntu-20.04
- {image: "rstudio/r-base:4.2-centos7", devtoolset: "8"}
- {image: "rstudio/r-base:4.3-noble"}
steps:
# Get the arrow checkout just for the docker config scripts
# Don't need submodules for this (hence false arg to macro): they fail on
Expand Down
12 changes: 6 additions & 6 deletions dev/tasks/tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -888,12 +888,12 @@ tasks:
- r-lib__libarrow__bin__darwin-arm64-openssl-3.0__arrow-{no_rc_r_version}\.zip
- r-lib__libarrow__bin__darwin-x86_64-openssl-1.1__arrow-{no_rc_r_version}\.zip
- r-lib__libarrow__bin__darwin-x86_64-openssl-3.0__arrow-{no_rc_r_version}\.zip
- r-pkg__bin__windows__contrib__4.4__arrow_{no_rc_r_version}\.zip
- r-pkg__bin__windows__contrib__4.3__arrow_{no_rc_r_version}\.zip
- r-pkg__bin__windows__contrib__4.2__arrow_{no_rc_r_version}\.zip
- r-pkg__bin__macosx__big-sur-x86_64__contrib__4.4__arrow_{no_rc_r_version}\.tgz
- r-pkg__bin__macosx__big-sur-x86_64__contrib__4.3__arrow_{no_rc_r_version}\.tgz
- r-pkg__bin__macosx__contrib__4.2__arrow_{no_rc_r_version}\.tgz
- r-pkg__bin__macosx__big-sur-arm64__contrib__4.4__arrow_{no_rc_r_version}\.tgz
- r-pkg__bin__macosx__big-sur-arm64__contrib__4.3__arrow_{no_rc_r_version}\.tgz
- r-pkg__bin__macosx__big-sur-arm64__contrib__4.2__arrow_{no_rc_r_version}\.tgz
- r-pkg__src__contrib__arrow_{no_rc_r_version}\.tar\.gz


Expand Down Expand Up @@ -1356,7 +1356,7 @@ tasks:
r_tag: latest
r_custom_ccache: true

{% for r_org, r_image, r_tag in [("rhub", "ubuntu-gcc-release", "latest"),
{% for r_org, r_image, r_tag in [("rhub", "ubuntu-release", "latest"),
("rocker", "r-ver", "latest"),
("rstudio", "r-base", "4.2-focal"),
("rstudio", "r-base", "4.1-opensuse153")] %}
Expand All @@ -1377,9 +1377,9 @@ tasks:
template: r/azure.linux.yml
params:
r_org: rhub
r_image: debian-gcc-devel-lto
r_image: gcc13
r_tag: latest
flags: '-e NOT_CRAN=false -e INSTALL_ARGS=--use-LTO'
flags: '-e INSTALL_ARGS=--use-LTO'

# This one has -flto=auto
test-r-ubuntu-22.04:
Expand Down
5 changes: 3 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1459,8 +1459,8 @@ services:
# (including building the C++ library) on any Docker image that contains R
#
# Usage:
# R_ORG=rhub R_IMAGE=ubuntu-gcc-release R_TAG=latest docker-compose build r
# R_ORG=rhub R_IMAGE=ubuntu-gcc-release R_TAG=latest docker-compose run r
# R_ORG=rhub R_IMAGE=ubuntu-release R_TAG=latest docker-compose build r
# R_ORG=rhub R_IMAGE=ubuntu-release R_TAG=latest docker-compose run r
image: ${REPO}:r-${R_ORG}-${R_IMAGE}-${R_TAG}
build:
context: .
Expand Down Expand Up @@ -1523,6 +1523,7 @@ services:
cache_from:
- ${REPO}:r-rhub-fedora-clang-devel-latest
args:
# TODO: change this to rhub/clang-asan
base: rhub/fedora-clang-devel-san
r_dev: ${ARROW_R_DEV}
devtoolset_version: ${DEVTOOLSET_VERSION}
Expand Down
2 changes: 1 addition & 1 deletion r/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Description: 'Apache' 'Arrow' <https://arrow.apache.org/> is a cross-language
language-independent columnar memory format for flat and hierarchical data,
organized for efficient analytic operations on modern hardware. This
package provides an interface to the 'Arrow C++' library.
Depends: R (>= 3.4)
Depends: R (>= 4.0)
License: Apache License (>= 2.0)
URL: https://github.com/apache/arrow/, https://arrow.apache.org/docs/r/
BugReports: https://github.com/apache/arrow/issues
Expand Down
2 changes: 1 addition & 1 deletion r/R/dplyr-funcs-type.R
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ register_bindings_type_cast <- function() {
fix.empty.names = TRUE,
stringsAsFactors = FALSE) {
# we need a specific value of stringsAsFactors because the default was
# TRUE in R <= 3.6
# TRUE in R <= 3.6 and folks might still be cargoculting to stay in the past.
if (!identical(stringsAsFactors, FALSE)) {
arrow_not_supported("stringsAsFactors = TRUE")
}
Expand Down
14 changes: 0 additions & 14 deletions r/R/util.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,6 @@
# specific language governing permissions and limitations
# under the License.

# for compatibility with R versions earlier than 4.0.0
if (!exists("deparse1")) {
deparse1 <- function(expr, collapse = " ", width.cutoff = 500L, ...) {
paste(deparse(expr, width.cutoff, ...), collapse = collapse)
}
}

# for compatibility with R versions earlier than 3.6.0
if (!exists("str2lang")) {
str2lang <- function(s) {
parse(text = s, keep.source = FALSE)[[1]]
}
}

oxford_paste <- function(x,
conjunction = "and",
quote = TRUE,
Expand Down
5 changes: 0 additions & 5 deletions r/tests/testthat/test-Array.R
Original file line number Diff line number Diff line change
Expand Up @@ -818,11 +818,6 @@ test_that("Handling string data with embedded nuls", {
)
array_with_nul <- arrow_array(raws)$cast(utf8())

# The behavior of the warnings/errors is slightly different with and without
# altrep. Without it (i.e. 3.5.0 and below, the error would trigger immediately
# on `as.vector()` where as with it, the error only happens on materialization)
skip_on_r_older_than("3.6")

# no error on conversion, because altrep laziness
v <- expect_error(as.vector(array_with_nul), NA)

Expand Down
16 changes: 4 additions & 12 deletions r/tests/testthat/test-RecordBatch.R
Original file line number Diff line number Diff line change
Expand Up @@ -595,14 +595,10 @@ test_that("RecordBatch supports cbind", {
)

# Rejects Table and ChunkedArray arguments
if (getRversion() >= "4.0.0") {
# R 3.6 cbind dispatch rules cause cbind to fall back to default impl if
# there are multiple arguments with distinct cbind implementations
expect_error(
cbind(record_batch(a = 1:2), arrow_table(b = 3:4)),
regexp = "Cannot cbind a RecordBatch with Tables or ChunkedArrays"
)
}
expect_error(
cbind(record_batch(a = 1:2), arrow_table(b = 3:4)),
regexp = "Cannot cbind a RecordBatch with Tables or ChunkedArrays"
)
expect_error(
cbind(record_batch(a = 1:2), b = chunked_array(1, 2)),
regexp = "Cannot cbind a RecordBatch with Tables or ChunkedArrays"
Expand All @@ -622,10 +618,6 @@ test_that("Handling string data with embedded nuls", {
batch_with_nul <- record_batch(a = 1:5, b = raws)
batch_with_nul$b <- batch_with_nul$b$cast(utf8())

# The behavior of the warnings/errors is slightly different with and without
# altrep. Without it (i.e. 3.5.0 and below, the error would trigger immediately
# on `as.vector()` where as with it, the error only happens on materialization)
skip_on_r_older_than("3.6")
df <- as.data.frame(batch_with_nul)

expect_error(
Expand Down
4 changes: 0 additions & 4 deletions r/tests/testthat/test-Table.R
Original file line number Diff line number Diff line change
Expand Up @@ -581,10 +581,6 @@ test_that("Table supports cbind", {
})

test_that("cbind.Table handles record batches and tables", {
# R 3.6 cbind dispatch rules cause cbind to fall back to default impl if
# there are multiple arguments with distinct cbind implementations
skip_if(getRversion() < "4.0.0", "R 3.6 cbind dispatch rules prevent this behavior")

expect_equal(
cbind(arrow_table(a = 1L:2L), record_batch(b = 4:5)),
arrow_table(a = 1L:2L, b = 4:5)
Expand Down
7 changes: 5 additions & 2 deletions r/tests/testthat/test-altrep.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,6 @@
# specific language governing permissions and limitations
# under the License.

skip_on_r_older_than("3.6")

test_that("altrep test functions do not include base altrep", {
expect_false(is_arrow_altrep(1:10))
expect_identical(test_arrow_altrep_is_materialized(1:10), NA)
Expand Down Expand Up @@ -373,6 +371,11 @@ test_that("altrep min/max/sum identical to R versions for double", {
expect_altrep_roundtrip(x, max)
expect_altrep_roundtrip(x, sum)

# On valgrind the NA_real_ is sometimes transformed to NaN
# https://stat.ethz.ch/pipermail/r-devel/2021-April/080683.html
# so we skip these there to avoid complicated NA == NaN logic,
# and they are tested on a number of other platforms / conditions
skip_on_linux_devel()
x <- c(1, 2, NA_real_)
expect_altrep_roundtrip(x, min, na.rm = TRUE)
expect_altrep_roundtrip(x, max, na.rm = TRUE)
Expand Down
5 changes: 0 additions & 5 deletions r/tests/testthat/test-chunked-array.R
Original file line number Diff line number Diff line change
Expand Up @@ -475,11 +475,6 @@ test_that("Handling string data with embedded nuls", {
)
chunked_array_with_nul <- ChunkedArray$create(raws)$cast(utf8())

# The behavior of the warnings/errors is slightly different with and without
# altrep. Without it (i.e. 3.5.0 and below, the error would trigger immediately
# on `as.vector()` where as with it, the error only happens on materialization)
skip_on_r_older_than("3.6")

v <- expect_error(as.vector(chunked_array_with_nul), NA)

expect_error(
Expand Down
10 changes: 0 additions & 10 deletions r/tests/testthat/test-dplyr-collapse.R
Original file line number Diff line number Diff line change
Expand Up @@ -185,16 +185,6 @@ See $.data for the source Arrow object",
fixed = TRUE
)

skip_if(getRversion() < "3.6.0", "TODO investigate why these aren't equal")
# On older R versions:
# ── Failure (test-dplyr-collapse.R:172:3): Properties of collapsed query ────────
# head(q, 1) %>% collect() not equal to tibble::tibble(lgl = FALSE, total = 8L, extra = 40).
# Component "total": Mean relative difference: 0.3846154
# Component "extra": Mean relative difference: 0.3846154
# ── Failure (test-dplyr-collapse.R:176:3): Properties of collapsed query ────────
# tail(q, 1) %>% collect() not equal to tibble::tibble(lgl = NA, total = 25L, extra = 125).
# Component "total": Mean relative difference: 0.9230769
# Component "extra": Mean relative difference: 0.9230769
expect_equal(
q %>%
arrange(lgl) %>%
Expand Down
Loading

0 comments on commit 6eb0b37

Please sign in to comment.