- Updated Rust Polars to 0.44.2 (#1271).
- Minimum supported Rust version (MSRV) is now 1.82.0.
$reshape()
'snested_type
argument is removed.$approx_n_unique()
no longer works on Categorical type.
<Series>$compare()
is removed. (#1272)
- Passing a single data.frame to
pl$DataFrame()
orpl$LazyFrame()
to convert a data.frame to a polars DataFrame or LazyFrame is deprecated and a warning will be shown. Useas_polars_df()
oras_polars_lf()
instead (#1275).
- Maintain level order when converting Enums to factors (#1252, @andyquinterom).
- Updated Rust Polars to 0.43.1 (#1230).
- In
pl$scan_ipc()
andpl$read_ipc()
, the argumentmemory_map
is removed (#1230). - In
$serialize()
, in the fieldschema
, the fieldinner
is renamedfields
, and the fieldsoutput_schema
andfilter
are removed (#1230).
- New method
$cast()
forDataFrame
andLazyFrame
(#1219). - New argument
strict
in$drop()
to determine whether unknown column names should trigger an error (#1220). - New method
$to_dummies()
forDataFrame
(#1225). - New argument
include_file_paths
inpl_scan_csv()
andpl_read_csv()
(#1235). - New method
$join_where()
forDataFrame
andLazyFrame
to perform inequality joins (#1237).
- Converting data of datatype
Null
to R doesn't error anymore. It now creates a column filled withNA
(#1217).
- This is a maintenance release. No user facing changes.
- Updated Rust Polars to unreleased 2024-08-20, after 0.42.0 (#1183).
$describe_plan()
and$describe_optimized_plan()
are removed. Use respectively$explain(optimized = FALSE)
and$explain()
instead (#1182).- The parameter
inherit_optimization
is removed from all functions that had it (#1183). - In
$write_parquet()
and$sink_parquet()
, the parameterdata_pagesize_limit
is renameddata_page_size
(#1183). - The LazyFrame method
$get_optimization_toggle()
is removed, and$set_optimization_toggle()
is renamed$optimization_toggle()
(#1183). - In
$unpivot()
, the parameterstreamable
is removed (#1183). - Some functions have a parameter
future
that determines the compatibility level when exporting Polars' internal data structures. This parameter is renamedcompat_level
, which takesFALSE
for the oldest flavor (more compatible) andTRUE
for the newest one (less compatible). It can also take an integer determining a specific compatibility level when more are added in the future. For now,future = FALSE
can be replaced bycompat_level = FALSE
(#1183). - In
$scan_parquet()
and$read_parquet()
, the default value ofhive_partitioning
is nowNULL
(#1189). - In
$dt$epoch()
, the argumenttu
is renamed totime_unit
(#1196). - In
$fill_nan()
forDataFrame
,LazyFrame
andExpr
, the argument is renamedvalue
(#1198). $shift_and_fill()
is removed and replaced by a new argumentfill_value
in$shift()
.$shift_and_fill(fill_value, periods)
can be replaced by$shift(n, fill_value)
(#1201).- In
$shift()
for variousExpr
, the argumentperiods
is renamedn
(#1201). - In
$clip()
, argumentsmin
andmax
are renamedlower_bound
andupper_bound
(#1203). $clip_min()
and$clip_max()
are removed. Use$clip()
with onlylower_bound
orupper_bound
instead (#1203).- In
$write_csv
and$sink_csv()
, the argumentquote
is renamedquote_char
(#1206).
- New method
$str$extract_many()
(#1163). - Converting a
nanoarrow_array
with zero rows to anRPolarsDataFrame
viaas_polars_df()
now keeps the original schema (#1177). $write_parquet()
has two new argumentspartition_by
andpartition_chunk_size_bytes
to write aDataFrame
to a hive-partitioned directory (#1183).- New method
$bin$size()
(#1183). - In
$scan_parquet()
and$read_parquet()
, theparallel
argument can take the new value"prefiltered"
(#1183). $scan_parquet()
,$scan_ipc()
and$read_parquet()
have a new argumentinclude_file_paths
to automatically add a column containing the path to the source file(s) (#1183).$scan_ipc()
can read a hive-partitioned directory with its new argumentshive_partitioning
,hive_schema
, andtry_parse_hive_dates
(#1183).$scan_parquet()
and$read_parquet()
gain two new arguments for more control on importing hive partitions:hive_schema
andtry_parse_hive_dates
(#1189).- New method
$gather_every()
forLazyFrame
andDataFrame
(#1199). $glimpse()
forDataFrame
has two new argumentsmax_items_per_column
andmax_colname_length
(#1200).- New method
$list$sample()
(#1204). - New argument
coalesce
in$join_asof()
(#1205). - New argument
maintain_order
in$list$unique()
(#1207).
- In
$unnest()
forDataFrame
andLazyFrame
, thenames
argument is removed and replaced by...
. This doesn't change the previous behavior, e.g.df$unnest(names = c("a", "b"))
still works (#1170).
- Updated Rust Polars to 0.41.3 (#1147, #1156).
- In
$n_chunks()
, the default value ofstrategy
now is"first"
(#1137). $sample()
for Expr and DataFrame (#1136):- the argument
frac
is renamedfraction
; - all the arguments except
n
must be named; - for the Expr method only, the first argument is now
n
(it was already the case for the DataFrame method); - for the Expr method only, the default value for
with_replacement
is nowFALSE
(it was already the case for the DataFrame method).
- the argument
$melt()
had several changes (#1147):melt()
is renamed$unpivot()
.- Some arguments were renamed:
id_vars
is nowindex
,value_vars
is nowon
. - The order of arguments has changed:
on
is now first, thenindex
. The order of the other arguments hasn't changed. Note thaton
can be unnamed but all the other arguments must be named.
pivot()
had several changes (#1147):- The argument
columns
is renamedon
. - The order of arguments has changed:
on
is now first, thenindex
andvalues
. The order of the other arguments hasn't changed. Note thaton
can be unnamed but all the other arguments must be named.
- The argument
- In
$write_parquet()
and$sink_parquet()
, the default value of argumentstatistics
is nowTRUE
and can take other values thanTRUE/FALSE
(#1147). - In
$dt$truncate()
and$dt$round()
, the argumentoffset
has been removed. Use$dt$offset_by()
after those functions instead (#1147). - In
$top_k()
and$bottom_k()
forExpr
, the argumentsnulls_last
,maintain_order
andmultithreaded
have been removed. If anynull
values are in the top/bottomk
values, they will always be positioned last (#1147). $replace()
has been split in two functions depending on the desired behaviour (#1147):$replace()
recodes some values in the column, leaving all other values unchanged. Compared to the previous version, it doesn't use the argumentsdefault
andreturn_dtype
anymore.$replace_strict()
replaces all values by different values. If a value doesn't have a specific mapping, it is replaced by thedefault
value.
$str$concat()
is deprecated, use$str$join()
(with the same arguments) instead (#1147).- In
pl$date_range()
andpl$date_ranges()
, the argumentstime_unit
andtime_zone
have been removed. They were deprecated in previous versions (#1147). - In
$join()
, whenhow = "cross"
,on
,left_on
andright_on
must beNULL
(#1147).
- New method
$has_nulls()
(#1133). - New method
$list$explode()
(#1139). $over()
gains a new argumentorder_by
to specify the order of values within each group. This is useful when the operation depends on the order of values, such as$shift()
(#1147).$value_counts()
gains an argumentnormalize
to give relative frequencies of unique values instead of their count (#1147).
- Updated Rust Polars to unreleased version (> 0.40.0) (#1104, #1110, #1117, #1124):
- In
$join()
, there is a new argumentcoalesce
and thehow
options now accept"full"
instead of"outer"
and"outer_coalesce"
. $top_k()
and$bottom_k()
gain three argumentsnulls_last
,maintain_order
andmultithreaded
.- All
$rolling_*()
functions lose the argumentsby
,closed
andwarn_if_unsorted
. Rolling computations based onby
must be made via the correspondingrolling_*_by()
, e.grolling_mean_by()
instead ofrolling_mean(by =)
(#1115). pl$scan_parquet()
andpl$read_parquet()
gain an argumentglob
which defaults toTRUE
. Set it toFALSE
to avoid considering*
as a globing pattern.$is_not_nan()
on anull
value (NA
in R) now returnsnull
. Previously, it returnedTRUE
.- In
$reshape()
, argumentdims
is renameddimensions
and there is a new argumentnested_type
specifying if the output should be of type List or Array. - In
$value_counts()
, all arguments must be named and there is a new argumentname
to specify the name of the output. - In all functions accepting optimization parameter (such as
projection_pushdown
), there is a new parametercluster_with_columns
to combine sequential independent calls to$with_columns()
. $str$explode()
is removed.- The
check_sorted
argument is removed from$rolling()
and$group_by_dynamic()
. Sortedness is now verified in a quick manner, so this argument is no longer needed (pola-rs/polars#16494). $name$map()
stacks on Linux, so this method is deprecated and the document is removed. Please use other methods like<LazyFrame>$rename(<function>)
instead (#1123).
- In
- As warned in v0.16.0, the order of arguments in
pl$Series
is changed (#1071). The first argument is nowname
, and the second argument isvalues
. $to_struct()
on an Expr is removed. This method is now only available forSeries
,DataFrame
, and in the$list
and$arr
subnamespaces. For example,pl$col("a", "b", "c")$to_struct()
should be replaced withpl$struct(c("a", "b", "c"))
(#1092).pl$Struct()
now only accepts named inputs and objects of classRPolarsField
. For example,pl$Struct(pl$Boolean)
doesn't work anymore and should be named likepl$Struct(a = pl$Boolean)
(#1053).- In
$all()
and$any()
, the argumentdrop_nulls
is renamedignore_nulls
, and this argument must be named (#1050). - New method
$struct$with_fields()
(#1109) and new functionpl$field()
to be used in expressions in$struct$with_fields()
(#1113). - New methods for
RPolarsDataType
:$is_enum()
,$is_categorical()
,$is_known()
,$is_string()
,$contains_views()
,$contains_categorical()
(#1112). - In
$dt$combine()
, the argumentstm
andtu
are renamedtime
andtime_unit
(#1116). - The default value of the
rechunk
argument ofpl$concat()
is changed fromTRUE
toFALSE
(#1125). - In
$rename()
for LazyFrame and DataFrame, key-value pairs of names are changed toold_name = "new_name"
instead ofnew_name = "old_name"
(#1129). - In
$rename()
for LazyFrame and DataFrame, no argument is not allowed (#1129). - In all
$rolling_*()
functions, the argumentscenter
andddof
must be named (#1115).
- Allow specify a function in
$rename()
for LazyFrame and DataFrame. They are equivalent topolars.LazyFrame.rename(mapping: Callable[[str], str])
orpolars.DataFrame.rename(mapping: Callable[[str], str])
in Python Polars (#1122, #1129).
pl$read_ipc()
can read a raw vector of Apache Arrow IPC file (#1072).- New method
<DataFrame>$to_raw_ipc()
to serialize a DataFrame to a raw vector of Apache Arrow IPC file format (#1072). - New method
<LazyFrame>$serialize()
to serialize a LazyFrame to a character vector of JSON representation (#1073). - New function
pl$deserialize_lf()
to deserialize a LazyFrame from a character vector of JSON representation (#1073). - New methods
$str$head()
and$str$tail()
(#1074). - New S3 methods
nanoarrow::as_nanoarrow_array_stream()
andnanoarrow::infer_nanoarrow_schema()
forRPolarsSeries
(#1076). - New method
$dt$is_leap_year()
(#1077). as_polars_df()
andas_polars_series()
supportsarrow::RecordBatchReader
(#1078).- The new
experimental
argument foras_polars_df(<ArrowTabular>)
,as_polars_df(<RecordBatchReader>)
,as_polars_series(<nanoarrow_array_stream>)
, andas_polars_df(<nanoarrow_array_stream>)
(#1078). Ifexperimental = TRUE
, these functions switch to use the Arrow C stream interface internally. At this point, the performance is degraded under the expected use cases, so the default is set toexperimental = FALSE
.
- New method
<SQLContext>$register_globals()
(#1064). - New experimental method
$sql()
for DataFrame and LazyFrame (#1065).
- Move the API document website to the new place (#1067, #1068).
Access to the old website is set to redirect to the top page of the new website.
- Old URL:
https://rpolars.github.io/
- New URL:
https://pola-rs.github.io/r-polars/
- Old URL:
$cut()
and$qcut()
to bin continuous values into discrete categories (#1057).pl$scan_parquet()
andpl$read_parquet()
can read data from the internet by specifying a URL to the first argument (#1056, @andyquinterom).pl$scan_parquet()
andpl$read_parquet()
gain an argumentstorage_options
to scan/read data via cloud storage providers (GCP, AWS, Azure). Note that this support is experimental (#1056, @andyquinterom).- Add support for the
Enum
datatype viapl$Enum()
(#1061).
- In some read/scan functions, downloading files could fail if the URL was too long. This is now fixed (#1049, @DyfanJones).
This is a small hot-fix release to update dependent Rust Polars to 0.39.1 (#1042).
Also, there are some updates.
$len()
now correctly includesnull
values in the count (#1044).
$arr$max()
and$arr$min()
work without thenightly
feature (#1042).
-
Rust Polars is updated to 0.39.0 (#937, #1034).
-
R objects inside an R list are now converted to Polars data types via
as_polars_series()
(#1021, #1022, #1023). For example, up to polars 0.15.1, a list containing a data.frame with a column of{clock}
naive-time class was converted to a nested List type of Float64:data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day")) pl$select( nested_data = pl$lit(list(data)) ) #> shape: (1, 1) #> ┌──────────────────────────┐ #> │ nested_data │ #> │ --- │ #> │ list[list[list[f64]]] │ #> ╞══════════════════════════╡ #> │ [[[2.1475e9], [7305.0]]] │ #> └──────────────────────────┘
From 0.16.0, nested types are correctly converted, so that will be a List type of Struct type containing a Datetime type.
data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day")) pl$select( nested_data = pl$lit(list(data)) ) #> shape: (1, 1) #> ┌─────────────────────────┐ #> │ nested_data │ #> │ --- │ #> │ list[struct[1]] │ #> ╞═════════════════════════╡ #> │ [{1990-01-01 00:00:00}] │ #> └─────────────────────────┘
-
Several functions have been rewritten to match the behavior of Python Polars. There are four types of changes: a) change in argument names, b) change in the way arguments are passed (named or by position), c) arguments are removed, and d) change in the default and accepted values. Those are addressed separately below.
-
Change in argument names:
- In
$reshape()
, thedims
argument is renamed todimensions
(#1019). - In
pl$read_*
andpl$scan_*
functions, the first argument is nowsource
(#935). - In
pl$Series()
, the argumentx
is renamedvalues
(#933). - In
<DataFrame>$write_*
functions, the first argument is nowfile
(#935). - In
<LazyFrame>$sink_*
functions, the first argument is nowpath
(#935). - In
<LazyFrame>$sink_ipc()
, the argumentmemmap
is renamed tomemory_map
(#1032). - In
<DataFrame>$rolling()
,<LazyFrame>$rolling()
,<DataFrame>$group_by_dynamic()
and<LazyFrame>$group_by_dynamic()
, theby
argument is renamed togroup_by
(#983). - In
$dt$convert_time_zone()
and$dt$replace_time_zone()
, thetz
argument is renamed totime_zone
(#944). - In
$str$strptime()
, the argumentdatatype
is renamed todtype
(#939). - In
$str$to_integer()
(renamed from$str$parse_int()
), argumentradix
is renamed tobase
(#1038).
- In
-
Change in the way arguments are passed:
-
In all input/output functions, all arguments except the first argument must be named arguments (#935).
-
In
<DataFrame>$rolling()
and<DataFrame>$group_by_dynamic()
, all arguments exceptindex_column
must be named arguments (#983). -
In
$unique()
forDataFrame
andLazyFrame
, argumentskeep
andmaintain_order
must be named (#953). -
In
$bin$decode()
, thestrict
argument must be a named argument (#980). -
In
$dt$replace_time_zone()
, all arguments excepttime_zone
must be named arguments (#944). -
In
$str$contains()
, the argumentsliteral
andstrict
must be named (#982). -
In
$str$contains_any()
, theascii_case_insensitive
argument must be named (#986). -
In
$str$count_matches()
,$str$replace()
and$str$replace_all()
, theliteral
argument must be named (#987). -
In
$str$strptime()
,$str$to_date()
,$str$to_datetime()
, and$str$to_time()
, all arguments (except the first one) must be named (#939). -
In
$str$to_integer()
(renamed from$str$parse_int()
), all arguments must be named (#1038). -
In
pl$date_range()
, the argumentsclosed
,time_unit
, andtime_zone
must be named (#950). -
In
$set_sorted()
and$sort_by()
, argumentdescending
must be named (#1034). -
In
pl$Series()
, using positional arguments throws a warning, since the argument positions will be changed in the future (#966).# polars 0.15.1 or earlier # The first argument is `x`, the second argument is `name`. pl$Series(1:3, "foo") # The code above will warn in 0.16.0 # Use named arguments to silence the warning. pl$Series(values = 1:3, name = "foo") pl$Series(name = "foo", values = 1:3) # polars 0.17.0 or later (future version) # The first argument is `name`, the second argument is `values`. pl$Series("foo", 1:3)
This warning can also be silenced by replacing
pl$Series(<values>, <name>)
byas_polars_series(<values>, <name>)
.
-
-
Arguments removed:
- The argument
columns
in$drop()
is removed.$drop()
now accepts several character scalars, such as$drop("a", "b", "c")
(#912). - In
pl$col()
, thename
argument is removed, and the...
argument no longer accepts a list of characters andRPolarsSeries
class objects (#923). - In
pl$date_range()
, the unused argument (not working in recent versions)explode
is removed. (#950).
- The argument
-
Change in arguments default and accepted values:
- In
pl$Series()
, the argumentvalues
has a new default valueNULL
(#966). - In
$unique()
forDataFrame
andLazyFrame
, argumentkeep
has a new default value"any"
(#953). - In rolling aggregation functions (such as
$rolling_mean()
), the default value of argumentclosed
now isNULL
. Usingclosed
with a fixedwindow_size
now throws an error (#937). - In
pl$date_range()
, the argumentend
must be specified and the default value ofinterval
is changed to"1d"
. The argumentsstart
andend
no longer accept numeric values (#950). - In
pl$scan_parquet()
, the default value of the argumentrechunk
is changed fromTRUE
toFALSE
(#1033). - In
pl$scan_parquet()
andpl$read_parquet()
, the argumentparallel
only accepts"auto"
,"columns"
,"row_groups"
, and"none"
. Previously, it also accepted upper-case notation of"auto"
,"columns"
,"none"
, and"RowGroups"
instead of"row_groups"
(#1033). - In
$str$to_integer()
(renamed from$str$parse_int()
), the default value ofbase
is changed from2
to10
(#1038).
- In
-
-
The usage of
pl$date_range()
to create a range ofDatetime
data type is deprecated.pl$date_range()
will always create a range ofDate
data type in the future. Usepl$datetime_range()
if you want to create a range ofDatetime
instead (#950). -
<DataFrame>$get_columns()
now returns an unnamed list instead of a named list (#991). -
Removed
$argsort()
which was an old alias for$arg_sort()
(#930). -
Removed
pl$expr_to_r()
which was an alias for$to_r()
(#938). -
<Series>$to_r_list()
is renamed<Series>$to_list()
(#938). -
Removed
<Series>$to_r_vector()
which was an old alias for<Series>$to_vector()
(#938). -
Removed
<Expr>$rep_extend()
, which was an experimental method created at the early stage of this package and does not exist in other language APIs (#1028). -
The following deprecated functions are now removed:
pl$threadpool_size()
,<DataFrame>$with_row_count()
,<LazyFrame>$with_row_count()
(#965). -
In
$group_by_dynamic()
, the first datapoint is always preserved (#1034). -
$str$parse_int()
is renamed to$str$to_integer()
(#1038).
-
New functions:
pl$arg_sort_by()
(#929).pl$arg_where()
to get the indices that match a condition (#922).pl$datetime()
,pl$date()
, andpl$time()
to easily create Expr of class datetime, date, and time via columns and literals (#918).pl$datetime_range()
,pl$date_ranges()
andpl$datetime_ranges()
(#950, #962).pl$int_range()
andpl$int_ranges()
(#968)pl$mean_horizontal()
(#959)pl$read_ipc()
(#1033).is_polars_dtype()
(#927).
-
New methods:
<LazyFrame>$to_dot()
to print the query plan of a LazyFrame with graphviz dot syntax (#928).$clear()
forDataFrame
,LazyFrame
, andSeries
(#1004).$item()
forDataFrame
andSeries
(#992).$select_seq()
and$with_columns_seq()
forDataFrame
andLazyFrame
(#1003).$arr$to_list()
(#1018).$str$extract_groups()
(#979).$str$find()
(#985).<DataFrame>$write_ipc()
(#1032).RPolarsDataType
gains several methods to check the datatype, such as$is_integer()
,$is_null()
or$is_list()
(#1036).
-
New arguments or argument values:
ambiguous
can now take the value"null"
to convert ambigous datetimes to null values (#937).n
in$str$replace()
(#987).non_existent
in$dt$replace_time_zone()
to specify what should happen when a datetime doesn't exist.mapping_strategy
in$over()
(#984, #988).raise_if_undetermined
in$meta$output_name()
(#961).null_on_oob
in$arr$get()
and$list$get()
to determine what happens when the index is out of bounds (#1034).nulls_last
,multithreaded
, andmaintain_order
in$sort_by()
(#1034).
-
Other:
pl$Series()
now callsas_polars_series()
internally, so it can convert more classes to Series properly (#1015).- Export the
Duration
datatype (#955). - New active binding
<Series>$struct$fields
(#1002). - All
$write_*()
and$sink_*()
functions now invisibly return the input data (#1039).
- The
join_nulls
andvalidate
arguments of<DataFrame>$join()
now work correctly (#945). - We said in the changelog of 0.14.0 that all
row_count_*
args in I/O functions were renamedrow_index_*
, but this change was not made for CSV and IPC functions. This renaming is now made (#964). - Evaluating
Series
methods fromExpr
inside functions now works correctly (#973). Thanks @Yunuuuu for the report. - The dependent crate
extendr-api
is updated to 2024-03-31 unreleased version (#995). The issue that the R session crashes when a panic occurs in the Rust side is resolved. Thanks @CGMossa for the upstream fix. - The
parallel
argument ofpl$scan_parquet()
andpl$read_parquet()
now works correctly (#1033). Previously, any correct value was treated as"auto"
.
- Rust Polars is updated to 0.38.2 (#907).
- Minimum supported Rust version (MSRV) is now 1.76.0.
as_polars_df(<nanoarrow_array>)
is added (#893).- It is now possible to create an empty
DataFrame
with a specific schema withpl$DataFrame(schema = my_schema)
(#901). - New arguments
dtype
andnan_to_null
forpl$Series()
(#902). - New method
<DataFrame>$partition_by()
(#898).
- The default value of the
format
of$str$strptime()
is now correctly set (#892).
- Performance of
as_polars_df(<nanoarrow_array_stream>)
is improved (#896).
- Rust Polars is updated to 0.38.1 (#865, #872).
- in
$pivot()
, argumentsaggregate_function
,maintain_order
,sort_columns
andseparator
must be named. Values that are passed by position are ignored. - in
$describe()
, the name of the first column changed from"describe"
to"statistic"
. $mod()
methods and%%
works correctly to guaranteex == (x %% y) + y * (x %/% y)
.
- in
-
Removed
as.list()
for classRPolarsExpr
as it is a simple wrapper aroundlist()
(#843). -
Several functions have been rewritten to match the behavior of Python Polars.
pl$col(...)
requires at least one argument. (#852)pl$head()
,pl$tail()
,pl$count()
,pl$first()
,pl$last()
,pl$max()
,pl$min()
,pl$mean()
,pl$media()
,pl$std()
,pl$sum()
,pl$var()
,pl$n_unique()
, andpl$approx_n_unique()
are syntactic sugar forpl$col(...)$<method()>
. The argument...
now only accepts characters, that are either column names or regular expressions (#852).- There is no argument for
pl$len()
. If you want to measure the length of specific columns, you should usepl$count(...)
(#852). <Expr>$str$concat()
method'sdelimiter
argument's default value is changed from"-"
to""
(#853).<Expr>$str$concat()
method'signore_nulls
argument must be a named argument (#853).pl$Datetime()
's arguments are renamed:tu
totime_unit
, andtz
totime_zone
(#887).
-
pl$Categorical()
has been improved to allow specifying theordering
type (either lexical or physical). This also means that callingpl$Categorical
doesn't create aDataType
anymore. All calls topl$Categorical
must be replaced bypl$Categorical()
(#860). -
<Series>$rem()
is removed. Use<Series>$mod()
instead (#886). -
The conversion strategy between the POSIXct type without time zone attribute and Polars datetime has been changed (#878).
POSIXct
class vectors without a time zone attribute have UTC time internally and is displayed based on the system's time zone. Previous versions ofpolars
only considered the internal value and interpreted it as UTC time, so the time displayed asPOSIXct
and in Polars was different.# polars 0.14.1 Sys.setenv(TZ = "Europe/Paris") datetime = as.POSIXct("1900-01-01") datetime #> [1] "1900-01-01 PMT" s = polars::as_polars_series(datetime) s #> polars Series: shape: (1,) #> Series: '' [datetime[ms]] #> [ #> 1899-12-31 23:50:39 #> ] as.vector(s) #> [1] "1900-01-01 PMT"
Now the internal value is updated to match the displayed value.
# polars 0.15.0 Sys.setenv(TZ = "Europe/Paris") datetime = as.POSIXct("1900-01-01") datetime #> [1] "1900-01-01 PMT" s = polars::as_polars_series(datetime) s #> polars Series: shape: (1,) #> Series: '' [datetime[ms]] #> [ #> 1900-01-01 00:00:00 #> ] as.vector(s) #> [1] "1900-01-01 PMT"
This update may cause errors when converting from Polars to
POSIXct
for non-existent or ambiguous times. It is recommended to explicitly add a time zone before converting from Polars to R.Sys.setenv(TZ = "America/New_York") ambiguous_time = as.POSIXct("2020-11-01 01:00:00") ambiguous_time #> [1] "2020-11-01 01:00:00 EDT" pls = polars::as_polars_series(ambiguous_time) pls #> polars Series: shape: (1,) #> Series: '' [datetime[ms]] #> [ #> 2020-11-01 01:00:00 #> ] ## This will be error! # pls |> as.vector() pls$dt$replace_time_zone("UTC") |> as.vector() #> [1] "2020-11-01 01:00:00 UTC"
-
Removed argument
eager
inpl$date_range()
andpl$struct()
for more consistency of output. It is possible to replaceeager = TRUE
by calling$to_series()
(#882).
- In the when-then-otherwise expressions, the last
$otherwise()
is now optional, as in Python Polars. If$otherwise()
is not specified, rows that don't respect the condition set in$when()
will be filled withnull
(#836). <DataFrame>$head()
and<DataFrame>$tail()
methods now support negative row numbers (#840).$group_by()
now works with named expressions (#846).- New methods for the
arr
subnamespace:$median()
,$var()
,$std()
,$shift()
,$to_struct()
(#867). $min()
andmax()
now work on categorical variables (#868).- New methods for the
list
subnamespace:$n_unique()
,$gather_every()
(#869). - Converts
clock_time_point
andclock_zoned_time
objects from the{clock}
package to Polars datetime type (#861). - New methods for the
name
subnamespace:$prefix_fields()
andsuffix_fields()
(#873). pl$Datetime()
'stime_zone
argument now accepts"*"
to match any time zone (#887).
- R no longer crashes when calling an invalid Polars object that points to a null pointer (#874). This was occurring, such as when a Polars object was saved in an RDS file and loaded from another session.
- Since most of the methods of
Expr
are now available forSeries
, the experimental<Series>$expr
subnamespace is removed (#831). Use<Series>$<method>
instead of<Series>$expr$<method>
.
- New active bindings
$flags
forDataFrame
to show the flags used internally for each column. The output of$flags
forSeries
was also improved and now containsFAST_EXPLODE
forSeries
of typelist
andarray
(#809). - Most of
Expr
methods are also available forSeries
(#819, #828, #831). as_polars_df()
fordata.frame
is more memory-efficient and new argumentsschema
andschema_overrides
are added (#817).- Use
polars_code_completion_activate()
to enable code suggestions and autocompletion after$
on polars objects. This is an experimental feature that is disabled by default. For now, it is only supported in the native R terminal and in RStudio (#597).
<Series>$list
sub namespace methods returnsSeries
class object correctly (#819).
- Rust Polars is updated to 0.37.0 (#776).
- Minimum supported Rust version (MSRV) is now 1.74.1.
$with_row_count()
forDataFrame
andLazyFrame
is deprecated and will be removed in 0.15.0. It is replaced by$with_row_index()
.pl$count()
is deprecated and will be removed in 0.15.0. It is replaced bypl$len()
.$explode()
forDataFrame
andLazyFrame
doesn't work anymore on string columns.$list$join()
andpl$concat_str()
gain an argumentignore_nulls
. The current behavior is to return anull
if the row contains anynull
. Settingignore_nulls = TRUE
changes that.- All
row_count_*
args in reading/scanning functions are renamedrow_index_*
. $sort()
forSeries
gains an argumentnulls_last
.$str$extract()
and$str$zfill()
now accept anExpr
and parse strings as column names. Usepl$lit()
to recover the old behavior.$cum_count()
now starts from 1 instead of 0.
- The
simd
feature of the Rust library is removed in favor of the newnightly
feature (#800). If you specifiedsimd
via theLIBR_POLARS_FEATURES
environment variable during source installations, please usenightly
instead; there is no change if you specifiedfull_features
because it now containsnightly
instead ofsimd
. - The following functions were deprecated in 0.13.0 and are now removed (#783):
$list$lengths()
->$list$len()
pl$from_arrow()
->as_polars_df()
oras_polars_series()
pl$set_options()
andpl$reset_options()
->polars_options()
$is_between()
had several changes (#788):- arguments
start
andend
are renamedlower_bound
andupper_bound
. Their behaviour doesn't change. include_bounds
is renamedclosed
and must be one of"left"
,"right"
,"both"
, or"none"
.
- arguments
polars_info()
returns a slightly changed list.$threadpool_size
, which means the number of threads used by Polars, is changed to$thread_pool_size
(#784)$version
, which indicates the version of this package, is changed to$versions$r_package
(#791).$rust_polars
, which indicates the version of the dependent Rust Polars, is changed to$versions$rust_crate
(#791).
- New behavior when creating a
DataFrame
with a single list-variable.pl$DataFrame(x = list(1:2, 3:4))
used to create aDataFrame
with two columns named "new_column" and "new_column_1", which was unexpected. It now produces aDataFrame
with a singlelist
variable. This also applies to list-column created in$with_columns()
and$select()
(#794).
pl$threadpool_size()
is deprecated and will be removed in 0.15.0. Usepl$thread_pool_size()
instead (#784).
- Implementation of the subnamespace
$arr
for expressions onarray
-type columns. Anarray
column is similar to alist
column, but is stricter as each sub-array must have the same number of elements (#790).
- The
sql
feature is included in the default feature (#800). This means that functionality related to theRPolarsSQLContext
class is now always included in the binary package.
- New method
$write_parquet()
for DataFrame (#758). - S3 methods of
as.data.frame()
forRPolarsDataFrame
andRPolarsLazyFrame
accepts more arguments ofas_polars_df()
and<DataFrame>$to_data_frame()
(#762). - S3 methods of
arrow::as_arrow_table()
andarrow::as_record_batch_reader()
forRPolarsDataFrame
no longer need the{nanoarrow}
package (#754). - Some S3 methods for the
{nanoarrow}
package are added (#730).as_polars_df(<nanoarrow_array_stream>)
as_polars_series(<nanoarrow_array>)
as_polars_series(<nanoarrow_array_stream>)
$sort()
no longer panicks whendescending = NULL
(#748).
downlit::autolink()
now recognize the reference pages of this package (#739).
<Expr>$where()
is removed. Use<Expr>$filter()
instead (#718).- Deprecated functions from 0.12.x are removed (#714).
<Expr>$apply()
and<Expr>$map()
, use$map_elements()
and$map_batches()
instead.pl$polars_info()
, usepolars_info()
instead.
- The environment variables used when building the library have been changed
(#693). This only affects selecting the feature flag and selecting profiles
during source installation.
RPOLARS_PROFILE
is renamed toLIBR_POLARS_PROFILE
RPOLARS_FULL_FEATURES
is removed andLIBR_POLARS_FEATURES
is added. To select thefull_features
, setLIBR_POLARS_FEATURES="full_features"
.RPOLARS_RUST_SOURCE
, which was used for development, has been removed. If you want to use library binaries located elsewhere, useLIBR_POLARS_PATH
instead.
- Remove the
eager
argument of<SQLContext>$execute()
. Use the$collect()
method after$execute()
oras_polars_df
to get the result as aDataFrame
. (#719) - The argument
name_generator
of$list$to_struct()
is renamedfields
(#724). - The S3 method
[
for the$list
subnamespace is removed (#724). - The option
polars.df_print
has been renamedpolars.df_knitr_print
(#726).
$list$lengths()
is deprecated and will be removed in 0.14.0. Use$list$len()
instead (#724).pl$from_arrow()
is deprecated and will be removed in 0.14.0. Useas_polars_df()
oras_polars_series()
instead (#728).pl$set_options()
andpl$reset_options()
are deprecated and will be removed in 0.14.0. See?polars_options
for details (#726).
- For compatibility with CRAN, the number of threads used by Polars is automatically set to 2
if the environment variable
POLARS_MAX_THREADS
is not set (#720). To disable this behavior and have the maximum number of threads used automatically, one of the following ways can be used:- Build the Rust library with the
disable_limit_max_threads
feature. - Set the
polars.limit_max_threads
option toFALSE
with theoptions()
function before loading the package.
- Build the Rust library with the
- New method
$rolling()
forDataFrame
andLazyFrame
. When this is applied, it creates an object of classRPolarsRollingGroupBy
(#682, #694). - New method
$group_by_dynamic()
forDataFrame
andLazyFrame
. When this is applied, it creates an object of classRPolarsDynamicGroupBy
(#691). - New method
$sink_ndjson()
for LazyFrame (#681). - New function
pl$duration()
to create a duration by components (week, day, hour, etc.), and use them with date(time) variables (#692). - New methods
$list$any()
and$list$all()
(#709). - New function
pl$from_epoch()
to convert a Unix timestamp to a date(time) variable (#708). - New methods for the
list
subnamespace:$set_union()
,$set_intersection()
,$set_difference()
,$set_symmetric_difference()
(#712). - New option
int64_conversion
to specify how Int64 columns (that don't have equivalent in base R) should be converted. This option can either be set globally withpl$set_options()
or on a case-by-case basis, e.g with$to_data_frame(int64_conversion =)
(#706). - Several changes in
$join()
forDataFrame
andLazyFrame
(#716):<LazyFrame>$join()
now errors ifother
is not aLazyFrame
and<DataFrame>$join()
errors ifother
is not aDataFrame
.- Some arguments have been reordered (e.g
how
now comes beforeleft_on
). This can lead to bugs if the user didn't use argument names. - Argument
how
now accepts"outer_coalesce"
to coalesce the join keys automatically after joining. - New argument
validate
to perform some checks on join keys (e.g ensure that there is a one-to-one matching between join keys). - New argument
join_nulls
to considernull
values as a valid key.
<DataFrame>$describe()
now works with all datatypes. It also gains aninterpolation
argument that is used for quantiles computation (#717).as_polars_df()
andas_polars_series()
for thearrow
package classes have been rewritten and work better (#727).- Options handling has been rewritten to match the standard option handling in
R (#726):
- Options are now passed via
options()
. The option names don't change but they must be prefixed with"polars."
. For example, we can now passoptions(polars.strictly_immutable = FALSE)
. - Options can be accessed with
polars_options()
, which returns a named list (this is the replacement ofpl$options
). - Options can be reset with
polars_options_reset()
(this is the replacement ofpl$reset_options()
).
- Options are now passed via
- New function
polars_envvars()
to print the list of environment variables related to polars (#735).
This is a small release including a few documentation improvements and internal updates.
This version includes a few additional features and a large amount of documentation improvements.
pl$polars_info()
is moved topolars_info()
.pl$polars_info()
is deprecated and will be removed in 0.13.0 (#662).
- Rust Polars is updated to 0.36.2 (#659). Most of the changes from 0.35.x to 0.36.2
were covered in R polars 0.12.0.
The main change is that
pl$Utf8
is replaced bypl$String
.pl$Utf8
is an alias and will keep working, butpl$String
is now preferred in the documentation and in new code.
- New methods
$str$reverse()
,$str$contains_any()
, and$str$replace_many()
(#641). - New methods
$rle()
and$rle_id()
(#648). - New functions
is_polars_df()
,is_polars_lf()
,is_polars_series()
(#658). $gather()
now accepts negative indexing (#659).
- Remove the
Makefile
in favor ofTaskfile.yml
. Please usetask
instead ofmake
as a task runner in the development (#654).
- Rust Polars is updated to 2023-12-25 unreleased version (#601, #622).
This is the same version of Python Polars package 0.20.2, so please check
the upgrade guide for details too.
pl$scan_csv()
andpl$read_csv()
'scomment_char
argument is renamedcomment_prefix
.<DataFrame>$frame_equal()
and<Series>$series_equal()
are renamed to<DataFrame>$equals()
and<Series>$equals()
.<Expr>$rolling_*
functions gained an argumentwarn_if_unsorted
.<Expr>$str$json_extract()
is renamed to<Expr>$str$json_decode()
.- Change default join behavior with regard to
null
values. - Preserve left and right join keys in outer joins.
count
now ignores null values.NaN
values are now considered equal.$gather_every()
gained an argumentoffset
.
$apply()
on an Expr or a Series is renamed$map_elements()
, and$map()
is renamed$map_batches()
.$map()
and$apply()
will be removed in 0.13.0 (#534).- Removed
$days()
,$hours()
,$minutes()
,$seconds()
,$milliseconds()
,$microseconds()
,$nanoseconds()
. Those were deprecated in 0.11.0 (#550). pl$concat_list()
: elements being strings are now interpreted as column names. Usepl$lit
to concat with a string.<RPolarsExpr>$lit_to_s()
is renamed to<RPolarsExpr>$to_series()
(#582).<RPolarsExpr>$lit_to_df()
is removed (#582).- Change class names and function names associated with class names.
- The class name of all objects created by polars (
DataFrame
,LazyFrame
,Expr
,Series
, etc.) has changed. They now start withRPolars
, for exampleRPolarsDataFrame
. This will only break your code if you directly use those class names, such as in S3 methods (#554, #585). - Private methods have been unified so that they do not have the
RPolars
prefix (#584).
- The class name of all objects created by polars (
- The Extract function (
[
) for DataFrame can use columns not included in the result for filtering (#547). - The Extract function (
[
) for LazyFrame can filter rows with Expressions (#547). as_polars_df()
fordata.frame
has a new argumentrownames
for to convert the row.names attribute to a column. This option is inspired by thetibble::as_tibble()
function (#561).as_polars_df()
fordata.frame
has a new argumentmake_names_unique
(#561).- New methods
$str$to_date()
,$str$to_time()
,$str$to_datetime()
as alternatives to$str$strptime()
(#558). - The
dim()
function for DataFrame and LazyFrame correctly returns integer instead of double (#577). - The conversion of R's
POSIXct
class to Polars datetime now works correctly with millisecond precision (#589). <LazyFrame>$filter()
,<DataFrame>$filter()
, andpl$when()
now allow multiple conditions to be separated by commas, likelf$filter(pl$col("foo") == 1, pl$col("bar") != 2)
(#598).- New method
$replace()
for expressions (#601). - Better error messages for trailing argument commas such as
pl$DataFrame()$select("a",)
(#607). - New function
pl$threadpool_size()
to get the number of threads used by Polars (#620). Thread pool size is also included in the output ofpl$polars_info()
.
- Rust Polars is updated to 0.35.0 (2023-11-17) (#515)
- changes in
$write_csv()
andsink_csv()
:has_header
is renamedinclude_header
and there's a new argumentinclude_bom
. pl$cov()
gains addof
argument.$cumsum()
,$cumprod()
,$cummin()
,$cummax()
,$cumcount()
are renamed$cum_sum()
,$cum_prod()
,$cum_min()
,$cum_max()
,$cum_count()
.take()
andtake_every()
are renamed$gather()
andgather_every()
.$shift()
and$shift_and_fill()
now accept Expr as input.- when
reverse = TRUE
,$arg_sort()
now places null values in the first positions. - Removed argument
ambiguous
in$dt$truncate()
and$dt$round()
. $str$concat()
gains an argumentignore_nulls
.
- changes in
- The rowwise computation when several columns are passed to
pl$min()
,pl$max()
, andpl$sum()
is deprecated and will be removed in 0.12.0. Passing several columns to these functions will now compute the min/max/sum in each column separately. Usepl$min_horizontal()
pl$max_horizontal()
, andpl$sum_horizontal()
instead for rowwise computation (#508). $is_not()
is deprecated and will be removed in 0.12.0. Use$not()
instead (#511, #531).$is_first()
is deprecated and will be removed in 0.12.0. Use$is_first_distinct()
instead (#531).- In
pl$concat()
, the argumentto_supertypes
is removed. Use the suffix"_relaxed"
in thehow
argument to cast columns to their shared supertypes (#523). - All duration methods (
days()
,hours()
,minutes()
,seconds()
,milliseconds()
,microseconds()
,nanoseconds()
) are renamed, for example from$dt$days()
to$dt$total_days()
. The old usage is deprecated and will be removed in 0.12.0 (#530). - DataFrame methods
$as_data_frame()
is removed in favor of$to_data_frame()
(#533). - GroupBy methods
$as_data_frame()
and$to_data_frame()
which were used to convert GroupBy objects to R data frames are removed. Use$ungroup()
method and theas.data.frame()
function instead (#533).
- Fix the installation issue on Ubuntu 20.04 (#528, thanks @brownag).
- New methods
$write_json()
and$write_ndjson()
for DataFrame (#502). - Removed argument
name
inpl$date_range()
, which was deprecated for a while (#503). - New private method
.pr$DataFrame$drop_all_in_place(df)
to dropDataFrame
in-place, to release memory without invoking gc(). However, if there are other strong references to any of the underlying Series or arrow arrays, that memory will specifically not be released. This method is aimed for r-polars extensions, and will be kept stable as much as possible (#504). - New functions
pl$min_horizontal()
,pl$max_horizontal()
,pl$sum_horizontal()
,pl$all_horizontal()
,pl$any_horizontal()
(#508). - New generic functions
as_polars_df()
andas_polars_lf()
to create polars DataFrames and LazyFrames (#519). - New method
$ungroup()
forGroupBy
andLazyGroupBy
(#522). - New method
$rolling()
to apply an Expr over a rolling window based on date/datetime/numeric indices (#470). - New methods
$name$to_lowercase()
and$name$to_uppercase()
to transform variable names (#529). - New method
$is_last_distinct()
(#531). - New methods of the Expressions class,
$floor_div()
,$mod()
,$eq_missing()
and$neq_missing()
. The base R operators%/%
and%%
for Expressions are now translated to$floor_div()
and$mod()
(#523).- Note that
$mod()
of Polars is different from the R operator%%
, which is not guaranteedx == (x %% y) + y * (x %/% y)
. Please check the upstream issue pola-rs/polars#10570.
- Note that
- The extract function (
[
) for polars objects now behave more like for base R objects (#543).
- The argument
quote_style
in$write_csv()
and$sink_csv()
can now take the value"never"
(#483). pl$DataFrame()
now errors if the variables specified inschema
do not exist in the data (#486).- S3 methods for base R functions are well documented (#494).
- A bug that failing
pl$SQLContext()$register()
without load the package was fixed (#496).
- Rust Polars is updated to 2023-10-25 unreleased version (#442)
- Minimum supported Rust version (MSRV) is now 1.73.
- New subnamespace
"name"
that contains methods$prefix()
,$suffix()
keep()
(renamed fromkeep_name()
) andmap()
(renamed frommap_alias()
). $dt$round()
gains an argumentambiguous
.- The following methods now accept an
Expr
as input:$top_k()
,$bottom_k()
,$list$join()
,$str$strip_chars()
,$str$strip_chars_start()
,$str$strip_chars_end()
,$str$split_exact()
. - The following methods were renamed:
$str$n_chars()
->$str$len_chars()
$str$lengths()
->$str$len_bytes()
$str$ljust()
->$str$pad_end()
$str$rjust()
->$str$pad_start()
$concat()
withhow = "diagonal"
now accepts an argumentto_supertypes
to automatically convert concatenated columns to the same type.pl$enable_string_cache()
doesn't take any argument anymore. The string cache can now be disabled withpl$disable_string_cache()
.$scan_parquet()
gains an argumenthive_partitioning
.$meta$tree_format()
has a better formatted output.
$scan_csv()
and$read_csv()
now match more closely the Python-Polars API (#455):sep
is renamedseparator
,overwrite_dtypes
is renameddtypes
,parse_dates
is renamedtry_parse_dates
.- new arguments
rechunk
,eol_char
,raise_if_empty
,truncate_ragged_lines
path
can now be a vector of characters indicating several paths to CSV files. This only works if all CSV files have the same schema.
- New class
RPolarsSQLContext
and its methods to perform SQL queries on DataFrame- like objects. To use this feature, needs to build Rust library with full features (#457). - New methods
$peak_min()
and$peak_max()
to find local minima and maxima in an Expr (#462). - New methods
$read_ndjson()
and$scan_ndjson()
(#471). - New method
$with_context()
forLazyFrame
to have access to columns from other Data/LazyFrames during the computation (#475).
- Rust Polars is updated to 0.33.2 (#417)
- In all date-time related methods, the argument
use_earliest
is replaced byambiguous
. - In
$sample()
and$shuffle()
, the argumentfixed_seed
is removed. - In
$value_counts()
, the argumentsmultithreaded
andsort
(sometimes calledsorted
) have been swapped and renamedsort
andparallel
. $str$count_match()
gains aliteral
argument.$arg_min()
doesn't considerNA
as the minimum anymore (this was already the behavior of$min()
).- Using
$is_in()
withNA
on both sides now returnsNA
and notTRUE
anymore. - Argument
pattern
of$str$count_matches()
can now use expressions. - Needs Rust toolchain
nightly-2023-08-26
for to build with full features.
- In all date-time related methods, the argument
- Rename R functions to match Rust Polars
$str$count_match()
->$str$count_matches()
(#417)$str$strip()
->$str$strip_chars()
(#417)$str$lstrip()
->$str$strip_chars_start()
(#417)$str$rstrip()
->$str$strip_chars_end()
(#417)$groupby()
is renamed$group_by()
. (#427)
- Remove some deprecated methods.
- Method
$with_column()
has been removed (it was deprecated since 0.8.0). Use$with_columns()
instead (#402). - Subnamespace
$arr
has been removed (it was deprecated since 0.8.1). Use$list
instead (#402).
- Method
- Setting and getting polars options is now made with
pl$options
,pl$set_options()
andpl$reset_options()
(#384).
-
Bump supported R version to 4.2 or later (#435).
-
pl$concat()
now also supportsSeries
,Expr
andLazyFrame
(#407). -
New method
$unnest()
forLazyFrame
(#397). -
New method
$sample()
forDataFrame
(#399). -
New method
$meta$tree_format()
to display anExpr
as a tree (#401). -
New argument
schema
inpl$DataFrame()
andpl$LazyFrame()
to override the automatic type detection (#385). -
Fix bug when calling R from polars via e.g.
$map()
where query would not complete in one edge case (#409). -
New method
$cat$get_categories()
to list unique values of categorical variables (#412). -
New methods
$fold()
and$reduce()
to apply an R function rowwise (#403). -
New function
pl$raw_list
and classrpolars_raw_list
a list of R Raw's, where missing is encoded asNULL
to aid conversion to polars binary Series. Support back and forth conversion from polars binary literal and Series to R raw (#417). -
New method
$write_csv()
forDataFrame
(#414). -
New method
$sink_csv()
forLazyFrame
(#432). -
New method
$dt$time()
to extract the time from adatetime
variable (#428). -
Method
$profile()
gains optimization arguments and plot-related arguments (#429). -
New method
pl$read_parquet()
that is a shortcut forpl$scan_parquet()$collect()
(#434). -
Rename
$str$str_explode()
to$str$explode()
(#436). -
New method
$transpose()
forDataFrame
(#440). -
New argument
eager
ofLazyFrame$set_optimization_toggle()
(#439). -
{polars}
can now be installed with "R source package with Rust library binary", by a mechanism copied from the prqlr package.Sys.setenv(NOT_CRAN = "true") install.packages("polars", repos = "https://rpolars.r-universe.dev")
The URL and SHA256 hash of the available binaries are recorded in
tools/lib-sums.tsv
. (#435, #448, #450, #451)
- New string method
to_titlecase()
(#371). - Although stated in news for PR (#334)
strip = true
was not actually set for the "release-optimized" compilation profile. Now it is, but the binary sizes seems unchanged (#377). - New vignette on best practices to improve
polars
performance (#188). - Subnamespace name "arr" as in
<Expr>$arr$
&<Series>$arr$
is deprecated in favor of "list". The subnamespace "arr" will be removed in polars 0.9.0 (#375).
Rust Polars was updated to 0.32.0, which comes with many breaking changes and new features. Unrelated breaking changes and new features are put in separate sections (#334):
- update of rust toolchain: nightly bumped to nightly-2023-07-27 and MSRV is now >=1.70.
- param
common_subplan_elimination = TRUE
in<LazyFrame>
methods$collect()
,$sink_ipc()
and$sink_parquet()
is renamed and split intocomm_subplan_elim = TRUE
andcomm_subexpr_elim = TRUE
. - Series_is_sorted: nulls_last argument is dropped.
when-then-otherwise
classes are renamed toWhen
,Then
,ChainedWhen
andChainedThen
. The syntactically illegal methods have been removed, e.g. chaining$when()
twice.- Github release + R-universe is compiled with
profile=release-optimized
, which now includesstrip=false
,lto=fat
&codegen-units=1
. This should make the binary a bit smaller and faster. See also FULL_FEATURES=true
env flag to enable simd with nightly rust. For development or faster compilation, use insteadprofile=release
. fmt
arg is renamedformat
inpl$Ptimes
and<Expr>$str$strptime
.<Expr>$approx_unique()
changed name to<Expr>$approx_n_unique()
.<Expr>$str$json_extract
argpat
changed todtype
and has a new argumentinfer_schema_length = 100
.- Some arguments in
pl$date_range()
have changed:low
->start
,high
->end
,lazy = TRUE
->eager = FALSE
. Argstime_zone
andtime_unit
can no longer be used to implicitly cast time types. These two args can only be used to annotate a naive time unit. Mixingtime_zone
andtime_unit
forstart
andend
is not allowed anymore. <Expr>$is_in()
operation no longer supported for dtypenull
.- Various subtle changes:
(pl$lit(NA_real_) == pl$lit(NA_real_))$lit_to_s()
renders now tonull
nottrue
.pl$lit(NA_real_)$is_in(pl$lit(NULL))$lit_to_s()
renders now tofalse
and beforetrue
pl$lit(numeric(0))$sum()$lit_to_s()
now yields0f64
and notnull
.
<Expr>$all()
and<Expr>$any()
have a new argdrop_nulls = TRUE
.<Expr>$sample()
and<Expr>$shuffle()
have a new argfix_seed
.<DataFrame>$sort()
and<LazyFrame>$sort()
have a new argmaintain_order = FALSE
.
$rpow()
is removed. It should never have been translated. Use^
and$pow()
instead (#346).<LazyFrame>$collect_background()
renamed<LazyFrame>$collect_in_background()
and reworked. LikewisePolarsBackgroundHandle
reworked and renamed toRThreadHandle
(#311).pl$scan_arrow_ipc
is now calledpl$scan_ipc
(#343).
- Stream query to file with
pl$sink_ipc()
andpl$sink_parquet()
(#343) - New method
$explode()
forDataFrame
andLazyFrame
(#314). - New method
$clone()
forLazyFrame
(#347). - New method
$fetch()
forLazyFrame
(#319). - New methods
$optimization_toggle()
and$profile()
forLazyFrame
(#323). $with_column()
is now deprecated (following upstreampolars
). It will be removed in 0.9.0. It should be replaced with$with_columns()
(#313).- New lazy function translated:
concat_str()
to concatenate several columns into one (#349). - New stat functions
pl$cov()
,pl$rolling_cov()
pl$corr()
,pl$rolling_corr()
(#351). - Add functions
pl$set_global_rpool_cap()
,pl$get_global_rpool_cap()
, classRThreadHandle
andin_background = FALSE
param to<Expr>$map()
and$apply()
. It is now possible to run R code with<LazyFrame>collect_in_background()
and/or let polars parallize R code in an R processes pool. SeeRThreadHandle-class
in reference docs for more info. (#311) - Internal IPC/shared-mem channel to serialize and send R objects / polars DataFrame across R processes. (#311)
- Compile environment flag RPOLARS_ALL_FEATURES changes name to RPOLARS_FULL_FEATURES. If 'true'
will trigger something like
Cargo build --features "full_features"
which is not exactly the same asCargo build --all-features
. Some dev features are not included in "full_features" (#311). - Fix bug to allow using polars without library(polars) (#355).
- New methods
<LazyFrame>$optimization_toggle()
+$profile()
and enable Rust Polars feature CSE: "Activate common subplan elimination optimization" (#323) - Named expression e.g.
pl$select(newname = pl$lit(2))
are no longer experimental and allowed as default (#357). - Added methods
pl$enable_string_cache()
,pl$with_string_cache()
andpl$using_string_cache()
for joining/comparing Categorical series/columns (#361). - Added an S3 generic
as_polars_series()
where users or developers of extensions can define a custom way to convert their format to Polars format. This generic must return a Polars series. See #368 for an example (#369). - Private API Support for Arrow Stream import/export of DataFrame between two R packages that uses Rust Polars. See R package example here (#326).
- Replace the argument
reverse
bydescending
in all sorting functions. This is for consistency with the upstream Polars (#291, #293). - Bump Rust Polars from 2023-04-20 unreleased version to version 0.30.0 released in 2023-05-30 (#289).
- Rename
concat_lst
toconcat_list
. - Rename
$str$explode
to$str$str_explode
. - Remove
tz_aware
andutc
arguments fromstr_parse
. - in
$date_range
's thelazy
argument is nowTRUE
by default.
- Rename
- The functions to read CSV have been renamed
scan_csv
andread_csv
for consistency with the upstream Polars.scan_xxx
andread_xxx
functions are now accessed viapl
, e.g.pl$scan_csv()
(#305).
- New method
$rename()
forLazyFrame
andDataFrame
(#239) <DataFrame>$unique()
and<LazyFrame>$unique()
gain amaintain_order
argument (#238).- New
pl$LazyFrame()
to quickly create aLazyFrame
, mostly in examples or for demonstration purposes (#240). - Polars is internally moving away from string errors to a new error-type called
RPolarsErr
both on rust- and R-side. Final error messages should look very similar (#233). $columns()
,$schema()
,$dtypes()
forLazyFrame
implemented (#250).- Improvements to internal
RPolarsErr
. AlsoRPolarsErr
will now print each context of the error on a separate line (#250). - Fix memory leak on error bug. Fix printing of
%
bug. Prepare for renaming of polars classes (#252). - Add helpful reference landing page at
polars.github.io/reference_home
(#223, #264). - Supports Rust 1.65 (#262, #280)
- Rust Polars'
simd
feature is now disabled by default. To enable it, set the environment variableRPOLARS_ALL_FEATURES
totrue
when build r-polars (#262). opt-level
ofargminmax
is now set to1
in therelease
profile to support Rust < 1.66. The profile can be changed by setting the environment variableRPOLARS_PROFILE
(when set torelease-optimized
,opt-level
ofargminmax
is set to3
).
- Rust Polars'
- A new function
pl$polars_info()
will tell which features enabled (#271, #285, #305). select()
now accepts lists of expressions. For example,<DataFrame>$select(l_expr)
works withl_expr = list(pl$col("a"))
(#265).- LazyFrame gets some new S3 methods:
[
,dim()
,dimnames()
,length()
,names()
(#301) <DataFrame>$glimpse()
is a faststr()
-like view of aDataFrame
(#277).$over()
now accepts a vector of column names (#287).- New method
<DataFrame>$describe()
(#268). - Cross joining is now possible with
how = "cross"
in$join()
(#310). - Add license info of all rust crates to
LICENSE.note
(#309). - With CRAN 0.7.0 release candidate (#308).
- New author accredited, SHIMA Tatsuya (@eitsupi).
- DESCRIPTION revised.
- use
pl$set_polars_options(debug_polars = TRUE)
to profile/debug method-calls of a polars query (#193) - add
<DataFrame>$melt(), <DataFrame>$pivot() + <LazyFrame>$melt()
methods (#232) - lazy functions translated:
pl$implode
,pl$explode
,pl$unique
,pl$approx_unique
,pl$head
,pl$tail
(#196) pl$list
is deprecated, usepl$implode
instead. (#196)- Docs improvements. (#210, #213)
- Update nix flake. (#227)
- Bump Rust Polars from 2023-02-17 unreleased version to 2023-04-20 unreleased version. (#183)
top_k
'sreverse
option is removed. Use the newbottom_k
method instead.- The name of the
fmt
argument of some methods (e.g.parse_date
) has been changed toformat
.
DataFrame
objects can be subsetted using brackets like standard R data frames:pl$DataFrame(mtcars)[2:4, c("mpg", "hp")]
(#140 @vincentarelbundock)- An experimental
knit_print()
method has been added to DataFrame that outputs HTML tables (similar to py-polars' HTML output) (#125 @eitsupi) Series
gains new methods:$mean
,$median
,$std
,$var
(#170 @vincentarelbundock)- A new option
use_earliest
ofreplace_time_zone
. (#183) - A new option
strict
ofparse_int
. (#183) - Perform joins on nearest keys with method
join_asof
. (#172)
- The package name was changed from
rpolars
topolars
. (#84)
- Several new methods for DataFrame, LazyFrame & GroupBy translated (#103, #105 @vincentarelbundock)
- Doc fixes (#102, #109 @etiennebacher)
- Experimental opt-in auto completion (#96 @sorhawell)
- Base R functions work on DataFrame and LazyFrame objects via S3 methods: as.data.frame, as.matrix, dim, head, length, max, mean, median, min, na.omit, names, sum, tail, unique, ncol, nrow (#107 @vincentarelbundock).
- @etiennebacher made their first contribution in #102
- @vincentarelbundock made their first contribution in #103
Release date: 2023-04-16. Full changelog: v0.4.6...v0.5.0