Breaking changes
These are all minor breaking changes resulting from enhancements and are not expected to affect the vast majority of users.
-
A new
...
argument was added torow_to_names()
, preceding theremove_row
argument, as part of the newfind_header()
functionality. If code previously usedremove_row
as an unnamed argument, it will now error. If code previously used the unsupported behavior of passing anything other thanTRUE
orFALSE
toremove_row
, unexpected results may occur. -
Microsoft Excel incorrectly has a leap day on 29 February 1900 (see https://docs.microsoft.com/en-us/office/troubleshoot/excel/wrongly-assumes-1900-is-leap-year).
excel_numeric_to_date()
did not account for this error, and now it does. Dates returned fromexcel_numeric_to_date()
that precede 1 March 1900 will now be one day later compared to previous versions (i.e. what was 1 Feb 1900 is now 2 Feb 1900), and dates that Excel presents as 29 Feb 1900 will becomeas.POSIXct(NA)
. (#423, thanks @billdenney for fixing) -
A minor breaking change is that the time zone is now always set for
excel_numeric_to_date()
andconvert_date()
. The default timezone isSys.timezone()
, previously it was an empty string (""
). (#422, thanks @billdenney for fixing) -
get_dupes()
results are now sorted first by descending order ofdupe_count
, then alphabetically by sorting variables. (#493) -
There are several minor breaking changes resulting from enhancements to
adorn_ns()
:- The addition of the new argument
format_func
means that previous calls relying on,,,
as shorthand to get to the...
column selection argument may now require an extra comma. adorn_ns()
now defaults to displaying numbers of >3 digits withbig.mark = ","
, as part of the default value of the newformat_func
argument. E.g.,1234
is now1,234
.adorn_ns()
no longer prints leading whitespace whenposition = "front"
- this is not a visible change in the printed result and it would be rare that this affects any code.
- The addition of the new argument
-
When the first column of the data.frame input to
adorn_totals()
is a factor and a totals row is added to the bottom, that column now remains a factor, with "Total" or other user-specified totals name added to its factor levels (#494).
New features
-
row_to_names()
now has a new helper function,find_header()
to help find the row that contains the names. It can be used by passingrow_number="find_header"
. See the documentation ofrow_to_names()
andfind_header()
for more examples. (fix #429) -
remove_empty()
has a new argument,cutoff
which allows rows or columns to be removed if at least thecutoff
fraction of the data are missing. (fix #446, thanks to @jzadra for suggesting the feature and @billdenney for fixing) -
A new function
sas_numeric_to_date()
has been added to convert SAS dates, times, and datetimes to R objects (fix #475, thanks to @billdenney for suggesting and implementing) -
A new function
single_value()
has been added to ensure that only a single value or missing values are present in a vector (fix #428) -
A new function
get_one_to_one()
has been added to find columns that map 1:1 to each other, even if the values within the columns differ (fix #291, @billdenney) -
adorn_Ns()
contains a newformat_func
argument so that the user can format the Ns to their liking, e.g., changing thebig.mark
character. (#444) -
clean_names()
can now be called on database connection in a dbplyr code pipeline (#467)
Minor features
-
make_clean_names()
(and thereforeclean_names()
) issues a warning if the mu or micro symbol is in the names and it is not or may not be handled by areplace
argument value. (#448, thanks @IndrajeetPatil for reporting and @billdenney for fixing) The rationale is that standard transliteration would convert"[mu]g"
to"mg"
when it would be more typically be converted to"ug"
for use as a unit. A new, unexported constant (janitor:::mu_to_u) was added to help with mu to "u" replacements. -
excel_numeric_to_date()
now warns when times are converted toNA
due to hours that do not exist because of daylight savings time (fix #420, thanks @Geomorph2 for reporting and @billdenney for fixing). It also warns when inputs are not positive, since Excel only supports values down to 1 (#423). -
If a
tabyl()
or similar data.frame is sorted (e.g., withdplyr::arrange()
), then hasadorn_totals()
and/oradorn_percentages()
called on it, followed byadorn_ns()
, the Ns will be sorted correctly to match the tabyl they're being adorned on. (fix #407) -
clean_names()
now supports all object types that have either names or dimnames (#481, @DanChaltiel). -
adorn_pct_formatting()
uses the locale-dependent value ofdecimal.mark
as a decimal separator, e.g., in locales wheregetOption("OutDec")
is,
it will print percentages in the format"12,34%"
. This character can also be set manually withoptions(OutDec = ",")
.(#451). -
adorn_totals(where ="row")
now preserves factor class and levels of the first column of the input data.frame (#494). -
make_clean_names()
now allows for duplicate names to be returned by specifyingTRUE
to the newallow_dupes
argument (#495, @JasonAizkalns). -
Some warning messages now have classes so that they can be specifically suppressed with
suppressWarnings(..., class="the_class_to_suppress")
. To find the class of a warning you typically must look at the code where the error is occurring. (#452, thanks to @mgacc0 for suggesting and @billdenney for fixing)
Bug fixes
-
adorn_percentages()
was refactored for compatibility withdplyr
package versions >= 1.1.0 (#490) -
When a numeric variable is supplied as the 2nd variable (column) or 3rd variable (list) of a
tabyl
, the resulting columns or list are now sorted in numeric order, not alphabetic. (#438, thanks @daaronr for reporting and @mattroumaya for fixing) -
tabyl()
now succeeds when the second variable is named"n"
(#445). -
adorn_ns()
can act on a single-column data.frame input with custom Ns supplied if the variable to adorn is specified with...
(#456). -
adorn_totals()
on a one_way tabyl preserves thetabyl_type
attribute so that a subsequent call toadorn_pct_formatting()
works correctly on one-way tabyls (#523).