gtools-1.10.1
Release update. New commands and functions, several enhancements, and various bug fixes. Remember to run gtools, upgrade to keep up to date between major updates.
Features
-
New function
gstats transform
(weights, by allowed):
Applies a transformation to a variable; that is,y_i = f(x_i)
with
y
the target andx
the source. For example,gstats transform (demean) y = x, by(group)
gives
n_j = sum_i 1{group_i = j} s_j = sum_i 1{group_i = j} * x_i y_ij = x_i - s_j / n_j
available:
normalize, standardize: f(x) = (x - mean(x)) / sd(x) demean: f(x) = (x - mean(x)) demedian: f(x) = (x - median(x)) cumsum: f(x_i) = sum_{l = 1}^i x_l shift: f(x_i) = x_{i - lag} or x_{i + lead} rank: Similar to egen rank; see docs. moving: Moving statistics; see docs. range: Similar to rangestat; see docs.
-
gstats range
: Alias forgstats transform (range)
; see below. -
gstats moving
: Alias forgstats transform (moving)
; see below. -
gstats hdfe
(aliasgstats residualize
): Residualize variable
by absorbing high-dimensional fixed effects.-
Currenty in beta! Use with care; see docs for details.
-
Methods
cg
(Conjugate Gradient),squarem
(SQUAREM),it
(Irons and Tuck),map
(Method of Alternating Projections). -
Parallel execution of select functions can be enabled at compile
time viaGTOOLSOMP
-
-
gstats transform
:-
gstats transform (demean) ...
-
gstats transform (demedian) ...
-
gstats transform (normalize) ...
-
gstats transform (cumsum [+/- [varname]]) ...
: Sums in current order by default. User can request cum sum in ascending or descending order; last, the order can be determined by another variable. -
gstats transform (rank) ...
: Optionties()
specifies how to break ties (field, track, unique, stableunique). -
gstats transform (shift [+/-]#) ...
: Leads (default; e.g.shift 1
orshift +3
) and lags (e.g.shift -2
). -
gstats transform (moving stat lower upper) ...
: Moving statisticstat
from current observation +lower
until current observation +upper
; see docs for details. -
gstats transform (range stat lower upper varname) ...
: Moving statisticstat
for values of varname in range
varname[_n] - lower
tovarname[_n] + upper
. Can also specify a
statistic, e.g.range sd -1.0sd 1.0sd varname
to get all values
within a standard deviation ofvarname[_n]
. See docs for detauls. -
gstats transform, auto[()]
allows automagically naming
targets based on the source variable's name and the statistic
requested. Default is#source#_#stat#
.
-
-
greshape
-
Adds option
dropmiss
to drop missing rows (case-wise) when
reshaping long (vialong
orgather
). -
Closes #58; allows
uselabels[(varlist, [exclude])]
to optionally
specify which variables to use labels for (default is all
variables). The user can also specify the optionexclude
to
specify which variables not to do this for. -
Closes #63:
greshape wide/gather
allowsprefix(...)
for custom
output names. -
Closes #69.
greshape wide/spread
now allowslabelformat()
for
custom variable labels (only when a single variable is passed to
key()/j()
). The default is#keyvalue# #stublabel#
. Available
placeholders are#stubname#
,#stublabel#
,#keyname#
,
#keylabel#
,#keyvalue#
, and#keyvaluelabel#
-
-
gegen
:-
winsor
,winsorize
callgstats winsor
-
standardize
,normalize
,demean
,demedian
callgstats transform
-
Fixes #67; adds
gegen x = rank(varname) [wgt], by(varlist) ties(type)
viagstats transform (rank) [wgt], by() ties()
. Weights are optional. -
gegen x = moving_stat(y), window(lower upper)
callsgstats transform
-
gegen x = range_stat(y), interval(lower[stat] upper[stat] varname)
callsgstats transform
-
-
gcollapse
,gegen
,gstats tab
new functions:-
geomean
for geometric mean. -
gini
,gini dropneg
,gini keepneg
for gini coefficient
(optionally drop or keep negative values).
-
-
noinit
option forgcollapse, merge
,gegen
,gstats
(selected),
gregress
(and co.) to prevent targets from being emptied out
withreplace
. Prints warning!
Beta
- Regression models are in beta and not recommended at the moment;
see docs for details.
Enhancements
-
User must now specify global
GTOOLS_BETA
to use beta features. -
Typed (direct/non-hashed) radix sort in API internals
-
Allows the user to specify the temporary directory for files via
global GTOOLS_TEMPDIR
-
gunique, detail
now usesgstats sum, detail
-
Modularized the code base so that aliases are assigned to internal functions instead of the copy/paste if/else branching statements.
-
Categorize documentation into "Data manipulation", "Statistics",
and "Regression models". -
Move plugin compilation to GitHub rather than Travis.
-
gtop
prints the number of levels in Other and Missing rows by
default. (With missing it only does it if there's more than
one type of missing value.) -
greshape
tries to detect repeated stubs and suggests this possibility
to the user when a stub matches multiple variables. -
Faster excludeself mean and sum without specified range in
gstats transform
.
Bug Fixes
-
gstats winsor
, exits with error if replace and if/in are passed
(the way it's set up it'd be a bit of a hassle to allow init/noinit). -
gstats transform
,gstats hdfe
,gregress
(and co.) all now
initialize their targets to be empty (missing values) with
if in and replace. -
gtop
no longer incorrectly replaces the display value if the
numerical variable has a value label and no missing values. If there
was a single value this would result in an error:gtop
would think
there was always at least one missing value to replace. -
gcollapse
no longer fails when trying to label the collapsed output
if the source labels are blank (this can happen for example with data
transformed to.dta
from other formats or programs). -
gcollapse
no longer gives incorrect missing variables list when
part of that list is called with varlist notation (e.g.x* y
andx*
exist buty
does not). -
gunique
no longer ignores if/in withgen
andreplace
-
Fixed
gegen nunique
with multiple inputs -
Fixed bug where the prefix in
gstats
wasstat_
instead ofstat|
-
In
gquantiles
, data was read incorrectly withby()
andweights
ifxtile
was not requested. In particular, the data was copied as if
the target had only one column, but since weights need to be included,
the target has two columns. This was fixed. -
Fixed bug where a by variable being used as a source but not a
target got renamed to the target and was no longer available as a by
variable. Now a new variable should be created and the by variable
remains unchanged. -
Fixed memory leak where the C by variables were not cleared from memory
ofst_into->output
was allocated because free code was upgraded from 6
or 7 to 9. Conditional logic in place said that by variables should not
be cleared if free code was greater than 7, but that was only meant to
skip free code 8 and free code 9 in some scripts, but not all. Code 8
logic was deprecated and now by variables are allocated with code 8, so
they are always clared if free code is 6 or higher. -
by: gegen
now generates variables using theby
prefix.
This would give incorrect answers if the expression inside
egen assumed that it would be generated withby
. For example
by var: gegen x = mean(max(y, y[1]))
-
Closes #64: Removes
head
command fromgreshape
tests (done a few
commits ago but someone noticed before the merge). -
Closes #68.
gegen
now allowsby:
prefix when calling a
gstats transform
function (this is only allowed because these calls
already require single-variable input, so theby:
prefix should not
present an issue when calling the function). -
Closes #72: Warning for gegen expressions without by group
-
Closes #74: gstats transform parses abbreviated targets
-
Closes #75: gunique returns 0s in r() when there are no obs
-
Closes #78: if now passed raw/in double-quotes throughout the pipeline
-
Closes #79: Adds disclaimer to benchmarks.
-
Closes #82:
cw
ingcollapse
now working. -
Closes #85: Bug in
gegen
warning message causes errors in some fun calls. -
Closes #87: For OSX, make now compiles x86_64 and arm64 separately
then combines vialipo
. -
Various fixes to the docs.