- Jump to: Readme |
ppmlhdfe
Paper | Separation Paper | Help File | Separation Primer | Separation Benchmarks | Undocumented Options - Sections: reghdfe internals | Mata internals (Initialization - IRLS - Simplex - ReLU - Mu) | Display | Margins and esttab
This guide describes advanced and internal ppmlhdfe
options.
- (Recall that
tolerance(1e-8)
is the tolerance used to determine convergence within the IRLS step, not within reghdfe) itol(1e-9)
: reghdfe tolerance, used in three situations:- When running collinearity tests.
- Within the
guess(ols)
option - Within the IRLS step. Here it will be the "desired" reghdfe tolerance, and will differ from the actual reghdfe tolerance for two reasons:
- For speed purposes, the initial IRLS tolerance depends on the
start_inner_tol(1e-4)
- To ensure convergence and avoid numerical precision issues, the final reghdfe tolerance needs to be at least
0.1 irls_tolerance
and at most1e-12
.
- For speed purposes, the initial IRLS tolerance depends on the
accel(str)
denotes the type of acceleration used when partialling out, and is only relevant with 2 or more sets of fixed effects. Valid options arecg
(conjugate gradient, default),sd
(steep descent),a
(Aitken),lmsr
, andnone
.transf(str)
denotes the type of alternating projection used, and is only relevant with 2 or more sets of fixed effects. valid options aresym
(symmetric Kaczmarz, default),kac
(Kaczmarz), andcim
(Cimmino). This option is unused withlsmr
accelerations.
ppmlhdfe
allows you to change any of the internal options defined in the GLM Mata class.
For instance, if you don't want to standardize the variables before running the computations you can add the standardize_data(1)
option to ppmlhdfe:
sysuse auto
ppmlhdfe price gear, a(turn trunk) standardize_data(0)
In this case, notice how turning off standardization gives the exact same results although convergence takes a little longer (see Marquardt (1980) for a discussion of the numerical benefits of standardizing or "scaling" the data).
Also note that all of these Mata options must be set in the form option(value)
(so you cannot write standardize_data
but must write standardize_data(0)
or standardize_data(1)
).
remove_collinear_variables(1)
: whether to initially check for collinear variables. Setting it to zero saves areghdfe
regression, but might make detecting separation a bit more difficultstandardize_data(1)
: whether to standardize each variable (divide it by its standard deviation). Doing so improves the numerical stability of the computations, at a small speed cost.
tolerance(1e-8)
: We achieve convergence onceepsilon < tolerance
, whereepsilon = ΔDeviance / Deviance
(with some nuances for corner cases). Thus, atolerance
closer to zero leads to slower but more accurate results, while highertolerance
can be used for quick-and-dirty regressions.use_exact_solver(0)
: whether to always use an exact least-squares solver (instead of accelerating whenitol
is within 10 oftolerance
).use_exact_partial(0)
: every IRLS iteration partials out (z, X) instead of the equivalent-but-faster alternatives (described in the paper).target_inner_tol(1e-9)
: you can actually set this directly with theitol()
option; see above.start_inner_tol(1e-4)
: initial tolerance when partialling-out.min_ok(1)
: by default we only stop when we observeepsilon < tolerance
two times. Usingmin_ok(2)
or higher forces a few extra iterations. This is useful in corner cases where the deviance has converged but the rightmost digits of the estimates might still be converging.maxiter(1000)
: maximum number of iterations, after which an error is raised.use_heuristic_tol(1)
: when this option is on, we try to anticipate if the next iteration is likely to converge. If that's the case, we preemptively increase the inner tolerance and use an exact solver, so we can stop there.
simplex_tol(1e-12)
: internal tolerance used within simplex step. In particular, this is used to round to zero variables that are withinsimplex_tol
of zero.simplex_maxiter(1000)
: maximum number of iterations used within simplex step.
relu_tol(1e-4)
: used to set internal tolerances. For instance, calls to reghdfe will be set with tolerance equal torelu_tol/10
.relu_maxiter(100)
: maximum number of iterations in ReLU step. If exceeded, the step will stop but an error will not be raisedrelu_strict(0)
: if set to 1, will raise an error ifrelu_maxiter
is exceeded.relu_accelerate(0)
: if set to 1, it will slightly increase the weights of observations that are persistently negative, usually leading to faster convergence. This has been disabled by default as it tends to slowdown the other acceleration trick used (explained in the paper; see the lineutilde = u + utilde - u_last
).
A "certificate of separation" is a variable z that can be used to certify that some observations are separated, and what regressors are causing the separation. To do so, it must satisfy two properties: a) z≥0
, and b) a least-squares regression between z
and all regressors should have an R2 of 1.
There are three undocumented options that allow you to construct and use a certificate of separation. Example:
set obs 10
gen y = _n * (_n>4)
gen x = 2 * (_n==1) + 1 * (_n==2)
ppmlhdfe y x, tagsep(separated) zvar(z) r2
list
tagsep(..)
: saves an indicator variable representing the observations identified as separated by the ReLU step.zvarname(..)
: saves the certificate of separation.r2
: use it to run a least-squares regression between the certificate of separationz
and all regressors.
If you run ppmlhdfe
with the tagsep()
option as described above, or with verbose(1)
or higher, you will see a table describing each iteration of the ReLU algorithm:
mu_tol(1e-6)
: criteria for when to tag an observation as separated. This will happen for all observations wherey=0
andμ<mu_tol
. Note thatμ=1e-8
corresponds toη=-13.82
.- Because some datasets are very skewed and have very low (but positive) y values, we do an extra adjustment. If
min(η | y > 0)
is below -5, we will make the tolerance more conservative by that amount. For instance, ifmin(η | y > 0) = - 8
, then we will havelog(mu_tol) = log(1e-6) + (-8 - -5) = log(1e-6) - 3 = -16.82
.
You can use several estimation options to customize the regression output. For instance:
sysuse auto, clear
gen w = weight
ppmlhdfe price weight w i.rep, level(90) cformat(%6.3fc) noci baselevels noomitted vsquish
Produces:
You can also display the iteration output, as well as warning messages, by using the verbose(-1) option. You go from:
To:
To produce journal-style regression tables, you can do:
cls
estimates clear
sysuse auto, clear
qui ppmlhdfe price weight, a(turn) d
qui estpost margins, dydx(_all)
qui eststo
qui ppmlhdfe price weight length, a(turn trunk) d
qui estpost margins, dydx(_all)
qui eststo
estfe *, labels(turn "Turn FE" trunk "Trunk FE")
esttab, indicate(`r(indicate_fe)', labels("yes" "")) stat(N ll, fmt(%9.0fc %10.1fc)) se starlevels(* 0.1 ** 0.05 *** 0.01) label compress
Output:
For more information, see the ppmlhdfe
article, as well as the esttab and margins manuals.
ppmlhdfe
has a simplex option that allows us to directly call the Simplex solver, and even verify that it gave the correct answer. For instance, in the example below we will pass a matrix of four observations and three regressors.
Note that X1 + X2 + X3 = (-1, -1, -1, 0)
, so observations 1, 2, and 3 are separated.
ppmlhdfe, simplex( (1, -1, -1, 0 \ -1, 1, -1, 0 \ -1, -1, 1, 0)' ) ans(0, 0, 0, 1)
Partial output: