M-estimation in Julia
A goal of mine is to get up to speed with Catlab.jl, a Julia library for doing applied Category Theory. Since it’s written in Julia, I spent a bit of time getting hands-on Julia experience by toying around with something I know: M-estimation.
In this post, I code a stripped down Julia version of geex
, an R package I created to give developers a straightforward way to create new M-estimators. This Julia code (nor geex
for that matter) is not designed be to computationally optimized. It is designed so that programmers write a single function, corresponding to \(\psi\) for their “\(\psi\)-type” M-estimator; i.e. the solutions to a set of estimating equations:
\(\sum_i \psi_i(\theta) = \sum_i \psi(O_i; \theta) = 0\)
My hope is that geex
makes statisticians lives easier, and they can skip the tedious task of programming M-estimators (particularly variance estimators) by hand.
My attempt at implementing a version of geex
in Julia is:
using ForwardDiff,
IntervalRootFinding,
IntervalArithmetic,
StaticArrays
make_m_estimator(f) =
function(data::Array{Float64, 1} ; rootBox)
ψs = map(f, data)
crush(g) = mapreduce(g, +, ψs)
ψ(θ) = crush(ψᵢ -> ψᵢ(θ))
A(θ) = crush(ψᵢ -> - ForwardDiff.jacobian(ψᵢ, θ))
B(θ) = crush(ψᵢ -> ψᵢ(θ) .* ψᵢ(θ)')
Σ(θ) = inv(A(θ)) * B(θ) * inv(A(θ))'
θhat = mid.(interval.(roots(ψ, rootBox)))[1]
Σhat = Σ(θhat)
return (θhat, Σhat)
end
That’s it. First of all, this version is less general than geex
as the estimating functions only takes a 1D array of floats as data. Essentially, this limits the application to univariate data where each element of the input is a “unit”. To be useful in practice, it would need to handle multivariate data where the size of the units may be uneven. Nonetheless, the core ideas are there.
To use make_m_estimator
, you supply a function. In this case, I’m replicating example 3 in the geex
documentation.
myψ = x::Number ->
( (μ, σ², σ, lσ² ), ) ->
SVector( x - μ,
(x - μ)^2 - σ²,
sqrt(σ²) - σ,
log(σ²) - lσ²)
m = make_m_estimator(myψ)
Then to get estimates of \(\theta\) and its variance:
z = rand(1000)
Xμ = -1..1
Xσ = 0.001..0.25
r = m(z; rootBox = Xμ × Xσ × sqrt(Xσ) × log(Xσ))
The value of r[1]
is:
[0.5144286972530171, 0.08643690499929671, 0.2940015391104215, -2.448340553175395]
and
using Statistics
(mean(z), var(z), sqrt(var(z)), log(var(z))) =
(0.5144286972530172, 0.08652342842772438, 0.29414865022250974, -2.447340052841812)
So you can see the approximations are pretty good for this example.
Comparison to geex
There a couple implementation details worth noting:
- The root finding algorithm in the
Julia
version useIntervalRootFinding.jl
’sroots
function; whilegeex
(by default - the user can specify a root finding function) uses therootSolve::multiroot
function. I don’t know enough about the algorithms behind the functions to say much, but I’d like to understand the details and performance characteristics better. - To find the derivative of \(\psi\),
geex
usesnumDeriv::jacobian
(again, by default) to numerically find the derivatives; while in the Julia version uses automatic differentiation via theForwardDiff.jl
package.
These details are mostly hidden from the user, however, and the actual interface is quite similar (minus the cool factor of being able to use unicode characters in Julia):
library(geex)
mypsi <- function(data){
x <- data$x
function(theta){
c(x - theta[1],
(x - theta[1])^2 - theta[2],
sqrt(theta[2]) - theta[3],
log(theta[2]) - theta[4])
}
}
To put mypsi
in action:
dt <- data.frame(id = 1:1000, x = rnorm(1000))
r <- m_estimate(estFUN = mypsi,
data = dt,
units = "id",
root_control = setup_root_control(start = c(0, 1, 1, 0)))
coef(r)
## [1] 0.003024682 1.053540090 1.026421010 0.052156008
Summary
- I was able to get and up and going with Julia rather quickly. Overall, my experience was good.
- In the future I hope to look ways of computationally optimizing the approach to M-estimation implemented above. Julia has a reputation for being fast, so perhaps I may dig around in the Julia ecosystem further…
- …it looks like there already is an
MEstimation.jl
package. Definitely seems worth checking out!