Forum: open-discussion

RE: how should results compare with Stata mvprobi [ Reply ]
By: Arne Henningsen on 2012-01-28 21:13

BTW: I once compared the estimates obtained from R/mvProbit with the estimates that a co-author obtained from LIMDEP/NLOGIT: while the estimated coefficients were very similar, the estimated marginal effects were partly very different (even different signs). finally, we chose the marginal effects obtained by R/mvProbit, because the marginal effects returned by R/mvProbit were more consistent with the (signs of the) estimated coefficients and we could verify the code of R/mvProbit but not the code of LIMDEP/NLOGIT. Does STATA return marginal effects? Are they similar to the marginal effects returned by R/mvProbit? Please note that R/mvProbit can return different types of marginal effects (unconditional and conditional with different assumptions about the values of the other dependent variables).

RE: how should results compare with Stata mvprobi [ Reply ]
By: Arne Henningsen on 2012-01-28 20:55

[forum:5496]

The correlation coefficients are estimated in the same way as the coefficients of the explanatory variables: the (log-)likelihood function depends on all parameters (correlation coefficients + coefficients of explanatory variables) and it is maximised with respect to all parameters. In my experience, the log-likelihood function can be very flat around some correlation coefficients and the effect of the correlation coefficients on the log-likelihood function can considerably depend on the precision of the numerical integration. Therefore, I strongly recommend to re-estimate the model with higher precision of the numerical integration (both in R/mvProbit and STATA). Furthermore, you might try out the other things that I suggested in my previous post.

As I have neither STATA nor LIMDEP/NLOGIT, I am very interested in your comparisons.

RE: how should results compare with Stata mvprobi [ Reply ]
By: Paul Johnson on 2012-01-28 18:36

[forum:5495]

Thanks, Arne.

I'm not exactly asking for a statistics lesson, but it may seem that way. In mvProbit, I need to understand how the correlation coefficients are identified so I can figure out if there is something wrong, because the estimates from the Stata module are so different. THey are different in sign, not just magnitude.

The estimates on the slope coefficients are mostly consistent.

I suppose I should construct a simulated data set in which I know what the estimates are supposed to be so I can compare across mvProbit and the Stata module. Have you tried to do something like that? If not, I will and can share back to you. If you did, save me the trouble and post it up.

I do not have a copy of Greene's estimator for mv probit models, but I bet I can find somebody who does have it, we could get a comparison case from that as well. I gather he's branched from LimDep to a new thin NLogit or such.

RE: how should results compare with Stata mvprobi [ Reply ]
By: Arne Henningsen on 2012-01-27 06:02

[forum:5487]

Dear Paul

Thanks a lot for starting this interesting discussion. You are right: the calculation of the likelihood function of multivariate probit models requires numerical integration. Different algorithms are available for the numerical integration of the multivariate normal distribution. The R function mvProbit() allows the user to choose between 4 different algorithms, where the GHK (Geweke-Hajivassiliou-Keane) simulator is used by default. Another very suitable option is the algorithm developed by Genz (1992, 1993) and Genz and Bretz (2002), which can be chosen by setting argument 'algorithm' to 'GenzBretz()'. All these algorithms only return approximations of the true integrals, where the user can choose the precision. However, increasing the precision also increases the execution time. Therefore, I suggest to use a moderate precision (e.g. the default precision) for an exploratory analysis and use a higher precision for estimating the final model(s). This can be done, e.g., by

mvProbit( ..., nGHK = 5000 )

or

mvProbit( ..., algorithm = GenzBretz( maxpts = 1e6, abseps = 1e-5 ) )

Particularly when using higher precisions, it might be useful to set argument 'print.level' to a value larger than zero, e.g. '3', so that the user can follow the speed of the convergence.

Finally, it might be a good idea to try different optimisation methods, particularly the BFGS method.

When comparing the estimates from different software packages (or different integration algorithms), it is often useful to use the estimates of one software package as starting values in the other software package and vice versa. Different software packages (or different integration algorithms or different precisions of the numerical integration) usually result in different log-likelihood values even for identical estimates and identical data. Therefore, when comparing the log-likelihood values of different estimates, the same software, the same integration algorithm and the same precision of the numerical integration should be used. Please note that often the estimates from software package (or integration algorithm or precision) A have a higher log-likelihood value than the estimates from software package (or integration algorithm or precision) B when both log-likelihood values are computed with software package A but they have a lower log-likelihood value than the estimates from software package B when both log-likelihood values are computed with software package B. This is not really surprising as both software packages maximise the log-likelihood value using their own integration algorithm and precision. Increasing the precision of the integration algorithms in both software packages usually solves this problem.

how should results compare with Stata mvprobi [ Reply ]
By: Paul Johnson on 2012-01-25 04:20

[forum:5468]

Greetings.

Arne suggested I ask this question here so that there would be a public record on it.

I was excited to see the mvProbit package because I get a lot of
questions from students who have data of that type. We analyze a lot of survey data, which is always categorical.

Recently, I had a student who needed to estimate a probit model with 3 outcomes. He had used the Stata module mvprobit. I ran that same data through mvprobit estimator and the coefficient estimates were mostly on the same scale--the same things were big and small, statistically significant, and so forth. I'm no expert on mv models, but from the Stata module article, I see they are approximating the integral calculations with MC simulation, so there should always be some little differences. Right?

However, the estimated correlations among the errors were quite
different. After noting the difference, I gave up on learning the
details of your estimator, and told him to use Stata. But now I'm
curious to understand this a little bit. When I fit his data with
Stata, the correlations between the errors are negative, negative, and positive, while with your mvprobit, there is one negative and two positives.

Are these differences "supposed" to happen because of different
identification assumptions? I understand the identification
assumptions that go into a univariate probit model's slope
coefficients, but have no idea how the correlation estimates get
traction. Do you know what I mean?

--
Paul E. Johnson
Professor, Political Science
University of Kansas