Forum: open-discussion
Monitor Forum | Start New ThreadRE: Compare mvProbit (R), mvprobit (Stata), Mplus output [ Reply ] By: Arne Henningsen on 2015-02-18 19:54 | [forum:41925] |
Dear Paul Thanks a lot for sharing your very interesting observations! You may try to use the estimates of one software as starting values in another software and check how this affects the estimates. Best wishes, Arne |
Compare mvProbit (R), mvprobit (Stata), Mplus output [ Reply ] By: Paul Johnson on 2015-02-18 16:52 | [forum:41922] |
I've got a side-by-side comparison to share. If you want to repeat, run the R thing first, it will manufacture data (into workingdata folder), and the Stata and Mplus scripts will use same. I put the working folder here http://pj.freefaculty.org/scraps/mvprobit That has the full code for all 3 as well as the output, I'm just pasting in some highlights below. The big conclusion so far is that all of the methods return the same coefficient estimates on the predictors, and there are differences for the estimates of the random effects. The puzzle for me right now is that Mplus is super fast, giving estimates quite similar to R mvProbit, and the error correlation estimates from Stata are systematically different. I'm not enough of an expert on numerical methods to diagnose these differences, but I'm learning as I go. The data & model is based on a simple example in mvProbit's help, with 3 dichotomous outcomes and 2 predictors. Output summary R: Coefficients: Estimate Std. error t value Pr(> t) b_1_0 0.89680 0.14243 6.297 3.04e-10 *** b_1_1 1.26491 0.24651 5.131 2.88e-07 *** b_1_2 -0.96857 0.15724 -6.160 7.29e-10 *** b_2_0 -0.55607 0.14234 -3.907 9.36e-05 *** b_2_1 1.14664 0.22613 5.071 3.96e-07 *** b_2_2 -1.71012 0.20103 -8.507 < 2e-16 *** b_3_0 0.60155 0.12544 4.796 1.62e-06 *** b_3_1 -0.88220 0.18228 -4.840 1.30e-06 *** b_3_2 1.02642 0.12082 8.495 < 2e-16 *** R_1_2 0.06633 0.19285 0.344 0.7309 R_1_3 0.44016 0.17237 2.554 0.0107 * R_2_3 -0.22936 0.12369 -1.854 0.0637 . Compare the R_* estimates to rho in the Mplus output and the With values in Mplus. Stata (1000 draws ghk): I don't know what /atrho is, as opposed to rho. Log likelihood = -329.11704 -------------------------------------- | Coef. Std. Err. -------------+------------------------ y1 | x1 | 1.260093 .2474644 x2 | -.9638219 .1479759 _cons | .896007 .1386929 -------------+------------------------ y2 | x1 | 1.14732 .2129341 x2 | -1.70338 .1815299 _cons | -.5584081 .14178 -------------+------------------------ y3 | x1 | -.8754827 .1774318 x2 | 1.02252 .1176464 _cons | .6052505 .1254032 -------------+------------------------ /atrho21 | .105835 .1530473 /atrho31 | .2967765 .1620035 /atrho32 | -.2037864 .1221816 -------------+------------------------ rho21 | .1054416 .1513457 rho31 | .2883599 .1485327 rho32 | -.2010115 .1172448 -------------------------------------- Mplus: Note Mplus estimates thresholds instead of intercepts, so signs are reversed. The correlation estimates Estimate S.E. Y1 ON X1 1.234 0.228 X2 -0.968 0.154 Y2 ON X1 1.122 0.213 X2 -1.688 0.182 Y3 ON X1 -0.899 0.179 X2 1.036 0.116 Y2 WITH Y1 0.076 0.177 Y3 WITH Y1 0.449 0.158 Y2 -0.235 0.107 Thresholds Y1$1 -0.900 0.141 Y2$1 0.550 0.140 Y3$1 -0.624 0.123 R mvProbit Stata mvprobit Mplus R_1_2 0.06633 .1054416 0.076 R_1_3 0.44016 .2883599 0.449 R_2_3 -0.22936 -.2010115 -0.235 I realize from this list's history that I have been at this point before and have completely forgotten it. Arne asked if the estimated marginal effects match up, and I do not yet have an answer. I am trying to understand Greene's comments on that and at a certain level my age & math limitations cause me trouble. I think I see ways to make R mvProbit go faster, there's a for loop that does repeated calculations and there is a way to avoid some repetition and the ghkvec call can be vectorized, I think (hope). While speeding up mvProbit is an objective, it is important to note now that I do not think mvProbit is currently slower than Stata mvprobit. At first it appeared R mvProbit was grossly slower than Stata mvprobit, however, I think that was due to different settings on the number of ghk draws. The default nGHK in R mvProbit is 1000, and if I tune the number of draws in Stata to 5. If we set draws = 1000 in Stata, it takes much much longer than mvProbit in R. I can't say for sure if the tolerance critieria are comparable, however. Anyway, R mvProbit is not grossly slower than Stata mvprobit. If you look in the Stata output directory, find the log file, you see 2 fits, one for a lower number of draws, one for 1000 ghk draws. The parameter estimates for predictors are similar, the correlation estimates are different. Mplus, using some code that my GRAs say is "tricking" Mplus into estimating a comparable model, is lightning fast. It returns so fast that I suspect it is using a look-up table for the normal approximations. Since we are "tricking" Mplus to make this work, you might be cautious. However the coefficient estimates are similar. So far, it is comforting that all of these programs are in the same "ballpark" on the error correlations, but also frustrating that they differ and I'm having to wade through a lot more details than I want to. |