Forum: open-discussion

RE: mvProbit: How to use weight? What is the base for category variables? [ Reply ]
By: Arne Henningsen on 2013-06-13 09:29

[forum:39664]

Dear Arnob and Paul

@Paul: Thanks a lot for answering Arnob's question. I totally agree with your answer.

Arnob's problem is definitely a shortcoming of the "mvProbit" package that should be addressed in the future. If somebody adds a feature request at R-Forge, it is less likely that I forget it ;-)

Arnob or other users who have categorical ("factor") explanatory variables in mvProbit, could manually create dummy variables for all but one level of each factor variable until somebody has fixed this issue.

@Arnob: What kind of weights do you want to set?

Best regards,
Arne

RE: mvProbit: How to use weight? What is the base for category variables? [ Reply ]
By: Paul Johnson on 2013-05-28 19:26

[forum:39618]

Hello Arnob:

I'm not Arne, but I can help with the categorical predictors. There is not one hard and fast answer to "which level was omitted" because R has several built in methods. And depending on how your data is "read into" R, you can get different answers. This is determined by the options() function.

> options("contrasts")
$contrasts
unordered ordered
"contr.treatment" "contr.poly"

The default is treatment contrasts, which means the first level of a factor has no parameter associated with it. But that is not always what you expect, and I believe you were right to be cautious because mvProbit does not label the estimates.

If you have not used R very much/at all, I would say that starting with mvProbit is a little like learning to swim the day of your first Olympic swimming match. Also note that mvProbit instructions still say it is a test version, and, as you noted, its output is still "bare metal" variety (see warnings output below)

Now, about categories. This is not an mvProbit question, so it is off topic for this list, but.... You can find out how R looks at your factors with commands like this. I created a factor example. R's own alphabetical guessing approach puts numbers first, then letters after. So I'm pretty sure your variabls is not what you think, unless you've taken some special effort.

> x <- c("less than 50", "50-65", "above 65", "above 65")
> xf <- factor(x)
> levels(xf)
[1] "50-65" "above 65" "less than 50"
> contrasts(xf)
above 65 less than 50
50-65 0 0
above 65 1 0
less than 50 0 1

Here's where your caution is rewarded, since the levels are not labeled in the mvProbit output, and, as you noticed, it is not possible for you to be sure which is which from that output.

I'd suggest you should get some familiarity with R more generally. You should probably run your model the old one dependent variable at a time method with glm using the correct family and link, and after that graduate to mvProbit or similar.

Now, when you come back to mvProbit, you can see for yourself what mvProbit is doing. Type "mvProblt" and hit return (no parentheses). That shows the code it is running. Scan down do the thing that creates xMat, you'll see it is using R's model.matrix function. (see ?model.matrix and example("model.matrix") and you'll get the idea. If you took your model formula and data frame, you could run model.matrix() for yourself and then you would know exactly what data mvProbit is using to estimate your model. No guessing needed there.

I'd not tested mvProbit since R-3.0 was released, and with version 3.0.1, today I installed mvProbit and ran the example (example(mvProbit)) which resultes in a lot of warnings that a well educated user ought to track down.

There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: glm.fit: fitted probabilities numerically 0 or 1 occurred
2: In pmvnormWrap(upper = xBetaTmp, sigma = sigmaTmp, algorithm = algorithm, ... :
the correlation matrix is not positive definite
3: In pmvnormWrap(upper = xBetaTmp, sigma = sigmaTmp, algorithm = algorithm, ... :
the correlation matrix is not positive definite

....

I'm sorry to say I don't know of a "turn key" finished R project that will get this done for you in a tidy way. If you find one, you can let me know.

Paul Johnson <pauljohn at ku.edu>
University of Kansas

mvProbit: How to use weight? What is the base for category variables? [ Reply ]
By: Arnob Hoque on 2013-02-27 06:37

[forum:39617]

Hi Arne
I am a new user in R.
I am using multivariate probit to model farmers' simultaneous crop adoption decision. Most of my independent variables are category variables.

I wonder what mvProbit in R is treating as base category for the category variables. For example if "age is a category variable in my model with 3 categories "less than 50" , "50-65" and "above 65", which one would the package consider as base category? My understanding is that it is not considering the lowest category as base category. How can I check in "R" which category of each variable has been considered as base category?

Since the estimation results in R with mvProbit are given without label( gives as as b_2_0, b_2_1, b_2_2....b_3_0, b_3_1...), I cannot identify which categories are omitted as base category. Is there any way I can set the option to see the label of variables in the result instead?

In addition, would you please tell how to use weights in mvProbit directly?

Asking too many questions in one thread. Thanks and appreciation in advance.