Forum: help

RE: Beginner Question [ Reply ] By: Arne Henningsen on 2015-05-16 08:31	[forum:42247]
Dear Max I do not fully understand how you want to do the pooling. Sorry! It seems to me that you look for something like the "oheckman" procedure in STATA [1]. Unfortunately, I am not aware of an implementation of this estimator in R. [1] http://www.stata-journal.com/sjpdf.html?articlenum=st0123 Best regards, Arne

RE: Beginner Question [ Reply ]
By: Max Thomasberger on 2015-04-08 09:36

[forum:42141]

Dear Arne,

thank you very much for your answer, you helped me and my thesis a lot! :)

Since I ultimately want to do a oaxaca/blinder decomposition between the genders I will have to correct for sampleselection in the fulltime-sample. So I´ll have to pool the non-workers with the parttime workers (participation dummy=0) and the fulltime workers (participation dummy =1), correct?

I like the idea for a ordered probit, because I feel that this is a more realistic approach, do you have a suggestion for a textbook or paper that uses ordered-probits for sampleselection?

Thanks again very much for your help!

Max

RE: Beginner Question [ Reply ]
By: Arne Henningsen on 2015-04-07 05:47

[forum:42137]

Dear Max

I am not sure whether I correctly understood your question. So you want to separate the observations into three categories: no employment, part-time employment, and full-time employment using an arbitrarily chosen threshold for weekly working hours to distinguish part-time employment from full-time employment and then separately estimate the wage equations for the two types of employment? In this case it is probably best if the selection decision is modelled using an ordered probit model, which is not implemented in the sampleSelection package. In some empirical applications, it could perhaps be suitable to combine the non-employed and the part-time employed observations and then use a standard Heckman sample selection model to estimate the wage equation for the full-time employed observations. However, I think that it is never suitable to combine the non-employed and the full-time employed observations and then use a standard Heckman sample selection model to estimate the wage equation for the part-time employed observation, because the non-employed and the full-time employed observations are very different.

Best regards,
Arne

RE: Beginner Question [ Reply ]
By: Max Thomasberger on 2015-03-30 09:41

[forum:42118]

Dear Arne,

I´d like to pick your brain one more time, I know that this forum might not be the best place to ask these questions but since you are a specialist on sample selection (in fact the only one I "know") the answer might be very easy for you.

The textbook example of a heckman 2step sample selection model is between labourforce participation and no labourforce participation (either zero log hourly wage or a log hourly wage above zero). But there could also be sample selection between fulltime and parttime work,and here the difference between the log hourly wages is not as clear as in the first palce.

Could one use the heckman 2step model for this case as well or are the basic assumptions of the heckmanmodel wrong in this case?

I am sorry if I overstepped or wasted your time

All the best,
Max

RE: Beginner Question [ Reply ] By: Max Thomasberger on 2015-03-29 12:06	[forum:42116]
Dear Arne, thanks for your quick and helpful reply, as with many good answers the solution seems very obvious :) Liebe Grüße/All the best, Max

RE: Beginner Question [ Reply ]
By: Arne Henningsen on 2015-03-28 21:18

[forum:42115]

Dear Max

The second step of the two-step procedure only includes observations of labour market participants:

result_ols <- lm (wage ~ exper + I( exper^2 ) + educ + city + IMR,
data=Mroz87, subset = Mroz87$lfp == 1 )

This command returns the same estimates as the heckit() command. Please note that the standard errors differ, because heckit() takes into account that IMR is a generated explanatory that is based on the first-step estimation, while lm() does not take this into account.

Best regards,
Arne

Beginner Question [ Reply ]
By: Max Thomasberger on 2015-03-28 13:20

[forum:42114]

Hi,

I am learning sampleselection theory for my masterthesis (I want to use the Oaxaca/Blinder decomposition corrected with sampleselection like Reimer 1983) and want to replicate the first example from

http://www.inside-r.org/packages/cran/sampleSelection/docs/heckit

(Greene( 2003 ): example 22.8, page 786)

If I run the code extract the calculated InverseMills-ratio and run an OLS-regression on the same equation I get different results, which puzzles me...

Can anybody give me a hint how this is possible? I obviously didn´t understand something very important. I thought the second step in the 2step procedure is simply using the InverseMillsratio as additional regressor in OLS, but the results of the both approaches are very different.

Thank you very much in advance!
Max

Code:

rm(list=ls())
library (texreg)
library (Sampleselection)
library(stargazer)

data( Mroz87 )
Mroz87$kids <- ( Mroz87$kids5 + Mroz87$kids618 > 0 )
result_selection <- heckit( lfp ~ age + I( age^2 ) + faminc + kids + educ, wage ~ exper + I( exper^2 ) + educ + city, Mroz87)
Mroz87$IMR <- result_selection$invMillsRatio
result_ols <- lm (wage ~ exper + I( exper^2 ) + educ + city + IMR, data=Mroz87)

stargazer(result_selection, result_ols, type="text")

> stargazer(result_selection, result_ols, type="text")

==========================================================
Dependent variable:
--------------------------------------
wage
Heckman OLS
selection
(1) (2)
----------------------------------------------------------
exper 0.021 0.170***
(0.062) (0.039)

I(exper2) 0.0001 -0.003**
(0.002) (0.001)

educ 0.417*** 0.264***
(0.100) (0.073)

city 0.444 0.064
(0.316) (0.228)

IMR -2.337***
(0.840)

Constant -0.971 -0.620
(2.059) (1.419)

----------------------------------------------------------
Observations 753 753
R2 0.126 0.171
Adjusted R2 0.116 0.165
rho -0.343
Inverse Mills Ratio -1.098 (1.266)
Residual Std. Error 2.962 (df = 747)
F Statistic 30.742*** (df = 5; 747)
==========================================================
Note: *p<0.1; **p<0.05; ***p<0.01