Forum: developers

RE: gradient attribute and optim-based optimizers [ Reply ] By: Arne Henningsen on 2010-07-10 10:54	[forum:3109]
Yes, I agree with an all-upper-case name. I have sent an email to Yves (cc to you) and asked him if he has a good idea for the name of the BFGSYC method and how we should give credit to him (for providing the code).

RE: gradient attribute and optim-based optimizers [ Reply ] By: Ott Toomet on 2010-07-09 12:52	[forum:3108]
I am pretty liberal with tha naming. Perhaps BFGS1 or BFGS2? I would keep this in upper case though (we have all the maxXXX in upper case, right?). Actually I would like something with dash, but that doesn't work. We might ask YC whether it was him who actually implemented the function. Perhaps he borrowed it from somewhere as well..

RE: gradient attribute and optim-based optimizers [ Reply ]
By: Arne Henningsen on 2010-07-08 16:15

[forum:3106]

Hi Ott,

I can agree with your suggestion. However, I am not sure if maxBFGSYC() is really a good name for this function. What do you think about maxBFGSR(), maxBFGSr(), or maxRBFGS() (as the BFGS algorithm in optim() is as far as I know not written in R)? Do you have a better idea?

I think that there is a better place to give credit to Yves Croissant than a function name. Should we add his name in the DESCRIPTION file together with Spencer Graves under "with contributions from" and mention his name in the documentation of maxBFGSYC()?

RE: gradient attribute and optim-based optimizers [ Reply ] By: Ott Toomet on 2010-07-08 15:36	[forum:3105]
Hi, I would perhaps go for two distinct functions, as they are now (maxBFGS & amxBFGSYC or something like that). * This seems to be more consistent for me -- you just change the function names for changing algorithms. * This avoids another argument which is essentially the same as "method". * I feel parameter names get more easily overcrowded than function names. What do you think?

RE: gradient attribute and optim-based optimizers [ Reply ]
By: Arne Henningsen on 2010-07-07 16:00

[forum:3103]

a) Done.

b) Great idea! I have implemented this as you suggested.

d) Good points. So what would be the most user-friendly user-interface?
Suggestion:
maxBFGS( ..., implementation = "optim" )
equal to
maxLik( ..., method = "BFGS", implementation = "optim" )
equal to
maxLik( ..., method = "BFGSoptim"

and
maxBFGS( ..., implementation = "YC" )
equal to
maxBFGS( ..., method = "BFGS", implementation = "YC" )
equal to
maxLik( ..., method = "BFGSYC"

Maybe we should abbreviate "implementation" and "optim" or find other shorter names.
Which implementation should be the default?
What do you think?

RE: gradient attribute and optim-based optimizers [ Reply ]
By: Ott Toomet on 2010-06-28 11:41

[forum:3057]

a) Valid point.

b) OK, I later had a related idea. We might do something like that:

grad <- function( par, ... ) {
if(par == oldpar) {
return(oldgrad)
}
else {
g <- attributes( logLik( par, ... ) )$gradient
return( g )
}
}

d) Well... How would you distinguish b/w these two flavors of BFGS in commandline? maxLik( ... method="bfgs", flavor="YC") seems a little too clunky. maxBFGS(..., method="YC") might works, although now the methods are not that easily swappable.

RE: gradient attribute and optim-based optimizers [ Reply ]
By: Arne Henningsen on 2010-06-28 11:33

[forum:3056]

a) If both argument "grad" and attribute "gradient" are supplied, using the attribute and ignoring "grad" is computationally more efficient than using "grad", because the gradients are computed twice in the latter case. Of course, it is easier to add/remove argument "grad" than to change the objective function. However, if both argument "grad" and attribute "gradient" are supplied, then removing "grad" has not really an effect, because gradients are still used. So I still prefer to use argument "gradient" if both argument "grad" and attribute "gradient" are supplied. Or do I miss important circumstances, where it would be preferable that argument "grad" is used?

b) It seems that my comment was a little unclear. As we cannot be sure that the gradient function is always called after the objective function with exactly the same parameters, I did not suggest to store attribute "gradient" but to provide a wrapper function to optim()'s argument "gr", e.g.
grad <- function( par, ... ) {
g <- attributes( logLik( par, ... ) )$gradient
return( g )
}
Hence, logLik() is called (again) each time when a gradient should be computed. Of course, this is computationally inefficient but I do not see a better solution.

d) It seems that my comment was (again) a little unclear. I did not mean that we should internally ("under the hood") merge maxBFGS() and maxBFGSYC() but I think that it would be user-friendly if there is only one user-visible function for the BFGS algorithm, i.e. maxBFGS(). This function then either does what it currently does (i.e. calling maxOptim()) or it calls maxBFGSYC(). In the future, maxBFGSYC() could make use of maxNR(), similar to maxBHHH(). What do you think?

RE: gradient attribute and optim-based optimizers [ Reply ]
By: Ott Toomet on 2010-06-26 13:50

[forum:3041]

a) Implemented, but no warning. My suggestion is to take grad function if both grad and "gradient" attribute supplied. This is because it is easier to add/remove an argument than to change object function.

b) Yes. We probably have already something like that (logLikGradient & friends). However, it only works if gradient is _always called after_ function with _exactly the same_ parameter value. We don't know if this is the case (although it probably is).

c) They use gradient for the final Hessian. That stuff is implemented, except the priority/warning.

d) I would keep optim version as default initially for testing. Yes, it may be called like taht, although bfgs-yc is essentially a newton-raphson clone. So I was considering merging it to maxNR -- the only difference is essentially how inverse Hessian is calculated.

RE: gradient attribute and optim-based optimizers [ Reply ]
By: Arne Henningsen on 2010-06-25 15:20

[forum:3040]

As Yves pointed out, in some cases it is much faster to calculate the function value and the gradient together (in one function) than to calculate them separately (in two functions). Therefore, I find it very desirable that as many as possible algorithms used by maxLik() can use this feature.

I suggest the following:

a) If the the log-likelihood function (provided by the user) returns an attribute "gradient", it is used for the estimation in maxNR(), maxBHHH(), and maxBFGSYC. (This has already been implemented yet, right?) I think that maxNR(), maxBHHH(), and maxBFGSYC() should return a warning if they use attribute "gradient" and argument "grad" is provided by the user (and ignored by maxNR(), maxBHHH(), and maxBFGSYC()). What do you think?

b) If the the log-likelihood function returns an attribute "gradient" and maxBFGS() is used, we could write a simple wrapper function that extracts attribute "gradient". This function can then be used as argument "gr" of optim(). Of course this is not very efficient but probably better than using no gradients. This has the advantage that users do not have to change their log-likelihood function when switching from maxNR(), maxBHHH(), or maxBFGSYC() to maxBFGS(). What do you think?

c) maxNR() and maxSANN() do not need gradients. So gradients (no matter whether provided as argument "grad" or as attribute "gradient") can be ignored.

d) maxBFGS() could be the user interface both for optim( method = "BFGS" ) and for maxBFGSYC(), where the user decides whether optim() or YCs code is used. Maybe, maxBFGS() should call YCs code by default if the log-likelihood functions returns attribute "gradient" and maxBFGS() should call by default optim() if the log-likelihood function does not have this attribute. What do you think?

gradient attribute and optim-based optimizers [ Reply ] By: Ott Toomet on 2010-06-24 13:23	[forum:3037]
Should we use attribute "gradient" of the function value? It probably always works. However, it only works if gradient is only used after a call to function with exactly the same argument. Optim does not state this is the case, so your never know...