tag:blogger.com,1999:blog-4594832939334410220.post1549512505752115200..comments2024-02-12T06:23:51.153-06:00Comments on Deeply Trivial: Statistical Sins: Stepwise RegressionUnknownnoreply@blogger.comBlogger9125tag:blogger.com,1999:blog-4594832939334410220.post-16725619176001875242018-06-24T10:42:53.509-05:002018-06-24T10:42:53.509-05:00How about the (possibly?) related issue of Predict...How about the (possibly?) related issue of Predictive Analysis (by various names) wherein the practitioners disclaim interest in interpretive meaning.<br /><br />From page 4 of "Applied Predictive Modeling" - Kuhn/Johnson:<br />"Furthermore the foremost objective of these examples is not to understand why something will (or will not) occur. Instead, we are primarily interested in accurately predicting the chances that something will (or will not) happen."Robert Younghttps://www.blogger.com/profile/09056808374481236610noreply@blogger.comtag:blogger.com,1999:blog-4594832939334410220.post-81048048477812627792017-10-05T07:27:21.848-05:002017-10-05T07:27:21.848-05:00All subsets doesn't really solve the problem. ...All subsets doesn't really solve the problem. It may find the the best model, but it's still subject to the problem of capitalizing on the idiosyncrasies of the sample. Approaches like penalized ML and lasso are far better alternatives. See, for example http://www.philender.com/courses/linearmodels/notes4/swprobs.htmlMike Babyakhttps://www.blogger.com/profile/08897011712249287125noreply@blogger.comtag:blogger.com,1999:blog-4594832939334410220.post-58873414677349016432017-10-04T17:43:05.977-05:002017-10-04T17:43:05.977-05:00If we think of stepwise regression as a tool, it i...If we think of stepwise regression as a tool, it is never a good choice because with today's computing power there is a better tool--all subsets regression. Stepwise regression, contrary to above comments, does both forward and backward deleting. After several variables are added, it considers whether any can be dropped. Its branching algorithm is not optimal and so it is not guaranteed to find the best model for a given number of predictors. All subsets regression is guaranteed to find the best model. So if one wants to turn an algorithm loose on the data, the dominating choice is all subsets. The only value of stepwise regression was in the old days when computers were slow and all subsets regression was not an option. It is an obsolete tool that should never be used. Whether any statistical tool should be used in an atheoretical manner like this is another matter. But if one wants to use a tool to do that, don't use stepwise.mtnMan47https://www.blogger.com/profile/02710598986989874778noreply@blogger.comtag:blogger.com,1999:blog-4594832939334410220.post-27632661897321041242017-10-04T13:46:25.784-05:002017-10-04T13:46:25.784-05:00All excellent points. I suppose it's important...All excellent points. I suppose it's important to keep in mind that statistics are tools. Tools aren't inherently good or bad; it's in how they're used. A hammer can build a house (good) or smash your finger (bad), but because of how the tool was used. Stepwise regression could be perfectly justifiable if used in a certain way. My experience has been that it's used as a way to weed through dozens of predictor variables, some of which were collected for dubious or completely unrelated purposes. (E.g., researcher at a center adds a measure of X in another person's study, because researcher studies X and wants more data or because the granting agencies are interested in X, not because X even makes sense in the context of the study. I've seen this happen a lot.)Sarahttps://www.blogger.com/profile/13213593768515404983noreply@blogger.comtag:blogger.com,1999:blog-4594832939334410220.post-9565817026527315452017-10-04T13:35:27.887-05:002017-10-04T13:35:27.887-05:00The main problem is misuse of stepwise regression,...The main problem is misuse of stepwise regression, not stepwise regression in itself. The classic misuse is to compute inferential statistics on the regression coefficients as if the final selected variables were selected a priori by the researcher, as opposed to being selected on the basis of model fit. This inflates type 1 error rate. Nonetheless, as an exploratory data analysis technique, you could do worse than stepwise regression. Anonymoushttps://www.blogger.com/profile/06371289350843121954noreply@blogger.comtag:blogger.com,1999:blog-4594832939334410220.post-66653909944648215712017-10-04T13:22:01.423-05:002017-10-04T13:22:01.423-05:00Sara, your concerns are real, but the real problem...Sara, your concerns are real, but the real problem is atheoretical use of any statistic--the "kitchen Sink" approach--that is really taking on in non-academic circles. I use backward stepwise regression when predicting a final model from variables that are linked by theory to the outcome. I like backward because the process compares R2 change at each step and no variable is guaranteed a place in the equation, something forward stepwise doesn't do--once a variable is in the model, it is there regardless of impact of other variables down the line. I've used both approaches to test models and had some interesting outcomes.Anonymoushttps://www.blogger.com/profile/08116503082648041217noreply@blogger.comtag:blogger.com,1999:blog-4594832939334410220.post-84838980018226941442017-10-04T12:44:48.827-05:002017-10-04T12:44:48.827-05:00Seems that with p-hacking going on in fields like ...Seems that with p-hacking going on in fields like social psych, mechanical model selection might be reasonable to reduce researcher df etc. What about newer techniques like LAR? Wouldn't they have similar problems? No doubt mechanical model selection has some issues in terms of how predictors are related, but in absence of a theoretically informed model, these mechanical selection techniques seem refreshingly transparent. That ought to be worth something. Robrquinlan@wsu.edunoreply@blogger.comtag:blogger.com,1999:blog-4594832939334410220.post-44989893291004261642017-10-04T09:48:40.624-05:002017-10-04T09:48:40.624-05:00What about stepwise regression using AIC/BIC? From...What about stepwise regression using AIC/BIC? From what I've learned, this results in models that also include non-significant predictors.Timhttps://www.blogger.com/profile/13244801996272325086noreply@blogger.comtag:blogger.com,1999:blog-4594832939334410220.post-50895765990288071672017-10-04T09:31:46.056-05:002017-10-04T09:31:46.056-05:00Again, your insights are so helpful.
Thanks!Again, your insights are so helpful.<br />Thanks!Anonymoushttps://www.blogger.com/profile/02375254667704219443noreply@blogger.com