#delimit ; clear; set trace off ; set more off ; window manage forward results; noi di _n ; * do vsu-pr.do ; noi di in whi _col(12) "The bootstrap methods" _n _col(18) "Stanislav Kolenikov" _n _col(24) "skolenik@recep.glasnet.ru" _n _d(48) "-" _n(2) ; * noi vsutut ; noi di in gre _n(2) "This tutorial can be discontinued at any time by pressing " in wh "Break" in gr " (Ctrl-Break at" _n "the keyboard or Break icon at the toolbar. Press " in whi "Enter" in gre " or " in whi "Space" in gre " when you see " _n in blu "--more--" in gre " message." _n(4) ; * run vsu-prel.do ; cap pro drop More ; pro def More ; set more on ; more ; set more off ; end ; global SEPAR `" _n in whi _d(79) "-" _n "'; noi di in gre "This tutorial overviews the main uses of the bootstrap procedures in " _n "econometric practice. The Stata commands to be discussed are:" _n(2) in whi _col(15) "bs bstrap bstat bsample" _n(2) in gre "You can get a more detailed information on each of them by invoking" _n in whi "help bs " in gre "from Stata prompt after the tutorial session." ; More; noi di _n(3) $SEPAR in gre "The bootstrap is a computationally intensive method to obtain an approximation" _n "of the distribution of sample statistics such as median, slope coefficient" _n "of regression, or some nonlinear function of the data, distribution of" _n "which cannot be obtained analytically, or derivation is too difficult." _n "This is done by substituting the sample distribution for the population" _n "distribution and then drawing repeated samples (i.e., bootstrap samples)." _n "From those bootstrap samples, a series of values of the statistics of" _n "interest can be obtained." _n(2) "In other words, the bootstrap methods exploit the analogy (or, rather," _n "relation) between the population and the sample. To estimate some population" _n "characteristics, one would need to draw a number of observations and calculate" _n "the sample statistics based on these resamplings. The former is done via" _n "generating the observations by using the underlying distribution function" _n "(known, if it is Monte Carlo studies, or unknown, if it is an empirical data" _n "generating process, or repeated experiment)." _n(2) "In a similar way, bootstrap uses the sample distribution as the only" _n "possible proxy for the true one, and draws its samples by using the" _n "empirical distribution function."_n ; More; noi di _n in gre "Now, let us see how this works. Pretend we have the following data set: " $SEPAR _n ; set obs 12 ; set seed 880239 ; g int x=int(-16*log(1-uniform()))+1 ; sort x; lab var x "Some data" ; noi di in whi ". list x" ; push list x ; noi list x ; More; noi di _n in whi ". summarize, detail"; push summarize, detail; noi summarize, detail; More; noi di _n(2) $SEPAR in gre "Let us try to obtain (the bootstrap estimates of) the confidence intervals for" _n "the median of the sample of the fixed size. We shall have a look at several" _n "iterations (or, rather, resamples) which the bootstrap makes on this data." _n "For illustrative purposes, we shall use the low-level Stata bootstrapping" _n "command " in whi "bsample" in gre " that lies in a core of all bootstrapping procedures." _n "It substitutes the data in the memory by the bootstrap resample." $SEPAR _n; /* noi di in whi ". bstrap showbs, reps(3) noisily args(More)" ; push bstrap showbs, reps(3) noisily arge(More); noi bstrap showbs, reps(3) noisily args(More); */ More ; noi di in whi ". preserve" ; push preserve ; noi preserve ; noi di _n in whi ". bsample" ; push bsample ; noi bsample ; noi di _n in whi ". list" ; push list ; noi list ; noi di in whi _n ". summarize, detail" ; push summarize, d ; noi summarize, d ; More; noi di _n $SEPAR in gre "Now, we see that some observations were omitted, while others, repeated two" _n "or more times. In the output, the median is the 50th percentile. Stata saves" _n "it in the return code r(p50); see " in whi "help return" in gre " and " in whi "help summarize" in gre "." _n "Let us have a look at another couple of the bootstrap samples. Note that we" _n "added " in white "set seed" in gre " to make results easily reproducible. It always makes sense" _n "when you are dealing with some (pseudo)randomized processes like the bootstrap." _n "Also, note the use of " in whi "preserve" in gre " in this occasion in order to be able to " in whi _n "restore" in gre " the original data in memory." $SEPAR _n; More; noi di in whi ". set seed 800923"; push set seed 800923; noi set seed 800923; noi di in whi _n ". restore, preserve" ; push restore, preserve ; noi restore, preserve ; noi di in whi _n ". bsample" ; push bsample ; noi bsample ; noi di in whi _n ". list" ; push list ; noi list ; noi di in whi _n ". summarize, detail" ; push summarize, d ; noi summarize, d ; More; noi di in whi _n ". restore, preserve" ; push restore, preserve ; noi restore, preserve ; noi di in whi _n ". bsample" ; push bsample ; noi bsample ; noi di in whi _n ". list" ; push list ; noi list ; noi di in whi _n ". summarize, detail" ; push summarize, d ; noi summarize, d ; More; noi di $SEPAR in gre "Of course, one cannot take seriously inference based on three observations," _n "so after looking at those illustrations of resampling, let us turn to a" _n "more appropriate elaboration. To do this, we would need to invoke other" _n "Stata bootstrap-related commands. The easiest one is " in whi "bs" in gre ", and it just" _n "records the specified returned values of the specified command. It works" _n "like this:" $SEPAR _n; More; noi di in whi ". restore" ; push restore ; noi restore ; noi di in whi _n ". set seed 71842"; push set seed 71842; noi set seed 71842; noi di _n in whi `". bs "summarize x, detail" "r(p50)", reps(100) dots saving(bs100) replace"' ; push bs "summarize x, detail" "r(p50)", reps(100) dots saving(bs100) replace ; noi bs "summarize x, detail" "r(p50)", reps(100) dots saving(bs100) replace ; More; noi di $SEPAR in gre "The first argument is Stata command to be executed on the data, the second" _n "one is the statistics to be retrieved, then the number of the bootstrap" _n "samples we need, progress indicator, and file saving options." _n(2) "In the presented table, Stata reports what the actually observed value was," _n "how many bootstrap samples were taken, and a bunch of the bootstrap" _n "statistics." _n(2) "First, bias is the difference between the average across the bootstrapped" _n "statistics and the actual value. The bias should be a concern for heavily" _n "skewed distributions. The indicator for such a concern might be that bias" _n "is more than a quarter of the estimated standard deviation." _n ; More; noi di in gre "Second, standard deviation is the usual square root of the variance of the" _n "bootstrapped values of the statistic. Finally, three different versions" _n "of confidence intervals are presented: normal (assuming that the statistic" _n "of interest is normally distributed), percentile (i.e. naive percentiles of" _n "the bootstrap distribution), and bias-corrected (see special literature on" _n "the bootstrap)." _n; More; noi di in gre "We can retrieve the bootstrap information to have a closer look at it." _n "Recall the option " in whi "save(bs100) replace" in gre " we specified in our " in whi "bs" in gre " command." _n "Here, bs100 is the name of the file to contain the 100 bootstrap resamples" _n "along with the actually observed values." $SEPAR _n; noi di in whi ". use bs100, clear" ; push use bs100, clear ; noi use bs100, clear ; noi di in whi ". describe" ; push describe ; noi describe ; More; noi di _n $SEPAR in gre "We can redisplay the bootstrap statistics any moment we like by typing " in whi "bstat" in gre ":" $SEPAR _n; noi di in whi ". bstat" ; push bstat ; noi bstat ; More; noi di _n $SEPAR in gre "We can also play around the bootstrap statistic by graphing its" _n "distribution (normal density and the actually observed value are" _n "superimposed)..." $SEPAR; noi di in whi ". graph bs1, bin(20) xlabel xline(11.5) norm ylabel(0(0.1)0.3)"; push graph bs1, bin(20) xlabel xline(11.5) norm ylabel(0(0.1)0.3); noi graph bs1, bin(20) xlabel xline(11.5) norm ylabel(0(0.1)0.3); More; window manage forward results; noi di _n $SEPAR in gre "...tabulating the unique values..." $SEPAR; noi di in whi ". tabulate bs1" ; push tabulate bs1 ; noi tabulate bs1 ; More; noi di $SEPAR in gre "... or testing for normality. Null hypothesis of normality should be" _n "rejected in our case." $SEPAR ; noi di in whi ". sktest bs1" ; push sktest bs1 ; noi sktest bs1 ; More; noi di _n(7) $SEPAR in gre "Now, let us finally consider the most typical sort of the bootstrap use." _n "Usually calculation of the statistic of interest may be cumbersome " _n "and no ready-to-go Stata command might be available. Thus, you could" _n "need to do some programming to get your bootstrap estimates." _n(2) "As an example, we shall use the famous Stata automobile data." $SEPAR _n; local stdir : sysdir STATA ; noi di in whi `". use "`stdir'\auto.dta", clear"'; push use `"`stdir'\auto.dta"', clear; noi use `"`stdir'\auto.dta"', clear; More; noi di in whi ". describe"; push describe; noi describe; noi di _n $SEPAR in gre "We shall study the determinants of the car price, and we shall pose" _n "the following problem: are robust regression estimates better than OLS ones?" _n "Of course, Stata does report standard errors, but the confidence intervals" _n "based on these standard errors may be of wrong coverage. So we shall use" _n "the bootstrap confidence intervals to compare the two." $SEPAR _n; More; noi di _n $SEPAR in gre "First of all, let us see what we have as a starting point." $SEPAR _n ; noi di in whi ". regress price weight foreign mpg"; push regress price weight foreign mpg; noi regress price weight foreign mpg; More; noi di _n(2) in whi ". rreg price weight foreign mpg"; push rreg price weight foreign mpg; noi rreg price weight foreign mpg; More; noi di _n(2) $SEPAR in gre "The two sets of estimates differ, and the confidence intervals of the " _n "significant variables do not overlap much. So, what estimates should we" _n "trust? That is the point to study with the bootstrap." _n(2) "Now, the programming point. The most advanced version of the bootstrap" _n "bunch of commands is " in whi "bstrap" in gre ", but it also requires some bit of programming." ; More; local a "` " ; local b 1 ; local c " '" ; noi di in gre _n "If you look up " in whi "help bstrap" in gre " (or Read The Fine Manual), you would find that" _n "the program you are to write is to comply with a special form of convention:" _n(2) in whi " program define" in gre " your program name" _n in whi " version 6" _n `" if ""' substr("`a'",1,1) "`b'" substr("`c'",2,1) `""=="?" { "' _n `" global S_1 ""' in gre "variable names" in whi`"""' _n " exit " _n " }" _n in gre " your calculations on the data in memory" _n in whi " post " substr("`a'",1,1) "`b'" substr("`c'",2,1) in gre " results of calculation in the earlier specified order" _n in whi " end" _n ; More; noi di _n(2) in gre "The most specific points are the line 3 with " in whi "if" in gre " and the last but one line" _n "with" in whi " post" in gre ". The former tells the " in whi "bstrap" in gre " command how do you wish your" _n "results be called, and the latter does save the results in a separate" _n "Stata file. The program must be made known to Stata; the most standard" _n "way to do it is to save it as a separate " in whi ".ado" in gre " file visible to Stata." _n "Another option would be to define it right away. It is not at all convenient" _n "to type all the program statements interactively, but it can be relatively" _n "easily done when you are writing a do-file. Here in the tutorial we shall" _n "define it -- watch our steps and their compatibility with the above outline!" $SEPAR _n ; More; * noi di in whi ". type autobs.ado"; * push type autobs.ado; * noi type autobs.ado; noi di in whi _n(2) "capture program drop autobs" _n "program define autobs" _n " version 6" _n `" if ""' substr("`a'",1,1) "`b'" substr("`c'",2,1) `""=="?" { "' _n `" global S_1 "oweight oforeign ompg rweight rforeign rmpg" "' _n " exit" _n " }" _n " regress price weight foreign mpg" _n " local ow=_b[weight]" _n " local of=_b[foreign]" _n " local om=_b[mpg]" _n " rreg price weight foreign mpg" _n " local rw=_b[weight]" _n " local rf=_b[foreign]" _n " local rm=_b[mpg]" _n " post " substr("`a'",1,1) "1" substr("`c'",2,1) " (" substr("`a'",1,1) "ow" substr("`c'",2,1) ") (" substr("`a'",1,1) "of" substr("`c'",2,1) ") (" substr("`a'",1,1) "om" substr("`c'",2,1) ") (" substr("`a'",1,1) "rw" substr("`c'",2,1) ") (" substr("`a'",1,1) "rf" substr("`c'",2,1) ") (" substr("`a'",1,1) "rm" substr("`c'",2,1) ") "_n "end" _n(2); push capture program drop autobs ; push program define autobs ; push version 6 ; push if "`1'"=="?" { ; push global S_1 "oweight oforeign ompg rweight rforeign rmpg" ; push exit ; push } ; push regress price weight foreign mpg ; push local ow=_b[weight] ; push local of=_b[foreign] ; push local om=_b[mpg] ; push rreg price weight foreign mpg ; push local rw=_b[weight] ; push local rf=_b[foreign] ; push local rm=_b[mpg] ; push post `1' (`ow') (`of') (`om') (`rw') (`rf') (`rm') ; push end ; capture program drop autobs ; program define autobs ; version 6 ; if "`1'"=="?" { ; global S_1 "oweight oforeign ompg rweight rforeign rmpg" ; exit ; } ; regress price weight foreign mpg ; local ow=_b[weight] ; local of=_b[foreign] ; local om=_b[mpg] ; rreg price weight foreign mpg ; local rw=_b[weight] ; local rf=_b[foreign] ; local rm=_b[mpg] ; post `1' (`ow') (`of') (`om') (`rw') (`rf') (`rm') ; end ; #delimit ; More; noi di _n $SEPAR in gre "We would like you to draw your attention to several points." _n(2) "First, we made sure we do not have a program called " in whi "autobs" in gre " in Stata memory." _n(2) "Second, we used " in whi "=" in gre " (equal sign) when assigning our macros. If we did not," _n "Stata would just rememeber the string " in whi "_b[weight]" in gre " etc., and substitute it" _n "when parsing " in whi "post" in gre " command. This would result in substituting the last" _n "estimates results, instead of both estimates." _n ; More; noi di in gre "Third, we used parentheses in " in whi "post" in gre " argument to prevent Stata parser from" _n "trying to calculate whatever is in the argument string. Imagine we have" _n in whi ". post filename 2 6 -3 4 5 1" in gre _n "Then Stata would calculate: " in whi "6-3=3" in gre " when parsing this string, and thus" _n "spoiling everything except the first argument." _n ; More; noi di in gre "These are some tricks with " in whi "bstrap" in gre " command. Now we shall run our program." $SEPAR _n(2) ; noi di in whi ". set seed 81672" _n; push set seed 81672; noi set seed 81672; noi di in whi ". bstrap autobs, reps(200) dots saving(regbs) replace every(20) double"; push bstrap autobs, reps(200) dots saving(regbs) replace every(20) double; noi bstrap autobs, reps(200) dots saving(regbs) replace every(20) double; More; noi di _n(2) $SEPAR in gre "Now, we see that OLS confidence intervals in the basic regression seem" _n "to be too optimistic. The point is even stronger with robust estimates" _n "which does not have to be too surprising given the robust algorithm" _n "implemented in" in whi " rreg " in gre "command. It downweighs and even neglects outlying" _n "observations, and the bootstrap resampling may clone them thus worsening the" _n "working environment of robust regression, or, conversely, exclude outliers" _n "and improve conditions. Also, robust estimates are somewhat shifted away" _n "from OLS estimates." $SEPAR _n ; More; noi di _n(3) $SEPAR in gre "Now, some final remarks. It is commonly assumed that as many as 100 bootstrap" _n "resamples is enough to calculate the bootstrap variance of the estimate." _n "But if you want more precise infromation about quantiles, you would possibly" _n "specify 1000 resamples. In our example, we have seen that normal" _n "approximation had given insignificant coefficient of the car weight in robust" _n "regression with normal approximation to the bootstrap distribution, while" _n "more precise methods show that it is still significant." _n(2) "And a computational point. As long as the bootstrap is resampling the whole" "data set in Stata memory, the process may be considerably speeded up if" _n "you " in whi "drop" in gre " all the variables and observations you do not need. Be sure to" _n in whi "save" in gre " or at least " in whi "preserve " in gre "your data beforehand." _n(2) "There are still some subtle statistical issues with the bootstrap. For example," _n "a straightforward bootstrap fails completely to construct confidence" _n "intervals that cover the true value for such statistic as population" _n "maximum. In fact, sample maximum is no greater than the population one," _n "hence, the bootstrap produced values would be even lower." _n(2) "Another subtle issue is that the bootstrap in the form presented here" _n "is only applicable in i.i.d. case. If you have dependencies in the data" _n "such as time series or stratified samples, some specific resampling" _n "techniques should be used that account for those dependencies." _n(2) "So the message is:" _n; More; noi di in blue _n " Think carefully what you are boostrapping for!" _n $SEPAR _n(3); More; noi di in whi "Demonstration ends" _n "------------------" in gre _n(2) "That concludes our short demonstration, but there's much more. We now return" _n "control to you. Some suggestions:" _n(2) "- you can have a look at some other Stata tutorials; type " in whi _n " tutorial contents" in gre " to see what Stata developers prepared for you." _n(2) "- you can look at the bootstrap commands in more detail; type " in whi _n " help bs" in gre " or " in whi "whelp bs" in gre " to see the short description," " or consult the manual" in whi _n " [R] bstrap" in gre "." _n(2) " HAPPY BOOTSTRAPPING!" _n ; #delimit cr global SEPAR exit