#delimit ;
clear;
set trace off ;
set more off ;
window manage forward results;
noi di _n ;
* do vsu-pr.do ;
noi di in whi
_col(12) "The bootstrap methods" _n
_col(18) "Stanislav Kolenikov" _n
_col(24) "skolenik@recep.glasnet.ru" _n
_d(48) "-" _n(2) ;
* noi vsutut ;
noi di in gre _n(2)
"This tutorial can be discontinued at any time by pressing "
in wh "Break" in gr " (Ctrl-Break at" _n
"the keyboard or Break icon at the toolbar. Press "
in whi "Enter" in gre " or " in whi "Space" in gre
" when you see " _n in blu "--more--" in gre " message." _n(4)
;
* run vsu-prel.do ;
cap pro drop More ;
pro def More ;
set more on ;
more ;
set more off ;
end ;
global SEPAR `" _n in whi _d(79) "-" _n "';
noi di in gre
"This tutorial overviews the main uses of the bootstrap procedures in " _n
"econometric practice. The Stata commands to be discussed are:"
_n(2) in whi _col(15) "bs bstrap bstat bsample"
_n(2) in gre
"You can get a more detailed information on each of them by invoking"
_n in whi "help bs " in gre "from Stata prompt after the tutorial session." ;
More;
noi di _n(3) $SEPAR in gre
"The bootstrap is a computationally intensive method to obtain an approximation" _n
"of the distribution of sample statistics such as median, slope coefficient" _n
"of regression, or some nonlinear function of the data, distribution of" _n
"which cannot be obtained analytically, or derivation is too difficult." _n
"This is done by substituting the sample distribution for the population" _n
"distribution and then drawing repeated samples (i.e., bootstrap samples)." _n
"From those bootstrap samples, a series of values of the statistics of" _n
"interest can be obtained." _n(2)
"In other words, the bootstrap methods exploit the analogy (or, rather," _n
"relation) between the population and the sample. To estimate some population" _n
"characteristics, one would need to draw a number of observations and calculate" _n
"the sample statistics based on these resamplings. The former is done via" _n
"generating the observations by using the underlying distribution function" _n
"(known, if it is Monte Carlo studies, or unknown, if it is an empirical data" _n
"generating process, or repeated experiment)." _n(2)
"In a similar way, bootstrap uses the sample distribution as the only" _n
"possible proxy for the true one, and draws its samples by using the" _n
"empirical distribution function."_n
;
More;
noi di _n in gre
"Now, let us see how this works. Pretend we have the following data set: "
$SEPAR _n
;
set obs 12 ;
set seed 880239 ;
g int x=int(-16*log(1-uniform()))+1 ;
sort x;
lab var x "Some data" ;
noi di in whi ". list x" ;
push list x ;
noi list x ;
More;
noi di _n in whi ". summarize, detail";
push summarize, detail;
noi summarize, detail;
More;
noi di _n(2) $SEPAR in gre
"Let us try to obtain (the bootstrap estimates of) the confidence intervals for" _n
"the median of the sample of the fixed size. We shall have a look at several" _n
"iterations (or, rather, resamples) which the bootstrap makes on this data." _n
"For illustrative purposes, we shall use the low-level Stata bootstrapping" _n
"command " in whi
"bsample" in gre
" that lies in a core of all bootstrapping procedures." _n
"It substitutes the data in the memory by the bootstrap resample."
$SEPAR _n;
/*
noi di in whi ". bstrap showbs, reps(3) noisily args(More)" ;
push bstrap showbs, reps(3) noisily arge(More);
noi bstrap showbs, reps(3) noisily args(More);
*/
More ;
noi di in whi ". preserve" ;
push preserve ;
noi preserve ;
noi di _n in whi ". bsample" ;
push bsample ;
noi bsample ;
noi di _n in whi ". list" ;
push list ;
noi list ;
noi di in whi _n ". summarize, detail" ;
push summarize, d ;
noi summarize, d ;
More;
noi di _n $SEPAR in gre
"Now, we see that some observations were omitted, while others, repeated two" _n
"or more times. In the output, the median is the 50th percentile. Stata saves" _n
"it in the return code r(p50); see " in whi "help return" in gre " and "
in whi "help summarize" in gre "." _n
"Let us have a look at another couple of the bootstrap samples. Note that we" _n
"added " in white
"set seed" in gre
" to make results easily reproducible. It always makes sense" _n
"when you are dealing with some (pseudo)randomized processes like the bootstrap." _n
"Also, note the use of " in whi
"preserve" in gre
" in this occasion in order to be able to " in whi _n
"restore" in gre " the original data in memory."
$SEPAR _n;
More;
noi di in whi ". set seed 800923";
push set seed 800923;
noi set seed 800923;
noi di in whi _n ". restore, preserve" ;
push restore, preserve ;
noi restore, preserve ;
noi di in whi _n ". bsample" ;
push bsample ;
noi bsample ;
noi di in whi _n ". list" ;
push list ;
noi list ;
noi di in whi _n ". summarize, detail" ;
push summarize, d ;
noi summarize, d ;
More;
noi di in whi _n ". restore, preserve" ;
push restore, preserve ;
noi restore, preserve ;
noi di in whi _n ". bsample" ;
push bsample ;
noi bsample ;
noi di in whi _n ". list" ;
push list ;
noi list ;
noi di in whi _n ". summarize, detail" ;
push summarize, d ;
noi summarize, d ;
More;
noi di $SEPAR in gre
"Of course, one cannot take seriously inference based on three observations," _n
"so after looking at those illustrations of resampling, let us turn to a" _n
"more appropriate elaboration. To do this, we would need to invoke other" _n
"Stata bootstrap-related commands. The easiest one is " in whi
"bs" in gre
", and it just" _n
"records the specified returned values of the specified command. It works" _n
"like this:" $SEPAR _n;
More;
noi di in whi ". restore" ;
push restore ;
noi restore ;
noi di in whi _n ". set seed 71842";
push set seed 71842;
noi set seed 71842;
noi di _n in whi `". bs "summarize x, detail" "r(p50)", reps(100) dots saving(bs100) replace"' ;
push bs "summarize x, detail" "r(p50)", reps(100) dots saving(bs100) replace ;
noi bs "summarize x, detail" "r(p50)", reps(100) dots saving(bs100) replace ;
More;
noi di $SEPAR in gre
"The first argument is Stata command to be executed on the data, the second" _n
"one is the statistics to be retrieved, then the number of the bootstrap" _n
"samples we need, progress indicator, and file saving options." _n(2)
"In the presented table, Stata reports what the actually observed value was," _n
"how many bootstrap samples were taken, and a bunch of the bootstrap" _n
"statistics." _n(2)
"First, bias is the difference between the average across the bootstrapped" _n
"statistics and the actual value. The bias should be a concern for heavily" _n
"skewed distributions. The indicator for such a concern might be that bias" _n
"is more than a quarter of the estimated standard deviation." _n
;
More;
noi di in gre
"Second, standard deviation is the usual square root of the variance of the" _n
"bootstrapped values of the statistic. Finally, three different versions" _n
"of confidence intervals are presented: normal (assuming that the statistic" _n
"of interest is normally distributed), percentile (i.e. naive percentiles of" _n
"the bootstrap distribution), and bias-corrected (see special literature on" _n
"the bootstrap)." _n;
More;
noi di in gre
"We can retrieve the bootstrap information to have a closer look at it." _n
"Recall the option " in whi
"save(bs100) replace" in gre
" we specified in our " in whi
"bs" in gre
" command." _n
"Here, bs100 is the name of the file to contain the 100 bootstrap resamples" _n
"along with the actually observed values."
$SEPAR _n;
noi di in whi ". use bs100, clear" ;
push use bs100, clear ;
noi use bs100, clear ;
noi di in whi ". describe" ;
push describe ;
noi describe ;
More;
noi di _n $SEPAR in gre
"We can redisplay the bootstrap statistics any moment we like by typing " in whi
"bstat" in gre ":" $SEPAR _n;
noi di in whi ". bstat" ;
push bstat ;
noi bstat ;
More;
noi di _n $SEPAR in gre
"We can also play around the bootstrap statistic by graphing its" _n
"distribution (normal density and the actually observed value are" _n
"superimposed)..." $SEPAR;
noi di in whi ". graph bs1, bin(20) xlabel xline(11.5) norm ylabel(0(0.1)0.3)";
push graph bs1, bin(20) xlabel xline(11.5) norm ylabel(0(0.1)0.3);
noi graph bs1, bin(20) xlabel xline(11.5) norm ylabel(0(0.1)0.3);
More;
window manage forward results;
noi di _n $SEPAR in gre "...tabulating the unique values..." $SEPAR;
noi di in whi ". tabulate bs1" ;
push tabulate bs1 ;
noi tabulate bs1 ;
More;
noi di $SEPAR in gre
"... or testing for normality. Null hypothesis of normality should be" _n
"rejected in our case." $SEPAR ;
noi di in whi ". sktest bs1" ;
push sktest bs1 ;
noi sktest bs1 ;
More;
noi di _n(7) $SEPAR in gre
"Now, let us finally consider the most typical sort of the bootstrap use." _n
"Usually calculation of the statistic of interest may be cumbersome " _n
"and no ready-to-go Stata command might be available. Thus, you could" _n
"need to do some programming to get your bootstrap estimates." _n(2)
"As an example, we shall use the famous Stata automobile data." $SEPAR _n;
local stdir : sysdir STATA ;
noi di in whi `". use "`stdir'\auto.dta", clear"';
push use `"`stdir'\auto.dta"', clear;
noi use `"`stdir'\auto.dta"', clear;
More;
noi di in whi ". describe";
push describe;
noi describe;
noi di _n $SEPAR in gre
"We shall study the determinants of the car price, and we shall pose" _n
"the following problem: are robust regression estimates better than OLS ones?" _n
"Of course, Stata does report standard errors, but the confidence intervals" _n
"based on these standard errors may be of wrong coverage. So we shall use" _n
"the bootstrap confidence intervals to compare the two." $SEPAR _n;
More;
noi di _n $SEPAR in gre
"First of all, let us see what we have as a starting point." $SEPAR _n
;
noi di in whi ". regress price weight foreign mpg";
push regress price weight foreign mpg;
noi regress price weight foreign mpg;
More;
noi di _n(2) in whi ". rreg price weight foreign mpg";
push rreg price weight foreign mpg;
noi rreg price weight foreign mpg;
More;
noi di _n(2) $SEPAR in gre
"The two sets of estimates differ, and the confidence intervals of the " _n
"significant variables do not overlap much. So, what estimates should we" _n
"trust? That is the point to study with the bootstrap." _n(2)
"Now, the programming point. The most advanced version of the bootstrap" _n
"bunch of commands is " in whi
"bstrap" in gre
", but it also requires some bit of programming."
;
More;
local a "` " ; local b 1 ; local c " '" ;
noi di in gre _n
"If you look up " in whi
"help bstrap" in gre
" (or Read The Fine Manual), you would find that" _n
"the program you are to write is to comply with a special form of convention:"
_n(2) in whi
" program define" in gre " your program name" _n in whi
" version 6" _n
`" if ""' substr("`a'",1,1) "`b'" substr("`c'",2,1) `""=="?" { "' _n
`" global S_1 ""' in gre "variable names" in whi`"""' _n
" exit " _n
" }" _n in gre
" your calculations on the data in memory" _n in whi
" post " substr("`a'",1,1) "`b'" substr("`c'",2,1) in gre
" results of calculation in the earlier specified order" _n
in whi
" end" _n
;
More;
noi di _n(2) in gre
"The most specific points are the line 3 with " in whi
"if" in gre
" and the last but one line" _n
"with" in whi
" post" in gre
". The former tells the " in whi
"bstrap" in gre
" command how do you wish your" _n
"results be called, and the latter does save the results in a separate" _n
"Stata file. The program must be made known to Stata; the most standard" _n
"way to do it is to save it as a separate " in whi
".ado" in gre
" file visible to Stata." _n
"Another option would be to define it right away. It is not at all convenient" _n
"to type all the program statements interactively, but it can be relatively" _n
"easily done when you are writing a do-file. Here in the tutorial we shall" _n
"define it -- watch our steps and their compatibility with the above outline!"
$SEPAR _n
;
More;
* noi di in whi ". type autobs.ado";
* push type autobs.ado;
* noi type autobs.ado;
noi di in whi _n(2)
"capture program drop autobs" _n
"program define autobs" _n
" version 6" _n
`" if ""' substr("`a'",1,1) "`b'" substr("`c'",2,1) `""=="?" { "' _n
`" global S_1 "oweight oforeign ompg rweight rforeign rmpg" "' _n
" exit" _n
" }" _n
" regress price weight foreign mpg" _n
" local ow=_b[weight]" _n
" local of=_b[foreign]" _n
" local om=_b[mpg]" _n
" rreg price weight foreign mpg" _n
" local rw=_b[weight]" _n
" local rf=_b[foreign]" _n
" local rm=_b[mpg]" _n
" post " substr("`a'",1,1) "1" substr("`c'",2,1) " (" substr("`a'",1,1) "ow" substr("`c'",2,1) ") (" substr("`a'",1,1) "of" substr("`c'",2,1) ") (" substr("`a'",1,1) "om" substr("`c'",2,1) ") (" substr("`a'",1,1) "rw" substr("`c'",2,1) ") (" substr("`a'",1,1) "rf" substr("`c'",2,1) ") (" substr("`a'",1,1) "rm" substr("`c'",2,1) ") "_n
"end" _n(2);
push capture program drop autobs ;
push program define autobs ;
push version 6 ;
push if "`1'"=="?" { ;
push global S_1 "oweight oforeign ompg rweight rforeign rmpg" ;
push exit ;
push } ;
push regress price weight foreign mpg ;
push local ow=_b[weight] ;
push local of=_b[foreign] ;
push local om=_b[mpg] ;
push rreg price weight foreign mpg ;
push local rw=_b[weight] ;
push local rf=_b[foreign] ;
push local rm=_b[mpg] ;
push post `1' (`ow') (`of') (`om') (`rw') (`rf') (`rm') ;
push end ;
capture program drop autobs ;
program define autobs ;
version 6 ;
if "`1'"=="?" { ;
global S_1 "oweight oforeign ompg rweight rforeign rmpg" ;
exit ;
} ;
regress price weight foreign mpg ;
local ow=_b[weight] ;
local of=_b[foreign] ;
local om=_b[mpg] ;
rreg price weight foreign mpg ;
local rw=_b[weight] ;
local rf=_b[foreign] ;
local rm=_b[mpg] ;
post `1' (`ow') (`of') (`om') (`rw') (`rf') (`rm') ;
end ;
#delimit ;
More;
noi di _n $SEPAR in gre
"We would like you to draw your attention to several points." _n(2)
"First, we made sure we do not have a program called " in whi "autobs" in gre " in Stata memory."
_n(2) "Second, we used " in whi
"=" in gre
" (equal sign) when assigning our macros. If we did not," _n
"Stata would just rememeber the string " in whi
"_b[weight]" in gre
" etc., and substitute it" _n
"when parsing " in whi
"post" in gre
" command. This would result in substituting the last" _n
"estimates results, instead of both estimates." _n
;
More;
noi di in gre
"Third, we used parentheses in " in whi
"post" in gre
" argument to prevent Stata parser from" _n
"trying to calculate whatever is in the argument string. Imagine we have" _n
in whi ". post filename 2 6 -3 4 5 1" in gre _n
"Then Stata would calculate: " in whi
"6-3=3" in gre
" when parsing this string, and thus" _n
"spoiling everything except the first argument." _n
;
More;
noi di in gre
"These are some tricks with " in whi
"bstrap" in gre
" command. Now we shall run our program."
$SEPAR _n(2)
;
noi di in whi ". set seed 81672" _n;
push set seed 81672;
noi set seed 81672;
noi di in whi ". bstrap autobs, reps(200) dots saving(regbs) replace every(20) double";
push bstrap autobs, reps(200) dots saving(regbs) replace every(20) double;
noi bstrap autobs, reps(200) dots saving(regbs) replace every(20) double;
More;
noi di _n(2) $SEPAR in gre
"Now, we see that OLS confidence intervals in the basic regression seem" _n
"to be too optimistic. The point is even stronger with robust estimates" _n
"which does not have to be too surprising given the robust algorithm" _n
"implemented in" in whi
" rreg " in gre
"command. It downweighs and even neglects outlying" _n
"observations, and the bootstrap resampling may clone them thus worsening the" _n
"working environment of robust regression, or, conversely, exclude outliers" _n
"and improve conditions. Also, robust estimates are somewhat shifted away" _n
"from OLS estimates."
$SEPAR _n ; More;
noi di _n(3) $SEPAR in gre
"Now, some final remarks. It is commonly assumed that as many as 100 bootstrap" _n
"resamples is enough to calculate the bootstrap variance of the estimate." _n
"But if you want more precise infromation about quantiles, you would possibly" _n
"specify 1000 resamples. In our example, we have seen that normal" _n
"approximation had given insignificant coefficient of the car weight in robust" _n
"regression with normal approximation to the bootstrap distribution, while" _n
"more precise methods show that it is still significant." _n(2)
"And a computational point. As long as the bootstrap is resampling the whole"
"data set in Stata memory, the process may be considerably speeded up if" _n
"you " in whi
"drop" in gre
" all the variables and observations you do not need. Be sure to" _n
in whi "save" in gre " or at least " in whi "preserve " in gre
"your data beforehand." _n(2)
"There are still some subtle statistical issues with the bootstrap. For example," _n
"a straightforward bootstrap fails completely to construct confidence" _n
"intervals that cover the true value for such statistic as population" _n
"maximum. In fact, sample maximum is no greater than the population one," _n
"hence, the bootstrap produced values would be even lower." _n(2)
"Another subtle issue is that the bootstrap in the form presented here" _n
"is only applicable in i.i.d. case. If you have dependencies in the data" _n
"such as time series or stratified samples, some specific resampling" _n
"techniques should be used that account for those dependencies." _n(2)
"So the message is:" _n;
More;
noi di in blue _n " Think carefully what you are boostrapping for!"
_n $SEPAR _n(3);
More;
noi di in whi "Demonstration ends" _n "------------------" in gre _n(2)
"That concludes our short demonstration, but there's much more. We now return" _n
"control to you. Some suggestions:" _n(2)
"- you can have a look at some other Stata tutorials; type " in whi _n
" tutorial contents" in gre " to see what Stata developers prepared for you." _n(2)
"- you can look at the bootstrap commands in more detail; type " in whi _n
" help bs" in gre " or " in whi "whelp bs" in gre " to see the short description,"
" or consult the manual" in whi _n " [R] bstrap" in gre "." _n(2)
" HAPPY BOOTSTRAPPING!" _n
;
#delimit cr
global SEPAR
exit