Schaffer Online Library of Drug Policy Sign the Resolution
Contents | Feedback | Search
DRCNet Home
| Join DRCNet
DRCNet Library | Schaffer Library | Government Publications


Sourcebook of Criminal Justice Statistics

Appendix 8

National Household Survey on Drug Abuse

Survey methodology

Note: The following information was excerpted from U.S. Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, National Household Survey on Drug Abuse: Population Estimates 1997 (Rockville, MD: U.S. Department of Health and Human Services, 1998), pp. 1-13; and Preliminary Results from the 1997 National Household Survey on Drug Abuse (Washington, DC: U.S. Department of Health and Human Services, 1998), pp. 5-8, Appendix 1, and Appendix 2. Non-substantive editorial adaptations have been made.

Survey methodology

The National Household Survey on Drug Abuse (NHSDA) is a series of annual national surveys measuring the prevalence of drug, alcohol, and tobacco product use among the American household population age 12 and older. Estimates of drug use prevalence for the civilian, noninstitutionalized population of the United States are presented.

The NHSDA is based on a stratified, multi-stage area probability sample. For 1997, 123 primary sampling units (PSUs) were selected as the first stage of sampling. Within each PSU, area segments were selected with unequal probability proportional to a composite size measure designed to overrepresent concentrated Hispanic and black neighborhoods. Dwelling units were selected from each sample segment. The target population included all civilian residents of households (including civilians residing on military installations) and noninstitutional group quarters (e.g., college dormitories, homeless shelters, rooming houses) 12 years of age and older. Persons excluded from the universe include military personnel on active duty, transient populations (such as homeless people that do not reside in shelters), and residents of institutional group quarters (e.g., jails, hospitals, etc.). Data collection was continuous over the calendar year with approximately one-fourth of the sample allocated to each quarter.

Survey data were collected through personal visits to each selected residence. Introductory letters are mailed to each residence, explaining the survey prior to the interviewer's visit. Upon arrival, field representatives conduct a short voluntary screening procedure with any resident of the household 18 years of age or older that is capable of providing information on the age, race/ethnicity, sex, and marital status of each resident 12 years of age and older. This information is used in a random selection procedure that determines whether any resident members are eligible for an in-depth interview (either one, two, or no individuals are selected). The interviewer has no control over the selection procedure. The 1997 within-household person selection probabilities were based on the race/ethnicity of the head of household and the ages of each household member. Selected individuals were then asked if they would complete a voluntary interview. NHSDA field representatives conducted the interviews using a paper and pencil questionnaire that included both interviewer-administered questions and self-administered answer sheets (for collection of sensitive information). All screening and interview responses are kept confidential.

In 1997, a total of 31,290 eligible dwelling unit members were selected for an interview; of these, a total of 24,505 interviews were completed. Response rates for screening and interviewing were 92.7% and 78.3%, respectively.

Age and race/ethnicity were the two primary correlates of drug use on which the samples were stratified. The sample design ensured adequate sample sizes for four age groups (12 to 17, 18 to 25, 26 to 34, and 35 and older) and three race/ethnicity groups. This oversampling allowed certain subgroups to be large enough to support estimation. Based on the respondents' self-classifications the race/ethnicity groups were classified as: (1) Hispanic in origin, regardless of race; (2) white, not of Hispanic origin; and (3) black, not of Hispanic origin. As defined, these groups are mutually exclusive. Those who did not identify themselves as Hispanic, non-Hispanic white, or non-Hispanic black were included in the category "other." This includes American Indians, Alaska Natives, Pacific Islanders, Asians, and other groups. Separate estimates are not provided for this category because the sample size is too small.

The NHSDA surveys have used basically the same multistage area probability sample design that has been employed since the 1988 survey. This design uses a composite size measure methodology and a specially designed within-dwelling selection procedure to ensure that desired sample sizes would be achieved for subpopulations defined by age and race/ethnicity. However, in some survey years, oversampling was used to meet specified precision constraints for these subpopulations. The 1993 through 1997 NHSDAs oversampled Hispanics in areas of high Hispanic concentration to reduce survey costs. In addition, the 1993 through 1995 NHSDAs oversampled cigarette smokers ages 18 to 34.

During the 1997 study, data from a special supplemental sample were collected beginning with the second quarter of data collection. This supplemental sample was designed to increase the number of respondents who reside in California and Arizona in order to measure the impact of voter initiatives to legalize certain drugs for medical purposes. Fifteen of the 123 PSUs were located in California and 18 PSUs were located in Arizona. Of the 24,505 interviews completed for 1997, 5,223 interviews were conducted in California and 4,577 were conducted in Arizona. The final sample weights for NHSDA respondents were appropriately adjusted to account for this supplemental sample, thereby eliminating any potential bias in estimates that might otherwise exist.

A revised questionnaire and editing procedure were introduced beginning with the 1994 NHSDA. Data for 1994 through 1997 presented in SOURCEBOOK are based on the new questionnaire; data for years prior to 1994 presented in SOURCEBOOK have been adjusted by the Source in order to facilitate trend presentations (see discussion on next page).

In addition, beginning in 1991, the survey differs from previous years in two ways: Alaska and Hawaii were included in the sample and some individuals living in group quarters (e.g., civilians living on military installations, individuals living in college dormitories, or individuals living in homeless shelters) were included.

Development of weights

An analysis weight was calculated for each completed interview to reflect selection probabilities and to compensate for nonresponse and undercoverage. Poststratification adjustments are made to force the respondent weight totals to equal U.S. Bureau of the Census projections for the civilian, noninstitutionalized population according to age group, sex, race, and Hispanic origin. Each weight can be viewed as the number of U.S. population members that the responding sample member represents.

Table 1 1997 NHSDA sample size and U.S. population
represented, by demographic characteristics


Sample Population
Total 24,505 216,206,090
Male 10,836 104,017,297
Female 13,669 112,188,792
Race, ethnicity
White 12,443 161,168,830
Black 4,639 24,405,541
Hispanic 6,259 21,577,831
Other 1,164 9,053,889
12 to 17 7,844 22,547,092
18 to 25 6,239 27,690,997
26 to 34 4,387 35,245,502
35 years
and older
6,035 130,722,499
Northeast 2,905 40,341,528
North Central 3,255 48,355,568
South 7,554 81,231,438
West 10,791 46,277,555

Note: Sample size is the unweighted number of respondents in the 1997 National Household Survey on Drug Abuse. These 1997 population projections are based on the 1990 U.S. Bureau of the Census counts.

Adjusting for nonresponse through imputation

The prevalence estimates are based on the total sample or all cases in a subgroup, including some cases for which missing data for some recency-of-use and frequency-of-use variables were replaced with logically or statistically imputed (replaced) values. Prior to determining the completeness of a case, an editing procedure is implemented to check for inconsistencies and to determine if missing information is retrievable by using other information in the questionnaire. Logical imputation also is employed to replace inconsistent, missing, or invalid data. Determination of completeness of a case is then made. To be classified as a minimally complete interview, and therefore included in the database, data on the recency of use of alcohol, marijuana, and cocaine had to have been provided by the respondent or logically imputed from other answers supplied by the respondent.

For some key variables that still had missing values after the application of logical imputation, statistical imputation was used to replace the missing data with appropriate valid response codes. Data still missing for recency-of-use questions (for drugs other than alcohol, cocaine, and marijuana) were statistically imputed using a technique known as "hot deck imputation." The first step in this procedure involves sorting the data file progressively using data on recency-of-use of alcohol, marijuana, and cocaine; age; sex; Hispanic origin; and race. The hot deck imputation procedure replaces a missing item on a particular record by the last encountered nonmissing response for that item (from a previous record) on the sorted database. This procedure is appropriate for recency-of-use variables because the level of item nonresponse is low. Missing data for the frequency-of-use-in-the-past-12-months variables are statistically imputed using a logistic regression-based method of imputation. The potential for bias due to item nonresponse or imputation is minimal because item nonresponse is less than 2% for drug use recency questions.

Sampling error and confidence intervals

In the National Household Survey on Drug Abuse, as in every sample survey, there is some degree of statistical uncertainty or error. The estimates provided are subject to uncertainties of two types: nonsampling and sampling errors. Some sources of nonsampling error are recording and coding errors, nonresponse, computer processing errors, differences in respondents' interpretations of questions, and purposely false answers. Nonsampling errors cannot be quantified, however, rigorous attempts are made to minimize their occurrence through pretesting, interviewer training and evaluation, interview verification, coder training, coding verification, and other quality control measures.

Sampling errors denote the random fluctuations that occur in estimates when a sample of the population is drawn rather than conducting a complete census. Different samples drawn using the same procedures from the same population would be expected to result in different estimates. Many of these observed estimates would differ to some degree from the "true" population value and these differences are due to sampling error. Sampling errors are quantified by way of confidence intervals. Asymmetrical 95% confidence intervals are calculated for all estimated proportions and corresponding population estimates.


North Central--Includes the East North Central States--Illinois, Indiana, Michigan, Ohio, and Wisconsin; and the West North Central States--Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, and South Dakota.

Northeast--Includes the New England States--Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont; and the Middle Atlantic States--New Jersey, New York, Pennsylvania.

South--Includes the South Atlantic States--Delaware, District of Columbia, Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, and West Virginia; the East South Central States--Alabama, Kentucky, Mississippi, and Tennessee; and the West South Central States--Arkansas, Louisiana, Texas, and Oklahoma.

West--Includes the Mountain States--Arizona, Colorado, Idaho, Montana, Nevada, New Mexico, Utah, and Wyoming; and the Pacific States--Alaska, California, Hawaii, Oregon, and Washington.

Adjustment procedures for trend data

Beginning in 1994, the NHSDA began using an improved questionnaire and estimation procedure based on a series of studies and consultations with drug survey experts and data users. When the new questionnaire was introduced in 1994, a supplemental sample was selected for use with the old methodology (i.e., a questionnaire identical to previous years). This provided the capability to assess the impact of the new questionnaire and to measure the effects of the change in methodology. Because this new methodology produces estimates that are not directly comparable to previous estimates, the 1985-93 NHSDA estimates presented in tables 3.82-3.84 were adjusted to account for the new methodology that was begun in 1994. The substance use prevalence estimates, for nearly all of the substances presented, were adjusted using a simple ratio correction factor. The simple ratio correction factor measured the effect of the new methodology, relative to the old methodology, using data from the 1993 and 1994 NHSDAs. For the remaining substances, the prevalence estimates were adjusted by using a model-based method. Similar to the ratio adjustment, this method of adjusting previous estimates models the combined effect of all measurement error differences between the new and old methodologies.

Estimation procedures for heavy drug use

Beginning with the 1997 survey data, a new estimation procedure was designed to produce improved estimates of heavy drug use. Using external counts of drug users from other data sources, a ratio estimation procedure was developed to provide a partial adjustment for undercoverage of hard-to-reach populations and underreporting of drug use by survey respondents. This adjustment has resulted in 40-80% higher estimates of past year and past month heroin use and 20-40% higher estimates of frequent cocaine use.

Sourcebook of Criminal Justice Statistics