Journal of Interactive Advertising, Volume 3, Number 1, Fall 2002
The Internet’s potential for quantitative data collection has been debated by researchers for many years. For advertising academics and practitioners, the Internet allows for the assessment of consumer opinions and attitudes toward a range of topics. However, the accessibility of online populations and the generalizability of data collected online are uncertain. The author discusses a range of online techniques and what we still need to learn about such techniques in order to harness the potential of the Internet for quantitative research.
The author wishes to acknowledge the insights provided by Mariea Hoy, Angela Mak, and Tad O’Dell during the preparation of this manuscript.
The dynamic nature of the Internet is of interest to both advertising practitioners and academics. Practitioners and academics are both searching for ways to understand online consumer behavior and to measure attitudes and perceptions of persuasive online messages. Determining a way to observe such behaviors and to assess attitudes and perceptions, though, has been a challenge to researchers. While data collection using the Internet had been undertaken since 1987, a relatively small body of literature discussing online techniques and tactics currently exists.
My first e-mail survey, using a University community as a sample, was conducted with Mariea Hoy in 1997; with a subsequent national survey of Internet users begun that same year. Given the lack of information on online methodological techniques, specifically in regards to sampling issues, we developed a sampling methodology to generate as close to a random sample as we felt was feasible at the time. At the conclusion of both research projects, we assessed the potential of the Internet as a research tool, and believed at that time that the Internet provided great promise for researchers (Sheehan and Hoy 1999). Now, five years after that first email survey, the editors of this journal asked me to address the status of online research today. To do so, I will look at the past, the present, and the future of online research in order to assess the promises previously identified and to examine the potential of the Internet for online quantitative research.
Five years ago, using the Internet for quantitative research provided several promises for the future. These included:
Strong reach/penetration. In 1997, about 30% of the United States population was online. Email was used by the vast majority of Internet users, and was seen as the most appropriate way to survey the online universe.
Fast response speed. The Internet allowed data to be collected very quickly, with the majority of responses to email surveys returned in a 24 to 48 hour time period.
Low cost. Delivering surveys online meant researchers did not incur either postage or printing costs. In 1997, email addresses were available for free via online search engines. The Internet presented a cost-efficient channel for survey dissemination (especially appealing to poor Doctoral students).
Response flexibility. When using the Internet to deliver the survey instrument, individuals could reply in a number of ways. For email surveys, for example, individuals could respond via email or could print it out to complete it and then mail or fax it to the researcher. Today, a link on an email message could also take the respondent directly to the web survey. This flexibility addresses a range of user comfort levels with technology, which may have a positive effect on response rates.
Control of Anonymity. Anonymity and confidentiality is important to many survey respondents, and control over anonymity affects response rates. It is possible for respondents to be anonymous online, especially if web-based surveys without password controls are used. Respondents to email surveys can be anonymous if they used non-Internet methods of returning the survey (such as printing out the survey, removing identifiable information, and returning the completed survey via postal mail or fax).
Minimized interviewer error. When respondents typed in their answers to email surveys, there was a decreased chance of misinterpretation of answers, particularly in comparison to telephone or in person interviews. With web surveys, data entered by respondents can go directly into a spreadsheet for analysis.
Minimized interviewer bias. The interviewer’s electronic presence appears to rarely influence individuals’ answers.
We also found several limitations to using the Internet to collect data. These included:
Generalizability. The Internet had not reached a ‘critical mass’ of the United States population in 1997, and the dominant user demographic was men under the age of 35. Thus, while responses could be generalized to the Internet population as a whole, it was unlikely that results from an Internet-based sample could be generalized to the United States population as a whole.
Accessibility of Names. Email addresses for the national sample were limited to those individuals who agreed to be part of published online directories, since the sample was generated using a search engine that used the domain names of online directories to generate addresses. Additionally, directories selected for inclusion in the sample had to be listed at a central ‘directory of directories’. It would be impossible to know which directories (and their email addresses) were included and which were excluded from this sample. At the time, there was not a global or national directory of email addresses to use for sample generation; this would affect the overall generalizability of the results.
Multiple responses. Technological mechanisms were not available to reduce the chances of individuals responding more than once to online surveys, thus the possibility existed that single individuals would respond multiple times to the survey.
Response rate issues. Web surveys in 1997 were primarily conducted using non-probability samples; basically, web-based surveys were available for anyone to answer. Thus, it was impossible to assess the actual survey response rate. Therefore, the power of survey results from Web-based surveys was questionable.
The conclusion that we made at the end of our two email studies was that strong potential existed for using the Internet for online data collection. Today, many of the promises we identified from our 1997 work continue to make the Internet a promising vehicle for data collection. The greatest of these promises is the Internet’s ability to reach large numbers of people. Today, the Internet reaches almost 60% of Americans (NUA 2002) and the profile of the Internet population is very similar to the United States population as a whole. Therefore, the potential exists to generalize results not only to the Internet population, but also to the United States population as a whole.
The growth in online usage and subsequent advancements in technology have resulted in a number of different quantitative techniques to survey consumers online. Couper (2000) developed a typology of the options available for quantitative researchers to use online. Quantitative methodologies involving the Internet include both probability-based and non-probability based methods (Couper 2000). Probability-based methods include:
Intercept surveys: similar to an exit poll, these surveys target web site visitors and invite every nth visitor to participate in a poll. The sampling frame is defined as visitors to the web site.
List-based samples: the sampling frame is a pre-established list of those with web access. Invitations to participate and possibly the surveys themselves are sent to randomly selected members of the list. Examples of lists include faculty, staff, and student directories, and association member lists (such as the AEJMC member list)
Web and email options in mixed-mode surveys: respondents are mailed letters or contacted via phone to disseminate surveys, and have the option to answer via web or via email (which would be requested by respondents). The sampling frame would be the same population as the telephone or mail survey; the technique is designed to minimize respondent burden and cost.
Pre-recruited panels of Internet users: panels of individuals with web access are recruited via random digit dialing (RDD) and are invited to respond to email and/or web surveys. Examples include the panel used by the Pew Center for Internet Research.
Pre-recruited panels of the full United States population: panels of individuals are recruited and non-web users are provided with equipment to be able to answer surveys online. Couper (2000) saw this method as the only method that has a potential for obtaining a probability sample of the full population. One such company is Intersurvey, owned by Knowledge Networks.
Non-probability based methods include:
Entertainment polls: unscientific polls appearing on any manner of web sites representing the collective opinions of people taking the poll. An example of this could be CNN’s "Quick vote", which asks a daily question that site visitors can answer with a point and a click.
Unrestricted self-selected surveys: a web survey publicized via open invitations on portals, frequently visited web sites, or dedicated survey sites. These often have no access restrictions. Examples of this type of poll would be the National Geographic Society’s Survey 2000 and Georgia Institute of Technology’s GVU Survey.
Volunteer opt-in panels: panelists are recruited via banner ads at web sites and portals. Panelists are then invited to participate in either web or email surveys. Examples include panels available through NFO and Greenfield online.
A review of four journals that focus on advertising in general or advertising in new media in particular (Journal of Advertising, Journal of Advertising Research, Journal of Interactive Marketing and Journal of Interactive Advertising) showed that in the past two years the Internet was rarely used to collect quantitative survey data. Only three studies used online surveys, two with non-probability samples (unrestricted self-selected surveys and volunteer opt-in panels), and one using a probability sample.
Has the rapidly changing nature of technology affected the promises of the Internet for survey research, or increased the challenges? To answer this question, an assessment of the current state of the Internet vis-a-vis previously identified promises and challenges was undertaken.
Accessibility and cost challenges
The low cost of online research were seen as a promise, while sampling frame accessibility was seen as a potential limitation to online research. Today, the Internet is no longer guaranteed to be a low cost way to survey a probability sample. No national or global online directories of email addresses exist, and directories such as those at yahoo.com will not generate lists of email addresses without the inclusion of one piece of information such as a first or last name. Because of privacy concerns, many individuals opt out of being included in directories, which then reduces the value of online directories to that of a white pages phone book.
It is possible to survey large numbers of online users via pre-recruited panels, yet accessing such panels comes with a relatively high cost. I was recently given a quote of .50 per name for an email survey of online users 18+ that had ‘opted in’ to receive unsolicited emails. Other pre-recruited panels (such as those available through the Pew Center for Internet and American Life) are of limited usage for researchers since only a few questions from a single researcher are included in the study. Additionally, researchers must submit questions to a review board, which then decides whether the questions will be included or not: there is no guarantee that the panel will be accessible to researchers.
More success may be achieved with a more specific population in mind, since this may result in better access to a reasonable sampling frame. For example, lists of individuals who belong to various associations are available for rental. Researchers who wish to examine attitudes and opinions of students, academics, physicians, and other targeted groups may find appropriate lists available to them. Other low cost ways exist to develop a sampling frame. For example, if one wished to study top management at advertising agencies, the Standard Directory of Advertising Agencies could be used to randomly select agencies for inclusion in the sampling frame, and the directory may even include email addresses of top executives. Obviously, this would require considerable labor to establish the list. For such studies, the survey topic must be appropriate and directly relevant to members of the sampling frame, since generalizability will be limited to the population of interest, not the population as a whole.
Response Rate and Generalizability Challenges
Regardless of the type of sampling frame, response rates to Internet-based surveys have been declining over the past ten years. An analysis of thirty-one different studies published from 1986-2000 showed that email surveys generated an average response rate of 36.83%. Looking only at studies from 1997-2000, though, the average response rate was 29.42% (Sheehan 2001). A meta-analysis of 68 email and web studies showed an overall response rate of 39.6% (Cook, Heath, and Thompson 2000). A recent conversation with an administrator for the opt-in panel survey reported average response rates of 8-10%. These rates are all lower than the average response rate to paper surveys, which were estimated by Baruch (1999) to be at 55.6%. Both Sheehan (2001) and Cook, Heath, and Thompson (2000) found follow up contacts or reminders increased response rates, and Cook, Heath, and Thompson’s (2000) analysis found pre-notification also increased response rates.
To what can we attribute low response rates? Certainly, response rates to all manner of surveys have been decreasing in the United States over the past several decades. High response rates to early online surveys may be attributed to the novelty factor of online surveys, and now that such surveys are no longer a novelty, response rates are low. Online users today receive seven times the number of unsolicited emails than they did two years ago (Simpson 2002) and user concerns with unsolicited email and online privacy have reached a high level (Pastore 2002). Additionally, online users are constantly moving targets, changing email addresses much more frequently than they change postal mailing addresses. Shannon and Bradshaw (2002) indicated that there is a much higher undeliverable rate for email surveys than for postal mail surveys, and this undeliverable rate must be assessed in the overall response patterns of the survey.
Low response rates may affect the generalizability of online studies. In order to compensate for low response rates, many researchers generate large numbers of actual responses in order to attempt to obtain a more representative sample. Cook, Heath, and Thompson (2000) suggested that representativeness of sample is much more important than response rates attained by the surveys. Studies using non-probability samples use the rationale that the larger the actual number of responses, the more representative the responses.
However, a large number of responses do not guarantee a stable and generalizable population. Tierney (2000) compared demographic information provided by visitors to a tourism web site to the demographic information that was available about the actual population of the site. In his analysis, Tierney (2000) found that large sample sizes do not compensate for low response rates, in that survey respondents were not representative of average visitors to the Web site. This finding was based on the discrepancy between the percentage of respondents who ordered a state visitors guide (45.9%) at the tourism web site and the Web site traffic statistics on the number of visitors to the state’s Web site who clicked on the brochure order form (1.5%). This suggests that it does not matter how large a sample size you acquire in an Internet-based survey if respondents differ significantly from nonrespondents. Tierney (2000) recommends that assessments must be done in order to gauge the representativeness of samples using non-probability techniques or with low response rates.
In contrast, Witte, Amoroso, and Howard (2000) support the use of non-probability samples. They suggest that "randomness does not guarantee representativeness: rather, it provides the means to quantify the level of confidence with which one can say that the sample represents the population (p. 188)." They describe sample designs in surveys such as those collected for Nielsen Television ratings that increase the probability of responses from different groups (e.g. minorities) and argue that non-probability samples can use external benchmarks to construct adjustment factors that will allow researchers to treat the sample as if it was a random sample. For example, a web-based survey done on behalf of the National Geographic Society not only compared the demographic characteristics of respondents to United States demographics as a whole, but also gauged similarities between online respondents and the population as a whole by asking a series of questions regarding music preferences that were comparable to an offline study of the same topic. Comparisons in music preferences were seen as having the possibility to develop analytical leverage to construct plausible adjustment weights to make the online survey results generalizable. While the authors admit this approach was imperfect, they suggest that it does afford the means to further generalize from the web-based survey to the population at large (Witte, Amoroso, and Howard 2000).
The growth of types of technology and programs used to develop web and email interfaces has expanded greatly in the past several years. However, initial studies into design issues show that design can have an effect on survey participation. Couper (2000) states that we must use design survey response features judiciously to maximize data quality and minimize error. Several studies have identified that design issues can have systematic effects on the behavior of respondents, particularly in Web surveys. A study by Couper, Traugott, and Lamias (2001) found that changing the length of an entry box for a series of otherwise identical items changed the distribution of forced-choice responses. Complex graphics and other design choices to make surveys look more appealing may increase download time, and may subsequently lower survey completion rates. Other factors that may contribute to non-completion rates include using open-ended questions or questions arranged in tables (Bosnjak and Tuten 2001). A lack of familiarity with web ‘forms’ in general (such as pull-down boxes) may also contribute to increased drop out rates. Evidently, perceptions of ‘ease of use’ of Web surveys are in the eye of the beholder, and better understanding user comfort levels with web forms is important to developing appropriate instruments for data collection.
Before the Internet can fulfill its promise for online research, several areas must be examined further in order to address concerns that appear to limit the research potential of the Internet. First, it will be important to investigate the usefulness of non-probability samples, specifically in terms of the generalizability of such samples. One of the most efficient ways to generate a large number of responses to a survey continues to be by using a Web survey with announcements pointing potential respondents to the site at different places online (such as through banner ads, listserv postings, and the like). In order to better understand non-probability samples, comparison studies between results of a probability and non-probability sample could be undertaken to understand similarities and differences in the respondent pools, the overall survey results, and the surveys’ completion rates. This will help to answer questions regarding the generalizability of non-probability samples.
We should also more closely examine opt-in panels to determine the types of studies that might be appropriate for such non-probability samples. While it has been suggested that these panels are no better than convenience (e.g. student) samples, arguably such panels serve the function of eliminating those individuals who are highly unlikely to respond to the survey, and may result in a sample that is comparable to probability samples. Additionally, the panels may be as robust as student samples, and could be used for certain types of research for which student samples are commonly used. For example, a split opt-in panel experiment could test how different size type on a banner advertisement effects awareness, recall, and click through.
We also need more study of the use of multiple survey modes for a single survey, especially when the target population is a large public (Yun and Trumbo 2000). Multiple modes may be highly effective to generate generalizable responses; Yun and Trumbo (2000) did not find significant influences of survey mode in substantive analyses of data collected via three different modes (paper and pencil, email, web). The researchers suggest that using multi-mode survey techniques will improve the representativeness of the sample, without biasing other results. Studying results from multiple mode samples will also shed some light on the design issues concerns discussed earlier.
One ‘promise’ of the Internet was the fast response time. Are responses still returned quickly, and if so, does it really matter? An assessment of email response rates showed that very few studies published after 1999 reported response speed (Sheehan 2001). It could be that speed of response is taken for granted today. Another reason could be that speed is not that important a matter for academic studies, given the long time period it takes to get a study completed and published. Immediate responses are beneficial only if early respondents represent the respondent group as a whole: and studies have shown that this may not be true (Shannon and Bradshaw 2002).
As the amount of unsolicited email appearing in mailboxes increases, it seems reasonable to assume that the amount of unsolicited email correlates negatively with intentions to participate in email surveys. Concerns with online privacy in general and information collection in particular may also affect propensities to respond to web and email surveys. Studies investigating this relationship can help to shed some light on why online surveys responses may differ from those of the general population. Studies could investigate whether there may be any ways to improve the image of online research to prospective respondents.
Finally, we need to ascertain what types of resources and opportunities are available to investigate these questions as well as to provide a method for advertising researchers to collect data online. Government agencies like the National Science Foundation are starting to realize the importance of the Internet to academic researchers. For example, the NSF has recently implemented the TESS (Time-sharing Experiments for the Social Sciences) program. This project is designed to provide social scientists across the country with opportunities for original data collection, doing so in a way that increases the speed and efficiency with which advances in social scientific theory and analyses can be applied to critical social problems. To accomplish its goals, TESS uses two large-scale, cooperative data collection instruments: an ongoing national telephone survey and a study using random samples of the population interviewed via the Internet and WebTV. Investigators interested in mode of interview effects can use both data collection platforms simultaneously. While TESS is currently used only for experiments, a similar program for survey research will be an attractive (and probably very popular) option.
Opportunities to learn more about multi mode potential may exist at many on-campus survey centers. Such survey groups are equipped to handle random digit dial telephone surveys, and could also be used to potentially recruit panelists for email surveys. While a limitation with these types of facilities exist in that they tend to only survey a small region of the country, it would at least be a start to use a randomly generated sample to investigate some of the issues important for researchers to learn.
I still believe in the promise of the Internet for quantitative research and quantitative researchers. One of the most intriguing and frustrating aspects of the Internet is that it is difficult to use what has happened in the past to predict what will happen in the future. As researchers, though, that should encourage us to be creative, to try something new and take research risks to advance our understanding of this dynamic medium.
Baruch, Yehuda (1999), "Response rates in academic studies–a comparative analysis," Human Relations, 52 (4), 421-434.
Bosnjak, Michael and Tracy L. Tuten (2001), "Classifying Response Behaviors in Web-based surveys," Journal of Computer-Mediated Communications, 6 (3), <http://www.ascusc.org/jcmc/ vol6/issue3/boznjak.html>.
Cook, Colleen, Fred Heath, and Russell Thompson (2000), "A Meta-Analysis of Response Rates in Web- Or Internet-Based Surveys," Educational and Psychological Measurement, 60 (6), 821-836.
Couper, Mick P. (2000), "Web Surveys, A Review of Issues and Approaches," Public Opinion Quarterly, 64 (4), 464-494.
Couper, Mick P., Michael W. Traugott, and Mark J. Lamias (2001), "Web Survey Design and Administration," Public Opinion Quarterly, 65 (2), 230-253.
NUA (2002). "How Many Online?," <http://www.nua.com/surveys/how_many_online/index.html>.
Pastore, Michael (2002), "Email Continues Dominance of Net Apps," <http://www.cyberatlas.com>.
Shannon, David M. and Carol C. Bradshaw (2002), "A Comparison of Response Rate, Response Time and Costs of Mail and Electronic Surveys," The Journal of Experimental Education, 70 (2), 179-192.
Sheehan, Kim Bartel (2001), "Email Survey Response Rates: A Review," Journal of Computer Mediated Communication, 6 (2) <http://www.ascusc.org/jcmc/vol6/issue2/sheehan.html>.
Sheehan, Kim Bartel and Mariea Grubbs Hoy (1999), "Using E-mail to Survey Internet Users in the United States: Methodology and Assessment," Journal of Computer Mediated Communication, 4 (3) <http://www.ascusc.org/jcmc/vol4/issue3/sheehan.html>.
Simpson, Pete (2002), "Spam, Spam, Spam" <http://www.mimesweepter.com/news/inthenews/items/101.asp>.
Tierney, Patrick (2000), "Internet-Based Evaluation of Tourism Web Site Effectiveness," Journal of Travel Research, 39 (2), 212-219.
Witte, James C., Lisa M. Amoroso, and Philip E. N. Howard (2000), "Method and Representation in Internet-Based Survey Tools-Mobility, Community and Cultural Identity in Survey2000," Social Science Computer Review, 18 (2), 179-195.
Yun, Gi Woong and Craig Trumbo (2000), "Comparative Responses to a Survey Executed by Post, Email and Web Form", Journal of Computer Mediated Communications, 6 (1) <http://www.ascusc.org/jcmc/vol16/issue1/yun.html>.
Kim Bartel Sheehan (Ph.D., University of Tennessee-Knoxville) is an Assistant Professor of Advertising at The University of Oregon. Her research interests focus on new technology, specifically privacy concerns, information overload, and using technology for research.
* This is an invited article
Copyright © 2002 Journal of Interactive Advertising