Journal of Interactive Advertising, Volume 1, Number 2, Spring 2001
The objective of this paper is to provide some background to those interested in the Internet audience measurement industry, layout the alternative methods employed, and provide greater detail behind the methodology used by the leading Internet audience measurement service. Alternative measurement methods are discussed. Each of the commercial audience measurement firms that are active are constantly striving to refine and improve their methodologies to match the needs of a very rapidly changing industry, and some of these improvements are discussed. This paper concludes that considerable innovation will occur in the five years ahead, just as radical changes have occurred in the five years past.
In 1994 the Internet was a largely unknown technology to most consumers, although the buzz was beginning in advertising circles. The closest experience most consumers had was the commercial online services, CompuServe, Prodigy, and the rapidly growing America Online. A smattering of private bulletin board systems were also available to consumers, but few provided consumer-ready access to the Internet. For that matter, even Internet addressable email was only just becoming available on these systems, and even that was the latter part of 1994.
The Prodigy service was the first of the “big three” online services to provide a workable Internet interface for its subscribers. At the then typical connection speed of 2400 to 9600 baud, the experience for most consumers was a mix of wonderment and boredom. Seeing a page served from an exotic, far aware place such as Argentina was fantastic. The fantasy quickly paled after realizing that each link would load another page which would take another three to five minutes to download. The Internet was not well suited to consumer use at the time owing to the very narrow bandwidth, and the lack of organized content. The commercial online services were far better at working within the bandwidth limits, and provided a great deal of organization to the content they offered. Without question, users of the commercial online services far outnumbered Internet users in the mid 1990’s.
The commercial online services derived most of their economic value through customer subscriptions to the services. Customers paid for content access and connectivity. Prodigy was the innovator in driving advertising revenue, but relied heavily on subscription dollars to keep things moving forward.
External audience measurement of the commercial online services was not in demand at the time. The online services had (and still do!) an excellent subscriber counting mechanism in the form of subscription accounts with which to keep track of their performance. Standard accounting and internal metrics met most of their measurement needs. A simple periodic survey was sufficient to gauge relative market share, and none but the curious were otherwise interested in the size and composition of each of the firm’s audiences, principally because no third-party significant decisions were being made based on the estimates.
In the summer of 1994, less than 15% of U.S. households had a working computer and modem in the home. Further, only one-third of those had a subscription to one of the commercial online services. There just weren’t sufficient audiences to consider the online media as viable advertising alternatives, yet.
By 1995, however, Prodigy had launched its web interface as had CompuServe and America Online. PC ownership and commercial online service subscriptions were rising rapidly, very well in large part to AOL’s windows-based easy to use interface.
The advertising industry was frantic to learn as much as possible about the emerging medium as now significant growth was imminent. The words “Internet”, “World Wide Web” and “dot.com” were on everyone’s lips – especially among those in the media. The hype in so many of the discussions about the Internet was nearly deafening. Previously sensible business people were making imprudent decisions out of great fear. Would the Internet spell the death of the print industry? Would the Internet cause erosion of television prime time audiences? Would the Internet change the very social fabric of our country? Anyone not engaging in these conversations with enthusiasm was labeled as “doesn’t get it.” While there was great excitement about the possibilities, there was even greater fear of the changes that the Internet might bring.
Technical advances were providing more and more people with easier access to the Internet. Investment began flowing into Internet companies, particularly in Silicon Valley, which fueled further growth and media coverage.
The convergence of these factors, significant growth in the audiences and huge interest and concern in the Internet from media, created the demand for a world class audience measurement system for the Internet.
Today, Internet audience measurement is used for three main purposes. The first may be called self-promotion. It is important for organizations to be able to make claims about the size and growth of their audiences or technologies. While internal records are very valuable, they are often not audited for external use and it is difficult to compare the results with those of competitors. Readers of the claims need the context of competitive comparison, and also need to be assured that the results are not improperly biased in favor of the claimant.
The second purpose, which was the driver behind the author’s efforts in launching Media Metrix, is to support advertising planning, buying, selling and posting. Organizations offering Internet media opportunities to advertisers or their agencies, use audience measurement data to help position and sell the inventory. This is the same role that television ratings, radio ratings, and magazine audience estimates play for their respective media. The planner uses ratings data to sort through the many different options available, to identify those that are better values for the target audience. Two media options may be available at the same CPM (cost per 1,000 impressions), for example. However, 75% of option A’s inventory is consumed by women 18+, while only 30% of option B’s inventory is consumed by women 18+. If the advertiser is selling a female oriented product, option A is significantly more valuable than option B at the same price since more of the ad impressions will be delivered to a relevant demographic group. These calculations are sometimes incorporated into the buying/selling process as prices are being negotiated. The last stage in the cycle, as yet underdeveloped in the Internet, is the posting process. After a buy has been made and executed, Internet audience measurement can be used to determine the effectiveness of the campaign in terms of reaching the target group by the amount of reach or penetration (how many different people were reached?) and the average frequency (how many times did each person reached see one of the ads).
The third application of Internet audience measurement data is in strategic planning. The data are a treasure trove of information once properly mined. Knowing the patterns of consumer behavior, how consumers interact with a particular site or group of sites, can help site managers make decisions that improve the traffic flow and objective of the site tremendously.
There are three primary methods of Internet audience measurement in use today, with each having several variations. These are:
- Measurement from a sample of users who are metered (electronic measurement)
- Measurement from a sample of users who are surveyed (recall measurement)
- Measurement from analysis of server log files or their equivalents
Each of these measurement approaches has strengths and weaknesses and can generally be viewed to work in somewhat complementary ways. Although each is more appropriate in certain situations than the others.
Log file analysis entails a study of the server log files. Each time a file is requested of a website, its server records the request and its subsequent actions in a log file. Skillful and careful analysis of the log files can reveal tremendous insights into the nature of the audience for that site. The key strength of this method is that the information source is practically a census of the activity on the site. That offers the analyst the ability to study very narrow activity levels, looking at page by page counts of traffic. The risks, however, are that not all activity is logged on the server, due to local, proxy, ISP, and regional caching. Thus, what was perceived as census is missing some amount of the actual traffic.
Little is known about the entity requesting the file. The industry has progressed significantly from the early days on this front, however, by making good use of IP addresses, cookies and page-embedded measurement markers. At best, though, machines are being tracked, not the people behind them. Geography has emerged as one of the surprisingly most difficult challenges. Resolving all IP addresses to actual country of origin is quite “costly” in the analysis, and is usually best accomplished by making assumptions about the non-resolved addresses (up to 40% of traffic in some cases). The third challenge to log file analysis is sorting out actual users from robots and spiders. Many organizations have created computer programs called robots or spiders who automatically surf the Internet to gather information on various sites. These robots generate significant log file traffic and are often disguised in order to gather the information covertly. A good example is that of e-Commerce site A which has set up a system of robots that go to e-Commerce site B 24 hours a day shopping for a standard basket of goods to determine when site B has changed prices. The last challenge facing log file analysis is that comparisons across sites are difficult. There is not an agreed upon standard for performing the analysis which is implemented by all the players. Steps in the right direction have been taken here, but most of the larger sites have developed custom log file solutions to meet the needs of their own businesses. The results of log file analysis are generally regarded as company confidential information, and are therefore not broadly shared – making comparisons across competitors difficult.
Nonetheless, log file analysis plays a very important role in the industry. It can provide immediate and exacting detail as to what is happening on the site for the managers of the site. The data can be used in promotion and advertising sales, but comparisons to other sites are difficult to make.
These studies draw a sample of Internet users and then query the respondents through standard survey methods. This could be done through telephone, in-person, mailed, or web-based interviews. The advantages of this approach is that pertinent, definitive detail about individual users can be captured, such as age, sex, income, geography, and so on. Moreover, other useful advertising data can be collected, such as attitudes and awareness about certain technologies, hobbies, lifestyles, and intent to purchase key products (such as automobiles) in the near term. The attitudinal, life style, and demographic data combine to form powerful targeting tools.
To finish the job, the surveys must also inquire about usage of specific sites or media properties on the Internet. The survey method for measuring a specific site’s audience is frustrated by three factors.
First, overclaiming is a significant problem for very well known properties, as sites with very high brand awareness are often claimed when no actual usage took place. Additionally, underclaiming is likely for sites with generally low awareness. Depending on the data collection instrument, it can be very difficult to unambiguously describe a site to the point where a respondent would know whether or not he or she had actually visited the site. Clearly sites with good branding have a significant “advantage” in the results.
Second, social desirability (or undesirability) can have a powerful influence on claimed usage. Visitation to adult content sites, for example, will naturally be under-reported, especially when a live interviewer is involved. Similarly, sites with a perceived attractiveness can be overclaimed, resulting from a respondent’s natural tendency to please the interviewer.
Third, since usage estimates will be based on recall, naturally occurring errors in memory will affect the results. Along those lines, telescoping of time is a common problem in survey research. It is difficult for respondents to answer with precision whether or not they have visited a site “in the past 30 days”, especially if the most recent visit occurred within the past 45 days.
At the same time, these large-scale surveys are very valuable devices for understanding the composition of audiences at certain kinds of sites. Good targeting objectives can be set up (e.g., high-income professionals contemplating purchasing a new luxury car in the next 12 months). These surveys are often the best solutions for inter-media comparisons as well, helping planners understand how various Internet media properties fill in a campaign built from traditional media.
These large-scale studies recruit a sample of Internet users, install a software meter to electronically and passively record usage of the computer and the Internet, and automatically transmit that data back to a central office for tabulation and projection. Properly executed, this technique identifies specific individuals using the machine, to insure that the measurements taken are attributable to individuals across machines within a households, rather than to the machines themselves. This is an essential characteristic for media measurement, since it is people who consume the medium, not the machines. This is the method employed by Media Metrix, and it is the focus for the remainder of this paper.
This method of audience measurement has the advantage of accurately measuring actual usage of websites URL by URL, as well as digital media applications, such as America Online, instant messenger products, etc., key stroke by key stroke. Measurement of non-consumer branded media, such as exposure to the Doubleclick advertising network, can be gauged through observation, rather than being forced to rely on a consumer’s ability to recall and report seeing an ad served by the Doubleclick network (virtually impossible).
Given that all sites are measured by the same yardstick, inter-site comparisons are easily made, allowing for reliable rankings. Demographic details are known about each of the users in the sample, allowing for audience composition calculations. Click by click, key stroke by key stroke data are available for each user in the sample, allowing great flexibility in analysis.
The disadvantages of this method, versus the others, are that it is based on a limited sample of individuals, although the commercially available services today have built extremely large samples. Current solutions can measure audiences reliably down to approximately 50,000 unique visitors per month. Additionally, the representativeness of the sample must be managed very carefully. Solid, proven methods of statistical sampling, and sample maintenance, are necessary to insure that the sample is representative of the universe.
While log files measure all activity around the world, samples will measure only those geographies that are specifically sampled. Unfortunately, it is not always economically viable to recruit and install samples for all geographies of possible interest. The major Internet markets around the world are presently being measured. Media Metrix and its affiliates, for example, are measuring Internet and digital media usage in the United States, Canada, U.K., France, Germany, Sweden and the Nordic countries, Switzerland, Spain, Italy, Brazil, Argentina, Mexico, Japan, and Australia as well as others under consideration.
There are five steps to measuring Internet audiences using a sample of individuals. The first is to define the universe of individuals and their circumstances, and to be very clear on which behaviors are to be measured. This is a very important step, as it serves as the foundation on which all measurement follows. Once a universe is defined, it must be measured in its own right.
Next, a representative sample of universe members must be recruited. The behavior from this sample will be projected to the previously described measured universe. The recruited sample must be interviewed to capture personal and household level demographic characteristics for subsequent analysis. The sample must also be installed with the electronic measurement system, typically the meter. Once meters are installed, a series of edit rules must be developed and applied to the data that is returned to the central office. These edit rules determine which respondents are reliably returning usage data. The last step is to weight the sample to correct any demographic biases in the installed, in-tab sample versus the universe estimates, and then project the weighted sample to the universe.
The usage behavior of the weighted and projected sample is then a representative portrayal of the behavior of the defined universe. Behavioral patterns observed in the weighted, projected sample are assumed to reflect behavioral patterns in the universe.
This method is used by most of the commercial Internet audience measurement companies, as well as by the audience measurement firms in television, radio, print, newspaper, and other media. Naturally, each firm will employ somewhat different techniques that differentiate the offerings, but the general principal is applied fairly universally.
This is not as simple as it seems. The defined universe must be a measurable universe by means of standard survey methods, and by means of a recruited, metered sample. The first consideration is to define whether the universe will be actual users in a specific time period, or those who have access (that is, they could be users). A definition of actual users (Media Metrix employs a “past 30 days usage” definition) has reduced ambiguityin that respondents are able to understand definitively and uniformly the question, “did you, personally, use the internet in the past 30 days or not?”
A universe definition tied to access, rather than recent usage, does have certain advantages, however. First, the population’s access to the internet is less volatile than actual usage. From research conducted by Media Metrix, we know that 15% of each month’s Internet audience will not use in the next month, replaced by a separate set of users that hadn’t used in the previous month. Therefore, it would be advantageous to project a sample of persons with access to a stable universe of those with access, and then observe in the sample the number of actual “past 30 day users” from the weighted projected sample. The challenge, however, is to define “access” in an unambiguous and uniform way that can be understood by all survey respondents, and from which a sample of individuals with “access” could be recruited. The real challenge, is that virtually everyone in the United States has “access” to the Internet if he or she made it an imperative to get on the Internet. An operable definition must be laid out that can be mirrored in the universe estimate study and in the installed sample.
Usage in a specified time frame, or access, is only the first step. Next, limits on the usage need to be defined. As a practical matter, metered samples are difficult to set up in academic environments, in public usage areas (such as Internet Cafés, or Kiosks), or work environments. The researcher must define which locations of usage are to be included in the universe, and then map the universe estimate study with the sample. Since 1997, Media Metrix operates its sample to reflect usage of the Internet At Home and At Work. This means there are three components to the total sample – those who are installed with the meter only at home, those who are installed only at work, and those who are installed both at home and at work. Obviously, managing the samples by location is critical, given the volume of Internet usage occurring at work.
The third component of defining the universe is to choose which behaviors are to be included in the universe definition. Internet usage includes usage of the World Wide Web, email, streaming media, FTP or other file transfers (such as those made recently famous by Napster), commercial online services, instant messaging products, multi-player games, and other client-server applications.
Some of the commercial providers of Internet audience measurement define their measured universes to be strictly usage of the World Wide Web. This is unfortunate, especially in the United States, since America Online the proprietary service represents the largest segment of Internet users and would be specifically excluded from such a definition. Instead, the broadest definition possible is best, so that all of these behaviors can be measured and understood. The challenge facing the commercial providers is to keep up with the rapidly changing technologies. Media Metrix began its audience measurement efforts by including and reporting all software applications, which while including America Online, CompuServe, Prodigy and others, also included software titles such as Microsoft Word, Excel, and other applications. Today, the we are uniquely continuing to evolve a definition of “Digital Media” which includes most applications that make use of the Internet as a means for connecting media and consumer, with an eye towards advertising support.
In summary, the universe definition used by Media Metrix is all those persons aged 2+ who have personally used the internet in the past 30 days at home or at work, where the internet is defined as any of the world wide web, the commercial online services, instant messaging products, and certain advertising supported email applications. This definition is under constant review by the organization, as the industry continues to evolve.
Given a workable definition of the universe, the next step is to develop an estimate of the size of that universe. This is typically done through monthly enumeration studies, conducted via RDD telephone surveys. (RDD is the most common, high quality method of conducting enumeration studies. RDD is an acronym of Random Digit Dialing, a method of random sample selection from the universe of telephone households in the United States.)
The role of the Enumeration Study is to provide the researcher with the monthly (or periodic) universe estimate to which the installed, metered sample will be projected. Thus, this is an independent sample. Most organizations conduct effectively continuous interviews from replicates of random telephone numbers, and then tabulate the results monthly to develop estimates for the universe, as defined by the research organization. These surveys tend to be very complex, in that they must precisely define the universe to the respondent so that accurate answers can be gathered. A clear definition of the medium must be provided, the location of use, the recency of usage by location, and others are all factors that go into the questionnaire design.
The core of the research comes in the sample which is installed with the data collection instrument or meter. To provide reliable estimates for clients of the service, the sample must be representative of the universe it is aiming to measure. This means that a good random sample must be generated, biases in the sample are minimized at every possible turn, and the highest manageable cooperation rates are achieved.
RDD is again at the foundation of the sampling process for the leading Internet audience measurement organizations. Other methods are also in use, which will be discussed later. The method employed by Media Metrix is described below.
The RDD method begins with a very high quality list of randomly generated telephone numbers from working telephone exchanges. Those telephone numbers are then passed against reverse telephone directories, to match names & addresses to the phone numbers where possible. Typically, about 50% of the phone numbers match. Keep in mind that the phone numbers are randomly generated out of working exchanges – they are explicitly not exclusively “listed” phone numbers, as that would introduce a bias. The list is then split in two, the mailables, and the non-mailables.
The mailable sample specs are sent a recruitment package via mail. The recruitment package outlines the program of audience measurement, explains the benefits and incentives for becoming a member of the panel, and urges the recipient to go to the recruitment website to register (record demographics) and download and install the meter. The non-mailable specs, and those mailables who fail to self-install, are sent to the telephone center, where they are called and recruited via live telephone operator. The recruiter explains the program, invites the user to visit the website, and also offers to mail recruitment literature. This approach maximizes the yield of the list: Those persons who refuse to take solicitous phone calls are still reached with the mail method. Those who automatically discard mail solicitations, are given a second chance via the phone.
Other organizations have used alternative methods for metered sample recruitment. A very inexpensive method is to use web banner advertisements to entice users into a program. This has been conducted with varying degrees of success by several organizations. The risks with this method are two-fold. First, the probability of seeing any banner, or banner campaign, is directly proportional to the user’s total amount of internet usage. A person who spends 20 hours per month on the internet is 20 times as likely to see a particular banner campaign as one who spends only one hour per month online. Thus, banner recruitment introduces an obvious and correlated bias into the sample toward heavy internet usage that must be addressed in some way if possible. The second bias introduced is even more difficult to overcome. If all the recruitment banners were presented on, say, the portal Yahoo!, all of the recruited respondents would be known to have visited Yahoo!. The behavior of those Yahoo! visitors (with a bias towards heavy Yahoo! visitors) is certainly not representative of all Internet users in general. Thus, a bias not only toward heaviness of usage, but toward or away from specific sites is also introduced with this methodology. The allure of this method is its low cost of respondent acquisition versus the RDD method. Allure, because larger samples are appealing, but the quality tradeoffs in the form of specific, known biases are substantial and bring into question whether or not the true cost is lower or not.
One solution proposed for this is to spread the banners across many different sites, and to supplement banner recruitment with other kinds of recruitment. While this may help disguise some of these biases, one can’t rely strictly on luck or hope to wash out other less obvious, but no less dangerous, biases that will undoubtedly be introduced. This is a rich area for future research, in which a measurement of and an understanding of those biases could be developed, and scientific methods to overcome those biases might be found.
Once the sample is recruited and installed, its ongoing maintenance is the next concern. To be of greatest value, it is important to keep recruited panelists in the program for an extended period of time in order to study behavioral changes individual by individual. Thus, it is desirable to have limited voluntary turnover in the sample. However, with the rapid growth of the Internet audience, it is important to insure a steady flow of new respondents into the sample, so that “newbies” are properly represented in the sample. Additionally, as respondents acquire new PC’s, or gain/lose access to the Internet at work, it is important to follow up to insure that all measurable locations are indeed measured.
Media Metrix installed the first meter, then called the “PC Meter” into a consumer sample in June 1995. Since then, many other solutions have been proposed for metered measurement and some of those have even been deployed in large-scale samples.
While there are different solutions to capturing usage behavior, the commercialized methods fall into three categories, namely those based on the Operating System, those based on a Local (client-based) Proxy, or and those based on a Central Office Proxy. Each is described briefly, in reverse order.
The Central Office Proxy method sets up the respondent’s browser to proxy through a central office, controlled proxy service. All file requests to the internet are passed through the proxy server where they can be recorded and attributed back to the respondent’s PC. This is the equivalent of an ISP recording all file requests by its subscribers. Unless a more developed solution is utilized, it is not possible to know which individual is at the PC for the Central Office Proxy method.
The Local Proxy method sets up a virtual proxy server on the respondents PC, where all file requests from the browser are passed first to the local proxy which records the request, and then passes the request on out to the internet. This method captures all of the files requests, but must be incorporated into each browser version on the panelist’s PC. When new versions of browsers are introduced, new set up and installation software must be created and distributed to the sample. As new browser versions have been introduced with some regularity over the years, and we can only expect greater innovation in the months and years to come, this method requires a high degree of diligence on the part of the panel operator to insure that the sample stays fully installed. The key short coming of this method, however, is that only web surfing file requests are captured. Other uses of the internet are missed. Usage of the America Online proprietary service, for example, is not observed through a proxy cache. Instant messaging products, such as Yahoo! Instant Messenger, generate considerable activity, but are simply not observed by a Local Proxy meter.
The Operating System meter is the method employed by Media Metrix. This meter binds to the operating system of the respondent’s computer. Since it is in the system at such a deep layer, it is able to keep track of the activity of all applications on the PC. Whenever any application renders a web page, whether it is a browser or some other application such as Real Player, the meter is able to see that a page is being rendered and it quickly records the associated URL.
When the user first boots up the computer, the meter presents its User Interface screen, which asks the current user to identify himself from the list of registered persons for the machine. A simple mouse click selects the user, and the meter drops into the background to do its work. Behind each user’s name is the date of birth and gender. All subsequent usage of the machine is attributed to that user within that household.
This screen presents itself at machine boot up, and also after 30 minutes of machine idle time. While the meter is in the background, it remains an active and visible icon in the system tray at the bottom of Windows’ screens. If one user should replace another at the key board, a click on the system tray recalls the meter’s User Interface so that the alternative user can be identified and credited with the subsequent usage. Properly identifying individuals, especially within the home environment, is important, as Media Metrix has observed an average of 1.8 different web-using individuals within each web-using household in the United States.
While in the background, the MMXI Meter records five channels of information. The machine state is first, meaning whether the machine is in active use, is idle, or has “gone to sleep”. The second is a continuous record of software applications in focus. Inside the Windows operating system, for example, only one software application has the focus of the machine at a time. This is the application that can “see” the user’s typing, is responsive to mouse clicks, and typically is the application that has the blue bar across the top. As a user shifts from one application to another, the meter records each of these changes, recording the date and time of each such shift to the log file of information.
The third channel of information is, for certain pre-determined applications, the capture of the contents of the window title bar. The contents of that window title are useful in determine what is happening inside certain pre-selected applications. The commercial online service, America Online, for example provides a fairly clear and steady record of the user’s activity.
The fourth channel of information is captured from any application which is presenting a web page. The URL of all pages presented are captured as the pages are rendered. The URL’s are updated as the user moves from page to page, including when the user navigates by means of the back button in the browser. This method allows us to know exactly what the user is seeing as he or she sees it. As the URL changes, the meter records the new URL and the date and time it was presented on screen. This provides a rich click stream of page-by-page viewing of the world wide web.
The fifth channel of information is a record of all files requested over the TCP/IP connection to the internet. This provides a date/time stamped list of all files requested, whether they were HTML pages, graphics images, streaming media content, or other data files requested over the internet by the user’s PC. Importantly, all files requested are captured and recorded regardless of which application is making the request. It is not, therefore, necessary for the meter to be calibrated to each version of browser or Internet enabled application.
All data captured are immediately written to a usage log file queue. Whenever there is an active internet connection, the meter’s data transmission subsystem bleed data of the usage log queue, sending the data in packets back to central data collection servers located at the company’s data collection headquarters. If a live internet connection is not available, as is often the case for those users who use dial-up internet access, the data are stored on the user’s hard disk until a connection is made. Since the meter records all computer usage, not just Internet usage, a certain amount of information is kept locally until it can be transmitted.
When the data arrive, the change line data undergo preliminary processing. The meter records changes that occur on the user’s computer. That is, the meter reports the date and time that a particular user switched from, for example, page A to page B and then to page C. These data are converted into a normalized form which we call state data. State data represent the complete fact of usage in a single record. In the previous example, the state data would be that a particular user, on a particular date, at a particular time, viewed page B for exactly 20 seconds.
These data are then matched to dictionaries of the Internet, which organize the millions of different URL’s viewed by the sample into digital media properties, reflecting how the Internet companies operate their businesses. This is an important development in the maturing of the industry. In the beginning, a simple list of file servers was adequate to describe the sites. Today, it is far more complex, outlining ownership and control across sites and digital media properties.
At the end of the month, each respondent is evaluated to determine whether or not he is in the Universe and “in-tab” (or included in the reports). To be in the Universe, given the current definition employed, the user must have used Digital Media during the reporting period. To be in-tab, or included in the reports, the user must meet certain reporting criteria designed to insure that we have received a complete record of the user’s usage and non-usage during the entire month.
All those individuals that are determined to be in-tab, are then selected for weighting and projection. The in-tab sample is weighted on key demographic characteristics to insure that the sample reflects the demographic composition of the Universe, as measured by the universe estimate study. The weighting variables are mainly personal and household characteristics, presently including age, sex, household size, household composition, geographic region, household income, education of householder, and others. Once the sample is weighted, it is projected to the universe estimate. This is accomplished by applying a constant projection factor to each individual’s weight following the weighting procedure. The result is a weighted, projected sample from which detailed computer Internet usage data are available.
From these data sets, reports are generated which provide estimates of the number of unique visitors visiting each of some 5,000 different reportable websites each month. The “30 day cume audience” has become the standard measure of comparison across sites over the past five years of measurement. However, other measurements are also reported which may have even more value to advertisers. These include the frequency of usage, average number of pages viewed, average number of minutes spent, etc.
The objective of this paper was to provide some background to the Internet audience measurement industry, layout the alternative methods employed, and provide greater detail behind the methodology used by the leading Internet audience measurement service. There are alternative methods, and each of the commercial audience measurement firms that are active are constantly striving to refine and improve their methodologies to match the needs of a very rapidly changing industry. Without question, considerable innovation will occur in the five years ahead of us, just as radical changes have occurred in the five years past.
Consumer Dynamics on the Web–The Reality Behind the Myth
(PowerPoint HTML format | Animated PowerPoint format)
A presentation to the annual meeting of the American Academy of Advertising at Lexington, Kentucky, (March 1998) by Steve Coffey, Media Metrix.
Steve Coffey is Executive Vice President of Media Metrix, responsible for the core day-to-day business operations of the company. e-mail: [email protected]
URL: jiad.org/vol1/no2/coffey
Copyright © 2001 Journal of Interactive Advertising