Smart Computing-Editorial

Reference Series


How Computers Work, Part I
August 2001• Vol.5 Issue 3 Page(s) 180-187 in print issue
Inside The Internet How Information Travels Online

You first entered Web life at college. Soon after going online, you received all kinds of ads for student discounts and cheap spring break fares. You left college, and the ads disappeared. When you got a dog, your online interests turned to things canine, and soon your favorite portal sites displayed pet food advertisements. Who’s spying on you?

Most Web sites have systems in place for tracking the usage of their files, and some use any personal information you submit to tailor the content to your interests. A few sites even try to guess your interests based on your online activities. The user usually is an anonymous number from the Web site server’s perspective, but even the accumulated data from tracking nameless computers proves to be a powerful resource. Most site administrators keep different Web tracking logs to improve their sites, and at the same time, advertisers study audience numbers to determine which sites offer the hottest spots for their ads.

Gathering Information. Understanding how a Web site can keep track of its visitors starts with understanding the basics of how the World Wide Web works. As a Web user, you begin your travels at your own PC. You type a URL (uniform resource locator), a Web page address, into your browser or click an on-screen link. Using this request, your browser and PC send the URL request to your ISP (Internet service provider), a local firm or national company, such as AT&T WorldNet or Earthlink, that serves as middleman between you and the Internet.

The ISP relays the request through the vast system of connected computers that make up the Internet. When it reaches the server, the device that manages Web site resources, the request is for a single URL, but the server breaks it into pieces. It registers one request per item on the page, each of which counts as a Web hit. The heart of the page is an HTML(Hypertext Markup Language) file that will tell your browser how to assemble the page for viewing. The HTML file also lists all the auxiliary files, such as photographs and Web-based applications, that fill out the Web page. All these items travel to your PC separately where your browser assembles them into the full Web page you requested.

The server then compiles its information to send to the IP (Internet Protocol) address, a unique number ISPs assign to each of their clients. Some ISPs provide different addresses every time a computer logs in; others assign clients permanent addresses. Static IP addresses are how a Web server keeps track of a particular computer’s activity within the server’s realm of Web sites.

As the server processes the request, it asks the server application for special instructions on how to send the information. This application is a program on the server that can differentiate among visitors. It can verify a username and password the user entered to access a page with restricted access, or it can place a cookie on the hard drive. A cookie is a text file a server puts onto a visitor’s hard drive for future access. The next time the user requests that URL, the server will look on the hard drive for the cookie and read its contents to learn about the user.

When the ISP makes its request to the Web server, it offers the server some extra information such as which ISP it is and what computer (identified only by its IP address) made the initial request. The server application keeps track of this, notes the time, and notes what information was requested in a log file, a list of Web activity. Webmasters view these log files to see what has been happening on their Web sites.

The Not-So-Quiet Type. There are two types of Web users, and most of us act as both at various times. The first type of user only requests data, offering no more personal information than the minimum that is revealed simply by requesting a Web page. If all you ever do on the Internet is request data by entering URLs, the most a Web site can track is the pages your computer requests and how often it visits.

The second type enters information in addition to viewing sites. Every time you identify yourself by typing a username and password or filling in a blank on a Web page, you give the Web site information to process. The server passes the data to a server application, which checks for previous information from you, analyzes your demographic profile, and sends appropriately tailored information.

If you have a Web-based e-mail account, for example, the service’s site probably asks for a password before it will display your e-mail messages. The server application matches the password with your file, which includes both e-mail messages and the information you provided when you originally signed up for the account. At the top of the page, you may see an advertisement specific to what the Web server knows about your interests. Although your messages are confidential, your interests are far from private. The information you provide about yourself is the price of information or a service on the Internet.

Hard Drive Cookies. When the site server delivers the Web page, it often drops a small file called a cookie onto your hard drive. This file contains a unique username the Web site assigned to you. When you visit that site in the future, the server looks for the cookie to provide information about your computer. It can read only its own cookie; it cannot retrieve another server’s cookie or other files on your computer. However, some sites do form alliances that allow a site to read cookies placed on your computer by every other site in the group. The server reads the cookie after it receives your request and before it delivers the requested content.

This NextCard Visa Web page is surrounded by the various advertising links a user could follow to arrive at the site. NextCard Visa's Web server application notes which users clicked on which advertisement to access its Web page.

Cookie files assist the Web site in a number of ways. For example, an automobile site promoting a new car by giving one away in a drawing might assign you a cookie so you cannot enter more than once. Also, a site might rotate ads for users so they do not see the same advertisement twice. Cookies benefit you by eliminating some repetitive data entry. With a cookie on your hard drive, for example, you can request not to have to enter your password each time you visit a particular site. The site recognizes your computer when you request the page and reads the password from the computer’s cookie.

Analyzing The Information. The real power of a server’s tracking abilities appears when the information is compiled into a report for the site administrators. Companies access the data by installing a Web tracking program on their servers. The information the trackers receive varies according to which tracking program they use and what extra tools they install along with the program.

Usage information appears in a customized log file that is a complex spreadsheet full of data about the visitors and their Web hits. A Webmaster can program a server application to produce a report containing the following types of information about a specific computer’s use of the site:

The computer’s IP address.
The number the site’s server assigns to a specific username or password.
The date and time of requests the computer makes. Most log files are chronological.
The method of the request. This will be GET if the server sent information to the user, POST if the visitor sent personal data to the server, or HEAD if the visitor is an automated system checking whether anything on the site has changed.
The URL or graphic the user requested.
The status of the request, or whether the visitor successfully received the requested page.
The amount of information, which is measured in bytes, the user sends to the server application and the server sends to the user.
The referring URL or the link a user clicked to arrive at the site. If a user types the address rather than linking directly from another site, the log file may show its own Web address.
The user’s browser or ISP.
The computer’s OS (operating system). This lets Web servers know in which format to send you the information.
The user’s cookie file name.

Clearly, Webmasters are interested in much more than simply Web hits, which can provide inaccurate representations of a Web’s audience. Unless a programmer knows how many hits make up certain pages, the actual number of pages requested is unavailable. Therefore, companies combine hits into page views, which measure the compilations of successful hits that add up to one Web page for the user.

Webmasters also want to know the number of visitors their sites attract. Some site servers consider each request for a specific Web page a new visitor, even if the same visitor requests the page multiple times.

Most site administrators track how long users spend on the Web site. Request durations measure how long a user spends on one page before requesting another. Visit durations measure the amount of time between a user’s first and last requests.

Even if you misspell what you're searching for, MSNSearch can direct you to what you really wanted.

As a Web surfer, you can easily track your own page views, request durations, and visits, but it’s difficult to track the number of hits you make. You can get a rough idea of how many hits you made at a specific Web page by counting the number of non-textual items on the page and adding one more for the HTML file itself.

This is just a guess, however, because the average user won’t recognize all of the elements. You can also click your browser’s View menu and select Source to see the HTML code behind the Web page you’re viewing. Within the source code, select Search from the menu bar, click Find, and search for IMG, GIF, and other graphics file extensions. Each graphics file extension you find is another hit. Unfortunately, unless you know all the possible graphical file extensions, you may miss one or two. Without the log files to that Web site, you can sleuth no more.

Why Web Tracking Matters. Web tracking is far more than simply a way for Web site administrators to spy on their users. It’s even much more than a simple way to measure the size of a site’s audience. Companies use tracking information to provide advertisers with detailed pictures of the site’s audience and provide users themselves with targeted advertisements, time-saving services, and better query results.

Advertisements. Through cookies and log files, Web site servers can record which advertisements you’ve seen and decide which ones to show you in the future. If you use a portal site to link to other sites on a specific topic, the portal may begin showing you ads based on that topic. Using a portal to link to classical music sites, for example, might produce ads for record clubs during your next visit to the portal.

Online advertisers use referral data to see how many users clicked specific advertisements to arrive at their sites. This detailed analysis makes it easy to spot both the successful and ineffective ads. More importantly, most sites don’t care who you are as much as they want to know from where you came.

However, this is the only information that the referring site offers advertisers about their users. For example, Barnes & Noble can distinguish if someone arrived at its site through an ad on Lycos, but Lycos tells Barnes & Noble nothing else about its users. It doesn’t really have to, since Barnes & Noble begins collecting its own information about the users as soon as they arrive at a Barnes & Noble Web site.

Demographics. Web-masters also use Web tracking information to segment their audience into demographic groups. You may have an e-mail account on a Web site owned by a company that owns many Web sites. The ZIP code you provided while registering for the e-mail account went into the same user profile the server accesses for other sites of theirs that you visit. For example, if someone from Denver registers with a network of sites that offer geographically specialized services, such as local news or weather, information specific to Denver appears on-screen whenever the user goes to these sites.

Some tracking software, such as Ultramatch, which was originally designed for the Go Network (http://www.smartcomputing.com/editorial/://www.go.com), tracks each user’s activity across the company’s network of Web sites and guesses the visitor’s age, gender, and interests to assemble a demographic user profile. Users in the same profile will receive similar information when they reach a Web site on the network.

For example, if a user clicks Boy Scouts of America links and the schedule for the New York Knicks and searches for various college home pages in New England, the network of sites may profile that user as a college-bound high school male in New York. This is clearly stereotyping, but it’s a best guess about the user’s demographics. However, if a computer has more than one user, such as with a family of four, the network attempts to lump them together in one profile based on that PC’s single IP address. This is another reason why Web sites want their visitors to become registered users. If every member of the family has a separate username, the tracking software does not have to try to compile one profile of that computer.

Query results. Finally, Webmasters analyze their logs to uncover the shortcomings of their sites. They want to know when a user is not able to get the information she requested. For example, MSN(Microsoft Network) analysts look at query (information request) failures to find common spelling errors.

Once MSNfinds that a certain phrase is being misspelled continually, it can adjust the search results to make up for its users’ lack of spelling accuracy. If Amtrak is searched for as Amtrack enough times, the MSNcrew adjusts the search results to bring up the information for Amtrak even if the searcher doesn’t know the proper name.

(As of April 2001)

The Web Tracking Players. A broad selection of tracking programs and services fill the needs of various site administrators. Many companies use Microsoft’s Site Server, which provides Web site usage information to Webmasters. Other companies design their own or have outside companies design a product for them. Almost all World Wide Web analyzers, whether they are commercial products or ones designed internally, compile information on their log files through the CLF(Common Log Format) or ECLF(Extended Common Log Format).

A Web site server can also send its logs to an outside company to sift through and organize into log files. For example, some servers send their information to I/PRO’s NetLine, a program that organizes Web tracking data for individual companies. NetLine software sits on the Web site server, retrieves log files, and sends them to I/PRO. The Web tracking company then organizes the information into a report and returns that information to the server’s Webmaster.

The Future Of Web Tracking. The art of analyzing Internet usage data is still in its developmental years. A major goal for the future is finding an effective way for Web sites to exchange information on user activity without betraying the personal information of their loyal users. They want to share data about an individual PC’s activity without revealing that John Doe who lives on Ash St. in Tulsa and makes between $35,000 and $50,000 a year likes to visit sites about New Zealand tourism.

Outside analysts, such as Engage, provide one way to exchange usage information. Engage spent the last few years building Engaged Knowledge, a global database of user profiles that now has more than 2,000 Web sites contributing information about their users. Engage has 88 million anonymous user profiles to offer these Web sites.

That information, according to Debbie Hynes of Engage, is shared with third-parties, but not in its entirety. While Engage does not share the “raw profile”information with third-parties, Hynes says the company does share “recommendations” on a cookie-specific basis with third parties. “For example, Angara uses our recommendations to target product offers on the Web sites of e-commerce companies,”says Hynes.

As programs that analyze how users make use of their online time mature, many companies are de-emphasizing hit counts and focusing on unique users, a more accurate audience measure that counts the number of different visitors to a site per month. Media Metrix has become a leader in Internet audience tracking by installing more than 100,000 tracking devices into personal computers in homes and businesses worldwide. These computers represent a sample of the online community, giving Media Metrix estimates of how many unique users visit different sites each month.

Web sites increasingly use their own server tracking information for demographic profiling and monitoring of a site’s performance. Advertisers, however, look more and more to Web profilers, such as Media Metrix, to show them which Web sites are hot and which ones are not. When Web sites brag about unique users, they’re using data from a Web profiler, such as Media Metrix, not their own.

Perhaps for the first time ever, a media outlet can track with almost absolute precision your attention to their information. They don’t know who you are, but they can remember your PC. That still might sound like someone’s spying on you, but remember the benefits of Web tracking. Now that the Internet knows which puzzle piece you are, you’ll have to skim past less information and advertisements irrelevant to you. We’ll see whether newspapers and the evening news can keep up with such technology.

by Michelle Nelson

View the graphics that accompany this article.
(NOTE: These pages are PDF (Portable Document Format) files. You will need Adobe Acrobat Reader to view these pages. Download Adobe Acrobat Reader)

What The Server Sees

What the server knows about first-time visitors:

your IP address
your type of computer and browser
your ISP’s (Internet service provider’s) name
that it has no cookies on your system
your page requests

What the server knows about registered visitors:

the contents of its cookie on your hard drive
that you’re a registered user
the application server may need to request and check a password for certain applications
its guess of your demographic profile

Want more information about a topic you found of interest while reading this article? Type a word or phrase that identifies the topic and click "Search" to find relevant articles from within our editorial database.