Dick Thornburgh and Herbert S. Lin, Editors

Committee to Study Tools and Strategies for Protecting Kids from Pornography and Their Applicability to Other Inappropriate Internet Content

Computer Science and Telecommunications Board

National Research Council



2.

 

Technology



Suppose that a student is assigned to do a report for school on animals that build things, and he selects beavers as his primary topic. Connecting to the Internet through a computer at home, he goes to an online search engine, where he tries to search the Internet for information about "adult beavers." The search engine returns links to a large number of Web pages. When he clicks on a certain link, he is surprised when he finds a sexually oriented Web site intended for adult use.

This scenario--or one similar to it--is one of the most common that underlies parental concerns about children using the Internet. This chapter addresses the technological dimensions of this "reference scenario" and some of the things that can be done to protect against it.


2.1.

 

AN ORIENTATION TO CYBERSPACE AND THE INTERNET


2.1.1.

 

Characteristics of Digital Information

In the reference scenario, the student is seeking information (content) on beavers--a kind of animal. All information on the Internet is represented in bits--electronic strings of 1's and 0's that are later interpreted according to some algorithm to produce a representation that is meaningful to human beings. Digital information has properties very different from those of the information that a student might retrieve in a book. For purposes of this report, the salient aspects of this digital representation of information are the following:1

  • Reproducible. Unlike a physical book or photograph or analog audio recording, a digital information object can be copied infinitely many times, often without losing any fidelity or quality.
  • Easily shared. Because information is easily copied, it is also easy to distribute at low cost. Digital information can be shared more easily than any type of analog information in the past. In the physical world, broadcasting information to groups has serious costs and hence requires a certain wherewithal and commitment. Technologies such as e-mail and Web sites allow broadcasting to many people at the touch of a single button.
  • Flexible. A variety of different types of information can be represented digitally: images, movies, text, sound. Digital information can even be used to control movement in the physical world through digitally controlled actuators.
  • Easily modified. Digital representations of information can be easily manipulated. It is trivial to modify an image--say, changing hair color from blond to red, adding a few notes to a musical score, or deleting and adding text to a document. So, for example, a naked body can be affixed to a head of a child, words modified from their original intent and music "borrowed" freely, and even virtual "people" created, all without leaving a visible trace of these manipulations.
  • Difficult to intercept. Because no physical object is necessarily associated with a digital information object, interdiction of digital information is much more difficult than interdiction of a physical object carrying information. In other words, there is no book, no magazine, no photo that can be intercepted by physical means.


2.1.2.

 

The Nature of the Internet Medium and a Comparison to Other Media Types

In the reference scenario, the student relies on the Internet. The preceding discussion about digital information is important, but the nature of the Internet itself also makes it quite unlike other more traditional media such as television, film, print, and the telephone. Thus, it is useful to describe certain key features of the Internet medium and to compare it to some other, more traditional media.

  • The Internet supports many-to-many connectivity. A single user can receive information and content from a large number of different sources, and can also transmit his or her content to a large number of recipients (one-to-many). A single user can engage with others in a one-to-one mode (one-to-one). Multiple users can engage with many others (many-to-many). Broadcast media such as television and radio as well as print are one-to-many media--one broadcast station or publisher sends to many recipients. Telephony is inherently one-to-one, although party lines and conference calling change this characterization of telephones to some extent.
  • The Internet supports a high degree of interactivity (). Thus, when the user is searching for content (and the search strategy is a good one), the content that he or she receives can be more explicitly customized to his or her own needs.2 In this regard, the Internet is similar to a library in which the user can make an information request that results in the production of books and other media relevant to that request. By contrast, user choices with respect to television and film are largely limited to the binary choice of "accept or do not accept a channel," and all a user has to do to receive content is to turn on the television. The telephone is an inherently interactive medium, but one without the many-to-many connectivity of the Internet.
  • The Internet is highly decentralized. Indeed, the basic design philosophy underlying the Internet has been to push management decisions to as decentralized a level as possible. Thus, if one imagines the Internet as a number of communicating users with infrastructure in the middle facilitating that communication, management authority rests mostly (but not exclusively) with the users rather than the infrastructure--which is simply a bunch of pipes that carry whatever traffic the users wish to send and receive. (How long this decentralization will last is an open question.3) By contrast, television and the telephone operate under a highly centralized authority and facilities. Furthermore, the international nature of the Internet makes it difficult for one governing board to gain the consensus necessary to impose policy, although a variety of transnational organizations are seeking to address issues of Internet governance globally.
  • The Internet is intrinsically a highly anonymous medium. That is, nothing about the way in which messages and information are passed through the Internet requires identification of the party doing the sending.4 One important consequence of the Internet's anonymity is that it is quite difficult to differentiate between adult and minor users of the Internet, a point whose significance is addressed in greater detail in Chapter 4. A second consequence is that technological approaches that seek to differentiate between adults and minors (discussed in Chapter 13) generally entail some loss of privacy for adults who are legitimate customers of certain sexually explicit materials to which minors do not have legitimate access.
  • The capital costs of becoming an Internet publisher are relatively low, and thus anyone can establish a global Web presence at the cost of a few hundred dollars (as long as it conforms to the terms of service of the Web host). Further, for the cost of a subscription to an Internet service provider (ISP), one can interact with others through instant messages and e-mail without having to establish a Web presence at all. The cost of reaching a large, geographically dispersed audience may be about the same as those required to reach a small, geographically limited audience, and in any event do not rise proportionately with the size of the audience.
  • Because nearly anyone can put information onto the Internet, the appropriateness, utility, and even veracity of information on the Internet are generally uncertified and hence unverified. With important exceptions (generally associated with institutions that have reputations to maintain), the Internet is a "buyer beware" information marketplace, and the unwary user can be misinformed, tricked, and seduced or led astray when he or she encounters information publishers that are not reputable.
  • The Internet is a highly convenient medium, and is becoming more so. Given the vast information resources that it offers and coupled with search capabilities for finding many things quickly, it is no wonder that for many people the Internet is the information resource of first resort.


2.1.3.

 

Internet Access Devices

In the reference scenario, the student uses a computer to access the Internet. While today a personal computer is the most common way to connect to the Internet, devices for accessing the Internet are proliferating. Entire businesses have begun to spring up in order to ready content and delivery of information for a host of other devices. These devices include:

  • Handheld organizers like Palm and Handspring--typically these devices contain built-in wireless modems and use services like OmniSky;
  • Cell phones with built in Web access;
  • WebTVTM and Internet access devices that are used on TV sets and customized to MSN and AOL and whose deployment began in 2001;
  • Blackberry RIM and wireless paging devices;
  • Standalone Internet machines like the Compaq Ipaq and mailstations;
  • Kiosks designed for surfing the Internet and typically used in public spaces;
  • Game machines like Sega, Nintendo, Microsoft's Xbox. Today's gaming technology (e.g., Sony's Playstation) increasingly uses the Internet to provide users with multi-player communities in which a user can compete against and/or cooperate with other like-minded individuals. Software is generally available on CD-ROMs, and the widespread availability of CD-ROM writers makes the possibility of non-vendor-produced games and activities a realistic one. Game-playing applications are also increasingly available for use on various Web sites, sometimes for free. Note that such games often contain violent material.

In addition, many commercial establishments frequented by children, including coffee shops, department stores, and fast food restaurants, will have customer-usable Internet access points. Broadband Internet access--needed for efficient transmission of images and movies--will also grow in the future, though with some uncertainty about how fast it will be deployed. Specialized Web access devices will cost much less than today's computers (a few hundred dollars each rather than several hundred or thousand dollars). Wireless Internet access is also expected to grow in popularity, though the feasibility of transmitting high-quality images through wireless links remains an open question.

These devices and business trends suggest increasingly ubiquitous access to the Internet. Note also an important social point--wireless access and access "anywhere" enable users, including children, to escape many forms of local supervision (e.g., someone looking over his or her shoulder), and individuals will not be as dependent on school, libraries, and work to provide Internet access. Consequently, approaches to Internet protection and safety for children that depend on actions whose effect is limited to a single venue will be increasingly ineffective.


2.1.4.

 

Connecting to the Internet

In the reference scenario, the student connects to the Internet. In general, access to cyberspace is provided by one or more Internet service providers (ISPs). For children, Internet connections are available via:

  • Personal Internet service. In this case, a party subscribes to a consumer-oriented ISP, and gains access to the Internet through as many places as the provider can provide access ports. Such services are generally responsible for home access. There are many variations in the from ISPs and many different fee structures as well. Note that an individual child may be using a family account, a personal account associated with a family account, or a friend's personal Internet service.
  • School and/or library Internet service. A student (or faculty member or staff person) or a library patron uses school or library facilities to obtain Internet access. In general, schools and libraries obtain Internet service for their students and patrons through business-oriented ISPs, and a whole host of classroom ISPs have been brought to the market.
  • Public terminals. An individual pays "by the minute" for Internet access at a public terminal, which may be located in a coffee shop or an airport, or through a wireless service.

In addition to Internet connections, some ISPs offer other services designed to enhance the user's experience. Proprietary services (including parental controls to help manage the online experience of children) and content are offered by a number of online service providers. These services and content are available only to those who subscribe to those online service providers. In other cases, services are available to some non-subscribers (for example, the instant message (IM) services of some ISPs can provide IM service to those who do not subscribe to those ISPs).

Moreover, various online service providers develop--and seek to develop--reputations about the kinds of content that they may offer. For example, a service provider may bill itself as being "family-friendly" and thus provide access only to Web sites that it regards as appropriate. The denial of access to all Web sites not on the provider's "family-friendly" list is a proprietary service that the online provider offers that is unavailable to others who do not subscribe to it.

ISPs offer dial-up or broadband access to the Internet. The majority of at-home access is today achieved through dial-up connections--a user's computer dials an ISP phone number and connects to the ISP through an ordinary modem. However, broadband access, generally through "DSL" (digital subscriber lines) from phone companies or cable modems from cable TV companies, is growing because of the higher-bandwidth connections offered. Higher bandwidth is relevant because some kinds of material contain many more bits than others. Text, for example, typically contains many fewer bits than do images, and images contain many fewer bits than movies have. Thus, viewing of graphics-intensive material online through a low-bandwidth connection is often very tedious and tries the patience of all but the most dedicated users.

ISPs also require their subscribers to abide by certain terms of service, violation of which is grounds for termination of the service contract with a subscriber. An individual subscriber to an ISP is bound directly by the terms of service of that ISP. An individual who obtains Internet service through an intermediary is bound by the terms of service imposed by the intermediary, which may (or may not) be stricter than those that bind the ISP and the intermediary. Note also that ISPs vary across a wide range in the extent to which they enforce their terms of service. A typical provision in the terms of service of many ISPs might forbid a user from posting sexually explicit material under most conditions.

ISPs make decisions about content that they will carry. In particular, many ISPs do not allow access to every Usenet newsgroup (e.g., they may not carry newsgroups that carry a large volume of child pornography).5 For subscribers to these ISPs, the newsgroups that are not carried can be difficult to find and are for many practical purposes non-existent.6

Finally, ISPs are funded by subscription and/or by advertising. Subscription entails periodic payment by the user to the ISP for access privileges. Advertising entails payments by advertisers to the ISP for the privilege of displaying ads, and thus the user must be willing to accept the presence of ads in return for access privileges.


2.1.5.

 

Identifying devices on the Internet: the Role of Addressing

Every computer or other device connected to the Internet is identified by a series of numbers called an IP address.7 The domain name system is a naming system that translates these computer-readable IP addresses into human-readable forms, namely domain names. Thus, a domain name is a name that identifies one or more IP addresses. A canonical domain name has the form "example.com."

Every domain name has a suffix corresponding to a top-level domain (TLD), in this example .com . Until October 1, 2001, the most common top-level domains allowed for Internet use have been .net, .org, .com , .edu, .gov, and .mil. In addition, a number of two-letter country suffixes have been recognized. As this report goes to press, a number of other top-level domains have been approved: .biz, .info, .pro, .coop, .aero, .museum, and .name. (How many other TLDs will eventually be available is an open question, and the issue of the number and type of TLDs is highly charged politically and economically.) As a rule of thumb, the non-country suffixes indicate something about the nature of the party with which the site is affiliated. For example, example.museum is likely operated by a museum, example.gov is operated by a government agency.

The domain name is a key element of routing traffic across the Internet. For example, a typical e-mail address is of the form "John.Doe@example.com." The address of a typical Web site has the form "www.example.com." The Web site address is generally part (or all) of a uniform resource locator (URL) that identifies a particular Web page that can be found on a Web site. Thus, www.example.com/page1 might refer to a page on the example.com Web site.


2.1.6.

 

Functionality of the Internet

In the reference scenario, the student used a search engine to search the World Wide Web for information about beavers. Search engines are only one aspect of the functionality that the Internet offers, and as the Internet matures, new functions based on new applications and technologies are constantly being introduced. Some of the more important functions of the Internet are described below and are summarized in Box 2.2.

  • The World Wide Web (WWW) refers to the set of all the information resources that can be accessed via the hypertext transfer protocol (HTTP). Loosely speaking, it is the set of all Web pages that can be addressed by a request of the form "http: URL."8 Today, the publicly accessible World Wide Web consists of over 2 billion Web pages,9 though there is a great deal of uncertainty in any estimate of Web size. Web pages are associated with particular hosts (though not every host has a Web page), and many Web pages themselves include links to other Web pages. The Web is based on a client-server model--a user (client) specifically requests a Web page from a host (server).
  • Search engines help to organize, classify and return information based on a query, and those who surf the Web typically rely on various types of search engines to find the information they are seeking. describes how search engines work. Search engines rely on technologies of information retrieval, as discussed in Section . Given the enormous volume of information on the Web, users in general do not know where to find the information they seek. To cope with this situation, search engines have been developed to help users find the addresses of information residing on the Web. While no data have been collected on this point, it is probably fair to say that search engines enable the finding of most information that people access on the Internet.
  • E-mail refers to messages that are sent electronically from one user to another (or to many others) and read at a time of the recipient's choosing. E-mail can carry attachments that can be other information objects, such as images, movies, audio recordings, and so on. E-mail can also be used as a direct marketing tool ("spam") analogous to third-class postal mail (also known as junk mail). The use of e-mail requires knowledge of a recipient's e-mail address.
  • File sharing refers to a process in which devices controlled by end users (i.e., "peers") interact directly with each other to transfer files between them, rather than interacting through a central server. In some file-sharing networks, a central server holds a publicly accessible index to the files available from end users (but not the files themselves). End users then transfer the files between themselves.10 Other peer-to-peer file-sharing networks eliminate even the centralized server index function. Users of these systems are connected to a network of other parties (rather than to a centralized index), and a query from one user goes to an immediate circle of possible respondents. If not satisfied, the query then goes from those respondents to other respondents. Furthermore, such queries are highly anonymous, though file transfers between end users are not. Although peer-to-peer interaction is most often performed in a user-to-user mode, there is no reason that in principle a single user could not establish peer-to-peer connections to a large number of other users and thus function in a "server-like" mode for those users.

 

  • Usenet newsgroups are a broadcast medium in which anyone anywhere with a computer can be a transmitter. Typically groups form around shared social interests. Thus, the Usenet becomes the place for discussion among self-selecting groups interested in specific topics. The volume carried by Usenet newsgroups is substantial (over 50 gigabytes per day on more than 10,000 newsgroups).11 Anyone can "post" a message on any Usenet newsgroup (perhaps anonymously--see Section ), governed only by his or her own judgment in ascertaining the relevance of the message to the nominal topic of that newsgroup. Newsgroups are named as described in .
  • Internet relay chat (IRC) and chat rooms. These are popular real-time interactive services on the Internet that function as the equivalent of CB radio, where one person talks on a channel and anyone listening on that channel can hear and respond. IRC and chat rooms allow users to exchange text-only messages in real time with other people all over the world. IRC "channels" and chat rooms can be public (so that they can be found by others wishing to join the conversation) or private (so that they are invisible to the general public and special knowledge of the channel's or chat room's name is needed to join). IRC and chat rooms require a user to take active (initiating) steps to join an ongoing conversation. In addition, some chat rooms or channels on the Internet are monitored by employees or volunteers for language and content and behaviors, but most are not. (These monitors sometimes have the ability to force particular users out of a conversation.)

There are many variants of chat rooms. Chat rooms can be based on interests--movies, sports, hobbies, or can be just a place to meet people. Some of the latest technologies are found in the online gaming community where people assume digital visual representations called avatars. Avatars can then interact with each other in cyberspace. The chat then has a visual animated component. MUDs and MOOs are complex online games relying mainly on text interactions while relatively new games like Microsoft's Age of Empires and Electronic Arts' The Sims Online utilize visual representations to create fantastic communities for role playing.

  • Instant messaging services allow a two-way, real-time, private dialog between two users These services include such well-known entities as AOL's Instant Messenger and Yahoo's Messenger. A user initiating a message sends an invitation to talk to another (specific) user who is online at the same time. Unlike IRC, no channel-seeking initiating step is required on the part of the recipient to become part of such a conversation.12 Instant messaging also allows someone to carry on multiple private conversations simultaneously. Instant messaging is very popular today for both professional and personal use, because unlike in chat rooms, one tends to talk to people whom one already knows. Note also that IMs are often used in conjunction with chat rooms or other online activities--a user in a chat room can send an IM to someone else in the chat room (because he or she sees the other party's screen name or "handle"), thus establishing a private communication. Once limited to text-only interactions, IM services are increasingly sophisticated. For example, some IMs can support direct voice interactions and exchanges of music or image files. Other IM services allow a user to block selected other parties from contacting him or her, thus increasing the difficulty of harassment.
  • Videoconferencing applications are growing. Web cameras and streaming media depend on the increasing availability of broadband Internet connections to allow the high-quality real-time transmission of audio and video content. Today's Internet videoconferencing suffers from many of the same problems as Internet telephony, most notably poor quality (low resolution as well as "jitter" in the moving images). A popular consumer videoconferencing application is CU-SEE-ME, a very inexpensive videoconferencing tool originally developed at Cornell University for educational applications and now used to support a wide variety of video applications. Chat rooms are often forums in which Web cameras are used to send pictures in real time.

 

  • Streaming media, video, and audio are allowing people to watch movies like broadcasts over the Web as well. A movie that is now available through pay-per-view cable TV may readily become available through the Internet (a phenomenon known as digital convergence), perhaps augmented by the availability of an online chat room for discussion of that content with one's friends and/or an electronic commerce site where one can purchase products or services illustrated in the movie.
  • Internet telephony allows two-way real-time voice communication to be established without records of such communications appearing on family telephone bills. A variety of standards now in place facilitate the interoperability of Internet telephony products, which would otherwise be hampered by proprietary specifications and protocols. However, because the Internet was not designed to support real-time operations, the quality of such connections remains an issue, though progress is being made in this area. Internet telephony products enable Internet users to establish real-time voice contact without the need for a telephone, and even today, voice connections (of somewhat low fidelity) can be established through certain types of instant message and in some chat rooms.

In addition to these functions, there are a variety of Internet applications for facilitating Web activity (). The use of these applications is often free, and they are important because they reduce the costs and difficulty of establishing a (non-commercial) Web presence and of generating communities of shared interest--in sports, in science, and in trading of sexually explicit materials.

Finally, a variety of peripheral devices are also relevant to a discussion of Internet functionality. The availability of devices to convert sound into digital form, to digitize

existing images, and to record still and video imagery enables individuals to generate digital content inexpensively and in private. Digital cameras, Web cameras, and camcorders pictures are dropping in price and the pictures they take increasing in quality, and virtually anyone can publish videos to the Web or can participate in or set up videoconferences at very low cost.13 Thus, while one might have had difficulty in the past in taking a picture of a couple having sex (because of the difficulty in having the film developed), today a digital camera enables one to do the same in complete privacy.


2.1.7.

 

Cost and Economics of the Internet

On the Internet, the cost of handling information is rapidly decreasing. From a message sender's point of view, electronic messages cost next to nothing to create, exactly nothing to duplicate, and virtually nothing to send, and--given the anonymity of the Internet--inexpensive bandwidth imposes none of the costs normally associated with responsibility, prudence, or probity, leading to problems such as unsolicited commercial e-mail (also known as spam). Bandwidth is inexpensive enough that most ISPs and services recognize that it is cheaper to "send everything" through its pipes than to determine if a message or information is inappropriate, unwanted, or unrequested by the receiver.

Furthermore, because digital information can be so freely reproduced, it is essentially impossible to rely on mechanical difficulty or expense of reproduction to curtail the availability of anything to anyone. Once released onto the Internet, content is next to impossible to ban--whether that content involves a political manifesto, sensitive classified information, company trade secrets, one's medical records, or child pornography.14

Finally, the Internet contains an enormous volume of material that changes rapidly. The sheer mass of this material means that it is economically prohibitive to review every publicly accessible item for its inappropriateness or lack thereof.

The economics described above suggest that if it costs virtually nothing to provide content to everyone, then an entirely free market will seek to make all possible content available to everyone. The implications of such economics are further discussed in Chapter 3.


2.1.8.

 

A Global Internet

The Internet transcends the physical boundaries of local communities and national borders alike, thus expanding the universe from which content of various kinds can be drawn. Of particular relevance is that many other nations have different views about visual depictions of sexuality and the human body. For example, images of frontal nudity are found in mainstream print media in many parts of Europe, and publication or broadcast of such images raises little concern or outcry there. Thus, material not seen as "pornographic" by those providing it (e.g., content providers in Europe) may be perceived as such by those viewing it in a different cultural context (e.g., by some viewers in the United States).

A further consequence of the Internet's international nature is that only with great difficulty (and many would argue that it is impossible) can laws passed in one jurisdiction affect the behavior of parties in other jurisdictions that are not generally subject to such laws.15 Thus, to the extent that sexually explicit material of any kind--or any other type of material, for that matter--is available from overseas sources, laws that seek to restrict U.S. content providers from making such material available to U.S. citizens will fail to restrict it in practice.16


2.1.9.

 

The Relative Newness of the Internet

Amidst all of the attention given to the Internet and dot-com phenomena, it is helpful to recall that the Internet has been a part of the national consciousness for less than a decade (since the mid-1990s). Ten years is an enormously long time compared to the time scale of technology change, but it is quite short on the time scale of social, economic, and legal change. Given that the array of pre-Internet social, economic, and legal and regulatory practices to balance competing societal interests developed over a time scale of many decades (and in some cases, centuries), it is not surprising that the Internet has offered something of a vacuum into which many parties seeking quick advantage have moved.

For example, the practice of adult-oriented Web sites using addresses that are based on common words or that are similar to those of non-adult businesses draws many people to sites that they had not intended to visit. Branding histories have not been established that allow users to differentiate between reliable and unreliable information. Certain practices that are acceptable in the real world--such as direct marketing--may cross over into the unacceptable in cyberspace because they are increasingly voluminous and often seen as more intrusive as well.

Perhaps the most important consequence of the relative newness of the Internet is the generation gap in knowledge between parent and child. It may be that as today's children become parents themselves, their familiarity with rapid rates of technological change will reduce the knowledge gap between them and their children, and mitigate to some extent the consequences of the gap that remains.


2.2.

 

TECHNOLOGIES OF INFORMATION RETRIEVAL

As suggested in the reference scenario in which a student seeks information on adult beavers, information retrieval is an important part of what people do on the Internet. By virtue of its vast scope, the Internet is a route for obtaining a range and variety of material to which one would most likely not otherwise have easy access--such materials include history, science, entertainment, games, medical information, and religious information, as well as materials that adults deem inappropriate for children. If children are treated as adults on the Internet, children may come across such materials.

Searching for information on the Internet is different from searching for information in, for example, a library in the physical world. Typically, an individual might search for information using an Internet search engine. A common initial search strategy--used by many inexperienced individuals--is to type one or two keywords and then to examine the sites that are returned. For a word such as "sex," a search engine might return information on sex education sites, a set of biology notes on sex, and adult-oriented Web sites. By contrast, a user of a physical library might rely on the content labeling in various classification systems, such as those of the Library of Congress and the Dewey Decimal systems. On the Internet, this absence of reliable content labeling confounds specificity in searching. Further, the scale of a "Web catalog" (i.e., the volume of information accessible through popular search engines) is much larger than that of most library catalogs of holdings, and Web search engines often do not provide adequate categorization of Web pages contained in their databases. Finally, the most important distinction between the physical library and the Internet is the fact that all physical libraries exercise some editorial discretion in acquiring materials, whereas the Internet is a venue in which the publications of any party are available and retrievable without editorial restriction.

Information retrieval systems support people in finding information in large databases of information objects (whether in the form of text, images, video, or other media) that is relevant to their problems or situations. Internet search engines, where the database is the Web, are a typical example of such systems, as are libraries, where the database is the collection. To accomplish their goals, information retrieval systems must:

  • Represent the content of the information objects (what the objects are about), through a process called content representation or content analysis;
  • Represent the person's information "need," through a process called problem or user representation;
  • Match the representations of information objects and information problem, to retrieve those objects that are most likely to be useful to the searcher (search techniques); and
  • Provide an interface between the user and the other components of the system to support the user's interaction with those components and with the information objects.

Filtering systems, discussed at greater length in Sections 2.3.1 and 12.1, work like information retrieval systems in reverse; that is, they are concerned not with retrieving desirable information, but rather with making sure that undesirable information is not retrieved. However, their essential operations remain the same: they must represent the content of the information objects; they must represent relevant characteristics of the user; they must match object representations with user representations to eliminate undesirable objects; and they must provide a means for users to specify or otherwise indicate what is not desired.

The essential problem with information retrieval (and filtering) is that all of these processes are inherently uncertain. With respect to content analysis, what an information object is about can be many things for many people. The problem is intrinsically difficult, even for humans: one person may think a picture shows a starry sky; another may interpret it as a symptom of mental ill-health; and a third is interested only in the brush technique. Similarly, one user may find a particular page of text obscene; to a second it is merely embarrassing; and to the third, it contains important health-care information. Also, even representing what text is about is fraught with uncertainty. Most words mean many things (polysemy); most concepts can be expressed in many ways (synonymy).

Images are a particularly difficult recognition challenge for computers. Computers seek to recognize an image by analyzing the relationship of the pixels in it (color tone, contrast, and so on). While it is often possible to tell whether a picture has nearly naked people in it, images of the California desert and apple pies are also sometimes identified as pictures with naked people by today's image recognition software.17 Image recognition technology is for the most part incapable of distinguishing minors from adults (and hence cannot identify child pornography with any reliability). At the same time, using words that may be found alongside images provides additional information that can help identify sexually explicit images properly.

With respect to representing the user (or what the user desires or desires not to see) the problems are similarly difficult. Users are in general unable to specify precisely that which they do not know but may be searching for, nor are they (or a computer algorithm, or another person) able to specify precisely the characteristics of that which they should not see. The matching process is thus itself inevitably uncertain, since the representations on which it depends cannot be complete and certain.

Because information retrieval and information filtering are probabilistic, any search engine will find material that is irrelevant to the user's needs and fail to find material that is relevant. Similarly, any filter will inevitably allow the passing of some undesirable material, and will filter out some desirable material. Any attempt to avoid errors of the first type will lead to an increase in errors of the second type, and vice versa.

These points are discussed in greater detail in Section and in Appendix C.


2.3.

 

TECHNOLOGIES RELATED TO ACCESS CONTROL AND POLICY ENFORCEMENT

As more people--and children--connect to the Internet, problems such as exposure to inappropriate material and experiences assume a higher profile. One logical conclusion might be that if technology helped to create these problems, technology can help to solve them. While the committee does not believe that technology is yet the foundation of good solutions to these problems (and may never be), technologies nevertheless do have useful roles to play. Below is a brief discussion of technologies that may be relevant.


2.1.

 

Filtering Technologies

Filtering technologies allow Internet material or activities that are deemed inappropriate to be blocked, so that the individual using that filtered computer cannot gain access to that material or participate in those activities. Typically, material is determined to be inappropriate on the basis of its source, its content, or the labels that have been associated with it. Determination of inappropriate content can be accomplished by computer-based methods, by a combination of computer-based methods and human judgment, or by human judgment alone. This section addresses automatic and human plus automatic methods, since the size of the Internet effectively prevents use of human judgment alone (). (In the case of methods based on a combination of human plus automatic techniques, a human rater examines Web sites that a preliminary machine-performed analysis has identified for human examination, and makes a judgment call about whether the site is inappropriate, and if so, determines the objectionable category into which the page falls.)

Filtering technologies can be applied in several ways. One is by the establishment of so-called "black lists," which are lists of sources that have been deemed to be inappropriate, and that the user is prevented from accessing. Another is by the establishment of "white lists," which are lists of sources that have been deemed appropriate, and thus are the only sources that the user is allowed to access. These two methods require a priori identification of the bad (good) sites, which are then incorporated into the filtering software, which stands between the user's Internet access tool and the Internet itself. Bad sites for black lists can be identified through any of the technologies described below. Also, in a priori determinations of inappropriate content, the categorization judgment is usually made days, weeks, or even months in advance of the user's request for the Web site--a point that is significant in light of the fact that the content of Web sites typically changes over time.

A third means of applying filtering technologies occurs in real time, that is, at the time that the user is actually interacting on the Internet and when the information in question is flowing directly to the user. In this case, there may be no a priori blocking of specific sites or sources; rather, the content or other characteristics of retrieved items are analyzed prior to display, and on this basis it is determined whether they should be displayed to the user. This real-time method can also be used in reverse; that is, it can be used to analyze the user's request, and on this basis decide whether the request should be allowed, or disallowed. Although conducted in real time, this method nevertheless requires a priori specification of indicators of content which determine that that source has inappropriate content. Finally, only real-time content monitoring is useful for monitoring and selective blocking of outgoing information, such as blocking certain text from appearing in e-mail (e.g., a phone number).

Note that if a requested Web site is determined to be inappropriate, there are several options for how much material from that site should be blocked. For example, all material on that site might be blocked (everything on www.example.com). Or only a certain directory might be blocked (www.example.com/directory1mightbeblocked,whilewww.example.com/directory2mightnotbeblocked). Or a particular page within a directory might be blocked (e.g., www.example.com/directory1/picture1.jpg).

Filtering by Internet Domain Names and Addresses

Filtering by Internet domain names and addresses is typically accomplished by examination of the name of the Web site that is requested by the user or returned to the user, in the case of real-time filtering. The name of a Web site (or page on a Web site) is specified by a uniform resource locator (URL). A given URL, for example, http://www.example.com/directory1/picture1.jpg, is usually checked against this list in a number of ways.

In the case of a priori filtering, the URL is checked against a preexisting list of inappropriate names generated by the filter vendor. All parts of the URL are compared to a list of words or terms that have been previously found to be associated with sites containing inappropriate material, or that are believed are likely to be associated with inappropriate material. For example, www.hotmama.com is likely to refer to an adult Web site.18 The .xxx domain (discussed in Section 13.1) is based on this notion.

This method can be used to permit access, as well as to prevent access. For instance, a site in the .gov domain would in general be considered highly unlikely to contain inappropriate material, as would a site with the name of a museum. In the case of real-time filtering, access would be denied (allowed) based on the comparison; in the case of a priori filtering, the URL would probably be forwarded to a human evaluator, who would determine whether it should be placed on the black list.

A related method is to examine the links that are made from a site and to a site. Because many adult Web sites are linked to each other, a referral to a known adult site A that is present on Web site B provides reason to assume that B is also an adult site.

A second method is to check the IP address of the Web site--in this (made-up) case, 203.12.34.12. If this address is on a list of inappropriate IP addresses, access is blocked. This approach is helpful when a Web site has only an IP address and no domain name associated with it.

A complication in this analysis of page names is that different hosts can share the same IP address through a process known as IP-based virtual hosting, which is a way of assigning multiple domain names to the same IP address. IP-based virtual hosting is made possible by the fact that the HTTP protocol passes the URL containing the requested domain name to the site at the given IP address, and the software at that IP address maps the domain name to the appropriate portion of the server. Thus, an entry in the domain name server need not point to a unique address, and a given IP address does not specify a Web site unambiguously. Thus, www.porn-company.com and www.safe-for-kids.com might share the same IP address (e.g., 204.1.23.3), even though each of these names, when entered into a browser, would reach the correct sites. A list that designated 204.1.23.3 as containing inappropriate material would block both domain names.

Filtering by Textual Analysis

Filtering by textual analysis makes use of information retrieval representation technologies discussed in Section 2.2 and Appendix C. The basic concept is to examine all of the text that is on the site or page that is being considered (or in the search request), and to determine whether that text is indicative of inappropriate content.

The most naïve method of doing this is to compare the individual words of the text or request to a list of words that are strongly associated with inappropriate content. For example, the site might be deemed inappropriate if any of a number of keywords is found (e.g., "orgy," "cum," "bomb," "gun," "marijuana," and so on). When such words are found, access is blocked, or the site is flagged for possible inclusion on a black list.

However, many words have more than one meaning (for instance, "beaver" can have both sexual and nonsexual meanings); furthermore, the context in which words appear has a great effect on their appropriateness (for instance, the word "breast" can appear in a cancer information site, as opposed to an adult-oriented, sexually explicit site). More sophisticated text analysis techniques that are available to address these problems can, for instance, identify phrases (e.g. "beaver dams" or "breast cancer") in order to determine appropriateness more precisely.

Another method of textual analysis that is used for filtering is text classification or categorization (see Appendix C). This technique analyzes the text as a whole, taking account of such characteristics as frequency of occurrence of various words, co-occurrence of pairs or other combinations of words, and other statistical parameters of the text. Text classification is first applied to a so-called training collection of texts that are already known to be either appropriate or inappropriate, in order to discover regularities in the statistical properties of appropriate texts and inappropriate texts. The same technique is then applied to texts retrieved from the Web, and their statistical characteristics are used to classify them as either appropriate or inappropriate.

Filtering by Image Analysis

Almost all sexually explicit material on the Internet is associated with images. As indicated in Section and Appendix C, analysis of images to determine if they are inappropriate is a very hard problem, if it is to be done accurately. Nevertheless, there are some techniques that can provide clues to the potential inappropriateness of an image.19 For instance, it is possible to identify large expanses of what is likely to be flesh in an image, and it is also possible to determine whether an image is likely to be of one or more people. Also, it is possible to have a set of canonical or usual inappropriate images, against which images on a Web site can be compared. However, all of these techniques are highly error-prone and therefore are most often used in combination with other indicators of potential inappropriateness as described below.

Filtering by Labels

All Web pages have associated with them information that describes various characteristics of the page and that is typically hidden from the user. For example, HTML or XML tags within the body of a page can encode various rules that determine how information is structured on the page. This low-level information can be used to compare the page's structure against a set of structures commonly associated with inappropriate pages. At a somewhat higher level, Web sites have associated with them information about the site or page as a whole. Such metadata can be used to determine the appropriateness (or not) of a site. Metadata is not directly viewable by the user, a feature that has been exploited by many inappropriate (and even some appropriate) sites in order to bias search results toward themselves.

For instance, due to the nature of search engines, the more times a word that is used in a query appears in a site, the higher up in retrieval rankings that site will be placed. Thus, extended repetition of commonly used search terms in the metadata, which have no relationship to the actual content of the site itself, will result in that site's being retrieved and placed highly in the results when those terms are used.

This methodology can, however, also be used for filtering purposes, in the following ways. The terms in the metadata can be compared to the words in the text of the page, and if those in the metadata are markedly dissimilar from those in the page, that page is suspect. Also, the fact of unusual repetition of words in the metadata can be used as a clue for filtering.

The most straightforward method of labeling for filtering is labeling to indicate the nature of the content of the Web page or site. This can be accomplished either by third parties who label sites according to some established set of categories that indicate their content, or by the producer of the site. This is, in effect, the human version of the statistically based automatic text classification described above. The filter then works by establishing which categories of sites are allowed to be presented, reading the appropriate label in the metadata, and refusing all sites that are either on a black list of categories, or not on a white list.

A common framework for labeling is the Platform for Internet Content Selection (PICS--). In the domain of television, the V-chip is a filter that is based on labeling. (Movies and video games also have labels (i.e., ratings) that often appear before a program is televised or a game is played, but these are not machine-readable. Further, these labels are intended to provide advice to consumers rather than to enable technological denial.)

Filtering Using Combinations of Methods

All of the technologies of filtering that are discussed above have inherent uncertainties associated with them, which lead them to make errors both of commission (misinterpreting a site as inappropriate) or omission (not identifying an inappropriate site). However, the sources of error in each of the techniques are different. Thus, by combining the various techniques, the level of error can be reduced. For example, if image analysis indicates the high probability of a naked person but textual analysis does not indicate any of the words usually associated with adult-oriented material, analysis of the associated URL finds the domain .gov, and the metadata indicates that the owner of the site is the National Gallery of Art, the filter would be justified in predicting that the site should not be regarded as containing adult-oriented, sexually explicit material, despite the evidence from image analysis. Such methods show promise in improving filter performance.

Trade-offs in Filtering

As mentioned above, filtering is subject to two kinds of error: errors of commission, also known and referred to in this volume, as Type I errors, or as overblocking, and errors of omission, also referred to as Type II errors, or underblocking. In the information retrieval literature (see Appendix C), these kinds of errors are associated, respectively, with the performance measures of precision and recall. The first type of error--overblocking--occurs when a site that is appropriate is filtered, i.e., is deemed inappropriate and therefore denied to the user. The second type of error--underblocking--occurs when a site which is in fact inappropriate is deemed appropriate, and therefore permitted to the user.

Due to the nature of filtering, these two types of errors are inevitable. It is possible to adjust the method of filtering such that the occurrence of one type of error is reduced; however, reducing one type of error will always result in increasing the other type of error. For instance, one can reduce underblocking by setting the standard for what is inappropriate at a very low level (e.g., denying access to all sites or refusing all queries that contain the word "adult" or the word "sex"). This might result in many sexually explicit sites being successfully filtered, but it will clearly also result in a concomitant increase in overblocking, since many obviously appropriate sites will also be filtered.20 In some settings (e.g., in doing research), it is desirable to minimize overblocking. In other settings (e.g., in households that are highly risk-averse), it is desirable to minimize underblocking. But it is not possible to minimize both simultaneously. Note also that even a low rate of overblocking will still cause a large number of pages to be blocked, simply because most of the content on the Web consists of innocuous content.

Quantitatively estimating the rates of these two types of errors, or the rate of success in blocking and not blocking, depends on knowledge (or estimation) of four numerical parameters, as indicated in .

Placement of Filters

Filters can be installed in a variety of places. Some ISPs use filters to screen the content they pass onto their subscribers. The major Internet browsers (Internet Explorer and Netscape) support label-based filtering. Some search engines provide users with the option to perform filtered searches. Third-party commercial software vendors sell stand-alone filters that can be installed on a personal computer or into a local area network serving an organization (e.g., a school or a library system). See Section 12.1.1 for a more detailed discussion of this issue.


2.3.

 

Technologies for Authentication and Age Verification

The process of authentication involves assessing the validity of an assertion about the identity of a user.21 (Note that a separate issue relates to the identification of a specific piece of software or hardware being used (). When only a specific individual is using that software or hardware, the authentication problem is reduced to that of identifying the specific software or hardware in use. But in general, multiple users of a given software or hardware system must be assumed.22)

In the physical world, the authentication process is conceptually straightforward because of face-to-face interactions. When an individual buying beer presents a driver's license to a liquor store clerk, the clerk can compare the picture on the license to the individual in front of him. Of course, the license could be phony, but the face-to-face nature of the interaction helps to ensure that the subject being compared to the credential is real.23

Such assurance is not available when a face-to-face interaction is not possible, as in the automated authentication of a user to a computer system.24 Automated authentication depends on the prospective system user sharing with the authentication device something the person knows, has, or includes as a feature, such as a "smart card" belonging to the appropriate individual, a secret password, the individual's voice, or a biometric signature such as a fingerprint or retinal pattern.

Authentication is only one dimension of keeping children away from age-inappropriate materials. The second key element is that of ensuring that a user is older than some specified age (e.g., older than 17). While authentication involves assessing the validity of an assertion about the identity of a user, it does not speak directly to the issue of age verification. Assurance about age must, in general, be provided by reference to a document that provides information about it, and today's infrastructures needed to support online authentication of identity generally do not include such documents.

In the physical world, age verification can be provided as a part of the credential being presented--a driver's license generally has a date of birth recorded on it. However, a driver's license would be just as good an authenticator of identity if it did not have the date of birth on it.

In an online environment, age verification is much more difficult because a pervasive nationally available infrastructure for this purpose is not available. One method is based on the fact that many adults (but not very many children) have credit cards--presentation of a valid credit card number is presumed to be an indicator that the presenter is an adult. Taken in the large, this is not a bad assumption--the vast majority of credit cards are in fact owned by adults, and the vast majority of minors do not own or have legitimate access to credit cards. Thus, an adult-oriented Web site that uses credit cards as its medium of exchange presumes that the presentation of a valid credit card also verifies that the card user is of legal age. (Even when children are issued credit cards, it is likely that their parents are reviewing the statements.) Entering a valid credit card number grants access to the inside of the site.25

Many online adult verification services (AVSs), which provide a verification of adult status to other adult Web sites, also use credit cards.26 Because the credit card is generally the user's method of payment for the service, the AVS relies on the credit card to verify the adult status of the user.27

Another approach to age verification is to couple databases of public information to an authentication process. For example, an individual wishing to gain access to an adults-only service sends an online request to an age verification service (along with a credit-card number to effect payment) for a certification of age for a given individual. He or she also provides appropriate biographical information, and the adult verification service checks that information against public records such as state drivers' licenses and voting registration that contain or imply age information. When adult status is confirmed, a credential certifying one's adult status and that can be used online is mailed (via postal service) to the address of record on those public records. (In this context, the postal address serves as an authenticating device that ensures the adult credential is sent to the right person.) The individual can then use this special key to obtain access to adults-only services that recognize this special key.

A third approach is to use age verification scripts. An online script can guide a user through a questionnaire that asks, among other things, the user's age, and it can reject users that are underage. To help deal with the problem of lying about one's age, some scripts are written to accept only one attempt at entering age, and so a user who enters "15" at first, is rejected for being underage, and then tries to enter "20," is unsuccessful. In such cases, he or she may have to try again from another computer.

Note that each of these methods imposes a cost in convenience of use, and the magnitude of this cost rises as the confidence in age verification increases. Age verification scripts are very convenient for the legitimate adult user, who must simply tell the truth about his or her age. But they are also susceptible to being fooled by a savvy adolescent who knows that the correct age must be entered. A credit card is less convenient for the legitimate adult user, because he or she must be willing to incur the expense of a subscription (or the hassle of canceling one). However, since most credit cards are owned by adults, the use of a credit card provides additional confidence that it is truly an adult who is seeking to use it. At the same time, some minors do own credit cards or prepaid cards that function as credit cards, while other minors are willing to use credit cards borrowed with or without permission from their parents. Using public databases to verify adult status provides the greatest confidence of all that the alleged adult is truly an adult, but because the user must wait for the processing and mailing of the adult credential, it is the least convenient of them all.

Claims have been made that certain "biometric" signatures can differentiate between adults and children. While human physiology does indeed dictate that certain changes in one's body occur as one grows from child to adult, the precise trajectory of these changes varies from individual to individual. However, one's legal status--as being entitled to privileges as an adult that are not enjoyed as a child--is fixed by laws that specify, for example, that individuals even one day over 18 are considered adults and one day under 18 are considered unemancipated minors. No technology today or on the horizon can hope to make such fine distinctions in the case of individuals.28 For this reason, biometric technologies as a method for age verification are not considered here.

Age verification technologies as integrated into functional systems are discussed in greater detail in Chapter 13.


2.4.

 

Encryption (and End-to-End Opacity)

Encryption is used to hide information from all but specific authorized parties. In the most general encryption process, an originator (the first party) creates a message intended for a recipient (the second party), protects (encrypts) it by a cryptographic process, and transmits it as ciphertext. The receiving party decrypts the received ciphertext message to reveal its true content, the plaintext. Anyone else (a third party) who wishes undetected and unauthorized access to the message must penetrate (by cryptanalysis) the protection afforded by the cryptographic process or obtain the relevant decryption key (or use another approach to obtain the key, such as bribing someone to reveal it).

Encryption also has relevance to the protection of digitized intellectual property, such as proprietary images. Because encryption restricts the access of unauthorized parties, encryption can be used to help prevent the dissemination of unauthorized reproductions of digital objects. Encryption is thus the fundamental technology underlying digital rights management systems (discussed in greater detail in Chapter 13). The use of encryption may increase dramatically in the coming years.

In the context of this study, the significance of encryption is that if content, whether acceptable or inappropriate, is encrypted properly, it cannot be identified by third parties. Thus, while it is possible to interdict all information flows that are encrypted, it is impossible to interdict specific transmissions on the basis of content--a point with obvious relevance to filtering systems intended to block specific content. Thus, encryption allows transmission and reception of information to occur with essentially no outside scrutiny possible.


2.5.

 

Anonymizers

As noted in Section , the technology of the Internet itself does not generally require any party to authenticate its identity. Thus, users and online identities (e.g., a screen name or an e-mail address) are bound together through administrative procedures, usually those of an ISP, that are associated with gaining access to the Internet. Through such bindings, any interaction of an individual with an Internet-related service--whether visiting a Web page, sending an e-mail, posting a message, setting up a Web page, or participating in a chat room--is tied to a specific identity that can, in principle, be traced administratively back to that specific individual.

Anonymizers break this binding and decouple an individual from a specific online identity. The anonymizer provides what amounts to an identity that is randomly generated. This identity is then used for posting messages, sending e-mail, participating in chats, and accessing Web pages. (Some anonymizers enable return paths when necessary; for example, the recipient of an anonymous e-mail may wish to reply to the (anonymous) sender.) However, anyone seeking to trace the anonymized identity back to the original user will find a number of barriers that make it very difficult to recover the identity of the original user. One example of an anonymizer useful to publishing information on the Web is described in .

Anonymizers are significant because they enable individuals to undertake activities for which they need not suffer retribution. For an individual living in a totalitarian state, an anonymizer enables him or her to post an anti-government message in safety or to browse forbidden Web sites. In the United States, it enables someone to freely post a message expressing unpopular political views or to browse Web sites in privacy. Commercial enterprises--which need to have a way to accept money--do not have much use for anonymizers, even if they are posting materials that may be controversial. But those with non-commercial interests can use the same technology to anonymously post child pornography or harass or stalk an individual online. When anonymizers are used, tracing the identity of online criminal perpetrators becomes difficult.


2.6.

 

Location Verification

The legal regimes of today are ones in which jurisdiction is based largely on geographical borders. For example, as noted in Chapter 4, "community standards" are an important factor in determining whether a given image is obscene. However, the Internet is designed and structured in such a way that geographical borders and the physical location of a user have no significance for the functionality he or she expects from the Internet or any resources to which he or she is connected. This fact raises the question of the extent to which a user's location can in fact be established.

One way to establish location is simply to ask the user where he or she is located upon logging in. Thus, the first screen seen by the user might ask for his or her present zip code (or state, or country). But in the event that the user chooses to be deceptive (e.g., to avoid restrictions on Internet service based on his or her location), the problem shifts to one of determining location through technological means.

Under some circumstances, it can be virtually impossible to determine the precise physical location of an Internet user. Consider, for example, the case of an individual connecting to the Internet through a dial-up modem. It is not an unreasonable assumption that the user is most likely in the region in which calls to the dial-up number are local, simply because it would be unnecessary for most people to incur long-distance calling costs for such connections. However, nothing prevents a user from using a long-distance telephone call (e.g., from Tennessee) to access a modem in California.

In practice, recovering location information is a complex and time-consuming process.29 As a rule, the information needed to ascertain the geographic location of an IP address associated with a fixed (wired) Internet access point at a given time is known collectively by a number of administrative entities, and could be aggregated automatically. But there is no protocol in place to pass this information to relevant parties, and thus such aggregation is not done today.

The bottom line is that determining the physical location of most Internet users is a challenging task today, though this task is likely to be easier in the future. Appendix C provides additional discussion.


2.5.

 

WHAT THE FUTURE MAY BRING

The hardest part of this report to calibrate is how the future will change the technologies that today scope both the problem and any putative solutions. As of this writing, the World Wide Web is not even a decade old, while the creation and adoption rates for new technologies show generally accelerating deployments of these technologies.

The rapid changes of capability in the hardware underlying information technologies will lead to computing that is 100 times more cost-effective, storage 1,000 times more cost-effective, and bandwidth 10,000 times more cost-effective 10 years hence, and it is highly likely that many applications will emerge to take advantage of such increased capability, as has occurred in the past. What follows below is admittedly speculative, but even if any given speculation is far from the mark, taken together these notions paint a portrait of a very different technological milieu in which the age-old problem of "protecting children on the Internet" will play out in the future.

  • Mechanisms for financial transactions will change significantly over the course of a decade. Financial transactions are likely to become increasingly less private, as the various forms of payment embody different features to enable traceability. Even cash may become more traceable in the future. This development will favor parents who wish to monitor the expenditures of their children, but will have no impact on those children who borrow electronic wallets at home or who access those sources of sexually explicit material that do not charge.
  • Voice interaction with computers will become increasingly common, and the capability of computer-generated voices to sound like real people, or even parties known to an individual, will increase. Today, a 55-year-old man can pretend to be a 13-year-old girl using e-mail and instant messages; tomorrow, a 55-year-old man may be able to sound just like a 13-year-old girl over the telephone. It may even be possible for the same 55-year-old man to sound like the girl's mother. In short, technology will offer greater deceptive capabilities, and those that are most at risk from the existence of such capabilities are likely to be children who lack the experience to identify deception.
  • Voice interaction will allow younger children, who would find typing difficult, to speak a Web site address to their computer.
  • Peer-to-peer interactions will be increasingly common, as the technology will largely eliminate the need for large-scale servers, thus eliminating them as principal points of leverage for any control strategy. It already grows ever more expensive to selectively delete content than to keep it all, and this economic fact will dominate the future with implications for privacy, digital rights management, and the steady accumulation of data that is best described as digital detritus.
  • Virtual reality advances will soon defeat the ability of even experts to distinguish pictures that are real from those that are synthetic. Haptic devices (i.e., touch-, motion-, and pressure-sensitive devices) may become more common as a way to interface with computers. Whether then a person, an action, or an event is real or not may soon be irrelevant to many consumers. Action, especially "action" in the sexual and violent sub-meanings of those words, will be as realistic as the audience is willing to pay for, and the prices of such offerings will inevitably drop.
  • Locations from which access to the Internet is possible will proliferate wildly. And, with an expansion in the types of information resources that are accessible (e.g., new virtual reality resources), policies that give permission to view, access, modify, or delete any information resource will present an enormously complex problem simply as a result of scale. Even today, fine-grained access control driven by policy is, or soon will be, beyond the scope of human management and may be beyond the scope of mechanistic alternatives. If access control policies are impossible to formulate, the only alternative is an approach that depends on users to exercise self-control. Monitoring of user actions in order to ensure appropriately self-controlled users then becomes the only technical alternative to access control. This is not a statement about the desirability of this outcome, only that it is a possible one if access control policies become impractical.

Although the notions described above are not necessarily desirable from a societal or personal standpoint, they are extrapolations of certain phenomena today, and there are at least some paths from today that could result in their coming true. On the other hand, they may not come true, a point that emphasizes a vast range of uncertainty about the technological future.

What has been true over the years is that those who produce and consume sexual content--both for commercial and non-commercial purposes--have stayed on the leading edge of new technologies.30 Thus, whatever the technological future is like in detail, it seems safe to predict with reasonably high confidence that sexual content will be disproportionately present in the initial stages of adoption of any new technology. Because technology changes rapidly, no final technological solutions are possible. It is for this reason, among others, that the committee in later chapters emphasizes social and educational strategies for protecting children from inappropriate sexually explicit material.

Finally, many of the issues associated with protecting children from inappropriate material and experiences on the Internet relate to the architecture of the Internet as it exists today, a state of existence that reflects policy and engineering decisions made decades ago. These are not immutable, though major changes that might facilitate control of content delivery could be made only at very considerable cost and at the potential expense of other societal interests.


Boxes

Box 2.1 Interactivity of the Internet
Box 2.2 Modes of Exposure to Inappropriate Sexually Explicit Material and Potentially Dangerous Experiences
Box 2.3 How Search Engines Work
Box 2.4 Characteristics of Usenet Newsgroups
Box 2.5 Internet Applications for Facilitating Web Activity
Box 2.6 Human Scrutiny of Every Site to Be Blocked?
Box 2.7 The Platform for Internet Content Selection (PICS)
Box 2.8 Appropriate and Inappropriate Blocking
Box 2.9 Specific Identification of Hardware and Software
Box 2.10 Publius: A System for Publishing Anonymously on the World Wide Web


Notes

1 More discussion can be found in Computer Science and Telecommunications Board, National Research Council, 2000, The Digital Dilemma: Intellectual Property in the Information Age, National Academy Press, Washington, D.C.

2 Customization happens explicitly when a user undertakes a search for particular kinds of information, but it can happen in a less overt manner because customized content can be delivered to a user based, for example, on his or her previous requests for information.

3 Marjory S. Blumenthal and David D. Clark, 2000, "Rethinking the Design of the Internet: The End to End Arguments vs. the Brave New World," in Communications Policy in Transition: The Internet and Beyond, 2001, B. Compaine and S. Greenstein, eds., MIT Press, Cambridge, Mass.

4 It is true that access to the Internet may require an individual to log into a computer or even to an Internet service provider. But for the most part, the identity of the user_once captured for purposes of accessing the Internet_is not a part of information that is automatically passed on to an applications provider (e.g., a Web site owner). More importantly, many applications providers_for entirely understandable business reasons_choose not to require authentication. (Strong authentication in general requires an infrastructure that is capable of providing a trusted verification of identity_and in the absence of such an infrastructure, strong authentication is an expensive and inconvenient proposition for the user. This point is discussed at greater length in Section ).

5 Usenet is a worldwide distributed discussion system, consisting of a set of newsgroups with names that are classified hierarchically by subject. "Articles" or "messages" are "posted" to these newsgroups by people on computers with the appropriate software_these articles are then broadcast to other interconnected computer systems via a wide variety of networks. Some newsgroups are "moderated"; in these newsgroups, the articles are first sent to a moderator for approval before appearing in the newsgroup. For more information, see Chip Salzenberg, "What is Usenet?," available online at <http://www.faqs.org/faqs/usenet/what-is/part1/>.

6 There are Web sites through which one can read Usenet newsgroups even if the ISP has decided not to carry certain newsgroups, thus circumventing the ISPs selection policy.

7 The IP address of a device provides a unique address to which and from which messages can be routed. A typical IP address has the form a.b.c.d, where a, b, c, and d are numbers from zero to 255. The mapping between domain name and IP address is managed by devices known as domain name servers. See Computer Science and Telecommunications Board, National Research Council, Domain Name Systems, National Academy Press, Washington, D.C., in preparation, for more information. Note also that IP addresses may be mapped dynamically to devices, so today, a user's computer would have one IP address and tomorrow it might have a different one.

8 Most browsers handle addresses without a preceding "http:" as though it was present. Also, some Web pages are accessible only through the "https:" protocol.

9 For example, as of November 2001 the Google search engine had indexed 1.6 billion Web pages. As of April 2002, it has indexed 2.1 billion Web pages.

10 This mode of file sharing first gained widespread publicity with the Napster network, an online service that facilitated the sharing of digital music files among users. The files themselves--the information content of interest to end users--always remained on client systems and never passed through a centralized server (such as one that would host a Web page). Instead, the server gave end users the ability to search for particular files of interest and to initiate a peer-to-peer transfer between the users willing to share (and receive) files without the payment of a fee, even when the files constituted legally protected intellectual property. Napster is important for this discussion because there is no particular reason that the files in question must be digital music files--and indeed, extensions of the Napster protocol can handle other types of files.

11 Personal communication, Dan Geer, president of Usenet.

12 Buddy lists are an important element of IM services. A buddy list contains the online names of "buddies" of a given user and indicates when one or more come online. When the user knows that "sue123" is online, she can send "sue123" an instant message and start a conversation. Thus, buddy lists facilitate online real-time communication among people who know each other's online names. Most IM services also offer a blocking option that enables a user who receives an IM from someone to block it. This option is used when a user receives an IM from someone with whom the user does not want to communicate (e.g., a stranger, or a friend with whom one is on the "outs").

13 A 2001 video advertisement from Sony Europe for its Vaio line of notebook computers (which can have a Webcam built into them) depicts a man working at home on his Vaio notebook computer (with the Webcam). An adult female whom he obviously knows enters the room, greets him, strips to her underwear in another room, and starts behaving with him in a very sexually aggressive manner. The advertisement closes with several businessmen on the other end of a video conference looking at their screen in surprise seeing the woman on top of the man. The advertisement is sexually suggestive but depicts no overt sexual activity or nudity.

14 This is not to say that all content on the Internet remains accessible, but in practice attempts to ban certain information content result in efforts by those interested in such information to copy and distribute it. Thus, while the personal medical records of John Doe may not be of particular interest, and if posted today may disappear without a trace tomorrow, the reason is that no one except John Doe is likely to be interested in such records. However, if the personal medical records of the President of the United States were posted on the Internet, it would be virtually impossible for the most determined efforts of the White House to erase them and to eliminate all access to them.

15 Of course, such a claim is valid only to the extent that content providers and ISPs are numerous and dispersed internationally. If the number of ISPs is small enough (as could happen through attrition or mergers and acquisitions in one jurisdiction), they become likely targets for regulation, as regulatory efforts can be concentrated rather than dispersed.

16 On the other hand, sources that appear to be foreign may in fact be under the jurisdiction of U.S. law. For example, the mere fact that a domain name has a country suffix such as .ru or .jp does not necessarily mean that its owner is located in Russia or Japan. Indeed, in this hypothetical example, such parties may well reside in California or Iowa.

17 D.A. Forsyth and M.M. Fleck. 1999. "Automatic Detection of Human Nudes," International Journal of Computer Vision 32(1): 63-77.

18 As of October 26, 2001, this Web site presented a blank page. But it may not be blank in the future.

19 A brief summary concerning the technology of screening for sexually explicit images can be found in James Ze Wang, Jia Li, Gio Wiederhold, and Oscar Firschein, 1998, "System for Screening Objectionable Images," Computer Communications Journal 21(15): 1355-1360, and papers referenced therein.

20 This is a real example from a filtering system that was encountered at one of the site visits.

21 In this report, the term "identity" is used in its colloquial sense, namely the biological life form – the human being – in question. Security specialists often refer to identity more generally as a collection of information about an individual. For more discussion, see the Computer Science and Telecommunications Board's forthcoming study on authentication technologies. Available online at <http://www.cstb.org/web/projects/authentication>.

22 For more discussion of authentication technologies, see Computer Science and Telecommunications Board, National Research Council, 1991, Computers at Risk; Cryptography's Role in Securing the Information Society, 1996, Kenneth W. Dam and Herbert S. Lin, eds.; Trust in Cyberspace, 1999; and Realizing the Potential of C4I: Fundamental Challenges, 1999, all published by National Academy Press, Washington, D.C. CSTB's forthcoming study on authentication will address these technologies comprehensively (see footnote ).

23 Indeed, in the physical world, someone who presents a fake ID that is recognized as such by the clerk is subject to arrest.

24 In principle, age verification could occur through the use of streaming video and audio. In this scenario, a Web camera and microphone located on the user's access point would be used to transmit a high-fidelity voice and video image to a human being working on behalf of the adult content provider. The human being (who might be called a cyberspace "bouncer") would ascertain the adult status from viewing the image and listening to the voice, and if there were any doubt, the bouncer would demand to see a driver's license that the alleged adult could hold up to the camera. Even through voice alone, a trained human verifier can often determine whether the person on the other end is in fact an adult, though this may not always work for very young adults. The human verifier asks questions, and then listens for tone of voice, composure, presence, stuttering, and other things that are not reflected in a typed textual interaction. Because adults tend to have more confidence and self-assurance than children, such voice interactions provide valuable distinguishing information. These scenarios are technically feasible even today, but are likely not to be economically attractive. The reason is that one of the major advantages of Internet commerce is the ability to drastically reduce the extent to which human beings are involved. Given that many adult-oriented Web sites operate on very thin margins, the use of such a mechanism would likely be prohibitive.

25 Determining with certainty whether a submitted credit card number corresponds to an account in good standing requires an online transaction between the site operator and the credit card company. That is, the site operator transmits the number to the credit card company and the company checks to see if the number refers to an account in good standing. However, many incorrectly-formed credit card numbers (i.e., a credit card number that does not correspond to an existing credit card account) can be detected without such a transaction if the site operator applies a formula (the Luhn formula (see <http://www.webopedia.com/TERM/L/Luhn_formula.html>) to the alleged number. Although the application of the Luhn formula will detect 90 percent of randomly-generated credit card numbers, the random generation of a few dozen numbers is virtually certain to result in at least one number that passes the Luhn test. (In addition, a number that passes the Lunh test can be circulated among interested parties with ease.) Note also that the Child Online Protection Act (COPA), discussed in Chapter 4, specifies that the use of age verification technologies such as a credit card is an affirmative defense to the charge that a commercial Web site made available to minors material that is "harmful to minors" (i.e., obscene with respect to minors). This defense protects the Web site whether or not a minor has stolen a credit card or generated a false credit card number that passes the Luhn formula.

26 Such services also accept applications via 1-900 phone numbers (which children are not supposed to use without parental permission) that charge phone bills automatically and via U.S. mail. Mail applications are supposed to include proof of age.

27 The "typical" adult verification service provides the user with a special code number. Adult Web sites contract with the service (of which many exist). A user wishing access to one of these adult Web sites enters the code number. The adult Web site then contacts the AVS to confirm that the number is valid, and if it is, grants the user access. (The adult Web site usually pays the AVS a commission for users that are verified in this manner.)

28 See, for example, testimony of John Woodward, senior policy analyst, RAND, to the COPA Commission on June 9, 2000. Available online at <http://www.copacommission.org/meetings/hearing1/woodward.test.pdf>.

29 While location information is not provided automatically from the IP addresses that an administrative entity allocates, some location information can be inferred. For example, if the administrative entity is an ISP, and the ISP is, for example, a French ISP, it is likely_though not certain_that most of the subscribers to a French ISP are located in France. Of course, a large French company using this ISP might well have branch offices in London, so the geographical correspondence between French ISP and Internet user will not always be valid for this case, though as a rule of thumb, it is not a bad working assumption.

30 For example, the video cassette recorder, inexpensive video cameras, and CD-ROM technologies found some of their first applications in the production and viewing of sexually explicit "adult" movies and interactive sexual games and entertainment. For one perspective on this point, see Jonathan Coopersmith, 2000, "Pornography, Videotape, and the Internet," IEEE Technology and Society 19(1): 27-34.

31 Steve Lawrence and Lee Giles. 1999. "Accessibility of Information on the Web," Nature 400: 107-109.

32 Note that free services often have characteristics that may make them less desirable than
for-pay services. For example, a business owner who is concerned about high reliability and availability of resources is likely to want a service that provides frequent file backup, a feature that is unlikely to be available from a free storage provider.

33 Seth Finkelstein and Lee Tien, 2000, Blacklisting Bytes, white paper submitted to the Committee on Tools and Strategies for Protecting Kids from Pornography and Their Applicability to Other Inappropriate Internet Content. Available online at <http://www.itasnrc.org> and at <http://www.eff.org/Censorship/Censorware/20010306_eff_nrc_paper1.html>.

34 Finkelstein and Tien. 2001. "Blacklisting Bytes," footnote above.

35 For example, according to the Intel Corporation, a processor chip newer than the Pentium II processor (i.e., Pentium III and above) electronically embeds a processor serial number, which serves as an identifier for the processor, and, by association, the system of which it is a part. Though the default setting for the processor serial number is "off," it can be turned on through the use of software. Intel believes that system identification can enable certain benefits, such as authenticating participants in a secure chat room or ensures security in e-commerce situations. For business users, processor serial number identification will allow information technology departments to provide better information management or improved management of corporate PC assets. See <http://support.intel.com/support/processors/pentiumiii/psu.htm> and <http://support.intel.com/support/processors/pentiumiii/psqa.htm#2>.

36 See, for example, Greg Lefevre, 1999, "Microsoft's GUID Sparks Fears of Privacy Invasion," CNN, March 8. Available online at <http://www.cnn.com/TECH/computing/9903/08/microsoft.privacy.02/>.











Buy this book

Buy this book

Copyright 2002 by the National Academy of Sciences
Previous Table of Contents Next