Title: The Invisible Web. By: Clyde, Anne, Teacher Librarian, 14811782, Apr2002, Vol. 29, Issue 4
Database: Academic Search Complete
HTML Full Text
THE INVISIBLE WEB
Section: Info Tech
In October 2001 Google claimed to have the largest database of any Web search engine -- it was searching a database with links to more than 1,600 million Web pages. Enormous as the Google database is, though, other search engines were indexing some web pages that were not represented in Google's database. Further, the new "bow tie" theory of the Internet (developed by Chris Sherman) suggests that there is not as much overlap among the major search engines and directory databases as had previously been thought. In other words, the Internet as represented in the search engine databases may be much larger than we have assumed.
Even so, it seems that Web search engines together are indexing no more than around 40 to 50 percent of the Web (estimates range from as low as 20 percent to as high as 50 percent). This means that a very large part of the Web cannot be searched using search engines such as Google, AltaVista, HotBot, or Excite. USA Today reported on a research paper estimating that the World Wide Web is 500 times larger than what is represented in the databases of these search engines. Terms used for the part of the Web's content that is not indexed by the major search engines include "the invisible web" or "the deep web" or "the hidden web." It is not that this portion of the Web really is invisible; it is just that it is not available to users of conventional Web-based search engines. Locating information on the "invisible" or "deep" Web is often referred to as "mining." It may require special skills and special access conditions (such as passwords) or even special software.
Material available only to people with passwords represents one of the most important parts of the "invisible" Web. The required password might be paid for, come via membership in an association or be free, but nevertheless serves to protect information from users who do not have authorized access. Web sites of some professional organizations have pages that are available only to members. Some universities and colleges make course materials available only to their own registered students. Subscription-based commercial online information services such as DIALOG or OCLC, SIRS or InfoTrac are also "hidden" from Internet users who have not paid the fee or who are not users of a school library or university library that has paid the fee. Subscription-based online encyclopedias are another example.
Another component of the "invisible" Web is material made publicly available through real-time "streaming" services -- services that may send a news message in a running "ticker" across the computer screen, or provide audio or video content in a stream to a window on the screen. Once the streamed message is gone, it cannot be recalled by the user through conventional means (unless it is captured to disk when first broadcast) -- yet it remains stored on the service that sent it out. Continuously updated news, financial information, weather information and sports scores all fall into this category. Much of this information may also have historical value in the future.
Still another major component of the "invisible" Web is information stored in databases on database-driven web sites (Wiseman, 1999/2000). BrightPlanet/Complete Planet claims to have links to more than 90,000 searchable databases through their web site. An example of a database-driven web site is Amazon, the electronic bookstore. At Amazon, much of the information about books, authors, reviews and book rankings is stored in databases and is only displayed on a web page when someone does a search. Such web pages, generated automatically in response to a search query, are called "dynamic pages." The next time someone does a search using the same search terms, the page of results might look different because new items have been added to the database or because Amazon has stored information about the user's preferences. Even if a search engine does penetrate part of a dynamic Web site, the dynamically created Web pages will have changing URLs in response to each new search, so that storing a URL is pointless. Other important database-driven Web sites include:
• the Universal Currency Converter, which updates financial/currency rates once a minute,
• FedStats, which provides access to statistics maintained by United States government agencies; most statistical data from the United States Census Bureau (and the equivalent bureau of other countries),
• ERIC, the enormous database of education resources maintained by the United States federal Department of Education,
• PubMed, the Web version of the important Medline medical database, and
• library catalogs such as the catalog of the Library of Congress (which includes more than 12 million documents) and even the catalogs of local libraries.
There is no single easy answer to the inevitable question, "How do I find information on the invisible Web?" While directories exist, their coverage varies and is far from complete. The commercial online information services vary a great deal in terms of the search strategies supported and searchers are most likely to be successful when they understand the content of the databases and the way in which the content is indexed. This can be true even of the apparently simple services like online encyclopedias. Generally speaking, the complex scientific and business databases (whether free or for-fee) require search skills specific to the particular database, so that considerable search skills are required to move from database to database.
Direct Search (compiled by Gary Price, co-author of The invisible Web: Uncovering information sources search engines can't see) is a large searchable directory of "links to the search interfaces of resources that contain data not easily or entirely searchable/accessible from general search tools" -- in other words, the "invisible" Web. It includes archive collections, library catalogs, collections of electronic books, United States government databases, international databases, news sources, bibliographies, journal or magazine databases, legal services, standard reference works online and subject-specific resources. Another and shorter listing can be found on Robert Lockie's Web page, "Those dark hiding places: The invisible Web revealed," while the previously-mentioned BrightPlanet is a major source.
Commercial online information services (generally speaking, services that charge a fee) make available databases that are not normally available to Internet users. The DIALOG online information service has more than 600 databases accessible via its Dialog Web and Dialog Classic services; OCLC, LEXIS-NEXIS and Emerald, among others, also provide access to commercial databases and services. While the search interface may take time to master, these services enable the user to get authoritative information quickly and to download it in a format that is appropriate for a particular application. The indexing is logical, the database structures are coherent and the information is verified. While many of the online information services are focused on research or business databases, there are some, such as SIRS, InfoTrac and the encyclopedia databases, that are aimed at public and school libraries. The major newspaper databases also have the school and public library audience in mind, though they may be targeting their services at researchers as well. Because commercial databases differ in terms of content and search interface, users will need some level of information awareness and search skills in order to use them.
The last words go to Karen Diaz (2000,p.134): "The dramatic term 'invisible' is meant to underscore the importance of realizing there is more to the Web than even Metacrawler search engine might reveal. In reality, the resources discussed here are not invisible, but merely out of the mainstream of accepted search strategies." Further, she says, "The more we learn about the compartments on the Web and the variety of tools that exist to access these compartments, the less invisible the resources become, and the richer and more productive is our experience of using the Web for research."