Information Professionals & Information Retrieval: The Web as an Information Retrieval System

4.05.2008

The Web as an Information Retrieval System

In traditional database systems, the retrieval of information depends in large part on the organization of the database used for the search, and the description of the data it contains. This structured environment make the retrieval procedure a lot easier. However, the Web was not designed as a system for the retrieval of organized information. Instead, it evolved into a dynamic and unstructured (and largely uncontrolled) archive of the world’s digital documents (Singhal, 2001). Although the Web has been described as a hypertext database by some (Shneiderman & Kearsley, 1989), others disagree, arguing that a true hypertext database has a conceptual model behind it which provides organization and consistency. This is hardly true of the Web, or even of individual documents on the Web (Baeza-Yates & Ribeiro-Neto, 1999).

If the goal of an information system is to retrieve all and only the relevant documents in a collection for a particular query, how does it work when the collection is all documents available on the Web? We cannot evaluate our success based upon recall and precision alone, because there is just no way of knowing how many relevant documents are out there.

Baeza-Yates & Ribeiro-Neto (1999) identified the following problems which also impact retrieving information from the Web:

Distributed data: documents spread over millions of different web servers.
Heterogeneous data: multiple media types (images, video), multiple languages, even different alphabets.
Volatile data: documents can change or disappear rapidly.
Unstructured and redundant data: no uniform structure, nearly 30% duplicate documents.
Quality of data: no editorial control, inaccurate information, poor quality writing, typos.
Large volume: billions of separate documents.

Information professionals have always needed to evaluate any information they retrieved. Now, in spite of the remarkable progress made in developing sophisticated search tools on the Web, the need to assess the reliability of retrieved information is more important than ever.

References

Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: Association for Computing (ACM) Press.

Shneiderman, B., & Kearsley G. (1989). Hypertext hands-on: An introduction to a new way of organizing and accessing information. Boston: Addison Wesley.

Singhal, A. (2001). Modern information retrieval: A brief overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 24(4), 35-43. Retrieved March 8, 2008, from http://singhal.info/ieee2001.pdf

1 comment:

Ken said...: In a related vein, it also has to be tagged and described! I'll pass on that task - I'm rooting for del.icio.us.; 4/13/2008 1:18 PM

Post a Comment

Introduction

Retrieving information is a key area of library and information science. Librarians and other information professionals have long been experts in knowing where to find the best information and how to access it. Our skills include understanding how information is stored in various systems, how it is searched, evaluated, and how people use it. Effective information search and retrieval is interrelated with the organization and description of the stored information.

Information professionals today have a vast array of resources at their command as they connect people with information. On this blog I will discuss some of those resources from the viewpoint of a professional looking for information, as well as highlight some interesting issues and trends.

4.05.2008

The Web as an Information Retrieval System

1 comment:

Purpose

Introduction

About Me: Carol Winfield

Blog Archive

My classmates in Information Science in Librarianship are blogging on related topics:

Select Information Retrieval Resources