SEARCH ENGINES
For an Internet researcher the first great tool is the general,
Internet-wide search engine. The largest ones encompass huge amounts of data
(Google at last count indexed over four billion pages, but even that isn't the
entire Web or even, it is speculated, the majority of the Web). They're also
your ticket to finding more general search engines and useful information
collections.
So as an Internet researcher the first element you must know is
the general search engine. There are two that I consider major and a bunch of
minor ones. They can be divided into two broad categories: full-text search
engines and searchable subject indexes.
Full-Text Engines
Full-text engines are those search engines that try to index
the entire content of a Web page. (They don't always do it because many search
engines limit how much of a page they'll index. Google, for example, will only
index the first 101K of a page no matter how large the page is.) That includes
the title, URL, and page content. Google and Teoma are examples of full-text
engines.
Searchable Subject Indexes
Searchable subject indexes make no attempt to index the content
of a site. Instead, the name and URL of a site—and usually some kind of brief
description—are included in a set of categories.
Mixing It Up
Now, here's the tricky part. Google, a full-text search engine,
has a searchable subject component called Google Directory. Yahoo, a searchable
subject index, has the option to search a full-text engine. (Yahoo's directory
results are from their searchable subject index, while their full-text search
matches are called Web results and come from their own full-text search engine.)
But primarily, Google is known as a full-text engine and Yahoo is known as a
searchable subject index.
Why Have Two Kinds?
Why have two kinds of search engines anyway? What is each one good for?
Full-text search engines are good when you're searching for very distinct types of information—for example, quotes, song lyrics, addresses, less-famous people, lesser-known places, or complicated queries. Searchable subject indexes do not contain enough information about Web pages to answer these kinds of queries.
On the other hand, the limitations of searchable subject indexes make them very useful for more general searching—when you're trying to find information on New York, for example. Or George Washington. Or other general topics. Sometimes going through a searchable subject index finds you enough material that you can then get more specific information from a full-text engine. The two types of search engines work harmoniously together—provided you know which one to use first.
What They All Have in Common: Search Defaults
Despite the fact that they're searching very different things, both types of search engines have one thing in common: their search default.
This is important, so pay attention.
When you enter a multiple-word query into a search engine and don't enter any search modifiers (like AND or NOT), the search engine has to decide how to treat your query. Broadly speaking, the search engine can do one of two things. It can decide to search so that of your search words must be included in any results—in this case it's defaulting to AND. Or it can decide to search so that any of your search words must appear in a document for it to appear in search results. In that case it's defaulting to OR.
The first most important thing to know about a search engine is whether it's a full-text engine or a searchable subject index. The second most important thing to know is whether the search engine defaults to AND or OR. If it defaults to AND, you should be more thoughtful about your query words, because all query words you choose must appear in a Web page before you'll get results. If it defaults to OR, you should be sure to put + signs in front of terms that must be included in your search. You can also try to search more for phrases.
How can you tell if something defaults to AND or OR? Do a search with a very odd set of words—say elderberry chiropractic snowblower brick. If you get no results (or just a few results), you're searching an engine that defaults to AND. If you get lots of results, you've found an engine that defaults to OR.