SEARCH ENGINES

For an Internet researcher the first great tool is the general, Internet-wide search engine. The largest ones encompass huge amounts of data (Google at last count indexed over four billion pages, but even that isn't the entire Web or even, it is speculated, the majority of the Web). They're also your ticket to finding more general search engines and useful information collections. So as an Internet researcher the first element you must know is the general search engine. There are two that I consider major and a bunch of minor ones. They can be divided into two broad categories: full-text search engines and searchable subject indexes. Full-Text Engines Full-text engines are those search engines that try to index the entire content of a Web page. (They don't always do it because many search engines limit how much of a page they'll index. Google, for example, will only index the first 101K of a page no matter how large the page is.) That includes the title, URL, and page content. Google and Teoma are examples of full-text engines. Searchable Subject Indexes Searchable subject indexes make no attempt to index the content of a site. Instead, the name and URL of a site—and usually some kind of brief description—are included in a set of categories. Mixing It Up Now, here's the tricky part. Google, a full-text search engine, has a searchable subject component called Google Directory. Yahoo, a searchable subject index, has the option to search a full-text engine. (Yahoo's directory results are from their searchable subject index, while their full-text search matches are called Web results and come from their own full-text search engine.) But primarily, Google is known as a full-text engine and Yahoo is known as a searchable subject index. Why Have Two Kinds? Why have two kinds of search engines anyway? What is each one good for? Full-text search engines are good when you're searching for very distinct types of information—for example, quotes, song lyrics, addresses, less-famous people, lesser-known places, or complicated queries. Searchable subject indexes do not contain enough information about Web pages to answer these kinds of queries. On the other hand, the limitations of searchable subject indexes make them very useful for more general searching—when you're trying to find information on New York, for example. Or George Washington. Or other general topics. Sometimes going through a searchable subject index finds you enough material that you can then get more specific information from a full-text engine. The two types of search engines work harmoniously together—provided you know which one to use first. What They All Have in Common: Search Defaults Despite the fact that they're searching very different things, both types of search engines have one thing in common: their search default. This is important, so pay attention. When you enter a multiple-word query into a search engine and don't enter any search modifiers (like AND or NOT), the search engine has to decide how to treat your query. Broadly speaking, the search engine can do one of two things. It can decide to search so that of your search words must be included in any results—in this case it's defaulting to AND. Or it can decide to search so that any of your search words must appear in a document for it to appear in search results. In that case it's defaulting to OR. The first most important thing to know about a search engine is whether it's a full-text engine or a searchable subject index. The second most important thing to know is whether the search engine defaults to AND or OR. If it defaults to AND, you should be more thoughtful about your query words, because all query words you choose must appear in a Web page before you'll get results. If it defaults to OR, you should be sure to put + signs in front of terms that must be included in your search. You can also try to search more for phrases. How can you tell if something defaults to AND or OR? Do a search with a very odd set of words—say elderberry chiropractic snowblower brick. If you get no results (or just a few results), you're searching an engine that defaults to AND. If you get lots of results, you've found an engine that defaults to OR.