Web Search Engines

Module W12b


Contents

About this document...
Audience and Objectives


Wherever you see this separator line in the document, clicking on it will return you to the Contents.

 
MOST OF THE WEB'S WIDE WORLD IS OFF THE MAP
A study by the NEC Research Institute has found that the vast majority of 600 million Web pages open to the public are not indexed by search engines. The report -- which be can be found at http://www.wwwmetrics.com/ -- says the top search engines in terms of coverage are Northern Light (covering 16% of the Web), Snap (15.5%), AltaVista (15.5%), Hotbot (11.3%). Other search engine coverage figures are:  Microsoft (8.5%), Infoseek (8%), Google (7.8%), Yahoo (7.4%), Excite (5.6%), Lycos (2.5%), and Euroseek (2.2%).  The search engines seem to be biased toward sites that receive the most traffic, and use a site's popularity to decide whether it should be indexed.  The study also says that 83% of the Web now contains commercial content, 6% offers information of scientific or educational value, and 1.5% is focused on pornography.  (abstracted from The New York Times 8 Jul 99  http://www.nytimes.com/library/tech/99/07/circuits/articles/08geek.html in NewsScan Daily, 8 July 1999.)

Individual Search Engines

Listed in alphabetical order. For background on searching, see module W12c.


Alta Vista
http://www.altavista.com/
Details:
http://www.altavista.com/av/content/help.htm
Searches:
Web, Usenet
Features:
Browsing by Category; 
Keyword search: must, must not, phrase, boolean, refine, organize results. Specify what language, date range, where in a Web page
Excite http://www.excite.com/
Details:
http://www.excite.com/Info/searching.html?a-tip-t
Searches:
Web, Usenet, news feed
Features:
Browsing by Category; 
Keyword search: must, must not, phrase, boolean, concept-based, view by Web site, refine, editorial selection

FedWorld

 http://www.fedworld.gov/
Details:
Coming soon!
Searches:
U.S. Government Web sites
Features:
Coming soon!

Google!

http://google.stanford.edu/
Details:
General info + tips at http://google.stanford.edu/about.html
Searches:
Web
Features:
Keyword search: must

GovBot

 http://eden.cs.umass.edu/Govbot/
Details:
Coming soon!
Searches:
U.S. Government Web sites
Features:
Coming soon!
HotBot
http://www.hotbot.com/
Details:
http://help.hotbot.com/
Searches:
Web, Usenet
Features:
Browsing by Category; 
Keyword search: The Web, Usenet, news site, classifieds, domain names, stocks, discussion groups, shareware, businesses, people, email addresses. Specify what language.
InfoSeek http://www.infoseek.com/
Details:
http://www.infoseek.com/Help?pg=HomeHelp.html
Searches:
Web, stocks, news feed, people, and UPS tracking. 
Features:
Browsing by Category; 
Keyword search: must, must not, phrase
Lycos 
http://www.lycos.com/
Details:
http://www.lycos.com/help/
Searches:
Web, Lycos Pictures & Sounds, Lycos database of TOP 5% reviews, personal homepages, UPS tracking, books, stocks
Features:
Browsing by Category; 
Keyword search: must, must not, phrase, natural language query, boolean, organize results
PlanetSearch http://www.planetsearch.com/
Details:
http://www.planetsearch.com/?a=9&h=32
Searches:
Web
Features:
Browsing by Category - mainly entertainment; 
Keyword search: must, must not, simple boolean, color-coded display of results
WebCrawler http://www.webcrawler.com/
Details:
http://www.webcrawler.com/Help/Help.html
Searches:
Web
Features:
Browsing by Category; 
Keyword search: organize results, boolean, subject-oriented edited list
Yahoo! http://www.yahoo.com/
Details:
http://search.yahoo.com/search/help?
Searches:
Web, Usenet, Yellow Pages, People (including email), Maps, Classifieds, Personals, News, Sports, Weather, Stock Quotes
Features:
Browsing by Category; 
Keyword search: must, must not, document section restrictions, all, any, phrase, wildcard, person's name, limit by date

Multiple Search Sites

These sites provide parallel searches using several engines at once. When the information comes back, it is analyzed, organized,  and presented in various ways.
 
Dogpile http://www.dogpile.com/
Details:
http://www.dogpile.com/notes.html
Uses:
WWW: Yahoo, Lycos' A2Z, Go2.com, Excite Guide Search, WWW Yellow Pages, Thunderstone, What U Seek, Lycos, PlanetSearch, Magellan, WebCrawler, InfoSeek, AltaVista, Excite & HotBot. 
Usenet: Reference.com, Dejanews, HotBot News, Altavista and Dejanews' old Database. 
FTP: Filez and FTP Search. 
News Wires: Yahoo News Headlines, Excite News, Infoseek News Wires. 
Features:
must, must not, boolean as supported by the engines used
Inference Find http://www.infind.com/
Details:
...of search strategy at 
http://www.infind.com/about.html
...and of syntax at 
http://www.infind.com/boolean.html
Uses:
WebCrawler, Yahoo, Lycos, Alta Vista, InfoSeek, and Excite
Features:
Searches Web only. Provides clustering of results by type. Boolean and other search syntax features are not supported by all engines. 
MetaCrawler http://www.metacrawler.com/
Details:
http://www.metacrawler.com/customize.html
Uses:
AltaVista,  Excite,  Infoseek,  Lycos,  Webcrawler,  Yahoo! 
Features:
Customization for search domain, timeout, interface, results format.  Search the Web, usenet, FTP, computer products. 
Metafind http://www.metafind.com/
Details:
http://www.metafind.com/syntax.html
Uses:
AltaVista, Excite, Infoseek, Planetsearch and Webcrawler. 
Features:
You can use AND, OR, NEAR, NOT, ( ), and "" in your search (no need to capitalize). AND is the default connector. Organizes the results. 

Desktop Search Organizer(s)

Mata Hari

Not a search engine: this is a search organizer 

http://www.theWebtools.com/

Details:
Mata Hari is a rogram you run on your own computer to organize searches using various search engines. 
Searches:
Web and local files
Features:
Simple and boolean searches; also allows multiple questions and lets you do other things while it queries the search engines. You can customize the organization of results, and easly save the "hit list" it returns.

Audience:

This is for people familiar with the World Wide Web who want to make better use of serach engines for research.

Objectives:

This module is a reference guide to help you choose a search engine, and to appreciate the differences between them.

About this document...

Module W12b: Web Search Engines

This document is part of a modular instruction series in Computer Information Systems.
Closely related: For more information, see the overview or the list of modules in this series, W- (World Wide Web). This document has been used in the following classes: CIS 100, CIS 101, CIS 160
Author:
Laurence J. Krieg
Institution:
Department of Computer Information Systems, Washtenaw Community College
History:
Original: 16 Nov 1997
This version posted: Monday, 31-Aug-2009 11:48:00 EDT
Copyright:
Copyright © 1999, Laurence J. Krieg.

Instructors: You may point to this file in your Web-based materials.
Students: you may make a copy for your personal use.
All other uses: contact the author, Laurence J. Krieg for permission.