Teach Time Encyclopedia - Learn About Our World
Home Page
Teach Time
Featured Topics

United States
by state

CITYology

Academic Disciplines

Historical Timelines

Themed Timelines

Calendars

Reference Tables

Biographies

How-tos



Monday, October 13, 2008

Grub distributed web-crawling project

Grub is the name for a search engine pioneered by LookSmart based on the power of distributed computing. Users may download the grubclient software and let it run during computer idle time. The client indexes URLs and sends them back to the main grub server in a highly compressed form. The collective cache can then be searched on the Grub website. Grub is able to quickly build a large cache by asking thousands of clients to cache a small portion of the web each.

Though many believe in Grub's novel distributed computing system, the search engine has its share of opponents. Many state that a large cache is not the strength of a good search engine, rather, that it is the ability to deliver accurate, precise results to users. Loyal fans of Google state that they enjoy that search engine for its targeted results and would not switch to Grub unless its search technology were superior to Google's. Quite a few webmasters are opposed to Grub for its apparent ignorance of sites' robots.txt files. These files can prevent robots from caching certain areas. Because Grub, as its developers claim, also caches robots.txt, changes to the file may not be detected. Webmasters counter that Grub does not understand long-lasting robots.txt files blocking access to all crawlers. According to Wikipedia's own webmasters, the /w/ directory, which stores the scripts for page-editing, etc. and is blocked to robots by robots.txt, is cached by Grub but no other search engine. Wikipedia's webmasters also complain that Grub's distributed architecture creates server overload by keeping open a large number of TCP connections — the effects of this are essentially the same as a typical distributed denial of service attack.

References

Two posts, [1] and [1], to Wikitech-L by Brion Vibber, one of Wikipedia's developers.


Internet Hotel Solutions

Site Sponsors
AC Units
Baltimore Harbor
Boot Camp Grads
Bra Size
Burkittsville
College Hotels
Digital Harbor
Free Cell Phones
Golden Hare Travel
Golf Vacations
Golf Courses
Gourmet
Hair Styles
Hippodrome
iWoman
Lesson Plans
Maryland Hotels
MD Genealogy
Minor League Stuff
Motel Site
Ocean City
OC Real Estate
Old Agers
Office Supplies
Orlando
Pet Friendly Hotel
Room Prices
Savannah, GA
Ski Vacations
South Baltimore
Student Teaching
Travel Sources
University Hotels
Visit Military Bases
Washington, DC

Brought to you by NoChildLeftBehind.com and the Beaches and Towns Network, LLC.