|
web crawler
A spider is a program that browses (crawlers) web sites extracting information for search engine databases. Spiders can be summoned to a site through search engine registration or they will eventually find your site by following links from other sites (assuming you have links from other sites).
spider tips
- Spiders do not read pages as browser do. They generally cannot execute JavaScript, including links performed by scripting, or frames links.
- While spiders will explore your site using hyperlinks, they typically will only go so many levels deep. Therefore, it is likely that a visiting spider may not index your entire site. You may need to register multiple pages with a search engine.
- If you have sensitive information you do not want indexed on a search engine you can use meta tags or a special instruction file to block them from certain pages.
- Your web server logs can tell you when and what pages a spider has visited.
|
|
|