Basic Web Crawler

his would be a web crawler that would be able to gather information in an automated fashion and enter this information into a database for other uses. While I have a few ideas for uses of this information, I do not yet want to describe those plans fully.
Information gathered would include total display characters, dimensions of images displayed, quantity of images, links contained on the page and whether those links are within the site or to external destinations.
I envision that the process would be recursive and controllable in how many links deep the crawler would go, whether to go outside the initial target site and whether to gather the information in concentric rings or whether it will follow each branch of the tree.

About this Entry

This page contains a single entry by tmichael published on June 20, 2001 4:21 PM.

Email proxy was the previous entry in this blog.

Image comparison program is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.