Plucking Out Various Content En Mass from Web-sites
Add to Favourites
|
I’ve been doing some research recently about how to grab various content off multi-page web-sites, and I’ve found a tool which is perfect for the job!
Just recently, I’ve been looking to create a business directory which would contain a list of companies, including their company name , contact number, postal address, and their web-site. Setting up the software which is going to house these listings is the easy bit, as there are numerous open-source business directories made in PHP which are more than adequate for the job, however the actual collection of data to fill the web-site was the hard bit… until now!
Yell.com (similar web-sites are available!) lists hundreds of thousands of businesses on their site, and it is such a good resource for finding what you need. Getting hold of the data though has always been tricky, however I have come across a piece of software called Web Content Extractor, which does it all for you!
All you have to do is load the page of the Yell web-site within the mini-browser of the software, and highlight the areas of the page from where you’d like the data collected – then you simply tell it to follow the specific hyperlinks and carry on the routine on the subsequent pages. In this case, I merely tell the software to follow hyperlinks that say “Next >”, and then it’ll go and grab the listings for every single company in the genre that I have selected!
The data is then placed into an ideal format, where itcan then be exported in different ways – either Excel, CSV, SQL etc. Now I’ve got hold of it in a way that I want, I can then easily insert this data into my web-site database, so that my site is as populated as Yells is! Best of all, this process can be fully automated – can’t be bad! I really recommend you check it out…
Extract lots of data from multi-page websites using Web Content Extractor
about the author
This article was written by Sam Davis on September 25, 2006.
Computing over a glass of Grenache Shiraz... again! Sam is the Editor of Blasted Thing. Contact Us |
related articles
comments
Leave a Reply
![]() XSitePro for Designing Successful Websites |





























