Magento WizardNo joke, this website relaunches on 1st April. New look, new features!

Contact the site  Make Money

Plucking Out Various Content En Mass from Web-sites

 Add to Favourites  
(No Ratings Yet)
Loading ... Loading ...

I’ve been doing some research recently about how to grab various content off multi-page web-sites, and I’ve found a tool which is perfect for the job!

Just recently, I’ve been looking to create a business directory which would contain a list of companies, including their company name , contact number, postal address, and their web-site. Setting up the software which is going to house these listings is the easy bit, as there are numerous open-source business directories made in PHP which are more than adequate for the job, however the actual collection of data to fill the web-site was the hard bit… until now!

Yell.com (similar web-sites are available!) lists hundreds of thousands of businesses on their site, and it is such a good resource for finding what you need. Getting hold of the data though has always been tricky, however I have come across a piece of software called Web Content Extractor, which does it all for you!

All you have to do is load the page of the Yell web-site within the mini-browser of the software, and highlight the areas of the page from where you’d like the data collected – then you simply tell it to follow the specific hyperlinks and carry on the routine on the subsequent pages. In this case, I merely tell the software to follow hyperlinks that say “Next >”, and then it’ll go and grab the listings for every single company in the genre that I have selected!

The data is then placed into an ideal format, where itcan then be exported in different ways – either Excel, CSV, SQL etc. Now I’ve got hold of it in a way that I want, I can then easily insert this data into my web-site database, so that my site is as populated as Yells is! Best of all, this process can be fully automated – can’t be bad! I really recommend you check it out…

Extract lots of data from multi-page websites using Web Content Extractor

  • Digg
  • del.icio.us
  • Facebook
  • Google
  • LinkedIn
  • Live
  • Reddit
  • StumbleUpon
  • Technorati
  • TwitThis

about the author

    This article was written by Sam Davis on September 25, 2006.
    Computing over a glass of Grenache Shiraz... again!
    Sam is the Editor of Blasted Thing. Contact Us

related articles

 PHP 6 and MySQL 5 for Dynamic Web Sites: Visual QuickPro Guide
 Niche marketing just got easier
 Automatically populating a forum with REAL topics!!
 XSitePro for designing successful websites

comments

Leave a Reply




Spam protection by WP Captcha-Free


XSitePro for Designing Successful Websites

  • Sponsored Ads

  • Magento Themes
    • vtiger: great article, thank you....
  • Recent Tweets

    new magento site - project for Kopila by evageser: as discussed (Budget: $250-750, Jobs: Magento, PHP) http://url4.eu/1uXmo

    #jobs - Temmplate Install On Magento http://bit.ly/dnIaXj

    Hallo Welt. Auf nach Abstatt zum Shop-Finetuning.. #magento

    Magento - Siding Cost Per Square Foot. Sears Siding And Windows ...: siding cost per square foots community centre... http://bit.ly/aiezP8

    new #magento site - project for Kopila by evageser: as discussed (Budget: $250-750, Jobs… http://goo.gl/fb/efw2