Magento WizardNo joke, this website relaunches on 1st April. New look, new features!

Contact the site  Make Money

Plucking Out Various Content En Mass from Web-sites

 Add to Favourites  
(No Ratings Yet)
Loading ... Loading ...

I’ve been doing some research recently about how to grab various content off multi-page web-sites, and I’ve found a tool which is perfect for the job!

Just recently, I’ve been looking to create a business directory which would contain a list of companies, including their company name , contact number, postal address, and their web-site. Setting up the software which is going to house these listings is the easy bit, as there are numerous open-source business directories made in PHP which are more than adequate for the job, however the actual collection of data to fill the web-site was the hard bit… until now!

Yell.com (similar web-sites are available!) lists hundreds of thousands of businesses on their site, and it is such a good resource for finding what you need. Getting hold of the data though has always been tricky, however I have come across a piece of software called Web Content Extractor, which does it all for you!

All you have to do is load the page of the Yell web-site within the mini-browser of the software, and highlight the areas of the page from where you’d like the data collected – then you simply tell it to follow the specific hyperlinks and carry on the routine on the subsequent pages. In this case, I merely tell the software to follow hyperlinks that say “Next >”, and then it’ll go and grab the listings for every single company in the genre that I have selected!

The data is then placed into an ideal format, where itcan then be exported in different ways – either Excel, CSV, SQL etc. Now I’ve got hold of it in a way that I want, I can then easily insert this data into my web-site database, so that my site is as populated as Yells is! Best of all, this process can be fully automated – can’t be bad! I really recommend you check it out…

Extract lots of data from multi-page websites using Web Content Extractor

  • Digg
  • del.icio.us
  • Facebook
  • Google
  • LinkedIn
  • Live
  • Reddit
  • StumbleUpon
  • Technorati
  • TwitThis

about the author

    This article was written by Sam Davis on September 25, 2006.
    Computing over a glass of Grenache Shiraz... again!
    Sam is the Editor of Blasted Thing. Contact Us

related articles

 PHP 6 and MySQL 5 for Dynamic Web Sites: Visual QuickPro Guide
 Niche marketing just got easier
 Automatically populating a forum with REAL topics!!
 XSitePro for designing successful websites

comments

Leave a Reply




Spam protection by WP Captcha-Free


XSitePro for Designing Successful Websites

  • Sponsored Ads

  • Magento Themes
    • vtiger: great article, thank you....
  • Recent Tweets

    MAGENTO PROGRAMMER NEEDED - oDesk - Los Angeles, CA: Need a programmer that will work on many projects. From o... http://bit.ly/aafp0N

    Barely Corporate Wordpress Theme : Wordpress Themes Joomla Magento ... #wordpress

    開発開始。宿題になってるMagentoのモジュールを開発しよう。 #ktchikone

    Magento will mit 22,5 Mio. Dollar Wachstumskapital Expansion vorantreiben http://ff.im/-hNHUM

    #magento Freelancercom photographer shopping carts projects: Interspire Shopping Cart Custom Product, 12, $703, CS... http://bit.ly/cPBRIe