WHAT IS A WEBSITE SCRAPER?






Introduction


A website scraper is a piece of software that looks at your website and scrapes (or collects) all the data it wants to for whatever purpose desired.



What data can be obtained through scraping ?


Websites present data in a very structured way. Because of this any of your website data can be gathered, analysed and stored by scraper software including, for example, all text, pictures, links, comments, prices, addresses and phone numbers.



How do they do this ?


It isn't science fiction, black magic or particularly difficult to scrape a website. Here's how it works: scraper software opens your website and downloads a text copy of the html (that is the computer language that your site is written in). This creates a local copy of the website on their computer in the form of text file. This text file will have run all the scripts and whatnot so it is a true representation of a site. And if there are scripts present that have not run, the scraper will run them. In this way the scraper sees what a visitor sees. And they don't download just the index or home page. They download everything. Page by page, link by link, as much as they can in order to harvest the data. And as html files are so structured, it is relatively easy for the scraper software to then pick and choose what data gets further analysed and stored.



What information can be obtained ?


Scrapers can get any information they like: pricing information for competitors, contact information for marketeers, personnel information for recruiters. And of course Email addresses. Email addresses are one of the most common items scraped as they are easy to get (being very structured) and they have a value. See HTML EMAIL FORMAT article for more information on how Email's appear within a website's html code.



How does scraping software work ?


Scraper software is written in powerful scripting languages that can open a web site URL and copy the data in an instant. There is very little you can do to prevent them doing this. Common languages used are Perl, Python, C++ and Java, and all are available free. These languages come with many ready made developer libraries that can be used to open a site and have powerful search routines to search even large sites in fractions of a second. So even a high-school student can develop quite effective scrapers. And as scraping is not illegal, scraper software can also be bought from several well known on-line retailers.



How does WebEmailProtector help stop your email address being scraped ?


Our service prevents scraping because your email address is no longer contained within the html code structure (or any other code such as JSscript) on your site. We hold the address on our server (once you have registered it) and release it only once we are sure a bona-fide visitor is accessing it.



Enjoy !

Get an Email Address Encryption key for the WEBEMAILPROTECTOR service and secure your website email addresses here GET-A-KEY page.



Post your comments here!