If you want to scrape content from the web don’t make it so obvious.

While CURL-ing content with PHP, it’s not uncommon for people to get upset because you’re either being clever and avoiding paying for something, or you’re just flat out stealing someone’s content.

The easiest way for them to do this is by checking the user-agent and that’s your biggest enemy. If you look in your php.ini file you’re probably set to identify as ‘PHP’ which is not only obvious but it’s easy to block. If you’ve got users visiting your domain identified as PHP; someone is trying to steal your stuff.

Fortunately there are numerous ways round this, you can modify your .htaccess file, set a PHP variable or modify the agent using Curl. Below are examples of how to identify your actions as a Mozilla browser:

Add the following line to your .htaccess file and that should do the trick:

php_value user_agent Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20071025 Firefox/

PHP set
You could also use an ini_set to define the user agent too, just place this PHP line into the head of the script doing the Curl:

ini_set('user_agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20071025 Firefox/');

Set it in Curl
You can also set the parameter in the Curl script itself, meaning that only this action is identified as Mozilla. Just add the following line into the Curl script in your PHP (not forgetting to change the $curl variable to whatever you’re using:

curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20071025 Firefox/');
Leave a Reply

Your email address will not be published. Required fields are marked *

sixteen − seven =

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You May Also Like

301 Redirect from non-www to www

Do you want to improve your search engine rankings? SEO gurus told you that Google might be penalizing you for duplicate content? How did you get duplicate content, you ask? Unless you specify otherwise, we send www.yourdomain.com and yourdomain.com to the same place.

How to Create an Effective Coming Soon Page

Coming soon pages are a rather young concept on the internet. Back in the day, when a new business was planning to launch a website, one day there was no website, and the next day the thing was live … just like that with no warning.

How to add expires header to images?

Caching with .htaccess and Apache will take your website and your web skills to the next level. This is some technical and advanced methods condensed to simple htaccess code examples for you.