如果你看到的话,那么你需要设置你的php教程并开启这个库,如果你是在windows平台下,那么非常简单,你需要改一改你的php.ini文件的设置,找到php_curl.dll,并取消前面的分号注释就行了,如下所示:
取消下在的注释 extension=php_curl.dll
如果你是在linux下面,那么,google排名你需要重新编译你的php了,编辑时,你需要打开编译参数——在configure命令上加上“–with-curl” 参数.
一个小示例,如果一切就绪,下面是一个小例程,代码如下:
- <?php
-
- $curl = curl_init();
-
- curl_setopt($curl, curlopt_url, 'http://phpfensi.com');
-
- curl_setopt($curl, curlopt_header, 1);
-
- curl_setopt($curl, curlopt_returntransfer, 1);
-
- $data = curl_exec($curl);
-
- curl_close($curl);
-
- ?>
var_dump($data);
如何post数据
上面是抓取网页的代码,下面则是向某个网页post数据,假设我们有一个处理表单的网址http://www.example.com/sendsms.php,其可以接受两个表单域,一个是电话号码,一个是短信内容,代码如下:
- <?php
- $phonenumber = '13912345678';
- $message = 'this message was generated by curl and php';
- $curlpost = 'pnumber=' . urlencode($phonenumber) . '&message=' . urlencode($message) . '&submit=send';
- $ch = curl_init();chain link fencing
- curl_setopt($ch, curlopt_url, 'http://www.example.com/sendsms.php');
- curl_setopt($ch, curlopt_header, 1);
- curl_setopt($ch, curlopt_returntransfer, 1);
- curl_setopt($ch, curlopt_post, 1);
- curl_setopt($ch, curlopt_postfields, $curlpost);
- $data = curl_exec();
- curl_close($ch);
- ?>
从上面的程序我们可以看到,使用curlopt_post设置http协议的post方法,而不是get方法,然后以curlopt_postfields设置post的数据.
关于代理服务器
下面是一个如何使用代理服务器的示例,请注意其中高亮的代码,代码很简单,我就不用多说了,代码如下:
- <?php
- $ch = curl_init();
- curl_setopt($ch, curlopt_url, 'http://www.phpfensi.com');
- curl_setopt($ch, curlopt_header, 1);
- curl_setopt($ch, curlopt_returntransfer, 1);
- curl_setopt($ch, curlopt_httpproxytunnel, 1);
- curl_setopt($ch, curlopt_proxy, 'fakeproxy.com:1080');
- curl_setopt($ch, curlopt_proxyuserpwd, 'user:password');
- $data = curl_exec();
- curl_close($ch);
- ?>
关于ssl和cookie
关于ssl也就是https协议,你只需要把curlopt_url连接中的http://变成https://就可以了,当然,还有一个参数叫curlopt_ssl_verifyhost可以设置为验证站点.
关于cookie,你需要了解下面三个参数:
curlopt_cookie,在当面的会话中设置一个cookie.
curlopt_cookiejar,当会话结束的时候保存一个cookie.
curlopt_cookiefile,cookie的文件.
http服务器认证,最后,我们来看一看http服务器认证的情况,代码如下:
- <?php
- $ch = curl_init();
- curl_setopt($ch, curlopt_url, 'http://www.phpfensi.com');
- curl_setopt($ch, curlopt_returntransfer, 1);
- curl_setopt($ch, curlopt_httpauth, curlauth_basic);
- curl_setopt(curlopt_userpwd, '[username]:[password]')
- $data = curl_exec();
- curl_close($ch);
- ?>
看一个利用curl抓取163邮箱地址列表代码
curl技术说白了就是模拟浏览器的动作实现页面抓取或表单提交,通过此技术可以实现许多有去的功能,代码如下:
- <?php
- error_reporting(0);
-
- $user = 'papatata_test';
-
- $pass = '000000';
-
-
-
- $url = 'http://reg.163.com/logins.jsp教程?type=1&url=http://entry.mail.163.com/coremail/fcg/ntesdoor2?lightweight%3d1%26verifycookie%3d1%26language%3d-1%26style%3d-1';
- $ch = curl_init($url);
-
- $cookie = tempnam('.','~');
- $referer_login = 'http://mail.163.com';
-
- curl_setopt($ch, curlopt_returntransfer, true);
- curl_setopt($ch, curlopt_header, true);
- curl_setopt($ch, curlopt_connecttimeout, 120);
- curl_setopt($ch, curlopt_post, true);
- curl_setopt($ch, curlopt_referer, $referer_login);
- $fields_post = array(
- 'username'=> $user,
- 'password'=> $pass,
- 'verifycookie'=>1,
- 'style'=>-1,
- 'product'=> 'mail163',
- 'seltype'=>-1,
- 'secure'=>'on'
- );
- $headers_login = array(
- 'user-agent' => 'mozilla/5.0 (windows; u; windows nt 5.1; zh-cn; rv:1.9) gecko/2008052906 firefox/3.0',
- 'referer' => 'http://www.163.com'
- );
- $fields_string = '';
- foreach($fields_post as $key => $value)
- {
- $fields_string .= $key . '=' . $value . '&';
- }
- $fields_string = rtrim($fields_string , '&');
- curl_setopt($ch, curlopt_cookiesession, true);
-
- curl_setopt($ch, curlopt_cookiejar, $cookie);
- curl_setopt($ch, curlopt_httpheader, $headers_login);
- curl_setopt($ch, curlopt_post, count($fields));
- curl_setopt($ch, curlopt_postfields, $fields_string);
- $result= curl_exec($ch);
- curl_close($ch);
-
- $url='http://entry.mail.163.com/coremail/fcg/ntesdoor2?lightweight=1&verifycookie=1&language=-1&style=-1&username=loki_wuxi';
- $ch = curl_init($url);
- $headers = array(
- 'user-agent' => 'mozilla/5.0 (windows; u; windows nt 5.1; zh-cn; rv:1.9) gecko/2008052906 firefox/3.0'
- );
- curl_setopt($ch, curlopt_returntransfer, true);
- curl_setopt($ch, curlopt_header, true);
- curl_setopt($ch, curlopt_connecttimeout, 120);
- curl_setopt($ch, curlopt_post, true);
- curl_setopt($ch, curlopt_httpheader, $headers);
-
- curl_setopt($ch, curlopt_cookiefile, $cookie);
- curl_setopt($ch, curlopt_cookiejar, $cookie);
- $result = curl_exec($ch);
- curl_close($ch);
-
- preg_match('/sid=[^"].*/', $result, $location);
- $sid = substr($location[0], 4, -1);
-
-
- $url='http://g4a30.mail.163.com/jy3/address/addrlist.jsp?sid='.$sid.'&gid=all';
- $ch = curl_init($url);
- $headers = array(
- 'user-agent' => 'mozilla/5.0 (windows; u; windows nt 5.1; zh-cn; rv:1.9) gecko/2008052906 firefox/3.0'
- );
- curl_setopt($ch, curlopt_returntransfer, true);
- curl_setopt($ch, curlopt_header, true);
- curl_setopt($ch, curlopt_connecttimeout, 120);
- curl_setopt($ch, curlopt_post, true);
- curl_setopt($ch, curlopt_httpheader, $headers);
- curl_setopt($ch, curlopt_cookiefile, $cookie);
- curl_setopt($ch, curlopt_cookiejar, $cookie);
- $result = curl_exec($ch);
- curl_close($ch);
-
- unlink($cookie);
-
- preg_match_all('/<td class="ibx_td_addrname"><a[^>]*>(.*?)</a></td><td class="ibx_td_addremail"><a[^>]*>(.*?)</a></td>/i', $result,$infos,preg_set_order);
-
- print_r($infos);
- ?>
|