在php中采集我们用的是简单的采集方式,例如file_get_contents就无法做到了,但是如果想模拟登录用户并采集利用它就没办法了,我们可利用CURL函数来实现模拟登录并采集数据.
这里要说一些,默认情况下,PHP的CURL功能是没有开启的,所以你要自己去开启这个功能,需要在php.ini中把 ;extension= php_curl.dll 前面的 " ; " 号去掉!!!
我讲讲昨天晚上的程序吧,虽然最后没有成功,但是还是学习到一些东西的,代码如下:
- $login="http://www.phpfensi.com/index.php?action=login";
- $post_file="user=××&pw=××";
- $cookie_file = tempnam('./temp','cookie');
-
-
- $ch=curl_init($login_url);
- curl_setopt($ch,CURLOPT_HEADER,0);
-
-
-
-
- curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
- curl_setopt($ch,CURLOPT_POST,1);
-
-
-
- curl_setopt($ch,CURLOPT_POSTFIELDS,$post_file);
- curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
- curl_exec($ch);
- curl_close($ch);
上面已经完成了模拟登录的过程,下面要做的就是进入具有权限的页面了,要记得你现在已经登录了,你应该把登录的凭证cookie保存了起来,代码如下:
- $url="http://www.phpfensi.com/admin/××";
- $ch = curl_init($url);
- curl_setopt($ch, CURLOPT_HEADER, 0);
- curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
-
-
-
- curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
- $contents = curl_exec($ch);
例2,代码如下:
- <?php
- $cookie_path = './';
-
-
-
- $vars['username'] = '张三';
- $vars['pwd'] = '123';
-
- $method_post = true;
-
- $url = 'http://****.com/login';
-
-
-
- $ch = curl_init();
- $params[CURLOPT_URL] = $url;
- $params[CURLOPT_HEADER] = true;
- $params[CURLOPT_RETURNTRANSFER] = true;
- $params[CURLOPT_FOLLOWLOCATION] = true;
- $params[CURLOPT_USERAGENT] = 'Mozilla/5.0 (Windows NT 5.1; rv:9.0.1) Gecko/20100101 Firefox/9.0.1';
-
- $postfields = '';
- foreach ($vars as $key => $value){
- $postfields .= urlencode($key) . '=' . urlencode($value) . '&';
- }
-
- $params[CURLOPT_POST] = true;
- $params[CURLOPT_POSTFIELDS] = $postfields;
-
-
- if (isset($_COOKIE['cookie_jar']) && ($_COOKIE['cookie_jar'] || is_file($_COOKIE['cookie_jar'])))
- {
- $params[CURLOPT_COOKIEFILE] = $_COOKIE['cookie_jar'];
- }
- else
- {
- $cookie_jar = tempnam($cookie_path, 'cookie');
- $params[CURLOPT_COOKIEJAR] = $cookie_jar;
- setcookie('cookie_jar', $cookie_jar);
- }
- curl_setopt_array($ch, $params);
- $content = curl_exec($ch);
-
-
- echo '
- '; echo $content;
- /*
-
- echo '
- --------------------------------------------------------------------------------
- ';
- $nexturl = 'http://****.com/test';
- $params[CURLOPT_URL] = $nexturl;
- $params[CURLOPT_POSTFIELDS] = '';
- curl_setopt_array($ch, $params);
- $content = curl_exec($ch);
- echo $content;
-
- */
- curl_close($ch);
-
- ?>
注:如果遇到无法请求https站点的情况,可能是因为无法验证证书或者域名,只要在curl_setopt_array前增加以下两项就可以了,代码如下:
$params[CURLOPT_SSL_VERIFYPEER] = false;
$params[CURLOPT_SSL_VERIFYHOST] = false; |