PHP中用file_get_contents抓取网页

来源：互联网发布：淘宝可以不发货吗编辑：程序博客网时间：2024/06/06 00:20

1.用 file_get_contents 访问 http 时取回响应头
file_get_contents('http://www.baidu.com/');
var_dump($http_response_header);
输出如下：
array(9) {
[0]=>
string(15) "HTTP/1.1 200 OK"
[1]=>
string(35) "Date: Sun, 13 Dec 2009 10:46:32 GMT"
[2]=>
string(15) "Server: BWS/1.0"
[3]=>
string(20) "Content-Length: 3520"
[4]=>
string(38) "Content-Type: text/html;charset=gb2312"
[5]=>
string(22) "Cache-Control: private"
[6]=>
string(38) "Expires: Sun, 13 Dec 2009 10:46:32 GMT"
[7]=>
string(121) "Set-Cookie: BAIDUID=0D6BDA29200E9DC5B9F4674B6884C9D1:FG=1; expires=Sun, 13-Dec-39 10:46:32 GMT; path=/; domain=.baidu.com"
[8]=>
string(39) "P3P: CP=" OTI DSP COR IVA OUR IND COM ""
}

在用 file_get_contents 访问 http 时，stream wrapper 会把响应头放到当前作用域下的 $http_response_header 数组变量里，详细可以看这儿。

2.向被请求的url中传递参数
test.php
<?php
$opts = array(
'http'=>array(
    'timeout'=>10,
    'header'=>"User-Agent: php\r\n" .
              "Cookie: foo=bar\r\n"
)
);
$context = stream_context_create($opts);
echo file_get_contents('http://localhost/research/test1.php', false, $context);

//用echo file_get_contents('test1.php', false, $context);无效，在这种情况下file_get_contents的第一个参数必须是url的形式

?>

test1.php
<?php
echo 'i am test1.php';
print("<pre>");
var_dump($_SERVER);
print("</pre>");
?>

输出如下，大家可以看到我们在请求前传递的两个参数在$_SERVER中已经输出来了，说明请求前传入的参数是有效的。
i am test1.php
array(26) {
["HTTP_HOST"]=>
string(9) "localhost"
["HTTP_USER_AGENT"]=>
string(3) "php"
["HTTP_COOKIE"]=>
string(7) "foo=bar"
["PATH"]=>
string(132) "C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program Files\Common Files\Thunder Network\KanKan\Codecs;E:\www\mysql\bin"
["SystemRoot"]=>
string(10) "C:\WINDOWS"
["COMSPEC"]=>
string(27) "C:\WINDOWS\system32\cmd.exe"
["PATHEXT"]=>
string(48) ".COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH"
["WINDIR"]=>
string(10) "C:\WINDOWS"
["SERVER_SIGNATURE"]=>
string(0) ""
["SERVER_SOFTWARE"]=>
string(31) "Apache/2.2.11 (Win32) PHP/5.2.8"
["SERVER_NAME"]=>
string(9) "localhost"
["SERVER_ADDR"]=>
string(9) "127.0.0.1"
["SERVER_PORT"]=>
string(2) "80"
["REMOTE_ADDR"]=>
string(9) "127.0.0.1"
["DOCUMENT_ROOT"]=>
string(8) "E:/myphp"
["SERVER_ADMIN"]=>
string(26) "admin@caihuafeng-PC.domain"
["SCRIPT_FILENAME"]=>
string(27) "E:/myphp/research/test1.php"
["REMOTE_PORT"]=>
string(4) "4617"
["GATEWAY_INTERFACE"]=>
string(7) "CGI/1.1"
["SERVER_PROTOCOL"]=>
string(8) "HTTP/1.0"
["REQUEST_METHOD"]=>
string(3) "GET"
["QUERY_STRING"]=>
string(0) ""
["REQUEST_URI"]=>
string(19) "/research/test1.php"
["SCRIPT_NAME"]=>
string(19) "/research/test1.php"
["PHP_SELF"]=>
string(19) "/research/test1.php"
["REQUEST_TIME"]=>
int(1260702914)
}

关于file_get_contents的超时处理及发起post请求可以看这儿，我在这里就不赘述了。

参考：

http://iamcaihuafeng.blog.sohu.com/139517344.html
http://syre.blogbus.com/logs/41894450.html
http://a138s.blog.163.com/blog/static/31474077200911672043103/

0 0