使用phantomJS 模拟JS

来源:互联网 发布:用友成本核算软件 编辑:程序博客网 时间:2024/06/04 00:42

简介

  • PhantomJS (phantomjs.org) is a headless WebKit scriptable with JavaScript. The latest stable release is version 2.1.
    –来自https://github.com/ariya/phantomjs
  • PhantomJS是一个无界面的,可脚本编程的WebKit浏览器引擎。它原生支持多种web 标准:DOM 操作,CSS选择器,JSON,Canvas 以及SVG
  • 问:页面是JS渲染的该怎么办呢?答:可以解决

安装

来源于官方文档:http://phantomjs.org/download.html 题外话:ubuntu上可以直接安装phantojs

sudo apt-get install phantomjs

Windows

Download phantomjs-2.1.1-windows.zip (17.4 MB) and extract (unzip) the content.

The executable phantomjs.exe is ready to use.

Note: For this static build, the binary is self-contained with no external dependency. It will run on a fresh install of Windows Vista or later versions. There is no requirement to install Qt, WebKit, or any other libraries.

Mac OS X

Download phantomjs-2.1.1-macosx.zip (16.4 MB) and extract (unzip) the content.

Note: For this static build, the binary is self-contained with no external dependency. It will run on a fresh install of OS X 10.7 (Lion) or later versions. There is no requirement to install Qt or any other libraries.

Linux 64-bit

Download phantomjs-2.1.1-linux-x86_64.tar.bz2 (22.3 MB) and extract the content.

Note: For this static build, the binary is self-contained. There is no requirement to install Qt, WebKit, or any other libraries. It however still relies on Fontconfig (the package fontconfig or libfontconfig, depending on the distribution). The system must have GLIBCXX_3.4.9 and GLIBC_2.7.

Linux 32-bit

Download phantomjs-2.1.1-linux-i686.tar.bz2 (23.0 MB) and extract the content.

Note: For this static build, the binary is self-contained. There is no requirement to install Qt, WebKit, or any other libraries. It however still relies on Fontconfig (the package fontconfig or libfontconfig, depending on the distribution). The system must have GLIBCXX_3.4.9 and GLIBC_2.7.

FreeBSD

Binary packages are available via pkg:

$ sudo pkg install phantomjs

Source Code

Check the official git repository github.com/ariya/phantomjs.

相关方法

helloworld

新建helloworld.js 编辑如下:

console.log('Hello, world!');phantom.exit(); //结束时候使用,不然程序不会结束,非常重要
 phantomjs hello.js

The output is:

Hello, world!

Page Loading

网页可以通过创建网页对象来加载,分析和渲染。

以下脚本演示了最简单的页面对象的使用。 它加载example.com,然后将其保存为一个图像,example.png放在脚本运行的相同目录中。

var page = require('webpage').create();page.open('http://example.com', function(status) {  console.log("Status: " + status);  if(status === "success") {    page.render('example.png');  }  phantom.exit();});

由于其渲染功能,PhantomJS可用于捕捉网页,本质上是采取内容的屏幕截图。

以下loadspeed.js脚本加载指定的URL(不要忘记http协议),并测量加载它的时间。

var page = require('webpage').create(),  system = require('system'),  t, address;if (system.args.length === 1) {  console.log('Usage: loadspeed.js <some URL>');  phantom.exit();}t = Date.now();address = system.args[1];page.open(address, function(status) {  if (status !== 'success') {    console.log('FAIL to load the address');  } else {    t = Date.now() - t;    console.log('Loading ' + system.args[1]);    console.log('Loading time ' + t + ' msec');  }  phantom.exit();});

使用以下命令运行脚本:

phantomjs loadspeed.js http://www.google.com

它输出的东西类似于:

加载http://www.google.com加载时间719毫秒

Code Evaluation

代码评估
要评估网页上下文中的JavaScript代码,请使用evaluate()函数。执行是“沙盒”,代码无法访问任何JavaScript对象和变量在自己的页面上下文之外。可以从evaluate()返回一个对象,但它仅限于简单对象,不能包含函数或闭包。

这是一个显示网页标题的例子:

var page = require'webpage'.create();page.open(url,functionstatus){  var title = page.evaluate(function(){    返回document.title;  });  console.log('页面标题是'+title);  phantom.exit();});

任何来自网页的控制台消息(包括evaluate()中的代码)都不会默认显示。要覆盖此行为,请使用onConsoleMessage回调。前面的例子可以被重写为:

var page = require('webpage').create();page.onConsoleMessage = function(msg) {  console.log('Page title is ' + msg);};page.open(url, function(status) {  page.evaluate(function() {    console.log(document.title);  });  phantom.exit();});

由于脚本是在Web浏览器上运行的,因此标准的DOM脚本和CSS选择器可以正常工作。它使PhantomJS适合执行各种页面自动化任务。

网络请求和响应监听

当页面从远程服务器请求资源时,可以通过onResourceRequested和onResourceReceived回调跟踪请求和响应。 这在示例netlog.js中演示:

var page = require'webpage').create();page.onResourceRequested = functionrequest){   console.log('Request'+ JSON.stringify(request,undefined4));};page.onResourceReceived = functionresponse){   console.log('Receive'+ JSON.stringify(response,undefined4));};page.open(URL);

模拟请求头

var page = require('webpage').create();console.log('The default user agent is ' + page.settings.userAgent);page.settings.userAgent = 'SpecialAgent';page.open('http://www.httpuseragent.org', function(status) {  if (status !== 'success') {    console.log('Unable to access network');  } else {    var ua = page.evaluate(function() {      return document.getElementById('myagent').textContent;    });    console.log(ua);  }  phantom.exit();});

运行输出:

The default user agent is Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1null

More

http://phantomjs.org/quick-start.html 快速入门
http://phantomjs.org/examples/ 官方栗子
http://phantomjs.org/page-automation.html webpage模块的其他方法
https://github.com/ariya/phantomjs phantomjs的源代码托管于github