node.js 学习笔记003 :使用superagent和cheerio实现简单网页爬虫

来源：互联网发布：独立游戏开发者知乎编辑：程序博客网时间：2024/05/16 14:26

superagent能够实现主动发起get/post/delete等请求
cheerio则能够对请求结果进行解析，解析方式和jquery的解析方式几乎完全相同
superagent网址：http://visionmedia.github.io/superagent/
cheerio网址：https://github.com/cheeriojs/cheerio

1.安装superagent、cheerio

npm install superagent cheerio --save

2.功能实现

var express=require("express");var superagent=require("superagent");var cheerio=require("cheerio");var app=express();app.get("/",function(req,resp){    superagent.get("https://cnodejs.org/").end(function(error,data){        if(error){            console.log("error exception occured !");            return next(error);        }        var $=cheerio.load(data.text); //注意传递的是data.text而不是data本身        var arr=[];        $('#topic_list .topic_title').each(function(idx,element){            var $element=$(element);            arr.push({                "title":$element.attr("title"),                "href":$element.attr("href")            });        });        resp.send(arr);    });});app.listen(3000,function(req,resp){    console.log("server is running ......");    });

3.测试方法

http://localhost:3000

0 0