[乐意黎]phpQuery采集微信公众号文章乱码

来源：互联网发布：淘宝注册还要拨打号码编辑：程序博客网时间：2024/05/16 14:06

终于找到解决方案了，这是一个值得庆祝的事情....

原来是因为微信在源码中加入了防采集代码,把文章源码中的这一段去掉就可以了！

具体代码如下：

public function getCon(){
        header('Content-type: text/html; charset=utf-8');
        import('Vendor.QL.QueryList');
        $w_url=$_POST['wurl'];   //接收到的文章地址
//    测试文章地址
//        $w_url='http://mp.weixin.qq.com/s?__biz=MzA5NzQ5OTMxMA==&mid=2650621512&idx=1&sn=2059946e820805c0d62a450aa3af62be&chksm=88960789bfe18e9f47417eb45cd8efe458af9e93fea3e8e4e242ea2376fd3e4c69f5218293cb&scene=0#wechat_redirect';
//        echo "<script>alert('".$w_url."');</script>";
        $html = file_get_contents($w_url);  //获取文章源码并保存到参数中
//        echo "<script>alert('".$html."');</script>";
        $html = str_replace("<!--headTrap<body></body><head></head><html></html>-->", "", $html);  //去除微信中的抓取干扰代码
//        die($w_url);

//        var_dump($html);
        $data = \QueryList::Query($html,array(
            //采集规则库
            //'规则名' => array('jQuery选择器','要采集的属性'),
            'titleTag' => array('title','text'),
//            'title' => array('#activity-name','text'),
            'content' => array('body','text'),
//            'image' => array('img','src'),
            //微信规则
            'contentWx' => array('#js_content','text'),
//            'imageWx' => array('img','data-src'),
//            'conText' => array('.rich_media_content>p','text'),
        ))->data;
        foreach ($data as $k => $v) {
            $data[$k]['imageWx'] = $this->cut_str($v['imageWx'],'?',0);
        }
//打印结果
//        print_r($data);
        $this->assign('conD',$data);
        $this->display();

    }

静下心来好好敲代码，没有任何一项技能是不需要经过长年累月的琢磨就可以轻易成就的

原文地址： http://www.cnblogs.com/lens85/p/6007111.html

0 0