Go实战--golang中的JQUERY(PuerkitoBio/goquery、从html中获取链接)
来源:互联网 发布:2017最新人工智能龙头 编辑:程序博客网 时间:2024/04/28 03:44
生命不止,继续 go go go !!!
jQuery应该说是家喻户晓。
jQuery is a fast, small, and feature-rich JavaScript library. It makes things like HTML document traversal and manipulation, event handling, animation, and Ajax much simpler with an easy-to-use API that works across a multitude of browsers. With a combination of versatility and extensibility, jQuery has changed the way that millions of people write JavaScript.
jQuery 是一个 JavaScript 函数库。
jQuery 库包含以下特性:
HTML 元素选取
HTML 元素操作
CSS 操作
HTML 事件函数
JavaScript 特效和动画
HTML DOM 遍历和修改
AJAX
Utilities
在golang的世界中,
github.com/PuerkitoBio/goquery 这个库就实现了类似 jQuery 的功能,让我们能方便的使用 Go 语言操作 HTML 文档。
记住,如果使用golang做爬虫方面的事儿,你可能会用到goquery啊!
参考:
http://blog.studygolang.com/2015/04/go-jquery-goquery/
PuerkitoBio/goquery
github地址:
https://github.com/PuerkitoBio/goquery
Star: 4833
描述:
A little like that j-thing, only in Go.
获取:
go get github.com/PuerkitoBio/goquery
创建 Document 对象
goquery 暴露了两个结构体:Document 和 Selection.
Document 表示一个 HTML 文档,Selection 用于像 jQuery 一样操作,支持链式调用。goquery 需要指定一个 HTML 文档才能继续后续的操作。
查找到指定节点
Selection 有一系列类似 jQuery 的方法,Document 结构体内嵌了 *Selection,因此也能直接调用这些方法。主要的方法是 Selection.Find(selector string),传入一个选择器,返回一个新的,匹配到的 *Selection,所以能够链式调用。
属性操作
经常需要获取一个标签的内容和某些属性值,使用 goquery 可以很容易做到
官方例子
package mainimport ( "fmt" "log" "github.com/PuerkitoBio/goquery")func ExampleScrape() { doc, err := goquery.NewDocument("http://metalsucks.net") if err != nil { log.Fatal(err) } // Find the review items doc.Find(".sidebar-reviews article .content-block").Each(func(i int, s *goquery.Selection) { // For each item found, get the band and title band := s.Find("a").Text() title := s.Find("i").Text() fmt.Printf("Review %d: %s - %s\n", i, band, title) })}func main() { ExampleScrape()}
输出:
Review 0: Cavalera Conspiracy - PsychosisReview 1: Cannibal Corpse - Red Before BlackReview 2: All Pigs Must Die - Hostage AnimalReview 3: Electric Wizard - Wizard Bloody WizardReview 4: Trivium - The Sin and the Sentence
获取所有links
import ( "fmt" "log" "github.com/PuerkitoBio/goquery")func linkScrape() { doc, err := goquery.NewDocument("http://jonathanmh.com") if err != nil { log.Fatal(err) } doc.Find("body a").Each(func(index int, item *goquery.Selection) { linkTag := item link, _ := linkTag.Attr("href") linkText := linkTag.Text() fmt.Printf("Link #%d: '%s' - '%s'\n", index, linkText, link) })}func main() { linkScrape()}
输出:
Link #0: 'Skip to content' - '#content'Link #1: 'JonathanMH' - 'https://jonathanmh.com/'Link #2: 'Blog' - 'https://jonathanmh.com/category/blog/'Link #3: 'Hire Me' - 'https://jonathanmh.com/hire-me/'Link #4: 'About' - 'https://jonathanmh.com/about/'Link #5: 'twitter' - 'https://twitter.com/JonathanMH_com'Link #6: 'rss feed' - 'http://jonathanmh.com/feed/'Link #7: 'github' - 'https://github.com/JonathanMH'Link #8: 'stackoverflow' - 'http://stackoverflow.com/users/896285/jonathan-m-hethey'Link #9: 'instagram' - 'http://instagram.com/jonathanmh'Link #10: 'facebook' - 'https://www.facebook.com/pages/JonathanMH/159526834122370'Link #11: 'linkedin' - 'http://www.linkedin.com/in/jonathanmh'Link #12: 'hire me' - '/hire-me'Link #13: '' - 'https://twitter.com/JonathanMH_com'Link #14: '' - 'https://www.facebook.com/JonathanMH-159526834122370/'Link #15: '' - 'https://www.instagram.com/jonathanmh/'Link #16: '' - 'https://github.com/jonathanmh/'Link #17: 'Work every day like you just got fired' - 'https://jonathanmh.com/work-every-day-like-just-got-fired/'Link #18: 'Vue.js API Client / Single Page App (SPA) Tutorial' - 'https://jonathanmh.com/vue-js-api-client-single-page-app-spa-tutorial/'Link #19: 'Building a Simple Searchable API with Express (Backend)' - 'https://jonathanmh.com/building-a-simple-searchable-api-with-express-backend/'Link #20: 'Music Monday: Doom Soundtrack' - 'https://jonathanmh.com/music-monday-doom-soundtrack/'Link #21: 'Brick by Brick' - 'https://jonathanmh.com/brick-by-brick/'Link #22: 'Taking Screenshots with Headless, The Chrome Debuggping Protocol (CDP) and Golang' - 'https://jonathanmh.com/taking-screenshots-headless-chrome-debuggping-protocol-cdp-golang/'Link #23: 'Firefox has re-joined the Browser Wars' - 'https://jonathanmh.com/firefox-re-joined-browser-wars/'Link #24: 'A Mastodon Review, is it the next Twitter / Facebook by the People?' - 'https://jonathanmh.com/mastodon-review-next-twitter-facebook-people/'Link #25: 'Testing Coin Hive Crowd Source Monero Mining' - 'https://jonathanmh.com/testing-coin-hive-crowd-source-monero-mining/'Link #26: 'Glass Half' - 'https://jonathanmh.com/glass-half/'Link #27: 'read older posts' - '/blog/page/2/'Link #28: 'twitter' - 'https://twitter.com/JonathanMH_com'Link #29: 'rss feed' - 'http://jonathanmh.com/feed/'Link #30: 'github' - 'https://github.com/JonathanMH'Link #31: 'stackoverflow' - 'http://stackoverflow.com/users/896285/jonathan-m-hethey'Link #32: 'instagram' - 'http://instagram.com/jonathanmh'Link #33: 'facebook' - 'https://www.facebook.com/pages/JonathanMH/159526834122370'Link #34: 'linkedin' - 'http://www.linkedin.com/in/jonathanmh'Link #35: '.htaccess' - 'https://jonathanmh.com/tag/htaccess/'Link #36: 'Adobe' - 'https://jonathanmh.com/tag/adobe/'Link #37: 'Android' - 'https://jonathanmh.com/tag/android/'Link #38: 'Arch Linux' - 'https://jonathanmh.com/tag/arch-linux/'Link #39: 'atom' - 'https://jonathanmh.com/tag/atom/'Link #40: 'bash' - 'https://jonathanmh.com/tag/bash/'Link #41: 'blogging' - 'https://jonathanmh.com/tag/blogging/'Link #42: 'Brackets' - 'https://jonathanmh.com/tag/brackets/'Link #43: 'cigtrack' - 'https://jonathanmh.com/tag/cigtrack/'Link #44: 'CodeIgniter' - 'https://jonathanmh.com/tag/codeigniter/'Link #45: 'CSS' - 'https://jonathanmh.com/tag/css/'Link #46: 'Digital Ocean' - 'https://jonathanmh.com/tag/digital-ocean/'Link #47: 'express.js' - 'https://jonathanmh.com/tag/express-js/'Link #48: 'facebook' - 'https://jonathanmh.com/tag/facebook/'Link #49: 'ghost' - 'https://jonathanmh.com/tag/ghost/'Link #50: 'git' - 'https://jonathanmh.com/tag/git/'Link #51: 'github' - 'https://jonathanmh.com/tag/github/'Link #52: 'gitlab' - 'https://jonathanmh.com/tag/gitlab/'Link #53: 'go' - 'https://jonathanmh.com/tag/go/'Link #54: 'golang' - 'https://jonathanmh.com/tag/golang/'Link #55: 'Google' - 'https://jonathanmh.com/tag/google/'Link #56: 'Gulp' - 'https://jonathanmh.com/tag/gulp/'Link #57: 'gvim' - 'https://jonathanmh.com/tag/gvim/'Link #58: 'JavaScript' - 'https://jonathanmh.com/tag/javascript/'Link #59: 'kickstarter' - 'https://jonathanmh.com/tag/kickstarter/'Link #60: 'Linux' - 'https://jonathanmh.com/tag/linux/'Link #61: 'markdown' - 'https://jonathanmh.com/tag/markdown/'Link #62: 'mindset' - 'https://jonathanmh.com/tag/mindset/'Link #63: 'MVC' - 'https://jonathanmh.com/tag/mvc/'Link #64: 'Nginx' - 'https://jonathanmh.com/tag/nginx/'Link #65: 'node.js' - 'https://jonathanmh.com/tag/node-js/'Link #66: 'npm' - 'https://jonathanmh.com/tag/npm/'Link #67: 'PHP' - 'https://jonathanmh.com/tag/php/'Link #68: 'plugin' - 'https://jonathanmh.com/tag/plugin/'Link #69: 'Raspberry PI' - 'https://jonathanmh.com/tag/raspberry-pi/'Link #70: 'SCSS' - 'https://jonathanmh.com/tag/scss/'Link #71: 'social media' - 'https://jonathanmh.com/tag/social-media/'Link #72: 'ssh' - 'https://jonathanmh.com/tag/ssh/'Link #73: 'Terminal' - 'https://jonathanmh.com/tag/terminal/'Link #74: 'toolbox' - 'https://jonathanmh.com/tag/toolbox/'Link #75: 'UberWriter' - 'https://jonathanmh.com/tag/uberwriter/'Link #76: 'Ubuntu' - 'https://jonathanmh.com/tag/ubuntu/'Link #77: 'vim' - 'https://jonathanmh.com/tag/vim/'Link #78: 'web crawling' - 'https://jonathanmh.com/tag/web-crawling/'Link #79: 'WordPress' - 'https://jonathanmh.com/tag/wordpress/'Link #80: 'Blog' - 'https://jonathanmh.com/category/blog/'Link #81: 'Hire Me' - 'https://jonathanmh.com/hire-me/'Link #82: 'About' - 'https://jonathanmh.com/about/'Link #83: 'twitter' - 'https://twitter.com/JonathanMH_com'Link #84: 'rss feed' - 'http://jonathanmh.com/feed/'Link #85: 'github' - 'https://github.com/JonathanMH'Link #86: 'stackoverflow' - 'http://stackoverflow.com/users/896285/jonathan-m-hethey'Link #87: 'instagram' - 'http://instagram.com/jonathanmh'Link #88: 'facebook' - 'https://www.facebook.com/pages/JonathanMH/159526834122370'Link #89: 'linkedin' - 'http://www.linkedin.com/in/jonathanmh'Link #90: 'JonathanMH' - 'https://jonathanmh.com/'Link #91: 'Proudly powered by WordPress' - 'https://wordpress.org/'
Convert All HTML Links to reStructuredText via goquery
package mainimport ( "os" "strings" "text/template" "github.com/PuerkitoBio/goquery")const rstLink = "`{{.Text}} <{{.Href}}>`_\n"type htmlLink struct { Text string Href string}func main() { url := "https://www.baidu.com" doc, err := goquery.NewDocument(url) if err != nil { panic(err) } tmpl := template.Must(template.New("test").Parse(rstLink)) doc.Find("a").Each(func(_ int, link *goquery.Selection) { text := strings.TrimSpace(link.Text()) href, ok := link.Attr("href") if ok { tmpl.Execute(os.Stdout, &htmlLink{text, href}) } })}
输出:
` </>`_`手写 <javascript:;>`_`拼音 <javascript:;>`_`关闭 <javascript:;>`_`百度首页 </>`_`设置 <javascript:;>`_`登录 <https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F>`_`新闻 <http://news.baidu.com>`_`hao123 <http://www.hao123.com>`_`地图 <http://map.baidu.com>`_`视频 <http://v.baidu.com>`_`贴吧 <http://tieba.baidu.com>`_`学术 <http://xueshu.baidu.com>`_`登录 <https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F>`_`设置 <http://www.baidu.com/gaoji/preferences.html>`_`更多产品 <http://www.baidu.com/more/>`_`新闻 <http://news.baidu.com/ns?cl=2&rn=20&tn=news&word=>`_`贴吧 <http://tieba.baidu.com/f?kw=&fr=wwwt>`_`知道 <http://zhidao.baidu.com/q?ct=17&pn=0&tn=ikaslist&rn=10&word=&fr=wwwt>`_`音乐 <http://music.baidu.com/search?fr=ps&ie=utf-8&key=>`_`图片 <http://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word=>`_`视频 <http://v.baidu.com/v?ct=301989888&rn=20&pn=0&db=0&s=25&ie=utf-8&word=>`_`地图 <http://map.baidu.com/m?word=&fr=ps01000>`_`文库 <http://wenku.baidu.com/search?word=&lm=0&od=0&ie=utf-8>`_`更多» <//www.baidu.com/more/>`_`把百度设为主页 <//www.baidu.com/cache/sethelp/help.html>`_`关于百度 <http://home.baidu.com>`_`About Baidu <http://ir.baidu.com>`_`百度推广 <http://e.baidu.com/?refer=888>`_`使用百度前必读 <http://www.baidu.com/duty/>`_`意见反馈 <http://jianyi.baidu.com/>`_`京公网安备11000002000001号 <http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000001>`_
- Go实战--golang中的JQUERY(PuerkitoBio/goquery、从html中获取链接)
- Go使用goquery获取url小实例
- Go实战--golang中使用go-spew(davecgh/go-spew)
- Go实战--golang中使用MongoDB(mgo)
- Go实战--golang中使用echo框架中的cors(labstack/echo、rs/cors)
- Go实战--golang中使用echo框架中的HTTP/2、Server Push(labstack/echo、golang.org/x/net/http2)
- Go实战--golang中使用JWT(JSON Web Token)
- Go实战--golang中读写文件的几种方式
- Go实战--golang中使用markdown(russross/blackfriday)
- Go实战--golang中读写文件的几种方式
- Go实战--golang中使用firebase实时数据库(zabawaba99/firego)
- Go实战--golang中使用RethinkDB(gorethink/gorethink.v3)
- Golang---goquery爬虫获取golang语言中文网页面信息并保存MySQL
- go/golang 中的import
- Go实战--gopherjs/gopherjs让你的golang代码在浏览器中执行(Golang转JavaScript)
- Go实战--golang中使用redis(redigo和go-redis/redis)
- Go实战--golang中生成读取二维码(skip2/go-qrcode和boombuler/barcode)
- Go实战--golang中使用WebSocket实时聊天室(gorilla/websocket、nkovacs/go-socket.io)
- 思归!
- 老弟,加油!
- Ubuntu 下如何查看已安装的软件
- 人不经历吃一堑,无法长一智!
- 不要逼男人在你和他妈妈之间选择
- Go实战--golang中的JQUERY(PuerkitoBio/goquery、从html中获取链接)
- 工作终于告一段落!
- iOS应用内支付(IAP)详解
- 坏脾气的小故事
- 又回北京啦!
- 远离癌症九项守则
- 中秋聚餐!
- 工作繁忙!
- 告一段落!