爬虫分析利器网,找到最热门的利器
Chen Hao posted on 18 Feb 2017关注利器这个网站有些时间了,某天突然想把里面推荐的利器写个爬虫程序爬下来然后做个分析。找个周末,就把这个事给做了,然后有了这个MVP (minimum variable product)。爬虫用python强大的scrapy包完成,代码放在github。这里只是初步的分析,有很多地方需要改善,同时也有很多有意思的事情可以做,会在这里陆续更新。
## loading data, file "liqi_data_tidy.csv" can be found in above github link
library(dplyr)
library(wordcloud)
library(ggplot2)
liqi_data <- read.csv(file="liqi_data_tidy.csv", header = TRUE, row.names = 1, stringsAsFactors = FALSE)
截至到2017年2月18号,爬虫出来总共180 位利器分享者,分享工具总计3450个。
1. 分享达人榜单
fxdr <- liqi_data %>%
group_by(author) %>%
summarise(toolCounts = n(), source = unique(source)) %>%
arrange(desc(toolCounts))
分享达人前二十名
排名标准:推荐的利器数量越多,排名越靠前
kable(fxdr[1:20, ])
author | toolCounts | source |
---|---|---|
刘斌 | 116 | http://liqi.io/liubin/ |
有才 | 81 | http://liqi.io/youcai/ |
柳东原 | 81 | http://liqi.io/liudongyuan/ |
李大锤 | 74 | http://liqi.io/lidachui/ |
Shrugged | 73 | http://liqi.io/shrugged/ |
韩金乌 | 72 | http://liqi.io/hanjinwu/ |
w | 66 | http://liqi.io/w-wanqu/ |
Allen | 63 | http://liqi.io/allen/ |
曹舒旻 | 62 | http://liqi.io/caoshumin/ |
Chris Xia | 58 | http://liqi.io/chris-xia/ |
可可苏玛 | 58 | http://liqi.io/cocosuma/ |
Odding | 57 | http://liqi.io/odding/ |
擎天 | 57 | http://liqi.io/qingtian/ |
月野耕 | 56 | http://liqi.io/yueyegeng/ |
濛子 | 55 | http://liqi.io/mengzi/ |
吴涛 | 54 | http://liqi.io/wutao/ |
零力 | 52 | http://liqi.io/lingli/ |
任宁 | 49 | http://liqi.io/renning/ |
大狗熊 | 49 | http://liqi.io/bearbig/ |
江宏 | 49 | http://liqi.io/jianghong/ |
分享达人云图
par(family = "STHeiti")
wordcloud(fxdr$author, fxdr$toolCounts, random.order = FALSE, colors=brewer.pal(8, "Dark2"))
2. 利器排行
lqph <- liqi_data %>%
group_by(tools) %>%
summarise(toolCounts = n(), link = links[1], source = source[1]) %>%
arrange(desc(toolCounts))
利器推荐榜前20名
排名标准:被推荐的次数越多,排名越靠前
用柱状图来看看:
ggplot(lqph[1:20, ], aes(x = factor(tools, levels = lqph$tools[1:20]), y = toolCounts)) + geom_bar(stat = "identity", fill="lightgreen") + theme_bw() + theme(axis.text.x = element_text(angle = 90, hjust = 1, face = "bold")) + geom_text(stat='identity',aes(label=toolCounts), vjust=-0.1) + xlab("") + ylab("sharetimes")
详情:
kable(lqph[1:20, ])
tools | toolCounts | link | source |
---|---|---|---|
MacBook Pro | 51 | http://www.apple.com/macbook-pro/ | http://liqi.io/zhangmiao/ |
Evernote | 41 | https://evernote.com/ | http://liqi.io/yueyegeng/ |
iPhone 6 | 28 | http://www.apple.com/shop/buy-iphone/iphone6 | http://liqi.io/ivanzhao/ |
Sketch | 27 | https://www.sketchapp.com/ | http://liqi.io/xingmei/ |
Apple Watch | 24 | http://www.apple.com/watch/ | http://liqi.io/zhangmiao/ |
Dropbox | 24 | https://www.dropbox.com/ | http://liqi.io/liguoyu/ |
Chrome | 23 | https://www.google.com/chrome/browser/desktop/index.html | http://liqi.io/wutao/ |
MacBook Air | 23 | http://www.apple.com/cn/macbook-air/ | http://liqi.io/liumengxi/ |
Slack | 23 | https://slack.com/ | http://liqi.io/ivanzhao/ |
Photoshop | 22 | http://www.adobe.com/cn/products/cs6/photoshop.html | http://liqi.io/haoxiaohao/ |
Trello | 22 | https://trello.com/ | http://liqi.io/yueyegeng/ |
Kindle | 21 | https://www.amazon.cn/Kindle%E5%95%86%E5%BA%97/b?node=116087071 | http://liqi.io/yueyegeng/ |
iPhone 6s | 19 | http://www.apple.com/shop/buy-iphone/iphone6s | http://liqi.io/zhangmiao/ |
Ulysses | 18 | https://itunes.apple.com/cn/app/ulysses/id623795237?mt=12 | http://liqi.io/zhangmiao/ |
1Password | 17 | https://1password.com/ | http://liqi.io/liguoyu/ |
Alfred | 17 | https://www.alfredapp.com/ | http://liqi.io/liguoyu/ |
iPhone | 17 | http://www.apple.com/iphone/ | http://liqi.io/xingmei/ |
Keynote | 17 | http://www.apple.com/keynote/ | http://liqi.io/liguoyu/ |
Xcode | 17 | https://developer.apple.com/xcode/cn/ | http://liqi.io/ouluhai/ |
iPad | 16 | http://www.apple.com/ipad/ | http://liqi.io/songshaopeng/ |
利器云图
被推荐两次以上的利器总共有229个,放在云图里如下
par(family = "STHeiti")
wordcloud(lqph$tools, lqph$toolCounts, min.freq=3, random.order = FALSE, colors=brewer.pal(8, "Dark2"))
看看那些冷门的推荐利器
那些只被推荐一次的利器总共有2996个, 随机挑选20个来看看:
set.seed(20)
lmlq <- lqph[lqph$toolCounts == 1, ]
lmlq_count <- nrow(lmlq)
kable(lmlq[sample(1:lmlq_count, 20), c(1,2,4)])
tools | toolCounts | source |
---|---|---|
樱木A4 透写台 | 1 | http://liqi.io/shine/ |
单行道 | 1 | http://liqi.io/miyu/ |
GoPro Hero 4 Silver | 1 | http://liqi.io/bearbig/ |
PowerBook G4 667 | 1 | http://liqi.io/gongchen/ |
论中国 | 1 | http://liqi.io/wangtao/ |
钟颖 | 1 | http://liqi.io/zhongying/ |
Aquamacs | 1 | http://liqi.io/yedingding/ |
AirPort Time Capsule | 1 | http://liqi.io/guling/ |
Hue | 1 | http://liqi.io/liuchengyin/ |
JPEGmini | 1 | http://liqi.io/tangshenggang/ |
X500 | 1 | http://liqi.io/lvyesu/ |
养猫 | 1 | http://liqi.io/sunqi/ |
《Bakuman》 | 1 | http://liqi.io/lvmunan/ |
三星 WD80J7260GX/SC 洗烘一体机 | 1 | http://liqi.io/guotingting/ |
Dell UltraSharp U2414H 23.8 inch Widescreen IPS LCD Monitor | 1 | http://liqi.io/maobojue/ |
Moto X | 1 | http://liqi.io/cuiqiwen/ |
http://www.mobile-patterns.com/ | 1 | http://liqi.io/duanxianzhou/ |
Ballpark | 1 | http://liqi.io/duxiao/ |
G胖 | 1 | http://liqi.io/haimaoluohewu/ |
小米智能家居套装 | 1 | http://liqi.io/cuiqiwen/ |