python抓取bing主页背景图片
/**author: insun
title:python抓取bing主页背景图片
blog:http://yxmhero1989.blog.163.com/blog/static/112157956201311743439712/
**/
搜索巨头里面从来没有bing 最近看到一篇文章 说bing背景图还不错
的确还不赖 然后想用python练个手抓一抓
看源码 有g_img={url: 后面的url就是图片地址 点击右下角的上一页下一页可以换图片
FF中的FireBug没找出具体路径 那就HttpFox来抓个包吧
有一串json加载进了一张jpeg和相关信息
http://cn.bing.com/HPImageArchive.aspx?format=js&idx=0&n=1&nc=1361089515117&FORM=HYLH1
返回json格式:
{"images":[{"startdate":"20130216","fullstartdate":"201302161600","enddate":"20130217", "url":"http://s.cn.bing.net/az/hprichbg/rb/LongJi_ZH-CN8658435963_1366x768.jpg", "urlbase":"/az/hprichbg/rb/LongJi_ZH-CN8658435963", "copyright":"桂林龙脊梯田 (? Yoshinori Kuwahara/Flickr/Getty Images)", "copyrightlink":"http://cn.bing.com/search?q=%E9%BE%99%E8%84%8A%E6%A2%AF%E7%94%B0&go=&qs=bs&form=hpcapt", "wp":false,"hsh":"e688c3f17a0b57306642188adcbf2187","drk":1,"top":1,"bot":1, "hs":[{"desc":"童话故事中,莴苣姑娘可以放下长发作为绳索让王子爬入城堡,", "link":"http://cn.bing.com/search?q=%E9%BB%84%E6%B4%9B%E7%91%B6%E5%AF%A8+%E9%95%BF%E5%8F%91%E6%9D%91&go=&qs=bs&form=hphot1", "query":"而在龙脊梯田景区的黄洛瑶寨中,处处可见“莴苣姑娘”!","locx":11,"locy":41},{"desc":"这块土地上洒下了壮民和瑶民祖祖辈辈的血汗与生命,", "link":"http://cn.bing.com/images/search?q=%e9%be%99%e8%84%8a%e6%a2%af%e7%94%b0&FORM=hphot2","query":"而如今,它变成了妩媚潇洒的曲线世界——龙脊梯田。","locx":46,"locy":49},{"desc":"层层叠叠,色彩斑斓,规模宏大,气势磅礴,","link":"http://cn.bing.com/search?q=%E6%9E%81%E7%BE%8E%E4%BB%99%E5%A2%83+%E4%B8%AD%E5%9B%BD%E4%B8%83%E5%A4%A7%E6%A2%AF%E7%94%B0&go=&qs=bs&form=hphot3","query":"美若仙境的梯田,中国不只七座。","locx":60,"locy":42},{"desc":"“七星伴月”是龙脊梯田的精华,由一块月亮田和七块大小山包所组成,关于它形成的缘由,","link":"http://cn.bing.com/search?q=%E9%BE%99%E8%84%8A%E6%A2%AF%E7%94%B0+%E4%B8%83%E6%98%9F%E4%BC%B4%E6%9C%88%E7%9A%84%E4%BC%A0%E8%AF%B4&go=&qs=bs&form=hphot4","query":"流传着一个凄美的爱情故事……","locx":77,"locy":40}],"msg":[{"title":"今日图片故事","link":"http://cn.bing.com/search?q=%E9%BE%99%E8%84%8A%E6%A2%AF%E7%94%B0&go=&qs=bs&form=pgbar1","text":"生机盎然的龙脊梯田"},{"title":"看图片,学英语","link":"http://cn.bing.com/dict/search?q=%E6%A2%AF%E7%94%B0&go=&qs=n&form=pgbar2","text":"用英语说梯田"}]}], "tooltips":{"loading":"正在加载...","previous":"上一页","next":"下一页","walle":"此图片不能下载用作壁纸。","walls":"下载此图片。与 Facebook 连接以发挥必应 Bing 的最大功能。图片只能用作壁纸。"}}
原本我写了个python抓取http://cn.bing.com/ 这个页面的代码 只能抓取当天的那张图片
#!/usr/bin/env python # -*- coding:utf-8 -*- # -*- author:insun -*- # python抓取bing主页背景图片 import urllib,re,sys def get_bing_backphoto(): url = 'http://cn.bing.com/' html = urllib.urlopen(url).read() if not html: print 'open & read bing error!' return -1 reg = re.compile(";g_img={url:'(.*?)'",re.S) text = re.findall(reg,html) #http://s.cn.bing.net/az/hprichbg/rb/LongJi_ZH-CN8658435963_1366x768.jpg for imgurl in text: right = imgurl.rindex('/') savepath = imgurl.replace(imgurl[:right+1],'') urllib.urlretrieve(imgurl, savepath) get_bing_backphoto()
上面也可以参考:http://www.isayme.org/python-get-bing-day-pic.html
如今思路变了 可以抓ajax那个连接 根据idx为0-N的数字抓取以往的图片 链接上的参数n只能为1 要是传其他的话 他就一直返回今天的数据 想必写过程序的人都了解。
抓过来都不用python json处理了 因为已经read后已经是str型了 不信你type看看。
然后的代码就这样了 你也可以抓他的时间再加图片后面来记录图片是哪天的
#!/usr/bin/env python # -*- coding:utf-8 -*- # -*- author:insun -*- # python抓取bing主页所有背景图片 import urllib,re,sys,os def get_bing_backphoto(): if (os.path.exists('photos')== False): os.mkdir('photos') for i in range(0,1000): url = 'http://cn.bing.com/HPImageArchive.aspx?format=js&idx='+str(i) +'&n=1&nc=1361089515117&FORM=HYLH1' html = urllib.urlopen(url).read() if html == 'null': print 'open & read bing error!' sys.exit(-1) reg = re.compile('"url":"(.*?)","urlbase"',re.S) text = re.findall(reg,html) #http://s.cn.bing.net/az/hprichbg/rb/LongJi_ZH-CN8658435963_1366x768.jpg for imgurl in text: right = imgurl.rindex('/') name = imgurl.replace(imgurl[:right+1],'') savepath = 'photos/'+ name urllib.urlretrieve(imgurl, savepath) print name + ' save success!' get_bing_backphoto()
后来发现 idx为21的时候json数据就为null了 我设置了个1000的i真是杞人忧天加痴心妄想了
补充:Web开发 , Python ,