亚洲av成人无遮挡网站在线观看,少妇性bbb搡bbb爽爽爽,亚洲av日韩精品久久久久久,兔费看少妇性l交大片免费,无码少妇一区二区三区

Chinaunix

標(biāo)題: 使用beautfulSoup抓取網(wǎng)頁(yè)數(shù)據(jù)并寫(xiě)入txt文件失敗 [打印本頁(yè)]

作者: maple412    時(shí)間: 2015-12-11 22:08
標(biāo)題: 使用beautfulSoup抓取網(wǎng)頁(yè)數(shù)據(jù)并寫(xiě)入txt文件失敗
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import urllib2
import re

link=''
path=r'E:/pydownload/1.txt'
f=open(path,'wb+')
for i in range(1,150):
    print "it is download the %d page:" % i
    new=link + str(i) + '.'+'shtml'
    req=urllib2.Request(new)
    fd=urllib2.urlopen(req)
    soup=BeautifulSoup(fd.read(),from_encoding="utf-8"
    ret=soup.find(id="content_1"
    for r in ret.stripped_strings:
        f.write(r)

f.close()
網(wǎng)頁(yè)的內(nèi)容是中文的。提示如下錯(cuò)誤:
Traceback (most recent call last):
  File "E:\py_prj\test1.py", line 17, in <module>
    f.write(r)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(12
作者: ma__    時(shí)間: 2015-12-12 00:53
  1. reload(sys)
  2. sys.setdefaultencoding( "utf-8" )
復(fù)制代碼

作者: maple412    時(shí)間: 2015-12-14 16:40
回復(fù) 2# ma__
沒(méi)用,一樣的效果
作者: ma__    時(shí)間: 2015-12-14 20:40
  1. # -*- coding: utf-8 -*-
  2. from bs4 import BeautifulSoup
  3. import urllib2
  4. import re,sys
  5. reload(sys)
  6. sys.setdefaultencoding( "utf-8" )

  7. link='http:     //movie.douban.com/'
  8. path=r'1.txt'
  9. f=open(path,'wb+')
  10. # print "it is download the %d page:" % i
  11. # new=link + str(i) + '.'+'shtml'
  12. req=urllib2.Request(link)
  13. fd=urllib2.urlopen(req)
  14. soup=BeautifulSoup(fd.read(),from_encoding="utf-8")
  15. ret=soup.find(id="top-nav-appintro")
  16. for r in ret.stripped_strings:
  17.         f.write(r)
  18.         print r
  19. # f.close()
復(fù)制代碼
To get rid of this warning, change this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "lxml"

豆瓣
3.0
和有趣的人做有趣的事
掃碼直接下載
iPhone
·
Android
為什么我可以
作者: ma__    時(shí)間: 2015-12-14 20:53
本帖最后由 ma__ 于 2015-12-14 20:55 編輯

r是unicode
那你寫(xiě)之前加行
r=r.encode('utf-8','ignore')





歡迎光臨 Chinaunix (http://www.72891.cn/) Powered by Discuz! X3.2