亚洲av成人无遮挡网站在线观看,少妇性bbb搡bbb爽爽爽,亚洲av日韩精品久久久久久,兔费看少妇性l交大片免费,无码少妇一区二区三区
Chinaunix
標題:
瀏覽器能打開的網址,用python卻采集不了
[打印本頁]
作者:
predatorymh
時間:
2011-08-17 11:18
標題:
瀏覽器能打開的網址,用python卻采集不了
如題:
http://www.bioonjob.com/hospital ... 0-8135-12A2C52548E3
用瀏覽器能正常打開,但是用python卻無法讀取其源代碼
我的代碼:
req=urllib2.Request(url)
req.add_header('User-Agent', "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)")
response=urllib2.urlopen(req)
作者:
ubuntu_mark
時間:
2011-08-17 15:02
首先,你這個url轉向別的的頁面了,再一個請求發(fā)過去,好像報500,需不需要登錄,你再試試
作者:
poper168
時間:
2011-08-17 17:35
res = opener.open(req)
File "C:\software\Python25\lib\urllib2.py", line 387, in open
response = meth(req, response)
File "C:\software\Python25\lib\urllib2.py", line 498, in http_response
'http', request, response, code, msg, hdrs)
File "C:\software\Python25\lib\urllib2.py", line 425, in error
return self._call_chain(*args)
File "C:\software\Python25\lib\urllib2.py", line 360, in _call_chain
result = func(*args)
File "C:\software\Python25\lib\urllib2.py", line 506, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error
復制代碼
挺奇怪,我抓過來也是提示500,瀏覽器打開正常
作者:
renxiao2003
時間:
2011-08-17 22:03
有沒有代理。
作者:
hipeace86
時間:
2011-08-19 09:37
本帖最后由 hipeace86 于 2011-08-19 09:59 編輯
import pycurl
c = pycurl.Curl()
c.setopt(pycurl.URL,"http://www.bioonjob.com/hospital/show.asp?id=5544112F-6542-4DD0-8135-12A2C52548E3")
import StringIO
b = StringIO.StringIO()
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.setopt(pycurl.FOLLOWLOCATION, 1)
c.setopt(pycurl.MAXREDIRS, 5)
c.perform()
print b.getvalue().decode('gb2312')
復制代碼
表示能采集
作者:
predatorymh
時間:
2011-08-23 11:34
樓上的大俠,為什么用pycurl就能采集呢?
作者:
newcnad
時間:
2011-08-23 12:33
其實是因為這個頁面本身返回的就是 500 Internal Server Error ,在瀏覽器中訪問返回的狀態(tài)也是500(雖然有部分內容)
作者:
predatorymh
時間:
2011-08-23 13:53
沒想到urllib2怎么都采不到的網址用pycurl能夠輕松采到啊,我平時用的都是urllib,看來以后要多研究研究pycurl了
作者:
descusr
時間:
2011-11-18 15:53
www.samsung.com
這個網站用pycurl也采集不了,
html = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(c.URL, url.encode('utf-8') if type(url) is unicode else url )
c.setopt(pycurl.URL,str(url))
c.setopt(pycurl.WRITEFUNCTION, html.write)
c.setopt(pycurl.NOBODY,0)
c.setopt(pycurl.FOLLOWLOCATION, 0)
c.setopt(pycurl.MAXREDIRS, 5)
c.setopt(pycurl.CONNECTTIMEOUT, 60)
c.setopt(pycurl.TIMEOUT, 300)
#c.setopt(pycurl.USERAGENT, "Mozilla/5.0 (Windows; U;compatible; MSIE 8.0; Windows NT 6.1; SV1; .NET CLR 1.1.4322)")
USER_AGENT = 'Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.9.0.5) Gecko/2008121622 Ubuntu/8.10 (intrepid) Firefox/3.0.5'
c.setopt(pycurl.USERAGENT, USER_AGENT)
c.perform()
ret = html.getvalue()
得到的ret為None!!!好郁悶~~
歡迎光臨 Chinaunix (http://www.72891.cn/)
Powered by Discuz! X3.2