亚洲av成人无遮挡网站在线观看,少妇性bbb搡bbb爽爽爽,亚洲av日韩精品久久久久久,兔费看少妇性l交大片免费,无码少妇一区二区三区

  免費(fèi)注冊(cè) 查看新帖 |

Chinaunix

  平臺(tái) 論壇 博客 文庫
最近訪問板塊 發(fā)新帖
查看: 1560 | 回復(fù): 0
打印 上一主題 下一主題

python中scrapy的使用問題 [復(fù)制鏈接]

論壇徽章:
2
2015年迎新春徽章
日期:2015-03-04 10:01:44程序設(shè)計(jì)版塊每日發(fā)帖之星
日期:2015-06-28 22:20:00
跳轉(zhuǎn)到指定樓層
1 [收藏(0)] [報(bào)告]
發(fā)表于 2016-08-22 22:58 |只看該作者 |倒序?yàn)g覽
準(zhǔn)備通過scrapy自動(dòng)下載網(wǎng)站的小說鏈接:代碼如下:

from scrapy.spiders import CrawlSpider,Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.selector import Selector
from teizi.items import TeiziItem
from scrapy import log

class XunduSpider(CrawlSpider):
    name="teizi"
    download_delay=1
    allowed_domains=['http://www.xunread.com/']
    start_urls=["http://www.xunread.com/article/8c39f5a0-ca54-44d7-86cc-148eee4d6615/index.shtml"]
    rules=[Rule(LinkExtractor(allow=('\d\.shtml')),callback='parse_item',follow=True)]

    def parse_item(self,response):
        log.msg("parse_item",level='INFO')
        item=TeiziItem
        sel=Selector(response)
        script_content = sel.xpath('//div[@id="content"]/script/div/text()').extract()
        script_title= sel.xpath('//div[@id="title"]/script/div/text()').extract()
        item['content']=[n.encode('utf-8') for n in script_content]
        item['title']=[n.encode('utf-8') for n in script_title]
        yield item

執(zhí)行結(jié)果如下:

C:\Users\Administrator\teizi>scrapy crawl teizi
C:\Users\Administrator\teizi\teizi\spiders\tiezi_spider.py:5: ScrapyDeprecationW
arning: Module `scrapy.log` has been deprecated, Scrapy now relies on the builti
n Python library for logging. Read the updated logging entry in the documentatio
n to learn more.
  from scrapy import log
2016-08-22 22:43:09 [scrapy] INFO: Scrapy 1.1.0 started (bot: teizi)
2016-08-22 22:43:09 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'te
izi.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['teizi.spiders'], 'BOT_
NAME': 'teizi', 'COOKIES_ENABLED': False, 'USER_AGENT': 'Mozilla/5.0 (Windows NT
6.1; rv:38.0) Gecko/20100101 Firefox/38.0'}
2016-08-22 22:43:10 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2016-08-22 22:43:10 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-08-22 22:43:11 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-08-22 22:43:11 [scrapy] INFO: Enabled item pipelines:
['teizi.pipelines.TeiziPipeline']
2016-08-22 22:43:11 [scrapy] INFO: Spider opened
2016-08-22 22:43:11 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 i
tems (at 0 items/min)
2016-08-22 22:43:11 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-08-22 22:43:12 [scrapy] DEBUG: Crawled (200) <GET http://www.xunread.com/ro
bots.txt> (referer: None)
2016-08-22 22:43:14 [scrapy] DEBUG: Crawled (200) <GET http://www.xunread.com/ar
ticle/8c39f5a0-ca54-44d7-86cc-148eee4d6615/index.shtml> (referer: None)
2016-08-22 22:43:14 [scrapy] DEBUG: Filtered offsite request to 'www.xunread.com
': <GET http://www.xunread.com/article/8 ... c-148eee4d6615/1.sh
tml>
2016-08-22 22:43:14 [scrapy] INFO: Closing spider (finished)
2016-08-22 22:43:14 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 556,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 44647,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 8, 22, 14, 43, 14, 353000),
'log_count/DEBUG': 4,
'log_count/INFO': 7,
'offsite/domains': 1,
'offsite/filtered': 657,
'request_depth_max': 1,
'response_received_count': 2,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2016, 8, 22, 14, 43, 11, 360000)}
2016-08-22 22:43:14 [scrapy] INFO: Spider closed (finished)

從打印來看,開始第一個(gè)頁面下載的時(shí)候,就馬上停止了,似乎并沒有進(jìn)行parse_item這個(gè)函數(shù),請(qǐng)各位看下原因是什么。謝謝
您需要登錄后才可以回帖 登錄 | 注冊(cè)

本版積分規(guī)則 發(fā)表回復(fù)

  

北京盛拓優(yōu)訊信息技術(shù)有限公司. 版權(quán)所有 京ICP備16024965號(hào)-6 北京市公安局海淀分局網(wǎng)監(jiān)中心備案編號(hào):11010802020122 niuxiaotong@pcpop.com 17352615567
未成年舉報(bào)專區(qū)
中國互聯(lián)網(wǎng)協(xié)會(huì)會(huì)員  聯(lián)系我們:huangweiwei@itpub.net
感謝所有關(guān)心和支持過ChinaUnix的朋友們 轉(zhuǎn)載本站內(nèi)容請(qǐng)注明原作者名及出處

清除 Cookies - ChinaUnix - Archiver - WAP - TOP