Logging

scrapy.log及其函数已经弃用,现在主张显式地调用Python标准的logging。请继续阅读以了解新logging系统的更多信息。

Scrapy使用Python内置的logging系统记录事件日志。我们会提供一些简单的例子来让你开始,但对于更高级的用例,强烈建议彻底阅读其文档。

Logging原生即可工作,并可以用Logging的设置中列出的Scrapy设置在某种程度上进行配置。

Scrapy调用scrapy.utils.log.configure_logging()设置一些合理的默认值,并在运行命令时处理Logging的设置中的这些设置 ,所以如果你像从脚本运行Scrapy中所述的一样从脚本运行Scrapy,建议手动调用它。

Log的级别

Python的内置logging定义5个不同的级别来表示一个给定日志消息的严重性。下面是标准的级别,以降序排列:

  1. logging.CRITICAL - for critical errors (highest severity)
  2. logging.ERROR - for regular errors
  3. logging.WARNING - for warning messages
  4. logging.INFO - for informational messages
  5. logging.DEBUG - for debugging messages (lowest severity)

如何log消息

下面是如何使用logging.WARNING级别来记录一条信息的简单例子:

import logging
logging.warning("This is a warning")

输出日志信息到标准的5个级别中的任何一个都有快捷方式, 有一个接收给定日志级别作为参数的通用的logging.log方法。如果你需要这样做,最后一个例子可以改写为︰

import logging
logging.log(logging.WARNING, "This is a warning")

在此种情况下,你可以创建不同的"logger"来封装消息 (例如,常见的做法是为每个模块创建不同的logger)。这些logged可以单独配置,而且它们允许按层级构建。

上面的例子在幕后使用root logger,它是一个顶级的logger,所有消息被都传播给它(除非另有指明)。使用logging只是明确获取root logger的快捷方式,所以上面的代码片段也等同于︰

import logging
logger = logging.getLogger()
logger.warning("This is a warning")

只要通过logging.getLogger函数获取名称,你可以用获取一个不同的logger

import logging
logger = logging.getLogger('mycustomlogger')
logger.warning("This is a warning")

Finally, you can ensure having a custom logger for any module you’re working on by using the __name__ variable, which is populated with current module’s path:

import logging
logger = logging.getLogger(__name__)
logger.warning("This is a warning")

See also

Module logging, HowTo
Basic Logging Tutorial
Module logging, Loggers
Further documentation on loggers

Logging from Spiders

Scrapy provides a logger within each Spider instance, that can be accessed and used like this:

import scrapy

class MySpider(scrapy.Spider):

    name = 'myspider'
    start_urls = ['http://scrapinghub.com']

    def parse(self, response):
        self.logger.info('Parse function called on %s', response.url)

这个logger使用Spider的名字创建,但你可以使用任何自定义的Python logger。例如︰

import logging
import scrapy

logger = logging.getLogger('mycustomlogger')

class MySpider(scrapy.Spider):

    name = 'myspider'
    start_urls = ['http://scrapinghub.com']

    def parse(self, response):
        logger.info('Parse function called on %s', response.url)

Logging configuration

Loggers on their own don’t manage how messages sent through them are displayed. For this task, different “handlers” can be attached to any logger instance and they will redirect those messages to appropriate destinations, such as the standard output, files, emails, etc.

默认情况下,Scrapy基于下面的设置为root logger设置和配置一个handler。

Logging settings

These settings can be used to configure the logging:

第一对设置定义日志消息的目的地。If LOG_FILE is set, messages sent through the root logger will be redirected to a file named LOG_FILE with encoding LOG_ENCODING. If unset and LOG_ENABLED is True, log messages will be displayed on the standard error. Lastly, if LOG_ENABLED is False, there won’t be any visible log output.

LOG_LEVEL determines the minimum level of severity to display, those messages with lower severity will be filtered out. It ranges through the possible levels listed in Log levels.

LOG_FORMAT and LOG_DATEFORMAT specify formatting strings used as layouts for all messages. Those strings can contain any placeholders listed in logging’s logrecord attributes docs and datetime’s strftime and strptime directives respectively.

Command-line options

所有命令都有命令行参数,你可以使用它们来覆盖Scrapy关于logging的某些设置。

See also

Module logging.handlers
Further documentation on available handlers

scrapy.utils.log module