Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update Apify log formatter to contain logger name (#116)
## Description Since Actors can contain many different loggers it could be valuable to have a logger name in the log (at the beginning). ## Example (Scrapy Actor) ### Before ``` $ apify run --purge Info: All default local stores were purged. Run: /home/vdusek/Apify/actor-templates/templates/python-scrapy/.venv/bin/python3 -m src INFO Initializing actor... INFO System info ({"apify_sdk_version": "1.1.4", "apify_client_version": "1.4.1", "python_version": "3.11.5", "os": "linux"}) INFO Actor is being executed... INFO Scrapy 2.11.0 started (bot: titlebot) INFO Versions: lxml 4.9.3.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.11.5 (main, Aug 28 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)], pyOpenSSL 23.2.0 (OpenSSL 3.1.2 1 Aug 2023), cryptography 41.0.3, Platform Linux-6.5.5-200.fc38.x86_64-x86_64-with-glibc2.37 INFO Enabled addons: [] ({"crawler": "<scrapy.crawler.Crawler object at 0x7fc405aaf110>"}) INFO Telnet Password: 1d21357fcef1a014 INFO Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7fc405aaf110>"}) INFO Overridden settings: {'BOT_NAME': 'titlebot', 'DEPTH_LIMIT': 1, 'NEWSPIDER_MODULE': 'src.spiders', 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7', 'ROBOTSTXT_OBEY': True, 'SCHEDULER': 'src.apify.scheduler.ApifyScheduler', 'SPIDER_MODULES': ['src.spiders'], 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'} INFO Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats', 'src.apify.middlewares.ApifyRetryMiddleware'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7fc405aaf110>"}) INFO Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7fc405aaf110>"}) INFO Enabled item pipelines: ['src.apify.pipelines.ActorDatasetPushPipeline', 'src.pipelines.TitleItemPipeline'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7fc405aaf110>"}) INFO Spider opened ({"spider": "<TitleSpider 'title_spider' at 0x7fc403a0c290>"}) INFO Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) ({"spider": "<TitleSpider 'title_spider' at 0x7fc403a0c290>"}) INFO TelnetConsole starting on 6023 INFO Telnet console listening on 127.0.0.1:6023 ({"crawler": "<scrapy.crawler.Crawler object at 0x7fc405aaf110>"}) INFO TitleSpider is parsing <200 https://apify.com>... INFO TitleSpider is parsing <200 https://apify.com/templates>... INFO TitleSpider is parsing <200 https://apify.com/>... INFO TitleSpider is parsing <200 https://apify.com/enterprise>... INFO TitleSpider is parsing <200 https://crawlee.dev>... INFO TitleSpider is parsing <200 https://apify.com/store>... INFO TitleSpider is parsing <200 https://apify.com/actors>... INFO TitleSpider is parsing <200 https://docs.apify.com/>... INFO TitleSpider is parsing <200 https://apify.com/storage>... INFO TitleSpider is parsing <200 https://apify.com/proxy>... INFO TitleSpider is parsing <200 https://apify.com/integrations>... INFO TitleSpider is parsing <200 https://apify.com/data-for-generative-ai?ref=top_nav>... INFO TitleSpider is parsing <200 https://blog.apify.com/>... INFO TitleSpider is parsing <200 https://apify.com/partners>... INFO TitleSpider is parsing <200 https://apify.com/about>... INFO TitleSpider is parsing <200 https://apify.com/ideas>... INFO TitleSpider is parsing <200 https://apify.com/pricing>... INFO TitleSpider is parsing <200 https://docs.apify.com>... INFO TitleSpider is parsing <200 https://docs.apify.com/academy/web-scraping-for-beginners>... INFO TitleSpider is parsing <200 https://apify.com/>... INFO TitleSpider is parsing <200 https://apify.com/>... INFO TitleSpider is parsing <200 https://apify.com/>... INFO TitleSpider is parsing <200 https://apify.com/>... INFO TitleSpider is parsing <200 https://docs.apify.com/academy/apify-platform>... INFO TitleSpider is parsing <200 https://apify.com/>... INFO TitleSpider is parsing <200 https://apify.com/>... INFO TitleSpider is parsing <200 https://apify.com/partners/actor-developers>... INFO TitleSpider is parsing <200 https://apify.com/use-cases>... INFO TitleSpider is parsing <200 https://apify.com/data-for-generative-ai>... INFO TitleSpider is parsing <200 https://discord.com/invite/jyEM2PRvMU>... INFO TitleSpider is parsing <200 https://apify.com/product-matching-ai>... INFO Ignoring response <403 https://www.g2.com/products/apify/reviews>: HTTP status code is not handled or not allowed ({"spider": "<TitleSpider 'title_spider' at 0x7fc403a0c290>"}) INFO TitleSpider is parsing <200 https://apify.com/success-stories>... INFO TitleSpider is parsing <200 https://console.apify.com/sign-in>... INFO TitleSpider is parsing <200 https://apify.com/store/scrapers/universal-web-scrapers>... INFO TitleSpider is parsing <200 https://apify.com/>... INFO TitleSpider is parsing <200 https://apify.com/>... INFO TitleSpider is parsing <200 https://apify.com/enterprise>... INFO TitleSpider is parsing <200 https://apify.com/>... INFO TitleSpider is parsing <200 https://apify.com/streamers/youtube-scraper>... INFO Ignoring response <403 https://www.trustradius.com/products/apify/reviews>: HTTP status code is not handled or not allowed ({"spider": "<TitleSpider 'title_spider' at 0x7fc403a0c290>"}) INFO TitleSpider is parsing <200 https://console.apify.com>... INFO Ignoring response <403 https://crozdesk.com/it/platform-as-a-service-paas/apify>: HTTP status code is not handled or not allowed ({"spider": "<TitleSpider 'title_spider' at 0x7fc403a0c290>"}) INFO TitleSpider is parsing <200 https://console.apify.com/sign-up>... INFO TitleSpider is parsing <200 https://apify.com/terms-of-use>... INFO TitleSpider is parsing <200 https://apify.com/privacy-policy>... INFO TitleSpider is parsing <200 https://apify.com/quacker/twitter-scraper>... INFO TitleSpider is parsing <200 https://apify.com/cookie-policy>... INFO TitleSpider is parsing <200 https://apify.com/apify/cheerio-scraper>... INFO Ignoring response <403 https://www.capterra.com/reviews/150854/Apify>: HTTP status code is not handled or not allowed ({"spider": "<TitleSpider 'title_spider' at 0x7fc403a0c290>"}) INFO TitleSpider is parsing <200 https://apify.com/apify/web-scraper>... INFO TitleSpider is parsing <200 https://docs.apify.com/cli/>... INFO TitleSpider is parsing <200 https://apify.com/compass/crawler-google-places>... INFO TitleSpider is parsing <200 https://consent.youtube.com/ml?continue=https://www.youtube.com/apify?cbrd%3D1&gl=CZ&hl=en&cm=2&pc=yt&src=1>... INFO TitleSpider is parsing <200 https://apify.com/apify/puppeteer-scraper>... INFO TitleSpider is parsing <200 https://github.com/apify>... INFO TitleSpider is parsing <200 https://apify.com/junglee/amazon-crawler>... INFO TitleSpider is parsing <200 https://apify.com/voyager/booking-scraper>... INFO TitleSpider is parsing <200 https://help.apify.com/en/>... INFO TitleSpider is parsing <200 https://stackoverflow.com/questions/tagged/apify>... INFO Closing spider (finished) ({"spider": "<TitleSpider 'title_spider' at 0x7fc403a0c290>"}) INFO Dumping Scrapy stats: {'downloader/exception_count': 2, 'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 2, 'downloader/request_bytes': 21499, 'downloader/request_count': 84, 'downloader/request_method_count/GET': 84, 'downloader/response_bytes': 2156228, 'downloader/response_count': 84, 'downloader/response_status_count/200': 70, 'downloader/response_status_count/302': 4, 'downloader/response_status_count/308': 3, 'downloader/response_status_count/403': 6, 'downloader/response_status_count/404': 1, 'elapsed_time_seconds': 3.797489, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2023, 10, 2, 18, 8, 25, 118412, tzinfo=datetime.timezone.utc), 'httpcompression/response_bytes': 10134095, 'httpcompression/response_count': 73, 'httperror/response_ignored_count': 4, 'httperror/response_ignored_status_count/403': 4, 'item_scraped_count': 56, 'log_count/INFO': 71, 'memusage/max': 75239424, 'memusage/startup': 75239424, 'request_depth_max': 1, 'response_received_count': 77, 'robotstxt/forbidden': 2, 'robotstxt/request_count': 17, 'robotstxt/response_count': 17, 'robotstxt/response_status_count/200': 14, 'robotstxt/response_status_count/403': 2, 'robotstxt/response_status_count/404': 1, 'start_time': datetime.datetime(2023, 10, 2, 18, 8, 21, 320923, tzinfo=datetime.timezone.utc)} ({"spider": "<TitleSpider 'title_spider' at 0x7fc403a0c290>"}) INFO Spider closed (finished) ({"spider": "<TitleSpider 'title_spider' at 0x7fc403a0c290>"}) INFO (TCP Port 6023 Closed) INFO Exiting actor ({"exit_code": 0}) ``` ### After ``` $ apify run --purge Info: All default local stores were purged. Run: /home/vdusek/Apify/actor-templates/templates/python-scrapy/.venv/bin/python3 -m src [apify] INFO Initializing actor... [apify] INFO System info ({"apify_sdk_version": "1.1.4", "apify_client_version": "1.4.1", "python_version": "3.11.5", "os": "linux"}) [apify] INFO Actor is being executed... [scrapy.utils.log] INFO Scrapy 2.11.0 started (bot: titlebot) [scrapy.utils.log] INFO Versions: lxml 4.9.3.0, libxml2 2.10.3, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.11.5 (main, Aug 28 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)], pyOpenSSL 23.2.0 (OpenSSL 3.1.2 1 Aug 2023), cryptography 41.0.3, Platform Linux-6.5.5-200.fc38.x86_64-x86_64-with-glibc2.37 [scrapy.addons] INFO Enabled addons: [] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f9c770fac10>"}) [scrapy.extensions.telnet] INFO Telnet Password: 565a012bc27d0fc0 [scrapy.middleware] INFO Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f9c770fac10>"}) [scrapy.crawler] INFO Overridden settings: {'BOT_NAME': 'titlebot', 'DEPTH_LIMIT': 1, 'NEWSPIDER_MODULE': 'src.spiders', 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7', 'ROBOTSTXT_OBEY': True, 'SCHEDULER': 'src.apify.scheduler.ApifyScheduler', 'SPIDER_MODULES': ['src.spiders'], 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'} [scrapy.middleware] INFO Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats', 'src.apify.middlewares.ApifyRetryMiddleware'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f9c770fac10>"}) [scrapy.middleware] INFO Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f9c770fac10>"}) [scrapy.middleware] INFO Enabled item pipelines: ['src.apify.pipelines.ActorDatasetPushPipeline', 'src.pipelines.TitleItemPipeline'] ({"crawler": "<scrapy.crawler.Crawler object at 0x7f9c770fac10>"}) [scrapy.core.engine] INFO Spider opened ({"spider": "<TitleSpider 'title_spider' at 0x7f9c76fe43d0>"}) [scrapy.extensions.logstats] INFO Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) ({"spider": "<TitleSpider 'title_spider' at 0x7f9c76fe43d0>"}) [twisted] INFO TelnetConsole starting on 6023 [scrapy.extensions.telnet] INFO Telnet console listening on 127.0.0.1:6023 ({"crawler": "<scrapy.crawler.Crawler object at 0x7f9c770fac10>"}) [apify] INFO TitleSpider is parsing <200 https://apify.com>... [apify] INFO TitleSpider is parsing <200 https://apify.com/data-for-generative-ai?ref=top_nav>... [apify] INFO TitleSpider is parsing <200 https://apify.com/templates>... [apify] INFO TitleSpider is parsing <200 https://apify.com/enterprise>... [apify] INFO TitleSpider is parsing <200 https://apify.com/>... [apify] INFO TitleSpider is parsing <200 https://apify.com/integrations>... [apify] INFO TitleSpider is parsing <200 https://apify.com/storage>... [apify] INFO TitleSpider is parsing <200 https://apify.com/actors>... [apify] INFO TitleSpider is parsing <200 https://apify.com/proxy>... [apify] INFO TitleSpider is parsing <200 https://apify.com/partners>... [apify] INFO TitleSpider is parsing <200 https://apify.com/partners/actor-developers>... [apify] INFO TitleSpider is parsing <200 https://apify.com/data-for-generative-ai>... [apify] INFO TitleSpider is parsing <200 https://apify.com/product-matching-ai>... [apify] INFO TitleSpider is parsing <200 https://apify.com/use-cases>... [apify] INFO TitleSpider is parsing <200 https://apify.com/about>... [apify] INFO TitleSpider is parsing <200 https://apify.com/success-stories>... [apify] INFO TitleSpider is parsing <200 https://apify.com/ideas>... [apify] INFO TitleSpider is parsing <200 https://apify.com/pricing>... [apify] INFO TitleSpider is parsing <200 https://docs.apify.com/academy/web-scraping-for-beginners>... [apify] INFO TitleSpider is parsing <200 https://docs.apify.com/>... [apify] INFO TitleSpider is parsing <200 https://docs.apify.com/academy/apify-platform>... [apify] INFO TitleSpider is parsing <200 https://docs.apify.com>... [apify] INFO TitleSpider is parsing <200 https://apify.com/>... [apify] INFO TitleSpider is parsing <200 https://blog.apify.com/>... [apify] INFO TitleSpider is parsing <200 https://apify.com/>... [apify] INFO TitleSpider is parsing <200 https://apify.com/>... [apify] INFO TitleSpider is parsing <200 https://apify.com/>... [apify] INFO TitleSpider is parsing <200 https://apify.com/>... [apify] INFO TitleSpider is parsing <200 https://apify.com/>... [apify] INFO TitleSpider is parsing <200 https://apify.com/streamers/youtube-scraper>... [apify] INFO TitleSpider is parsing <200 https://apify.com/apify/web-scraper>... [apify] INFO TitleSpider is parsing <200 https://apify.com/compass/crawler-google-places>... [apify] INFO TitleSpider is parsing <200 https://apify.com/quacker/twitter-scraper>... [apify] INFO TitleSpider is parsing <200 https://apify.com/apify/cheerio-scraper>... [apify] INFO TitleSpider is parsing <200 https://crawlee.dev>... [apify] INFO TitleSpider is parsing <200 https://apify.com/apify/puppeteer-scraper>... [apify] INFO TitleSpider is parsing <200 https://apify.com/store>... [apify] INFO TitleSpider is parsing <200 https://apify.com/>... [apify] INFO TitleSpider is parsing <200 https://apify.com/enterprise>... [apify] INFO TitleSpider is parsing <200 https://discord.com/invite/jyEM2PRvMU>... [apify] INFO TitleSpider is parsing <200 https://apify.com/>... [apify] INFO TitleSpider is parsing <200 https://apify.com/>... [scrapy.spidermiddlewares.httperror] INFO Ignoring response <403 https://crozdesk.com/it/platform-as-a-service-paas/apify>: HTTP status code is not handled or not allowed ({"spider": "<TitleSpider 'title_spider' at 0x7f9c76fe43d0>"}) [scrapy.spidermiddlewares.httperror] INFO Ignoring response <403 https://www.trustradius.com/products/apify/reviews>: HTTP status code is not handled or not allowed ({"spider": "<TitleSpider 'title_spider' at 0x7f9c76fe43d0>"}) [scrapy.spidermiddlewares.httperror] INFO Ignoring response <403 https://www.g2.com/products/apify/reviews>: HTTP status code is not handled or not allowed ({"spider": "<TitleSpider 'title_spider' at 0x7f9c76fe43d0>"}) [apify] INFO TitleSpider is parsing <200 https://apify.com/terms-of-use>... [apify] INFO TitleSpider is parsing <200 https://apify.com/privacy-policy>... [apify] INFO TitleSpider is parsing <200 https://apify.com/cookie-policy>... [apify] INFO TitleSpider is parsing <200 https://apify.com/junglee/amazon-crawler>... [apify] INFO TitleSpider is parsing <200 https://apify.com/voyager/booking-scraper>... [apify] INFO TitleSpider is parsing <200 https://console.apify.com/sign-in>... [apify] INFO TitleSpider is parsing <200 https://docs.apify.com/cli/>... [apify] INFO TitleSpider is parsing <200 https://consent.youtube.com/ml?continue=https://www.youtube.com/apify?cbrd%3D1&gl=CZ&hl=en&cm=2&pc=yt&src=1>... [scrapy.spidermiddlewares.httperror] INFO Ignoring response <403 https://www.capterra.com/reviews/150854/Apify>: HTTP status code is not handled or not allowed ({"spider": "<TitleSpider 'title_spider' at 0x7f9c76fe43d0>"}) [apify] INFO TitleSpider is parsing <200 https://github.com/apify>... [apify] INFO TitleSpider is parsing <200 https://console.apify.com>... [apify] INFO TitleSpider is parsing <200 https://console.apify.com/sign-up>... [apify] INFO TitleSpider is parsing <200 https://help.apify.com/en/>... [apify] INFO TitleSpider is parsing <200 https://stackoverflow.com/questions/tagged/apify>... [apify] INFO TitleSpider is parsing <200 https://apify.com/store/scrapers/universal-web-scrapers>... [scrapy.core.engine] INFO Closing spider (finished) ({"spider": "<TitleSpider 'title_spider' at 0x7f9c76fe43d0>"}) [scrapy.statscollectors] INFO Dumping Scrapy stats: {'downloader/exception_count': 2, 'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 2, 'downloader/request_bytes': 21563, 'downloader/request_count': 84, 'downloader/request_method_count/GET': 84, 'downloader/response_bytes': 2156572, 'downloader/response_count': 84, 'downloader/response_status_count/200': 70, 'downloader/response_status_count/302': 4, 'downloader/response_status_count/308': 3, 'downloader/response_status_count/403': 6, 'downloader/response_status_count/404': 1, 'elapsed_time_seconds': 2.714584, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2023, 10, 2, 17, 56, 23, 977190, tzinfo=datetime.timezone.utc), 'httpcompression/response_bytes': 10134231, 'httpcompression/response_count': 73, 'httperror/response_ignored_count': 4, 'httperror/response_ignored_status_count/403': 4, 'item_scraped_count': 56, 'log_count/INFO': 71, 'memusage/max': 75722752, 'memusage/startup': 75722752, 'request_depth_max': 1, 'response_received_count': 77, 'robotstxt/forbidden': 2, 'robotstxt/request_count': 17, 'robotstxt/response_count': 17, 'robotstxt/response_status_count/200': 14, 'robotstxt/response_status_count/403': 2, 'robotstxt/response_status_count/404': 1, 'start_time': datetime.datetime(2023, 10, 2, 17, 56, 21, 262606, tzinfo=datetime.timezone.utc)} ({"spider": "<TitleSpider 'title_spider' at 0x7f9c76fe43d0>"}) [scrapy.core.engine] INFO Spider closed (finished) ({"spider": "<TitleSpider 'title_spider' at 0x7f9c76fe43d0>"}) [twisted] INFO (TCP Port 6023 Closed) [apify] INFO Exiting actor ({"exit_code": 0}) ``` ### After (screenshot with colored output) ![image](https://github.com/apify/apify-sdk-python/assets/25082181/686ffc2a-df44-44d0-aacb-6d2dd39229c8)
- Loading branch information