The ScrapeOps Python Requests SDK is an extension for your python requests based scraping that gives you all the scraping monitoring, statistics, alerting and data validation you will need straight out of the box.
Just import it into your python project, start using our requests wrapper and the SDK will automatically monitor your scrapers and send your logs to your scraping dashboard.
Full documentation can be found here: ScrapeOps Documentation
View features
-
** Job Stats & Visualisation**
- 📈 Individual Job Progress Stats
- 📊 Compare Jobs versus Historical Jobs
- 💯 Job Stats Tracked
- ✅ Pages Scraped & Missed
- ✅ Items Parsed & Missed
- ✅ Item Field Coverage
- ✅ Runtimes
- ✅ Response Status Codes
- ✅ Success Rates & Average Latencies
- ✅ Errors & Warnings
- ✅ Bandwidth
-
Health Checks & Alerts
- 🕵️♂️ Custom Spider & Job Health Checks
- 📦 Out of the Box Alerts - Slack (More coming soon!)
- 📑 Daily Scraping Reports
-
Proxy Monitoring (Coming Soon)
- 📈 Monitor Your Proxy Account Usage
- 📉 Track Your Proxy Providers Performance
- 📊 Compare Proxy Performance Verus Other Providers
You can get the ScrapeOps monitoring suite up and running in 3 easy steps.
pip install scrapeops-python-requests
Import then initialize the Scrapeops logger and add your API key
.
from scrapeops_python_requests.scrapeops_requests import ScrapeOpsRequests
scrapeops_logger = ScrapeOpsRequests(
scrapeops_api_key='API_KEY_HERE',
spider_name='SPIDER_NAME_HERE',
job_name='JOB_NAME_HERE',
)
The last step is to just override the standard python requests with our requests wrapper.
Our wrapper uses the standard python request library but just provides a way for us to monitor the requests as they happen.
Please only initialize the requests wrapper once near the top of your code.
requests = scrapeops_logger.RequestsWrapper()
We've added a simple example so you can see how you can add it to an existing project.
from scrapeops_python_requests.scrapeops_requests import ScrapeOpsRequests
## Initialize the ScrapeOps Logger
scrapeops_logger = ScrapeOpsRequests(
scrapeops_api_key='API_KEY_HERE',
spider_name='DemoSpider',
job_name='Test1',
)
## Initialize the ScrapeOps Python Requests Wrapper
requests = scrapeops_logger.RequestsWrapper()
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
'http://quotes.toscrape.com/page/3/',
'http://quotes.toscrape.com/page/4/',
'http://quotes.toscrape.com/page/5/',
]
for url in urls:
response = requests.get(url)
item = {'test': 'hello'}
## Log Scraped Item
scrapeops_logger.item_scraped(
response=response,
item=item
)
That's all. From here, the ScrapeOps SDK will automatically monitor and collect statistics from your scraping jobs and display them in your ScrapeOps dashboard.