You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please add a new example to the Crawlee Python examples page for users to follow; or, somewhere else if there is a better spot.
This example would address complex, real-world scenarios where users need to combine multiple crawling techniques and technologies. By providing a fully functional, extensive example, users can copy-paste it and adapt it to their specific needs, saving them the effort of figuring out how to connect all the pieces for complicated use cases.
Proposed Workflow for the Example:
Login with Playwright: Use Playwright to log in to a site and establish a session (e.g., handling cookies, tokens, or authentication).
Crawl JavaScript-Heavy Pages: Use Playwright to navigate and crawl dynamic, JavaScript-heavy pages using the established session.
Crawl Static Pages: Leverage the session to crawl static pages using a lightweight HTTP crawler for increased speed and efficiency.
Mimic Requests: Use the session to make authenticated requests (e.g., mimicking API calls) and download JSON files.
Use RESTful API: Demonstrate how to use the established session to interact with a REST API to fetch more JSON data.
Use GraphQL: Extend the example further by including authenticated requests to a GraphQL API to fetch additional JSON data.
Value to Users:
Efficiency: Users can copy-paste the example and simply delete the sections they don’t need, instead of piecing together solutions from scratch.
Real-World Applicability: Many web scraping tasks involve a mix of JavaScript-heavy crawling, lightweight static scraping, and direct API requests. A comprehensive example would address these common, yet complex scenarios.
Ease of Learning: Beginners can see how different technologies (Playwright, HTTP crawlers, RESTful APIs, GraphQL) work together in a single project, fostering a better understanding of Crawlee's full capabilities.
Customizability: The modular nature of the example makes it adaptable to a wide range of use cases, from crawling e-commerce sites to accessing complex data sources.
Demonstrate sessions persistence and sharing between the different tools. For instance, how to share cookies between Playwright and the HTTP crawler.
The text was updated successfully, but these errors were encountered:
Hi @matecsaj, thanks for your interest in Crawlee! Have you check out our Introduction guide? I believe it addresses most of what you are asking for. That said, I am aware we are missing a login example, I'll open a new issue to cover that (#870).
Please add a new example to the Crawlee Python examples page for users to follow; or, somewhere else if there is a better spot.
This example would address complex, real-world scenarios where users need to combine multiple crawling techniques and technologies. By providing a fully functional, extensive example, users can copy-paste it and adapt it to their specific needs, saving them the effort of figuring out how to connect all the pieces for complicated use cases.
Proposed Workflow for the Example:
Value to Users:
The text was updated successfully, but these errors were encountered: