Please add a comprehensive example to the documentation #862

matecsaj · 2025-01-05T17:55:07Z

Please add a new example to the Crawlee Python examples page for users to follow; or, somewhere else if there is a better spot.

This example would address complex, real-world scenarios where users need to combine multiple crawling techniques and technologies. By providing a fully functional, extensive example, users can copy-paste it and adapt it to their specific needs, saving them the effort of figuring out how to connect all the pieces for complicated use cases.

Proposed Workflow for the Example:

Login with Playwright: Use Playwright to log in to a site and establish a session (e.g., handling cookies, tokens, or authentication).
Crawl JavaScript-Heavy Pages: Use Playwright to navigate and crawl dynamic, JavaScript-heavy pages using the established session.
Crawl Static Pages: Leverage the session to crawl static pages using a lightweight HTTP crawler for increased speed and efficiency.
Mimic Requests: Use the session to make authenticated requests (e.g., mimicking API calls) and download JSON files.
Use RESTful API: Demonstrate how to use the established session to interact with a REST API to fetch more JSON data.
Use GraphQL: Extend the example further by including authenticated requests to a GraphQL API to fetch additional JSON data.

Value to Users:

Efficiency: Users can copy-paste the example and simply delete the sections they don’t need, instead of piecing together solutions from scratch.
Real-World Applicability: Many web scraping tasks involve a mix of JavaScript-heavy crawling, lightweight static scraping, and direct API requests. A comprehensive example would address these common, yet complex scenarios.
Ease of Learning: Beginners can see how different technologies (Playwright, HTTP crawlers, RESTful APIs, GraphQL) work together in a single project, fostering a better understanding of Crawlee's full capabilities.
Customizability: The modular nature of the example makes it adaptable to a wide range of use cases, from crawling e-commerce sites to accessing complex data sources.
Demonstrate sessions persistence and sharing between the different tools. For instance, how to share cookies between Playwright and the HTTP crawler.

vdusek · 2025-01-06T11:33:48Z

Hi @matecsaj, thanks for your interest in Crawlee! Have you check out our Introduction guide? I believe it addresses most of what you are asking for. That said, I am aware we are missing a login example, I'll open a new issue to cover that (#870).

github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 5, 2025

vdusek changed the title ~~Please add a comprehensive example to the documentation.~~ Please add a comprehensive example to the documentation Jan 6, 2025

vdusek mentioned this issue Jan 6, 2025

Add a login example to the documentation #870

Open

vdusek closed this as completed Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please add a comprehensive example to the documentation #862

Please add a comprehensive example to the documentation #862

matecsaj commented Jan 5, 2025

vdusek commented Jan 6, 2025

Please add a comprehensive example to the documentation #862

Please add a comprehensive example to the documentation #862

Comments

matecsaj commented Jan 5, 2025

vdusek commented Jan 6, 2025