Releases: apify/crawlee-python
Releases · apify/crawlee-python
0.5.2
0.5.1
0.5.1 (2025-01-07)
🐛 Bug Fixes
- Make result of RequestList.is_empty independent of fetch_next_request calls (#876) (d50249e) by @janbuchar
0.5.0
0.5.0 (2025-01-02)
🚀 Features
- Add possibility to use None as no proxy in tiered proxies (#760) (0fbd017) by @Pijukatel
- Add
use_state
context method (#682) (868b41e) by @Mantisus - Add pre-navigation hooks router to AbstractHttpCrawler (#791) (0f23205) by @Pijukatel
- Add example of how to integrate Camoufox into PlaywrightCrawler (#789) (246cfc4) by @Pijukatel
- Expose event types, improve on/emit signature, allow parameterless listeners (#800) (c102c4c) by @janbuchar
- Add stop method to BasicCrawler (#807) (6d01af4) by @Pijukatel
- Add
html_to_text
helper function (#792) (2b9d970) by @Pijukatel - [breaking] Implement
RequestManagerTandem
, removeadd_request
fromRequestList
, accept any iterable inRequestList
constructor (#777) (4172652) by @janbuchar
🐛 Bug Fixes
- Fix circular import in
KeyValueStore
(#805) (8bdf49d) by @Mantisus - [breaking] Refactor service usage to rely on
service_locator
(#691) (1d31c6c) by @vdusek - Pass
verify
in httpx client (#802) (074d083) by @Mantisus - Fix
page_options
forPlaywrightBrowserPlugin
(#796) (bd3bdd4) by @Mantisus - Fix event migrating handler in
RequestQueue
(#825) (fd6663f) by @Mantisus - Respect user configuration for work with status codes (#812) (8daf4bd) by @Mantisus
abort-on-error
for successive runs (#834) (0cea673) by @Mantisus- Relax ServiceLocator restrictions (#837) (aa3667f) by @janbuchar
- Fix typo in exports (#841) (8fa6ac9) by @janbuchar
Refactor
- [breaking] Refactor HttpCrawler, BeautifulSoupCrawler, ParselCrawler inheritance (#746) (9d3c269) by @Pijukatel
- [breaking] Remove
json_
andorder_no
fromRequest
(#788) (5381d13) by @Mantisus - [breaking] Rename PwPreNavContext to PwPreNavCrawlingContext (#827) (84b61a3) by @vdusek
- [breaking] Rename PlaywrightCrawler kwargs: browser_options, page_options (#831) (ffc6048) by @Pijukatel
- [breaking] Update the crawlers & storage clients structure (#828) (0ba04d1) by @vdusek
0.4.5
0.4.5 (2024-12-06)
🚀 Features
- Improve project bootstrapping (#538) (367899c) by @janbuchar
🐛 Bug Fixes
- Add upper bound of HTTPX version (#775) (b59e34d) by @vdusek
- Fix incorrect use of desired concurrency ratio (#780) (d1f8bfb) by @Pijukatel
- Remove pydantic constraint <2.10.0 and update timedelta validator, serializer type hints (#757) (c0050c0) by @Pijukatel
0.4.4
0.4.3
0.4.2
0.4.2 (2024-11-20)
🐛 Bug Fixes
- Respect custom HTTP headers in
PlaywrightCrawler
(#685) (a84125f) by @Mantisus - Fix serialization payload in Request. Fix Docs for Post Request (#683) (e8b4d2d) by @Mantisus
- Accept string payload in the Request constructor (#697) (19f5add) by @vdusek
- Fix snapshots handling (#692) (4016c0d) by @Pijukatel
0.4.1
0.4.1 (2024-11-11)
🚀 Features
- Add
max_crawl_depth
option toBasicCrawler
(#637) (77deaa9) by @Prathamesh010 - Add BeautifulSoupParser type alias (#674) (b2cf88f) by @Pijukatel
🐛 Bug Fixes
- Fix total_size usage in memory size monitoring (#661) (c2a3239) by @janbuchar
- Add HttpHeaders to module exports (#664) (f0c5ca7) by @vdusek
- Fix unhandled ValueError in request handler result processing (#666) (0a99d7f) by @janbuchar
- Fix BaseDatasetClient.iter_items type hints (#680) (a968b1b) by @Pijukatel
0.4.0
0.4.0 (2024-11-01)
🚀 Features
- Add headers in unique key computation (#609) (6c4746f) by @Prathamesh010
- Add
pre_navigation_hooks
toPlaywrightCrawler
(#631) (5dd5b60) by @Prathamesh010 - Add
always_enqueue
option to bypass URL deduplication (#621) (4e59fa4) by @Rutam21 - Split and add extra configuration to export_data method (#580) (6751635) by @deshansh
🐛 Bug Fixes
- Use strip in headers normalization (#614) (a15b21e) by @vdusek
- Merge payload and data fields of Request (#542) (d06fcef) by @vdusek
- Default ProxyInfo port if httpx.URL port is None (#619) (8107a6f) by @steffansafey
Chore
0.3.9
0.3.9 (2024-10-23)
🚀 Features
- Key-value store context helpers (#584) (fc15622) by @janbuchar
- Added get_public_url method to KeyValueStore (#572, closes #514) (3a4ba8f) by @akshay11298
🐛 Bug Fixes
- Workaround for JSON value typing problems (#581, closes #563) (403496a) by @janbuchar