Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: added guide for result storages (Dataset, KeyValueStore) #587

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Manish-k723
Copy link

Description

This PR adds documentation on Crawlee's result storage types, specifically the Key-Value Store and Dataset, providing usage examples and file structures for efficient data management.

Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a guide for Crawlee Python, but your code samples are in JS.

If you want to continue working on that, update the code samples to Python. Also, please use the same structure as we have in other guides (docs/guides) - code samples are in separate files, we use links to API docs, ...

@Manish-k723
Copy link
Author

Manish-k723 commented Oct 16, 2024

Hi @vdusek updated the code as per requirement:

  1. Made the guide for crawlee python which missed earlier.
  2. updated the code structure as other guide docs have.

Pls let me know, if you see any other issues in this PR. closes #479

@Manish-k723
Copy link
Author

Manish-k723 commented Oct 19, 2024

Hi @vdusek pls check this PR and provide your valuable inputs on this, I am keen to work on this and contribute more to the crawlee.

@vdusek
Copy link
Collaborator

vdusek commented Oct 21, 2024

Hey @Manish-k723, CI checks are not passing.

@Manish-k723
Copy link
Author

Manish-k723 commented Oct 21, 2024

Hi @vdusek resolved the CI errors, pls let me know your inputs if something is still not correct.

docs/guides/result_storage.mdx Outdated Show resolved Hide resolved
docs/guides/result_storage.mdx Show resolved Hide resolved
docs/guides/result_storage.mdx Show resolved Hide resolved
docs/guides/result_storage.mdx Show resolved Hide resolved

Every Crawlee project run is linked to a default dataset, which is generally used to store the results specific to that crawler execution. Utilizing this dataset is optional.

In Crawlee, datasets are represented by the Dataset class. To facilitate writing to the default dataset, Crawlee provides the `Dataset.pushData()` function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identifiers are wrong. Probably just blindly copy-pasted from JS...


Every Crawlee project run is tied to a default key-value store. By convention, the project’s input and output are saved in this default key-value store under the keys `INPUT` and `OUTPUT`, respectively. Typically, both input and output are in JSON format, though other formats are also acceptable.

In Crawlee, the key-value store is represented by the KeyValueStore class. To facilitate easy access to the default key-value store, Crawlee provides the functions `KeyValueStore.getValue()` and `KeyValueStore.setValue()`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identifiers are wrong. Probably just blindly copy-pasted from JS...

docs/guides/result_storage.mdx Show resolved Hide resolved
docs/guides/result_storage.mdx Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a new guide for result storages (Dataset, KeyValueStore)
2 participants