Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve API docs of public components #304

Open
6 of 21 tasks
vdusek opened this issue Jul 15, 2024 · 8 comments · May be fixed by #613
Open
6 of 21 tasks

Improve API docs of public components #304

vdusek opened this issue Jul 15, 2024 · 8 comments · May be fixed by #613
Labels
documentation Improvements or additions to documentation. hacktoberfest t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@vdusek
Copy link
Collaborator

vdusek commented Jul 15, 2024

Improve API docs of the public components, mainly:

  • BasicCrawler
  • HttpCrawler
  • BeautifulSoupCrawler
  • ParselCrawler
  • PlaywrightCrawler
  • Dataset,
  • KeyValueStore
  • RequestQueue
  • MemoryStorageClient
  • HttpxHttpClient
  • CurlImpersonateHttpClient,
  • Configuration
  • EventManager
  • LocalEventManager
  • Request
  • Session
  • SessionPool
  • BrowserPool,
  • PlaywrightBrowserController
  • PlaywrightBrowserPlugin
  • Statistics
@vdusek vdusek added documentation Improvements or additions to documentation. t-tooling Issues with this label are in the ownership of the tooling team. labels Jul 15, 2024
@vdusek vdusek added this to the 95th sprint - Tooling team milestone Jul 30, 2024
vdusek added a commit that referenced this issue Aug 27, 2024
### Description

- Improve docstrings of storage classes.
- I also changed the list of main classes to reflect at least "somehow"
the current public interface.

### Issues

- Relates: #304

### Testing

- Website was rendered locally.

### Checklist

- [x] CI passed
vdusek added a commit that referenced this issue Sep 18, 2024
@vdusek vdusek removed their assignment Sep 30, 2024
@belloibrahv
Copy link

Hi @vdusek,

I’d like to work on this issue to improve the API docs for the listed public components. Before starting, I have a few quick questions:

  1. Are there any specific guidelines or formats I should follow for the documentation?
  2. Should the focus be on adding descriptions and examples, or should I also cover parameter types, return values, etc.?

Looking forward to your guidance!

@vdusek
Copy link
Collaborator Author

vdusek commented Oct 2, 2024

Hi @belloibrahv, thanks for your interest in Crawlee.

Are there any specific guidelines or formats I should follow for the documentation?

Should the focus be on adding descriptions and examples, or should I also cover parameter types, return values, etc.?

@belloibrahv
Copy link

Hi @vdusek,

I'd like to contribute to improving the API docs for the public components as described in this issue. After reviewing the provided references, guidelines, and the list of components, here's how I plan to approach this task:

  1. Focus on improving class docstrings for the unchecked components, starting with:

    • BasicCrawler
    • HttpCrawler
    • BeautifulSoupCrawler
  2. Follow the Google style guide for docstring formatting, ensuring clarity and consistency.

  3. For each class, I'll provide:

    • A brief one-line summary of the class's purpose
    • A more detailed description of its functionality and use cases
    • Any important notes or caveats about usage
  4. Where appropriate, I'll include a simple example of how to use the class.

  5. I'll ensure the formatting is correct for proper rendering in the API reference.

I plan to start with the BasicCrawler class and submit a pull request for review. This will allow for early feedback before proceeding with the other classes.

Is this approach aligned with your expectations? Do you have any additional guidance or specific areas you'd like me to focus on within these unchecked components?

Thank you for the opportunity to contribute to Crawlee!

Best regards,
@belloibrahv

@vdusek
Copy link
Collaborator Author

vdusek commented Oct 2, 2024

@belloibrahv That would be great, thanks.

@belloibrahv
Copy link

Hi @vdusek,

I hope you're doing well! I've completed the changes for the BasicCrawler docstring updates as discussed in issue #304. The PR is now ready, and I'd greatly appreciate it if you could review it and provide any feedback or suggestions for improvement.

Looking forward to your thoughts!

@belloibrahv
Copy link

@vdusek , Thank you for your feedback. I’ve reviewed the JS API class you mentioned (https://github.com/apify/crawlee/blob/master/packages/basic-crawler/src/internals/basic-crawler.ts) and understand your concerns. For issue #304, would it still be appropriate to proceed if I ensure strict adherence to the guidelines and requirements this time?

Additionally, since you mentioned that BasicCrawler might not be the best starting point (At early review), could you suggest a more suitable class or component to begin with? Any further guidance on how I can improve my approach to make the PR smoother for review would be greatly appreciated.

Thank you for your help and patience.

@vdusek
Copy link
Collaborator Author

vdusek commented Oct 21, 2024

Hi @belloibrahv, if you're still interested in working on this, I can provide further guidelines on what we're expecting.

I recommend focusing on one of the more "high-level" HTTP-based crawlers - HttpCrawler, BeautifulSoupCrawler, or ParselCrawler, as they should be more easily understood.

For inspiration, take a look at some of the already completed classes in the checklist above, such as Dataset, KeyValueStore, RequestQueue, HttpxHttpClient, CurlImpersonateHttpClient, or Request.

Please make only relevant changes. Stay within the scope of this issue/PR, and ensure that the pull request has a clear, single objective. For this PR, that means updating the class docstring. You can also modify the method docstrings if you're confident, but avoid making changes outside the intended scope.

You should now be familiar with the Google-style docstrings, as we discussed in your previous PR.

If you choose one of the HTTP-based crawlers, you should describe how they inherit from BasicCrawler, meaning they include all its features. Also, explain what and how they use (HTTP clients and HTML parsers). You can provide a short code example as well (but try to execute it at first 🙂).

It should not be just about using ChatGPT or any other LLM. Of course, you can use them, but first, you need to understand the code & issue you're trying to solve. So that it can provide you with a meaningful output.

Choose one of the crawlers I suggested, try to understand the code, and write something meaningful.

Thank you, and good luck!

@belloibrahv
Copy link

Hi, could you please review my new PR #613 when you have a moment? Thank you!

@apify apify deleted a comment from belloibrahv Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation. hacktoberfest t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
3 participants