Improve API docs of public components #304

vdusek · 2024-07-15T13:44:49Z

### Description - Improve docstrings of storage classes. - I also changed the list of main classes to reflect at least "somehow" the current public interface. ### Issues - Relates: #304 ### Testing - Website was rendered locally. ### Checklist - [x] CI passed

Related: #304

belloibrahv · 2024-10-02T02:45:25Z

Hi @vdusek,

I’d like to work on this issue to improve the API docs for the listed public components. Before starting, I have a few quick questions:

Are there any specific guidelines or formats I should follow for the documentation?
Should the focus be on adding descriptions and examples, or should I also cover parameter types, return values, etc.?

Looking forward to your guidance!

vdusek · 2024-10-02T09:25:12Z

Hi @belloibrahv, thanks for your interest in Crawlee.

Are there any specific guidelines or formats I should follow for the documentation?

We follow the Google style.
Maybe you can also check out the API reference rendering is broken #324.

Should the focus be on adding descriptions and examples, or should I also cover parameter types, return values, etc.?

This should be only about the class docstrings.
You can check out the previous PRs (docs: improve docstrings of storages #465, docs: improve docstring of Request class #534) and maybe Crawlee JS API docs for inspiration.

belloibrahv · 2024-10-02T10:59:01Z

Hi @vdusek,

I'd like to contribute to improving the API docs for the public components as described in this issue. After reviewing the provided references, guidelines, and the list of components, here's how I plan to approach this task:

Focus on improving class docstrings for the unchecked components, starting with:
- BasicCrawler
- HttpCrawler
- BeautifulSoupCrawler
Follow the Google style guide for docstring formatting, ensuring clarity and consistency.
For each class, I'll provide:
- A brief one-line summary of the class's purpose
- A more detailed description of its functionality and use cases
- Any important notes or caveats about usage
Where appropriate, I'll include a simple example of how to use the class.
I'll ensure the formatting is correct for proper rendering in the API reference.

I plan to start with the BasicCrawler class and submit a pull request for review. This will allow for early feedback before proceeding with the other classes.

Is this approach aligned with your expectations? Do you have any additional guidance or specific areas you'd like me to focus on within these unchecked components?

Thank you for the opportunity to contribute to Crawlee!

Best regards,
@belloibrahv

vdusek · 2024-10-02T13:48:17Z

@belloibrahv That would be great, thanks.

belloibrahv · 2024-10-03T00:14:05Z

Hi @vdusek,

I hope you're doing well! I've completed the changes for the BasicCrawler docstring updates as discussed in issue #304. The PR is now ready, and I'd greatly appreciate it if you could review it and provide any feedback or suggestions for improvement.

Looking forward to your thoughts!

belloibrahv · 2024-10-21T14:35:49Z

@vdusek , Thank you for your feedback. I’ve reviewed the JS API class you mentioned (https://github.com/apify/crawlee/blob/master/packages/basic-crawler/src/internals/basic-crawler.ts) and understand your concerns. For issue #304, would it still be appropriate to proceed if I ensure strict adherence to the guidelines and requirements this time?

Additionally, since you mentioned that BasicCrawler might not be the best starting point (At early review), could you suggest a more suitable class or component to begin with? Any further guidance on how I can improve my approach to make the PR smoother for review would be greatly appreciated.

Thank you for your help and patience.

vdusek · 2024-10-21T15:50:19Z

Hi @belloibrahv, if you're still interested in working on this, I can provide further guidelines on what we're expecting.

I recommend focusing on one of the more "high-level" HTTP-based crawlers - HttpCrawler, BeautifulSoupCrawler, or ParselCrawler, as they should be more easily understood.

For inspiration, take a look at some of the already completed classes in the checklist above, such as Dataset, KeyValueStore, RequestQueue, HttpxHttpClient, CurlImpersonateHttpClient, or Request.

Please make only relevant changes. Stay within the scope of this issue/PR, and ensure that the pull request has a clear, single objective. For this PR, that means updating the class docstring. You can also modify the method docstrings if you're confident, but avoid making changes outside the intended scope.

You should now be familiar with the Google-style docstrings, as we discussed in your previous PR.

If you choose one of the HTTP-based crawlers, you should describe how they inherit from BasicCrawler, meaning they include all its features. Also, explain what and how they use (HTTP clients and HTML parsers). You can provide a short code example as well (but try to execute it at first 🙂).

It should not be just about using ChatGPT or any other LLM. Of course, you can use them, but first, you need to understand the code & issue you're trying to solve. So that it can provide you with a meaningful output.

Choose one of the crawlers I suggested, try to understand the code, and write something meaningful.

Thank you, and good luck!

belloibrahv · 2024-10-22T19:22:50Z

Hi, could you please review my new PR #613 when you have a moment? Thank you!

vdusek added documentation Improvements or additions to documentation. t-tooling Issues with this label are in the ownership of the tooling team. labels Jul 15, 2024

B4nan assigned vdusek Jul 16, 2024

vdusek added this to the 95th sprint - Tooling team milestone Jul 30, 2024

vdusek modified the milestones: 95th sprint - Tooling team, 96th sprint - Tooling team, 97th sprint - Tooling team Aug 19, 2024

vdusek mentioned this issue Aug 27, 2024

docs: improve docstrings of storages #465

Merged

1 task

vdusek modified the milestones: 97th sprint - Tooling team, 98th sprint - Tooling team Sep 9, 2024

vdusek mentioned this issue Sep 18, 2024

docs: improve docstring of Request class #534

Merged

vdusek added a commit that referenced this issue Sep 18, 2024

docs: improve docstring of Request class (#534)

530caa0

Related: #304

vdusek modified the milestones: 98th sprint - Tooling team, 99th sprint - Tooling team Sep 23, 2024

B4nan added the hacktoberfest label Sep 23, 2024

vdusek removed their assignment Sep 30, 2024

belloibrahv mentioned this issue Oct 3, 2024

docs: Improved API documentation for BasicCrawler class #564

Closed

1 task

This was referenced Oct 22, 2024

docs: Update http crawler docs #612

Closed

docs: Refactor HttpCrawler to conform to Google-style docstring guide #613

Open

apify deleted a comment from belloibrahv Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve API docs of public components #304

Improve API docs of public components #304

vdusek commented Jul 15, 2024 •

edited

Loading

belloibrahv commented Oct 2, 2024

vdusek commented Oct 2, 2024

belloibrahv commented Oct 2, 2024

vdusek commented Oct 2, 2024

belloibrahv commented Oct 3, 2024

belloibrahv commented Oct 21, 2024

vdusek commented Oct 21, 2024

belloibrahv commented Oct 22, 2024

Improve API docs of public components #304

Improve API docs of public components #304

Comments

vdusek commented Jul 15, 2024 • edited Loading

belloibrahv commented Oct 2, 2024

vdusek commented Oct 2, 2024

belloibrahv commented Oct 2, 2024

vdusek commented Oct 2, 2024

belloibrahv commented Oct 3, 2024

belloibrahv commented Oct 21, 2024

vdusek commented Oct 21, 2024

belloibrahv commented Oct 22, 2024

vdusek commented Jul 15, 2024 •

edited

Loading