-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc Quality Transform: update readme and add sample notebook #790
Conversation
Signed-off-by: Daiki Tsuzuku <[email protected]>
1747aeb
to
1a70530
Compare
Thanks, @dtsuzuku-ibm. I am ok with most of the decisions you have made (like not including the information we can get from github about the Author and Date. However, @agoyal26, who created the template, may insist that we still provide such information in the README file. |
I have made some suggestions to the read me. We can skip adding date of revisions but I think adding the list of contributors with their email would be good. |
Where are your suggestions, @agoyal26? (other than adding the list of contributors and their emails) |
I added them in readme.md itself as comments- are they visible? @shahrokhDaijavad |
They are probably on your local copy of the file, @agoyal26. I don't see them. Do you want to just copy and paste your local README file here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see some comments on readme to improve readability
Thanks, @agoyal26 Now, I see your comments! |
Signed-off-by: Daiki Tsuzuku <[email protected]>
9547de6
to
ecb87b0
Compare
I'm ok with that. Should I change the name of |
oh my apologies - I get it now. Let it be description of the column. Also please add contributors and authors name and email id to readme. Then looks like we are good to go. Thank you |
Signed-off-by: Daiki Tsuzuku <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@shahrokhDaijavad Could you assign any autorized user as reviewer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @dtsuzuku-ibm and @agoyal26
@dtsuzuku-ibm I think this is ready to merge. I will ask @touma-I to do this. |
@shahrokhDaijavad I'm afraid I'm not authorized to merge this PR... Could you merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shahrokhDaijavad @dtsuzuku-ibm This shows how to run the example script once we have cloned the repo. I wonder if we should also add a section to it that would explain how to use it in a notebook or a python script without cloning the repo but only with pip install . i.e.:
!pip install data-prep-toolkit
!pip install data-prep-toolkit-transform[doc_quality]
from doc_quality_transform python import ...
params = {
...
}
...
laucher.launch()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add example. with a few lines of code on how one can do pip install, setup the parameter block for input/output folder and then invoke the runtime launcher using the default parameters . Thanks
We can do all of these in jupyter/collab notebook which will be added in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dtsuzuku-ibm . that is great. Once you have the notebook, you can reference it. For now, I don't think this is complete until we have one or the other or both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add example. with a few lines of code on how one can do pip install, setup the parameter block for input/output folder and then invoke the runtime launcher using the default parameters . Thanks
@dtsuzuku-ibm Since we don't have the notebook example yet, @touma-I is suggesting adding these few lines of code as an example of how we will be using all transforms in the future using pip install and not cloning the repo. Once we have the notebook example, we may remove these lines and just refer to the notebook. |
very good point @touma-I . I agree too- as we are promoting that DPK transforms are easy to use via pip install - we should include these steps. |
@touma-I @shahrokhDaijavad @agoyal26 |
Hi @dtsuzuku-ibm I understand the desire not to copy/paste and hopefully we will get to the point where we can automate some of this stuff maybe via CI/CD. But for now, it is not clear to me what is the minimum that the user needs to do to use this transform. I think having it explained in 5-6 lines of code will help get us to that point. |
Signed-off-by: SHAHROKH DAIJAVAD <[email protected]>
@dtsuzuku-ibm and @touma-I, I added a minimal version of the notebook for this transform based on what we have decided as a notebook template for all transforms. I tried to mimic what I learned from Maroun yesterday, but this being my first time, I may have made mistakes. Please check and modify as needed. |
Signed-off-by: SHAHROKH DAIJAVAD <[email protected]>
Signed-off-by: Daiki Tsuzuku <[email protected]>
@shahrokhDaijavad @touma-I I've added link to jupyter notebook. Could you review? |
@dtsuzuku-ibm The link is fine. However, the output of the notebook you have run shows an error (ModuleNotFoundError: No module named 'doc_quality_transform'). I assume you were able to fix this error and run the notebook to the end, because the final output looks good. Am I right? It would be good to have a version of the notebook without output errors in the repo. |
Signed-off-by: Daiki Tsuzuku <[email protected]>
@shahrokhDaijavad I've updated notebook. It seems my latest local change was not synced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you, @dtsuzuku-ibm.
Question to @touma-I: Although pip installs are commented out in this notebook, we don't need a separate: pip install dpk-doc-quality-transform-python when we use: pip install data-prep-toolkit-transforms[all]>=0.2.2.dev3, right?
@shahrokhDaijavad That is correct. @dtsuzuku-ibm Please remove the pip install dpk-doc-quality-transform-python . We no longer publish individual transforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove reference to dpk-doc-quality-transform-python . We no longer package individual transforms. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove references to pip install dpk-doc-quality-transform-python
(as we no longer publish the individual transforms) and to pip install data-prep-connector
(as this notebook does not seem to have a dependency on the web crawler).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove references to pip install dpk-doc-quality-transform-python
Without dpk-doc-quality-transform-python
, we cannot import doc_quality_transform module. Could you tell me what kind of dependency we need to install instead?
pip install data-prep-connector
Got it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dtsuzuku-ibm I think if you do: pip install ‘data-prep-toolkit-transforms[all]>=0.2.2.dev3', it has doc_quality_transform included. @touma-I can confirm this. Please see the comment at the top of the notebook that says: "These pip installs need to be adapted to use the appropriate release level."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @dtsuzuku-ibm.
Signed-off-by: Daiki Tsuzuku <[email protected]>
b05f61b
to
cf13388
Compare
Signed-off-by: Daiki Tsuzuku <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@touma-I Could you review this PR? I need your review to clear the requested change you made. |
following template #753 (comment)
What I didn't include
Why are these changes needed?
Related issue number (if any).
#753