-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 3052 #3053
Issue 3052 #3053
Conversation
Add bio for Megan S. Kane
Create corpus-analysis-with-spacy.md
Upload additional assets
Upload images directory
Upload original avatar
Upload gallery avatar
- Update links to the lesson's `.ipynb` (to be rendered with nbviewer) - Slightly adjust wording at lines 80, 82 and 84.
Correct link, line 47
Delete image to remove transparent background.
(without transparent background)
- Correct typing errors, lines 72 and 194 - Adjust formatting of Research Questions to remove headers
Delete image to remove transparent background.
(without transparent background)
Delete to replace with cropped image.
(cropped image)
@anisa-hawes is this ready for/waiting on my review? |
Hello @hawc2, Yes. Please do read through, and let me know if you spot anything that needs adjustment. As explained via Slack, I've gone through the process of re-cropping some of the images which appeared to be surrounded by 'transparent' background space. I remain puzzled by the fact that several of the Figures are aligned to the left margin while all the others are centred (for example, Figures 9., 10., 25., 26., most glaringly – although these do not feature that extra 'transparent' space'). I also have some concerns about the accessibility of the figures in general. This is a broader question to tackle across all journals, but I think we should be aiming to avoid any screenshots of tabular data. Rather, we should be replacing them with data tables formatted in Markdown (the ones I'd suggest replacing are, for example, Figures 3., 4., 5., 7.). I also think we could significantly improve the accessibility of this lesson by providing the excerpts from spaCy's outputs written as code (for example, Figure 9., 10., 13., 14., 16., 17.). I think something like this could work:
This doesn't display as it would on our website, where I think it would display as a grey-shaded box (our 'notes' boxes) with a narrow-line frame around it. Depending on what you think of these suggestions, we can agree a solution for making these adjustments. Do you/Megan have the raw output excerpts at hand? If so, you could share them and Charlotte and I would be happy to implement the changes and make the necessary adjustments to the figure number sequence. Transforming the tabular data might be a bit more cumbersome, but I think the accessibility benefits would be significant. Charlotte and I can help with this, or take the task on if you/Megan don't have the capacity. |
@mkane968 can you provide the original spreadsheet data for screenshot spreadsheets? |
Hi @hawc2 and @anisa-hawes, Here is the tabluar data for the specified images: Figure 2:
Figure 3:
Figure 4:
Figure 5:
Figure 7:
Figure 18:
Figure 19:
Figure 21:
Figure 23:
And here is the raw output for the following figures: Figure 6:
Figure 8:
Figure 9:
Figure 10:
Figure 13:
Figure 14:
Figure 16:
Figure 17:
Figure 25:
Figure 26:
Is this all you need? If there a different/better format for you to revise the figures, happy to provide that instead. Additionally, when re-running the code, I realized that a cell needs to be added at the top of the Part of Speech Analysis section to create a new dataframe to use for the section:
A couple of the code blocks after this have to be tweaked to reflect the use of this new dataframe rather than the final_paper_df; if it's not changed, the code will break at the start of the named entity analysis section. Here is the Colab notebook with the revised change: Can I still edit the markdown file to make this change? Sorry, I'm not sure how this slipped through earlier! Thanks, Megan |
@mkane968 thanks so much, this is just what we needed I think. @anisa-hawes let me know if I can help with anything else preparing the lesson for publication, your plan sounds good to me. Slack me if I can help debugging the images cropping oddly |
- Replace figures 2, 3, 4, 5, 7, 18, 19, 21, 23 with tabular data - Adjust/add text to introduce each table and explain what it provides - Replace figures 6, 8, 9, 10, 13, 14, 16, 17, 25, 26 with raw output - Adjust/add text to introduce each output and explain what it provides - + some small typographical corrections
Add `table-wrapper` to make wide tables scrollable width-ways.
Add hard returns to follow `<div class="table-wrapper" markdown="block">`
Deleting directory (to replace with updated figure set)
Upload updated figure set.
- Update image filenames - Update figure numbers
Small adjustments to tables
To be replaced with updated notebook.
Hello @mkane968, Many thanks for providing the data tables formatted in Markdown + the excerpts from spaCy's outputs written as code. I think these adjustments will significantly improve the accessibility and readability of this lesson, and I really appreciate your collaboration.
You can review the Netlify Preview to see the changes as staged. I've renumbered the remaining figures (and their filenames), and adjusted the captions accordingly. As noted above, I've also made some small adjustments to the text, so that the tables and output are introduced and make sense within the lesson. You can review the rich-diff of the changes I made d4bbbe6, and let me know if anything is incorrect or not as you want it.
Additionally, I'd like to raise a couple of queries which I noted in the course of making these adjustments:
however I noticed that Figure 10 (as was) included a different extract (I've included a screenshot below) so I typed this output myself. This means your hypothesis (line 454) that the text output is likely to be from a Biology paper remains true (although, as I explain above, I think that sentence needs adjustment). Let me know how you want to handle this/if you want us to replace it with the New York list?
|
Hi @anisa-hawes, The changes look good! Just one minor edit:
Revisions related to the part-of-speech dataframe:
In response to your other notes:
Listing the nouns in each text can help us ascertain the texts' subjects. Let's list the nouns in two different texts, the text located in row 3 of the DataFrame and the text located in row 163.
The first text in the list includes botany and astronomy concepts; this is likely to have been written for a biology course.
In contrast, the second text appears to be an analysis of Shakespeare plays and movie adaptations, likely written for an English course.
Along with assisting content analyses, extracting nouns have been shown to help build more efficient topic models[^9].
Thanks! Megan |
Integrate Megan's edits.
Delete notebook asset to replace with updated version.
Thank you, @mkane968.
|
Thanks @anisa-hawes! The changes look good, just a few notes from a final read-through:
In response to your other questions:
|
Thank you for these clarifications, @mkane968.
|
Integrate Megan's edits.
Replace perma.cc link with live link. (Perma.cc cannot archive that URL).
This looks great to me. Thank you @mkane968 and @anisa-hawes for your careful attention to details and meticulous corrections/improvements to this lesson. It's ready for publication! |
- Adjust capitalisation of 'spaCy' in the lesson title - Update `date:`
Hello @hawc2. Sorry to trouble you for a re-review. I made one tiny change, which was to adjust the capitalisation of 'spaCy' in the title so that it's consistent with the lesson. This is aligned with how we've titled Installing Python Modules with pip, for example. |
Preparing files for publication on behalf of AWC.
Checklist
Closes #ISSUENUMBER
to your summary above[ ] if the text needs to be translated, please follow the translation request guidelines, then assign the relevant language team(s) as "Reviewers" and tag both the team as well as the managing editor in your PR.