-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flags at the beginning and end of the text are not rendered correctly #51
Comments
@nidhi2509's work on #47 is probably relevant here. She identified 482699b as the last good commit. The next commit after that is af9014e. |
Actually, I am seeing this regression with the previous commits too. |
So not a regression but a plain old bug! |
A good next step would be to make a new branch and write some "red" tests for these particular bugs. Does that make sense? |
@kaliloua7 has started writing test cases. He wrote a "green" test case for a flag in the middle of the text. We need "red" test cases for flags at the beginning and end, which will be "green" when we fix the bug(s). @elsayeaa has a theory about why the "undefined" is appearing at the end. This bug would benefit from some pair programming. |
After working on the bug with @dylanjwu, we concluded that the bug arises from the getBlurbs function. What's happening is that the text is split into an array of elements with an undefined object at the end, since the splitting method is using "" (empty string) as a delimiter. Then, the text is modified, joined, and then split into another array of flags using the delimiter "[!]". There are two current fixes, with each have a disadvantage: The first fixsimply adding an empty space at the end of the text before splitting it, which will still create an undefined object in the area (so, the elements would be the n elements included + empty space + undefined; instead of n elements included + undefined). However, this is considered as a Kludge since it's not actually fixing the problem. The second fixslicing the TextArray before joining it to remove the undefined element at the end and then splitting it again using the delimiter "[!]" to create the flags. Disadvantage, the last character in the entire text is not rendered, which is the "." Suggestion:Completely reworking the getBlrubs() method to avoid creating an unwanted undefined object. Let me know what you think @ProfJanetDavis and @j6k4m8, so whether to make this is as a new issue or implement one of the suggsted fixes. |
Thanks for this analysis. The proposed fixes seem to deal with the "undefined" at the end of the text, but not the unrendered flag at the beginning of the text. Is that right or am I missing something? Of the two suggested fixes, the second one sounds more appropriate. The undefined value should never be rendered as text. Do you know why the last character is not rendered? Regarding your suggestion, do you have an idea for an alternate algorithm for |
@dylanjwu has proposed a solution for the flag at the beginning of the sentence, and it is currently functional. @dylanjwu Can you elaborate more?
it shows like this: |
The bug that arose for the flag at the start of the text was because the "dummy flag" starting and ending at index 0 (to ensure that text at the start that does not contain a blurb is rendered) was overwriting any actual flags that also start at 0. I fixed this by checking if a flag is contained at the start of the text, and if so, setting the flags array to be empty. See sol'n here:
|
That makes sense!
…On Sun, Sep 6, 2020 at 9:20 PM Dylan Wu ***@***.***> wrote:
The bug that arose for the flag at the start of the text was because the
"dummy flag" starting and ending at index 0 (to ensure that text at the
start that does not contain a blurb is rendered) was overwriting any actual
flags that also start at 0. I fixed this by checking if a flag is contained
at the start of the text, and if so, setting the flags array to be empty.
See the code here:
`
getFlags(issues){
let flags = [
{
start: 0,
end: 0,
category: "",
problem: "",
suggestions: "",
bias: ""
}
];
let firstFlag = issues[0].flags;
if(firstFlag.length > 0 && firstFlag[0][0] === 0){
flags = [];
}`
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#51 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEXJIU3ZATKS2VBPRB72XT3SERNP3ANCNFSM4P3U737Q>
.
--
Janet Davis (she/they)
Department Chair, Computer Science
Associate Professor and Microsoft Chair of Computer Science
Whitman College, Walla Walla, WA 99362
206-383-8798 (cell), http://cs.whitman.edu/~davisj/
Schedule an appointment: http://meetme.so/JanetDavis
|
So, after walking through the code again and giving it another look. I am suggesting that we should use another delimiters for separating and joining the text. That is, use another delimiter that's not existing at the end of the text. The following are some of the suggested fixes that came to my mind, and I am not quite sure which one to be better implemented.
What do you think? |
@elsayeaa I'm not sure. I've been trying to understand the example you gave earlier, with I think you might need to screenshare and walk me through it during our next meeting on Thursday. |
I was able to finally fix the problem!
However, I am putting this as a prototype and was thinking to make it more generalized and instead of OR statements for each punctuation mark, we look into a list or a datatype that have them collectively. This solves the problem of the extra spaces between the flagged words and their punctuation. Finally, for the last flag, to solve the undefined problem, we do the following code:
this code basically checks if the last element is undefined and removes it if it exists, if not it leaves the text as it is. These fixes are good for the front-end, but I strongly suggest modifying the tockenize function in the backend and make it include the punctuation. I think Tockenize was used to detect words, but not display the text as it is, so it fails to achieve what's required on the website scale. Let me know what you think. @dylanjwu can you make a branch of your fix to the flag at the beginning, so I can merge both? |
However, I am putting this as a prototype and was thinking to make it more
generalized and instead of OR statements for each punctuation mark, we look
into a list or a datatype that have them collectively.
The built-in `String includes()` method may be useful here. You can make a
string of punctuation characters and check if `textArray[flag.end]` is in
that string. But wouldn't it be simpler to instead check if the next
character is *not* a space or newline? Due to the tokenization, I don't
think a flag will ever end mid-word.
Also, wouldn't the suggested code error if `flag.end` is at the end of
`textArray`?
Finally, I would hesitate to recommend changes to the backend code without
a much deeper understanding of that code.
…On Sat, Sep 12, 2020 at 6:15 PM elsayeaa ***@***.***> wrote:
I was able to finally fix the problem!
So, after going through the code manifold times and the back-end, I
discovered that the main issue was a backend issue since using
nltk.tockenize only takes the word without spaces and punctuation, which
causes the loss of punctuation and spaces. That said, if we look at
previous code of splitting and separating the text into flags and
non-flags. The separating method takes the flag and adds the delimiter to
the end of the flag (the flag that lost its punctuation). So the following
code does the fix.
for (const [i, flag] of flags.entries()) {
textArray[flag.end] = textArray[flag.end] === '.' || textArray[flag.end] === ','
? textArray[flag.end] + "[!]||||"
: "[!]||||" + textArray[flag.end];
textArray[flag.start] = `[!]||${i}||` + textArray[flag.start];
}
However, I am putting this as a prototype and was thinking to make it more
generalized and instead of OR statements for each punctuation mark, we look
into a list or a datatype that have them collectively. This solves the
problem of the extra spaces between the flagged words and their
punctuation. Finally, for the last flag, to solve the undefined problem, we
do the following code:
textArray = textArray.join("").split("[!]");
if (textArray[-1] === '||||undefined'){
textArray.pop();
}
this code basically checks if the last element is undefined and removes it
if it exists, if not it leaves the text as it is.
These fixes are good for the front-end, but I strongly suggest modifying
the tockenize function in the backend and make it include the punctuation.
I think Tockenize was used to detect words, but not display the text as it
is, so it fails to achieve what's required on the website scale. Let me
know what you think. @dylanjwu <https://github.com/dylanjwu> can you make
a branch of your fix to the flag at the beginning, so I can merge both?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#51 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEXJIU52W5RPUUN62NR5EI3SFQMM5ANCNFSM4P3U737Q>
.
--
Janet Davis (she/they)
Department Chair, Computer Science
Associate Professor and Microsoft Chair of Computer Science
Whitman College, Walla Walla, WA 99362
206-383-8798 (cell), http://cs.whitman.edu/~davisj/
Schedule an appointment: http://meetme.so/JanetDavis
|
I thought about the String Include method.. I can show you an illustration of before-and-after my modifications. There is no error if the flag is at the end of the sentence, middle of the sentence, middle and end at the same time, beginning, beginning and middle and end, beginning and end, and so on with different permutations. I made sure to account for these variations. Checking if it is not space or a new line is a beautiful idea and easier to implement than checking punctuation symbols. I think this fix solves it at the frontend level, but it makes the code fragile and prone to unaccounted errors, so although we need a deeper understanding of the code (which I tried to do over this weekend), I strongly recommend that we change it to accommodate the needs of the front end. In the end, it is a matter of choice. @ProfJanetDavis Should I make a pull request after merging @dylanjwu solution to the first-flag issue to this fix? |
Yes, I think this will be ready for a pull request.
…On Sun, Sep 13, 2020 at 4:04 PM elsayeaa ***@***.***> wrote:
I thought about the String Include method but I was just asking for any
suggestions here. I can show you an illustration of before-and-after my
modifications. There is no error if the flag is at the end of the sentence,
middle of the sentence, middle and end at the same time, beginning,
beginning and middle and end, beginning and end, and so on with different
permutations. I made sure to account for these variations. Checking if it
is not space or a new line is a beautiful idea and easier to implement than
checking punctuation symbols. I think this fix solves it at the backend,
but it makes the code fragile and prone to unaccounted errors, so although
we need a deeper understanding of the code (which I tried to do over this
weekend), I strongly recommend that we change it to accommodate the needs
of the front end. In the end, it is a matter of choice. @ProfJanetDavis
<https://github.com/ProfJanetDavis> Should I make a pull request after
merging @dylanjwu <https://github.com/dylanjwu> to this fix?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#51 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEXJIU75QZLIRWC76GVBC33SFVFXLANCNFSM4P3U737Q>
.
--
Janet Davis (she/they)
Department Chair, Computer Science
Associate Professor and Microsoft Chair of Computer Science
Whitman College, Walla Walla, WA 99362
206-383-8798 (cell), http://cs.whitman.edu/~davisj/
Schedule an appointment: http://meetme.so/JanetDavis
|
Describe the bug
There are problems with rendering flags at the beginning and end of the text.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Both instances of "She is willing" should be flagged. There should not be "undefined" at the end.
Additional context
We discovered this while investigating #47 .
The text was updated successfully, but these errors were encountered: