-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for md ref links, others #6
Changes from 4 commits
bea28de
c3e27b8
824c235
14788d1
4b28843
d49dd46
45dd9db
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,47 +1,60 @@ | ||
import scrapeLinks from "./scrapeLinks"; | ||
|
||
const testMarkdownString = ` | ||
This is a Markdown example with [a link to google](https://www.google.com) and [one with a subdirectory](https://www.google.com/nested/page.html) | ||
const markdownString = ` | ||
Markdown example with a [link to Google](https://www.google.com), one [with a URL path](https://www.google.com/nested/page.html), and others: | ||
|
||
and [another to reddit](www.reddit.com) and [a third to Twitter](facebook.com) | ||
- One [to reddit](www.reddit.com) | ||
- A fourth [to Facebook](facebook.com) (incomplete URLs) | ||
- Finally a few [ref] [links][links] [here][link-here] | ||
|
||
as well as some blank lines | ||
[ref]: https://www.ref.com | ||
[links]: | ||
www.links.in/newline | ||
[link-here]: | ||
/just/a/path | ||
|
||
There's also some blank lines, misc. text, and <span>HTML</span> code. | ||
`; | ||
|
||
const plaintextString = ` | ||
This string is plaintext, with links like https://www.google.com and https://www.google.com/nested/page.html | ||
|
||
I can scrape "https://reddit.com/r/subreddit" and (https://facebook.com) as well! | ||
I can scrape "https://reddit.com/r/subreddit" and (https://facebook.com) as well! The new regex can pull www.youtube.com too!? | ||
|
||
The new regex can pull www.youtube.com too!? unfortunately, gmail.com is just too vague. | ||
TODO: Unfortunately, gmail.com is just too vague. | ||
TODO: Ending in a period won't work well either, e.g. www.something.com. | ||
`; | ||
|
||
const plaintextTestResult = [ | ||
const markdownTestResult = [ | ||
"https://www.google.com", | ||
"https://www.google.com/nested/page.html", | ||
"https://reddit.com/r/subreddit", | ||
"https://facebook.com", | ||
"www.youtube.com", | ||
"www.reddit.com", | ||
"facebook.com", | ||
"https://www.ref.com", | ||
"www.links.in/newline", | ||
"/just/a/path", | ||
Comment on lines
+34
to
+36
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Vs. this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm BTW
But if I remove these 2 lines then it fails with
🤔 Yelp There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm pretty sure you're looking at two different tests, one that splits the string into newlines and one that doesn't. I just removed the newline split one and changed the inner workings so we don't get string arrays as content (by joining them before returning as content). That should solve this weirdness and allow us to pick up links that take multiple lines like this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right I thought it had something to do with the new line even though I put TBH I'm not 100% what the change to the other files do but the tests are passing so thanks! Do you approve this PR now? 🙂 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
All just internal stuff, basically this project has a two ways of gathering links: file and GH diff. The GH diff method used to return an array of strings before since I had assumed no links span multiple lines, but that's clearly not the case- now that parser joins the array into a string before returning it, similarly to the filesystem-based parser. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay! I'm trying this PR out manually by running it on the CLI, and noticed some things:
I test it out manually by running There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Patch is in! I can deploy tomorrow when I'm a bit less tired, I don't want to rush and mess it up. However, merging in the source repo won't break anything in prod so I can approve/merge. Thanks @jorgeorpinel! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey thank you so much @rogermparent ! I see you did some more tweaks to address some of the things you mentioned. Let's see what it does on dvc.org once deployed 😬 |
||
]; | ||
|
||
const markdownTestResult = [ | ||
const plaintextTestResult = [ | ||
"https://www.google.com", | ||
"https://www.google.com/nested/page.html", | ||
"www.reddit.com", | ||
"facebook.com", | ||
"https://reddit.com/r/subreddit", | ||
"https://facebook.com", | ||
"www.youtube.com", | ||
"www.something.com.", | ||
]; | ||
|
||
test("It scrapes from the markdown test string", () => { | ||
expect( | ||
scrapeLinks({ | ||
filePath: "test.md", | ||
content: testMarkdownString, | ||
content: markdownString, | ||
}) | ||
).toEqual(markdownTestResult); | ||
}); | ||
|
||
test("It scrapes from the markdown test split by newlines", () => { | ||
const splitTest = testMarkdownString.split("\n"); | ||
const splitTest = markdownString.split("\n"); | ||
expect( | ||
scrapeLinks({ | ||
filePath: "test.md", | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test this...