-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summarizer still tries to process video links and picture galleries #27
Comments
Pretty sure this is fixed now. Can you find some broken links? |
Environment and 2 weeks is an example. How did the old API handle it? Even returning 'cannot summarize article' would be an improvement |
The old API looked at all text inside a div with a specific id/class. The current one takes an URL and looks for the largest body of text within the page. Checking for the id before loading the page would take too long to load, so the only thing I can think of would be to use the old method for all URLs with theguardian.com/video/ in the URL. |
Good idea, what did the old method return? If it still doesn't make sense then returning an error would be a better option |
Pushed quick fix to master. Will now error if it starts talking about autoplay. '/video/' is floating in the middle of the URL, and can't be parsed nicely without affecting future links. |
Can't a regular expression sort that?
|
No description provided.
The text was updated successfully, but these errors were encountered: