Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error "Applying Character set: utf-8" for theguardian.com #157

Closed
wfriesen opened this issue Aug 16, 2020 · 4 comments
Closed

Error "Applying Character set: utf-8" for theguardian.com #157

wfriesen opened this issue Aug 16, 2020 · 4 comments

Comments

@wfriesen
Copy link
Contributor

I'm seeing this error in the System Log when running for theguardian.com:

Applying Character set: utf-81. plugins.local/feediron/bin/fi_logger.php(33): trigger_error(Applying Character set: utf-8, 1024)
2. plugins.local/feediron/bin/fi_helper.php(59): log(1, Applying Character set: utf-8)
3. plugins.local/feediron/filters/fi_mod_xpath/init.php(8): getDOM(
<!DOCTYPE html>
<html id="js-context" class="js-off is-not-modern id--signed-out" lang="en" data-page-path="/australia-news/2020/aug/16/coalition-must-ensure-australia-wont-be-at-end-of-queue-for-coronavirus-vaccine-labor-says">
<head>

(full log truncated)

This occurs while pulling feeds during the usual update, and also while running in the FeedIron Testing tab.

This seems to occur regardless of the config, but the simplest one I've used to cause this is:

{
    "type": "xpath",
    "xpath": [
        "article"
    ]
}

It seems to happen against any URLs appearing in the guardian feed, but one example is: https://www.theguardian.com/australia-news/2020/aug/16/coalition-must-ensure-australia-wont-be-at-end-of-queue-for-coronavirus-vaccine-labor-says

I'm running the latest versions:
tt-rss: v20.08-5497a137d
feediron: latest master (51b3446)

@dugite-code
Copy link
Contributor

dugite-code commented Aug 17, 2020

This is normal behavior as it's the way the debugging logs to the TT-RSS system log by throwing a E_USER_NOTICE error and it's not a real error. "debug": false should be the default but you can add it to your main config to be safe. When ruining in the testing tab this will still always show.

Related enhancement issue #122

@wfriesen
Copy link
Contributor Author

Thanks, that clears up the logging issue, but now it appears that feediron is not being applied when the updater runs.

I started from scratch with a blank database, added the plugin and my settings, then imported my feeds. After the update ran nothing was filtered by feediron. If I go to the Feed Debugger, check Force Refresh, and then Continue, I can see lines like

[07:00:42/43] hash differs, applying plugin filters:
[07:00:42/43] ... Feediron
[07:00:42/43] === 0.1244 (sec)
[07:00:42/43] plugin data: feediron,

and the relevant feed items are filtered according to the config.

Is there a reason this would work from both the testing tab and when forcing a refresh, but not during the regular feed updates?

@dugite-code
Copy link
Contributor

dugite-code commented Aug 17, 2020

Feediron always runs for all feeds as it checks to see if the article url matches the config url. The only reason it won't replace the feed body is if it can't match the url link to the article with the url in the config. I'm using very simple text matching to do this so keep it as simple as possible i.e. don't use https://www.theguardian.com use theguardian.com

Your main config should look something like this.

{
    "theguardian.com":{
        "type": "xpath",
        "xpath": [
            "article"
        ]
    },
    "debug": false
}

The only time I've encountered issues is when the site uses a service like feedburner that obfuscates the original article url. Using the plugin af_unburn solves that however.

As a side note: I didn't know the feed debugger even existed... well that's embarrassing and useful

Edit: checking a feed I use Feediron for the average processing time is around 0.3 - 0.5 (sec) where as a feed it's not running on is around 0.0007 (sec)

@wfriesen
Copy link
Contributor Author

This ended up being due to incorrect permissions, I think. I'm running ttrss in docker containers on AWS, and mounted in the plugin directory as a volume from the host. After updating the scripts to clone feediron in the same way as the ttrss code itself, everything is working perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants