Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should we deal with invalid characters on 1.9? #13

Open
eric opened this issue Sep 6, 2011 · 8 comments
Open

How should we deal with invalid characters on 1.9? #13

eric opened this issue Sep 6, 2011 · 8 comments

Comments

@eric
Copy link
Contributor

eric commented Sep 6, 2011

We ran into issues on 1.9 with a file that is supposed to be UTF-8 having invalid characters in it.

A fix was suggested for remote_syslog that should clearly go directly into eventmachine-tail, but I haven't been able to figure out exactly how I would want to fix it.

Here is the discussion that we've had so far:
papertrail/remote_syslog#13

Any thoughts would be welcome for the best way to solve this.

@jordansissel
Copy link
Owner

Since em-tail doesn't display or do any calculations on characters, really, I don't think it should care what encoding the data has that it is reading - so if it's breaking on some input, I think it's a bug in em-tail.

Can you publish a sample file with some bad data? Otherwise I'll try to reproduce and hack on a fix.

@eric
Copy link
Contributor Author

eric commented Sep 6, 2011

I was just playing around with this: https://gist.github.com/1169737

@mblair
Copy link

mblair commented Sep 12, 2011

I've hit this too. I've tried the iconv workaround in the remote_syslog pull request, as well as something like:

data = data.encode!( 'UTF-8', invalid: :replace, undef: :replace )

And I'm still getting the following error:

/usr/lib/ruby/gems/1.9.1/gems/eventmachine-0.12.10/lib/em/buftok.rb:66:in `split': invalid byte sequence in UTF-8 (ArgumentError)

Any ideas?

@vihai
Copy link

vihai commented Mar 13, 2012

Any news about this issue?

@jordansissel
Copy link
Owner

Probably should just read into a buffer that is set explicitly to binary mode, and let the consumer of the data care about the encoding.

I'll get to fixing this eventually if nobody else does.

@rb2k
Copy link

rb2k commented Sep 4, 2014

Resurrecting this after a few years :)
We just ran into this as well.

For us, we launched an app with start-stop-daemon and didn't pass the LC_ALL variable set to something UTF-8'ish

--> Ruby uses POSIX/ASCII and blows up when having to touch and UTF-8 char

@jordansissel
Copy link
Owner

This project has been replaced by the filewatch library. Last I knew, event
machine was abandoned as a project (most recent release is 1.5 years ago),
so I recommend not using em-tail.

sorry for the bugs, but this project is probably not worth resurrecting.

Recommend you check out the filewatch library instead, maybe?

On Thursday, September 4, 2014, Marc Seeger [email protected]
wrote:

Resurrecting this after a few years :)
We just ran into this as well


Reply to this email directly or view it on GitHub
#13 (comment)
.

@rb2k
Copy link

rb2k commented Sep 6, 2014

Sure, probably a good choice :)

Although I don't see an integrated way of actually tailing a file, rather than just being notified that something changed? But maybe it's just too early in the morning ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants