Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode issues in the logging #25

Closed
hakavlad opened this issue Mar 15, 2019 · 22 comments
Closed

unicode issues in the logging #25

hakavlad opened this issue Mar 15, 2019 · 22 comments

Comments

@hakavlad
Copy link

hakavlad commented Mar 15, 2019

thrash-protect freezed my browser and crashed:

ERROR:root:red alert!  unacceptable time delta observed! interval: 0.5 cooldown_counter: 1 expected delay: 0 delta: 0.0579028129578 time: 1552623249.83 frozen pids: [(28299,)]
ERROR:root:red alert!  unacceptable time delta observed! interval: 0.5 cooldown_counter: 3 expected delay: 0 delta: 0.0423080921173 time: 1552623249.87 frozen pids: [(28299,), (28928,)]
ERROR:root:red alert!  unacceptable time delta observed! interval: 0.5 cooldown_counter: 5 expected delay: 0 delta: 0.0360288619995 time: 1552623250.0 frozen pids: [(28299,), (28928,), (29415,)]
Traceback (most recent call last):
  File "./thrash-protect.py", line 560, in <module>
    main()
  File "./thrash-protect.py", line 556, in main
    thrash_protect(args)
  File "./thrash-protect.py", line 531, in thrash_protect
    current.unfrozen_pid = unfreeze_something()
  File "./thrash-protect.py", line 505, in unfreeze_something
    log_unfrozen(pid_to_unfreeze)
  File "./thrash-protect.py", line 435, in log_unfrozen
    logfile.write("%s - unfrozen   pid %5s - %s - list: %s\n" % (get_date_string(), str(pid), get_process_info(pid), frozen_pids))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 107-113: ordinal not in range(128)
@tobixen
Copy link
Owner

tobixen commented Mar 15, 2019

Oups, that's a bad one

@tobixen tobixen changed the title crash unicode issues in the logging Mar 15, 2019
@hakavlad
Copy link
Author

see also rfjakob/earlyoom#110

Handle line with unicode:

with open('/proc/' + pid + '/stat', 'rb') as f:
    line = f.read().decode('utf-8', 'ignore')

@tobixen
Copy link
Owner

tobixen commented Mar 15, 2019

We're already doing decode there ... at least in master branch):

        ## TODO: we should fetch this information from /proc filesystem instead of using ps
        info = check_output("ps -p %d uf" % pid, shell = True).decode('utf-8')

@tobixen
Copy link
Owner

tobixen commented Mar 15, 2019

The decode is there in the v0.12.1 revision as well, though it was not there in the original pull-request. It was apparently introduced in pull request #6, 2016. So unless you're running a very old version of thrash-protect, the issue must be somewhere else. I should try and see if I can reproduce the issue.

@hakavlad
Copy link
Author

hakavlad commented Mar 15, 2019

We're already doing decode

...without 'ignore'.

Please use decode('utf-8', 'ignore') instead of decode('utf-8')

you're running a very old version

I used the latest version. I cloned it today.
thrash-protect.py 0.12.1

@tobixen
Copy link
Owner

tobixen commented Mar 15, 2019

I've fixed the ignore, but I don't believe that will help.

Is there any difference if you run it under python2 or python3?

@hakavlad
Copy link
Author

I added 'ignore' and it doesn't help.

Traceback (most recent call last):
  File "./thrash-protect.py", line 560, in <module>
    main()
  File "./thrash-protect.py", line 556, in main
    thrash_protect(args)
  File "./thrash-protect.py", line 528, in thrash_protect
    freeze_something()
  File "./thrash-protect.py", line 451, in freeze_something
    pids_to_freeze = pids_to_freeze or global_process_selector.scan()
  File "./thrash-protect.py", line 389, in scan
    ret = self.collection[self.scan_method_count % len(self.collection)].scan()
  File "./thrash-protect.py", line 337, in scan
    stats = self.readStat(fn)
  File "./thrash-protect.py", line 204, in readStat
    stats_tx = stat_file.readline()
  File "/usr/local/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 21: invalid continuation byte

It needs multiple fixes.

@hakavlad
Copy link
Author

Instead of
stats_tx = stat_file.readline()
maybe need to try
stats_tx = stat_file.read().decode('utf-8', 'ignore').split('\n')

@hakavlad
Copy link
Author

Bad idea: AttributeError: 'str' object has no attribute 'decode'.

@tobixen
Copy link
Owner

tobixen commented Mar 15, 2019

The character set should be given when opening the file. However, as I read the error above it assumes UTF-8, but carps because the file is not valid UTF8.

I will need to try to reproduce this charset problems somehow.

@tobixen
Copy link
Owner

tobixen commented Mar 15, 2019

How does a typical non-ascii /proc/*/stat file look for you?

@tobixen
Copy link
Owner

tobixen commented Mar 15, 2019

the only thing there that is non-numeric is the process name. I'll see if I can spawn up a process locally that has a non-ascii name.

@hakavlad
Copy link
Author

hakavlad commented Mar 15, 2019

it works with python and python3

        with open(sfn, 'rb') as stat_file:
            stats=[]
            stats_tx = stat_file.read().decode('utf-8', 'ignore')

@hakavlad
Copy link
Author

reproduce this charset problems

Rename memory hog to ПАГПАВОНАВЧАОНАЛ and execute it.

@hakavlad
Copy link
Author

reproduce this charset problems

cd /tmp
ln -s /usr/bin/tail ./ЙЦУКЕНГШЩЗХ
./ЙЦУКЕНГШЩЗХ /dev/zero

@hakavlad
Copy link
Author

need to handle next

Traceback (most recent call last):
  File "./thrash-protect.py", line 560, in <module>
    main()
  File "./thrash-protect.py", line 556, in main
    thrash_protect(args)
  File "./thrash-protect.py", line 531, in thrash_protect
    current.unfrozen_pid = unfreeze_something()
  File "./thrash-protect.py", line 505, in unfreeze_something
    log_unfrozen(pid_to_unfreeze)
  File "./thrash-protect.py", line 435, in log_unfrozen
    logfile.write("%s - unfrozen   pid %5s - %s - list: %s\n" % (get_date_string(), str(pid), get_process_info(pid), frozen_pids))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 90-100: ordinal not in range(128)

@tobixen
Copy link
Owner

tobixen commented Mar 15, 2019

I guess the problem is a bit further up ...

with open("/var/log/thrash-protect.log", 'a') as logfile:

tobixen added a commit that referenced this issue Mar 15, 2019
@tobixen
Copy link
Owner

tobixen commented Mar 15, 2019

I believe it should work now. Tested at my end at least, without problems.

@tobixen tobixen closed this as completed Mar 15, 2019
@tobixen
Copy link
Owner

tobixen commented Mar 15, 2019

oh, except now things are breaking under python2

@tobixen
Copy link
Owner

tobixen commented Mar 15, 2019

and now there are test breakages

@tobixen
Copy link
Owner

tobixen commented Mar 16, 2019

unit tests mended, ref c56ba65

@tobixen tobixen closed this as completed Mar 16, 2019
@tobixen
Copy link
Owner

tobixen commented Mar 17, 2019

I've been digging a bit more into this - I'd say it could almost be classified as a kernel bug. In the stat file, long command names are truncated - and in this case, the truncation happens right in a UTF-8 character. The Ш-character is two bytes long in UTF-8, d0a8 - but the truncation happens between d0 and a8. Hence the decode('utf-8') without the ignore-flag will indeed break.

I had forgotten we actually had unit tests in place, I'll make sure to include this process name in the test code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants