id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	blockedby	blocking	branch_state	votes
3843	mcedit, 8-bit terminal, encoding=UTF-8: characters between 0x80 and 0xff are broken	lzsiga	zaytsev	"Hi, this problem is present in mc-4.8.19 (reproduced on Debian/Linux and AIX), it affects the editor.

Problem description:

I use a 8-bit terminal-emulator with LC_CYTPE=hu_HU.ISO-8859-2 (for Hungarian language), and it works properly; in mcedit's 'Choose encoding' dialog I select 'UTF-8' (as I want to create a file in UTF-8).

Then I try some accented letters in the editor, such as á é ő ű. They are all in ISO-8859-2 (codes e1, e9, f5, fb), but only first two are in ISO-8859-1; the unicodes are: U+E1, U+E9, U+0151, U+0171

The problem is that only the two latter characters are properly displayed and stored in file; for the two former, editor displays dots instead of them; and saving into file, instead of UTF8-sequences (c3e1, c3e9) it stores single-bytes (the ISO-8859-2 codes: e1 e9)

I think I found the source of the problem in src/editor/edit.c, line 3559
{{{
       if (char_for_insertion > 255 && !mc_global.utf8_display)
}}}
It ignores characters between 128 and 255 even if 'UTF-8' is selected (it is mc_global.source_codepage==12 in my case)
The change I suggest is this:
{{{
        if ((char_for_insertion > 255 ||
            (char_for_insertion > 127 && str_isutf8 (get_codepage_id (mc_global.source_codepage)))) &&
            !mc_global.utf8_display)
        {
}}}
I tested it on linux and AIX in different contexts (8-bit emulator vs unicode-emulator; 8-bit file-encoding vs unicode), and it seemed working in all cases.

(I admit, the method of checking whether mc_global.source_codepage is UTF-8 or not is a bit clumsy, but I couldn't find a simpler method.)"	defect	closed	major	4.8.20	mcedit	master	fixed	mcedit utf8	egmont			merged	committed-master