Ticket #1629 (closed defect: fixed)
[Patch] Problems displaying UTF-8 manual pages
| Reported by: | egmont | Owned by: | slavazanko | 
|---|---|---|---|
| Priority: | major | Milestone: | 4.8.0-pre1 | 
| Component: | mcview | Version: | 4.7.0-pre3 | 
| Keywords: | Cc: | ||
| Blocked By: | Blocking: | ||
| Branch state: | no branch | Votes for changeset: | commited-master commited-stable | 
Description
Formatted manual pages using UTF-8 charset are handled incorrectly by mc-4.7-pre2.
Problem 1: Accented characters in bold (yellow) text are displayed incorrectly: the letter, followed by a dot, then the letter once again, all in white.
Problem 2: Search does not find text that are bold or underlined (yellow or red) and contain accented letters. Exception: it finds if only the first character is accented, but highlights the match incorrectly.
I attach such a formatted manpage (emitted by groff-utf8) for your convenience - you might have to press F9 in mcview.
Attachments
Change History
Changed 16 years ago by egmont
- Attachment chsh-man-hu-formatted.txt added
Changed 16 years ago by egmont
- Attachment chsh-man-hu-formatted.zip added
Same in .zip so that you can download it
comment:2 Changed 16 years ago by egmont
I attach a proof of concept patch. This patch fixes displaying UTF-8 nroff pages, but probably break non-utf8 ones. It doesn't address the issues with search either.
So this patch is not to be applied in its current form, it needs more work. It just shows where the bugs are: "from - 2" obviously assumes single byte locales, so does mcview_get_byte to get c_next, as well as g_ascii_isprint (there's a fixme entry about it somewhere else in that file).
Changed 16 years ago by egmont
- Attachment mc-4.7-pre2-utf8-nroff-proof-of-concept.patch added
proof of concept only - not a ready patch
comment:3 Changed 16 years ago by slavazanko
- Status changed from new to accepted
- Owner set to slavazanko
- severity changed from no branch to on rework
- Milestone changed from 4.7 to 4.7.0-pre4
created branch 1629_nroff_utf8 (parent: master)
changeset:7adcdc570a07353ad64bc017352eb245dff3753b
Please, test it. I'm tested on:
*) with option --enable-charset in utf-8 codepage
*) with option --enable-charset in one-byte codepage
*) with option --disable-charset in utf-8 codepage (old behavior like no patch)
*) with option --disable-charset one-byte codepage (old behavior like no patch)
With '--disable-charset' option mans looks as no patch present.
Keep here your comments about this, please.
comment:4 Changed 16 years ago by egmont
Haven't tested, but the patch looks good. One question, though:
What's the point in reverting c_prev's assignment for the "no charset" case? I think my version is cleaner here: the source is shorter (you save one mcview_get_byte() call and one #ifdef), plus, the charset and the no charset code becomes much more similar to each other. I think having two different code paths for setting c_prev makes the code harder to understand and maintain.
comment:5 Changed 16 years ago by slavazanko
- Version changed from 4.7.0-pre2 to 4.7.0-pre3
- severity changed from on rework to on review
Branch rebased. Also, fixed c_prev's assignment
initial changeset:95311b5e5a0cc8d0c4b79a93dabe906319fe533c
Review, Please.
comment:6 Changed 16 years ago by egmont
I cannot compile&run right now, but thoroughly looking at the patch it looks good. If you've tested, please go ahead and apply, thanks!
Please don't forget that it only addresses the first issue in this bug, not the second one. That one's still open.
comment:7 Changed 16 years ago by slavazanko
- severity changed from on review to on rework
yep, i forgot about second part of bugreport :)
comment:9 Changed 16 years ago by egmont
Friendly ping... What happened to the changesets created by slavazanko? They are no longer accessible.
For your convenience I attach an updated version of my patch (rebased on 4.7.0.2).
Even though this patch only fixes the first part of the bug, I'd appreciate if you went ahead and submitted the fix. Later we can build on top of this and fix searching in manpages too.
comment:10 Changed 16 years ago by egmont
- Summary changed from Problems displaying UTF-8 manual pages to [Patch] Problems displaying UTF-8 manual pages
Ping...
You nearly submitted the patch 6 months ago, and since then nothing happened. Moreover, your changesets are no longer available - why?
I'd be happy to work on the second part of the story, but only if the first part is already submitted - I don't have time to waste for creating patches if they are ignored by the developers. Thanks for your understanding.
comment:11 Changed 14 years ago by slavazanko
- severity changed from on rework to on review
- Milestone changed from 4.7 to 4.8.0-pre1
Recreated branch changeset:1629_nroff_utf8.
Initial changeset:f5055c78684694a7a81575c30ee9815927b81d4c
Branch have resolutions of two issues: 
- show UTF-8 symbols in nroff'ed text and
- UTF-8 search over nroff'ed text.
Review, please
comment:15 Changed 14 years ago by andrew_b
- Votes for changeset changed from angel_il to angel_il andrew_b
- severity changed from on review to approved
comment:16 Changed 14 years ago by slavazanko
- Status changed from accepted to testing
- Votes for changeset changed from angel_il andrew_b to commited-master
- Resolution set to fixed
- severity changed from approved to merged
- Blocking 265 removed
Merge changeset:d7fd84507c063adb79b02b544d07203b73518831
Commits:
git log --pretty=oneline 7d52669..79660a7
comment:17 Changed 14 years ago by slavazanko
- Status changed from testing to closed
- Keywords stable-candidate removed
- Votes for changeset changed from commited-master to commited-master commited-stable
- Branch state set to no branch
cherry-picked in stable branch:
git log --pretty=oneline 84cb81e5..52047442


Formatted UTF-8 manual page