lib/vfs testsuite failures on illumos #3972

mc-butler · 2019-02-23T17:31:48Z

Important

This issue was migrated from Trac:

Origin	https://midnight-commander.org/ticket/3972
Reporter	`mnowak` (@Mno-hime)

Running test suite from v4.8.22 on OpenIndiana 2018.10 (illumos kernel) with changeset from #3971 comment 1 I get following failures in the lib/vfs testsuite:

PASS: canonicalize_pathname
PASS: current_dir
FAIL: path_cmp
FAIL: path_len
FAIL: path_manipulations
FAIL: path_serialize
PASS: relative_cd
PASS: tempdir
PASS: vfs_adjust_stat
PASS: vfs_parse_ls_lga
PASS: vfs_path_from_str_flags
FAIL: vfs_path_string_convert
PASS: vfs_prefix_to_class
PASS: vfs_setup_cwd
PASS: vfs_split
PASS: vfs_s_get_path
PASS: path_recode
PASS: vfs_get_encoding
============================================================================
Testsuite summary for /lib/vfs
============================================================================
# TOTAL: 18
# PASS:  13
# SKIP:  0
# XFAIL: 0
# FAIL:  5
# XPASS: 0
# ERROR: 0
============================================================================
See tests/lib/vfs/test-suite.log
============================================================================

test-suite.log file is attached.

Let me know should you need information about the environment.

Note

Original attachments:

test-suite.log (raw) by mnowak (@Mno-hime) on Feb 23, 2019 at 17:33 UTC
test-suite.2.log (raw) by andypost (apostnikov@….com) on Jan 23, 2020 at 16:03 UTC
vfs_path_string_convert.log (raw) by andypost (apostnikov@….com) on Jan 23, 2020 at 16:03 UTC
openindiana.patch (raw) by zaytsev (@zyv) on Sep 4, 2024 at 18:36 UTC
Screenshot 2024-09-13 at 12.33.52.png (raw) by zaytsev (@zyv) on Sep 13, 2024 at 10:53 UTC
Screenshot 2024-09-13 at 12.37.54.png (raw) by zaytsev (@zyv) on Sep 13, 2024 at 10:54 UTC
Screenshot 2024-09-13 at 12.37.29.png (raw) by zaytsev (@zyv) on Sep 13, 2024 at 10:54 UTC
Screenshot 2024-09-13 at 19.53.13.png (raw) by zaytsev (@zyv) on Sep 13, 2024 at 18:05 UTC
edit_printable.patch (raw) by andrew_b (@aborodin) on Sep 14, 2024 at 12:45 UTC
edit_printable_2.patch (raw) by andrew_b (@aborodin) on Sep 14, 2024 at 13:35 UTC
aliases.patch (raw) by zaytsev (@zyv) on Sep 15, 2024 at 9:53 UTC
0001-buildsys-fix-bootstrapped-build-on-Solaris-and-syste.patch (raw) by zaytsev (@zyv) on Oct 17, 2024 at 7:17 UTC

The text was updated successfully, but these errors were encountered:

mc-butler · 2019-02-23T17:33:47Z

Changed by mnowak (@Mno-hime) on Feb 23, 2019 at 17:33 UTC

Attachment test-suite.log (raw) added

mc-butler · 2019-02-24T10:37:44Z

Changed by andrew_b (@aborodin) on Feb 24, 2019 at 10:37 UTC (comment 1)

Well, I can see three kinds of test result in the attached log:
1) passed
2) failed
3) aborted

2) and 3) are charset related. What is your locale?

First of all I'm interesting aborted tests. Since I'm not an OpenIndiana user I'm unable to reproduce these errors.

mc-butler · 2020-01-23T16:03:07Z

Changed by andypost (apostnikov@….com) on Jan 23, 2020 at 16:03 UTC

Attachment test-suite.2.log (raw) added

mc-butler · 2020-01-23T16:03:31Z

Changed by andypost (apostnikov@….com) on Jan 23, 2020 at 16:03 UTC

Attachment vfs_path_string_convert.log (raw) added

mc-butler · 2020-01-23T16:06:47Z

Changed by andypost (apostnikov@….com) on Jan 23, 2020 at 16:06 UTC (comment 2)

I'm trying to enable tests on Alpinelinux and getting 2 failures in vfs_path_string_convert.c

Ref https://gitlab.alpinelinux.org/alpine/aports/merge_requests/3243

FAIL: path_len
FAIL: vfs_path_string_convert

mc-butler · 2024-09-03T08:41:25Z

Changed by zaytsev (@zyv) on Sep 3, 2024 at 8:41 UTC (comment 3)

Owner set to zaytsev
Status changed from new to accepted

Mentioned Alpine problems are due to musl iconv: #4495 .

Unfortunately, Cfarm doesn't have an Illumos / OpenIndiana host. I see that mc 4.8.31 is packaged. What is the current situation there, do the tests pass? Is there a development shell host one could use? Any build/test logs? What is the libc like, who provides iconv? Not sure if these failures are due to (to be fixed) mocking things, or iconv.

Maybe a good approximation is to check what is the situation on Solaris.

mc-butler · 2024-09-03T08:49:43Z

Changed by mnowak (@Mno-hime) on Sep 3, 2024 at 8:49 UTC (comment 4)

I am not involved with the project anymore, but you can get a Vagrant Box from https://app.vagrantup.com/openindiana/boxes/hipster.

mc-butler · 2024-09-03T09:05:01Z

Changed by zaytsev (@zyv) on Sep 3, 2024 at 9:05 UTC (comment 5)

Thanks for the tip, do you happen to know if it's possible to download this somehow and import into UTM instead of setting up Vagrant? UTM is an interface for QEMU, so I'd need a qcow2 image and information about the machine configuration. Apparently recently a plugin for Vagrant was developed to use UTM behind the scenes, but if I don't have to install lots of stuff, that would be great...

mc-butler · 2024-09-03T09:21:10Z

Changed by mnowak (@Mno-hime) on Sep 3, 2024 at 9:21 UTC (comment 6)

The libvirt provider of the Box might be a QCOW2 underneath, but likely won't boot as such if extracted and imported.

There are also installation images https://openindiana.org/downloads/; even the minimal image has a small runtime environment with shell, but if you need to install a package to debug things, you'd rather install the whole OS and go from there.

mc-butler · 2024-09-03T09:29:21Z

Changed by zaytsev (@zyv) on Sep 3, 2024 at 9:29 UTC (comment 7)

Thanks again! That looks good so far. Is there any way to know how the machine is configured? Vargantfile has only the following inside:

{"architecture":"amd64","disks":[{"format":"qcow2","path":"box_0.img"}],"provider":"libvirt"}

Like memory, CPU, devices...

mc-butler · 2024-09-03T09:44:39Z

Changed by mnowak (@Mno-hime) on Sep 3, 2024 at 9:44 UTC (comment 8)

The configuration is likely sourced from https://github.com/OpenIndiana/oi-packer.

mc-butler · 2024-09-03T10:44:41Z

Changed by zaytsev (@zyv) on Sep 3, 2024 at 10:44 UTC (comment 9)

Perfect, I've got it to work - the configuration is here:

https://github.com/OpenIndiana/oi-packer/blob/master/hipster.pkr.hcl

Default PC, 4G RAM, AMD64.

Sadly, everything is extremely slow on ARM, but apparently OI doesn't support ARM natively, so emulation is the only option.

But how do I get in on the console? jack/jack & root/openindiana are not working!

mc-butler · 2024-09-03T10:59:21Z

Changed by mnowak (@Mno-hime) on Sep 3, 2024 at 10:59 UTC (comment 10)

Try, root/vagrant https://github.com/OpenIndiana/oi-packer/blob/master/hipster.pkr.hcl#L16?

mc-butler · 2024-09-04T18:35:58Z

Changed by zaytsev (@zyv) on Sep 4, 2024 at 18:35 UTC (comment 11)

Hey, not sure how I have missed it. You were of great help, thank you! I guess I should add some notes on how to test on OpenIndiana on our wiki. I was still able to reproduce test failures. There are two types of problems:

Relates to our vfs scripts and test suite not compatible with ksh, because of the local keyword - see attached patch - seems uncomplicated to fix
Everything that has to do with charset conversion fails - with --disable-charset all test pass (surprisingly!) - which seems to be the same issue as with Alpine / musl.

The charset stuff is annoying, because to my understanding we are actually not using it directly, but pulling it from glib, so either we do something wrong, or glib iconv is broken on these platforms.

I guess I need to look into small reproducer. In the worst case we should include it in autoconf tests and disable charsets if this test fails, because then mc is not going to work correctly.

mc-butler · 2024-09-04T18:36:41Z

Changed by zaytsev (@zyv) on Sep 4, 2024 at 18:36 UTC

Attachment openindiana.patch (raw) added

mc-butler · 2024-09-04T19:04:09Z

Changed by mnowak (@Mno-hime) on Sep 4, 2024 at 19:04 UTC (comment 12)

You are welcome. iconv might be different on illumos distributions like OpenIndiana (source from https://github.com/illumos/illumos-gate/tree/master/usr/src/cmd/iconv) and might do a few things differently to more common GNU libinconv.

mc-butler · 2024-09-05T06:37:28Z

Changed by zaytsev (@zyv) on Sep 5, 2024 at 6:37 UTC (comment 13)

I have tested iconv on the command line and it seems to convert between UTF-8 and KOI8-R used in the tests just fine. All aliases also available. I was even able to read the file in translation mode with mcview coming from the shipped mc! So at least for that the conversions are working correctly.

So I started reading path_len.c and following the functions with HAVE_CHARSET, and I actually don't quite get how it works. Apparently the idea was that the elements of the path are converted from terminal encoding into whatever #enc says they should be encoded in memory, and the encoding is also set to what enc says:

https://github.com/MidnightCommander/mc/blob/master/lib/vfs/vfs.c#L170

But then apparently it was supposed to stay UTF-8 (or rather terminal encoding) in vpath->str when all elements are joined again?

For me on macOS this conversion doesn't do anything though (WHAT?!), it seems that the element stays encoded as UTF-8. But apparently as there is a bug in the conversion back, which also doesn't happen, so the end result is correct and the test pass.

It seems that on OpenIndiana/Illumos and Alpine/musl it actually DOES the conversion to KOI8-R as intended, but due to missing conversion back the test fails.

Andrew, can you help, at least tell me how it was supposed to work before I debug further? This is very confusing.

mc-butler · 2024-09-05T06:39:10Z

Changed by zaytsev (@zyv) on Sep 5, 2024 at 6:39 UTC (comment 14)

Blocking set to #4495

mc-butler · 2024-09-05T09:45:59Z

Changed by zaytsev (@zyv) on Sep 5, 2024 at 9:45 UTC (comment 15)

On Illumos: term_encoding: 646 (!?)

$ iconv -l
The following are all supported code set names.  All combinations
of those names are not necessarily available for the pair of the
fromcode-tocode.  Some of those code set names have aliases, which
are case-insensitive and described in parentheses following the
canonical name:

    5601,
    646 (ASCII, US-ASCII, US_ASCII, USASCII),

With the following patch:

diff --git a/tests/lib/vfs/path_len.c b/tests/lib/vfs/path_len.c
index 6bab6f551..24beca8ac 100644
--- a/tests/lib/vfs/path_len.c
+++ b/tests/lib/vfs/path_len.c
@@ -42,7 +42,7 @@
 static void
 setup (void)
 {
-    str_init_strings (NULL);
+    str_init_strings ("UTF-8");
 
     vfs_init ();
     vfs_init_localfs ();

The tests work on Illumos. But for any single bit (KOI8-R or ASCII) seems to break.

On macOS: term_encoding: US-ASCII, but also forcing "UTF-8" works.

mc-butler · 2024-09-05T14:18:05Z

Changed by andrew_b (@aborodin) on Sep 5, 2024 at 14:18 UTC (comment 13.16)

Replying to zaytsev:

Andrew, can you help, at least tell me how it was supposed to work before I debug further? This is very confusing.

Sorry :-(

mc-butler · 2024-09-05T15:57:11Z

Changed by zaytsev (@zyv) on Sep 5, 2024 at 15:57 UTC (comment 17)

Cc set to slavazanko

I see, so the whole VFS magic including recoding was Slava's work, and you don't know how this #enc: stuff was actually supposed to work? Slava, are you alive :) ? Can you say something?

mc-butler · 2024-09-07T07:20:25Z

Changed by zaytsev (@zyv) on Sep 7, 2024 at 7:20 UTC (comment 18)

So I traced the functions again on macOS, and I was initially wrong about conversion not happening there which confused me. Everything is working as expected (?):

vpath = vfs_path_from_str_uri_parser (path);
- Here, deep inside, the conversion from UTF-8 to KOI8-R in-memory representation is happening in _vfs_translate_path
vpath->str = vfs_path_to_str_flags (vpath, 0, flags);
- Here the opposite conversion from KOI8-R memory representation to UTF-8 is happening using str_vfs_convert_from

It seems that there is some problem with vfs_path_to_str_flags on Illumos, which apparently doesn't have to do with iconv, but with detecting terminal encoding and the need to convert back. Or maybe the conversion fails and this is not handled correctly. In any case, the problem is that the conversion from UTF-8 to KOI8-R happens correctly in vfs_path_from_str_uri_parser, but the conversion back to UTF-8 doesn't happen.

If I force terminal encoding to UTF-8 with str_init_strings ("UTF-8") as shown above, then both conversions are happening correctly like on macOS.

I wonder if I can get any further with a debugger...

mc-butler · 2024-09-07T10:43:03Z

Changed by zaytsev (@zyv) on Sep 7, 2024 at 10:43 UTC (comment 19)

gdbserver is not supported on Solaris, and gdbgui needed fixing:

Fix address resolution on Solaris by specifying the protocol miguelgrinberg/Flask-SocketIO#2088
Fix setting echo attribute in ptylib on Unix98 systems cs01/gdbgui#491

And best of all, the tests pass under graphical debugger - probably because it's run inside Python and stuff is set to UTF there :(

I have the feeling that g_iconv_open (codeset, from_enc) where codeset="646" and from_enc="KOI8-R" in strutil.c:269 returns something that doesn't do anything, which is weird, because the conversion in the forward direction works.

mc-butler · 2024-09-11T08:48:56Z

Changed by zaytsev (@zyv) on Sep 11, 2024 at 8:48 UTC (comment 20)

I think I've figured out what the problem is with Illumos and Alpine. The issues are not exactly the same, but the root cause is the same.

In the VFS library we have conversions that go in two directions:

From terminal encoding to the desired encoding
- Handled by the vfs_translate_path function
From desired encoding to terminal encoding
- Handled by str_vfs_convert_from function

Both functions return estr_t, which can be ESTR_SUCCESS, ESTR_PROBLEM (some characters could not be translated, so it's not possible to reverse the operation) and ESTR_FAILURE (conversion impossible due to wrong / unsupported encoding).

The result of the forward conversion vfs_translate_path is somehow checked and returns a NULL pointer on failure, which is supposed to be checked (hopefully?), but on problems it doesn't do anything and just continues.

The result of the backward conversion (str_vfs_convert_from) is simply ignored!

`path_len` test

This test is started with str_init_strings (NULL), which means US-ASCII on macOS and equivalents elsewhere. The HAVE_CHARSET part contains a raw UTF-8 string that should be recoded as KOI8-R for the file system.

Linux and macOS

iconv just interprets the UTF-8 string as bytes of US-ASCII and "converts" them from bytes of KOI8-R without changing anything.

The problem happens during the conversion from KOI8-R to US-ASCII and is signaled as ESTR_PROBLEM, but it's not a hard error, just some characters can't be translated. And what iconv does is just leave the string as it is.

Since the result of the backward conversion is ignored, everything matches.

Solaris

Here iconv actually decides to do a forward conversion because it can and it's valid, even though it probably shouldn't. So the path element is now a valid KOI8-R string.

The backward conversion from KOI8-R to US-ASCII fails, but iconv leaves the string as it is and ignores the result, so the KOI8-R string is returned.

The length obviously doesn't match and the test fails.

Alpine

Similar to Solaris, but instead of returning the string unchanged, iconv returns an empty string, so path ends up being /#enc:KOI8-R and the assertion fails:

Assertion 'actual_length == data->expected_length' failed: actual_length == 12, data->expected_length == 38

Conclusion

This particular test is simply broken and only works on Linux and macOS by (bad) luck. It should be initialized with str_init_strings ("UTF-8") and this would make the test work correctly on all systems.

I haven't analyzed other tests, but probably they can be fixed somehow.

My question is, what do we do about error handling? It doesn't sound like a good idea to just ignore problems. At the very least, it leads to buggy tests working by accident without anyone noticing. But there are also segfaults on Alpine.

I think maybe it was designed that way because the whole recoding story was for like Linux (KOI8-R) systems accessing Windows FTP (CP1251) with wrong encoding returned or something, so in the worst case the errors mean the file list is garbage or something, and there was actually no intention to propagate errors.

I think we should at least not ignore ESTR_FAILURE on reverse conversion and return a string of ??? of the same length as the input one, so nothing segfaults.

I'm also not sure about returning a NULL pointer on forward conversion. I see some NULL checks, but many seem to be missing. Maybe also return a string of ??? instead?

We can then test for invalid conversions.

Andrew, what do you think?

mc-butler · 2024-09-11T10:16:46Z

Changed by andrew_b (@aborodin) on Sep 11, 2024 at 10:16 UTC (comment 20.21)

Replying to zaytsev:

This particular test is simply broken and only works on Linux and macOS by (bad) luck. It should be initialized with str_init_strings ("UTF-8")

Because test strings are in UTF-8?

and this would make the test work correctly on all systems.

My question is, what do we do about error handling? It doesn't sound like a good idea to just ignore problems.

Definitely, an error handling should be done.

I think we should at least not ignore ESTR_FAILURE on reverse conversion and return a string of ??? of the same length as the input one, so nothing segfaults.

I'm also not sure about returning a NULL pointer on forward conversion. I see some NULL checks, but many seem to be missing. Maybe also return a string of ??? instead?

Probably, in cases of ESTR_FAILURE return of an input string is a workable solution. But if it hides a conversion problem from the caller, it's better to return NULL and a caller should decide to do then.

We can then test for invalid conversions.

mc-butler · 2024-09-11T18:01:14Z

Changed by zaytsev (@zyv) on Sep 11, 2024 at 18:01 UTC (comment 22)

Because test strings are in UTF-8?

Yes.

Suppose you want to test the correctness of the conversions from the terminal encoding to the desired encoding and back at a very superficial level. You need to provide a path that will be encoded differently than the terminal encoding, but the conversion should be lossless.

For example, you can provide the test string in UTF-8 (which will be your terminal encoding) and say that a part of it should be converted to KOI8-R: /#enc:KOI8-R/тестовый/путь. You can now check that after the roundtrip the path has the correct length in bytes (38). You should also check that /тестовый/путь takes 14 bytes in KOI8-R and not 26 bytes. But this was never checked, and so no one has found out that the test is wrong.

The thing is, you have to do str_init_strings ("UTF-8") for the code to work correctly (UTF-8 -> KOI8-R -> UTF-8). If you do str_init_strings (NULL), it will assume that the terminal encoding is ASCII and your string is in ASCII, but it's actually in UTF-8. What happens then depends on how exactly iconv fails, but because we don't do proper error handling, the roundtrip actually works on Linux and macOS as no conversion happens and the raw bytes are just preserved. On Solaris and musl, different things happen, but at least one of the two conversions fails.

So we can either fix the test by adding a missing assertion and setting the encoding to UTF-8, which it is, and then it will work everywhere. Or we can choose a different pair of encodings.

For example, hardcode bytes to CP1251 and do a CP1251 -> KOI8-R -> CP1251 conversion. But this might not be so good for length checks, because both are singlebyte encodings. Unfortunately, the only multibyte encoding we support is UTF-8, because the whole path handling depends on ASCII symbols like / being encoded as single-byte ASCII. So I think it's fine to keep UTF-8. It's actually a good test, and single byte conversions should be tested elsewhere, like `vfs_path_string_convert'.

I have started a branch to fix the tests. Unfortunately, the following tests still need to be investigated (some of them may fail due to broken mocks, see #4584):

FAIL: path_cmp
FAIL: path_manipulations
FAIL: path_serialize
FAIL: vfs_path_string_convert
FAIL: edit_complete_word_cmd

mc-butler · 2024-09-12T09:05:33Z

Changed by zaytsev (@zyv) on Sep 12, 2024 at 9:05 UTC (comment 23)

path_cmp - fixed in the same way as path_len.

path_manipulations & path_serialize have the same problem: for some reason I don't understand yet, #enc:KOI8-R is dropped when manipulating the path (in vfs_path_append_vpath_new and so vfs_path_to_str_flags).

Setting terminal encoding to UTF-8 or a valid Cyrillic encoding like CP1251 fixes the tests.

mc-butler · 2024-09-12T18:09:36Z

Changed by zaytsev (@zyv) on Sep 12, 2024 at 18:09 UTC (comment 24)

Ok, I think I know why this happens. On Solaris, g_iconv_open ("ASCII", "KOI8-R") fails because this conversion is considered invalid. On macOS and apparently Linux, this converter can be created, and as long as the compatible subset is given as strings, it even works.

The problem is the lack of error handling in the str_crt_conv_from function in the VFS. At other call sites there are some cases where errors are handled. But I don't know how to add error handling there. My idea is to keep fixing tests and then make a list of places where error handling is missing, so this went unnoticed.

mc-butler · 2024-09-13T08:43:33Z

Changed by zaytsev (@zyv) on Sep 13, 2024 at 8:43 UTC (comment 25)

vfs_path_string_convert segfaults, because the NULL pointer returned in vfs_translate_path_n is simply ignored and later explodes as element->path is NULL:

$ pstack core 
core 'core' of 10170:   ./vfs_path_string_convert
 0000000000427cd0 vfs_path_to_str_flags () + 15f2
 00000000004280b5 vfs_path_from_str_flags () + b2
 000000000042b05d test_from_to_string_fn () + 3d
 00007fffac43945c srunner_run_tagged () + 55c
 0000000000424808 mctest_run_all () + 5f
 000000000042b510 main () + b3
 0000000000424687 _start_crt () + 87
 00000000004245e8 _start () + 18

edit_complete_word_cmd segfaults due to missing error handling:

$ pstack core 
core 'core' of 19717:   ./edit_complete_word_cmd
 00007fffaf379c05 iconv_close (ffffffffffffffff) + 15
 00000000004f1af9 str_close_conv () + 28
 00000000004ea422 str_nconvert_to_display () + 90
 00000000004656a9 edit_collect_completion_from_one_buffer () + 2cd
 0000000000465972 edit_collect_completions () + 1c4
 0000000000466104 edit_complete_word_cmd () + 2d5
 0000000000455319 test_autocomplete_fn () + c1
 00007fffac43945c srunner_run_tagged () + 55c
 00000000004545e8 mctest_run_all () + 5f
 000000000045583e main () + 8a
 0000000000454487 _start_crt () + 87
 00000000004543e8 _start () + 18

mc-butler · 2024-09-13T09:32:23Z

Changed by andrew_b (@aborodin) on Sep 13, 2024 at 9:32 UTC (comment 25.26)

Replying to zaytsev:

edit_complete_word_cmd segfaults due to missing error handling:
 00000000004ea422 str_nconvert_to_display () + 90

I've added a commit with fix of that.

mc-butler · 2024-09-15T09:55:45Z

Changed by zaytsev (@zyv) on Sep 15, 2024 at 9:55 UTC (comment 39)

I looked a bit more into implementing aliases support. As I suggested, there are two ways to do it:

Parse file into existing data structure (id & name)
Parse file into new data structure (id, aliases & name)

I tried the second option, which I think is the only one that makes things better, but it just seems too crazy:

My understanding is that we need to check all usages of get_codepage_index and get_codepage_id and the variables where the results are stored. Every place where some comparisons are done needs to be replaced with a new function like codepage_equals which takes the aliases into account. I think it's about ~100 places to check.

So I tried the first option just to see how it feels and came up with a patch (I know it's dirty, just to show what I mean!). I'm not sure if it makes things any better compared to manual entries in the file. You need some kind of way to disambiguate the descriptions for aliases, so I added the id in parentheses. None of this feels like it's worth it... except maybe if you like my gio / glib conversion instead of the usual C-style manual parsing.

P.S. I also threw away the default thing, which I think was never used.

mc-butler · 2024-09-15T14:02:20Z

Changed by andrew_b (@aborodin) on Sep 15, 2024 at 14:02 UTC (comment 40)

Aliases mean that we will have several codepages assigned to one menu item. We don't want several menu items with same text, are we? If one menu item will cover several codepages we should somehow try them one by one to found workable convertor.

mc-butler · 2024-09-15T15:40:15Z

Changed by zaytsev (@zyv) on Sep 15, 2024 at 15:40 UTC (comment 41)

Aliases mean that we will have several codepages assigned to one menu item.

Yes, this is the option (2) described in comment:39. But how to do this? Do you have any idea how to avoid going through all the call sites? I will not manage to rewrite this in several months. Parsing turned out to be the "easy" part...

Maybe we can do more hacks like the Solaris one, even though it's disgusting? I think not many code pages are affected. I have started to make a list of differences between platforms.

mc-butler · 2024-09-24T10:09:40Z

Changed by zaytsev (@zyv) on Sep 24, 2024 at 10:09 UTC

Blocking changed from #4495 to #4495, #4591

mc-butler · 2024-09-24T10:21:21Z

Changed by zaytsev (@zyv) on Sep 24, 2024 at 10:21 UTC (comment 43)

Branch state changed from no branch to on review
Milestone changed from Future Releases to 4.8.33
Cc slavazanko deleted

Branch: 3972_fix_illumos
Initial [c4a29da290dd6ebf6fb88071f9e6456820e92f8c]

I have rebased everything, cleaned up the patches and went for the encoding name detection. The other options are just an insane amount of work with the current code base. Follow-up ticket is #4591. Maybe one day someone could have a look.

mc-butler · 2024-09-30T08:03:54Z

Changed by zaytsev (@zyv) on Sep 30, 2024 at 8:03 UTC (comment 44)

It seems that there is still one test, that now fails on Linux :( The paths are dropped due to missing error handling as explained in #4591.

FAIL: vfs_path_string_convert

mc-butler · 2024-09-30T09:16:17Z

Changed by zaytsev (@zyv) on Sep 30, 2024 at 9:16 UTC (comment 45)

The test passes on Linux if I add CP866 in addition to IBM866 to mc.charsets for tests.

IBM866   CP 866
CP866    CP 866

What is this evil magic again :( I thought that iconv translations were independent of the display stuff, but somehow there is still a relationship.

mc-butler · 2024-10-01T07:04:57Z

Changed by andrew_b (@aborodin) on Oct 1, 2024 at 7:04 UTC (comment 45.46)

Replying to zaytsev:

The test passes on Linux if I add CP866 in addition to IBM866 to mc.charsets for tests.
IBM866   CP 866
CP866    CP 866
What is this evil magic again :( I thought that iconv translations were independent of the display stuff, but somehow there is still a relationship.

The test is passed if encoding in mc.charsets is the same as in vfs_path_string_convert.c.

sed -i -e 's/CP866/IBM866/' vfs_path_string_convert.c

makes the test passed with IBM866 in mc.charsets.

mc-butler · 2024-10-01T08:24:36Z

Changed by zaytsev (@zyv) on Oct 1, 2024 at 8:24 UTC (comment 47)

The test is passed if encoding in mc.charsets is the same as in vfs_path_string_convert.c.

Yes, I understand that. My question is WHY?

I found the answer for initialisation (codeset must match), and figured it's impossible for me to fix without rewriting mc. But I was hoping that conversions would be independent of this list of encodings: The iconv function works with both CP866 and IBM866.

The only suspicious place I've found is is_supported_encoding, but when I set it to return true it still doesn't work. There seems to be some other hidden connection.

mc-butler · 2024-10-03T09:40:39Z

Changed by zaytsev (@zyv) on Oct 3, 2024 at 9:40 UTC (comment 48)

It is the is_supported_encoding function! Apparently the source files were not rebuilt properly the last time I checked, so this confused me. This function determines if encoding is supported by looking at the charsets table, which I think is wrong. My idea would be to try creating a converter with iconv instead, and if it works - great, if not, the encoding is not supported.

diff --git a/lib/charsets.c b/lib/charsets.c
index ccaf4f6ae..67553fa62 100644
--- a/lib/charsets.c
+++ b/lib/charsets.c
@@ -267,17 +267,10 @@ get_codepage_index (const char *id)
 gboolean
 is_supported_encoding (const char *encoding)
 {
-    gboolean result = FALSE;
-    guint t;
-
-    for (t = 0; t < codepages->len; t++)
-    {
-        const char *id;
-
-        id = ((codepage_desc *) g_ptr_array_index (codepages, t))->id;
-        result |= (g_ascii_strncasecmp (encoding, id, strlen (id)) == 0);
-    }
-
+    GIConv coder = str_crt_conv_from (encoding);
+    gboolean result = (coder != INVALID_CONV) ? TRUE : FALSE;
+    if (result)
+        str_close_conv (coder);
     return result;
 }

But I have the next problem: canonicalize_pathname test fails :(

/b/#enc:utf-8/../c != /c
/b/c/#enc:utf-8/.. != /b

Other test cases pass...

mc-butler · 2024-10-03T09:52:36Z

Changed by zaytsev (@zyv) on Oct 3, 2024 at 9:52 UTC (comment 49)

This seems to be a bug in the code: the old version of is_supported_encoding would ignore everything in the encoding name if a partial match was found. The new version, of course, wants a full match. The code in these test cases passes utf-8/../c as the encoding name, and the new function returns false.

I think a partial match is a bad idea. It could allow old code to accept completely invalid encoding names like koi8-rus. It wouldn't be removed from the path even though it's invalid and unsupported.

mc-butler · 2024-10-03T11:32:11Z

Changed by zaytsev (@zyv) on Oct 3, 2024 at 11:32 UTC (comment 50)

Suggested fix in [3ef535f8b48e850f7ef465c2dee36e9cde3f55b6] . Please review.

mc-butler · 2024-10-03T18:25:12Z

Changed by andrew_b (@aborodin) on Oct 3, 2024 at 18:25 UTC (comment 50.51)

Replying to zaytsev:

Suggested fix in [3ef535f8b48e850f7ef465c2dee36e9cde3f55b6] . Please review.

Fixes:

coding style;
memory leaks in canonicalize_pathname_custom() due to usage of vfs_get_encoding();

vfs_get_encoding() was move to the "public functions" section of file.

mc-butler · 2024-10-03T19:02:30Z

Changed by zaytsev (@zyv) on Oct 3, 2024 at 19:02 UTC (comment 52)

Thank you very much! I didn't realise that this function allocates a new string that needs to be freed :(

What about the rest? I think we fixed some bugs, but the aliases and error handling stuff are more like design issues. I have created a follow up if anyone wants to address this one day. As far as I'm concerned, this one is done.

mc-butler · 2024-10-05T08:26:03Z

Changed by andrew_b (@aborodin) on Oct 5, 2024 at 8:26 UTC (comment 53)

Branch state changed from on review to approved
Votes set to andrew_b

tests: vfs_path_string_convert - use UTF-8 to prevent creation of invalid converters

tests: path_manipulations / path_serialize - use UTF-8 to prevent creation of invalid converters

can be squashed.

mc-butler · 2024-10-05T09:51:17Z

Changed by zaytsev (@zyv) on Oct 5, 2024 at 9:51 UTC (comment 54)

Resolution set to fixed
Votes changed from andrew_b to committed-master
Branch state changed from approved to merged
Status changed from accepted to testing

Merged as [a24d45f] . Wow!

mc-butler · 2024-10-08T06:54:35Z

Changed by andrew_b (@aborodin) on Oct 8, 2024 at 6:54 UTC (comment 55)

An aftershok.

if ! xgettext -h 2>&1 | grep -e '--keyword=' >/dev/null ; then

This line in autogen.sh is not quite correct. The help of program is translated and may not matched the pattern '--keyword='. Particularly, in Russian

$ xgettext -h 2>&1 | grep -e '--keyword='
$

and po-related files are not created. Because

$ xgettext -h 2>&1 | grep -e '--keyword'
  -k, --keyword[=СЛОВО]       дополнительное ключевое слово для поиска
  -k, --keyword               не использовать ключевые слова по умолчанию

mc-butler · 2024-10-08T07:11:47Z

Changed by zaytsev (@zyv) on Oct 8, 2024 at 7:11 UTC (comment 56)

Oh, this is annoying, sorry.

What do you think is the best way to fix it? I wasn't happy about it anyway, because of course the text might go unnoticed, but I think the situation before, with random errors, was even worse. Do you have a better idea?

grep -e '--keyword' without =
LANG=C
Print it in red using ANSI sequences?
Print to stderr?
Go to the beginning of the script and exit 1?

mc-butler · 2024-10-08T07:37:12Z

Changed by andrew_b (@aborodin) on Oct 8, 2024 at 7:37 UTC (comment 56.57)

LANG=C xgettext --help: output is in Russian.
LC_MESSAGES=C xgettext --help: output is in English

mc-butler · 2024-10-17T07:17:09Z

Changed by zaytsev (@zyv) on Oct 17, 2024 at 7:17 UTC

Attachment 0001-buildsys-fix-bootstrapped-build-on-Solaris-and-syste.patch (raw) added

mc-butler · 2024-10-17T07:17:38Z

Changed by zaytsev (@zyv) on Oct 17, 2024 at 7:17 UTC (comment 58)

Unfortunately, I managed to also break a bootstrapped build on Solaris. Ok to commit to master?

mc-butler · 2024-10-17T09:53:21Z

Changed by andrew_b (@aborodin) on Oct 17, 2024 at 9:53 UTC (comment 59)

Ok.

mc-butler · 2024-10-17T11:06:25Z

Changed by zaytsev (@zyv) on Oct 17, 2024 at 11:06 UTC (comment 60)

Done, thanks!

mc-butler · 2024-10-18T19:11:39Z

Changed by zaytsev (@zyv) on Oct 18, 2024 at 19:11 UTC (comment 61)

Status changed from testing to closed

mc-butler closed this as completed Oct 18, 2024

mc-butler assigned zyv Feb 28, 2025

mc-butler mentioned this issue Sep 24, 2024

Add error handling for iconv conversions in VFS and editor code #4591

Open

lib/vfs testsuite failures on illumos #3972

lib/vfs testsuite failures on illumos #3972

Comments

mc-butler commented Feb 23, 2019

mc-butler commented Feb 23, 2019

mc-butler commented Feb 24, 2019

mc-butler commented Jan 23, 2020

mc-butler commented Jan 23, 2020

mc-butler commented Jan 23, 2020

mc-butler commented Sep 3, 2024

mc-butler commented Sep 3, 2024

mc-butler commented Sep 3, 2024

mc-butler commented Sep 3, 2024

mc-butler commented Sep 3, 2024

mc-butler commented Sep 3, 2024

mc-butler commented Sep 3, 2024

mc-butler commented Sep 3, 2024

mc-butler commented Sep 4, 2024

mc-butler commented Sep 4, 2024

mc-butler commented Sep 4, 2024

mc-butler commented Sep 5, 2024

mc-butler commented Sep 5, 2024

mc-butler commented Sep 5, 2024

mc-butler commented Sep 5, 2024

mc-butler commented Sep 5, 2024

mc-butler commented Sep 7, 2024

mc-butler commented Sep 7, 2024

mc-butler commented Sep 11, 2024

path_len test

Linux and macOS

Solaris

Alpine

Conclusion

mc-butler commented Sep 11, 2024

mc-butler commented Sep 11, 2024

mc-butler commented Sep 12, 2024

mc-butler commented Sep 12, 2024

mc-butler commented Sep 13, 2024

mc-butler commented Sep 13, 2024

mc-butler commented Sep 15, 2024

mc-butler commented Sep 15, 2024

mc-butler commented Sep 15, 2024

mc-butler commented Sep 24, 2024

mc-butler commented Sep 24, 2024

mc-butler commented Sep 30, 2024

mc-butler commented Sep 30, 2024

mc-butler commented Oct 1, 2024

mc-butler commented Oct 1, 2024

mc-butler commented Oct 3, 2024

mc-butler commented Oct 3, 2024

mc-butler commented Oct 3, 2024

mc-butler commented Oct 3, 2024

mc-butler commented Oct 3, 2024

mc-butler commented Oct 5, 2024

mc-butler commented Oct 5, 2024

mc-butler commented Oct 8, 2024

mc-butler commented Oct 8, 2024

mc-butler commented Oct 8, 2024

mc-butler commented Oct 17, 2024

mc-butler commented Oct 17, 2024

mc-butler commented Oct 17, 2024

mc-butler commented Oct 17, 2024

mc-butler commented Oct 18, 2024

`path_len` test