- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/vfs testsuite failures on illumos #3972
Comments
|
Well, I can see three kinds of test result in the attached log:
2) and 3) are charset related. What is your locale?
First of all I'm interesting aborted tests. Since I'm not an OpenIndiana user I'm unable to reproduce these errors. |
|
|
I'm trying to enable tests on Alpinelinux and getting 2 failures in vfs_path_string_convert.c
Ref https://gitlab.alpinelinux.org/alpine/aports/merge_requests/3243
|
Mentioned Alpine problems are due to musl iconv: #4495 .
Unfortunately, Cfarm doesn't have an Illumos / OpenIndiana host. I see that mc 4.8.31 is packaged. What is the current situation there, do the tests pass? Is there a development shell host one could use? Any build/test logs? What is the libc like, who provides iconv? Not sure if these failures are due to (to be fixed) mocking things, or iconv.
Maybe a good approximation is to check what is the situation on Solaris. |
I am not involved with the project anymore, but you can get a Vagrant Box from https://app.vagrantup.com/openindiana/boxes/hipster. |
Thanks for the tip, do you happen to know if it's possible to download this somehow and import into UTM instead of setting up Vagrant? UTM is an interface for QEMU, so I'd need a qcow2 image and information about the machine configuration. Apparently recently a plugin for Vagrant was developed to use UTM behind the scenes, but if I don't have to install lots of stuff, that would be great... |
The libvirt provider of the Box might be a QCOW2 underneath, but likely won't boot as such if extracted and imported.
There are also installation images https://openindiana.org/downloads/; even the minimal image has a small runtime environment with shell, but if you need to install a package to debug things, you'd rather install the whole OS and go from there. |
Thanks again! That looks good so far. Is there any way to know how the machine is configured? Vargantfile has only the following inside:
Like memory, CPU, devices... |
The configuration is likely sourced from https://github.com/OpenIndiana/oi-packer. |
Perfect, I've got it to work - the configuration is here:
https://github.com/OpenIndiana/oi-packer/blob/master/hipster.pkr.hcl
Default PC, 4G RAM, AMD64.
Sadly, everything is extremely slow on ARM, but apparently OI doesn't support ARM natively, so emulation is the only option.
But how do I get in on the console? jack/jack & root/openindiana are not working! |
Try, root/vagrant https://github.com/OpenIndiana/oi-packer/blob/master/hipster.pkr.hcl#L16? |
Hey, not sure how I have missed it. You were of great help, thank you! I guess I should add some notes on how to test on OpenIndiana on our wiki. I was still able to reproduce test failures. There are two types of problems:
The charset stuff is annoying, because to my understanding we are actually not using it directly, but pulling it from glib, so either we do something wrong, or glib iconv is broken on these platforms.
I guess I need to look into small reproducer. In the worst case we should include it in autoconf tests and disable charsets if this test fails, because then mc is not going to work correctly. |
|
You are welcome. iconv might be different on illumos distributions like OpenIndiana (source from https://github.com/illumos/illumos-gate/tree/master/usr/src/cmd/iconv) and might do a few things differently to more common GNU libinconv. |
I have tested iconv on the command line and it seems to convert between UTF-8 and KOI8-R used in the tests just fine. All aliases also available. I was even able to read the file in translation mode with mcview coming from the shipped mc! So at least for that the conversions are working correctly.
So I started reading path_len.c and following the functions with HAVE_CHARSET, and I actually don't quite get how it works. Apparently the idea was that the elements of the path are converted from terminal encoding into whatever #enc says they should be encoded in memory, and the encoding is also set to what enc says:
https://github.com/MidnightCommander/mc/blob/master/lib/vfs/vfs.c#L170
But then apparently it was supposed to stay UTF-8 (or rather terminal encoding) in vpath->str when all elements are joined again?
For me on macOS this conversion doesn't do anything though (WHAT?!), it seems that the element stays encoded as UTF-8. But apparently as there is a bug in the conversion back, which also doesn't happen, so the end result is correct and the test pass.
It seems that on OpenIndiana/Illumos and Alpine/musl it actually DOES the conversion to KOI8-R as intended, but due to missing conversion back the test fails.
Andrew, can you help, at least tell me how it was supposed to work before I debug further? This is very confusing. |
On Illumos: term_encoding: 646 (!?)
With the following patch:
The tests work on Illumos. But for any single bit (KOI8-R or ASCII) seems to break.
On macOS: term_encoding: US-ASCII, but also forcing "UTF-8" works. |
Replying to zaytsev:
Sorry :-( |
I see, so the whole VFS magic including recoding was Slava's work, and you don't know how this #enc: stuff was actually supposed to work? Slava, are you alive :) ? Can you say something? |
So I traced the functions again on macOS, and I was initially wrong about conversion not happening there which confused me. Everything is working as expected (?):
It seems that there is some problem with vfs_path_to_str_flags on Illumos, which apparently doesn't have to do with iconv, but with detecting terminal encoding and the need to convert back. Or maybe the conversion fails and this is not handled correctly. In any case, the problem is that the conversion from UTF-8 to KOI8-R happens correctly in vfs_path_from_str_uri_parser, but the conversion back to UTF-8 doesn't happen.
If I force terminal encoding to UTF-8 with str_init_strings ("UTF-8") as shown above, then both conversions are happening correctly like on macOS.
I wonder if I can get any further with a debugger... |
gdbserver is not supported on Solaris, and gdbgui needed fixing:
And best of all, the tests pass under graphical debugger - probably because it's run inside Python and stuff is set to UTF there :(
I have the feeling that g_iconv_open (codeset, from_enc) where codeset="646" and from_enc="KOI8-R" in strutil.c:269 returns something that doesn't do anything, which is weird, because the conversion in the forward direction works. |
I think I've figured out what the problem is with Illumos and Alpine. The issues are not exactly the same, but the root cause is the same.
In the VFS library we have conversions that go in two directions:
Both functions return estr_t, which can be ESTR_SUCCESS, ESTR_PROBLEM (some characters could not be translated, so it's not possible to reverse the operation) and ESTR_FAILURE (conversion impossible due to wrong / unsupported encoding).
The result of the forward conversion vfs_translate_path is somehow checked and returns a NULL pointer on failure, which is supposed to be checked (hopefully?), but on problems it doesn't do anything and just continues.
The result of the backward conversion (str_vfs_convert_from) is simply ignored! path_len test
This test is started with str_init_strings (NULL), which means US-ASCII on macOS and equivalents elsewhere. The HAVE_CHARSET part contains a raw UTF-8 string that should be recoded as KOI8-R for the file system. Linux and macOS
iconv just interprets the UTF-8 string as bytes of US-ASCII and "converts" them from bytes of KOI8-R without changing anything.
The problem happens during the conversion from KOI8-R to US-ASCII and is signaled as ESTR_PROBLEM, but it's not a hard error, just some characters can't be translated. And what iconv does is just leave the string as it is.
Since the result of the backward conversion is ignored, everything matches. Solaris
Here iconv actually decides to do a forward conversion because it can and it's valid, even though it probably shouldn't. So the path element is now a valid KOI8-R string.
The backward conversion from KOI8-R to US-ASCII fails, but iconv leaves the string as it is and ignores the result, so the KOI8-R string is returned.
The length obviously doesn't match and the test fails. Alpine
Similar to Solaris, but instead of returning the string unchanged, iconv returns an empty string, so path ends up being /#enc:KOI8-R and the assertion fails:
Conclusion
This particular test is simply broken and only works on Linux and macOS by (bad) luck. It should be initialized with str_init_strings ("UTF-8") and this would make the test work correctly on all systems.
I haven't analyzed other tests, but probably they can be fixed somehow.
My question is, what do we do about error handling? It doesn't sound like a good idea to just ignore problems. At the very least, it leads to buggy tests working by accident without anyone noticing. But there are also segfaults on Alpine.
I think maybe it was designed that way because the whole recoding story was for like Linux (KOI8-R) systems accessing Windows FTP (CP1251) with wrong encoding returned or something, so in the worst case the errors mean the file list is garbage or something, and there was actually no intention to propagate errors.
I think we should at least not ignore ESTR_FAILURE on reverse conversion and return a string of ??? of the same length as the input one, so nothing segfaults.
I'm also not sure about returning a NULL pointer on forward conversion. I see some NULL checks, but many seem to be missing. Maybe also return a string of ??? instead?
We can then test for invalid conversions.
Andrew, what do you think? |
Replying to zaytsev:
Because test strings are in UTF-8?
Definitely, an error handling should be done.
Probably, in cases of ESTR_FAILURE return of an input string is a workable solution. But if it hides a conversion problem from the caller, it's better to return NULL and a caller should decide to do then.
|
Yes.
Suppose you want to test the correctness of the conversions from the terminal encoding to the desired encoding and back at a very superficial level. You need to provide a path that will be encoded differently than the terminal encoding, but the conversion should be lossless.
For example, you can provide the test string in UTF-8 (which will be your terminal encoding) and say that a part of it should be converted to KOI8-R: /#enc:KOI8-R/тестовый/путь. You can now check that after the roundtrip the path has the correct length in bytes (38). You should also check that /тестовый/путь takes 14 bytes in KOI8-R and not 26 bytes. But this was never checked, and so no one has found out that the test is wrong.
The thing is, you have to do str_init_strings ("UTF-8") for the code to work correctly (UTF-8 -> KOI8-R -> UTF-8). If you do str_init_strings (NULL), it will assume that the terminal encoding is ASCII and your string is in ASCII, but it's actually in UTF-8. What happens then depends on how exactly iconv fails, but because we don't do proper error handling, the roundtrip actually works on Linux and macOS as no conversion happens and the raw bytes are just preserved. On Solaris and musl, different things happen, but at least one of the two conversions fails.
So we can either fix the test by adding a missing assertion and setting the encoding to UTF-8, which it is, and then it will work everywhere. Or we can choose a different pair of encodings.
For example, hardcode bytes to CP1251 and do a CP1251 -> KOI8-R -> CP1251 conversion. But this might not be so good for length checks, because both are singlebyte encodings. Unfortunately, the only multibyte encoding we support is UTF-8, because the whole path handling depends on ASCII symbols like / being encoded as single-byte ASCII. So I think it's fine to keep UTF-8. It's actually a good test, and single byte conversions should be tested elsewhere, like `vfs_path_string_convert'.
I have started a branch to fix the tests. Unfortunately, the following tests still need to be investigated (some of them may fail due to broken mocks, see #4584):
|
path_cmp - fixed in the same way as path_len.
path_manipulations & path_serialize have the same problem: for some reason I don't understand yet, #enc:KOI8-R is dropped when manipulating the path (in vfs_path_append_vpath_new and so vfs_path_to_str_flags).
Setting terminal encoding to UTF-8 or a valid Cyrillic encoding like CP1251 fixes the tests. |
Ok, I think I know why this happens. On Solaris, g_iconv_open ("ASCII", "KOI8-R") fails because this conversion is considered invalid. On macOS and apparently Linux, this converter can be created, and as long as the compatible subset is given as strings, it even works.
The problem is the lack of error handling in the str_crt_conv_from function in the VFS. At other call sites there are some cases where errors are handled. But I don't know how to add error handling there. My idea is to keep fixing tests and then make a list of places where error handling is missing, so this went unnoticed. |
vfs_path_string_convert segfaults, because the NULL pointer returned in vfs_translate_path_n is simply ignored and later explodes as element->path is NULL:
edit_complete_word_cmd segfaults due to missing error handling:
|
Replying to zaytsev:
I've added a commit with fix of that. |
I looked a bit more into implementing aliases support. As I suggested, there are two ways to do it:
I tried the second option, which I think is the only one that makes things better, but it just seems too crazy:
My understanding is that we need to check all usages of get_codepage_index and get_codepage_id and the variables where the results are stored. Every place where some comparisons are done needs to be replaced with a new function like codepage_equals which takes the aliases into account. I think it's about ~100 places to check.
So I tried the first option just to see how it feels and came up with a patch (I know it's dirty, just to show what I mean!). I'm not sure if it makes things any better compared to manual entries in the file. You need some kind of way to disambiguate the descriptions for aliases, so I added the id in parentheses. None of this feels like it's worth it... except maybe if you like my gio / glib conversion instead of the usual C-style manual parsing.
P.S. I also threw away the default thing, which I think was never used. |
Aliases mean that we will have several codepages assigned to one menu item. We don't want several menu items with same text, are we? If one menu item will cover several codepages we should somehow try them one by one to found workable convertor. |
Yes, this is the option (2) described in comment:39. But how to do this? Do you have any idea how to avoid going through all the call sites? I will not manage to rewrite this in several months. Parsing turned out to be the "easy" part...
Maybe we can do more hacks like the Solaris one, even though it's disgusting? I think not many code pages are affected. I have started to make a list of differences between platforms. |
Branch: 3972_fix_illumos
I have rebased everything, cleaned up the patches and went for the encoding name detection. The other options are just an insane amount of work with the current code base. Follow-up ticket is #4591. Maybe one day someone could have a look. |
The test passes on Linux if I add CP866 in addition to IBM866 to mc.charsets for tests.
What is this evil magic again :( I thought that iconv translations were independent of the display stuff, but somehow there is still a relationship. |
Replying to zaytsev:
The test is passed if encoding in mc.charsets is the same as in vfs_path_string_convert.c.
makes the test passed with IBM866 in mc.charsets. |
Yes, I understand that. My question is WHY?
I found the answer for initialisation (codeset must match), and figured it's impossible for me to fix without rewriting mc. But I was hoping that conversions would be independent of this list of encodings: The iconv function works with both CP866 and IBM866.
The only suspicious place I've found is is_supported_encoding, but when I set it to return true it still doesn't work. There seems to be some other hidden connection. |
It is the is_supported_encoding function! Apparently the source files were not rebuilt properly the last time I checked, so this confused me. This function determines if encoding is supported by looking at the charsets table, which I think is wrong. My idea would be to try creating a converter with iconv instead, and if it works - great, if not, the encoding is not supported.
But I have the next problem: canonicalize_pathname test fails :(
Other test cases pass... |
This seems to be a bug in the code: the old version of is_supported_encoding would ignore everything in the encoding name if a partial match was found. The new version, of course, wants a full match. The code in these test cases passes utf-8/../c as the encoding name, and the new function returns false.
I think a partial match is a bad idea. It could allow old code to accept completely invalid encoding names like koi8-rus. It wouldn't be removed from the path even though it's invalid and unsupported. |
Suggested fix in [3ef535f8b48e850f7ef465c2dee36e9cde3f55b6] . Please review. |
Replying to zaytsev:
Fixes:
vfs_get_encoding() was move to the "public functions" section of file. |
Thank you very much! I didn't realise that this function allocates a new string that needs to be freed :(
What about the rest? I think we fixed some bugs, but the aliases and error handling stuff are more like design issues. I have created a follow up if anyone wants to address this one day. As far as I'm concerned, this one is done. |
tests: vfs_path_string_convert - use UTF-8 to prevent creation of invalid converters
can be squashed. |
An aftershok.
This line in autogen.sh is not quite correct. The help of program is translated and may not matched the pattern '--keyword='. Particularly, in Russian
and po-related files are not created. Because
|
Oh, this is annoying, sorry.
What do you think is the best way to fix it? I wasn't happy about it anyway, because of course the text might go unnoticed, but I think the situation before, with random errors, was even worse. Do you have a better idea?
|
LANG=C xgettext --help: output is in Russian. |
|
Unfortunately, I managed to also break a bootstrapped build on Solaris. Ok to commit to master? |
Ok. |
Done, thanks! |
|
Important
This issue was migrated from Trac:
mnowak
(@Mno-hime)Running test suite from v4.8.22 on OpenIndiana 2018.10 (illumos kernel) with changeset from #3971 comment 1 I get following failures in the lib/vfs testsuite:
test-suite.log file is attached.
Let me know should you need information about the environment.
Note
Original attachments:
mnowak
(@Mno-hime) onFeb 23, 2019 at 17:33 UTC
andypost
(apostnikov@….com) onJan 23, 2020 at 16:03 UTC
andypost
(apostnikov@….com) onJan 23, 2020 at 16:03 UTC
zaytsev
(@zyv) onSep 4, 2024 at 18:36 UTC
zaytsev
(@zyv) onSep 13, 2024 at 10:53 UTC
zaytsev
(@zyv) onSep 13, 2024 at 10:54 UTC
zaytsev
(@zyv) onSep 13, 2024 at 10:54 UTC
zaytsev
(@zyv) onSep 13, 2024 at 18:05 UTC
andrew_b
(@aborodin) onSep 14, 2024 at 12:45 UTC
andrew_b
(@aborodin) onSep 14, 2024 at 13:35 UTC
zaytsev
(@zyv) onSep 15, 2024 at 9:53 UTC
zaytsev
(@zyv) onOct 17, 2024 at 7:17 UTC
The text was updated successfully, but these errors were encountered: