Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve handling of compressed content in mc.ext #4128

Closed
mc-butler opened this issue Oct 8, 2020 · 22 comments
Closed

improve handling of compressed content in mc.ext #4128

mc-butler opened this issue Oct 8, 2020 · 22 comments
Assignees
Labels
area: core Issues not related to a specific subsystem prio: medium Has the potential to affect progress

Comments

@mc-butler
Copy link

Important

This issue was migrated from Trac:

Origin https://midnight-commander.org/ticket/4128
Reporter ossi (@ossilator)
Mentions netch@….kiev.ua

this may be controversial - there is a related thread at https://mail.gnome.org/archives/mc/2005-June/thread.html#00022, though one would expect it to be outdated by now. i've been using this code for a very long time.

Note

Original attachments:

@mc-butler
Copy link
Author

Changed by ossi (@ossilator) on Oct 8, 2020 at 22:40 UTC

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 12, 2020 at 5:36 UTC (comment 1)

  • Milestone changed from Future Releases to 4.8.26
  • Status changed from new to accepted
  • Owner set to andrew_b

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 12, 2020 at 5:38 UTC (comment 2)

  • Branch state changed from no branch to approved
  • Votes set to andrew_b

Branch: 4128_mc.ext_compressed_content
[8857423]

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 12, 2020 at 5:39 UTC (comment 3)

  • Branch state changed from approved to merged
  • Status changed from accepted to testing
  • Resolution set to fixed
  • Votes changed from andrew_b to committed-master

Thanks!

Merged to master: [eeb14d3].

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 12, 2020 at 5:41 UTC (comment 4)

  • Status changed from testing to closed

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Nov 1, 2020 at 23:47 UTC (comment 5)

It would be awesome if someone did something to ts - right now it is always regarded as a video file. This might have been reasonable before the advent of TypeScript, but now most of TS-files are actually code.

Unfortunately file is not perfect:

zaytsev@parallels:~/src/app/src$ file main.ts 
main.ts: Java source, ASCII text

zaytsev@parallels:~/src/app/src$ file typings.d.ts 
typings.d.ts: ASCII text

zaytsev@parallels:~/src/app/src$ file test.ts 
test.ts: Java source, UTF-8 Unicode text

Maybe match on text before video?

@mc-butler
Copy link
Author

Changed by ossi (@ossilator) on Nov 2, 2020 at 11:02 UTC (comment 6)

that's one hell of an overloaded extension:

> file qt/translations/qt_ru.ts
qt/translations/qt_ru.ts: XML 1.0 document, UTF-8 Unicode text, with very long lines

(this extension predates typescript by over a decade.)

we can keep the order, but need to switch to content detection:

...
# does anyone have a sample file to verify that?
type/MPEG
    Include=video
...
type/XML
    Open=linguist %f
...
include/editor
...

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Nov 2, 2020 at 11:32 UTC (comment 7)

zaytsev@parallels:~/src$ file sample_640x360.ts 
sample_640x360.ts: data

zaytsev@parallels:~/src$ file ed24p_10.ts 
ed24p_10.ts: data

@mc-butler
Copy link
Author

Changed by ossi (@ossilator) on Nov 2, 2020 at 11:56 UTC (comment 8)

heh, FAIL.
then we need some coding, to enable compound conditions like this:

shell/i/.ts
type/^data$
    Open=...

(so basically AND conditions by stacking.)

i'd suggest to create a new ticket (and possibly delete the kinda off-topic comments here).

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Nov 2, 2020 at 12:29 UTC (comment 9)

Done in #4141.

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Nov 2, 2020 at 12:30 UTC

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Feb 6, 2025 at 18:33 UTC (comment 11)

Feedback from Ubuntu user:

https://bugs.launchpad.net/ubuntu/+source/mc/+bug/2097560

I have a zip file where the first enclosed file is PDF. For a reason (I strictly prefer okular over evince and was lazy to find another place to set the priority), I changed the mc extension file (mc.ext) entry to the following:

type/^PDF
        Open=okular %f & disown

Unexpectedly, this caused misopening of some zip files. Checking what is happening by strace showed that mc calls file with arguments to detect the enclosed contents:

1368162 19:40:43.017932 execve("/usr/bin/file", ["file", "-z", "-S", "-L", "/home/netch/Downloads/sql_12.zip"], 0x6017f1d3d398 /* 68 vars */ <unfinished ...>

and this file helper returns:

1368162 19:40:43.040724 write(1, "PDF document, version 1.3, 90 pages (Zip archive data, at least v2.0 to extract, compression method=deflate)\n", 109) = 109

The rule in the extension file detects, according to "^PDF", this is PDF and runs okular instead of ark for this file. Okular deliberately rejects opening zip.

With the default extension file contents:

    Open=/usr/lib/mc/ext.d/doc.sh open pdf

the bug still has place but is masked because passing through doc.sh and then xdg-open rechecks the file type using file (technically, with kioslave5) without -z that allows proper detection.

So using simple file/ rules in mc extension file is incompatible with using file -z for deeper analysis (if this deeper analysis has sense in general).

Iʼm uncertain for security impact but it could have taken place in a more complicated scenario.

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Feb 6, 2025 at 18:40 UTC (comment 12)

I'm not sure what the best way is to solve this problem universally. I think our Type= rules could/should exclude compressed? ossi, can you do something about this?

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Feb 6, 2025 at 18:42 UTC (comment 13)

  • Status changed from closed to reopened
  • Resolution fixed deleted

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Feb 6, 2025 at 18:43 UTC (comment 14)

  • Milestone changed from 4.8.26 to Future Releases
  • Branch state changed from merged to no branch
  • Votes committed-master deleted

@mc-butler
Copy link
Author

Changed by ossi (@ossilator) on Feb 6, 2025 at 20:13 UTC (comment 15)

dunno, depends on how stupid file really is.

hmm, apparently very - file -z on an .odt file bugs out a gzip error. this is doubly bogus, as these are zip and not gzip files.
i was attempting to figure out what it would do for higher-level file formats that are inherently archives. so much for that.

that it aggressively peeks contents of multi-file archives is also extremely questionable. it doesn't do that for tar files.

maybe that release of file is just really botched? this is file-5.45 from recent debian unstable.

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Feb 7, 2025 at 7:32 UTC (comment 16)

Some experiments on my machine (macOS):

% file --version
file-5.41
magic file from /usr/share/file/magic

% file -z -b -L -S test.pdf.zip
ERROR:[gzip: Unknown compression format] (Zip archive data, at least v2.0 to extract, compression method=deflate)

% file -b -L -S test.pdf.zip       
Zip archive data, at least v2.0 to extract, compression method=deflate

% file -z -b -L -S test.pdf.gz                                    
PDF document, version 1.3 (zip deflate encoded) (gzip compressed data, was "test.pdf", last modified: Mon Jul  8 06:39:30 2024, from Unix)

% file -b -L -S test.pdf.gz                                        
gzip compressed data, was "test.pdf", last modified: Mon Jul  8 06:39:30 2024, from Unix, original size modulo 2^32 1264794

% file -z -b -L -S test.pdf.bz2                                    
PDF document, version 1.3 (zip deflate encoded) (bzip2 compressed data, block size = 900k)

% file -b -L -S test.pdf.bz2                                       
bzip2 compressed data, block size = 900k

It would really help if you could please explain what did you want to achieve and how it was supposed to work? To me it seems that -z, which is documented as "Try to look inside compressed files" does exactly what it says.

My guess is that you wanted to simplify / make the detection of man stuff (or, generally, compressed data for which we have special handling) more reliable.

In this case, maybe the solution is to make sure that we put the archives first, unless we have special support for compressed data. If we do, then these entries should come first. Can you look into this and make a patch?

@mc-butler
Copy link
Author

Changed by ossi (@ossilator) on Feb 7, 2025 at 14:28 UTC (comment 17)

ok, i had a closer look.

In this case, maybe the solution is to make sure that we put the archives first, unless we have special support for compressed data.

yes, this seems like what we should do.
but then, there is the mess of c3848a6 and a4cccdf.
so this clearly needs some rather careful examination.
i recommend to re-close this issue and create a followup.

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Feb 7, 2025 at 15:29 UTC (comment 18)

  • Milestone Future Releases deleted
  • Status changed from reopened to closed
  • Resolution set to fixed

Created new ticket as suggested - #4649. I would appreciate any help - editing the ticket itself, and of course working on the rules.

I'm now trying to focus my time on researching a proper GitHub migration, as we don't want an improper one, and a proper one requires tons of work for which there are no other takers :(

@mc-butler
Copy link
Author

Changed by netch (netch@….kiev.ua) on Feb 7, 2025 at 15:41 UTC (comment 19)

  • Cc set to netch@….kiev.ua

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Feb 22, 2025 at 11:46 UTC

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Feb 22, 2025 at 11:47 UTC (comment 21)

  • Blocked by set to #4649

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: core Issues not related to a specific subsystem prio: medium Has the potential to affect progress
Development

No branches or pull requests

2 participants