Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make MC faster at copying files withing one HDD: add a large buffer option #2193

Closed
mc-butler opened this issue May 13, 2010 · 21 comments
Closed
Assignees
Labels
area: core Issues not related to a specific subsystem prio: medium Has the potential to affect progress
Milestone

Comments

@mc-butler
Copy link

Important

This issue was migrated from Trac:

Origin https://midnight-commander.org/ticket/2193
Reporter birdie (aros@….com)
Mentions gotar@….pl, powerman-asdf@….ru

Currently MC has the same small buffer (64K) for all copy operations regardless their source and destination.

This has the following problem: when you need to copy a small file withing one physical HDD, HDD itself will spend a good chunk of time repositioning heads to read the next tiny portion of data.

I propose to implement a new Copy File(s) dialog option:

[X] Use large buffers

where copy_large_buffer can be defined as an option of the mc.ini file, with a default value of 64MB (it's quite sane for modern PCs).

@mc-butler
Copy link
Author

Changed by birdie (aros@….com) on May 13, 2010 at 10:51 UTC (comment 1)

PS This option applies to "Move file(s)" operation when a new destination is the same HDD but a different partition.

@mc-butler
Copy link
Author

Changed by ossi (@ossilator) on May 13, 2010 at 12:08 UTC (comment 2)

this should be no visible option, as it bothers the user with internal stuff.

mc should feel free to allocate as much buffer memory as it wants as long as it is not an excessive amount of the system's total physical memory (exact formula to be determined). sane allocators will actually return such big allocations to the system when they are freed, so the huge peak memory usage is of no concern.

one concern of huge buffers is abortability and accurate progress information. therefore the algorithm should start with some conservative chunk size (default determined by the media type) and adaptively adjust the size to make the processing of each chunk take about 200ms or so. for media with wildly differing bandwidths (e.g., hdd vs. ftp over dsl), the chunk sizes for reading and writing could also differ significantly. note that the finer chunking does not imply that each read is followed by one write - to achieve the optimization suggested by birdie, one would easily accumulate 16 4mb chunks. determining whether the source and destination live on the same physical media and thus whether a higher interleaving factor should be used is a bit of a challenge, though.

@mc-butler
Copy link
Author

Changed by birdie (aros@….com) on May 13, 2010 at 13:11 UTC (comment 2.3)

Replying to ossi:

determining whether the source and destination live on the same physical media and thus whether a higher interleaving factor should be used is a bit of a challenge, though.

That's why this time a user selectable option seems like a good way to go :) I'm not sure if POSIX API allows to determine if source and destination reside on the same media.

@mc-butler
Copy link
Author

Changed by ossi (@ossilator) on May 14, 2010 at 12:42 UTC (comment 3.4)

Replying to birdie:

That's why this time a user selectable option seems like a good way to go :)

"oh, it could be hard. let's do some user-unfriendly crap instead."

I'm not sure if POSIX API allows to determine if source and destination reside on the same media.

first off, let's assume we already have a real file system path (i.e., mcvfs needs to give us one).
then it gets tricky. posix as such will indeed not be enough. i think the most promising approach is querying the mount table (just calling mount and parsing the output) and recursively resolving the mount points to obtain the volumes the files live on. next, one would stat() the two devices and compare the major device ids returned in st_dev. caveats: a) the device id stuff is system-specific, i.e., it means googling for lots of man pages. b) even on linux, FUSE may mess up the detection of the real device. that's a minor problem, though: it's unlikely that huge files which need the above optimization live on a FUSE mount.

@mc-butler
Copy link
Author

Changed by birdie (aros@….com) on May 14, 2010 at 14:20 UTC (comment 5)

  • OK, first of all, Total Commander have this option. :)
  • Secondly, long forgotten Dos Navigator had this options (buried very deeply, but that's not what really matters).
  • Third of all,
# cd /tmp; mkdir loop; mount -o loop,ro geexbox-1.2.4-en.i386.glibc.iso loop;

# mount | grep loop
/dev/loop0 on /tmp/loop type iso9660 (ro)

Now try to understand, where files from /mnt/loop belong to.

  • Fourthly, using precalculated RAM size might be extremely dangerous, because let's say we want to use 10% of free RAM, but it may turn out that there's no real free RAM, because what seems to be free RAM is in fact a shared libraries cache (which in Linux is usually shown as cached/free RAM), and thus eating this amount of RAM may lead to swapping or even OOM situation.

@mc-butler
Copy link
Author

Changed by slyfox (@trofi) on May 14, 2010 at 18:11 UTC (comment 6)

I propose to implement a new Copy File(s) dialog option:

[X] Use large buffers

where copy_large_buffer can be defined as an option of the mc.ini file, with a default value of 64MB (it's quite sane for modern PCs).

My experiments didn't show any timing changes when buffers are larger, than 64KB on most of loads. Just copying 64K bytes is a relatively significant CPU work. Guess it's more, than syscall overhead. Where did you get the '64MB' digit? Do you use 'noop' scheduler on IDE/SATA disk?

If device/filesystem operates on larger data chunks (256KB SSD blocks, ~1MB flash blocks) - it caches data in block layer, so firing one more syscall to get cached data wouldn't matter.

I propose you to write benchmark, which disproves my expectations :]

@mc-butler
Copy link
Author

Changed by gotar (gotar@….pl) on May 26, 2010 at 20:15 UTC (comment 6.7)

  • Cc set to gotar@….pl

Replying to slyfox:

I propose you to write benchmark, which disproves my expectations :]

time cp linux-2.6.33.2.tar.bz2 /mnt/
0.00s user 0.31s system 2% cpu 11.619 total

echo 1 > /proc/sys/vm/drop_caches
time cat linux-2.6.33.2.tar.bz2 > /dev/null
0.01s user 0.11s system 1% cpu 10.512 total
time cp linux-2.6.33.2.tar.bz2 /mnt/
0.01s user 0.24s system 69% cpu 0.365 total

Similar results with dd - reading entire file first gives about 5% performance improvement. IMHO it's worth doing for 500M+ files regardles of I/O schedulers and the rest.

@mc-butler
Copy link
Author

Changed by angel_il (@ilia-maslakov) on Jul 5, 2010 at 20:29 UTC (comment 8)

  • Milestone changed from 4.7.3 to 4.7

@mc-butler
Copy link
Author

Changed by birdie (aros@….com) on Mar 19, 2011 at 8:35 UTC (comment 9)

  • Milestone changed from 4.7 to 4.8

I've now copied a large file (4,6GB) from one partition to another one, using

64K buffer:
real    2m20.418s
user    0m0.087s
sys     0m9.309s

and using 64M buffer:
real    1m54.316s
user    0m0.040s
sys     0m10.503s

So, using a larger buffer for copying files withing one physical HDD disk makes sense (it doesn't apply to SSD disks because their seek time is close to zero).

@mc-butler
Copy link
Author

Changed by krokous (krokous@….cz) on Apr 11, 2012 at 12:56 UTC (comment 9.10)

  • Branch state set to no branch

64K buffer:
real 2m20.418s

and using 64M buffer:
real 1m54.316s

64K may be small, but isn't 64M an overkill? What about for example a 1M buffer?
Could be large enough to have neglibigle overhead to 64M buffer, but it will eat much less memory.

Perhaps the size of the buffer can be specified somewhere in advanced config, with some reasonable, though still rather small (512K?) default.

I guess that more benchmarking (on both SSD and HDD) should be done before changing ther default.

@mc-butler
Copy link
Author

Changed by powerman (powerman-asdf@….ru) on May 28, 2012 at 19:44 UTC (comment 11)

Another use case for this is using 'sync' mount option for usb flash drive (to make it possible to eject flash right after copy file dialog closes, without needs to umount first).

While 'sync' is too slow (and thus unusable) on most filesystems, it works really good on ext4. On my Corsair without 'sync' usual cp speed is 11MB/sec, with 'sync' cp speed is 4.5MB/sec, but mc speed is only 1.5MB/sec. At same time, dd bs=2M speed is 11.5MB/sec (i.e. even faster than cp without 'sync'!).

So, large buffers (1-64MB) for copying files in mc is must have feature!

And keeping in mind this bug is open already for 2 years, I'm really prefer to see this feature implemented with [X]largebuffer checkbox in UI soon, than wait for 3 more years until someone finally figure formula to increase buffer size without checkboxes. :-)

@mc-butler
Copy link
Author

Changed by powerman (powerman-asdf@….ru) on May 28, 2012 at 19:45 UTC (comment 12)

  • Cc changed from gotar@….pl to gotar@….pl, powerman-asdf@….ru

@mc-butler
Copy link
Author

Changed by powerman (powerman-asdf@….ru) on May 28, 2012 at 19:47 UTC (comment 13)

Actually, I can even live with a patch which constantly increase buffer size, if someone will provided it.

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Jun 18, 2015 at 18:28 UTC (comment 14)

  • Milestone changed from 4.8 to Future Releases

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Mar 25, 2016 at 7:53 UTC (comment 15)

Ticket #3624 has been marked as a duplicate of this ticket.

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Apr 6, 2016 at 11:40 UTC (comment 16)

  • Branch state changed from no branch to on review
  • Owner set to andrew_b
  • Status changed from new to accepted

Branch: 2193_copy_buffer_size
Initial [d63f6da04d315703e3ffced79431d6dcde2019bd]

The Coreutils way is used: the buffer size is based on block size of destination file system.

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Apr 6, 2016 at 20:24 UTC (comment 17)

Oh wow, very cool, I'll try to have a look!

@mc-butler
Copy link
Author

Changed by birdie (aros@….com) on Apr 7, 2016 at 7:51 UTC (comment 18)

In a perfect world, MC should use at least three threads for copying/moving files:

One thread to write to a ring buffer;
One thread to read from a ring buffer;
One thread to show progress every X seconds (for instance, 0.3 seconds).

Right now MC can be slow at copying for a different reason: because it spends too much time updating the screen.

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Apr 25, 2016 at 10:32 UTC (comment 19)

  • Branch state changed from on review to approved
  • Milestone changed from Future Releases to 4.8.17
  • Votes set to andrew_b

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Apr 25, 2016 at 10:34 UTC (comment 20)

  • Votes changed from andrew_b to committed-master
  • Branch state changed from approved to merged
  • Status changed from accepted to testing
  • Resolution set to fixed

Merged to master: [7b928e6].

git log --pretty=oneline 5ba9789..7b928e6

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Apr 25, 2016 at 10:35 UTC (comment 21)

  • Status changed from testing to closed

@mc-butler mc-butler marked this as a duplicate of #3624 Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: core Issues not related to a specific subsystem prio: medium Has the potential to affect progress
Development

No branches or pull requests

2 participants