Go Back   Coding Forum > Coding World > Java

Reply
 
LinkBack Thread Tools Display Modes
Old 02-22-2012, 10:30 PM   #11 (permalink)
Jeff Higgins
Guest
 
Posts: n/a
Default O.T. optimising file placement

On 02/22/2012 05:52 PM, Gene Wirchenko wrote:
> On Wed, 22 Feb 2012 16:03:41 -0500, Jeff Higgins
> <jeff@invalid.invalid> wrote:
>
>> On 02/22/2012 04:00 PM, Jeff Higgins wrote:
>>> On 02/22/2012 02:15 PM, Roedy Green wrote:
>>>>
>>>> Back before the Internet, I was pushing for what I call "Marthaing"
>>>> drives. We might get them any year now.
>>> Who is Martha? Back before the Internet I was advocating for squeezable
>>> catsup bottles. We have'em, but I haven't got a dime for'em.

>
>> <http://uncyclopedia.wikia.com/wiki/Ketchup_v._Catsup>

>
> I noticed that the tone is not as academic as Wikipedia.
>

Yep
"Tomato ketchup is a pseudoplastic — or "shear thinning" substance —
which can make it difficult to pour from a glass bottle."

  Reply With Quote
Old 02-23-2012, 12:30 AM   #12 (permalink)
Daniel Pitts
Guest
 
Posts: n/a
Default O.T. optimising file placement

On 2/22/12 3:24 PM, Jeff Higgins wrote:
> On 02/22/2012 05:52 PM, Gene Wirchenko wrote:
>> On Wed, 22 Feb 2012 16:03:41 -0500, Jeff Higgins
>> <jeff@invalid.invalid> wrote:
>>
>>> On 02/22/2012 04:00 PM, Jeff Higgins wrote:
>>>> On 02/22/2012 02:15 PM, Roedy Green wrote:
>>>>>
>>>>> Back before the Internet, I was pushing for what I call "Marthaing"
>>>>> drives. We might get them any year now.
>>>> Who is Martha? Back before the Internet I was advocating for squeezable
>>>> catsup bottles. We have'em, but I haven't got a dime for'em.

>>
>>> <http://uncyclopedia.wikia.com/wiki/Ketchup_v._Catsup>

>>
>> I noticed that the tone is not as academic as Wikipedia.
>>

> Yep
> "Tomato ketchup is a pseudoplastic — or "shear thinning" substance —
> which can make it difficult to pour from a glass bottle."
>

Edible Non Newtonian fluids FTW
  Reply With Quote
Old 02-23-2012, 01:30 AM   #13 (permalink)
Martin Gregorie
Guest
 
Posts: n/a
Default O.T. optimising file placement

On Wed, 22 Feb 2012 14:27:17 -0800, Lew wrote:

> Modern hard drives, pretty much all of them, have a buffer and
> microprocessor as part of the hardware. We're not going to get any
> "Marthaing" as you describe it (wherever the heck /that/ term came from)
> because what they're already doing is already so effective.
>
> What they mostly do is collect read and write requests and combine them
> in elevator-seek order, along with full-track readahead. This optimizes
> disk access for single sweeps of the drive heads.
>

Agreed, and a mainframe OS I was using in the early '70s (ICL's George 3)
was doing it back then and very effective it is too for speeding up disk
access. Back in the day it pushed the speed of the 2800 rpm, 60 MB
washing-machine sized disk drives up from around 8 accesses/sec to
something like 20-30 per sec.

However, its ineffective unless there are many active processes
simultaneously requesting disk i/o. If all the requests come from one
single threaded process then it can't optimize head movement because
there's never more than one pending request at a time. I know this is
reduction ad absurdam, but it does make the point that a small active
process population is unlikely to be optimised as well as a large one.
This is relevant today for allmost all single-user workstations
regardless of whether they are running Windows, Linux or OS X. Since the
majority of applications run on these machines are single threaded, about
the only time you have more than one process accessing the disk is when
the user is hammering away at a task, be it wordprocessing, spread-sheet,
browser or IDE and the mail reader, sitting in the background, finds some
mail waiting.

> The on-drive buffer
> also holds enough data for most reads and writes, overtaking any
> advantage that any (perforce extremely slow) physical re-ordering of the
> tracks could accomplish.
>

Yep, the on-drive buffer will almost always be capable of holding several
physical tracks and, in addition, on a *NIX system anyway, all RAM not
occupied by running processes and their data will contain disk buffers.


--
martin@ | Martin Gregorie
gregorie. | Es***, UK
org |
  Reply With Quote
Old 02-23-2012, 02:30 AM   #14 (permalink)
Arne Vajhøj
Guest
 
Posts: n/a
Default O.T. optimising file placement

On 2/22/2012 8:31 PM, Martin Gregorie wrote:
> On Wed, 22 Feb 2012 14:27:17 -0800, Lew wrote:
>> Modern hard drives, pretty much all of them, have a buffer and
>> microprocessor as part of the hardware. We're not going to get any
>> "Marthaing" as you describe it (wherever the heck /that/ term came from)
>> because what they're already doing is already so effective.
>>
>> What they mostly do is collect read and write requests and combine them
>> in elevator-seek order, along with full-track readahead. This optimizes
>> disk access for single sweeps of the drive heads.
>>

> Agreed, and a mainframe OS I was using in the early '70s (ICL's George 3)
> was doing it back then and very effective it is too for speeding up disk
> access. Back in the day it pushed the speed of the 2800 rpm, 60 MB
> washing-machine sized disk drives up from around 8 accesses/sec to
> something like 20-30 per sec.
>
> However, its ineffective unless there are many active processes
> simultaneously requesting disk i/o. If all the requests come from one
> single threaded process then it can't optimize head movement because
> there's never more than one pending request at a time. I know this is
> reduction ad absurdam, but it does make the point that a small active
> process population is unlikely to be optimised as well as a large one.
> This is relevant today for allmost all single-user workstations
> regardless of whether they are running Windows, Linux or OS X. Since the
> majority of applications run on these machines are single threaded, about
> the only time you have more than one process accessing the disk is when
> the user is hammering away at a task, be it wordprocessing, spread-sheet,
> browser or IDE and the mail reader, sitting in the background, finds some
> mail waiting.


Most OS'es support async IO.

Arne
  Reply With Quote
Old 02-23-2012, 03:30 AM   #15 (permalink)
Gene Wirchenko
Guest
 
Posts: n/a
Default O.T. optimising file placement

On Wed, 22 Feb 2012 18:24:06 -0500, Jeff Higgins
<jeff@invalid.invalid> wrote:

>On 02/22/2012 05:52 PM, Gene Wirchenko wrote:
>> On Wed, 22 Feb 2012 16:03:41 -0500, Jeff Higgins
>> <jeff@invalid.invalid> wrote:
>>
>>> On 02/22/2012 04:00 PM, Jeff Higgins wrote:
>>>> On 02/22/2012 02:15 PM, Roedy Green wrote:
>>>>>
>>>>> Back before the Internet, I was pushing for what I call "Marthaing"
>>>>> drives. We might get them any year now.
>>>> Who is Martha? Back before the Internet I was advocating for squeezable
>>>> catsup bottles. We have'em, but I haven't got a dime for'em.

>>
>>> <http://uncyclopedia.wikia.com/wiki/Ketchup_v._Catsup>

>>
>> I noticed that the tone is not as academic as Wikipedia.
>>

>Yep
>"Tomato ketchup is a pseudoplastic — or "shear thinning" substance —
>which can make it difficult to pour from a glass bottle."


"Would you like fries with that?"

Sincerely,

Gene Wirchenko
  Reply With Quote
Old 02-23-2012, 10:30 PM   #16 (permalink)
Martin Gregorie
Guest
 
Posts: n/a
Default O.T. optimising file placement

On Wed, 22 Feb 2012 21:40:19 -0500, Arne Vajhøj wrote:

>
> Most OS'es support async IO.
>

Yes, I know, but its not relevant to a single-threaded process since its
logic generally requires it to wait for a read or write to complete
before it continues[1]. Hence my comment that this prevents head movement
being optimized unless a lot of processes are active because there's only
one outstanding IOP per process.

[1] unless you're deliberately doing async i/o using poll() or
select() (in C) or nio (in Java), in which case the process is often
best regarded as a half-way house between single and multi-threaded
logic.


--
martin@ | Martin Gregorie
gregorie. | Es***, UK
org |
  Reply With Quote
Old 02-24-2012, 05:00 AM   #17 (permalink)
Patricia Shanahan
Guest
 
Posts: n/a
Default O.T. optimising file placement

On 2/23/2012 3:16 PM, Martin Gregorie wrote:
> On Wed, 22 Feb 2012 21:40:19 -0500, Arne Vajhøj wrote:
>
>>
>> Most OS'es support async IO.
>>

> Yes, I know, but its not relevant to a single-threaded process since its
> logic generally requires it to wait for a read or write to complete
> before it continues[1]. Hence my comment that this prevents head movement
> being optimized unless a lot of processes are active because there's only
> one outstanding IOP per process.
>
> [1] unless you're deliberately doing async i/o using poll() or
> select() (in C) or nio (in Java), in which case the process is often
> best regarded as a half-way house between single and multi-threaded
> logic.
>
>


There are some exceptions to this. For example, if you are reading a
file sequentially, the OS may prefetch blocks you have not yet
requested, and have multiple reads outstanding as a result.

Depending on the OS and how the IO is being handled, a write may appear
to be complete from the program's point of view once the data has been
copied to a kernel buffer. The OS may be writing out modified blocks,
including swap space blocks, at any time.

Patricia
  Reply With Quote
Old 02-24-2012, 06:30 PM   #18 (permalink)
Lew
Guest
 
Posts: n/a
Default O.T. optimising file placement


Martin Gregorie wrote:
> Unlike some, I take a good deal of interest in what my machines are up
> to, so I was quoting what I see using top on my Linux systems. During
> normal operation there is very little activity on my laptop except from
> the programs I'm actively using unless, as you say, logwatch/smartd/
> rkhunter/updatedb get run by atd, but on a reasonably quick machine they
> don't run for long.
>
> Of course, the house server is a different case, since it has several
> 24/7 services on it, but again its only heavy, continuous disk activity
> is overnight when it runs backups/logwatch/smartd/updatedb. Apart from
> that requests that wake up Postfix/Spamassassin/Apache/or ftpd/sshd are
> pretty sporadic and the disk LED flashes are best described as
> intermittent.


Sounds like disk optimizations would help that system.

> The longest continuously busy time on either machine is during backups
> and even there there precious little contention since rsync or tar+gzip
> since the only stuff being written to the disk its reading from are
> backup logs. Same applies to software update sessions. To the best of my
> knowledge (and watching top) none of yum, rpm, tar, gzip or rsync are
> multi-threaded: rsync is probably using poll() based async i/o but from
> top and observed behaviour none of the others seem to do that. In fact
> the only long-running programs on my systems that I know to be multi-
> threaded are Apache, Postgres, SA and Postfix.


Now /that/ is objective evidence.

In your particular case you have no need of optimization of your disk
processes. You don't mention it but by omission I will grant you that virtual
memory on your system does not seriously contend for disk either. But a
typical consumer scenario is to listen to a stream while surfing the web on
Windows with several chat windows open, causing multiple disk IO ops on a
constant basis of themselves and also putting pressure on virtual memory. Even
such a single-user system can benefit from elevator seeking and on-disk buffers.

Consider also that burstiness of demand does not argue against the need for
optimization, really. During bursts the optimization helps, and a user might
complain if their disks got weird once an hour.

Regardless, if you don't need optimization why worry? It's like the Pope
comparing brands of condoms.

Again, we don't excoriate the value of optimizations by citing examples where
optimization isn't needed. We evaluate optimizations by how useful they are
when they are needed.

--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedi.../c/cf/Friz.jpg
  Reply With Quote
Old 02-24-2012, 07:30 PM   #19 (permalink)
Martin Gregorie
Guest
 
Posts: n/a
Default O.T. optimising file placement

On Thu, 23 Feb 2012 21:45:33 -0800, Patricia Shanahan wrote:

> On 2/23/2012 3:16 PM, Martin Gregorie wrote:
>> On Wed, 22 Feb 2012 21:40:19 -0500, Arne Vajhøj wrote:
>>
>>
>>> Most OS'es support async IO.
>>>

>> Yes, I know, but its not relevant to a single-threaded process since
>> its logic generally requires it to wait for a read or write to complete
>> before it continues[1]. Hence my comment that this prevents head
>> movement being optimized unless a lot of processes are active because
>> there's only one outstanding IOP per process.
>>
>> [1] unless you're deliberately doing async i/o using poll() or
>> select() (in C) or nio (in Java), in which case the process is
>> often best regarded as a half-way house between single and
>> multi-threaded logic.
>>
>>
>>

> There are some exceptions to this. For example, if you are reading a
> file sequentially, the OS may prefetch blocks you have not yet
> requested, and have multiple reads outstanding as a result.
>

Fair point, and I've seen blinding speed from reads where the disk
drivers used track reads, but it still doesn't affect my point that
there's still only one I/O request in the queue per active single
threaded process. Head movement optimisation is simply sidestepped in
this case.

> Depending on the OS and how the IO is being handled, a write may appear
> to be complete from the program's point of view once the data has been
> copied to a kernel buffer. The OS may be writing out modified blocks,
> including swap space blocks, at any time.
>

Again agreed: its fair to regard a write as complete from the program's
POV as soon as it can reread the block/record - something that many
indexed sequential access schemes need to do to re-establish a 'current
record' pointer.


--
martin@ | Martin Gregorie
gregorie. | Es***, UK
org |
  Reply With Quote
Old 02-24-2012, 11:30 PM   #20 (permalink)
Martin Gregorie
Guest
 
Posts: n/a
Default O.T. optimising file placement

On Fri, 24 Feb 2012 10:50:39 -0800, Lew wrote:

> Martin Gregorie wrote:
>> Unlike some, I take a good deal of interest in what my machines are up
>> to, so I was quoting what I see using top on my Linux systems. During
>> normal operation there is very little activity on my laptop except from
>> the programs I'm actively using unless, as you say, logwatch/smartd/
>> rkhunter/updatedb get run by atd, but on a reasonably quick machine
>> they don't run for long.
>>
>> Of course, the house server is a different case, since it has several
>> 24/7 services on it, but again its only heavy, continuous disk activity
>> is overnight when it runs backups/logwatch/smartd/updatedb. Apart from
>> that requests that wake up Postfix/Spamassassin/Apache/or ftpd/sshd are
>> pretty sporadic and the disk LED flashes are best described as
>> intermittent.

>
> Sounds like disk optimizations would help that system.
>

Probably not - they are all cron jobs and hence get run sequentially.

>> The longest continuously busy time on either machine is during backups
>> and even there there precious little contention since rsync or tar+gzip
>> since the only stuff being written to the disk its reading from are
>> backup logs. Same applies to software update sessions. To the best of
>> my knowledge (and watching top) none of yum, rpm, tar, gzip or rsync
>> are multi-threaded: rsync is probably using poll() based async i/o but
>> from top and observed behaviour none of the others seem to do that. In
>> fact the only long-running programs on my systems that I know to be
>> multi- threaded are Apache, Postgres, SA and Postfix.

>
> In your particular case you have no need of optimization of your disk
> processes. You don't mention it but by omission I will grant you that
> virtual memory on your system does not seriously contend for disk
> either.
>

Well spotted. My type of load almost never swaps. That was the case with
the old 512 MB RAM box and is double true with its replacement (4 GB
RAM), but that still doesn't stop me setting swap space at twice RAM.

In fact the only program I have that does use gobs on RAM is a JavaMail +
Postgres app and I'm not sure if its a problem due to JavaMail's queueing
or if I've got overly long lived Object instances. Tracking this down in
on my to-do list. All I know at present is that the same program using
the same JVM uses gobs more RAM on the new machine (which is 6 times
faster as well as having 8x more RAM), so it might simply be a case of
persuading the GC to run more often.

> But a typical consumer scenario is to listen to a stream while
> surfing the web on Windows with several chat windows open, causing
> multiple disk IO ops on a constant basis of themselves and also putting
> pressure on virtual memory. Even such a single-user system can benefit
> from elevator seeking and on-disk buffers.
>

I'm not saying head movement optimisation is a bad thing, just that it
can be difficult to get enough queued requests for it to work without a
large population of active processes that all do a lot of disk accesses.

You may well be right about the typical consumer setup: I lack any
experience that: all I understand is the pattern that my own use pattern
generates. However, I would point out that streamed music or video may
never touch the disk (though of course a torrent will). The amount of
disk i/o due to chat/IM/Twitter/web browsers may be less that we'd expect
because its (a) very bursty and (b) disk i/o time is vastly outweighed by
human reading and typing time.

> Consider also that burstiness of demand does not argue against the need
> for optimization, really. During bursts the optimization helps, and a
> user might complain if their disks got weird once an hour.
>

Sure, but the user's activity scan and resulting interaction with one
program at a time, which may well be single threaded, for a few minutes
before switching to another. This tends to produce widely separated
bursts of i/o from one or two processes.

> Regardless, if you don't need optimization why worry? It's like the Pope
> comparing brands of condoms.
>

Like it!

> Again, we don't excoriate the value of optimizations by citing examples
> where optimization isn't needed. We evaluate optimizations by how useful
> they are when they are needed.
>

I wasn't intending to do that, having seen just how well head scheduling
works. I merely intended to point out that there are corner cases where
such algorithms don't help - but are not a hindrance either.


--
martin@ | Martin Gregorie
gregorie. | Es***, UK
org |
  Reply With Quote
Reply

Thread Tools
Display Modes



All times are GMT. The time now is 01:38 AM.


Powered by vBulletin® Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Copyright ©2010, CodingForum.Org