Greg Chesson writes:
...
> Linux has proven to be worthwhile as a node controller in an MPP
> architecture -
> that's what a Beowolf is. But that does not make it ready for SMP nodes
> that scale to large numbers. It seems wasteful to program a large scale
> ccNUMA machine the same way as a Beowolf cluster: you'd be throwing away
> most of the capabilities of the hardware. That's why I don't
> think it is interesting or particularly useful... unless a massive amount
> of work went into rewriting the io and memory management subsystems,
> not to mention scheduling, administration, etc.
...
I expect that much of the work will gradually happen in linux, once
the remaining small-CPU-count issues are resolved. Right now, there
is no shortage of interesting problems to attack. :-)
One possible way to approach the large-CPU-count space with linux
is to indeed run multiple linux kernels, one per node in a ccNUMA
machine, and add a distributed OS layer at a fairly high level. If it
is not underway already somewhere, I would expect someone to take up
the project as a graduate school project. Some systems of this sort
have been built or attempted, with varying degrees of success. Given
the linux bias toward small and simple, a linux-based distributed OS
might actually work.
One important ingredient in such a system, which would be
valuable immediately for clusters, would be a efficient distributed
volume manager and file system.
In any case, trying to port linux straight to a large ccNUMA
(or even a large SMP) system would be a lot of effort for limited
return at present.
|