I think this is the closest I've come to writing a Cthulhu Mythos
document: if you don't know what's going on, it's harmless, but with a
little knowledge it can be mildly disturbing…
So I was in a bad mood and wanted a programming problem. I've
never implemented quicksort before (I tend to use languages that have
sorting functions built in). And I was already thinking backwards. So
I implemented this in PostScript from the algorithm description on
Wikipedia, to see how much work it would be. Answer, really not very
much at all.
The public entry point: define the dict entry that serves as an anchor
for the array (I suspect there's a better way of doing this), and kick
off the top-level sort.
/quicksort { % [ a c b ] -> [ a b c ]
1 dict begin
/arr exch def
0 arr length 1 sub quicksort.main
arr
end
} bind def
The main routine: make sure there's something there to sort, generate
a partition, and quicksort each subpart. (It's dead handy that full
stops are valid characters in a PostScript name, so I can put all my
specialised routines into their own namespace. For full OO I'd shove
them into their own dict, but I don't think that really generates any
major advantages – and it would need an equivalent of self
or this
which would probably have to be based on where
plus an object-local
name.)
/quicksort.main { % lo hi -> (null)
3 dict begin
/hi exch def
/lo exch def
/xit false def
lo 0 lt {
/xit true def
} if
hi 0 lt {
/xit true def
} if
lo hi ge {
/xit true def
} if
xit not {
/p quicksort.partition def
lo p quicksort.main
p 1 add hi quicksort.main
} if
end
} bind def
The partitioning routine (this is Hoare's original version): pick the
middle value in the part of the array we're looking at, then walk
inwards from both ends, swapping entries if they're out of relative
order.
/quicksort.partition {
3 dict begin
/pivot arr hi lo add 2 idiv get def
/i lo 1 sub def
/j hi 1 add def
{
{
/i i 1 add def
arr i get pivot ge {
exit
} if
} loop
{
/j j 1 sub def
arr j get pivot le {
exit
} if
} loop
i j ge {
j
exit
} if
i j quicksort.swap
} loop
end
} bind def
Finally, swap two array entries by index.
/quicksort.swap {
2 dict begin
/bi exch def
/ai exch def
arr ai get
arr bi get
arr exch ai exch put
arr exch bi exch put
end
} bind def
Probably this could be done with fewer variables, but it works nicely
on GhostScript. Beware: PostScript arrays can be heterogeneous, and if
the things in the array you ask to sort aren't comparable (e.g. a
number versus a string) it will fail.
If this is of actual use to you… well, it probably shouldn't be.
- Posted by Owen Smith at
06:51pm on
30 January 2022
These days all sort algorithms are sub optimal. Things like quicksort and anything else recommended in textbooks from the 1980s are all designed on the assumption that the comparisons are the expensive bit, which they were then. But these says comparisons in fast cached CPUs are virtually free, and the expensive bit now is accessing external RAM which takes several thousand cycles. Keeping things in cache matters, but there has been no new research on how to do better sorting and instead we just throw faster hardware at it. In the absence of better research, doing everything as an external merge sort wouldn't be a bad start since that assumes all data is read from and written to tape.
- Posted by Random Visitor at
12:07am on
31 January 2022
"Mathematical Illustrations: A Manual of Geometry and PostScript" warns against opening a dictionary in your recursing routine and recursing while it's still open, as you're prone to run out of dictstack.
- Posted by RogerBW at
10:46am on
31 January 2022
Welcome, Mx Visitor, or may I call you Random?
I agree that that's a valid consideration, but I think Casselman's priority is not necessarily a universal one; it's twenty-five years later, and we don't always need to save every byte. Which is a justification for "I was too lazy to do it".
To remove the per-function dicts, which is only really worth doing in quicksort.main
, would mean moving more to the stack. (Casselman's approach is to use a dict but stuff its values onto the stack and end
it before entering the recursive parts, and I may well do that in a future diversion.)
- Posted by Random Visitor at
05:41pm on
02 February 2022
I really do hate "Mx". The default in English is still masculine if unknown, and making like snowflakes where everything just has to be special... we can't all ride the short bus, you know.
Owen makes a valid point in that there could be more research into relative costs beyond the rather crude abstraction we're using for algorithm complexity. I recall Poul-Henning Kamp mentioning that reshuffling the storage pattern helped more tree-traversal access stay in cache, yielding a nice speed-up for Varnish.
I wonder whether merge sort makes a good starting point, though presumably with suitable (implicit or explicit) prefaulting you could do a lot. No idea how that would translate to a PostScript interpreter. But key-based access is bound to be slower and bulkier than stack-based access so even if dictstacks have become bigger, relying on stacks of dicts rather than the parameter stack likely is still a bit of a drag. So, next PostScript-nerdery session might have to see some benchmarking done. I presume you did test the implementation. What's the size of the largest list you sorted?
Of course, a toy implementation written for the benefit of the programmer rather than for solving problems need not consider efficiency. But just saying "well it's been 25 years" and let the abundance of hardware compensate just as much means that you're expecting less performance, relatively speaking, out of current hardware than out of old hardware.
You'd be in good company, though. I get about the same (un)usability experience out of a webmodern browser on a modern OS on modern-ish hardware as I got out of 25 year old hardware with a 20 year old OS and matching webbrowser. Yes, it can do things now it couldn't do back then, but it doesn't really add much in usability, getting shit done, or any other metric beyond "gee this thing is almost as nice as a TV to look at". Well, I didn't get into computing to cultivate my inner couch potato.
There's also that PostScript is ment to run on printers, and they tend to lag a little behind the state of the art in computing resources, when not trying very hard to go backward in terms of capabilities.
- Posted by RogerBW at
06:12pm on
02 February 2022
That's nice. I "really do hate" people who use fake names and email addresses so that I can't add them to an auto-pass list but have to approve their comments by hand each time.
Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.