Thursday, November 12, 2015

Timsort

So quicksort is pretty cool. Everybody knows "how" it works, meaning we've all seen a visualization of quicksort picking a pivot, going through and finding values less than or greater than and moving them around the pivot. Unfortunately very few people correctly implement the algorithm correctly on their first try, or if they did, they've never implemented it since. 

Less common that quicksort is the man behind the curtain (edit: I made this post when I was pretty exhausted, is "man behind the curtain" even a phrase? Does it make sense here?)... Timsort. Timsort is based on Tim Peter's list sort for python described here. It even runs O(n lg n) in the worst case, and often faster on real world data! As an adaptive sorting routine, Timsort uses a binary sort when it recurses deeply enough and combines sets of data that were ordered to begin with. (More edits... Tim sort is also a stable sorting algorithm, meaning that if you want to compare array vs array like with Strings, you can compare each piece of that array. This means objects with equal keys will not be sorted arbitrarily.)

Aside from the coolness of having beaten quicksort Timsort has its downsides, for example it does also use n/2 extra space where qiucksort does sorting in-place. In most applications I would have, the extra space would be no problem, however for big data analysis n/2 extra space could be GBs or TBs.

No comments:

Post a Comment