XP file sorting rant

History

Ever since file systems on computers have existed, file names have been text strings, composed of some variable number of characters, with various limits on lengths, valid characters, and sometimes subdivided into parts. Early file systems had no provision for sorting the names... they appeared in the order they were allocated in the file system's directory, which was sometimes the order in which the files were created... but if files were deleted and others created, sometimes gaps were left, later filled, and the order appeared to be quite random (but was generally quite deterministic, if only one knew the algorithm, and the history of file creation and deletion operations).

It was a great step forward when directory name display commands (often called "dir", sometimes called "ls") provided options to sort the output in various ways... options to sort by name, by date, by size, and other attributes were created.

Sorts by name were always by character code of each character, beginning at the first and proceeding to the last, either for the filename as a whole, or for each syntactical unit of the filename.

It has been quite a few years now, since long filenames have been available in most desktop operating systems, and most software handles them comfortably at this point. The incentive to create encoding schemes to squeeze unique, meaningful data into 8 or 14 character filenames has been greatly reduced with the availability of long filenames... however people still do embed a variety of numeric information into filenames. And there are still embedded systems running old operating systems with limited length filenames.

A problem

One problem with text string sorting of filenames containing sequence numbers is if the sequence numbers are composed of a variable number of digits (think more than 9, or more than 99, etc.). If one does not plan ahead, the order of the text-sorted string filenames might include things like...

file198
file199
file2
file20
file200
file201

Now as a sequence of files, numbered say, from 1 to 500, this ordering is not intuitive. One would want the files to be ordered by the value of the digit-substring taken as a whole, and that can easily be achieved by using leading zero digits to make the numeric portion of the name have a fixed number of digits, resulting in...

file002
...
file020
...
file198
file199
file200
file201

A Microsoft style improvement

In XP, Microsoft decided to solve the above problem in a different way. Instead of treating the filename as a plain text string, and sorting it left to right, and letting users supply simple solutions like the above to simple problems they might encounter, XP now treats filenames as multi-field substrings, and applies text sorting rules to the text substrings, and numeric sorting rules to the numeric substrings. The text and numeric fields are variable length, determined by whether or not the next character is an ASCII digit from 0 through 9, or anything else. This "solves" the above problem, resulting in a sort order for the above filenames of

file2
...
file20
...
file198
file199
file200
file201

Clearly the enhancement to XP is an improvement, right?

Let us consider other uses of digit strings in file names besides simple ordered sequences. Another common type of number to embed in filenames is the version of a distributed product. Version numbers are often variable in length, and often contain punctuation. Here are some typical version numbers... v1 v1.5 v2 v2.1.7 v2.1.8 v2.2 Often, the version is encoded in a filename without the punctuation characters... resulting, on most operating systems, in

product_v1
product_v15
product_v2
product_v217
product_v218
product_v22

This works great, and even sorts right... except now, in XP, it sorts as

product_v1
product_v2
product_v15
product_v22
product_v217
product_v218

This is not nearly as useful as it was.

In some applications, fixed-number-of-digits hexadecimal numbers are placed into filenames. Because hexadecimal numbers may contain the letters A, B, C, D, E, and F in any digit position, a sequence of hexadecimal numbers may be treated as multiple fields. What would appear in numerical order on most operating systems,

file0001
file0399
file03e6
file0a53
file0a7c

appears on XP as

file0a7c
file0a53
file0001
file03e6
file0399

What's more, if you add 7000 (hex) to these numbers, the order changes again, to

file7a7c
file7a53
file73e6
file7001
file7399

Knowing how to count in hexadecimal isn't going to help anyone find a file in the "ordered" list XP's Windows Explorer presents!

And Windows XP supports Unicode in filenames... but the numeric digits for Thai, for Chinese, and other numeration systems, all available in Unicode, are not used to delineate numeric subfields for filenames. So this solution only works (to the extent that you consider the above working), for people using the ASCII digits for their numeric subfields.

And Windows XP still has a command prompt, and the command prompt still supports the "dir" command.... which still sorts the way every other operating system sorts filenames... leading to further confusion and inconsistencies.

And why, oh why, didn't Windows XP figure out that I want the following list of files sorted in the order given below?

zero
XCIV
417
seven hundred sixty eight
MMCCLXIII

The final word?

It is not clear that the final word on this topic has been written. But in SP1, Microsoft quietly added a new policy. Adding the following key or keys to the registry will undo the filename sorting feature introduced in XP:

[HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\Currentversion\Policies\Explorer]
"NoStrCmpLogical"=dword:00000001

[HKEY_CURRENT_USER\Software\Microsoft\Windows\Currentversion\Policies\Explorer]
"NoStrCmpLogical"=dword:00000001

Use the first one if you wish to undo it for all users of the machine, or the second one if you wish to undo it just for the currently logged in user.

It appears that some customers must have spoken, and that they didn't like the feature, and that they were important customers (since Microsoft listened).