07-06-2020, 01:10 AM
I started registering (on the computer) my collection of records, cassette, tape recordings somewhere in 1978. Until then I had it all written out on paper.
As soon as the data was in the computer I encountered the issue of sorting the artists.
My latest gross redesign of the database and software was in 2001. This is the software I still use up to today.
The issue to be solved is how to sort arbitrary artists (people names, band names). I tried a number of heuristic algorithms but they all had their flaws. It is theoretically impossible to distinguish a band name like Pink Floyd from an person name like Eddy Floyd. Not to mention Jethro Tull (the person) and Jethro Tull (the band).
My first attempt was to indicate the sort key in the display name with an asterisk: *Pink Floyd versus Eddy *Floyd. (For convenience a leading asterisk could be left out. No asterisk meant sorting 'as is'). This had certain limitations so I quickly switched to a format with two fields: the sort key and the rest: "Pink Floyd; Pink Floyd" and "Floyd, Eddy; Eddy Floyd".
From there is was a trivial step to "Pink Floyd" (no asterisk → sort 'as is') and "Floyd; Eddy *".
Transformation from sortname to display was a single rexep replacement: s/^(.*?);\s(.*)\*(.*)/$2$1$3/ (in Perl).
Since there is a deterministic way to transform sortname to display name it is not necessary to have both in the database. It is also upward compatible in the sense that it is easy to augment an existing implementation with only display names to deal with sort names -- without affecting current data and behaviour.
This is basically the situation with MSPro. It only has artist names that are display names. By using the above transformation algorithm (or something similar) a sorting order can be obtained without the need for a separate sortnames item. Also, as Mike already noticed, if you would have multiple artists and sortartists how to pair them?
In my initial design I used commas to separate the sort part from the rest which turned out to be non-optimal (think "Crosby, Stills, Nash & Young"). There a much better choices.
So my personal advice to Mike would be to not try the road of heuristics but adopt something similar to described above. Heuristics will let you down.
And yes, it is "Bach, Johann Sebastian" and "Vaughan Williams, Ralph".
As soon as the data was in the computer I encountered the issue of sorting the artists.
My latest gross redesign of the database and software was in 2001. This is the software I still use up to today.
The issue to be solved is how to sort arbitrary artists (people names, band names). I tried a number of heuristic algorithms but they all had their flaws. It is theoretically impossible to distinguish a band name like Pink Floyd from an person name like Eddy Floyd. Not to mention Jethro Tull (the person) and Jethro Tull (the band).
My first attempt was to indicate the sort key in the display name with an asterisk: *Pink Floyd versus Eddy *Floyd. (For convenience a leading asterisk could be left out. No asterisk meant sorting 'as is'). This had certain limitations so I quickly switched to a format with two fields: the sort key and the rest: "Pink Floyd; Pink Floyd" and "Floyd, Eddy; Eddy Floyd".
From there is was a trivial step to "Pink Floyd" (no asterisk → sort 'as is') and "Floyd; Eddy *".
Transformation from sortname to display was a single rexep replacement: s/^(.*?);\s(.*)\*(.*)/$2$1$3/ (in Perl).
Since there is a deterministic way to transform sortname to display name it is not necessary to have both in the database. It is also upward compatible in the sense that it is easy to augment an existing implementation with only display names to deal with sort names -- without affecting current data and behaviour.
This is basically the situation with MSPro. It only has artist names that are display names. By using the above transformation algorithm (or something similar) a sorting order can be obtained without the need for a separate sortnames item. Also, as Mike already noticed, if you would have multiple artists and sortartists how to pair them?
In my initial design I used commas to separate the sort part from the rest which turned out to be non-optimal (think "Crosby, Stills, Nash & Young"). There a much better choices.
So my personal advice to Mike would be to not try the road of heuristics but adopt something similar to described above. Heuristics will let you down.
And yes, it is "Bach, Johann Sebastian" and "Vaughan Williams, Ralph".
Johan
johanvromans.nl — hetgeluidvanseptember.nl — mojore.nl -- howsagoin.nl
Samsung Galaxy Note S7FE (T733) 12.4", Android 13.0, AirTurn Duo & Digit (Gigs).
Samsung Galaxy Note S4 (T830) 10.5", Android 10.0 (maintenance and backup).
Samsung A3 (A320FL), Android 8.0.0 (emergency).
johanvromans.nl — hetgeluidvanseptember.nl — mojore.nl -- howsagoin.nl
Samsung Galaxy Note S7FE (T733) 12.4", Android 13.0, AirTurn Duo & Digit (Gigs).
Samsung Galaxy Note S4 (T830) 10.5", Android 10.0 (maintenance and backup).
Samsung A3 (A320FL), Android 8.0.0 (emergency).