The seventh item on my vaguebooking list was “07. Seven new topics”. These are new “subject areas” that I want to write about on my blog.
Pop culture is likely one of them, although it might be more narrow than that, maybe “pop culture intersecting with the news”. I didn’t comment on Jian Ghomeshi or Bill Cosby’s news items when they hit, but I loved watching people post and take sides, often looking like internet trolls in comment forums except they were posting the same comments on their own social media feeds. My take is a bit different and is primarily about the law, and the court of public opinion vs. the court of justice or law. I may yet blog about it.
Equally, I love the law. So much so that I couldn’t become a lawyer. I’d like to take a subject area and blog about that, but I haven’t yet found my niche. It may very well harken back to my days at law school when I was working for the Ministry of Education in B.C. and focus on the law, schools, education, and children. I haven’t quite decided yet. But there’s an itch there that I’d like to scratch again.
In the realm of writing, I have three areas that are of interest to me. First and foremost is the changing nature of the business model of publishing. I’m very much in the world where “everyone must choose their own path”, and I may turn my attention again to the world of disrupted publishing. Second, I think there is a lot of general information out there about marketing of books in the modern age, but not a lot that gives a comprehensive list of “here’s everything you COULD do, choose wisely”. I started work on this at one time and would like to go back to it. Finally, I also think there is a ripe area for a different slant on books and publishing, and that’s measuring the performance of libraries. I did some research and even some preliminary writing about three years ago but never brought anything to fruition. I think libraries are going to come under increased fire in the digital age, and while they have a strong role to play, I don’t think many of them are telling the right story or using the right yardsticks. When they tell their story initially, they act as a community centre; when their funding is threatened, they claim critics are burning books and destroying literacy if the library goes the way of the dodo. The balance is off, and maybe I can find something I can contribute to the conversation.
In a similar vein, I’m wondering if I have something to say about charities. I feel that much of the rhetoric out there is a bit one-sided, or at times, diametrically-opposed two-sided. I know, for example, that there is not much out there giving people insights into different types of charities. I also have some questions for myself that I want answered on local basic human needs programming and the most effective means of contributing donor dollars.
Finally, I do reviews for books, movies, TV and music, or at least my website says I do. I’ve been a slacker-doodle for my reviews, and I want to get back into them. I am not yet ready to commit to exactly what the other six categories will look like when I’m done, but I know this one pretty well. So, I commit to:
So I need help with a statistical question. It starts off relatively easy, and then I complicate it with two aspects that result in my having no idea how to handle it at all. Let’s start with the easy part. Let’s assume there are two ranked lists, and in the first instance I’ll just do five things in the list:
List One
List Two
A
B
C
D
E
D
E
C
A
B
What I want to know is how much the rankings in list one differ from list two. An easy way to do that (Solution A) is to compare the differences:
A(L1) to A(L2) = three spots lower i.e. -3
B = three spots lower i.e. -3
C = same spot i.e. 0 change
D = three spots higher i.e. +3
E = three spots higher i.e. +3
Net result is essentially 0, as it should be…for every displacement in list 1 to list 2, there is a corresponding displacement of another item. In the end, they’ll net out at zero change.
So, the proper statistical technique (Solution B) would be to use nominal values — ignoring the +/- — and ending up with 4 changes of 3 spots and 1 change of 0, for a total of 12 spots of difference over 5 items in the list or an average difference of 2.4. So I could argue that the difference in rankings between list one and list two is about 2.5 spots on average. I’m okay up to that point. Not completely sure what that tells me, but it’s a number. I almost think I’m looking at two separate samples from a pool and calculating their degree of deviation from each other, but not quite since it is a full sample of the whole population (i.e. there are only five items in that example), not a “sample”, so I can’t use sampling methodology to see how different it is from some generic population.
So we come to the two complications…the first complication (call it C1) is of scale. My lists aren’t five items long, they are a 100 items long. I don’t think that complicates it too much, just one of “scope” more or less.
The second complication (C2) is much more insidious…the first list is fully ordered, #1-100. The second list, however, is grouped into five unequally sized tiers. I’ll use a smaller example than 100, just 10 to make it plain, and I’ll reverse them just so it is obvious the lists are different…I’ll also tuck in a third list that is for all intents and purposes identical to List One, just grouped differently:
List One
List Two
List Three
A
B
C
D
E
F
G
H
I
J
I,J
F,G,H
D,E
C
A,B
A,B
C,D,E
F,G
H
I,J
The obvious choice would be to convert List One or List Two to “match” each other…I could, for example, rank I vs. J in List Two to get a #1 and #2 slot, then F vs. G vs. H to get #3,4,5 (Solution C). However, that would require a lot of subjectivity on my part that isn’t very functional. In my list two example, I & J are basically “tied”, no way to differentiate them further.
I could however decide that, like in a sports competition:
I & J share rank “1”;
F,G,H share rank “3”;
D,E share rank “6”;
C would have rank “8”; and,
A & B would have rank “10”.
Seems like a good solution (Solution D), right? It’s the way tournaments do it. The problem is if I apply this technique to List Three, which is virtually identical to List One, just grouped into 5 levels instead of 10, the numbers don’t tell you that (i.e. 1: A,B; 2: C,D,E; 3: F&G; 4: H; 5: I&J). If I do comparisons, I’d end up with a total difference of “A=0, B =1, C=0, D=1, E=2, F=0, G=1, H=0, I=0, J=1” for a total of 6/10 or .6 difference), even though the lists are basically identical.
A second alternative (Solution E) to converting List Two/Three to List One format is to do “average” and uneven rankings…so from List Three, A&B wouldn’t be in position “1”, they would be between 1&2. So I would give them both the average of 1.5; C,D,E would average out at #4 (i.e. spots 3, 4, and 5, averaging out to spot 4), etc. Nominally this would work, i.e. they would “net out” correctly and not nominally, but I would still be left with calculating a difference not in terms of ranking but in terms of methodology of ranking.
Soooo, I think I need to find a way to convert List One into List Two/Three format. Since List Three shows me whether or not my methodology “works”, I’m going to compare List One and List Three for the next part. One way to convert L1 to L3 format is to just divide L1 into equal chunks (Solution F):
A,B
C,D
E,F
G,H
I,J
This maintains the list format, divides it into equal chunks so not reflecting any bias of methodology in List Three, and preserves the ranking order. But if I then compare this “new” list one with List Three, I would get: A=0,B=0,C=0,D=0,E=1,F=0,G=1,H=0,I=0,J=0 for a net difference of 2 spots out of 10 items. It would show the list was “slightly” different, but not radically so, and would reflect essentially the difference in methodology in this “pure” example. Even if I bump it up to 100 items, those differences should be relatively minor. But again, primarily focusing on methodological differences.
Lastly, I have Solution G — I’ll convert List One into five levels, same as for List Three, but I will make them unequal size i.e. matching the size of the groups from List Three. If I do this for List One, it basically will look identical to List Three and comparing them would give me “net change = 0” and “nominal change = 0”. Which sounds good, but it basically means that I am “weighting” the results of List One to match the secondary lists’ ranking approach — for example, perhaps the original “weighting” would have been 9 items in Level 1 and 1 item at Level 5, but I wouldn’t know that. Instead, I’m imposing the ranking / weightings of List Two/Three’s methodology onto the pre-established list in List 1.
Summary
Solution A (Net changes, matching lists) — doesn’t work as nets out and lists aren’t matched in my applied example;
Solution B (Nominal changes, matching lists) — doesn’t work as lists are matched in my applied example;
Solution C (Re-rank List 2) — doesn’t work as no way to differentiate List 2;
Solution D (Sports tournament) — doesn’t work on similar lists, adds a methodological problem to a ranking approach;
Solution E (Average rankings) — doesn’t work as it eliminate second methodological problem but still leaves measurement of the different approaches to rankings;
Solution F (Equal chunks) — semi-works but it would still measure difference in methodology and ranking approach; and,
Solution G (Weighted chunks) — semi-works as it reflects nominal change of 0 in matching lists, but adds bias of second ranking approach.
The only other thought I had was to combine the results of Solutions D, E, F, and G and take an average of the four approaches. Not sure if that helps or if I’m just compounding my methodological and ranking problems.
Would love some thoughts if anyone has any to share…FYI, this is for personal use, not a work issue, so it doesn’t have to be entirely statistically pure, but I would like a little more comfort with an approach than I have for Solution G currently.