ELO ratings all the way! Though not always practical (e.g. can't use with restaurants, as you can only compare places you've been), they are criminally under used.
Don't ELO ratings depend structurally upon the distribution of the things being compared? For example, if you have Chess players A, B, and C, with ratings of 1000, 1200, and 1400, then the ELO rating is telling you not just about their relative ranking, but also about the chance of A beating B, B beating C, and A beating C.
For those ratings to be truly accurate and stable, that means the underlying mathematical structure of the rating has to match the structure of how players at different levels of skill compare against each other. Otherwise, you may have a system that accurately predicts how often A will beat B while 200 points lower rating, and accurately predicts how often B will beat C while 200 points lower rating, but fails to accurately predict how often A will beat C while 400 points lower rating.
The consequence of the ELO model not matching the structure of the thing being rated is that ratings become unstable. The correct rating for player A playing against player B (1200 rating) won't be the same as the correct rating for player A playing against player C (1400 rating). In that situation, in order to correctly interpret what a rating means, you would have to know what the ratings were that the player was being rated against (and who those opponents had been rated against, etc., ad nauseum).
The transitive property isn't the issue here; as you say, it is assumed in ELO, and built into the structure of fixed star ratings.
My concern is that ELO assumes a particular probability structure; for the standard ELO, a 200 point difference represents a 76% chance of victory, while a 400 point difference represents a 90% chance of victory. This means that if player B has a 76% chance of beating you, and player C has a 76% chance of beating them, then player C has a 90% chance of beating you. When that probability structure is correct, then it doesn't matter if you play player B or player C; either way, your expected rating will converge to the same equilibrium value.
However, if that structure is not correct (and in general, why would it be?), then your rating will have a different equilibrium value depending on who you are competing against.
re: 5 stars, while the AirBnB magnet is held as a bad example, it does one thing right: it makes it explicit what the ratings mean to both the reviewer and reviewee. Obviously it also needs to be explicit and understood by all the other users of AirBnB, but it's a necessary first step.
Instead of having stars - which work well for art, as you say with movie reviews - you just need the rating system to have those words. No stars, no numbers, no smiley faces - just words:
- excellent beyond expectations
- good, met reasonable expectations
- acceptable, but there were some non-critical issues
- bad, with critical issues that make this a thing you should not buy
- RED ALERT, injury or death may ensue, criminal venture, please shut these people down and/or arrest them
Whatever words, so long as they are always the SAME words in the same industry/context/website, and there's at least 1 not-the-highest level/rating generally understood to be "they made me review this, it was fine, whatever" and represents "the employees did their job and you, management, who is reading this, have no reason to complain."
Edit: another reference point would be those pain scales on the doctor's office walls. Yes, they have the color shaded smiley faces and too many numbers, but they outline that some levels are "fine, don't worry" and others are "JESUS CHRIST I AM DYING"
I was laughing over the exact same sticker (from the post) at an Airbnb this weekend. It does make sense to have {defined scale} | {broken averages}, but it’s still frustrating given that I would rather not have broken averages in the first place. Give me the Japanese system! We’re wasting so much room with a scale this way.
Re eating out, a non-regulatory problem is that restaurants are incentivized to prioritize taste over health to a degree that people never would in their home kitchen. And this is hard to avoid because of equilibrium considerations, whereas few people can bring themselves to add the levels of fat sugar and salt to their own food that the restaurant would.
"So won’t this mean you’ll simply have to add a second earlier date for ‘peak quality,’ and some people will then throw out anything past that date too? Also, isn’t ‘peak quality’ almost always ‘the day or even minute we made this? Who is going to buy things that are past ‘peak quality’ but not expired? Are stores going to have to start discounting such items?"
In the UK we already have a sell-by date and a use-by date. I understand the sell-by date to mean "This is when it's no longer as fresh as we like to advertise our products as being, but still fine." Yes, many supermarkets do discount items that are past their sell-by date but haven't hit their use-by date; no, people don't throw out things that reach that phase while sitting in their fridge. I think it's a good system!
Agreed, the new California law actually makes a ton of sense (assuming it's written well enough to not backfire somehow). Right now, food items here are inconsistently labelled with either of the 2 dates:
1. "sell by" / "best by" / "best if used by" (I think these all mean the same thing, the "peak quality" date)
2. "use by" (what people would normally call expiration date, but it's not actually a legal requirement for this to mean anything related to food safety so depending on the manufacturer, it may really mean the same thing as 1)
2 seems like the more useful one but 1 seems to be more common. I'm guessing this is because 1 is a much weaker statement so it's less liability for the manufacturer if the date is inaccurate?
Many people don't realize there's a difference and tend to throw away perfectly safe food after the "best by" date, hence all the preventable food waste.
I'm not sure if the situation is any less confusing in other states, but a quick Google search tells me it's probably the same in New York. Check your fridge and see how many items have actual expiration dates vs "best by" dates!
And yes, grocery stores here routinely discount items that are close to whatever date is printed on the package. I've never seen a store sell an item after its "best by" date (discounted or not) even though it would be safe and legal, so there's probably some more preventable food waste there.
"It’s still a statement about regulatory costs and requirements, essentially, that it is often also cheaper. In a sane world, cooking at home would be a luxury."
I would tell my high school econ students that "Do it yourself" is market failure. If you build your own deck for your house, the market is distorted in some way (and don't tell me that you love building decks, because you don't do it often). The biggest distortion is that if you build your own deck, the labor is not taxed. Same with cooking your own food.
This takes it a little to the absurd. If doing basic things for yourself, or any kinds of independence is a "market failure", and especially if the failure is me being taxed for not doing...I guess., he one job that is my comparative advantage and nothing else, then the problem is in the model and not me.
It's often a market failure, but sometimes just represents information transfer costs. If you like something that isn't the precise standard thing, it can be hard to convey that to others. A friend just had a contractor come in and take down a 18-foot-wide swath of trees leading up to a cabin, to make it easier to use some machinery on another part of the project, but the friend only asked for 10 feet of trees to be cleared. The missing trees are going to be years worth of reminder of the risks of outsourcing.
It’s funny so watch people come up with sophisticated explanations. Most people don’t know how to cook the sort of food they find in restaurants or how to cut trees or how to build a deck. Or they don’t have the time or don’t want to. That’s not a market failure. Haven’t you all read Adam Smith?
Second, if it is a market failure, which maybe it is, then so much the worse for free market capitalism, hail DIY anarchism! Take the argument where it leads: markets are often not a good solution to people’s problem. Maybe that’s a more general feature than you’re willing to accept.
Third, getting back to cooking: I don’t know, but things like rent, furniture, staff, energy, quality ingredients, and so on, are costs diners expect to see reflected in the price of restaurant food. You don’t count them as part of cooking at home. This whole puzzle seems completely fake and smug to me.
My understanding is that it is literally true that all chickens must be registered and not doing so is a criminal offense. From your link: "[f]rom 1 October, a new legal requirement for all bird keepers in England and Wales to register... regardless of the size of their flock." Another page on the same website is more explicit: "[t]he threshold for mandatory registration will be reduced from 50 birds to 1, which means all poultry and captive bird keepers will be legally required to register their birds." https://www.nfuonline.com/updates-and-information/defra-announces-changes-to-the-gb-poultry-register/
If by "much less onerous / arbitrary" you mean that filling out a single form is not much of a burden, that not filling it out would only result in a small fine, that there is even "no penalty for registering after 1 October" on account of technical issues with the government portal ("a high volume of applications..." who could've seen that coming?) and that the requirement is only an extension of preexisting laws and not a radical power-grab, then those are true statements, but it's an issue of values, not facts.
What confused me at first was precisely the claim that "all chickens must be registered." Taken literally, this is false: the law requires all *poultry keepers* to be registered, not all chickens.
This makes a substantive difference in my view.
A law that treated chickens like dogs in many US states and cities -- each one is registered individually with the government -- would have been absolutely bonkers. (One of my eggs hatched and now I have to spend 20 minutes on a government portal registering it as a chicken?!) This was my initial interpretation based on a literal reading of the statement, perhaps colored by my lived experience that the UK really does pass laws this insane from time to time.
But a law that requires poultry keepers to register with a stated goal of helping to manage infectious disease outbreaks sounds like it could potentially pass the cost / benefit test. I'm not saying it does, but it's also not a "point and sneer" example of government run amok.
I've accepted that 5 stars means that expectations were met, but I refuse to participate. My solution is to only rate when service exceeds expectations.
Unfortunately, it is worse than described. I ordered something from Etsy and had truly horrible service and left an (appropriate, in my view) 2-star review. It was deleted and the item/store still has a perfect 5-star average.
Restaurant food should be compared to cooking at home daily from scratch with beautiful plating and someone else cleaning up for you, following hygiene rules as prescribed by inspectors. (Note: there may well still be privately funded voluntary health inspections for restaurants so you don't lose this cost entirely in a free market.) And if kitchen space separately and its not included in the living space.
When I cook at home I vary the quality a lot based on the time and convenience, eat day-old leftovers, don't bother with plating and can dispense with hair nets and rubber gloves. Also nobody cleans up for me. Maybe you can factor that into my time cost?
Ah, that's a good point. I've only been close to recruiting in SV new grads, where I wouldn't expect new hires to consider that (and NCs weren't enforceable).
But a wise new hire should factor that in. Perhaps optimism bias puts leaving for a competitor beyond the planning horizon.
"facial ticks" -- two gentle corrections. First, the phrase is "facial tic". Second, it doesn't mean what you think it means. A tic is a repeated muscle spasm. The research focused on facial features which are things like relative size, symmetry, stuff like that.
I'd like to think it was a tragedy of the J-schools, not the commons, because you'd have an obvious point where you could fix it. But thinking back on my own interactions with journalists back in the late 70's and 80's, before the J schools had come to dominate journalism, I do recall it being the case that news reports of anything you had personal knowledge of were typically wrong in serious ways.
Sometimes it was just casual inaccuracy. Other times, ways that were really hard to think weren't deliberate, frankly. Even then upholding the narrative often was more important than accuracy. Like reporting a mixed race political rally I'd been at as lily white, and carefully not photographing anybody but whites. That one shocked me pretty deeply, when I was less cynical than today.
The biggest difference is that, with the collapse of the newspaper industry, you typically don't have battling outlets trying to uphold contradictory narratives, which could be checked against each other. That really kept the worst impulses of the journalistic community in check.
Michael Crichton coined the term Gell-Mann Amnesia in 2002 or so, so it’s been at least 20+ years. I don’t doubt at all that it goes a lot further back. Orwell wrote 1984 as a reaction to British newspaper reporting of the Spanish Civil War, noticing how every outlet was lying to further their narrative.
Re: Japan, an American man would describe his wife as 'amazing' (5*), anything else would require explanation; a Japanese man would describe his wife as 'normal' (3*), anything else would require explanation. Dodger S. Otani earlier this year introduced his new wife as a "normal Japanese woman."
I would like an official tier system for commander decks. Having a rule zero conversation works fine when you're playing with your friends. But for commander night at my local game store, there are always people who either don't really know what's in their deck, are too lazy to discuss it much, or are sneakily sandbagging. So we get a lot of mismatched games.
Yelp has the most accurate ratings, but the app is buggy, way worse than it was a decade ago. Example: every time you move the map or start your search over, it removes your filter options. It also automatically zooms to sponsored listings for a few seconds after a search. So distracting.
Google Maps is not buggy, but the reviews suck: Americans just spam five stars whenever a place is above mediocre, so you get a ton of places that are 4.5 that should be 3.5.
So what I do is use Google Maps for discovery, then cross-compare with Yelp. It's unwieldy when you're on the go, but necessary.
ELO ratings all the way! Though not always practical (e.g. can't use with restaurants, as you can only compare places you've been), they are criminally under used.
Wrote about it here (though the site I built is down ATM - went more viral than expected...) https://logos.substack.com/p/cellar-door
Don't ELO ratings depend structurally upon the distribution of the things being compared? For example, if you have Chess players A, B, and C, with ratings of 1000, 1200, and 1400, then the ELO rating is telling you not just about their relative ranking, but also about the chance of A beating B, B beating C, and A beating C.
For those ratings to be truly accurate and stable, that means the underlying mathematical structure of the rating has to match the structure of how players at different levels of skill compare against each other. Otherwise, you may have a system that accurately predicts how often A will beat B while 200 points lower rating, and accurately predicts how often B will beat C while 200 points lower rating, but fails to accurately predict how often A will beat C while 400 points lower rating.
The consequence of the ELO model not matching the structure of the thing being rated is that ratings become unstable. The correct rating for player A playing against player B (1200 rating) won't be the same as the correct rating for player A playing against player C (1400 rating). In that situation, in order to correctly interpret what a rating means, you would have to know what the ratings were that the player was being rated against (and who those opponents had been rated against, etc., ad nauseum).
Yes ELO assumes the transitive property, but that's true and implicit in star ratings too, no?
The transitive property isn't the issue here; as you say, it is assumed in ELO, and built into the structure of fixed star ratings.
My concern is that ELO assumes a particular probability structure; for the standard ELO, a 200 point difference represents a 76% chance of victory, while a 400 point difference represents a 90% chance of victory. This means that if player B has a 76% chance of beating you, and player C has a 76% chance of beating them, then player C has a 90% chance of beating you. When that probability structure is correct, then it doesn't matter if you play player B or player C; either way, your expected rating will converge to the same equilibrium value.
However, if that structure is not correct (and in general, why would it be?), then your rating will have a different equilibrium value depending on who you are competing against.
re: 5 stars, while the AirBnB magnet is held as a bad example, it does one thing right: it makes it explicit what the ratings mean to both the reviewer and reviewee. Obviously it also needs to be explicit and understood by all the other users of AirBnB, but it's a necessary first step.
Instead of having stars - which work well for art, as you say with movie reviews - you just need the rating system to have those words. No stars, no numbers, no smiley faces - just words:
- excellent beyond expectations
- good, met reasonable expectations
- acceptable, but there were some non-critical issues
- bad, with critical issues that make this a thing you should not buy
- RED ALERT, injury or death may ensue, criminal venture, please shut these people down and/or arrest them
Whatever words, so long as they are always the SAME words in the same industry/context/website, and there's at least 1 not-the-highest level/rating generally understood to be "they made me review this, it was fine, whatever" and represents "the employees did their job and you, management, who is reading this, have no reason to complain."
Edit: another reference point would be those pain scales on the doctor's office walls. Yes, they have the color shaded smiley faces and too many numbers, but they outline that some levels are "fine, don't worry" and others are "JESUS CHRIST I AM DYING"
I was laughing over the exact same sticker (from the post) at an Airbnb this weekend. It does make sense to have {defined scale} | {broken averages}, but it’s still frustrating given that I would rather not have broken averages in the first place. Give me the Japanese system! We’re wasting so much room with a scale this way.
Re eating out, a non-regulatory problem is that restaurants are incentivized to prioritize taste over health to a degree that people never would in their home kitchen. And this is hard to avoid because of equilibrium considerations, whereas few people can bring themselves to add the levels of fat sugar and salt to their own food that the restaurant would.
Amazing accidental typo: glamorize vs. glomarize
"So won’t this mean you’ll simply have to add a second earlier date for ‘peak quality,’ and some people will then throw out anything past that date too? Also, isn’t ‘peak quality’ almost always ‘the day or even minute we made this? Who is going to buy things that are past ‘peak quality’ but not expired? Are stores going to have to start discounting such items?"
In the UK we already have a sell-by date and a use-by date. I understand the sell-by date to mean "This is when it's no longer as fresh as we like to advertise our products as being, but still fine." Yes, many supermarkets do discount items that are past their sell-by date but haven't hit their use-by date; no, people don't throw out things that reach that phase while sitting in their fridge. I think it's a good system!
Agreed, the new California law actually makes a ton of sense (assuming it's written well enough to not backfire somehow). Right now, food items here are inconsistently labelled with either of the 2 dates:
1. "sell by" / "best by" / "best if used by" (I think these all mean the same thing, the "peak quality" date)
2. "use by" (what people would normally call expiration date, but it's not actually a legal requirement for this to mean anything related to food safety so depending on the manufacturer, it may really mean the same thing as 1)
2 seems like the more useful one but 1 seems to be more common. I'm guessing this is because 1 is a much weaker statement so it's less liability for the manufacturer if the date is inaccurate?
Many people don't realize there's a difference and tend to throw away perfectly safe food after the "best by" date, hence all the preventable food waste.
I'm not sure if the situation is any less confusing in other states, but a quick Google search tells me it's probably the same in New York. Check your fridge and see how many items have actual expiration dates vs "best by" dates!
And yes, grocery stores here routinely discount items that are close to whatever date is printed on the package. I've never seen a store sell an item after its "best by" date (discounted or not) even though it would be safe and legal, so there's probably some more preventable food waste there.
"It’s still a statement about regulatory costs and requirements, essentially, that it is often also cheaper. In a sane world, cooking at home would be a luxury."
I would tell my high school econ students that "Do it yourself" is market failure. If you build your own deck for your house, the market is distorted in some way (and don't tell me that you love building decks, because you don't do it often). The biggest distortion is that if you build your own deck, the labor is not taxed. Same with cooking your own food.
This takes it a little to the absurd. If doing basic things for yourself, or any kinds of independence is a "market failure", and especially if the failure is me being taxed for not doing...I guess., he one job that is my comparative advantage and nothing else, then the problem is in the model and not me.
Why? I think the issue here is you're assigning moral weight to 'market failure'.
I pick my own nose: market failure
You should get a Nose Frieda s have a professional caregiver do it for you, but these professionals shouldn't require any certifications.
It seems like taxation is much smaller of an issue than principal/agent problems, information asymmetries, etc.
It's often a market failure, but sometimes just represents information transfer costs. If you like something that isn't the precise standard thing, it can be hard to convey that to others. A friend just had a contractor come in and take down a 18-foot-wide swath of trees leading up to a cabin, to make it easier to use some machinery on another part of the project, but the friend only asked for 10 feet of trees to be cleared. The missing trees are going to be years worth of reminder of the risks of outsourcing.
It’s funny so watch people come up with sophisticated explanations. Most people don’t know how to cook the sort of food they find in restaurants or how to cut trees or how to build a deck. Or they don’t have the time or don’t want to. That’s not a market failure. Haven’t you all read Adam Smith?
Second, if it is a market failure, which maybe it is, then so much the worse for free market capitalism, hail DIY anarchism! Take the argument where it leads: markets are often not a good solution to people’s problem. Maybe that’s a more general feature than you’re willing to accept.
Third, getting back to cooking: I don’t know, but things like rent, furniture, staff, energy, quality ingredients, and so on, are costs diners expect to see reflected in the price of restaurant food. You don’t count them as part of cooking at home. This whole puzzle seems completely fake and smug to me.
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/monthly-roundup-23-october-2024
The chicken registration thing in the UK seems much less onerous / arbitrary than the linked tweet implied. See for example:
https://tinyurl.com/5hdytaak
My understanding is that it is literally true that all chickens must be registered and not doing so is a criminal offense. From your link: "[f]rom 1 October, a new legal requirement for all bird keepers in England and Wales to register... regardless of the size of their flock." Another page on the same website is more explicit: "[t]he threshold for mandatory registration will be reduced from 50 birds to 1, which means all poultry and captive bird keepers will be legally required to register their birds." https://www.nfuonline.com/updates-and-information/defra-announces-changes-to-the-gb-poultry-register/
Reddit confirms it is a criminal offense, but while jail time is theoretically possible, violators are likely only to receive a fine: https://old.reddit.com/r/LegalAdviceUK/comments/1cdqz4d/unregistered_chickens_what_can_happen/l1etps4/
If by "much less onerous / arbitrary" you mean that filling out a single form is not much of a burden, that not filling it out would only result in a small fine, that there is even "no penalty for registering after 1 October" on account of technical issues with the government portal ("a high volume of applications..." who could've seen that coming?) and that the requirement is only an extension of preexisting laws and not a radical power-grab, then those are true statements, but it's an issue of values, not facts.
What confused me at first was precisely the claim that "all chickens must be registered." Taken literally, this is false: the law requires all *poultry keepers* to be registered, not all chickens.
This makes a substantive difference in my view.
A law that treated chickens like dogs in many US states and cities -- each one is registered individually with the government -- would have been absolutely bonkers. (One of my eggs hatched and now I have to spend 20 minutes on a government portal registering it as a chicken?!) This was my initial interpretation based on a literal reading of the statement, perhaps colored by my lived experience that the UK really does pass laws this insane from time to time.
But a law that requires poultry keepers to register with a stated goal of helping to manage infectious disease outbreaks sounds like it could potentially pass the cost / benefit test. I'm not saying it does, but it's also not a "point and sneer" example of government run amok.
You're right. It's good that the law isn't as insane as I thought it was.
For the record, I must point out that I was only pointing and sneering at the fact that the portal went down.
That's definitely worth pointing and sneering at!
I've accepted that 5 stars means that expectations were met, but I refuse to participate. My solution is to only rate when service exceeds expectations.
Unfortunately, it is worse than described. I ordered something from Etsy and had truly horrible service and left an (appropriate, in my view) 2-star review. It was deleted and the item/store still has a perfect 5-star average.
Restaurant food should be compared to cooking at home daily from scratch with beautiful plating and someone else cleaning up for you, following hygiene rules as prescribed by inspectors. (Note: there may well still be privately funded voluntary health inspections for restaurants so you don't lose this cost entirely in a free market.) And if kitchen space separately and its not included in the living space.
When I cook at home I vary the quality a lot based on the time and convenience, eat day-old leftovers, don't bother with plating and can dispense with hair nets and rubber gloves. Also nobody cleans up for me. Maybe you can factor that into my time cost?
> I don’t doubt it helps but also companies can simply not require such agreements at this point?
This is a commons / prisoners delima problem. Every company in SV benefits from hiring alumni from other companies with collective knowledge.
But in either equilibrium, your company would be better off if it could enforce a non compete.
That is far from obvious to me - I should be willing to take an otherwise substantially worse deal in order to avoid a noncompete.
Ah, that's a good point. I've only been close to recruiting in SV new grads, where I wouldn't expect new hires to consider that (and NCs weren't enforceable).
But a wise new hire should factor that in. Perhaps optimism bias puts leaving for a competitor beyond the planning horizon.
"facial ticks" -- two gentle corrections. First, the phrase is "facial tic". Second, it doesn't mean what you think it means. A tic is a repeated muscle spasm. The research focused on facial features which are things like relative size, symmetry, stuff like that.
I'd like to think it was a tragedy of the J-schools, not the commons, because you'd have an obvious point where you could fix it. But thinking back on my own interactions with journalists back in the late 70's and 80's, before the J schools had come to dominate journalism, I do recall it being the case that news reports of anything you had personal knowledge of were typically wrong in serious ways.
Sometimes it was just casual inaccuracy. Other times, ways that were really hard to think weren't deliberate, frankly. Even then upholding the narrative often was more important than accuracy. Like reporting a mixed race political rally I'd been at as lily white, and carefully not photographing anybody but whites. That one shocked me pretty deeply, when I was less cynical than today.
The biggest difference is that, with the collapse of the newspaper industry, you typically don't have battling outlets trying to uphold contradictory narratives, which could be checked against each other. That really kept the worst impulses of the journalistic community in check.
Michael Crichton coined the term Gell-Mann Amnesia in 2002 or so, so it’s been at least 20+ years. I don’t doubt at all that it goes a lot further back. Orwell wrote 1984 as a reaction to British newspaper reporting of the Spanish Civil War, noticing how every outlet was lying to further their narrative.
Re: Japan, an American man would describe his wife as 'amazing' (5*), anything else would require explanation; a Japanese man would describe his wife as 'normal' (3*), anything else would require explanation. Dodger S. Otani earlier this year introduced his new wife as a "normal Japanese woman."
I would like an official tier system for commander decks. Having a rule zero conversation works fine when you're playing with your friends. But for commander night at my local game store, there are always people who either don't really know what's in their deck, are too lazy to discuss it much, or are sneakily sandbagging. So we get a lot of mismatched games.
For NYC:
Yelp has the most accurate ratings, but the app is buggy, way worse than it was a decade ago. Example: every time you move the map or start your search over, it removes your filter options. It also automatically zooms to sponsored listings for a few seconds after a search. So distracting.
Google Maps is not buggy, but the reviews suck: Americans just spam five stars whenever a place is above mediocre, so you get a ton of places that are 4.5 that should be 3.5.
So what I do is use Google Maps for discovery, then cross-compare with Yelp. It's unwieldy when you're on the go, but necessary.