typodupeerror

## Metrics Mania and the Countless Counting Problem138

mobkarma writes "Einstein once said, 'Not everything that can be counted counts, and not everything that counts can be counted.' A New York Times article suggests that unless we know how things are counted, we don't know if it's wise to count on the numbers. The problem isn't with statistical tests themselves, but with what we do before and after we run them. If a person starts drinking day in and day out after a cancer diagnosis and dies from acute cirrhosis, did he kill himself? The answers to such questions significantly affect the count."
This discussion has been archived. No new comments can be posted.

## Metrics Mania and the Countless Counting Problem

• #### Technically (Score:4, Funny)

on Thursday May 20, 2010 @02:26PM (#32282868) Journal

Cancer itself could be considered a form of killing yourself.

• #### Re: (Score:2)

Cancer itself could be considered a form of killing yourself.

I believe your literal translation is misplaced. The term "killing yourself" strongly implies an explicit and voluntary act that results in your death. Merely having your body mutate without doing something to cause it (like jumping into a toxic waste dump) isn't a form of killing yourself.

• #### Re: (Score:2)

Well killing another human being doesn't have to be intentional, does it? There are quite a few accidents that happen.

It's not as if someone other than your own genes determined your cancerous state, unless as you say, you were put in a situation where you were exposed to dangerous radiation levels.

But that usually isn't the case. Either way, not intentional, I was just eluding to the whole "Having your own cells mutate and attack you" is still pretty much you, killing yourself, as unintentional as it may b

• #### Re: (Score:3, Insightful)

I was just eluding to the whole "Having your own cells mutate and attack you"

"Eluding". definition:
1. Evading or escaping from, as by daring, cleverness, or skill
2. Escaping the understanding or grasp of

"Alluding", definition:
Making an indirect reference

Yes, I'm a spelling nazi today....

• #### Re: (Score:1, Interesting)

by Anonymous Coward

Especially since the FP was relying on being "technically" correct (which, in all fairness, is the BEST kind of correctness)...

• #### Re: (Score:2)

Go smoke your grass somewhere else, ya damn grammar hippie freak!

• #### Re: (Score:2)

It's an allusion!

("Final countdown" starts playing in the background)

• #### Re: (Score:2)

Yes, I'm a spelling nazi today....

I think the distinction between spelling and meaning/usage has alluded you ;-)

• #### Re: (Score:1)

Knowing that excessive alcohol and tobacco use greatly increases your risk of heart problems and cancer, and doing it anyway I believe IS a form of slow suicide.
• #### Re: (Score:2)

I do not smoke. i do not drink.

Knowing that excessive alcohol and tobacco use greatly increases your risk of heart problems and cancer, and doing it anyway I believe IS a form of slow suicide.

No it is not. It is though a risk. It is a choice to take the risk.

• #### Re: (Score:1)

OK. I just know people who are trying to destroy themselves with alcohol. It is their own choice.
• #### I thought the joke went the other way? (Score:2, Funny)

by Anonymous Coward

Q: What did the clinically depressed alcoholic man with acute cirrhosis get for Christmas?
A: Cancer

• #### Deep down (Score:2)

We're all just afraid of uncertainty. It is the shadow from which anything potentially could arise. Our brains are just hardwired to be much more fearful than hopeful (for obvious evolutionary reasons).
• #### Re: (Score:2)

We're all just afraid of uncertainty. It is the shadow from which anything potentially could arise. Our brains are just hardwired to be much more fearful than hopeful (for obvious evolutionary reasons).

It really depends on the context. For some things were overly fearful and for some were overly hopeful. One of the most common errors in reasoning is to engage in wishful thinking. Some forms of wishful thinking are very blatant with people explicitly believing in something because they'd rather have it be true than not.

• #### cirrhosis (Score:3, Insightful)

on Thursday May 20, 2010 @02:41PM (#32283068)
You don't die of cirrhosis by drinking heavily for a short time. You may die of alcohol poisoning.
• #### Re:cirrhosis (Score:5, Funny)

on Thursday May 20, 2010 @02:45PM (#32283128) Homepage
I believe you just counted something that didn't need to be counted.
• #### Insightful (Score:1, Redundant)

This comment is insightful, not funny.
• #### Re: (Score:2)

Cancer can last years, you can easily destroy your liver before then if you spend all your time drinking.

Poisoning requires a very large amount of alcohol in your blood at one time, well above the level that liver damage starts. If you maintain a constant blood-alcohol level above what your liver can handle you will be actively destroying your liver.

It's certainly possible for a depressed non-drinker to turn to drinking as a form of self-medication and destroy their liver before the cancer does. Poison le

• #### Goodhart's law (Score:3, Insightful)

on Thursday May 20, 2010 @02:44PM (#32283122)

Sounds like a restatement of the simultaneously-discovered Goodhart's Law [lesswrong.com], Lucas critique [wikipedia.org], and Campbell's Law [wikipedia.org].

Basically, once you start measuring something as a proxy for what you really want to know, people start to take the proxy into account when making decisions, to the point where it becomes useless as a measure for whatever it was intended.

Here, people take these cancer tests as a measure of their probability of cancer. But once they start to treat them as reliable, they start doing more self-destructive things, destroying the correlation between the proxy (the cancer test) and the actual probability of cancer.

• #### Lies, damned lies, and statistics (Score:5, Insightful)

on Thursday May 20, 2010 @03:10PM (#32283464)

Many years ago, I had an in-depth discussion about gathering statistics on heart disease with a woman on the board of the American Heart Association. This was a big deal. Serious ethical issues were in play and there was a great deal of infighting going on.

I asked her how you make a definitive decision that someone has heart disease. I was trying to figure out what to measure. Her answer surprised me. She said "You wait till they die. Then you cut out their heart and have a look." She then went on to patiently explain to me that the only thing that could be measured and evaluated were "markers" of heart disease. Those markers, as revealed by various disgnostic tests, could be mighty reliable. But you never know if someone is going to die of heart disease until they...you know...actually *die*.

Thus informed, I came to realize that what we measure is almost never what we really want to know. Measuring the right stuff is simply too hard to do. No matter where you look, this is almost universally true. In my job, for example, we fix computer problems. Thus, we measure how many incidents get closed and how much time it took. If you quickly close an incident, then surely you've provided good service, right? Most slashdotters should realize that's not true. In fact, my job is actually to get other, more important workers back to work asap. The only way to measure that would be to interview my customers and their bosses. We'd have to pry for an hour into their effectiveness to find out if I properly completed a job that took me five minutes. That's too much trouble, so we look for markers. Closed incidents. Timeliness of closures.

Measures are inadequate so often that I pretty much don't trust anything that contains them. After years of training in Quality Improvement Processes, I came to realize that the amount of time needed to understand a process and perfectly spec out what needs to be measured is 452% of the expected life cycle of the project, plus or minus a 17.5% margin of error. (Aside - How much do you trust those statistics?)

Almost no one can devote the time required to do the job (no matter what "the job" is) right. We just hope people do their best and trust to good intentions.

As a computer guy who wants things to be either "yes" or "no", unambiguously, I found this state of affairs very difficult to accept. But it's just part of being human.

• #### Re: (Score:1)

Another example is the processor clock frequency. People took the frequency as indication of processor speed, and Intel reacted by making the Pentium do less per clock cycle, so they could increase the number for the same actual speed.

I also guess measuring programmer productivity in lines of code actually encourages not reusing code (after all, if you write basically the same functionality again, you get more lines of code than if you just reuse existing code).

• #### Re: (Score:2)

Indeed! Identifying what proxies (http://en.wikipedia.org/wiki/Proxy_%28statistics%29) to use is one of the trickier aspects in the soft sciences and statistics. If you read the Economist, you'd see proxies for just about everything (e.g. http://www.economist.com/markets/bigmac/ [economist.com]), and a lot of research is required just to show what a given proxy measures.

• #### Re: (Score:3, Interesting)

by Anonymous Coward

In fact, my job is actually to get other, more important workers back to work asap.

A refreshing point of view. A surprising amount of IT weenies seem think that what they do is the most important thing in the entire company, and that the rest of the organization needs to bow down to their whims.

I remember having to explain to an IT worker that if they weren't going to change the schedule of the forced anti-virus full scan from 10:30am, I was going to delete the software since it was keeping me from doing m

• #### Re: (Score:2)

Where I work, the weekly check-it-all AV run is scheduled for Sunday nights. That takes care of all the desktops in the office. Laptops run, then, as soon as they get put on the network Monday. Generally, people don't mind, especially since our AV software runs in the background and doesn't slow anybody down enough that it's worth complaining about. Their machines are a bit sluggish on Monday morning but, then again, so are most of the workers.

The folks who find that the AV scan slows them unacceptably

• #### Re: (Score:2)

He didn't seem to understand that... having almost every machine be unusable for three hours in the middle of the work day cost a lot of productive time.

Oh... You had Symantec's AV, huh?

• #### Re: (Score:2)

As a computer guy who wants things to be either "yes" or "no", unambiguously, I found this state of affairs very difficult to accept. But it's just part of being human.

I wish my supervisor would accept this. But then, his supervisor would need to accept it, and on and on to the top. I feel like I spend more time at my job trying to quantify my work than actually doing it. And the resulting numbers are always meaningless.

• #### Re: (Score:2)

Interesting. Before we had mandatory ticketing software, deskside IT support folks in my organization had assigned user populations. I had 350 (or so) officers to keep happy. That was my job. Screw tickets, screw counting anything. If the officers that depended on me were happy, I was happy. And so was my boss.

Then we started measuring things and the quality of my worklife took a big hit. I'll never forget a crusty old sysadmin who spoke out during a training session to an HQ analyst. Quote: "I can

• #### Re: (Score:2)

A lot of it has to do with the way we do business today. Everyone knows who the bad employees are, but you aren't really allowed to say anything or call them out. Even if it were socially acceptable, it is practically illegal to fire someone for incompetence, unless you can prove they are incompetent. That's where the metrics come in. Once you need to fire someone, you have numbers to back it up. It even eliminates the awkwardness of having to confront someone about their poor work ethic or the low qu
• #### Re: (Score:2)

What you call "the way we do business today", I prefer to call "crappy management". A bit shorter and more to the point.

• #### Re: (Score:2)

<sarcasm>But a manager can't fire someone just because they think they are a poor employee! Surely someone's livelihood should not be undone because of the opinion of their manager, right?</sarcasm>
• #### Re: (Score:2)

not sure exactly if its related, but this reminded me of a experiment done where a downtown area had all kind of traffic lights and such removed. The end effect was contradictory, as the actual rate of accidents went down.

But then any corp thats publicly traded the job of the management is not to sell products or services, but to make the corporation look good on the trade floor.

• #### Re: (Score:2, Insightful)

When you quantify your work, don't forget to quantify the work you invest in quantifying your work.

• #### Re: (Score:2)

The most annoying thing about the bean-counting mentality is the creation of beans to count. For example, in the NHS it is really difficult to do stats on patient care as each case is different so the beancounting management impose additional paperwork or data entry tasks to create countable beans. Hooray! Suddenly more 'data' for layers of management to fight amongst themselves with, at only the cost of reduced patient care due to reduced medical (as opposed to clerical) time available to the medical staff

• #### Re: (Score:2)

heck, excessive bean counting may well have been what drove the soviet union to collapse.

• #### Re: (Score:2)

The only way to measure that would be to interview my customers and their bosses. We'd have to pry for an hour into their effectiveness to find out if I properly completed a job that took me five minutes.

It's not really as hard as that. There is an interim marker you can use that is nearly as good as anything you'll get from an interview. All you really need to know is if the customers are happy with your service. If your service eats up a lot of their time, i.e. doesn't keep them working as long as possible, they won't be happy.

There are still limits to that - it doesn't work if your IT department is servicing 3,000 users. However, if it's 300 users with a single point of contact, you'll know pretty qu

• #### Re: (Score:2)

...it doesn't work if your IT department is servicing 3,000 users. However, if it's 300 users with a single point of contact, you'll know pretty quickly how well you are performing...

My IT department services about 100,000 users.

I still think that just asking my customers how well I helped them with their problems is the best way to gauge my performance. I argued along those lines to management for a while but to no avail.

Now, we're in the process of removing nearly all "deskside" support and forcing em

• #### Re: (Score:2)

Sounds like a restatement of the simultaneously-discovered Goodhart's Law [lesswrong.com], Lucas critique [wikipedia.org], and Campbell's Law [wikipedia.org].

Basically, once you start measuring something as a proxy for what you really want to know, people start to take the proxy into account when making decisions, to the point where it becomes useless as a measure for whatever it was intended.

A few years back I was working for a major corporation that was pushing Six Sigma [wikipedia.org] as the holy grail for all problems, and I was forced to attend some seminars. (Afterwards I christened the program Six Sigmoidoscopies [wikipedia.org] , which may have even underestimated the pain involved.) One of the presenters talked about the difficulty of applying hard statistical quality analysis to something as abstract as software development, but more or less proceeded to say that the solution was to find whatever metrics could be

• #### Re: (Score:2)

Btw, I've had a flexible sigmoidoscopy, and they're not painful, they administer something IV to knock you out so it's over before you know it and you don't experience any pain, unless you count hearing yourself fart a lot afterward.

• #### Re: (Score:2)

Basically, once you start measuring something as a proxy for what you really want to know, people start to take the proxy into account when making decisions, to the point where it becomes useless as a measure for whatever it was intended.

Sounds like drunk driving on so many levels. Once, it was about impairment. Then they had definitive numbers about how much alcohol was in your system, so the level of impairment was irrelevant, they just counted the count and made the count itself illegal.

Or the numbe
• #### No counting problem that I can see (Score:2)

So the problem isn't one of having too much data but rather unreliable correlation of that data to draw conclusions. What exactly is new here?

• #### Re:No counting problem that I can see (Score:4, Informative)

on Thursday May 20, 2010 @03:09PM (#32283434) Journal

Actually the "counting" problem they mentioned is a categorization problem. Depending how you define your categories, you get different counts. But that's because those are really different categories (they are defined differently). So the question is not really one of counting, but one of the "correct" definition of the category.

• #### Re: (Score:2)

There's another problem, and that problem is the edge cases that don't fit the statistics. Statistically speaking, smoking causes cancer and will kill you, and usually does. Despite this, there was a woman (now dead) who was, at the time, the world's oldest human. She had a cigarette every day after lunch until she died at age 112.

My own great-uncle started smoking at age 12, and stopped seventy years later when a lip cancer scared him. Ten years later HE died of old age at 92, long past the age most of us

• #### Re: (Score:1)

There's another problem, and that problem is the edge cases that don't fit the statistics. Statistically speaking, smoking causes cancer and will kill you, and usually does. Despite this, there was a woman (now dead) who was, at the time, the world's oldest human. She had a cigarette every day after lunch until she died at age 112.

That reminds me to the joke where the reporter speaks with the 100 year old. The reporter asks: "What do you think why you got that old?" - "I don't drink, I don't smoke, and I do

• #### Re: (Score:2)

The version I heard was a bit longer (and I think funnier).

A reporter is interviewing a 100 year old man, and asks what he attributes his longevity to. The old man says, "Well, in the first place, I don't drink. I go to church every Sunday, and I never let a drop of alcohol touch my lips. I don't smoke and I don't drink. I eat healthy foods, and I never drink. I get plenty of exercise, and I never drink." At that point a loud crash comes from the other room. Startled, the reporter exclaims "What was that?!?

• #### Re: (Score:2)

The problem is having too much data that means too little, or not enough data that means too much.

Take the summary's example:

If a person starts drinking day in and day out after a cancer diagnosis and dies from acute cirrhosis, did he kill himself?

If your goal is to find deaths caused by Cancer, and your statistic is "Within six months 40% of people with cancer die", is it the cancer that is killing them? How many of them are dieing in automobile accidents in that period of time? How many fell off a roof? Should a guy who drinks himself to death because he has cancer count as a death caused by cancer? What about a guy who wa

• #### Figures (Score:2, Funny)

I finally learn how to count and now they tell me it's useless. What's next, I learn how to type and I find out nobody is reading what I write?

tl;dr

• #### I totally agree (Score:2)

Counting things counts for 23% less than it did last millennium.

• #### Re: (Score:2)

I don't believe that since 26% of statistics are made-up on the spot.

• #### Re: (Score:2)

Well, back in my day, statistics were made up on the spot 42% of the time. Times, they are a changing.

• #### Count on it! (Score:2)

You can count on metrics being a problem.

• #### Re: (Score:2)

> You can count on metrics being a problem.

Right. Give me goog old American feet and pounds any day.

• #### The Improving Economy (Score:2)

I have heard this issue raised regarding reports of the health of the economy. Retail sales are shown to be up, but only because stores that go out of business are dropped from the counting. If there were still there counting as big fat goose eggs the average would show that the economy is in fact contracting.
• #### Re: (Score:2)

The problem is worse there, because ALL the figures that the government uses to measure the economy have been systematically tinkered with to make the current economy look better for at least decades. Which means that time series is impossible. (They keep changing the definitions of what any particular thing measures.)

Try to find out what the current money supply is, e.g. Which measure do you use, and what does it actually measure?

The current administration is always under pressure to make the economy lo

• #### The problem IS with statistical tests (Score:1)

Scientists, etc. use statistical tests to get information about something they can measure, and how well that measurable quantity can be predicted from their data set. They form a hypothesis that variable X can be predicted from the data. They test their hypothesis, and calculate the probability that knowledge of the data set will lead to a correct prediction of X. If they get something like 68%, 99.9%, etc., they're happy and they write it up. Perfect, but Suppose in some parallel universe (that some str
• #### Re: (Score:1)

All else being equal, 60% of the time, statistical tests work every time.
• #### Re: (Score:2)

You raise an interesting point about how sometimes measuring less might mean more.

but Suppose in some parallel universe .. similar scientists had been more diligent, and conducted statistical analyses on not just X, but on 10^10 other variables. .. Same as in the other universe, they find that 99.9% of the variance in X can be predicted by the data set, but since they tested so many variables, they can't claim significance. By random chance, a lot of other variables did even better than X.

However, I thought the way science is supposed to work is that you make an a priori hypothesis about how X should behave and then try to validate it by experiment or measurement. A posteriori correlations discovered after an experiment or measurement (no matter how few or many variables measured) usually count for naught--as in the familiar adage "Correlation is not causation."

• #### Real Numbers (Score:1)

I always interpreted that quote as a comment on the existence of the real numbers.
• #### This is why we have so called 'soft' sciences (Score:1)

...'unless we know how things are counted, we don't know if it's wise to count on the numbers'.... Which is why we still spend money on, public health research, and other so called 'soft' social sciences! GIGO...

#### Related LinksTop of the: day, week, month.

Logic is the chastity belt of the mind!

Working...