Slashdot is powered by your submissions, so send in your scoop

Improperly Anonymized Logs Reveal Details of NYC Cab Trips 192

Posted by Unknown Lamer on Monday June 23, 2014 @07:55PM from the check-your-proof dept.

mpicpp (3454017) writes with news that a dump of fare logs from NYC cabs resulted in trip details being leaked thanks to using an MD5 hash on input data with a very small key space and regular format. From the article: City officials released the data in response to a public records request and specifically obscured the drivers' hack license numbers and medallion numbers. ... Presumably, officials used the hashes to preserve the privacy of individual drivers since the records provide a detailed view of their locations and work performance over an extended period of time.

It turns out there's a significant flaw in the approach. Because both the medallion and hack numbers are structured in predictable patterns, it was trivial to run all possible iterations through the same MD5 algorithm and then compare the output to the data contained in the 20GB file. Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.

This discussion has been archived. No new comments can be posted.

Improperly Anonymized Logs Reveal Details of NYC Cab Trips

Load All Comments

Search 192 Comments Log In/Create an Account

Comments Filter:

Oops. (Score:2)

by mythosaz ( 572040 ) writes:

"Oops"
-New York
- Cue the DMCA. (Score:2)
  
  by MickLinux ( 579158 ) writes:
  
  Oops.
  - Cue the DMCA. (Score:2, Insightful)
    
    by Anonymous Coward writes:
    
    In other news, the credentials for their plug-n-play coffee machine are 'admin' 'admin', and their gym locker combo is 1234. Someone made a half-assed attempt to obfuscate some data that nobody cares about (unless your husband's a cheating cabbie, I guess) and someone cracked it. News?
- - - - Re:Oops. (Score:5, Insightful)
        
        by philip.paradis ( 2580427 ) writes: on Tuesday June 24, 2014 @04:35AM (#47304281)
        
        The United States dollar [wikipedia.org] is the currency preferred by drug dealers, whose trade is in fact made more profitable by the failed "War on Drugs" [wikipedia.org].
        
        Parent Share
        twitter facebook
        
        Re:Oops. (Score:4, Insightful)
        
        by philip.paradis ( 2580427 ) writes: on Tuesday June 24, 2014 @07:37AM (#47304727)
        
        The War on Drugs is a massively successful enterprise if your definition of success is the ability to extract billions of USD worth of funding from taxpayers, with a disproportionate amount of said funding going to the overt militarization of police forces in the USA at the expense of civil liberties and human rights. However, if your indicators of success are tied to social, medical, or economic improvement for the citizens of the United States of America, the entire affair is indeed a massive failure.
        For reference, this is coming from someone who consumes nothing more than nicotine (vaping these days, gave up cigarettes after 20 years) and whiskey, and once wore an actual military uniform for a living.
        
        Parent Share
        twitter facebook
Data Security Officer (Score:5, Insightful)

by FlyHelicopters ( 1540845 ) writes: on Monday June 23, 2014 @08:05PM (#47301949)

Too many governments and corporations continue to fail to understand that it requires having experts who actually know what they are doing be in charge of data security.
This doesn't mean you contract it out to the lowest bidder or hire the cheapest CS degree you can find.
It means you hire knowledge and experience, you hire expert skills, and those cost money.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by fuzzyfuzzyfungus ( 1223518 ) writes:
  
  In this case, it sounds like whoever got handed the job just couldn't, didn't care to, or was overruled about, thinking like an attacker.
  
  There are probably subtler methods of de-anonymizing the data that would require nontrivial skill to think of and counter; but it's a bit surprising to see somebody who knows enough about manipulating data to pull 20GB of records and hash a single field in each one without hurting himself or munging the result; but doesn't think "Medallion numbers are written on cabs. S
  - Re: Data Security Officer (Score:2)
    
    by MalleusEBHC ( 597600 ) writes:
    
    Adding a salt is a trivial way of fixing this.
    - Re: (Score:3)
      
      by WaffleMonster ( 969671 ) writes:
      
      Adding a salt is a trivial way of fixing this.
      No it aint.
      - Re: Data Security Officer (Score:4, Informative)
        
        by Anonymous Coward writes: on Tuesday June 24, 2014 @04:22AM (#47304255)
        
        A naive use of salt would mean that you might as well omit the data. The aim of including the values in hashed form is to be able to say: This is the same driver as this. So same numbers have to hash to same numbers, which means you can't hash individual lines with different salts or you lose that information. In order to keep that information, you have to hash same numbers with the same salt each time. That basically gives you a random number with which to replace each number. So that works, but it removes the reason for using a hash, which is to have a local operation which creates a global irreversible one-to-one mapping. If you have to create one salt per unique number, you might as well use the salt as irreversible identifier.
        
        Parent Share
        twitter facebook
    - Re: (Score:3)
      
      by m.dillon ( 147925 ) writes:
      
      Except you can decode the salt trivially if you took a cab ride that happens to be in the data set and you recorded the license and medallion number. At which point the salt is useless.
      -Matt
      - Re: (Score:2)
        
        by fuzzyfuzzyfungus ( 1223518 ) writes:
        
        It does make your table 'o handy precomputed hashes unhelpful; but on such a computationally trivial keyspace that barely matters.
        
        I wonder if the choice of hashing, rather than substituting a UUID, was based on not thinking through the weakness of a hash under the circumstances, or based on the extra difficulty of making sure that the same UUID is substituted for the same hack and medallion number in all instances? It's not a whole lot of additional difficulty; but the tipping point has to live somewhere
        
        Re: (Score:2)
        
        by cheater512 ( 783349 ) writes:
        
        What part of the story used ANY precomputed rainbow tables? None.
        salt + "1234", if you know the "1234" then its a tiny brute force to get the salt.
    - Re: (Score:2)
      
      by msauve ( 701917 ) writes:
      
      Using a one time pad is even easier.
      - Re: (Score:2)
        
        by ColdWetDog ( 752185 ) writes:
        
        For taxi cabs?
        
        Re: Data Security Officer (Score:4, Informative)
        
        by msauve ( 701917 ) writes: on Monday June 23, 2014 @10:28PM (#47302889)
        
        Sure. I'm assuming there's a requirement to have a unique transformation of medallion numbers (otherwise, you wouldn't have to include even a hashed version)...
        
        Instead of applying some hash to the medallion number, just do something like:
        Change all appearances of the first number in the list to "1". Change all appearances of the next unique medallion number in the list to "2." Etc.
        
        The result is in essence a OTP. Unless records of the process are kept, it's irreversible (lacking external info, such as medallion number x picked up a fare at location y at time z and correlated info is in the info provided)..
        
        Parent Share
        twitter facebook
        
        Re: (Score:2, Insightful)
        
        by philip.paradis ( 2580427 ) writes:
        
        I'm appalled that your post has been modded "informative." Please do us all a favor and abstain from any future posts on cryptography. Instead, I recommend you spend your time with resources like Applied Cryptography [schneier.com]. Seriously, please put down the shovel, and if you're doing anything involving crypto for a living, please do the world a favor and resign today.
        
        Re: (Score:3)
        
        by complete loony ( 663508 ) writes:
        
        Anonymising the data just requires replacing each key with something unrecognisable. The GP's suggestion passes the smell test, though I would suggest randomising the list instead of assigning id values sequentially.
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        You clearly have no idea whatsoever what a one-time pad [wikipedia.org] is. Reference my other comments in this thread for additional hints as to why msauve's error is particularly egregious in this context. Alternately, stay ignorant. Your choice.
        
        Re: (Score:2)
        
        by N1AK ( 864906 ) writes:
        
        It is informative. Unless you knew that a particular record in the dataset was for a specific medallion/plate combo then what he's suggesting is sufficient to obscure the driver. If you did know that then you couldn't obfuscate the data without making it impossible to tell which records relate to the same (known) vehicle. If you're happy to do that then you could just not include any reference to either medallion or plates in any format in the data.
        
        I'm not remotely surprised that someone on the internet
        
        Re: (Score:3)
        
        by msauve ( 701917 ) writes:
        
        philip.paradis is simply being a assholish troll.
        
        The original medallion and license(?) numbers need to be transformed into unique but consistent identifiers in the output, so one can still follow an individual cab/driver, but not be able to identify them in the real world.
        
        Assuming the dataset is ordered in some way (such as by date and time, which seems logical), even changing each cab/driver number to a unique, truly random number wouldn't be any more secure than the sequential assignment I gave as an ex
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        You're still completely wrong. I'm willing to spend my own money to ship you hardcopy references that will help you better yourself, in the hope that you will stop dispensing the sort of horrid advice you're continuing to regurgitate here. Why aren't you willing to take me up on this offer? Are you unable to provide a shipping address of any sort?
        
        Re: (Score:2)
        
        by msauve ( 701917 ) writes:
        
        Repeating an incorrect statement doesn't make it correct. You're really not very good at trolling, or much of anything it seems.
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        Why won't you accept a shipment of formal reference materials?
        
        Re: Data Security Officer (Score:5, Funny)
        
        by msauve ( 701917 ) writes: on Tuesday June 24, 2014 @07:32AM (#47304707)
        
        Do you always dig in so forcefully when you're demonstrably wrong?
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        To be clear, advice similar to the sort you administered in the post I originally replied to is an apt explanation for why we have the number of massive failures in cryptographic functionality in software these days. You have absolutely no business even beginning to comment on this subject. May I please ship you a few hardcopy references?
        
        Re: (Score:2)
        
        by msauve ( 701917 ) writes:
        
        Somehow, I don't think I'll get an apology. [slashdot.org]
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        Are you being completely serious and saying that you don't recognize how the process described in the original post [slashdot.org] is anything but a one time pad?
        
        Re: (Score:2)
        
        by nabsltd ( 1313397 ) writes:
        
        Please demonstrate how I'm wrong. You just became a personal project of mine.
        The problem was that the data released in the FOIA response had personally-identifiable information (the mediallion number) replaced with something that could be used to re-generate the PII without any information that isn't public.
        The GP's scheme was to replace the PII with a number that cannot be used to re-generate the PII with just the FOIA response. The PII could be re-generated if you had some kind of extra knowledge (e.g., the mapping used, or knowledge of when a particular cab was at a particular l
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        Try again. [wikipedia.org]
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        Completely absent additional information, I'll give you another hint on why deterministic assignment is a very bad choice here, representing a practice in total opposition to OTP: curve fitting [wikipedia.org]. Is this starting to make a little more sense now?
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        Dude, msauve's proposed methodology is indeed tragically flawed, and you clearly haven't read the balance of the posts in this thread. Why are you so resistant to refutation of bad crypto advice? Are you positioned to benefit from deterministic systems which are advertised as cryptographically sound?
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        You're clearly a fan of "get one key, get 'em all." Who signs your paychecks these days?
        
        Re: (Score:2)
        
        by ultranova ( 717540 ) writes:
        
        I'm appalled that your post has been modded "informative."
        
        I'm appalled that yours has been modded "Insightful" despite having no content beyond a verbose "you suck".
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        You must have missed the motherfucking literary reference I linked. Read the fucking book (and hopefully a few more), you fucking retard.
        
        Re: (Score:2)
        
        by ultranova ( 717540 ) writes:
        
        You must have missed the motherfucking literary reference I linked. Read the fucking book (and hopefully a few more), you fucking retard.
        
        The book you linked to is about cryptography, not incest pornography as you seem to be implying. Neither of these seems relevant to anonymising - as opposed to encrypting - records.
        Good luck with treating your Tourette's, BTW. Or hangover. Whichever is relevant here.
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        Are you confirming shipment of the book (along with a couple of other volumes) to Delft University of Technology [slashdot.org] in your care? I found it odd that even an undergraduate at such an institution would not already have access to such material, but perhaps all university copies are already on loan to other students. As an aside, you appear to be lacking the capacity to distinguish emphasis borne of extreme frustration from certain pathological afflictions. You should work on that.
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        By the way, thanks for the added laughs per your attempt to reframe this discussion as "anonymising" versus "encrypting." You'd get a few charity points for sophomoric debate tactics if the subject matter were a bit less serious in nature, but that particular bit of commentary is indeed nothing more than a juvenile attempt at diverting attention from the matters at hand. Try again.
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        Look, seriously, provide an address and I'll ship you a fucking copy of the book. Your choice.
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        Thank you. I'll dispatch the shipment in a few hours in the care of "ultranova", provided I get a response back under that user account indicating confirmation of the destination address. I'll provide a post tracking reference here once the shipment is confirmed to be in transit.
        
        Re: (Score:2)
        
        by philip.paradis ( 2580427 ) writes:
        
        Please educate yourself. [wikipedia.org]
    - Re: (Score:2)
      
      by fulldecent ( 598482 ) writes:
      
      This is correct. MD5(salt + data). Salt is same for EVERY MD5 operation. Create the file and then delete the salt, done. This is called keying.
  - Re: (Score:2)
    
    by AmiMoJo ( 196126 ) * writes:
    
    It was probably just overconfidence. Someone googled the solution, thought it didn't look hard, and told their boss they could take care of it and save $$$ in the process.
- Re:Data Security Officer (Score:5, Interesting)
  
  by Opportunist ( 166417 ) writes: on Monday June 23, 2014 @09:33PM (#47302541)
  
  You can contract it out to the lowest bidder without a problem. There only have to be 2 clauses in the contract:
  1) You have a GOOD ITSEC company audit the shit out of it before it goes live.
  2) If the audit reveals that the company taking the contract don't know jack about security, THEY will pay for the audit and THEY will improve the software until they think it's finally good enough.
  1 and 2 are repeated until 1 turns out good.
  I worked for a very long time in government. And I learned one thing: You are not supposed to know shit. You are supposed to buy knowledge.
  
  Parent Share
  twitter facebook
  - Re:Data Security Officer (Score:4, Insightful)
    
    by chriscappuccio ( 80696 ) writes: on Monday June 23, 2014 @11:44PM (#47303249) Homepage
    
    Sorry but unless you define "GOOD ITSEC company audit the shit out of it" in tangible terms that can actually hold someone liable for failure in a real way, this is just baloney. And if you define it with teeth, the price will increase. Basically, to define it properly, you'd be able to do it yourself. Oops.
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by SeaFox ( 739806 ) writes:
    
    I worked for a very long time in government. And I learned one thing: You are not supposed to know shit. You are supposed to buy knowledge.
    Isn't that how the entire job market works? That's why we have the education loan bubble we have -- employers don't believe you know anything without a piece of paper showing you spent thousands of dollars to learn it.
  - - Re: (Score:2)
      
      by Opportunist ( 166417 ) writes:
      
      I don't know about your government, in mine, there's a process and proscribed procedure for everything. I'm fairly sure there's even a defined procedure how to correctly pass gas.
      And hence there is of course a procedure for hiring. You'd actually be surprised how efficient bureaucracy can be at inventing ways to make itself indispensable. If you don't know who to hire, hire a guy to tell you who to hire.
      I am not kidding.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  It is not a surprise when you consider where else they mess up spectacularly. It is like there is no active intelligence to be had in these organizations.
- Re:Data Security Officer (Score:5, Interesting)
  
  by penix1 ( 722987 ) writes: on Monday June 23, 2014 @10:21PM (#47302855) Homepage
  
  From TFS...
  City officials released the data in response to a public records request and specifically obscured the drivers' hack license numbers and medallion numbers...
  How many of you here have had to deal with a Freedom Of Information Act (FOIA) request which is what a "public records request" is? I have had the pleasure over a dozen times. You have 10 days to respond to that request in my state. Some states it is even less. Failure to do so can result in stiff penalties. 10 days is hardly enough time to contract out to someone and have the job "done right".
  It means you hire knowledge and experience, you hire expert skills, and those cost money.
  And you are happy to have your taxes raised to pay those fees? Riiiight!
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by chromaexcursion ( 2047080 ) writes:
    
    Small problem.
    Taxi Hack numbers are available in a publicly accessible data base.
    A determined individual probably could find license numbers, they may be publicly accessible.
    Failure to understand the vulnerability is the design failure.
    A simple solution would have been to order the hashes numerically and re-number them cardinally. ie. 1,2,3 ...
    Would take less than a minute, for someone than knew how.
    Perhaps a few hours if the right person had to be tracked down.
    Never release source data.
  - Re:Data Security Officer (Score:5, Informative)
    
    by sexybomber ( 740588 ) writes: on Monday June 23, 2014 @11:34PM (#47303197)
    
    Your State may be different, but New York's Freedom of Information Law (or FOIL, we like to be different) works like this:
    The agency has to respond within five business days, but that response can read something like:
    Dear Sexybomber:
    We have received your request for public records pursuant to FOIL. Due to the complexity of the records you have requested, it may not be possible to produce them within the standard 20-day statutory period. We anticipate that we will be able to produce the records you have requested within 40 days. If you have questions or concerns, please direct them in writing to the address above.
    If they run into a snag, they have to inform you of this and produce the records within a "reasonable period".
    So it's not like NYC was under a five-day time crunch here. They could easily have responded and said it would take 40 or 60 days, being as there were several million records requested. That's definitely long enough to bring in a consultant (or even one of the more technically-literate staff members) to properly secure the data.
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by philip.paradis ( 2580427 ) writes:
    
    Hint: 30 seconds of my time leads me to believe this applies to you: Pennsylvania’s New Right to Know Law [state.pa.us]. If I'm in error on the state in question, please let me know, and I'll be more than glad to guide you to the appropriate legislation for your jurisdiction.
That was a dumb thing to do. (Score:2)

by K. S. Kyosuke ( 729550 ) writes:

Cue a CFAA trial and a long stay in a cozy federal PMITA penitentiary.
- Re: (Score:2)
  
  by Opportunist ( 166417 ) writes:
  
  And the crime would be? Exposing government stupidity?
Prediction: de-anonymization considered "hacking" (Score:5, Insightful)

by rsborg ( 111459 ) writes: on Monday June 23, 2014 @08:12PM (#47301983) Homepage

Large organizations will consistently fail to hire/staff competent people for data security related issues, and will push back on fines or punitive findings by criminalizing publicizing their incompetence.
Thus sending all such talent straight to criminals who'll be happy to reward them with hard cash.
It's like these guys _want_ a dystopian future.

Share
twitter facebook
- - Re:Prediction: de-anonymization considered "hackin (Score:5, Interesting)
    
    by Opportunist ( 166417 ) writes: on Monday June 23, 2014 @09:49PM (#47302639)
    
    True that.
    I am in the fortunate situation of having near unlimited funds. I was joking that I need a rubber stamp labeled "for security reasons", because whenever I want something, these three magic words will brush aside nearly all objections (ok, within reason, but anything 5 digits or less is nearly certainly mine if I "rubber stamp" it that way).
    The most recent draft of the security procedures I did I peppered liberally with "insanity" as I call it. It's a political thing. You demand stuff that you don't really want but is so terribly obstructive to everyone else that they'll agree with what you actually want just to get the insane levels of "security" (read: obstruction and red tape) out of the way. To my unending horror (and slight amusement) they signed it off without changing a comma. Now find out how to argue why you want your own requirements out of the crap...
    The reason isn't that our board suddenly found out how much they love security or how important the confidentiality of the (considerably sensitive, I should add) private data we hold here is. What changed is simply that our government upped the fines and punishment for data breeches considerably, up to and including jail time for board members if negligence can somehow be tacked to them. In a nutshell, unless you can show that you tried to stay on top of security when holding highly sensitive data, you should prepare to take a longer vacation, all expenses paid, in a holiday resort of your government's choice.
    I guess when your ass is on the line, you get very willing to spend money.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by chromaexcursion ( 2047080 ) writes:
      
      You've elegantly described why stiff federal penalties are needed.
      
      Interesting that when a direct line to someone's pocketbook is defined everyone gets on board, but when it's just a chance someone's drinking water would be tainted with cancer causing chemicals most can't find the connection.
      Corporate malfeasance comes in all forms.
      - Re:Prediction: de-anonymization considered "hackin (Score:5, Interesting)
        
        by Opportunist ( 166417 ) writes: on Tuesday June 24, 2014 @12:04AM (#47303327)
        
        Fines in a corporate world are a matter of risk management: How likely is it that it happens, what's the fine if it happens and how much do we save by not giving a damn? If this unholy trinity comes up with the "don't give a damn" on top, you don't give a damn and the fine becomes part of the operation cost. The more I get to play with C-Levels, the more I get the nagging feeling that I'm the only one weighed down by a consciousness.
        Actually, I think it's more insidious. It's a blame shifting game where everyone can claim he's doing it for the "greater good", because "being bad" is actually "being good". Take the scenario where some people have to be laid off. The floor manager knows them personally. He knows every single one of them, he knows their personal life, their family situation and it really breaks his heart to let one of them go, but he knows he has to. Either he fires one of them or he might have to fire them all because they won't be profitable anymore with the new requirements, and that could lead to the shutdown of the entire branch. His superior may not know the people anymore, but he has to do it because he himself doesn't make that decision, that's been decided further up. He can't simply ignore an order from C-Level. The C's don't need to be psychopaths (though it sure helps, it seems...), they can even be compassionate, but they know that the investors will only keep their money in the company if they perform well and if the cash flow is to their liking. He can easily brush any troubles with his consciousness aside when he fires a few people now, since if he didn't their quarter figures won't look nice, stock would plummet and investors will jump ship, and then he'd have to lay off even more people. But you can't even blame the investment bankers. Because they have to pick the best performing stocks, it's not their money, it's money from investors, money they put aside for their retirement, the investors have a responsibility towards the people that entrust them with their money (ok, recent history shows that most don't give a shit, but let's assume we find an investment banker with a consciousness... it's just a thought experiment, remember). The people investing money don't even know WHAT they invest in, they just toss money onto their investor with the order to "make more of it". And they're not "evil" either, they just want to prepare for their retirement. That people could well be the same that get fired now for the sake of more profit. Essentially, they're firing themselves without knowing it.
        But I ramble.
        What this is supposed to show is that in the corporate world it's easy to play the blame shifting game and use the "but I have to!" excuse. It's sad but it seems the only escape from that game is to actually grab them at the nuts and tell them that they won't be shifting the blame anywhere. And behold, it works.
        Of course that also means that I have to watch my back or it's going to be my ass that's going to jail. But fortunately all I have to do is heed the laws. And that's easy enough, surprisingly.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by skovnymfe ( 1671822 ) writes:
        
        A new car built by my company leaves somewhere traveling at 60 mph. The rear differential locks up. The car crashes and burns with everyone trapped inside. Now, should we initiate a recall? Take the number of vehicles in the field, A, multiply by the probable rate of failure, B, multiply by the average out-of-court settlement, C. A times B times C equals X. If X is less than the cost of a recall, we don't do one.
        
        Re: (Score:2)
        
        by wonkey_monkey ( 2592601 ) writes:
        
        A new car built by my company [...] car crashes and burns with everyone trapped inside. Now, should we initiate a recall?
        No, you just need to stop making such shitty cars.
        
        Re: (Score:2)
        
        by Opportunist ( 166417 ) writes:
        
        Why? As long as people buy them, there is no pressing need provided that the profit outmatches the potential fines. That's corporate logic.
        What? Oh, people die, yes. That's where the potential fines come into play.
        
        Re: (Score:2)
        
        by cellocgw ( 617879 ) writes:
        
        No, you just need to stop making such shitty cars.
        Seems a lot of people got whooshed by the original post, so:
        I have changed your automobile safety design. Pray I do not change it further -- T. Durden
        
        Re: (Score:3)
        
        by bluegutang ( 2814641 ) writes:
        
        This is not a new phenomenon. And not an easy one to solve. From The Grapes of Wrath by John Steinbeck:
        "I built [this house] with my hands. Straightened old nails to put the sheathing on. Rafters are wired to the stringers with baling wire. It's mine. I built it. You bump it down—I'll be in the window with a rifle. You even come too close and I'll pot you like a rabbit."
        "It's not me. There's nothing I can do. I'll lose my job if I don't do it. And look—suppose you kill me? They'll just hang you,
    - Re: (Score:2)
      
      by superdana ( 1211758 ) writes:
      
      data breeches
      bring me my computing pants!
  - Re:Prediction: de-anonymization considered "hackin (Score:5, Informative)
    
    by Anonymous Coward writes: on Tuesday June 24, 2014 @01:18AM (#47303621)
    
    > Target's breach cost them 50% of their revenue for a year.
    No it did not. Not even close. [cbsnews.com] At worst their profits for the subsequent quarter were down 50% or in terms of revenue, that's less than a 6% drop compared to a year ago.
    
    Parent Share
    twitter facebook
Oops, indeed (Score:5, Funny)

by Krishnoid ( 984597 ) writes: on Monday June 23, 2014 @08:21PM (#47302041) Journal

Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.
Having thereby run afoul of the circumvention of copyright protection mechanisms clause of the Digital Millenium Copyright Act, he was then subjected to the NYPD's controversial new program [theonion.com], and subsequently incarcerated.

Share
twitter facebook
Error so popular it was enshrined in PCI DSS (Score:5, Insightful)

by WaffleMonster ( 969671 ) writes: on Monday June 23, 2014 @08:54PM (#47302281)

Always assumed anywhere term "anonymized data" is used it is more likely than not to be companies and governments paying lip service to its customers... where data could easily be reversed into an identifiable way by either taking advantage of insufficient entropy or cross referencing datasets.
There is after all no cost for violating privacy or unnecessary risk exposure associated with disclosure.
One of my favorite examples of dangers of insufficient entropy stem from a PCI DSS requirement written by "experts" who should know better.
3.4 Render PAN unreadable anywhere it is stored (including on portable digital media, backup media, and in logs) by using any of the following approaches:
One-way hashes based on strong cryptography, (hash must be of the entire PAN) ...
Search space of typical 16-digit card numbers is no match for a modern CPU once you have taken check digit, card type, issuer and issuer specific numbering into account... "strong cryptography" can't fix stupid.

Share
twitter facebook
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Indeed. Any reversible transformation for a small-entropy source set is insecure. Anybody that actually understands crypto knows that. Seems this mess is just one more indicator that some people hire far too cheap when it gets to IT.
- Re: (Score:2)
  
  by swillden ( 191260 ) writes:
  
  Always assumed anywhere term "anonymized data" is used it is more likely than not to be companies and governments paying lip service to its customers... where data could easily be reversed into an identifiable way by either taking advantage of insufficient entropy or cross referencing datasets.
  It's worth mentioning that one possible solution in this sort of situation is to use a keyed hash. Assuming a good base hash (which MD5 really isn't, any more, but HMAC MD5 would likely have been fine) and a well-secured key with sufficient entropy, it is infeasible to reverse the hash. Cross-referencing may still be an issue, though straight brute force reversing of the hashing isn't. To eliminate the possibility of cross-referencing it's necessary to use a different hash key for each database.
  Of course,
  - - Re: (Score:2)
      
      by swillden ( 191260 ) writes:
      
      Thereby introducing a known plaintext into a cryptographic construct--something not to be taken lightly.
      You don't know what you're talking about.
      First, with any decent cipher or keyed hash, known plaintext by itself poses no risk to security. If it does, then by definition your cryptographic construct is pre-broken and you should get another one that works. Encryption should nearly always be randomized not because known plaintext is a problem, but to avoid replay attacks. That's not relevant here, in fact "replay" is a desired feature since the whole point is to produce IDs which can be correlated within a
- Re: (Score:3)
  
  by Wrath0fb0b ( 302444 ) writes:
  
  Um, the standard is fine. The phrase "One-way hashes based on strong cryptography" means (to any professional in the business) that one must salt [wikipedia.org] the hash with sufficient entropy to make brute-forcing the input space impossible. So 16 digit CC has little entry, but add a 16-byte hash and you've somewhere.
  So yeah, "strong cryptography" can't fix stupid, but those that know how to use it are plenty fine.
  - Re: (Score:2)
    
    by WaffleMonster ( 969671 ) writes:
    
    Um, the standard is fine. The phrase "One-way hashes based on strong cryptography" means (to any professional in the business) that one must salt the hash with sufficient entropy to make brute-forcing the input space impossible. So 16 digit CC has little entry, but add a 16-byte hash and you've somewhere.
    This is the second time 'use salts' has been mentioned. Salts are not secret keys and only provide protection against creation of lookup tables to accelerate brute force of multiple items... they in no way address the underlying problem of insufficient entropy.
    I don't know the exact figure last I looked into this space of every possible credit card that can be issued across all currently known issuers is well less than a trillion most likely in tens to hundreds of billions range... practically free by tod
    - Re: (Score:3, Interesting)
      
      by Buzer ( 809214 ) writes:
      
      Salts do provide protection against that. Salts are secret if you want them to be (you can protect the plain text salt same way as you do protect your plain text keys for encryption), you only need to share them when other party has to be able to hash their original data.
      Here are some sha1 hashes:
      
      4c2199828f355281e0f6eccb76d9df609f99ed0e salt+"123"
      458183225b77f6baff7c4c439b0ed3a5e7278e8a salt+"456"
      ed974fc96c530639cccc9b18315396789d93a697 salt+"789"
      f87a2fa039a20d01032f19b5852868343f3d06b9 salt+"???"
      So, how a
      - Re: (Score:2)
        
        by swillden ( 191260 ) writes:
        
        Salts are secret if you want them to be
        If you keep them secret then they're not salts, they're keys. The definition of "salt" in the cryptographic world includes the notion that it need not be kept secret, just as "IV" is a value which need not be secret but must not be predictable, and "nonce" is a value which need not be secret or predictable (indeed a salt is technically a form of or application of a nonce).
        Another characteristic of salts is that you use a different salt for each entry. That's counterproductive in the case being discussed,
MD5 is not the problem (Score:2)

by gweihir ( 88907 ) writes:

For this application, MD5 did not make a difference. SHA512 would have been just as insecure. For some applications, MD5 is perfectly secure if used competently. This example is one and the original story doe snot claim any culpability on the part of MD5. As always, there is no substitute for knowing what you are doing.
I de-anonymized this comment (Score:2)

by ewg ( 158266 ) writes:

I de-anonymized this comment by signing in.
Using a published hash - FAIL (Score:2)

by chromaexcursion ( 2047080 ) writes:

Using any public hash exposes you to dictionary attacks. Especially when you publish which one you've used.
The quality of the encryption is irrelevant.
Security through obscurity, using a custom algorithm, is the only way.
Taking MD5, it's published, and tweaking a few points (though who ever did this needs to be very competent) would have been sufficient.

Some manager probably said any work for addition security wasn't worth the cost. Ooops!
- Re: (Score:3)
  
  by PPH ( 736903 ) writes:
  
  Security through obscurity, using a custom algorithm, is the only way.
  Not necessarily. I imagine the reason the hashed field was included in the published logs was to provide a key to group results by driver. Even if that driver was to remain anonymous. So all the city would have had to do is issue a system generated UID for each medallion/license number combination and populate the published data with that.
  Nobody knows who driver 1, 2, 3, .., 736903, ... etc. are. But one can still analyze per-driver data.
  - Re: (Score:2)
    
    by chromaexcursion ( 2047080 ) writes:
    
    nope, it has to do with the key. given a tag # and license # you can dictionary attack the hash. especially since the the source data is known, easy to break.
    
    they didn't pre-anonamize the keys
    - Re: (Score:3)
      
      by swillden ( 191260 ) writes:
      
      nope, it has to do with the key. given a tag # and license # you can dictionary attack the hash. especially since the the source data is known, easy to break.
      If they'd used a keyed hash of tag # and license #, it wouldn't have been breakable. Even HMAC-MD5 would have been fine, given sufficient entropy in the key, though I'd have used HMAC-SHA256 just as a matter of good crypto hygiene.
      And a custom algorithm is wrong, wrong, wrong. That's just begging for weakness in the solution. Use the proper standard algorithm for the job.
- Re:Using a published hash - FAIL (Score:4, Interesting)
  
  by Vellmont ( 569020 ) writes: on Monday June 23, 2014 @11:16PM (#47303101) Homepage
  
  Taking MD5, it's published, and tweaking a few points (though who ever did this needs to be very competent) would have been sufficient.
  
  No, that would have been stupid. It's unlikely someone would have reverse engineered your hacked md5 algorithm, but it's also possible you could screw it up.
  The solution is VERY simple. Generate a random 256 bit string. Hash random-string+data, and use the output as the identifier. Throw away the random 256 bit string.
  
  Some manager probably said any work for addition security wasn't worth the cost. Ooops!
  
  No, some developer didn't know what the hell they were doing. You'd be surprised (but shouldn't be) how little most developers know about security, especially encryption.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by chromaexcursion ( 2047080 ) writes:
    
    well, you just described a way to tweak an algorithm.
    wouldn't even have to go to a 256 bit key. Doing that into MD5 would probably foil anything less than a concerted financial attack.
    No media outlet could afford the computing power to attack that.
    I used the same approach, with some further tweaks to secure financial communications a decade ago.
    
    Lack of understanding security doesn't surprise me. I'm an engineer who does. I designed and wrote a suite that passed a 3d party, hostile, security audit.
    - Re: (Score:2)
      
      by Vellmont ( 569020 ) writes:
      
      No, that's not a tweak to an algorithm, it's a random input to an algorithm. The algorithm is the same, the input is different.
    - Re: (Score:2)
      
      by swillden ( 191260 ) writes:
      
      I designed and wrote a suite that passed a 3d party, hostile, security audit.
      I don't normally play the credential game, but if that's what you want to do...
      Me too. Many times. Including once an audit by the NSA (back when they actually tried to strengthen security). I've also been a security consultant for dozens of fortune 500 companies, and similarly-sized international corporations around the world. I've consulted for the US and Israeli militaries. I'm currently a crypto security engineer at Google, and the lead maintainer of a popular open source crypto library. I'm not a real
  - Re: (Score:2)
    
    by swillden ( 191260 ) writes:
    
    The solution is VERY simple. Generate a random 256 bit string. Hash random-string+data, and use the output as the identifier.
    This. Except rather than hashing the key with the data, use a proper keyed hash construction. HMAC is a good choice.
  - Re: (Score:2)
    
    by nabsltd ( 1313397 ) writes:
    
    The solution is VERY simple. Generate a random 256 bit string. Hash random-string+data, and use the output as the identifier. Throw away the random 256 bit string.
    How is this any more secure than assigning a the random 256 bit string as the identifier (with collision prevention, of course)?
    Next, how would a random sort of the original keys (SELECT DISTINCT medallion_number FROM the_table ORDER BY RANDOM) followed by assigning 1..number_of_medallions to use as the identifier be less secure?
    As others have stated, you could even just assign the new identifier sequentially if the source table isn't sorted by the key you are trying to obscure.
  - - Re: (Score:2)
      
      by swillden ( 191260 ) writes:
      
      You can simply use the 256 bits of garbage if all you need is an identifier.
      Yes, but you need to get the same 256 bits of garbage each time you encounter a given driver ID. This means adding a lookup table. Much simpler and faster to use a keyed hash as your lookup table.
      - Re: (Score:2)
        
        by nabsltd ( 1313397 ) writes:
        
        CREATE TABLE id_link ( new_id INT AUTO_INCREMENT, old_id CHAR(50) ); INSERT INTO id_link (old_id) SELECT DISTINCT old_id FROM old_table ORDER BY RAND(); SELECT new_id, other_field_1_from_old_table, other_field_2_from_old_table FROM old_table, id_link WHERE old_table.old_id = id_link.old_id;
        How hard was that?
- Re:What's the issue here? (Score:5, Insightful)
  
  by gweihir ( 88907 ) writes: on Monday June 23, 2014 @09:32PM (#47302529)
  
  You are naive. The problem starts to crop up when you start correlating things. Then you can find all sorts of things, like patterns of visiting a mistress, people meeting in secret (which is perfectly legal, but the government fears it), etc.
  
  Parent Share
  twitter facebook
  - Re:What's the issue here? (Score:5, Insightful)
    
    by chriscappuccio ( 80696 ) writes: on Tuesday June 24, 2014 @12:43AM (#47303477) Homepage
    
    The government has the info already, they handed it out!
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      And the Government is the only party that does data-correlation?
- Re:What's the issue here? (Score:5, Insightful)
  
  by Opportunist ( 166417 ) writes: on Monday June 23, 2014 @10:07PM (#47302747)
  
  Actually the movement of a cab is a wealth of information. Not by itself, but it's very good at connecting dots. If you want to follow someone around, these things tend to be invaluable. You can, essentially, follow someone around without following them around, even retroactively. People rarely go from place to place randomly. They have destinations. If someone takes a cab from the airport and doesn't live in the area where he landed, it is likely that his destination is the place that he will stay in. After a flight, especially a long one, people want to get rid of their heavy baggage, take a shower, put on new clothing. So you can easily find out where someone stayed. Which becomes twice as interesting if the destination is not a hotel, because now you got another person to screen.
  This information by itself is not much. But as part of a bigger network it is something we'd have killed for back when I was still doing profiling.
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by AHuxley ( 892839 ) writes:
    
    Very insightful Opportunist .
    With more nations trying to count passports in and out a wealth of information about each person entering some countries is now been stored.
    From face recognition, gait analysis, 'free' wifi, a new/old phone been set up for cheaper local use, the random risk of a laptop been examined and cloned on entry and exit.
    If you want to rent a car you face a complex 'chat down' by the friendly on site rental staff.
    So you take the next random taxi.
    In the past along a long airport roa
  - - Re: (Score:3)
      
      by AHuxley ( 892839 ) writes:
      
      Has Joe Sixpack been seen near any anti war protests? Written to the press at a city, star or federal level? Given charitable contributions to a faith based group now under investigation? Have a security clearance? Have a family member with a new or old security clearance? Does Joe Sixpack travel outside the USA a lot?
      Its not just about been "much easier" its about getting it all, having domestic staff feel ok about storing and sorting domestic details per person, been able to legally collect mor
    - Re: (Score:3)
      
      by Opportunist ( 166417 ) writes:
      
      The point is that you can't follow every Joe Random around all the time. But occasionally some Joe Random becomes a Joe Someone and you just wish you had the information that you could have if you just followed him.
      Scenario.
      You find out that there is someone you deem a nuisance to the powers that are. You finally caught him. But he doesn't talk. Imagine you're an entity that has access to a lot of information, either directly (because you have it) or indirectly (because you can request it). Using the CC inf
- Re: (Score:2)
  
  by dcw3 ( 649211 ) writes:
  
  I would love to have a glimpse, at this. I bet we'd be able to find some hacks who frequently take extended routes to bump up their fares.
- Re: (Score:2)
  
  by msauve ( 701917 ) writes:
  
  Are his initials "NSA?"
  - Re: (Score:3)
    
    by viperidaenz ( 2515578 ) writes:
    
    After he discombobulated Agent Smith from the inside, Neo changed his name to incorporate all 3 identities.
    Neo Smith Anderson.
    - Re: (Score:2)
      
      by SpzToid ( 869795 ) writes:
      
      Brilliant. Of course. This just makes so much sense now.
- Re: (Score:2)
  
  by wiredlogic ( 135348 ) writes:
  
  Hacking? This man is obviously a terrist fer'ner. Get him to Gitmo in a rendition wagon ASAP.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Oops. (Score:2)

Cue the DMCA. (Score:2)

Cue the DMCA. (Score:2, Insightful)

Re:Oops. (Score:5, Insightful)

Re:Oops. (Score:4, Insightful)

Data Security Officer (Score:5, Insightful)

Re: (Score:3, Insightful)

Re: Data Security Officer (Score:2)

Re: (Score:3)

Re: Data Security Officer (Score:4, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Data Security Officer (Score:4, Informative)

Re: (Score:2, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Data Security Officer (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Data Security Officer (Score:5, Interesting)

Re:Data Security Officer (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Data Security Officer (Score:5, Interesting)

Re: (Score:2)

Re:Data Security Officer (Score:5, Informative)

Re: (Score:2)

That was a dumb thing to do. (Score:2)

Re: (Score:2)

Prediction: de-anonymization considered "hacking" (Score:5, Insightful)

Re:Prediction: de-anonymization considered "hackin (Score:5, Interesting)

Re: (Score:2)

Re:Prediction: de-anonymization considered "hackin (Score:5, Interesting)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re:Prediction: de-anonymization considered "hackin (Score:5, Informative)

Oops, indeed (Score:5, Funny)

Error so popular it was enshrined in PCI DSS (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

MD5 is not the problem (Score:2)

I de-anonymized this comment (Score:2)

Using a published hash - FAIL (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)