Open Data Needs Open Source Tools 62

Posted by Soulskill on Tuesday March 09, 2010 @01:29PM from the stop-trying-to-fork-reality dept.

macslocum writes "Nat Torkington begins sketching out an open data process that borrows liberally from open source tools: 'Open source discourages laziness (because everyone can see the corners you've cut), it can get bugs fixed or at least identified much faster (many eyes), it promotes collaboration, and it's a great training ground for skills development. I see no reason why open data shouldn't bring the same opportunities to data projects. And a lot of data projects need these things. From talking to government folks and scientists, it's become obvious that serious problems exist in some datasets. Sometimes corners were cut in gathering the data, or there's a poor chain of provenance for the data so it's impossible to figure out what's trustworthy and what's not. Sometimes the dataset is delivered as a tarball, then immediately forks as all the users add their new records to their own copy and don't share the additions. Sometimes the dataset is delivered as a tarball but nobody has provided a way for users to collaborate even if they want to. So lately I've been asking myself: What if we applied the best thinking and practices from open source to open data? What if we ran an open data project like an open source project? What would this look like?'"

Open Data Needs Open Source Tools

This discussion has been archived. No new comments can be posted.

Search 62 Comments Log In/Create an Account

Comments Filter:

Standards by Domain needed. (Score:4, Interesting)

by headkase ( 533448 ) writes: on Tuesday March 09, 2010 @02:03PM (#31416562)

High-level: Save your differences from day to day, bittorrent those differences to others, merge back in differences from others. Low-level: OMG, we used different table-names.

Re:Already being done (Score:4, Interesting)

by Hurricane78 ( 562437 ) writes: <deleted@nOspAm.slashdot.org> on Tuesday March 09, 2010 @02:50PM (#31417172)

I've said this a thousand times before: Make Wikipedia a P2P project without a single control, and build a cascading network of trust relationships on top of it (think CSS rules, but on articles instead of elements, and one CSS file per user, perhaps including those of others), and you solve all problems with then not-existing central authorities, and so also with censorship.
The only caveat: People have to learn again, who to trust and who not. (Example of where this fails: Political parties and other groups with advanced social engineering / rhetorics / mass psychology skills, like marketing companies.)

Re:Already being done (Score:3, Interesting)

by lennier ( 44736 ) writes: on Tuesday March 09, 2010 @08:25PM (#31421434) Homepage

I've said this a thousand times before: Make Wikipedia a P2P project without a single control, and build a cascading network of trust relationships on top of it (think CSS rules, but on articles instead of elements, and one CSS file per user, perhaps including those of others), and you solve all problems with then not-existing central authorities, and so also with censorship.
I agree wholeheartedly. If I understand correctly, this is very like what David Gelernter [edge.org] is saying with his datasphere/lifestreams concept: a fully distributed system with no centre where any node can absorb and retransmit its own view of the data universe. Twitter and 'retweets' is a sort of lame, struggling, misbegotten attempt to shamble towards this idea.
What would happen, I think, is that such a distributed Wikipedia would converge on a few 'trusted super-editors' who produced their own authorised versions - like Linux kernel forks or distributions - since the pressure to join a 'good enough' peer group would force forking to only happen where necessary. And yes, there'd probably emerge separate political factions: a Mainstream Wikipedia, a Citizendium, a Conservapedia, an Encyclopedia Dramatica, a UFOpedia, a Treknopedia, each of which has their own idea of what subjects are/are not 'noteworthy' or which sources are well-attested... but that's fine, we have that already, what we'd win in a truly distributed system is not the ability the ability to fork (which the GPL already gives us) but the ability to easily remerge which is currently a real pain.
There's no reason, for instance, why Citizendium, TVTropes, Encyclopedia Dramatica, C2, MeatballWiki, etc all couldn't share the same technical base and content and link to and import/export from each other, and just provide different editorial policies or views. And I think we'd all win hugely if we could bring that about.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Open Data Needs Open Source Tools 62

Open Data Needs Open Source Tools More Login

Open Data Needs Open Source Tools

Standards by Domain needed. (Score:4, Interesting)

Re:Already being done (Score:4, Interesting)

Re:Already being done (Score:3, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot