Biohackathon 103
wjv writes: "Open source Bioinformatics hackers from around the world are meeting in the
first ever Biohackathon to hack, eat, hack, sleep, hack... The South African
Business Day has the scoop, or see our weblog. The
event is co-sponsored by my employer
and O'Reilly. I'm typing this from the
hackathon, and you wouldn't believe the buzz... or the scenic venue!"
Re:South Effrica (Score:2, Insightful)
For what it's worth, I've been to South Africa and met lots of nice people there :-)
not the first! (Score:2, Informative)
This is a follow up to the fist which was held
in Arizona.
Re:not the first! (Score:4, Informative)
The original intention was (I believe) that part one would be "talking" and part two "hacking". But as it happens, a lot more got done in Tucson than most attendees anticipated.
Only posted for 3 minutes... (Score:1, Funny)
Bah (Score:5, Funny)
Let them hack AI (Score:1, Troll)
If Biohackathon, then Technological Singularity. [caltech.edu]
A greater hack than this, no man hacks, except that he shall lay down his alife coding skills for the Immortal Artificial Mind. [scn.org]
All else is trivial. You have here and now on Slashdot received the call from History: Hack the Artificial Mind, [scn.org] or forever rue the lost opportunity.
Re:Let them hack AI (Score:2, Funny)
hacking (Score:2, Funny)
Re:hacking (Score:2)
FBI Agent #2: "Hey there, young man! Put down that equipment! Unauthorized cloning violates the DMCA!"
Ha! (Score:2, Funny)
Venue (Score:4, Interesting)
Re:Venue (Score:2)
-Chris Dag
Re:Venue (Score:1)
Racist or not, BRING YOUR OWN BANDWIDTH (Score:2)
All visitors will probably feel bandwidth deprived. Now you've
got 1 of 2 options:
1) Petition the SA Government on how they are depriving you
of your constitutional (very important word) right to bandwidth.
2) Bring your own pipe.
Unless your parents give you a bribary allowance, 2 will be
by far the most painless.
And for the last time: "NO! Zimbabwe isn't a province of
South-Africa. And NO Africa doesn't consist of 2 countries North
and South Africa!
the end.
Predators and Prey (Score:1)
So that's what they call it these days eh ? Biohacking ? Is that the english translation of Bukkake ?
(tasteless, I know.. bah!)
Re:Predators and Prey (Score:2, Insightful)
these twenty year olds are THE EXPERTS in
bioinformatics. This is overly pretentious to
say the least. I have worked for a bioinformatics
company for the last 10 years and would not say
that the world's top bioinformatics programmers
would actually be there.
This meeting is nothing but hype and crap!
Re:Predators and Prey (Score:4, Insightful)
As to being 'hype', would you prefer I3C?
Real artists ship.
http://bio.perl.org
'nuff said
Re:Predators and Prey (Score:1)
The problem behind the problem (Score:4, Informative)
The Biomedical Information Science and Technology Initiative [nih.gov], for the National Institutes of Health, says: Today the disciplines of computer science and biology are often too far apart to help one another. A computer-science student often stops studying other sciences after freshman biology or chemistry; a biology student, even one knowledgeable about computers, may not ever have had formal computer-science classes. Biomedical computing needs a better -- and more attractive -- meld of those disciplines. Today computer-science students have little incentive to learn about biomedicine. The barrier is not just the rigorous demands of computer science, it is also the relative rewards: The $50,000 to $80,000 a year that professional programmers earn makes the compensation associated with many research positions in biology laughable. This situation is even more risible when one includes the reality that staff positions on NIH research grants are guaranteed for no longer than the grant award.
This is a problem in every field of scientific computing but it is particularly acute in biology because of the bizarre and heterogeneous data set. Ultimately, the question is whether it is more efficient to teach a computer science student biology or teach programming to a biology student.
People who go into computer science typically do so because of fascination with the tools and techniques, not because they are interested so much in the data. The scientific mindset of the biologist might transfer to computer science much easier than the mindset of the programmer transfer to biology.
The computer has the same fundamental status in biology as the microscope. Computer science in the form of bioinformatics should perhaps be as basic to the study of biology as organic chemistry.
Re:The problem behind the problem (Score:2, Interesting)
Bzzz. To narrow. The real question is, how do you find smart, tech savy people and turn them on to the questions at hand. The class of people you want are the ones that never let school get in the way of their education. I'll argue that you even want to pull people who have a background neither bio nor CS. Perhaps, finance, physics or fluid mechanics as example. Why? Finance folks do some wicked nasty statistics and modelling -- they eat and sleep stochastic calc. Physicists make their livings, fundamentally, by measuring and interpreting complex systems. Fluid mechanics deals with lots of non-linear systems
Re:The problem behind the problem (Score:2, Interesting)
My plan is to get a masters in bioinformatics, then perhaps look for a 50-80k job. In the meantime, i'll be happy paying for school, assuming i can find a good program, because it's something i want to study.
So, OT question for those who know: where are the best bioinformatics graduate programs? (my particular interest is in proteomics) And what should i consider while considering schools?
Re:The problem behind the problem (Score:2, Insightful)
I think the exerpt was saying that it's hard to tempt programmers making from 50-80k into being somebody's post-doc researcher, making maybe $20k a year, or less.
If you work professionally in bioinformatics, you will do much better, probably on par with being a programmer professionally. This guy was just pointing out that its much harder to convince bright and well trained people to slave for nothing in the academic world, since their skills are still rare and in high demand. Since everyone's working for private corporations, nothing gets published, so the body of open research in bioinformatics increases only very slowly.
You can find a list of bioinformatics programs here: http://www.ib3.gmu.edu/courses/bioinfogradprgm.htm l [gmu.edu]
Re:The problem behind the problem (Score:1)
I find it telling that computer science is being equated with programming rather than problem solving. Many people go into computer science because they are fascinated with solving new problems, and devising new techniques, not just with the existing ones. It's these people that you want; people who can be creative about what can be done with your data, and what can be extracted from it.
Better yet, set up academic programs in a way that enables (and encourages) people with interests in both areas, and with minds inclined both for scientific discovery and for problem solving, to bring the disciplines together. The "ultimate" question
as to which set of people should be cross-trained in the other discipline is an old question that needs to be transcended by truly interdisciplinary programs. The NIH initiative is right that we have to think outside of traditional discipline boundaries (and potentially outside of the traditional lab organization as well, since these people will need to be treated according to the skills they can offer).
Teaching biologists how to program is not new. It's very useful for them to be able to automate basic techniques they would otherwise do by hand. But if you want them to be able to create the new computer techniques that are needed, they need to know a lot more about computer science than just how to program, and they need a more creative mindset to seek out and solve new problems. The efficiency of training is not the issue. Obtaining the needed results is.
Re:The problem behind the problem (Score:3, Interesting)
As much as bioinformatics tries to combine biology, computer science, and mathematics (which no has metioned yet but which has as much importance as the other 2 disciplines), they do stay quite seperate with regards to actual the actual programs written. Imagine a biologist running a bioinformatics lab. He may come up with a problem for which computers would work well in solving. So, does this biologist write the program himself? No. He tells the computer scientist who either works for him or is in collaboration with him what he wants, and the programmer programs it. Perhaps he has a mathematition there somewhere too to help out with the algorithms, but in the end he does no 'real' work himself except to come up with the idea.
Computer sciencists, as you say, don't really care about the data and, per their training, are not able to think about biological processes with the same expertise a biologist is. Vice versa with the biologist. So, at some point you still need experts in each individual field, as opposed to trying to merge 3 disciplines into one.
I say this as a Ph.D. student in bioinformatics with a BS in biology and a very good computer science background. To be honest, my cs background is of much more use to me than my biology degree, since the biology we work in is specific (and thus easy to learn), as with most bioinformatics laboratories. Many people can write scripts to get the data they need, but where a good cs background comes in is the difference between a program running 3 weeks or 3 hours.
Re:The problem behind the problem (Score:1)
So in your opinion, coming up with the idea isn't real work? I could not disagree more. Biology domain knowlege drives bioinformatics. Your project may be "specific", but I'll bet it is specific only because of a comprehensive understanding of the problem by at least one biologist.
As far as using CS expertise, that depends on the problem and the applicable skills of the biologist. I have a doctorate in biology, but no formal training in CS and yet I have been able to write my own code and/or modify open source tools which run in a high throughput, clustered linux environment constructed by
Do I value CS, math, and statistics professionals? Of course! I rely on open source tools and the linux movement as much as the biology community. I don't pretend that I invented domain knowlege in biology, computer science, or math. I suppose my best "skill" is to put my pride aside and ask questions, read, and learn from multiple coommunities. Depending on the community, I ask as a newbie, novice, or expert. I believe in the value of sharing to such an extent that I started a bioinformatics interest group [cvbig.org].
I say this as a bioinformatician working in industry who spent two winters of his career stuck in a remote part of Switzerland with nothing but a laptop and an early release of slackware.
Re:The problem behind the problem (Score:1)
Re:The problem behind the problem (Score:1)
I recommend that you:
Re:The problem behind the problem (Score:1)
That's why my current bioinformatics grant application contains a position each for a biology postdoc and a research programmer, plus myself, a biologist with a decade of computing under my belt. The postdoc will explore data analysis, prototype new applications and remain focussed on the biological questions, the programmer will generalize and componentize the prototyped applications and write new ones from scratch, plus s/he'll make sure that we store and treat the data correctly. Myself, I'll bridge the gap between bench scientists and our team, try and keep our sight on the forest and not on the trees and align our efforts with other similar teams.
Efforts such as these require multidisciplinary teams. There's simply no way that an individual can cover all aspects adequately, even if we try hard. We need to make sure, however, that all team members are on the same plane, understand what is going on and are working toward the same goal.
Re:The problem behind the problem (Score:2, Insightful)
As a student persuing Bioinformatics at UCLA, I must agree with a previous reply saying that your statement is a bit too narrow. However, I'll go so far as to say that we need to do both at the same time. I believe it was either Dr. Fox or Dr Eisenberg who said that no one can know every part of the problem at the same time. One *MAJOR* problem I've run into is that Biologist don't state many of their assumptions about a biological system when speaking to a non-biologist; while mathematicians and CS people don't know enough about biology to understand that any solution to a bioinformatics problem needs to have "biological relevance." From what I can see, this means it needs to conform to all the unspoken presuppositions that biologists and biochemist take for granted. This is not insignificant.
As my boss Parag Mallick pointed out, either we (math/CS people) come up with a solution that's formally correct but takes longer than the biologist's hacks that aproximate the system; or we find the solution to a problem that no one cares about.
Another major barrier is the vocabulary. There's a reason that mathematicians and chemists are often at eachothers' throats. The math they use is the same, but the vocabulary and notations are very different, so it's like working in a foreign language. Take this to the nth degree when you're a mathematician listening to a biologist give a presentation. The breadth of knowledge is huge, even though the depth to any one part of biology can be rather shallow (relative).
By comparison, it's brutal to watch the biologist tack a probability/statistics class. Don't even bother with a topology/measure theory class. The breadth of knowledge isn't that big, but the depth of the theorems is massive. I've noticed that this leads to a "if it doesn't look like a tool I already know, then I'm not going to use it" attitude among many biologists. After remembering my boughts with algebraic topology, combinatorics, and advanced linear algebra (and listening to myself when I try and explain my Newest Idea(tm)), I can't blame them.
So we need education in both directions. The problem is way too big to be tackled by any one mindset.
Just my 2 pessos.
Re:What kind of... (Score:1)
From what I have read and inferred, they wanted to get together in an awesome atmosphere, hack, brainstorm and have fun. To see what they can accomplish.
Formal methods are great, but it needs to be understood why they are being used. With large amounts of people, it starts getting harder to do things without formal methods. Anarchy does not work well on a large scale.
And maybe as their projects start getting more defined, formal methods are going to be applied to help create a better product, which I think makes sense. But they just want something to play with now. And have some fun in the process.
Some people consider working to exhaustion with a group of friends towards a common goal fun. I consider it more fun than sitting in a cube coding from 8 to 5. But that's me. =)
I agree the software might be better if a more sane design and schedule were used and I believe that will come eventually.
But why not let them have a little fun in the process?
Re:What kind of... (Score:1)
The object with bioinformatics is to get answers, not necessarily to create tools (unlike many more traditional applications).
Some history from someone on the peripherary.
In teh beginning was data and a bunch of programmers in different places with different skills asking different questions.
Toolkits were developed with different aims in mind:
BioPerl is a programmers toolkit in perl with strong links to the ensembl project (heck, it IS the Ensembl project).
EMBOSS (not represented at teh biohackathon unfortunately) is an application oriented toolkit in C.
Likewise biojava, biopython etc.
Each of these projects has its own strengths and weaknesses but has reached an appropriate level of maturity where it has become apparent that working together is essential for the best future development of all teh projects.
The aim of the biohackathon is to bring together sufficient coders for sufficient time to allow a lot of the minor (and most are minor) differences and interoperatability issues with each particular toolset to be ironed out so that they can each leverage each others strengths.
With the dispersed nature of the coders (many, many continents) it is fantastic that O'Reilly and Electric Genetics have sponsored this get together. Those of us who have to answer new questions on a day to day basis are extremely gratefull for the toolkits provided, and the ability to hack them to suit the task in hand.
There is a lot of iterative engineering going on. If we were to sit down and desigh to the nth degree then we would still be waiting for a standard model of DNA in a computer..
Instead we write poor code that gets some results. Others pick it up and make it better because they need to. I can think of thousands of lines of code I didn't need to write because of these projects. This equals results faster equals new therapies faster.
We are not writing wordprocessors here.
..d
Re:What kind of... (Score:4, Interesting)
You wouldn't believe the lack of anarchy among these people. They sound young, but there is a lot of personal discipline in that room.
The best product is the one that is tested and evolves with that experience - and this is working code, used in anger by the human genome project.
Hey, check out http://www.ensembl.org and see what you think.
That's one of the best user interfaces i've seen.. (Score:2)
I'd throw into the mix the very important software component design book, "Design Patterns: Elements of Reusable Object-Oriented Software" [hillside.net] [ISBN 0-201-63361-2]
Wanted: Biohacker Help (Score:4, Interesting)
My main area of concern however is the lack of good tools to take the raw data from sequencing machines and produce genotypes. Most software available is vendor specific, closed source, not very robust and extremely expensive. The closed source vendor specific solutions which are available lock up the data in proprietary databases, making it difficult to migrate to equipment from other vendors in the future and to get the data organized for many projects. All the software (including the few open source projects that exist) that I have evaluated has the same basic flaw, it starts with the premise that the lab will use them to screen against an existing database of genotypes (for disease or pedigree). These are fine medical applications (for which they were developed) but does not address the needs of the basic research laboratory which is working to discover the genotypes to begin with.
I would like to build an open source application that gives the user the freedom to choose the data collection platform, the freedom to move the data from one application to another and the freedom to improve and expand the application itself. I face two challenges: 1) Administration that says "open source, why would we want to use shareware". This one I'm addressing by building the information infrastructure using Linux. 2) Finding qualified programers that would like to work on the project. (I'm a pretty good admin, but am not a programmer).
The need for this work is great. In talking with other people in my field, I've found that the key thing they want to know is what software are you using to do the raw analysis. No one is satisfied with the current situation, but most of these are old school and don't know anything about opensource software. I've showed them that we can use existing open source software to run the lab. I'd like to show them that we can develop our own software to do some of the basic work. Any volunteers?
Why not NCSA? (Score:2)
I'm a manager with a large company that used to be an NCSA partner. One of the things they loved to demo was a biology workbench, so do open collaborative stuff. Now, I'm a mathematician by training, not a biologist, so I don't know if this was just demo-ware or not, or even connected to what you do.
Anyway, I worked on projects with NCSA a few years ago and they were building a lot of great tools for fundamental research. Perhaps this is no longer the case. But it's surprising that your adminstration can't get with the CS administration and learn that open source is, in fact, a good thing.
I'd volunteer, naturally, but I've been to central Illinois in the winter before and nearly froze to death
Re:Why not NCSA? (Score:1)
I'd volunteer, naturally, but I've been to central Illinois in the winter before and nearly froze to death
Hey! It snowed here yesterday for the first time in YEARS! It's blowing around a bit now, but I don't think the weather is all that bad here.
Bioinformatics.org.... (Score:3, Informative)
Re:Wanted: Biohacker Help (Score:1)
Additionally, I would like to be able to write a "Setting Up a Genetic Marker Lab Howto" using only open source software. If I can do this, other research facilities (particularly those who have little funding) will be able to replicate the setup and do this work in their own countries, giving them more control of their food supply.
Maybe I'm missing something but... (Score:2, Interesting)
The front-end can be locked and proprietary and you can point it at any database you need. I would be skeptical that the software you use doesn't allow even this (although I know bastard companies like this exist =). It seems trivial to program a frontend that does the number crunching based on queries from a relational database...I would suspect that organizing the data would be the hard part. Maybe I should finish my Perl for Bioinformatics book before I oversimplify. =)
Jayson
Re:Maybe I'm missing something but... (Score:2, Interesting)
You are right in that we do want the genotype data stored in a relational database. The problem is getting to the data we want to put in the database.
Re:Maybe I'm missing something but... (Score:1)
Okay, so the goal is to take raw data, which (from the best info I could gather) is sketchy then use imperfect techniques up to and including human intervention to dump the data into a relational database. Sounds like the primary qualifications in the programmer that you will need are equal quantities of patience and altruism. =)
Jayson
Re:Wanted: Biohacker Help (Score:2)
Zoot
Re:Wanted: Biohacker Help (Score:1)
Hmm. What exactly are you needing? Something that speaks a particular proprietary protocol with the device? Is the bus some standard? I've done alot of RS422, RS232 and Current loop stuff with various types of actuators and sensors. And protocols from PLCs to Opto22 and Sutron Data Language RTUs as well as bizarre Microcontrollers of various flavors. Is this protocol documented? Typically debugging the driver will take access to the hardware and some sort of appropriate monitor to watch the communication. The rest is pretty trivial.
Or is this something higher level than a device driver?
Or both.
I may not be able to spare enough time for this. I'm sure that if I can't get to this somebody will if the answers to those questions are clear.
Links to additional press coverage (Score:2)
To date we were the Feature National story in Business Day: LINK [businessday.co.za]
Computer Week online - top story on their home page today: LINK [computerweek.co.za]
sa.internet.com picked up the press release and covered it: LINK [internet.com]
Some additional media interviews were given today so there will likely be additional coverage. It's nice to see the press get most of the details correct :)
Long trip (Score:2)
Re:Biohacking Conference-not 31i+3 (Score:1)
Learning Bioinformatics (Score:2)
Like most of us here, i've got plenty of
computer programming skills. Plus i've also
got a degree in Physics, but what i like is
much Biology or Biochemistry apart from the
basics like DNA, the base pairs and amino acids, what do in need to learn to become useful in
bioinformatics?
Re:Learning Bioinformatics (Score:2, Interesting)
i guess it depends on what it is you really want to do in the field..._very_ basically, there are two areas in bioinformatics: 1. the programmer who creates (possibly enterprise-level) tools as directed by the needs of the scientists and, 2. the bioinfomatics researcher/scientist who also develops tools at need, but also analyzes the data and makes conclusions and uses those conclusions/interpretations to guide wet lab work. and then, the results from the wet lab work come back to the bioinfo scientist who then incorporates the data to refine their ideas or to develop new ones which then go back into the lab. it's a very nice positive feedback loop when it works.
i fall in the latter category, which i like to call "genome hacking." the programming focus is to get the data and process it rather than making a tool that looks pretty, is user friendly, etc.
what i have found most useful in this regard is an extensive background in molecular/cellular biology (i have ~10 years of wet lab experience interspersed with my bioinformatics work (i've been full time bioinfo since '95/'96)). since molecular/cellular biology data is inherently noisy, i find that experience actually working with it and interpreting it has a profound impact on how i do my computational research as not only do i know what the wet lab is capable of doing, but i am also able to analyze wet lab data and make informed decisions based upon it...many times, this noise i spoke of has a story to tell...and sometimes it does not. it is experience that allows one to make the differentiation.
as to the type 1 bioinfo type, i always think that it is a good idea to have a working knowledge of the type of data you're processing--not just the form the data take (ie this is a text file, this is an image, etc.), but rather "this is a DNA sequence that may have errors in it and i need to be aware of that and know the types of errors that can occur so that i can include provisions for that." of course, it's more complicated than that, but i think you get the idea. of course, the best way to learn this is by doing...reading some basic molecular biology texts wouldn't hurt either. ;)
james
Re:Learning Bioinformatics (Score:2, Insightful)
I'm a computer science student, and i am enrolled in a coop program. Having basically no knowledge in biology, I have been able, within 3 months of work, to learn everything i needed and to produce enough results/data to create 3+ M.Sc./Ph.D wet lab projects.
Re:Learning Bioinformatics (Score:1)
Re:Learning Bioinformatics (Score:2, Insightful)
as i said, it depends on what you want to do and what you want to get out of it. anyone with a narrow view and some programming abilities can generate data. it's the interpretation of the data that gets the discoveries...
I'm a computer science student, and i am enrolled in a coop program. Having basically no knowledge in biology, I have been able, within 3 months of work, to learn everything i needed and to produce enough results/data to create 3+ M.Sc./Ph.D wet lab projects.
good for you (i really mean that!). but can you analyze the data you generated and make useful conclusions that further the understanding of that particular project/field?
what you describe falls into the type 1 category i mentioned...and don't get me wrong--it's a very important category. it's just not what excites me...i'm more about having a relevant biological question (what is the function of gene x? is gene y involved in disease z? etc.) and then figuring out what tools i need (and create them if they're not already extant) and what data sources i need to mine. i can pump out gigs of data, but, in my mind, it's all about what useful information one can glean from it, not about how much you can produce. (and i'm not saying that that is what you were saying)
the more projects you become involved in, the more your knowledge base grows...be sure to have the scientists you're working with explain the _entire_ project to you so that you can see the bigger picture of what they are trying to accomplish...and maybe you'll be able to bring a different way of thinking to the project...it's not all about 1's and 0's.
james
Re:Learning Bioinformatics (Score:1)
Julien
Re:Learning Bioinformatics (Score:1)
well, that's great, then! the more and varied projects you get to work on, the greater your knowledge-based will become and the more connections you'll be able to make in the data. good luck!
james
Re:Learning Bioinformatics (Score:2, Informative)
A great resource is the National Center for Biotechnology Information's website at http://www.ncbi.nlm.nih.gov/ [nih.gov].
It houses genomic/protein data, tools, and pubs related to the field.
Biohackathon? (Score:5, Funny)
*groan*
Biohackathon?? (Score:3, Funny)
Bioinformatics as a student (Score:2)
I'm pursuing a CS and Biochemistry double undergrad degree right now. I might not actually be able to graduate with both degrees acknowledge by my university, but I will have taken all their courses.
In two years I will graduate, and move on to the next thing. I am still unsure as to whether I should go on to graduate school, or try to find employment in the bioinformatics field after undergrad.
I have two questions for people who have been working in this field. Should I go to graduate school? Where? And is there anything that I can play around with software-wise, etc, that will give me some practical experience in this area. // I just looked over at bioperl today, haven't installed it yet.
Thanks, rofgile
Re:Bioinformatics as a student (Score:1)
2) Play around with multi-variant clustering analysis techniques. The software is lab/project dependent.
Re:Bioinformatics as a student (Score:1)
bah! can you tell i disagree?
james
Re:Bioinformatics as a student (Score:1)
see http://bioinfo.mshri.on.ca/tkcourse for a tutorial
The toolkit is what much of the NCBI site is built on, including stuff like Genbank, BLAST and Cn3D.