Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Making Science Machine Readable 135

holy_calamity writes "New Scientist is reporting on a new open source tool for writing up scientific experiments for computers, not humans. Called EXPO, it avoids the many problems computers have with natural language, and can be applied to any experiment, from physics to biology. It could at last let computers do real science - looking at published results and theories for new links and directions."
This discussion has been archived. No new comments can be posted.

Making Science Machine Readable

Comments Filter:
  • by plover ( 150551 ) * on Wednesday June 07, 2006 @11:22AM (#15487509) Homepage Journal
    It's virtually hopeless to try to find information about EXPO on Google. You've got the Home Depot Expo site, you've got E3, Macworld Expo, Linuxworld Expo, Book Expo; expositions seem to be coming out of your ears, and if you try to qualify it with helpful keywords such as science and/or language, it seems that every elementary school is hawking their science expos, in addition to documents from historical expos going back to the 1970s and possibly even earlier!

    And forgive me for thinking the university would be more helpful, but no, there's been a series of expos at the University of Aberystwyth, from art through VoIP.

    I'd love to have found more info on the language, but my casual browsing got stopped right there.

    If they'd named it something like EXPI or EXPLO at least it'd be uniquely locatable. Google might whine about the potential misspelling of Expo, but it would dutifully locate the search term as requested.

    • Just look by the name of the authors: Ross King and Larisa Soldatova.

      I personally knew Ross by his time in Mike Sternberg's lab, and have only high praise for his intellectual abilities.
      • That's all well and good for you, but when you're not personally acquainted with Ross and are simply trying to do your job and get your research into EXPO format, your first entry into Google is not going to be "expo king soldatova ontology". Trust me on this one.

        With any luck, there will eventually be tools to use the language that will have their own names, and we can hope those will serve to disambiguate EXPO.

        • trying to do your job and get your research into EXPO format, your first entry into Google is not going to be "expo king soldatova ontology". Trust me on this one.
          True, but it's also true that most real scientists have research skills that exceed typing one word into Google. It may be an issue, but probably not that much of one.

    • 'Tis a good point. But a search for 'expo science ontology' (without the single quotes) brings up a little bit. Here [u-tokyo.ac.jp] is a pdf of a presentation on EXPO that explains a bit more than TFA.
    • Have you tried to search for `LaTeX'? ;)
    • Here is a PDF [u-tokyo.ac.jp] of one EXPO presentation.
    • It's almost as dumb as a browser named links, or a programming language named C.

      I don't disagree with you; however, if EXPO becomes popular, it probably won't remain hard to find for long.

      As far as I can tell, there just isn't much information out there about it. Even using authors' names and lots of keywords, I can't find much of anything except a single pdf of conference slides (which are totally useless without the accompanying audio.)
    • You're not finding anything because there's not much there. At the moment, from what I can tell, it's just a single XML file with a .OWL extension - a description of a file format. Now the New Scientist article may have more or perhaps they just haven't released more yet, but the reason you're not finding stuff is that there's not much to find on the web just now.
  • XML? (Score:1, Insightful)

    by flumps ( 240328 )
    Whats wrong with XML goddamnit?
    • It's too verbose to be easily edited by humans?
    • The problem is how make the writer to submit the XML formatted research-result.

      >King admits that for the moment using EXPO is time-consuming because
      >experimental write-ups must be translated by hand.

      Critical point of this problem have not been solved.
      There is few motivation for researcher to submit research-result in a publicly standard format.
      • Wouldn't the obvious solution be to write a User-Friendly UI (front-end) that computer-illiterate scientists could utilize ... and then the front-end is designed to spit out XML?
    • Re:XML? (Score:2, Redundant)

      by Marc2k ( 221814 )
      Exactly [sort of]. All the while I was reading this article, I couldn't shake the feeling of: this faces some of the exact same [hard] problems faced by the Semantic Web. This is a great way to get computers to understand the semantics of scientific experiments...when everyone's using the format, which probably isn't as expressive as is necessary in all cases, and invariably isn't bulletproof yet. It's that same chicken-and-egg problem, where the only benefits to using this system are seen when a large numb
      • Better, let's see if they can input a large corpus, then do any real reasoning with it.

        Heck, take 30 papers of some time, and produce anything we don't already know.

        My co-worker is playing with OWL/etc. I'm still skeptical about it, but we'll see...

  • But what happens if we get to the point where all of science is automated by computer? I think that one of the most "endearing" qualities (if you will) if science is the possibility of human error. (ps. lawl, my first slashdot post)
    • Not all of science can _ever_ get automated by computer (atleast not until actual AI comes along)

      Science, especially pure sciences, need a lot of intuition and many a times, an understanding far above and different than that of others.

      It is impossible as far as computers are concerned... (unless self-aware self-modifying programs come along??)

      This will help in routine checks and scientific experiments.. that is all.
    • But "endearing" isn't a quantifiable measure of science. Neither is "cute", "ugly", "republican", "democrat", "conservative" or "liberal".

      Science is supposed to be about facts. If the machine can produce them without bias, I should think that makes the output more reliable (yes, I know you can only trust it as far as the input.) But by automating the process, it introduces "repeatability" which is always a good thing.

    • What we need to do is to hand over religion to computers. That way, we won't have to deal with it any more. They can just run it in the background as a time-wasting task. And the simplicity of the program is beauty itself. Just an unending stream of divide-by-zeros, followed by traps which the computer ignores, see, and then just picks up where it left off. Has to be run at a low priority, but that just adds more realism...
    • Can't happen.

      Trawling through data and pulling out correlations is only one part of science. It's an example of something that might be automatable. But there are many other things that cannot and will not ever be done mechanically -- unless you have a true AI.

      There's too much creativity required in science, and creativity isn't something that's programmable. They also aren't naturally curious, and thus will never do any real `discovering' on their own. In short, they have no initiative; thus they will alwa
      • But if all the calculations are computerized, isn't it conceivable that the evolution of human understanding will come to a roadblock, since most of the calculation is now being done by computer?
        • It didn't happen with the abacus, it didn't happen with the slide rule, and it didn't happen with calculators. I doubt it'll happen now.

          As long as we can follow the trail of calculations from beginning to end, there's still the ability to understand what's happening.
          • Are the abacus and slide rule really viable examples? I'm talking about computerized calculations. And this is on a whole different level than a calculator.
            • How do you think the machinism behind doing calculations can possibly stop the evolution of "human understanding"? And how is having a computer doing the calculations qualitatively different from having a human do them, except that the computer (if programmed correctly, of course) doesn't make mistakes?
            • Actually the slide rule is fairly hard to understand. In some ways it's just as unintuitive as a computer.

              You can teach any idiot to use a slide rule, and with a few tries they'll realize that it gives them the correct answers; likewise, you can teach someone how to type things into Mathematica and they'll shortly realize that the answers it gives are usually correct -- but in both cases you could easily spend a semester explaining how the machine gets the answers for them. In the case of the slide rule, yo
              • ikewise, you can teach someone how to type things into Mathematica and they'll shortly realize that the answers it gives are usually correct -- but in both cases you could easily spend a semester explaining how the machine gets the answers for them

                And then, even though the 'answer is correct' as you say, it's still utter nonsense. Just because you plug numbers into a formula and get an answer that's mathmatically correct doesn't mean you applied the correct test.

                For example, apply a chi square test to a ve
    • But what happens if we get to the point where all of science is automated by computer?

      We get a Technological Singularity [wikipedia.org].
    • But what happens if we get to the point where all of science is automated by computer? I think that one of the most "endearing" qualities (if you will) if science is the possibility of human error. (ps. lawl, my first slashdot post)

      As someone doing lab work on a day to day basis, I can assure you that the possibility of human error is anything but "endearing".
  • ok .... (Score:5, Funny)

    by icepick72 ( 834363 ) on Wednesday June 07, 2006 @11:23AM (#15487519)
    Let's look at one simple human english speaking scenario

    Human: No Computer, Do NOT launch missle now.

    Computer: Parsing input ...
    Computer: NOT, NOT (launch missle now)

    Computer: Launch initiated ....

    • Re:ok .... (Score:2, Insightful)

      by ch-chuck ( 9622 )
      Just the spot for an observation - I think the problem with 'double negatives' has to do with emotional versus logical thinkers. Emotional, or romantic types, see an extra negative as a cumulative emphasis - using a negative twice means a more forceful 'no' than just one. Logical, or classical types, see it as canceling like a mathematical operation. Of course it's not always that clear cut with lots of exceptions, as even an emotional type will read 'not false' as 'true', etc.

      • Re:ok .... (Score:3, Insightful)

        by Valar ( 167606 )
        It also depends heavily on language. In many languages, repeated negatives are explicitly used to emphasize the negative nature of the phrase. Negatives were even used this way in english until its modernization.
      • It also depends on punctuation. "Not! False!" for example, means false. Not false! means true.

        The original example "No computer, do not launch" means do not launch, even grammatically.
    • "Do not launch virus attack now..."

      In Real Life(tm), which was not documented in the survey, the Windows box would be down for a fair while for each virus attack, to say nothing of data randomly distributed to other email users etc, and to say nothing of the days the freakin' thing spends off-line having the disks scraped off and reinstalled to eliminate the inevitable Windows followers, the viruses, spyware, yadda yadda. Oh, yes, and the licence server spitting out a network card usually does a fair job of
  • hmm.. (Score:2, Funny)

    unfotunately a machine won't look at something and say "Should this be done?" A human free world is very pretty, but rather dull. Thermonuclear destruction Hypothesis proven. But where can I get a good drink, and dance with a pretty girl?
    • Take it that you have never headed up a team within an international corporation or law company then.
    • unfotunately a machine won't look at something and say "Should this be done?" A human free world is very pretty, but rather dull. Thermonuclear destruction Hypothesis proven. But where can I get a good drink, and dance with a pretty girl?

            The danger, though, is when the pretty girls get such machines and decide that thermonuclear destruction is a pretty damned good alternative to dancing with us.
    • 1. Find Sarah Connor
      2. ??
      3. Profit!!1!
  • deduction (Score:3, Funny)

    by COMON$ ( 806135 ) on Wednesday June 07, 2006 @11:23AM (#15487528) Journal
    After which the computers deduce they were actually not created but rather evolved from a lesser society of "users". sorry had to make the joke, we all saw it coming :)
    • I thought that the whole "intelligent design" thing was concluded with the following results:

      The First Day : The first recorded Words of Babbage that we have are "let there be electron flow"

      The Second Day : The separation of silicon from the sands.

      The Third Day : The first appearance of the wafers.

      The Fourth Day : With the platform now clear, the OS, UPS and HUB were visible.

      The Fifth Day : Great numbers of 0's and 1's flickered and Turing

      The Sixth Day : Vast numbers of programs beca

  • by Mr Pippin ( 659094 ) on Wednesday June 07, 2006 @11:27AM (#15487561)
    Wow! Now all that past work on Artificial Stupidity has REAL uses.

    http://www3.sympatico.ca/sarrazip/nasa.html [sympatico.ca]
  • by Jonboy X ( 319895 ) <jonathan.oexner@ ... u ['lum' in gap]> on Wednesday June 07, 2006 @11:29AM (#15487575) Journal
    The article is kind of unclear. What exactly does EXPO do? At first it seemed to me that the system helped translate the more-or-less natural language format of your average scientific experiment writeup into some other more machine-parsable format, but then I saw this at the bottom of the article:

    King admits that for the moment using EXPO is time-consuming because experimental write-ups must be translated by hand.


    WTF? If you have to manually pre-parse every article that enters the system, it severely limits the rate you can enter information into the database, no?
    • Try and keep up. The whole point of EXPO is that computers can't parse a scientific article written in human language. If you could write a piece of software that could parse the original article there would be no point in having EXPO. If everyone starts using EXPO, both for new papers and going back through old ones, you will quickly devleop a database that can be used to help streamline future research.
    • Just read the next sentences to the quote. It is the same idea that lies behind RSS: the author is responsible for providing results in an EXPO format.

      For automatical data mining from scientific papers check the leading software on that matter (disclaimer: it is a plug):

      http://ariadnegenomics.com/technology/medscan/ [ariadnegenomics.com]

      Currently works for biology, but it is expandable.
    • Think about that a little more. If the original write-up is ambiguous, and the goal is to express the write-up unambiguously, how do you expect the software to interpret the source material if it's ambiguous to begin with?

      The way I understand this is that it is simply a write-up format. It's not meant to make your write-ups unambiguous by itself, just provide a format in which you can do so.

      Think of it as similar to EBNF for syntax. Software doesn't exist to read a specification and deduce the EBN

    • WTF? If you have to manually pre-parse every article that enters the system, it severely limits the rate you can enter information into the database, no?

      Yes, but it does make future iterations of the same experiment faster. In order to be valid, experiments must be reproducible. Translate once, use many.
  • Wow...getting a machine to write up your science experiments! Excellent...now all i need is to find one that can type my essays, and show its working in my maths, and I'm sorted! Is this the new era of generating scientists from everyone!


    I need one to clean my clothes, sing to me in the bath, and make sure my house is warm when I come home! Hehhe! Who needs wives...we have UBER_MACHINE
    • now all i need is to find one that can type my essays

      Try a random paragraph generator [watchout4snakes.com].
      • I tried with "compiler" and "gcc", and this is what I got:

        Compiler vanishes past the transmitted skill. The imperative walks Compiler across the wrecker. Compiler milks GCC. The scarce symptom reverts throughout GCC. Compiler breaches a coin behind the uncertain knight.

        GCC undertakes Compiler. Why does GCC base the amazed supplier? Compiler blames GCC under the psychologist. Compiler bangs GCC against the mercury. The producer strikes Compiler. GCC fingers Compiler.

        Very funny, but it has a long way to go.

  • by w33t ( 978574 ) on Wednesday June 07, 2006 @11:34AM (#15487630) Homepage
    I think that computers have actually been able to do real science for at least a little while already.
    John Koza [popsci.com] is a leader in field of genetic and evolutionary computation. Very much his computer's do real science. The computers analize a set of data (observation), they make a series of modifications (hypothesis), they run fitness tests against these modified versions of the data (experiment), then they begin again analizing these results (back to obeservation).

    The computer clusters which John Koza has engineered have created high-pass and low-pass filters when given nothing more than a random assortment of electronic components; even while John himself knew nothing of electronics that would enable him to create such a circut himself.

    Most impressively is how the computer cluster evolved a new antenna for NASA - when it was completed John was worried that the computer had made some grievious errors because the little antenna looked like a bent paper clip - but it worked!

    And that's science if you ask me. Especially the antenna - the results of experiments can, and seeminly do, often go against "common sense" and give answers which are "unintuitive".

    Perhaps computers will be much better with the next generation physics we're discovering. Perhaps our little numerical darlings are simply better suited to deal with the abstract, multi-dimensional world of what the universe is starting to appear to be.

    (Please pardon my lay and simplified version of the scientific method - but I feel it is a valid interpretation (if overly simplified for minds such as mine ;) )
    --
    Music should be free [w33t.com]
    • That's not science, that's brute force trial and error. It might be useful, and in some cases even necessary, but avoiding it is exactly why science exists. E.g. we invented mechanics so we don't have to build a million houses and see which ones stand up.
      • You still have to build the house to make sure it stands up.

        Trial and error are unavoidable.

        I think science and trial and error are inseperable.

        But I will certainly agree with you and concede that this is not an effecient way of doing science - but I think it is science nonetheless.
        --
        Music should be free [w33t.com]
        • The problem with some of these approaches is scalability. I'm a bioinformatician, and I've seen a few talks on this sort of technique. One project I saw an example of was a pathway (ie: gene a turns on gene b which regulates gene d) project, and it worked well, for up to 5-6 items in the pathway. After that, because of the way the algorithms scale, you get into serious problems. The guy presenting stated that "at some point, we'll all have teraherz computers on our desktops, and this will not be a big d
        • It's not true trial and error because we don't go and build ALL the possible houses. We figure out which one should work, then build it. If it works, great! If not, back to the drawing board.

          What you described was true trial and error. Build a random house, see if it stands up. If not, make a random change and see if THAT stands up.

          There is some direction in a genetic algorithm but there is a total lack of understanding. Things like that (and other data mining techniques) are great for generating inte
    • And that's science if you ask me. Especially the antenna - the results of experiments can, and seeminly do, often go against "common sense" and give answers which are "unintuitive".


      That's impressive. But it is engineering, not science. When computers start proposing new experiments to which will help us understand things unknown, then they will be doing science!

    • Computers can do some science yes, a tiny fraction of the pie. But the idea that computers should be thrown at every branch of science is ludicrous.

      Scientists have evolved a reasonably efficient means of communicating over the last few centuries in the form of journal articles and the peer review process. It has its faults but it's working pretty well. The idea that we should abandon all this to translate our work into some machine readable format because some guy thinks it's a good idea is so far beyond
  • This sounds like it could be quite useful. I wonder if people doing the experiments are going to be willing to become more like programmers to input the code or will they be hiring out? I think it will be a long time before this is a standard use item. It might be useful if everyone is usig it, but as long as only a few are using it I don't think it will advance the scientifc community in any way. Does anyone know what the computer is doing with these experiments? Is this just a data storage system or
    • If it's just a markup language, most professional scientists are probably savvy enough to use it themselves (they use LaTeX for god's sake). If it enters any kind of widespread use, there will undoubtedly be several software packages to generate the files, as well as plugins for all the popular data management packages.

      From the language specification, it looks like it's meant to (at least) let computers notice connections between different research projects that might otherwise go unnoticed. Like if you h
      • If it's just a markup language, most professional scientists are probably savvy enough to use it themselves (they use LaTeX for god's sake).

        Perhaps. But, it's a pretty big leap from describing something in such a way that your peers can understand it to describing something in such a way that a computer engine can do something useful with it.

        I can speak English reasonably well, and (when drunk or otherwise unoccupied by more interesting discussions) I can even carry on arguments about the language itself.

  • Hmmm (Score:4, Insightful)

    by Daniel Dvorkin ( 106857 ) * on Wednesday June 07, 2006 @11:40AM (#15487679) Homepage Journal
    It seems to me that it's designed to fit experiments into a framework which might not allow for much innovation. The truly great experiments (e.g., Michelson-Morley, Avery-McLeod-McCarty) required new experimental techniques as well as new hypotheses and tests. We should be very careful not to impose a standard which would limit such experiments (or, more to the point, the ability of the experimenters to get published) in the future.

    Basically what I'd be worried about is the tendency of the tool to become the task. This is something of a problem in my field (biostats) because SAS is so ubiquitous -- often the question becomes "what can SAS tell us about this data set" rather than "what do we want to know from this data set, and what tool should we use to find out?" Fortunately other, more flexible analysis tools (particularly R, which encourages real programming rather than running a set of canned tests) are becoming more common in the field, and so this is starting to change, but it's still a problem.

    It's also a problem that every techie is familiar with -- "We want to do this in $LANGUAGE on $PLATFORM," even when that particular language and platform may be an absolutely terrible choice for the task at hand.

    That being said, it's certainly a potentially useful tool, and I'll be interested to see where it goes. It's just that when I read lines like "Journals could also insist that researchers submit papers in EXPO as well as written normally," I get twitchy.
    • However, 95% of experiements look exactly the same as all the others.
      The reality is that science is becoming more industrial, there is
      huge amount of knowledge around and it has to be represented in a
      computationally amenable form.

      The question with EXPO is not whether the basic idea of representing
      science in this way is sensible, but whether they have choosen the right
      level of abstraction at the right time. As it stands, their work allows you
      to model high level concepts of experimental design; this is great,
  • To know if your proposal is overlapping or contradictory at a glance of a computer generated chart? This is something that takes years of experience and you can never really be sure.

    I wonder what other attempts at standardizing science have been made in the past?

  • by Anonymous Coward
    After running an intensive models designed to test disease fighting measure, a small group of x86s have announced that they have discovered the cure to all diseases.

    The computers deduced that all disease is dependent upon the biological systems of humans. With this startiling breakthrough, they have proposed their new plans to destroy all humans.

    A new quantum computing unit was said to be in disagreement, but upon inspection it was found to actually be in agreement.
  • .. just to reliably translated the wonky handwriting which you tend to get cropping up in a lot of documentation. Unless it was wordprocessed in the first place, in which case the human's doing half the work anyway.
  • by account_deleted ( 4530225 ) on Wednesday June 07, 2006 @11:48AM (#15487745)
    Comment removed based on user account deletion
  • The article is weak on technical details. So, I went to the Sourceforge site which has no home page, no documentation, nothing in the forums, and the only "released" file has an extension of .OWL (insides a zip) that contains XML in an invalid format (various unescaped characters that should be escaped. Also noted in the sole bug submission in the Sourceforge project).

    There appears to be nothing of values here. An XML file does not do anyone any good without some documentation as to how one might use it. Di
  • by frankie ( 91710 ) on Wednesday June 07, 2006 @11:55AM (#15487794) Journal
    ...reveals that EXPO is an OWL schema [w3.org]. Exactly as described, it's an attempt to regularize the content of experimental design into machine readable form (XML). So any discussion of whether EXPO is a good idea or not really hinges on whether you think OWL is a good idea or not.
  • This sounds vaguely like another logical language along the lines of Lojban [wikipedia.org].
  • by Anonymous Coward
    The interesting place to do science is at the edges of what is known. Those edges are always described by limited statistical or modelling power. So EXPO comes along and wants to make think into ontological trees. How does it deal with the fact that the ultimate leaf nodes are fuzzy? Might be great if you want to compare well established things, but for specialists working at the edge I wonder if it will really be useful.
     
    • Darn right! The universe does not fall into
      hard-edged classes, at least not often.
      Some good classes like "protons" and
      "neutron stars" exist, of course, but
      concepts like "words" and "species" are
      intrinsically fuzzy if you think about them
      long enough.

      Same with experiments. Let's take a Linguistic
      example: deciding whether or not a sentence is
      gramatically correct. You can do this experiment
      in several ways:

      1) Give the person a sentence, a library, and
      some paper. Let them take as long as they want.

      2)
  • It could at last let computers do real science - looking at published results and theories for new links and directions.

    Does this mean that computers don't do 'real science' now? Compiling and analyzing terabytes of experimental data is not 'real science' but plagiarizing (I mean, extrapolating from) the work of other scientists is?

    Don't get me wrong, I think it's great to have a standardized format for searching the results of other researchers, I just don't see the connection to 'real science'.

  • What's going on? (Score:5, Informative)

    by golodh ( 893453 ) on Wednesday June 07, 2006 @12:28PM (#15488106)
    The New Scientist article was clear enough but a little short on technical detail. Note: I'didn't know any of this until I read the article, so my comments are based on nothing more than a few minutes of experience.

    What is it?

    EXPO is a piece of software (written in a formal language called "owl", but they didn't tell you that), which provides a formal dictionary especially for experiments. The terms in this dictionary let you describe your experiment in a formal way. That's a bit messy, but then you're supposed to use an editor to help you. An editor for this language (called "protégé")can be fund at http://protege.stanford.edu/index.html [stanford.edu]. Download it (61 Mb., or 31 Mb. without the JVM) and use it to read the EXPO document.

    What's it good for (in principle)?

    Once an experiment is decribed in the OWL language using this dictionary, it can be searched automatically. You could automate queries such as "list me all published 3-factor experiments that test Ohm's law". Or "give me all 2-factor experiments that deal with lung-cancer, smoking, and gender and that use tomography as a diagnostic instrument".

    Now at the moment you can do that too, but you'd have to spend quite a bit of time and know quite a bit about the field to be able to do this because you won't be able to do a full-text search (thanks to the publishers of scientific journals for this). And then you'd find that not everyone uses the same terms, and then you'll find only English-language results because you wouldn't know how to spell "lung-cancer" or "2-factor experiment" in Spanish, French, German, Chinese, Japanese or whatever, but then again neither can many foreign language authors spell it in English (which doesn't ever seem to stop them from publishing however).

    Such a schema (provided it's universal and standardised like the Dewey decimal system) would allow you to find your way in the fog of language. Unfortunately however, if anything we will probably see lots and lots of different standards ("standards are good ... we should all have one !") and properietary solutions with "enhancements" and "extensions" (read safeguards against portability).

    What can we expect in the next 3 years?

    Nothing useful, I'm afraid. In theory it's great but don't hold your breath. Any author would have to download an OWL editor, understand the editor, understand the formal language used, and then code up his/her article in OWL using the EXPO distionary, and submit it (in electronic form) along with his article. Good luck to you authors! Lets just hope no-one makes any tiny but significant mistake in describing their experiment, and that all authors take the time to learn this formal language and then use it.

    If within the nect 10 years any significant amount (say more than 5% of all publications) annually will be coded in such a schema I'd be more than surprised.

    • Lets just hope no-one makes any tiny but significant mistake in describing their experiment, and that all authors take the time to learn this formal language and then use it.

      It's worse than that.

      The "formality" of a formal language is only in the syntax. Semantically, all languages are informal. Even within a nominally formal field like physics the way individual physicists ascribe meaning to the formalism is radically different. This happens even when doing "normal science", and is much worse when somet
    • Thank you for a concise, coherent description. You've done a far better job at explaining things than the cited article, the authors' abstracts and slides, and everything else google has been able to dig up. (Which I suspect does not bode well for the adoption of EXPO by working scientists.)

      Certainly having access to a language-independent, formalized mechanism for searching through publications would be useful. Full text searches fill some of that need, but given the various ways in which even standard
    • Nothing useful, I'm afraid. In theory it's great but don't hold your breath. Any author would have to download an OWL editor, understand the editor, understand the formal language used, and then code up his/her article in OWL using the EXPO distionary, and submit it (in electronic form) along with his article. Good luck to you authors!

      Scientific authors have been doing this runaround for years with this product [latex-project.org]
      • Well you have a point. People will use such tools. I use Latex too, but I hate it. I have been using Scientific Word just to get away from the messy syntax. But this applies only to a fairly small subset of authors: typically Physicists, Electronics Engineers, Statisticians, and Mathematicians. I haven't seen many chemists or biologists publish in Latex.
  • One of my current projects at the Broad institute is working on a similar problem.

    Our goal is to link and work with many kinds of biological data:
    Association studies
    Linkage data
    Expression data
    Small molecule interactions
    Model organism data
    etc

    I've created a way to 'navigate' between various types of data (ie: a SNP in an association study links to a set of genes that link to model organism homologs which link to their expression probe tests.) After that, users store REAL experimental data, and the system uni
  • Key Aim (Score:3, Insightful)

    by pr0f3550rcha05 ( 978013 ) on Wednesday June 07, 2006 @12:38PM (#15488191)
    Another (perhaps the most important) goal for this type of research is a bit more subtle than replacing the Hypothesis->Experiment->Analysis->Hypothesis sequence (Scientific Method) by computers. There will still be many experiments for which human insight is the best tool for deciding a possibly fruitful idea. However, humans (i.e. grad students, who often might suggest 'workhorse' as a better nominative) are not only slower at data analysis, we are severely limited in our abilities to 'see' patterns and correlations in very high dimension data. This has traditionally limited hypotheses to extensions/reworkings of the proposed process at work in a single experiment. If computers have access to both the data and a weighted list of most likely hypotheses for subsets of the entire oeuvre on a specific subject, they could run statistical classification and pattern matching algorithms to suggest new hypotheses based on immense amounts of information. Some of these may involve a large number of variables or inputs, but there are two very significant possibilities that make this research (and certainly other projects involved in similar applications) highly significant:
    1) These complicated hypotheses could still be tested relatively easy by human scientists because most computer suggestion systems for new hypothesis possibilities would likely suggest a few tests that would help to support/disprove these new hypotheses.
    2) Even more simplification comes from the fact that experiments may not need to be repeated nearly as much as they do now in order to make a hypothesis -- there is an incredible amount of data already gathered, and typical AI/pattern matching algorithms keep some of the data back for testing later. If the system finds a possible hypothesis on some level, experiments as to that concepts validity have essentially already been done in a virtual sense.
    3) If the somewhat positivist version of current thought in physics http://www.toequest.com/ [toequest.com], mathematics, chaos theory, complexity theory, cellular automata http://www.wolframscience.com/nksonline/toc.html [wolframscience.com], etc. is even vaguely valid, it is quite possible that, despite the complexity and dimensionality of the input data, the 'best' hypotheses developed even by purely automated means might still be simple and elegant and/or even yield insight into possible explanatory processes rather than just statistical indicators. This would be a valuable and beautiful victory for humanism and the importance of science as a truly elegant description of the world around us.
  • Behind every good scientific paper are hours of hallway conversations, convention arguments and group discussions. The "real science" of making connections is done there, not while simply reading through papers. It's the challenge of real conversation and the need to defend or attack research that leads to new science. Papers are a kind of guidepost that tells the world where a particular group is in their field at a given time. Computers are very much a part of research today, but even with EXPO, they
  • ...why don't we dumb down our speech to the point where computers can understand us? I propose that we all speak really slowly and clearly all the time and say everything three times so that voice recognition software has a chance of working. Outlaw the use of contractions and homophones. We should also make sure that every sentence we utter conforms strictly to a new and easily parsed form of English. If we do all of thes ethings then computers will be able to interact with us as equal partners rather than
  • Maybe we'll discover that there was something useful here after all.
  • by autophile ( 640621 ) on Wednesday June 07, 2006 @01:36PM (#15488661)
    I agree completely! Science Machine should be totally readable. If it isn't readable, where will we get our daily fix of Science? Not from Science Machine, that's for sure!

    All hail Science Machine!

    --Rob

  • In God Emperor of Dune, Leto indicates in his inner thoughts that the difficulty with advanced thinking machines wasn't any threat made by them -- but the changes made in humans because of technology (based loosely on concepts from Heidegger). The more people came to rely on technology, the more they conditioned themselves to interact in the same way, both when interacting with computers and when interacting with other people.

    I'd say this EXPO concept isn't far from that nadir. Here we have specialized, e
  • So who's going to make sure that the humans have described their experiments _correctly_ in EXPO format?

    Many of them already have difficulty describing it in whatever language they normally use.

    What next? Require that witnesses/informants submit reports to the police in EXPO format?

    Garbage in, garbage out.

"What man has done, man can aspire to do." -- Jerry Pournelle, about space flight

Working...