Stories
Slash Boxes
Comments
typodupeerror delete not in

Comments: 148 +-   NASA Requests Help With Von Braun's Notes on Monday June 29, @09:16AM

Posted by CmdrTaco on Monday June 29, @09:16AM
from the yes-actually-it-is-rocket-science dept.
nasa
space
media
science
DynaSoar writes "NASA is soliciting ideas from the public on how best to catalog and digitize the collected notes of Wernher von Braun. 'We're looking for creative ways to get it out to the public,' said project manager Jason Crusan. 'We don't always do the best with putting out large sets of data like this.' The PDF notes are those of rocket scientist Wernher von Braun, the first director of NASA's Marshall Spaceflight Center in Huntsville, Alabama and are typed with copious handwritten notes in the margin. According to the official request for information, NASA needs ideas on what format to use (PDF), how to index the notes, and how to create a useful database. The unique nature and historical value of the data, literally discovered in boxes six months ago, is what motivated NASA to ask the public for ideas."
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • NASA (Score:5, Insightful)

    by dov_0 (1438253) on Monday June 29, @09:18AM (#28513657)
    Seems to have a habit of just dumping things in warehouses and forgetting about them.
    • Re:NASA (Score:5, Funny)

      by HalifaxRage (640242) on Monday June 29, @09:22AM (#28513709) Journal
      Next week: What to do with this big golden box thing? We tried opening it and some guy's face melted.
      • Re:NASA (Score:4, Funny)

        by cayenne8 (626475) on Monday June 29, @09:36AM (#28513895) Homepage Journal
        "...those of rocket scientist Wernher von Braun, the fist director of NASA's Marshall Spaceflight Center..."

        Wow...I didn't know they had that position?!?!

        I'm not sure I'd WANT to be fist director....sounds like more of a strange pr0n thing than a NASA office.

      • Next week: What to do with this big golden box thing? We tried opening it and some guy's face melted.

        Guy 1: It's the Ark of the Covenant!

        Guy 2: No, it's a spare reactor core. Same effect.

      • NASA: We already have top men on that.
        Slashdot: But wh--
        NASA: Top. Men.

        (My favorite line. Uttered by the actor who played Porkins, IIRC.)

    • I assure you that they have top men working on it right now.

    • Re: (Score:3, Interesting)

      Not sure if I can really blame them.

      This past weekend I had a garage sale and, as I was clearing stuff, realized how much junk paperwork I had stashed in the garage. There were books, manuals, class notes, lecture notes (from those I attended and those I gave), meeting notebooks, documentation on long obsolete processes (Token Ring MAU reset procedures, Novell Netware rebuild procedures). I had notebooks of stories, embarrassing journal entries from college ("DH has the most beautiful eyes!!"), and all sort

  • by mwilliamson (672411) on Monday June 29, @09:19AM (#28513667) Homepage Journal
    group-iv tiff + ASCII, key-value metadata descriptor in XML. Keep it generic.
    • obviously, bittorrent to distribute the resulting set far and wide.
      • obviously, bittorrent to distribute the resulting set far and wide.

        ... with the files labeled as "Porn_Video_Michael_Jackson_And_Bubbles_Beat_It.rar" ...

        Might as well get MediaSentry and the RIAA in on the act ...

  • They got that million dollar touchless scanner that can digitize the papers with ease, then put them into either Open Source or PDF formats.

    • Isn't the PDF format open source?

      • yes it is. but many whiners here will argue against it.

        The thing is, dont half ass the pdf by simply encapsulating images. they need to do a real OCR on it and separate things out to images that are not typewritten.

        then donate the boxes to the Smithsonian.

        the MOST IMPORTANT aspect of the documents is that it is easily searched. which means all text must be text and not images. Yes that includes his handwriting.

        • The thing is, dont half ass the pdf by simply encapsulating images. they need to do a real OCR on it and separate things out to images that are not typewritten. ...the MOST IMPORTANT aspect of the documents is that it is easily searched. which means all text must be text and not images. Yes that includes his handwriting.

          I agree, but the second most important aspect is that the images of the original get preserved too. The ideal way to do it is to have the image be displayed, but with the OCR'd text linked t

        • Let me fix that for you:

          the SECOND MOST IMPORTANT aspect of the documents is that it is easily searched.

          The FIRST is of course making a high fidelity digital copy of the original pages, that will serve as the authority on all questions of possible ambiguity in the handwriting, or whether a figure in the margin is a thumbnail sketch or a mere doodle.

          A 600 or 1200 dpi .png image of each page in full color would do as the master digital archive. The .png format is an excellent choice since it is open, well understood, and going to be around for a long, long time. Its accuracy is more than adequate for this work. That it supports lossless compression is a bonus: images of pages usually compress very well. Copies of the master digital library should be kept at various institutions and made available on request to anyone.

          Then for public and research use, convert each page to HTML 4.01 strict, (since it is universally available, will be around for a long, long time, and Google, etc, can do the indexing for us). UTF of course, especially since Werner used some German and Greek glyphs in his handwriting.

          Suggest using OCR to handle conversion of the typed notes, and volunteers or cheap student labor to transcribe the handwritten material (use consensus of several transcribers to assure accuracy). These can be incorporated into the main pages as divs and spans inserted into the correct place in the flow (use classes like "left margin" and "rightmargin"). CSS can use absolute positioning to make them marginal accordians (expand from the margin on mouseover), etc.

          Treat sketches like the handwriting: put an img of the sketch into a div or span at the right place in the flow, then also add a searchable text description of the sketch in that div.

          A simple script can process the final HTML fragment of each page and insert id="unique" attributes on each paragraph, etc, and <a name="unique"> targets where these would be useful.

          The finished NASA product should be a simple online database using server side scripting to compose and serve out pages on request. It should be built with cooperation from Google and other search platforms so that spiders will have good access to the body of the work without causing excessive bandwidth problems. It should be possible for any researcher to develop his own custom search engine. Ideally, it will support not just the notes, but also concordances, wiki discussions, etc.

          I once did a lot of this kind of work in moving sermons and such that were circulated by mimeograph in the 1960s and 1970s to web pages. I digitized the pages with a Minolta Z1 camera on a reverse tripod using indirect lighting, and converted to OCR with OmniScan (IIRC). The OCR came out in Word 97 format, and I used Perl scripts to transcribe to HTML. If the technical quality of the originals is good, this can go pretty fast and is highly accurate, even as a basement project. If the original notes use consistent formatting, which I would expect of Werner, then scripting with good use of regular expressions cna do the bulk of the HTML markup.

            • Re: (Score:3, Interesting)

              For the right persons, transcribing the handwritten notes and sketches would be very rewarding. Werner Von Braun was pivotal technologist whose work for the Nazis either posed one of the greatest threats to England during WWII or, through high level monkeywrenching, managed to keep that threat from becoming a reality. He was definitely a very complex character who succeeded in doing a helluva good balancing act on dangerously high political high wires.

              So access to his notes in exchange for doing the drudg

      • No. There is no such thing as an open source format. Open source is a term that can only apply to an implementation of a standard, not to the standard itself. Things like xpdf/Poppler are open source implementations of the PDF standard. The term 'open standard' applies to formats but is badly defined. The common definitions of an open format are:
        1. Can be licensed under nondescriminatory conditions (e.g. MPEG formats).
        2. Freely available specification, can be implemented by anyone (e.g. PDF).
        3. Future versions of the standard controlled by a a standards committee (e.g. HTML).

        PDF, since its creation, has been an open standard according to definition 2. Some people don't like it because it doesn't meet definition 3 (Adobe are the only ones who can create new versions of the PDF spec).

  • by Anonymous Coward on Monday June 29, @09:22AM (#28513719)

    Gather round while I sing you of Wernher von Braun
    A man whose allegiance is ruled by expedience
    Call him a Nazi, he won't even frown
    "Ha, Nazi schmazi," says Wernher von Braun

    Don't say that he's hypocritical
    Say rather that he's apolitical
    "Once the rockets are up, who cares where they come down
    That's not my department," says Wernher von Braun

    Some have harsh words for this man of renown
    But some think our attitude should be one of gratitude
    Like the widows and cripples in old London town
    Who owe their large pensions to Wernher von Braun

    You too may be a big hero
    Once you've learned to count backwards to zero
    "In German oder English I know how to count down
    Und I'm learning Chinese," says Wernher von Braun

  • "Wernher von Braun, the fist director of NASA's Marshall Spaceflight Center"

    Nasty..

  • by codeButcher (223668) on Monday June 29, @09:25AM (#28513767)

    On the next thing that goes up to space (or even just a suborbital flight), crank down the window at about 20km up and throw the stuff out (or have some automated thingy with an explosive bolt that distributes it into the atmosphere). Now THAT would be a "creative way to get it out to the public".

    Then again, maybe that would be TOO creative.

  • Scan it at high resolution, OCR what you can, and load it into Distributed Proofreaders [pgdp.net]. Or if the material is too technical for the layperson, ask for a copy of the web-based software and set up your own private site. Let bored grad students work on it in exchange for some kind of minor credit on the final digitized work. (I believe that the bored grad students phenomenon produces half of the highly-technical articles on Wikipedia.)

    • Captchas.

      There are projects that use captchas to digitize old texts, NASA could put those parts which don't lend themselves to OCR as captchas on their webpage.

      • Re: (Score:2, Insightful)

        by Anonymous Coward

        Unfortunately, the notes are full of non-words, like (RTG), SNAP-10A, B70, n.mi
        At least, that what i'm assuming they say, because some of them are rather unreadable. Now, slashdotters may recognise some, but many people won't see the "words"

  • > the fist director of NASA's Marshall Spaceflight Center in Huntsville, Alabama

    Boy do I not want to work for that particular department.

  • TIFF FTW (Score:5, Interesting)

    by alta (1263) on Monday June 29, @09:31AM (#28513835) Homepage Journal

    Lets go with a format almost anyone can read. As soon as their all scanned in as high res TIFFs THEN you can begin to OCR them and create hybrid PDF's which CAN be indexed. From there we have a good start with high quality originals and searchable dirivitives. Then people can start rolling whatever custom solutions they want to.

    Yes, I know that OCR is going to be very crude, especially for anything hand written. But what it will do is get us a very good starting point. Id like to see a wiki set up with the OCR'd text as the beginning text, a link to the document and then the public can begin to go in and correct the OCR mistakes, and fill in what just flat out couldn't be OCRd.

  • Well, considering they host over 6,000 pdfs [google.com] and the RFI is in PDF with the title of the document being "Microsoft Word - WvB RFI 6-24-09.doc" by Jason Crusan who used Acrobat Distiller 7.0.5(Windows), I think we know what everyone uses at NASA. Fine. I'm not going to bitch about that. Instead I'm going to point out that if you're already dependent on Adobe Acrobat Reader & Microsoft Word being around until the end of time supporting your old doctypes, you might as well release these in PDF from DOC sources too.

    But, if I were doing this: Assuming these are all in images, put the images in whatever format you want and make a generic wiki page for each of them. Then let users log in (NASA fans should pour in) and translate the pages to annotated wiki pages with the footnotes (normally references) being all the side notes that were penciled in. They can categorize them by related missions and maybe even tag them ... you will need at least one or two people on your staff to administrate. Diagrams and drawings will probably need to be cropped and retained as images. Keep those in a lossless format but distribute whatever saves you bandwidth.

    Once that's done, ideally you'd put it in some XML standards based format (ODF or OOXML, yeah, that's another argument to be had) that you will always be able to read even if you have to build your own viewer/converter. Keep these sources indexed and provide for people the rendered PDF/PS/PNG/whocares and then you could probably build scripts to rebuild all from sources if you want. New technology comes out or people want to view them in HTML 5--no problem, just build a neat little XSLT for them.

    As for indexing them, I can tell you one way not to do it. Don't do the thing that curators of classical music did [stason.org]. Man, that's like speaking another language to me. Arrange the notes by mission or date if you can and any natural titles that arise for the favorites, add to it as an alias.
  • Why don't they release it in the open standard PDF, with annotations for the handwritten notes, which I believe are in the in the standard. (I might be wrong.)
  • by g34rs (1583313) on Monday June 29, @09:39AM (#28513923)
    Thanks NASA for making me feel like my opinion is valued and useful. Kind of like that, oh what was it called? The vote for the name of that satellite thingy? When really you're just passing the buck because your budget didn't include "digitizing old notes."
    • by Anonymous Coward

      instead of focring people to pay taxes on some project of dubious desirability, they are trying to see if the public has any support for their idea, before they thrust headlong into it.

      government workers should ask the opinion of the taxpayers more often, we are after all , their bosses. i have a lot of respect for the government employees that remember this, and nothing but contempt for those who want to 'play social engineer and tax waster' without regard for what the public thinks.

    • Re: (Score:3, Insightful)

      Even if NASA did do it itself, "society" would be paying for it anyway...

      Actually, this should be better in two important ways: not only could crowd-sourcing could accomplish the task much more efficiency than $50-grand-space-pen-NASA could to begin with, but also the cost would be distributed across the entire Internet, rather than being shouldered only by American taxpayers! It's a win-win-win* situation, I'd say.

      (* for NASA, and for space geeks, and for taxpayers)

  • Anonymous Coward (Score:2, Interesting)

    by Anonymous Coward

    You guys clearly do not read enough electronic media. PDF and Djvu are the more widespread and relatively ubiquitous modern electronic book formats. Djvu tends to be vastly superior to PDF in terms of file size though.

    Read all about it here:
    http://en.wikipedia.org/wiki/Djvu

    Discuss.

  • Zoom! (Score:3, Informative)

    by Quiet_Desperation (858215) on Monday June 29, @09:43AM (#28513979)

    We're looking for creative ways to get it out to the public

    By rocket mail!

    http://en.wikipedia.org/wiki/Rocket_mail [wikipedia.org]

  • How about take a page from the Talmud? [wikimedia.org] Seems a perfect format, and there's been thousands of years of indexing of that document.
  • Who cares where they come down.
    That's not my department, says Wernher von Braun.

  • personlly, i'd love a printed hard copy on my book shelf. right there with my Goddard books.

  • by nbauman (624611) on Monday June 29, @10:32AM (#28514617) Homepage Journal
    How about something like this? http://tobaccodocuments.org/ [tobaccodocuments.org]
There has been a little distress selling on the stock exchange. -- Thomas W. Lamont, October 29, 1929 (Black Tuesday)