Forgot your password?
typodupeerror
NASA Space Media Science

NASA Requests Help With Von Braun's Notes 148

Posted by CmdrTaco
from the yes-actually-it-is-rocket-science dept.
DynaSoar writes "NASA is soliciting ideas from the public on how best to catalog and digitize the collected notes of Wernher von Braun. 'We're looking for creative ways to get it out to the public,' said project manager Jason Crusan. 'We don't always do the best with putting out large sets of data like this.' The PDF notes are those of rocket scientist Wernher von Braun, the first director of NASA's Marshall Spaceflight Center in Huntsville, Alabama and are typed with copious handwritten notes in the margin. According to the official request for information, NASA needs ideas on what format to use (PDF), how to index the notes, and how to create a useful database. The unique nature and historical value of the data, literally discovered in boxes six months ago, is what motivated NASA to ask the public for ideas."
This discussion has been archived. No new comments can be posted.

NASA Requests Help With Von Braun's Notes

Comments Filter:
  • by TheHawke (237817) <rchapin&pelicancoast,net> on Monday June 29, 2009 @10:22AM (#28513695)

    They got that million dollar touchless scanner that can digitize the papers with ease, then put them into either Open Source or PDF formats.

  • by Anonymous Coward on Monday June 29, 2009 @10:27AM (#28513787)
    Here he is performing it live. [youtube.com]
  • Zoom! (Score:3, Informative)

    by Quiet_Desperation (858215) on Monday June 29, 2009 @10:43AM (#28513979)

    We're looking for creative ways to get it out to the public

    By rocket mail!

    http://en.wikipedia.org/wiki/Rocket_mail [wikipedia.org]

  • by TheRaven64 (641858) on Monday June 29, 2009 @12:29PM (#28515405) Journal
    No. There is no such thing as an open source format. Open source is a term that can only apply to an implementation of a standard, not to the standard itself. Things like xpdf/Poppler are open source implementations of the PDF standard. The term 'open standard' applies to formats but is badly defined. The common definitions of an open format are:
    1. Can be licensed under nondescriminatory conditions (e.g. MPEG formats).
    2. Freely available specification, can be implemented by anyone (e.g. PDF).
    3. Future versions of the standard controlled by a a standards committee (e.g. HTML).

    PDF, since its creation, has been an open standard according to definition 2. Some people don't like it because it doesn't meet definition 3 (Adobe are the only ones who can create new versions of the PDF spec).

  • Let me fix that for you:

    the SECOND MOST IMPORTANT aspect of the documents is that it is easily searched.

    The FIRST is of course making a high fidelity digital copy of the original pages, that will serve as the authority on all questions of possible ambiguity in the handwriting, or whether a figure in the margin is a thumbnail sketch or a mere doodle.

    A 600 or 1200 dpi .png image of each page in full color would do as the master digital archive. The .png format is an excellent choice since it is open, well understood, and going to be around for a long, long time. Its accuracy is more than adequate for this work. That it supports lossless compression is a bonus: images of pages usually compress very well. Copies of the master digital library should be kept at various institutions and made available on request to anyone.

    Then for public and research use, convert each page to HTML 4.01 strict, (since it is universally available, will be around for a long, long time, and Google, etc, can do the indexing for us). UTF of course, especially since Werner used some German and Greek glyphs in his handwriting.

    Suggest using OCR to handle conversion of the typed notes, and volunteers or cheap student labor to transcribe the handwritten material (use consensus of several transcribers to assure accuracy). These can be incorporated into the main pages as divs and spans inserted into the correct place in the flow (use classes like "left margin" and "rightmargin"). CSS can use absolute positioning to make them marginal accordians (expand from the margin on mouseover), etc.

    Treat sketches like the handwriting: put an img of the sketch into a div or span at the right place in the flow, then also add a searchable text description of the sketch in that div.

    A simple script can process the final HTML fragment of each page and insert id="unique" attributes on each paragraph, etc, and <a name="unique"> targets where these would be useful.

    The finished NASA product should be a simple online database using server side scripting to compose and serve out pages on request. It should be built with cooperation from Google and other search platforms so that spiders will have good access to the body of the work without causing excessive bandwidth problems. It should be possible for any researcher to develop his own custom search engine. Ideally, it will support not just the notes, but also concordances, wiki discussions, etc.

    I once did a lot of this kind of work in moving sermons and such that were circulated by mimeograph in the 1960s and 1970s to web pages. I digitized the pages with a Minolta Z1 camera on a reverse tripod using indirect lighting, and converted to OCR with OmniScan (IIRC). The OCR came out in Word 97 format, and I used Perl scripts to transcribe to HTML. If the technical quality of the originals is good, this can go pretty fast and is highly accurate, even as a basement project. If the original notes use consistent formatting, which I would expect of Werner, then scripting with good use of regular expressions cna do the bulk of the HTML markup.

  • by CNeb96 (60366) on Monday June 29, 2009 @10:51PM (#28523597)

    I just assumed that by PDF they mean PDF/A. Isn't that controlled by ISO?

    Yep

    "On January 29, 2007, Adobe announced its intent to release the full Portable Document Format (PDF) 1.7 specification to AIIM, the Enterprise Content Management Association, for the purpose of publication by the International Organization for Standardization (ISO). During 2007 and into early 2008 that intent was turned into a reality. ISO published the approved ISO 32000-1 standard based upon PDF 1.7 in July 2008. ISO will also produce future versions of the PDF Specification."

    http://www.adobe.com/devnet/pdf/pdf_reference.html [adobe.com]

    except for the "extra" features adobe added and documented since the release of the standard.

APL is a write-only language. I can write programs in APL, but I can't read any of them. -- Roy Keir

Working...