Cleanup Instructions

From WikiMarion
Jump to: navigation, search

To take a file from raw OCR-ed text to a useable article, follow the following steps. If you don't know how to edit a page and add the markup necessary for headings, italics, etc., read the Editing Help page as well.

  1. Delete the page numbers, if present. They will usually also include the author's name (e.g. Musselman 3).
  2. Read through the paper and:
    1. Add headings and subheadings as appropriate. Try to use a new heading or subheading at least every few paragraphs.
    2. Clean up typos made by the student or conversion errors made by the scanner and OCR software.
      1. For example, the software often mistakes 'l', 'i', and '1' or confuses 'm' for 'rn'. It also makes makes mistakes on superscripts, like the ones word processors apply on ordinal numbers, like '2nd'.
      2. If you need a copy of the original scanned text, email me at rmlucas03 AT yahoo DOT com.
      3. Tip: The Firefox browser (version 2.0 or later) includes a spell-checker, so editing in Firefox can be easier.
    3. Edit the text as necessary. Some papers are already well-organized, while others are not. You may want scan the paper first to see if you the time and interest to do make the necessary revisions.
  3. Convert the Works Cited sections to the appropriate format: Make the references into a bulleted list by adding a star character (*) in front of each line. Delete any lines between the entries as well as any spaces the student has inserted. (MediaWiki just renders tabs as spaces, so you can usually leave them in. But once you have saved, double check to make sure the section looks right.)
  4. Convert the heading at the top of the paper into a 'credits' section at the bottom. Check a few other articles to see the basic format we use. A little variation is OK.
  5. Add photos, if possible. There may be a note on your page letting you know that appropriate photos have already been uploaded. You can also upload your own or search the database for existing photos on your topic (click here and search). You're not expected to go out and take your own photos, but if you're willing and able, it would be nice.


  • Make sure no lines begin with a space, followed by characters. This formats the following paragraph strangely, inside a boxed with a dashed line.
  • Leave a full space between paragraphs. If you simply pressing 'enter' to get a new line, the MediaWiki software will render them on the same line.
  • Don't paste your text into Microsoft Word for editing. It will change the formatting when you copy and paste your text back out. If you really need to paste the text out of the browser, use a text-editor, like Notepad.