Skip to Main Content

Scholars' Commons Digitization Equipment

Instructions for the digitization equipment in the Wells Library Scholars' Commons

How to Use ABBYY FineReader for OCR

OCR stands for Optical Character Recognition - this is what makes PDFs searchable and editable. This is a very useful thing for projects using many books or documents and projects that require a lot of data analysis that you might not want to do by hand.

  1. Make sure you have PDF or Word copies of the thing you want to OCR, or are willing to digitize them yourself.
  2. Open ABBYY FineReader from the desktop.
  3. Choose "Open PDF/Word/Excel/Etc" if you want to make a PDF/Word doc text editable, or choose "Conver to PDF/Word/Excel/Etc" if you want to convert an image or PDF into these other formats.
  4. When your image, PDF, or Word doc opens in ABBYY, the program should automatically try to "recognize" or read your document. This may take a moment depending on the size of the document. Your pages will be on the left side of the screen, the text output will be on the right.
  5. The program will try to highlight what it thinks is text in green boxes, and what it thinks are images (i.e. pictures on an otherwise text page) in red boxes.
    1. If it incorrectly recognizes part of the page, you can click on the colored box and a small box will appear at the top of the colored box. From there you can change the selected area to text/image/table/etc.
    2. You can use the "Delete" tool at the top of the window to delete boxes you do not want in your raw text version - either images or blocks of text you don't need.
    3. The "Hand" tool moves the boxes around, and "Recognition Area" allows you to draw your own boxes where you want them to be recognized, i.e. if you only want a single paragraph, or if your items are in columns, which ABBYY has a harder time recognizing.
    4. When you draw your new boxes, you have to specify that they are text using the little box that pops up above your drawn box, and then choosing "Recognize" at the top of the window.
  6. If it doesn't automatically read your pages, choose "Recognize" at the top of the window.
    1. ABBYY will run OCR recognition in the document, giving you the option of language to detect and the option to detect pages and images.
  7. Your other option is to "Recognize and Verify in OCR Editor."
  8. You can copy and paste the raw text into a Word document, or choose "File > Save As" and choose one of those options. This will save the text version at the right of the window.