The Real Reason Scanned PDFs Are So Large
When you scan a physical document, the scanner captures it as a photograph — a high-resolution raster image. That image is then embedded inside a PDF container. A single page scanned at 300 DPI produces a raw image of roughly 2500x3300 pixels. A 10-page document contains 10 of these images. At 300 DPI color scanning, each image can be 2-5MB before any PDF overhead. That is where the 20-50MB total comes from.
Color Scanning vs. Grayscale vs. Black and White
Color scans are the largest. A color image stores red, green, and blue values for every pixel. Grayscale stores only brightness. Black and white stores only on or off. For most text documents — contracts, forms, handwritten notes — color scanning adds zero useful information. Switching to grayscale reduces file size by roughly 65%. Switching to black and white for pure text reduces it by 80-90%.
- Color scan: 5MB per page (unnecessary for most text documents)
- Grayscale: 1.5-2MB per page
- Black and white: 0.3-0.8MB per page
How to Fix an Already Large Scanned PDF
If the PDF has already been scanned at high resolution and color, use the Compress PDF tool. Upload the file and apply High compression. For scanned text documents, High compression is safe — the compression algorithm is intelligent enough to preserve text legibility while reducing the redundant image data. A 30MB 10-page scanned contract typically compresses to under 2MB at High compression with completely readable text.
Preventing Large Scans in the Future
Change your scanner settings before scanning. For text documents like contracts, letters, and forms: scan at 150-200 DPI in grayscale. This produces sharp, readable text at a fraction of the size of a color 300 DPI scan. For documents with photos or color diagrams that must be preserved accurately: use 200-300 DPI color. For archival copies that must be searchable: 300 DPI grayscale with OCR is the professional standard.
Why OCR Makes Scanned PDFs Both Smaller and Searchable
Running OCR on a scanned PDF creates a text layer on top of the image. Some OCR tools then replace the image pages with the recognized text, dramatically reducing file size. Others add searchable text without removing the images. Use the PDF to Text tool to extract searchable text from scanned PDFs. For documents that must remain as PDF but be searchable, professional OCR tools create an invisible text layer while keeping the visual appearance.