Going paperless? Get a Canon P-215 ($285) or Canon P-208 ($170), PDFScanner for the Mac ($15), and get going.

We have a great little Swedish-made birch wood filing cabinet that we use to store our paperwork: bank statements, legal documents, that stuff. However, the filing cabinet is almost full (the USA likes paperwork), and I’ve refused to get another one, because more paper means more space and more waste. So, I decided to go paperless: scan everything, store them as PDFs, and shred the paper after it’s scanned.

There are some services (such as Outbox in the USA) that will do this for you. Call me oldskool, but they’re just not my thing: I don’t like the idea of other people reading my mail, any more than I like the idea of other people reading my email. I’d also prefer not to pay a subscription every year, though honestly, at ~$10/month, these services aren’t that expensive if you decide that you really hate doing this yourself. So I decided to see if I could find a solution that did what I wanted:

  • scan stuff fast: 5 pages+ per minute, with double-sided (a.k.a. duplex) scanning;
  • scan stuff in bulk: at least 10 pages at a time, and save out selected pages to different documents;
  • OCR everything: save as text-searchable PDF (searchable by Spotlight on OS X);
  • and, the biggest problem with scanners: have software that wasn’t a pain in the arse to use.

Doesn’t seem that onerous, does it? Well, I’ve actually been looking for a solution for a long time now…

The Fujitsu ScanSnap

I’ve had the legendary Fujitsu ScanSnap S300M for years. The ScanSnap series are the kings of document scanners: duplex, fast, sheet feeder; pick three. Unfortunately, they suffered from one large problem: you had to use Fujitsu’s own software to do the scanning, because they didn’t have TWAIN drivers, which is the standard scanner driver interface on OS X and Windows. This means that on a Mac, you couldn’t use Image Capture, Preview, or Photoshop to scan things, and were limited by what Fujitsu’s software could do. Double unfortunately, Fujitsu’s software is what you’d expect from a company that makes printers and scanners: not great. The more expensive ScanSnaps support OCR, but the S300M didn’t, and I couldn’t cobble together a satisfactory workflow that used the Fujitsu software and integrate it with an OCR program. I tried hacking together Automator scripts, custom AppleScripts, shell and Python stuff, DevonThink Pro Office (which has native support for the ScanSnap) and couldn’t find something satisfactory.

Taking a different route inspired by Linux geekdom, I did try TWAIN SANE for OS X, which takes the hundreds of drivers available for the Linux SANE scanning framework, and presents them as TWAIN devices for OS X. Unfortunately, the driver for the S300M didn’t work: sane-find-scanner would find it just once, and never find it again, and scanimage never worked. I would’ve loved to debug it and fix it, but that “life” thing keeps getting in the way. So, death to all scanners with a proprietary interface. What other options are there?

The Canon P-215

Thankfully, Canon’s entered the market with their very silly-named, yet totally awesome Canon imageFORMULA P-215 Scan-tini Personal Document Scanner. It’s like the Fujitsu ScanSnap S300M, but better in every way. It scans 15ppm instead of the ScanSnap’s 8ppm; has a 20-page sheet feeder instead of 10; can be powered from a single USB port; and most importantly, it’s TWAIN-compliant. The P-215 is $285 on Amazon. The Wirecutter, a fantastic gadget review site, agrees that the P-215 is the most awesomest portable scanner around.

There’s also the Canon P-208 for about $100 less, which is basically the P-215 lite: smaller and slower (about the ScanSnap’s speed), but still TWAIN-complaint. I use the P-215, but see no reason why the P-208 would be significantly worse for what I do. I probably would’ve got the P-208 if it were $170: at the time I bought them, the P-215 was $270 and the P-208 was $230, and I thought the $40 was worth the extra features. $170 is a much better price than $270, though, so look into the P-208 seriously if you’re considering this.

The scanner is designed so that it presents itself as both a scanner and a USB drive when you plug it in, with the USB drive containing the scanning software, so you’re never without it. Clever. The software that comes with the Canon is actually “not that bad” as far as scanning software goes… which still means it’s pretty craptastic. The Wirecutter does a good job of reviewing the software, so I won’t review it here, except that to say that my standards for quality software is probably higher than the Wirecutter’s. However, since the P-215 is TWAIN-compliant, you can use any software you want to scan stuff. So, what’s some good scanning software?

PDFScanner

Thankfully, one coder from Germany was sick of all the crappy scanning software out there, and wrote his Own Damn Scanning Software, creatively named PDFScanner. PDFScanner is just plain excellent. If you have a scanner at all, just go buy it. It’s a measly $15, and I guarantee you that it’s orders of magnitude better than the tosspot scanning software that you got with your scanner.

  • It does OCR.
  • It does deskewing.
  • It detects blank pages and removes them from the scan.
  • You can re-order and remove pages that you scanned. (OMG! Wow!)
  • It’s multithreaded so it OCRs and deskews multiple pages at once, and puts all those cores in your Mac to work. Oooer.
  • It has a “fake duplex” mode, so that if your scanner doesn’t support duplex scanning, you can scan the first side of all your pages, then the second side of all your pages. Cool.
  • You can select some pages out of the 20 you scanned, save just those pages as a single PDF, then remove them from the scan. Imagine that.
  • You can select different compression levels when saving the PDF.
  • It can import existing PDFs and OCR them.

Paired with PDFScanner, the P-215 will quickly munch through your paper documents so that you can happily shred them afterwards. The shredding is satisfying.

Did I mention that PDFScanner is $15? It’s $15. I hope the guy makes 10x as much money from it as the fools who write the retarded scanning software that comes with scanners.

Recommendations

I scan stuff at 300dpi: a one-page document is around 800KB. You may want to scan stuff at higher resolutions if you’re paranoid about reproducing them super-well if you need to re-print them. I don’t think it’s necessary.

I have a folder creatively named “Filing Cabinet” that I throw all my documents into. (Lycaeum is also a cool name if you’re an Ultima geek.) The top-level folder structure inside there mostly resembled the physical filing cabinet: “Banks”, “Cars”, “Healthcare”, etc, and works well. One of the nice things about a digital filing cabinet is that you’re not limited to just one level of dividers: just go create sub-folders inside your top-level folders, e.g. “Insurance” inside “Cars”. (Such advanced technology!) I include the date in the filename for most of my scans in YYYYMMDD format, but not all of them (e.g. I do for bank statements and car service appointments, but not for most work-related material).

Since all these documents contain some sensitive information, you do want to store them securely, but be able to conveniently access them. I store my stuff on Google Drive since I trust Google’s security (go two-factor authentication).

I expect the virtual filing cabinet to grow to 10-20GB for a few years of data, which is peanuts these days. I’m happy to pay Google the two-digit cost per year to have that much storage space in Google Drive.

Small Niggles

The workflow’s not perfect, but it really is close enough to perfect that I feel it’s about as streamlined as you can get. You do need to double-check that every physical page made it through to the final PDF correctly, which is just common sense. Don’t expect to scan 1000 pages, click Save and shred things without cross-checking.

Feeding in the paper into the Canon P-215 in the orientation that you’d expect means that you have to tell PDFScanner to scan in “Portrait (Upside Down)” mode instead of just “Portrait” mode, which seems a bit odd. (I’d blame this on the Canon TWAIN driver rather than PDFScanner, mind you.) No other side-effects besides having to pick the upside-down orientation, though.

Scanning in a mix of landscape and portrait documents doesn’t work if you want to OCR the whole batch, because PDFScanner will OCR the scan in its original orientation only, and won’t let you re-run OCR after you’ve rotated the page. This just means that you have to scan in portrait documents in one batch, and landscape documents in another batch. Not a big deal, especially since scanning a batch in PDFScanner is simply pressing the “Scan” button. I’ve emailed the PDFScanner author about this, and got a response back within a day saying that he’ll consider adding it to the next version. Maybe it’s already fixed.

It’d be a bonus for the P-215 to be wireless, instead of requiring a USB cable. However, the scanner’s so small and portable that I just grab it, scan stuff, then put it back on the shelf, so this isn’t a problem for me. If you really want, you can buy a fairly expensive $170 WiFi adapter for the P-215. At least you get a battery pack for your $170 too.

The Paperless Workflow In Practice

In practice, the workflow’s worked out quite well. I’m now rapidly churning through the entire filing cabinet’s worth of documents, and have shredded 99% of the paper. The workflow is near-perfect, with the very small caveats mentioned above.

I do keep a few physical documents, such as things printed on special paper (e.g. US social security card), birth certificates, etc. Those exceptions are extremely rare though; in total, I’ve managed to reduce a 80-90cm stack of paper (~3 feet) to about an inch. I love going paperless, and would highly encourage anyone who’s been thinking about it to get a Canon P-208 or P-215, PDFScanner, and make the leap. I said that the shredding is satisfying, right? It’s very satisfying.