Sunday, March 16, 2014

NFC Ring Project

I believe I've found my next project.  It appears that the NFC Ring is about to hit the streets.  There's a bowl on my dresser where I toss my keys, pen, and whatever else is in my pockets, when I arrive home.  I'm thinking that I could marry a Raspberry Pi, a NFC sensor, and a NFC Ring, so that the action of dropping the ring into the bowl (or other container) causes the lights in the office to come on and a WOL packet is sent to the computer.  Inversely, removing the ring from the bowl should ensure that the lights are off (after a certain period of time).

I got the above idea while researching NFC tags.  I'd seen a project where someone drops their phone into a bowl that has a NFC tag in the bottom, at his home, office, and gym.  Each of the tags causes the phone to change settings, join Wi-Fi, etc.

The ring project looks a bit more attractive in that I only need it at one location and the charge on the phone remains an issue (can barely make it through the day on one charge).  What'd'ya think?

- Tim

Monday, March 10, 2014

The 757 web site has been wobbly of late and users are starting to complain about the amount of space taken up by 11+ years of blog post and wiki entries.  As such, NeighborhoodTechie will become the home of the blog and the wiki is being converted to book form (more about this later).

Sunday, March 9, 2014

PDF text extraction tools

Building a larger tool out of a collection of smaller tools can be quite a learning experience.  For the past few months, I've been working on a document search engine to hold and index a collection of PDF files which were generated via the PrintFriendly browser app.

In the past month, I learned a few things about PDFs and extracting text from them.  

- PDF is not a document language like DocBook or HTML.  Rather, it is more of a type setting language, in that letters are located individually on a page.
- There are no good tools to properly extract text from a PDF (commercial tools included).
- Most text extraction tools cannot properly handle the letters "f", "o", "ll", and "t".

Of the various tools tested, it appears that Calibre's ebook-convert produces the cleanest straight-text output.  I'm using that in the text extraction piece of the search and including the ability to edit the extracted text (to improve the searches).