I believe I've found my next project. It appears that the NFC Ring is about to hit the streets. There's a bowl on my dresser where I toss my keys, pen, and whatever else is in my pockets, when I arrive home. I'm thinking that I could marry a Raspberry Pi, a NFC sensor, and a NFC Ring, so that the action of dropping the ring into the bowl (or other container) causes the lights in the office to come on and a WOL packet is sent to the computer. Inversely, removing the ring from the bowl should ensure that the lights are off (after a certain period of time).
I got the above idea while researching NFC tags. I'd seen a project where someone drops their phone into a bowl that has a NFC tag in the bottom, at his home, office, and gym. Each of the tags causes the phone to change settings, join Wi-Fi, etc.
The ring project looks a bit more attractive in that I only need it at one location and the charge on the phone remains an issue (can barely make it through the day on one charge). What'd'ya think?
- Tim
Sunday, March 16, 2014
Monday, March 10, 2014
Sunday, March 9, 2014
PDF text extraction tools
Building a larger tool out of a collection of smaller tools can be quite a learning experience. For the past few months, I've been working on a document search engine to hold and index a collection of PDF files which were generated via the PrintFriendly browser app.
In the past month, I learned a few things about PDFs and extracting text from them.
- PDF is not a document language like DocBook or HTML. Rather, it is more of a type setting language, in that letters are located individually on a page.
- There are no good tools to properly extract text from a PDF (commercial tools included).
- Most text extraction tools cannot properly handle the letters "f", "o", "ll", and "t".
Of the various tools tested, it appears that Calibre's ebook-convert produces the cleanest straight-text output. I'm using that in the text extraction piece of the search and including the ability to edit the extracted text (to improve the searches).
Subscribe to:
Posts (Atom)