Thursday, February 18, 2016

DMS status

Over the past few weeks, I've added the following features to the Document Management Solution:

- Source URL guesser - I heavily use PrintFriendly to capture web site articles.  It's PDF output includes the source URL in the first 100 lines of the PDF.  This is easily recovered via grep "/URI (http" --text -m1 filename.
- Title guesser - Somewhat less accurate than the above but still useful.  PrintFriendly's PDFs include the page title in the resulting filename (use the format of "site-title.pdf").  All it takes is trimming of the site name and switching underscores to spaces in the remaining filename.
- Drag and drop processing of files.
- Keyboard shortcuts to "press" various PrintFriendly buttons which are keyboard unfriendly (i.e., no shortcuts or search-able text associated with them).

The end result is approximately a 80-85% reduction in manual processing time.  Still to go:

- adding incremental indexing
- removing need for nightly full indexing
- adding "indexed" timestamp to record metadata