Over the past few weeks, I've added the following features to the Document Management Solution:
- Source URL guesser - I heavily use PrintFriendly to capture web site articles. It's PDF output includes the source URL in the first 100 lines of the PDF. This is easily recovered via grep "/URI (http" --text -m1 filename.
- Title guesser - Somewhat less accurate than the above but still useful. PrintFriendly's PDFs include the page title in the resulting filename (use the format of "site-title.pdf"). All it takes is trimming of the site name and switching underscores to spaces in the remaining filename.
- Drag and drop processing of files.
- Keyboard shortcuts to "press" various PrintFriendly buttons which are keyboard unfriendly (i.e., no shortcuts or search-able text associated with them).
The end result is approximately a 80-85% reduction in manual processing time. Still to go:
- adding incremental indexing
- removing need for nightly full indexing
- adding "indexed" timestamp to record metadata