I've been experimenting (again) with search engines (other than the Sphinx/MySQL-based document management system (DMS) currently used in-house), in attempt to come up with a less input-intensive approach to managing thousands (coming up on 11K) of documents. The tool that I'm currently testing is Recoll (something that I've worked with before).
In an attempt to make each document's metadata more portable, I'm working on embedding such within each document's EXIF data (via the exif tool). This approach dovetails nicely with my kluge of Gleebox, Chromix, and (recently added) SurfingKeys browser extensions.
I've added an "Edit" link to Recoll's "Open/Download/Preview" menu. Clicking on Edit takes the user to the metadata editor from the existing DMS system. Of course, the editor no longer saves to the database. Instead, it saves the metadata in the documents EXIF header.
I've enabled display of tags (Recoll calls them "keywords") in the Web-UI's output. This was a simple addition because Recoll already indexes keywords from document EXIF headers (if they exist). In a future version, I intend to modify the template so that each tag is actually a link to a listing of other documents with the same keyword. Implementation will likely require use of a SQLite3 database, which is periodically (nightly?) rebuilt.
So far, I have the following opinions about Recoll:
- The approach is much more portable as there's no longer a separate database to replicate/back up/otherwise maintain.
- I don't have to write additional document parsers (\o/ -yay!). Not that I have very many Word documents in the data store but...
- A cannot seem to get a consistent output from the same search phrase. When more than one page of results exists, relevence sorting returns slightly different ordering each time the query is run. Note: this may be a result of my ongoing updating of metadata info but it should affect results to the degree that it does.
- There's no "sort by title" search option. Shouldn't this be a must-have?
Overall, it's a usable tool but the following may make it more attractive:
- Thumbnails in the results.
- Keywords which are individual links to external tag lists.
- Triggering recollindex via inotify.
- Sort by title
The new Web-UI result.tpl template is posted on: Github.