Monday, December 31, 2018

Modify Recoll's Web-UI Template

I've been experimenting (again) with search engines (other than the Sphinx/MySQL-based document management system (DMS) currently used in-house), in attempt to come up with a less input-intensive approach to managing thousands (coming up on 11K) of documents. The tool that I'm currently testing is Recoll (something that I've worked with before).

In an attempt to make each document's metadata more portable, I'm working on embedding such within each document's EXIF data (via the exif tool). This approach dovetails nicely with my kluge of Gleebox, Chromix, and (recently added) SurfingKeys browser extensions.

Mod 1

I've added an "Edit" link to Recoll's "Open/Download/Preview" menu. Clicking on Edit takes the user to the metadata editor from the existing DMS system. Of course, the editor no longer saves to the database. Instead, it saves the metadata in the documents EXIF header.

Mod 2

I've enabled display of tags (Recoll calls them "keywords") in the Web-UI's output. This was a simple addition because Recoll already indexes keywords from document EXIF headers (if they exist). In a future version, I intend to modify the template so that each tag is actually a link to a listing of other documents with the same keyword. Implementation will likely require use of a SQLite3 database, which is periodically (nightly?) rebuilt.

So far, I have the following opinions about Recoll:

Pros

  • The approach is much more portable as there's no longer a separate database to replicate/back up/otherwise maintain.
  • I don't have to write additional document parsers (\o/ -yay!). Not that I have very many Word documents in the data store but...

Cons

  • A C++ engine that uses HTML templates for the front-end, which contain embedded Python commands, with Javascript and CSS making everything look pretty. Need I say more?
  • A cannot seem to get a consistent output from the same search phrase. When more than one page of results exists, relevence sorting returns slightly different ordering each time the query is run. Note: this may be a result of my ongoing updating of metadata info but it should affect results to the degree that it does.
  • There's no "sort by title" search option. Shouldn't this be a must-have?

Needs

Overall, it's a usable tool but the following may make it more attractive:

  • Thumbnails in the results.
  • Keywords which are individual links to external tag lists.
  • Triggering recollindex via inotify.
  • Sort by title

Link

The new Web-UI result.tpl template is posted on: Github.

Sunday, December 16, 2018

To-do items

Notes to self/for the to-do list:
  • need a way to reference specific docs as external links (probably need a get function and an internal link)
  • feature for DMS - flag for possible project or useful for pending/existing project
  • after start of new year, clean out deprecated/non-functional feeds from rss reader