DQL, QueryResultsProcessor, and JNoSQL

Thu Jan 13 14:32:04 EST 2022

Tags: dgqf domino jakartaee jnosql

As I've been adding new technologies to and talking about the XPages Jakarta EE project, I've kind of danced around a major missing layer: data access.

Technically, the toolchain has provided Domino data access all along, by way of having the same contextual sessions and database as XPages. You could use those to access whatever data you want, and they'd do the job as well as they ever do (c'est-?-dire: poorly). Beyond that, though, there's no equivalent to the (questionable) xp:dominoDocument and xp:dominoView components of XPages, and definitely no pre-provided object-to-database mapper.

The answer is pretty clear: Jakarta NoSQL. This API isn't quite finalized, but it's been usable for a long time: I wrote a Darwino driver for it years ago, and that driver is powering this very blog. I also wrote a Domino driver years ago, but it was very much a proof-of-concept: since it pre-dated DQL, it used formula queries for everything, and thus would scale extremely poorly. It was a nice exercise, but not anything useful.

For XPages JEE, I decided to take another swing at that. The implementation of the driver will warrant a tale on its own, but for now I'd like to focus on the DQL side of it.

DQL

I talked a bit about DQL when it came out, back when it wasn't well-understood, but since then I haven't actually had much occasion to put it to use. For the times I've needed complex Domino data access since then, it's been built on pre-existing operations on top of views. While adding DQL has been something I've considered from time to time, it'd never hit the threshold of being worth it: our needs involve extracting tons of data to bulk send it to service clients, and so views have remained necessary. While we could in theory alter our querying and filtering to select documents and project those selections onto the views, it'd have been a lot of work for partial benefits.

DQL itself has gotten more capable in the intervening years, and just on its own it's a perfect match for JNoSQL needs. Since all JNoSQL operations are sent to the driver as either individual doc IDs or an arbitrary query, something like DQL is required, and it's up to the task now.

It's half of the story, though. What DQL (by way of the DominoQuery object) gives you is a DocumentCollection, effectively just the list of note IDs. You can, as I'd hypothesized about doing, apply that against a view to extract data, but that still requires you to separate out the act of view management from the act of doing queries. If you want to have data sorted or categorized, you would still have to create an equivalent or superset view.

QueryResultsProcessor

So that's where the addition of QueryResultsProcessor comes in. QRP is technically distinct from DQL - you can use it to process arbitrary document collections, for example - but they're certainly a conceptual match. If you're comparing it to a SQL statement, DQL is the "FROM foo" and "WHERE x" parts, while QRP is the "SELECT a,b,c", "ORDER BY y", and "GROUP BY z" parts.

The general way it works is that you:

Create a QueryResultsProcessor from a Database instance (as opposed to Session - this distinction becomes important later)
Feed it sources of documents: DQL queries or arbitrary document collections
Add any desired columns to extract data. These are Domino-style columns, and you can also specify sorting and categorization here, as you would when building a view
Since data may come from multiple databases, you can also customize column formulas to account for that
Execute the process and retrieve the results, currently either as JSON or as a "view". More on these "views" later

When I first heard about QRPs, I had a concern with step 2: I'd thought that you could only pass a built DocumentCollection to the processor, which would significantly limit the room for Domino to add behind-the-scenes efficiencies. However, my fears were unfounded; the ability to pass in a DominoQuery object and the DQL directly and let the QRP execute it means that HCL is free to do whatever they want to make it fast. That's the sort of thing that makes SQL queries potentially so stupidly efficient: because you're just asking the database for results, the DB is free to optimize the heck out of them. This pairing potentially brings that to Domino, and that's what makes it important.

JSON Output

The executeToJson method is pretty straightforward if a somewhat-peculiar choice. It has no parameters, and returns the results of your query as reasonably-formatted JSON. It's unfortunate that this returns a String and not an InputStream, which adds some inherent inefficiency to dealing with it on the Java side, but that will only really hurt with very-large data sets.

Along with the requested fields, formula results, and aggregations, the document entries include the note ID (oddly in "formula" format) and the database file path, so you can use that to open up the document.

Anyway, this is a workmanlike format and can be potentially just sent to REST clients directly, though it'd be good form to at least strip out the DB paths and note IDs.

View Output

Now here's the spicy one. The executeToView method stores the results in a very-weird type of view. This has a few big advantages over the JSON mechanism:

The view persists in the database, up to a number of hours you specify programmatically. This allows you to essentially offload some extra caching to the database, which is ideal
You can use ViewNavigator and other efficient mechanisms to work with the view data, meaning you don't have the "here's a big result blob in memory" problem you have with the JSON format
Since it's a "view", anything that works with view data can work with it. This is presumably the reason it's implemented this way at all, rather than as some new kind of entity - building on existing mechanisms
The "anything that works with view data" doesn't just mean things like ViewNavigator: it also means the Notes client and view data sources

Now, these "views" have a lot of weird characteristics. It's useful to see the specifics listed out like that, but they all derive from a core lesson to ingest:

This is not a stored query; it is a cached result.

These views are not auto-updated, nor is there any mechanism I know of to refresh them outside of deleting and re-creating them. They're equivalent in concept to if you took the JSON from the first type and stored it in a document somewhere: it'll only change if you change it. The only way Domino will act on them is to delete them when they're expired.

Anyway, the data in these views is the same data that would go to the JSON format, just stored as Notes collation data instead of a string. It contains columns, potentially categorized and aggregated, for the data you requested, as well as hidden "$DBPath" and "$NoteID" columns at the end. The entry-level note ID (the one from entry.getNoteID()) is arbitrary and intended to not represent an actual document - since, after all, the documents may come from distinct databases. I've found the value of entry.getUniversalID() to be the doc's original UNID, but this is best treated as not a guarantee and so should not be used.

Designer Rights

So here's a fun catch: though any Reader can perform a query, you need Designer access to create a view. This seemed like a problem to me at first, since I'd want the generated results to be from a specific user for reader-field purposes, but it's not really an impediment, at least when you're in an environment like XPages.

Above, I mentioned that the fact that you create a QueryResultsProcessor object from a Database is important, and this is one of the reasons why. Though traditionally you wouldn't mix descendants of session and sessionAsSigner together, there's no actual rule against it. You can re-open your context database with sessionAsSigner, make a QRP object from that, and then feed it a DominoQuery object created from the non-signer database:

Database database = ExtLibUtil.getCurrentDatabase();
Session sessionAsSigner = ExtLibUtil.getCurrentSessionAsSigner();
Database databaseAsSigner = sessionAsSigner.getDatabase(database.getServer(), database.getFilePath());

DominoQuery dominoQuery = database.createDominoQuery();
QueryResultsProcessor qrp = databaseAsSigner.createQueryResultsProcessor();
qrp.addDominoQuery(dominoQuery, "some DQL", null);
View result = qrp.executeToView("some view name");

Because the QueryResultsProcessor uses the provided DominoQuery object as the "engine" for the DQL search, the query will use the normal user's rights while the processing will use the signer rights.

Naming and Expiring Results

As seen there, you have to name your views. While you could in theory use this mechanism to kind of manage your own views for general use and name them things like "People By First Name" or whatever, you'll likely want to work with them programmatically and name them based on your query input.

In the case of this JNoSQL driver, I compute a predictable-from-input hash-based name from the name of the creating class, the current user, and the sort/skip/limit attributes of the incoming query. You could really do whatever you want here, but having at least some sort of hash like this is likely the way to go.

Now there's the matter of detecting when you need to refresh the data. In some applications, it may suffice to go with the "expire in X hours" parameter when creating the view, though that's extremely coarse and only really useful on its own for specific needs (like a daily report).

The tack I took here was to try to do an efficient check of view creation time compared to the last data modification time from the source database. The Database class only has a "last modified" time in general, but I can't very well use that when my results caches are being added as design elements: a second distinct query would "invalidate" the first even when the data hasn't changed. There might be a proper way to get this in lotus.domino, the NAPI has a wrapper for NSFDbModifiedTimeByName: NotesSession.getLastDataModificationDateByName. That lets you get the last data-mod time in epoch seconds, and you can then compare that to the creation time of the view.

While it's unfortunate that you have to remove the view outright to refresh it instead of doing a delta update like NIF would do, I get it, and it's generally fast enough. Plus, there's enough hand-wavy stuff going on with feeding the DQL query to the QRP that Domino would be free to secretly retain results for a bit and do deltas internally if it so desires.

Storing Result Views

The other interesting aspect of creating a QRP object from a Database and not a Session is that that DB serves as the destination to house the views. While in a single-DB environment it would seem very natural to just store the views in the same place as the data, there's no particular requirement to do so. Moreover, if you're querying multiple databases, you're naturally not going to do this for all docs anyway, so you'll be forced to conceptualize this anyway.

Now, personally, I'm fine with a bunch of temporary machine-named views hanging out in the NSF (especially since the names are wrapped in parentheses to hide them from default UI listings), I can see why it could be annoying. For one, these views sync to an ODP in Designer, which I put in as an Aha idea to change, but might actually rightly be called a bug. Beyond that, while these views won't meaningfully contribute to NIF's workload (since NIF will skip them), they're unsightly and would get in the way if you're trying to tend to the design of your NSF like a garden.

So you might want to have a side database to store these views, and this could also be a way to get around the "needing Designer access" requirement if you're in an environment where you don't have a signer session. In the Notes client, you could store the results in a local NSF; on the server, you could make a "scratch" NSF somewhere to house them, and then add readers to the view design note when doing so to prevent leaking data across users and apps.

Conclusion

Anyway, this is all pretty neat. Reusing view design elements to just be static containers for collation data is weird, but I get the practical reasons why it makes sense. Importantly, this pairing solves some very-real problems with querying and extracting data from Domino. For example, if you do all of your querying via this route, you can use DQL's "EXPLAIN" capability to actually get some insight into database performance for once. You could imagine having an optional mode where you log the EXPLAIN results and execution times for all queries your app is performing, and then manually create "index" views to fix hotspots. It's quite satisfying to finally get that kind of ability in Domino. It'd be neat if that also came to QueryResultsProcessor.

I'm looking forward to expanding the JNoSQL driver further and then either using that directly in client work or adapting the code I use there. I'll definitely add such a logging capability, which will go a long way to put some numbers to the "feels slow" problem that can crop up. Beyond that, barring any show stoppers, I'm thoroughly excited by the prospect of moving away from fetching explicitly-named views in code and switching to an idiom of querying the pool of documents and letting the database make it work for me.

New Comment

Name: Body: