Showing posts for tag "open-source"

Writing Domino JEE Apps Outside Domino

Nov 8, 2019 1:47 PM

In my last post, I mentioned that I decided to write the File Store app as a Jakarta EE application running in a web server alongside Domino, instead of as an app running on the server itself. In the comments, Fredrik Norling asked the natural question of whether it'd have been difficult to run this on the server, which in turn implies a lot of questions about deployment, toolkits, and other aspects.

Why Not

My immediate answer is that it would certainly be doable to run this on Domino, but the targeted nature of the app meant that I had some leeway in how I structured it. There are a couple reasons why I went this route, but most of them just boil down to not having to deal with all the gotchas, big and small, of doing development on top of Domino.

As a minor example, I wanted to add some configuration parameters to the app, and for this I used MicroProfile Config. MicroProfile Config is a small spec that standardizes the process of doing key-value-based configuration, allowing me to just say that I want a named value and let the runtime pick it up from the environment, system properties, or a config file as necessary. It wouldn't be difficult to write a configuration checker, but why bother reinventing the wheel when it's right there?

Same goes for having the SFTP server load at startup and be running consistently. If I did this on Domino, I could either put it in an NSF-based XPages app with an ApplicationListener and depend on the preload notes.ini parameter, or I could go my usual route and use an HttpService that manages the lifecycle. Neither route is difficult either, but they're both weirder and more fiddly than using servlet context listeners in a normal JEE app.

And then there's just the death by a thousand cuts: needing to use Tycho to build, having to deal with Eclipse Target Platforms, making sure anything using reflection is wrapped in an AccessController block to avoid Java policy issues, the nightmare of dependency management, the requirement to use either Designer or the okay-but-still-cumbersome Domino HTTP restart development cycle, and on and on. With a JEE app, all of those problems disappear into thin air.

The Main Hurdle

All that said, there's still the hurdle of actually implementing Notes native API access outside of Domino. At its core, what I'm doing is the same as what has been possible for years and years, initializing a Notes/Domino runtime in a secondary process. The setup for this varies platform by platform but generally involves either setting a couple environment variables for your process or (as is the case with the Domino Open Liberty Runtime) spawning the process from Domino itself.

Beyond setting up your process's environment, there's also the matter of initializing each thread of your app. On Domino, all threads are inherently NotesThreads, but outside of that there's some specific management to be done. You can call NotesThread.sinitThread() and NotesThread.stermThread() manually or run your code in specifically-spawned NotesThreads. I largely do the latter, making use of an ExecutorService to handle maintaining a thread pool for me. I then added some supporting code on top of that to let me run blocks of code as an arbitrary named user while retaining a cached set of sessions. That part wouldn't really be necessary if you're writing an app that doesn't need to act as different user names, but it's handy for something like this, where incoming connections must run as a user to enforce security fields properly.

Philosophical Advantages

Beyond my desire to avoid hassles and get to use modern Jakarta EE goodies, I think there are also some more philosophical advantages to writing applications this way, and specifically advantages that line up with some of HCL's stated long-term goals as well. Domino has long been a monolith, and that has largely served it well, but keeping everything from the DB all the way up through to the app-dev stack in the same bag means you're often constrained in your toolkits and deployment choices. By moving things just outside of the main Domino tower, you're freed up to use different languages and techniques that don't have to be integrated and maintained in the core. This could be a much larger jump than I'm doing here, and that's just what HCL has been pushing with the AppDev Pack and the associated Node.js domino-db module.

I think it's beneficial to picture Domino more as a dominant central core with ancillary servers and apps running just adjacent to it - not a full-on Microservices architecture, but just a little decentralized to keep areas of concern nice and separate. Done well, this setup is a lot more flexible and fault-tolerant, while still being fairly straightforward and performant. It's also a perfect match for a project like this that's geared towards implementing a new protocol - it doesn't even have to worry about HTTP SSO or reverse proxies yet.

So I think that this is where things are heading anyway, and it's just a nice cherry on top that it also happens to be a much (much, much) more pleasant way to write Java apps than OSGi plugins.

Another Side Project: NSF SFTP File Store

Nov 5, 2019 4:12 PM

When I Know Some Guys kicked off, we bought a couple of Transporter devices to handle our file-syncing needs without having to rely on Dropbox or another hosted service. Unfortunately, Nexsan killed off Transporters a couple years back, and, though they still kind of work, it's been a back-burnered project for us to find a good replacement.

Ideally, we'd find something that would handle syncing data from our various locations transparently while also allowing for normal file access through some common protocol. Aside from the various hosted commercial services, there are various software packages you can run locally, like OwnCloud and NextCloud. I even got a Raspberry Pi with a USB hard drive to tinker with those, though I never got around to actually doing so.

Yesterday, though, I realized that we already have a fleet of privately-owned servers that replicate seamlessly in the form of our Domino domain. They also, conveniently, have nice capabilities for blob storage, shared user authentication, and fine-grained access control. What they didn't have, though, was any good form of file protocol. I'm pretty sure that Domino still has WebDAV built-in, but that's just for design elements. Years ago, Stephan Wissel created a project that works with file attachments, but that didn't cover all the bases I wanted and I didn't want to adopt the code base to extend it myself. There's also Karsten Lehmann's Mindoo FTP Server from around the same time, but that was non-SSH FTP and targeted at the local filesystem.

So that meant it was time for a new project!

The Plan

I initially looked at WebDAV, since it's commonly supported, but it's also very long in the tooth, and that has led to all of the projects implementing that being pretty old and cumbersome as well.

Then, I found the Apache Mina project, which implements a number of server protocols, SSH included, and is actively maintained. Looking into how its SFTP support works, I found that it's shockingly simple and well-designed. All of the filesystem access is based on the Java NIO packages added in Java 7, which is a pluggable system for making arbitrary filesystems.

Using SFTP and SCP means that it'll work with common tools like Transmit and - critically - rsync. That means that, even in the absence of an custom app like Dropbox, mobile access and syncing with a local filesystem come "for free".

The Project

So out of this was born a new project, NSF File Server. Thanks to how good Mina is, I was able to get a NIO filesystem implementation and SFTP+SCP server up and running in very little time:

Screen shot of the SFTP server in Transmit

In its current form, there aren't a lot of tricks: the files are stored as attachments to normal documents in a "filestore.nsf" database with two views, which allow for directory-contents and individual-file lookup while also being pretty self-explanatory to a Notes client. I have some ideas about other ways to structure this, but there's an advantage to having it be pretty basic:

Screen shot of the File Store NSF

Similarly simple are the authentication mechanisms, which allow for both password and public-key authentication based on the HTTPPassword and sshPublicKey fields in a person document, respectively (and maybe via LDAP in directory assistance? I never remember the mechanics of @NameLookup).

The App

Because this is a scratch-our-itch project and I'm personally tired of dealing with Domino's OSGi environment, the app itself is implemented as a WAR file, expected to be deployed to a modern Jakarta EE server like my precious Open Liberty. Conveniently, I have just the project for that as well, making deploying NSF-accessing WARs to Domino a bit more reasonable.

For now, the app is faceless: the only "web app" bits are some listeners that initialize the Notes environment and then spawn the SSH server. I plan to add at least a basic web UI, though.

Future Plans

My immediate plan is to kick the tires on this enough to get it to a point where it can serve as its original goal of a syncing SFTP server. I do have other potential ideas in mind for the future, though, if I feel so inspired. Most of my current logged issues are optional enhancements like POSIX attribute support, more-efficient handling with the C API, and better security handling.

It's also a good foundation for any number of other interfaces. A normal web UI is the natural next step, but it could easily provide, for example, S3 API compatibility.

For now, though I haven't gotten around to uploading a build to OpenNTF yet, feel free to poke around the code and let me know if any ideas strike your fancy.

Anatomy of a Clean Open-Source Project

Mar 18, 2019 11:54 AM

Tags: open-source

Over the years, initially thanks to Peter Tanner's diligent work as the OpenNTF IP Manager and now my own occupation of that seat, I've learned to really appreciate the virtues of dotting your "i"s and crossing your "t"s when it comes to making an open-source project legally clean and clear.

It's definitely something I underrated early on, though - caring about the specific differences between licenses and, in particular, maintaining things like per-file license/copyright headers felt like annoying busywork. For a project that only you will ever use, it technically is, but the hope of open source is that you'll get other people using your work and, ideally, contributing back in turn, and that's when it's important to make sure you have everything sorted out.

Why Bother?

Well, for one, you or your users could theoretically be sued or otherwise legally entangled if you don't keep track of this stuff. Admittedly, it's fairly unlikely, but the consequence of, for example, unknowingly including GPL software in your proprietary product is potentially significant.

It's for that sort of reason that it's important to make sure everything is clean before some large corporations will risk even looking at your project. IBM is particularly good about this because they were significantly burned in the past, and came out of it with extremely-strict view, and that rubbed off on OpenNTF both culturally and with their gracious technical assistance along the way.

And, since large consumers require this sort of vetting, it's also important to know how to do it if you want to contribute code to an open-source organization like OpenNTF, Eclipse, or Apache.

It's also surprisingly satisfying once you get into the swing of it, I've found.

The Example Project

Since I've been spending a lot of time recently with the NSF ODP Tooling project, we'll look at its GitHub repository.

Common Files

There are a couple common features that tend to show up, and which both people and tooling (like GitHub's license identifier) look for:

  • The LICENSE file, which is the most critical. This contains the text of the license you're using, as well as one of the declarations of the copyright year (though, admittedly, it's easy to forget to include that part). This is what declares the effective license for the code in the repository that you own the copyright to, and should be included right from the start if possible.

  • The NOTICE file, which is vital if you're including any code from any sources not covered under the main copyright. This file should list all of the third-party code you have included in the repository, its license type, and, if possible, where to acquire it. If your project's distributable form includes additional third-party code not included in the repository (such as Maven or npm dependencies), these should be enumerated here as well

    • Writing this file has an important side effect in that it forces you to account for the licenses of your dependencies. More than once, I've run into a situation where I found that a common dependency had an incompatible license (such as the pure GPL). In some cases, this has meant abandoning the dependency outright, while in others it has meant finding a better-licensed alternative. Eclipse Orbit exists in large part for this purpose.
  • A legal directory containing any additional license/redistribution information not covered by the NOTICE. This can also sometimes take the form of files like NOTICE-Weld in the root of a project, and is useful for mass-including copyright/notice information from third parties in their original form.

In addition to including these files in the project repository, you should also make sure to include them in any binary distributions you make. In my projects, this takes the shape of inclusions in a Maven Assembly Plugin packaging file.

File Headers

I originally chafed against the idea of per-file copyright/license headers. They're not strictly necessary when the files are included in the original repository, they're redundant, you end up with massive commits touching hundreds of files just to change a year, and they can dwarf the size of the actual code they're copyrighting.

However, I've really come around to the practice of including them, and the main reason is that it makes the files easier for others to copy and use legally. It's one thing when someone finds their way to the root of your repository or distribution package, but it's another when they find an individual class by doing a web search or hitting F3 in Eclipse. In those cases, they can find their way up to the license (assuming your source package includes it), but it's much easier if it's just declared right up front.

It's also easier to clearly distinguish the third-party code you're including. When each file has its copyright information clearly noted, you can easily tell the difference between a sui-generis project file and an included third-party file without having to parse through the NOTICE every time.

And, fortunately, it doesn't have to be a huge hassle to maintain. In each of my Maven projects, I include a license plugin configuration to declare copyright information, any special data types, and which files to not include. Then, whenever I add new files or make a change after the turn of a year, I can run mvn license:format and it'll keep everything tidy for me.

pom.xml Configuration

Maven (and it's not alone in this) provides a lot of pom.xml-level elements to declare all sorts of metadata about your project, like its SCM repository, issue tracker, and, critically, license and developers. I like to declare the inception year, the license, and the <developers> block:

	<inceptionYear>2018</inceptionYear>

	<licenses>
		<license>
			<name>The Apache Software License, Version 2.0</name>
			<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
		</license>
	</licenses>

	<developers>
		<developer>
			<name>Jesse Gallagher</name>
			<email>jesse@frostillic.us</email>
		</developer>
	</developers>

I use that <inceptionYear/> value as part of the license-header file to keep track of the copyright range, at least when there's contiguous multi-year development.

OSGi Stuff

Since most of my projects are still OSGi, I've been aiming to improve my licensing setup there too. The main place where this comes into play is in feature.xml files, which have required elements to specify the copyright and license. It's not terribly unusual for these to end up as their "[Enter Copyright Description here.]" defaults, but it's important to fill these in. They're included in the "accept the licenses" dialog when installing into Eclipse/Designer, and are available in the "Installed software" descriptions in the UI.

But What License to Use to Begin With?

I'll finish off this post with what is actually the most important part of the process, but which can usually be answered simply. There are a lot of open-source licenses out there, and you could theoretically make up your own, but for our purposes the choice tends to follow some basic rules:

  • If you're contributing to an established OS organization, use theirs - for example, if you're contributing to Eclipse, use the EPL.

  • If you want your code to be mixed other OS projects and (potentially) proprietary ones, pick Apache or something like it.

    • At OpenNTF, we have a preference for Apache over other similar licenses, because it's well-established and makes copyright handling clearer than the equivalents, something that is critical for large companies. Let past lawyers do your legwork on this one.
  • If you want to require that users of your code keep the code open source, consider the GPL.

    • Be extremely wary of this, however: the GPL is intentionally "infectious" and limits how the code can be used. Various projects carve out little exceptions to the GPL to allow use in otherwise-non-GPL products, but it's still something of a minefield.
    • The GPL is one of the approved licenses for OpenNTF, but we kind of discourage it except in cases where a project is GPL because it's derived from previously-GPL'd code.
  • If you don't want to be bothered too much by copyright and just want the code out there, consider Public Domain. In practice, it's usually best for you to retain copyright, but explicitly declaring Public Domain is certainly an effective way of allowing any use.

For projects in our community, the quick answer is "use Apache". It's permissive, covers copyright, and is known and trusted by pretty much everyone.

More Work Than It's Worth?

Both the earlier parts of this post and Betteridge's Law contribute to making it clear that my answer is "no, it's not more work than it's worth", but I can certainly see why it'd feel that way. The first couple times I submitted projects to OpenNTF and got a "here's some stuff to fix" email from Peter Tanner, part of me definitely chafed at the whole thing. That can be particularly the case for Notes-based contributions - sometimes, you just want to plunk an NTF on the project page and be done with it, and Notes certainly doesn't have a "wrap this NSF copy in a ZIP with LICENSE and NOTICE files" checkbox.

However, as I learned more about the legal importance of having licenses correct and got more practice at doing this stuff from the start, I started to appreciate the whole process. It also turned out to be really helpful to sort this stuff out on smaller projects before working on larger ones, especially ones with established teams and procedures.

In all, it's worth it both to allow you to contribute to larger projects and, regardless of project size, it's worth it for anyone consuming your code.