- Java Hiccups
- Bitwise Operators
- Java Grab Bag 2
- Java Travelogue: The Care and Feeding of Locales
- More Notes on Filesystem and Charset Portability
A few months back, I talked about some localization troubles in the NSF ODP Tooling and how it's important to be explicit in your handling of this sort of thing to make sure your code will work in an environment that isn't specifically "Linux or macOS in an en-US environment".
Well, after making a bunch of little tweaks over the last few days, I have two additional tips in this arena! Specifically, my foes this round came from three sources: Windows, my use of a ZIP file filesystem, and the old reliable charset.
The first bit of trouble had to do with how those two things interact. For a long time, I've been in the (commonly-held) habit of using
File.separatorChar to get the default path separator for the system - that is,
\ on Windows and
/ on most other platforms. Those work well enough - no real trouble there.
However, my problem came from using the Java NIO ZIP filesystem on Windows. Take this bit of code:
Path is a path on the local filesystem, that works just fine, taking a path like "com/example/Foo.java" and turning it into "com.example.Foo". It also works splendidly on macOS and Linux in all cases, the two systems I actually use. However, when
path represents a path within a ZIP file and you're working on Windows, it fails, returning a "class name" like "com/example/Foo".
This is exactly what happens when compiling an ODP using a remote Domino server running on Windows. For the portability reasons mentioned in my previous post, the client sends a ZIP of the ODP to the server and then the compilation pulls directly out of that ZIP instead of writing it out to the filesystem. The way the ZIP filesystem driver in Java is written, it uses
/ for its path separator on all platforms, which is consistent with dealing with ZIP files generally. But, when mixed with the native filesystem separator, that line resolved to:
...and there's the problem. The fix is to change the code to instead get the directory separator from the contextual filesystem in question:
A little more verbose, sure, but it has the advantage of functioning consistently in all environments.
This also has significant implications if you use static properties to store filesystem-dependent elements. This came into play in my
OnDiskProject class, which contains a bunch of path matchers to find design elements to import from the ODP. Originally, I kept these in a static property that was generated by writing them Unix-style, then running them through a generator to use the platform-native separator character. This had to change, since the actual ODP store may or may not be the platform-native filesystem. This sort of thing is pervasive, and it'll take me a bit to get over my long-standing habit.
Over-Interpreting Character Sets
This one is similar to the charset troubles in my previous post, but ran into subtle trouble in the ODP compiler. Here was the sequence of events:
- The ODP Compilers reads the XSP source of a page or custom control using
ODPUtil, which read in the string as UTF-8
- It then passes that string to the Bazaar's
- That method uses
StringReaderand an IBM Commons
ReaderInputStreamto read the content
- That content is then read in by
FacesReader, which uses the default DOM parser to read the XML
In general, that flow worked just fine. However, that's because, in general, I write US-ASCII markup. However, when the page contains, say, Czech diacritics, this goes off the rails. Somewhere in the interpretation and re-interpretation of the file, the UTF-8-iness of it breaks.
Fortunately, this one was a clean one: XML has its own mechanism for declaring its encoding (and it's almost always UTF-8 anyway), so my code doesn't actually need to be responsible for interpreting the bytes of the file before it gets to the DOM parser. So I added a version of the Bazaar method that takes an
InputStream directly and modified NSF ODP to use it, with no extra interpretation in between.