frostillic.us

Showing posts for tag "xml"

Winter Project #2.5: XML Schemas for XPages

Thu Jan 02 10:59:58 EST 2020

Tags: xml xpages

After I did my initial port of my XSP completion assistant to LSP4XML, I got to thinking about improvements I could make to it. One of these was the notion of creating XML Schemas for a project's XPages, similar to how the DXL contributor just passes along the schema files that ship with Notes and Domino.

Doing the same with XSP isn't nearly so easy, and comes with a good number of gotchas that make it not only impossible to fully validate an XPage with schemas, but also difficult-to-impractical to generate such schemas on the fly outside of Designer. But first, two asides!

Aside #1: Mechanisms of Content Assistance

At its core, doing content assistance in a text editor is essentially a matter of the editor saying to the assistance plugin "I have a file here, with this content, at this location, and the user's cursor is in this spot. What should I suggest as the next part to type? And, once they've typed it, can you tell me if it is valid?".

In the simplest case, this could be something like a dictionary of words when writing prose. A content assistance plugin for English-the-language could merely consist of a list of known words, and it would take a prompt from the editor of "ca" and suggest "car", "cat", and so forth. There's an upper limit to what that sort of thing can and should do, and just doing that would probably suffice.

Programming languages are more complicated, and often the best route for autocomplete is to basically also be a whole compiler with full knowledge of the language and the structure of not only the current file, but also of other files in the project and all the dependencies. That way, an IDE can suggest types, method names, parameters, and all sorts of complicated external notions.

XML/HTML content assistance is often somewhere in between there. In both the case of my original XSP implementation and the LSP4XML port, the route I took was to let the in-between layer handle knowledge of raw XML mechanics like tags and attributes, but then basically provide a dictionary of known words when asked. So, when the editor said "the user typed xp:v", my code searched through all of its known components and responded with "maybe xp:view or xp:viewPanel".

I decided over the last couple days to take another route, based on the fact that XML is designed to be a toolkit for making fully-described and -validated markup.

Aside #2: XML Validation

XML itself is something of a meta-language: it has its rules, but it's intended to be a format used to describe other, more-specific grammars. On its own, it has the notion of whether or not a document is well-formed: this means specifically that it follows all the syntax rules of XML, like the proper use of brackets, attribute quotes, element hierarchy, and all that. Well-formedness is comparatively easy to enforce, and basically any XML editor does, but it doesn't say anything about whether a given XML document is a valid example of its kind.

That job is left up to a secondary definition, usually done via either a Document Type Definition file or an XML Schema, though there are more ways than that. These are the things that say, for example, that in XHTML the root element must be <html>, and that element contains zero or one <head> element and one <body> element, and so forth.

With the aid of a document schema, an XML processor (such as a structured text editor) can verify first that a document is well-formed XML and second that all of the elements that it can match up with the schema are valid. Moreover, it can itself maintain the list of potential elements, attributes, and values, and display them in a clean and fast way without the content-assistance plugin having to worry about parsing and substring matching.

That "...elements that it can match up..." caveat comes in to play because XML allows mixing grammars in a single file and the processor may or may not require that all of those grammars be strictly defined. What identifies these grammars in a file is the use of an XML "namespace", which is a URI that may or may not actually go anywhere. For example, the MathML namespace is "http://www.w3.org/1998/Math/MathML" and the SVG one is "http://www.w3.org/2000/svg". Both of those are URLs that resolve to pages, but that's just because the W3C is being nice; the only requirement is that they are unique URIs. In an individual document, these namespaces may be matched up to prefixes, and one can be defined as the base namespace for elements that don't have prefixes.

XSP as an XML Grammar

Which brings us to XPages. XSP-the-markup-language is XML-based and so every XSP document must be well-formed. Designer won't even give you the time of day if, say, you leave off the closing </xp:view> tag at the end of a file. Additionally, XSP has many of the trappings of a fully-validated XML language. It uses namespaces as its prime identifier to identify tags and you can't just write any old tag in one of those namespaces or give an existing tag some random new attribute. Moreover, many attributes and elements have special rules about their content: you can't put text inside <xp:this.data/>, and <xp:text disableOutputTag="foo"/> is illegal. These are all specialities of XML Schema.

XSP does not, though, have any schema. Most of the validation happens at build time (including of XML well-formedness, apparently), and some is even delayed until runtime (like that invalid boolean above). There are some good reasons for this. First and foremost is the fact that XSP really describes Java objects that are themselves contributed programmatically, by way of XSP library classes. Once your path to validity runs through arbitrary Java code, it means that you don't have the option to statically compare the file to a schema - you have to run that code inside your environment. Additionally, the XPages namespace ("http://www.ibm.com/xsp/core") isn't even defined in a single place, and has controls declared across several plugins. Same goes for the ExtLib's "http://www.ibm.com/xsp/coreex" namespace, and then there's the Custom Control namespace, "http://www.ibm.com/xsp/custom", which can't even be determined at a global level and has to be synthesized repeatedly on a project-by-project basis. And then, beyond all of that, XSP has rules that just can't be expressed in XML Schema at all, so Designer would have to have a secondary validator anyway, dampening the benefits of codifying a schema.

But Could It Work, Though?

Still, I figured that, if I could craft a schema that's good enough for basic use, I could get some extra completion and suggestion assistance while hopefully marking everything as vague enough to not run into trouble with the flexible nature of XSP.

My biggest ally here is that, while the core and ExtLib namespaces are technically open at any point, it would be such bad form for, for example, a company-specific library to declare its components as part of it that I can ignore that possibility entirely. In effect, for a given Domino release, those namespaces are sealed and fully describable ahead of time.

So I set out to see if I could make XML schemas to contribute to LSP4XML to give it some more knowledge of what it's working with. Like when I generated the JSON used by the original content assistance, the tack I took was to write a servlet that runs in an XPages context and emits the files I want based on a stock runtime. My initial version of this was too clever: XML Schema allows for subclassing, inheritance, and references, and I originally set out to have the schemas match the structure of the underlying component trees. I got pretty close on this, but ended up spending all my time fighting namespace collisions, and in particular the really-subtle one where the roots of the tree are actually defined in the "http://www.ibm.com/xsp/jsf/core" namespace, which is an IBM fork of JSF's "http://java.sun.com/jsf/core". As a side note, there are no concrete component definitions there, so you can't get any use out of it in an XSP file; it's just an interesting implementation detail.

The Current Results

When I took a step back and gave it another whack, I ended up coming up with something that pretty much works. I'm still not sure if it'll actually be the way I keep going, though. For one, there are currently a bevy of open bugs, some of which may end up being showstoppers. Beyond that, though, since XSP still requires a secondary processor to check validity, it may end up being the most reasonable route to just go back to offering completion ideas and maybe some post-processing to check.

Still, this is one of those types of projects that's worth it just as a learning experience. I hadn't even really looked at XML Schemas since around when they came out, and this sure was a good way to get a crash course. And, in the mean time, I think that the generated schema files are pretty-interesting artifacts.

No comments

Winter Project #1: XPages LSP4XML Extension

Fri Dec 27 16:27:14 EST 2019

Tags: nsfodp xml xpages

The last couple weeks of the year are always a good time to work on some side projects or small utilities to scratch an itch, and this year definitely ended up that way, seeing me work on a couple interesting things over this Christmas week.

The Project

The first of these is an enhancement to the NSF ODP Tooling project. Among the various components that I've put in there over time is an Eclipse content assistance plugin that provides some autocomplete capabilities when working with XPages and Custom Controls within an ODP. Specifically, it knows about the stock and ExtLib controls that ship with Domino, as well as any Custom Controls within the same project, and it allows Eclipse to provide completion suggestions. The code in there is actually hairier than you might think, and that's because it's building on top of Eclipse's generic text completion system, with only a little assistance from an existing implementation I cribbed. Still, it does the job.

The Problem

However, most of my XPages development nowadays takes place first outside Domino and then only ends up back on Domino when I want to make sure it works.

The trouble there is that the content assistance plugin I wrote is tied to both the ODP nature (an Eclipse-ism meaning that it knows a given project is associated with a set of capabilities) and the specific layout of an on-disk project.

The Quick Route

I originally set out to just loosen up that association a bit - take the existing plugin and allow it to work with any .xsp file and to try to find custom controls in a more webapp-type layout of a project.

However, I figured this was a good option to look beyond that. Though the plugin does indeed do what I want, the trouble is that it's thoroughly tied to Eclipse specifically. All of the classes use Eclipse core and XML tooling classes extensively and none of that would be portable to any other IDE. I figured this would be a perfect time to jump into the world of the Language Server Protocol.

Aside: What is LSP?

The Language Server Protocol is a standard for a way to provide IDE-type support in a way that's not dependent on any specific IDE. It grew out of Visual Studio Code and is gradually seeping its way across the whole development landscape.

It allows for the specifics of handling a language - checking validity, resolving classes and other entities, identifying keywords, and so forth - to be separated from the IDE used for editing it. Using LSP, if an IDE wants to support editing, for example, JavaScript, its creator no longer needs to create and maintain a tool to handle all of the intricacies and changing rules of the language - instead, it can bring in the LSP implementation for it and then focus just on the specifics of what the IDE does differently from others.

Additionally, this decoupling means that the LSP implementation doesn't even have to be written in the same language as the IDE using it, and are often just written in the same language that they're implementing. If you look across the list of implementations, you can see that the Swift server is written in Swift, the Ruby one in Ruby, and the Java one is actually Eclipse's Java tooling extracted from the IDE.

Wild Web Developer

Since Eclipse long predates LSP, it's historically had its own implementations for any languages it supports. Though it's had enough clout to support a lot of languages, some of them have trailed behind. Like, really far behind. Prime among these have been its web-language-related editors, which do an okay-enough job editing basic HTML, CSS, and JavaScript, but pretty much missed the boat on newer features, transpiled languages like TypeScript, and project structures like NPM. While there have long been plugins for Angular and TypeScript, they never fully kept up and the whole thing ended up falling far, far behind other IDEs like VS Code.

Enter Wild Web Developer, the whimsically-named project to bring the fruits of the LSP development to bolster Eclipse's web-tech support. Though it's named for its web language implementations, what it really is is a combination of two things: a generic text editor backed by a small array of LSP implementations and a syntax-coloring system derived from TextMate, which itself became a pseudo-standard for syntax coloring.

LSP4XML

That brings us to the piece that ties together LSPs and my immediate desires: LSP4XML, the most-popular XML Language Server implementation, which is used by both VS Code and Eclipse, and just so happens to be written in Java and is designed to be extended.

Since LSP4XML is so smoothly extensible and Wild Web Developer just added a way to contribute these extensions in Eclipse, that meant I could accomplish what I want without having to worry about writing a whole LSP implementation just to support XSP and DXL.

XSP Completion Participant

Contributing to an LSP4XML server involves creating an extension class that then registers the individual capabilities you want to provide.

In this case, I contributed an ICompletionParticipant implementation. ICompletionParticipant has a delightfully-straightforward API, and all you have to do is provide tag, content, and attribute suggestions based on the context the user's cursor is in.

With this simpler API, I was able to significantly refactor down my earlier implementation, making it much more readable and focused.

DXL Schema Contributor

The other piece of XML completion that I added to the NSF ODP Tooling was to provide the (blessedly-redistributable) DXL schemas that ship with Domino to Eclipse. Unlike the XSP completion assistant, this plugin is entirely code-free, consisting solely of the schema file itself and an extension contribution in plugin.xml. The reason this works is that each DXL file declares its XML namespace at the top of the file, and so I can tell Eclipse to look for the schema file I'm providing when editing DXL.

LSP4XML also provides a way to provide schema files, but it's a little more complicated, involving a resolver class implementation. The idea is the same, though, mapping the namespace to the DTD file.

In both cases, the standardized and descriptive nature of XML schemas means that merely providing them to the IDE allows for all sorts of code assistance, even down to the level of suggesting and validating attribute values. It's pretty great.

Side Benefits

Making this switch to LSP4XML accomplished my original goal: by changing the XSP handling in the NSF ODP Tooling for Eclipse, I switched over to the Wild Web Developer editor and got editing in .xsp files anywhere (and a bit snappier to boot, since it's inherently heavily multithreaded).

But, like I mentioned, Language Servers are used across IDEs, most notably Visual Studio Code. Thanks to VS Code's equivalent LSP4XML extension mechanism, I was able to contribute the same extensions used for Eclipse there, and get the same type of results. That's a far cry from being able to get all of the NSF ODP Tooling capabilities outside of Eclipse, but it's a big start.

The Next Version

Currently, these additions are just in the develop branch of the tooling and haven't made their way to a proper release yet, but they've proven themselves so far in my use. My hope is to make a few more improvements, get the VS Code extension into shape, and make it part of a "3.0" release of the Tooling.

No comments

Fun With Old XML Features

Wed Apr 03 01:35:55 EDT 2013

Tags: java xml

One of the side effects of working on the OpenNTF Domino API is that I saw every method in the interfaces, including ones that were either new to me or that I had forgotten about a long time ago. One of these is the "parseXML" method found on Items, RichTextItems, and EmbeddedObjects. This was added back in 5.0.3, I assume for some reason related to the mail template, like everything else added back then. Basically, it takes either the contents of a text item, the text of a rich text item, or the contents of an attached XML document and converts it to an org.w3c.dom.Document object (the usual Java standard for dealing with XML docs).

That's actually kind of cool on its own in some edge cases (along with the accompanying transformXML method), but you can also combine it with ANOTHER little-utilized feature: XPath support in XPages. So say you have an XML document like this attached to a Notes doc (or stored in an Item):

<stuff>
	<thing>
		<foo>bar</foo>
	</thing>
	<thing>
		<foo>baz</foo>
	</thing>
</stuff>

Once you have that, you can write code like this*:

<?xml version="1.0" encoding="UTF-8"?>
<xp:view xmlns:xp="http://www.ibm.com/xsp/core">
	<xp:this.data>
		<xp:dominoDocument var="doc" formName="Doc" action="openDocument">
			<xp:this.postOpenDocument><![CDATA[#{javascript:
				requestScope.put("xmlDoc", doc.getDocument().getFirstItem("Attachment").getEmbeddedObjects()[0].parseXML(false))
			}]]></xp:this.postOpenDocument>
		</xp:dominoDocument>
	</xp:this.data>
	
	<xp:repeat value="#{xpath:xmlDoc:stuff/thing}" var="thing">
		<p><xp:text value="#{xpath:thing:foo}"/></p>
	</xp:repeat>

</xp:view>

I don't really have any immediate use to do this kind of thing, but sometimes it's just fun to explore what the platform lets you do. It's one more thing I'll keep in the back of my mind as a potential technique if the right situation arises.

* Never write code like this.

No comments