Skip to content

Latest commit

 

History

History
319 lines (237 loc) · 18 KB

README.md

File metadata and controls

319 lines (237 loc) · 18 KB

uruk

Uruk is the Clojure wrapper for MarkLogic's XML Content Connector for Java (XCC/J). Uruk empowers you to access your Enterprise NoSQL database from Clojure.

With Uruk, you can use MarkLogic's XCC API to:

  • evaluate stored XQuery programs
  • dynamically construct and evaluate XQuery programs
  • manage documents and stream inserts

The name Uruk comes from the ancient Mesopotamian city-state and period in which some of the oldest known writing has been found. One can see Uruk as perhaps the first document database—and it certainly wasn’t organized relationally.

Maintenance Status

Uruk is used in production and is under active maintenance. This project is sponsored by LambdaWerk. For commercial support inquiries please get in touch at dave.liepmann@gmail.com.

Uruk is part of the XQuery-mode stack for working with XQuery in emacs.

Installation

Clojars Project

To install, add the following dependency to your project.clj dependencies: [uruk "0.3.11"]

In your namespace: (:require [uruk.core :as uruk]). (I also like ur as an alias, for brevity. Delightfully, Ur is another ancient city-state with ties to the origins of written documents.)

To run Uruk locally, you need MarkLogic installed on your machine. To run Uruk's tests or examples, see configuring MarkLogic for Uruk below.)

API docs

Online API docs via Codox and autodoc. Uruk documentation is also available on cljdoc.

Usage

Resources

For some background, see the XCC Developer's Guide and the MarkLogic XCC Javadoc to understand what Uruk is talking to.

For examples of how to use specific types and functions, see test/uruk/core_test.clj. Examples in this README are included for reference in src/uruk/examples/readme.clj.

MarkLogic configuration

To run Uruk's tests or evaluate its examples directly in a REPL, you'll need to configure MarkLogic on your machine to match the settings Uruk expects. If you have an existing MarkLogic install, feel free to skip these steps and instead point your REPL at your own database.

  1. Install and start a local MarkLogic server via the Install Instructions.

  2. Open the Admin Interface at http://localhost:8001/

  3. Create a forest named "UrukForest"

  4. Create a database named "UrukDB". Attach it to UrukForest but otherwise leave use the default settings.

  5. Create an XDBC Server named "UrukServer" on port 8383.

  6. Create role uruk-tester-role with URI privilege view-uri, execute-privileges any-uri, xdmp:external-binary, and xdmp:timestamp, and all the default document permissions (node-update, execute, update, insert, and read) for xa (these are all needed for specific tests).

  7. Create user uruk-tester with password "password" and roles of xa and uruk-tester-role. This will be used to run tests and README examples.

  8. Finally, add environment variable URUK_TEST_IMG_PATH (e.g. export URUK_TEST_IMG_PATH=/path/to/uruk/resources/ml-favicon.ico) to your Bash profile (.bashrc) and make sure it's available to your environment.

You should now be able to run lein test and, if you start up a REPL, the examples in test/uruk/core_test.clj.

Examples of using Uruk

For ease of replication, the examples below are also in src/uruk/examples/readme.clj.

Basic usage takes the form of:

(with-open [session (uruk/create-session {:uri xdbc-uri :content-base database-name
                                          :user database-user :password database-pwd})]
  (uruk/execute-xquery session xquery-string))

...of which a concrete example is:

(with-open [session (uruk/create-session {:uri "xdbc://localhost:8383/"
                                          :user "uruk-tester" :password "password"})]
  (uruk/execute-xquery session "\"hello world\""))

...which in this case should return ("hello world") (if you provide valid credentials).

Let's def our database information for brevity in the rest of our examples:

(def db {:uri "xdbc://localhost:8383/"
         :user "uruk-tester" :password "password"
         :content-base "UrukDB"})

Using that database info, let's take an overview of query functionality. Most use cases are handled by passing an optional configuration map to functions execute-query or execute-module, like so:

(with-open [session (uruk/create-session db)]
  (uruk/execute-xquery session
                       "xquery version \"1.0-ml\"; doc('/bigdoc.xml')"
                       {:types :raw
                        :options {:cache-result false}
                        :variables {:a "a"}
                        :shape :single}))

Each optional key in that configuration map is described below.

Types

Basic type conversion is performed automatically for most XCC types. If for any reason you need access to the raw results, use the :types key in the config map, passing :raw like so:

(with-open [session (uruk/create-session db)]
  (uruk/execute-xquery session "\"hello world\"" {:types :raw}))

=> #object[com.marklogic.xcc.impl.CachedResultSequence 0x2c034c22 "CachedResultSequence: size=1, closed=false, cursor=-1"]

This lets you inspect result types with result->type:

(with-open [session (uruk/create-session db)]
  (uruk/result->type (uruk/execute-xquery session "\"hello world\"" {:types :raw})))

=> "xs:string"

Those result types are matched with :xml-name values in the xcc-types look-up table, which contains the :ml->clj function that Uruk uses to transform result items into more manageable Clojure types. (For most types that’s as simple as #(.asString %) (for XdmDocuments) or reading the number contained in a string. But if you need more in-depth handling of results, you can override the default mappings a la carte by passing a map to the aforementioned types parameter, like so:

(with-open [session (uruk/create-session db)]
  (uruk/execute-xquery session
                       "xquery version \"1.0-ml\"; doc('/dir/unwieldy.xml')"
                       {:types {"document-node()" #(custom-function %)})})

The keys for this map are used to look up :xml-name, and the values replace :ml->clj.

Shape

For convenience, you can mold query results by specifying :shape in the configuration map:

:shape value Result
nil ignore response, returning nil
:single return just the first element of the response
:single! if the response is one element, return just that element; if not (i.e. if the response is more than one element) throw an error
anything else return response as-is

For example, to clean up our simple example from earlier:

(with-open [session (uruk/create-session db)]
  (uruk/execute-xquery session "\"hello world\"" {:shape :single}))
=> "hello world"

Options

Uruk enables you to set Request options on your queries.

Request options are passed as a map to the :options key in the config map. All keys in that inner map must be present in valid-request-options. For example, to retrieve a document as a stream, use the :cache-result request option, which corresponds to MarkLogic's RequestOptions.setCacheResult. (Notice that we also specify no type conversion, because otherwise we would get the document content itself.)

(with-open [sess (uruk/create-session db)]
  (uruk/execute-xquery sess "xquery version \"1.0-ml\"; doc('/content-factory/new-doc')"
                       {:types :raw
                        :options {:cache-result false}}))
=> #object[com.marklogic.xcc.impl.StreamingResultSequence 0x6d7f6 "StreamingResultSequence: closed=true"]

Variables

Uruk empowers you to pass XDM variables to your query, through the :variables key in the configuration map. Variables are most easily passed as a simple mapping from name keys to String values, like so:

(with-open [session (uruk/create-session db)]
  (uruk/execute-xquery session "xquery version \"1.0-ml\";
                                declare variable $my-variable as xs:string external;
                                $my-variable"
                       {:variables {"my-variable" "my-value"}
                        :shape :single!}))

If you need a non-XS_STRING variable, then use the more nuanced map-of-variables syntax:

(with-open [session (uruk/create-session db)]
  (uruk/execute-xquery session "xquery version \"1.0-ml\";
                                declare variable $my-variable as xs:integer external;
                                $my-variable"
                       {:variables {"my-variable" {:value 1
                                                   :type :xs-integer}}
                        :shape :single!}))

The value for type should be a keyword corresponding to a key in variable-types, e.g. :document for XML documents (ValueType/DOCUMENT). It defaults to XS_STRING if :type is not specified. For example, the first simple variables map example above could also be described as {"my-variable" {:value "my-value"}}.

Depending on the XdmValue type, conversion of expected Clojure values is automatic, for instance with this booleanNode:

(with-open [session (uruk/create-session db)]
  (uruk/execute-xquery session "xquery version \"1.0-ml\";
                                declare variable $my-variable as boolean-node() external;
                                $my-variable"
                       {:variables {"my-variable" {:value false
                                                   :type :boolean-node}}
                        :shape :single!}))

Of particular interest is that variables that are XML document-nodes or elements can be created by passing either a String representation, a hiccup-style vector, or a clojure.data.xml.node.Element. (Uruk uses clojure.data.xml 0.1.0-beta2 in order to get its namespace support.)

Values are converted according to the :clj->xdm key in xcc-types. If you need to override those conversions, set the :as-is? key to true inside the map describing the variable. This puts the onus of producing the correct object on you. For instance, we could set :as-is? for that booleanNode:

(with-open [session (uruk/create-session db)]
  (uruk/execute-xquery session "xquery version \"1.0-ml\";
                           declare variable $my-variable as boolean-node() external;
                           $my-variable"
                       {:variables {"my-variable" {:value (-> (com.fasterxml.jackson.databind.node.JsonNodeFactory/instance)
                                                              (.booleanNode false)
                                                              ValueFactory/newBooleanNode)
                                                   :type :boolean-node
                                                   :as-is? true}}
                        :shape :single!}))

The variables map syntax also accepts a :namespace key.

Content Sources and Session Creation

In addition to the basic create-session function that we've been using thus far, Uruk also supports session creation through all the various ContentSourceFactory methods in MarkLogic. Functions make-uri-content-source, make-hosted-content-source, and make-cp-content-source are used to create ContentSource objects that can be manipulated for more complex session-management processes in your application. Note also that create-default-session lets you create sessions by directly invoking the default login credentials of your content sources.

Transactions

Multiple database updates that must occur together can take advantage of transactions. To borrow an example from the XCC Developer’s Guide:

The following example demonstrates using multi-statement transactions in Java. The first multi-statement transaction in the session inserts two documents into the database, calling Session.commit to complete the transaction and commit the updates. The second transaction demonstrates the use of Session.rollback. The third transaction demonstrates implicitly rolling back updates by closing the session.

– Programming in XCC > Multi-Statement Transactions

We translate the original Java to Clojure, taking advantage of Clojure’s with-open idiom:

;; Open a session and configure it to trigger multi-statement transaction use:
(with-open [session (uruk/create-session db {:auto-commit? false :update-mode true})]
  ;; The first request (query) starts a new, multi-statement transaction:
  (uruk/execute-xquery session "xdmp:document-insert('/docs/mst1.xml', <data><stuff/></data>)")

  ;; This second request executes in the same transaction as the
  ;; previous request and sees the results of the previous update:
  (uruk/execute-xquery session "xdmp:document-insert('/docs/mst2.xml', fn:doc(\"/docs/mst1.xml\"));")

  ;; After commit, updates are visible to other transactions. Commit
  ;; ends the transaction after current statement completes.
  (uruk/commit session) ;; <—- Transaction ends; updates are kept

  ;; Rollback discards changes and ends the transaction. The following
  ;; document deletion query never occurs, since it is rolled back
  ;; before calling commit:
  (uruk/execute-xquery session "xdmp:document-delete('/docs/mst1.xml')")
  (uruk/rollback session) ;; <– Transaction ends; updates are lost

  ;; Closing session without calling commit causes a rollback. The
  ;; following update is lost, since we don't commit before the end of
  ;; the (with-open) and its implicit `.close`:
  (uruk/execute-xquery session "xdmp:document-delete('/docs/mst1.xml')"))

Inserting Clojure XML Elements

You can insert clojure.data.xml.node.Element objects as content:

(with-open [session (uruk/create-session db)]
  (uruk/insert-element session
                       "/content-factory/new-doc" ;; uri to insert at
                       (clojure.data.xml/element :foo)))

This function takes an optional map describing document metadata, including Content Creation Options to use during the insert. For example:

(with-open [session (uruk/create-session db)]
  (uruk/insert-element session
                       "/content-factory/another-new-doc"
                       (clojure.data.xml/element :bar)
                       {:quality 2}))

See uruk.core/valid-content-creation-options, which is a Clojurey version of the possibilities described by ContentCreateOptions.

Inserting Text

You can also directly insert text as content, in any of MarkLogic's supported forms (text, binary, JSON, XML):

(with-open [session (uruk/create-session db)]
  (uruk/insert-string session
                      "/content-factory/new-text-doc" ;; uri to insert at
                      "<abc>def</abc>"))

The insert-string function used here automatically detects string type and inserts the correct type of content. For instance, in this example, the string will be automatically inserted as XML, since clojure.data.xml/parse-str successfully parses it as XML. This function takes options just like insert-element.

Uncovered surface area

Uruk is sturdy and ready for production. However, some aspects of the XCC/J API have not yet been implemented:

TODO

  • update clojure.data.xml preview dependency--see /~https://github.com/clojure/data.xml/blob/master/CHANGES.md
  • look into possibly using clojure.spec (once Clojure 1.9 is stable)
  • (breaking change) consider namespaced keys for various config options
  • generative testing (for instance, in as-expected-session-config?)
  • ensure insert-element robustly covers needed use cases
  • possibly implement REx to automatically parse XQuery for XDM variable types
  • possibly implement use-fixtures within tests to create user with appropriate permissions

License

Copyright © 2016-2018 David Liepmann

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.