From 309ef2ceb65027fd9d90a5069bf208938c95c6be Mon Sep 17 00:00:00 2001 From: mignatovich Date: Wed, 31 Jul 2019 16:21:11 -0400 Subject: [PATCH] Article from conference is start page of cite now. --- content/about.md | 509 +++++++++++++++++--- content/documentation/WASM.md | 1 + content/documentation/article.md | 434 ----------------- content/documentation/learning_resources.md | 2 +- content/documentation/manifest.md | 1 + content/documentation/whitepaper.md | 290 ----------- layouts/documentation/list.html | 4 +- layouts/documentation/single.html | 2 +- layouts/index.html | 5 +- layouts/partials/buttons.html | 3 - layouts/samples/list.html | 4 +- layouts/samples/single.html | 2 +- 12 files changed, 467 insertions(+), 790 deletions(-) delete mode 100644 content/documentation/article.md delete mode 100644 content/documentation/whitepaper.md diff --git a/content/about.md b/content/about.md index 4ce6c16..6f35e42 100644 --- a/content/about.md +++ b/content/about.md @@ -4,75 +4,476 @@ date: 2018-07-19T18:35:05-04:00 draft: false --- -In kolmoblocks, data chunks (or data blocks) are identified by their cryptographically strong hashes. For example, ”ABABABF”, an ASCII-serialized string, can be can identified by its SHA-256: **CC6460622F8AFE1A4DC7A92E34D53709115B0E39CCC3F323A35FA5965DB006D6** (we will contactinate hashes in the examples to only the first 5 digits of its hex from here to keep it sane). +# KolmoLD: Data Modeling for the Modern Internet +--- +## ABSTRACT + +KolmoLD is a framework for content-addressable networking with a programmable layer: the data objects distributed across the network can contain program code to decode the data that is specified to be executed in a sandboxed environment and references other data objects based on a cryptographic hash function naming scheme. + +Traditionally, the algorithms to access given data are delivered via a channel that is separate from the one used to deliver the data itself, and usually involves installing custom software that could introduce security risks. In contrast, KolmoLD enables the delivery of the algorithms and the data along the same channel and ensures security. The KolmoLD approach unlocks the promise of the information-centered networks: enabling true data deduplication, data reuse and optimal data delivery. + +--- +## CCS CONCEPTS + +* **Networks** → **Future Internet**; *High-Precision Networking;* *Programmable Networks* + +--- +## KEYWORDS +Future Internet; Algorithmic Data Networks; Information Centered +Networks; Content Addressable Networks +ACM Reference Format: +Dmitry Borzov, Tingqiu(Tim) Yuan, Mikhail Ignatovich, and Jian Li. 2019. +KolmoLD: Data Modeling for the Modern Internet. In ACM SIGCOMM +2019 Workshop on Networking for Emerging Applications and Technologies +(NEAT’19), August 19, 2019, Beijing, China. ACM, New York, NY, USA, 7 pages. +https://doi.org/10.1145/3341558.3342198 + +--- +## INTRODUCTION + +Whether the messages sent across a network contain images, text, or voice, from the perspective of the network layer within the OSI model, they are abstracted out to indistinguishable strings of bytes with a destination address. This design principle, the separation of concerns in the representation of data between the application layer and the lower abstraction levels, has been cited as one of the key principles that enabled the Internet's success [[11](#references)]. + +And yet, with time, the industry grew to learn the limitations inherent in this concept. It transpired that not all data is equal when it comes to networking. For example, it could be argued that VoIP traffic is more sensitive to high latency than, say, traffic for a routine software update, and thus should be given priority in processing, all other things being equal. The necessity to differentiate the categories of traffic was one of the major motivating factors behind the rise of the family of technologies referred to as software-defined networking (SDN) [[9](#references)]. + +How can content distribution, one of the key traffic categories, be enhanced if the network was somehow cognizant of the meaning of the content shared? The concepts of algorithmic information theory, specifically the minimal description length (MDL) principle formulated by Rissanen, provides the theoretical basis for fundamentally more powerful network communication. According to this, optimal communication involves considering both the character of the message and the context the data recipient has [[8](#references)]. + +While the practicality of applying such an approach to the problem of content delivery networking might sound far-fetched at first, there are, in fact, some well-known design patterns implemented in domain-specific content distribution systems that can be interpreted as its embodiment, even in a relatively basic way. + +One example is software package distribution managers, such as the Advanced Package Tool (APT) system [[1](#references)] for the Debian Linux family of operating systems. Such systems are responsible for downloading and installing software packages from a central package repository. It is common that a digital asset, such as a specific version of a software library, is used in multiple packages. Consider a scenario where the same library is used in two packages: package A and package B. If the user, who already has package A installed, wants to install package B, the APT client program code can recognize that a given library is already locally available, and thus only request and download the parts of the package it actually needs. In other words, leveraging the aspects of the meaning of the content at hand, in this case the inner structure of the software packages, leads to avoiding unnecessary traffic. + +Another example is the encoding negotiation performed as part of the HTTP [[6](#references)]. The browser announces to the server a list of the data compression formats that it supports with a special HTTP request header. This enables the server to select the correct encoding software for the content that the browser requested, with the certainty that the browser will be able to decode and render the content correctly. + +This illustrates how being able to reference different data formats and incorporating the fact that the same message can be represented in different data formats leads to smarter communication. + +Note how there is nothing specific to the domains at hand (the software packages or web content) in both examples presented. The optimizations described in the examples are general in nature and can be applied to any data shared across the network. + +It is therefore reasonable to ask whether it is possible to design a general-purpose content delivery networking technology that introduces primitives to reflect some aspects of the meaning of the content being shared. In other words, whether content delivery can be empowered by being based on the MDL principle at the network level. + +KolmoLD, the framework being presented in this paper, aims to be such a design. + +--------------- + +## CONCEPTUAL OVERVIEW + +This section aims to provide a high-level conceptual overview of the KolmoLD framework as well as introduce core terminology. Note that more specific technical details can be found in the design section. + +In our experience, the best way to introduce the core ideas behind our proposal is with the following primer analogy. + +Pirate S obtains a treasure map and she needs help to find the treasure, but her pirate associates are dispersed across the Caribbean. She needs to send them the location of the treasure, so that they can meet her there. + +One way to achieve that is to send out a copy of the treasure map to each pirate. However, pirate S realizes that it could be unnecessary in some cases. She is aware that some of her associates might have access to the map of the treasure island, albeit without the "X" marking the treasure spot. All they might require is a way to identify the map they need and directions to the treasure's location. Others are more knowledgeable and can read navigational charts: these pirates just need the coordinates. + +In other words, pirate S only needs to provide the pirates with enough information for them to reconstruct the treasure map on their side. But which pirate needs what? Enquiring what each pirate needs might incur a big overhead and is something pirate S does not want to do. + +What pirate S does instead is produce a **manifest** for the treasure map and sends it to each of her associates: + +--- + +> The treasure map has the hash "A34F6". + +> It can be constructed from these two formulas: + +> - those who have a map of an island with the hash "52347" can mark the spot fifty feet South of Spyeglass Hill and they shall have the map. +> - those familiar with the navigational charts specified in the textbook with the hash "45006" will find the treasure at "23.8'N 82.23'W" + +--- + +The **formulas** are ways to construct the manifest's target **data**, in this case the treasure map. Each formula contains references to other data objects in the form of cryptographic hash function-based data object identifiers (or **DOI**s). + +Upon receiving the manifest, pirate A realises that her sister has the map of the treasure island, asks her for it, and fills in the location of the treasure to construct her treasure map. Pirate B uses the textbook "45006" and marks the location of the treasure using the specified coordinates. She can be sure that she interprets the coordinates given correctly as she could verify the hash of her navigation reference. + +Pirate S was able to optimize the distribution of the treasure map to pirates by distributing information about the properties of the content at hand, which other pirates were able to leverage to find the optimal way to reconstruct the data they were after. + +As in the case of the treasure map, any content can be described with manifests within KolmoLD. Such descriptions involve DOIs to other data objects and reflect the content's inner structure. The naming scheme based on the cryptohash function makes such references unambiguous and verifiable. The set of such manifests is free from the particulars of the underlying network such as, say, the node IDs, and is only based on the data it describes. In that sense, KolmoLD's manifest syntax can be seen as a data modeling language. + +The reference to the navigation textbook is a special type of data object introduced in this methodology that is interpreted as an algorithm to be applied to the referenced data as part of the manifest description. These special data objects are referred to as **kolmoblocks**. Such algorithms can specify the decoder for the data compression format used, for example. + +Altogether, network agents could potentially read and comprehend the data model specified with KolmoLD and make smarter decisions on how to reconstruct the content they are seeking, achieving a conceptually more powerful content delivery network. + +--- + +## BACKGROUND + +This section offers an overview of the key academic results, as well as products and projects relevant to the discussion. + + +### The MDL principle + +The concepts of Shannon entropy and Kolmogorov complexity are often quoted as the bases for two major approaches to information theory. + +As the approach based on Shannon entropy offers stronger framework for quantitative analysis, it tends to see wider adoption. However, the assumptions that come with the concept of Shannon entropy tend to break apart when they are applied to complex challenges in networking. Specifically, the concepts of the abstract source and constant probabilistic distributions do not fit well with the challenges of engineering practical content delivery networking. + +Some recent developments in networking theory illustrate this. A new field of network encoding has emerged in recent years , where the shape of the network is taken into account in message encoding to achieve more efficient communication. The theory attracted considerable interest and contributed to aspects of the 5G family of technologies. Yet this domain cannot be described within the Shannon entropy-based methodology. + +We propose using algorithmic information theory (based on the concept of Kolmogorov complexity) instead, and specifically, the minimum description length (MDL) principle formulated by Rissanen as the theoretical foundation to discuss and formulate our approach [8](#references). + +Let us give a minimal overview of how it applies to our case. + +We start with a network node seeking to retrieve our target data object \\( D \\). We are given a set of models \\( M \\) that can be used to characterize \\( D \\). The following cost functions are specified: + +(1) \\( t \\): traffic, the amount of data to be retrieved from the network, + +(2) \\( c \\): computational cost, + +(3) \\( m \\): computational memory requirements cost. + +A given \\( M_i \\) model from the set \\( M \\) can be applied to describe (that is, encode) \\( D \\), with the efficiency described with a cost vector: + +
+

+\begin{equation} +\vec{C}(M_i, D) = \bigl( + t \left\{ M_i(D) \right\} , + c \left\{ M_i(D) \right\} , + m \left\{ M_i(D) \right\} +\bigl) \qquad (1) +\end{equation} +

+
+ + +However, this is not the full cost of applying \\( M_i \\). The node also needs to be able to recognize the model applied and use it to decode the target \\( D \\). If the node already has the \\( M_i \\) decoder, maybe as a locally cached kolmoblock, it only needs a reference to that kolmoblock. Otherwise, the node has to first retrieve the full description of the decoder first, in the form of downloading the kolmoblock. This cost of the \\( M_i \\) description does not depend on the target \\( D \\) and can be represented as follows: + +
+

+\begin{equation} +\vec{R}(M_i) = \bigl( + t \left\{ M_i \right\}, + c \left\{ M_i \right\}, + m \left\{ M_i \right\} +\bigl) \qquad (2) +\end{equation} +

+
+ +Given both cost vectors for each \\( M_i \\) in \\( M \\), the network node can find the optimal model for obtaining the data object. + +The total cost for the network node is: + +
+

+\begin{equation} + \min_{ \forall i \in M } U\left\{ \vec{C}(M_i, D) + \vec{R}(M_i) \right\} \qquad (3) +\end{equation} +

+
+ +Here \\( U \\) is the utility function for the given network node that reflects the trade offs between different types of costs specific to this given node. + +The models represented above could be any method used to describe data, for example a data compression algorithm, such as the LZ-76 compression [[12](#references)]. Models could also reproduce a specific data object by leveraging other data objects on the network. For example, the data object that represents the string 'hello world' can be described by a model that appends the data object "world" to the data object "hello". + +In other words, \\( \vec{R(M_i)} \\) is the cost of retrieving the data objects needed to construct the \\( M_i \\) model: + +
+

+\begin{equation} +\vec{R(M_i)} = \min_{ \forall j \in M } U\left\{ \vec{C}(M_j, M_i) + \vec{R}(M_j) \right\} \qquad (4) +\end{equation} +

+
+ + +The MDL principle suggests that the key to optimal communication is the ability to calculate the cost vectors for the set \\( M \\) and target \\( D \\) **for each given network node**. The KolmoLD design offers network primitives that allow for sharing the information necessary to calculate the cost vectors. + +### Information-Centric Networks + +The network protocols [[2](#references)] that form the backbone of the Internet are based on the telephony network model: they enable point-to-point communication between two nodes by specifying the format for messages to be relayed from one node to another. As the Internet matured, the point-to-point network communication pattern gave way to the dominance of other ones such as, notably, content delivery, where one provider is tasked to deliver the same content to multiple consumers. + +While the telephony network model has proven to be resilient and powerful in numerous contexts, its advantages are more limited when used for pure content distribution. These complications can be compared to those that will arise when trying to broadcast television or radio content over the telephone network. + +An alternative network communication paradigm, called the information-centric network (ICN), has been proposed as a viable alternative for content distribution [[14](#references)]. On a point-to-point network, a node that wants to obtain a specific data object must send a request to another node that is known to have that data object. In contrast, on an ICN the requests are not defined by the addressee, but by the content (more specifically, the data object) they are looking for. Such content requests can then be served by any node that has the content. + +One way to illustrate the advantages of the ICN model is to consider the problem of caching JavaScript libraries. While each web application's JavaScript code is ultimately unique, it usually includes JavaScript libraries that are built to contain elements and functionality that commonly needed within web applications. As a result, multiple websites might include the same version of a library and such libraries commonly comprise a sizeable portion of the overall JavaScript code. According to one analysis [[3](#references)], this practice is so common that it is estimated that up to 72% of the websites use at least one of these JavaScript libraries. Even more strikingly, 79% of those websites use the most common JavaScript library, jQuery. + +As a best practice, the web developers prefer hosting all the JavaScript code and libraries they require on their own, rather than referencing the shared URLs, to minimize security issues and avoid dependencies on third parties for their end-user experience. As the end-user navigates from one website to the next then, the browser is forced to download the same copy of the JavaScript library anew. + +ICN's architecture is designed to gracefully handle situations such as these. Under ICN, a given JavaScript library can be referenced not by a URL, but by a content identifier. The copy can then be obtained from the closest node willing to serve it, and cached and re-used on local devices. + +ICN received considerable academic interest, resulting in multiple proposals and implementations. Named Data Networking (NDN) [[13](#references)] is an example of a high-profile project [[2](#references)]. + +### What's in a name? + +An ICN architecture proposal must specify numerous aspects of the problem: routing, data discovery, scaling [[14](#references)]. One design decision is at the core of all these aspects and can be used as the basis for the classification of ICN proposals: the naming scheme. This is specifically how the data objects in the network are referenced and identified. + +Using cryptohash functions as the basis for the naming scheme is one of the options. Cryptohash naming is a well-known design pattern also referred to as the content-addressable storage pattern. Some well-known software projects that can be viewed as ICN proposals, IPFS [[4](#references)] and BitTorrent [[5](#references)], adopt a hash-based naming scheme. + +On the one hand, this approach provides the following advantages: + + **(1) Universal naming convention**: the hash-based name is unambiguous, global, and self-contained; it does not depend on the context within which it is used; + + **(2) Does not depend on the circumstances of origination**: The only thing that defines the hash-based name is the data itself; not the circumstances behind who, when and with what purpose produced or published the given data object; + + **(3) Self-certifying name**: It is easy for the data receiver to verify whether the data object matches the name; + + +On the other hand, the drawbacks of this approach are as follows: + + **(1) Off-by-one errors**: while data reuse is often quoted as an advantage of the approach discussed, its impact has been deemed relatively limited, according to some quantitative analysis research. The reason is fundamental: two data objects that are only different by a bit will be identified as two different data objects with two distinct hashes. There is no way for the network to recognize and leverage the case where two objects contain related information in any way; + + **(2) No metadata**: the data by itself is virtually useless without the information concerning the data type or data format that is used. + + +We have illustrated that KolmoLD also adopts the hash-based naming scheme. Yet, the other contribution KolmoLD makes is providing the ability to encode arbitrary algorithms that are pertinent to the data object at hand. This means that similar data objects can be linked together with formulas describing ways to construct one data object from a related one, meaning that the decoder for the given file format can be described in an algorithm data object and referenced within the manifest. + +Thus, KolmoLD's approach brings about the advantages of hash-basing naming while mediating its drawbacks. + +### The right codec +As data compression format (or codec) choice mechanics is the key for efficient content delivery networking we argue that it should be incorporated as a component of network design. + +Choosing the optimal codec goes beyond achieving just the highest compression ratio [[12](#references)]: application context and end-device capabilities also need to be considered. For example, low-power devices with limited processing power would struggle to run computationally expensive decoders. Some other codec features that might influence the ultimate choice are domain specific, such as progressive download functionality that could display a lower resolution of an image to the end-user once a part of the whole data object has been downloaded. A higher resolution of the image will then be shown as the rest of the data is received. -Such a naming scheme provides you with the global address space of data, where data block's id only depends on the data itself, not where it came from. +In general, the more assumptions that can be made about the data, the better codec can be designed. The new generation of compression algorithms provide tooling to leverage this. For example, zstandard allows you to "train" your codec by feeding it the samples of data you would like to compress and building up a special data structure, a dictionary, of frequently encountered tokens. -One can note that such a naming scheme was implemented in [IPFS](ipfs.io) or [git](https://git-scm.com/). +Another example of the power of highly-specialized codecs can be illustrated with online multiplayer video games. The game architecture consists of a game server that keeps track of the state of the shared virtual environment. The information it needs to keep track of is specific to this online video game: player navigational inputs and player inventory, for example. The game server will send this information to all participating player client applications. Both the client and the server's program code is highly tuned and optimized to only send the absolute minimum of the data required. The client application then uses it to render the video stream that constitutes the game environment. Thus, the video game program code can be thought of us a special type of video codec: the one that only works for the video stream of that given video game. The more narrow the type of data we target, the more efficient compression we might achieve. -The key idea of kolmoblocks is to combine the hash-based naming scheme with the concept of sending data as programs that output that data. +Based on that, it would seem reasonable to expect a plethora of narrow-use domain specific data compression formats and codecs available in the marketplace. For example, video codecs that are tailored for specific domains, such as videos with solely e-learning content. -For example, lets say you want to send someone that example string from above, ”ABABABF”. You can send instead a program that outputs that string: +Instead, software engineers are usually stuck with several decades-old general-purpose data compression formats: an LZ-based compressor for structured data such as text and bytecode; jpeg or png for images; and so on. -```python -def render(): - return "ABABABF" +The reason is directly related to the effort required to see adoption of new codecs in the industry. Not only does the developer of a new codec have to design and implement one, she also needs the end-user agent to be able to recognize and use it. That usually means that new software supporting the new codec must be deployed on the end-user system before an application can use it. + +KolmoLD takes an approach that sidesteps this challenge altogether: what if you could distribute the decoder algorithm along with the data itself? This would enable the content publisher to select the codec that works best for the given data type without worrying about the codec's support. The recipient would be able to download the decoding algorithm just like any other content and use it to decode the given data. + +In this scenario, running the decoder algorithm effectively amounts to executing untrusted program code, which raises security and performance concerns. The execution environment needs to be sandboxed from the host machine with all the security issues carefully considered and addressed. + +Fortunately, there are developer communities who have gained experience addressing these challenges. The Ethereum Virtual Machine is an example of deterministic execution of Turing-complete programs that includes references to other code based on the hash naming scheme. Browser-based encapsulated execution of JavaScript code is another example. + +KolmoLD proposes a specification for such a sandboxed environment where Turing-complete algorithms can be executed in a runtime environment that is only able to access other input data objects, and outputs the target data object. + +The practical implementation for it is built on top of WebAssembly [[7](#references)]. This allows us to tap into the research and expertise accumulated in that community to solve a similar problem. + +--- +## DESIGN +At its core, KolmoLD provides primitives to describe the relations between the data objects shared and distributed across the network. The solution consists of the following components: + +**(1) KolmoLD manifest syntax** that allows network nodes to describe the data objects they handle, and their relations with other data objects. + +**(2) WebAssembly-based kolmoblock format** [[7](#references)] to encode arbitrary programs that can be executed to retrieve data objects. + +**(3) A versioned registry** of approved and supported WebAssembly-based modules that contain codecs and algorithms. + + +Consider the data objects in the following "Hello World" example: + +![image](/img/hello_world.jpg) + +The "Hello world!" string can be constructed in two ways: by concatenating strings "Hello " and "world!", or by concatenating strings "He" and "llo world!". + +This basic data model consists of the following data objects: + +(1) kolmoblock "cat", which encodes the logic to concatenate input strings; + +(2) strings "Hello world!", "He", "llo world!", "world!" and "Hello". + +This model can be described with the following manifest for the target string ">Hello world!" with the hash "B5D40" (we will be using only the first 5 digits of the hex representation of SHA-256 in our examples): + +---- +``` + (match, + (doi, "B5D40"), + (exec + (doi, "cat", + type: "application/wasm+kolmold" + ), + (doi, + type: "text/plain", + data: "Hello ", + ) + (doi, + type: "text/plain", + data: "world!", + ), + ) + (exec + (doi, "cat", + type: "application/wasm+kolmold" + ), + (doi, + type: "text/plain", + data: "He", + ) + (doi, + type: "text/plain", + data: "llo world!", + ), + ) + ) +``` +----- + + +The data object is a key abstraction of the KolmoLD methodology. It refers to any identifiable chunk of data of arbitrary (up to a multiple of bytes) size. The KolmoLD manifests allow for describing the properties of the data object, called a kolmoblock, and the relation between kolmoblocks. + +The manifest syntax supports three types of expressions: doi, exec and match. All were used in the example above. + +Or, more formally, in the Extended Backus-Naur form (EBNF): + +---- +``` +kolmold_expr ::= { doi_expr | exec_expr | match_expr } +``` +---- + + +We will now review each in more detail. + +### DOI expression +Data objects are identified by the data object identifier, or the Data Object Identifier(DOI). The doi expression allows for referencing a specific data object. + +The doi expression syntax is defined as follows: + +---- +``` +doi_expr ::== doi [] + [ type: ] + [ size: ] + [ data: ] + [ publisher: ] + [ sign: ] + [ func-verify: kolmoblock_expr ] + [ func-sign: kolmoblock_expr ] +``` +---- + + +The first argument is the cryptohash-based ID. Next follow optional attributes that provide additional metadata about the data object being described: + + +* **type**: MIME type-based data object type, + +* **size**: size of the data object in bytes, + +* **data**: Data literal of the data object, + +* **publisher**: public key used for digital signature. + +* **sign**: the digital signature. + +* **func-verify**: Verifier function used, expects a registered kolmoblock doi, + +* **func-sign**: Digital signature verifier, expects a registered kolmoblock doi. + + +The data literal attribute is a special one: if the data object contains only the data itself as part of the doi expression, many other attributes such as size or the hash ID are redundant. + +In the "Hello World" example there are two different MIME types defined. The concatenation function kolmoblock *cat* has the MIME type *application/wasm+kolmold*. This indicates that the *cat* kolmoblock contains program code written in the WebAssembly bytecode format. The dependency blocks that contain the strings that will be concatenated, are defined as type *text/plain*. + + +### Exec expression +The keyword *exec* is used for a formula expression. That is, to specify a data object that is an output of a kolmoblock code sandbox execution. The *cat* kolmoblock is an example of this and illustrates how a data object in the KolmoLD framework can be described as an output of an algorithm encoded in a WebAssembly-based format. + +The MIME type for the kolmoblock format is *application/wasm+kolmold*, which indicates that our implementation is based on the WebAssembly (wasm) module format. The standard is a subset of pure WebAssembly format, with non-deterministic operations taken out. + +In addition, the kolmoblock format provides the following format specifications: + +(1) **Entry point function**: wasm module's exported function *main* is used as an entry point for running the kolmoblock bytecode. The *main* function is expected to be of the type *-> i32*. + +(2) **Determinism**: KolmoLD expressions are deterministic and reproducible, however WebAssembly's 1.0 version is not fully deterministic. In order to address this disparity, KolmoLD outlines the unspecified behaviour of non-determinism. + +(3) **Input data object loaders**: exec expressions allow for denoting other data objects as input (or dependency) data objects for the execution at hand. The kolmoblock runtime is responsible for preloading the input data objects into the WebAssembly module's linear memory before the module's code execution and informing the WebAssembly code of the boundaries of the input data object's data in linear memory. + +The kolmoblock WebAssembly module's linear memory with the dependency blocks preloaded can be illustrated as: + + +![image](/img/wasm_kolmold.jpg) +*Figure 1: The dependency data objects are preloaded into kolmoblock's linear memory* + + +The dependency data objects are identified by their index number, starting with 0. The exec expression syntax is as follows: + +---- ``` -The consumer would be able to to retrieve the requested block sent in such format by: -1. running this block in an interpreter and saving the output into a file -2. validate that the output is indeed the requested data block by checking its hash -Sending data as the programs only marginally changes the total size of data sent, yet it provides unique capabilities. For example, this script outputs the same target block yet is smaller in size: +exec_expr ::= "exec" op_cap_expr* mem_size_expr* + [ op_cap: varint ] + [ mem_size: varint ] + [dep-exprs] +dep-exprs ::= doi_expr [dep-exprs] + +``` +---- + +* **wasm-module-cid**: The cid of the executable kolmoblock wasm-module, + +* **op-cap**: Operational limit specified in webassembly instructions evaluated, + +* **mem-size**: webassembly memory limit specified in the number of 64kB pages, -```python -def render(): - return "AB"*3 + "F" +* **dep-exprs**: Ordered list of dependency data objects. + + + +### Match expression + +A key feature of KolmoLD is the ability to express the same content as generated in multiple ways. The *match* expressions are used to state that multiple descriptions of data objects, whether exec or doi expression-based, are producing the same data object. + +Match expressions are similar to pattern matching expressions in functional programming languages such as Haskell. + +The match expression syntax is as follows: + +---- ``` -Now lets make it possible to reference and "import" other data blocks by their hash in such programs. +match_expr ::= match +expr ::= doi_expr | exec_expr +exprs ::= + +``` +----- + + +Let's say, we have the same data object being referenced with two different *dois* obtained with different hash functions: **'five spaces'** for the *hash_func1* function, and **'ffee45'** for the *hash_func2*. This can be stated with the following expression: -Consider distributing a data block **F3025:**: ”ABABABF-FBABABA”. The following script takes the block **CC646** as a dependancy block and outputs the target block **F3025:** +---- +``` + +(match, + (doi, 'five spaces', func-verify: hash_func1), + (doi, 'ffee45', func-verify: hash_func2) +) -```python -# kolmoblock -# target block: F3025 # -# dependancy blocks: CC646 -def render(): - return dep('CC646') + "-" + dep('CC646')[::-1] ``` +---- + +Let's say we need to express that the data object with **doi** '1' can be obtained by executing a kolmoblock with **doi** 'fibonacci'. -You could not just import raw data, but also "code" from the "codeblocks", the ones that contain the algorithms/logic. - -Consider a codeblock that contains Huffman-encoding decoding function: -```python -# codeblock 77650 -def f(huffman_tree, encoded_string): - output = ’’ - cur = 0 - while (cur < len(encoded_string)): - node = huffman_tree - while type(node) is list: - bit = encoded_string[cur] - cur += 1 - node = node[0] if bit else node[1] - output += node - return output +This can be done with the following match expression: + +---- ``` -We can reference and "import" that function in our kolmoblock for the same "ABABABF-FBABABA" string encoded with the Huffman tree: - ```python -# kolmoblock -# target block: F3025 -# dependancy blocks: 77650 -def render(): - huffman_decode = eval(dep(’77650’)) - huffman_tree = [’AB’, - [’BA’, - [’F’, ’-’] - ] - ] -encoded = 0b000110111110101010 -return huffman_decode(huffman_tree,encoded) + +(match, + cid('1'), + ( exec + cid('fibonacci') + ) +) ``` +---- + -More details can be found in the [documentation section](/documentation). +With the basic building blocks of **match**, **doi** and **exec** expressions, the manifest syntax provides primitives to describe the information about the underlying structure of the data shared across the network. + +--- +## CONCLUSION +KolmoLD offers new primitives for low-level content distribution networking. Network nodes can leverage them to share and distribute information about the character and underlying structure behind the data objects that make up the content shared. + +This enables network nodes to make better decisions on how the content should be retrieved that take into account the context of each given network node, leading to the MDL-based network communication. + +--- +## REFERENCES +###### 1. [n.d.]. Apt - Debian Wiki. https://wiki.debian.org/Apt Accessed 2019-04-08.KolmoLD: Data Modeling for the Modern Internet +###### 2. [n.d.]. A Future Internet Architecture. http://named-data.net/ Accessed 2019-04-09. +###### 3. [n.d.]. JavaScript Usage Distribution in the Top 1 Million Sites. https://trends.builtwith.com/docinfo/Javascript Accessed 2019-04-08. +###### 4. Juan Benet. 2014. IPFS - Content Addressed, Versioned, P2P File System. (072014). +###### 5. Bram Cohen. [n.d.]. BitTorrent.org. http://www.bittorrent.org/beps/bep_0003.html Accessed 2019-04-09. +###### 6. Walter Goralski. 2017. The illustrated network: how TCP/IP works in a modern network (2nd ed.). Morgan Kaufmann. +###### 7. Andreas Haas, Andreas Rossberg, Derek L. Schuff, Ben L. Titzer, Michael Holman, Dan Gohman, Luke Wagner, Alon Zakai, and JF Bastien. 2017. Bringing the Web Up to Speed with WebAssembly. SIGPLAN Not. 52, 6 (June 2017), 185–200. https://doi.org/10.1145/3140587.3062363 +###### 8. Ming Li and Paul Vitányi. 2014. Introduction to kolmogorov complexity and its applications (3rd ed.). Springer. NEAT’19, August 19, 2019, Beijing, China +###### 9. Thomas D. Nadeau and Kenneth Gray. 2013. SDN: software defined networks. OReilly Media, Inc. +###### 10. Afif Osseiran, Jose F. Monserrat, and Patrick Marsch. 2016. 5G mobile and wireless communications technology. Cambridge University Press. +###### 11. Larry L. Peterson and Bruce S. Davie. 2011. Computer networks - a systems approach (5. ed.). Morgan Kaufmann. +###### 12. Khalid Sayood. 2018. Introduction to data compression (5th ed.). Morgan Kaufmann. +###### 13. Lixia Zhang, Alexander Afanasyev, Jeffrey Burke, Van Jacobson, kc claffy, Patrick Crowley, Christos Papadopoulos, Lan Wang, and Beichuan Zhang. 2014. Named Data Networking. SIGCOMM Comput. Commun. Rev. 44, 3 (July 2014), 66–73. https://doi.org/10.1145/2656877.2656887 +###### 14. Q. Zhang, X. Wang, M. Huang, K. Li, and S. K. Das. 2018. Software Defined Networking Meets Information Centric Networking: A Survey. IEEE Access 6 (2018), 39547–39563. https://doi.org/10.1109/ACCESS.2018.2855135 \ No newline at end of file diff --git a/content/documentation/WASM.md b/content/documentation/WASM.md index 884a435..82b1b4f 100644 --- a/content/documentation/WASM.md +++ b/content/documentation/WASM.md @@ -4,6 +4,7 @@ fullTitle: "WebAssembly" name: "Mikhail Ignatovich" date: 2018-07-19 external_link: "" +weight: 3 draft: false --- diff --git a/content/documentation/article.md b/content/documentation/article.md deleted file mode 100644 index 95a5e80..0000000 --- a/content/documentation/article.md +++ /dev/null @@ -1,434 +0,0 @@ ---- -title: "KolmoLD" -fullTitle: "KolmoLD: Data Modeling for the Modern Internet " -name: "Dmitry Borzov" -date: 2018-07-19 -external_link: "" -draft: false -mathjax: true -menu: - concept: - parent: 'concept' - weight: 30 ---- - - ---- - -# KolmoLD: Data Modeling for the Modern Internet ---- -## ABSTRACT - -KolmoLD is a framework for content-addressable networking with a programmable layer: the data objects distributed across the network can contain program code to decode the data that is specified to be executed in a sandboxed environment and references other data objects based on a cryptographic hash function naming scheme. - -Traditionally, the algorithms to access given data are delivered via a channel that is separate from the one used to deliver the data itself, and usually involves installing custom software that could introduce security risks. In contrast, KolmoLD enables the delivery of the algorithms and the data along the same channel and ensures security. The KolmoLD approach unlocks the promise of the information-centered networks: enabling true data deduplication, data reuse and optimal data delivery. - - - -> ## CCS CONCEPTS -> Networks $\rightarrow$ High-Precision Networking; Programmable Networks - - -## INTRODUCTION - -Whether the messages sent across a network contain images, text, or voice, from the perspective of the network layer within the OSI model, they are abstracted out to indistinguishable strings of bytes with a destination address. This design principle, the separation of concerns in the representation of data between the application layer and the lower abstraction levels, has been cited as one of the key principles that enabled the Internet's success $[11]$. - -And yet, with time, the industry grew to learn the limitations inherent in this concept. It transpired that not all data is equal when it comes to networking. For example, it could be argued that VoIP traffic is more sensitive to high latency than, say, traffic for a routine software update, and thus should be given priority in processing, all other things being equal. The necessity to differentiate the categories of traffic was one of the major motivating factors behind the rise of the family of technologies referred to as software-defined networking (SDN) $[9]$. - -How can content distribution, one of the key traffic categories, be enhanced if the network was somehow cognizant of the meaning of the content shared? The concepts of algorithmic information theory, specifically the minimal description length (MDL) principle formulated by Rissanen, provides the theoretical basis for fundamentally more powerful network communication. According to this, optimal communication involves considering both the character of the message and the context the data recipient has $[8]$. - -While the practicality of applying such an approach to the problem of content delivery networking might sound far-fetched at first, there are, in fact, some well-known design patterns implemented in domain-specific content distribution systems that can be interpreted as its embodiment, even in a relatively basic way. - -One example is software package distribution managers, such as the Advanced Package Tool (APT) system $[1]$ for the Debian Linux family of operating systems. Such systems are responsible for downloading and installing software packages from a central package repository. It is common that a digital asset, such as a specific version of a software library, is used in multiple packages. Consider a scenario where the same library is used in two packages: package A and package B. If the user, who already has package A installed, wants to install package B, the APT client program code can recognize that a given library is already locally available, and thus only request and download the parts of the package it actually needs. In other words, leveraging the aspects of the meaning of the content at hand, in this case the inner structure of the software packages, leads to avoiding unnecessary traffic. - -Another example is the encoding negotiation performed as part of the HTTP $[6]$. The browser announces to the server a list of the data compression formats that it supports with a special HTTP request header. This enables the server to select the correct encoding software for the content that the browser requested, with the certainty that the browser will be able to decode and render the content correctly. - -This illustrates how being able to reference different data formats and incorporating the fact that the same message can be represented in different data formats leads to smarter communication. - -Note how there is nothing specific to the domains at hand (the software packages or web content) in both examples presented. The optimizations described in the examples are general in nature and can be applied to any data shared across the network. - -It is therefore reasonable to ask whether it is possible to design a general-purpose content delivery networking technology that introduces primitives to reflect some aspects of the meaning of the content being shared. In other words, whether content delivery can be empowered by being based on the MDL principle at the network level. - -KolmoLD, the framework being presented in this paper, aims to be such a design. - ---------------- - -## CONCEPTUAL OVERVIEW - -This section aims to provide a high-level conceptual overview of the KolmoLD framework as well as introduce core terminology. Note that more specific technical details can be found in the design section. - -In our experience, the best way to introduce the core ideas behind our proposal is with the following primer analogy. - -Pirate S obtains a treasure map and she needs help to find the treasure, but her pirate associates are dispersed across the Caribbean. She needs to send them the location of the treasure, so that they can meet her there. - -One way to achieve that is to send out a copy of the treasure map to each pirate. However, pirate S realizes that it could be unnecessary in some cases. She is aware that some of her associates might have access to the map of the treasure island, albeit without the "X" marking the treasure spot. All they might require is a way to identify the map they need and directions to the treasure's location. Others are more knowledgeable and can read navigational charts: these pirates just need the coordinates. - -In other words, pirate S only needs to provide the pirates with enough information for them to reconstruct the treasure map on their side. But which pirate needs what? Enquiring what each pirate needs might incur a big overhead and is something pirate S does not want to do. - -What pirate S does instead is produce a \textbf{manifest} for the treasure map and sends it to each of her associates: - ---- - -> The treasure map has the hash "A34F6". -> It can be constructed from these two formulas: -> - those who have a map of an island with the hash "52347" can mark the spot fifty feet South of Spyeglass Hill and they shall have the map. -> - those familiar with the navigational charts specified in the textbook with the hash "45006" will find the treasure at "23.8'N 82.23'W" - ---- - -The **formulas** are ways to construct the manifest's target **data**, in this case the treasure map. Each formula contains references to other data objects in the form of cryptographic hash function-based data object identifiers (or **DOI**s). - -Upon receiving the manifest, pirate A realises that her sister has the map of the treasure island, asks her for it, and fills in the location of the treasure to construct her treasure map. Pirate B uses the textbook "45006" and marks the location of the treasure using the specified coordinates. She can be sure that she interprets the coordinates given correctly as she could verify the hash of her navigation reference. - -Pirate S was able to optimize the distribution of the treasure map to pirates by distributing information about the properties of the content at hand, which other pirates were able to leverage to find the optimal way to reconstruct the data they were after. - -As in the case of the treasure map, any content can be described with manifests within KolmoLD. Such descriptions involve DOIs to other data objects and reflect the content's inner structure. The naming scheme based on the cryptohash function makes such references unambiguous and verifiable. The set of such manifests is free from the particulars of the underlying network such as, say, the node IDs, and is only based on the data it describes. In that sense, KolmoLD's manifest syntax can be seen as a data modeling language. - -The reference to the navigation textbook is a special type of data object introduced in this methodology that is interpreted as an algorithm to be applied to the referenced data as part of the manifest description. These special data objects are referred to as **kolmoblocks**. Such algorithms can specify the decoder for the data compression format used, for example. - -Altogether, network agents could potentially read and comprehend the data model specified with KolmoLD and make smarter decisions on how to reconstruct the content they are seeking, achieving a conceptually more powerful content delivery network. - ---- - -## BACKGROUND - -This section offers an overview of the key academic results, as well as products and projects relevant to the discussion. - - -### The MDL principle - -The concepts of Shannon entropy and Kolmogorov complexity are often quoted as the bases for two major approaches to information theory. - -As the approach based on Shannon entropy offers stronger framework for quantitative analysis, it tends to see wider adoption. However, the assumptions that come with the concept of Shannon entropy tend to break apart when they are applied to complex challenges in networking. Specifically, the concepts of the abstract source and constant probabilistic distributions do not fit well with the challenges of engineering practical content delivery networking. - -Some recent developments in networking theory illustrate this. A new field of network encoding has emerged in recent years , where the shape of the network is taken into account in message encoding to achieve more efficient communication. The theory attracted considerable interest and contributed to aspects of the 5G family of technologies. Yet this domain cannot be described within the Shannon entropy-based methodology. - -We propose using algorithmic information theory (based on the concept of Kolmogorov complexity) instead, and specifically, the minimum description length (MDL) principle formulated by Rissanen as the theoretical foundation to discuss and formulate our approach\cite{li_vitanyi_2014}. - -Let us give a minimal overview of how it applies to our case. - -We start with a network node seeking to retrieve our target data object $D$. We are given a set of models $M$ that can be used to characterize $D$. The following cost functions are specified: - - 1. t:traffic, the amount of data to be retrieved from the network, - 2. c:computational cost, - 3. m:computational memory requirements cost. - - -A given *Mi* model from the set *M* can be applied to describe (that is, encode) *D*, with the efficiency described with a cost vector: - - - $\overrightarrow{C}$$(Mi(D))$ = $(t{Mi(D)}, c{Mi(D), m{Mi(D)}})$ - - - -However, this is not the full cost of applying $Mi$. The node also needs to be able to recognize the model applied and use it to decode the target $D$. If the node already has the $Mi$ decoder, maybe as a locally cached kolmoblock, it only needs a reference to that kolmoblock. Otherwise, the node has to first retrieve the full description of the decoder first, in the form of downloading the kolmoblock. This cost of the $Mi$ description does not depend on the target $D$ and can be represented as follows: - -$\overrightarrow{R}$$(Mi)$=$(t${Mi}$, c${Mi}$, m${Mi}) - - - -Given both cost vectors for each $Mi$ in $M$, the network node can find the optimal model for obtaining the data object. - -The total cost for the network node is: - - -$\min_{\forall i \varepsilon M}{\bigcup}${$\overrightarrow{C}\{M_i, D\}+ \overrightarrow{R}\{M_i\})$} - - - - - Here $\bigcup$ is the utility function for the given network node that reflects the trade offs between different types of costs specific to this given node. - -The models represented above could be any method used to describe data, for example a data compression algorithm, such as the LZ-76 compression $[12]$. Models could also reproduce a specific data object by leveraging other data objects on the network. For example, the data object that represents the string 'hello world' can be described by a model that appends the data object "world" to the data object "hello". - -In other words, $\vec{R(M_i)}$ is the cost of retrieving the data objects needed to construct the $Mi$ model: - -$R$($\overrightarrow{M}i)$=$\min_{\forall i \varepsilon M}{\bigcup}$$\{\overrightarrow{C}(M_j, M_i)+\overrightarrow{R}(M_j)\}$ - - - -The MDL principle suggests that the key to optimal communication is the ability to calculate the cost vectors for the set $M$ and target $D$ **for each given network node**. The KolmoLD design offers network primitives that allow for sharing the information necessary to calculate the cost vectors. - -### Information-Centric Networks - -The network protocols $[2]$ that form the backbone of the Internet are based on the telephony network model: they enable point-to-point communication between two nodes by specifying the format for messages to be relayed from one node to another. As the Internet matured, the point-to-point network communication pattern gave way to the dominance of other ones such as, notably, content delivery, where one provider is tasked to deliver the same content to multiple consumers. - -While the telephony network model has proven to be resilient and powerful in numerous contexts, its advantages are more limited when used for pure content distribution. These complications can be compared to those that will arise when trying to broadcast television or radio content over the telephone network. - -An alternative network communication paradigm, called the information-centric network (ICN), has been proposed as a viable alternative for content distribution $[14]$. On a point-to-point network, a node that wants to obtain a specific data object must send a request to another node that is known to have that data object. In contrast, on an ICN the requests are not defined by the addressee, but by the content (more specifically, the data object) they are looking for. Such content requests can then be served by any node that has the content. - -One way to illustrate the advantages of the ICN model is to consider the problem of caching JavaScript libraries. While each web application's JavaScript code is ultimately unique, it usually includes JavaScript libraries that are built to contain elements and functionality that commonly needed within web applications. As a result, multiple websites might include the same version of a library and such libraries commonly comprise a sizeable portion of the overall JavaScript code. According to one analysis $[3]$, this practice is so common that it is estimated that up to 72\% of the websites use at least one of these JavaScript libraries. Even more strikingly, 79\% of those websites use the most common JavaScript library, jQuery. - -As a best practice, the web developers prefer hosting all the JavaScript code and libraries they require on their own, rather than referencing the shared URLs, to minimize security issues and avoid dependencies on third parties for their end-user experience. As the end-user navigates from one website to the next then, the browser is forced to download the same copy of the JavaScript library anew. - -ICN's architecture is designed to gracefully handle situations such as these. Under ICN, a given JavaScript library can be referenced not by a URL, but by a content identifier. The copy can then be obtained from the closest node willing to serve it, and cached and re-used on local devices. - -ICN received considerable academic interest, resulting in multiple proposals and implementations. Named Data Networking (NDN) \cite{zhang_ndn} is an example of a high-profile project \cite{ndn_project}. - -### What's in a name? - -An ICN architecture proposal must specify numerous aspects of the problem: routing, data discovery, scaling $[14]$. One design decision is at the core of all these aspects and can be used as the basis for the classification of ICN proposals: the naming scheme. This is specifically how the data objects in the network are referenced and identified. - -Using cryptohash functions as the basis for the naming scheme is one of the options. Cryptohash naming is a well-known design pattern also referred to as the content-addressable storage pattern. Some well-known software projects that can be viewed as ICN proposals, IPFS $[4]$ and BitTorrent $[5]$, adopt a hash-based naming scheme. - -On the one hand, this approach provides the following advantages: - - **(1) Universal naming convention**: the hash-based name is unambiguous, global, and self-contained; it does not depend on the context within which it is used; - - **(2) Does not depend on the circumstances of origination**: The only thing that defines the hash-based name is the data itself; not the circumstances behind who, when and with what purpose produced or published the given data object; - - **(3) Self-certifying name**: It is easy for the data receiver to verify whether the data object matches the name; - - -On the other hand, the drawbacks of this approach are as follows: - - **(1) Off-by-one errors**: while data reuse is often quoted as an advantage of the approach discussed, its impact has been deemed relatively limited, according to some quantitative analysis research. The reason is fundamental: two data objects that are only different by a bit will be identified as two different data objects with two distinct hashes. There is no way for the network to recognize and leverage the case where two objects contain related information in any way; - - **(2) No metadata**: the data by itself is virtually useless without the information concerning the data type or data format that is used. - - -We have illustrated that KolmoLD also adopts the hash-based naming scheme. Yet, the other contribution KolmoLD makes is providing the ability to encode arbitrary algorithms that are pertinent to the data object at hand. This means that similar data objects can be linked together with formulas describing ways to construct one data object from a related one, meaning that the decoder for the given file format can be described in an algorithm data object and referenced within the manifest. - -Thus, KolmoLD's approach brings about the advantages of hash-basing naming while mediating its drawbacks. - -### The right codec -As data compression format (or codec) choice mechanics is the key for efficient content delivery networking we argue that it should be incorporated as a component of network design. - -Choosing the optimal codec goes beyond achieving just the highest compression ratio $[12]$: application context and end-device capabilities also need to be considered. For example, low-power devices with limited processing power would struggle to run computationally expensive decoders. Some other codec features that might influence the ultimate choice are domain specific, such as progressive download functionality that could display a lower resolution of an image to the end-user once a part of the whole data object has been downloaded. A higher resolution of the image will then be shown as the rest of the data is received. - -In general, the more assumptions that can be made about the data, the better codec can be designed. The new generation of compression algorithms provide tooling to leverage this. For example, zstandard allows you to "train" your codec by feeding it the samples of data you would like to compress and building up a special data structure, a dictionary, of frequently encountered tokens. - -Another example of the power of highly-specialized codecs can be illustrated with online multiplayer video games. The game architecture consists of a game server that keeps track of the state of the shared virtual environment. The information it needs to keep track of is specific to this online video game: player navigational inputs and player inventory, for example. The game server will send this information to all participating player client applications. Both the client and the server's program code is highly tuned and optimized to only send the absolute minimum of the data required. The client application then uses it to render the video stream that constitutes the game environment. Thus, the video game program code can be thought of us a special type of video codec: the one that only works for the video stream of that given video game. The more narrow the type of data we target, the more efficient compression we might achieve. - -Based on that, it would seem reasonable to expect a plethora of narrow-use domain specific data compression formats and codecs available in the marketplace. For example, video codecs that are tailored for specific domains, such as videos with solely e-learning content. - -Instead, software engineers are usually stuck with several decades-old general-purpose data compression formats: an LZ-based compressor for structured data such as text and bytecode; jpeg or png for images; and so on. - -The reason is directly related to the effort required to see adoption of new codecs in the industry. Not only does the developer of a new codec have to design and implement one, she also needs the end-user agent to be able to recognize and use it. That usually means that new software supporting the new codec must be deployed on the end-user system before an application can use it. - -KolmoLD takes an approach that sidesteps this challenge altogether: what if you could distribute the decoder algorithm along with the data itself? This would enable the content publisher to select the codec that works best for the given data type without worrying about the codec's support. The recipient would be able to download the decoding algorithm just like any other content and use it to decode the given data. - -In this scenario, running the decoder algorithm effectively amounts to executing untrusted program code, which raises security and performance concerns. The execution environment needs to be sandboxed from the host machine with all the security issues carefully considered and addressed. - -Fortunately, there are developer communities who have gained experience addressing these challenges. The Ethereum Virtual Machine is an example of deterministic execution of Turing-complete programs that includes references to other code based on the hash naming scheme. Browser-based encapsulated execution of JavaScript code is another example. - -KolmoLD proposes a specification for such a sandboxed environment where Turing-complete algorithms can be executed in a runtime environment that is only able to access other input data objects, and outputs the target data object. - -The practical implementation for it is built on top of WebAssembly \cite{webassembly-spec}. This allows us to tap into the research and expertise accumulated in that community to solve a similar problem. - -## DESIGN -At its core, KolmoLD provides primitives to describe the relations between the data objects shared and distributed across the network. The solution consists of the following components: - - **(1) KolmoLD manifest syntax** that allows network nodes to describe the data objects they handle, and their relations with other data objects. - - **(2) WebAssembly-based kolmoblock format** $[7]$ to encode arbitrary programs that can be executed to retrieve data objects. - **(3) A versioned registry** of approved and supported WebAssembly-based modules that contain codecs and algorithms. - - -Consider the data objects in the following "Hello World" example: - -![image](/img/hello_world.jpg) - -The "Hello world!" string can be constructed in two ways: by concatenating strings "Hello " and "world!", or by concatenating strings "He" and "llo world!". - -This basic data model consists of the following data objects: - - 1. kolmoblock "cat", which encodes the logic to concatenate input strings; - 2. strings "Hello world!", "He", "llo world!", "world!" and "Hello". - ->This model can be described with the following manifest for the target string ">Hello world!" with the hash "B5D40" (we will be using only the first 5 digits >of the hex representation of SHA-256 in our examples): - ----- -``` - (match, - (doi, "B5D40"), - (exec - (doi, "cat", - type: "application/wasm+kolmold" - ), - (doi, - type: "text/plain", - data: "Hello ", - ) - (doi, - type: "text/plain", - data: "world!", - ), - ) - (exec - (doi, "cat", - type: "application/wasm+kolmold" - ), - (doi, - type: "text/plain", - data: "He", - ) - (doi, - type: "text/plain", - data: "llo world!", - ), - ) - ) -``` ------ - - -The data object is a key abstraction of the KolmoLD methodology. It refers to any identifiable chunk of data of arbitrary (up to a multiple of bytes) size. The KolmoLD manifests allow for describing the properties of the data object, called a kolmoblock, and the relation between kolmoblocks. - -The manifest syntax supports three types of expressions: doi, exec and match. All were used in the example above. - -Or, more formally, in the Extended Backus-Naur form (EBNF): - ----- -``` -kolmold_expr ::= { doi_expr | exec_expr | match_expr } -``` ----- - - -We will now review each in more detail. - -### DOI expression -Data objects are identified by the data object identifier, or the Data Object Identifier(DOI). The doi expression allows for referencing a specific data object. - -The doi expression syntax is defined as follows: - ----- -``` -doi_expr ::== doi [] - [ type: ] - [ size: ] - [ data: ] - [ publisher: ] - [ sign: ] - [ func-verify: kolmoblock_expr ] - [ func-sign: kolmoblock_expr ] -``` ----- - - -The first argument is the cryptohash-based ID. Next follow optional attributes that provide additional metadata about the data object being described: - - -**- type**: MIME type-based data object type, - -**- size**: size of the data object in bytes, - -**- data**: Data literal of the data object, - -**- publisher**: public key used for digital signature. - -**- sign**: the digital signature. - -**-func-verify**: Verifier function used, expects a registered kolmoblock doi, - -**-func-sign**: Digital signature verifier, expects a registered kolmoblock doi. - - -The data literal attribute is a special one: if the data object contains only the data itself as part of the doi expression, many other attributes such as size or the hash ID are redundant. - -In the "Hello World" example there are two different MIME types defined. The concatenation function kolmoblock **cat** has the MIME type **$application/wasm+kolmold$**. This indicates that the **cat** kolmoblock contains program code written in the WebAssembly bytecode format. The dependency blocks that contain the strings that will be concatenated, are defined as type $text/plain$. - - -### Exec expression -The keyword **exec** is used for a formula expression. That is, to specify a data object that is an output of a kolmoblock code sandbox execution. The **cat** kolmoblock is an example of this and illustrates how a data object in the KolmoLD framework can be described as an output of an algorithm encoded in a WebAssembly-based format. - -The MIME type for the kolmoblock format is **$application/wasm+kolmold$**, which indicates that our implementation is based on the WebAssembly (wasm) module format. The standard is a subset of pure WebAssembly format, with non-deterministic operations taken out. - -In addition, the kolmoblock format provides the following format specifications: - -**(1) Entry point function**: wasm module's exported function **main** is used as an entry point for running the kolmoblock bytecode. The **main** function is expected to be of the type *-> i32*. - -**(2) Determinism**: KolmoLD expressions are deterministic and reproducible, however WebAssembly's 1.0 version is not fully deterministic. In order to address this disparity, KolmoLD outlines the unspecified behaviour of non-determinism. - -**(3) Input data object loaders**: exec expressions allow for denoting other data objects as input (or dependency) data objects for the execution at hand. The kolmoblock runtime is responsible for preloading the input data objects into the WebAssembly module's linear memory before the module's code execution and informing the WebAssembly code of the boundaries of the input data object's data in linear memory. - -The kolmoblock WebAssembly module's linear memory with the dependency blocks preloaded can be illustrated as: - - -![image](/img/wasm_kolmold.jpg) -*Figure 1: The dependency data objects are preloaded into kolmoblock's linear memory* - - -The dependency data objects are identified by their index number, starting with 0. The exec expression syntax is as follows: - ----- -``` - - -exec_expr ::= "exec" op_cap_expr* mem_size_expr* - [ op_cap: varint ] - [ mem_size: varint ] - [dep-exprs] -dep-exprs ::= doi_expr [dep-exprs] - -``` ----- - - **- wasm-module-cid**: The cid of the executable kolmoblock wasm-module, - - **- op-cap**: Operational limit specified in webassembly instructions evaluated, - - **- mem-size**: webassembly memory limit specified in the number of 64kB pages, - - **- dep-exprs**: Ordered list of dependency data objects. - - - -### Match expression - -A key feature of KolmoLD is the ability to express the same content as generated in multiple ways. The **match** expressions are used to state that multiple descriptions of data objects, whether exec or doi expression-based, are producing the same data object. - -Match expressions are similar to pattern matching expressions in functional programming languages such as Haskell. - -The match expression syntax is as follows: - ----- -``` - -match_expr ::= match -expr ::= doi_expr | exec_expr -exprs ::= - -``` ------ - - -Let's say, we have the same data object being referenced with two different \textbf{doi}s obtained with different hash functions: **'five spaces'** for the **hash_func1** function, and **'ffee45'**} for the **hash_func2**. This can be stated with the following expression: - ----- -``` - -(match, - (doi, 'five spaces', func-verify: hash_func1), - (doi, 'ffee45', func-verify: hash_func2) -) - -``` ----- - -Let's say we need to express that the data object with **doi** {'1'} can be obtained by executing a kolmoblock with **doi** 'fibonacci'. - -This can be done with the following match expression: - ----- -``` - -(match, - cid('1'), - ( exec - cid('fibonacci') - ) -) -``` ----- - - -With the basic building blocks of **match**, **doi** and **exec** expressions, the manifest syntax provides primitives to describe the information about the underlying structure of the data shared across the network. - - -## CONCLUSION -KolmoLD offers new primitives for low-level content distribution networking. Network nodes can leverage them to share and distribute information about the character and underlying structure behind the data objects that make up the content shared. - -This enables network nodes to make better decisions on how the content should be retrieved that take into account the context of each given network node, leading to the MDL-based network communication. - diff --git a/content/documentation/learning_resources.md b/content/documentation/learning_resources.md index 2e230f9..33ed0b7 100644 --- a/content/documentation/learning_resources.md +++ b/content/documentation/learning_resources.md @@ -4,7 +4,7 @@ fullTitle: "Learning resources" name: "Dmitry Borzov" date: 2018-07-15T18:02:42-04:00 external_link: "" -weight: +weight: 1 draft: false --- diff --git a/content/documentation/manifest.md b/content/documentation/manifest.md index 3d6e6d8..b6f74c7 100644 --- a/content/documentation/manifest.md +++ b/content/documentation/manifest.md @@ -4,6 +4,7 @@ fullTitle: "Manifest" name: "Mikhail Ignatovich" date: 2018-07-19 external_link: "" +weight: 2 draft: false --- diff --git a/content/documentation/whitepaper.md b/content/documentation/whitepaper.md deleted file mode 100644 index 778573b..0000000 --- a/content/documentation/whitepaper.md +++ /dev/null @@ -1,290 +0,0 @@ ---- -title: "whitepaper" -fullTitle: "Kolmoblocks: composability for content distribution based on Merkle DAGs" -name: "Dmitry Borzov" -date: 2018-07-15T18:02:42-04:00 -external_link: "" -weight: -draft: false ---- - -Kolmogorov data blocks (or kolmoblocks) is a data block serialization format for content- -addressable network protocols based on cryptohash naming scheme. Kolmoblocks can be thought -of as scripts for a Turing-complete DSL that take other Merkle data blocks as an input and out- -put the target data block. Its design assures block composability, deterministic reproducibility and -eliminates the problem of serialization format versioning hell. - - -## Table Of Contents - -- [Introduction](#introduction) -- [Background](#background) - - [Content Addressable Network Protocols](#content-addressable-network-protocols) - - [Block Composability](#block-composability) - - [The Great Divide](#the-great-divide) -- [Design](#design) - - [Blocks as Code](#blocks-as-code) - - [Kolmoblock Language](#kolmoblock-language) - - [Performance vs Expressiveness](#performance-vs-expressiveness) - - [Lambdablocks](#lambdablocks) - - [Gas](#gas) - - [Header](#header) - - [Clients and Servers](#clients-and-servers) -- [Applications](#applications) - - [Origin Agnostic CDNs](#origin-agnostic-cdns) - -## Introduction - -Content-addressable network protocols based on hash naming scheme (hash-based CANPs), such as IPFS[[1](/~https://github.com/ipfs/specs)] or dat[[2](https://www.datprotocol.com)], provide unique advantages over traditional host-addressable network protocols. The nodes of such networks are able to request and retrieve content by its cryptographically secure hash from any other peer node. -Current proposals for hash-based CANPs offset their promise with unique challenges: - -1. **no composability:** Let’s say the target blob is a slight iteration over already distributed one: it could be just a single bit flip. The content-addressable network based on hash-naming scheme would treat such a block as a completely new one; there is no capability to signal that a target block can be composed out of the other data blocks. -2. **format versioning hell:** the context of the block’s origination (such as what producer software version was used, the assumptions on the target consumer software and so on) defines the chosen block’s serialization & compression format. Yet, the channels for distribution of both producer and consumer software are usually different from the channels used to distribute/share the data itself. That leads to the unintended effect of non-reusable and non-deserializable data blocks outside of their immediate use. - -Kolmoblocks (Kolmogorov data blocks) is a block serialization format that addresses these issues. Kolmoblocks enable free-form block composability and selfdocumentation of the block’s format by design. -Kolmogorov blocks are programming scripts for a special DSL (referred to as the kolmoblock language) that can take other kolmoblocks as an input and output (or ”render”) the target block. Kolmoblock language interpreter specification makes kolmoblocks immutable, reproducible and Turing-complete. - -## Background - -This section outlines the context behind the problem and the proposition value the kolmoblock design attempts to address. It also de nes the terminology used -in the article. - -### Content Addressable Network Protocols - -The network protocols at the backbone of today’s Internet (IP, TCP, HTTP) are based on the telephony’s model: they enable point-to-point communication across the network. However, the overwhelming share of today’s networks’ bandwidth is dedicated to what can be called content distribution: consumers retrieving identifiable blocks of data (blobs) such as software, text or multimedia. - -Just like the telephone network is a poor vehicle for traditional broadcast content distribution technologies such as cable television, host-addressable nature of the current protocol stack leads to network inefficiencies. - -Consider, for example, a situation of several people in a room where each is using their own device to watch the same video. Each device will have to request and download its own copy of the same video from the provider’s data center, as there is no way for the local devices to coordinate and share the retrieval of the content. - -Another example is a user of a cloud file storage service such as Dropbox trying to access the document she has on her laptop from her phone. It’s reasonable to expect that the requested data will have to make a round trip to a cloud provider’s server on the other side of the Globe even though the source of the data could be less than a meter away from its destination. - -Content-addressable network protocols (CANPs) address inefficiencies like these by letting the network nodes request the network by its content, not the hosting provider. Since content consumers only care about what they get, not who they get this from, CANPs provide natural advantages for content distribution. - -Different proposals exist for the content naming scheme to be used for CANPs. The naming scheme where the cryptographic hash of data is used to identify content in CANPs (or hash-based CANPs) is an elegant solution that provides some unique advantages over the other proposals: - -1. **self-certifying:** Since the content’s authenticity can be checked, consumers don’t need to extend their trust to the nodes they interact with in order to retrieve the content from them -2. **data reuse:** hash-based naming means data’s identity does not include any specifics on its publisher, hash-based naming leads to data reuse by design. For example, common datasets such as the Human Genome Project datasets or the UTF8 codepoint mapping table maybe be shared, published and consumed by multiple applications and agents that may not even be aware of each other’s existence. - -Hash-based CANPs are a perfect fit for content distribution. - -### Block Composability - -Lets look into an example of using hash-based CANPs for distributing a computer operating system (OS) distro (such as Ubuntu Linux). The OS developer publishes the hash of the distribution of the most recent release so that the other network nodes may retrieve it based on this hash from any network peer that is willing to serve it to them. - -Now, let’s say, one day a security issue within the OS distribution was identified and the OS developer issues a security update to their OS distribution. They make the necessary fix, distribute the new version and update their published hash of the most recent OS release. - -Even though the updated distro can differ from the previous version by a single bit flip, since the cryptographic hash of the new version is completely different, the network would treat the new data block as completely new data. - -As a result, a consumer that might have the old version of the OS distribution and would have been able to reproduce the new version by applying a small diff to the version it already has, is forced to download the new version all over again. - -To help with situations like these, the network protocol needs to have a mechanism to communicate to the nodes that it is possible to compose a data block from the previously distributed blocks. - -Within our example, the network peers can be advised that there are two ways to obtain the new distro’s data block: either download the whole block, or only download the diff (that can be orders of magnitude smaller in size) and apply it to the data block of the old version of the distribution. - -With such a feature, the consumers that happen to already have the old version of the distribution would probably opt for only downloading and applying the diff, and the consumers that don’t have the old version (because they joined the network later, for example) would still be able to download the new version as one data block. - -Support for being able to convey that data blocks can be composed out of other data blocks will be referred to as block composability. - -### The Great Divide - -There is a great divide in computing of today: the data and the software to deserialize and interpret it are distributed along two very different channels. - -Data only has value if the consumer has access to the software that supports the serialization format used for it. However, installing software is not an easy ordeal as it means user have to trust both the original software developers and the software distribution system they rely upon. - -Data and the specification of the serialization format it relies upon needs to be distributed along the same channel. - -## Design - -This section provides an architectural overview of kolmoblocks. - -### Blocks as Code - -Let’s choose to take the first 5 digits of SHA256 of data blocks as our illustrative cryptographic hash function. For example, ”ABABABF”, an ASCII-serialized string, can be can identified by its hash **CC646**. - -Nodes of the hash-based CANP network are designed to be able to request their peers for this data block by its hash, and the peer node to send the requested data in response. - -Let’s consider the scenario where the serving node sends the requested block to the consumer not in a raw byte form, but as the following dynamic programming language script (say, a Python script): - -```python -render = lambda : "ABABABF" -``` - -The consumer would still be able to to retrieve the requested block sent in such format by: - -1. running this block in an interpreter and saving the output into a file -2. validate that the output is indeed the requested data block by checking its hash - -Note that sending the data as the programming script only marginally changed the total size of data sent, yet it provides unique capabilities. For example, this script outputs the same target block yet is smaller in size: - -```python -render = lambda : "AB"*3 + "F" -``` - -Sending data as programs that output the target block is the key idea behind the kolmoblocks. Just like in the contrived example above, the consumers run kolmoblocks in a dedicated sandboxed environment to retrieve the target block and validate its authenticity by comparing the hash. - -Now consider distributing the block **F3025:** ”ABABABF-FBABABA”. The following script takes the block **CC646** as a dependency block and outputs the target block **F3025:** - -```python -# kolmoblock -# target block: F3025 # -# dependancy blocks: CC646 -def render(): - return dep('CC646') + "-" + dep('CC646')[::-1] -``` - -Kolmoblocks enable block composability: being able to refer and access other data blocks in its code. Such blocks are referred to as block dependencies and need to be rendered and evaluated in order for the target block to be rendered. - -### Kolmoblock Language - -Kolmoblocks cannot be written for an existing general purpose programming interpreter. In order to assure security and reproducibility, the kolmoblock interpreter needs to be run in a sandboxed execution environment. Such a domain-specific interpreter and the programming language it relies upon is referred to as **kolmolang** (a Kolmogorov block domain-specific language). -Kolmolang’s domain of use defines the features it needs to have: - -1. **homoiconic macros:** in Kolmoblock language, the code is a just a data structure abstract syntax tree (AST) of the interpreter is the data structure -2. **deterministic virtual state machine:** kolmoblocks enable deterministic, reproducible rendering across platforms and nodes. Kolmoblock language is parsed in a domain-specific virtual machine, defined by the set of recognizable opcodes - -Kolmolang is fundamentally a Turing-complete dynamic programming language of Lisp programming language family [[4](https://arxiv.org/abs/1608.02621)] which makes it meet these requirements. - -### Performance vs Expressiveness - -Among other things, the design decisions behind kolmoblocks were informed by the lessons from the other domain specific virtual machine environments: Javascript, -JVM, Webassembly and Ethereum Virtual Machine. - -### Lambdablocks - -So far, we have introduced kolmolang-based programs in the context of kolmoblocks, the point of which was to encode target blocks. We can also have general purpose kolmolang code in the rendered (target) data blocks as well. - -Let’s call data blocks that have kolmolang code lambdablocks to highlight their properties functional programming. Just like the namesake concept of lambda functions, lambdablocks can be considered anonymous in the sense that their identifier (the cryptohash of the code) is based on what the code does, that is on its essence, not -the context it was created for. -The following lambdablock contains a function that returns n-th Fibonacci number: - -```python -# lambdablock 509ED -def f(n): - if n == 0: return 0 - elif n == 1: return 1 - else: return f(n-1)+f(n-2) -``` - -Lambdablock code can be imported and executed within kolmoblocks. This lets us separate and reuse decoding code. -As an example, consider the kolmoblock in which the data is compressed with Huffman encoding: - -```python -# kolmoblock -# target block: F3025 -# dependancy blocks: 77650 -def render(): - huffman_decode = eval(dep(’77650’)) - huffman_tree = [’AB’, - [’BA’, - [’F’, ’-’] - ] - ] - encoded = 0b000110111110101010 - return huffman_decode(huffman_tree,encoded) -``` - -This kolmoblock imports and calls the lambdablock that contains the decoding logic: - -```python -# lambdablock 77650 -def f(huffman_tree, encoded_string): - output = ’’ - cur = 0 - while (cur < len(encoded_string)): - node = huffman_tree - while type(node) is list: - bit = encoded_string[cur] - cur += 1 - node = node[0] if bit else node[1] - output += node - return output -``` - -### Gas - -Turing completeness of the kolmoblock language leads to an issue of the halting problem: a malicious kolmoblock may overload the block renderer by making it run in an infinite loop. - -To address this issue, the kolmoblock’s header is required to specify block’s ”gas” value: a metric on the computational resources required to render the block. The kolmoblock will be discarded and ignored by the block renderer if it is found to exceed it’s advertised gas limit. - -### Header - -Kolmoblock’s header contains metadata the consumers need for an informed decision on whether they should retrieve and render it: - -1. **hash function lambdablock** specifying the cryptographic hash function used to calculate the data block’s target hash -2. **hash of the target data block:** the hash of the target data block that kolmoblock outputs if rendered. Note that multiple kolmoblocks can have the same target data block. -3. **size** of the kolmoblock -4. **list of hashes of dependancy data blocks:** the hashes of data blocks that are needed for the kolmoblock rendering. -5. **gas:** an advertised metric on the computational resources needed to render the block -6. **cryptographic signature:** of the block producer - -When a kolmoblock-based CANP node expresses an interest in a data block, the peers only return headers of the kolmoblocks that have that data block as a target. The requesting node is then able to use the header metadata to make a decision on which kolmoblock it would choose to pursue by an algorithm such as the following: - -```python -def evaluate_rendering_cost_of(kolmoblock): - total_gas = 0 - total_bandwidth = 0 - if is_rendered(kolmoblock): - return total_gas, total_bandwidth - total_gas += kolmoblock.gas - if not is_fetched(kolmoblock): - total_bandwidth += kolmoblock.size - for dep in kolmoblock.dependencies: - dep_gas, dep_band = evaluate_rendering_cost_of(kolmoblock) - total_gas += dep_gas - dep_band += dep_band - return total_gas, total_bandwidth -``` - -Kolmoblock header can have an invalid information, whether due to malicious intent or an error. It is in the best interest of kolmoblock servers and consumers to **validate** all kolmoblocks they handle and discard and ignore the invalid ones. -Validating a kolmoblock includes: - -1. running the block and validating that the output has the advertised target hash -2. making sure that the advertised gas limit is not exceeded - -### Clients and Servers - -While specifying what kolmoblock-based CANPs might look like is outside of the scope of this paper, it is illustrative to think through some of the practices networks that use kolmoblocks might adopt. -Let’s say a kolmoblock-based CANP client node is interested in fetching a data block. It announces its interest to the other nodes and gets a list of kolmoblocks with the data block of interest as the target block. The client can make an estimate of the cost of obtaining and rendering each of the available komoblocks. An example algorithm for that is outlined at Fig. 1. And then optimize to fetch and render the kolmoblock that would take the least of resources to retrieve and render. -With cryptohash-addressable naming, the same data can be shared under different blocks due to being partitioned into blocks differently. Consider an example illustrated in the following figure, the text of a novel is distributed both as one block, **OC71D** and as 3 separate blocks for each book (**CA590, DDAA0, A099B**). - -![Fig.2](/img/Kolmoblocks/fig2.PNG) - -That leads to redundancy and data duplication. A kolmoblock-based server may address this by doing **compactification:** composing new, more efficient kolmoblocks that are still able to reproduce all the target blocks. The quoted kolmoblock makes having to store **OC71D** in a raw format unnecessary. - -## Applications - -In this section, we go through potential applications of kolmoblocks. - -### Origin Agnostic CDNs - -Adoption of technologies whose value is based on network effect have an inherent chicken-and-egg problem to it. Early adopters tend to see little value in it since its support by other network nodes is minimal. For technologies of this nature to be successful, it is important to identify an application where the adopters get value even with the limited adoption of the technology by the other agents. For kolmoblocks, origin-agnostic CDNs can be just that. - -Both content providers and ISPs are incentivized to have CDN arrangements. IPSs would like to cache as much content as they can within their network to lower their transit costs. Content providers, on their end, want to lower their hosting costs, and both parties are interested in lower latency and better experience for customers. - -ISPs can’t just start caching all the traffic on their side: - -1. TLS-based encryption is widely adopted as the best practice to certify the content’s origin and authenticity to the customer and leads to ISPs only seeing encrypted traffic; -2. it is hard to solve the cache expiration problem generally, without being able to make any assumptions about the content provider - -As a result, ISPs only are able to cache content from the providers they have custom arrangements with (such as a contractual relationship), either direct ones, or via a ternary party, such as a CDN company. Setting up and managing an arrangement like this usually comes with overhead costs (legal, engineering, managing relationships with mediatory parties). - -If we are to take the total number of Internet Autonomous Systems as a proxy for the number of ISPs across the world (in tens of thousands), that are spread across jurisdictions and estimate the content producers for which caching content would have brought considerable advantage by Alexa’s list (500), we see that solving this with the current approach would lead to having to maintain an order of 10,000,000 contractual relationships. It is easy to see that this approach does not scale. Lets see how kolmoblock-based CDN would be able to address this situation. - -As an example, consider the content producer (let’s call them **example-video-site.com**) setting up the following services: - -1. a Kolmoblock-based CANP server that serves the hosted content to other Kolmoblock-based CANP peers -2. setting up an HTTPS gateway to their content where the hosted data blocks are identified by their hash-based ids as part of the url. For example, would serve the **CC646** data block - -This setup can be announced to the network with DNS records like this: - -```sh -_kolmoblocks_.tcp.example-video-site.com. IN URI 10 1 "https://kolmoblocks.example-video-site.com/" -_kolmoblocks._udp.example-video-site.com. IN URI 10 1 "kolmoblocks://" -``` - -The content provider’s dynamic site may refer to kolmoblocks via its HTTPS gateway link so ISPs and customers see it just as another HTTP endpoint. - -However, an CDN server controlled by the ISP would be able to identify this content provider supports kolmoblocks. It would be able to overwrite the DNS record for the content provider’s kolmoblock gateway to point at its own kolmoblock gateway, retrieve the requested kolmoblocks from the content provider using a kolmoblockbased CANP, and serve the cached content to the user. - -Kolmoblocks enable parties to be able to cache content without having to maintain a direct relationship between each other. diff --git a/layouts/documentation/list.html b/layouts/documentation/list.html index c708070..fdfb569 100644 --- a/layouts/documentation/list.html +++ b/layouts/documentation/list.html @@ -1,7 +1,7 @@ {{ partial "header.html" . }} -
-
+
+
{{ partial "nav-single.html" . }}
diff --git a/layouts/documentation/single.html b/layouts/documentation/single.html index 57a24ad..4b4651c 100644 --- a/layouts/documentation/single.html +++ b/layouts/documentation/single.html @@ -1,7 +1,7 @@ {{ partial "header.html" . }} -
+
{{ partial "nav-single.html" . }}
diff --git a/layouts/index.html b/layouts/index.html index e6407d1..803bbaf 100644 --- a/layouts/index.html +++ b/layouts/index.html @@ -1,3 +1,4 @@ + {{ partial "header.html" . }}