Skip to content

Commit

Permalink
feat(file domain): add support for reading arbitrary files as strings (
Browse files Browse the repository at this point in the history
…#726)

* feat(file domain): add support for reading arbitrary files as strings

* support specifying file parser

* adding some words about the file string format
  • Loading branch information
mildwonkey authored Oct 11, 2024
1 parent 5fd0f32 commit 0b1c0c8
Show file tree
Hide file tree
Showing 10 changed files with 345 additions and 86 deletions.
116 changes: 94 additions & 22 deletions docs/reference/domains/file-domain.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,33 +11,41 @@ domain:
filepaths:
- name: config
path: grafana.ini
parser: ini # optionally specify which parser to use for the file type
```
## Supported File Types
The file domain uses OPA's [conftest](https://conftest.dev) to parse files into a json-compatible format for validations. ∑Both OPA and Kyverno (using [kyverno-json](https://kyverno.github.io/kyverno-json/latest/)) can validate files parsed by the file domain.
The file domain supports the following file formats for validation:
* CUE
* CycloneDX
* Dockerfile
* EDN
* Environment files (.env)
* HCL and HCL2
* HOCON
* Ignore files (.gitignore, .dockerignore)
* INI
* JSON
* Jsonnet
* Property files (.properties)
* SPDX
* TextProto (Protocol Buffers)
* TOML
* VCL
* XML
* YAML
The file domain uses OPA's [conftest](https://conftest.dev) to parse files into a json-compatible format for validations. Both OPA and Kyverno (using [kyverno-json](https://kyverno.github.io/kyverno-json/latest/)) can validate files parsed by the file domain.
The file domain includes the following file parsers:
* cue
* cyclonedx
* dockerfile
* dotenv
* edn
* hcl1
* hcl2
* hocon
* ignore
* ini
* json
* jsonc
* jsonnet
* properties
* spdx
* string
* textproto
* toml
* vcl
* xml
* yaml
The file domain can also parse arbitrary file types as strings. The entire file contents will be represented as a single string.
The file parser can usually be inferred from the file extension. However, if the file extension does not match the filetype you are parsing (for example, if you have a json file that does not have a `.json` extension), or if you wish to parse an arbitrary file type as a string, use the `parser` field in the FileSpec to specify which parser to use. The list above contains all the available parses.

## Validations
When writing validations against files, the filepath `Name` must be included as
When writing validations against files, the filepath `name` must be included as
the top-level key in the validation. The placement varies between providers.

Given the following ini file:
Expand Down Expand Up @@ -124,5 +132,69 @@ provider:
- validate.msg
```

### Parsing files as arbitrary strings
Files that are parsed as strings are represented as a key-value pair where the key is the user-supplied file `name` and the value is a string representation of the file contexts, including special characters, for e.g. newlines (`\n`).

As an example, let's parse a similar file as before as an arbitrary string.

When reading the following multiline file contents as a string:
```server.txt
server = https
port = 3000
```

The resources for validation will be formatted as a single string with newline characters:

```
{"config": "server = https\nport = 3000"}
```

And the following validation will confirm if the server is configured for https:
```validation.yaml
domain:
type: file
file-spec:
filepaths:
- name: 'config'
path: 'server.txt'
parser: string
provider:
type: opa
opa-spec:
rego: |
package validate
import rego.v1
# Default values
default validate := false
default msg := "Not evaluated"
validate if {
check_server_protocol.result
}
msg = check_server_protocol.msg
config := input["config"]
check_server_protocol = {"result": true, "msg": msg} if {
regex.match(
`server = https\n`,
config
)
msg := "Server protocol is set to https"
} else = {"result": false, "msg": msg} if {
regex.match(
`server = http\n`,
config
)
msg := "Server Protocol must be https - http is disallowed"
}

output:
validation: validate.validate
observations:
- validate.msg
```
## Note on Compose
While the file domain is capable of referencing relative file paths in the `file-spec`, Lula does not de-reference those paths during composition. If you are composing multiple files together, you must either use absolute filepaths (including network filepaths), or ensure that all referenced filepaths are relative to the output directory of the compose command.
26 changes: 26 additions & 0 deletions src/pkg/common/schemas/validation.json
Original file line number Diff line number Diff line change
Expand Up @@ -381,6 +381,32 @@
},
"path": {
"type": "string"
},
"parser": {
"type": "string",
"enum": [
"cue",
"cyclonedx",
"dockerfile",
"edn",
"hcl1",
"hcl2",
"hocon",
"ignore",
"ini",
"json",
"jsonc",
"jsonnet",
"properties",
"spdx",
"textproto",
"toml",
"vcl",
"xml",
"yaml",
"dotenv",
"string"
]
}
}
}
Expand Down
115 changes: 96 additions & 19 deletions src/pkg/domains/files/files.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,35 +37,36 @@ func (d Domain) GetResources(ctx context.Context) (types.DomainResources, error)
defer os.RemoveAll(dst)

// make a map of rel filepaths to the user-supplied name, so we can re-key the DomainResources later on.
filenames := make(map[string]string, len(d.Spec.Filepaths))
filenames := make(map[string]string, 0)

// unstructuredFiles is used to store a list of files that Lula needs to parse.
unstructuredFiles := make([]FileInfo, 0)
filesWithParsers := make(map[string][]FileInfo, 0)

// Copy files to a temporary location
for _, path := range d.Spec.Filepaths {
file := filepath.Join(workDir, path.Path)
bytes, err := network.Fetch(file)
if err != nil {
return nil, fmt.Errorf("error getting source files: %w", err)
for _, fi := range d.Spec.Filepaths {
if fi.Parser != "" {
if fi.Parser == "string" {
unstructuredFiles = append(unstructuredFiles, fi)
continue
} else {
filesWithParsers[fi.Parser] = append(filesWithParsers[fi.Parser], fi)
continue
}
}

// We'll just use the filename when writing the file so it's easier to reference later
relname := filepath.Base(path.Path)

err = os.WriteFile(filepath.Join(dst, relname), bytes, 0666)
file := filepath.Join(workDir, fi.Path)
relname, err := copyFile(dst, file)
if err != nil {
return nil, fmt.Errorf("error writing local files: %w", err)
}

// and save this info for later
filenames[relname] = path.Name
filenames[relname] = fi.Name
}

// get a list of all the files we just downloaded in the temporary directory
files := make([]string, 0)
err = filepath.WalkDir(dst, func(path string, d fs.DirEntry, err error) error {
if !d.IsDir() {
files = append(files, path)
}
return nil
})
files, err := listFiles(dst)
if err != nil {
return nil, fmt.Errorf("error walking downloaded file tree: %w", err)
}
Expand All @@ -79,14 +80,66 @@ func (d Domain) GetResources(ctx context.Context) (types.DomainResources, error)

// clean up the resources so it's using the filepath.Name as the map key,
// instead of the file path.
drs := make(types.DomainResources, len(config))
drs := make(types.DomainResources, len(config)+len(unstructuredFiles)+len(filesWithParsers))
for k, v := range config {
rel, err := filepath.Rel(dst, k)
if err != nil {
return nil, fmt.Errorf("error determining relative file path: %w", err)
}
drs[filenames[rel]] = v
}

// Now for the custom parsing: user-specified parsers and string files.

for parserName, filesByParser := range filesWithParsers {
// make a sub directory by parser name
parserDir, err := os.MkdirTemp(dst, parserName)
if err != nil {
return nil, err
}

for _, fi := range filesByParser {
file := filepath.Join(workDir, fi.Path)
relname, err := copyFile(parserDir, file)
if err != nil {
return nil, fmt.Errorf("error writing local files: %w", err)
}

// and save this info for later
filenames[relname] = fi.Name
}

// get a list of all the files we just downloaded in the temporary directory
files, err := listFiles(parserDir)
if err != nil {
return nil, fmt.Errorf("error walking downloaded file tree: %w", err)
}

parsedConfig, err := parser.ParseConfigurationsAs(files, parserName)
if err != nil {
return nil, err
}

for k, v := range parsedConfig {
rel, err := filepath.Rel(parserDir, k)
if err != nil {
return nil, fmt.Errorf("error determining relative file path: %w", err)
}
drs[filenames[rel]] = v
}
}

// add the string form of the unstructured files
for _, f := range unstructuredFiles {
// we don't need to copy these files, we'll just slurp the contents into
// a string and append that as one big DomainResource
b, err := os.ReadFile(filepath.Join(workDir, f.Path))
if err != nil {
return nil, fmt.Errorf("error reading source files: %w", err)
}
drs[f.Name] = string(b)
}

return drs, nil
}

Expand All @@ -103,3 +156,27 @@ func CreateDomain(spec *Spec) (types.Domain, error) {
}
return Domain{spec}, nil
}

// copyFile is a helper function that copies a file from source to dst, and returns the relative file path between the two.
func copyFile(dst string, src string) (string, error) {
bytes, err := network.Fetch(src)
if err != nil {
return "", fmt.Errorf("error getting source files: %w", err)
}

// We'll use the filename when writing the file so it's easier to reference later
relname := filepath.Base(src)

return relname, os.WriteFile(filepath.Join(dst, relname), bytes, 0666)
}

func listFiles(dir string) ([]string, error) {
files := make([]string, 0)
err := filepath.WalkDir(dir, func(path string, d fs.DirEntry, err error) error {
if !d.IsDir() {
files = append(files, path)
}
return nil
})
return files, err
}
4 changes: 4 additions & 0 deletions src/pkg/domains/files/files_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,21 @@ func TestGetResource(t *testing.T) {
d := Domain{Spec: &Spec{Filepaths: []FileInfo{
{Name: "foo.yaml", Path: "foo.yaml"},
{Name: "bar.json", Path: "bar.json"},
{Name: "baz", Path: "baz", Parser: "json"},
{Name: "arbitraryname", Path: "nested-directory/baz.hcl2"},
{Name: "stringtheory", Path: "arbitrary.file", Parser: "string"},
}}}

resources, err := d.GetResources(context.WithValue(context.Background(), types.LulaValidationWorkDir, "testdata"))
require.NoError(t, err)
if diff := cmp.Diff(resources, types.DomainResources{
"bar.json": map[string]interface{}{"cat": "Cheetarah"},
"foo.yaml": "cat = Li Shou",
"baz": map[string]interface{}{"lizard": "Snakob"},
"arbitraryname": map[string]any{
"resource": map[string]any{"catname": map[string]any{"blackcat": map[string]any{"name": "robin"}}},
},
"stringtheory": "hello there!",
}); diff != "" {
t.Fatalf("wrong result:\n%s\n", diff)
}
Expand Down
5 changes: 3 additions & 2 deletions src/pkg/domains/files/spec.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ type Spec struct {
}

type FileInfo struct {
Name string `json:"name" yaml:"name"`
Path string `json:"path" yaml:"path"`
Name string `json:"name" yaml:"name"`
Path string `json:"path" yaml:"path"`
Parser string `json:"parser,omitempty" yaml:"parser,omitempty"`
}
1 change: 1 addition & 0 deletions src/pkg/domains/files/testdata/arbitrary.file
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
hello there!
1 change: 1 addition & 0 deletions src/pkg/domains/files/testdata/baz
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{ "lizard": "Snakob"}
Loading

0 comments on commit 0b1c0c8

Please sign in to comment.