Validation in Depth

A standard military principle United States Marines are taught is "defense in depth". The general idea is to establish multiple, independent measures to ensure you are still protected given the failure of one line of defense. It is also an easily recognizable construct in the defensive information security space: e.g. many organizations use both network and host based firewalls.

We LOVE automation at Recon InfoSec, (see here for examples) but with the beauty of automation often comes the fragility of complexity. In order to combat this fragility, we apply a "defense in depth" approach to validating and testing our automated process to ensure that we catch any "bad" code as far left as possible.

bad-code

Additionally, OpenSOC is an open and distributed team, with contributors from varying backgrounds and specialties. Using technical measures to enable MOAR people to easily contribute awesome material is a double win.

For example, during our last OpenSOC @ DEF CON 28 Safe Mode we had over 20 hours of content and 500+ challenges to create, validate, and run for our 800+ participants. This requires a ton of hours, a ton of material, a ton of compute power, with only a small team to build it.

Environment

Use Cases

We use A LOT of YAML and other markup languages to define different elements of our infrastructure. For example, metadata about an attack scenario, indicators of compromise, mappings to MITRE ATT&CK, Infrastructure as Code configurations, or CI pipeline configurations. Because markup documents are often ingested at runtime and incredibly flexible by design, it can be challenging to ensure that they are both syntactically valid AND meet the schema requirements of whatever software/system/tool is using them.
a-lot-of-yaml

Schemas

Many IDEs support syntax validation for languages like YAML, but cannot recognize functional (schema) failures. In the below example, the second line may be caught by an IDE, but the third likely won't.

- name: this is all good

No problems here!

- name: this: is invalid

This may be automatically highlighted by your IDE

- nmae: this is also valid

While the syntax is fine, this will likely break functionally. IDEs will struggle to identify this without context

Most common tools which use JSON/YAML/etc. for configuration files have pre-defined schemas of some sort, even if just in the documentation. But, a purposefully defined schema can drive automation.

Enter JSON Schema:

a vocabulary that allows you to annotate and validate JSON documents

This gives us an easy standard for developing and digesting schemas for nearly any system we would like to use. And of course, online libraries of pre-defined schemas (like JSON Schema Store) exist for popular tools. No need to manually create a schema for Circle CI or Ansible.

The note specifically references JSON documents but we can use this standard for many other formats as well using the jsonschema python package. For example, loading either a JSON or YAML document into Python can be converted into identical dictionaries.

Implementation

Given our requirement for validating multiple types of files, and the tools for doing so, here is how we are currently using them.

IDE

Much of team uses VS Code and implementing custom schema validation using JSON Schemas is as easy as installing an extension such as the YAML Language Support Extension from Redhat and a little configuration in settings.json.

"yaml.schemas": {
  "./path/to/schema.json": "data.yaml"
},
"yaml.schemaStore.enable": true,

In the above example, any file named data.yaml will be automatically validated (including autocomplete and tips!) using the schema.json schema in real time within VS Code.

Git Hooks

We use CI jobs for most of our projects, and here, the hope is to keep bad code out of the repo and the CI job. Git hooks are a perfect application for executing validation before the code hits the remote repo (e.g. within a "pre-commit" hook).
This is the first place we use the jsonschema Python package as part of the pre-commit hook.

import json

from jsonschema import Draft7Validator
import yaml


validator = None
with open("schema.json", "r") as f:
	schema = json.load(f)
	validator = Draft7Validator(schema)

data = []
with open("data.yaml", "r") as f:
	data = yaml.safe_load(f)
    
validation_errors = sorted(validator.iter_errors(data), key=str)

The above script gives us a list of all of the validation errors using THE SAME schema as we used in our IDE. If we detect any errors, we can exit 1 and stop the commit before bad markup is committed. Running a script like this is often significantly faster and lower overhead than firing up whatever tool requires the data being validated.

CI Job

Neither IDE extension on a local dev machine, nor Git hooks are totally enforceable and both could be passively excluded. So, we enforce one final validation check as part of the CI job.

Using the same script AND the same schema as above, we create another job within our CI pipeline to validate the data before building/deploying/etc.

version: 2.1
jobs:
	validate:
        docker:
          - image: circleci/python:3.7
        steps:
          - checkout
          - run:
              name: Install required modules
              command: pip install -r ./requirements.txt
          - run:
              name: Validate
              command: python validate.py -s ./schema.json -c data.yaml

Again, if we get a non-successful exit, we can break the pipeline and catch the errors before we build and deploy.

Conclusion

These validation steps are an addition to other types of testing, and do not replace unit tests, integration testing, or deployments to test environments. But, catching a validation error while you are writing code is much faster than catching it on the last unit test of a large test suite. Additionally, using processes like these provide level of abstraction away from your software, enabling a larger group of people to contribute.

More Coolness

As stated earlier, anything we can get into a Python dictionary could potentially be validated against a schema. For example, Graylog rules are not a standard YAML/JSON format, but with some creative Python parsing, can be ported into a dictionary!

We are always looking for creative ways to apply automation and more effectively build awesome OpenSOC experiences for huge groups of hunters. If you are interested in learning more, or being a part of OpenSOC, stay tuned here on the blog (or on Twitter #OpenSoc) to get the latest on the events we are running.

Join Us

Also, if you love OpenSOC and are interested in private training for your team, check out the Recon InfoSec Network Defense Range.