Harness the true power of YAML
YAML claims it is a human-friendly data serialization standard. “Human-friendly,” said no one who has ever used YAML.
One space out of place, and everything breaks.
With version 1.2. the primary objective is to bring YAML into compliance with JSON as an official subset. As a superset of JSON, what does YAML add? In this article, I’m going to explore things you might not know about YAML.
Before we begin, let’s have a quick intro to YAML. YAML has ordered lists called sequences. These sequences are created by dashes and unordered key-value pairs separated by a colon. These key-value pairs are called mappings. Each of the keys in a mapping must be unique. The text that defines the keys, values, and sequence items are called scalar values, which are zero or more printable Unicode characters. With that clear, let’s begin our tour of ten things you might not know about YAML.
YAML is a superset of JSON, which means any valid JSON is valid YAML. On the left is a JSON document and, on the right, the canonical form of that content.
If the YAML parser you’re using supports version 1.2, you can convert all your YAML files to JSON and everything will continue to work. But before you do that, read the rest of this article. You might discover why you want to stick with YAML.
YAML supports comments. Comments can start anywhere on the line with a pound sign (#) and continue until the end of the line. This can be very handy in large documents such as CI/CD pipelines. Adding comments does not change the output in any way.
YAML supports anchors and aliases. This prevents you from having to repeat the same value. An anchor is an ampersand (
&) followed by a label, followed by a space and the value you want to reuse. When you want to use the value, type asterisk (
*) and the label. When the YAML is processed, all aliases will be replaced with the actual values. This allows you to change the value in one place and have the change propagate the entire document.
New lines can be preserved or folded using the literal style indicated by a vertical bar (
Or the folded style denoted by a greater than symbol (
When using the folded style, each line break is folded into a space unless it ends an empty or more indented line. Adding a minus (
-) or a plus (
+) controls if the trailing new lines are preserved.
When using the literal style each line break is preserved. However, the trailing new lines can be controlled with a minus (
-) or a plus (
Finally, you can add a block indicator to inform the parser of the expected indentation. Any additional spaces will be added to the front of the value.
Values can be written as plain (without quotes), double quotes, or single quotes. Quoting a value is helpful if the first character of the value could lead to ambiguity. When using double quotes, backslashes and double quotes must be escaped with a backslash. When using single quotes, backslashes and double quotes don’t need to be escaped. However, to get a single quote, you must use two.
You can identify the YAML version using a directive at the top of the file. For example, you can start a document with
%YAML 1.2 to denote that this document supports version 1.2 of the YAML spec.
You can assign a type to a value using double exclamation marks followed by the type. For example, I can change my age from the default integer to a float. So, I can report that I am 48 and a half years old.
YAML supports two authoring styles: block and flow. Block uses indentation to denote structure, which is how most of us see YAML being used. YAML’s flow style can be thought of as a natural extension of JSON and uses double quotes, single quotes, square brackets, curly braces, and commas to denote structure.
Flow defines sequences written as a comma-separated list within square brackets and mappings using curly braces. The flow style is far more forgiving when it comes to spaces. You can mix both block and flow styles in the same document.
You can use a question mark to denote a key in a key-value pair. This can be useful when the key is null or the first character is a character — such as a dash or square bracket — that, without quotes, would lead to ambiguity. Using a question mark also allows the key to span multiple lines. When using a question mark, the colon must be on a separate line.
Tabs cannot be used for indentation; they must be converted to spaces. I always thought the number of spaces was fixed. The number of spaces is not important as long as parallel elements have the same left justification and the hierarchically-nested elements are indented further. The indentation level of a block scalar is detected from the first non-empty line.
Before I did the research for this article, I was always frustrated with YAML, and I wondered why it even existed. Now I understand that it adds a lot of cool features to JSON, and I can make a more informed decision on when to use it.
Editor: Chelsea Brown