Last week, vector has been introduced. This week, this demo has been created including parsing, log to metric, vector agent monitoring and host metrics (Vector as a node exporter) .
1️⃣ Clone the repository
2️⃣ Run the script
./run.sh
3️⃣ Open Grafana Dashboards:
The app contains 2 supervisord services:
app.sh writes logs to LOG_BASE_PATH
(/workspace/app/log/). The log structure is 2 lines; bad logs
is used as an invalid key value to monitor pipeline errors while the second one is parsed as a metric (H is hotel and T in the fake app think time part in milliseconds).
10 app.sh instances is configured to show how the graph scale when the number of agent is high.
t=2023-11-20T11:34:15.692975421+00:00 bad logs
t=2023-11-20T11:34:15.694559072+00:00 H=2497 T=2725
vector reads the logs from the app and converts logs to metrics. Vector is also monitored and a grafana dashboard is available.
A mimir, loki and grafana are used as a backend to visualize datapoints.
To convert log to metrics, the log should be structured and correctly parsed. Vector component Vector Remapping Language is used to parse line to structured log.
VRL is a lightweight rust which brings safety and true error handling to avoid runtime error when the error can be statically verified during writing the parser.
It helps to easily maintain transformations.
In this demo, the following VRL is used to parse the log line to a structured log.
The following part has been introduced on the previous post.
. |= parse_key_value!(.message, field_delimiter:"\t", accept_standalone_key:false)
.timestamp = parse_timestamp!(.t, "%Y-%m-%dT%H:%M:%S%.f%:z")
.job = "vector"
del(.message)
.message
is the raw message which can be parsed.
1️⃣ parse message as key value and put the object at root level (dot) .
. The parse_key_value returns a result which can be the object or an error. The bang !
operator is used to fail on error.
2️⃣ Add a field timestamp
by parsing and fail if there is an error
3️⃣ Add job
field to vector
4️⃣ Remove message
to avoid paying twice raw and structured signal.
⚠️ This program will fail on error meaning that vector will stop on error.
It is also possible to handle errors in vrl but it comes at repeating the same error handling every time.
Instead of failing vector on error, it is also possible and recommended to drop on error so that vector will not fail at all and the message is not lost but rerouted.
Log to metric transformation in vector is a common pattern where logs are visualized as aggregated metrics only. Using logs as direct source to visualize is CPU and IO intensive and unefficient way to visualize datapoints.
To avoid high resource usage to simply visualize datapoints, a log to metric transformation can be used.
The log to metric demo pipeline:
1️⃣ Setup source app_file_raw
2️⃣ Parse line to structured log by configuring the transforms vector remapping language (VRL) conversion keyvalue.vrl. Drop on error, abort and reroute thoses messages to a drop input.
3️⃣ Setup the log_to_metric
vector transforms by creating 2 counters: one incremented by log (app.count
) and another one incremented by T
value (app.total
) which is the thinktime of the app
component.
# file log
[sources.applog_file_raw]
type = "file"
include = ["${LOG_BASE_PATH}**/*.*"]
[transforms.applog_file]
type = "remap"
inputs = ["applog_file_raw"]
file = "config/vrl/keyvalue.vrl"
# forward to a dead letter queue on error or abort
drop_on_error = true
drop_on_abort = true
reroute_dropped = true
[transforms.applog_file_2_metric]
type = "log_to_metric"
inputs = ["applog_file"]
[[transforms.applog_file_2_metric.metrics]]
type = "counter"
field = "T"
namespace = "app"
name = "count"
[transforms.applog_file_2_metric.metrics.tags]
hotel = ""
[[transforms.applog_file_2_metric.metrics]]
type = "counter"
field = "T"
namespace = "app"
name = "total"
increment_by_value = true
[transforms.applog_file_2_metric.metrics.tags]
hotel = ""
Vector has the ability to unit test the transformation pipeline like the log 2 metric test.
[[tests]]
name = "Test applog_file parsing"
[[tests.inputs]]
insert_at = "applog_file"
type = "log"
[tests.inputs.log_fields]
message = "t=2023-11-13T15:53:37.728584030+01:00\th=FR-LT-00410\tH=6666\tT=5663"
[[tests.outputs]]
extract_from = "applog_file"
[[tests.outputs.conditions]]
type = "vrl"
source = '''
assert!(is_timestamp(.timestamp))
'''
1️⃣ Substitute the input applog_file
: the structured log.
2️⃣ Inject an entry .message =2023-11-13T15:53:37.728584030+01:00\th=FR-LT-00410\tH=6666\tT=5663
3️⃣ Intercept the pipeline output of applog_file
to test.outputs
4️⃣ Assert the output with a vrl assertion assert!(is_timestamp(.timestamp))
.
5️⃣ Run the unit test
```bash
docker run --rm -w /vector -v $(pwd):/vector/config/ timberio/vector:0.34.0-debian test --config-toml /vector/config/**/*.toml
```
A vector_test.sh script has been made to ease the vector integration.
Vector has a graph
command which produces dot graph. A vector_graph.sh is available to build this graph.
docker run --rm -w /vector -v $(pwd):/vector/config/ timberio/vector:0.34.0-debian graph --config-toml /vector/config/**/*.toml | dot -Tsvg > graph.svg
As soon as the pipeline run, it becomes important to monitor errors, saturation and utilization to properly trigger alerts.
Vector is instrumented with logs and metrics.
Dedicated pipeline metrics are available and a grafana dashboard is available.
For the sake of the demo, the dashboard is a lightweight version of the full vector.dev.
In this demo, on error, the log is dropped and it is possible to collect all dropped messages of the pipeline.
## dead letter queue for dropped messages
[sinks.dlq_loki]
type = "loki"
endpoint = "http://loki:3100"
inputs = ["*.dropped"]
encoding = { codec = "json" }
labels = { application = "dead-letter-queue", host="", pid="" }
In this demo, a bad logs
is not a key value and the pipeline fails so that this log appears in the dropped inputs.
Vector produces such log on error which is helpful to analyze which component is failing("component_id": "applog_file"
), the error message("message": "function call error for \"parse_key_value\" at (5:82): could not parse whole line successfully"
) and the full log message("message": "t=2023-11-23T13:00:14.872126641+00:00\tbad logs"
)
{
"file": "/workspace/app/log/03523229af2f_20231123.log",
"host": "03523229af2f",
"message": "t=2023-11-23T13:00:14.872126641+00:00\tbad logs",
"metadata": {
"dropped": {
"component_id": "applog_file",
"component_kind": "transform",
"component_type": "remap",
"message": "function call error for \"parse_key_value\" at (5:82): could not parse whole line successfully",
"reason": "error"
}
},
"source_type": "file"
}
The Utilization
graph helps to monitor components utilization to quickly identify a component using too much resources.
Sinks and sources components bandwidth is monitored through Component sent
and Component received
Buffer Events is also important to care, in case of problem on the sink side, the number of events can be high and it is important to check that there is no dropped telemetry due to a sink problem.
Vector can also be used as a drop-in solution of node exporter to reduce the number of agent per host.
This grafana graph has been imported and vector has been configured to export host metrics.