As I have outlined in the series of blogs, Azure Logging is based on Data Collection Rules (DCRs) and Azure Pipeline.
As log data serves as documentation for compliance and is crucial in SIEM security hunting , it is vital, that we can monitor the log ingestion health state and detect if data is dropped.
This blog covers how we can monitor the health and analyze metrics/performance of the log ingestion using Data Collection Rules.
Content
- Implementation of DCR Diagnostics
- Performance Monitoring
- Health Monitoring
- Alerting with Azure Monitor (Performance)
- Alerting with Azure Monitor (Health)
- Troubleshooting (Microsoft Article)
Implementation of DCR Diagnostics
In order for us to monitor the ingestion of data through a Data Collection Rule, we need to enable Diagnostics on the Data Collection Rule.
Design consideration (cost)
You don’t need to enable AllMetrics to see the data using Azure Metrics, as the data is already in the Azure Platform (free).
But if you want to keep the data in logs for compliance purpose (dropped logs) – or use it for queries, you can enable the AllMetrics above which will stream the data into LogAnalytics which will make the below entries every 1 min. The data is billable, if you stream it into LogAnalytics.
Performance Monitoring
Once we have enabled diagnostics, platform metrics data will now be flowing into Azure Monitor (free) so we can analyze a number of metrics.
How do I find the metrics ?
Now you can select the metric of choice.
You can now see the data in the timeline.
Health Monitoring
Parts of the diagnostics information related to health/errors goes in the table DCRLogErrors, which can be queried using Kusto.
Alerting with Azure Monitor (Performance)
Since we have the data in Azure Monitor, we can also configure alerting using Dynamic Thresholds as shown below. Using machine learning, Azure Monitor will learn the “normal flow” so it can detect abnormally dynamically.
Alerting with Azure Monitor (Health)
Monitoring of DCR Errors can be done with Azure Monitor custom search query. If number of rows returned after greater than 0, then errors are happening.
Troubleshooting (Microsoft Article)
Microsoft has created a great article or troubleshooting, which can be found here
Article covers: