This blog is about keeping long-term Sentinel logs, giving you insight to the options today – with great opportunities to save money 🙂
To be honest, I can get confused with the limitations and feature-set, as there are coming many great changes rapidly. Therefore I have prepared an detailed overview with 2 tables covering Decision #1 Export vs. Keep and Decision 2: Choosing our long-term log-storage solution
I covered Azure LogAnalytics basic logs and archive logs in a previous blog-post – so I have decided to cover the solution, Azure Data Explorer, in this blog.
In many cases, this solution can be a great option, where you can save lots of money. But on the flipside, it is also more complex. Hopefully this guide can get you up and running, so you can try it out. I will cover adding Syslog data, but it can also be used for e.g. Defender for Endpoint data (and more).
Azure Data Explorer is an alternative to consumption-based log retention, where you pay a fixed price for the infrastructure for long-term log-storage based on Azure Data Explorer cluster – and then use Azure Data Factory to transform your data.
If you want to get a tutorial on how to get a complete environment up and running using Azure Data Explorer, Azure Data Factory and export to Azure Storage Account, then keep reading, where I will try to take you through the steps.
Decision #1 – Export vs. Keep
When we have to decide on a solution, we will have to decide on “should we export or keep data”. Below are the options:
Option | Export | Export | Export | Export | Keep | Keep |
Method | Continuously Export | Continuously Export | Ad-hoc export – Powershell | Schedule export via Query (LogicApp, Function, Data Factory | LogAnalytics Retention | LogAnalytics Archive |
Target | EventHub | Storage Account | ||||
Format | JSON | JSON | Any format | Any format | ||
Limitations | Link | Link |
Decision #2 – Choosing our long-term log-storage solution
The big question is what solution to choose for our long-term logs, and here the complexity starts, where we have to take many factors into consideration: platform-knowledge in-house, limitations, time2enable, management, performance, monthly cost, scalability, security, etc.
Cost-factors
As mentioned above, many factors must be taken into account when you look at costs. Cost-differences cannot be concluded without asking some important questions. Here are 4 samples of these questions:
- For how long do you require interactive hunting capabilities – alternative can a search job be sufficient ?
- If it’s just limited hunting and audit/compliance, you can come far by using loganalytics archive.
- If you need to query the data very often, go with Azure Data Explorer/Azure Data Factor.
- I have shown some price below I compare a few scenarios also using LogAnalytics archive.
- For how long do you keep your retention in Sentinel (short-term) ?
- If you only have a 90 days in Sentinel and consider longterm from >91 days vs. if you keep shortterm in Sentinel for 1 year.
- If you keep shortterm only for 90 days and then move everything to longterm and keep for longer, you can gain pretty high savings moving to ADX/ADX.
- If you keep short-term for 12 months in Sentinel, then the long-term might only be 12 months, which will make the cost difference smaller.
- How long is your length you need to keep the longterm data ?
- Longer length ADX/ADF outperforms LogAnalytics in cost (infrastructure payment vs. consumption payment)
- What is your duplication management design, as this is one of the main differences between the EventHub vs ADF method.
- Azure Data Explorer combined with EventHub method stores some data (the first X months) in both Microsoft Sentinel and Azure Data Explorer, resulting in data duplication
- Azure Data Explorer combined with Azure Data Factory enables you to copy data from Azure Data Factory only when it nears its retention limit in Microsoft Sentinel / Log Analytics, avoiding duplication.
I hope the below table will help you 🙂
Choice | LogAnalytics Retention – Analytics logs (lifecycle stage 1) | LogAnalytics Retention – Basic logs (lifecycle stage 1) | LogAnalytics Archive (lifecycle stage 2) | Azure Data Explorer | Storage Account |
Storage placement | LogAnalytics | LogAnalytics | LogAnalytics | Azure Data Explorer | Storage Account |
Storage limitation | Unlimited | Unlimited | Unlimited | Unlimited | 500 Tb pr storage account |
Kusto Query Frontend | LogAnalytics Data available for interactive queries | LogAnalytics Data available for interactive queries | LogAnalytics Data available using search job or restore | Azure Data Explorer Data available for interactive queries | Azure Data Explorer Data available for interactive queries |
Data move methods | No need | No need | No need | 1:Continuesly Export to Storage Account 2:Data Factory Daily Move | Continuously Export to Storage Account |
Kusto | Full support Full Schema | Limited support Full Schema | Search jobs Data Restore | Full cmdlets support Full Schema | Full cmdlets support Full Schema |
Search performance with date filter, 54.056 rows | Fast 5,2 sec | Not measured | Very Slow >2-15 min | Very fast 0,175 sec | Slow 150 sec – 16 gb, 392 blobs |
Search performance (full table query) –54.056 rows | 5,8-7,5 sec Returned 30.000 rows | Not measured | Very Slow >2-15 min | 3,7 sec Returned 54.056 rows | Failed (hit limit 64 mb) |
Performance scaling | ADX cluster | N/A | N/A | Autoscale compute | N/A |
Billing method | Consumption per Gb | Consumption per Gb | Consumption per Gb | Infrastructure (fixed) + some consumption | Consumption per Gb |
Data Protection | RBAC Restrict access to databases, tables or even rows witin a table | RBAC Restrict access to databases, tables or even rows witin a table | RBAC Restrict access to databases, tables or even rows witin a table | immutable storage account | |
Retention length (limit max days) | 730 (2 years) | 8 days | 36.500 (100 years) | 146.000 (400 years) | |
Archive (limit max days) | 2556 (7 years) | 2556 (7 years) | 2556 (7 years) | No need, run as retention | No need, run as retention |
Retention targetting | Workspace Individual Tables | Workspace Individual Tables | Workspace Individual Tables | Database Individual Tables | Separate storage accounts per tables (max 10 export rules) |
Setup complexity | Easy | Easy | Ad-hoc | Complex (req knowledge about Data Explorer & Data Factory) | Moderate |
Monthly Cost (USD) Example #1 – 250 Gb Daily, 2 yr retention | 11.270-20.759 Scenario #1: Data Retention for 9 months incl. 3 free: 8.897 + Archive for 12 months: 2373 Total: 11.270 Scenario #2: Data Retention for 24 months incl. 3 free: Total: 20.759 | 4.745 Data Retention for 8 days + Archive for 24 months | Prices are covered in examples under retention, as it is one solution | 3.052 | 2.372-4.050 ADX: 510 + Storage: Average 1862 first 24 months Avr 0-12 month = 987 Avr 13-24 month = 2.738 3.540 (24 months, full capacity, 186.000 Gb) |
Monthly Cost (USD) Example #2 – 50 Gb Daily, 2 yr retention | 2.254-4.152 Scenario #1: Data Retention for 9 months incl. 3 free: 1779 + Archive for 12 months: 475 Total: 2.254 Scenario #2: Data Retention for 24 months incl. 3 free: Total: 4.152 | 949 Data Retention for 8 days + Archive for 24 months | Prices are covered in examples under retention, as it is one solution | 2.329 | 890-1239 ADX: 510 + Storage: Average 380 first 24 months Avr 0-12 month = 198 Avr 13-24 month = 562 729 (24 months, full capacity, 37.200 Gb) |
More info | Search jobs | Link |
Lets get started
By default, logs ingested into Microsoft Sentinel are stored in Azure Monitor Log Analytics.
Storing logs in Azure Data Explorer reduces costs while retains your ability to query your data, and is especially useful as your data grows. For example, while security data may lose value over time, you may be required to retain logs for regulatory requirements or to run periodic investigations on older data.
Azure Data Explorer is a big data analytics platform that is highly optimized for log and data analytics. Since Azure Data Explorer uses Kusto Query Language (KQL) as its query language, it’s a good alternative for Microsoft Sentinel data storage. Using Azure Data Explorer for your data storage enables you to run cross-platform queries and visualize data across both Azure Data Explorer and Microsoft Sentinel.
Microsoft Sentinel provides full SIEM and SOAR capabilities, quick deployment and configuration, as well as advanced, built-in security features for SOC teams. However, the value of storing security data in Microsoft Sentinel may drop after a few months, once SOC users don’t need to access it as often as they access newer data.
If you only need to access specific tables occasionally, such as for periodic investigations or audits, you may consider that retaining your data in Microsoft Sentinel is no longer cost-effective. At this point, we recommend storing data in Azure Data Explorer, which costs less, but still enables you to explore using the same KQL queries that you run in Microsoft Sentinel.
You can access the data in Azure Data Explorer directly from Microsoft Sentinel using the Log Analytics Azure Data Explorer proxy feature. To do so, use cross cluster queries in your log search or workbooks.
Important
Core SIEM capabilities, including Analytic rules, UEBA, and the investigation graph, do not support data stored in Azure Data Explorer.
You may want to retain any data with security value in Microsoft Sentinel to use in detections, incident investigations, threat hunting, UEBA, and so on. Keeping this data in Microsoft Sentinel mainly benefits Security Operations Center (SOC) users, where typically, 3-12 months of storage are enough.
You can also configure all of your data, regardless of its security value, to be sent to Azure Data Explorer at the same time, where you can store it for longer. While sending data to both Microsoft Sentinel and Azure Data Explorer at the same time results in some duplication, the cost savings can be significant as you reduce the retention costs in Microsoft Sentinel.
Before going into details on how to do this, it is important to stress, that this method can be considered complex and it will require competency in Azure Data Factory, Azure Data Explorer, Azure LogAnalytics and Azure Storage.
This solution can be used to store your security logs; Syslog, Defender for Endpoint logs (Advanced Hunting), Azure Activity, etc.
I will in details cover setting up Syslog (CommonSecurityLog) in this post, but you can easily use it for these tables:
- Azure
- Aadmanagedidentitysigninlogs
- Aadnoninteractiveusersigninlogs
- Aadserviceprincipalsigninlogs
- Auditlogs
- Signinlogs
- Defender for Endpoint (Advanced Hunting) – requires Sentinel M365D connector
- DeviceEvents
- DeviceFileEvents
- DeviceLogonEvents
- DeviceRegistryEvents
- DeviceImageLoadEvents
- DeviceNetworkInfo
- DeviceProcessEvents
- DeviceFileCertificateInfo
- DeviceInfo
- DeviceNetworkEvents
- Active Directory
- SecurityEvent
Can I use the same Kusto queries for my security-hunting ?
Yes, you can use the same Kusto queries searching in Azure Data Explorer (ADX) cluster, as you are using today. You can even search in both environments at the same time.
// This query will search in the ADX-table CommonSecurityLog (long-term log-backup)
//
union adx('https://decxxxplatformmgmtp.westeurope.kusto.windows.net/dedb-longterm-securitylogs').CommonSecurityLog
| where DestinationIP == "13.69.67.61"
| summarize total = count() by SourceIP
// This query will query both in the LogAnalytics CommonSecurityLog (short-term) – and in the ADX CommonSecuityLog (long-term)
//
union CommonSecurityLog, adx('https://decxxxplatformmgmtp.westeurope.kusto.windows.net/dedb-longterm-securitylogs').CommonSecurityLog
| where TimeGenerated between (datetime(2023-01-01) .. datetime(2023-01-09))
| where DestinationIP == "13.69.67.61"
| summarize total = count() by SourceIP
Solution overview (highlevel)
Before starting, I would like to credit the Microsoft team making this blog, which I have used to get my own knowledge up to speed. I have used a few pictures, statements and links from the article. I would also like to say thanks to the LogAnalytics product-team and Azure Data Explorer team for helping out with input.
The solution covers these steps:
1: Azure LogAnalytics tables are exported to Azure Storage Account.
This can for example be
- CommonSecurityLog
- ActivityLog
- DeviceEvents
- DeviceFileEvents
- DeviceLogonEvents
- DeviceRegistryEvents
- DeviceImageLoadEvents
- DeviceNetworkInfo
- DeviceProcessEvents
- DeviceFileCertificateInfo
- DeviceInfo
- DeviceNetworkEvents
2: Azure Data Factory will run a daily copy job from Azure Storage Account to Azure Data Explorer when it nears its retention limit in Microsoft Sentinel / LogAnalytics, avoiding duplication of data
- Last Modified Time Scope (1 day of data)
- Start – last modified 86 days ago
- End – last modified 85 days ago
3: When data is transferred successful, Azure Data Factory will delete the data in the blob storage, which was just transferred into ADX.
How to implement the solution
The implementation covers 7 steps:
- Step 1: Azure Storage Account
- Step 2: Azure Data Explorer (ADX) cluster
- Step 3: Azure Data Factory (ADF)
- Step 4: Establishment of Schema/Tables on ADX cluster
- Step 5: Create Copy-job in Azure Data Factory (ADF)
- Step 6: Finetune Copy job (last modified, testing)
- Step 7: Production ready – enable Delete data
Step 1: Azure Storage Account
1.1 – Create Storage Account (blob)
Create an Azure Storage Account, used for export of LogAnalytics
1.2 – Configure LogAnalytics Export of tables to Storage Account blob
For more information, see Log Analytics workspace data export in Azure Monitor.
Step 2: Azure Data Explorer (ADX) cluster
2.1 – Create Cluster
Create an Azure Data Explorer cluster and database
Select the correct compute SKU for your Azure Data Explorer cluster
Choose Review + Create to deploy the cluster
2.2 – Create database
When the cluster is running, we need to setup a database
Step 3: Azure Data Factory (ADF)
3.1 – Create Azure Data Factory environment (or re-use existing)
Choose Review + Create to start the ADF deployment
3.2 – Create Linked service for Azure Data Explorer (Kusto)
3.3 – Create Linked service for Azure Storage Account
Start my delegating the managed identity for the Data Factory
As we will only use Azure Blob storage as source (and not as sink), we only need to give Storage Blob Data Reader
Now go into ADF and setup a new linked service for Blob storage. Remember to choose System Assigned Managed Identity
Lastly you should now publish the 2 linked services.
Step 4: Establishment of Schema/Tables on ADX cluster
4.1 – Create table based on sample
Now we will create the destination table in ADX, based on the schema from the source (Blob storage).
There are multiple ways to accomplish this, but the fastest way to do this is:
Go into ADX, choose Database. Then go into Query – and right-click and choose ‘Create table’
You can choose to ingest data or leave it with no data ingestion for the preliminary config. Just click ‘ingest data’ and choose what you want. Personally I like to see data coming in.
If you unclick ingest data, remember to choose ‘create mapping’
Now choose Create table. It will create the table and mapping table and start to ingest data and verify schema is correct.
When you have verified data looks correct, then you can choose to delete new data.
Alternative, I have also added a script on my Github, which can be used to export the schema of the table
On Github, I have also included the scripts ways of setting of the tables, staging table, mapping, etc.
Step 5: Create Copy-job in Azure Data Factory (ADF)
Lastly we need to configure the Copy-job in Azure Data Factory (ADF)
On the front page in ADF, choose New Ingest
You can change the random names created by the wizard on the summary page shown above. Just click the Edit
Step 6: Finetune Copy job (last modified, testing)
6.1 – Suspend schedule (stop) + publish (as we will configure it first)
6.2 – Rename Source, Sink, Copy data activity
The wizard creates some odd names, which we will now rename
6.3 – Filtering by Last Modified
For testing we can just choose data from 60 min ago to 50 min ago (10 min). Choose what you want 🙂
@addminutes(utcnow(),-60)
Choose to Publish and Trigger now
6.4 – Monitoring
We will now verify data is coming into ADX using ADF Monitoring
6.5 – Verification using LogAnalytics query
We can now verify using standard Kusto queries
union adx('https://decxxxxplatformmgmtp.westeurope.kusto.windows.net/dedb-longterm-securitylogs').CommonSecurityLog
| where DestinationIP == "13.69.67.61"
| summarize total = count() by SourceIP
union CommonSecurityLog, adx('https://decxxxxxplatformmgmtp.westeurope.kusto.windows.net/dedb-longterm-securitylogs').CommonSecurityLog
| where TimeGenerated between (datetime(2023-01-01) .. datetime(2023-01-09))
| where DestinationIP == "13.69.67.61"
| summarize total = count() by SourceIP
6.6 – Clear data testing
If we want to reset the test data in the ADX table, this can be done using this command (run as Query)
.clear table CommonSecurityLog data
Step 7: Production ready – enable Delete data
When we are done testing, we need to do 2 things:
- enable support for deleting data in Azure Storage Account after successful copy to ADX
- enable schedule, so job will run daily
We need to add an additional activity Delete
Now we need to link the 2 activities together with SUCCESFUL-status.
Click both boxes – and draw an arrow from the Copy_Files to the Delete and a connector will be made.
Change Filter by last modified in both copy and delete task to
@adddays(utcnow(),-86)
@adddays(utcnow(),-85)
Change the Schedule so the job is started. If needed, rename the schedule job to make it clear what task it runs.
Lastly publish – and we are done 🙂
Design considerations
When storing your Microsoft Sentinel data in Azure Data Explorer, consider the following elements:
Consideration | Description |
---|---|
Cluster size and SKU | Plan carefully for the number of nodes and the VM SKU in your cluster. These factors will determine the amount of processing power and the size of your hot cache (SSD and memory). The bigger the cache, the more data you will be able to query at a higher performance. We encourage you to visit the Azure Data Explorer sizing calculator, where you can play with different configurations and see the resulting cost. Azure Data Explorer also has an autoscale capability that makes intelligent decisions to add/remove nodes as needed based on cluster load. For more information, see Manage cluster horizontal scaling (scale out) in Azure Data Explorer to accommodate changing demand. |
Hot/cold cache | Azure Data Explorer provides control over the data tables that are in hot cache, and return results faster. If you have large amounts of data in your Azure Data Explorer cluster, you may want to break down tables by month, so that you have greater granularity on the data that’s present in your hot cache. For more information, see Cache policy (hot and cold cache) |
Retention | In Azure Data Explorer, you can configure when data is removed from a database or an individual table, which is also an important part of limiting storage costs. For more information, see Retention policy. |
Security | Several Azure Data Explorer settings can help you protect your data, such as identity management, encryption, and so on. Specifically for role-based access control (RBAC), Azure Data Explorer can be configured to restrict access to databases, tables, or even rows within a table. For more information, see Security in Azure Data Explorer and Row level security. |
Data sharing | Azure Data Explorer allows you to make pieces of data available to other parties, such as partners or vendors, and even buy data from other parties. For more information, see Use Azure Data Share to share data with Azure Data Explorer. |
Other cost components | Consider the other cost components for the following methods: Exporting data via an Azure Event Hub: – Log Analytics data export costs, charged per exported GBs. – Event hub costs, charged by throughput unit. Export data via Azure Storage and Azure Data Factory: – Log Analytics data export, charged per exported GBs. – Azure Storage, charged by GBs stored. – Azure Data Factory, charged per copy of activities run. |
Great guide!
Small note, when adding the “delete data” action, remember to give adequate permissions to the managed identity. It will need contributor on the storage account to delete data at all.
Thanks for the awesome writeup!
Revisiting this, did you make the repository private on github?
There is no repository for this blog post (and have never been, as there is no scripts). Where do you see a link for that ?
Hey Morten!
Love your articles, you’re so detailed.
Question: what if I want to bypass the ingestion costs in Sentinel and just store (and parse) my syslog data in ADX, and then use kql later on to query it from Sentinel if I need to?
Can I use AMA/DCR -> ADX? or how can I get syslog to ADX?
Thanks!