How to save $$$ by storing your Syslog and Defender for Endpoint long-term logs in Azure Data Explorer cluster using Azure Data Factory and Azure Storage Account export – while keeping Kusto query functionalities ?

This blog is about keeping long-term Sentinel logs, giving you insight to the options today – with great opportunities to save money 🙂

To be honest, I can get confused with the limitations and feature-set, as there are coming many great changes rapidly. Therefore I have prepared an detailed overview with 2 tables covering Decision #1 Export vs. Keep and Decision 2: Choosing our long-term log-storage solution

I covered Azure LogAnalytics basic logs and archive logs in a previous blog-post – so I have decided to cover the solution, Azure Data Explorer, in this blog.

In many cases, this solution can be a great option, where you can save lots of money. But on the flipside, it is also more complex. Hopefully this guide can get you up and running, so you can try it out. I will cover adding Syslog data, but it can also be used for e.g. Defender for Endpoint data (and more).

Azure Data Explorer is an alternative to consumption-based log retention, where you pay a fixed price for the infrastructure for long-term log-storage based on Azure Data Explorer cluster – and then use Azure Data Factory to transform your data.

If you want to get a tutorial on how to get a complete environment up and running using Azure Data Explorer, Azure Data Factory and export to Azure Storage Account, then keep reading, where I will try to take you through the steps.

Decision #1 – Export vs. Keep

When we have to decide on a solution, we will have to decide on “should we export or keep data”. Below are the options:

OptionExportExportExportExportKeepKeep
MethodContinuously ExportContinuously ExportAd-hoc export – PowershellSchedule export via Query
(LogicApp, Function, Data Factory
LogAnalytics RetentionLogAnalytics Archive
TargetEventHubStorage Account
FormatJSONJSONAny formatAny format
LimitationsLinkLink

Decision #2 – Choosing our long-term log-storage solution

The big question is what solution to choose for our long-term logs, and here the complexity starts, where we have to take many factors into consideration: platform-knowledge in-house, limitations, time2enable, management, performance, monthly cost, scalability, security, etc.

Cost-factors

As mentioned above, many factors must be taken into account when you look at costs. Cost-differences cannot be concluded without asking some important questions. Here are 4 samples of these questions:

  • For how long do you require interactive hunting capabilities – alternative can a search job be sufficient ?
    • If it’s just limited hunting and audit/compliance, you can come far by using loganalytics archive.
    • If you need to query the data very often, go with Azure Data Explorer/Azure Data Factor.
    • I have shown some price below I compare a few scenarios also using LogAnalytics archive.
  • For how long do you keep your retention in Sentinel (short-term) ?
    • If you only have a 90 days in Sentinel and consider longterm from >91 days vs. if you keep shortterm in Sentinel for 1 year.
    • If you keep shortterm only for 90 days and then move everything to longterm and keep for longer, you can gain pretty high savings moving to ADX/ADX.
    • If you keep short-term for 12 months in Sentinel, then the long-term might only be 12 months, which will make the cost difference smaller.
  • How long is your length you need to keep the longterm data ?
    • Longer length ADX/ADF outperforms LogAnalytics in cost (infrastructure payment vs. consumption payment)
  • What is your duplication management design, as this is one of the main differences between the EventHub vs ADF method.
    • Azure Data Explorer combined with EventHub method stores some data (the first X months) in both Microsoft Sentinel and Azure Data Explorer, resulting in data duplication
    • Azure Data Explorer combined with Azure Data Factory enables you to copy data from Azure Data Factory only when it nears its retention limit in Microsoft Sentinel / Log Analytics, avoiding duplication.

I hope the below table will help you 🙂

ChoiceLogAnalytics Retention – Analytics logs
(lifecycle stage 1)
LogAnalytics Retention – Basic logs (lifecycle stage 1)LogAnalytics Archive (lifecycle stage 2)Azure Data ExplorerStorage Account
Storage placementLogAnalyticsLogAnalyticsLogAnalyticsAzure Data ExplorerStorage Account
Storage limitationUnlimitedUnlimitedUnlimitedUnlimited500 Tb pr storage account
Kusto Query FrontendLogAnalytics
 
Data available for interactive queries
LogAnalytics
 
Data available for interactive queries
LogAnalytics
 
Data available using search job or restore
Azure Data Explorer
 
Data available for interactive queries
Azure Data Explorer
 
Data available for interactive queries
Data move methodsNo needNo needNo need1:Continuesly Export to Storage Account
2:Data Factory Daily Move
Continuously Export to Storage Account
KustoFull support
Full Schema
Limited support
Full Schema
Search jobs
 
Data Restore
Full cmdlets support
Full Schema
Full cmdlets support
Full Schema
Search performance with date filter,
54.056 rows
Fast
5,2 sec
Not measuredVery Slow
>2-15 min
Very fast
0,175 sec
Slow
150 sec – 16 gb, 392 blobs
Search performance (full table query) –54.056 rows5,8-7,5 sec
Returned 30.000 rows
Not measuredVery Slow
>2-15 min
3,7 sec
Returned 54.056 rows
Failed
(hit limit 64 mb)
Performance scalingADX clusterN/AN/AAutoscale computeN/A
Billing methodConsumption per GbConsumption per GbConsumption per GbInfrastructure (fixed)
+ some consumption
Consumption per Gb
Data ProtectionRBAC
 
Restrict access to databases, tables or even rows witin a table
RBAC
 
Restrict access to databases, tables or even rows witin a table
RBAC
 
Restrict access to databases, tables or even rows witin a table
immutable storage account
Retention length
(limit max days)
730 (2 years)8 days36.500
(100 years)
146.000
(400 years)
Archive
(limit max days)
2556
(7 years)
2556
(7 years)
2556
(7 years)
No need, run as retentionNo need, run as retention
Retention targettingWorkspace
Individual Tables
Workspace
Individual Tables
Workspace
Individual Tables
Database
Individual Tables
Separate storage accounts per tables (max 10 export rules)
Setup complexityEasyEasyAd-hocComplex
(req knowledge about Data Explorer & Data Factory)
Moderate
Monthly Cost (USD)
Example #1 – 250 Gb Daily, 2 yr retention
11.270-20.759
 
Scenario #1:
Data Retention for 9 months incl. 3 free:
8.897
+ Archive for 12 months: 2373
Total: 11.270
 
Scenario #2:
Data Retention for 24 months incl. 3 free:
Total: 20.759
4.745

Data Retention for 8 days + Archive for 24 months










Prices are covered in examples under retention, as it is one solution











3.052














2.372-4.050
 
ADX:
510
+
Storage:
Average 1862 first 24 months
Avr 0-12 month = 987
Avr 13-24 month = 2.738
 
3.540 (24 months, full capacity, 186.000 Gb)
Monthly Cost (USD)
Example #2 – 50 Gb Daily, 2 yr retention
2.254-4.152
 
Scenario #1:
Data Retention for 9 months incl. 3 free:
1779
+ Archive for 12 months: 475
Total: 2.254
 
Scenario #2:
Data Retention for 24 months incl. 3 free:
Total: 4.152
949

Data Retention for 8 days + Archive for 24 months










Prices are covered in examples under retention, as it is one solution












2.329















890-1239
 
ADX:
510
+
Storage:
Average 380 first 24 months
Avr 0-12 month = 198
Avr 13-24 month = 562
 
729 (24 months, full capacity, 37.200 Gb)


More infoSearch jobsLink

Lets get started

By default, logs ingested into Microsoft Sentinel are stored in Azure Monitor Log Analytics.

Storing logs in Azure Data Explorer reduces costs while retains your ability to query your data, and is especially useful as your data grows. For example, while security data may lose value over time, you may be required to retain logs for regulatory requirements or to run periodic investigations on older data.

Azure Data Explorer is a big data analytics platform that is highly optimized for log and data analytics. Since Azure Data Explorer uses Kusto Query Language (KQL) as its query language, it’s a good alternative for Microsoft Sentinel data storage. Using Azure Data Explorer for your data storage enables you to run cross-platform queries and visualize data across both Azure Data Explorer and Microsoft Sentinel.

Microsoft Sentinel provides full SIEM and SOAR capabilities, quick deployment and configuration, as well as advanced, built-in security features for SOC teams. However, the value of storing security data in Microsoft Sentinel may drop after a few months, once SOC users don’t need to access it as often as they access newer data.

If you only need to access specific tables occasionally, such as for periodic investigations or audits, you may consider that retaining your data in Microsoft Sentinel is no longer cost-effective. At this point, we recommend storing data in Azure Data Explorer, which costs less, but still enables you to explore using the same KQL queries that you run in Microsoft Sentinel.

You can access the data in Azure Data Explorer directly from Microsoft Sentinel using the Log Analytics Azure Data Explorer proxy feature. To do so, use cross cluster queries in your log search or workbooks.

 Important

Core SIEM capabilities, including Analytic rules, UEBA, and the investigation graph, do not support data stored in Azure Data Explorer.

You may want to retain any data with security value in Microsoft Sentinel to use in detections, incident investigations, threat hunting, UEBA, and so on. Keeping this data in Microsoft Sentinel mainly benefits Security Operations Center (SOC) users, where typically, 3-12 months of storage are enough.

You can also configure all of your data, regardless of its security value, to be sent to Azure Data Explorer at the same time, where you can store it for longer. While sending data to both Microsoft Sentinel and Azure Data Explorer at the same time results in some duplication, the cost savings can be significant as you reduce the retention costs in Microsoft Sentinel.

Before going into details on how to do this, it is important to stress, that this method can be considered complex and it will require competency in Azure Data Factory, Azure Data Explorer, Azure LogAnalytics and Azure Storage.

This solution can be used to store your security logs; Syslog, Defender for Endpoint logs (Advanced Hunting), Azure Activity, etc.

I will in details cover setting up Syslog (CommonSecurityLog) in this post, but you can easily use it for these tables:

  • Azure
    • Aadmanagedidentitysigninlogs
    • Aadnoninteractiveusersigninlogs
    • Aadserviceprincipalsigninlogs
    • Auditlogs
    • Signinlogs
  • Defender for Endpoint (Advanced Hunting) – requires Sentinel M365D connector
    • DeviceEvents
    • DeviceFileEvents
    • DeviceLogonEvents
    • DeviceRegistryEvents
    • DeviceImageLoadEvents
    • DeviceNetworkInfo
    • DeviceProcessEvents
    • DeviceFileCertificateInfo
    • DeviceInfo
    • DeviceNetworkEvents
  • Active Directory
    • SecurityEvent

Can I use the same Kusto queries for my security-hunting ?

Yes, you can use the same Kusto queries searching in Azure Data Explorer (ADX) cluster, as you are using today. You can even search in both environments at the same time.

// This query will search in the ADX-table CommonSecurityLog (long-term log-backup)
//
union adx('https://decxxxplatformmgmtp.westeurope.kusto.windows.net/dedb-longterm-securitylogs').CommonSecurityLog 
| where DestinationIP == "13.69.67.61"
| summarize total = count() by SourceIP
// This query will query both in the LogAnalytics CommonSecurityLog (short-term) – and in the ADX CommonSecuityLog (long-term)
//
 union CommonSecurityLog, adx('https://decxxxplatformmgmtp.westeurope.kusto.windows.net/dedb-longterm-securitylogs').CommonSecurityLog
| where TimeGenerated between (datetime(2023-01-01) .. datetime(2023-01-09))
| where DestinationIP == "13.69.67.61"
| summarize total = count() by SourceIP

Solution overview (highlevel)

Before starting, I would like to credit the Microsoft team making this blog, which I have used to get my own knowledge up to speed. I have used a few pictures, statements and links from the article. I would also like to say thanks to the LogAnalytics product-team and Azure Data Explorer team for helping out with input.

The solution covers these steps:

    1: Azure LogAnalytics tables are exported to Azure Storage Account.

    This can for example be

    • CommonSecurityLog
    • ActivityLog
    • DeviceEvents
    • DeviceFileEvents
    • DeviceLogonEvents
    • DeviceRegistryEvents
    • DeviceImageLoadEvents
    • DeviceNetworkInfo
    • DeviceProcessEvents
    • DeviceFileCertificateInfo
    • DeviceInfo
    • DeviceNetworkEvents

    2: Azure Data Factory will run a daily copy job from Azure Storage Account to Azure Data Explorer when it nears its retention limit in Microsoft Sentinel / LogAnalytics, avoiding duplication of data

    • Last Modified Time Scope (1 day of data)
      • Start – last modified 86 days ago
      • End – last modified 85 days ago

    3: When data is transferred successful, Azure Data Factory will delete the data in the blob storage, which was just transferred into ADX.

    How to implement the solution

    The implementation covers 7 steps:

    • Step 1: Azure Storage Account
    • Step 2: Azure Data Explorer (ADX) cluster
    • Step 3: Azure Data Factory (ADF)
    • Step 4: Establishment of Schema/Tables on ADX cluster
    • Step 5: Create Copy-job in Azure Data Factory (ADF)
    • Step 6: Finetune Copy job (last modified, testing)
    • Step 7: Production ready – enable Delete data

    Step 1: Azure Storage Account

    1.1 – Create Storage Account (blob)

    Create an Azure Storage Account, used for export of LogAnalytics

    1.2 – Configure LogAnalytics Export of tables to Storage Account blob

    For more information, see Log Analytics workspace data export in Azure Monitor.

    Step 2: Azure Data Explorer (ADX) cluster

    2.1 – Create Cluster

    Create an Azure Data Explorer cluster and database

    Select the correct compute SKU for your Azure Data Explorer cluster

    Enable Purge must be enabled
    Important to enable system-assigned identity
    Choose depending on your network design

    Choose Review + Create to deploy the cluster

    2.2 – Create database

    When the cluster is running, we need to setup a database

    Now we will continue on setting up Azure Data Factory (ADF)

    Step 3: Azure Data Factory (ADF)

    3.1 – Create Azure Data Factory environment (or re-use existing)

    Choose Review + Create to start the ADF deployment

    3.2 – Create Linked service for Azure Data Explorer (Kusto)

    3.3 – Create Linked service for Azure Storage Account

    Start my delegating the managed identity for the Data Factory

    As we will only use Azure Blob storage as source (and not as sink), we only need to give Storage Blob Data Reader

    Now go into ADF and setup a new linked service for Blob storage. Remember to choose System Assigned Managed Identity

    Lastly you should now publish the 2 linked services.

    Step 4: Establishment of Schema/Tables on ADX cluster

    4.1 – Create table based on sample

    Now we will create the destination table in ADX, based on the schema from the source (Blob storage).

    There are multiple ways to accomplish this, but the fastest way to do this is:

    Go into ADX, choose Database. Then go into Query – and right-click and choose ‘Create table’

    You can choose to ingest data or leave it with no data ingestion for the preliminary config. Just click ‘ingest data’ and choose what you want. Personally I like to see data coming in.

    If you unclick ingest data, remember to choose ‘create mapping’

    Now choose Create table. It will create the table and mapping table and start to ingest data and verify schema is correct.

    When you have verified data looks correct, then you can choose to delete new data.

    Alternative, I have also added a script on my Github, which can be used to export the schema of the table

    On Github, I have also included the scripts ways of setting of the tables, staging table, mapping, etc.

    Step 5: Create Copy-job in Azure Data Factory (ADF)

    Lastly we need to configure the Copy-job in Azure Data Factory (ADF)

    On the front page in ADF, choose New Ingest

    In case of errors with mismatch, you will see this error. Typically I delete this particular value (Internal_workspaceResourceId)

    You can change the random names created by the wizard on the summary page shown above. Just click the Edit

    Step 6: Finetune Copy job (last modified, testing)

    6.1 – Suspend schedule (stop) + publish (as we will configure it first)

    6.2 – Rename Source, Sink, Copy data activity

    The wizard creates some odd names, which we will now rename

    6.3 – Filtering by Last Modified

    For testing we can just choose data from 60 min ago to 50 min ago (10 min). Choose what you want 🙂

    @addminutes(utcnow(),-60)

    Choose to Publish and Trigger now

    6.4 – Monitoring

    We will now verify data is coming into ADX using ADF Monitoring

    6.5 – Verification using LogAnalytics query

    We can now verify using standard Kusto queries

    union adx('https://decxxxxplatformmgmtp.westeurope.kusto.windows.net/dedb-longterm-securitylogs').CommonSecurityLog
    | where DestinationIP == "13.69.67.61"
    | summarize total = count() by SourceIP
    

    union CommonSecurityLog, adx('https://decxxxxxplatformmgmtp.westeurope.kusto.windows.net/dedb-longterm-securitylogs').CommonSecurityLog
    | where TimeGenerated between (datetime(2023-01-01) .. datetime(2023-01-09))
    | where DestinationIP == "13.69.67.61"
    | summarize total = count() by SourceIP
    

    6.6 – Clear data testing

    If we want to reset the test data in the ADX table, this can be done using this command (run as Query)

    .clear table CommonSecurityLog data

    Step 7: Production ready – enable Delete data

    When we are done testing, we need to do 2 things:

    1. enable support for deleting data in Azure Storage Account after successful copy to ADX
    2. enable schedule, so job will run daily

    We need to add an additional activity Delete

    Now we need to link the 2 activities together with SUCCESFUL-status.

    Click both boxes – and draw an arrow from the Copy_Files to the Delete and a connector will be made.

    Change Filter by last modified in both copy and delete task to

    @adddays(utcnow(),-86)
    
    @adddays(utcnow(),-85)

    Change the Schedule so the job is started. If needed, rename the schedule job to make it clear what task it runs.

    Lastly publish – and we are done 🙂

    Design considerations

    When storing your Microsoft Sentinel data in Azure Data Explorer, consider the following elements:

    ConsiderationDescription
    Cluster size and SKUPlan carefully for the number of nodes and the VM SKU in your cluster. These factors will determine the amount of processing power and the size of your hot cache (SSD and memory). The bigger the cache, the more data you will be able to query at a higher performance.

    We encourage you to visit the Azure Data Explorer sizing calculator, where you can play with different configurations and see the resulting cost.

    Azure Data Explorer also has an autoscale capability that makes intelligent decisions to add/remove nodes as needed based on cluster load. For more information, see Manage cluster horizontal scaling (scale out) in Azure Data Explorer to accommodate changing demand.
    Hot/cold cacheAzure Data Explorer provides control over the data tables that are in hot cache, and return results faster. If you have large amounts of data in your Azure Data Explorer cluster, you may want to break down tables by month, so that you have greater granularity on the data that’s present in your hot cache.

    For more information, see Cache policy (hot and cold cache)
    RetentionIn Azure Data Explorer, you can configure when data is removed from a database or an individual table, which is also an important part of limiting storage costs.

    For more information, see Retention policy.
    SecuritySeveral Azure Data Explorer settings can help you protect your data, such as identity management, encryption, and so on. Specifically for role-based access control (RBAC), Azure Data Explorer can be configured to restrict access to databases, tables, or even rows within a table. For more information, see Security in Azure Data Explorer and Row level security.
    Data sharingAzure Data Explorer allows you to make pieces of data available to other parties, such as partners or vendors, and even buy data from other parties. For more information, see Use Azure Data Share to share data with Azure Data Explorer.
    Other cost componentsConsider the other cost components for the following methods:

    Exporting data via an Azure Event Hub:
    – Log Analytics data export costs, charged per exported GBs.
    – Event hub costs, charged by throughput unit.

    Export data via Azure Storage and Azure Data Factory:
    – Log Analytics data export, charged per exported GBs.
    – Azure Storage, charged by GBs stored.
    – Azure Data Factory, charged per copy of activities run.

    3 thoughts on “How to save $$$ by storing your Syslog and Defender for Endpoint long-term logs in Azure Data Explorer cluster using Azure Data Factory and Azure Storage Account export – while keeping Kusto query functionalities ?”

    1. Great guide!
      Small note, when adding the “delete data” action, remember to give adequate permissions to the managed identity. It will need contributor on the storage account to delete data at all.
      Thanks for the awesome writeup!

      Reply

    Leave a Reply