How to do data transformation with Azure LogAnalytics – to enrich information, optimize cost, remove sensitive data?

One of the cool features in Azure LogAnalytics is the capability to do data-transformation before the data enters your LogAnalytics workspaces.

If you want to see my latest blog-series of articles about ‘mastering Azure logging’, please check the overview here

If you want to see my latest tutorial on setting up data transformation, please check it out here

I use this feature to normalize data and enrich data for Microsoft Sentinel environments and optimize costs by removing “noise” in logs. I have also used it to temporary stop millions of unnecessary log events caused by application errors or misconfigurations. Then I could investigate and fix the root cause of the issue.

To summarize, this feature can help your with:

CategoryDetails
Remove sensitive dataYou may have a data source that sends information you don’t want stored for privacy or compliancy reasons.

Filter sensitive information. Filter out entire rows or just particular columns that contain sensitive information.

Obfuscate sensitive information. For example, you might replace digits with a common character in an IP address or telephone number.
Enrich data with additional or calculated informationUse a transformation to add information to data that provides business context or simplifies querying the data later.

Add a column with additional information. For example, you might add a column identifying whether an IP address in another column is internal or external.

Add business specific information. For example, you might add a column indicating a company division based on location information in other columns.
Reduce data costsSince you’re charged ingestion cost for any data sent to a Log Analytics workspace, you want to filter out any data that you don’t require to reduce your costs.

Remove entire rows. For example, you might have a diagnostic setting to collect resource logs from a particular resource but not require all of the log entries that it generates. Create a transformation that filters out records that match a certain criteria.

Remove a column from each row. For example, your data may include columns with data that’s redundant or has minimal value. Create a transformation that filters out columns that aren’t required.

Parse important data from a column. You may have a table with valuable data buried in a particular column. Use a transformation to parse the valuable data into a new column and remove the original.   Examples of where data-transformation is useful: We want to remove specific security-events from a server, which are making lots of ”noise” in our logs due to a misconfiguration or error and it is impossible to fix itWe want to remove security events, which we might show with a high amount, but we want to filter it out like kerberos computer-logon traffic.
Disclaimer about security & data-transformation
From a security hunting perspective, it is important to note, that you should be very careful with this, as you might remove data which will make security hunting harder – or even impossible if vital data is removed or table structure is altered.

Please do proper testing and be aware of the consequences.

Supported ways of doing data-transformation

Currently, there are support for doing data-transformation using 4 methods:

  1. XPath filtering using Azure Monitor Agent
  2. Azure Monitor Agent (AMA) using TransformKql in Data Collection Rule (DCR)
  3. Custom application sending data using logs ingestion API via DCR
  4. Workspace transformation DCR

This blog will go deep-dive on method 2; ’Azure Monitor Agent (AMA) using TransformKql in Data Collection Rule (DCR)’. I will make another post on method 4 soon.

Before going into configurering data-transformation, I want to touch on the ”old” method using XPath (XML Path Language).

Transformation using XPath

Since AMA was launched in June 2021, it has supported doing transformation with XPath. This methods works fine and has helped my customers to save money.

Microsoft asked me to provide a statement about this in the officiel go-live announcement, when Azure Monitor Agent went GA in June 2021 (https://techcommunity.microsoft.com/t5/azure-observability-blog/a-powerful-agent-for-azure-monitor-and-a-simpler-world-of-data/ba-p/2443285).

Here is an example of using XPath where I am collecting all security events, excluding any computer-related security-events like Computer Kerberos logon-traffic

"Security!*[System[(band(Keywords,13510798882111488))]] and *[(EventData[Data[@Name='TargetUserName'] != 'EXCHANGE01$']) and (EventData[Data[@Name='TargetUserName'] != 'EXCHANGE02$']) and (EventData[Data[@Name='TargetUserName'] != 'EXCHANGE03$']) and (EventData[Data[@Name='TargetUserName'] != 'EXCHANGE04$']) and (EventData[Data[@Name='TargetUserName'] != 'EXCHANGE05$']) ]"

AMA is still supporting XPath, but I recommend to use data transformation using DCR-rules, as it solves some of the limitations in the XPath implementation in AMA.

XPath is a standard, currently at 3.1 (2017), but Microsoft chose to implement XPath v1 in AMA from 1999. V1 doesn’t support filtering with like or contains. Therefore it is hard to scale using xpath v1, if you need advanced features.

In the sample above, I had to explicitly define the server-names where I wanted to remove computer-related security events.

How does transformation work using AMA & DCR ?

The main difference between XPath and data-transformation using DCR is where it runs, as XPath filtering is done on the endpoint, whereas data-transformation using a DCR runs in the Azure backend in a dedicated pipeline per DCR – being applied on each row of data coming in – before data is sent into Loganalytics.

Before going into the details on how to create a data-transformation in data collection rules, we will recap the principles about Data Collection Rules (DCRs) and data transformations.

Any resource using AMA is always collecting the union of all DCRs.

This way multiple teams / roles can enable what they need, without impacting other teams. If Microsoft switched the behavior, one team could disable other collection by accident. 

Therefore a transformation will only impact that particular workspace where the transformation applies.

If you – by mistake – have 2 DCRs for SecurityEvent collection and use one of them for transformation, then you will end up having double of the events (except for the filtered away events).

As mentioned earlier, AMA will union all DCRs, so it is important, that you have a good overview of the current associated DCRs associated with the endpoints. Which data is being collected into which tables ?

Implementation of Data Collection using Data Collection Rules (DCRs)

This blog will not go into details on how to configure a data collection rule, but for reading purpose I will touch on this high-level.

Normally, I’m configuring a set of default DCRs that are being applied to all servers, both native Azure VMs and Azure Arc-enabled servers – for example:

  • Collection of VMInsights (using Dependency Agent with AMA-support)
  • Collection of System and Application event log
  • Collection of Security event log

Targeting your DCRs

After defining the DCRs, you need to associate the DCRs to your resources. This can be done in multiple ways:

  • Azure Policy and Tags – for example defining a tag.
  • Scripts – for example using Powershell ’New-AzDataCollectionRuleAssociation’
  • Terraform

Pick your choice depending on how you want to do this 😊

How to transform your data using TransformKql in a DCR using AMA

The process to create a separate DCR with data transformation is done in 7 steps:

  1. Kusto command must be defined
  2. Deploy a new DCR using Azure Monitor – or via Sentinel AMA-connector if transformation is for SecurityEvents
  3. Adding the TransformKql to the DCR
  4. Updating/uploading the modified DCR
  5. Verification of changes in DCR
  6. Assign the DCR to the machine(s) where the transformation must happend
  7. Wait for the DCR and transformation pipeline to kick in

Step 1: Kusto command must be defined

Typically, I start by defining the Kusto command to show all data; except the data we want to filter away.

Below are 4 samples

SecurityEvent | where EventID != 5145Here I want to see all Security event except for EventID = 5145
SecurityEvent | where (EventID != 8002) and (EventID != 5058) and (EventID != 4662)Here I want to see all Security events except for EventID = 8002,5058,4662
Event | where ( (EventID != 10016 and EventLog == “Application”)  )Here I want to see all Event system and application events, except for application events with eventid 10016
CommonSecurityLog | where (DeviceVendor !contains “sonicwall”) or ((DeviceVendor contains “sonicwall”) and (Activity contains “connection opened” or Activity contains “connection closed”) and (Protocol != “udp/dns”))Here I want to see all CEF/syslog where devicevendor is different from sonicwall like Cisco and all sonicwall events, except if protocol is udp/dns

Note:

When the Kusto command is working as expected, then copy the Kusto query into Notepad and change the tablename to source – below I changed SecurityEvent with source

source | where EventID != 5145

Lastly we need to change the Kusto query, so it will be applied as one line with escape characters

  source\n| where EventID != 5145  

Tip

In case you are working with advanced queries, I take my Kusto query and paste it in an online converter, which will create everything with escape characters.

Personally, I use the online website https://jsonformatter.org/json-escape

Transformation query (Kusto)Transformation query for transformKql
source | where (Account != “WINSFTP\\autotest”) and (EventID != 4688 and EventID != 8002 and EventID != 4625) and (Account != “WORKGROUP\\WINSFTP$”)source\r\n| where (Account != \”CVT-WINSFTP\\\\cvtautotest\”) and (EventID != 4688 and EventID != 8002 and EventID != 4625) and (Account != \”WORKGROUP\\\\CVT-WINSFTP$\”)

Step 2: Deploy a new DCR

Configure a standard DCR rule and choose to collect what you want like all security events or all system and application events.

Make a note of the ResourceId of the DCR rule. You will use the ResourceId in step 3

Step 3: Adding the TransformKql to the DCR

I use a Powershell script to retrieve the DCR into a flat JSON-file; use Notepad++ to edit the file and add the needed transformation – and then upload the modified DCR again. Below I have created a sample folder C:\TMP where the file will be stored

The script can be found here on my github

Start by setting the variables $ResourceId and $FilePath

ResourceId was noted in the previous step 2

FilePath can be any path – it is only a temporary file used for this change – example

####################################################
# VARIABLES
####################################################

# here you put the ResourceID of the Data Collection Rules (a sample is provided below)
$ResourceId = "/subscriptions/xxxxxx/resourceGroups/rg-logworkspaces/providers/microsoft.insights/dataCollectionRules/dcr-ingest-exclude-security-eventid"
    
# here you put a path and file name where you want to store the temporary file-extract from DCR (a sample is provided below)
$FilePath   = "c:\tmp\dcr-ingest-exclude-security-eventid.txt"

Connect to Azure

####################################################
# CONNECT TO AZURE
####################################################

Connect-AzAccount

Run the export DCR

$DCR = Invoke-AzRestMethod -Path ("$ResourceId"+"?api-version=2021-09-01-preview") -Method GET

$DCR.Content | ConvertFrom-Json | ConvertTo-Json -Depth 20 | Out-File -FilePath $FilePath

Now you have a JSON file in c:\tmp folder

Modify file and add TransformKql

Open the file using Notepad++ and add the line transformKql command that you created in step 1

"transformKql":  "source\n| where (EventID != 5145)"

NOTE: Remember to add the , (comma) on the line before so you are not breaking the JSON structure.

Step 4: Update the DCR (Uploading the changed DCR from the file)

Now we want to run the last part of the powershell script, which will update the DCR taking the entire content of the local file and making a REST api call using a specific api version (2021-09-01-preview). TransformKql requires a specific api-version (2021-09-01-preview), so we need to communicate with that particular api-set using REST api.

####################################################
# UPLOAD FILE / UPDATE DCR WITH TRANSFORM
####################################################

$DCRContent = Get-Content $FilePath -Raw 

Invoke-AzRestMethod -Path ("$ResourceId"+"?api-version=2021-09-01-preview") -Method PUT -Payload $DCRContent

You should be getting a StatusCode 200 with the PUT commmand, indicating everything it updated correctly

If there is is an error in the file structure, you will get an error 400

Step 5: Verification of changes in DCR

If you go and check the GUI for the changes, please remember to choose the correct API in the drop-down menu. Otherwise you wont be able to see the changes. You will now see the transformKql.

You can also extract the changes running the first lines again to extract into the local file using the GET command

####################################################
# EXPORT EXISTING DCR TO FILE
####################################################

$DCR = Invoke-AzRestMethod -Path ("$ResourceId"+"?api-version=2021-09-01-preview") -Method GET

$DCR.Content | ConvertFrom-Json | ConvertTo-Json -Depth 20 | Out-File -FilePath $FilePath

Step 6: Associate the DCR rule to the machine(s)

Now you have to associate the DCR rule to the machine(s) where you want the transformation to happend.

Step 7: Wait for the DCR and transformation pipeline to kick in

Now we have to wait for pipeline transformation to happend. Normally it wil start within 5-10 minutes, but I have seen changes taking 30 min.

Troubleshooting

If you want to verify, that the local Azure Monitor Agent has received the DCR rule, this can be done using the local AMA files. AMA will check every 5 min for new DCR changes.

Verification using local AMA config files

Open your DCR rule in JSON view and note the ImmuntableId of your DCR rule

Open the file C:\WindowsAzure\Resources\AMADataStore.DC1\mcs\mcsconfig.latest.xml (or mcsconfig.latest.json).

Search for the ImmuntableId. You should now see that it is using the new DCR rule.

If AMA is not receiving the DCR rule, please verify that you have linked the DCR rule to the server. Otherwise continue to the AMA troubleshooting below.

Azure Monitor Agent troubleshooting

Azure Monitor Agent has a troubleshooting Powershell-script, which might be able to help you.

Run the following Powershell script:

C:\Packages\Plugins\Microsoft.Azure.Monitor.AzureMonitorWindowsAgent\<latest version on your computer>\CollectAMALogs.ps1

It will generate a zip-file on your desktop with all the log-files.

Open the file maeventtable.csv in Excel and filter

  • Stream = MonAgentManager.exe
  • File = refreshconfigurations.cpp

You should now be able to see the DCR changes

More resources about data transformation

More samples on transformation of custom logs

3 thoughts on “How to do data transformation with Azure LogAnalytics – to enrich information, optimize cost, remove sensitive data?”

Leave a Reply