Updating thousands of Azure Tags


How come this is not built into azure?

Tagging resources in Azure is a very powerful feature where you assign a tag and a value to almost any resource in Azure. It can be used for cost management by tagging each resource belonging to a business unit, or you can have a tag for all your SQL servers and in that way serve as target for a policy.

Some quick examples are:

business_unit: Finance
server_type: sql_cluster

These two tags, added to multiple different resource, would let you sort and determine costs for all the resources belonging to the finance department and target the VM or service with policies, security scans and patching.

It is very easy to get started with, but suddenly it gets out of hand and instead of a few neatly organized tags you have things like:

business_unit: Finance
businessUnit: finance
server_type: sql_cluster
orderedBy: To Be Defined
CreatedON: NA
ChangeNumber: change1234

It´s a mess with duplicate tags, tags that have no information in them and a lack of standardisation. In a big azure tenant where multiple teams, consultants and years of work have accumulated it is very easy to end up here if no one has taken a very strict responsebility for tagging.

One might think that Azure would have some sort of tag manager where you can change or delete tags on multiple resources at the same time easily, but no. You are required to either manually go through each resource in the web interface write some code to do it for you. This is mostly fine for small tasks; you can write a few lines that change one tag on all resources in your current subscription. But, sometimes you might need to clean up tens of thousands of tags. An example of a cleanup I needed to do was changing the structure of a tag such that “tagName” became “tag_name”, remove tags with no value or placeholder tags with temporary values and also fix a lot of mistyped tags such as “chage” which should have been “change”.

As with any complex operation, start by breaking it into smaller steps and describing what those steps are.

Complete operation: Standardize all tags in an Azure tenant. This includes deleting tags that can be classified as useless. Examples are empty tags or tags where the value is set to things like “To Be Defined”.

1. Store all resources and their tags in a data object.
2. Process everything and identify what tags needs to changed.
3. Construct an azure cli command to update the tags in the tenant.

Time to look at some code. I will show some select lines and explain their role.

I will however abbreviate greatly as there is a lot of edge case handling and fluff that does not impact the core logic.

We start by getting all the resources and their tags from azure.

resources=$(az group list --query "[?type!=''].{name:name, id:id, tags:tags}" --output json)

We get all the resources within a subscription from this, meaning several thousand object if the tenant is large. But let us look at just one of the resources and their tags for now. Below is what an example object looks like:

[
  {
    "id": "/subscriptions/ee9f3701-9a37-4407-8922-47264e1d2388/resourceGroups/rg-dev-public-tfstate/providers/Microsoft.Storage/storageAccounts/tfstateqz1s9",
    "name": "tfstateqz1s9",
    "tags": {
      "change": "example123",
    }
  }
]

Usually the amount of objects that needs processing is massive and running it as one process takes hours if not days. To avoid that, I split the large JSON array into a series of smaller arrays before I send the smaller slices to be processed concurrently.

for ((i = 0; i < json_length; i += chunk_size)); do
    json_var=$(echo "$resources" | jq -c --argjson limit "$chunk_size" --argjson skip "$i" '(.[$skip:$skip+$limit])')
    process_resources "$json_var" "$chunk_size" "$num_resources" "indexfile$((i/chunk_size + 1))" &
done

I wanted to have a progress bar since processing thousands of tags took a long time and having a progress bar is nice. I also wanted it to be somewhat realistic, but it turns out making a realistic progress bar is not that easy. When I introduced multiple processes, it also got a lot harder. With a single process, you can calculate the work done percentage by dividing the current work done with the total amount of work. So if you have done 200 of 1000 resources, then you have done 20% of the work, easy. But when you have multiple processes reporting progress, then tracking it becomes tough. One process could be at 100% while another could be at 80%. It was tricky to track that progress within the shellfile itself, so I ended up creating a directory with index files. Each process would increment its own index file. The process that was furthest along would dictate the total percentage done. If there are 10 processes working on roughly evenly split workload, they will finish within roughly the same timeframe. Below is a diagram that tries to clarify the process.

resources = [1, 2, 3, 4, 5, 6, 7, 8]
chunk_size = 3

i=0 → [1,2,3] → process_resources → indexfile1  
i=3 → [4,5,6] → process_resources → indexfile2  
i=6 → [7,8]   → process_resources → indexfile3  

To bee continued