Azure tag management


Surely this is built in functionality?

Tagging resources in Azure is a very useful feature. It can be used for billing, to put the resource within the scope of a policy, or simply to organize your resources. It is very easy to get started with, but suddenly you have thousands of tags. And then you realize that you want to change one of the tags that are applied to thousands of resources. I thought that surely Azure has some sort of tag manager where you can change this on multiple resources at the same time easily, but no. You are required to either manually go through each resource in the web interface or write some bash / PowerShell to go through each tag. This is again fine; you can write a few lines that change one tag on all resources in your current subscription. But life is not easy, so the task I had before me was cleaning up tens of thousands of tags. This usually meant changing the structure of a tag such that “tagName” became “tag_name”. Additionally, there was a requirement to remove tags with no value or placeholder tags with temporary values. These tags were likely a product of humans complying with the letter of the law (policy) instead of the intent. The policy for creating a resource likely required one or more tags to be set on the resource. Sometimes the person creating a resource does not really know what they are creating. They just know that someone wanted a so-and-so sized SQL server in that vnet with this list of users, and the resource is required to have one or more tags. So they did what was required and moved on. This is the outcome in a lot of businesses today where workers are living in a “factory” style environment. They have no knowledge or care about the work they do. They have a ticket, email, or a manager yelling at them. So they just make the thing so they are left alone again. At any rate, it was now up to me to fix this mess.

Naively, I tried to non-destructively update only the tags that were set to be changed. This was extremely slow since each replace was essentially its own little update operation. When there are thousands of resources and some have lots of tags, this quickly ends up taking too long. After settling on doing one operation per resource, the only option was to create the tags. This meant I needed to read all the tags, then decide which to transform, which to delete, and which to recreate as they were.

I ended up with this command:

    az tag create --resource-id /subscriptions/52c67a7d-984a-4e5f-b568-39b427c6e123/resourceGroups/Hub-Common-Network-RG  --tags 'creator'='admin.ola.nordmann@business.no' 'environment'='Hub' 'eo'='1234' 'owned_by'='cloud@consultingfirm.no' 

Great, now we need to somehow take all resources in a subscription and get all their tags and transform, delete, or keep each tag based on a set of rules. Please note that I am not a “real” programmer. What follows is what I managed to cobble together as a jack of all trades that likes to dabble.

I wanted this program to be usable by others, so I decided to require that the subscription was given by the user, require a parameter present, as well as ask for confirmation before actually modifying any resources. I see way too many people that love to run scripts they do not properly understand, so the least I can do is make it a little hard to destroy something.

# Prompt the user for input until valid input is provided
while true; do
    echo "Please enter Azure subscription ID:"
    read user_input

    # Check if the input is empty
    if [ -z "$user_input" ]; then
        echo "Error: Input cannot be empty. Please try again."
    else
        # Validate the subscription ID using Azure CLI
        az account show --subscription "$user_input" &> /dev/null

        if [ $? -eq 0 ]; then
            # Set the subscription and break out of the loop
            az account set --subscription "$user_input"
            echo "Subscription set to: $user_input"
            break
        else
            echo "Error: Invalid subscription ID. Please try again."
        fi
    fi
done

Store the given information for later.

# Check if the subscription is set. Split name and ID
current_subscription=$(az account show --query '{ Name: name, ID: id }' -o json)
subname=$(echo "$current_subscription" | jq -r '.Name')
subid=$(echo "$current_subscription" | jq -r '.ID')

Set up logging and a directory for storing index files.

# Specify the directory name for logfiles
directory_name="set_tags_logfiles"
index_file_dir="index"

# Check if the directory exists
if [ -d "$directory_name" ]; then
    echo "Directory '$directory_name' exists."
else
    # Create the directory
    mkdir "$directory_name"

    # Check if the directory was created successfully
    if [ $? -eq 0 ]; then
        echo "Directory '$directory_name' created successfully."
    else
        echo "Error: Failed to create directory '$directory_name'."
    fi
fi

# Check if the directory exists
if [ -d "$index_file_dir" ]; then
    echo "Directory '$index_file_dir' exists."
else
    # Create the directory
    mkdir "$index_file_dir"

    # Check if the directory was created successfully
    if [ $? -eq 0 ]; then
        echo "Directory '$index_file_dir' created successfully."
    else
        echo "Error: Failed to create directory '$index_file_dir'."
    fi
fi

# Logfile setup
logfile_set_tags="$directory_name/$subname""_log_set_tags_$(date "+%Y%m%d_%H%M%S").txt"
start=`date +%s`
# Create file
echo "" > $logfile_set_tags

Run a check to see if we are doing a test run or modifying resources. If we are not just doing a test run, then ask for permission.

delay=1

# Ask for confirmation only if resources are modified
if [ "$param" = "start" ]; then
    confirmation="false"
    export confirmation
    while true; do
        read -p "Confirm that you want to attempt to modify tags on all resources under $subname subscription? " yn
        case $yn in
            [Yy]* ) confirmation="true"; break;;
            [Nn]* ) echo "You selected No, exiting in $delay seconds"; sleep $delay; exit 1;;
            * ) echo "Please answer yes or no.";;
        esac
    done
fi

# Runs the script and modifies tags on all resources if the parameter 'start' is used. If not, the script just generates a logfile.
if [ "$param" = "start" ]; then
    echo "Tagfix script starting on $subname in 5 seconds"
else    
    echo "Dry run of tagfix script starting on $subname in 5 seconds"
fi

for ((i = delay; i >= 1; i--)); do
    echo "$i"
    sleep 1
done

echo "Starting. See logfile under 'set_tags_logfiles' for more information"

I wanted to create a progress bar to have some sense of how far along we were while running.

# Setup progress bar
bar_size=40
bar_char_done="#"
bar_char_todo="-"
bar_percentage_scale=0
counter=0
multi=false
show_progress () {
    local total="$1"
    local current="$2"
    percent_highest=1
    percent=0
    
    # Skip the first 10 iterations
    if ((counter < 2)) && [[ $multi == true ]]; then
        ((counter++))
        return
    fi
    # Loop through the index text files

    if [[ $multi == true ]]; then
        for (( i=2; i<=$p+1; i++ )); do
            file="$index_file_dir/indexfile$i"
            # Check if the file exists
            if [ -f "$file" ]; then
                # Read the number from the file and add it to the total
                number=$(<"$file")
                current=$((current + number))
            fi
        done
    else
        file="$index_file_dir/indexfile0"
        # Check if the file exists
        if [ -f "$file" ]; then
            # Read the number from the file and add it to the total
            number=$(<"$file")
            current=$((current + number))
        fi
    fi

    # calculate the progress in percentage 
    percent=$(bc <<< "scale=$bar_percentage_scale; 100 * $current / $total" )

    # Ensure progress bar not jumping around if a lower progress number is reported by a process
    if ((percent < percent_highest)); then
        percent=$percent_highest
    else
        percent_highest=$percent

    fi

    # Check if progress is 100%
    if (( $(bc <<< "$percent == 100") )); then
        echo -ne "\rProcessing completed : [${done_sub_bar}] ${percent}%\r"
    else
        # The number of done and todo characters
        done=$(bc <<< "scale=0; $bar_size * $percent / 100" )
        todo=$(bc <<< "scale=0; $bar_size - $done" )

        # build the done and todo sub-bars
        done_sub_bar=$(printf "%${done}s" | tr " " "${bar_char_done}")
        todo_sub_bar=$(printf "%${todo}s" | tr " " "${bar_char_todo}")

        # output the bar with carriage return and line feed
        echo -ne "\rProcessing : [${done_sub_bar}${todo_sub_bar}] ${percent}%\r"
    fi

}

Now we come to the part that constructs and runs the Azure CLI commands we saw earlier

construct_azure_cli_command() {
    local imported_tags="$2"
    # Construct the az tag create command
    az_tags_command="timeout 10s az tag create --resource-id $id --tags"  
    # Ugly hack to import associative arrays
    eval "declare -A imported_objects="${1#*=}
    for i in "${!imported_objects[@]}"; do
        # Filter out some keys that should not be kept. 
        if [ "$key" != "123" ] && [ "$key" != "NA" ] && [ "$key" != "To be defined" ]; then
            az_tags_command+=" '$i'='${imported_objects[$i]}'"
        fi
    done

    echo -e "*******Start of object******************* \n\
    id: $id \n\
    Current tags: $imported_tags \n\
    Azurecli command:\n$az_tags_command \n\
    *******End of Object******************* \n" >> $logfile_set_tags

    # Run the finished az cli command
    if [ "$param" = "start" ]; then
        eval "$az_tags_command" > /dev/null 2>&1
    fi
}

construct_azure_cli_command() takes in two parameters: the tags as they are unmodified, as well as the modified tags. This is because I wanted to log what they were before and after the operation, and I wanted them one after another in the logfile for clarity. In retrospect, this function should not have any modification going on. But it does strip out the placeholder tags. I think I put it here because, as you will see, the next code block is kind of a mess.

# Function that modifies tags
modify_tags() {
    local current_tags="$1"
    # Create an empty array to store the transformed key-value pairs
    declare -A transformed_object=()
    # Iterate over each key-value pair in the object
    for key in $(jq -r 'keys[]' <<< "$current_tags"); do
        # Keep the values in current_tags if they are not in the lookup_table
        if [ -z "${lookup_table[$key]}" ]; then
            value=$(jq -r --arg key "$key" '.[$key]' <<< "$current_tags")
            value=$(echo "$value" | sed "s/'//g")
            # Remove certain tag names if they have no value, or if they have "bad" values
            if [ "$value" != "null" ] && [ -n "$value" ]; then
                if [ "$key" != "123" ] && [ "$key" != "NA" ] && [ "$key" != "To be defined" ]; then
                    transformed_object["$key"]="$value"
                fi
            fi
        fi
    done
    for key in "${!lookup_table[@]}"; do
        # Get the corresponding value from the lookup table
        transformed_key=${lookup_table[$key]}
        if [ -z "${transformed_object[$key]}" ]; then
            value=$(jq -r ".\"$key\"" <<< "$current_tags")
            # Filter out '' tags that create problems since we use '' to enclose tag name and tag value
            value=$(echo "$value" | sed "s/'//g")
            # only do the thing if there is a value 
            if [ "$value" != "null" ] && [ -n "$value" ] ; then
                transformed_object["$transformed_key"]=$value
            fi
        fi
        # Check if the key exists in the object
        if jq -e ".\"$key\"" <<< "$current_tags" > /dev/null; then
            value=$(jq -r ".\"$key\"" <<< "$current_tags")
            value=$(echo "$value" | sed "s/'//g")
            # Check if the transformed key is not empty and the value is not null or empty
            if [ -n "$transformed_key" ] && [ "$value" != "null" ]; then
                # Add the transformed key-value pair to the transformed object
                transformed_object["$transformed_key"]="$value"
            fi
        fi
    done
    # Check if the array is empty
    if [ ${#transformed_object[@]} -eq 0 ]; then
        # Do not build an empty command if there are no tags to set
        echo "No tags to modify on $id, skipping." >> $logfile_set_tags
    else
        construct_azure_cli_command "$(declare -p transformed_object)" "$current_tags"
    fi
}

Due to the number of resources that needed to be modified per subscription, it became necessary to create multiple different processes to get through them all in a reasonable timeframe. I created a function that takes in a limited amount of the total resources and does two things. The first thing that happens is that it checks that there is actually something to modify. The resource that shall be modified is then sent to a different function that actually does the modifying. This function also helps to keep track of the progress.

You see, I wanted to have a progress bar since it took a long time and having a progress bar is nice. I also wanted it to be somewhat realistic, but making a realistic progress bar is not that easy. It’s not super hard either, but the actual functionality you get out of it is limited versus time invested. When I introduced multiple processes, it also got a lot harder. With a single process, you can calculate the work done percentage by dividing the current work done with the total amount of work. For a lot of scenarios, that will not work since you do not know the amount of work left. But we have a set amount of resources, and they take roughly the same time to go through. So if you have done 200 of 1000 resources, then you have done 20% of the work, easy. But when you have multiple processes reporting progress, then tracking it becomes tough. One process could be at 100% while another could be at 80%. From here on, it becomes very apparent that I was rushing a bit due to time constraints and lack of ability. I ended up creating a directory with index files. Each process would increment its own index file. This was because keeping track of this in bash became a bit tricky. The process that was furthest along would dictate the total percentage done. This sounds a bit backwards, but was the quickest way to finish it. If there are 10 processes working on roughly evenly split workload, they will finish within roughly the same timeframe. When a process takes hours to finish, being “stuck” on 100% for a minute is not a big deal. If the workload is smaller, then the difference between the quickest and slowest process is just a few seconds.

index=0
process_resources() {
    local resources="$1"
    local chunk_size="$2"
    local num_resources="$3"
    local indexfile="$index_file_dir/$4"
    current=0

    for resource in $(echo "$resources" | jq -c '.[]'); do
        name=$(echo "$resource" | jq -r '.name')
        id=$(echo "$resource" | jq -r '.id')
        # Check if JSON object is not null
        if [[ $(echo "$resource" | jq '.tags') != "null" ]]; then
            current_tags=$(echo "$resource" | jq -r '.tags')
            modify_tags "$current_tags"
        else
            # JSON object is null
            echo "$id has no tags, skipping" >> $logfile_set_tags
        fi
        ((index++))
        echo "$index" > $indexfile
        show_progress "$num_resources" "$current"
    done
}
# Start of Main loop
if ! $confirmation && [ "$param" = "start" ]; then
    echo "You either did not select Yes or the script was cancelled, exiting in 5 seconds"
    sleep 5
    exit 1
else
    # Get all resource groups and their tags
    echo "Processing all tags on resource groups under selected subscription"
    
    # Counter for progress bar
    counter=1
    resource_names=$(az group list --query "[].name" --output tsv)    
    resources=$(az group list --query "[?type!=''].{name:name, id:id, tags:tags}" --output json)
    num_resources=$(az group list --query "[?type!=''].{name:name, id:id, tags:tags}" --output json | jq length)
 
    # Multi
    json_length=$(echo "$resources" | jq length)
    chunk_size=$((json_length / $p + 1))

    if ((json_length < 20)); then
        process_resources "$resources" "$chunk_size" "$num_resources" "indexfile0" &
        wait
        echo
    else
        multi=true
        # Split the JSON string into multiple variables
        for ((i = 0; i < json_length; i += chunk_size)); do
            json_var=$(echo "$resources" | jq -c --argjson limit "$chunk_size" --argjson skip "$i" '(.[$skip:$skip+$limit])')
            process_resources "$json_var" "$chunk_size" "$num_resources" "indexfile$((i/chunk_size + 2))" &
        done

        wait
        echo
    fi

    # Reset counter
    counter=1
    echo
    # Then we get all the resources in all the resource groups
    echo "Processing tags on all other resources under selected subscription"
    resource_names2=$(az group list --query "[].name" --output tsv)    
    resources2=$(az resource list --query "[?type!=''].{name:name, id:id, tags:tags}" --output json)
    num_resources2=$(az resource list --query "[?type!=''].{name:name, id:id, tags:tags}" --output json | jq length)

    # Multi
    multi=true
    json_length=$(echo "$resources2" | jq length)
    chunk_size=$((json_length / $p + 1))

    # Split the JSON string into multiple variables
    for ((i = 0; i < json_length; i += chunk_size)); do
        json_var=$(echo "$resources2" | jq -c --argjson limit "$chunk_size" --argjson skip "$i" '(.[$skip:$skip+$limit])')
        process_resources "$json_var" "$chunk_size" "$num_resources2" "indexfile$((i/chunk_size + 2))" &
    done

    wait
    echo
fi
end=`date +%s`

runtime=$((end-start))
echo "Script ran in $runtime seconds" >> "$logfile_set_tags"

When running this command as part of a bigger program, I noticed that Azure would sometimes be slow about sending a reply. This meant that even when the command was received and processed, waiting for the reply would sometimes drag on for minutes. Not great when running thousands of commands. Adding a timeout worked perfectly. I did set a very safe 10 seconds since the time use was not critical and most commands got their replies in a timely manner.

timeout 10s az tag create --resource-id /subscriptions/52c67a7d-984a-4e5f-b568-39b427c6e123/resourceGroups/Hub-Common-Network-RG  --tags 'creator'='admin.ola.nordmann@business.no' 'environment'='Hub' 'eo'='1234' 'owned_by'='cloud@consultingfirm.no' 
construct_azure_cli_command() {
    local imported_tags="$2"
    # Construct the az tag create command
    az_tags_command="timeout 10s az tag create --resource-id $id --tags"  
    # Ugly hack to import associative arrays
    eval "declare -A imported_objects="${1#*=}
    for i in "${!imported_objects[@]}"; do
        # Filter out some keys that should not be kept. 
        if [ "$key" != "123" ] && [ "$key" != "NA" ] && [ "$key" != "To be defined" ]; then
            az_tags_command+=" '$i'='${imported_objects[$i]}'"
        fi
    done