In a previous post, Add Search to Hugo Sites With Azure Search, I explained how I added a search capability to my site using Azure Search. In this post, I’ll show you how I trigger Azure Search to reindex the site each time it’s redeployed as part of my existing GitHub Actions configuration.
As I explained in my post Automated Hugo Releases with GitHub Actions, I use GitHub Actions to deploy my site. What I wanted to do what to make sure that whenever the site was updated, Azure Search would reindex the site.
What I did was setup an additional job for the live site. Whenever a deployment succeeds, I want to not only tell Azure Search to reindex the site, but I also want to purge the search results page as well as the search index data file on my site from the CDN.
This involved two steps:
- Create a service principal (Azure AD application) that will be used to make the changes in Azure
- Update the GitHub Actions workflow
As I said above, I want to do two things in Azure: purge the CDN and re-run the index. I wanted to do both of these things with the Azure CLI, but it’s support for Azure Search is quite limited at this time, so I was stuck using their REST API.
However, I can use the CLI for purging the CDN. I’d prefer to have an agent do this rather than a user, so I created a service principal that was granted permission to do this.
To do this, create a new Azure AD application in your tenant from within the Azure portal. You don’t need to grant it any permissions, but you do need to create a client secret.
Once you have the app created, head over to your resource group or CDN, whichever level you want to give this app access to, and select the Access control (IAM) menu item:
Here I’ve granted the app the contributor role to the entire resource group. You can limit this further if you wish, for example I could have just granted this the role CDN Endpoint Contributor on the resource group or the actual CDN endpoint, but I use this for other things that are beyond the scope of this article.
With an app created, now I can update the pipeline.
Next, I had to add another secret to my GitHub workflow: AZURE_ACCOM_BOT_CLIENTSECRET. This contains the secret defined for my service principal above.
Now, head over to your workflow. I first updated my environment settings to include details for the new service principal and the Azure Search admin key:
The last step is to update the workflow by adding a new job after the existing build & deployment job. Here’s what it looks like:
Note Line BreaksI added line breaks to make it easier to read … all the script commands should be on a single line, not broken up like you see in this snippet.
Let me explain what this does:
- Create a new job: I first create a new job
search-reindexthat depends on the stage
build-deploy. This is done as I only want to reindex the site when the deployment succeeds. In this setup I also add a condition so that this only runs when a build is triggered on the
masterbranch. Technically, this isn’t necessary as the
build-deployonly runs on master as well which would control it.
- Step: Login to Azure CLI: The first step in this job is to login to the Azure CLI. This is where I’m using my service principal that I created. By injecting the IDs and client secrets in as environment variables, they won’t get written to the pipeline execution or diagnostic logs.
- Step: Purge CDN: I’m using an Azure CDN on my site for optimal performance. When the site gets updated, I want to make sure the search page is purged from the CDN as well as that JSON file that Azure Search uses to index the site.
- Step: Reindex the site: The last step is to reindex the site. As I said above, the Azure CLI doesn’t support this at the time of writing, so I’m using their management REST API. To simplify things, I wrapped this step up in a custom GitHub Action that I’ve published to the GitHub marketplace for others to use: GitHub Marketplace: Azure Cognitive Search Reindex
It may seem a bit strange when yo look at your indexer results as it shows 0/0 docs were indexed, but that’s only when the indexer doesn’t see a change to the file. When you actually add content to the site, it will pickup those files.
For instance, with my Github Actions setup, I automatically rebuild & deploy the site a few times a day picking up any content that I wrote that I wanted published at a later time. If nothing changed, then there was nothing new to get indexed. But as you can see from the indexer history above, When I added one new blog post, it picked it up.comments powered by Disqus