Azure Container App Jobs: Run Maintenance Tasks Without the Overhead

Every Microsoft 365 tenant accumulates operational debt over time. Groups without owners. Site collections with a single admin who left the company. App registrations nobody remembers creating. Keeping on top of this kind of governance drift used to mean either a full-blown Azure Function, a Logic App with limited PowerShell support, or a scheduled runbook that you forgot to maintain. None of these feel quite right for a task that should run for a few minutes, produce a report, and disappear.

Azure Container App Jobs are a cleaner answer. This post explains what they are, how they work, and walks through the sample project I built to demonstrate the pattern.

What Are Azure Container App Jobs?

Azure Container Apps is Microsoft’s serverless container platform. You give it a container image and it runs it without you having to think about clusters, nodes, or ingress controllers. Most workloads on Container Apps are long-running services — web APIs, background workers — that run continuously and scale based on load.

Container App Jobs are a different execution model. Instead of a service that stays up, a job runs a container to completion and stops. The job has an execution history, supports retries, and can be triggered in three ways:

Manual — triggered on demand via CLI, API, or the Azure portal.
Schedule — triggered by a cron expression, like a cloud-native cron job.
Event-driven — triggered by an external event source (for example a queue message).

This makes Container App Jobs a natural fit for anything with a clear start and end: nightly reports, data exports, governance checks, tenant housekeeping tasks.

How a job execution works

When a trigger fires, Container Apps pulls the configured image, starts a container, runs it to completion, and records the result in the job’s execution history. You configure:

Parallelism — how many container replicas to start per execution.
Replica completion count — how many replicas must succeed for the execution to be considered successful.
Retry limit — how many times a failed replica is retried before the execution is marked as failed.
Timeout — the maximum wall-clock time for a single replica.

For a single-task maintenance job, parallelism and completion count are both 1. The job starts one container, runs it, and either succeeds or retries up to the configured limit.

Comparison with alternatives

	Azure Functions	Logic Apps	Container App Jobs
Language	.NET, Python, JS, PS…	Designer / connectors	Anything in a container
Execution model	Function host, cold starts	Workflow engine	Run to completion
State	Durable Functions for state	Built-in workflow state	None (stateless)
Script complexity	Medium	Low	High
Infrastructure	Consumption or Premium plan	Standard or Consumption	Pay-per-execution
Managed identity	Yes	Yes	Yes

Container App Jobs shine when your task is a complex script that needs full access to PowerShell modules, custom binaries, or libraries that don’t fit neatly into a Function runtime.

The Architecture of the Sample

The sample project, PuntoBello Container App Job , demonstrates a pattern for running PowerShell maintenance scripts against a Microsoft 365 tenant on a recurring schedule. The full infrastructure is defined in Terraform and deployed via the Azure Developer CLI (azd up).

Resources provisioned

Azure Subscription
└── Resource Group
    ├── Log Analytics Workspace         ← Container App Environment diagnostics
    ├── Container Registry              ← Stores the PowerShell container image
    ├── Storage Account
    │   └── File Share ("data")         ← Hosts the .ps1 scripts at runtime
    ├── User-Assigned Managed Identity  ← The job's Azure identity
    ├── Container App Environment       ← Managed hosting layer
    └── Container App Job               ← The scheduled job definition

The container image contains PowerShell, the Az module, and PnP.PowerShell — tooling that changes infrequently. The actual maintenance scripts, which change often, are stored separately on an Azure Storage File Share and mounted into the container at /mnt/scripts at runtime.

This separation has a practical benefit: updating a script does not require rebuilding or pushing a new container image. Drop a new .ps1 file into the job/ folder, run terraform apply, and Terraform detects the change via MD5 hash and re-uploads the file to the share. The next job execution picks it up automatically.

terraform apply
  │
  ├── azurerm_storage_share_file   ← uploads job/*.ps1 (change-detected via MD5)
  └── azapi_resource "caj"         ← job runs: pwsh -File /mnt/scripts/Test-ContainerAppJob.ps1

Authentication: Managed Identity all the way down

The job container authenticates to Azure and Microsoft 365 exclusively via a User-Assigned Managed Identity — no secrets, no certificates, no credential rotation. The identity is assigned the required application role permissions during deployment:

Permission	Service	Purpose
`Group.Read.All`	Microsoft Graph	List M365 groups and their owners
`Sites.FullControl.All`	Microsoft Graph	Enumerate SharePoint site collections
`Sites.FullControl.All`	SharePoint Online	Connect via PnP PowerShell
`Application.Read.All`	Microsoft Graph	Read app registrations
`User.Read.All`	Microsoft Graph	Resolve user details

Inside the script, authentication looks like this:

# Azure PowerShell
Connect-AzAccount -Identity -AccountId $env:AZURE_CLIENT_ID `
    -Tenant $env:AZURE_TENANT_ID -Subscription $env:AZURE_SUBSCRIPTION_ID

# PnP PowerShell
$cnAdmin = Connect-PnPOnline -Url $env:SPO_ADMIN_SITE_URL `
    -ManagedIdentity -UserAssignedManagedIdentityClientId $env:AZURE_CLIENT_ID `
    -ReturnConnection

The client ID, tenant ID, and subscription ID are injected as environment variables by the Terraform job definition.

The job schedule

The job is configured with a Schedule trigger and the cron expression 0 18 * * 1-5 — weekdays at 18:00 UTC. Switching to on-demand execution for example is a one-line change in infra/main-caj.tf:

triggerType = "Manual"

What the Test Script Actually Does

The included script, Test-ContainerAppJob.ps1, is not a hello-world. It is a real tenant governance check with two sections:

1. M365 Groups without owners

Using Group.Read.All, it pages through all Microsoft 365 Groups, fetches the owners list for each, and flags any group that has zero owners. Teams-backed groups are identified separately so they can be prioritized.

2. SharePoint site collections with fewer than two admins

Using Sites.FullControl.All, it lists all site collections via Graph, connects to each one with PnP PowerShell, and reports any site that has fewer than two site collection admins. A site with a single admin is one person leaving the company away from being locked out.

At the end it prints a summary to the container log:

=== Maintenance Summary ===
  M365 Groups without owner  : 3
  SPO Sites with < 2 admins  : 7

This output is captured by Log Analytics via the Container App Environment and queryable with KQL. Or you can extend the script to write results to a SharePoint list, send a Teams notification, or push to any other endpoint.

Log verbosity

The script respects three environment variables that are set in the Terraform job definition:

Variable	Default	Effect
`INFORMATION`	`1`	Standard progress messages
`VERBOSE`	`0`	Detailed per-item output
`DEBUG`	`0`	Raw API responses and loop traces

Setting VERBOSE=1 in the job definition during troubleshooting and back to 0 for production is all that is needed.

Deployment in Practice

The entire stack deploys with three commands from inside the dev container:

azd env new <env-name>
azd env set SPO_ADMIN_SITE_URL "https://<yourtenant>-admin.sharepoint.com"
azd up

azd up delegates to Terraform, which:

Provisions all Azure resources.
Builds the container image from .devcontainer/Dockerfile using the kreuzwerker/docker Terraform provider and pushes it to the ACR.
Uploads the .ps1 scripts from job/ to the Azure File Share.
Assigns the Microsoft Graph and SharePoint application roles to the UAMI.

The only prerequisite beyond an authenticated Azure session is that Docker is running locally — the Terraform docker provider needs it to build and push the image.

One Dockerfile, two roles

There is one detail worth calling out because it is easy to overlook: the same Dockerfile in .devcontainer/ serves two distinct purposes.

As a dev container — VS Code rebuilds and reopens in it, giving you a shell with PowerShell, Azure CLI, azd, Terraform, and all required modules preinstalled. Because azd up calls the kreuzwerker/docker Terraform provider, which talks to the Docker daemon via its socket, the devcontainer needs access to the host Docker socket. This is configured in devcontainer.json:

"mounts": [
  "source=/var/run/docker.sock,target=/var/run/docker.sock,type=bind"
]

As the Container App Job image — Terraform builds the same Dockerfile and pushes the resulting image to the ACR. That image runs as the non-root node user (uid 1000), which is the right posture for a production container.

The conflict: the Dockerfile ends with USER node, but the node user has no permission to access /var/run/docker.sock, which is owned by root on the host. Running azd up from inside the devcontainer therefore fails with a permission error unless the devcontainer itself runs as root.

The practical workaround for local development is straightforward — comment out the USER node line before rebuilding the devcontainer.

Dev only — do not use in production. Running a container as root removes an important layer of defense-in-depth. This step is purely a local convenience to allow the devcontainer shell to reach the Docker socket. Never ship or deploy an image built from a Dockerfile where USER node is commented out, and never use this workaround in a CI/CD pipeline or any shared environment.

# Switch to the 'node' user
# USER node   ← comment out for devcontainer use

Then run azd up from the root shell. Terraform builds the image using the same Dockerfile, but at that point it is the kreuzwerker/docker provider — not the devcontainer shell — doing the build, and it passes USER node through correctly into the final image layers. The container that actually runs in Azure therefore still executes as the non-root node user.

Restore the line before committing so the image definition stays correct for production:

# Switch to the 'node' user
USER node

This is an inherent tension when a single Dockerfile is used for both a dev environment and a production container image. The tradeoff here favours simplicity — one image definition, no separate dev/prod Dockerfiles — at the cost of this one manual step during local setup.

Why This Pattern?

The combination of Container App Jobs, managed identity, and a File Share for scripts gives you:

No secrets — the container authenticates with its identity, not stored credentials.
No image rebuilds for script changes — update the script in job/, apply Terraform, done.
Execution history out of the box — every run is recorded with its start time, duration, and exit status.
Full PowerShell module support — any module that installs in the Dockerfile is available; no runtime limitations.
Cost proportional to usage — you pay for the seconds the container runs, not for an always-on host.

The sample is intentionally thin on opinion. The Terraform is readable, the script is self-contained, and swapping in your own maintenance logic is a matter of dropping a new .ps1 file into job/ and updating the container command in infra/main-caj.tf. The rest of the infrastructure stays unchanged.

The full sample is available at github.com/diemobiliar/puntobello-containerappjob .

Search

Azure Container App Jobs: Run Maintenance Tasks Without the Overhead

How to use Azure Container App Jobs with PowerShell, managed identity, and Terraform to run scheduled Microsoft 365 governance checks — no secrets, no always-on infrastructure, no image rebuilds for script changes.

What Are Azure Container App Jobs?

How a job execution works

Comparison with alternatives

The Architecture of the Sample

Resources provisioned

Authentication: Managed Identity all the way down

The job schedule

What the Test Script Actually Does

Log verbosity

Deployment in Practice

One Dockerfile, two roles

Why This Pattern?

What Are Azure Container App Jobs?

How a job execution works

Comparison with alternatives

The Architecture of the Sample

Resources provisioned

Why a File Share for scripts?

Authentication: Managed Identity all the way down

The job schedule

What the Test Script Actually Does

Log verbosity

Deployment in Practice

One Dockerfile, two roles

Why This Pattern?