Every Microsoft 365 tenant accumulates operational debt over time. Groups without owners. Site collections with a single admin who left the company. App registrations nobody remembers creating. Keeping on top of this kind of governance drift used to mean either a full-blown Azure Function, a Logic App with limited PowerShell support, or a scheduled runbook that you forgot to maintain. None of these feel quite right for a task that should run for a few minutes, produce a report, and disappear.
Azure Container App Jobs are a cleaner answer. This post explains what they are, how they work, and walks through the sample project I built to demonstrate the pattern.
What Are Azure Container App Jobs?
Azure Container Apps is Microsoft’s serverless container platform. You give it a container image and it runs it without you having to think about clusters, nodes, or ingress controllers. Most workloads on Container Apps are long-running services — web APIs, background workers — that run continuously and scale based on load.
Container App Jobs are a different execution model. Instead of a service that stays up, a job runs a container to completion and stops. The job has an execution history, supports retries, and can be triggered in three ways:
- Manual — triggered on demand via CLI, API, or the Azure portal.
- Schedule — triggered by a cron expression, like a cloud-native cron job.
- Event-driven — triggered by an external event source (for example a queue message).
This makes Container App Jobs a natural fit for anything with a clear start and end: nightly reports, data exports, governance checks, tenant housekeeping tasks.
How a job execution works
When a trigger fires, Container Apps pulls the configured image, starts a container, runs it to completion, and records the result in the job’s execution history. You configure:
- Parallelism — how many container replicas to start per execution.
- Replica completion count — how many replicas must succeed for the execution to be considered successful.
- Retry limit — how many times a failed replica is retried before the execution is marked as failed.
- Timeout — the maximum wall-clock time for a single replica.
For a single-task maintenance job, parallelism and completion count are both 1. The job starts one container, runs it, and either succeeds or retries up to the configured limit.
Comparison with alternatives
| Azure Functions | Logic Apps | Container App Jobs | |
|---|---|---|---|
| Language | .NET, Python, JS, PS… | Designer / connectors | Anything in a container |
| Execution model | Function host, cold starts | Workflow engine | Run to completion |
| State | Durable Functions for state | Built-in workflow state | None (stateless) |
| Script complexity | Medium | Low | High |
| Infrastructure | Consumption or Premium plan | Standard or Consumption | Pay-per-execution |
| Managed identity | Yes | Yes | Yes |
Container App Jobs shine when your task is a complex script that needs full access to PowerShell modules, custom binaries, or libraries that don’t fit neatly into a Function runtime.
The Architecture of the Sample
The sample project, PuntoBello Container App Job
, demonstrates a pattern for running PowerShell maintenance scripts against a Microsoft 365 tenant on a recurring schedule. The full infrastructure is defined in Terraform and deployed via the Azure Developer CLI (azd up).
Resources provisioned
Azure Subscription
└── Resource Group
├── Log Analytics Workspace ← Container App Environment diagnostics
├── Container Registry ← Stores the PowerShell container image
├── Storage Account
│ └── File Share ("data") ← Hosts the .ps1 scripts at runtime
├── User-Assigned Managed Identity ← The job's Azure identity
├── Container App Environment ← Managed hosting layer
└── Container App Job ← The scheduled job definition
Why a File Share for scripts?
The container image contains PowerShell, the Az module, and PnP.PowerShell — tooling that changes infrequently. The actual maintenance scripts, which change often, are stored separately on an Azure Storage File Share and mounted into the container at /mnt/scripts at runtime.
This separation has a practical benefit: updating a script does not require rebuilding or pushing a new container image. Drop a new .ps1 file into the job/ folder, run terraform apply, and Terraform detects the change via MD5 hash and re-uploads the file to the share. The next job execution picks it up automatically.
terraform apply
│
├── azurerm_storage_share_file ← uploads job/*.ps1 (change-detected via MD5)
└── azapi_resource "caj" ← job runs: pwsh -File /mnt/scripts/Test-ContainerAppJob.ps1
Authentication: Managed Identity all the way down
The job container authenticates to Azure and Microsoft 365 exclusively via a User-Assigned Managed Identity — no secrets, no certificates, no credential rotation. The identity is assigned the required application role permissions during deployment:
| Permission | Service | Purpose |
|---|---|---|
Group.Read.All |
Microsoft Graph | List M365 groups and their owners |
Sites.FullControl.All |
Microsoft Graph | Enumerate SharePoint site collections |
Sites.FullControl.All |
SharePoint Online | Connect via PnP PowerShell |
Application.Read.All |
Microsoft Graph | Read app registrations |
User.Read.All |
Microsoft Graph | Resolve user details |
Inside the script, authentication looks like this:
# Azure PowerShell
Connect-AzAccount -Identity -AccountId $env:AZURE_CLIENT_ID `
-Tenant $env:AZURE_TENANT_ID -Subscription $env:AZURE_SUBSCRIPTION_ID
# PnP PowerShell
$cnAdmin = Connect-PnPOnline -Url $env:SPO_ADMIN_SITE_URL `
-ManagedIdentity -UserAssignedManagedIdentityClientId $env:AZURE_CLIENT_ID `
-ReturnConnection
The client ID, tenant ID, and subscription ID are injected as environment variables by the Terraform job definition.
The job schedule
The job is configured with a Schedule trigger and the cron expression 0 18 * * 1-5 — weekdays at 18:00 UTC. Switching to on-demand execution for example is a one-line change in infra/main-caj.tf:
triggerType = "Manual"
What the Test Script Actually Does
The included script, Test-ContainerAppJob.ps1, is not a hello-world. It is a real tenant governance check with two sections:
1. M365 Groups without owners
Using Group.Read.All, it pages through all Microsoft 365 Groups, fetches the owners list for each, and flags any group that has zero owners. Teams-backed groups are identified separately so they can be prioritized.
2. SharePoint site collections with fewer than two admins
Using Sites.FullControl.All, it lists all site collections via Graph, connects to each one with PnP PowerShell, and reports any site that has fewer than two site collection admins. A site with a single admin is one person leaving the company away from being locked out.
At the end it prints a summary to the container log:
=== Maintenance Summary ===
M365 Groups without owner : 3
SPO Sites with < 2 admins : 7
This output is captured by Log Analytics via the Container App Environment and queryable with KQL. Or you can extend the script to write results to a SharePoint list, send a Teams notification, or push to any other endpoint.
Log verbosity
The script respects three environment variables that are set in the Terraform job definition:
| Variable | Default | Effect |
|---|---|---|
INFORMATION |
1 |
Standard progress messages |
VERBOSE |
0 |
Detailed per-item output |
DEBUG |
0 |
Raw API responses and loop traces |
Setting VERBOSE=1 in the job definition during troubleshooting and back to 0 for production is all that is needed.
Deployment in Practice
The entire stack deploys with three commands from inside the dev container:
azd env new <env-name>
azd env set SPO_ADMIN_SITE_URL "https://<yourtenant>-admin.sharepoint.com"
azd up
azd up delegates to Terraform, which:
- Provisions all Azure resources.
- Builds the container image from
.devcontainer/Dockerfileusing thekreuzwerker/dockerTerraform provider and pushes it to the ACR. - Uploads the
.ps1scripts fromjob/to the Azure File Share. - Assigns the Microsoft Graph and SharePoint application roles to the UAMI.
The only prerequisite beyond an authenticated Azure session is that Docker is running locally — the Terraform docker provider needs it to build and push the image.
One Dockerfile, two roles
There is one detail worth calling out because it is easy to overlook: the same Dockerfile in .devcontainer/ serves two distinct purposes.
As a dev container — VS Code rebuilds and reopens in it, giving you a shell with PowerShell, Azure CLI, azd, Terraform, and all required modules preinstalled. Because azd up calls the kreuzwerker/docker Terraform provider, which talks to the Docker daemon via its socket, the devcontainer needs access to the host Docker socket. This is configured in devcontainer.json:
"mounts": [
"source=/var/run/docker.sock,target=/var/run/docker.sock,type=bind"
]
As the Container App Job image — Terraform builds the same Dockerfile and pushes the resulting image to the ACR. That image runs as the non-root node user (uid 1000), which is the right posture for a production container.
The conflict: the Dockerfile ends with USER node, but the node user has no permission to access /var/run/docker.sock, which is owned by root on the host. Running azd up from inside the devcontainer therefore fails with a permission error unless the devcontainer itself runs as root.
The practical workaround for local development is straightforward — comment out the USER node line before rebuilding the devcontainer.
Dev only — do not use in production. Running a container as root removes an important layer of defense-in-depth. This step is purely a local convenience to allow the devcontainer shell to reach the Docker socket. Never ship or deploy an image built from a Dockerfile where
USER nodeis commented out, and never use this workaround in a CI/CD pipeline or any shared environment.
# Switch to the 'node' user
# USER node ← comment out for devcontainer use
Then run azd up from the root shell. Terraform builds the image using the same Dockerfile, but at that point it is the kreuzwerker/docker provider — not the devcontainer shell — doing the build, and it passes USER node through correctly into the final image layers. The container that actually runs in Azure therefore still executes as the non-root node user.
Restore the line before committing so the image definition stays correct for production:
# Switch to the 'node' user
USER node
This is an inherent tension when a single Dockerfile is used for both a dev environment and a production container image. The tradeoff here favours simplicity — one image definition, no separate dev/prod Dockerfiles — at the cost of this one manual step during local setup.
Why This Pattern?
The combination of Container App Jobs, managed identity, and a File Share for scripts gives you:
- No secrets — the container authenticates with its identity, not stored credentials.
- No image rebuilds for script changes — update the script in
job/, apply Terraform, done. - Execution history out of the box — every run is recorded with its start time, duration, and exit status.
- Full PowerShell module support — any module that installs in the Dockerfile is available; no runtime limitations.
- Cost proportional to usage — you pay for the seconds the container runs, not for an always-on host.
The sample is intentionally thin on opinion. The Terraform is readable, the script is self-contained, and swapping in your own maintenance logic is a matter of dropping a new .ps1 file into job/ and updating the container command in infra/main-caj.tf. The rest of the infrastructure stays unchanged.
The full sample is available at github.com/diemobiliar/puntobello-containerappjob .