Skip to main content
Back to research
Operations runbookLast verified 2026-05-05

Deploy workflow — pushing a tagged release to a region

Pushing a tagged release into a region. Stage → backup → atomic install → restart → health probe → automatic rollback on failure. Operator-triggered, never automatic on tag.

View source on GitHub

How a tagged binary release lands on a host. End-to-end pipeline:

git tag vX.Y.Z          → release.yml fires automatically
                        → cross-compiles binaries, pushes to GitHub Release
                        # No GHCR push — F-1221 (codex audit-2026-05-12);
                        # docker/<binary>.Dockerfile remains for self-host builds.

operator triggers       → deploy.yml workflow_dispatch (region + version + binaries)
                        → downloads binaries from GitHub Release
                        → SHA256SUMS verification
                        → Ansible playbook over SSH
                        → backup (prev binary → /usr/local/bin/<b>.prev-<tag>)
                        → install → restart → health probe → rollback on fail

This doc covers the deploy half. The release half is in `release-process.md` §Cut.

Triggering a deploy

gh workflow run deploy.yml \
  -f region=r1 \
  -f version=v0.2.0 \
  -f binaries=stellarindex-indexer,stellarindex-aggregator,stellarindex-api

Or use the GitHub Actions UI: Actions → deploy → Run workflow, fill in the dropdowns.

Defaults if binaries is omitted: stellarindex-indexer,stellarindex-aggregator,stellarindex-api (the three long-running services).

The workflow refuses to run unless version matches vX.Y.Z[-prerelease][+build] and the GitHub Release exists.

Per-region setup

Each region needs four secrets configured in the repo's GitHub Secrets settings:

SecretWhat it is
<REGION>_HOSTPublic IP/hostname of the deploy target (e.g. 136.243.90.96 for r1)
<REGION>_USERSSH user (defaults to root if unset)
DEPLOY_SSH_PRIVATE_KEYOpenSSH private key whose public counterpart is in the host's ~/.ssh/authorized_keys. Generate with ssh-keygen -t ed25519 -f deploy-key; the secret holds the contents of deploy-key (private).
<REGION>_SSH_KNOWN_HOSTSBase64-encoded output of ssh-keyscan -t ed25519 <host>. Pinning known_hosts prevents MITM during the deploy connection. Use `ssh-keyscan -t ed25519 <host> \base64` to produce.

Currently only r1 is wired. Adding r2 / r3:

  1. Add the four R2_* / R3_* secrets above.
  2. Add the region to the workflow's region choice list in .github/workflows/deploy.yml.
  3. Extend the case in the "Resolve region inventory" step to map the new region's secrets.
  4. Optionally configure a GitHub Environment named after the region with required reviewers (forces manual approval before the deploy job runs).

What the playbook does

`configs/ansible/playbooks/deploy-binary.yml` loops over each requested binary and includes `configs/ansible/tasks/deploy-one-binary.yml`.

Per-binary sequence:

  1. Resolve previous version from the sidecar

/var/lib/stellarindex/deployed-versions/<binary>. First-deploy fallback is a UTC timestamp.

  1. Stage the new binary as <install_dir>/<binary>.new

(controller → host copy via SSH).

  1. Backup the current <binary><binary>.prev-<previous-tag>.
  2. Atomic rename .new → live path.
  3. Write sidecar with the new version tag.
  4. `systemctl restart <binary>.service`.
  5. Grace period (default 15s) before health probe.
  6. Health probe:

- stellarindex-api: curl http://127.0.0.1:3000/v1/healthz expects 200 (5 retries × 3s) - other binaries: systemctl is-active expects active (5 retries × 3s)

  1. Rollback on probe failure:

- Stop the failing service. - Move bad binary to <binary>.failed-<new-version> (preserved for post-mortem). - Restore <binary>.prev-<previous-tag> → live path. - Restore the previous sidecar. - Restart with old binary. - Fail the play (workflow surfaces non-zero).

  1. Prune backups beyond the most-recent 5 to bound disk usage.

Backup naming + rollback

Backups land at /usr/local/bin/<binary>.prev-<tag> where <tag> is the SemVer of the previous deploy (resolved from the sidecar). Examples after a few deploys:

/usr/local/bin/stellarindex-api
/usr/local/bin/stellarindex-api.prev-v0.2.0
/usr/local/bin/stellarindex-api.prev-v0.1.3
/usr/local/bin/stellarindex-api.prev-v0.1.2

To roll back manually (workflow path is preferred — see release-process.md §Rollback):

ssh root@<host> "
  systemctl stop stellarindex-api
  mv /usr/local/bin/stellarindex-api /tmp/bad-stellarindex-api
  cp /usr/local/bin/stellarindex-api.prev-v0.1.3 /usr/local/bin/stellarindex-api
  echo v0.1.3 > /var/lib/stellarindex/deployed-versions/stellarindex-api
  systemctl start stellarindex-api
"

Then re-run gh workflow run deploy.yml -f version=v0.1.3 … to get the workflow's state back in sync (idempotent — it'll be a no-op if the sidecar already says v0.1.3 and the binary is healthy).

Failure modes

SymptomLikely causeFix
Workflow fails at "Validate inputs"version doesn't match SemVerRe-run with a valid vX.Y.Z tag
"Region <X> host secret is unset"<REGION>_HOST not configured in GitHub SecretsAdd the secret per §Per-region setup
"Bad binary preserved at …failed-vX.Y.Z"New binary failed health probe; rolled backInspect /usr/local/bin/<binary>.failed-<v> on the host; journalctl -u <binary> -n 200 shows why
SSH timeout / "permission denied"Stale key, removed authorized_keys entry, host firewall changeVerify DEPLOY_SSH_PRIVATE_KEY is current; SSH manually from a known-good box
"Post-deploy version mismatch" *(future check)*Currently disabled — no --version flag on binariesTrack in launch-readiness backlog

Cross-references