Deploy workflow — pushing a tagged release to a region — Stellar Index operations

How a tagged binary release lands on a host. End-to-end pipeline:

git tag vX.Y.Z          → release.yml fires automatically
                        → cross-compiles binaries, pushes to GitHub Release
                        # No GHCR push — F-1221 (codex audit-2026-05-12);
                        # docker/<binary>.Dockerfile remains for self-host builds.

operator triggers       → deploy.yml workflow_dispatch (region + version + binaries)
                        → downloads binaries from GitHub Release
                        → SHA256SUMS verification
                        → Ansible playbook over SSH
                        → backup (prev binary → /usr/local/bin/<b>.prev-<tag>)
                        → install → restart → health probe → rollback on fail

This doc covers the deploy half. The release half is in `release-process.md` §Cut.

Triggering a deploy

gh workflow run deploy.yml \
  -f region=r1 \
  -f version=v0.2.0 \
  -f binaries=stellarindex-indexer,stellarindex-aggregator,stellarindex-api

Or use the GitHub Actions UI: Actions → deploy → Run workflow, fill in the dropdowns.

Defaults if binaries is omitted: stellarindex-indexer,stellarindex-aggregator,stellarindex-api (the three long-running services).

The workflow refuses to run unless version matches vX.Y.Z[-prerelease][+build] and the GitHub Release exists.

Per-region setup

Each region needs four secrets configured in the repo's GitHub Secrets settings:

Secret	What it is
`<REGION>_HOST`	Public IP/hostname of the deploy target (e.g. `136.243.90.96` for r1)
`<REGION>_USER`	SSH user (defaults to `root` if unset)
`DEPLOY_SSH_PRIVATE_KEY`	OpenSSH private key whose public counterpart is in the host's `~/.ssh/authorized_keys`. Generate with `ssh-keygen -t ed25519 -f deploy-key`; the secret holds the contents of `deploy-key` (private).
`<REGION>_SSH_KNOWN_HOSTS`	Base64-encoded output of `ssh-keyscan -t ed25519 <host>`. Pinning known_hosts prevents MITM during the deploy connection. Use `ssh-keyscan -t ed25519 <host> \	base64` to produce.

Currently only r1 is wired. Adding r2 / r3:

Add the four R2_* / R3_* secrets above.
Add the region to the workflow's region choice list in .github/workflows/deploy.yml.
Extend the case in the "Resolve region inventory" step to map the new region's secrets.
Optionally configure a GitHub Environment named after the region with required reviewers (forces manual approval before the deploy job runs).

What the playbook does

`configs/ansible/playbooks/deploy-binary.yml` loops over each requested binary and includes `configs/ansible/tasks/deploy-one-binary.yml`.

Per-binary sequence:

Resolve previous version from the sidecar

/var/lib/stellarindex/deployed-versions/<binary>. First-deploy fallback is a UTC timestamp.

Stage the new binary as <install_dir>/<binary>.new

(controller → host copy via SSH).

Backup the current <binary> → <binary>.prev-<previous-tag>.
Atomic rename .new → live path.
Write sidecar with the new version tag.
`systemctl restart <binary>.service`.
Grace period (default 15s) before health probe.
Health probe:

- stellarindex-api: curl http://127.0.0.1:3000/v1/healthz expects 200 (5 retries × 3s) - other binaries: systemctl is-active expects active (5 retries × 3s)

Rollback on probe failure:

- Stop the failing service. - Move bad binary to <binary>.failed-<new-version> (preserved for post-mortem). - Restore <binary>.prev-<previous-tag> → live path. - Restore the previous sidecar. - Restart with old binary. - Fail the play (workflow surfaces non-zero).

Prune backups beyond the most-recent 5 to bound disk usage.

Backup naming + rollback

Backups land at /usr/local/bin/<binary>.prev-<tag> where <tag> is the SemVer of the previous deploy (resolved from the sidecar). Examples after a few deploys:

/usr/local/bin/stellarindex-api
/usr/local/bin/stellarindex-api.prev-v0.2.0
/usr/local/bin/stellarindex-api.prev-v0.1.3
/usr/local/bin/stellarindex-api.prev-v0.1.2

To roll back manually (workflow path is preferred — see release-process.md §Rollback):

ssh root@<host> "
  systemctl stop stellarindex-api
  mv /usr/local/bin/stellarindex-api /tmp/bad-stellarindex-api
  cp /usr/local/bin/stellarindex-api.prev-v0.1.3 /usr/local/bin/stellarindex-api
  echo v0.1.3 > /var/lib/stellarindex/deployed-versions/stellarindex-api
  systemctl start stellarindex-api
"

Then re-run gh workflow run deploy.yml -f version=v0.1.3 … to get the workflow's state back in sync (idempotent — it'll be a no-op if the sidecar already says v0.1.3 and the binary is healthy).

Failure modes

Symptom	Likely cause	Fix
Workflow fails at "Validate inputs"	`version` doesn't match SemVer	Re-run with a valid `vX.Y.Z` tag
"Region <X> host secret is unset"	`<REGION>_HOST` not configured in GitHub Secrets	Add the secret per §Per-region setup
"Bad binary preserved at …failed-vX.Y.Z"	New binary failed health probe; rolled back	Inspect `/usr/local/bin/<binary>.failed-<v>` on the host; `journalctl -u <binary> -n 200` shows why
SSH timeout / "permission denied"	Stale key, removed `authorized_keys` entry, host firewall change	Verify `DEPLOY_SSH_PRIVATE_KEY` is current; SSH manually from a known-good box
"Post-deploy version mismatch" (future check)	Currently disabled — no `--version` flag on binaries	Track in launch-readiness backlog

Cross-references

`docs/operations/release-process.md` — the cut-tag side of the pipeline
`docs/architecture/semver-policy.md` — version tag rules
`.github/workflows/release.yml` — produces the artefacts this consumes
`.github/workflows/deploy.yml` — the workflow itself
`configs/ansible/playbooks/deploy-binary.yml` — top-level playbook
`configs/ansible/tasks/deploy-one-binary.yml` — per-binary task list