Files
twcloud-scaler/agents.md
Sergey Vanyushkin 191cdd108f feat: add Timeweb Cloud provider for Woodpecker CI autoscaler
- Implement timewebcloud provider with DeployAgent, RemoveAgent, ListDeployedAgentNames
- Add minimal HTTP API client for Timeweb Cloud (create/list/delete servers)
- Register provider in main.go with CLI flags
- Add timeweb-list and timeweb-tester utilities
- Include Dockerfile and docker-compose.yml for deployment
- Update DEPLOY.md with verified OS/preset IDs
2026-05-16 13:09:07 +03:00

152 lines
7.8 KiB
Markdown

# Project: Woodpecker CI Autoscaler — Timeweb Cloud Provider
## Goal
Add a Timeweb Cloud provider to the Woodpecker CI autoscaler so that:
1. The Woodpecker server runs permanently on one VDS.
2. When a CI job appears, the autoscaler dynamically creates a new VDS on Timeweb Cloud.
3. The VDS is bootstrapped via cloud-init, connects to the server as an agent, and runs the job.
4. After the job finishes and the idle timeout expires, the VDS is destroyed.
## Background
### Current Setup
- Woodpecker server and agent run permanently on a single VDS via Docker Compose.
- The goal is to move to a dynamic model where agents are created on demand.
### Woodpecker CI Autoscaler Architecture
- **Repository**: `woodpecker-ci/autoscaler` (separate from the main `woodpecker-ci/woodpecker` repo).
- **Language**: Go.
- **Provider Interface** (3 methods):
```go
type Provider interface {
DeployAgent(context.Context, *woodpecker.Agent) error
RemoveAgent(context.Context, *woodpecker.Agent) error
ListDeployedAgentNames(context.Context) ([]string, error)
}
```
- **Provisioning Flow**:
1. Autoscaler monitors the Woodpecker queue.
2. When pending tasks exceed capacity, it calls `AgentCreate()` to get a token, then `DeployAgent()`.
3. `DeployAgent` creates a VM and passes cloud-init user-data.
4. The VM boots, installs Docker, and runs the Woodpecker agent container via docker compose.
5. The agent connects to the server via gRPC using the provided token.
6. On scale-down, `RemoveAgent()` terminates the VM, and the agent is deleted from Woodpecker.
- **Cloud-init**: The autoscaler generates a cloud-init YAML that installs Docker and starts the agent. Custom templates are supported via `WOODPECKER_PROVIDER_USERDATA` / `WOODPECKER_PROVIDER_USERDATA_FILE`.
- **Agent Environment Variables** (set in cloud-init):
- `WOODPECKER_SERVER` — gRPC address of the server.
- `WOODPECKER_AGENT_SECRET` — token generated by `AgentCreate()`.
- `WOODPECKER_MAX_WORKFLOWS` — parallelism per agent.
- `WOODPECKER_GRPC_SECURE` — TLS flag.
- **Configuration**: The autoscaler uses `urfave/cli` for CLI flags. Providers define their own flags (e.g., `--hetznercloud-api-token`).
- **Registration**: To add a new provider, you must:
1. Implement the `Provider` interface in a new package under `providers/<name>/`.
2. Create a `flags.go` file with CLI flags.
3. Import the package and add a case in `cmd/woodpecker-autoscaler/main.go`.
4. Append the provider's flags to the global app flags.
### Timeweb Cloud API
- **Public API**: Yes — `https://api.timeweb.cloud`.
- **Official Go SDK**: `github.com/timeweb-cloud/sdk-go` (OpenAPI-generated).
- **Authentication**: JWT Bearer token (`Authorization: Bearer <token>`).
- **VDS Lifecycle Endpoints**:
- Create: `POST /api/v1/servers`
- Delete: `DELETE /api/v1/servers/{server_id}`
- Get: `GET /api/v1/servers/{server_id}`
- List: `GET /api/v1/servers`
- Start: `POST /api/v1/servers/{server_id}/start`
- Shutdown: `POST /api/v1/servers/{server_id}/shutdown`
- Clone: `POST /api/v1/servers/{server_id}/clone`
- **Create Server Parameters**:
- `name` (required)
- `os_id` or `image_id`
- `preset_id` or `configuration` (CPU, RAM, disk)
- `ssh_keys_ids`
- `cloud_init` — **this is critical** for passing user-data.
- `availability_zone`
- `hostname`
- **Rate Limit**: 20 requests per second per endpoint.
- **Tags/Labels**: The API does not seem to have a native "label" or "tag" system for servers. We may need to track pool association by server name prefix or by storing state locally. **This is an open question.**
## Implementation Plan
### Phase 1: Project Setup
1. Fork / vendor `woodpecker-ci/autoscaler` as the base.
2. Add `github.com/timeweb-cloud/sdk-go` as a dependency.
3. Create the provider package: `providers/timewebcloud/`.
### Phase 2: Provider Implementation
1. **Struct & Constructor** (`provider.go`):
- Fields: API client, config, pool ID, default image/preset/zone.
- `New(ctx, cli.Command, *config.Config) (types.Provider, error)`.
2. **Flags** (`flags.go`):
- `--timewebcloud-api-token` (env: `WOODPECKER_TIMEWEBCLOUD_API_TOKEN`)
- `--timewebcloud-os-id` / `--timewebcloud-image-id`
- `--timewebcloud-preset-id` / `--timewebcloud-configuration`
- `--timewebcloud-availability-zone`
- `--timewebcloud-ssh-key-id`
- `--timewebcloud-hostname-prefix`
3. **DeployAgent**:
- Generate cloud-init user-data via `cloudinit.RenderUserDataTemplate()`.
- Call `CreateServer` with the agent name and user-data.
- Store the mapping `agent.Name -> server_id` (in memory or via naming convention).
4. **RemoveAgent**:
- Find server by agent name (list all servers and filter by name, or use a stored mapping).
- Call `DeleteServer`.
- Handle "not found" gracefully.
5. **ListDeployedAgentNames**:
- List all servers.
- Filter by name prefix (e.g., `pool-<pool-id>-agent-`).
- Return matching names.
### Phase 3: Integration
1. Import the provider in `main.go`.
2. Add `case "timewebcloud":` to `setupProvider()`.
3. Append `timewebcloud.ProviderFlags` to the global flags.
### Phase 4: Testing & Deployment
1. Build the binary.
2. Test locally or on a staging VDS:
- Start the autoscaler with `--provider=timewebcloud`.
- Trigger a CI job.
- Verify VDS creation, agent connection, job execution, and cleanup.
3. Update Docker Compose / deployment docs.
## Key Technical Decisions
### 1. How to Track Agent-to-Server Mapping?
**Options**:
- **A. Name Prefix Convention**: Name servers as `wp-<pool>-<agent-name>`. `ListDeployedAgentNames` filters by prefix. Simple, no state needed.
- **B. In-Memory Map**: Store `map[string]int` (agent name -> server ID) in the provider struct. Lost on restart.
- **C. Local State File**: Persist the map to disk. Survives restart.
- **D. API Metadata**: If Timeweb API supports tags/labels, use them. (Currently unclear.)
**Recommendation**: Start with **A** (name prefix) as the simplest and most robust approach. If Timeweb adds tags later, migrate to **D**.
### 2. How to Handle Server Readiness?
**Question**: After `CreateServer`, the server may take time to boot. Does `DeployAgent` need to wait?
**Answer**: No. The autoscaler engine only requires that the VM creation is initiated. The agent will connect when ready. The engine has `AgentInactivityTimeout` (default 10m) to clean up agents that never connect.
### 3. OS Image Selection
**Question**: What base image should be used for the agent VMs?
**Answer**: Ubuntu 22.04 LTS or Debian 12 (stable, good Docker support). The `os_id` must be fetched from Timeweb's API (`GetOsList`). Alternatively, a custom image with Docker pre-installed could speed up boot time.
### 4. SSH Keys
**Question**: Are SSH keys needed if we use cloud-init?
**Answer**: Cloud-init handles everything. SSH keys are optional but useful for debugging. The provider should allow configuring `ssh_keys_ids`.
## Open Questions
1. Does Timeweb Cloud API support assigning custom tags/labels to servers? (Affects `ListDeployedAgentNames` implementation.)
2. What is the typical boot time for a new VDS? (Affects `AgentInactivityTimeout` tuning.)
3. Does the `cloud_init` field in `CreateServer` accept standard cloud-init YAML? (Needs testing.)
4. Is there a way to use a custom image (snapshot) to pre-install Docker and reduce boot time?
5. What are the `os_id` values for Ubuntu/Debian? (Need to call `GetOsList`.)
6. Does Timeweb charge for stopped (but not deleted) servers? (Affects whether we should stop vs. delete.)
## References
- Woodpecker Autoscaler Repo: `https://github.com/woodpecker-ci/autoscaler`
- Provider Interface: `engine/types/provider.go`
- Hetzner Provider (reference): `providers/hetznercloud/`
- Cloud-init Render: `engine/inits/cloudinit/cloudinit.go`
- Timeweb Cloud Go SDK: `https://github.com/timeweb-cloud/sdk-go`
- Timeweb Cloud API Docs: `https://timeweb.cloud/api-docs`