# Project: Woodpecker CI Autoscaler — Timeweb Cloud Provider ## Goal Add a Timeweb Cloud provider to the Woodpecker CI autoscaler so that: 1. The Woodpecker server runs permanently on one VDS. 2. When a CI job appears, the autoscaler dynamically creates a new VDS on Timeweb Cloud. 3. The VDS is bootstrapped via cloud-init, connects to the server as an agent, and runs the job. 4. After the job finishes and the idle timeout expires, the VDS is destroyed. ## Background ### Current Setup - Woodpecker server and agent run permanently on a single VDS via Docker Compose. - The goal is to move to a dynamic model where agents are created on demand. ### Woodpecker CI Autoscaler Architecture - **Repository**: `woodpecker-ci/autoscaler` (separate from the main `woodpecker-ci/woodpecker` repo). - **Language**: Go. - **Provider Interface** (3 methods): ```go type Provider interface { DeployAgent(context.Context, *woodpecker.Agent) error RemoveAgent(context.Context, *woodpecker.Agent) error ListDeployedAgentNames(context.Context) ([]string, error) } ``` - **Provisioning Flow**: 1. Autoscaler monitors the Woodpecker queue. 2. When pending tasks exceed capacity, it calls `AgentCreate()` to get a token, then `DeployAgent()`. 3. `DeployAgent` creates a VM and passes cloud-init user-data. 4. The VM boots, installs Docker, and runs the Woodpecker agent container via docker compose. 5. The agent connects to the server via gRPC using the provided token. 6. On scale-down, `RemoveAgent()` terminates the VM, and the agent is deleted from Woodpecker. - **Cloud-init**: The autoscaler generates a cloud-init YAML that installs Docker and starts the agent. Custom templates are supported via `WOODPECKER_PROVIDER_USERDATA` / `WOODPECKER_PROVIDER_USERDATA_FILE`. - **Agent Environment Variables** (set in cloud-init): - `WOODPECKER_SERVER` — gRPC address of the server. - `WOODPECKER_AGENT_SECRET` — token generated by `AgentCreate()`. - `WOODPECKER_MAX_WORKFLOWS` — parallelism per agent. - `WOODPECKER_GRPC_SECURE` — TLS flag. - **Configuration**: The autoscaler uses `urfave/cli` for CLI flags. Providers define their own flags (e.g., `--hetznercloud-api-token`). - **Registration**: To add a new provider, you must: 1. Implement the `Provider` interface in a new package under `providers//`. 2. Create a `flags.go` file with CLI flags. 3. Import the package and add a case in `cmd/woodpecker-autoscaler/main.go`. 4. Append the provider's flags to the global app flags. ### Timeweb Cloud API - **Public API**: Yes — `https://api.timeweb.cloud`. - **Official Go SDK**: `github.com/timeweb-cloud/sdk-go` (OpenAPI-generated). - **Authentication**: JWT Bearer token (`Authorization: Bearer `). - **VDS Lifecycle Endpoints**: - Create: `POST /api/v1/servers` - Delete: `DELETE /api/v1/servers/{server_id}` - Get: `GET /api/v1/servers/{server_id}` - List: `GET /api/v1/servers` - Start: `POST /api/v1/servers/{server_id}/start` - Shutdown: `POST /api/v1/servers/{server_id}/shutdown` - Clone: `POST /api/v1/servers/{server_id}/clone` - **Create Server Parameters**: - `name` (required) - `os_id` or `image_id` - `preset_id` or `configuration` (CPU, RAM, disk) - `ssh_keys_ids` - `cloud_init` — **this is critical** for passing user-data. - `availability_zone` - `hostname` - **Rate Limit**: 20 requests per second per endpoint. - **Tags/Labels**: The API does not seem to have a native "label" or "tag" system for servers. We may need to track pool association by server name prefix or by storing state locally. **This is an open question.** ## Implementation Plan ### Phase 1: Project Setup 1. Fork / vendor `woodpecker-ci/autoscaler` as the base. 2. Add `github.com/timeweb-cloud/sdk-go` as a dependency. 3. Create the provider package: `providers/timewebcloud/`. ### Phase 2: Provider Implementation 1. **Struct & Constructor** (`provider.go`): - Fields: API client, config, pool ID, default image/preset/zone. - `New(ctx, cli.Command, *config.Config) (types.Provider, error)`. 2. **Flags** (`flags.go`): - `--timewebcloud-api-token` (env: `WOODPECKER_TIMEWEBCLOUD_API_TOKEN`) - `--timewebcloud-os-id` / `--timewebcloud-image-id` - `--timewebcloud-preset-id` / `--timewebcloud-configuration` - `--timewebcloud-availability-zone` - `--timewebcloud-ssh-key-id` - `--timewebcloud-hostname-prefix` 3. **DeployAgent**: - Generate cloud-init user-data via `cloudinit.RenderUserDataTemplate()`. - Call `CreateServer` with the agent name and user-data. - Store the mapping `agent.Name -> server_id` (in memory or via naming convention). 4. **RemoveAgent**: - Find server by agent name (list all servers and filter by name, or use a stored mapping). - Call `DeleteServer`. - Handle "not found" gracefully. 5. **ListDeployedAgentNames**: - List all servers. - Filter by name prefix (e.g., `pool--agent-`). - Return matching names. ### Phase 3: Integration 1. Import the provider in `main.go`. 2. Add `case "timewebcloud":` to `setupProvider()`. 3. Append `timewebcloud.ProviderFlags` to the global flags. ### Phase 4: Testing & Deployment 1. Build the binary. 2. Test locally or on a staging VDS: - Start the autoscaler with `--provider=timewebcloud`. - Trigger a CI job. - Verify VDS creation, agent connection, job execution, and cleanup. 3. Update Docker Compose / deployment docs. ## Key Technical Decisions ### 1. How to Track Agent-to-Server Mapping? **Options**: - **A. Name Prefix Convention**: Name servers as `wp--`. `ListDeployedAgentNames` filters by prefix. Simple, no state needed. - **B. In-Memory Map**: Store `map[string]int` (agent name -> server ID) in the provider struct. Lost on restart. - **C. Local State File**: Persist the map to disk. Survives restart. - **D. API Metadata**: If Timeweb API supports tags/labels, use them. (Currently unclear.) **Recommendation**: Start with **A** (name prefix) as the simplest and most robust approach. If Timeweb adds tags later, migrate to **D**. ### 2. How to Handle Server Readiness? **Question**: After `CreateServer`, the server may take time to boot. Does `DeployAgent` need to wait? **Answer**: No. The autoscaler engine only requires that the VM creation is initiated. The agent will connect when ready. The engine has `AgentInactivityTimeout` (default 10m) to clean up agents that never connect. ### 3. OS Image Selection **Question**: What base image should be used for the agent VMs? **Answer**: Ubuntu 22.04 LTS or Debian 12 (stable, good Docker support). The `os_id` must be fetched from Timeweb's API (`GetOsList`). Alternatively, a custom image with Docker pre-installed could speed up boot time. ### 4. SSH Keys **Question**: Are SSH keys needed if we use cloud-init? **Answer**: Cloud-init handles everything. SSH keys are optional but useful for debugging. The provider should allow configuring `ssh_keys_ids`. ## Open Questions 1. Does Timeweb Cloud API support assigning custom tags/labels to servers? (Affects `ListDeployedAgentNames` implementation.) 2. What is the typical boot time for a new VDS? (Affects `AgentInactivityTimeout` tuning.) 3. Does the `cloud_init` field in `CreateServer` accept standard cloud-init YAML? (Needs testing.) 4. Is there a way to use a custom image (snapshot) to pre-install Docker and reduce boot time? 5. What are the `os_id` values for Ubuntu/Debian? (Need to call `GetOsList`.) 6. Does Timeweb charge for stopped (but not deleted) servers? (Affects whether we should stop vs. delete.) ## References - Woodpecker Autoscaler Repo: `https://github.com/woodpecker-ci/autoscaler` - Provider Interface: `engine/types/provider.go` - Hetzner Provider (reference): `providers/hetznercloud/` - Cloud-init Render: `engine/inits/cloudinit/cloudinit.go` - Timeweb Cloud Go SDK: `https://github.com/timeweb-cloud/sdk-go` - Timeweb Cloud API Docs: `https://timeweb.cloud/api-docs`