feat: add Timeweb Cloud provider for Woodpecker CI autoscaler
- Implement timewebcloud provider with DeployAgent, RemoveAgent, ListDeployedAgentNames - Add minimal HTTP API client for Timeweb Cloud (create/list/delete servers) - Register provider in main.go with CLI flags - Add timeweb-list and timeweb-tester utilities - Include Dockerfile and docker-compose.yml for deployment - Update DEPLOY.md with verified OS/preset IDs
This commit is contained in:
151
agents.md
Normal file
151
agents.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# Project: Woodpecker CI Autoscaler — Timeweb Cloud Provider
|
||||
|
||||
## Goal
|
||||
Add a Timeweb Cloud provider to the Woodpecker CI autoscaler so that:
|
||||
1. The Woodpecker server runs permanently on one VDS.
|
||||
2. When a CI job appears, the autoscaler dynamically creates a new VDS on Timeweb Cloud.
|
||||
3. The VDS is bootstrapped via cloud-init, connects to the server as an agent, and runs the job.
|
||||
4. After the job finishes and the idle timeout expires, the VDS is destroyed.
|
||||
|
||||
## Background
|
||||
|
||||
### Current Setup
|
||||
- Woodpecker server and agent run permanently on a single VDS via Docker Compose.
|
||||
- The goal is to move to a dynamic model where agents are created on demand.
|
||||
|
||||
### Woodpecker CI Autoscaler Architecture
|
||||
- **Repository**: `woodpecker-ci/autoscaler` (separate from the main `woodpecker-ci/woodpecker` repo).
|
||||
- **Language**: Go.
|
||||
- **Provider Interface** (3 methods):
|
||||
```go
|
||||
type Provider interface {
|
||||
DeployAgent(context.Context, *woodpecker.Agent) error
|
||||
RemoveAgent(context.Context, *woodpecker.Agent) error
|
||||
ListDeployedAgentNames(context.Context) ([]string, error)
|
||||
}
|
||||
```
|
||||
- **Provisioning Flow**:
|
||||
1. Autoscaler monitors the Woodpecker queue.
|
||||
2. When pending tasks exceed capacity, it calls `AgentCreate()` to get a token, then `DeployAgent()`.
|
||||
3. `DeployAgent` creates a VM and passes cloud-init user-data.
|
||||
4. The VM boots, installs Docker, and runs the Woodpecker agent container via docker compose.
|
||||
5. The agent connects to the server via gRPC using the provided token.
|
||||
6. On scale-down, `RemoveAgent()` terminates the VM, and the agent is deleted from Woodpecker.
|
||||
- **Cloud-init**: The autoscaler generates a cloud-init YAML that installs Docker and starts the agent. Custom templates are supported via `WOODPECKER_PROVIDER_USERDATA` / `WOODPECKER_PROVIDER_USERDATA_FILE`.
|
||||
- **Agent Environment Variables** (set in cloud-init):
|
||||
- `WOODPECKER_SERVER` — gRPC address of the server.
|
||||
- `WOODPECKER_AGENT_SECRET` — token generated by `AgentCreate()`.
|
||||
- `WOODPECKER_MAX_WORKFLOWS` — parallelism per agent.
|
||||
- `WOODPECKER_GRPC_SECURE` — TLS flag.
|
||||
- **Configuration**: The autoscaler uses `urfave/cli` for CLI flags. Providers define their own flags (e.g., `--hetznercloud-api-token`).
|
||||
- **Registration**: To add a new provider, you must:
|
||||
1. Implement the `Provider` interface in a new package under `providers/<name>/`.
|
||||
2. Create a `flags.go` file with CLI flags.
|
||||
3. Import the package and add a case in `cmd/woodpecker-autoscaler/main.go`.
|
||||
4. Append the provider's flags to the global app flags.
|
||||
|
||||
### Timeweb Cloud API
|
||||
- **Public API**: Yes — `https://api.timeweb.cloud`.
|
||||
- **Official Go SDK**: `github.com/timeweb-cloud/sdk-go` (OpenAPI-generated).
|
||||
- **Authentication**: JWT Bearer token (`Authorization: Bearer <token>`).
|
||||
- **VDS Lifecycle Endpoints**:
|
||||
- Create: `POST /api/v1/servers`
|
||||
- Delete: `DELETE /api/v1/servers/{server_id}`
|
||||
- Get: `GET /api/v1/servers/{server_id}`
|
||||
- List: `GET /api/v1/servers`
|
||||
- Start: `POST /api/v1/servers/{server_id}/start`
|
||||
- Shutdown: `POST /api/v1/servers/{server_id}/shutdown`
|
||||
- Clone: `POST /api/v1/servers/{server_id}/clone`
|
||||
- **Create Server Parameters**:
|
||||
- `name` (required)
|
||||
- `os_id` or `image_id`
|
||||
- `preset_id` or `configuration` (CPU, RAM, disk)
|
||||
- `ssh_keys_ids`
|
||||
- `cloud_init` — **this is critical** for passing user-data.
|
||||
- `availability_zone`
|
||||
- `hostname`
|
||||
- **Rate Limit**: 20 requests per second per endpoint.
|
||||
- **Tags/Labels**: The API does not seem to have a native "label" or "tag" system for servers. We may need to track pool association by server name prefix or by storing state locally. **This is an open question.**
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Project Setup
|
||||
1. Fork / vendor `woodpecker-ci/autoscaler` as the base.
|
||||
2. Add `github.com/timeweb-cloud/sdk-go` as a dependency.
|
||||
3. Create the provider package: `providers/timewebcloud/`.
|
||||
|
||||
### Phase 2: Provider Implementation
|
||||
1. **Struct & Constructor** (`provider.go`):
|
||||
- Fields: API client, config, pool ID, default image/preset/zone.
|
||||
- `New(ctx, cli.Command, *config.Config) (types.Provider, error)`.
|
||||
2. **Flags** (`flags.go`):
|
||||
- `--timewebcloud-api-token` (env: `WOODPECKER_TIMEWEBCLOUD_API_TOKEN`)
|
||||
- `--timewebcloud-os-id` / `--timewebcloud-image-id`
|
||||
- `--timewebcloud-preset-id` / `--timewebcloud-configuration`
|
||||
- `--timewebcloud-availability-zone`
|
||||
- `--timewebcloud-ssh-key-id`
|
||||
- `--timewebcloud-hostname-prefix`
|
||||
3. **DeployAgent**:
|
||||
- Generate cloud-init user-data via `cloudinit.RenderUserDataTemplate()`.
|
||||
- Call `CreateServer` with the agent name and user-data.
|
||||
- Store the mapping `agent.Name -> server_id` (in memory or via naming convention).
|
||||
4. **RemoveAgent**:
|
||||
- Find server by agent name (list all servers and filter by name, or use a stored mapping).
|
||||
- Call `DeleteServer`.
|
||||
- Handle "not found" gracefully.
|
||||
5. **ListDeployedAgentNames**:
|
||||
- List all servers.
|
||||
- Filter by name prefix (e.g., `pool-<pool-id>-agent-`).
|
||||
- Return matching names.
|
||||
|
||||
### Phase 3: Integration
|
||||
1. Import the provider in `main.go`.
|
||||
2. Add `case "timewebcloud":` to `setupProvider()`.
|
||||
3. Append `timewebcloud.ProviderFlags` to the global flags.
|
||||
|
||||
### Phase 4: Testing & Deployment
|
||||
1. Build the binary.
|
||||
2. Test locally or on a staging VDS:
|
||||
- Start the autoscaler with `--provider=timewebcloud`.
|
||||
- Trigger a CI job.
|
||||
- Verify VDS creation, agent connection, job execution, and cleanup.
|
||||
3. Update Docker Compose / deployment docs.
|
||||
|
||||
## Key Technical Decisions
|
||||
|
||||
### 1. How to Track Agent-to-Server Mapping?
|
||||
**Options**:
|
||||
- **A. Name Prefix Convention**: Name servers as `wp-<pool>-<agent-name>`. `ListDeployedAgentNames` filters by prefix. Simple, no state needed.
|
||||
- **B. In-Memory Map**: Store `map[string]int` (agent name -> server ID) in the provider struct. Lost on restart.
|
||||
- **C. Local State File**: Persist the map to disk. Survives restart.
|
||||
- **D. API Metadata**: If Timeweb API supports tags/labels, use them. (Currently unclear.)
|
||||
|
||||
**Recommendation**: Start with **A** (name prefix) as the simplest and most robust approach. If Timeweb adds tags later, migrate to **D**.
|
||||
|
||||
### 2. How to Handle Server Readiness?
|
||||
**Question**: After `CreateServer`, the server may take time to boot. Does `DeployAgent` need to wait?
|
||||
**Answer**: No. The autoscaler engine only requires that the VM creation is initiated. The agent will connect when ready. The engine has `AgentInactivityTimeout` (default 10m) to clean up agents that never connect.
|
||||
|
||||
### 3. OS Image Selection
|
||||
**Question**: What base image should be used for the agent VMs?
|
||||
**Answer**: Ubuntu 22.04 LTS or Debian 12 (stable, good Docker support). The `os_id` must be fetched from Timeweb's API (`GetOsList`). Alternatively, a custom image with Docker pre-installed could speed up boot time.
|
||||
|
||||
### 4. SSH Keys
|
||||
**Question**: Are SSH keys needed if we use cloud-init?
|
||||
**Answer**: Cloud-init handles everything. SSH keys are optional but useful for debugging. The provider should allow configuring `ssh_keys_ids`.
|
||||
|
||||
## Open Questions
|
||||
1. Does Timeweb Cloud API support assigning custom tags/labels to servers? (Affects `ListDeployedAgentNames` implementation.)
|
||||
2. What is the typical boot time for a new VDS? (Affects `AgentInactivityTimeout` tuning.)
|
||||
3. Does the `cloud_init` field in `CreateServer` accept standard cloud-init YAML? (Needs testing.)
|
||||
4. Is there a way to use a custom image (snapshot) to pre-install Docker and reduce boot time?
|
||||
5. What are the `os_id` values for Ubuntu/Debian? (Need to call `GetOsList`.)
|
||||
6. Does Timeweb charge for stopped (but not deleted) servers? (Affects whether we should stop vs. delete.)
|
||||
|
||||
## References
|
||||
- Woodpecker Autoscaler Repo: `https://github.com/woodpecker-ci/autoscaler`
|
||||
- Provider Interface: `engine/types/provider.go`
|
||||
- Hetzner Provider (reference): `providers/hetznercloud/`
|
||||
- Cloud-init Render: `engine/inits/cloudinit/cloudinit.go`
|
||||
- Timeweb Cloud Go SDK: `https://github.com/timeweb-cloud/sdk-go`
|
||||
- Timeweb Cloud API Docs: `https://timeweb.cloud/api-docs`
|
||||
Reference in New Issue
Block a user