Skip to content

[BUG] Cannot start worker on docker-compose setup #1804

Open
@swertz

Description

@swertz

Describe the issue

When using a self-hosted hatchet environment with the suggested docker-compose configuration from the web page, I don't manage to start up a worker that is running outside of docker, encountering this error:

[DEBUG]	🪓 -- 2025-06-03 13:25:05,939 - Retrying <function AdminClient.put_workflow at 0x103f20860>: attempt 1 ended with: <Future at 0x12a99cf80 state=finished raised _InactiveRpcError>
[DEBUG]	🪓 -- 2025-06-03 13:25:07,356 - Retrying <function AdminClient.put_workflow at 0x103f20860>: attempt 2 ended with: <Future at 0x12a822180 state=finished raised _InactiveRpcError>
[DEBUG]	🪓 -- 2025-06-03 13:25:10,091 - Retrying <function AdminClient.put_workflow at 0x103f20860>: attempt 3 ended with: <Future at 0x11fa93f80 state=finished raised _InactiveRpcError>
[DEBUG]	🪓 -- 2025-06-03 13:25:14,697 - Retrying <function AdminClient.put_workflow at 0x103f20860>: attempt 4 ended with: <Future at 0x12a97ac00 state=finished raised _InactiveRpcError>
[ERROR]	🪓 -- 2025-06-03 13:25:23,434 - failed to register workflow: Yahoo bronze
[ERROR]	🪓 -- 2025-06-03 13:25:23,435 - <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "DNS resolution failed for 7077: C-ares status is not ARES_SUCCESS qtype=A name=7077 is_balancer=0: Domain name not found"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2025-06-03T13:25:23.432937+02:00", grpc_status:14, grpc_message:"DNS resolution failed for 7077: C-ares status is not ARES_SUCCESS qtype=A name=7077 is_balancer=0: Domain name not found"}"

I don't see anything suspicious in the logs of the engine or dashboard.

Environment

  • SDK: Python v1.10.3
  • Engine: self-hosted latest docker

Expected behavior

I expected the worker to start. I've quickly tried using the hatchet-lite setup and it works fine there.

Code to Reproduce, Logs, or Screenshots

The code used to start the worker:

from my_project.tasks import task


def main() -> None:
    worker = hatchet.worker("test-worker", workflows=[task])
    worker.start()


if __name__ == "__main__":
    main()

My docker compose file looks like this. I've only changed it to use postgres as a message queue:

db:
    init: true
    restart: always
    image: ${POSTGRES_IMAGE}
    expose:
      - 5432
    ports:
      - 10101:5432
    env_file:
      - .env
    volumes:
      - postgres-data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -d hatchet -U hatchet"]
      interval: 10s
      timeout: 10s
      retries: 5
      start_period: 10s

  migration:
    image: ghcr.io/hatchet-dev/hatchet/hatchet-migrate:latest
    command: /hatchet/hatchet-migrate
    environment:
      DATABASE_URL: "postgres://hatchet:hatchet@db:5432/hatchet"
    depends_on:
      db:
        condition: service_healthy

  setup-config:
    image: ghcr.io/hatchet-dev/hatchet/hatchet-admin:latest
    command: /hatchet/hatchet-admin quickstart --skip certs --generated-config-dir /hatchet/config --overwrite=false
    environment:
      DATABASE_URL: "postgres://hatchet:hatchet@db:5432/hatchet"
      SERVER_AUTH_COOKIE_DOMAIN: localhost:8080
      SERVER_AUTH_COOKIE_INSECURE: "t"
      SERVER_GRPC_BIND_ADDRESS: "0.0.0.0"
      SERVER_GRPC_INSECURE: "t"
      SERVER_GRPC_BROADCAST_ADDRESS: localhost:7077
      SERVER_DEFAULT_ENGINE_VERSION: "V1"
      SERVER_INTERNAL_CLIENT_INTERNAL_GRPC_BROADCAST_ADDRESS: hatchet-engine:7077
      SERVER_MSGQUEUE_KIND: postgres
    volumes:
      - hatchet_certs:/hatchet/certs
      - hatchet_config:/hatchet/config
    depends_on:
      migration:
        condition: service_completed_successfully

  hatchet-engine:
    image: ghcr.io/hatchet-dev/hatchet/hatchet-engine:latest
    command: /hatchet/hatchet-engine --config /hatchet/config
    restart: on-failure
    depends_on:
      setup-config:
        condition: service_completed_successfully
      migration:
        condition: service_completed_successfully
    ports:
      - "7077:7070"
    environment:
      DATABASE_URL: "postgres://hatchet:hatchet@db:5432/hatchet"
      SERVER_GRPC_BIND_ADDRESS: "0.0.0.0"
      SERVER_GRPC_INSECURE: "t"
    volumes:
      - hatchet_certs:/hatchet/certs
      - hatchet_config:/hatchet/config
 
  hatchet-dashboard:
    image: ghcr.io/hatchet-dev/hatchet/hatchet-dashboard:latest
    command: sh ./entrypoint.sh --config /hatchet/config
    ports:
      - 8080:80
    restart: on-failure
    depends_on:
      setup-config:
        condition: service_completed_successfully
      migration:
        condition: service_completed_successfully
    environment:
      DATABASE_URL: "postgres://hatchet:hatchet@db:5432/hatchet"
    volumes:
      - hatchet_certs:/hatchet/certs
      - hatchet_config:/hatchet/config

volumes:
  postgres-data:
  hatchet_config:
  hatchet_certs:

As suggested here I have the following environment variables loaded when launching the worker:

HATCHET_CLIENT_TOKEN="(generated from the dashboard)"
HATCHET_CLIENT_TLS_STRATEGY=none
HATCHET_CLIENT_HOST_PORT=7077
HATCHET_CLIENT_API_URL=localhost
HATCHET_CLIENT_SERVER_URL=localhost

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions