Service Discovery: How Microservices Find Each Other
Learn how service discovery works in microservices. Covers client-side vs server-side discovery, Consul, etcd, and Kubernetes DNS with practical examples.
You have 20 microservices. Each one runs on multiple instances. Instances start, crash, scale up, scale down — their IP addresses change constantly.
How does your order service know where to find the payment service right now?
That's the problem service discovery solves.
The Problem with Hardcoded Addresses
In a monolith, everything runs in one process. No routing needed. In microservices, services call each other over the network.
You could hardcode IP addresses:
PAYMENT_SERVICE_URL = "http://10.0.1.45:8080"This breaks immediately when:
- The payment service restarts with a different IP
- You scale to 3 payment service instances — which one do you call?
- The service moves to a different server
You need a dynamic way to find services.
How Service Discovery Works
Service discovery has two parts:
Service registry: A database that tracks which services are running and where. Every service registers itself when it starts and deregisters when it stops.
Discovery mechanism: How clients find a service's location. Either the client asks the registry directly, or a load balancer does it.
Payment service starts → registers with registry: "payment-svc at 10.0.1.45:8080"
Order service needs payment → asks registry: "where is payment-svc?"
Registry replies: "10.0.1.45:8080"
Order service calls payment serviceClient-Side Discovery
The client (calling service) talks to the registry directly, picks an instance, and calls it.
import consul
import random
c = consul.Consul()
def get_payment_service_url() -> str:
# Query registry for healthy instances
index, services = c.health.service("payment-service", passing=True)
if not services:
raise Exception("No healthy payment service instances")
# Pick one (simple round-robin or random)
instance = random.choice(services)
address = instance["Service"]["Address"]
port = instance["Service"]["Port"]
return f"http://{address}:{port}"
# Usage
url = get_payment_service_url()
response = requests.post(f"{url}/charge", json={"amount": 100})Advantage: The client controls load balancing — it can use any strategy it wants. Disadvantage: Every service needs the discovery logic. If you change how discovery works, you update every service.
Server-Side Discovery
The client calls a load balancer (or proxy). The load balancer queries the registry and forwards to the right instance. The client doesn't know discovery exists.
Order Service → Load Balancer → Registry lookup → Payment Instance 1
→ Payment Instance 2
→ Payment Instance 3This is how Kubernetes works. You call payment-service (a stable DNS name). Kubernetes' internal load balancer routes to a healthy pod.
Advantage: No discovery code in clients. One place to update routing logic. Disadvantage: Load balancer is an extra hop. It's a potential bottleneck and single point of failure (though you run multiple replicas).
Consul: Popular Service Registry
Consul by HashiCorp is one of the most common service registries. Services register via API or config file, Consul runs health checks.
import consul
c = consul.Consul(host="consul.internal")
# Register this service on startup
c.agent.service.register(
name="payment-service",
service_id="payment-1",
address="10.0.1.45",
port=8080,
check=consul.Check.http("http://10.0.1.45:8080/health", interval="10s")
)
# Consul runs /health check every 10s
# If it fails 3 times, marks service unhealthy
# Unhealthy services don't appear in discovery resultsHealth checks are critical — they prevent traffic from routing to a crashed or unresponsive instance.
Kubernetes: Built-in Service Discovery
Kubernetes has service discovery built in via DNS. Every Service object gets a stable DNS name.
# payment-service.yaml
apiVersion: v1
kind: Service
metadata:
name: payment-service
spec:
selector:
app: payment
ports:
- port: 80
targetPort: 8080Now any pod in the cluster can reach payment service at:
payment-service(same namespace)payment-service.default.svc.cluster.local(full name)
Kubernetes DNS maps this to the Service's ClusterIP, which load balances across healthy pods automatically.
# In Kubernetes — just use the service name
response = requests.post("http://payment-service/charge", json={"amount": 100})
# No registry calls, no IP management — Kubernetes handles itWhen a pod dies, Kubernetes removes it from the endpoints list. New requests go to healthy pods only.
etcd: Discovery for Infrastructure
etcd is a distributed key-value store often used for service discovery at the infrastructure level (it's what Kubernetes itself uses internally).
import etcd3
client = etcd3.client()
# Register
client.put("/services/payment/instance-1", "10.0.1.45:8080")
# Watch for changes
for event in client.watch_prefix("/services/payment/"):
print(f"Service changed: {event}")
# Discover
instances = []
for value, metadata in client.get_prefix("/services/payment/"):
instances.append(value.decode())etcd is strongly consistent — when a service registers, everyone sees it immediately. Good for infrastructure components (like Kubernetes control plane) that need consensus.
Health Checks: The Critical Part
Service discovery is only useful if unhealthy instances are removed quickly. Most registries support several check types:
HTTP check: GET /health → must return 2xx
TCP check: connection to port must succeed
Script check: run a shell command, 0 exit = healthy
TTL check: service must call "still alive" every N secondsThe recovery loop:
- Instance crashes
- Health check fails
- Registry marks instance unhealthy
- Clients stop receiving that instance in lookups
- Traffic routes to remaining healthy instances
This happens in seconds, not minutes.
Choosing an Approach
Using Kubernetes? Use built-in DNS. Zero extra infrastructure.
Mixed infrastructure or non-Kubernetes? Use Consul. It integrates with VMs, containers, bare metal.
Need strong consistency for config/locks? Use etcd.
Simple internal services? Even Docker Compose gives you DNS-based discovery by service name — enough for development and small deployments.
Key Takeaways
- Service discovery tracks which instances are running and routes traffic to healthy ones
- Client-side discovery: the caller queries the registry — more control, more coupling
- Server-side discovery: a proxy/load balancer handles routing — simpler for clients
- Consul is the go-to self-hosted registry; Kubernetes DNS is the easiest if you're on k8s
- Health checks are what make discovery reliable — without them, clients hit dead instances
- Most production systems use server-side discovery without callers knowing it exists
Service discovery is invisible when it works and catastrophic when it doesn't. Set up health checks before anything else.
Related reading: Load Balancing Strategies · API Gateway Pattern · Circuit Breaker Pattern
Enjoyed this article?
Get weekly insights on backend architecture, system design, and Go programming.
Related Posts
Continue reading with these related posts
Distributed Tracing: Debug Requests Across Services
Learn how distributed tracing works and how to implement it. Covers trace IDs, spans, OpenTelemetry, Jaeger, and how to find performance bottlenecks in microservices.
Saga Pattern: Distributed Transactions Without 2PC
Learn how the saga pattern handles distributed transactions in microservices. Covers choreography vs orchestration, compensating transactions, and real examples.
Idempotency in APIs: Preventing Duplicate Operations
Learn what idempotency means in API design and why it matters for payments, retries, and distributed systems. With practical implementation patterns.