Service Discovery: How Microservices Find Each Other

You have 20 microservices. Each one runs on multiple instances. Instances start, crash, scale up, scale down — their IP addresses change constantly.

How does your order service know where to find the payment service right now?

That's the problem service discovery solves.

The Problem with Hardcoded Addresses

In a monolith, everything runs in one process. No routing needed. In microservices, services call each other over the network.

You could hardcode IP addresses:

python

PAYMENT_SERVICE_URL = "http://10.0.1.45:8080"

This breaks immediately when:

The payment service restarts with a different IP
You scale to 3 payment service instances — which one do you call?
The service moves to a different server

You need a dynamic way to find services.

How Service Discovery Works

Service discovery has two parts:

Service registry: A database that tracks which services are running and where. Every service registers itself when it starts and deregisters when it stops.

Discovery mechanism: How clients find a service's location. Either the client asks the registry directly, or a load balancer does it.

plaintext

Payment service starts → registers with registry: "payment-svc at 10.0.1.45:8080"
Order service needs payment → asks registry: "where is payment-svc?"
Registry replies: "10.0.1.45:8080"
Order service calls payment service

Client-Side Discovery

The client (calling service) talks to the registry directly, picks an instance, and calls it.

python

import consul
import random
 
c = consul.Consul()
 
def get_payment_service_url() -> str:
    # Query registry for healthy instances
    index, services = c.health.service("payment-service", passing=True)
    
    if not services:
        raise Exception("No healthy payment service instances")
    
    # Pick one (simple round-robin or random)
    instance = random.choice(services)
    address = instance["Service"]["Address"]
    port = instance["Service"]["Port"]
    return f"http://{address}:{port}"
 
# Usage
url = get_payment_service_url()
response = requests.post(f"{url}/charge", json={"amount": 100})

Advantage: The client controls load balancing — it can use any strategy it wants. Disadvantage: Every service needs the discovery logic. If you change how discovery works, you update every service.

Server-Side Discovery

The client calls a load balancer (or proxy). The load balancer queries the registry and forwards to the right instance. The client doesn't know discovery exists.

plaintext

Order Service → Load Balancer → Registry lookup → Payment Instance 1
                                                → Payment Instance 2
                                                → Payment Instance 3

This is how Kubernetes works. You call payment-service (a stable DNS name). Kubernetes' internal load balancer routes to a healthy pod.

Advantage: No discovery code in clients. One place to update routing logic. Disadvantage: Load balancer is an extra hop. It's a potential bottleneck and single point of failure (though you run multiple replicas).

Consul: Popular Service Registry

Consul by HashiCorp is one of the most common service registries. Services register via API or config file, Consul runs health checks.

python

import consul
 
c = consul.Consul(host="consul.internal")
 
# Register this service on startup
c.agent.service.register(
    name="payment-service",
    service_id="payment-1",
    address="10.0.1.45",
    port=8080,
    check=consul.Check.http("http://10.0.1.45:8080/health", interval="10s")
)
 
# Consul runs /health check every 10s
# If it fails 3 times, marks service unhealthy
# Unhealthy services don't appear in discovery results

Health checks are critical — they prevent traffic from routing to a crashed or unresponsive instance.

Kubernetes: Built-in Service Discovery

Kubernetes has service discovery built in via DNS. Every Service object gets a stable DNS name.

yaml

# payment-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: payment-service
spec:
  selector:
    app: payment
  ports:
    - port: 80
      targetPort: 8080

Now any pod in the cluster can reach payment service at:

payment-service (same namespace)
payment-service.default.svc.cluster.local (full name)

Kubernetes DNS maps this to the Service's ClusterIP, which load balances across healthy pods automatically.

python

# In Kubernetes — just use the service name
response = requests.post("http://payment-service/charge", json={"amount": 100})
# No registry calls, no IP management — Kubernetes handles it

When a pod dies, Kubernetes removes it from the endpoints list. New requests go to healthy pods only.

etcd: Discovery for Infrastructure

etcd is a distributed key-value store often used for service discovery at the infrastructure level (it's what Kubernetes itself uses internally).

python

import etcd3
 
client = etcd3.client()
 
# Register
client.put("/services/payment/instance-1", "10.0.1.45:8080")
 
# Watch for changes
for event in client.watch_prefix("/services/payment/"):
    print(f"Service changed: {event}")
 
# Discover
instances = []
for value, metadata in client.get_prefix("/services/payment/"):
    instances.append(value.decode())

etcd is strongly consistent — when a service registers, everyone sees it immediately. Good for infrastructure components (like Kubernetes control plane) that need consensus.

Health Checks: The Critical Part

Service discovery is only useful if unhealthy instances are removed quickly. Most registries support several check types:

plaintext

HTTP check:   GET /health → must return 2xx
TCP check:    connection to port must succeed
Script check: run a shell command, 0 exit = healthy
TTL check:    service must call "still alive" every N seconds

The recovery loop:

Instance crashes
Health check fails
Registry marks instance unhealthy
Clients stop receiving that instance in lookups
Traffic routes to remaining healthy instances

This happens in seconds, not minutes.

Choosing an Approach

Using Kubernetes? Use built-in DNS. Zero extra infrastructure.

Mixed infrastructure or non-Kubernetes? Use Consul. It integrates with VMs, containers, bare metal.

Need strong consistency for config/locks? Use etcd.

Simple internal services? Even Docker Compose gives you DNS-based discovery by service name — enough for development and small deployments.

Key Takeaways

Service discovery tracks which instances are running and routes traffic to healthy ones
Client-side discovery: the caller queries the registry — more control, more coupling
Server-side discovery: a proxy/load balancer handles routing — simpler for clients
Consul is the go-to self-hosted registry; Kubernetes DNS is the easiest if you're on k8s
Health checks are what make discovery reliable — without them, clients hit dead instances
Most production systems use server-side discovery without callers knowing it exists

Service discovery is invisible when it works and catastrophic when it doesn't. Set up health checks before anything else.

Service Discovery: How Microservices Find Each Other

The Problem with Hardcoded Addresses

How Service Discovery Works

Client-Side Discovery

Server-Side Discovery

Consul: Popular Service Registry

Kubernetes: Built-in Service Discovery

etcd: Discovery for Infrastructure

Health Checks: The Critical Part

Choosing an Approach

Key Takeaways

Enjoyed this article?

Related Posts

Distributed Tracing: Debug Requests Across Services

Saga Pattern: Distributed Transactions Without 2PC

Idempotency in APIs: Preventing Duplicate Operations