Istio in Action 7장 - 관찰 가능성: 서비스의 동작 이해하기

Kubernetes

Istio in Action 7장 - 관찰 가능성: 서비스의 동작 이해하기

mokpolar 2025. 9. 17. 10:14

안녕하세요?
Istio in Action 책을 공부하면서 내용을 조금씩 정리해보려고 합니다.

7장은 관찰 가능성: 서비스의 동작 이해하기 입니다.

실습 환경 준비

MacOS, OrbStack으로 Container 구동
Kind, k8s 1.33.1
istioctl 1.27.0

curl -L https://istio.io/downloadIstio | sh -

istioctl install --set profile=demo -y
        |\
        | \
        |  \
        |   \
      /||    \
     / ||     \
    /  ||      \
   /   ||       \
  /    ||        \
 /     ||         \
/______||__________\
____________________
  \__       _____/
     \_____/

WARNING: Istio is being upgraded from 1.13.0 to 1.27.0.
         Running this command will overwrite it; use revisions to upgrade alongside the existing version.
         Before upgrading, you may wish to use 'istioctl x precheck' to check for upgrade warnings.
✔ Istio core installed ⛵️
✔ Istiod installed 🧠
✔ Egress gateways installed 🛫
✔ Ingress gateways installed 🛬
✔ Installation complete

7.1 관찰 가능성이란 무엇인가?

관찰 가능성 : 외부 신호와 특성만 보고도 시스템의 내부 상태를 이해하고 추론할 수 있는 수준
애플리케이션 계측, 네트워크 계측, 시그널 수집 인프라, 데이터베이스 뿐 아니라 예기치 못한 일이 일어났을 때 방대한 데이터를 잘 추리고 결합해 전체 그림을 그려내야 한다.

7.1.1 관찰 가능성 vs 모니터링

모니터링 : 메트릭, 로그, 트레이스 등을 수집 및 집계하고 시스템 상태를 미리 정의한 기준과 비교하는 관행
- 하나가 임계값을 넘겨 불량 상태로 향하고 있으면 시스템을 바로잡기 위한 조치
- 바람직하지 않다고 알려진 상태를 감시하고 경고하기 위해 메트릭을 수집하고 집계
이에 비해 관찰 가능성의 특성은
- 시스템을 예측하기 매우 어려운 것이라 모든 고장을 사전에 알 수 없다고 가정
- 그래서 더 많은 데이터, 카디널리티가 높은 데이터까지 수집하고 빠르게 탐색하고 질문

7.1.2 이스티오는 어떻게 관찰 가능성을 돕는가?

이스티오의 데이터 플레인 프록시, 엔보이는 서비스 간 네트워크 요청 경로에 있다.
그래서 요청 처리와 서비스 상호작용에 관한 중요 메트릭을 포착할 수 있다.
예를 들어, 초당 요청 수, 요청 처리에 걸리는 시간, 실패한 요청 수
동적으로 새 메트릭을 추가할 수도 있다.

7.2 이스티오 메트릭 살펴보기

7.2.1 데이터 플레인의 메트릭

7장용 예제 서비스 배포

k apply -f services/catalog/kubernetes/catalog.yaml
serviceaccount/catalog unchanged
service/catalog created
deployment.apps/catalog created

 k apply -f services/webapp/kubernetes/webapp.yaml
serviceaccount/webapp unchanged
service/webapp created
deployment.apps/webapp created

k apply -f services/webapp/istio/webapp-catalog-gw-vs.yaml
gateway.networking.istio.io/coolstore-gateway created
virtualservice.networking.istio.io/webapp-virtualservice created

서비스 접근 확인

curl http://192.168.97.2:31733/api/catalog -H "Host: webapp.istioinaction.io"
[{"id":1,"color":"amber","department":"Eyewear","name":"Elinor Glasses","price":"282.00"},{"id":2,"color":"cyan","department":"Clothing","name":"Atlas Shirt","price":"127.00"},{"id":3,"color":"teal","department":"Clothing","name":"Small Metal Shoes","price":"232.00"},{"id":4,"color":"red","department":"Watches","name":"Red Dragon Watch","price":"232.00"}]

istio_requests_total 이라는 부분을 보면 인그레스 게이트웨이에서 webapp 서비스로 들어오는 요청에 대한 메트릭이라는 사실을 알 수 있음
istio_requests_total
istio_requests_bytes
istio_response_bytes
istio_requests_duration
istio_request_duration_milliseconds

프록시가 엔보이 통계를 더 많이 보고하도록 설정하기

애플리케이션의 호출이 자신의 클라이언트 측 프록시를 거쳐갈 때, 프록시는 라우팅 결정을 내리고 업스트림 클러스터로 라우팅한다.
업스트림 클러스터 : 관련 설정 ( 로드밸런싱, 보안, 서킷브레이커 설정 등)을 적용해 실제 호출되는 서비스

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: control-plane
spec:
  profile: demo
  meshConfig:
    defaultConfig: <------------- 모든 서비스용 기본 프록시 설정 정의
      proxyStatsMatcher: <------- 보고할 메트릭 커스터마이징
        inclusionPrefixes: <----- 기본 메트릭에 더해 여기의 접두사와 일치하는 메트릭
        - "cluster.outbound|80|catalog.istioinaction"

메시 전체에서 수집하는 메트릭을 늘리면 메트릭 수집시스템을 과부하 상태로 만들 수 있음
더 좋은 방법은 워크로드 별로 애너테이션으로 포함할 메트릭을 지정하는 것

metadata:
  annotation:
    proxy.istio.io/config: |- <------ webapp 복제본용 프록시 설정
      proxyStatsMatcher:
        inclusionPrefixes:
        - "cluster.outbound|80|catalog.istioinaction"

k apply -f ch7/webapp-deployment-stats-inclusion.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: webapp
  name: webapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: webapp
  template:
    metadata:
      annotations:
        proxy.istio.io/config: |-
          proxyStatsMatcher:
            inclusionPrefixes:
            - "cluster.outbound|80||catalog.istioinaction"
      labels:
        app: webapp
    spec:
      containers:
      - env:
        - name: KUBERNETES_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: CATALOG_SERVICE_HOST
          value: catalog.istioinaction
        - name: CATALOG_SERVICE_PORT
          value: "80"
        - name: FORUM_SERVICE_HOST
          value: forum.istioinaction
        - name: FORUM_SERVICE_PORT
          value: "80"
        image: istioinaction/webapp:latest
        imagePullPolicy: IfNotPresent
        name: webapp
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        securityContext:
          privileged: false

curl http://192.168.97.2:31733/api/catalog -H "Host: webapp.istioinaction.io"
[{"id":1,"color":"amber","department":"Eyewear","name":"Elinor Glasses","price":"282.00"},{"id":2,"color":"cyan","department":"Clothing","name":"Atlas Shirt","price":"127.00"},{"id":3,"color":"teal","department":"Clothing","name":"Small Metal Shoes","price":"232.00"},{"id":4,"color":"red","department":"Watches","name":"Red Dragon Watch","price":"232.00"}]

그리고 istio stats 긁어오기

k exec -it deploy/webapp -c istio-proxy -- curl localhost:15000/stats | grep catalog

이 메트릭들은 업스트림 클러스터로 향하는 커넥션 혹은 요청에서 서킷 브레이커가 적용되고 있는지 여부를 보여줌

cluster.outbound|80||catalog.circuit_breaker.deafault.cx_open: 0
cluster.outbound|80||catalog.circuit_breaker.deafault.cx_pool_open: 0
cluster.outbound|80||catalog.circuit_breaker.deafault.rq_open: 0
cluster.outbound|80||catalog.circuit_breaker.deafault.rq_pending_open: 0
cluster.outbound|80||catalog.circuit_breaker.deafault.cx_req_retry_open: 0

엔보이는 트래픽을 식별할때 출처가 내부인지 외부인지를 구분함.
- 내부 : 메시 내부
- 외부 : 메시 외부에서 시작(인그레스 게이트웨이로 들어온)
cluster_name.internal.* : 메시 내부에서 시작해 성공한 요청 갯수
cluster_name.ssl.* : 트래픽이 TLS로 업스트림 클러스터로 이동하는지 여부

7.2.2 컨트롤 플레인의 메트릭

컨트롤 플레인의 메트릭 호출하기

k exec -it -n istio-system deploy/istiod -- curl localhost:15014/metrics
# HELP citadel_server_csr_count The number of CSRs received by Citadel server.
# TYPE citadel_server_csr_count counter
citadel_server_csr_count 2
# HELP citadel_server_root_cert_expiry_seconds The time remaining, in seconds, before the root cert will expire. A negative value indicates the cert is expired.
# TYPE citadel_server_root_cert_expiry_seconds gauge
citadel_server_root_cert_expiry_seconds 3.12768891974063e+08
# HELP citadel_server_root_cert_expiry_timestamp The unix timestamp, in seconds, when the root cert will expire.
# TYPE citadel_server_root_cert_expiry_timestamp gauge
citadel_server_root_cert_expiry_timestamp 2.070837463e+09
# HELP citadel_server_success_cert_issuance_count The number of certificates issuances that have succeeded.
# TYPE citadel_server_success_cert_issuance_count counter
citadel_server_success_cert_issuance_count 2
# HELP endpoint_no_pod Endpoints without an associated pod.
# TYPE endpoint_no_pod gauge
endpoint_no_pod 0
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 2.5541e-05
go_gc_duration_seconds{quantile="0.25"} 5.8291e-05
go_gc_duration_seconds{quantile="0.5"} 0.000922291
go_gc_duration_seconds{quantile="0.75"} 0.002063167
go_gc_duration_seconds{quantile="1"} 0.008150865
go_gc_duration_seconds_sum 0.022038092
go_gc_duration_seconds_count 15
# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the GOGC environment variable, and the runtime/debug.SetGCPercent function. Sourced from /gc/gogc:percent.
# TYPE go_gc_gogc_percent gauge
go_gc_gogc_percent 100
# HELP go_gc_gomemlimit_bytes Go runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function. Sourced from /gc/gomemlimit:bytes.
# TYPE go_gc_gomemlimit_bytes gauge
go_gc_gomemlimit_bytes 4.171526144e+09
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 673
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.24.4"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated in heap and currently in use. Equals to /memory/classes/heap/objects:bytes.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.7508984e+07
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated in heap until now, even if released already. Equals to /gc/heap/allocs:bytes.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 7.8924736e+07
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table. Equals to /memory/classes/profiling/buckets:bytes.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.593119e+06
# HELP go_memstats_frees_total Total number of heap objects frees. Equals to /gc/heap/frees:objects + /gc/heap/tiny/allocs:objects.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 528895
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. Equals to /memory/classes/metadata/other:bytes.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 4.105184e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and currently in use, same as go_memstats_alloc_bytes. Equals to /memory/classes/heap/objects:bytes.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.7508984e+07
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. Equals to /memory/classes/heap/released:bytes + /memory/classes/heap/free:bytes.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 1.3312e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. Equals to /memory/classes/heap/objects:bytes + /memory/classes/heap/unused:bytes
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 2.5092096e+07
# HELP go_memstats_heap_objects Number of currently allocated objects. Equals to /gc/heap/objects:objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 90454
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS. Equals to /memory/classes/heap/released:bytes.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 9.904128e+06
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system. Equals to /memory/classes/heap/objects:bytes + /memory/classes/heap/unused:bytes + /memory/classes/heap/released:bytes + /memory/classes/heap/free:bytes.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 3.8404096e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.758068543081912e+09
# HELP go_memstats_mallocs_total Total number of heap objects allocated, both live and gc-ed. Semantically a counter version for go_memstats_heap_objects gauge. Equals to /gc/heap/allocs:objects + /gc/heap/tiny/allocs:objects.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 619349
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures. Equals to /memory/classes/metadata/mcache/inuse:bytes.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 9664
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system. Equals to /memory/classes/metadata/mcache/inuse:bytes + /memory/classes/metadata/mcache/free:bytes.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 15704
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures. Equals to /memory/classes/metadata/mspan/inuse:bytes.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 427840
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system. Equals to /memory/classes/metadata/mspan/inuse:bytes + /memory/classes/metadata/mspan/free:bytes.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 456960
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place. Equals to /gc/heap/goal:bytes.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 3.514093e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations. Equals to /memory/classes/other:bytes.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.780657e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes obtained from system for stack allocator in non-CGO environments. Equals to /memory/classes/heap/stacks:bytes.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 3.538944e+06
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator. Equals to /memory/classes/heap/stacks:bytes + /memory/classes/os-stacks:bytes.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 3.538944e+06
# HELP go_memstats_sys_bytes Number of bytes obtained from system. Equals to /memory/classes/total:byte.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 4.9894664e+07
# HELP go_sched_gomaxprocs_threads The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously. Sourced from /sched/gomaxprocs:threads.
# TYPE go_sched_gomaxprocs_threads gauge
go_sched_gomaxprocs_threads 8
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 13
# HELP grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.
# TYPE grpc_server_handled_total counter
grpc_server_handled_total{grpc_code="OK",grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2
# HELP grpc_server_handling_seconds Histogram of response latency (seconds) of gRPC that had been application-level handled by the server.
# TYPE grpc_server_handling_seconds histogram
grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.005"} 1
grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.01"} 2
grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.025"} 2
grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.05"} 2
grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.1"} 2
grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.25"} 2
grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.5"} 2
grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="1"} 2
grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="2.5"} 2
grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="5"} 2
grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="10"} 2
grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="+Inf"} 2
grpc_server_handling_seconds_sum{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 0.008609167000000001
grpc_server_handling_seconds_count{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2
# HELP grpc_server_msg_received_total Total number of RPC stream messages received on the server.
# TYPE grpc_server_msg_received_total counter
grpc_server_msg_received_total{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2
# HELP grpc_server_msg_sent_total Total number of gRPC stream messages sent by the server.
# TYPE grpc_server_msg_sent_total counter
grpc_server_msg_sent_total{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2
# HELP grpc_server_started_total Total number of RPCs started on the server.
# TYPE grpc_server_started_total counter
grpc_server_started_total{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2
# HELP istio_build Istio component build info
# TYPE istio_build gauge
istio_build{component="pilot",tag="1.27.0"} 1
# HELP istiod_managed_clusters Number of clusters managed by istiod
# TYPE istiod_managed_clusters gauge
istiod_managed_clusters{cluster_type="local"} 1
istiod_managed_clusters{cluster_type="remote"} 0
# HELP istiod_uptime_seconds Current istiod server uptime in seconds
# TYPE istiod_uptime_seconds gauge
istiod_uptime_seconds 787.721140208
# HELP pilot_conflict_inbound_listener Number of conflicting inbound listeners.
# TYPE pilot_conflict_inbound_listener gauge
pilot_conflict_inbound_listener 0
# HELP pilot_conflict_outbound_listener_tcp_over_current_tcp Number of conflicting tcp listeners with current tcp listener.
# TYPE pilot_conflict_outbound_listener_tcp_over_current_tcp gauge
pilot_conflict_outbound_listener_tcp_over_current_tcp 0
# HELP pilot_debounce_time Delay in seconds between the first config enters debouncing and the merged push request is pushed into the push queue (includes pushcontext_init_seconds).
# TYPE pilot_debounce_time histogram
pilot_debounce_time_bucket{le="0.01"} 0
pilot_debounce_time_bucket{le="0.1"} 0
pilot_debounce_time_bucket{le="1"} 10
pilot_debounce_time_bucket{le="3"} 10
pilot_debounce_time_bucket{le="5"} 10
pilot_debounce_time_bucket{le="10"} 10
pilot_debounce_time_bucket{le="20"} 10
pilot_debounce_time_bucket{le="30"} 10
pilot_debounce_time_bucket{le="+Inf"} 10
pilot_debounce_time_sum 1.139648459
pilot_debounce_time_count 10
# HELP pilot_destrule_subsets Duplicate subsets across destination rules for same host
# TYPE pilot_destrule_subsets gauge
pilot_destrule_subsets 0
# HELP pilot_dns_cluster_without_endpoints DNS clusters without endpoints caused by the endpoint field in STRICT_DNS type cluster is not set or the corresponding subset cannot select any endpoint
# TYPE pilot_dns_cluster_without_endpoints gauge
pilot_dns_cluster_without_endpoints 0
# HELP pilot_duplicate_envoy_clusters Duplicate envoy clusters caused by service entries with same hostname
# TYPE pilot_duplicate_envoy_clusters gauge
pilot_duplicate_envoy_clusters 0
# HELP pilot_eds_no_instances Number of clusters without instances.
# TYPE pilot_eds_no_instances gauge
pilot_eds_no_instances 0
# HELP pilot_endpoint_not_ready Endpoint found in unready state.
# TYPE pilot_endpoint_not_ready gauge
pilot_endpoint_not_ready 0
# HELP pilot_inbound_updates Total number of updates received by pilot.
# TYPE pilot_inbound_updates counter
pilot_inbound_updates{type="config"} 56
pilot_inbound_updates{type="eds"} 33
pilot_inbound_updates{type="svc"} 9
# HELP pilot_info Pilot version and build information.
# TYPE pilot_info gauge
pilot_info{version="1.27.0-7359d8be2504f2b191f7d94156af08e6590d2d1c-Clean"} 1
# HELP pilot_k8s_cfg_events Events from k8s config.
# TYPE pilot_k8s_cfg_events counter
pilot_k8s_cfg_events{event="add",type="DestinationRule"} 2
pilot_k8s_cfg_events{event="add",type="EnvoyFilter"} 6
pilot_k8s_cfg_events{event="add",type="Gateway"} 1
pilot_k8s_cfg_events{event="add",type="VirtualService"} 2
# HELP pilot_k8s_reg_events Events from k8s registry.
# TYPE pilot_k8s_reg_events counter
pilot_k8s_reg_events{event="add",type="EndpointSlice"} 9
pilot_k8s_reg_events{event="add",type="Namespaces"} 7
pilot_k8s_reg_events{event="add",type="Nodes"} 1
pilot_k8s_reg_events{event="add",type="Pods"} 17
pilot_k8s_reg_events{event="add",type="Services"} 9
pilot_k8s_reg_events{event="delete",type="Pods"} 1
pilot_k8s_reg_events{event="update",type="EndpointSlice"} 15
pilot_k8s_reg_events{event="update",type="Nodes"} 39
pilot_k8s_reg_events{event="update",type="Pods"} 24
# HELP pilot_no_ip Pods not found in the endpoint table, possibly invalid.
# TYPE pilot_no_ip gauge
pilot_no_ip 0
# HELP pilot_proxy_convergence_time Delay in seconds between config change and a proxy receiving all required configuration.
# TYPE pilot_proxy_convergence_time histogram
pilot_proxy_convergence_time_bucket{le="0.1"} 7
pilot_proxy_convergence_time_bucket{le="0.5"} 7
pilot_proxy_convergence_time_bucket{le="1"} 7
pilot_proxy_convergence_time_bucket{le="3"} 7
pilot_proxy_convergence_time_bucket{le="5"} 7
pilot_proxy_convergence_time_bucket{le="10"} 7
pilot_proxy_convergence_time_bucket{le="20"} 7
pilot_proxy_convergence_time_bucket{le="30"} 7
pilot_proxy_convergence_time_bucket{le="+Inf"} 7
pilot_proxy_convergence_time_sum 0.00831075
pilot_proxy_convergence_time_count 7
# HELP pilot_proxy_queue_time Time in seconds, a proxy is in the push queue before being dequeued.
# TYPE pilot_proxy_queue_time histogram
pilot_proxy_queue_time_bucket{le="0.1"} 7
pilot_proxy_queue_time_bucket{le="0.5"} 7
pilot_proxy_queue_time_bucket{le="1"} 7
pilot_proxy_queue_time_bucket{le="3"} 7
pilot_proxy_queue_time_bucket{le="5"} 7
pilot_proxy_queue_time_bucket{le="10"} 7
pilot_proxy_queue_time_bucket{le="20"} 7
pilot_proxy_queue_time_bucket{le="30"} 7
pilot_proxy_queue_time_bucket{le="+Inf"} 7
pilot_proxy_queue_time_sum 0.0008617910000000001
pilot_proxy_queue_time_count 7
# HELP pilot_push_triggers Total number of times a push was triggered, labeled by reason for the push.
# TYPE pilot_push_triggers counter
pilot_push_triggers{type="endpoint"} 5
pilot_push_triggers{type="proxy"} 2
# HELP pilot_pushcontext_init_seconds Total time in seconds Pilot takes to init pushContext.
# TYPE pilot_pushcontext_init_seconds histogram
pilot_pushcontext_init_seconds_bucket{le="0.01"} 3
pilot_pushcontext_init_seconds_bucket{le="0.1"} 3
pilot_pushcontext_init_seconds_bucket{le="0.5"} 3
pilot_pushcontext_init_seconds_bucket{le="1"} 3
pilot_pushcontext_init_seconds_bucket{le="3"} 3
pilot_pushcontext_init_seconds_bucket{le="5"} 3
pilot_pushcontext_init_seconds_bucket{le="+Inf"} 3
pilot_pushcontext_init_seconds_sum 0.0020785
pilot_pushcontext_init_seconds_count 3
# HELP pilot_services Total services known to pilot.
# TYPE pilot_services gauge
pilot_services 9
# HELP pilot_virt_services Total virtual services known to pilot.
# TYPE pilot_virt_services gauge
pilot_virt_services 2
# HELP pilot_vservice_dup_domain Virtual services with dup domains.
# TYPE pilot_vservice_dup_domain gauge
pilot_vservice_dup_domain 0
# HELP pilot_xds Number of endpoints connected to this pilot using XDS.
# TYPE pilot_xds gauge
pilot_xds{version="1.27.0"} 2
# HELP pilot_xds_config_size_bytes Distribution of configuration sizes pushed to clients
# TYPE pilot_xds_config_size_bytes histogram
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="1"} 0
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="10000"} 0
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="1e+06"} 4
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="4e+06"} 4
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="1e+07"} 4
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="4e+07"} 4
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="+Inf"} 4
pilot_xds_config_size_bytes_sum{type="type.googleapis.com/envoy.config.cluster.v3.Cluster"} 85400
pilot_xds_config_size_bytes_count{type="type.googleapis.com/envoy.config.cluster.v3.Cluster"} 4
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="1"} 0
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="10000"} 9
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="1e+06"} 9
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="4e+06"} 9
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="1e+07"} 9
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="4e+07"} 9
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="+Inf"} 9
pilot_xds_config_size_bytes_sum{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment"} 13256
pilot_xds_config_size_bytes_count{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment"} 9
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="1"} 2
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="10000"} 4
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="1e+06"} 4
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="4e+06"} 4
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="1e+07"} 4
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="4e+07"} 4
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="+Inf"} 4
pilot_xds_config_size_bytes_sum{type="type.googleapis.com/envoy.config.listener.v3.Listener"} 6844
pilot_xds_config_size_bytes_count{type="type.googleapis.com/envoy.config.listener.v3.Listener"} 4
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="1"} 0
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="10000"} 2
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="1e+06"} 2
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="4e+06"} 2
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="1e+07"} 2
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="4e+07"} 2
pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="+Inf"} 2
pilot_xds_config_size_bytes_sum{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration"} 1076
pilot_xds_config_size_bytes_count{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration"} 2
# HELP pilot_xds_push_time Total time in seconds Pilot takes to push lds, rds, cds and eds.
# TYPE pilot_xds_push_time histogram
pilot_xds_push_time_bucket{type="cds",le="0.01"} 4
pilot_xds_push_time_bucket{type="cds",le="0.1"} 4
pilot_xds_push_time_bucket{type="cds",le="1"} 4
pilot_xds_push_time_bucket{type="cds",le="3"} 4
pilot_xds_push_time_bucket{type="cds",le="5"} 4
pilot_xds_push_time_bucket{type="cds",le="10"} 4
pilot_xds_push_time_bucket{type="cds",le="20"} 4
pilot_xds_push_time_bucket{type="cds",le="30"} 4
pilot_xds_push_time_bucket{type="cds",le="+Inf"} 4
pilot_xds_push_time_sum{type="cds"} 0.003247625
pilot_xds_push_time_count{type="cds"} 4
pilot_xds_push_time_bucket{type="eds",le="0.01"} 9
pilot_xds_push_time_bucket{type="eds",le="0.1"} 9
pilot_xds_push_time_bucket{type="eds",le="1"} 9
pilot_xds_push_time_bucket{type="eds",le="3"} 9
pilot_xds_push_time_bucket{type="eds",le="5"} 9
pilot_xds_push_time_bucket{type="eds",le="10"} 9
pilot_xds_push_time_bucket{type="eds",le="20"} 9
pilot_xds_push_time_bucket{type="eds",le="30"} 9
pilot_xds_push_time_bucket{type="eds",le="+Inf"} 9
pilot_xds_push_time_sum{type="eds"} 0.0025550819999999998
pilot_xds_push_time_count{type="eds"} 9
pilot_xds_push_time_bucket{type="lds",le="0.01"} 4
pilot_xds_push_time_bucket{type="lds",le="0.1"} 4
pilot_xds_push_time_bucket{type="lds",le="1"} 4
pilot_xds_push_time_bucket{type="lds",le="3"} 4
pilot_xds_push_time_bucket{type="lds",le="5"} 4
pilot_xds_push_time_bucket{type="lds",le="10"} 4
pilot_xds_push_time_bucket{type="lds",le="20"} 4
pilot_xds_push_time_bucket{type="lds",le="30"} 4
pilot_xds_push_time_bucket{type="lds",le="+Inf"} 4
pilot_xds_push_time_sum{type="lds"} 0.0073142089999999995
pilot_xds_push_time_count{type="lds"} 4
pilot_xds_push_time_bucket{type="rds",le="0.01"} 2
pilot_xds_push_time_bucket{type="rds",le="0.1"} 2
pilot_xds_push_time_bucket{type="rds",le="1"} 2
pilot_xds_push_time_bucket{type="rds",le="3"} 2
pilot_xds_push_time_bucket{type="rds",le="5"} 2
pilot_xds_push_time_bucket{type="rds",le="10"} 2
pilot_xds_push_time_bucket{type="rds",le="20"} 2
pilot_xds_push_time_bucket{type="rds",le="30"} 2
pilot_xds_push_time_bucket{type="rds",le="+Inf"} 2
pilot_xds_push_time_sum{type="rds"} 0.000928501
pilot_xds_push_time_count{type="rds"} 2
# HELP pilot_xds_pushes Pilot build and send errors for lds, rds, cds and eds.
# TYPE pilot_xds_pushes counter
pilot_xds_pushes{type="cds"} 4
pilot_xds_pushes{type="eds"} 9
pilot_xds_pushes{type="lds"} 4
pilot_xds_pushes{type="rds"} 2
# HELP pilot_xds_send_time Total time in seconds Pilot takes to send generated configuration.
# TYPE pilot_xds_send_time histogram
pilot_xds_send_time_bucket{le="0.01"} 19
pilot_xds_send_time_bucket{le="0.1"} 19
pilot_xds_send_time_bucket{le="1"} 19
pilot_xds_send_time_bucket{le="3"} 19
pilot_xds_send_time_bucket{le="5"} 19
pilot_xds_send_time_bucket{le="10"} 19
pilot_xds_send_time_bucket{le="20"} 19
pilot_xds_send_time_bucket{le="30"} 19
pilot_xds_send_time_bucket{le="+Inf"} 19
pilot_xds_send_time_sum 0.0013460400000000002
pilot_xds_send_time_count 19
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 8.46
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.073741816e+09
# HELP process_network_receive_bytes_total Number of bytes received by the process over the network.
# TYPE process_network_receive_bytes_total counter
process_network_receive_bytes_total 3.284157e+06
# HELP process_network_transmit_bytes_total Number of bytes sent by the process over the network.
# TYPE process_network_transmit_bytes_total counter
process_network_transmit_bytes_total 951206
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 17
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 5.8810368e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.75806778225e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.366982656e+09
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19

이 중 워크로드 이니증서 요청 CSR 에 서명하는데 사용하는 루트 인증서의 만료 시점과 컨트롤 플레인에 들어온 CSR 요청 및 발급된 인증서 개수를 확인 가능

# HELP citadel_server_root_cert_expiry_seconds The time remaining, in seconds, before the root cert will expire. A negative value indicates the cert is expired.
# TYPE citadel_server_root_cert_expiry_seconds gauge
citadel_server_root_cert_expiry_seconds 3.12768891974063e+08
# HELP citadel_server_root_cert_expiry_timestamp The unix timestamp, in seconds, when the root cert will expire.
# TYPE citadel_server_root_cert_expiry_timestamp gauge
citadel_server_root_cert_expiry_timestamp 2.070837463e+09
# HELP citadel_server_success_cert_issuance_count The number of certificates issuances that have succeeded.
# TYPE citadel_server_success_cert_issuance_count counter
citadel_server_success_cert_issuance_count 2

istio 버젼

# HELP istio_build Istio component build info
# TYPE istio_build gauge
istio_build{component="pilot",tag="1.27.0"} 1

설정을 데이터 플레인 프록시에 밀어넣고 동기화 하는데 소요되는 시간의 분포

# HELP pilot_proxy_convergence_time Delay in seconds between config change and a proxy receiving all required configuration.
# TYPE pilot_proxy_convergence_time histogram
pilot_proxy_convergence_time_bucket{le="0.1"} 7
pilot_proxy_convergence_time_bucket{le="0.5"} 7
pilot_proxy_convergence_time_bucket{le="1"} 7
pilot_proxy_convergence_time_bucket{le="3"} 7
pilot_proxy_convergence_time_bucket{le="5"} 7
pilot_proxy_convergence_time_bucket{le="10"} 7
pilot_proxy_convergence_time_bucket{le="20"} 7
pilot_proxy_convergence_time_bucket{le="30"} 7
pilot_proxy_convergence_time_bucket{le="+Inf"} 7
pilot_proxy_convergence_time_sum 0.00831075
pilot_proxy_convergence_time_count 7

컨트롤 플레인에 알려진 서비스 개수, 사용자가 설정한 VirtualService 리소스 개수, 연결된 프록시 개수

# HELP pilot_services Total services known to pilot.
# TYPE pilot_services gauge
pilot_services 9
# HELP pilot_virt_services Total virtual services known to pilot.
# TYPE pilot_virt_services gauge
pilot_virt_services 2
# HELP pilot_vservice_dup_domain Virtual services with dup domains.
# TYPE pilot_vservice_dup_domain gauge
pilot_vservice_dup_domain 0
# HELP pilot_xds Number of endpoints connected to this pilot using XDS.
# TYPE pilot_xds gauge
pilot_xds{version="1.27.0"} 2

특정 xDS API의 업데이트 횟수

# HELP pilot_xds_pushes Pilot build and send errors for lds, rds, cds and eds.
# TYPE pilot_xds_pushes counter
pilot_xds_pushes{type="cds"} 4
pilot_xds_pushes{type="eds"} 9
pilot_xds_pushes{type="lds"} 4
pilot_xds_pushes{type="rds"} 2

7.3 프로메테우스로 이스티오 메트릭 긁어오기

7.3.1 프로메테우스와 그라파나 설정하기

helm repo add prometheus-community https://prometheus-community.github.io/helm
-charts
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
"prometheus-community" has been added to your repositories

helm repo update
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "volcano-sh" chart repository
...Successfully got an update from the "traefik" chart repository
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "milvus" chart repository
...Successfully got an update from the "vector" chart repository
...Successfully got an update from the "localstack-repo" chart repository
...Successfully got an update from the "localstack" chart repository
...Successfully got an update from the "flagger" chart repository
Update Complete. ⎈Happy Helming!⎈

k create ns prometheus
namespace/prometheus created


helm install prom prometheus-community/kube-prometheus-stack --version 13.13.1
 -n prometheus -f ch7/prom-values.yaml
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/juyoungjung/.kube/config

k get po -n prometheus
NAME                                                   READY   STATUS             RESTARTS      AGE
prom-grafana-57bcb4cc59-rvgsd                          2/2     Running            0             68s
prom-kube-prometheus-stack-admission-patch-tbbc4       0/1     CrashLoopBackOff   3 (26s ago)   66s
prom-kube-prometheus-stack-operator-69888c5fb6-bv2pj   1/1     Running            0             68s
prometheus-prom-kube-prometheus-stack-prometheus-0     2/2     Running            1 (9s ago)    39s

7.3.2 이스티오 컨트롤 플레인과 워크로드를 긁어가도록 프로메테우스 오퍼레이터 설정하기

프로메테우스 오퍼레이터의 커스텀 리소스 ServiceMonitor, PodMonitor
- 프로메테우스가 이스티오에서 메트릭을 수집하도록 설정
아래와 같이 이스티오 컨트롤 플레인 구성요소를 긁어오도록 설정할 수 있음

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: istio-component-monitor
  namespace: prometheus
  labels:
    monitoring: istio-components
    release: prom
spec:
  jobLabel: istio
  targetLabels: [app]
  selector:
    matchExpressions:
    - {key: istio, operator: In, values: [pilot]}
  namespaceSelector:
    any: true
  endpoints:
  - port: http-monitoring
    interval: 15s

k apply -f ch7/service-monitor-cp.yaml
servicemonitor.monitoring.coreos.com/istio-component-monitor created

이제 프로메테우스에서 컨트롤 플레인에 대한 텔레메트리를 볼 수 있음
- 컨트롤 플레인에 연결된 사이드카 개수
- 설정 충돌
- 메시 내부 변동량
- 컨트롤 플레인의 기본적인 메모리/CPU 사용량

k -n prometheus port-forward statefulset/prometheus-prom-kube-prometheus-stack
-prometheus 9090
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090

데이터 플레인 수집 활성화

k apply -f ch7/pod-monitor-dp.yaml
podmonitor.monitoring.coreos.com/envoy-stats-monitor created

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: envoy-stats-monitor
  namespace: prometheus
  labels:
    monitoring: istio-proxies
    release: prom
spec:
  selector:
    matchExpressions:
    - {key: istio-prometheus-ignore, operator: DoesNotExist}
  namespaceSelector:
    any: true
  jobLabel: envoy-stats
  podMetricsEndpoints:
  - path: /stats/prometheus
    interval: 15s
    relabelings:
    - action: keep
      sourceLabels: [__meta_kubernetes_pod_container_name]
      regex: "istio-proxy"
    - action: keep
      sourceLabels: [__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape]
    - sourceLabels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: $1:$2
      targetLabel: __address__
    - action: labeldrop
      regex: "__meta_kubernetes_pod_label_(.+)"
    - sourceLabels: [__meta_kubernetes_namespace]
      action: replace
      targetLabel: namespace
    - sourceLabels: [__meta_kubernetes_pod_name]
      action: replace
      targetLabel: pod_name

7.4 이스티오 표준 메트릭 커스터마이징하기

istio_requests_total : COUNTER, 요청이 들어올 때마다 증가
istio_request_duration_milliseconds : DISTRIBUTION, 요청 지속 시간의 분포
istio_request_bytes : DISTRIBUTION, 요청 바디 크기의 분포
istio_response_bytes : DISTRIBUTION, 응답 바디 크기의 분포
istio_request_messages_total : (gRPC) COUNTER, 클라이언트에게서 메시지가 올 때마다 증가
istio_response_messages_total : (gRPC) COUNTER, 서버가 메시지를 보낼 때마다 증가
3가지 주요 개념
- 메트릭 metric
  - 서비스 호출(인, 아웃바운드) 간 텔레메트리의 카운터나 게이지, 히스토그램, 분포 distribution
  - istio_requests_total 메트릭은 서비스로 향하는 (인바운드) 혹은 서비스에서 나오는(아웃바운드) 요청의 총개수를 센다.
  - 서비스에 인/아웃바운드 요청이 모두 있다면 istio_requests_total 메트릭에는 항목이 2개 표시
- 디멘션 dimenstion
  - 인/아웃바운드
  - 메트릭과 디멘션 조합마다 통계가 따로 표시
  - 메트릭에는 디멘션이 여럿일 수 있음
  - istio_requests_toal의 기본 디멘션
    - response_code="200" : 요청 세부 정보
    - reporter="destination" : 메트릭이 누구의 관점인가
    - source_app="istio_ingressgateway" : 호출 주체
    - destination_app="webapp" : 호출 대상
  - 디멘션 중 하나라도 다르면 메트릭의 새로운 항목으로 보인다.
    - 응답 코드가 500인 경우면 다른 줄에서 표시
  - 디멘션이 다르면 istio_requests_total에 대해서 서로 다른 2개의 항목이 보인다.
- 속성 attribute
  - 특정 디멘션의 값은 attribute에서 갖고 온다.
  - 엔보이 프록시가 런타임에 갖고있는 값이 attribute
    - reqeust.path : url중 경로 부분
    - reqeust.url_path : url 중 경로 부분, 쿼리 문자열 제외
    - reqeust.host : 호스트 부분
    - reqeust.scheme : 스킴 부분 (예: http)
    - reqeust.method : 요청 메서드
    - reqeust.headers : 모든 요청 헤더. 헤더 이름은 소문자로 변환
  - 응답 속성
  - 커넥션 속성
  - 업스트림 속성
  - 메타데이터/필터 상태 속성
  - 웹어셈블리 속성

7.4.1 기존 메트릭 설정하기

이스티오 메트릭은 EnvoyFilter 리소스를 사용해 stats 프록시 플러그인에서 설정한다.

k get envoyfilter -n istio-system
NAME                    AGE
stats-filter-1.11       30d
stats-filter-1.12       30d
stats-filter-1.13       30d
tcp-stats-filter-1.11   30d
tcp-stats-filter-1.12   30d
tcp-stats-filter-1.13   30d

아래 엔보이 필터는 istio.stats라는 필터를 직접 구성한다.
이 필터는 통계 기능을 구현하는 웹어셈블리 플러그인이다.
이 웹어셈블리 필터는 실제로는 엔보이 코드베이스 내에서 직접 컴파일돼 NULL 가상머신에서 실행된다.
그래서 웹어셈블리 가상머신에서 실행되지 않는다.

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  creationTimestamp: "2025-08-18T00:37:27Z"
  generation: 1
  labels:
    install.operator.istio.io/owning-resource-namespace: istio-system
    istio.io/rev: default
    operator.istio.io/component: Pilot
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.13.0
  name: stats-filter-1.13
  namespace: istio-system
  resourceVersion: "869"
  uid: a8e8b9ff-3bc4-4ced-8857-e19d400bc151
spec:
  configPatches:
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_OUTBOUND
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: envoy.filters.http.router
      proxy:
        proxyVersion: ^1\.13.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.stats
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config:
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {
                    "debug": "false",
                    "stat_prefix": "istio"
                  }
              root_id: stats_outbound
              vm_config:
                code:
                  local:
                    inline_string: envoy.wasm.stats
                runtime: envoy.wasm.runtime.null
                vm_id: stats_outbound
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_INBOUND
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: envoy.filters.http.router
      proxy:
        proxyVersion: ^1\.13.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.stats <--------- 필터 이름
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config: <--------------- 필터 설정
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {
                    "debug": "false",
                    "stat_prefix": "istio",
                    "disable_host_header_fallback": true,
                    "metrics": [
                      {
                        "dimensions": {
                          "destination_cluster": "node.metadata['CLUSTER_ID']",
                          "source_cluster": "downstream_peer.cluster_id"
                        }
                      }
                    ]
                  }
              root_id: stats_inbound
              vm_config:
                code:
                  local:
                    inline_string: envoy.wasm.stats
                runtime: envoy.wasm.runtime.null
                vm_id: stats_inbound
  - applyTo: HTTP_FILTER
    match:
      context: GATEWAY
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: envoy.filters.http.router
      proxy:
        proxyVersion: ^1\.13.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.stats
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config:
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {
                    "debug": "false",
                    "stat_prefix": "istio",
                    "disable_host_header_fallback": true
                  }
              root_id: stats_outbound
              vm_config:
                code:
                  local:
                    inline_string: envoy.wasm.stats
                runtime: envoy.wasm.runtime.null
                vm_id: stats_outbound

기존 메트릭에 디멘션 추가하기

istio_requests_total 메트릭에 디멘션을 2개 추가하고 싶다면?
업스트림 호출에서 meshID별로 프록시의 버젼이 어떤지 확인하기
아래 설정에서는 requests_total 메트릭을 특정해 속성에서 오는 디멘션 둘이 새로 포함되도록 설정
request_protocol 디멘션을 제거

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: demo
  values:
    telemetry:
      v2:
        prometheus:
          configOverride:
            inboundSidecar:
              metrics:
              - name: requests_total
                dimensions: <----------- 추가한 새 디멘션
                  upstream_proxy_version: upstream_peer.istio_version
                  source_mesh_id: node.metadata['MESH_ID']
                tags_to_remove: <------- 제거한 태그 목록
                - request_protocol
            outboundSidecar:
              metrics:
              - name: requests_total
                dimensions:
                  upstream_proxy_version: upstream_peer.istio_version
                  source_mesh_id: node.metadata['MESH_ID']
                tags_to_remove:
                - request_protocol
            gateway:
              metrics:
              - name: requests_total
                dimensions:
                  upstream_proxy_version: upstream_peer.istio_version
                  source_mesh_id: node.metadata['MESH_ID']
                tags_to_remove:
                - request_protocol

그리고 이 디멘션을 메트릭에서 확인하기 전에 이스티오의 프록시가 이 디멘션에 대해 알게 해야 한다.

디플로이먼트 파드 사양에 sidecar.istio.io/extraStatTags 애너테이션을 달아야 함

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: webapp
name: webapp
spec:
replicas: 1
selector:
matchLabels:
app: webapp
template:
metadata:
annotations:
proxy.istio.io/config: |-
extraStatTags:
- "upstream_proxy_version"
- "source_mesh_id"
labels:
app: webapp
spec:
containers:
- env:
- name: KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: istioinaction/webapp:latest
imagePullPolicy: IfNotPresent
name: webapp
ports:
- containerPort: 8080
name: http
protocol: TCP
securityContext:
privileged: false


### 7.4.2 새로운 메트릭 만들기 

* 새 메트릭을 만들고 싶으면 stats 플러그인에 새 메트릭을 정의 
* istio_get_calls라는 새로운 이름을 정의 
* istio_ 접두사는 자동으로 붙는다. 
* 메트릭의 값은 CEL Common Expression Language 표현시인 문자열로 COUNTER 타입에 정수를 반환해야 한다. 
* CEL 표현식은 attribute에 대해 작동하고 아래의 경우 HTTP GET 요청 개수를 센다. 

```yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: demo
  values:
    telemetry:
      v2:
        prometheus:
          configOverride:
            inboundSidecar:
              definitions:
              - name: get_calls
                type: COUNTER
                value: "(request.method.startsWith('GET') ? 1 : 0)"
            outboundSidecar:
              definitions:
              - name: get_calls
                type: COUNTER
                value: "(request.method.startsWith('GET') ? 1 : 0)"
            gateway:
              definitions:
              - name: get_calls
                type: COUNTER
                value: "(request.method.startsWith('GET') ? 1 : 0)"

새 디멘션은 이스티오 프록시에 명시적으로 알리듯이, 새 메트릭을 만들 때는 프록시에서 노출하라고 이스티오에 알려야 한다.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: webapp
  name: webapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: webapp

  template:
    metadata:
      annotations:
        proxy.istio.io/config: |-
          proxyStatsMatcher:
            inclusionPrefixes:
            - "istio_get_calls"
      labels:
        app: webapp
    spec:
      containers:
      - env:
        - name: KUBERNETES_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: istioinaction/webapp:latest
        imagePullPolicy: IfNotPresent
        name: webapp
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        securityContext:
          privileged: false

7.4.3 새 속성으로 호출 그룹화하기

기존 속성을 기반으로 더 세분화하거나 도메인에 특화해 새 속성을 만들 수 있다.
istio_operationId 라는 새 속성을 만들 수 있다.
- request.path_url과 request_method를 조합해 catalog 서비스의 /items API 로 가는 GET 호출 개수를 추적
이를 위해 attribute_gen 프록시 플러그인을 이용
- 웹 어셈블리 확장
- 이 플러그인은 stats 플러그인을 보완하는 역할을 한다.
- stats 플러그인보다 먼저 적용되어서 이 플러그인의 모든 속성을 stats에서 사용 가능

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: attribute-gen-example
  namespace: istioinaction
spec:
  configPatches:
  ## Sidecar Outbound
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_OUTBOUND
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: istio.stats
      proxy:
        proxyVersion: ^1\.13.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.attributegen
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config:
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {
                    "attributes": [
                      {
                        "output_attribute": "istio_operationId", <-속성이름
                        "match": [
                         {
                           "value": "getitems", <- 속성 값
                           "condition": "request.url_path == '/items' && request.method == 'GET'"
                         },
                         {
                           "value": "createitem",
                           "condition": "request.url_path == '/items' && request.method == 'POST'"
                         },
                         {
                           "value": "deleteitem",
                           "condition": "request.url_path == '/items' && request.method == 'DELETE'"
                         }
                       ]
                      }
                    ]
                  }
              vm_config:
                code:
                  local:
                    inline_string: envoy.wasm.attributegen
                runtime: envoy.wasm.runtime.null

그리고 이 cataglog 에 대한 API 호출을 식별하기 위해 istio_requests_total 메트릭안에 속성을 사용하는 새 디멘션 추가 upstream_operation

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: demo
  values:
    telemetry:
      v2:
        prometheus:
          configOverride:
            outboundSidecar:
              metrics:
              - name: requests_total
                dimensions:
                  upstream_operation: istio_operationId

'Kubernetes' 카테고리의 다른 글

Istio in Action 10장 데이터 플레인 성능 튜닝하기 (0)	2025.10.26
Istio in Action 9장 - 마이크로서비스 통신 보호하기 (0)	2025.10.12
Istio in Action 6장 - 복원력: 애플리케이션 네트워킹 문제 해결하기 (0)	2025.09.07
Istio in Action 5장 - 트래픽 제어 : 세밀한 트래픽 라우팅 (1)	2025.08.28
Istio in Action 4장 - Istio Gateway : 클러스터로 트래픽 들이기 (0)	2025.08.20

현재글Istio in Action 7장 - 관찰 가능성: 서비스의 동작 이해하기

mlops, Network, Go, Container, gatewayapi, istio, Kubeflow, Cloud, KServe, Kubernetes, eks, minio, CNI, cicd, Storage, AWS, tekton, Ceph, karpenter, DevOps,

Today :
Yesterday :

TOUCHING ELEPHANT