Kubernetes

Istio in Action 7장 - 관찰 가능성: 서비스의 동작 이해하기

mokpolar 2025. 9. 17. 10:14
반응형

안녕하세요?
Istio in Action 책을 공부하면서 내용을 조금씩 정리해보려고 합니다.

7장은 관찰 가능성: 서비스의 동작 이해하기 입니다.

실습 환경 준비

  • MacOS, OrbStack으로 Container 구동
  • Kind, k8s 1.33.1
  • istioctl 1.27.0
curl -L https://istio.io/downloadIstio | sh -

istioctl install --set profile=demo -y
        |\
        | \
        |  \
        |   \
      /||    \
     / ||     \
    /  ||      \
   /   ||       \
  /    ||        \
 /     ||         \
/______||__________\
____________________
  \__       _____/
     \_____/

WARNING: Istio is being upgraded from 1.13.0 to 1.27.0.
         Running this command will overwrite it; use revisions to upgrade alongside the existing version.
         Before upgrading, you may wish to use 'istioctl x precheck' to check for upgrade warnings.
✔ Istio core installed ⛵️
✔ Istiod installed 🧠
✔ Egress gateways installed 🛫
✔ Ingress gateways installed 🛬
✔ Installation complete

 

7.1 관찰 가능성이란 무엇인가?

  • 관찰 가능성 : 외부 신호와 특성만 보고도 시스템의 내부 상태를 이해하고 추론할 수 있는 수준
  • 애플리케이션 계측, 네트워크 계측, 시그널 수집 인프라, 데이터베이스 뿐 아니라 예기치 못한 일이 일어났을 때 방대한 데이터를 잘 추리고 결합해 전체 그림을 그려내야 한다.

 

7.1.1 관찰 가능성 vs 모니터링

  • 모니터링 : 메트릭, 로그, 트레이스 등을 수집 및 집계하고 시스템 상태를 미리 정의한 기준과 비교하는 관행
    • 하나가 임계값을 넘겨 불량 상태로 향하고 있으면 시스템을 바로잡기 위한 조치
    • 바람직하지 않다고 알려진 상태를 감시하고 경고하기 위해 메트릭을 수집하고 집계
  • 이에 비해 관찰 가능성의 특성은
    • 시스템을 예측하기 매우 어려운 것이라 모든 고장을 사전에 알 수 없다고 가정
    • 그래서 더 많은 데이터, 카디널리티가 높은 데이터까지 수집하고 빠르게 탐색하고 질문

 

7.1.2 이스티오는 어떻게 관찰 가능성을 돕는가?

  • 이스티오의 데이터 플레인 프록시, 엔보이는 서비스 간 네트워크 요청 경로에 있다.
  • 그래서 요청 처리와 서비스 상호작용에 관한 중요 메트릭을 포착할 수 있다.
  • 예를 들어, 초당 요청 수, 요청 처리에 걸리는 시간, 실패한 요청 수
  • 동적으로 새 메트릭을 추가할 수도 있다.

 

7.2 이스티오 메트릭 살펴보기

7.2.1 데이터 플레인의 메트릭

 

7장용 예제 서비스 배포

k apply -f services/catalog/kubernetes/catalog.yaml
serviceaccount/catalog unchanged
service/catalog created
deployment.apps/catalog created

 k apply -f services/webapp/kubernetes/webapp.yaml
serviceaccount/webapp unchanged
service/webapp created
deployment.apps/webapp created

k apply -f services/webapp/istio/webapp-catalog-gw-vs.yaml
gateway.networking.istio.io/coolstore-gateway created
virtualservice.networking.istio.io/webapp-virtualservice created

 

서비스 접근 확인

curl http://192.168.97.2:31733/api/catalog -H "Host: webapp.istioinaction.io"
[{"id":1,"color":"amber","department":"Eyewear","name":"Elinor Glasses","price":"282.00"},{"id":2,"color":"cyan","department":"Clothing","name":"Atlas Shirt","price":"127.00"},{"id":3,"color":"teal","department":"Clothing","name":"Small Metal Shoes","price":"232.00"},{"id":4,"color":"red","department":"Watches","name":"Red Dragon Watch","price":"232.00"}]
  • istio_requests_total 이라는 부분을 보면 인그레스 게이트웨이에서 webapp 서비스로 들어오는 요청에 대한 메트릭이라는 사실을 알 수 있음
  • istio_requests_total
  • istio_requests_bytes
  • istio_response_bytes
  • istio_requests_duration
  • istio_request_duration_milliseconds

 

프록시가 엔보이 통계를 더 많이 보고하도록 설정하기

  • 애플리케이션의 호출이 자신의 클라이언트 측 프록시를 거쳐갈 때, 프록시는 라우팅 결정을 내리고 업스트림 클러스터로 라우팅한다.
  • 업스트림 클러스터 : 관련 설정 ( 로드밸런싱, 보안, 서킷브레이커 설정 등)을 적용해 실제 호출되는 서비스
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: control-plane
spec:
  profile: demo
  meshConfig:
    defaultConfig: <------------- 모든 서비스용 기본 프록시 설정 정의
      proxyStatsMatcher: <------- 보고할 메트릭 커스터마이징
        inclusionPrefixes: <----- 기본 메트릭에 더해 여기의 접두사와 일치하는 메트릭
        - "cluster.outbound|80|catalog.istioinaction"
  • 메시 전체에서 수집하는 메트릭을 늘리면 메트릭 수집시스템을 과부하 상태로 만들 수 있음
  • 더 좋은 방법은 워크로드 별로 애너테이션으로 포함할 메트릭을 지정하는 것

 

 

metadata:
  annotation:
    proxy.istio.io/config: |- <------ webapp 복제본용 프록시 설정
      proxyStatsMatcher:
        inclusionPrefixes:
        - "cluster.outbound|80|catalog.istioinaction"
k apply -f ch7/webapp-deployment-stats-inclusion.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: webapp
  name: webapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: webapp
  template:
    metadata:
      annotations:
        proxy.istio.io/config: |-
          proxyStatsMatcher:
            inclusionPrefixes:
            - "cluster.outbound|80||catalog.istioinaction"
      labels:
        app: webapp
    spec:
      containers:
      - env:
        - name: KUBERNETES_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: CATALOG_SERVICE_HOST
          value: catalog.istioinaction
        - name: CATALOG_SERVICE_PORT
          value: "80"
        - name: FORUM_SERVICE_HOST
          value: forum.istioinaction
        - name: FORUM_SERVICE_PORT
          value: "80"
        image: istioinaction/webapp:latest
        imagePullPolicy: IfNotPresent
        name: webapp
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        securityContext:
          privileged: false
curl http://192.168.97.2:31733/api/catalog -H "Host: webapp.istioinaction.io"
[{"id":1,"color":"amber","department":"Eyewear","name":"Elinor Glasses","price":"282.00"},{"id":2,"color":"cyan","department":"Clothing","name":"Atlas Shirt","price":"127.00"},{"id":3,"color":"teal","department":"Clothing","name":"Small Metal Shoes","price":"232.00"},{"id":4,"color":"red","department":"Watches","name":"Red Dragon Watch","price":"232.00"}]

 

그리고 istio stats 긁어오기

k exec -it deploy/webapp -c istio-proxy -- curl localhost:15000/stats | grep catalog

 

이 메트릭들은 업스트림 클러스터로 향하는 커넥션 혹은 요청에서 서킷 브레이커가 적용되고 있는지 여부를 보여줌

cluster.outbound|80||catalog.circuit_breaker.deafault.cx_open: 0
cluster.outbound|80||catalog.circuit_breaker.deafault.cx_pool_open: 0
cluster.outbound|80||catalog.circuit_breaker.deafault.rq_open: 0
cluster.outbound|80||catalog.circuit_breaker.deafault.rq_pending_open: 0
cluster.outbound|80||catalog.circuit_breaker.deafault.cx_req_retry_open: 0
  • 엔보이는 트래픽을 식별할때 출처가 내부인지 외부인지를 구분함.
    • 내부 : 메시 내부
    • 외부 : 메시 외부에서 시작(인그레스 게이트웨이로 들어온)
  • cluster_name.internal.* : 메시 내부에서 시작해 성공한 요청 갯수
  • cluster_name.ssl.* : 트래픽이 TLS로 업스트림 클러스터로 이동하는지 여부

 

7.2.2 컨트롤 플레인의 메트릭

  • 컨트롤 플레인의 메트릭 호출하기
    k exec -it -n istio-system deploy/istiod -- curl localhost:15014/metrics
    # HELP citadel_server_csr_count The number of CSRs received by Citadel server.
    # TYPE citadel_server_csr_count counter
    citadel_server_csr_count 2
    # HELP citadel_server_root_cert_expiry_seconds The time remaining, in seconds, before the root cert will expire. A negative value indicates the cert is expired.
    # TYPE citadel_server_root_cert_expiry_seconds gauge
    citadel_server_root_cert_expiry_seconds 3.12768891974063e+08
    # HELP citadel_server_root_cert_expiry_timestamp The unix timestamp, in seconds, when the root cert will expire.
    # TYPE citadel_server_root_cert_expiry_timestamp gauge
    citadel_server_root_cert_expiry_timestamp 2.070837463e+09
    # HELP citadel_server_success_cert_issuance_count The number of certificates issuances that have succeeded.
    # TYPE citadel_server_success_cert_issuance_count counter
    citadel_server_success_cert_issuance_count 2
    # HELP endpoint_no_pod Endpoints without an associated pod.
    # TYPE endpoint_no_pod gauge
    endpoint_no_pod 0
    # HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
    # TYPE go_gc_duration_seconds summary
    go_gc_duration_seconds{quantile="0"} 2.5541e-05
    go_gc_duration_seconds{quantile="0.25"} 5.8291e-05
    go_gc_duration_seconds{quantile="0.5"} 0.000922291
    go_gc_duration_seconds{quantile="0.75"} 0.002063167
    go_gc_duration_seconds{quantile="1"} 0.008150865
    go_gc_duration_seconds_sum 0.022038092
    go_gc_duration_seconds_count 15
    # HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the GOGC environment variable, and the runtime/debug.SetGCPercent function. Sourced from /gc/gogc:percent.
    # TYPE go_gc_gogc_percent gauge
    go_gc_gogc_percent 100
    # HELP go_gc_gomemlimit_bytes Go runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function. Sourced from /gc/gomemlimit:bytes.
    # TYPE go_gc_gomemlimit_bytes gauge
    go_gc_gomemlimit_bytes 4.171526144e+09
    # HELP go_goroutines Number of goroutines that currently exist.
    # TYPE go_goroutines gauge
    go_goroutines 673
    # HELP go_info Information about the Go environment.
    # TYPE go_info gauge
    go_info{version="go1.24.4"} 1
    # HELP go_memstats_alloc_bytes Number of bytes allocated in heap and currently in use. Equals to /memory/classes/heap/objects:bytes.
    # TYPE go_memstats_alloc_bytes gauge
    go_memstats_alloc_bytes 1.7508984e+07
    # HELP go_memstats_alloc_bytes_total Total number of bytes allocated in heap until now, even if released already. Equals to /gc/heap/allocs:bytes.
    # TYPE go_memstats_alloc_bytes_total counter
    go_memstats_alloc_bytes_total 7.8924736e+07
    # HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table. Equals to /memory/classes/profiling/buckets:bytes.
    # TYPE go_memstats_buck_hash_sys_bytes gauge
    go_memstats_buck_hash_sys_bytes 1.593119e+06
    # HELP go_memstats_frees_total Total number of heap objects frees. Equals to /gc/heap/frees:objects + /gc/heap/tiny/allocs:objects.
    # TYPE go_memstats_frees_total counter
    go_memstats_frees_total 528895
    # HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. Equals to /memory/classes/metadata/other:bytes.
    # TYPE go_memstats_gc_sys_bytes gauge
    go_memstats_gc_sys_bytes 4.105184e+06
    # HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and currently in use, same as go_memstats_alloc_bytes. Equals to /memory/classes/heap/objects:bytes.
    # TYPE go_memstats_heap_alloc_bytes gauge
    go_memstats_heap_alloc_bytes 1.7508984e+07
    # HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. Equals to /memory/classes/heap/released:bytes + /memory/classes/heap/free:bytes.
    # TYPE go_memstats_heap_idle_bytes gauge
    go_memstats_heap_idle_bytes 1.3312e+07
    # HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. Equals to /memory/classes/heap/objects:bytes + /memory/classes/heap/unused:bytes
    # TYPE go_memstats_heap_inuse_bytes gauge
    go_memstats_heap_inuse_bytes 2.5092096e+07
    # HELP go_memstats_heap_objects Number of currently allocated objects. Equals to /gc/heap/objects:objects.
    # TYPE go_memstats_heap_objects gauge
    go_memstats_heap_objects 90454
    # HELP go_memstats_heap_released_bytes Number of heap bytes released to OS. Equals to /memory/classes/heap/released:bytes.
    # TYPE go_memstats_heap_released_bytes gauge
    go_memstats_heap_released_bytes 9.904128e+06
    # HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system. Equals to /memory/classes/heap/objects:bytes + /memory/classes/heap/unused:bytes + /memory/classes/heap/released:bytes + /memory/classes/heap/free:bytes.
    # TYPE go_memstats_heap_sys_bytes gauge
    go_memstats_heap_sys_bytes 3.8404096e+07
    # HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
    # TYPE go_memstats_last_gc_time_seconds gauge
    go_memstats_last_gc_time_seconds 1.758068543081912e+09
    # HELP go_memstats_mallocs_total Total number of heap objects allocated, both live and gc-ed. Semantically a counter version for go_memstats_heap_objects gauge. Equals to /gc/heap/allocs:objects + /gc/heap/tiny/allocs:objects.
    # TYPE go_memstats_mallocs_total counter
    go_memstats_mallocs_total 619349
    # HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures. Equals to /memory/classes/metadata/mcache/inuse:bytes.
    # TYPE go_memstats_mcache_inuse_bytes gauge
    go_memstats_mcache_inuse_bytes 9664
    # HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system. Equals to /memory/classes/metadata/mcache/inuse:bytes + /memory/classes/metadata/mcache/free:bytes.
    # TYPE go_memstats_mcache_sys_bytes gauge
    go_memstats_mcache_sys_bytes 15704
    # HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures. Equals to /memory/classes/metadata/mspan/inuse:bytes.
    # TYPE go_memstats_mspan_inuse_bytes gauge
    go_memstats_mspan_inuse_bytes 427840
    # HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system. Equals to /memory/classes/metadata/mspan/inuse:bytes + /memory/classes/metadata/mspan/free:bytes.
    # TYPE go_memstats_mspan_sys_bytes gauge
    go_memstats_mspan_sys_bytes 456960
    # HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place. Equals to /gc/heap/goal:bytes.
    # TYPE go_memstats_next_gc_bytes gauge
    go_memstats_next_gc_bytes 3.514093e+07
    # HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations. Equals to /memory/classes/other:bytes.
    # TYPE go_memstats_other_sys_bytes gauge
    go_memstats_other_sys_bytes 1.780657e+06
    # HELP go_memstats_stack_inuse_bytes Number of bytes obtained from system for stack allocator in non-CGO environments. Equals to /memory/classes/heap/stacks:bytes.
    # TYPE go_memstats_stack_inuse_bytes gauge
    go_memstats_stack_inuse_bytes 3.538944e+06
    # HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator. Equals to /memory/classes/heap/stacks:bytes + /memory/classes/os-stacks:bytes.
    # TYPE go_memstats_stack_sys_bytes gauge
    go_memstats_stack_sys_bytes 3.538944e+06
    # HELP go_memstats_sys_bytes Number of bytes obtained from system. Equals to /memory/classes/total:byte.
    # TYPE go_memstats_sys_bytes gauge
    go_memstats_sys_bytes 4.9894664e+07
    # HELP go_sched_gomaxprocs_threads The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously. Sourced from /sched/gomaxprocs:threads.
    # TYPE go_sched_gomaxprocs_threads gauge
    go_sched_gomaxprocs_threads 8
    # HELP go_threads Number of OS threads created.
    # TYPE go_threads gauge
    go_threads 13
    # HELP grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.
    # TYPE grpc_server_handled_total counter
    grpc_server_handled_total{grpc_code="OK",grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2
    # HELP grpc_server_handling_seconds Histogram of response latency (seconds) of gRPC that had been application-level handled by the server.
    # TYPE grpc_server_handling_seconds histogram
    grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.005"} 1
    grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.01"} 2
    grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.025"} 2
    grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.05"} 2
    grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.1"} 2
    grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.25"} 2
    grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.5"} 2
    grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="1"} 2
    grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="2.5"} 2
    grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="5"} 2
    grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="10"} 2
    grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="+Inf"} 2
    grpc_server_handling_seconds_sum{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 0.008609167000000001
    grpc_server_handling_seconds_count{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2
    # HELP grpc_server_msg_received_total Total number of RPC stream messages received on the server.
    # TYPE grpc_server_msg_received_total counter
    grpc_server_msg_received_total{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2
    # HELP grpc_server_msg_sent_total Total number of gRPC stream messages sent by the server.
    # TYPE grpc_server_msg_sent_total counter
    grpc_server_msg_sent_total{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2
    # HELP grpc_server_started_total Total number of RPCs started on the server.
    # TYPE grpc_server_started_total counter
    grpc_server_started_total{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2
    # HELP istio_build Istio component build info
    # TYPE istio_build gauge
    istio_build{component="pilot",tag="1.27.0"} 1
    # HELP istiod_managed_clusters Number of clusters managed by istiod
    # TYPE istiod_managed_clusters gauge
    istiod_managed_clusters{cluster_type="local"} 1
    istiod_managed_clusters{cluster_type="remote"} 0
    # HELP istiod_uptime_seconds Current istiod server uptime in seconds
    # TYPE istiod_uptime_seconds gauge
    istiod_uptime_seconds 787.721140208
    # HELP pilot_conflict_inbound_listener Number of conflicting inbound listeners.
    # TYPE pilot_conflict_inbound_listener gauge
    pilot_conflict_inbound_listener 0
    # HELP pilot_conflict_outbound_listener_tcp_over_current_tcp Number of conflicting tcp listeners with current tcp listener.
    # TYPE pilot_conflict_outbound_listener_tcp_over_current_tcp gauge
    pilot_conflict_outbound_listener_tcp_over_current_tcp 0
    # HELP pilot_debounce_time Delay in seconds between the first config enters debouncing and the merged push request is pushed into the push queue (includes pushcontext_init_seconds).
    # TYPE pilot_debounce_time histogram
    pilot_debounce_time_bucket{le="0.01"} 0
    pilot_debounce_time_bucket{le="0.1"} 0
    pilot_debounce_time_bucket{le="1"} 10
    pilot_debounce_time_bucket{le="3"} 10
    pilot_debounce_time_bucket{le="5"} 10
    pilot_debounce_time_bucket{le="10"} 10
    pilot_debounce_time_bucket{le="20"} 10
    pilot_debounce_time_bucket{le="30"} 10
    pilot_debounce_time_bucket{le="+Inf"} 10
    pilot_debounce_time_sum 1.139648459
    pilot_debounce_time_count 10
    # HELP pilot_destrule_subsets Duplicate subsets across destination rules for same host
    # TYPE pilot_destrule_subsets gauge
    pilot_destrule_subsets 0
    # HELP pilot_dns_cluster_without_endpoints DNS clusters without endpoints caused by the endpoint field in STRICT_DNS type cluster is not set or the corresponding subset cannot select any endpoint
    # TYPE pilot_dns_cluster_without_endpoints gauge
    pilot_dns_cluster_without_endpoints 0
    # HELP pilot_duplicate_envoy_clusters Duplicate envoy clusters caused by service entries with same hostname
    # TYPE pilot_duplicate_envoy_clusters gauge
    pilot_duplicate_envoy_clusters 0
    # HELP pilot_eds_no_instances Number of clusters without instances.
    # TYPE pilot_eds_no_instances gauge
    pilot_eds_no_instances 0
    # HELP pilot_endpoint_not_ready Endpoint found in unready state.
    # TYPE pilot_endpoint_not_ready gauge
    pilot_endpoint_not_ready 0
    # HELP pilot_inbound_updates Total number of updates received by pilot.
    # TYPE pilot_inbound_updates counter
    pilot_inbound_updates{type="config"} 56
    pilot_inbound_updates{type="eds"} 33
    pilot_inbound_updates{type="svc"} 9
    # HELP pilot_info Pilot version and build information.
    # TYPE pilot_info gauge
    pilot_info{version="1.27.0-7359d8be2504f2b191f7d94156af08e6590d2d1c-Clean"} 1
    # HELP pilot_k8s_cfg_events Events from k8s config.
    # TYPE pilot_k8s_cfg_events counter
    pilot_k8s_cfg_events{event="add",type="DestinationRule"} 2
    pilot_k8s_cfg_events{event="add",type="EnvoyFilter"} 6
    pilot_k8s_cfg_events{event="add",type="Gateway"} 1
    pilot_k8s_cfg_events{event="add",type="VirtualService"} 2
    # HELP pilot_k8s_reg_events Events from k8s registry.
    # TYPE pilot_k8s_reg_events counter
    pilot_k8s_reg_events{event="add",type="EndpointSlice"} 9
    pilot_k8s_reg_events{event="add",type="Namespaces"} 7
    pilot_k8s_reg_events{event="add",type="Nodes"} 1
    pilot_k8s_reg_events{event="add",type="Pods"} 17
    pilot_k8s_reg_events{event="add",type="Services"} 9
    pilot_k8s_reg_events{event="delete",type="Pods"} 1
    pilot_k8s_reg_events{event="update",type="EndpointSlice"} 15
    pilot_k8s_reg_events{event="update",type="Nodes"} 39
    pilot_k8s_reg_events{event="update",type="Pods"} 24
    # HELP pilot_no_ip Pods not found in the endpoint table, possibly invalid.
    # TYPE pilot_no_ip gauge
    pilot_no_ip 0
    # HELP pilot_proxy_convergence_time Delay in seconds between config change and a proxy receiving all required configuration.
    # TYPE pilot_proxy_convergence_time histogram
    pilot_proxy_convergence_time_bucket{le="0.1"} 7
    pilot_proxy_convergence_time_bucket{le="0.5"} 7
    pilot_proxy_convergence_time_bucket{le="1"} 7
    pilot_proxy_convergence_time_bucket{le="3"} 7
    pilot_proxy_convergence_time_bucket{le="5"} 7
    pilot_proxy_convergence_time_bucket{le="10"} 7
    pilot_proxy_convergence_time_bucket{le="20"} 7
    pilot_proxy_convergence_time_bucket{le="30"} 7
    pilot_proxy_convergence_time_bucket{le="+Inf"} 7
    pilot_proxy_convergence_time_sum 0.00831075
    pilot_proxy_convergence_time_count 7
    # HELP pilot_proxy_queue_time Time in seconds, a proxy is in the push queue before being dequeued.
    # TYPE pilot_proxy_queue_time histogram
    pilot_proxy_queue_time_bucket{le="0.1"} 7
    pilot_proxy_queue_time_bucket{le="0.5"} 7
    pilot_proxy_queue_time_bucket{le="1"} 7
    pilot_proxy_queue_time_bucket{le="3"} 7
    pilot_proxy_queue_time_bucket{le="5"} 7
    pilot_proxy_queue_time_bucket{le="10"} 7
    pilot_proxy_queue_time_bucket{le="20"} 7
    pilot_proxy_queue_time_bucket{le="30"} 7
    pilot_proxy_queue_time_bucket{le="+Inf"} 7
    pilot_proxy_queue_time_sum 0.0008617910000000001
    pilot_proxy_queue_time_count 7
    # HELP pilot_push_triggers Total number of times a push was triggered, labeled by reason for the push.
    # TYPE pilot_push_triggers counter
    pilot_push_triggers{type="endpoint"} 5
    pilot_push_triggers{type="proxy"} 2
    # HELP pilot_pushcontext_init_seconds Total time in seconds Pilot takes to init pushContext.
    # TYPE pilot_pushcontext_init_seconds histogram
    pilot_pushcontext_init_seconds_bucket{le="0.01"} 3
    pilot_pushcontext_init_seconds_bucket{le="0.1"} 3
    pilot_pushcontext_init_seconds_bucket{le="0.5"} 3
    pilot_pushcontext_init_seconds_bucket{le="1"} 3
    pilot_pushcontext_init_seconds_bucket{le="3"} 3
    pilot_pushcontext_init_seconds_bucket{le="5"} 3
    pilot_pushcontext_init_seconds_bucket{le="+Inf"} 3
    pilot_pushcontext_init_seconds_sum 0.0020785
    pilot_pushcontext_init_seconds_count 3
    # HELP pilot_services Total services known to pilot.
    # TYPE pilot_services gauge
    pilot_services 9
    # HELP pilot_virt_services Total virtual services known to pilot.
    # TYPE pilot_virt_services gauge
    pilot_virt_services 2
    # HELP pilot_vservice_dup_domain Virtual services with dup domains.
    # TYPE pilot_vservice_dup_domain gauge
    pilot_vservice_dup_domain 0
    # HELP pilot_xds Number of endpoints connected to this pilot using XDS.
    # TYPE pilot_xds gauge
    pilot_xds{version="1.27.0"} 2
    # HELP pilot_xds_config_size_bytes Distribution of configuration sizes pushed to clients
    # TYPE pilot_xds_config_size_bytes histogram
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="1"} 0
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="10000"} 0
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="1e+06"} 4
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="4e+06"} 4
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="1e+07"} 4
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="4e+07"} 4
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="+Inf"} 4
    pilot_xds_config_size_bytes_sum{type="type.googleapis.com/envoy.config.cluster.v3.Cluster"} 85400
    pilot_xds_config_size_bytes_count{type="type.googleapis.com/envoy.config.cluster.v3.Cluster"} 4
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="1"} 0
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="10000"} 9
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="1e+06"} 9
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="4e+06"} 9
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="1e+07"} 9
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="4e+07"} 9
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="+Inf"} 9
    pilot_xds_config_size_bytes_sum{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment"} 13256
    pilot_xds_config_size_bytes_count{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment"} 9
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="1"} 2
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="10000"} 4
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="1e+06"} 4
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="4e+06"} 4
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="1e+07"} 4
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="4e+07"} 4
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="+Inf"} 4
    pilot_xds_config_size_bytes_sum{type="type.googleapis.com/envoy.config.listener.v3.Listener"} 6844
    pilot_xds_config_size_bytes_count{type="type.googleapis.com/envoy.config.listener.v3.Listener"} 4
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="1"} 0
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="10000"} 2
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="1e+06"} 2
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="4e+06"} 2
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="1e+07"} 2
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="4e+07"} 2
    pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="+Inf"} 2
    pilot_xds_config_size_bytes_sum{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration"} 1076
    pilot_xds_config_size_bytes_count{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration"} 2
    # HELP pilot_xds_push_time Total time in seconds Pilot takes to push lds, rds, cds and eds.
    # TYPE pilot_xds_push_time histogram
    pilot_xds_push_time_bucket{type="cds",le="0.01"} 4
    pilot_xds_push_time_bucket{type="cds",le="0.1"} 4
    pilot_xds_push_time_bucket{type="cds",le="1"} 4
    pilot_xds_push_time_bucket{type="cds",le="3"} 4
    pilot_xds_push_time_bucket{type="cds",le="5"} 4
    pilot_xds_push_time_bucket{type="cds",le="10"} 4
    pilot_xds_push_time_bucket{type="cds",le="20"} 4
    pilot_xds_push_time_bucket{type="cds",le="30"} 4
    pilot_xds_push_time_bucket{type="cds",le="+Inf"} 4
    pilot_xds_push_time_sum{type="cds"} 0.003247625
    pilot_xds_push_time_count{type="cds"} 4
    pilot_xds_push_time_bucket{type="eds",le="0.01"} 9
    pilot_xds_push_time_bucket{type="eds",le="0.1"} 9
    pilot_xds_push_time_bucket{type="eds",le="1"} 9
    pilot_xds_push_time_bucket{type="eds",le="3"} 9
    pilot_xds_push_time_bucket{type="eds",le="5"} 9
    pilot_xds_push_time_bucket{type="eds",le="10"} 9
    pilot_xds_push_time_bucket{type="eds",le="20"} 9
    pilot_xds_push_time_bucket{type="eds",le="30"} 9
    pilot_xds_push_time_bucket{type="eds",le="+Inf"} 9
    pilot_xds_push_time_sum{type="eds"} 0.0025550819999999998
    pilot_xds_push_time_count{type="eds"} 9
    pilot_xds_push_time_bucket{type="lds",le="0.01"} 4
    pilot_xds_push_time_bucket{type="lds",le="0.1"} 4
    pilot_xds_push_time_bucket{type="lds",le="1"} 4
    pilot_xds_push_time_bucket{type="lds",le="3"} 4
    pilot_xds_push_time_bucket{type="lds",le="5"} 4
    pilot_xds_push_time_bucket{type="lds",le="10"} 4
    pilot_xds_push_time_bucket{type="lds",le="20"} 4
    pilot_xds_push_time_bucket{type="lds",le="30"} 4
    pilot_xds_push_time_bucket{type="lds",le="+Inf"} 4
    pilot_xds_push_time_sum{type="lds"} 0.0073142089999999995
    pilot_xds_push_time_count{type="lds"} 4
    pilot_xds_push_time_bucket{type="rds",le="0.01"} 2
    pilot_xds_push_time_bucket{type="rds",le="0.1"} 2
    pilot_xds_push_time_bucket{type="rds",le="1"} 2
    pilot_xds_push_time_bucket{type="rds",le="3"} 2
    pilot_xds_push_time_bucket{type="rds",le="5"} 2
    pilot_xds_push_time_bucket{type="rds",le="10"} 2
    pilot_xds_push_time_bucket{type="rds",le="20"} 2
    pilot_xds_push_time_bucket{type="rds",le="30"} 2
    pilot_xds_push_time_bucket{type="rds",le="+Inf"} 2
    pilot_xds_push_time_sum{type="rds"} 0.000928501
    pilot_xds_push_time_count{type="rds"} 2
    # HELP pilot_xds_pushes Pilot build and send errors for lds, rds, cds and eds.
    # TYPE pilot_xds_pushes counter
    pilot_xds_pushes{type="cds"} 4
    pilot_xds_pushes{type="eds"} 9
    pilot_xds_pushes{type="lds"} 4
    pilot_xds_pushes{type="rds"} 2
    # HELP pilot_xds_send_time Total time in seconds Pilot takes to send generated configuration.
    # TYPE pilot_xds_send_time histogram
    pilot_xds_send_time_bucket{le="0.01"} 19
    pilot_xds_send_time_bucket{le="0.1"} 19
    pilot_xds_send_time_bucket{le="1"} 19
    pilot_xds_send_time_bucket{le="3"} 19
    pilot_xds_send_time_bucket{le="5"} 19
    pilot_xds_send_time_bucket{le="10"} 19
    pilot_xds_send_time_bucket{le="20"} 19
    pilot_xds_send_time_bucket{le="30"} 19
    pilot_xds_send_time_bucket{le="+Inf"} 19
    pilot_xds_send_time_sum 0.0013460400000000002
    pilot_xds_send_time_count 19
    # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
    # TYPE process_cpu_seconds_total counter
    process_cpu_seconds_total 8.46
    # HELP process_max_fds Maximum number of open file descriptors.
    # TYPE process_max_fds gauge
    process_max_fds 1.073741816e+09
    # HELP process_network_receive_bytes_total Number of bytes received by the process over the network.
    # TYPE process_network_receive_bytes_total counter
    process_network_receive_bytes_total 3.284157e+06
    # HELP process_network_transmit_bytes_total Number of bytes sent by the process over the network.
    # TYPE process_network_transmit_bytes_total counter
    process_network_transmit_bytes_total 951206
    # HELP process_open_fds Number of open file descriptors.
    # TYPE process_open_fds gauge
    process_open_fds 17
    # HELP process_resident_memory_bytes Resident memory size in bytes.
    # TYPE process_resident_memory_bytes gauge
    process_resident_memory_bytes 5.8810368e+07
    # HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
    # TYPE process_start_time_seconds gauge
    process_start_time_seconds 1.75806778225e+09
    # HELP process_virtual_memory_bytes Virtual memory size in bytes.
    # TYPE process_virtual_memory_bytes gauge
    process_virtual_memory_bytes 1.366982656e+09
    # HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
    # TYPE process_virtual_memory_max_bytes gauge
    process_virtual_memory_max_bytes 1.8446744073709552e+19

 

이 중 워크로드 이니증서 요청 CSR 에 서명하는데 사용하는 루트 인증서의 만료 시점과 컨트롤 플레인에 들어온 CSR 요청 및 발급된 인증서 개수를 확인 가능

# HELP citadel_server_root_cert_expiry_seconds The time remaining, in seconds, before the root cert will expire. A negative value indicates the cert is expired.
# TYPE citadel_server_root_cert_expiry_seconds gauge
citadel_server_root_cert_expiry_seconds 3.12768891974063e+08
# HELP citadel_server_root_cert_expiry_timestamp The unix timestamp, in seconds, when the root cert will expire.
# TYPE citadel_server_root_cert_expiry_timestamp gauge
citadel_server_root_cert_expiry_timestamp 2.070837463e+09
# HELP citadel_server_success_cert_issuance_count The number of certificates issuances that have succeeded.
# TYPE citadel_server_success_cert_issuance_count counter
citadel_server_success_cert_issuance_count 2

 

 

istio 버젼

# HELP istio_build Istio component build info
# TYPE istio_build gauge
istio_build{component="pilot",tag="1.27.0"} 1

 

 

설정을 데이터 플레인 프록시에 밀어넣고 동기화 하는데 소요되는 시간의 분포

# HELP pilot_proxy_convergence_time Delay in seconds between config change and a proxy receiving all required configuration.
# TYPE pilot_proxy_convergence_time histogram
pilot_proxy_convergence_time_bucket{le="0.1"} 7
pilot_proxy_convergence_time_bucket{le="0.5"} 7
pilot_proxy_convergence_time_bucket{le="1"} 7
pilot_proxy_convergence_time_bucket{le="3"} 7
pilot_proxy_convergence_time_bucket{le="5"} 7
pilot_proxy_convergence_time_bucket{le="10"} 7
pilot_proxy_convergence_time_bucket{le="20"} 7
pilot_proxy_convergence_time_bucket{le="30"} 7
pilot_proxy_convergence_time_bucket{le="+Inf"} 7
pilot_proxy_convergence_time_sum 0.00831075
pilot_proxy_convergence_time_count 7

 

컨트롤 플레인에 알려진 서비스 개수, 사용자가 설정한 VirtualService 리소스 개수, 연결된 프록시 개수

# HELP pilot_services Total services known to pilot.
# TYPE pilot_services gauge
pilot_services 9
# HELP pilot_virt_services Total virtual services known to pilot.
# TYPE pilot_virt_services gauge
pilot_virt_services 2
# HELP pilot_vservice_dup_domain Virtual services with dup domains.
# TYPE pilot_vservice_dup_domain gauge
pilot_vservice_dup_domain 0
# HELP pilot_xds Number of endpoints connected to this pilot using XDS.
# TYPE pilot_xds gauge
pilot_xds{version="1.27.0"} 2

 

특정 xDS API의 업데이트 횟수

# HELP pilot_xds_pushes Pilot build and send errors for lds, rds, cds and eds.
# TYPE pilot_xds_pushes counter
pilot_xds_pushes{type="cds"} 4
pilot_xds_pushes{type="eds"} 9
pilot_xds_pushes{type="lds"} 4
pilot_xds_pushes{type="rds"} 2

 

7.3 프로메테우스로 이스티오 메트릭 긁어오기

7.3.1 프로메테우스와 그라파나 설정하기

helm repo add prometheus-community https://prometheus-community.github.io/helm
-charts
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
"prometheus-community" has been added to your repositories

helm repo update
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "volcano-sh" chart repository
...Successfully got an update from the "traefik" chart repository
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "milvus" chart repository
...Successfully got an update from the "vector" chart repository
...Successfully got an update from the "localstack-repo" chart repository
...Successfully got an update from the "localstack" chart repository
...Successfully got an update from the "flagger" chart repository
Update Complete. ⎈Happy Helming!⎈
k create ns prometheus
namespace/prometheus created


helm install prom prometheus-community/kube-prometheus-stack --version 13.13.1
 -n prometheus -f ch7/prom-values.yaml
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
k get po -n prometheus
NAME                                                   READY   STATUS             RESTARTS      AGE
prom-grafana-57bcb4cc59-rvgsd                          2/2     Running            0             68s
prom-kube-prometheus-stack-admission-patch-tbbc4       0/1     CrashLoopBackOff   3 (26s ago)   66s
prom-kube-prometheus-stack-operator-69888c5fb6-bv2pj   1/1     Running            0             68s
prometheus-prom-kube-prometheus-stack-prometheus-0     2/2     Running            1 (9s ago)    39s

 

7.3.2 이스티오 컨트롤 플레인과 워크로드를 긁어가도록 프로메테우스 오퍼레이터 설정하기

  • 프로메테우스 오퍼레이터의 커스텀 리소스 ServiceMonitor, PodMonitor
    • 프로메테우스가 이스티오에서 메트릭을 수집하도록 설정
  • 아래와 같이 이스티오 컨트롤 플레인 구성요소를 긁어오도록 설정할 수 있음
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: istio-component-monitor
  namespace: prometheus
  labels:
    monitoring: istio-components
    release: prom
spec:
  jobLabel: istio
  targetLabels: [app]
  selector:
    matchExpressions:
    - {key: istio, operator: In, values: [pilot]}
  namespaceSelector:
    any: true
  endpoints:
  - port: http-monitoring
    interval: 15s
k apply -f ch7/service-monitor-cp.yaml
servicemonitor.monitoring.coreos.com/istio-component-monitor created
  • 이제 프로메테우스에서 컨트롤 플레인에 대한 텔레메트리를 볼 수 있음
    • 컨트롤 플레인에 연결된 사이드카 개수
    • 설정 충돌
    • 메시 내부 변동량
    • 컨트롤 플레인의 기본적인 메모리/CPU 사용량
k -n prometheus port-forward statefulset/prometheus-prom-kube-prometheus-stack
-prometheus 9090
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090

 

 

  • 데이터 플레인 수집 활성화
    k apply -f ch7/pod-monitor-dp.yaml
    podmonitor.monitoring.coreos.com/envoy-stats-monitor created
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: envoy-stats-monitor
  namespace: prometheus
  labels:
    monitoring: istio-proxies
    release: prom
spec:
  selector:
    matchExpressions:
    - {key: istio-prometheus-ignore, operator: DoesNotExist}
  namespaceSelector:
    any: true
  jobLabel: envoy-stats
  podMetricsEndpoints:
  - path: /stats/prometheus
    interval: 15s
    relabelings:
    - action: keep
      sourceLabels: [__meta_kubernetes_pod_container_name]
      regex: "istio-proxy"
    - action: keep
      sourceLabels: [__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape]
    - sourceLabels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: $1:$2
      targetLabel: __address__
    - action: labeldrop
      regex: "__meta_kubernetes_pod_label_(.+)"
    - sourceLabels: [__meta_kubernetes_namespace]
      action: replace
      targetLabel: namespace
    - sourceLabels: [__meta_kubernetes_pod_name]
      action: replace
      targetLabel: pod_name

 

7.4 이스티오 표준 메트릭 커스터마이징하기

  • istio_requests_total : COUNTER, 요청이 들어올 때마다 증가
  • istio_request_duration_milliseconds : DISTRIBUTION, 요청 지속 시간의 분포
  • istio_request_bytes : DISTRIBUTION, 요청 바디 크기의 분포
  • istio_response_bytes : DISTRIBUTION, 응답 바디 크기의 분포
  • istio_request_messages_total : (gRPC) COUNTER, 클라이언트에게서 메시지가 올 때마다 증가
  • istio_response_messages_total : (gRPC) COUNTER, 서버가 메시지를 보낼 때마다 증가
  • 3가지 주요 개념
    • 메트릭 metric
      • 서비스 호출(인, 아웃바운드) 간 텔레메트리의 카운터나 게이지, 히스토그램, 분포 distribution
      • istio_requests_total 메트릭은 서비스로 향하는 (인바운드) 혹은 서비스에서 나오는(아웃바운드) 요청의 총개수를 센다.
      • 서비스에 인/아웃바운드 요청이 모두 있다면 istio_requests_total 메트릭에는 항목이 2개 표시
    • 디멘션 dimenstion
      • 인/아웃바운드
      • 메트릭과 디멘션 조합마다 통계가 따로 표시
      • 메트릭에는 디멘션이 여럿일 수 있음
      • istio_requests_toal의 기본 디멘션
        • response_code="200" : 요청 세부 정보
        • reporter="destination" : 메트릭이 누구의 관점인가
        • source_app="istio_ingressgateway" : 호출 주체
        • destination_app="webapp" : 호출 대상
      • 디멘션 중 하나라도 다르면 메트릭의 새로운 항목으로 보인다.
        • 응답 코드가 500인 경우면 다른 줄에서 표시
      • 디멘션이 다르면 istio_requests_total에 대해서 서로 다른 2개의 항목이 보인다.
    • 속성 attribute
      • 특정 디멘션의 값은 attribute에서 갖고 온다.
      • 엔보이 프록시가 런타임에 갖고있는 값이 attribute
        • reqeust.path : url중 경로 부분
        • reqeust.url_path : url 중 경로 부분, 쿼리 문자열 제외
        • reqeust.host : 호스트 부분
        • reqeust.scheme : 스킴 부분 (예: http)
        • reqeust.method : 요청 메서드
        • reqeust.headers : 모든 요청 헤더. 헤더 이름은 소문자로 변환
      • 응답 속성
      • 커넥션 속성
      • 업스트림 속성
      • 메타데이터/필터 상태 속성
      • 웹어셈블리 속성

 

7.4.1 기존 메트릭 설정하기

  • 이스티오 메트릭은 EnvoyFilter 리소스를 사용해 stats 프록시 플러그인에서 설정한다.
k get envoyfilter -n istio-system
NAME                    AGE
stats-filter-1.11       30d
stats-filter-1.12       30d
stats-filter-1.13       30d
tcp-stats-filter-1.11   30d
tcp-stats-filter-1.12   30d
tcp-stats-filter-1.13   30d

 

아래 엔보이 필터는 istio.stats라는 필터를 직접 구성한다.
이 필터는 통계 기능을 구현하는 웹어셈블리 플러그인이다.
이 웹어셈블리 필터는 실제로는 엔보이 코드베이스 내에서 직접 컴파일돼 NULL 가상머신에서 실행된다.
그래서 웹어셈블리 가상머신에서 실행되지 않는다.

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  creationTimestamp: "2025-08-18T00:37:27Z"
  generation: 1
  labels:
    install.operator.istio.io/owning-resource-namespace: istio-system
    istio.io/rev: default
    operator.istio.io/component: Pilot
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.13.0
  name: stats-filter-1.13
  namespace: istio-system
  resourceVersion: "869"
  uid: a8e8b9ff-3bc4-4ced-8857-e19d400bc151
spec:
  configPatches:
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_OUTBOUND
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: envoy.filters.http.router
      proxy:
        proxyVersion: ^1\.13.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.stats
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config:
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {
                    "debug": "false",
                    "stat_prefix": "istio"
                  }
              root_id: stats_outbound
              vm_config:
                code:
                  local:
                    inline_string: envoy.wasm.stats
                runtime: envoy.wasm.runtime.null
                vm_id: stats_outbound
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_INBOUND
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: envoy.filters.http.router
      proxy:
        proxyVersion: ^1\.13.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.stats <--------- 필터 이름
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config: <--------------- 필터 설정
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {
                    "debug": "false",
                    "stat_prefix": "istio",
                    "disable_host_header_fallback": true,
                    "metrics": [
                      {
                        "dimensions": {
                          "destination_cluster": "node.metadata['CLUSTER_ID']",
                          "source_cluster": "downstream_peer.cluster_id"
                        }
                      }
                    ]
                  }
              root_id: stats_inbound
              vm_config:
                code:
                  local:
                    inline_string: envoy.wasm.stats
                runtime: envoy.wasm.runtime.null
                vm_id: stats_inbound
  - applyTo: HTTP_FILTER
    match:
      context: GATEWAY
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: envoy.filters.http.router
      proxy:
        proxyVersion: ^1\.13.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.stats
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config:
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {
                    "debug": "false",
                    "stat_prefix": "istio",
                    "disable_host_header_fallback": true
                  }
              root_id: stats_outbound
              vm_config:
                code:
                  local:
                    inline_string: envoy.wasm.stats
                runtime: envoy.wasm.runtime.null
                vm_id: stats_outbound

 

기존 메트릭에 디멘션 추가하기

  • istio_requests_total 메트릭에 디멘션을 2개 추가하고 싶다면?
  • 업스트림 호출에서 meshID별로 프록시의 버젼이 어떤지 확인하기
  • 아래 설정에서는 requests_total 메트릭을 특정해 속성에서 오는 디멘션 둘이 새로 포함되도록 설정
  • request_protocol 디멘션을 제거
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: demo
  values:
    telemetry:
      v2:
        prometheus:
          configOverride:
            inboundSidecar:
              metrics:
              - name: requests_total
                dimensions: <----------- 추가한 새 디멘션
                  upstream_proxy_version: upstream_peer.istio_version
                  source_mesh_id: node.metadata['MESH_ID']
                tags_to_remove: <------- 제거한 태그 목록
                - request_protocol
            outboundSidecar:
              metrics:
              - name: requests_total
                dimensions:
                  upstream_proxy_version: upstream_peer.istio_version
                  source_mesh_id: node.metadata['MESH_ID']
                tags_to_remove:
                - request_protocol
            gateway:
              metrics:
              - name: requests_total
                dimensions:
                  upstream_proxy_version: upstream_peer.istio_version
                  source_mesh_id: node.metadata['MESH_ID']
                tags_to_remove:
                - request_protocol
  • 그리고 이 디멘션을 메트릭에서 확인하기 전에 이스티오의 프록시가 이 디멘션에 대해 알게 해야 한다.
  • 디플로이먼트 파드 사양에 sidecar.istio.io/extraStatTags 애너테이션을 달아야 함
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    labels:
    app: webapp
    name: webapp
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: webapp
    template:
    metadata:
    annotations:
    proxy.istio.io/config: |-
    extraStatTags:
    - "upstream_proxy_version"
    - "source_mesh_id"
    labels:
    app: webapp
    spec:
    containers:
    - env:
    - name: KUBERNETES_NAMESPACE
    valueFrom:
    fieldRef:
    fieldPath: metadata.namespace
    image: istioinaction/webapp:latest
    imagePullPolicy: IfNotPresent
    name: webapp
    ports:
    - containerPort: 8080
    name: http
    protocol: TCP
    securityContext:
    privileged: false

 


### 7.4.2 새로운 메트릭 만들기 

* 새 메트릭을 만들고 싶으면 stats 플러그인에 새 메트릭을 정의 
* istio_get_calls라는 새로운 이름을 정의 
* istio_ 접두사는 자동으로 붙는다. 
* 메트릭의 값은 CEL Common Expression Language 표현시인 문자열로 COUNTER 타입에 정수를 반환해야 한다. 
* CEL 표현식은 attribute에 대해 작동하고 아래의 경우 HTTP GET 요청 개수를 센다. 

```yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: demo
  values:
    telemetry:
      v2:
        prometheus:
          configOverride:
            inboundSidecar:
              definitions:
              - name: get_calls
                type: COUNTER
                value: "(request.method.startsWith('GET') ? 1 : 0)"
            outboundSidecar:
              definitions:
              - name: get_calls
                type: COUNTER
                value: "(request.method.startsWith('GET') ? 1 : 0)"
            gateway:
              definitions:
              - name: get_calls
                type: COUNTER
                value: "(request.method.startsWith('GET') ? 1 : 0)"

 

 

  • 새 디멘션은 이스티오 프록시에 명시적으로 알리듯이, 새 메트릭을 만들 때는 프록시에서 노출하라고 이스티오에 알려야 한다.
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: webapp
  name: webapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: webapp

  template:
    metadata:
      annotations:
        proxy.istio.io/config: |-
          proxyStatsMatcher:
            inclusionPrefixes:
            - "istio_get_calls"
      labels:
        app: webapp
    spec:
      containers:
      - env:
        - name: KUBERNETES_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: istioinaction/webapp:latest
        imagePullPolicy: IfNotPresent
        name: webapp
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        securityContext:
          privileged: false

 

7.4.3 새 속성으로 호출 그룹화하기

  • 기존 속성을 기반으로 더 세분화하거나 도메인에 특화해 새 속성을 만들 수 있다.
  • istio_operationId 라는 새 속성을 만들 수 있다.
    • request.path_url과 request_method를 조합해 catalog 서비스의 /items API 로 가는 GET 호출 개수를 추적
  • 이를 위해 attribute_gen 프록시 플러그인을 이용
    • 웹 어셈블리 확장
    • 이 플러그인은 stats 플러그인을 보완하는 역할을 한다.
    • stats 플러그인보다 먼저 적용되어서 이 플러그인의 모든 속성을 stats에서 사용 가능
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: attribute-gen-example
  namespace: istioinaction
spec:
  configPatches:
  ## Sidecar Outbound
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_OUTBOUND
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: istio.stats
      proxy:
        proxyVersion: ^1\.13.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.attributegen
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config:
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {
                    "attributes": [
                      {
                        "output_attribute": "istio_operationId", <-속성이름
                        "match": [
                         {
                           "value": "getitems", <- 속성 값
                           "condition": "request.url_path == '/items' && request.method == 'GET'"
                         },
                         {
                           "value": "createitem",
                           "condition": "request.url_path == '/items' && request.method == 'POST'"
                         },
                         {
                           "value": "deleteitem",
                           "condition": "request.url_path == '/items' && request.method == 'DELETE'"
                         }
                       ]
                      }
                    ]
                  }
              vm_config:
                code:
                  local:
                    inline_string: envoy.wasm.attributegen
                runtime: envoy.wasm.runtime.null

 

그리고 이 cataglog 에 대한 API 호출을 식별하기 위해 istio_requests_total 메트릭안에 속성을 사용하는 새 디멘션 추가 upstream_operation

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: demo
  values:
    telemetry:
      v2:
        prometheus:
          configOverride:
            outboundSidecar:
              metrics:
              - name: requests_total
                dimensions:
                  upstream_operation: istio_operationId
반응형