반응형
안녕하세요?
Istio in Action 책을 공부하면서 내용을 조금씩 정리해보려고 합니다.
7장은 관찰 가능성: 서비스의 동작 이해하기 입니다.
실습 환경 준비
- MacOS, OrbStack으로 Container 구동
- Kind, k8s 1.33.1
- istioctl 1.27.0
curl -L https://istio.io/downloadIstio | sh -
istioctl install --set profile=demo -y
|\
| \
| \
| \
/|| \
/ || \
/ || \
/ || \
/ || \
/ || \
/______||__________\
____________________
\__ _____/
\_____/
WARNING: Istio is being upgraded from 1.13.0 to 1.27.0.
Running this command will overwrite it; use revisions to upgrade alongside the existing version.
Before upgrading, you may wish to use 'istioctl x precheck' to check for upgrade warnings.
✔ Istio core installed ⛵️
✔ Istiod installed 🧠
✔ Egress gateways installed 🛫
✔ Ingress gateways installed 🛬
✔ Installation complete
7.1 관찰 가능성이란 무엇인가?
- 관찰 가능성 : 외부 신호와 특성만 보고도 시스템의 내부 상태를 이해하고 추론할 수 있는 수준
- 애플리케이션 계측, 네트워크 계측, 시그널 수집 인프라, 데이터베이스 뿐 아니라 예기치 못한 일이 일어났을 때 방대한 데이터를 잘 추리고 결합해 전체 그림을 그려내야 한다.
7.1.1 관찰 가능성 vs 모니터링
- 모니터링 : 메트릭, 로그, 트레이스 등을 수집 및 집계하고 시스템 상태를 미리 정의한 기준과 비교하는 관행
- 하나가 임계값을 넘겨 불량 상태로 향하고 있으면 시스템을 바로잡기 위한 조치
- 바람직하지 않다고 알려진 상태를 감시하고 경고하기 위해 메트릭을 수집하고 집계
- 이에 비해 관찰 가능성의 특성은
- 시스템을 예측하기 매우 어려운 것이라 모든 고장을 사전에 알 수 없다고 가정
- 그래서 더 많은 데이터, 카디널리티가 높은 데이터까지 수집하고 빠르게 탐색하고 질문
7.1.2 이스티오는 어떻게 관찰 가능성을 돕는가?
- 이스티오의 데이터 플레인 프록시, 엔보이는 서비스 간 네트워크 요청 경로에 있다.
- 그래서 요청 처리와 서비스 상호작용에 관한 중요 메트릭을 포착할 수 있다.
- 예를 들어, 초당 요청 수, 요청 처리에 걸리는 시간, 실패한 요청 수
- 동적으로 새 메트릭을 추가할 수도 있다.
7.2 이스티오 메트릭 살펴보기
7.2.1 데이터 플레인의 메트릭
7장용 예제 서비스 배포
k apply -f services/catalog/kubernetes/catalog.yaml
serviceaccount/catalog unchanged
service/catalog created
deployment.apps/catalog created
k apply -f services/webapp/kubernetes/webapp.yaml
serviceaccount/webapp unchanged
service/webapp created
deployment.apps/webapp created
k apply -f services/webapp/istio/webapp-catalog-gw-vs.yaml
gateway.networking.istio.io/coolstore-gateway created
virtualservice.networking.istio.io/webapp-virtualservice created
서비스 접근 확인
curl http://192.168.97.2:31733/api/catalog -H "Host: webapp.istioinaction.io"
[{"id":1,"color":"amber","department":"Eyewear","name":"Elinor Glasses","price":"282.00"},{"id":2,"color":"cyan","department":"Clothing","name":"Atlas Shirt","price":"127.00"},{"id":3,"color":"teal","department":"Clothing","name":"Small Metal Shoes","price":"232.00"},{"id":4,"color":"red","department":"Watches","name":"Red Dragon Watch","price":"232.00"}]
- istio_requests_total 이라는 부분을 보면 인그레스 게이트웨이에서 webapp 서비스로 들어오는 요청에 대한 메트릭이라는 사실을 알 수 있음
- istio_requests_total
- istio_requests_bytes
- istio_response_bytes
- istio_requests_duration
- istio_request_duration_milliseconds
프록시가 엔보이 통계를 더 많이 보고하도록 설정하기
- 애플리케이션의 호출이 자신의 클라이언트 측 프록시를 거쳐갈 때, 프록시는 라우팅 결정을 내리고 업스트림 클러스터로 라우팅한다.
- 업스트림 클러스터 : 관련 설정 ( 로드밸런싱, 보안, 서킷브레이커 설정 등)을 적용해 실제 호출되는 서비스
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: control-plane
spec:
profile: demo
meshConfig:
defaultConfig: <------------- 모든 서비스용 기본 프록시 설정 정의
proxyStatsMatcher: <------- 보고할 메트릭 커스터마이징
inclusionPrefixes: <----- 기본 메트릭에 더해 여기의 접두사와 일치하는 메트릭
- "cluster.outbound|80|catalog.istioinaction"
- 메시 전체에서 수집하는 메트릭을 늘리면 메트릭 수집시스템을 과부하 상태로 만들 수 있음
- 더 좋은 방법은 워크로드 별로 애너테이션으로 포함할 메트릭을 지정하는 것
metadata:
annotation:
proxy.istio.io/config: |- <------ webapp 복제본용 프록시 설정
proxyStatsMatcher:
inclusionPrefixes:
- "cluster.outbound|80|catalog.istioinaction"
k apply -f ch7/webapp-deployment-stats-inclusion.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: webapp
name: webapp
spec:
replicas: 1
selector:
matchLabels:
app: webapp
template:
metadata:
annotations:
proxy.istio.io/config: |-
proxyStatsMatcher:
inclusionPrefixes:
- "cluster.outbound|80||catalog.istioinaction"
labels:
app: webapp
spec:
containers:
- env:
- name: KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CATALOG_SERVICE_HOST
value: catalog.istioinaction
- name: CATALOG_SERVICE_PORT
value: "80"
- name: FORUM_SERVICE_HOST
value: forum.istioinaction
- name: FORUM_SERVICE_PORT
value: "80"
image: istioinaction/webapp:latest
imagePullPolicy: IfNotPresent
name: webapp
ports:
- containerPort: 8080
name: http
protocol: TCP
securityContext:
privileged: false
curl http://192.168.97.2:31733/api/catalog -H "Host: webapp.istioinaction.io"
[{"id":1,"color":"amber","department":"Eyewear","name":"Elinor Glasses","price":"282.00"},{"id":2,"color":"cyan","department":"Clothing","name":"Atlas Shirt","price":"127.00"},{"id":3,"color":"teal","department":"Clothing","name":"Small Metal Shoes","price":"232.00"},{"id":4,"color":"red","department":"Watches","name":"Red Dragon Watch","price":"232.00"}]
그리고 istio stats 긁어오기
k exec -it deploy/webapp -c istio-proxy -- curl localhost:15000/stats | grep catalog
이 메트릭들은 업스트림 클러스터로 향하는 커넥션 혹은 요청에서 서킷 브레이커가 적용되고 있는지 여부를 보여줌
cluster.outbound|80||catalog.circuit_breaker.deafault.cx_open: 0
cluster.outbound|80||catalog.circuit_breaker.deafault.cx_pool_open: 0
cluster.outbound|80||catalog.circuit_breaker.deafault.rq_open: 0
cluster.outbound|80||catalog.circuit_breaker.deafault.rq_pending_open: 0
cluster.outbound|80||catalog.circuit_breaker.deafault.cx_req_retry_open: 0
- 엔보이는 트래픽을 식별할때 출처가 내부인지 외부인지를 구분함.
- 내부 : 메시 내부
- 외부 : 메시 외부에서 시작(인그레스 게이트웨이로 들어온)
- cluster_name.internal.* : 메시 내부에서 시작해 성공한 요청 갯수
- cluster_name.ssl.* : 트래픽이 TLS로 업스트림 클러스터로 이동하는지 여부
7.2.2 컨트롤 플레인의 메트릭
- 컨트롤 플레인의 메트릭 호출하기
k exec -it -n istio-system deploy/istiod -- curl localhost:15014/metrics # HELP citadel_server_csr_count The number of CSRs received by Citadel server. # TYPE citadel_server_csr_count counter citadel_server_csr_count 2 # HELP citadel_server_root_cert_expiry_seconds The time remaining, in seconds, before the root cert will expire. A negative value indicates the cert is expired. # TYPE citadel_server_root_cert_expiry_seconds gauge citadel_server_root_cert_expiry_seconds 3.12768891974063e+08 # HELP citadel_server_root_cert_expiry_timestamp The unix timestamp, in seconds, when the root cert will expire. # TYPE citadel_server_root_cert_expiry_timestamp gauge citadel_server_root_cert_expiry_timestamp 2.070837463e+09 # HELP citadel_server_success_cert_issuance_count The number of certificates issuances that have succeeded. # TYPE citadel_server_success_cert_issuance_count counter citadel_server_success_cert_issuance_count 2 # HELP endpoint_no_pod Endpoints without an associated pod. # TYPE endpoint_no_pod gauge endpoint_no_pod 0 # HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 2.5541e-05 go_gc_duration_seconds{quantile="0.25"} 5.8291e-05 go_gc_duration_seconds{quantile="0.5"} 0.000922291 go_gc_duration_seconds{quantile="0.75"} 0.002063167 go_gc_duration_seconds{quantile="1"} 0.008150865 go_gc_duration_seconds_sum 0.022038092 go_gc_duration_seconds_count 15 # HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the GOGC environment variable, and the runtime/debug.SetGCPercent function. Sourced from /gc/gogc:percent. # TYPE go_gc_gogc_percent gauge go_gc_gogc_percent 100 # HELP go_gc_gomemlimit_bytes Go runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function. Sourced from /gc/gomemlimit:bytes. # TYPE go_gc_gomemlimit_bytes gauge go_gc_gomemlimit_bytes 4.171526144e+09 # HELP go_goroutines Number of goroutines that currently exist. # TYPE go_goroutines gauge go_goroutines 673 # HELP go_info Information about the Go environment. # TYPE go_info gauge go_info{version="go1.24.4"} 1 # HELP go_memstats_alloc_bytes Number of bytes allocated in heap and currently in use. Equals to /memory/classes/heap/objects:bytes. # TYPE go_memstats_alloc_bytes gauge go_memstats_alloc_bytes 1.7508984e+07 # HELP go_memstats_alloc_bytes_total Total number of bytes allocated in heap until now, even if released already. Equals to /gc/heap/allocs:bytes. # TYPE go_memstats_alloc_bytes_total counter go_memstats_alloc_bytes_total 7.8924736e+07 # HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table. Equals to /memory/classes/profiling/buckets:bytes. # TYPE go_memstats_buck_hash_sys_bytes gauge go_memstats_buck_hash_sys_bytes 1.593119e+06 # HELP go_memstats_frees_total Total number of heap objects frees. Equals to /gc/heap/frees:objects + /gc/heap/tiny/allocs:objects. # TYPE go_memstats_frees_total counter go_memstats_frees_total 528895 # HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. Equals to /memory/classes/metadata/other:bytes. # TYPE go_memstats_gc_sys_bytes gauge go_memstats_gc_sys_bytes 4.105184e+06 # HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and currently in use, same as go_memstats_alloc_bytes. Equals to /memory/classes/heap/objects:bytes. # TYPE go_memstats_heap_alloc_bytes gauge go_memstats_heap_alloc_bytes 1.7508984e+07 # HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. Equals to /memory/classes/heap/released:bytes + /memory/classes/heap/free:bytes. # TYPE go_memstats_heap_idle_bytes gauge go_memstats_heap_idle_bytes 1.3312e+07 # HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. Equals to /memory/classes/heap/objects:bytes + /memory/classes/heap/unused:bytes # TYPE go_memstats_heap_inuse_bytes gauge go_memstats_heap_inuse_bytes 2.5092096e+07 # HELP go_memstats_heap_objects Number of currently allocated objects. Equals to /gc/heap/objects:objects. # TYPE go_memstats_heap_objects gauge go_memstats_heap_objects 90454 # HELP go_memstats_heap_released_bytes Number of heap bytes released to OS. Equals to /memory/classes/heap/released:bytes. # TYPE go_memstats_heap_released_bytes gauge go_memstats_heap_released_bytes 9.904128e+06 # HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system. Equals to /memory/classes/heap/objects:bytes + /memory/classes/heap/unused:bytes + /memory/classes/heap/released:bytes + /memory/classes/heap/free:bytes. # TYPE go_memstats_heap_sys_bytes gauge go_memstats_heap_sys_bytes 3.8404096e+07 # HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection. # TYPE go_memstats_last_gc_time_seconds gauge go_memstats_last_gc_time_seconds 1.758068543081912e+09 # HELP go_memstats_mallocs_total Total number of heap objects allocated, both live and gc-ed. Semantically a counter version for go_memstats_heap_objects gauge. Equals to /gc/heap/allocs:objects + /gc/heap/tiny/allocs:objects. # TYPE go_memstats_mallocs_total counter go_memstats_mallocs_total 619349 # HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures. Equals to /memory/classes/metadata/mcache/inuse:bytes. # TYPE go_memstats_mcache_inuse_bytes gauge go_memstats_mcache_inuse_bytes 9664 # HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system. Equals to /memory/classes/metadata/mcache/inuse:bytes + /memory/classes/metadata/mcache/free:bytes. # TYPE go_memstats_mcache_sys_bytes gauge go_memstats_mcache_sys_bytes 15704 # HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures. Equals to /memory/classes/metadata/mspan/inuse:bytes. # TYPE go_memstats_mspan_inuse_bytes gauge go_memstats_mspan_inuse_bytes 427840 # HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system. Equals to /memory/classes/metadata/mspan/inuse:bytes + /memory/classes/metadata/mspan/free:bytes. # TYPE go_memstats_mspan_sys_bytes gauge go_memstats_mspan_sys_bytes 456960 # HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place. Equals to /gc/heap/goal:bytes. # TYPE go_memstats_next_gc_bytes gauge go_memstats_next_gc_bytes 3.514093e+07 # HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations. Equals to /memory/classes/other:bytes. # TYPE go_memstats_other_sys_bytes gauge go_memstats_other_sys_bytes 1.780657e+06 # HELP go_memstats_stack_inuse_bytes Number of bytes obtained from system for stack allocator in non-CGO environments. Equals to /memory/classes/heap/stacks:bytes. # TYPE go_memstats_stack_inuse_bytes gauge go_memstats_stack_inuse_bytes 3.538944e+06 # HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator. Equals to /memory/classes/heap/stacks:bytes + /memory/classes/os-stacks:bytes. # TYPE go_memstats_stack_sys_bytes gauge go_memstats_stack_sys_bytes 3.538944e+06 # HELP go_memstats_sys_bytes Number of bytes obtained from system. Equals to /memory/classes/total:byte. # TYPE go_memstats_sys_bytes gauge go_memstats_sys_bytes 4.9894664e+07 # HELP go_sched_gomaxprocs_threads The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously. Sourced from /sched/gomaxprocs:threads. # TYPE go_sched_gomaxprocs_threads gauge go_sched_gomaxprocs_threads 8 # HELP go_threads Number of OS threads created. # TYPE go_threads gauge go_threads 13 # HELP grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure. # TYPE grpc_server_handled_total counter grpc_server_handled_total{grpc_code="OK",grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2 # HELP grpc_server_handling_seconds Histogram of response latency (seconds) of gRPC that had been application-level handled by the server. # TYPE grpc_server_handling_seconds histogram grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.005"} 1 grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.01"} 2 grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.025"} 2 grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.05"} 2 grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.1"} 2 grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.25"} 2 grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="0.5"} 2 grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="1"} 2 grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="2.5"} 2 grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="5"} 2 grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="10"} 2 grpc_server_handling_seconds_bucket{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary",le="+Inf"} 2 grpc_server_handling_seconds_sum{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 0.008609167000000001 grpc_server_handling_seconds_count{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2 # HELP grpc_server_msg_received_total Total number of RPC stream messages received on the server. # TYPE grpc_server_msg_received_total counter grpc_server_msg_received_total{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2 # HELP grpc_server_msg_sent_total Total number of gRPC stream messages sent by the server. # TYPE grpc_server_msg_sent_total counter grpc_server_msg_sent_total{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2 # HELP grpc_server_started_total Total number of RPCs started on the server. # TYPE grpc_server_started_total counter grpc_server_started_total{grpc_method="CreateCertificate",grpc_service="istio.v1.auth.IstioCertificateService",grpc_type="unary"} 2 # HELP istio_build Istio component build info # TYPE istio_build gauge istio_build{component="pilot",tag="1.27.0"} 1 # HELP istiod_managed_clusters Number of clusters managed by istiod # TYPE istiod_managed_clusters gauge istiod_managed_clusters{cluster_type="local"} 1 istiod_managed_clusters{cluster_type="remote"} 0 # HELP istiod_uptime_seconds Current istiod server uptime in seconds # TYPE istiod_uptime_seconds gauge istiod_uptime_seconds 787.721140208 # HELP pilot_conflict_inbound_listener Number of conflicting inbound listeners. # TYPE pilot_conflict_inbound_listener gauge pilot_conflict_inbound_listener 0 # HELP pilot_conflict_outbound_listener_tcp_over_current_tcp Number of conflicting tcp listeners with current tcp listener. # TYPE pilot_conflict_outbound_listener_tcp_over_current_tcp gauge pilot_conflict_outbound_listener_tcp_over_current_tcp 0 # HELP pilot_debounce_time Delay in seconds between the first config enters debouncing and the merged push request is pushed into the push queue (includes pushcontext_init_seconds). # TYPE pilot_debounce_time histogram pilot_debounce_time_bucket{le="0.01"} 0 pilot_debounce_time_bucket{le="0.1"} 0 pilot_debounce_time_bucket{le="1"} 10 pilot_debounce_time_bucket{le="3"} 10 pilot_debounce_time_bucket{le="5"} 10 pilot_debounce_time_bucket{le="10"} 10 pilot_debounce_time_bucket{le="20"} 10 pilot_debounce_time_bucket{le="30"} 10 pilot_debounce_time_bucket{le="+Inf"} 10 pilot_debounce_time_sum 1.139648459 pilot_debounce_time_count 10 # HELP pilot_destrule_subsets Duplicate subsets across destination rules for same host # TYPE pilot_destrule_subsets gauge pilot_destrule_subsets 0 # HELP pilot_dns_cluster_without_endpoints DNS clusters without endpoints caused by the endpoint field in STRICT_DNS type cluster is not set or the corresponding subset cannot select any endpoint # TYPE pilot_dns_cluster_without_endpoints gauge pilot_dns_cluster_without_endpoints 0 # HELP pilot_duplicate_envoy_clusters Duplicate envoy clusters caused by service entries with same hostname # TYPE pilot_duplicate_envoy_clusters gauge pilot_duplicate_envoy_clusters 0 # HELP pilot_eds_no_instances Number of clusters without instances. # TYPE pilot_eds_no_instances gauge pilot_eds_no_instances 0 # HELP pilot_endpoint_not_ready Endpoint found in unready state. # TYPE pilot_endpoint_not_ready gauge pilot_endpoint_not_ready 0 # HELP pilot_inbound_updates Total number of updates received by pilot. # TYPE pilot_inbound_updates counter pilot_inbound_updates{type="config"} 56 pilot_inbound_updates{type="eds"} 33 pilot_inbound_updates{type="svc"} 9 # HELP pilot_info Pilot version and build information. # TYPE pilot_info gauge pilot_info{version="1.27.0-7359d8be2504f2b191f7d94156af08e6590d2d1c-Clean"} 1 # HELP pilot_k8s_cfg_events Events from k8s config. # TYPE pilot_k8s_cfg_events counter pilot_k8s_cfg_events{event="add",type="DestinationRule"} 2 pilot_k8s_cfg_events{event="add",type="EnvoyFilter"} 6 pilot_k8s_cfg_events{event="add",type="Gateway"} 1 pilot_k8s_cfg_events{event="add",type="VirtualService"} 2 # HELP pilot_k8s_reg_events Events from k8s registry. # TYPE pilot_k8s_reg_events counter pilot_k8s_reg_events{event="add",type="EndpointSlice"} 9 pilot_k8s_reg_events{event="add",type="Namespaces"} 7 pilot_k8s_reg_events{event="add",type="Nodes"} 1 pilot_k8s_reg_events{event="add",type="Pods"} 17 pilot_k8s_reg_events{event="add",type="Services"} 9 pilot_k8s_reg_events{event="delete",type="Pods"} 1 pilot_k8s_reg_events{event="update",type="EndpointSlice"} 15 pilot_k8s_reg_events{event="update",type="Nodes"} 39 pilot_k8s_reg_events{event="update",type="Pods"} 24 # HELP pilot_no_ip Pods not found in the endpoint table, possibly invalid. # TYPE pilot_no_ip gauge pilot_no_ip 0 # HELP pilot_proxy_convergence_time Delay in seconds between config change and a proxy receiving all required configuration. # TYPE pilot_proxy_convergence_time histogram pilot_proxy_convergence_time_bucket{le="0.1"} 7 pilot_proxy_convergence_time_bucket{le="0.5"} 7 pilot_proxy_convergence_time_bucket{le="1"} 7 pilot_proxy_convergence_time_bucket{le="3"} 7 pilot_proxy_convergence_time_bucket{le="5"} 7 pilot_proxy_convergence_time_bucket{le="10"} 7 pilot_proxy_convergence_time_bucket{le="20"} 7 pilot_proxy_convergence_time_bucket{le="30"} 7 pilot_proxy_convergence_time_bucket{le="+Inf"} 7 pilot_proxy_convergence_time_sum 0.00831075 pilot_proxy_convergence_time_count 7 # HELP pilot_proxy_queue_time Time in seconds, a proxy is in the push queue before being dequeued. # TYPE pilot_proxy_queue_time histogram pilot_proxy_queue_time_bucket{le="0.1"} 7 pilot_proxy_queue_time_bucket{le="0.5"} 7 pilot_proxy_queue_time_bucket{le="1"} 7 pilot_proxy_queue_time_bucket{le="3"} 7 pilot_proxy_queue_time_bucket{le="5"} 7 pilot_proxy_queue_time_bucket{le="10"} 7 pilot_proxy_queue_time_bucket{le="20"} 7 pilot_proxy_queue_time_bucket{le="30"} 7 pilot_proxy_queue_time_bucket{le="+Inf"} 7 pilot_proxy_queue_time_sum 0.0008617910000000001 pilot_proxy_queue_time_count 7 # HELP pilot_push_triggers Total number of times a push was triggered, labeled by reason for the push. # TYPE pilot_push_triggers counter pilot_push_triggers{type="endpoint"} 5 pilot_push_triggers{type="proxy"} 2 # HELP pilot_pushcontext_init_seconds Total time in seconds Pilot takes to init pushContext. # TYPE pilot_pushcontext_init_seconds histogram pilot_pushcontext_init_seconds_bucket{le="0.01"} 3 pilot_pushcontext_init_seconds_bucket{le="0.1"} 3 pilot_pushcontext_init_seconds_bucket{le="0.5"} 3 pilot_pushcontext_init_seconds_bucket{le="1"} 3 pilot_pushcontext_init_seconds_bucket{le="3"} 3 pilot_pushcontext_init_seconds_bucket{le="5"} 3 pilot_pushcontext_init_seconds_bucket{le="+Inf"} 3 pilot_pushcontext_init_seconds_sum 0.0020785 pilot_pushcontext_init_seconds_count 3 # HELP pilot_services Total services known to pilot. # TYPE pilot_services gauge pilot_services 9 # HELP pilot_virt_services Total virtual services known to pilot. # TYPE pilot_virt_services gauge pilot_virt_services 2 # HELP pilot_vservice_dup_domain Virtual services with dup domains. # TYPE pilot_vservice_dup_domain gauge pilot_vservice_dup_domain 0 # HELP pilot_xds Number of endpoints connected to this pilot using XDS. # TYPE pilot_xds gauge pilot_xds{version="1.27.0"} 2 # HELP pilot_xds_config_size_bytes Distribution of configuration sizes pushed to clients # TYPE pilot_xds_config_size_bytes histogram pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="1"} 0 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="10000"} 0 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="1e+06"} 4 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="4e+06"} 4 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="1e+07"} 4 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="4e+07"} 4 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.cluster.v3.Cluster",le="+Inf"} 4 pilot_xds_config_size_bytes_sum{type="type.googleapis.com/envoy.config.cluster.v3.Cluster"} 85400 pilot_xds_config_size_bytes_count{type="type.googleapis.com/envoy.config.cluster.v3.Cluster"} 4 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="1"} 0 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="10000"} 9 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="1e+06"} 9 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="4e+06"} 9 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="1e+07"} 9 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="4e+07"} 9 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",le="+Inf"} 9 pilot_xds_config_size_bytes_sum{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment"} 13256 pilot_xds_config_size_bytes_count{type="type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment"} 9 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="1"} 2 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="10000"} 4 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="1e+06"} 4 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="4e+06"} 4 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="1e+07"} 4 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="4e+07"} 4 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.listener.v3.Listener",le="+Inf"} 4 pilot_xds_config_size_bytes_sum{type="type.googleapis.com/envoy.config.listener.v3.Listener"} 6844 pilot_xds_config_size_bytes_count{type="type.googleapis.com/envoy.config.listener.v3.Listener"} 4 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="1"} 0 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="10000"} 2 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="1e+06"} 2 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="4e+06"} 2 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="1e+07"} 2 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="4e+07"} 2 pilot_xds_config_size_bytes_bucket{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration",le="+Inf"} 2 pilot_xds_config_size_bytes_sum{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration"} 1076 pilot_xds_config_size_bytes_count{type="type.googleapis.com/envoy.config.route.v3.RouteConfiguration"} 2 # HELP pilot_xds_push_time Total time in seconds Pilot takes to push lds, rds, cds and eds. # TYPE pilot_xds_push_time histogram pilot_xds_push_time_bucket{type="cds",le="0.01"} 4 pilot_xds_push_time_bucket{type="cds",le="0.1"} 4 pilot_xds_push_time_bucket{type="cds",le="1"} 4 pilot_xds_push_time_bucket{type="cds",le="3"} 4 pilot_xds_push_time_bucket{type="cds",le="5"} 4 pilot_xds_push_time_bucket{type="cds",le="10"} 4 pilot_xds_push_time_bucket{type="cds",le="20"} 4 pilot_xds_push_time_bucket{type="cds",le="30"} 4 pilot_xds_push_time_bucket{type="cds",le="+Inf"} 4 pilot_xds_push_time_sum{type="cds"} 0.003247625 pilot_xds_push_time_count{type="cds"} 4 pilot_xds_push_time_bucket{type="eds",le="0.01"} 9 pilot_xds_push_time_bucket{type="eds",le="0.1"} 9 pilot_xds_push_time_bucket{type="eds",le="1"} 9 pilot_xds_push_time_bucket{type="eds",le="3"} 9 pilot_xds_push_time_bucket{type="eds",le="5"} 9 pilot_xds_push_time_bucket{type="eds",le="10"} 9 pilot_xds_push_time_bucket{type="eds",le="20"} 9 pilot_xds_push_time_bucket{type="eds",le="30"} 9 pilot_xds_push_time_bucket{type="eds",le="+Inf"} 9 pilot_xds_push_time_sum{type="eds"} 0.0025550819999999998 pilot_xds_push_time_count{type="eds"} 9 pilot_xds_push_time_bucket{type="lds",le="0.01"} 4 pilot_xds_push_time_bucket{type="lds",le="0.1"} 4 pilot_xds_push_time_bucket{type="lds",le="1"} 4 pilot_xds_push_time_bucket{type="lds",le="3"} 4 pilot_xds_push_time_bucket{type="lds",le="5"} 4 pilot_xds_push_time_bucket{type="lds",le="10"} 4 pilot_xds_push_time_bucket{type="lds",le="20"} 4 pilot_xds_push_time_bucket{type="lds",le="30"} 4 pilot_xds_push_time_bucket{type="lds",le="+Inf"} 4 pilot_xds_push_time_sum{type="lds"} 0.0073142089999999995 pilot_xds_push_time_count{type="lds"} 4 pilot_xds_push_time_bucket{type="rds",le="0.01"} 2 pilot_xds_push_time_bucket{type="rds",le="0.1"} 2 pilot_xds_push_time_bucket{type="rds",le="1"} 2 pilot_xds_push_time_bucket{type="rds",le="3"} 2 pilot_xds_push_time_bucket{type="rds",le="5"} 2 pilot_xds_push_time_bucket{type="rds",le="10"} 2 pilot_xds_push_time_bucket{type="rds",le="20"} 2 pilot_xds_push_time_bucket{type="rds",le="30"} 2 pilot_xds_push_time_bucket{type="rds",le="+Inf"} 2 pilot_xds_push_time_sum{type="rds"} 0.000928501 pilot_xds_push_time_count{type="rds"} 2 # HELP pilot_xds_pushes Pilot build and send errors for lds, rds, cds and eds. # TYPE pilot_xds_pushes counter pilot_xds_pushes{type="cds"} 4 pilot_xds_pushes{type="eds"} 9 pilot_xds_pushes{type="lds"} 4 pilot_xds_pushes{type="rds"} 2 # HELP pilot_xds_send_time Total time in seconds Pilot takes to send generated configuration. # TYPE pilot_xds_send_time histogram pilot_xds_send_time_bucket{le="0.01"} 19 pilot_xds_send_time_bucket{le="0.1"} 19 pilot_xds_send_time_bucket{le="1"} 19 pilot_xds_send_time_bucket{le="3"} 19 pilot_xds_send_time_bucket{le="5"} 19 pilot_xds_send_time_bucket{le="10"} 19 pilot_xds_send_time_bucket{le="20"} 19 pilot_xds_send_time_bucket{le="30"} 19 pilot_xds_send_time_bucket{le="+Inf"} 19 pilot_xds_send_time_sum 0.0013460400000000002 pilot_xds_send_time_count 19 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 8.46 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1.073741816e+09 # HELP process_network_receive_bytes_total Number of bytes received by the process over the network. # TYPE process_network_receive_bytes_total counter process_network_receive_bytes_total 3.284157e+06 # HELP process_network_transmit_bytes_total Number of bytes sent by the process over the network. # TYPE process_network_transmit_bytes_total counter process_network_transmit_bytes_total 951206 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 17 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 5.8810368e+07 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1.75806778225e+09 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 1.366982656e+09 # HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes. # TYPE process_virtual_memory_max_bytes gauge process_virtual_memory_max_bytes 1.8446744073709552e+19
이 중 워크로드 이니증서 요청 CSR 에 서명하는데 사용하는 루트 인증서의 만료 시점과 컨트롤 플레인에 들어온 CSR 요청 및 발급된 인증서 개수를 확인 가능
# HELP citadel_server_root_cert_expiry_seconds The time remaining, in seconds, before the root cert will expire. A negative value indicates the cert is expired.
# TYPE citadel_server_root_cert_expiry_seconds gauge
citadel_server_root_cert_expiry_seconds 3.12768891974063e+08
# HELP citadel_server_root_cert_expiry_timestamp The unix timestamp, in seconds, when the root cert will expire.
# TYPE citadel_server_root_cert_expiry_timestamp gauge
citadel_server_root_cert_expiry_timestamp 2.070837463e+09
# HELP citadel_server_success_cert_issuance_count The number of certificates issuances that have succeeded.
# TYPE citadel_server_success_cert_issuance_count counter
citadel_server_success_cert_issuance_count 2
istio 버젼
# HELP istio_build Istio component build info
# TYPE istio_build gauge
istio_build{component="pilot",tag="1.27.0"} 1
설정을 데이터 플레인 프록시에 밀어넣고 동기화 하는데 소요되는 시간의 분포
# HELP pilot_proxy_convergence_time Delay in seconds between config change and a proxy receiving all required configuration.
# TYPE pilot_proxy_convergence_time histogram
pilot_proxy_convergence_time_bucket{le="0.1"} 7
pilot_proxy_convergence_time_bucket{le="0.5"} 7
pilot_proxy_convergence_time_bucket{le="1"} 7
pilot_proxy_convergence_time_bucket{le="3"} 7
pilot_proxy_convergence_time_bucket{le="5"} 7
pilot_proxy_convergence_time_bucket{le="10"} 7
pilot_proxy_convergence_time_bucket{le="20"} 7
pilot_proxy_convergence_time_bucket{le="30"} 7
pilot_proxy_convergence_time_bucket{le="+Inf"} 7
pilot_proxy_convergence_time_sum 0.00831075
pilot_proxy_convergence_time_count 7
컨트롤 플레인에 알려진 서비스 개수, 사용자가 설정한 VirtualService 리소스 개수, 연결된 프록시 개수
# HELP pilot_services Total services known to pilot.
# TYPE pilot_services gauge
pilot_services 9
# HELP pilot_virt_services Total virtual services known to pilot.
# TYPE pilot_virt_services gauge
pilot_virt_services 2
# HELP pilot_vservice_dup_domain Virtual services with dup domains.
# TYPE pilot_vservice_dup_domain gauge
pilot_vservice_dup_domain 0
# HELP pilot_xds Number of endpoints connected to this pilot using XDS.
# TYPE pilot_xds gauge
pilot_xds{version="1.27.0"} 2
특정 xDS API의 업데이트 횟수
# HELP pilot_xds_pushes Pilot build and send errors for lds, rds, cds and eds.
# TYPE pilot_xds_pushes counter
pilot_xds_pushes{type="cds"} 4
pilot_xds_pushes{type="eds"} 9
pilot_xds_pushes{type="lds"} 4
pilot_xds_pushes{type="rds"} 2
7.3 프로메테우스로 이스티오 메트릭 긁어오기
7.3.1 프로메테우스와 그라파나 설정하기
helm repo add prometheus-community https://prometheus-community.github.io/helm
-charts
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
"prometheus-community" has been added to your repositories
helm repo update
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "volcano-sh" chart repository
...Successfully got an update from the "traefik" chart repository
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "milvus" chart repository
...Successfully got an update from the "vector" chart repository
...Successfully got an update from the "localstack-repo" chart repository
...Successfully got an update from the "localstack" chart repository
...Successfully got an update from the "flagger" chart repository
Update Complete. ⎈Happy Helming!⎈
k create ns prometheus
namespace/prometheus created
helm install prom prometheus-community/kube-prometheus-stack --version 13.13.1
-n prometheus -f ch7/prom-values.yaml
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/juyoungjung/.kube/config
k get po -n prometheus
NAME READY STATUS RESTARTS AGE
prom-grafana-57bcb4cc59-rvgsd 2/2 Running 0 68s
prom-kube-prometheus-stack-admission-patch-tbbc4 0/1 CrashLoopBackOff 3 (26s ago) 66s
prom-kube-prometheus-stack-operator-69888c5fb6-bv2pj 1/1 Running 0 68s
prometheus-prom-kube-prometheus-stack-prometheus-0 2/2 Running 1 (9s ago) 39s
7.3.2 이스티오 컨트롤 플레인과 워크로드를 긁어가도록 프로메테우스 오퍼레이터 설정하기
- 프로메테우스 오퍼레이터의 커스텀 리소스 ServiceMonitor, PodMonitor
- 프로메테우스가 이스티오에서 메트릭을 수집하도록 설정
- 아래와 같이 이스티오 컨트롤 플레인 구성요소를 긁어오도록 설정할 수 있음
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: istio-component-monitor
namespace: prometheus
labels:
monitoring: istio-components
release: prom
spec:
jobLabel: istio
targetLabels: [app]
selector:
matchExpressions:
- {key: istio, operator: In, values: [pilot]}
namespaceSelector:
any: true
endpoints:
- port: http-monitoring
interval: 15s
k apply -f ch7/service-monitor-cp.yaml
servicemonitor.monitoring.coreos.com/istio-component-monitor created
- 이제 프로메테우스에서 컨트롤 플레인에 대한 텔레메트리를 볼 수 있음
- 컨트롤 플레인에 연결된 사이드카 개수
- 설정 충돌
- 메시 내부 변동량
- 컨트롤 플레인의 기본적인 메모리/CPU 사용량
k -n prometheus port-forward statefulset/prometheus-prom-kube-prometheus-stack
-prometheus 9090
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090
- 데이터 플레인 수집 활성화
k apply -f ch7/pod-monitor-dp.yaml podmonitor.monitoring.coreos.com/envoy-stats-monitor created
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: envoy-stats-monitor
namespace: prometheus
labels:
monitoring: istio-proxies
release: prom
spec:
selector:
matchExpressions:
- {key: istio-prometheus-ignore, operator: DoesNotExist}
namespaceSelector:
any: true
jobLabel: envoy-stats
podMetricsEndpoints:
- path: /stats/prometheus
interval: 15s
relabelings:
- action: keep
sourceLabels: [__meta_kubernetes_pod_container_name]
regex: "istio-proxy"
- action: keep
sourceLabels: [__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape]
- sourceLabels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
targetLabel: __address__
- action: labeldrop
regex: "__meta_kubernetes_pod_label_(.+)"
- sourceLabels: [__meta_kubernetes_namespace]
action: replace
targetLabel: namespace
- sourceLabels: [__meta_kubernetes_pod_name]
action: replace
targetLabel: pod_name
7.4 이스티오 표준 메트릭 커스터마이징하기
- istio_requests_total : COUNTER, 요청이 들어올 때마다 증가
- istio_request_duration_milliseconds : DISTRIBUTION, 요청 지속 시간의 분포
- istio_request_bytes : DISTRIBUTION, 요청 바디 크기의 분포
- istio_response_bytes : DISTRIBUTION, 응답 바디 크기의 분포
- istio_request_messages_total : (gRPC) COUNTER, 클라이언트에게서 메시지가 올 때마다 증가
- istio_response_messages_total : (gRPC) COUNTER, 서버가 메시지를 보낼 때마다 증가
- 3가지 주요 개념
- 메트릭 metric
- 서비스 호출(인, 아웃바운드) 간 텔레메트리의 카운터나 게이지, 히스토그램, 분포 distribution
- istio_requests_total 메트릭은 서비스로 향하는 (인바운드) 혹은 서비스에서 나오는(아웃바운드) 요청의 총개수를 센다.
- 서비스에 인/아웃바운드 요청이 모두 있다면 istio_requests_total 메트릭에는 항목이 2개 표시
- 디멘션 dimenstion
- 인/아웃바운드
- 메트릭과 디멘션 조합마다 통계가 따로 표시
- 메트릭에는 디멘션이 여럿일 수 있음
- istio_requests_toal의 기본 디멘션
- response_code="200" : 요청 세부 정보
- reporter="destination" : 메트릭이 누구의 관점인가
- source_app="istio_ingressgateway" : 호출 주체
- destination_app="webapp" : 호출 대상
- 디멘션 중 하나라도 다르면 메트릭의 새로운 항목으로 보인다.
- 응답 코드가 500인 경우면 다른 줄에서 표시
- 디멘션이 다르면 istio_requests_total에 대해서 서로 다른 2개의 항목이 보인다.
- 속성 attribute
- 특정 디멘션의 값은 attribute에서 갖고 온다.
- 엔보이 프록시가 런타임에 갖고있는 값이 attribute
- reqeust.path : url중 경로 부분
- reqeust.url_path : url 중 경로 부분, 쿼리 문자열 제외
- reqeust.host : 호스트 부분
- reqeust.scheme : 스킴 부분 (예: http)
- reqeust.method : 요청 메서드
- reqeust.headers : 모든 요청 헤더. 헤더 이름은 소문자로 변환
- 응답 속성
- 커넥션 속성
- 업스트림 속성
- 메타데이터/필터 상태 속성
- 웹어셈블리 속성
- 메트릭 metric
7.4.1 기존 메트릭 설정하기
- 이스티오 메트릭은 EnvoyFilter 리소스를 사용해 stats 프록시 플러그인에서 설정한다.
k get envoyfilter -n istio-system
NAME AGE
stats-filter-1.11 30d
stats-filter-1.12 30d
stats-filter-1.13 30d
tcp-stats-filter-1.11 30d
tcp-stats-filter-1.12 30d
tcp-stats-filter-1.13 30d
아래 엔보이 필터는 istio.stats라는 필터를 직접 구성한다.
이 필터는 통계 기능을 구현하는 웹어셈블리 플러그인이다.
이 웹어셈블리 필터는 실제로는 엔보이 코드베이스 내에서 직접 컴파일돼 NULL 가상머신에서 실행된다.
그래서 웹어셈블리 가상머신에서 실행되지 않는다.
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
creationTimestamp: "2025-08-18T00:37:27Z"
generation: 1
labels:
install.operator.istio.io/owning-resource-namespace: istio-system
istio.io/rev: default
operator.istio.io/component: Pilot
operator.istio.io/managed: Reconcile
operator.istio.io/version: 1.13.0
name: stats-filter-1.13
namespace: istio-system
resourceVersion: "869"
uid: a8e8b9ff-3bc4-4ced-8857-e19d400bc151
spec:
configPatches:
- applyTo: HTTP_FILTER
match:
context: SIDECAR_OUTBOUND
listener:
filterChain:
filter:
name: envoy.filters.network.http_connection_manager
subFilter:
name: envoy.filters.http.router
proxy:
proxyVersion: ^1\.13.*
patch:
operation: INSERT_BEFORE
value:
name: istio.stats
typed_config:
'@type': type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
value:
config:
configuration:
'@type': type.googleapis.com/google.protobuf.StringValue
value: |
{
"debug": "false",
"stat_prefix": "istio"
}
root_id: stats_outbound
vm_config:
code:
local:
inline_string: envoy.wasm.stats
runtime: envoy.wasm.runtime.null
vm_id: stats_outbound
- applyTo: HTTP_FILTER
match:
context: SIDECAR_INBOUND
listener:
filterChain:
filter:
name: envoy.filters.network.http_connection_manager
subFilter:
name: envoy.filters.http.router
proxy:
proxyVersion: ^1\.13.*
patch:
operation: INSERT_BEFORE
value:
name: istio.stats <--------- 필터 이름
typed_config:
'@type': type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
value:
config: <--------------- 필터 설정
configuration:
'@type': type.googleapis.com/google.protobuf.StringValue
value: |
{
"debug": "false",
"stat_prefix": "istio",
"disable_host_header_fallback": true,
"metrics": [
{
"dimensions": {
"destination_cluster": "node.metadata['CLUSTER_ID']",
"source_cluster": "downstream_peer.cluster_id"
}
}
]
}
root_id: stats_inbound
vm_config:
code:
local:
inline_string: envoy.wasm.stats
runtime: envoy.wasm.runtime.null
vm_id: stats_inbound
- applyTo: HTTP_FILTER
match:
context: GATEWAY
listener:
filterChain:
filter:
name: envoy.filters.network.http_connection_manager
subFilter:
name: envoy.filters.http.router
proxy:
proxyVersion: ^1\.13.*
patch:
operation: INSERT_BEFORE
value:
name: istio.stats
typed_config:
'@type': type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
value:
config:
configuration:
'@type': type.googleapis.com/google.protobuf.StringValue
value: |
{
"debug": "false",
"stat_prefix": "istio",
"disable_host_header_fallback": true
}
root_id: stats_outbound
vm_config:
code:
local:
inline_string: envoy.wasm.stats
runtime: envoy.wasm.runtime.null
vm_id: stats_outbound
기존 메트릭에 디멘션 추가하기
- istio_requests_total 메트릭에 디멘션을 2개 추가하고 싶다면?
- 업스트림 호출에서 meshID별로 프록시의 버젼이 어떤지 확인하기
- 아래 설정에서는 requests_total 메트릭을 특정해 속성에서 오는 디멘션 둘이 새로 포함되도록 설정
- request_protocol 디멘션을 제거
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
profile: demo
values:
telemetry:
v2:
prometheus:
configOverride:
inboundSidecar:
metrics:
- name: requests_total
dimensions: <----------- 추가한 새 디멘션
upstream_proxy_version: upstream_peer.istio_version
source_mesh_id: node.metadata['MESH_ID']
tags_to_remove: <------- 제거한 태그 목록
- request_protocol
outboundSidecar:
metrics:
- name: requests_total
dimensions:
upstream_proxy_version: upstream_peer.istio_version
source_mesh_id: node.metadata['MESH_ID']
tags_to_remove:
- request_protocol
gateway:
metrics:
- name: requests_total
dimensions:
upstream_proxy_version: upstream_peer.istio_version
source_mesh_id: node.metadata['MESH_ID']
tags_to_remove:
- request_protocol
- 그리고 이 디멘션을 메트릭에서 확인하기 전에 이스티오의 프록시가 이 디멘션에 대해 알게 해야 한다.
- 디플로이먼트 파드 사양에 sidecar.istio.io/extraStatTags 애너테이션을 달아야 함
apiVersion: apps/v1 kind: Deployment metadata: labels: app: webapp name: webapp spec: replicas: 1 selector: matchLabels: app: webapp template: metadata: annotations: proxy.istio.io/config: |- extraStatTags: - "upstream_proxy_version" - "source_mesh_id" labels: app: webapp spec: containers: - env: - name: KUBERNETES_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace image: istioinaction/webapp:latest imagePullPolicy: IfNotPresent name: webapp ports: - containerPort: 8080 name: http protocol: TCP securityContext: privileged: false
### 7.4.2 새로운 메트릭 만들기
* 새 메트릭을 만들고 싶으면 stats 플러그인에 새 메트릭을 정의
* istio_get_calls라는 새로운 이름을 정의
* istio_ 접두사는 자동으로 붙는다.
* 메트릭의 값은 CEL Common Expression Language 표현시인 문자열로 COUNTER 타입에 정수를 반환해야 한다.
* CEL 표현식은 attribute에 대해 작동하고 아래의 경우 HTTP GET 요청 개수를 센다.
```yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
profile: demo
values:
telemetry:
v2:
prometheus:
configOverride:
inboundSidecar:
definitions:
- name: get_calls
type: COUNTER
value: "(request.method.startsWith('GET') ? 1 : 0)"
outboundSidecar:
definitions:
- name: get_calls
type: COUNTER
value: "(request.method.startsWith('GET') ? 1 : 0)"
gateway:
definitions:
- name: get_calls
type: COUNTER
value: "(request.method.startsWith('GET') ? 1 : 0)"
- 새 디멘션은 이스티오 프록시에 명시적으로 알리듯이, 새 메트릭을 만들 때는 프록시에서 노출하라고 이스티오에 알려야 한다.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: webapp
name: webapp
spec:
replicas: 1
selector:
matchLabels:
app: webapp
template:
metadata:
annotations:
proxy.istio.io/config: |-
proxyStatsMatcher:
inclusionPrefixes:
- "istio_get_calls"
labels:
app: webapp
spec:
containers:
- env:
- name: KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: istioinaction/webapp:latest
imagePullPolicy: IfNotPresent
name: webapp
ports:
- containerPort: 8080
name: http
protocol: TCP
securityContext:
privileged: false
7.4.3 새 속성으로 호출 그룹화하기
- 기존 속성을 기반으로 더 세분화하거나 도메인에 특화해 새 속성을 만들 수 있다.
- istio_operationId 라는 새 속성을 만들 수 있다.
- request.path_url과 request_method를 조합해 catalog 서비스의 /items API 로 가는 GET 호출 개수를 추적
- 이를 위해 attribute_gen 프록시 플러그인을 이용
- 웹 어셈블리 확장
- 이 플러그인은 stats 플러그인을 보완하는 역할을 한다.
- stats 플러그인보다 먼저 적용되어서 이 플러그인의 모든 속성을 stats에서 사용 가능
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: attribute-gen-example
namespace: istioinaction
spec:
configPatches:
## Sidecar Outbound
- applyTo: HTTP_FILTER
match:
context: SIDECAR_OUTBOUND
listener:
filterChain:
filter:
name: envoy.filters.network.http_connection_manager
subFilter:
name: istio.stats
proxy:
proxyVersion: ^1\.13.*
patch:
operation: INSERT_BEFORE
value:
name: istio.attributegen
typed_config:
'@type': type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
value:
config:
configuration:
'@type': type.googleapis.com/google.protobuf.StringValue
value: |
{
"attributes": [
{
"output_attribute": "istio_operationId", <-속성이름
"match": [
{
"value": "getitems", <- 속성 값
"condition": "request.url_path == '/items' && request.method == 'GET'"
},
{
"value": "createitem",
"condition": "request.url_path == '/items' && request.method == 'POST'"
},
{
"value": "deleteitem",
"condition": "request.url_path == '/items' && request.method == 'DELETE'"
}
]
}
]
}
vm_config:
code:
local:
inline_string: envoy.wasm.attributegen
runtime: envoy.wasm.runtime.null
그리고 이 cataglog 에 대한 API 호출을 식별하기 위해 istio_requests_total 메트릭안에 속성을 사용하는 새 디멘션 추가 upstream_operation
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
profile: demo
values:
telemetry:
v2:
prometheus:
configOverride:
outboundSidecar:
metrics:
- name: requests_total
dimensions:
upstream_operation: istio_operationId
반응형
'Kubernetes' 카테고리의 다른 글
Istio in Action 6장 - 복원력: 애플리케이션 네트워킹 문제 해결하기 (0) | 2025.09.07 |
---|---|
Istio in Action 5장 - 트래픽 제어 : 세밀한 트래픽 라우팅 (1) | 2025.08.28 |
Istio in Action 4장 - Istio Gateway : 클러스터로 트래픽 들이기 (0) | 2025.08.20 |
Gateway API, AWS Gateway API Controller (0) | 2025.04.26 |
Vault로 K8S Secret 관리하기 (0) | 2025.04.10 |