應用程序可觀測性實戰 - Logging¶
本教程主要用來展示應用程式如何實作可觀測性三大支柱之一的 Logs 并且與平台系統元件進行整合。
步驟 01 - 環境安裝¶
環境的搭建主要要使用到下列幾個關鍵元件:
- Kubernetes
- Banzai Cloud - Logging operator
- fluentbit
- fluentd
- Grafana Loki
- Prometheus Operator
- grafana
Kubernetes¶
本教程使用 K3D 來構建實驗 K8S 集群, 詳細說明請參考:
執行下列命令來創建實驗 Kubernetes 集群:
確認 Kubernetes 及 Kubectl 是否成功安裝:
(輸出結果)Kubernetes control plane is running at https://0.0.0.0:6443
CoreDNS is running at https://0.0.0.0:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://0.0.0.0:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
Grafana Loki¶
本教程使用 grafana/loki-stack
chart 來構建 Grafana Loki, 詳細說明請參考:
使用以下命令添加 Loki 的 chart 存儲庫:
將 Loki 安裝到 logging 命名空間中:
helm upgrade --install --create-namespace --namespace logging loki grafana/loki-stack \
--set promtail.enabled=false
安裝之後檢查:
結果:
$ kubectl get svc -n logging
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
loki ClusterIP 10.43.32.223 <none> 3100/TCP 2m47s
Tip
loki 的服務可通過 http://loki.logging:3100
來連接
logging-operator¶
本教程使用 banzaicloud-stable/logging-operator
chart 來構建 Logging operator, 詳細說明請參考:
添加 banzaicloud-stable
helm 存儲庫並更新本地緩存:
將 Logging operator 安裝到 logging
命名空間中:
helm upgrade --install --wait --create-namespace --namespace logging logging-operator banzaicloud-stable/logging-operator
kube-prometheus-stack¶
本教程使用 kube-prometheus-stack
來構建可觀測性的相關元件, 詳細說明請參考:
添加 Prometheus-Community helm 存儲庫並更新本地緩存:
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
創建要配置的 vlaues 檔案:
grafana:
# change timezone setting base on browser
defaultDashboardsTimezone: browser
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://kube-stack-prometheus-kube-prometheus.monitoring:9090
access: proxy
isDefault: true
- name: Loki
type: loki
url: http://loki.logging:3100
access: proxy
isDefault: false
grafana.ini:
users:
viewers_can_edit: true
auth:
disable_login_form: false
disable_signout_menu: false
auth.anonymous:
enabled: true
org_role: Viewer
sidecar:
datasources:
logLevel: "DEBUG"
enabled: true
searchNamespace: "ALL"
dashboards:
logLevel: "DEBUG"
# enable the cluster wide search for dashbaords and adds/updates/deletes them in grafana
enabled: true
searchNamespace: "ALL"
label: grafana_dashboard
labelValue: "1"
prometheus:
prometheusSpec:
# enable the cluster wide search for ServiceMonitor CRD
serviceMonitorSelectorNilUsesHelmValues: false
# enable the cluster wide search for PodMonitor CRD
podMonitorSelectorNilUsesHelmValues: false
# enable the cluster wide search for PrometheusRule CRD
ruleSelectorNilUsesHelmValues: false
probeSelectorNilUsesHelmValues: false
Info
在設定 kube-stack-prometheus
的設定中定義了兩個元件 (Prometheus 與 Loki) 在 grafana 中的 Data Source 的設定:
grafana:
# change timezone setting base on browser
defaultDashboardsTimezone: browser
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://kube-stack-prometheus-kube-prometheus.monitoring:9090
access: proxy
isDefault: true
- name: Loki
type: loki
url: http://loki.logging:3100
access: proxy
isDefault: false
..
..
..
使用 Helm 在命名空間監控中部署 kube-stack-prometheus
chart:
helm upgrade --install --wait --create-namespace --namespace monitoring \
kube-stack-prometheus prometheus-community/kube-prometheus-stack \
--values kube-stack-prometheus-values.yaml
該 Helm chart 安裝了 Prometheus 組件和 Operator、Grafana 以及以下 exporters:
- prometheus-node-exporter 暴露底層硬件和操作系統的相關指標
- kube-state-metrics 監聽 Kubernetes API 服務器並生成有關對象狀態的指標
有關 kube-stack-prometheus
的詳細說明:
連接到 Grafana¶
Grafana Web UI 可通過以下命令通過端口轉發訪問:
kubectl port-forward --namespace monitoring svc/kube-stack-prometheus-grafana 3000:80 --address="0.0.0.0"
打開瀏覽器並轉到 http://localhost:3000 並填寫前一個命令所取得的用戶名/密碼。預設的帳號是:
username
: adminpassword
: prom-operator
步驟 02 - 應用程序 Logging 實戰¶
在雲原生應用開發 The Twelve-Factor App 方法論中的第11個原則說的是:
Quote
- Logs
把日志當作事件流
- 應用程式不應該煩惱 logs 存放位置,統一使用 stdout 直接輸出
- 這些事件流可以輸出至文件,或是在終端實現觀察的
- 輸出流可以發送到 Loki/ELK 諸如此類的日誌索引及分析系統
對很多應用開發工程師不容易理解為什麼把 log 日誌打印在 STDOUT 上是符合現代雲原生的一個作法,其實這個原則在實踐上是要搭配應用運行的平台來實施。
本教程是以 Kubernetes 來作為應用佈署運行的平台的角度來說明平台如何簡化了應用在 Observability 上的提升。
有關 Kubernetes 日誌架構 的詳細說明:
應用程序可觀性針對 Logging 的這個支柱來強化時的整合的手法可簡化成三個步驟:
- 應用程序日誌結構化 (Structured Logging)
- 設定應用程序日誌集中化收集轉換與轉拋的規則
- 應用程序日誌探索與查找
1. 應用程序日誌結構化¶
結構化日誌記錄是為應用程序日誌提供標準化的、預先確定的消息格式的過程,使它們能夠作為數據集而不是文本來處理。結構化日誌記錄的目標是獲取現在作為文本字符串提供的應用程序日誌,並將其轉化為易於搜索和分析的關係數據集合。
結構化日誌記錄組織記錄的數據,使其易於搜索、過濾和處理以進行更複雜的分析。JSON 是結構化日誌記錄的首選格式,但也可以接受其他格式例如在 CNCF 專案中常見的 logmft。最佳實踐是使用可以與日誌管理解決方案交互並支持自定義字段的日誌記錄框架來提供結構化日誌記錄。
參考下面教程來了解日誌結構化的定義與詳細說明:
儀器化應用程序的手法主要可區分成兩種類型:
本教程主要說明如何在不同程式語言裡使用第三方套件(侵入式的強化)來進行應用程序日誌結構化 (Structured Logging)。
參考下面兩個教程來來進行 Dotnet 應用程序日誌結構化:
佈署應用:
kubectl apply -f -<<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: dotnet-todoapi
labels:
app: dotnet-todoapi
spec:
replicas: 1
selector:
matchLabels:
app: dotnet-todoapi
template:
metadata:
labels:
app: dotnet-todoapi
spec:
containers:
- name: dotnet-todoapi
image: witlab/dotnet-todoapi:logging
ports:
- name: web
containerPort: 8080
EOF
創建 Service
:
kubectl apply -f -<<EOF
apiVersion: v1
kind: Service
metadata:
name: dotnet-todoapi
labels:
app: dotnet-todoapi
spec:
selector:
app: dotnet-todoapi
ports:
- name: web
port: 8080
targetPort: 8080
EOF
完成上述的教程後, 應用程式會啟動在 port:8080
並且將己經結構化的日誌打印在 stdout 上。執行下列指令來驗證:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
dotnet-todoapi-6ff77d597-sqsxf 1/1 Running 0 52s
$ kubectl logs dotnet-todoapi-6ff77d597-sqsxf
{"Timestamp":"2022-12-03T09:19:58.3051046+00:00","Level":"Information","Message":"Starting web host","Application":"dotnet-todoapi"}
{"Timestamp":"2022-12-03T09:19:58.4785717+00:00","Level":"Warning","Message":"Overriding address(es) 'http://+:80'. Binding to endpoints defined via IConfiguration and/or UseKestrel() instead.","addresses":"http://+:80","SourceContext":"Microsoft.AspNetCore.Server.Kestrel","Application":"dotnet-todoapi"}
{"Timestamp":"2022-12-03T09:19:58.4903858+00:00","Level":"Information","Message":"Now listening on: http://[::]:8080","address":"http://[::]:8080","EventId":{"Id":14,"Name":"ListeningOnAddress"},"SourceContext":"Microsoft.Hosting.Lifetime","Application":"dotnet-todoapi"}
{"Timestamp":"2022-12-03T09:19:58.4905616+00:00","Level":"Information","Message":"Application started. Press Ctrl+C to shut down.","SourceContext":"Microsoft.Hosting.Lifetime","Application":"dotnet-todoapi"}
{"Timestamp":"2022-12-03T09:19:58.4906005+00:00","Level":"Information","Message":"Hosting environment: Production","envName":"Production","SourceContext":"Microsoft.Hosting.Lifetime","Application":"dotnet-todoapi"}
{"Timestamp":"2022-12-03T09:19:58.4906368+00:00","Level":"Information","Message":"Content root path: /app/","contentRoot":"/app/","SourceContext":"Microsoft.Hosting.Lifetime","Application":"dotnet-todoapi"}
參考下面兩個教程來來進行 Java 應用程序日誌結構化:
佈署應用:
kubectl apply -f -<<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: java-todoapi
labels:
app: java-todoapi
spec:
replicas: 1
selector:
matchLabels:
app: java-todoapi
template:
metadata:
labels:
app: java-todoapi
spec:
containers:
- name: java-todoapi
image: witlab/java-todoapi:logging
ports:
- name: web
containerPort: 8080
EOF
創建 Service
:
kubectl apply -f -<<EOF
apiVersion: v1
kind: Service
metadata:
name: java-todoapi
labels:
app: java-todoapi
spec:
selector:
app: java-todoapi
ports:
- name: web
port: 8080
targetPort: 8080
EOF
完成上述的教程後, 應用程式會啟動在 port:8080
並且將己經結構化的日誌打印在 stdout 上。執行下列指令來驗證:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
java-todoapi-79f8b546f9-x4wcx 1/1 Running 0 18s
$ kubectl logs java-todoapi-79f8b546f9-x4wcx
{"@timestamp":"2022-12-03T11:29:28.727Z","@version":"1","message":"Starting TodoapiApplication v0.0.1-SNAPSHOT using Java 11.0.16.1 on java-todoapi-79f8b546f9-x4wcx with PID 1 (/opt/app.jar started by root in /opt)","logger_name":"com.wistron.todoapi.TodoapiApplication","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:28.736Z","@version":"1","message":"No active profile set, falling back to 1 default profile: \"default\"","logger_name":"com.wistron.todoapi.TodoapiApplication","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:29.283Z","@version":"1","message":"Bootstrapping Spring Data JPA repositories in DEFAULT mode.","logger_name":"org.springframework.data.repository.config.RepositoryConfigurationDelegate","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:29.319Z","@version":"1","message":"Finished Spring Data repository scanning in 28 ms. Found 1 JPA repository interfaces.","logger_name":"org.springframework.data.repository.config.RepositoryConfigurationDelegate","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:29.838Z","@version":"1","message":"Tomcat initialized with port(s): 8080 (http)","logger_name":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:29.846Z","@version":"1","message":"Initializing ProtocolHandler [\"http-nio-8080\"]","logger_name":"org.apache.coyote.http11.Http11NioProtocol","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:29.847Z","@version":"1","message":"Starting service [Tomcat]","logger_name":"org.apache.catalina.core.StandardService","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:29.847Z","@version":"1","message":"Starting Servlet engine: [Apache Tomcat/9.0.68]","logger_name":"org.apache.catalina.core.StandardEngine","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:29.906Z","@version":"1","message":"Initializing Spring embedded WebApplicationContext","logger_name":"org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/]","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:29.906Z","@version":"1","message":"Root WebApplicationContext: initialization completed in 1125 ms","logger_name":"org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:30.038Z","@version":"1","message":"HikariPool-1 - Starting...","logger_name":"com.zaxxer.hikari.HikariDataSource","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:30.233Z","@version":"1","message":"HikariPool-1 - Start completed.","logger_name":"com.zaxxer.hikari.HikariDataSource","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:30.274Z","@version":"1","message":"HHH000204: Processing PersistenceUnitInfo [name: default]","logger_name":"org.hibernate.jpa.internal.util.LogHelper","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:30.316Z","@version":"1","message":"HHH000412: Hibernate ORM core version 5.6.12.Final","logger_name":"org.hibernate.Version","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:30.458Z","@version":"1","message":"HCANN000001: Hibernate Commons Annotations {5.1.2.Final}","logger_name":"org.hibernate.annotations.common.Version","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:30.571Z","@version":"1","message":"HHH000400: Using dialect: org.hibernate.dialect.H2Dialect","logger_name":"org.hibernate.dialect.Dialect","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:31.019Z","@version":"1","message":"HHH000490: Using JtaPlatform implementation: [org.hibernate.engine.transaction.jta.platform.internal.NoJtaPlatform]","logger_name":"org.hibernate.engine.transaction.jta.platform.internal.JtaPlatformInitiator","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:31.026Z","@version":"1","message":"Initialized JPA EntityManagerFactory for persistence unit 'default'","logger_name":"org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:31.47Z","@version":"1","message":"spring.jpa.open-in-view is enabled by default. Therefore, database queries may be performed during view rendering. Explicitly configure spring.jpa.open-in-view to disable this warning","logger_name":"org.springframework.boot.autoconfigure.orm.jpa.JpaBaseConfiguration$JpaWebConfiguration","thread_name":"main","level":"WARN","level_value":30000}
{"@timestamp":"2022-12-03T11:29:32.13Z","@version":"1","message":"Starting ProtocolHandler [\"http-nio-8080\"]","logger_name":"org.apache.coyote.http11.Http11NioProtocol","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:32.145Z","@version":"1","message":"Tomcat started on port(s): 8080 (http) with context path ''","logger_name":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2022-12-03T11:29:32.153Z","@version":"1","message":"Started TodoapiApplication in 4.069 seconds (JVM running for 4.495)","logger_name":"com.wistron.todoapi.TodoapiApplication","thread_name":"main","level":"INFO","level_value":20000}
2. 設定應用程序日誌集中化收集轉換與轉拋的規則¶
Logging Operator 利用 CRD 的方式介入了日誌從採集、路由、輸出這三個階段的配置。它本質上來說還是利用 DaemonSet 和 StatefulSet 在集群內分別部署了 FluentBit 和 Fluentd 兩個組件,FluentBit 將容器日誌採集並初步處理後轉發給 Fluentd 做進一步的解析和路由,最終由 Fluentd 將日誌結果轉發給不同的服務。
Logging Operator CRD
整個 Logging Operator 的核心 CRD 就只有5個,它們分別是:
- logging:用於定義一個日誌採集端(FleuntBit)和傳輸端(Fleuntd)服務的基礎配置;
- flow:用於定義一個 namespaces 級別的日誌過濾、解析和路由等規則。
- clusterflow:用於定義一個集群級別的日誌過濾、解析和路由等規則。
- output: 用於定義namespace級別的日誌的輸出和參數;
- clusteroutput: 用於定義集群級別的日誌輸出和參數,它能把被其他命名空間內的 flow 關聯
參考下面教程來了解 LoggingOperator 的詳細說明:
接著讓我們使用 logging operator 所定義的 CRD 來定義收集日誌的 pipeline。下列的設定是使用了最簡化的配置來啟用 Logging Operator,實務上會採取各細緻的配置。
創建日誌 Logging 資源¶
kubectl -n logging apply -f - <<EOF
apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
name: default-logging-simple
spec:
fluentd: {}
fluentbit: {}
controlNamespace: logging
EOF
創建 Loki ClusterOutput 資源¶
kubectl -n logging apply -f - <<EOF
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterOutput
metadata:
name: loki-cluster-output
spec:
loki:
url: http://loki.logging:3100
configure_kubernetes_labels: true
buffer:
flush_mode: interval
flush_interval: 5s
timekey: 1m
timekey_wait: 30s
timekey_use_utc: true
EOF
創建 ClusterFlow 資源¶
kubectl -n logging apply -f - <<EOF
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterFlow
metadata:
name: loki-cluster-flow
spec:
filters:
- tag_normaliser: {}
globalOutputRefs:
- loki-cluster-output
EOF
3. 應用程序日誌探索與查找¶
打開瀏覽器並輸入 Grafana UI的位址 http://localhost:3000
。預設的帳號是:
username
: adminpassword
: prom-operator
Loki 查詢語言 LogQL¶
受 PromQL 的啓發,Loki 也有自己的查詢語言,稱爲 LogQL,它就像一個分佈式的 grep,可以聚合查看日誌。和 PromQL 一樣,LogQL 也是使用標籤和運算符進行過濾的。
想了解更多使用 LogQL 來查詢或分析日誌的語法:
結論¶
現在我們已經有了結構化的、集中的和正規化的日誌記錄的基本工具與平台。我們可以同時查詢我們所有的應用程序,搜索一些特定的異常,並過濾掉出錯誤日誌。
擁有結構化日誌記錄並不意味著每條的日誌都是值得記錄。換句話說,僅僅因為安裝了可觀察性工具並不意味著我們可看到的更多有用的訊息。
因此,團隊應該思考如何來收集真正有幫助我們了解應用程序的見解的日誌,並且積極地應用這些信息來推論出可改善 Day2 運維的 actions。