前言

由于Github Pages限流的原因,所以打算将静态页面部署到ECS,在没有限流和网络传输损耗的情况下速度会很快,本文章记录了利用Github Action定时每周一0点自动爬取页面,打包Image并上传Docker Hub,更新K8s服务的整个自动化流程。

Github Action真香呐,免费又好用。

建仓

Github

1.上传爬虫文件,使用Python的Scrapy框架,根据官网的Scrapy Tutorial提供的scrapy startproject tutorial命令可以快速构建一个爬虫项目,将创建的项目提交至Github仓库后提交爬虫脚本到仓库,脚本内容参考之前的文章:Python爬取网站内容做静态页面转发
PS:也可以直接在Github建仓后上传脚本,后续在Action中执行scrapy目录的初始化,但是这样的话就需要很多额外操作去移动文件
    如:1.将脚本文件移动到scrapy下的spiders目录下 
        2.cd 至scrapy目录执行scrapy crawl blog 
        3.cp 爬取的文件到上层目录(避免url多了一层路径)

2.创建Dockerfile文件,内容如下:
# 使用官方的nginx作为基础镜像  
FROM nginx:stable-alpine  
  
# 将当前目录下的所有文件复制到nginx的web根目录  
COPY . /usr/share/nginx/html  
  
# 如果需要自定义nginx的配置,可以将配置文件放在当前目录,并复制到这里  
# 例如: COPY nginx.conf /etc/nginx/nginx.conf  
  
# 开放80端口  
EXPOSE 80  
  
# 当容器启动时,运行nginx  
CMD ["nginx", "-g", "daemon off;"]
参考Github项目:https://github.com/brook-david/auto-crawl-blog-rebulid-for-static-web-action.git

Docker Hub

Docker Hub创建一个仓库my-blog

K8s配置

部署Deployment

1.Deployment设置image为上面创建的my-blog仓库,并设置imagePullPolicy为Always避免本地缓存。
2.设置部署节点nodeSelector到控制节点,并设置污点容忍1
apiVersion: apps/v1
kind: Deployment
metadata:
  name: static-blog
  namespace: blog-web
spec:
  replicas: 1
  selector:
    matchLabels:
      app: static-blog
  template:
    metadata:
      labels:
        app: static-blog
    spec:
      containers:
      - name: my-container
        image: dbopen/my-blog:main  # 这里替换为你的镜像仓库和标签
        imagePullPolicy: Always # 每次拉取最新镜像
        ports:
        - containerPort: 80  # 如果你的应用需要暴露端口,可以在这里指定
      tolerations:
      - key: "node-role.kubernetes.io/control-plane"
        operator: "Equal"
        value: ""
        effect: "NoSchedule"
      nodeSelector:
        kubernetes.io/hostname: izbp1605iwejf5qgem2c7hz

部署Service

apiVersion: v1
kind: Service
metadata:
  name: my-static-blog
  namespace: blog-web
  labels:
    app: static-blog
spec:
  ports:
  - port: 80  # Service的端口
    targetPort: 80  # 与Pod中的containerPort相对应
  selector:
    app: static-blog
  type: ClusterIP

部署Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-static-blog-ingress
  namespace: blog-web
  annotations:
    acme.cert-manager.io/http01-edit-in-place: "true"
    cert-manager.io/cluster-issuer: letsencrypt-prod #指定cert-manager的cluster-issuer
    ingress.kubernetes.io/ssl-redirect: "false"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - static.wanderto.top
    secretName: blog-tls-secret-blog-static
  # Ingress规则
  rules:
  - host: static.wanderto.top # 如果需要基于域名的路由,请设置域名
    http:
      paths:
      - path: / # 这里是路径,可以根据需要设置
        pathType: Prefix # 路径类型,可以是Prefix, Exact, or ImplementationSpecific
        backend:
          service:
            name: my-static-blog # 这里替换为你的Service名称
            port:
              number: 80 # 这里替换为你的Service端口号

K8S API TOKEN获取

Action通知K8s更新服务调用K8s API需要TOKEN,生成TOKEN方式有多种,这里使用serviceaccount生成TOKEN,并为serviceaccount绑定blog-web2下Deployment,Pod的查看与更新权限,生成的TOKEN后续使用
kubectl create serviceaccount update-pod-for-api -n blog-web

kubectl create role pod-update --verb=get --verb=list --verb=watch --verb=update --verb=patch --resource=pods --resource=deployment  -n blog-web

kubectl create rolebinding update-pod-for-api-binding --role=pod-update --serviceaccount=blog-web:update-pod-for-api  -n blog-web

# token有效期一个月
kubectl create token update-pod-for-api --duration=2592000s  -n blog-web

Github Action配置

Python环境准备与页面爬取

在环境中安装Python和scrapy,运行scrapy crawl blog命令爬取页面到当前目录
      - name: Setup Python # Set Python version
        uses: actions/setup-python@v5
        with:
          python-version: 3.8  # ${{ matrix.python-version }}
      # Install pip and scrapy
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install scrapy
      - name: Crawl
        run: scrapy crawl blog

打包Image并上传Docker Hub

构建Docker环境并登录Docker Hub,根据Dockerfile将当前目录打包Image并推送,需要将Docker Hub账号密码保存到环境的secrets中,secretss配置位置setting->security->Secrets and variables,参考:Using secrets in GitHub Actions
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Setup Pages
        uses: actions/configure-pages@v5
      - name: Upload artifact
        uses: actions/upload-pages-artifact@v3
        with:
          # Upload entire repository
          path: '.'
        # Docker Image upload
      - name: Log in to Docker Hub
        uses: docker/login-action@f4ef78c080cd8ba55a85445d5b36e214a81df20a
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

      - name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
        with:
          images: dbopen/my-blog
      - name: Build and push Docker image
        id: push
        uses: docker/build-push-action@3b5e8027fcad23fda98b2e3ac259d8d67585f671
        with:
          context: .
          file: ./Dockerfile
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
      - name: Generate artifact attestation
        uses: actions/attest-build-provenance@v1
        with:
          subject-name: ${{ env.DOCKER_REGISTRY }}/${{ env.DOCKER_IMAGE_NAME_BLOG}}
          subject-digest: ${{ steps.push.outputs.digest }}
          push-to-registry: false

通知K8s更新服务

使用K8s API通知K8s更新服务,通过更新Pod annotations中的时间戳的方式触发Pod更新动作,需要将前面生成的TOKEN保存至secrets
      - name: Notify
        run: |
          TIMESTAMP=$(date +%s)
          curl -k --location --request PATCH 'https://${{ vars.K8S_HOST }}:6443/apis/apps/v1/namespaces/blog-web/deployments/static-blog' \
          --header 'Authorization: Bearer ${{ secrets.K8S_UPDATE_TOKEN }}' \
          --header 'Content-Type: application/merge-patch+json' \
          --data '{  
                   "spec": {  
                       "template": {  
                           "metadata": {  
                               "annotations": {  
                                   "issued-timestamp": "'$TIMESTAMP'"    
                               }  
                           }  
                       }  
                   }  
               }'

完整的Action配置文件

# Simple workflow for deploying static content to GitHub Pages
name: Crawl Blog Pages

on:
  schedule:
    # * is a special character in YAML so you have to quote this string
    # Runs at 00:00, only on Monday.
    - cron:  '0 0 * * 1'
  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:
  
# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
  group: "pages"
  cancel-in-progress: false
  
# env
env:
  DOCKER_REGISTRY: docker.io
  DOCKER_IMAGE_NAME_BLOG: dbopen/my-blog 
  
jobs:
  crawl_page_and_bulid:
    name: Crawl Blog Pages And Build Docker Image
    runs-on: ubuntu-latest
    # strategy:
    #   matrix:
    #     python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
    permissions:
      packages: write
      contents: read
      attestations: write
      id-token: write
    steps:
      - uses: actions/checkout@v4
      - name: Setup Python # Set Python version
        uses: actions/setup-python@v5
        with:
          python-version: 3.8  # ${{ matrix.python-version }}
      # Install pip and scrapy
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install scrapy
      - name: Crawl
        run: scrapy crawl blog && ls -l
        # Build
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
        # Docker Image upload
      - name: Log in to Docker Hub
        uses: docker/login-action@f4ef78c080cd8ba55a85445d5b36e214a81df20a
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

      - name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
        with:
          images: dbopen/my-blog
      - name: Build and push Docker image
        id: push
        uses: docker/build-push-action@3b5e8027fcad23fda98b2e3ac259d8d67585f671
        with:
          context: .
          file: ./Dockerfile
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
    
      - name: Generate artifact attestation
        uses: actions/attest-build-provenance@v1
        with:
          subject-name: ${{ env.DOCKER_REGISTRY }}/${{ env.DOCKER_IMAGE_NAME_BLOG}}
          subject-digest: ${{ steps.push.outputs.digest }}
          push-to-registry: false
  notify_k8s_deployment_update:
    name: Notify K8s Deployment Update
    runs-on: ubuntu-latest
    needs: crawl_page_and_bulid
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Notify
        run: |
          TIMESTAMP=$(date +%s)
          curl -k --location --request PATCH 'https://${{ vars.K8S_HOST }}:6443/apis/apps/v1/namespaces/blog-web/deployments/static-blog' \
          --header 'Authorization: Bearer ${{ secrets.K8S_UPDATE_TOKEN }}' \
          --header 'Content-Type: application/merge-patch+json' \
          --data '{  
                   "spec": {  
                       "template": {  
                           "metadata": {  
                               "annotations": {  
                                   "issued-timestamp": "'$TIMESTAMP'"    
                               }  
                           }  
                       }  
                   }  
               }'
参考:https://github.com/brook-david/auto-crawl-blog-rebulid-for-static-web-action/blob/main/.github/workflows/crawl-page.yml

重新配置Igress转发地址

最后更新K8s中主页的Ingress配置,如果502则转发static.wanderto.top页面
# ingress snippets配置:
    nginx.org/server-snippets: |
      error_page 502 = @fallback;
      location @fallback {
        proxy_pass https://xxxxxx.github.io;
      }
本章为Python爬取网站内容做静态页面转发的后续处理

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注