Generate a Sitemap with a Python Script

2024-03-10

Recently, I was helping a friend with a CMS cluster and found that the CMS did not have a sitemap feature. Since Python 3 was already installed on the server, I decided to temporarily generate a sitemap using a script. Here is a record of it.

Script Content:

#!/usr/bin/env python

import datetime
import mysql.connector
import xml.etree.cElementTree as ET
from lxml import etree

# Database connection parameters
config = {
    'host': '10.80.0.3', # For example: '192.168.1.100'
    'user': 'wnote_r',
    'password': 'Wnote#Pss2024',
    'database': 'wnote',
    'raise_on_warnings': True
}

# Sitemap path
sdir = '/opt/wwwroot/'

def mselect(site, n1, n2):
    # Connect to the database
    cnx = mysql.connector.connect(**config)
    cursor = cnx.cursor()

    # Get article IDs
    article_ids = []
    tag_ids = []

    # Query all records in the article table
    query1 = "select id from article order by newstime desc  limit {};".format(n1)
    cursor.execute(query1)
    for row in cursor:
        article_ids.append(row[0])

    # Query all tags
    query2 = "select id from phome_ecms_book order by newstime desc limit {};".format(n2)
    cursor.execute(query2)
    for row in cursor:
        tag_ids.append(row[0])

    # Generate URL lists
    article_urls = [f"https://{site}/article/{id}.html" for id in article_ids]
    tag_urls = [f"https://{site}/tags/{id}.html" for id in tag_ids]

    # Merge URL lists
    urls = article_urls + tag_urls

    # Create the XML tree structure for the sitemap
    root = ET.Element('urlset',
                  {'xmlns': 'http://www.sitemaps.org/schemas/sitemap/0.9',
                   'xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
                   'xmlns:mobile': 'http://www.baidu.com/schemas/sitemap-mobile/1/',
                   'xsi:schemaLocation': 'http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd'})

    # Generate the sitemap file and build URL elements
    for url in urls:
        s_url = ET.SubElement(root, 'url')
        s_loc = ET.SubElement(s_url, 'loc')
        s_loc.text = url
        s_lastmod = ET.SubElement(s_url, 'lastmod')
        s_lastmod.text = datetime.datetime.now().strftime('%Y-%m-%dT%H:%M:%S+08:00')
        s_changefreq = ET.SubElement(s_url, 'changefreq')
        s_changefreq.text = 'always'
        s_priority = ET.SubElement(s_url, 'priority')
        s_priority.text = '0.95'

    # Generate the XML string
    sitemap_str = ET.tostring(root, encoding='unicode')

    # Save to file
    with open(spfile, "w", encoding="utf-8") as f:
        f.write(sitemap_str)

    # Close the database connection
    cursor.close()
    cnx.close()

with open('sites.list', 'r') as files:
    for tmp in files:
        site = tmp.strip().split()[0]
        spfile = sdir + site + '/' + tmp.strip().split()[1]
        n1 = tmp.strip().split()[2]
        n2 = tmp.strip().split()[3]
        print(site, spfile)
        mselect(site, n1, n2)

The content of sites.list is:

A Powerful Tool for Operating Baidu Cloud on Linux

wanzi

2023-12-04

Recently, I was performing a private deployment of our company’s SAAS product at the customer site. Since the customer’s network cannot access the internet, data can only be transferred through a Linux jump server. In addition to the k8s offline deployment program and images, there are also several hundred GB of pre-cut video data, which can only be transferred via Baidu Netdisk. I wondered if it was possible to synchronize Baidu Netdisk data via the command line. A quick Google search revealed that it is indeed possible. Below, I will briefly introduce the use of bypy.

Recommended Windows Package Management Tool

wanzi

2023-07-10

Having been used to the convenience of using Homebrew to install software packages on Mac, I recently installed a Windows system on my company’s computer and wanted to set it up similarly. The solution is Scoop, which I am recommending today.

Scoop is an open-source project that primarily uses commands to install Windows software packages. It effectively avoids permission pop-ups, hides GUI wizard installations, automatically finds and installs dependencies, and automates the installation process.

Achieving Multi-User Access on a Single GPU Card Using Alibaba Cloud's Open Source Solution

wanzi

2023-06-30

A new AI project has recently been launched, primarily providing online AI experiments for universities. The project has also purchased a GPU server, but it only has one Nvidia Tesla T4 card, which needs to support multiple students doing experiments online simultaneously.

The current online experiment system runs on Kubernetes, so we need to consider GPU sharing in the k8s environment. We have previously tested the Alibaba Cloud GPU card sharing solution; here, I will just record the steps for using it:

Go Study Notes - Detailed Explanation of Pointers

wanzi

2023-05-22

What is a Pointer?

In the Go language, a pointer is a variable that stores the memory address of another variable. Therefore, a pointer variable points to the memory address of another variable, not the variable itself.

When declaring a pointer variable, you need to add * before the variable name, indicating that this is a pointer variable, for example:

`1`	`var p *int`

This indicates that a pointer named p has been declared, which points to an integer variable. The & operator can be used to get the address of a variable, for example:

kind: Build a lightweight Kubernetes cluster

wanzi

2023-04-13

Recently revisiting Golang, I built a web application and successfully ran it locally. Now, I wanted to test it in a Kubernetes cluster, but due to hardware limitations, setting up a full-fledged K8s cluster was challenging. I remembered a friend mentioning that Kubernetes can also run inside Docker—so I decided to give it a try.

Today’s focus is kind. What is kind? And what can it do?

1. Introduction to kind

kind stands for Kubernetes In Docker — a tool that uses Docker container nodes to run local Kubernetes clusters. It’s primarily designed for testing Kubernetes itself and is ideal for local development or CI pipelines.

Customize Jenkins Agent Integration with Docker and kubectl Tools

wanzi

2023-03-22

Since the official Jenkins agent image does not include the tools we need—such as Helm, kubectl, curl, and ArgoCD—during Jenkins builds, we must integrate these tools ourselves.

Note: The official image names have changed:

jenkins/agent (formerly jenkins/slave) — renamed starting from version 4.3-2

jenkins/inbound-agent (formerly jenkins/jnlp-slave) — renamed starting from version 4.3-2

Dockerfile

FROM jenkins/inbound-agent:4.11-1-alpine-jdk11

USER root

ADD docker/docker /usr/bin/docker
ADD kubectl /usr/bin/kubectl
ADD helm /usr/bin/helm

RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories
RUN chmod +x /usr/bin/docker /usr/bin/kubectl /usr/bin/helm
RUN apk add curl

ENTRYPOINT ["/usr/local/bin/jenkins-agent"]

Build

`1`	`docker build -t harbor.test.com/tools/jnlp-docker:4.11-1-alpine-jdk11 -f Dockerfile .`

Jenkins Workspace and Local Persistent Storage for Maven Repository

wanzi

2023-03-22

Create Local Storage

Since this is a test cluster, we can directly use local volume storage here.

As the Jenkins server runs under the jenkins user with UID 1000, we need to pre-assign permissions for /opt/jenkins_agent/ and /opt/jenkins_maven/ on node01:

1
2

chown 1000.1000 -R /opt/jenkins_agent/
chown 1000.1000 -R /opt/jenkins_maven/

Local Storage: agent-pv-pvc.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: jenkins-agent-pv
spec:
  storageClassName: local # Local PV
  capacity:
    storage: 30Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  local:
    path: /opt/jenkins_agent
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - node01
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: jenkins-agent-pvc
  namespace: kube-ops
spec:
  storageClassName: local
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 30Gi

Local Storage: maven-pv-pvc.yaml