About xieydd | Xieydd's Blog

📫 If you wish to contact me, you can send an email to [email protected], or add my WeChat echo -n 'eGlleWRkX2hhaGEK' | base64 -d.

Follow me on Xiaohongshu (Chinese) and WeChat Official Account: 东仔 AI Infra (Dongzai AI Infra)

Xiaohongshu: 东仔 AI Infra

Click to expand WeChat Official Account QR Code

WeChat Official Account: 东仔 AI Infra

💻 As of 2026, I have over 8 years of experience in AI infrastructure, MLSys, Agent infrastructure, databases, and systems engineering. I have long worked on cloud-native AI training and inference platforms, managed vector database platforms, and systems performance optimization. I am familiar with Kubernetes, GCP, AWS, PostgreSQL, storage acceleration, resource scheduling, and Agent infrastructure, and can independently design and deliver large-scale distributed systems and cloud services. I also have extensive project management and cross-team collaboration experience, and keep contributing to AI Infra and cloud-native technology through technical exploration and open source work.

2023.5-present Tensorchord

Leading the development of the serverless model inference platform ModelZ on GCP, optimizing inference performance around cold starts, image preheating, P2P distribution, Lazy Load, and model caching services:

Reduced model service cold start time through cache model services and image preheating
Deployed open source models and hosted user LoRA models with vLLM, improving LLM throughput and reducing TTFT and tail latency through KVCache and scheduling optimizations

Cloud Team Leader, developing the vector database VectorChord’s cloud service and customer support VectorChord Cloud:

Built a vector database based on PostgreSQL on AWS, achieving control and data plane separation, BYOC (Bring Your Own Cloud), BYOD (Bring Your Own Data) capabilities
Implemented cloud-native architecture to achieve PostgreSQL storage and compute separation, high availability, Backup, PITR (Point-In-Time Recovery), In-Place Upgrade features

Agent Harness and new business exploration:

Built an enterprise-grade knowledge-base RAG system based on PostgreSQL as infrastructure for Agent memory; integrated 15 categories of workplace systems including Linear, Gmail, and GitHub, with hybrid vector and BM25 retrieval
Developed eBPF-based Agent security projects, using CLI and Skills to provide Prompt Injection protection, Agent Behavior Detection, Agent Native security capabilities, and building a Skill Security SaaS platform to help enterprises improve Skill security and reduce supply-chain risk

2021.2-2023.5 Tencent Cloud

Developed a large-scale AI platform for public cloud:

Built a high-performance, scalable elastic offline training platform using Tencent Cloud EKS (Elastic Kubernetes Service)
Integrated Tencent Cloud object storage and the GooseFS accelerator to create a high-performance cloud cache scheduling system and improve training data access efficiency

Established FinOps infrastructure to help public cloud customers manage and optimize cloud costs more effectively, enhancing cloud resource utilization:

Optimized scheduling and rescheduling, identified high and low priority tasks, and implemented intelligent elastic scaling
Combined Tencent’s Ruyi kernel scheduler optimization and observability to optimize costs while maintaining service quality
Launched a large-scale cost reduction initiative in the internal cloud, improving resource utilization through efficient resource allocation

2018-2021.2 (including internship) Unisound

At the AI algorithm company Unisound, I was responsible for the development and operation of the Atlas supercomputing platform, supporting AI Labs teams running NLP, TTS, ASR, and CV model training. Key responsibilities included:

Developing a large-scale intelligent scheduling system to optimize multi-tenant resource allocation and training resource utilization, improving MFU from 15% to 30%
Enhancing the performance of the high-performance distributed file system Lustre
Building a multi-layer cache cloud-native architecture to accelerate AI model training

Worked on 8-bit training and inference optimization at Unisound, including visual model inference optimization on Kunlun chips and NVIDIA Edge Devices.

Skill set: Kubernetes, EKS, Kubeflow, GCP, AWS, Serverless, Lustre, JuiceFS, GooseFS, scheduling, eBPF, PostgreSQL, vector databases, RAG, AI training platforms, inference services, cold start optimization, GPU / Edge, Agent infrastructure etc.

Open Source Projects

🌱 Currently focusing on MLOps, MLSys, PostgreSQL, Agent infrastructure, eBPF, and FinOps, contributing to several open source projects:

fluid Fluid, elastic data abstraction and acceleration for BigData/AI applications in the cloud. (Project under CNCF)
crane Crane is a FinOps Platform for Cloud Resource Analytics and Economics in Kubernetes clusters. The goal is to help users manage cloud costs more easily while ensuring application quality.
crane-scheduler Crane scheduler is a Kubernetes scheduler that can schedule pods based on actual node load.
creator Creator is the brain of the crane project, containing the core algorithm module and evaluation module.
openmodelz One-click machine learning deployment (LLM, text-to-image, etc.) at scale on any cluster (GCP, AWS, Lambda labs, your home lab, or even a single machine).
clusternet [CNCF Sandbox Project] Managing your Kubernetes clusters (including public, private, edge, etc.) as easily as browsing the Internet
vectorchord Scalable, fast, and disk-friendly vector search in Postgres, the successor of pgvecto.rs.

Recommended Blogs

Type	Author/Company	Blog URL
Infra	Chris Riccomini	https://materializedview.io/
Infra	Jack Vanlightly	https://jack-vanlightly.com/
Math and Science	苏剑林	https://kexue.fm/
AI Infra	Colfax	https://research.colfax-intl.com/blog/
Postgres	Gabriele Bartolini	https://www.gabrielebartolini.it/articles/
AI	Sebastian Raschka	https://magazine.sebastianraschka.com/archive?sort=new
AI Algorithm	Tom Yeh	https://www.byhand.ai/
AI Infra	Chip Huyen	https://huyenchip.com/blog/