Optimizing Model Inference Cold Start

xieydd published on 2025-01-08 included in 2025 serverless serving

Previously, while working on Serverless model inference Modelz, although we have pivoted now, I still want to share how to optimize the cold start problem of model inference. Since our service is based on container orchestration, it also involves the cold start problem of containers. Optimizing Model Inference Cold Start Problem First, let's look at the process of Serverless model inference, from user request to model inference: Click me 1 2 3 4 5 6 7 8 9 10 11 12 13 sequenceDiagram participant User participant Cloudflare participant Ingress participant AutoScaler participant Node participant containerd User->>Cloudflare: Model Call Cloudflare->>Ingress: Request Ingress->>AutoScaler: Request AutoScaler->>Node: Scale Up Node->>containerd: Container Note right of containerd: 1.

PostgreSQL High Availability

xieydd published on 2024-07-26 included in 2024 Postgres

I've been researching PostgreSQL high availability solutions recently, and here's what I've learned. PostgreSQL High Availability High Availability Goals PostgreSQL high availability typically has two main objectives: RPO (Recovery Point Objective): The maximum acceptable amount of data loss measured in time. This represents how much data loss a business can tolerate. RTO (Recovery Time Objective): The maximum acceptable downtime, measured from when a disaster occurs until the system is operational again.

All You Need to Know About Topology Awareness in Kubernetes

xieydd published on 2022-12-29 included in 2022 kubernetes Documentation

Recently, I've been working on some NUMA-aware scheduling tasks on an internally developed platform, involving the discovery of Kubernetes node resource topology and scheduling. However, due to my limited knowledge, I often find myself struggling to grasp the full picture. This article is an attempt to summarize and organize my understanding. Why Topology Awareness is Needed According to the official Kubernetes documentation, more and more systems are utilizing CPUs and hardware accelerators like GPUs and DPUs to support low-latency tasks and high-throughput parallel computing tasks.

The internals of Vector Databases

xieydd published on 2024-07-13 included in 2024 vector search Postgres

It has been over a year since I joined Tensorchord, and I haven't had the time to sit down and write some articles. Mainly because after having my daughter Tongtong, things have become much busier. During this time, I also experienced the pivot of the business from Serverless model inference Modelz to the vector search field VectorChord. The experience of this pivot might be shared in future articles, and those interested can also directly contact me.