Design a Trending Service

A deep dive into the system design and architecture.

Designing a real-time trending service is a fascinating and complex high-level design problem. It's a common feature in products like Twitter (X), Google News, and YouTube, and it tests a candidate's understanding of big data streaming, real-time analytics, and personalization at a massive scale.

In this deep dive, we'll design a system that can generate a personalized "Top 10" trending list for millions of users.

1. Understanding the Problem: Requirements & Goals

The first step is to define what "trending" means and what features our service must support.

Functional Requirements

Personalized Trend Ranking: The core requirement. The service must generate a unique Top 10 list for each user.

Multiple Trend Sources: This personalization must be a blend of three distinct signals:

Global Trends: What is popular across the entire platform right now.

Social Trends: What is popular within the user's immediate social network (people they follow).

Personal Interests: Topics related to the user's historical activity.

Non-Functional Requirements

Real-time: The trending list must feel "live." We'll aim to have global trends updated and visible to users within 5 minutes of a spike in activity.

Scalability: The system must handle a massive firehose of events (posts, likes, clicks) from hundreds of millions of users.

Low Latency: Fetching the personalized list for a user must be extremely fast. Our target is a P99 latency of under 150ms.

Scale and Constraints

Data Ingestion: Assume a peak of 100,000 events per second.

User Base: 300 million Daily Active Users (DAU).

This scale immediately tells us that counting every single event for every user is impossible. We must use approximation and pre-computation.

2. High-Level Design

Our architecture will be based on a stream-processing pipeline that separates the complex, slow computation of trends from the fast, simple serving of those trends to users. Here is a high-level overview of the system's architecture:

High-Level Design

3. Deep Dive: Core Components

A. Data Ingestion & Stream Processing

The foundation of our system is a pipeline that can handle the massive event stream.

Data Ingestion: We'll use a distributed message queue like Apache Kafka or AWS Kinesis. This acts as a durable, scalable buffer for the billions of events generated by users.

Stream Processor: A framework like Apache Flink or Spark Streaming will consume events from Kafka. This is where the "heavy lifting" of counting and trend detection happens.

B. The Trending Algorithm

How do we count trending topics from 100,000 events per second?

The Problem with Exact Counting: Storing and counting every event would require an enormous amount of memory and processing power. It's not feasible.

The Solution: Approximation Algorithms. We can use a probabilistic data structure like a Count-Min Sketch or a simple hash map within a sliding time window. The stream processor identifies high-frequency topics from the massive data stream.

Here's a look at the processing flow:

The Trending Algorithm

C. Trend Storage & Personalization

This is the read path, which needs to be extremely fast. The Personalization Service orchestrates the final ranking for each user.

Trend Cache: We'll use a low-latency in-memory store like Redis. We will store several pre-computed lists:

A single key for global_trends.

For social trends, a key for each user, social_trends:{user_id}.

User Interest Store: A separate database (perhaps a graph database like Neo4j or a document DB) will store a user's long-term interests.

Personalization Service: This is the API service the user's app calls. The interaction to generate a personalized feed is shown below.

Personalization Service Sequence

4. Follow-Up Considerations

Cold Start Problem: For a new user with no history or followers, the personalization service would gracefully fall back to showing only global trends.

Social Trend Computation: Calculating social trends in real-time is the hardest part. A common approach is an offline or near-real-time job that pre-computes these for active users and pushes the results to the Redis cache.

This design provides a robust, scalable, and personalized solution that meets the demanding requirements of a modern trending service.