Explain the Phoenix ML model architecture for X recommendations. Use when users ask about embeddings, transformers, how predictions work, or ML model details.
Use the skills CLI to install this skill with one command. Auto-detects all installed AI assistants.
Method 1 - skills CLI
npx skills i CloudAI-X/x-algo-skills/x-algo-mlMethod 2 - openskills (supports sync & update)
npx openskills install CloudAI-X/x-algo-skillsAuto-detects Claude Code, Cursor, Codex CLI, Gemini CLI, and more. One install, works everywhere.
Installation Path
Download and extract to one of the following locations:
No setup needed. Let our cloud agents run this skill for you.
Select Provider
Select Model
Best for coding tasks
No setup required
The X recommendation system uses Phoenix, a transformer-based ML system for predicting user engagement. It operates in two stages: retrieval and ranking.
┌─────────────────────────────────────────────────────────────────────────────────┐
│ RECOMMENDATION PIPELINE │
├─────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ │ │ │ │ │ │
│ │ User │────▶│ STAGE 1: │────▶│ STAGE 2: │────▶ Feed│
│ │ Request │ │ RETRIEVAL │ │ RANKING │ │
│ │ │ │ (Two-Tower) │ │ (Transformer) │ │
│ └──────────┘ │ │ │ │ │
│ │ Millions → 1000s │ │ 1000s → Ranked │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
Efficiently narrows millions of candidates to thousands using approximate nearest neighbor search.
[B, D][N, D]User Tower Candidate Tower
│ │
▼ ▼
[B, D] user emb [N, D] all posts
│ │
└───── dot product ───┘
│
▼
Top-K by similarity
Scores the retrieved candidates using a transformer that predicts multiple engagement actions.
# phoenix/recsys_model.py
@dataclass
class PhoenixModelConfig:
model: TransformerConfig # Grok-1 based transformer
emb_size: int # Embedding dimension D
num_actions: int # 18 action types
history_seq_len: int = 128 # User history length
candidate_seq_len: int = 32 # Candidates per batch
product_surface_vocab_size: int =
class RecsysBatch(NamedTuple):
# User identification
user_hashes: ArrayLike # [B, num_user_hashes]
# User engagement history
history_post_hashes: ArrayLike # [B, S, num_item_hashes]
history_author_hashes: ArrayLike # [B, S, num_author_hashes]
history_actions: ArrayLike # [B, S, num_actions]
history_product_surface: ArrayLike # [B, S]
# Candidates to score
candidate_post_hashes: ArrayLike # [B, C, num_item_hashes]
candidate_author_hashes: ArrayLike
Multiple hash functions map IDs to embedding tables:
@dataclass
class HashConfig:
num_user_hashes: int = 2 # Hash user ID 2 ways
num_item_hashes: int = 2 # Hash post ID 2 ways
num_author_hashes: int = 2 # Hash author ID 2 waysWhy hashes?
Each entity type has a "reduce" function that combines hash embeddings:
# User: Concatenate hash embeddings → project to D
def block_user_reduce(...):
# [B, num_user_hashes, D] → [B, 1, num_user_hashes * D] → [B, 1, D]
user_embedding = user_embeddings.reshape((B, 1, num_user_hashes * D))
user_embedding = jnp.dot(user_embedding, proj_mat_1) # Project down
return user_embedding, user_padding_mask
# History: Combine post + author + actions + product_surface
def block_history_reduce
Final input is concatenation of:
[User (1)] + [History (S)] + [Candidates (C)]
│ │ │
▼ ▼ ▼
[B, 1, D] [B, S, D] [B, C, D]
╲ │ ╱
╲ │ ╱
[B, 1+S+C, D]
Critical design: Candidates cannot attend to each other, only to user + history.
ATTENTION MASK
Keys (what we attend TO)
─────────────────────────────────────────────▶
│ User │ History (S) │ Candidates (C) │
┌────┼──────┼───────────────────┼─────────────────────┤
Q │ U │ ✓ │ ✓ ✓ ✓ ✓ │ ✗ ✗ ✗ ✗ │
u ├────┼──────┼───────────────────┼─────────────────────┤
e │ H │ ✓ │ ✓ ✓ ✓ ✓ │ ✗ ✗ ✗ ✗ │
r │ i │ ✓ │ ✓ ✓ ✓ ✓ │ ✗ ✗ ✗ ✗ │
i │ s │ ✓ │ ✓ ✓ ✓ ✓ │ ✗ ✗ ✗ ✗ │
e │ t │ ✓ │ ✓ ✓ ✓ ✓ │ ✗ ✗ ✗ ✗ │
s ├────┼──────┼───────────────────┼─────────────────────┤
│ C │ ✓ │ ✓ ✓ ✓ ✓ │ ✓ ✗ ✗ ✗ │
│ │ a │ ✓ │ ✓ ✓ ✓ ✓ │ ✗ ✓ ✗ ✗ │
│ │ n │ ✓ │ ✓ ✓ ✓ ✓ │ ✗ ✗ ✓ ✗ │
▼ │ d │ ✓ │ ✓ ✓ ✓ ✓ │ ✗ ✗ ✗ ✓ │
└────┴──────┴───────────────────┴─────────────────────┘
✓ = Can attend ✗ = Cannot attend (diagonal only for candidates)
Why candidate isolation?
def __call__(self, batch, recsys_embeddings) -> RecsysModelOutput:
# 1. Build combined embeddings
embeddings, padding_mask, candidate_start = self.build_inputs(batch, recsys_embeddings)
# 2. Pass through transformer (with candidate isolation mask)
model_output = self.model(
embeddings,
padding_mask,
candidate_start_offset=candidate_start, # For attention masking
)
# 3. Extract candidate outputs
Output Shape: [B, num_candidates, num_actions]
│
▼
┌─────────────────────────────────────────────┐
│ Like │ Reply │ Retweet │ Quote │ ... (18) │
└─────────────────────────────────────────────┘
Each output is a log-probability. Convert to probability:
probability = exp(log_prob)History actions are encoded as signed vectors:
def _get_action_embeddings(self, actions):
# actions: [B, S, num_actions] multi-hot vector
actions_signed = (2 * actions - 1) # 0→-1, 1→+1
action_emb = jnp.dot(actions_signed, action_projection)
return action_embThis encodes "did action" (+1) vs "didn't do action" (-1) for each action type.
Where the user engaged (home feed, search, notifications, etc.):
def _single_hot_to_embeddings(self, input, vocab_size, emb_size, name):
# Standard embedding lookup table
embedding_table = hk.get_parameter(name, [vocab_size, emb_size])
input_one_hot = jax.nn.one_hot(input, vocab_size)
return jnp.dot(input_one_hot, embedding_table)The sample transformer implementation is ported from the Grok-1 open source release by xAI. The core transformer architecture comes from Grok-1, adapted for recommendation system use cases with custom input embeddings and attention masking for candidate isolation.
/x-algo-engagement - The 18 action types the model predicts/x-algo-scoring - How predictions become weighted scores/x-algo-pipeline - Where ML fits in the full system