Skip to content
Engines

Palo Memory Engines

Welcome to the detailed documentation for Mpalo's Palo AI Memory Engines. Each engine is designed to provide unique capabilities for integrating persistent, context-aware memory into your applications. Below, you'll find comprehensive information on Palo Mini, Palo Bloom, Palo Research, and Palo Research, including their features, API names, technical specifications, and operational modes.

Please note: The previously announced "Palo Large" model is currently postponed as we focus on refining and enhancing our current suite of offerings.

Operating Modes

Palo AI Memory Engines offer two distinct operational modes that you can select based on your application's needs. This choice defines how Palo stores and recalls information, giving you powerful control over your AI's behavior.

1. Personalization Mode

Focus: Adaptive, "humanlike" memory.

How it Works: This mode uses vector reconstruction to recall memories. This can result in "blurry" recall, where the core patterns and context are remembered, but the exact wording might shift, similar to human memory. It allows for creative connections and emergent behavior.

Best For: Conversational chatbots, personal assistants, and creative applications where a humanlike feel is more important than perfect factual recall.

Available on: Palo Mini, Palo Bloom, Palo Research.

2. Research Mode

Focus: 100% accurate, factual recall with zero hallucinations.

How it Works: This mode stores the original text as metadata alongside the vector. When recalling a memory, Palo retrieves this exact, unaltered metadata, bypassing reconstruction entirely. This guarantees that what you put in is exactly what you get out.

Best For: Enterprise knowledge bases, legal or medical Q&A bots, technical documentation search—any application where factual precision is non-negotiable.

Available on: Palo Research, Palo Research.

Memory Engine Comparison

Feature Palo Mini Palo Bloom Palo Research Palo Research
API Name palo-lite palo palo-770 palo-DEEP-R
Primary Function Fast contextual memory for LLMs Balanced memory & performance Advanced, deep memory for complex applications Specialized memory for 100% accurate factual recall
Supported Context (per Input) ~4096+ tokens ~8192+ tokens Significantly Larger Significantly Larger
Key Memory Features Episodic Recall, Semantic Search Enhanced Recall & Search, Basic Relationship Linking Memory Mapping, Advanced Accurate Recall, Traversal All 770 Features + Specialized Fine-tuning
Primary Use Cases Simple chatbots, CLI tools, basic personalization. Personal assistants, mobile apps, educational tools. Enterprise knowledge bases, complex robotics, advanced support. Legal/Medical Q&A, compliance checks, technical lookups.
Operational Mode(s) Personalization Personalization Personalization & Research Personalization & Research
Performance Exceptionally Fast Very Fast Fast, optimized for depth Fast, optimized for accuracy
Learn More Details » Details » Details » Details »

Palo Mini

API Name: palo-lite

Palo Mini is an exceptionally fast and cost-effective AI memory engine designed to augment external LLMs. It provides essential episodic memory, enabling quick contextual recall for LLM-driven applications supporting interaction context windows of approximately 4096 tokens or more. Ideal for scenarios requiring rapid, memory-enhanced responses with minimal latency and resource usage. Operates primarily in "Personalization Mode."

Key Features & Specifications:

Function: AI Memory Engine for external LLMs.
LLM Interaction Context: ~4096+ tokens.
Multilingual Memory: Stores/retrieves text in various languages (primarily English optimized) for processing by an external LLM.
Vision Memory: Stores references/metadata for images (e.g., URLs, identifiers). External LLM handles actual image processing.
Memory Management for LLM: Basic Episodic Recall (specific past interactions), simple semantic search over recent memories to provide context to the LLM.
Performance: Exceptionally Fast & Cost-Effective. Low latency.
Memory Capacity: Optimized for short-term context, recent interactions (e.g., single conversation, few dozen memory entries).
Primary Mode: Personalization.
Use Cases: Augmenting simple customer service LLM bots, FAQ retrieval with LLMs, CLI tool context for LLM interaction, basic in-app LLM user preference storage.
Code Integration: Accessible via REST API. SDKs (Python, JavaScript, Java) planned. See SDKs page.

Palo Bloom

API Name: palo

Palo is a versatile and exceptionally fast, cost-effective AI memory engine that enhances external LLMs. It offers a balance of deeper memory capabilities and high performance, optimized for LLM applications on mobile/edge devices or those requiring robust memory for interaction context windows of approximately 8192 tokens or more. Operates primarily in "Personalization Mode."

Key Features & Specifications:

Function: AI Memory Engine for external LLMs.
LLM Interaction Context: ~8192+ tokens.
Multilingual Memory: Stores/retrieves text in a broader range of languages for processing by an external LLM. Good support for English & major European languages.
Vision Memory: Stores references/metadata for images (e.g., for images up to 512px as reference). External LLM handles actual image processing.
Memory Management for LLM: Enhanced Episodic Recall, improved Semantic Search over a larger memory span, basic relationship linking within stored memories to provide rich context to the LLM.
Performance: Exceptionally Fast & Cost-Effective. Low to medium latency.
Memory Capacity: Suitable for medium-term context, user preferences over multiple sessions (few hundred memory entries).
Primary Mode: Personalization.
Use Cases: Augmenting LLM-powered personal assistants, mobile LLM applications, smarter IoT device interactions with LLMs, context-aware educational tools using LLMs.
Code Integration: Accessible via REST API. SDKs (Python, JavaScript, Java) planned. See SDKs page.

Palo Research

API Name: palo-770

Palo Research is an advanced AI memory engine for external LLMs. Engineered for complex applications, it provides highly reliable and profound memory recall. It integrates sophisticated features like Memory Mapping and Memory Traversal for comprehensive semantic network building, ensuring nuanced and dependable context for the external LLM.

Key Features & Specifications:

Function: Advanced AI Memory Engine for external LLMs.
LLM Interaction Context: Significantly larger than Palo.
Memory Management for LLM: Memory Mapping (interconnected knowledge graphs), Memory Traversal (navigating complex event chains), and options for either adaptive or 100% accurate recall.
Primary Modes: Personalization & Research Mode (User Selectable).
Use Cases: Augmenting enterprise knowledge management LLMs, advanced research assistants, sophisticated robotics requiring long-term learning, complex customer support with deep history.
Multilingual Memory: Extensive multilingual text storage/retrieval for external LLM processing, with strong cross-lingual memory linking capabilities.
Vision Memory: Stores references/metadata for images (e.g., for images >=1024px as reference) and supports semantic links between visual and textual memories. External LLM handles actual image processing.
Performance: Exceptionally Fast & Cost-Effective, with latency optimized for depth and accuracy of recall.
Memory Capacity: Designed for extensive knowledge bases, long-term user/system memory (thousands to millions of entries).
Code Integration: Accessible via REST API. SDKs (Python, JavaScript, Java) planned. Supports complex data structures. See SDKs page.

Palo Research

API Name: palo-DEEP-R

Palo Research is our premier memory engine, specialized for applications where 100% factual accuracy and verifiable recall are non-negotiable. It leverages the full power of the 770 engine and is fine-tuned for understanding and retrieving information from dense, specialized documents. It is the definitive choice for building mission-critical AI systems.

Key Features & Specifications:

Function: Specialized AI Memory Engine for high-fidelity, accurate recall with external LLMs.
Memory Management for LLM: All features of Palo Research, with a strong emphasis on providing verifiable, citation-ready context to the LLM, especially when used in Research Mode.
Primary Modes: Personalization & Research Mode (User Selectable).
Use Cases: Legal document analysis and e-discovery, medical record summarization and research, financial compliance checks, academic paper review, and building enterprise-grade "expert" assistants.
LLM Interaction Context: Same as Palo Research.
Multilingual Memory: Same as Palo Research; for evaluating multilingual memory retrieval.
Vision Memory: Same as Palo Research; for evaluating memory system with visual references. External LLM handles image processing.
Performance: Optimized for evaluation throughput and detailed logging rather than interactive latency. Exceptionally fast & cost-effective for its purpose.
Memory Capacity: Suitable for very large evaluation datasets.
Code Integration: Accessible via REST API, often used with data analysis and LLM evaluation tools (Python predominant).

Comprehensive Pricing Page Analysis: Mpalo Engine Pricing & Features

An expert-level assessment synthesizing current market intelligence, competitive analysis, and strategic positioning recommendations across business economics, technical clarity, value proposition, and gaps/opportunities.

1. Business Economics & Pricing Structure Validation

Pricing Model Architecture: Sound and Competitive

Mpalo's blended input/output rate approach is mathematically defensible and market-competitive. The architecture across tiers is internally consistent:

Engine Blended Rate Input/Output Split Assessment
Mini $0.3/1M 54/46 Aggressively priced for experimentation
Palo Bloom $0.9/1M 44/56 Mid-market sweet spot
770 $2.1/1M 46/54 Enterprise reasoning tier
DEEP-R $2.9/1M 45/55 Flagship research-grade

The 45–56% output-to-input cost ratio is realistic for LLM inference, where generation is 2–5x more computationally expensive than processing input tokens.

Critical Pricing Comparison Issue: GPT Baseline Considerations

Current Competitive Reality (December 2025):

  • Gemini 2.5 Flash is functionally equivalent to Palo Bloom on price ($0.375/1M vs. $0.9/1M) but offers 1M token context windows without memory overhead.
  • GPT-4o mini ($0.375/1M) is cheaper than Palo Bloom if memory features aren't needed.
  • Claude 3.5 Sonnet ($11/1M) is significantly more expensive—Mpalo's real competitive set isn't GPT-4o or Claude, it's Gemini 2.5 Flash and GPT-4o mini on cost, with differentiation on memory architecture.

💡 Recommendation:

Update comparisons to acknowledge Gemini's price parity and position Mpalo's memory as the differentiator, not price alone.

Memory Feature Pricing: Well-Calibrated

Memory costs—60% of blended rate for Traversal, 80% for Mapping—are substantially cheaper than building equivalent infrastructure separately:

Approach Monthly Cost (10M docs) Latency Ownership
Mpalo Memory Traversal ~$540 (at Bloom rates) Integrated Managed
OpenAI Embeddings + Pinecone $12,000–$42,000/year storage only API roundtrip DIY
Weaviate/Milvus self-hosted ~$5,000/year infra + eng time Milliseconds Operational burden

The bundled approach (memory as inference cost, not storage cost) is architecturally superior and financially efficient.

💡 Suggested Addition:

"Traditional vector DB infrastructure for this memory capacity would cost $X–$Y annually; Mpalo memory features cost proportional to usage."

2. Technical Clarity & Feature Specifications Assessment

Strengths: Image Processing Specifications (Best-in-Class)

Image specs are transparent, comprehensive, and competitive:

Tier Max Size Formats Per-Call Limit Cost
Mini 5 MB JPG, PNG 1 $0.0005
Bloom 10 MB JPG, PNG, GIF 3 $0.001
770 50 MB JPG, PNG, GIF, WebP, TIFF, BMP 10 $0.003
DEEP-R 100 MB All + RAW 25 $0.015

This granularity matches Claude 3.5 and exceeds Gemini 2.5 on transparency. The feature limitations (non-Latin text issues, rotation problems, color-dependency) build credibility through honesty.

Critical Gap: Audio Processing

Status: "Coming Soon" across all fields. This creates ambiguity about timeline and pricing expectations.

Market context: Google Gemini launched audio input/output at $0.30–$1.00 input, $2.00–$12.00 output per 1M tokens; OpenAI audio runs $40/1M input, $80/1M output.

💡 Recommendation:

Either (a) remove audio specs entirely (signals incompleteness less negatively), or (b) add a timeline and pricing estimate (e.g., "Q2 2026 Beta; expected pricing: $1.50–$3.00/1M input, $6.00–$10.00/1M output").

Confusing Positioning: "Palo Output" Abstraction

The statement "Palo Output is a concise, abstract summary of the input, which significantly reduces your output token costs" is potentially misleading:

  • Wording suggests output reduction when it means input summarization/compression
  • No quantification or example of the abstraction provided

💡 Recommendation: Rename to "Smart Context Compression"

Example: 50K token raw context → 15K token compressed

Typical savings: 60–75% for structured data, 40–50% for unstructured

Code/research data processed fully; HTML/JSON abstracted intelligently

Result: 770 and DEEP-R process pre-compressed context at 40–60% cost vs. raw input

3. Value Proposition Analysis: "Memory That Pays for Itself"

The Thesis Is Sound, Evidence Is Weak

The core argument—persistent memory reduces redundant re-prompting, lowering total cost—is backed by market research:

  • Enterprises with AI memory systems show 3x higher user adoption and 2.5x better task completion accuracy
  • AI agents with long-term memory show 78% improvement in complex task execution
  • Memory-optimized systems reduce API spend by 30–60% through fewer redundant context passes

But the pricing page provides zero quantification of this benefit.

Missing: Total Cost of Ownership Comparisons

At face value, Palo Bloom ($0.9/1M) is 2.4x more expensive than GPT-4o mini ($0.375/1M). The pricing page doesn't explain when Bloom's memory features justify this premium.

💡 Suggested Narrative:

Example: Customer runs a personalized recommendation engine with 100K daily users, each averaging 5 turns per session. Without memory, every turn re-inputs 10K tokens of user history. With Mpalo's Memory Traversal, only new queries are input (1K tokens); history is accessed via memory search (~0.5K traversal tokens).

Cost without memory: 100K users × 5 turns × 10K tokens = 5B tokens/day = $1,875/day (GPT-4o mini)

Cost with Mpalo: 100K users × 5 turns × 1.5K tokens = 750M tokens/day + 5B traversal = $675/day (Bloom)

Savings: $1,200/day = $36K/month

Missing: Competitive Differentiation on Memory

The page positions Mpalo against GPT-4o and Claude generically. What it doesn't highlight is architectural differentiation:

Aspect Mpalo GPT-4o Claude 3.5 Gemini 2.5
Memory type Persistent (episodic + temporal) Context window only Context window only Context window only
Context limit Unlimited (via memory) 128K 200K 1M
Recall pattern Semantic search + temporal ordering Sequential Sequential Sequential
Cost model Proportional to usage Per-token input Per-token input Per-token input
Use case advantage Multi-session personalization, long-term reasoning Single long conversation Single long conversation Long documents in one session

True market position: You're not competing on base LLM cost; you're competing on memory architecture for multi-session, episodic applications. The pricing page should make this explicit.

4. Critical Gaps & Missed Opportunities

A. Pricing Gaps

1. Addon Pricing Missing

Custom Connections, Secure Tunnel, Private Data Spaces mentioned but no pricing. Competitors include these in standard tiers or charge $10–50/month. Add a pricing table or clarify if included in Business/Enterprise plans.

2. Storage Costs Not Quantified

"Memory Storage costs depend on your chosen Provider (BYOVS model)" — unclear to customers. Add context: "You choose your vector storage provider (Pinecone, Weaviate, Milvus). Storage costs typically $0.10–$0.50/GB-month. A 10M-document knowledge base (100 GB) costs $10K–$50K/year storage."

3. Rate Limits and Burstability Undefined

Architect plan: "120M tokens/month" — is this a hard cap or soft limit? Recommendation: "Hard cap; overages auto-billed at $X/1M tokens, or auto-upgrade to Business plan ($35/user/mo)."

4. Latency SLAs and Performance Specs Missing

What's the p50/p99 latency for memory traversal? Concurrent request limits? Failover guarantees? Add a Performance tier table.

B. Feature Definition Gaps

1. Memory Traversal vs. Mapping Trade-offs Unclear

Suggested guidance:

Traversal: 60% cost, Episodic recall only. Best for: chatbots, FAQs, single-session personalization (low cost). Latency: ~50ms.

Mapping: 80% cost, Episodic + Temporal recall. Best for: multi-week reasoning, compliance auditing, long-term user profiling (higher accuracy). Latency: ~200ms.

Add a decision matrix or recommendation engine. Customers will default to the cheaper option without guidance.

C. Positioning Gaps

1. No Enterprise Tier Details

Menu shows "Business ($35/user/mo)" but no Enterprise tier specs. What's included? Minimum seat count? SLA terms? Dedicated support?

2. No Adoption Path or Use Case Guidance

Suggested guidance:

Mini: Experimentation, non-critical features ($0.3/1M)

Bloom: Production apps, memory-enabled (recommended, $0.9/1M)

770: Complex reasoning, research, high accuracy ($2.1/1M)

DEEP-R: Cutting-edge research, publishing ($2.9/1M)

Synthesis: Scorecard & Recommendations

Dimension Rating Status
Pricing Structure 8/10 Economically sound; comparison outdated
Technical Specs 7/10 Excellent vision specs; audio incomplete; Palo confusing
Value Messaging 6/10 Strong thesis, weak on customer quantification
Completeness 5/10 Missing addon pricing, storage, SLAs, enterprise tier
Clarity 6.5/10 Good granularity; poor guidance and contextualization
OVERALL 6.5/10 Technically sound; needs positioning and marketing refinement

Top 3 Quick Wins

1. Update Competitive Comparisons

Acknowledge Gemini 2.5 Flash price parity; position memory as differentiator, not cost.

2. Add TCO Calculator

Show "Without memory: $X/month; with Mpalo memory: $Y/month; savings by use case."

3. Clarify Memory Trade-offs

Add decision matrix (Traversal vs. Mapping); include latency/cost trade-offs; provide recommendation logic.

Strategic Priority: Positioning for Product-Market Fit

Mpalo's true market advantage isn't price—it's persistent, episodic memory baked into inference, which is architecturally different from RAG or context windows. The pricing page should emphasize this as the reason to choose Mpalo, not the price alone. This repositioning will reduce customer acquisition friction and justify the Bloom tier's premium over Gemini.

Sources

Based on current 2025 LLM pricing, vector database cost analyses, and enterprise AI memory adoption studies. References include pricepertoken.com, OpenAI platform pricing, Anthropic documentation, Google AI pricing, and industry research reports.

Keep in Mind

All engines come with either a Personalization Mode, which offers humanlike blurry memory and forgetting, or a Research Mode that aims to enhance accuracy, knowledge breadth, and depth while ensuring that important details are not forgotten.

Our Mission, in short

At Mpalo, we stand against profit-over-people capitalism. The majority of profit is reinvested into research to ensure our technology remains consumer-friendly and transparent. We deliver modular, humanlike memory solutions that safeguard user data, prevent bias, and foster long-term, reliable storage of experiences.

Our commitment is to create technology that serves businesses, developers, and consumers alike—building trust, enhancing engagement, and igniting nostalgia through memory-driven AI that truly resonates.