Palo Memory Engines
Welcome to the detailed documentation for Mpalo's Palo AI Memory Engines. Each engine is designed to provide unique capabilities for integrating persistent, context-aware memory into your applications. Below, you'll find comprehensive information on Palo Mini, Palo Bloom, Palo Research, and Palo Research, including their features, API names, technical specifications, and operational modes.
Please note: The previously announced "Palo Large" model is currently postponed as we focus on refining and enhancing our current suite of offerings.
Operating Modes
Palo AI Memory Engines offer two distinct operational modes that you can select based on your application's needs. This choice defines how Palo stores and recalls information, giving you powerful control over your AI's behavior.
1. Personalization Mode
Focus: Adaptive, "humanlike" memory.
How it Works: This mode uses vector reconstruction to recall memories. This can result in "blurry" recall, where the core patterns and context are remembered, but the exact wording might shift, similar to human memory. It allows for creative connections and emergent behavior.
Best For: Conversational chatbots, personal assistants, and creative applications where a humanlike feel is more important than perfect factual recall.
Available on: Palo Mini, Palo Bloom, Palo Research.
2. Research Mode
Focus: 100% accurate, factual recall with zero hallucinations.
How it Works: This mode stores the original text as metadata alongside the vector. When recalling a memory, Palo retrieves this exact, unaltered metadata, bypassing reconstruction entirely. This guarantees that what you put in is exactly what you get out.
Best For: Enterprise knowledge bases, legal or medical Q&A bots, technical documentation search—any application where factual precision is non-negotiable.
Available on: Palo Research, Palo Research.
Featured AI Memory Engines at a Glance
All Palo engines are exceptionally fast and cost-effective, designed to provide a powerful memory layer for external LLMs.
Palo Mini
Ultra-fast AI memory engine for LLMs, enabling quick contextual recall for interactions supporting ~4096+ token context windows.
Learn more »Palo Bloom
Versatile AI memory engine for LLMs, offering a balance of deeper memory and performance for interactions supporting ~8192+ token context windows.
Learn more »Palo Research
Advanced AI memory engine for LLMs, providing highly reliable, accurate, and deep recall for complex applications with significantly larger context interactions.
Learn more »Palo Research
Specialized for 100% accurate, factual recall. Ideal for enterprise knowledge bases, legal/medical docs, and applications where accuracy is non-negotiable.
Learn more »Memory Engine Comparison
| Feature | Palo Mini | Palo Bloom | Palo Research | Palo Research |
|---|---|---|---|---|
| API Name | palo-lite |
palo |
palo-770 |
palo-DEEP-R |
| Primary Function | Fast contextual memory for LLMs | Balanced memory & performance | Advanced, deep memory for complex applications | Specialized memory for 100% accurate factual recall |
| Supported Context (per Input) | ~4096+ tokens | ~8192+ tokens | Significantly Larger | Significantly Larger |
| Key Memory Features | Episodic Recall, Semantic Search | Enhanced Recall & Search, Basic Relationship Linking | Memory Mapping, Advanced Accurate Recall, Traversal | All 770 Features + Specialized Fine-tuning |
| Primary Use Cases | Simple chatbots, CLI tools, basic personalization. | Personal assistants, mobile apps, educational tools. | Enterprise knowledge bases, complex robotics, advanced support. | Legal/Medical Q&A, compliance checks, technical lookups. |
| Operational Mode(s) | Personalization | Personalization | Personalization & Research | Personalization & Research |
| Performance | Exceptionally Fast | Very Fast | Fast, optimized for depth | Fast, optimized for accuracy |
| Learn More | Details » | Details » | Details » | Details » |
Palo Mini
API Name: palo-lite
Palo Mini is an exceptionally fast and cost-effective AI memory engine designed to augment external LLMs. It provides essential episodic memory, enabling quick contextual recall for LLM-driven applications supporting interaction context windows of approximately 4096 tokens or more. Ideal for scenarios requiring rapid, memory-enhanced responses with minimal latency and resource usage. Operates primarily in "Personalization Mode."
Key Features & Specifications:
Palo Bloom
API Name: palo
Palo is a versatile and exceptionally fast, cost-effective AI memory engine that enhances external LLMs. It offers a balance of deeper memory capabilities and high performance, optimized for LLM applications on mobile/edge devices or those requiring robust memory for interaction context windows of approximately 8192 tokens or more. Operates primarily in "Personalization Mode."
Key Features & Specifications:
Palo Research
API Name: palo-770
Palo Research is an advanced AI memory engine for external LLMs. Engineered for complex applications, it provides highly reliable and profound memory recall. It integrates sophisticated features like Memory Mapping and Memory Traversal for comprehensive semantic network building, ensuring nuanced and dependable context for the external LLM.
Key Features & Specifications:
Palo Research
API Name: palo-DEEP-R
Palo Research is our premier memory engine, specialized for applications where 100% factual accuracy and verifiable recall are non-negotiable. It leverages the full power of the 770 engine and is fine-tuned for understanding and retrieving information from dense, specialized documents. It is the definitive choice for building mission-critical AI systems.
Key Features & Specifications:
Comprehensive Pricing Page Analysis: Mpalo Engine Pricing & Features
An expert-level assessment synthesizing current market intelligence, competitive analysis, and strategic positioning recommendations across business economics, technical clarity, value proposition, and gaps/opportunities.
1. Business Economics & Pricing Structure Validation
Pricing Model Architecture: Sound and Competitive
Mpalo's blended input/output rate approach is mathematically defensible and market-competitive. The architecture across tiers is internally consistent:
| Engine | Blended Rate | Input/Output Split | Assessment |
|---|---|---|---|
| Mini | $0.3/1M | 54/46 | Aggressively priced for experimentation |
| Palo Bloom | $0.9/1M | 44/56 | Mid-market sweet spot |
| 770 | $2.1/1M | 46/54 | Enterprise reasoning tier |
| DEEP-R | $2.9/1M | 45/55 | Flagship research-grade |
The 45–56% output-to-input cost ratio is realistic for LLM inference, where generation is 2–5x more computationally expensive than processing input tokens.
Critical Pricing Comparison Issue: GPT Baseline Considerations
Current Competitive Reality (December 2025):
- Gemini 2.5 Flash is functionally equivalent to Palo Bloom on price ($0.375/1M vs. $0.9/1M) but offers 1M token context windows without memory overhead.
- GPT-4o mini ($0.375/1M) is cheaper than Palo Bloom if memory features aren't needed.
- Claude 3.5 Sonnet ($11/1M) is significantly more expensive—Mpalo's real competitive set isn't GPT-4o or Claude, it's Gemini 2.5 Flash and GPT-4o mini on cost, with differentiation on memory architecture.
💡 Recommendation:
Update comparisons to acknowledge Gemini's price parity and position Mpalo's memory as the differentiator, not price alone.
Memory Feature Pricing: Well-Calibrated
Memory costs—60% of blended rate for Traversal, 80% for Mapping—are substantially cheaper than building equivalent infrastructure separately:
| Approach | Monthly Cost (10M docs) | Latency | Ownership |
|---|---|---|---|
| Mpalo Memory Traversal | ~$540 (at Bloom rates) | Integrated | Managed |
| OpenAI Embeddings + Pinecone | $12,000–$42,000/year storage only | API roundtrip | DIY |
| Weaviate/Milvus self-hosted | ~$5,000/year infra + eng time | Milliseconds | Operational burden |
The bundled approach (memory as inference cost, not storage cost) is architecturally superior and financially efficient.
💡 Suggested Addition:
"Traditional vector DB infrastructure for this memory capacity would cost $X–$Y annually; Mpalo memory features cost proportional to usage."
2. Technical Clarity & Feature Specifications Assessment
Strengths: Image Processing Specifications (Best-in-Class)
Image specs are transparent, comprehensive, and competitive:
| Tier | Max Size | Formats | Per-Call Limit | Cost |
|---|---|---|---|---|
| Mini | 5 MB | JPG, PNG | 1 | $0.0005 |
| Bloom | 10 MB | JPG, PNG, GIF | 3 | $0.001 |
| 770 | 50 MB | JPG, PNG, GIF, WebP, TIFF, BMP | 10 | $0.003 |
| DEEP-R | 100 MB | All + RAW | 25 | $0.015 |
This granularity matches Claude 3.5 and exceeds Gemini 2.5 on transparency. The feature limitations (non-Latin text issues, rotation problems, color-dependency) build credibility through honesty.
Critical Gap: Audio Processing
Status: "Coming Soon" across all fields. This creates ambiguity about timeline and pricing expectations.
Market context: Google Gemini launched audio input/output at $0.30–$1.00 input, $2.00–$12.00 output per 1M tokens; OpenAI audio runs $40/1M input, $80/1M output.
💡 Recommendation:
Either (a) remove audio specs entirely (signals incompleteness less negatively), or (b) add a timeline and pricing estimate (e.g., "Q2 2026 Beta; expected pricing: $1.50–$3.00/1M input, $6.00–$10.00/1M output").
Confusing Positioning: "Palo Output" Abstraction
The statement "Palo Output is a concise, abstract summary of the input, which significantly reduces your output token costs" is potentially misleading:
- Wording suggests output reduction when it means input summarization/compression
- No quantification or example of the abstraction provided
💡 Recommendation: Rename to "Smart Context Compression"
Example: 50K token raw context → 15K token compressed
Typical savings: 60–75% for structured data, 40–50% for unstructured
Code/research data processed fully; HTML/JSON abstracted intelligently
Result: 770 and DEEP-R process pre-compressed context at 40–60% cost vs. raw input
3. Value Proposition Analysis: "Memory That Pays for Itself"
The Thesis Is Sound, Evidence Is Weak
The core argument—persistent memory reduces redundant re-prompting, lowering total cost—is backed by market research:
- Enterprises with AI memory systems show 3x higher user adoption and 2.5x better task completion accuracy
- AI agents with long-term memory show 78% improvement in complex task execution
- Memory-optimized systems reduce API spend by 30–60% through fewer redundant context passes
But the pricing page provides zero quantification of this benefit.
Missing: Total Cost of Ownership Comparisons
At face value, Palo Bloom ($0.9/1M) is 2.4x more expensive than GPT-4o mini ($0.375/1M). The pricing page doesn't explain when Bloom's memory features justify this premium.
💡 Suggested Narrative:
Example: Customer runs a personalized recommendation engine with 100K daily users, each averaging 5 turns per session. Without memory, every turn re-inputs 10K tokens of user history. With Mpalo's Memory Traversal, only new queries are input (1K tokens); history is accessed via memory search (~0.5K traversal tokens).
Cost without memory: 100K users × 5 turns × 10K tokens = 5B tokens/day = $1,875/day (GPT-4o mini)
Cost with Mpalo: 100K users × 5 turns × 1.5K tokens = 750M tokens/day + 5B traversal = $675/day (Bloom)
Savings: $1,200/day = $36K/month
Missing: Competitive Differentiation on Memory
The page positions Mpalo against GPT-4o and Claude generically. What it doesn't highlight is architectural differentiation:
| Aspect | Mpalo | GPT-4o | Claude 3.5 | Gemini 2.5 |
|---|---|---|---|---|
| Memory type | Persistent (episodic + temporal) | Context window only | Context window only | Context window only |
| Context limit | Unlimited (via memory) | 128K | 200K | 1M |
| Recall pattern | Semantic search + temporal ordering | Sequential | Sequential | Sequential |
| Cost model | Proportional to usage | Per-token input | Per-token input | Per-token input |
| Use case advantage | Multi-session personalization, long-term reasoning | Single long conversation | Single long conversation | Long documents in one session |
True market position: You're not competing on base LLM cost; you're competing on memory architecture for multi-session, episodic applications. The pricing page should make this explicit.
4. Critical Gaps & Missed Opportunities
A. Pricing Gaps
1. Addon Pricing Missing
Custom Connections, Secure Tunnel, Private Data Spaces mentioned but no pricing. Competitors include these in standard tiers or charge $10–50/month. Add a pricing table or clarify if included in Business/Enterprise plans.
2. Storage Costs Not Quantified
"Memory Storage costs depend on your chosen Provider (BYOVS model)" — unclear to customers. Add context: "You choose your vector storage provider (Pinecone, Weaviate, Milvus). Storage costs typically $0.10–$0.50/GB-month. A 10M-document knowledge base (100 GB) costs $10K–$50K/year storage."
3. Rate Limits and Burstability Undefined
Architect plan: "120M tokens/month" — is this a hard cap or soft limit? Recommendation: "Hard cap; overages auto-billed at $X/1M tokens, or auto-upgrade to Business plan ($35/user/mo)."
4. Latency SLAs and Performance Specs Missing
What's the p50/p99 latency for memory traversal? Concurrent request limits? Failover guarantees? Add a Performance tier table.
B. Feature Definition Gaps
1. Memory Traversal vs. Mapping Trade-offs Unclear
Suggested guidance:
Traversal: 60% cost, Episodic recall only. Best for: chatbots, FAQs, single-session personalization (low cost). Latency: ~50ms.
Mapping: 80% cost, Episodic + Temporal recall. Best for: multi-week reasoning, compliance auditing, long-term user profiling (higher accuracy). Latency: ~200ms.
Add a decision matrix or recommendation engine. Customers will default to the cheaper option without guidance.
C. Positioning Gaps
1. No Enterprise Tier Details
Menu shows "Business ($35/user/mo)" but no Enterprise tier specs. What's included? Minimum seat count? SLA terms? Dedicated support?
2. No Adoption Path or Use Case Guidance
Suggested guidance:
Mini: Experimentation, non-critical features ($0.3/1M)
Bloom: Production apps, memory-enabled (recommended, $0.9/1M)
770: Complex reasoning, research, high accuracy ($2.1/1M)
DEEP-R: Cutting-edge research, publishing ($2.9/1M)
Synthesis: Scorecard & Recommendations
| Dimension | Rating | Status |
|---|---|---|
| Pricing Structure | 8/10 | Economically sound; comparison outdated |
| Technical Specs | 7/10 | Excellent vision specs; audio incomplete; Palo confusing |
| Value Messaging | 6/10 | Strong thesis, weak on customer quantification |
| Completeness | 5/10 | Missing addon pricing, storage, SLAs, enterprise tier |
| Clarity | 6.5/10 | Good granularity; poor guidance and contextualization |
| OVERALL | 6.5/10 | Technically sound; needs positioning and marketing refinement |
Top 3 Quick Wins
1. Update Competitive Comparisons
Acknowledge Gemini 2.5 Flash price parity; position memory as differentiator, not cost.
2. Add TCO Calculator
Show "Without memory: $X/month; with Mpalo memory: $Y/month; savings by use case."
3. Clarify Memory Trade-offs
Add decision matrix (Traversal vs. Mapping); include latency/cost trade-offs; provide recommendation logic.
Strategic Priority: Positioning for Product-Market Fit
Mpalo's true market advantage isn't price—it's persistent, episodic memory baked into inference, which is architecturally different from RAG or context windows. The pricing page should emphasize this as the reason to choose Mpalo, not the price alone. This repositioning will reduce customer acquisition friction and justify the Bloom tier's premium over Gemini.
Sources
Based on current 2025 LLM pricing, vector database cost analyses, and enterprise AI memory adoption studies. References include pricepertoken.com, OpenAI platform pricing, Anthropic documentation, Google AI pricing, and industry research reports.
Keep in Mind
All engines come with either a Personalization Mode, which offers humanlike blurry memory and forgetting, or a Research Mode that aims to enhance accuracy, knowledge breadth, and depth while ensuring that important details are not forgotten.
Our Mission, in short
At Mpalo, we stand against profit-over-people capitalism. The majority of profit is reinvested into research to ensure our technology remains consumer-friendly and transparent. We deliver modular, humanlike memory solutions that safeguard user data, prevent bias, and foster long-term, reliable storage of experiences.
Our commitment is to create technology that serves businesses, developers, and consumers alike—building trust, enhancing engagement, and igniting nostalgia through memory-driven AI that truly resonates.
Get Started
If you're new to Palo Bloom, start here to learn the essentials and make your first API call.
Was this page helpful?
Your feedback helps us improve our documentation.