Mpalo API / Palo Memory Engines

Operating Modes

Palo AI Memory Engines offer two distinct operational modes that you can select based on your application's needs. This choice defines how Palo stores and recalls information, giving you powerful control over your AI's behavior.

1. Personalization Mode

Focus: Adaptive, "humanlike" memory.

How it Works: This mode uses vector reconstruction to recall memories. This can result in "blurry" recall, where the core patterns and context are remembered, but the exact wording might shift, similar to human memory. It allows for creative connections and emergent behavior.

Best For: Conversational chatbots, personal assistants, and creative applications where a humanlike feel is more important than perfect factual recall.

Available on: Palo Nano, Palo Bloom, Palo Research.

2. Research Mode

Focus: 100% accurate, factual recall with zero hallucinations.

How it Works: This mode stores the original text as metadata alongside the vector. When recalling a memory, Palo retrieves this exact, unaltered metadata, bypassing reconstruction entirely. This guarantees that what you put in is exactly what you get out.

Best For: Enterprise knowledge bases, legal or medical Q&A bots, technical documentation search—any application where factual precision is non-negotiable.

Available on: Palo Research.

Featured AI Memory Engines at a Glance

All Palo engines are exceptionally fast and cost-effective, designed to provide a powerful memory layer for external LLMs.

Palo Nano

Ultra-fast AI memory engine for LLMs, enabling quick contextual recall for interactions supporting ~4096+ token context windows.

Learn more »

Palo Bloom

Versatile AI memory engine for LLMs, offering a balance of deeper memory and performance for interactions supporting ~8192+ token context windows.

Learn more »

Palo Research

Advanced AI memory engine for LLMs, providing highly reliable, accurate, and deep recall for complex applications with significantly larger context interactions.

Learn more »

Memory Engine Comparison

Feature	Palo Nano	Palo Bloom	Palo Research
API Name	`mpalo-palo-nano`	`mpalo-palo`	`mpalo-palo-research`
Primary Function	Fast contextual memory for LLMs	Balanced memory & performance	Advanced, deep memory for complex applications, including factual recall
Supported Context (per Input)	~4096+ tokens	~8192+ tokens	Significantly Larger
Key Memory Features	Episodic Recall, Semantic Search	Enhanced Recall & Search, Basic Relationship Linking	Memory Mapping, Advanced Accurate Recall, Traversal, Fine-tuning
Primary Use Cases	Simple chatbots, CLI tools, basic personalization.	Personal assistants, mobile apps, educational tools.	Enterprise knowledge bases, complex robotics, legal/medical Q&A.
Operational Mode(s)	Personalization	Personalization	Personalization & Research
Performance	Exceptionally Fast	Very Fast	Fast, optimized for depth & accuracy
Learn More	Details »	Details »	Details »

Palo Nano

API Name: mpalo-palo-nano

Palo Nano is an exceptionally fast and cost-effective AI memory engine designed to augment external LLMs. It provides essential episodic memory, enabling quick contextual recall for LLM-driven applications supporting interaction context windows of approximately 4096 tokens or more. Ideal for scenarios requiring rapid, memory-enhanced responses with minimal latency and resource usage. Operates primarily in "Personalization Mode."

Key Features & Specifications:

Function: AI Memory Engine for external LLMs.

LLM Interaction Context: ~4096+ tokens.

Multilingual Memory: Stores/retrieves text in various languages (primarily English optimized) for processing by an external LLM.

Vision Memory: Stores references/metadata for images (e.g., URLs, identifiers). External LLM handles actual image processing.

Memory Management for LLM: Basic Episodic Recall (specific past interactions), simple semantic search over recent memories to provide context to the LLM.

Performance: Exceptionally Fast & Cost-Effective. Low latency.

Memory Capacity: Optimized for short-term context, recent interactions (e.g., single conversation, few dozen memory entries).

Primary Mode: Personalization.

Use Cases: Augmenting simple customer service LLM bots, FAQ retrieval with LLMs, CLI tool context for LLM interaction, basic in-app LLM user preference storage.

Code Integration: Accessible via REST API. SDKs (Python, JavaScript, Java) planned. See SDKs page.

Palo Bloom

API Name: mpalo-palo

Palo is a versatile and exceptionally fast, cost-effective AI memory engine that enhances external LLMs. It offers a balance of deeper memory capabilities and high performance, optimized for LLM applications on mobile/edge devices or those requiring robust memory for interaction context windows of approximately 8192 tokens or more. Operates primarily in "Personalization Mode."

Key Features & Specifications:

Function: AI Memory Engine for external LLMs.

LLM Interaction Context: ~8192+ tokens.

Multilingual Memory: Stores/retrieves text in a broader range of languages for processing by an external LLM. Good support for English & major European languages.

Vision Memory: Stores references/metadata for images (e.g., for images up to 512px as reference). External LLM handles actual image processing.

Memory Management for LLM: Enhanced Episodic Recall, improved Semantic Search over a larger memory span, basic relationship linking within stored memories to provide rich context to the LLM.

Performance: Exceptionally Fast & Cost-Effective. Low to medium latency.

Memory Capacity: Suitable for medium-term context, user preferences over multiple sessions (few hundred memory entries).

Primary Mode: Personalization.

Use Cases: Augmenting LLM-powered personal assistants, mobile LLM applications, smarter IoT device interactions with LLMs, context-aware educational tools using LLMs.

Code Integration: Accessible via REST API. SDKs (Python, JavaScript, Java) planned. See SDKs page.

Palo Research

API Name: mpalo-palo-research

Palo Research is an advanced AI memory engine for external LLMs. Engineered for complex applications, it provides highly reliable and profound memory recall. It integrates sophisticated features like Memory Mapping and Memory Traversal for comprehensive semantic network building, ensuring nuanced and dependable context for the external LLM.

Key Features & Specifications:

Function: Advanced AI Memory Engine for external LLMs.

LLM Interaction Context: Significantly larger than Palo.

Memory Management for LLM: Memory Mapping (interconnected knowledge graphs), Memory Traversal (navigating complex event chains), and options for either adaptive or 100% accurate recall.

Primary Modes: Personalization & Research Mode (User Selectable).

Use Cases: Augmenting enterprise knowledge management LLMs, advanced research assistants, sophisticated robotics requiring long-term learning, complex customer support with deep history.

Multilingual Memory: Extensive multilingual text storage/retrieval for external LLM processing, with strong cross-lingual memory linking capabilities.

Vision Memory: Stores references/metadata for images (e.g., for images >=1024px as reference) and supports semantic links between visual and textual memories. External LLM handles actual image processing.

Performance: Exceptionally Fast & Cost-Effective, with latency optimized for depth and accuracy of recall.

Memory Capacity: Designed for extensive knowledge bases, long-term user/system memory (thousands to millions of entries).

Code Integration: Accessible via REST API. SDKs (Python, JavaScript, Java) planned. Supports complex data structures. See SDKs page.

Comprehensive Pricing Page Analysis: Mpalo Engine Pricing & Features

An expert-level assessment synthesizing current market intelligence, competitive analysis, and strategic positioning recommendations across business economics, technical clarity, value proposition, and gaps/opportunities.

1. Business Economics & Pricing Structure Validation

Pricing Model Architecture: Sound and Competitive

Mpalo's blended input/output rate approach is mathematically defensible and market-competitive. The architecture across tiers is internally consistent:

Engine	Blended Rate	Input/Output Split	Assessment
Mini	$0.3/1M	54/46	Aggressively priced for experimentation
Palo Bloom	$0.9/1M	44/56	Mid-market sweet spot
Palo Research	$2.1/1M	46/54	Enterprise reasoning tier
Palo Research	$2.9/1M	45/55	Flagship research-grade

The 45–56% output-to-input cost ratio is realistic for LLM inference, where generation is 2–5x more computationally expensive than processing input tokens.

Critical Pricing Comparison Issue: GPT Baseline Considerations

Current Competitive Reality (December 2025):

Gemini 2.5 Flash is functionally equivalent to Palo Bloom on price ($0.375/1M vs. $0.9/1M) but offers 1M token context windows without memory overhead.
GPT-4o mini ($0.375/1M) is cheaper than Palo Bloom if memory features aren't needed.
Claude 3.5 Sonnet ($11/1M) is significantly more expensive—Mpalo's real competitive set isn't GPT-4o or Claude, it's Gemini 2.5 Flash and GPT-4o mini on cost, with differentiation on memory architecture.

💡 Recommendation:

Update comparisons to acknowledge Gemini's price parity and position Mpalo's memory as the differentiator, not price alone.

Memory Feature Pricing: Well-Calibrated

Memory costs—60% of blended rate for Traversal, 80% for Mapping—are substantially cheaper than building equivalent infrastructure separately:

Approach	Monthly Cost (10M docs)	Latency	Ownership
Mpalo Memory Traversal	~$540 (at Bloom rates)	Integrated	Managed
OpenAI Embeddings + Pinecone	$12,000–$42,000/year storage only	API roundtrip	DIY
Weaviate/Milvus self-hosted	~$5,000/year infra + eng time	Milliseconds	Operational burden

The bundled approach (memory as inference cost, not storage cost) is architecturally superior and financially efficient.

💡 Suggested Addition:

"Traditional vector DB infrastructure for this memory capacity would cost $X–$Y annually; Mpalo memory features cost proportional to usage."

2. Technical Clarity & Feature Specifications Assessment

Strengths: Image Processing Specifications (Best-in-Class)

Image specs are transparent, comprehensive, and competitive:

Tier	Max Size	Formats	Per-Call Limit	Cost
Mini	5 MB	JPG, PNG	1	$0.0005
Bloom	10 MB	JPG, PNG, GIF	3	$0.001
Palo Research	50 MB	JPG, PNG, GIF, WebP, TIFF, BMP	10	$0.003
Palo Research	100 MB	All + RAW	25	$0.015

This granularity matches Claude 3.5 and exceeds Gemini 2.5 on transparency. The feature limitations (non-Latin text issues, rotation problems, color-dependency) build credibility through honesty.

Critical Gap: Audio Processing

Status: "Coming Soon" across all fields. This creates ambiguity about timeline and pricing expectations.

Market context: Google Gemini launched audio input/output at $0.30–$1.00 input, $2.00–$12.00 output per 1M tokens; OpenAI audio runs $40/1M input, $80/1M output.

💡 Recommendation:

Either (a) remove audio specs entirely (signals incompleteness less negatively), or (b) add a timeline and pricing estimate (e.g., "Q2 2026 Beta; expected pricing: $1.50–$3.00/1M input, $6.00–$10.00/1M output").

Confusing Positioning: "Palo Output" Abstraction

The statement "Palo Output is a concise, abstract summary of the input, which significantly reduces your output token costs" is potentially misleading:

Wording suggests output reduction when it means input summarization/compression
No quantification or example of the abstraction provided

💡 Recommendation: Rename to "Context compression"

Example: 50K token raw context → 15K token compressed

Typical savings: 60–75% for structured data, 40–50% for unstructured

Code/research data processed fully; HTML/JSON abstracted intelligently

Result: Palo Research and Palo Research process pre-compressed context at 40–60% cost vs. raw input

3. Value Proposition Analysis: "Memory That Pays for Itself"

The Thesis Is Sound, Evidence Is Weak

The core argument—persistent memory reduces redundant re-prompting, lowering total cost—is backed by market research:

Enterprises with AI memory systems show 3x higher user adoption and 2.5x better task completion accuracy
AI agents with long-term memory show 78% improvement in complex task execution
Memory-optimized systems reduce API spend by 30–60% through fewer redundant context passes

But the pricing page provides zero quantification of this benefit.

Missing: Total Cost of Ownership Comparisons

At face value, Palo Bloom ($0.9/1M) is 2.4x more expensive than GPT-4o mini ($0.375/1M). The pricing page doesn't explain when Bloom's memory features justify this premium.

💡 Suggested Narrative:

Example: Customer runs a personalized recommendation engine with 100K daily users, each averaging 5 turns per session. Without memory, every turn re-inputs 10K tokens of user history. With Mpalo's Memory Traversal, only new queries are input (1K tokens); history is accessed via memory search (~0.5K traversal tokens).

Cost without memory: 100K users × 5 turns × 10K tokens = 5B tokens/day = $1,875/day (GPT-4o mini)

Cost with Mpalo: 100K users × 5 turns × 1.5K tokens = 750M tokens/day + 5B traversal = $675/day (Bloom)

Savings: $1,200/day = $36K/month

Missing: Competitive Differentiation on Memory

The page positions Mpalo against GPT-4o and Claude generically. What it doesn't highlight is architectural differentiation:

Aspect	Mpalo	GPT-4o	Claude 3.5	Gemini 2.5
Memory type	Persistent (episodic + temporal)	Context window only	Context window only	Context window only
Context limit	Unlimited (via memory)	128K	200K	1M
Recall pattern	Semantic search + temporal ordering	Sequential	Sequential	Sequential
Cost model	Proportional to usage	Per-token input	Per-token input	Per-token input
Use case advantage	Multi-session personalization, long-term reasoning	Single long conversation	Single long conversation	Long documents in one session

True market position: You're not competing on base LLM cost; you're competing on memory architecture for multi-session, episodic applications. The pricing page should make this explicit.

4. Critical Gaps & Missed Opportunities

A. Pricing Gaps

1. Addon Pricing Missing

Custom Connections, Secure Tunnel, Private Memory Spaces mentioned but no pricing. Competitors include these in standard tiers or charge $10–50/month. Add a pricing table or clarify if included in Business/Enterprise plans.

2. Storage Costs Not Quantified

"Memory Storage costs depend on your chosen Provider (BYOVS model)" — unclear to customers. Add context: "You choose your vector storage provider (Pinecone, Weaviate, Milvus). Storage costs typically $0.10–$0.50/GB-month. A 10M-document knowledge base (100 GB) costs $10K–$50K/year storage."

3. Rate Limits and Burstability Undefined

Architect plan: "120M tokens/month" — is this a hard cap or soft limit? Recommendation: "Hard cap; overages auto-billed at $X/1M tokens, or auto-upgrade to Business plan ($35/user/mo)."

4. Latency SLAs and Performance Specs Missing

What's the p50/p99 latency for memory traversal? Concurrent request limits? Failover guarantees? Add a Performance tier table.

B. Feature Definition Gaps

1. Memory Traversal vs. Mapping Trade-offs Unclear

Suggested guidance:

Traversal: 60% cost, Episodic recall only. Best for: chatbots, FAQs, single-session personalization (low cost). Latency: ~50ms.

Mapping: 80% cost, Episodic + Temporal recall. Best for: multi-week reasoning, compliance auditing, long-term user profiling (higher accuracy). Latency: ~200ms.

Add a decision matrix or recommendation engine. Customers will default to the cheaper option without guidance.

C. Positioning Gaps

1. No Enterprise Tier Details

Menu shows "Business ($35/user/mo)" but no Enterprise tier specs. What's included? Minimum seat count? SLA terms? Dedicated support?

2. No Adoption Path or Use Case Guidance

Suggested guidance:

Mini: Experimentation, non-critical features ($0.3/1M)

Bloom: Production apps, memory-enabled (recommended, $0.9/1M)

Palo Research: Complex reasoning, research, high accuracy ($2.1/1M)

Palo Research: Cutting-edge research, publishing ($2.9/1M)

Synthesis: Scorecard & Recommendations

Dimension	Rating	Status
Pricing Structure	8/10	Economically sound; comparison outdated
Technical Specs	7/10	Excellent vision specs; audio incomplete; Palo confusing
Value Messaging	6/10	Strong thesis, weak on customer quantification
Completeness	5/10	Missing addon pricing, storage, SLAs, enterprise tier
Clarity	6.5/10	Good granularity; poor guidance and contextualization
OVERALL	6.5/10	Technically sound; needs positioning and marketing refinement

Top 3 Quick Wins

1. Update Competitive Comparisons

Acknowledge Gemini 2.5 Flash price parity; position memory as differentiator, not cost.

2. Add TCO Calculator

Show "Without memory: $X/month; with Mpalo memory: $Y/month; savings by use case."

3. Clarify Memory Trade-offs

Add decision matrix (Traversal vs. Mapping); include latency/cost trade-offs; provide recommendation logic.

Strategic Priority: Positioning for Product-Market Fit

Mpalo's true market advantage isn't price—it's persistent, episodic memory baked into inference, which is architecturally different from RAG or context windows. The pricing page should emphasize this as the reason to choose Mpalo, not the price alone. This repositioning will reduce customer acquisition friction and justify the Bloom tier's premium over Gemini.

Sources

Based on current 2025 LLM pricing, vector database cost analyses, and enterprise AI memory adoption studies. References include pricepertoken.com, OpenAI platform pricing, Anthropic documentation, Google AI pricing, and industry research reports.

Keep in Mind

All engines come with either a Personalization Mode, which offers humanlike blurry memory and forgetting, or a Research Mode that aims to enhance accuracy, knowledge breadth, and depth while ensuring that important details are not forgotten.

Our Mission, in short

At Mpalo, we stand against profit-over-people capitalism. The majority of profit is reinvested into research to ensure our technology remains consumer-friendly and transparent. We deliver modular, humanlike memory solutions that safeguard user data, prevent bias, and foster long-term, reliable storage of experiences.

Our commitment is to create technology that serves businesses, developers, and consumers alike—building trust, enhancing engagement, and igniting nostalgia through memory-driven AI that truly resonates.

Palo Memory Engines

Operating Modes

1. Personalization Mode

2. Research Mode

Featured AI Memory Engines at a Glance

Palo Nano

Palo Bloom

Palo Research

Memory Engine Comparison

Palo Nano

Key Features & Specifications:

Palo Bloom

Key Features & Specifications:

Palo Research

Key Features & Specifications:

Comprehensive Pricing Page Analysis: Mpalo Engine Pricing & Features

1. Business Economics & Pricing Structure Validation

Pricing Model Architecture: Sound and Competitive

Critical Pricing Comparison Issue: GPT Baseline Considerations

Memory Feature Pricing: Well-Calibrated

2. Technical Clarity & Feature Specifications Assessment

Strengths: Image Processing Specifications (Best-in-Class)

Critical Gap: Audio Processing

Confusing Positioning: "Palo Output" Abstraction

3. Value Proposition Analysis: "Memory That Pays for Itself"

The Thesis Is Sound, Evidence Is Weak

Missing: Total Cost of Ownership Comparisons

Missing: Competitive Differentiation on Memory

4. Critical Gaps & Missed Opportunities

A. Pricing Gaps

B. Feature Definition Gaps

C. Positioning Gaps

Synthesis: Scorecard & Recommendations

Top 3 Quick Wins

Strategic Priority: Positioning for Product-Market Fit

Sources

Keep in Mind

Our Mission, in short

Get Started

Mind Platform

Quickstart

Memory Templates

Was this page helpful?