Federated Semantic Layer - AstroBee Documentation

Coming Soon — This page describes an architecture that is currently in development and not yet generally available. Contact us to learn more.

AstroBee is a semantic orchestration layer, not a data store. Users describe tasks in natural language; an AI agent translates them into queries executed directly on the customer’s own infrastructure, using the user’s own credentials. No customer data is ever ingested, copied, or permanently stored.

How It Works

Task → AI Agent → Semantic Layer → Query Planning → Credential-Delegated Execution → Ephemeral Join (if needed) → Results → Cleanup

Semantic Layer: Admins define a virtual business model — entities, properties, relationships — mapped to external tables (Snowflake, Databricks, Salesforce) or API endpoints (Gong, Google Calendar). No data is moved; these are pointers. Credential Delegation: Each user authenticates individually via OAuth to every connected source. Tokens are AES-256 encrypted at rest, decrypted only in-memory at query time, automatically refreshed, and never shared across users. AstroBee inherits each source’s native access control (Snowflake RBAC, Databricks Unity Catalog, Salesforce profiles) without replicating it.

Query Execution Paths

Query execution follows three paths depending on what the task touches:

Path	When	What Happens
A — Direct	Single warehouse	Dialect-specific SQL executes on source compute; only the result set crosses the boundary.
B — Extract	Single API	Call API with user’s token, transform response into tabular format in an ephemeral workspace, query, destroy.
C — Federate	Multiple sources	Push filtered queries to each source in parallel, load partial results into an ephemeral workspace, join locally, destroy.

Ephemeral Workspace: For Paths B and C, an in-memory workspace is created per query, isolated per user, and destroyed after the query completes (seconds). Brief result caching supports pagination, then is discarded.

Security Model

The architecture is zero-trust by design — AstroBee assumes no inherent rights to customer data.

No service accounts. Every query runs as a specific user with their permissions. If a user can’t access a table in Snowflake, they can’t access it through AstroBee — the warehouse rejects the query.
No permission duplication. No parallel ACL system to maintain or drift out of sync.
Blast radius is per-user. A compromised token exposes only that user’s scope.
Full audit trail. Every query logged with user identity, source, SQL/API call, and timestamp.
Data residency preserved. Customer data never leaves their cloud region.

Compliance: GDPR, HIPAA, and data residency requirements are satisfied structurally — not through policy — because AstroBee never holds the data.

Federation

Tasks that span systems are unanswerable by any single source. “Correlate Gong call sentiment with Salesforce win rates for deals closing this quarter” — Salesforce knows deal stages, Gong knows conversation quality, neither can answer alone. AstroBee extracts filtered subsets into an ephemeral workspace, joins on shared keys, computes the answer, and discards everything. Each source independently validates credentials — no permission bypass through federation. AstroBee acts as a translator and coordinator — never as a database. Customer data stays where it is, governed by the systems that already protect it.

The Vision: A Unified Data Layer

The ultimate goal is to create a semantic data layer that spans multiple systems — warehouses, lakehouses, CRMs, and SaaS APIs — enabling organizations to ask questions across their entire data ecosystem without moving or duplicating data. This vision is achieved through a phased approach:

Phase	Capability	Value
Phase 1	Single structured data source	Foundation: Prove the credential delegation model with warehouse queries
Phase 2	API data sources	Expansion: Extend to semi-structured data from SaaS applications
Phase 3	Federated queries across systems	Vision: Unified analytics across all data sources

Each phase builds on the previous, progressively expanding the types of data sources and query complexity supported while maintaining the core principles of zero data ingestion and native permission enforcement.

Deep Dives

Federated Query Layer

Core architecture, credential delegation, and the virtual semantic layer

API Data Sources

Dynamic extraction from Gong, Google Calendar, and other SaaS APIs

Cross-System Federation

Ephemeral joins across warehouses, lakehouses, and APIs

Security & Access Control

Multi-layer security model, zero-trust design, and compliance

​How It Works

​Query Execution Paths

​Security Model

​Federation

​The Vision: A Unified Data Layer

​Deep Dives

Federated Query Layer

API Data Sources

Cross-System Federation

Security & Access Control

How It Works

Query Execution Paths

Security Model

Federation

The Vision: A Unified Data Layer

Deep Dives