Skip to main content
Coming Soon — This page describes an architecture that is currently in development and not yet generally available. Contact us to learn more.
AstroBee is a semantic orchestration layer, not a data store. Users describe tasks in natural language; an AI agent translates them into queries executed directly on the customer’s own infrastructure, using the user’s own credentials. No customer data is ever ingested, copied, or permanently stored.

How It Works

Task → AI Agent → Semantic Layer → Query Planning → Credential-Delegated Execution → Ephemeral Join (if needed) → Results → Cleanup
Semantic Layer: Admins define a virtual business model — entities, properties, relationships — mapped to external tables (Snowflake, Databricks, Salesforce) or API endpoints (Gong, Google Calendar). No data is moved; these are pointers. Credential Delegation: Each user authenticates individually via OAuth to every connected source. Tokens are AES-256 encrypted at rest, decrypted only in-memory at query time, automatically refreshed, and never shared across users. AstroBee inherits each source’s native access control (Snowflake RBAC, Databricks Unity Catalog, Salesforce profiles) without replicating it.

Query Execution Paths

Query execution follows three paths depending on what the task touches:
PathWhenWhat Happens
A — DirectSingle warehouseDialect-specific SQL executes on source compute; only the result set crosses the boundary.
B — ExtractSingle APICall API with user’s token, transform response into tabular format in an ephemeral workspace, query, destroy.
C — FederateMultiple sourcesPush filtered queries to each source in parallel, load partial results into an ephemeral workspace, join locally, destroy.
Ephemeral Workspace: For Paths B and C, an in-memory workspace is created per query, isolated per user, and destroyed after the query completes (seconds). Brief result caching supports pagination, then is discarded.

Security Model

The architecture is zero-trust by design — AstroBee assumes no inherent rights to customer data.
  • No service accounts. Every query runs as a specific user with their permissions. If a user can’t access a table in Snowflake, they can’t access it through AstroBee — the warehouse rejects the query.
  • No permission duplication. No parallel ACL system to maintain or drift out of sync.
  • Blast radius is per-user. A compromised token exposes only that user’s scope.
  • Full audit trail. Every query logged with user identity, source, SQL/API call, and timestamp.
  • Data residency preserved. Customer data never leaves their cloud region.
Compliance: GDPR, HIPAA, and data residency requirements are satisfied structurally — not through policy — because AstroBee never holds the data.

Federation

Tasks that span systems are unanswerable by any single source. “Correlate Gong call sentiment with Salesforce win rates for deals closing this quarter” — Salesforce knows deal stages, Gong knows conversation quality, neither can answer alone. AstroBee extracts filtered subsets into an ephemeral workspace, joins on shared keys, computes the answer, and discards everything. Each source independently validates credentials — no permission bypass through federation. AstroBee acts as a translator and coordinator — never as a database. Customer data stays where it is, governed by the systems that already protect it.

The Vision: A Unified Data Layer

The ultimate goal is to create a semantic data layer that spans multiple systems — warehouses, lakehouses, CRMs, and SaaS APIs — enabling organizations to ask questions across their entire data ecosystem without moving or duplicating data. This vision is achieved through a phased approach:
PhaseCapabilityValue
Phase 1Single structured data sourceFoundation: Prove the credential delegation model with warehouse queries
Phase 2API data sourcesExpansion: Extend to semi-structured data from SaaS applications
Phase 3Federated queries across systemsVision: Unified analytics across all data sources
Each phase builds on the previous, progressively expanding the types of data sources and query complexity supported while maintaining the core principles of zero data ingestion and native permission enforcement.

Deep Dives