Coming Soon — This page describes an architecture that is currently in development and not yet generally available. Contact us to learn more.
The Power of Federation
The real value emerges when data from different systems can be joined. Consider questions like:- “Show products with high sales (Snowflake) but low inventory (Databricks)”
- “For deals closing this quarter, correlate Gong call sentiment with win rate”
- “Which customers have the most touchpoints (Calendar meetings + Gong calls + Salesforce activities)?”
Ephemeral Federation Architecture
Since Snowflake and Databricks can’t directly query each other, AstroBee provides an ephemeral workspace — a temporary database that exists only for the duration of the query.Federated Query Execution Flow
Agent recognizes federation needed
Query spans two systems — Sales in Snowflake, Inventory in Databricks
Federation coordinator generates two SQL queries
Snowflake: Aggregate sales by product (last 30 days)Databricks: Current inventory levels by product
Partial results fetched into ephemeral workspace
Snowflake results: ~5,000 product-sales rowsDatabricks results: ~10,000 product-inventory rows
Cross-Source Analytics: Warehouse + API Federation
Federation extends beyond warehouse-to-warehouse joins. Phase 3 enables joining warehouse data with API-extracted data. Example question: “For deals closing this quarter, show the correlation between Gong call sentiment, number of meetings, and win rate” This query draws from three sources simultaneously:| Source | Data | Fields |
|---|---|---|
| Gong (API) | Call sentiment | sentiment_score |
| Google Calendar (API) | Meeting count | meeting_count |
| Salesforce (Warehouse/API) | Opportunities | stage, amount, close_date |
account_id in the ephemeral workspace, producing a correlation analysis that no single system could provide:
- Salesforce knows deal stages but not conversation quality
- Gong knows call sentiment but not meeting frequency
- Calendar knows meeting load but not deal outcomes
- Together: Complete view of customer engagement → deal success correlation
Federation Security Model
Security Properties
- Multi-credential validation — Each source system independently verifies user permissions before returning data
- No permission bypass — If a user can’t see
sensitive_productsin Databricks, the federated query also can’t join it - Isolated workspaces — Each user query gets a separate ephemeral database (no cross-contamination)
- Automatic cleanup — Workspace destroyed after 30–60 seconds (or on error), ensuring no data leakage
- Audit logging — All source queries and federation operations logged with user identity
Performance Considerations
Query Size Limits
- Practical limit: ~100,000 rows per source (prevents excessive data transfer)
- If a query exceeds the limit, AstroBee prompts: “Please add filters to reduce result size”
- Alternative: Push more computation to the source warehouse (aggregations before transfer)
Network Latency
- Typical query time: 2–5 seconds (source execution + network transfer + local join)
- Parallel fetching reduces latency (Snowflake and Databricks queried simultaneously)
- For large joins: Consider materialized views in one warehouse (e.g., Databricks Delta Sharing or Snowflake data shares)
Cost Management
- Warehouse query costs billed directly to the customer (not AstroBee)
- Customer maintains full visibility into query usage via Snowflake/Databricks billing dashboards
- AstroBee can surface cost estimates before query execution (using warehouse APIs)
Data Source Support Matrix
| Source Type | Source | OAuth Support | Federation Support | Notes |
|---|---|---|---|---|
| Warehouse | Snowflake | Snowflake OAuth | Via ephemeral workspace | Key-pair auth also supported |
| Lakehouse | Databricks | OAuth 2.0 | Via ephemeral workspace | Unity Catalog integration |
| CRM | Salesforce | Salesforce OAuth | Via ephemeral workspace | Objects mapped to entities |
| API | Gong | OAuth 2.0 | Via extraction + ephemeral | Call data, transcripts, sentiment |
| API | Google Calendar | Google OAuth | Via extraction + ephemeral | Events, attendees, scheduling |
Source Type Characteristics
- Warehouse/Lakehouse (Snowflake, Databricks): Direct SQL queries against existing tables
- CRM (Salesforce): SOQL queries against objects, mapped to semantic entities
- API (Gong, Calendar): Dynamic extraction with on-the-fly schema inference

