Skip to main content
Coming Soon — This page describes an architecture that is currently in development and not yet generally available. Contact us to learn more.
Phase 3 represents the culmination of the architecture vision: unified analytics across multiple data sources. Building on credential delegation (Phase 1) and API extraction (Phase 2), this phase enables queries that span warehouses, lakehouses, and SaaS APIs — joining data that has never lived in the same system.

The Power of Federation

The real value emerges when data from different systems can be joined. Consider questions like:
  • “Show products with high sales (Snowflake) but low inventory (Databricks)”
  • “For deals closing this quarter, correlate Gong call sentiment with win rate”
  • “Which customers have the most touchpoints (Calendar meetings + Gong calls + Salesforce activities)?”
These insights are impossible with any single system — they require federated queries.

Ephemeral Federation Architecture

Since Snowflake and Databricks can’t directly query each other, AstroBee provides an ephemeral workspace — a temporary database that exists only for the duration of the query.

Federated Query Execution Flow

1

User asks question

“Show products with high sales but low inventory”
2

Agent recognizes federation needed

Query spans two systems — Sales in Snowflake, Inventory in Databricks
3

Federation coordinator generates two SQL queries

Snowflake: Aggregate sales by product (last 30 days)Databricks: Current inventory levels by product
4

Queries execute in parallel

Each query uses the user’s respective tokens
5

Partial results fetched into ephemeral workspace

Snowflake results: ~5,000 product-sales rowsDatabricks results: ~10,000 product-inventory rows
6

Join executes locally

SELECT p.product_name, s.quantity_sold, i.current_stock
FROM sales s
JOIN inventory i ON s.product_id = i.product_id
7

Final results displayed

~5,000 rows displayed in AstroBee UI
8

Workspace destroyed immediately

DuckDB instance terminated, no data persists

Cross-Source Analytics: Warehouse + API Federation

Federation extends beyond warehouse-to-warehouse joins. Phase 3 enables joining warehouse data with API-extracted data. Example question: “For deals closing this quarter, show the correlation between Gong call sentiment, number of meetings, and win rate” This query draws from three sources simultaneously:
SourceDataFields
Gong (API)Call sentimentsentiment_score
Google Calendar (API)Meeting countmeeting_count
Salesforce (Warehouse/API)Opportunitiesstage, amount, close_date
All three datasets are joined on account_id in the ephemeral workspace, producing a correlation analysis that no single system could provide:
  • Salesforce knows deal stages but not conversation quality
  • Gong knows call sentiment but not meeting frequency
  • Calendar knows meeting load but not deal outcomes
  • Together: Complete view of customer engagement → deal success correlation

Federation Security Model

Security Properties

  • Multi-credential validation — Each source system independently verifies user permissions before returning data
  • No permission bypass — If a user can’t see sensitive_products in Databricks, the federated query also can’t join it
  • Isolated workspaces — Each user query gets a separate ephemeral database (no cross-contamination)
  • Automatic cleanup — Workspace destroyed after 30–60 seconds (or on error), ensuring no data leakage
  • Audit logging — All source queries and federation operations logged with user identity

Performance Considerations

Query Size Limits

  • Practical limit: ~100,000 rows per source (prevents excessive data transfer)
  • If a query exceeds the limit, AstroBee prompts: “Please add filters to reduce result size”
  • Alternative: Push more computation to the source warehouse (aggregations before transfer)

Network Latency

  • Typical query time: 2–5 seconds (source execution + network transfer + local join)
  • Parallel fetching reduces latency (Snowflake and Databricks queried simultaneously)
  • For large joins: Consider materialized views in one warehouse (e.g., Databricks Delta Sharing or Snowflake data shares)

Cost Management

  • Warehouse query costs billed directly to the customer (not AstroBee)
  • Customer maintains full visibility into query usage via Snowflake/Databricks billing dashboards
  • AstroBee can surface cost estimates before query execution (using warehouse APIs)

Data Source Support Matrix

Source TypeSourceOAuth SupportFederation SupportNotes
WarehouseSnowflakeSnowflake OAuthVia ephemeral workspaceKey-pair auth also supported
LakehouseDatabricksOAuth 2.0Via ephemeral workspaceUnity Catalog integration
CRMSalesforceSalesforce OAuthVia ephemeral workspaceObjects mapped to entities
APIGongOAuth 2.0Via extraction + ephemeralCall data, transcripts, sentiment
APIGoogle CalendarGoogle OAuthVia extraction + ephemeralEvents, attendees, scheduling

Source Type Characteristics

  • Warehouse/Lakehouse (Snowflake, Databricks): Direct SQL queries against existing tables
  • CRM (Salesforce): SOQL queries against objects, mapped to semantic entities
  • API (Gong, Calendar): Dynamic extraction with on-the-fly schema inference

Next Steps