Skip to main content

Schema Discovery

Automatic understanding of your data relationships—no manual configuration required.

What is Schema Discovery?

Schema Discovery is how Shadowfax automatically figures out your data structure. When you import data, the AI analyzes column names, data types, and patterns to understand relationships between tables. It identifies foreign keys, suggests joins, and builds a schema map—all without you defining anything manually.

Schema Map

Automatically discovered relationships between datasets

Why Schema Discovery Matters

Zero configuration: Import data and start analyzing—no setup needed.

Smart joins: The AI knows how tables relate, so it suggests correct join keys.

Error prevention: Reduces mistakes from incorrect table relationships.

Time savings: Skip hours of manual schema documentation.

Onboarding speed: New team members understand data structure visually.

How It Works

Automatic Analysis

When you add Sources to your Workbook:

  1. Column inspection: Examines column names for patterns (id, user_id, customer_id, etc.)
  2. Data type analysis: Checks if columns contain appropriate data for keys
  3. Relationship inference: Identifies likely foreign key relationships
  4. Cardinality detection: Determines one-to-one, one-to-many, or many-to-many relationships
  5. Naming pattern matching: Recognizes common naming conventions

Visual Schema Map

Access the schema visualization to see:

  • All your Sources as entities
  • Relationship lines connecting related tables
  • Join keys labeled on each connection
  • Relationship types (one-to-many, etc.)

ER Diagram

Entity-relationship diagram showing discovered schema

What Gets Discovered

Foreign Key Relationships

Pattern recognition:

  • customer_id in orders table → links to id in customers table
  • product_id → links to products table
  • user_id → links to users table

Naming variations handled:

  • customerId (camelCase)
  • customer_id (snake_case)
  • CustomerID (PascalCase)

Join Recommendations

When you ask to combine datasets:

@[orders] and @[customers] Show customer names with their orders

The AI already knows to join on customer_id = id because of schema discovery.

Data Type Inference

Shadowfax detects:

  • Primary keys: Unique identifiers
  • Foreign keys: References to other tables
  • Dates and times: For temporal analysis
  • Categorical fields: For grouping
  • Numeric measures: For aggregation
  • Text fields: For filtering and display

Common Scenarios

Multi-Table Analysis

Your request:

@[orders], @[customers], and @[products] Show revenue by customer segment
and product category

What happens:

  1. AI sees three tables mentioned
  2. Checks discovered relationships
  3. Knows: orders.customer_id → customers.id
  4. Knows: orders.product_id → products.id
  5. Constructs correct joins automatically
  6. Groups by customer segment and product category

You do: Nothing—just mention the tables.

Multi-Table Join

AI uses discovered schema to join multiple tables correctly

Ambiguous Column Names

Scenario: Both "orders" and "returns" have a date column

What happens:

  • Schema discovery notes both columns
  • When you mention "date", AI asks which one you mean
  • Or uses context to infer (e.g., if you mentioned @[orders], uses orders.date)

Your role: Add clarifying context in column annotations to avoid ambiguity.

Missing Relationships

Scenario: Two tables should relate but discovery missed it

What happens:

  • You explicitly tell the AI the join key:
    Join @[table1] with @[table2] on table1.custom_field = table2.id
  • AI learns and remembers this relationship for future queries

Viewing Your Schema

Accessing Schema Map

  1. Click the schema visualization icon (usually in the toolbar)
  2. See all Sources as connected entities
  3. Click relationships to see join keys
  4. Zoom and pan to explore complex schemas

Schema Visualization UI

Interactive schema map interface

Understanding the Visualization

Boxes: Represent Sources (tables) Lines: Represent relationships Labels on lines: Show the join keys (e.g., "customer_id = id") Line style: Indicates relationship type

  • Solid: One-to-many
  • Dashed: Many-to-many
  • Arrow direction: Shows the "many" side

Enhancing Schema Discovery

Add Context to Columns

Help the AI understand your data better:

  1. After importing, add column context in the Source settings
  2. Explain unusual naming conventions
  3. Clarify what IDs represent
  4. Note any data quality issues

Example:

  • Column: cust_ref
  • Context: "This is the customer ID, links to customers.id"

Specify Join Conditions

If automatic discovery misses something:

@[orders] Join with @[special_discounts] where orders.promo_code =
special_discounts.code

The AI will note this relationship for future use.

Tips & Best Practices

Use consistent naming: Stick to one convention (snake_case or camelCase) across datasets.

Name foreign keys clearly: Use patterns like customer_id, product_id rather than generic ref or fk.

Review the schema map: Check that discovered relationships match your expectations.

Add context for unusual patterns: If your schema doesn't follow conventions, explain it in column annotations.

Leverage auto-discovery: Let the AI figure out relationships instead of specifying every join manually.

Verify complex joins: For multi-table joins, check the generated SQL to ensure correctness.

Teach the AI: When you correct a relationship, the AI learns for that Workbook.

Benefits Over Manual Schema Definition

Speed: Instant understanding vs. hours of documentation

Accuracy: Detects patterns you might miss

Maintenance-free: Automatically understands new tables

No configuration files: No need to write schema YAML or config

Smart defaults: Works out-of-the-box for standard patterns

Limitations

Non-standard schemas: Unusual naming patterns may require clarification

Implicit relationships: Relationships without naming hints might be missed

Multiple valid joins: When two tables can join on different keys, you may need to specify

In these cases, simply tell the AI explicitly and it will remember.

  • Sources - Where schema discovery happens
  • Views - Benefit from discovered relationships
  • AI Chat Interface - Where you leverage schema knowledge
  • @Mentions - Reference datasets the AI understands