Schema Discovery
Automatic understanding of your data relationships—no manual configuration required.
What is Schema Discovery?
Schema Discovery is how Shadowfax automatically figures out your data structure. When you import data, the AI analyzes column names, data types, and patterns to understand relationships between tables. It identifies foreign keys, suggests joins, and builds a schema map—all without you defining anything manually.
Automatically discovered relationships between datasets
Why Schema Discovery Matters
Zero configuration: Import data and start analyzing—no setup needed.
Smart joins: The AI knows how tables relate, so it suggests correct join keys.
Error prevention: Reduces mistakes from incorrect table relationships.
Time savings: Skip hours of manual schema documentation.
Onboarding speed: New team members understand data structure visually.
How It Works
Automatic Analysis
When you add Sources to your Workbook:
- Column inspection: Examines column names for patterns (id, user_id, customer_id, etc.)
- Data type analysis: Checks if columns contain appropriate data for keys
- Relationship inference: Identifies likely foreign key relationships
- Cardinality detection: Determines one-to-one, one-to-many, or many-to-many relationships
- Naming pattern matching: Recognizes common naming conventions
Visual Schema Map
Access the schema visualization to see:
- All your Sources as entities
- Relationship lines connecting related tables
- Join keys labeled on each connection
- Relationship types (one-to-many, etc.)
Entity-relationship diagram showing discovered schema
What Gets Discovered
Foreign Key Relationships
Pattern recognition:
customer_idin orders table → links toidin customers tableproduct_id→ links to products tableuser_id→ links to users table
Naming variations handled:
customerId(camelCase)customer_id(snake_case)CustomerID(PascalCase)
Join Recommendations
When you ask to combine datasets:
@[orders] and @[customers] Show customer names with their orders
The AI already knows to join on customer_id = id because of schema discovery.
Data Type Inference
Shadowfax detects:
- Primary keys: Unique identifiers
- Foreign keys: References to other tables
- Dates and times: For temporal analysis
- Categorical fields: For grouping
- Numeric measures: For aggregation
- Text fields: For filtering and display
Common Scenarios
Multi-Table Analysis
Your request:
@[orders], @[customers], and @[products] Show revenue by customer segment
and product category
What happens:
- AI sees three tables mentioned
- Checks discovered relationships
- Knows: orders.customer_id → customers.id
- Knows: orders.product_id → products.id
- Constructs correct joins automatically
- Groups by customer segment and product category
You do: Nothing—just mention the tables.
AI uses discovered schema to join multiple tables correctly
Ambiguous Column Names
Scenario: Both "orders" and "returns" have a date column
What happens:
- Schema discovery notes both columns
- When you mention "date", AI asks which one you mean
- Or uses context to infer (e.g., if you mentioned @[orders], uses orders.date)
Your role: Add clarifying context in column annotations to avoid ambiguity.
Missing Relationships
Scenario: Two tables should relate but discovery missed it
What happens:
- You explicitly tell the AI the join key:
Join @[table1] with @[table2] on table1.custom_field = table2.id - AI learns and remembers this relationship for future queries
Viewing Your Schema
Accessing Schema Map
- Click the schema visualization icon (usually in the toolbar)
- See all Sources as connected entities
- Click relationships to see join keys
- Zoom and pan to explore complex schemas
Interactive schema map interface
Understanding the Visualization
Boxes: Represent Sources (tables) Lines: Represent relationships Labels on lines: Show the join keys (e.g., "customer_id = id") Line style: Indicates relationship type
- Solid: One-to-many
- Dashed: Many-to-many
- Arrow direction: Shows the "many" side
Enhancing Schema Discovery
Add Context to Columns
Help the AI understand your data better:
- After importing, add column context in the Source settings
- Explain unusual naming conventions
- Clarify what IDs represent
- Note any data quality issues
Example:
- Column:
cust_ref - Context: "This is the customer ID, links to customers.id"
Specify Join Conditions
If automatic discovery misses something:
@[orders] Join with @[special_discounts] where orders.promo_code =
special_discounts.code
The AI will note this relationship for future use.
Tips & Best Practices
Use consistent naming: Stick to one convention (snake_case or camelCase) across datasets.
Name foreign keys clearly: Use patterns like customer_id, product_id rather than generic ref or fk.
Review the schema map: Check that discovered relationships match your expectations.
Add context for unusual patterns: If your schema doesn't follow conventions, explain it in column annotations.
Leverage auto-discovery: Let the AI figure out relationships instead of specifying every join manually.
Verify complex joins: For multi-table joins, check the generated SQL to ensure correctness.
Teach the AI: When you correct a relationship, the AI learns for that Workbook.
Benefits Over Manual Schema Definition
Speed: Instant understanding vs. hours of documentation
Accuracy: Detects patterns you might miss
Maintenance-free: Automatically understands new tables
No configuration files: No need to write schema YAML or config
Smart defaults: Works out-of-the-box for standard patterns
Limitations
Non-standard schemas: Unusual naming patterns may require clarification
Implicit relationships: Relationships without naming hints might be missed
Multiple valid joins: When two tables can join on different keys, you may need to specify
In these cases, simply tell the AI explicitly and it will remember.
Related Features
- Sources - Where schema discovery happens
- Views - Benefit from discovered relationships
- AI Chat Interface - Where you leverage schema knowledge
- @Mentions - Reference datasets the AI understands