Microsoft Acquires Osmos: Agentic Data Engineering Comes to Microsoft Fabric

In January 2026, Microsoft announced the acquisition of Osmos, an agentic AI data engineering platform that automates complex data transformation, integration, and quality tasks. This acquisition signals Microsoft’s commitment to bringing autonomous AI agents into the data engineering workflow within Microsoft Fabric. For data engineers struggling with repetitive ETL development, schema mapping, and data quality issues, Osmos promises to transform hours of manual work into minutes of AI-assisted automation.

What is Osmos?

Osmos is an agentic data engineering platform that uses AI agents to automate the traditionally manual and time-consuming aspects of data pipeline development:

  • Schema Mapping: Automatically map source schemas to target schemas, inferring transformations from examples
  • Data Transformation: Generate transformation logic from natural language descriptions
  • Data Quality: Identify anomalies, validate constraints, and suggest fixes autonomously
  • Pipeline Generation: Create complete ETL/ELT pipelines from high-level requirements
  • Error Resolution: Diagnose and fix pipeline failures without human intervention

Unlike traditional low-code data tools that still require significant manual configuration, Osmos takes an agentic approach—its AI actively explores data, proposes solutions, and iterates until requirements are met.

Why Microsoft Acquired Osmos

The acquisition addresses key challenges in Microsoft’s data platform strategy:

ChallengeCurrent State in FabricOsmos Solution
Schema mappingManual Data Factory mappingAI-powered auto-mapping with confidence scores
Transformation authoringWrite Spark/SQL manuallyNatural language to transformation code
Data qualitySeparate tools/manual rulesAutonomous anomaly detection and remediation
Pipeline debuggingManual log analysisAI agent diagnoses and proposes fixes
Time to valueDays to weeksHours to production

Osmos Integration Architecture in Fabric

graph TB
    subgraph Sources ["Data Sources"]
        S1["SQL Server"]
        S2["Salesforce"]
        S3["REST APIs"]
        S4["Files (CSV, JSON)"]
    end
    
    subgraph Fabric ["Microsoft Fabric"]
        subgraph Osmos ["Osmos Agent Layer"]
            SchemaAgent["Schema Mapping Agent"]
            TransformAgent["Transformation Agent"]
            QualityAgent["Data Quality Agent"]
            PipelineAgent["Pipeline Agent"]
        end
        
        subgraph Core ["Fabric Core"]
            Lakehouse["OneLake Lakehouse"]
            Warehouse["Synapse Warehouse"]
            Dataflow["Data Factory Pipelines"]
            Notebooks["Spark Notebooks"]
        end
        
        subgraph Output ["Analytics Layer"]
            PowerBI["Power BI"]
            Copilot["Fabric Copilot"]
        end
    end
    
    S1 --> SchemaAgent
    S2 --> SchemaAgent
    S3 --> SchemaAgent
    S4 --> SchemaAgent
    
    SchemaAgent --> TransformAgent
    TransformAgent --> QualityAgent
    QualityAgent --> PipelineAgent
    
    PipelineAgent --> Dataflow
    PipelineAgent --> Notebooks
    Dataflow --> Lakehouse
    Notebooks --> Lakehouse
    Lakehouse --> Warehouse
    Warehouse --> PowerBI
    Warehouse --> Copilot
    
    style Osmos fill:#E8F5E9,stroke:#2E7D32
    style Core fill:#E3F2FD,stroke:#1565C0
    style Output fill:#FFF3E0,stroke:#EF6C00

Core Capabilities

1. Intelligent Schema Mapping

from fabric.osmos import SchemaMapper

# Initialize schema mapper with source and target
mapper = SchemaMapper(
    source_connection="salesforce://myorg",
    target_lakehouse="lakehouse://sales_bronze"
)

# AI analyzes both schemas and proposes mappings
mapping_proposal = await mapper.auto_map()

# Review proposed mappings with confidence scores
for mapping in mapping_proposal.field_mappings:
    print(f"{mapping.source_field} -> {mapping.target_field}")
    print(f"  Confidence: {mapping.confidence:.0%}")
    print(f"  Transform: {mapping.suggested_transform or 'direct'}")
    print(f"  Reasoning: {mapping.reasoning}")

# Example output:
# Account.Name -> customer_name
#   Confidence: 98%
#   Transform: direct
#   Reasoning: Semantic match on 'name' field for customer entity
#
# Account.AnnualRevenue -> annual_revenue_usd
#   Confidence: 94%
#   Transform: CAST(value AS DECIMAL(18,2))
#   Reasoning: Currency field, target expects decimal type
#
# Account.BillingAddress -> billing_address_json
#   Confidence: 87%
#   Transform: TO_JSON(struct(street, city, state, zip))
#   Reasoning: Flattened struct to JSON for flexible querying

# Accept all high-confidence mappings, review others
approved_mappings = mapping_proposal.accept_above_confidence(0.90)
for low_conf in mapping_proposal.below_confidence(0.90):
    # Interactive review
    approved = await review_with_user(low_conf)
    if approved:
        approved_mappings.add(approved)

2. Natural Language Transformation

from fabric.osmos import TransformationAgent

agent = TransformationAgent(lakehouse="sales_bronze")

# Describe transformation in natural language
transformation_request = """
From the raw_orders table:
1. Filter to only US orders from the last 90 days
2. Join with customers table on customer_id
3. Calculate total order value including tax (9.5% for CA, 8% for NY, 6% elsewhere)
4. Flag high-value orders (> $10,000) for priority processing
5. Aggregate by customer to get lifetime value
"""

# Agent generates and explains the transformation
result = await agent.generate_transformation(transformation_request)

print("Generated SQL:")
print(result.sql_code)

# Output:
# WITH us_orders AS (
#     SELECT o.*, 
#            c.customer_name,
#            c.state,
#            o.subtotal * (1 + CASE 
#                WHEN c.state = 'CA' THEN 0.095
#                WHEN c.state = 'NY' THEN 0.08
#                ELSE 0.06
#            END) AS total_with_tax,
#            CASE WHEN o.subtotal > 10000 THEN TRUE ELSE FALSE END AS is_priority
#     FROM raw_orders o
#     JOIN customers c ON o.customer_id = c.customer_id
#     WHERE c.country = 'US'
#       AND o.order_date >= CURRENT_DATE - INTERVAL 90 DAYS
# )
# SELECT customer_id,
#        customer_name,
#        COUNT(*) as order_count,
#        SUM(total_with_tax) as lifetime_value,
#        MAX(is_priority) as has_priority_orders
# FROM us_orders
# GROUP BY customer_id, customer_name

print("
Explanation:")
print(result.explanation)

# Validate before execution
validation = await agent.validate_transformation(result)
if validation.is_valid:
    await agent.execute(result, target_table="customer_lifetime_value")
💡
ITERATIVE REFINEMENT

The transformation agent supports iterative refinement. If results don’t match expectations, provide feedback like “tax calculation is wrong for Texas” and the agent will adjust the logic automatically.

3. Autonomous Data Quality

from fabric.osmos import DataQualityAgent

quality_agent = DataQualityAgent(
    lakehouse="sales_silver",
    monitoring_mode="continuous"
)

# Agent autonomously profiles data and learns patterns
profile = await quality_agent.profile_table("customer_orders")

print("Discovered Patterns:")
for pattern in profile.patterns:
    print(f"  {pattern.column}: {pattern.description}")
    print(f"    Rule: {pattern.inferred_rule}")

# Output:
# Discovered Patterns:
#   email: Email format with domain validation
#     Rule: REGEXP_LIKE(email, '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
#   order_total: Positive decimal, typically $50-$5000
#     Rule: order_total > 0 AND order_total < 50000
#   created_at: Timestamp, no future dates, monotonically increasing
#     Rule: created_at <= CURRENT_TIMESTAMP AND created_at >= '2020-01-01'

# Configure autonomous remediation
quality_agent.configure_remediation({
    "null_handling": "quarantine",  # Move to quarantine table
    "format_errors": "attempt_fix",  # Try to fix automatically
    "anomalies": "flag_for_review",  # Add flag, don't block
    "duplicates": "keep_latest"  # Dedup by keeping most recent
})

# Start continuous monitoring
await quality_agent.start_monitoring(
    alert_channel="teams://data-engineering",
    check_interval_minutes=15
)

4. Pipeline Generation and Self-Healing

from fabric.osmos import PipelineAgent

pipeline_agent = PipelineAgent(workspace="sales-analytics")

# Generate complete pipeline from requirements
pipeline_spec = """
Build a daily pipeline that:
1. Extracts new/updated records from Salesforce (Accounts, Opportunities, Contacts)
2. Lands raw data in bronze layer with CDC tracking
3. Applies business transformations for silver layer
4. Creates aggregated tables for gold layer (sales by region, pipeline forecast)
5. Refreshes Power BI dataset
6. Sends Slack notification on completion or failure
"""

pipeline = await pipeline_agent.generate_pipeline(
    name="salesforce_daily_etl",
    specification=pipeline_spec,
    schedule="0 6 * * *"  # 6 AM daily
)

# Review generated pipeline
print(f"Generated {len(pipeline.activities)} activities:")
for activity in pipeline.activities:
    print(f"  {activity.name}: {activity.type}")
    print(f"    Depends on: {activity.dependencies}")

# Deploy with self-healing enabled
await pipeline.deploy(
    self_healing=True,
    max_auto_retry=3,
    escalate_after_failures=5
)

# When pipeline fails, agent automatically:
# 1. Analyzes error logs
# 2. Identifies root cause
# 3. Attempts automatic fix (schema drift, connection timeout, etc.)
# 4. Re-runs failed activities
# 5. Escalates to humans only if auto-fix fails
⚠️
SELF-HEALING BOUNDARIES

Osmos agents cannot modify production schemas or delete data without explicit approval. Self-healing is limited to retry logic, connection refresh, and parameter adjustment. Breaking changes require human review.

Integration with Fabric Copilot

Osmos enhances Fabric Copilot with data engineering-specific capabilities:

User: "Why is the sales dashboard showing incorrect numbers for Q4?"

Fabric Copilot + Osmos:
I've analyzed the data lineage and found the issue:

📊 Root Cause Analysis:
1. The sales_silver.orders table has 2,847 duplicate records
2. Duplicates were introduced on Dec 15 when the Salesforce sync
   ran twice due to a timeout retry
3. This inflated Q4 revenue by approximately $1.2M

🔧 Recommended Fix:
1. Run deduplication on sales_silver.orders keeping latest record
2. Add idempotency check to Salesforce sync pipeline
3. Refresh downstream aggregates

Would you like me to apply these fixes? I'll:
- Create backup of affected tables
- Execute dedup transformation
- Add pipeline guard against duplicate syncs
- Refresh Power BI dataset

[Apply Fixes] [Review Changes First] [Cancel]

Pricing and Availability

FeatureAvailabilityPricing Model
Schema Mapping AgentGA (Q1 2026)Included in Fabric capacity
Transformation AgentGA (Q1 2026)Included in Fabric capacity
Data Quality AgentPreview (Q1 2026)Preview: free; GA: capacity units
Pipeline Self-HealingPreview (Q2 2026)Premium add-on
Continuous MonitoringPreview (Q2 2026)Per-table pricing

Key Takeaways

  • Microsoft’s Osmos acquisition brings agentic AI to data engineering workflows in Microsoft Fabric.
  • Schema mapping agents automate the tedious work of mapping sources to targets with confidence scores.
  • Natural language transformations let data engineers describe logic in plain English instead of writing SQL/Spark.
  • Autonomous data quality continuously monitors data, detects anomalies, and can auto-remediate common issues.
  • Self-healing pipelines diagnose and fix failures automatically, escalating to humans only when needed.

Conclusion

The Osmos acquisition positions Microsoft Fabric as not just a unified data platform, but an intelligent data platform where AI agents handle the repetitive, error-prone aspects of data engineering. For enterprises drowning in data integration projects, Osmos promises to dramatically reduce time-to-value while improving data quality. Early adopters should expect the schema mapping and transformation agents in Q1 2026, with more advanced capabilities rolling out throughout the year.

References


Discover more from C4: Container, Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.