Data Operations Analyst Guide¶

Role Overview¶

As a Data Operations Analyst, you are responsible for:

Managing data ingestion from multiple sources (servicers, custodians, trustees)
Configuring column mappings to normalize heterogeneous data formats
Ensuring data quality through validation and cleansing
Resolving data issues including schema drift and format changes

CalcBridge provides intelligent mapping tools and validation pipelines to streamline your data operations.

What CalcBridge Does for You

Auto-detects column mappings using alias profiles
Identifies schema drift before it causes calculation failures
Validates data quality with configurable rules
Provides detailed diagnostics for troubleshooting

Daily Workflow¶

flowchart LR
    A[Upload Files] --> B[Configure Mappings]
    B --> C[Validate Data]
    C --> D[Resolve Issues]
    D --> E[Confirm Import]

    style A fill:#DBEAFE,stroke:#3B82F6
    style B fill:#FEF3C7,stroke:#F59E0B
    style C fill:#DCFCE7,stroke:#22C55E
    style D fill:#FEE2E2,stroke:#EF4444
    style E fill:#EDE9FE,stroke:#8B5CF6

Step 1: Upload Files¶

Receive and upload data files from various sources:

# Upload via API
curl -X POST https://api.calcbridge.io/api/v1/workbooks/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@servicer_report.xlsx" \
  -F "name=January 2024 Servicer Report" \
  -F "source=servicer_abc"

Step 2: Configure Mappings¶

Review auto-detected mappings and adjust as needed:

Mapping Detection Results
=========================
Auto-matched: 45/50 columns (90%)
Needs Review: 5 columns

Unmatched Columns:
- "Loan Balance (USD)" -> Suggested: par_amount
- "Cpn Rate" -> Suggested: coupon
- "S&P Rtg" -> Suggested: rating_sp
- "Moody Rtg" -> Suggested: rating_moody
- "Sec Type" -> Suggested: security_type

Step 3: Validate Data¶

Run validation checks on imported data:

Data type validation
Required field checks
Range validations
Referential integrity

Step 4: Resolve Issues¶

Address any data quality issues identified:

Missing required fields
Invalid data formats
Out-of-range values
Duplicate records

Step 5: Confirm Import¶

Finalize the import and make data available for analysis:

# Confirm import
curl -X POST "https://api.calcbridge.io/api/v1/workbooks/{id}/confirm" \
  -H "Authorization: Bearer $TOKEN"

Key Features¶

Column Mapping¶

CalcBridge uses a two-tier mapping system:

Tier	Description	Example
Alias Profiles	Pre-configured synonyms for common data sources	`clo_holdings_standard.json`
Custom Mappings	Tenant-specific overrides	Your organization's mappings

Schema Detection¶

Automatic detection of:

Column names and positions
Data types (number, text, date)
Null patterns
Unique identifiers

Data Validation¶

Built-in validation rules:

Validation Rules:
  required_fields:
    - cusip
    - par_amount
    - price

  data_types:
    par_amount: numeric
    price: numeric
    maturity_date: date

  range_checks:
    price:
      min: 0
      max: 200
    par_amount:
      min: 0

  format_checks:
    cusip:
      pattern: "^[A-Z0-9]{9}$"

Bulk Operations¶

Support for high-volume data processing:

Multi-file uploads
Batch validation
Scheduled imports
Delta processing

Step-by-Step Tutorials¶

Creating Mapping Profiles¶

Learn how to create reusable column mapping profiles for recurring data sources.

Understanding Alias Profiles¶

CalcBridge includes pre-built alias profiles for common data sources:

config/clo_aliases/
  - clo_holdings_standard.json
  - everest_extended.json
  - ratings_export.json
  - invesco_clo_holdings.json
  - ft_tearsheet_holdings.json

Tutorial: Create a Custom Profile¶

1. Navigate to Mapping Configuration

Go to Data Ops > Mappings > Create New Profile

2. Define Profile Metadata

{
  "profile_name": "servicer_xyz_monthly",
  "description": "Monthly report format from Servicer XYZ",
  "source_type": "excel",
  "version": "1.0",
  "created_date": "2024-01-15"
}

3. Define Column Mappings

{
  "column_mappings": {
    "Loan ID": {
      "target": "loan_id",
      "type": "string",
      "required": true
    },
    "Outstanding Principal Balance": {
      "target": "par_amount",
      "type": "numeric",
      "required": true,
      "aliases": ["Principal Balance", "OPB", "Current Balance"]
    },
    "Coupon Rate (%)": {
      "target": "coupon",
      "type": "numeric",
      "transform": "divide_by_100",
      "aliases": ["Rate", "Cpn Rate", "Interest Rate"]
    },
    "Maturity": {
      "target": "maturity_date",
      "type": "date",
      "format": "MM/DD/YYYY",
      "aliases": ["Mat Date", "Maturity Date", "Final Maturity"]
    }
  }
}

4. Configure Validation Rules

{
  "validation_rules": {
    "par_amount": {
      "min": 0,
      "required": true
    },
    "coupon": {
      "min": 0,
      "max": 0.25,
      "required": true
    },
    "maturity_date": {
      "min": "today",
      "required": true
    }
  }
}

5. Test the Profile

Upload a sample file using the new profile:

curl -X POST https://api.calcbridge.io/api/v1/workbooks/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@sample_report.xlsx" \
  -F "mapping_profile=servicer_xyz_monthly" \
  -F "validate_only=true"

6. Save and Activate

Once testing is complete, save the profile for production use.

Handling Schema Drift¶

Learn to detect and resolve schema changes in incoming data.

What is Schema Drift?¶

Schema drift occurs when the structure of incoming data changes:

Column names change
Columns are added or removed
Data types change
Column order changes

Schema Drift Impact

Undetected schema drift can cause:

Calculation failures
Missing data in reports
Incorrect compliance test results
Data quality issues

Drift Detection¶

CalcBridge automatically detects schema drift during import:

Schema Drift Detection Report
=============================
File: January_Report.xlsx
Profile: servicer_xyz_monthly

Drift Detected: YES

Changes Found:
  Added Columns:
    - "Spread to Benchmark" (position: 15)

  Renamed Columns:
    - "Loan ID" -> "Loan Identifier" (similarity: 92%)

  Missing Columns:
    - "Industry Code" (was required)

  Type Changes:
    - "Rating": text -> number (possible enum encoding change)

Resolution Workflow¶

Step 1: Review Drift Report

Examine the detected changes and assess impact.

Step 2: Update Mapping Profile

Add AliasAdd New ColumnHandle Missing

For renamed columns, add the new name as an alias:

{
  "Loan ID": {
    "target": "loan_id",
    "aliases": ["Loan Identifier", "LoanID", "ID"]
  }
}

For new columns, decide whether to map or ignore:

{
  "Spread to Benchmark": {
    "target": "spread_to_benchmark",
    "type": "numeric",
    "required": false
  }
}

For missing required columns, set a default or mark optional:

{
  "Industry Code": {
    "target": "industry_code",
    "required": false,
    "default": "UNKNOWN"
  }
}

Step 3: Test Updated Profile

curl -X POST https://api.calcbridge.io/api/v1/workbooks/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@January_Report.xlsx" \
  -F "mapping_profile=servicer_xyz_monthly" \
  -F "validate_only=true"

Step 4: Document Changes

Log the schema change for future reference:

{
  "change_log": [
    {
      "date": "2024-01-15",
      "change_type": "column_rename",
      "old_name": "Loan ID",
      "new_name": "Loan Identifier",
      "action": "added_alias",
      "source_notification": "Servicer email 2024-01-10"
    }
  ]
}

Validating Imported Data¶

Comprehensive data validation workflow.

Validation Pipeline¶

flowchart TB
    A[Raw Data] --> B[Schema Validation]
    B --> C[Type Validation]
    C --> D[Required Fields]
    D --> E[Range Checks]
    E --> F[Format Validation]
    F --> G[Business Rules]
    G --> H{All Pass?}
    H -->|Yes| I[Import Complete]
    H -->|No| J[Generate Error Report]
    J --> K[Review & Fix]
    K --> A

    style A fill:#DBEAFE,stroke:#3B82F6
    style I fill:#DCFCE7,stroke:#22C55E
    style J fill:#FEE2E2,stroke:#EF4444

Validation Levels¶

Level	Checks	Action on Failure
Error	Critical issues	Block import
Warning	Potential issues	Flag for review
Info	Suggestions	Log only

Running Validation¶

# Run full validation
curl -X POST "https://api.calcbridge.io/api/v1/workbooks/{id}/validate" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "validation_level": "strict",
    "include_warnings": true,
    "generate_report": true
  }'

Validation Report¶

Data Validation Report
======================
Workbook: January_Report.xlsx
Rows Processed: 1,247
Validation Level: Strict

Summary:
  Errors: 3
  Warnings: 12
  Info: 45

Errors (Must Fix):
  Row 156: par_amount is negative (-500000)
  Row 892: cusip format invalid (12345)
  Row 1003: maturity_date is in the past (2023-06-15)

Warnings (Review Recommended):
  Row 45: price unusually high (175.50)
  Row 234: coupon unusually low (0.001)
  [10 more warnings...]

Info:
  45 rows have optional fields missing

Fixing Validation Errors¶

Option 1: Fix at Source

Correct the data in the source file and re-upload.

Option 2: Fix in CalcBridge

Use the data editor to correct individual values:

# Update specific row
curl -X PATCH "https://api.calcbridge.io/api/v1/workbooks/{id}/sheets/{sheet}/data/156" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "par_amount": 500000
  }'

Option 3: Apply Transform Rules

Create rules to automatically fix known issues:

{
  "transform_rules": [
    {
      "condition": "par_amount < 0",
      "action": "set_absolute_value"
    },
    {
      "condition": "cusip.length < 9",
      "action": "left_pad_zeros"
    }
  ]
}

Tips and Best Practices¶

Mapping Management¶

Profile Organization

Create separate profiles for each data source
Version your profiles (v1.0, v1.1, v2.0)
Document the reason for each mapping
Archive old profiles instead of deleting

Data Quality¶

Validate early: Run validation before heavy processing
Track trends: Monitor error rates over time
Root cause: Fix issues at the source when possible
Automate: Set up scheduled validation jobs

Performance Optimization¶

File Size	Recommendation
< 10 MB	Direct upload
10-50 MB	Chunked upload
50-100 MB	Background processing
> 100 MB	Split into multiple files

Troubleshooting Common Issues¶

Issue	Cause	Solution
Columns not mapping	New column name	Add alias to profile
Type conversion error	Unexpected format	Update type definition
Missing data	Schema drift	Review and update mappings
Slow upload	Large file	Use chunked upload
Duplicate detection	Re-upload	Enable deduplication

Quick Reference¶

File Format Support¶

Format	Extension	Full Support
Excel 2007+	.xlsx	Yes
Excel 97-2003	.xls	Yes
Macro-enabled	.xlsm	Yes (macros ignored)
CSV	.csv	Yes

API Endpoints for Data Ops¶

# Upload workbook
POST /api/v1/workbooks/upload

# Get mapping profiles
GET /api/v1/mappings/profiles

# Create mapping profile
POST /api/v1/mappings/profiles

# Validate workbook
POST /api/v1/workbooks/{id}/validate

# Get schema drift report
GET /api/v1/workbooks/{id}/schema-drift

# Get ingest diagnostics
GET /api/v1/workbooks/{id}/diagnostics

# Confirm import
POST /api/v1/workbooks/{id}/confirm

Diagnostic Metrics¶

{
  "ingest_diagnostics": {
    "schema_drift": {
      "detected": true,
      "changes": 3
    },
    "mapping_audit": {
      "matched": 45,
      "unmatched": 5,
      "match_rate": 0.90
    },
    "input_checksum": "sha256:abc123...",
    "formula_risk": {
      "circular_refs": 0,
      "external_refs": 2
    }
  }
}

Next Steps¶

Start Uploading Data View User Guide