Skip to content

Data Operations Analyst Guide

Welcome, Data Operations Analyst

This guide will help you leverage CalcBridge for efficient data ingestion, mapping configuration, and data quality management.


Role Overview

As a Data Operations Analyst, you are responsible for:

  • Managing data ingestion from multiple sources (servicers, custodians, trustees)
  • Configuring column mappings to normalize heterogeneous data formats
  • Ensuring data quality through validation and cleansing
  • Resolving data issues including schema drift and format changes

CalcBridge provides intelligent mapping tools and validation pipelines to streamline your data operations.

What CalcBridge Does for You

  • Auto-detects column mappings using alias profiles
  • Identifies schema drift before it causes calculation failures
  • Validates data quality with configurable rules
  • Provides detailed diagnostics for troubleshooting

Daily Workflow

flowchart LR
    A[Upload Files] --> B[Configure Mappings]
    B --> C[Validate Data]
    C --> D[Resolve Issues]
    D --> E[Confirm Import]

    style A fill:#DBEAFE,stroke:#3B82F6
    style B fill:#FEF3C7,stroke:#F59E0B
    style C fill:#DCFCE7,stroke:#22C55E
    style D fill:#FEE2E2,stroke:#EF4444
    style E fill:#EDE9FE,stroke:#8B5CF6

Step 1: Upload Files

Receive and upload data files from various sources:

# Upload via API
curl -X POST https://api.calcbridge.io/api/v1/workbooks/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@servicer_report.xlsx" \
  -F "name=January 2024 Servicer Report" \
  -F "source=servicer_abc"

Step 2: Configure Mappings

Review auto-detected mappings and adjust as needed:

Mapping Detection Results
=========================
Auto-matched: 45/50 columns (90%)
Needs Review: 5 columns

Unmatched Columns:
- "Loan Balance (USD)" -> Suggested: par_amount
- "Cpn Rate" -> Suggested: coupon
- "S&P Rtg" -> Suggested: rating_sp
- "Moody Rtg" -> Suggested: rating_moody
- "Sec Type" -> Suggested: security_type

Step 3: Validate Data

Run validation checks on imported data:

  • Data type validation
  • Required field checks
  • Range validations
  • Referential integrity

Step 4: Resolve Issues

Address any data quality issues identified:

  • Missing required fields
  • Invalid data formats
  • Out-of-range values
  • Duplicate records

Step 5: Confirm Import

Finalize the import and make data available for analysis:

# Confirm import
curl -X POST "https://api.calcbridge.io/api/v1/workbooks/{id}/confirm" \
  -H "Authorization: Bearer $TOKEN"

Key Features

Column Mapping

CalcBridge uses a two-tier mapping system:

Tier Description Example
Alias Profiles Pre-configured synonyms for common data sources clo_holdings_standard.json
Custom Mappings Tenant-specific overrides Your organization's mappings

Schema Detection

Automatic detection of:

  • Column names and positions
  • Data types (number, text, date)
  • Null patterns
  • Unique identifiers

Data Validation

Built-in validation rules:

Validation Rules:
  required_fields:
    - cusip
    - par_amount
    - price

  data_types:
    par_amount: numeric
    price: numeric
    maturity_date: date

  range_checks:
    price:
      min: 0
      max: 200
    par_amount:
      min: 0

  format_checks:
    cusip:
      pattern: "^[A-Z0-9]{9}$"

Bulk Operations

Support for high-volume data processing:

  • Multi-file uploads
  • Batch validation
  • Scheduled imports
  • Delta processing

Step-by-Step Tutorials

Creating Mapping Profiles

Learn how to create reusable column mapping profiles for recurring data sources.

Understanding Alias Profiles

CalcBridge includes pre-built alias profiles for common data sources:

config/clo_aliases/
  - clo_holdings_standard.json
  - everest_extended.json
  - ratings_export.json
  - invesco_clo_holdings.json
  - ft_tearsheet_holdings.json

Tutorial: Create a Custom Profile

1. Navigate to Mapping Configuration

Go to Data Ops > Mappings > Create New Profile

2. Define Profile Metadata

{
  "profile_name": "servicer_xyz_monthly",
  "description": "Monthly report format from Servicer XYZ",
  "source_type": "excel",
  "version": "1.0",
  "created_date": "2024-01-15"
}

3. Define Column Mappings

{
  "column_mappings": {
    "Loan ID": {
      "target": "loan_id",
      "type": "string",
      "required": true
    },
    "Outstanding Principal Balance": {
      "target": "par_amount",
      "type": "numeric",
      "required": true,
      "aliases": ["Principal Balance", "OPB", "Current Balance"]
    },
    "Coupon Rate (%)": {
      "target": "coupon",
      "type": "numeric",
      "transform": "divide_by_100",
      "aliases": ["Rate", "Cpn Rate", "Interest Rate"]
    },
    "Maturity": {
      "target": "maturity_date",
      "type": "date",
      "format": "MM/DD/YYYY",
      "aliases": ["Mat Date", "Maturity Date", "Final Maturity"]
    }
  }
}

4. Configure Validation Rules

{
  "validation_rules": {
    "par_amount": {
      "min": 0,
      "required": true
    },
    "coupon": {
      "min": 0,
      "max": 0.25,
      "required": true
    },
    "maturity_date": {
      "min": "today",
      "required": true
    }
  }
}

5. Test the Profile

Upload a sample file using the new profile:

curl -X POST https://api.calcbridge.io/api/v1/workbooks/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@sample_report.xlsx" \
  -F "mapping_profile=servicer_xyz_monthly" \
  -F "validate_only=true"

6. Save and Activate

Once testing is complete, save the profile for production use.


Handling Schema Drift

Learn to detect and resolve schema changes in incoming data.

What is Schema Drift?

Schema drift occurs when the structure of incoming data changes:

  • Column names change
  • Columns are added or removed
  • Data types change
  • Column order changes

Schema Drift Impact

Undetected schema drift can cause:

  • Calculation failures
  • Missing data in reports
  • Incorrect compliance test results
  • Data quality issues

Drift Detection

CalcBridge automatically detects schema drift during import:

Schema Drift Detection Report
=============================
File: January_Report.xlsx
Profile: servicer_xyz_monthly

Drift Detected: YES

Changes Found:
  Added Columns:
    - "Spread to Benchmark" (position: 15)

  Renamed Columns:
    - "Loan ID" -> "Loan Identifier" (similarity: 92%)

  Missing Columns:
    - "Industry Code" (was required)

  Type Changes:
    - "Rating": text -> number (possible enum encoding change)

Resolution Workflow

Step 1: Review Drift Report

Examine the detected changes and assess impact.

Step 2: Update Mapping Profile

For renamed columns, add the new name as an alias:

{
  "Loan ID": {
    "target": "loan_id",
    "aliases": ["Loan Identifier", "LoanID", "ID"]
  }
}

For new columns, decide whether to map or ignore:

{
  "Spread to Benchmark": {
    "target": "spread_to_benchmark",
    "type": "numeric",
    "required": false
  }
}

For missing required columns, set a default or mark optional:

{
  "Industry Code": {
    "target": "industry_code",
    "required": false,
    "default": "UNKNOWN"
  }
}

Step 3: Test Updated Profile

curl -X POST https://api.calcbridge.io/api/v1/workbooks/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@January_Report.xlsx" \
  -F "mapping_profile=servicer_xyz_monthly" \
  -F "validate_only=true"

Step 4: Document Changes

Log the schema change for future reference:

{
  "change_log": [
    {
      "date": "2024-01-15",
      "change_type": "column_rename",
      "old_name": "Loan ID",
      "new_name": "Loan Identifier",
      "action": "added_alias",
      "source_notification": "Servicer email 2024-01-10"
    }
  ]
}

Validating Imported Data

Comprehensive data validation workflow.

Validation Pipeline

flowchart TB
    A[Raw Data] --> B[Schema Validation]
    B --> C[Type Validation]
    C --> D[Required Fields]
    D --> E[Range Checks]
    E --> F[Format Validation]
    F --> G[Business Rules]
    G --> H{All Pass?}
    H -->|Yes| I[Import Complete]
    H -->|No| J[Generate Error Report]
    J --> K[Review & Fix]
    K --> A

    style A fill:#DBEAFE,stroke:#3B82F6
    style I fill:#DCFCE7,stroke:#22C55E
    style J fill:#FEE2E2,stroke:#EF4444

Validation Levels

Level Checks Action on Failure
Error Critical issues Block import
Warning Potential issues Flag for review
Info Suggestions Log only

Running Validation

# Run full validation
curl -X POST "https://api.calcbridge.io/api/v1/workbooks/{id}/validate" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "validation_level": "strict",
    "include_warnings": true,
    "generate_report": true
  }'

Validation Report

Data Validation Report
======================
Workbook: January_Report.xlsx
Rows Processed: 1,247
Validation Level: Strict

Summary:
  Errors: 3
  Warnings: 12
  Info: 45

Errors (Must Fix):
  Row 156: par_amount is negative (-500000)
  Row 892: cusip format invalid (12345)
  Row 1003: maturity_date is in the past (2023-06-15)

Warnings (Review Recommended):
  Row 45: price unusually high (175.50)
  Row 234: coupon unusually low (0.001)
  [10 more warnings...]

Info:
  45 rows have optional fields missing

Fixing Validation Errors

Option 1: Fix at Source

Correct the data in the source file and re-upload.

Option 2: Fix in CalcBridge

Use the data editor to correct individual values:

# Update specific row
curl -X PATCH "https://api.calcbridge.io/api/v1/workbooks/{id}/sheets/{sheet}/data/156" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "par_amount": 500000
  }'

Option 3: Apply Transform Rules

Create rules to automatically fix known issues:

{
  "transform_rules": [
    {
      "condition": "par_amount < 0",
      "action": "set_absolute_value"
    },
    {
      "condition": "cusip.length < 9",
      "action": "left_pad_zeros"
    }
  ]
}

Tips and Best Practices

Mapping Management

Profile Organization

  • Create separate profiles for each data source
  • Version your profiles (v1.0, v1.1, v2.0)
  • Document the reason for each mapping
  • Archive old profiles instead of deleting

Data Quality

  1. Validate early: Run validation before heavy processing
  2. Track trends: Monitor error rates over time
  3. Root cause: Fix issues at the source when possible
  4. Automate: Set up scheduled validation jobs

Performance Optimization

File Size Recommendation
< 10 MB Direct upload
10-50 MB Chunked upload
50-100 MB Background processing
> 100 MB Split into multiple files

Troubleshooting Common Issues

Issue Cause Solution
Columns not mapping New column name Add alias to profile
Type conversion error Unexpected format Update type definition
Missing data Schema drift Review and update mappings
Slow upload Large file Use chunked upload
Duplicate detection Re-upload Enable deduplication

Quick Reference

File Format Support

Format Extension Full Support
Excel 2007+ .xlsx Yes
Excel 97-2003 .xls Yes
Macro-enabled .xlsm Yes (macros ignored)
CSV .csv Yes

API Endpoints for Data Ops

# Upload workbook
POST /api/v1/workbooks/upload

# Get mapping profiles
GET /api/v1/mappings/profiles

# Create mapping profile
POST /api/v1/mappings/profiles

# Validate workbook
POST /api/v1/workbooks/{id}/validate

# Get schema drift report
GET /api/v1/workbooks/{id}/schema-drift

# Get ingest diagnostics
GET /api/v1/workbooks/{id}/diagnostics

# Confirm import
POST /api/v1/workbooks/{id}/confirm

Diagnostic Metrics

{
  "ingest_diagnostics": {
    "schema_drift": {
      "detected": true,
      "changes": 3
    },
    "mapping_audit": {
      "matched": 45,
      "unmatched": 5,
      "match_rate": 0.90
    },
    "input_checksum": "sha256:abc123...",
    "formula_risk": {
      "circular_refs": 0,
      "external_refs": 2
    }
  }
}

Next Steps

  • Complete this guide
  • Upload your first data file
  • Create a custom mapping profile
  • Run data validation
  • Set up scheduled imports

Start Uploading Data View User Guide