Data Operations Analyst Guide¶
Role Overview¶
As a Data Operations Analyst, you are responsible for:
- Managing data ingestion from multiple sources (servicers, custodians, trustees)
- Configuring column mappings to normalize heterogeneous data formats
- Ensuring data quality through validation and cleansing
- Resolving data issues including schema drift and format changes
CalcBridge provides intelligent mapping tools and validation pipelines to streamline your data operations.
What CalcBridge Does for You
- Auto-detects column mappings using alias profiles
- Identifies schema drift before it causes calculation failures
- Validates data quality with configurable rules
- Provides detailed diagnostics for troubleshooting
Daily Workflow¶
flowchart LR
A[Upload Files] --> B[Configure Mappings]
B --> C[Validate Data]
C --> D[Resolve Issues]
D --> E[Confirm Import]
style A fill:#DBEAFE,stroke:#3B82F6
style B fill:#FEF3C7,stroke:#F59E0B
style C fill:#DCFCE7,stroke:#22C55E
style D fill:#FEE2E2,stroke:#EF4444
style E fill:#EDE9FE,stroke:#8B5CF6 Step 1: Upload Files¶
Receive and upload data files from various sources:
# Upload via API
curl -X POST https://api.calcbridge.io/api/v1/workbooks/upload \
-H "Authorization: Bearer $TOKEN" \
-F "file=@servicer_report.xlsx" \
-F "name=January 2024 Servicer Report" \
-F "source=servicer_abc"
Step 2: Configure Mappings¶
Review auto-detected mappings and adjust as needed:
Mapping Detection Results
=========================
Auto-matched: 45/50 columns (90%)
Needs Review: 5 columns
Unmatched Columns:
- "Loan Balance (USD)" -> Suggested: par_amount
- "Cpn Rate" -> Suggested: coupon
- "S&P Rtg" -> Suggested: rating_sp
- "Moody Rtg" -> Suggested: rating_moody
- "Sec Type" -> Suggested: security_type
Step 3: Validate Data¶
Run validation checks on imported data:
- Data type validation
- Required field checks
- Range validations
- Referential integrity
Step 4: Resolve Issues¶
Address any data quality issues identified:
- Missing required fields
- Invalid data formats
- Out-of-range values
- Duplicate records
Step 5: Confirm Import¶
Finalize the import and make data available for analysis:
# Confirm import
curl -X POST "https://api.calcbridge.io/api/v1/workbooks/{id}/confirm" \
-H "Authorization: Bearer $TOKEN"
Key Features¶
Column Mapping¶
CalcBridge uses a two-tier mapping system:
| Tier | Description | Example |
|---|---|---|
| Alias Profiles | Pre-configured synonyms for common data sources | clo_holdings_standard.json |
| Custom Mappings | Tenant-specific overrides | Your organization's mappings |
Schema Detection¶
Automatic detection of:
- Column names and positions
- Data types (number, text, date)
- Null patterns
- Unique identifiers
Data Validation¶
Built-in validation rules:
Validation Rules:
required_fields:
- cusip
- par_amount
- price
data_types:
par_amount: numeric
price: numeric
maturity_date: date
range_checks:
price:
min: 0
max: 200
par_amount:
min: 0
format_checks:
cusip:
pattern: "^[A-Z0-9]{9}$"
Bulk Operations¶
Support for high-volume data processing:
- Multi-file uploads
- Batch validation
- Scheduled imports
- Delta processing
Step-by-Step Tutorials¶
Creating Mapping Profiles¶
Learn how to create reusable column mapping profiles for recurring data sources.
Understanding Alias Profiles¶
CalcBridge includes pre-built alias profiles for common data sources:
config/clo_aliases/
- clo_holdings_standard.json
- everest_extended.json
- ratings_export.json
- invesco_clo_holdings.json
- ft_tearsheet_holdings.json
Tutorial: Create a Custom Profile¶
1. Navigate to Mapping Configuration
Go to Data Ops > Mappings > Create New Profile
2. Define Profile Metadata
{
"profile_name": "servicer_xyz_monthly",
"description": "Monthly report format from Servicer XYZ",
"source_type": "excel",
"version": "1.0",
"created_date": "2024-01-15"
}
3. Define Column Mappings
{
"column_mappings": {
"Loan ID": {
"target": "loan_id",
"type": "string",
"required": true
},
"Outstanding Principal Balance": {
"target": "par_amount",
"type": "numeric",
"required": true,
"aliases": ["Principal Balance", "OPB", "Current Balance"]
},
"Coupon Rate (%)": {
"target": "coupon",
"type": "numeric",
"transform": "divide_by_100",
"aliases": ["Rate", "Cpn Rate", "Interest Rate"]
},
"Maturity": {
"target": "maturity_date",
"type": "date",
"format": "MM/DD/YYYY",
"aliases": ["Mat Date", "Maturity Date", "Final Maturity"]
}
}
}
4. Configure Validation Rules
{
"validation_rules": {
"par_amount": {
"min": 0,
"required": true
},
"coupon": {
"min": 0,
"max": 0.25,
"required": true
},
"maturity_date": {
"min": "today",
"required": true
}
}
}
5. Test the Profile
Upload a sample file using the new profile:
curl -X POST https://api.calcbridge.io/api/v1/workbooks/upload \
-H "Authorization: Bearer $TOKEN" \
-F "file=@sample_report.xlsx" \
-F "mapping_profile=servicer_xyz_monthly" \
-F "validate_only=true"
6. Save and Activate
Once testing is complete, save the profile for production use.
Handling Schema Drift¶
Learn to detect and resolve schema changes in incoming data.
What is Schema Drift?¶
Schema drift occurs when the structure of incoming data changes:
- Column names change
- Columns are added or removed
- Data types change
- Column order changes
Schema Drift Impact
Undetected schema drift can cause:
- Calculation failures
- Missing data in reports
- Incorrect compliance test results
- Data quality issues
Drift Detection¶
CalcBridge automatically detects schema drift during import:
Schema Drift Detection Report
=============================
File: January_Report.xlsx
Profile: servicer_xyz_monthly
Drift Detected: YES
Changes Found:
Added Columns:
- "Spread to Benchmark" (position: 15)
Renamed Columns:
- "Loan ID" -> "Loan Identifier" (similarity: 92%)
Missing Columns:
- "Industry Code" (was required)
Type Changes:
- "Rating": text -> number (possible enum encoding change)
Resolution Workflow¶
Step 1: Review Drift Report
Examine the detected changes and assess impact.
Step 2: Update Mapping Profile
For renamed columns, add the new name as an alias:
For new columns, decide whether to map or ignore:
Step 3: Test Updated Profile
curl -X POST https://api.calcbridge.io/api/v1/workbooks/upload \
-H "Authorization: Bearer $TOKEN" \
-F "file=@January_Report.xlsx" \
-F "mapping_profile=servicer_xyz_monthly" \
-F "validate_only=true"
Step 4: Document Changes
Log the schema change for future reference:
{
"change_log": [
{
"date": "2024-01-15",
"change_type": "column_rename",
"old_name": "Loan ID",
"new_name": "Loan Identifier",
"action": "added_alias",
"source_notification": "Servicer email 2024-01-10"
}
]
}
Validating Imported Data¶
Comprehensive data validation workflow.
Validation Pipeline¶
flowchart TB
A[Raw Data] --> B[Schema Validation]
B --> C[Type Validation]
C --> D[Required Fields]
D --> E[Range Checks]
E --> F[Format Validation]
F --> G[Business Rules]
G --> H{All Pass?}
H -->|Yes| I[Import Complete]
H -->|No| J[Generate Error Report]
J --> K[Review & Fix]
K --> A
style A fill:#DBEAFE,stroke:#3B82F6
style I fill:#DCFCE7,stroke:#22C55E
style J fill:#FEE2E2,stroke:#EF4444 Validation Levels¶
| Level | Checks | Action on Failure |
|---|---|---|
| Error | Critical issues | Block import |
| Warning | Potential issues | Flag for review |
| Info | Suggestions | Log only |
Running Validation¶
# Run full validation
curl -X POST "https://api.calcbridge.io/api/v1/workbooks/{id}/validate" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"validation_level": "strict",
"include_warnings": true,
"generate_report": true
}'
Validation Report¶
Data Validation Report
======================
Workbook: January_Report.xlsx
Rows Processed: 1,247
Validation Level: Strict
Summary:
Errors: 3
Warnings: 12
Info: 45
Errors (Must Fix):
Row 156: par_amount is negative (-500000)
Row 892: cusip format invalid (12345)
Row 1003: maturity_date is in the past (2023-06-15)
Warnings (Review Recommended):
Row 45: price unusually high (175.50)
Row 234: coupon unusually low (0.001)
[10 more warnings...]
Info:
45 rows have optional fields missing
Fixing Validation Errors¶
Option 1: Fix at Source
Correct the data in the source file and re-upload.
Option 2: Fix in CalcBridge
Use the data editor to correct individual values:
# Update specific row
curl -X PATCH "https://api.calcbridge.io/api/v1/workbooks/{id}/sheets/{sheet}/data/156" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"par_amount": 500000
}'
Option 3: Apply Transform Rules
Create rules to automatically fix known issues:
{
"transform_rules": [
{
"condition": "par_amount < 0",
"action": "set_absolute_value"
},
{
"condition": "cusip.length < 9",
"action": "left_pad_zeros"
}
]
}
Tips and Best Practices¶
Mapping Management¶
Profile Organization
- Create separate profiles for each data source
- Version your profiles (v1.0, v1.1, v2.0)
- Document the reason for each mapping
- Archive old profiles instead of deleting
Data Quality¶
- Validate early: Run validation before heavy processing
- Track trends: Monitor error rates over time
- Root cause: Fix issues at the source when possible
- Automate: Set up scheduled validation jobs
Performance Optimization¶
| File Size | Recommendation |
|---|---|
| < 10 MB | Direct upload |
| 10-50 MB | Chunked upload |
| 50-100 MB | Background processing |
| > 100 MB | Split into multiple files |
Troubleshooting Common Issues¶
| Issue | Cause | Solution |
|---|---|---|
| Columns not mapping | New column name | Add alias to profile |
| Type conversion error | Unexpected format | Update type definition |
| Missing data | Schema drift | Review and update mappings |
| Slow upload | Large file | Use chunked upload |
| Duplicate detection | Re-upload | Enable deduplication |
Quick Reference¶
File Format Support¶
| Format | Extension | Full Support |
|---|---|---|
| Excel 2007+ | .xlsx | Yes |
| Excel 97-2003 | .xls | Yes |
| Macro-enabled | .xlsm | Yes (macros ignored) |
| CSV | .csv | Yes |
API Endpoints for Data Ops¶
# Upload workbook
POST /api/v1/workbooks/upload
# Get mapping profiles
GET /api/v1/mappings/profiles
# Create mapping profile
POST /api/v1/mappings/profiles
# Validate workbook
POST /api/v1/workbooks/{id}/validate
# Get schema drift report
GET /api/v1/workbooks/{id}/schema-drift
# Get ingest diagnostics
GET /api/v1/workbooks/{id}/diagnostics
# Confirm import
POST /api/v1/workbooks/{id}/confirm
Diagnostic Metrics¶
{
"ingest_diagnostics": {
"schema_drift": {
"detected": true,
"changes": 3
},
"mapping_audit": {
"matched": 45,
"unmatched": 5,
"match_rate": 0.90
},
"input_checksum": "sha256:abc123...",
"formula_risk": {
"circular_refs": 0,
"external_refs": 2
}
}
}
Next Steps¶
- Complete this guide
- Upload your first data file
- Create a custom mapping profile
- Run data validation
- Set up scheduled imports