Skip to main content

Validation Overview

Comprehensive validation systems ensure data quality and accuracy across all document processing stages. From initial document analysis to final data export, validation safeguards protect against errors and maintain data integrity.

Validation Scope

Document Processing Validation

Validation occurs at multiple stages of the document processing pipeline:

  • Extraction Validation - Verify extracted field data, implement business rules, and validate against reference datasets
  • Segmentation Validation - Ensure proper document section identification and boundary detection
  • Classification Validation - Confirm document type identification and routing accuracy
  • Post-Processing Validation - Validate data transformations and export formatting

Quality Assurance Framework

Multi-layered approach to data quality:

  • Automated Validation - Real-time checks during processing with immediate feedback
  • Business Rule Validation - Custom logic implementation for domain-specific requirements
  • Reference Data Validation - Verification against external datasets and lookup tables
  • Manual Review Integration - Human oversight for complex validation scenarios

Lookup Lists

Validate data against reference datasets and provide autocomplete functionality:

  • Reference Data - Validate extracted values against predefined lists
  • Fuzzy Matching - Handle variations in data formatting and spelling
  • Autocomplete - Provide suggested values during manual data entry
  • Translation Support - Map values to standardized formats
  • Multi-Column Validation - Use multiple data points for validation accuracy

Business Rules

Implement complex validation logic through MongoDB aggregation queries:

  • Data Transformation - Modify extracted data based on business logic
  • Cross-Field Validation - Validate data across multiple fields simultaneously
  • Conditional Logic - Apply validation rules based on document context
  • Automated Corrections - Fix common data extraction errors automatically
  • Audit Trails - Track all rule applications and data modifications

Validation Workflow

1. Enable Validation

Configure validation settings in project or pipeline step settings:

  • Navigate to Validation tab in project settings
  • Toggle "Enable data validation"
  • Configure field-specific validation rules
  • Set up templates and reference data as needed

2. Configure Field Rules

Define validation rules for each extraction field:

  • Data Types - Set appropriate field types (string, number, date, etc.)
  • Validation Settings - Configure type-specific validation parameters
  • Required Status - Mark fields as mandatory or optional
  • Additional Settings - Set up lookup lists, APIs, or calculated fields

3. Structure Templates

Organize fields into logical templates for consistent presentation:

  • Enable Templates - Toggle template functionality in validation settings
  • Create Groups - Define logical groupings for related fields
  • Arrange Layout - Position fields in grid or text format
  • Set Dependencies - Configure field relationships for validation context

4. Set Up Reference Data

Configure lookup lists and external validation:

  • Create Lookup Lists - Import or create reference datasets
  • Configure APIs - Set up external validation endpoints
  • Map Fields - Connect extraction fields to validation sources
  • Test Validation - Verify lookup functionality with sample data

5. Implement Business Rules

Create advanced validation logic through custom rules:

  • Define Rules - Create MongoDB aggregation queries for data transformation
  • Test Logic - Validate rule behavior with sample documents
  • Set Priority - Order rules for sequential application
  • Monitor Performance - Track rule execution and accuracy

Validation Types

Data Type Validation

Built-in validation for common data formats:

String Fields

  • Pattern matching with regular expressions
  • Length restrictions (minimum/maximum characters)
  • Character set validation
  • Format normalization

Number Fields

  • Numeric format validation (integer, decimal)
  • Range validation (minimum/maximum values)
  • Decimal separator handling (comma/period)
  • Currency formatting and validation

Date Fields

  • Date format recognition and parsing
  • Relative date calculation
  • Date range validation
  • Weekday and holiday handling

Calculated Fields

  • Mathematical operations (sum, subtract, multiply)
  • Cross-field calculations using other field values
  • Error margin configuration for calculation accuracy
  • Dependency validation ensuring required fields exist

External Validation

Integration with external systems for advanced validation:

API Lookup

  • Real-time validation against external databases
  • Context-aware validation using multiple field values
  • Custom validation logic through external endpoints
  • Error and warning message handling

Lookup Lists

  • Internal reference data validation
  • Autocomplete suggestions during data entry
  • Translation between different value representations
  • Multi-column validation for complex matching

Validation Process

1. Extraction Completion

Validation triggers after field extraction is complete:

  • System processes all annotated fields
  • Initial data type conversion and formatting
  • Required field validation
  • Basic format validation

2. Field-Level Validation

Individual field validation according to configured rules:

  • Data type compatibility checking
  • Range and pattern validation
  • Custom validation rule application
  • Error and warning generation

3. Cross-Field Validation

Validation involving multiple fields simultaneously:

  • Calculated field computation
  • Dependency validation
  • Business rule application
  • Template structure validation

4. External Validation

Integration with lookup lists and external APIs:

  • Reference data matching
  • API endpoint validation calls
  • Response processing and error handling
  • Validation result integration

5. Final Review

Present validation results for manual review:

  • Error highlighting for failed validation
  • Warning indicators for questionable data
  • Validation success confirmation
  • Export readiness determination

Configuration Examples

Basic Field Validation

Simple validation setup for common field types:

Field Name: Invoice Amount
Type: Currency
Required: Yes
Currency: EUR
Min Value: 0
Max Value: 999999

Validation Result:
✓ Value: €1,234.56 - Valid
✗ Value: "abc" - Error: Invalid currency format
⚠ Value: €0.00 - Warning: Zero amount detected

Template Configuration

Structured field organization:

Template Group: Invoice Header
Layout: Table (2 columns)
Fields:
- [Invoice Number] [Invoice Date]
- [Vendor Name] [Amount]

Template Group: Line Items
Layout: Repeating Table
Fields:
- Description | Quantity | Unit Price | Total

Business Rule Example

Data transformation using MongoDB aggregation:

Rule: "Calculate total amount from line items"
Query: [
{
$set: {
"total_amount.value": {
$sum: "$line_items.total.value"
}
}
}
]

Error Handling

Validation Errors

Critical validation failures that prevent export:

  • Missing Required Fields - Essential data not found or extracted
  • Invalid Data Format - Data doesn't match expected type or pattern
  • Range Violations - Numeric values outside acceptable limits
  • Lookup Failures - Unable to validate against reference data

Validation Warnings

Non-critical issues requiring review but allowing export:

  • Data Quality Concerns - Values that seem unusual but valid
  • Missing Optional Fields - Non-essential fields without data
  • Low Confidence Scores - Extraction confidence below threshold
  • Reference Mismatches - Similar but not exact matches in lookup data

Resolution Strategies

Approaches for addressing validation issues:

  • Manual Correction - Edit field values directly in the interface
  • Re-extraction - Trigger new extraction with different parameters
  • Rule Adjustment - Modify validation rules to accommodate data variations
  • Reference Updates - Update lookup lists or API configurations

Performance Considerations

Optimization Strategies

Ensure efficient validation processing:

  • Rule Ordering - Arrange business rules by execution priority
  • Lookup Caching - Cache frequently accessed reference data
  • API Timeouts - Configure appropriate timeouts for external validation
  • Batch Processing - Group validation operations for efficiency

Monitoring and Maintenance

Track validation performance and accuracy:

  • Validation Success Rates - Monitor percentage of successful validations
  • Processing Times - Track validation execution duration
  • Error Patterns - Identify common validation failures
  • Rule Performance - Assess business rule effectiveness and accuracy