Validation Overview
Comprehensive validation systems ensure data quality and accuracy across all document processing stages. From initial document analysis to final data export, validation safeguards protect against errors and maintain data integrity.
Validation Scope
Document Processing Validation
Validation occurs at multiple stages of the document processing pipeline:
- Extraction Validation - Verify extracted field data, implement business rules, and validate against reference datasets
- Segmentation Validation - Ensure proper document section identification and boundary detection
- Classification Validation - Confirm document type identification and routing accuracy
- Post-Processing Validation - Validate data transformations and export formatting
Quality Assurance Framework
Multi-layered approach to data quality:
- Automated Validation - Real-time checks during processing with immediate feedback
- Business Rule Validation - Custom logic implementation for domain-specific requirements
- Reference Data Validation - Verification against external datasets and lookup tables
- Manual Review Integration - Human oversight for complex validation scenarios
Lookup Lists
Validate data against reference datasets and provide autocomplete functionality:
- Reference Data - Validate extracted values against predefined lists
- Fuzzy Matching - Handle variations in data formatting and spelling
- Autocomplete - Provide suggested values during manual data entry
- Translation Support - Map values to standardized formats
- Multi-Column Validation - Use multiple data points for validation accuracy
Business Rules
Implement complex validation logic through MongoDB aggregation queries:
- Data Transformation - Modify extracted data based on business logic
- Cross-Field Validation - Validate data across multiple fields simultaneously
- Conditional Logic - Apply validation rules based on document context
- Automated Corrections - Fix common data extraction errors automatically
- Audit Trails - Track all rule applications and data modifications
Validation Workflow
1. Enable Validation
Configure validation settings in project or pipeline step settings:
- Navigate to Validation tab in project settings
- Toggle "Enable data validation"
- Configure field-specific validation rules
- Set up templates and reference data as needed
2. Configure Field Rules
Define validation rules for each extraction field:
- Data Types - Set appropriate field types (string, number, date, etc.)
- Validation Settings - Configure type-specific validation parameters
- Required Status - Mark fields as mandatory or optional
- Additional Settings - Set up lookup lists, APIs, or calculated fields
3. Structure Templates
Organize fields into logical templates for consistent presentation:
- Enable Templates - Toggle template functionality in validation settings
- Create Groups - Define logical groupings for related fields
- Arrange Layout - Position fields in grid or text format
- Set Dependencies - Configure field relationships for validation context
4. Set Up Reference Data
Configure lookup lists and external validation:
- Create Lookup Lists - Import or create reference datasets
- Configure APIs - Set up external validation endpoints
- Map Fields - Connect extraction fields to validation sources
- Test Validation - Verify lookup functionality with sample data
5. Implement Business Rules
Create advanced validation logic through custom rules:
- Define Rules - Create MongoDB aggregation queries for data transformation
- Test Logic - Validate rule behavior with sample documents
- Set Priority - Order rules for sequential application
- Monitor Performance - Track rule execution and accuracy
Validation Types
Data Type Validation
Built-in validation for common data formats:
String Fields
- Pattern matching with regular expressions
- Length restrictions (minimum/maximum characters)
- Character set validation
- Format normalization
Number Fields
- Numeric format validation (integer, decimal)
- Range validation (minimum/maximum values)
- Decimal separator handling (comma/period)
- Currency formatting and validation
Date Fields
- Date format recognition and parsing
- Relative date calculation
- Date range validation
- Weekday and holiday handling
Calculated Fields
- Mathematical operations (sum, subtract, multiply)
- Cross-field calculations using other field values
- Error margin configuration for calculation accuracy
- Dependency validation ensuring required fields exist
External Validation
Integration with external systems for advanced validation:
API Lookup
- Real-time validation against external databases
- Context-aware validation using multiple field values
- Custom validation logic through external endpoints
- Error and warning message handling
Lookup Lists
- Internal reference data validation
- Autocomplete suggestions during data entry
- Translation between different value representations
- Multi-column validation for complex matching
Validation Process
1. Extraction Completion
Validation triggers after field extraction is complete:
- System processes all annotated fields
- Initial data type conversion and formatting
- Required field validation
- Basic format validation
2. Field-Level Validation
Individual field validation according to configured rules:
- Data type compatibility checking
- Range and pattern validation
- Custom validation rule application
- Error and warning generation
3. Cross-Field Validation
Validation involving multiple fields simultaneously:
- Calculated field computation
- Dependency validation
- Business rule application
- Template structure validation
4. External Validation
Integration with lookup lists and external APIs:
- Reference data matching
- API endpoint validation calls
- Response processing and error handling
- Validation result integration
5. Final Review
Present validation results for manual review:
- Error highlighting for failed validation
- Warning indicators for questionable data
- Validation success confirmation
- Export readiness determination
Configuration Examples
Basic Field Validation
Simple validation setup for common field types:
Field Name: Invoice Amount
Type: Currency
Required: Yes
Currency: EUR
Min Value: 0
Max Value: 999999
Validation Result:
✓ Value: €1,234.56 - Valid
✗ Value: "abc" - Error: Invalid currency format
⚠ Value: €0.00 - Warning: Zero amount detected
Template Configuration
Structured field organization:
Template Group: Invoice Header
Layout: Table (2 columns)
Fields:
- [Invoice Number] [Invoice Date]
- [Vendor Name] [Amount]
Template Group: Line Items
Layout: Repeating Table
Fields:
- Description | Quantity | Unit Price | Total
Business Rule Example
Data transformation using MongoDB aggregation:
Rule: "Calculate total amount from line items"
Query: [
{
$set: {
"total_amount.value": {
$sum: "$line_items.total.value"
}
}
}
]
Error Handling
Validation Errors
Critical validation failures that prevent export:
- Missing Required Fields - Essential data not found or extracted
- Invalid Data Format - Data doesn't match expected type or pattern
- Range Violations - Numeric values outside acceptable limits
- Lookup Failures - Unable to validate against reference data
Validation Warnings
Non-critical issues requiring review but allowing export:
- Data Quality Concerns - Values that seem unusual but valid
- Missing Optional Fields - Non-essential fields without data
- Low Confidence Scores - Extraction confidence below threshold
- Reference Mismatches - Similar but not exact matches in lookup data
Resolution Strategies
Approaches for addressing validation issues:
- Manual Correction - Edit field values directly in the interface
- Re-extraction - Trigger new extraction with different parameters
- Rule Adjustment - Modify validation rules to accommodate data variations
- Reference Updates - Update lookup lists or API configurations
Performance Considerations
Optimization Strategies
Ensure efficient validation processing:
- Rule Ordering - Arrange business rules by execution priority
- Lookup Caching - Cache frequently accessed reference data
- API Timeouts - Configure appropriate timeouts for external validation
- Batch Processing - Group validation operations for efficiency
Monitoring and Maintenance
Track validation performance and accuracy:
- Validation Success Rates - Monitor percentage of successful validations
- Processing Times - Track validation execution duration
- Error Patterns - Identify common validation failures
- Rule Performance - Assess business rule effectiveness and accuracy
Related Documentation
- Field Types - Detailed field validation configuration
- Field Templates - Template structure and organization
- Lookup Lists - Reference data validation systems
- Business Rules - Advanced validation logic and data transformation
- Export Settings - Final data export configuration