Skip to main content

Lookup Lists

Lookup lists are Excel-like data objects stored at the team level in Team Resources that help normalize field values by matching against predefined columns of data.

Overview

Lookup lists help you:

  • Normalize extracted values - Match extracted text against your reference data
  • Validate field content - Ensure extracted values exist in your approved datasets
  • Enable autocomplete - Help annotators find the correct values quickly
  • Store translations - Map extracted values to standardized alternatives
  • Maintain consistency - Keep data clean across all document processing

Accessing Lookup Lists

Lookup lists are managed at the team level under Team Resources:

Navigate to: Team → ResourcesLookup Lists

From here you can:

  • Create new lookup lists
  • Import data from Excel or CSV files
  • View and edit existing lists
  • Manage translations
  • Download or erase list data

How Validation Works

When validating a field against a lookup list:

Column Matching

  • Column names must match the field's name_id (also called export_key)
  • The system searches for the extracted value in the matching column
  • Exact matching only - there is no fuzzy matching

Validation Results

  • Single match found - Validation passes
  • Multiple matches - Warning thrown (ambiguous data)
  • No matches - Error or warning thrown (depends on field requirement)

Field Configuration

To use a lookup list on a field:

  1. Select field type - Choose "Lookup List" validation
  2. Choose lookup list - Select from your team's available lists
  3. Configure options - Set autocomplete, translation, and context settings

Lookup List Structure

Lookup lists are organized with columns that contain your reference data:

Columns

  • Column names should match your field name_id for validation
  • Up to 26 columns supported (A through Z)
  • Editable columns can be modified directly in the interface
  • Description columns provide additional context

Example: Country List

country_codecountry_nameregion
USUnited StatesNorth America
GBUnited KingdomEurope
DEGermanyEurope
JPJapanAsia

For a field with name_id = "country_code", validation will search the country_code column.

Autocomplete Feature

When autocomplete is enabled on a field:

  • Search functionality - Users can type to search the lookup list
  • Dropdown suggestions - Matching values appear as options
  • Quick selection - Click to select the correct value
  • Improved accuracy - Reduces manual typing errors

Autocomplete helps annotators find the right values quickly without memorizing the entire list.

Translation System

Lookup lists include a translation preprocessing step that transforms annotation values before matching:

How Translation Works

  1. Translation preprocessing - Annotation values are first run through the translation list
  2. Value transformation - Translation maps input values to standardized forms
  3. Lookup matching - Transformed values are then matched against lookup data
  4. Case-insensitive - Both translation and matching ignore case differences

Translation as Preprocessing

  • Input transformation - Raw annotation values are converted before lookup
  • Multiple translations - Single input value can have multiple translation options
  • Standardization - Converts variations into consistent lookup values
  • Fallback - If no translation exists, original value is used for matching

Context-Based Validation

Multi-field context:

  • Combined annotations - Multiple annotation fields can be combined into one lookup action
  • Multi-column matching - Context enables matching against multiple lookup list columns
  • Enhanced accuracy - Additional context improves match precision
  • Relationship validation - Validates related field combinations together

Metadata Enrichment

When a successful match is found:

  • Full row data - All column data from the matched row is collected
  • Metadata object - Complete row information is appended to the annotation
  • Additional context - Enriches annotations with related lookup data
  • Downstream usage - Metadata can be used by subsequent processing steps

Building Translation Maps

Add translations for:

  • Common misspellings - Frequent OCR or typing errors
  • Alternative formats - Different ways to express the same concept
  • Historical variations - Old naming conventions still appearing in documents
  • Regional differences - Locale-specific terminology

Validation Behavior

Validation Process

  1. Translation preprocessing - Run annotation values through translation list to get standardized forms
  2. Context assembly - Combine multiple annotations if context fields are configured
  3. Lookup matching - Match translated values against lookup list data (single or multi-column)
  4. Metadata enrichment - Append full row data to annotation metadata on successful match
  5. Result classification:
    • Valid - Successful match found, metadata added to annotation
    • Warning - Multiple possible matches detected, or no match for non-required field
    • Error - No match found for required field

Error Handling

No matches found:

  • Required fields - Generates error, document cannot proceed
  • Non-required fields - Generates warning, document can still be processed
  • Review extracted value for missing variations
  • Add the value directly to lookup list
  • Create translation mapping for the variation

Multiple matches:

  • Warning generated - System cannot determine single correct match
  • Review data to identify cause of multiple matches
  • Refine list values or translation rules for clarity

Validation Rules

Exact matching only:

  • No fuzzy or approximate matching
  • Extracted value must match exactly (ignoring case)
  • Translations provide controlled alternative forms
  • Field requirement determines severity - Required fields throw errors, non-required fields show warnings ## Data Import and Management

File Import

  • Supported formats - Excel (.xlsx) or CSV files
  • Maximum size - Up to 100,000 rows per import
  • Column mapping - Import columns automatically map to your lookup list structure
  • Import modes - Choose between "Append" (add to existing) or "Overwrite" (replace all data)

Import Process

  1. Upload file - Drag and drop or select Excel/CSV file
  2. Preview data - Review how columns will be mapped
  3. Choose mode - Append new data or overwrite existing
  4. Import - Process the file (may take up to 1 minute for large overwrites)

Available Actions

On each lookup list:

  • Import data - Add or replace data from Excel/CSV files
  • Download - Export current list data and translations to Excel
  • Erase data - Remove all list content (keeps list structure)
  • Delete list - Permanently remove the entire lookup list

Performance Notes

  • Large overwrites - May cause up to 1 minute downtime during processing
  • Import validation - Files are validated before import to prevent errors
  • Automatic backup - Download existing data before large overwrites

Using Context Fields

When configuring lookup list validation, you can specify context fields to include additional data in the validation process:

  • Related field values - Include values from other fields for more accurate matching
  • Enhanced validation - Use context to disambiguate similar values
  • Better translations - Context helps create more accurate translation mappings

Best Practices

Data Organization

  1. Clear column names - Match column names exactly to field name_id values
  2. Consistent formatting - Use consistent data formats within each column
  3. Regular updates - Keep lists current with new values as they appear
  4. Backup before changes - Download lists before major updates

Validation Setup

  1. Enable autocomplete - Help users find correct values quickly
  2. Use translations - Build up translation maps over time
  3. Monitor warnings - Review multiple matches and add translations to resolve
  4. Test with real data - Validate lists work with actual document content

Lookup lists provide essential data normalization capabilities while remaining flexible enough to handle diverse document types and evolving business requirements.