Metadata import

The metadata import feature allows a user to import metadata from platforms that wouldn't otherwise support metadata and it does so based on the contents of a delimited text file or a database table. This can be useful when importing data that is produced during an export routine such as exporting content from a platform that is not yet supported by the application. Metadata import runs after all content has been processed. It will iterate over each row from the import source and attempt to determine the impacted file along with the desired metadata values. If the file can be determined, it will add the necessary metadata to the impacted file.

Important CSV Template Information

If you receive an error when importing a CSV file, the file does not meet the expected format. Compare your file to the sample file provided, and edit the file before attempting to import it again.  

Not Supported

  • When using the metadata import function, configuring a custom metadata content filter is not applicable
  • Job Filters | Filter by Metadata filter will be ignored

Ensure your Platform supports and is setup for metadata

For each platform, you must define your metadata template or the corresponding columns. Please refer to the platform's documentation regarding this setup

File Format

The import source can be any delimited file or even a database table. The import source needs to contain 2 types of information for each row in the file; first, the file path and second, the metadata values.

Important CSV Template Information

If you receive an error when importing a CSV file, the file does not meet the expected format. Compare your file to the sample files provided, and edit the file before attempting to import it again.  

File Path

  • The first column(s) in the import source must be the file path. This can be specified using the column name(s). 

    • Some customers prefer to split the path into 2 columns; 1 column that represents the directory name and 1 column that represents the file name

      • Valid directory column names: FolderName, Folder_Name, and Folder_Structure
      • Valid file column names: FileName, File_Name
    • Some customers prefer to specify the path in 1 column that represents the full path
      • Valid column names: FilePath, Path, FullPath, File_Path, and Full_Path
  • The file path can either be relative to the job source or an absolute path within the source platform. For example, if the job source is /C/files/users/ and the file in question is /C/files/users/jdoe/locations.xls, then the file path in the import source could be /jdoe/locations.xls or /C/files/users/jdoe/locations.xls.

Example:

Download | Metadata Import Sample CSV

skysync_metadata.csv

path,location,order_num,order_date
/C/files/users/jdoe/locations1.xls,US,123456,10/20/2015
/C/files/users/jdoe/locations2.xls,EU,789103,10/21/2015

* Notice that any empty fields in the CSV file are empty quotes. Currently, the CSV parser does not allow empty fields to be unquoted.

Metadata

  • schema (or template): The metadata schema to use when mapping the metadata (optional)

  • [*]: Every other column in the import source is considered as metadata values

    • To include metadata from multiple schemas, you can structure the column name in the following format "property;schema" where "property" represents the metadata property name and "schema" represents the secondary metadata schema ID

Ensure the template key/id is used. Template display name will result in failed import

Example:

Download | Import Metadata from Template and add Custom Metadata Sample CSV

Example | Import to a Box Metadata Template & Create Custom Metadata Fields

path,fieldName1;templateKey,fieldName2;templateKey,customMetadata_fieldName
/C/files/users/jdoe/locations1.xlsx,US,123456,customValue
/C/files/users/jdoe/locations2.xlsx,US,789103,customValue

* Notice that any empty fields in the CSV file are empty quotes. Currently, the CSV parser does not allow empty fields to be unquoted.

Hash

  • If the import file contains a column named "SHA1" or "Hash", the system will validate that the hash in the import file matches the hash of the file on the destination platform. If the hash does not match, then a warning message will be logged.

API Usage

To create a transfer job with metadata import using the ReST API, include the following in your "transfer" block in postman:

{
    "name": "...",
    "transfer": {
        ...,
        "metadata_import": {
            "schema": "optional default schema",
            "source": {
                "retry_failures": true,
                "text": {
                    "delimiter": ",", //optional, will default to comma
                    "target": {
                        "path": "/skysync_metadata.csv"
                    }
                }
            },
            "output": {
                "format": "text"
            }
        }
    }
}

Database Table

The import source can also be from a database table

{
    "name": "...",
    "transfer": {
        ...,
        "metadata_import": {
            "source": {
                "db": {
                    "type": "sqlserver",
                    "connection_string": "SERVER=(local);DATABASE=ContentMetadata;Integrated Security=SSPI",
                    "table": "FileMetadata"
                }
            },
            "output": {
                "format": "text"
            }
        }
    }
}

Metadata Import - Job Details - Post Process

Job will run and complete successfully

METADATA IMPORT PROCESSES AFTER THE JOB HAS RUN

Ensure you wait for this process to be complete before you check the results

Metadata Import is a post-process activity. Items will transfer to the destination even when encountering an error on the metadata import as metadata import failures are logged as warnings. Users should review the Log tab for the metadata import job and sort by 'Warnings', to view metadata that was not applied. Note, that on subsequent job executions the failed metadata import items will be marked 'Flagged' for remediation which further helps identify items that may require intervention.

Review Results

On your source directory where the metadata import file path was configured, review:

  • {{import file name}}-xx-export.csv file indicates metadata import rows that were successful
  • {{import file name}}-xx-processed.csv file indications the rows within the import file that DataHub has processed
  • {{import file name}}-xx-failures.csv file indicates the metadata import rows that failed to be applied

Review the destination to ensure metadata values were applied to your content

Rows in the metadata import file that failed to be applied will be returned to the original file so they can be retried on subsequent runs, if needed. 

Failed Metadata Import

Metadata that fails to be applied during transfer to the destination will be an audit warning activity in the log: Metadata import failed The mapped source does not exist or was not transferred successfully to the destination. (path=/folder/folder...)

  • Failed rows will return to the original import file
  • Review each item and fix the invalid data in the import file

Review Items with Warnings

GET {{url}}v1/transfers/{{job}}/auditing?level=warn

Related Links