Import and manage Datasets

From Genesys Documentation
Revision as of 13:46, January 17, 2020 by WikiSysop (talk | contribs)
Jump to: navigation, search
This topic is part of the manual Genesys Predictive Routing Help for version 9.0.0 of Genesys Predictive Routing.
Read this topic for other versions:

Predictive Routing Datasets can include a broad range of data used to create Predictors and to train and test Models.

  • Interaction Datasets are created by uploading data directly from Genesys Info Mart. See Configure Data Loader to upload data for instructions.
  • Datasets with outcome data, and other data not derived from Genesys Info Mart, is uploaded from CSV files, as explained in this topic.
Related documentation:

Extract an interaction Dataset from Genesys Info Mart

Interaction Datasets are created by extracting the data from the Genesys Info Mart Database.

To extract the desired data, use the following procedure:

  1. Create an SQL query to retrieve the interaction features from the database.
    • The /dl folder in the Data Loader Docker container contains an example SQL query that retrieves agent handle time (AHT) metric data for the interactions between the given start and end dates. Your query can extract data for AHT or any other metric you have the necessary data for.
    • You can extend the query to extract the columns from the Genesys Info Mart Database that contain user data relevant to the selected interactions.
    1. Place the SQL query into one of the folders on the host that is mapped to the Docker container
    2. Provide the path to the query file location within the container in the [dataset-<your_name>].sql-query configuration option, which is configured on the Data Loader Application object.
  2. Upload the extracted dataset to the GPR application.
    1. Add two sections in the Data Loader Application object: [dataset-<name>] and [schema-<name>], where <name> is the name you specify for the Dataset. You must use the same <name> value for the dataset- and the schema- sections.
    2. Configure the following options in the [dataset-<name>] section:
      • start-date
      • end-date
      • chunk-size
      • sql-query
      • data-type - MUST have the value interactions
      • upload-dataset - set this to false initially, to prevent Data Loader from trying to upload the Dataset while you are still configuring it. When you having finished configuring the Dataset, change the value to true to start the upload process.
    3. Configure the [schema-<name>] section. This configuration consists of adding options equivalent to ALL column names that the SQL query you created extracts from the Genesys Info Mart Database.
      • The options values must specify the datatype for each column.
      • One column must contain the timestamps for the interactions. Use the modifier “created_at” for the value of the option containing the timestamp column. For example, you might configure the following option: START_TS=timestamp,created_at
      • If the Dataset contains columns holding sensitive or PII data, use the “anon” modifier in the values of the options with these column names. For example, you might configure the following option: Customer_ANI=string,anon
  3. When you have completed your configuration, set the upload-dataset option value to true. Data Loader checks the value of this option every 15 minutes. After a delay of no longer than 15 minutes, therefore, Data Loader starts to upload the data you specified into the configured Dataset.

Upload a Dataset from a CSV file

Predictive Routing supports Datasets containing outcome data taken from sources other than Genesys Info Mart. To upload this data, use the following procedure:

  1. Create a CSV file containing the desired data, collected into a consistent schema.
    1. Add two sections in the Data Loader Application object: [dataset-<name>] and [schema-<name>], where <name> is the name you specify for the Dataset. You must use the same <name> value for the dataset- and the schema- sections.
    2. Configure the following options in the [dataset-<name>] section:
      • location
      • csv-separator
      • data-type - MUST have the value outcomes
      • upload-dataset - set this to false initially, to prevent Data Loader from trying to upload the Dataset while you are still configuring it. When you having finished configuring the Dataset, change the value to true to start the upload process.
      When configuring for a CSV file upload, you can omit the following options: start-date, end-date, sql-query.
    3. Configure the [schema-<name>] section. This configuration consists of adding options equivalent to ALL column names that the SQL query you created extracts from the Genesys Info Mart Database.
      • The options values must specify the datatype for each column.
      • One column must contain the timestamps for the interactions. Use the modifier “created_at” for the value of the option containing the timestamp column. For example, you might configure the following option: START_TS=timestamp,created_at
      • If the Dataset contains columns holding sensitive or PII data, use the “anon” modifier in the values of the options with these column names. For example, you might configure the following option: Customer_ANI=string,anon
  2. When you have completed your configuration, set the upload-dataset option value to true. Data Loader checks the value of this option every 15 minutes. After a delay of no longer than 15 minutes, therefore, Data Loader starts to upload the data you specified into the configured Dataset.

Monitor Dataset upload progress

Select CSV file to upload data
  1. Click the Settings gear icon GPMsettingsGear.png, located on the right side of the top menu bar, to open the Settings menu (appears on the left side of the window).
  2. Click the Datasets tab to open the Datasets window.
    • The right-hand toggle navigation menu enables you to view a tree view of all Datasets associated with your Account, with the Predictors and Models configured for each. To open or close this navigation menu, click the GPRDatPredModNavIcon.png icon.
    • You must reload this page to view updates made using the Predictive Routing API, such as appending data to a Dataset.
  3. Click Create Dataset.
  4. In the Source File field, click in the box to browse to your CSV file, or drag and drop the file into the box.

The Predictive Routing application automatically determines the separator type and the schema. It uses the first 500 rows to set the schema. If there are missing or inconsistent values in these rows, the schema might not appear correctly. The instructions in the following section explain how to verify the schema and upload the data.

Verify datatypes and set timestamp field

At this stage, before the data is uploaded, review and configure your Dataset.

  1. Enter a name for your Dataset.
  2. Scroll down the list of fields and ensure that the datatypes have been discovered correctly. If any are incorrect, select the correct datatype from the Type drop-down list.
  3. Click the X in a row to remove it from the Dataset. If you change your mind before you upload your data (that is, before you click the Create button on the window), scroll to the bottom of the list of fields and click in the None Selected text box. Choose the field to be restored, and click Add Back.
  4. Set the Created At field, which determines the time the record was created or the time the interaction occurred, depending on the values stored in the field you select. The drop-down list contains all fields with the Timestamp datatype.
  5. (Optional) Click in the PII Fields text box to select the fields that contain personally identifiable or sensitive information, such as Social Security numbers, passwords, names and addresses, phone numbers, and so on. Data in these fields is anonymized when you create the Dataset. See Data Anonymization for important information about how data is anonymized. Note that you cannot anonymize the Created At timestamp field.
  6. Click Create to upload the data.
  • If you try to upload a file that contains more than 100 columns, GPR generates an error message. Cancel out of the upload procedure, edit your CSV file to reduce the number of columns, and then repeat the upload procedure.
  • If you try to upload a file with too many columns using the GPR API, the error message appears in the Dataset Description field in the GPR application.
  • If you try to upload more than 2.5 million rows to a Dataset, only the first 2.5 million rows are stored and the rest are discarded. A warning message appears on the Dataset Uploads tab specifying how many rows were discarded. The Append button is disabled until you delete some rows from the Dataset.

View the list of Datasets

After you create a Dataset, it appears in the table of Datasets on the Settings: Datasets window.
SettingsDatasetList.png
  • To delete a Dataset from the list, select the check box in the leftmost column, then click Delete Selected.
  • Click a Dataset name to view that Dataset and to append data.

View a Dataset schema

When you click a Dataset name in the Settings: Datasets list, the Dataset Schema tab is displayed. It shows all of the columns in your schema and includes the following information:

  • Field Name and Type - the name of the column in the Dataset and the datatype for that field. The field specified as the Created At Timestamp field is marked with a green identifying box.
  • Cardinality - the number of unique values that occur in that column. If there are more than 1000, this field shows the value as 1000+. Click the cardinality value to open a pop-up window that displays the first 1000 unique values that occur in the field.
  • Missing Values and Invalid Values - The number of rows in which the value for that column is either missing or is in some way invalid. For example, there might be an alphabetical string in a numeric field. The number is followed by the percentage of rows with missing or invalid values. Use these fields to determine whether the data quality is satisfactory.
    • Invalid values are discarded from the Dataset. If the Created At Timestamp row contains missing or invalid values, the entire row is discarded
  • PII - Anonymized fields have a check mark in this column.

You can sort the table by clicking any column header.

Append data to a Dataset

To add more data to an existing Dataset:

Select CSV file to append

  1. Open the Schema tab for a specific Dataset.
  2. Click Append Data. The Append Data pop-up window opens.
  3. Select the desired CSV file. It must comply with the schema for the existing Dataset.
    If your appended CSV file has errors, a red banner appears with a notification message. This message contains a link to open a pop-up window where you can view the specific errors. Also, the Missing Values and Invalid Values columns in the Dataset Schema table are updated to display the number and percentage of errors for each Dataset field.
    If you append a CSV file containing more columns than appear in the original schema, the extra columns are automatically discarded.

The Uploads tab

List of CSV files in Dataset
Click the Uploads tab to view a table listing the CSV files uploaded to the current Dataset. For each CSV file, the table shows information to help you evaluate the source and quality of your data.
  • Red numbers in the Missing Values and Invalid Values columns indicate gaps or inconsistencies in the data.
  • The Status column provides a quick view of whether any CSV files have data issues that can cause problems when using the Dataset for training Models or scoring agents. Hover your mouse over the status icon in a row to see a tooltip that explains the reason for the status.
    The way the status is calculated depends on the number of uploads you have done. For the first five uploads, the status is calculated based on a simple percentage of successfully-imported values. For the sixth and later uploads, the status is calculated relative to the average results of all uploads.
Status Uploads 1-5 Uploads 6 and above

Calculated based on the average of the missing + invalid values for all previous uploads

Green checkmark icon = Success Fewer than 5% of all values in the CSV file are missing or invalid. From 0% to (average% + 3%)
Yellow caution icon = Warning between 5% and 50% of the values are missing or invalid. From (average% + 3%) to (average% + 13%)
Yellow half-circle icon = Warning The CSV file contained more than 2.5 million rows, so that some rows were not uploaded. The CSV file contained more than 2.5 million rows, so that some rows were not uploaded.
Red stop icon = Error More than 50% of the values in the CSV file are missing or invalid. From (average% + 13%) to 100%

You can remove a problematic CSV file from the Dataset without deleting the entire Dataset. To do so:

  • Click the check box on the left side of the CSV row, and then click the trashcan icon that appears above the table.
Retrieved from "https://all.docs.genesys.com/PE-GPR/9.0.0/Help/Datasets (2025-07-20 18:14:40)"
Comments or questions about this documentation? Contact us for support!