Difference between revisions of "PE-GPR/9.0.0/Help/Datasets"

From Genesys Documentation
Jump to: navigation, search
(Published)
(Published)
Line 1: Line 1:
 
{{Article
 
{{Article
 
|Standalone=No
 
|Standalone=No
|DisplayName=Import and manage datasets
+
|DisplayName=Upload interaction and other data
|TocName=Import and manage datasets
+
|TocName=Upload interaction and other data
|Context=Predictive Routing datasets can include a broad range of data used to create Predictors and to train and test Models.
+
|Context=Predictive Routing datasets can include a broad range of data used to create Predictors and to train and test Models. This topic explains how to upload this data using the GPR web application.
 
 
*Data Loader creates interaction and Agent Profile datasets by uploading data directly from Genesys Info Mart. See {{Link-SomewhereInThisVersion|manual=Deployment|topic=DL-CfgFile|display text=Configure Data Loader to upload data}} for instructions.
 
*Datasets containing outcome data, and other data not derived from Genesys Info Mart, is uploaded from CSV files, as explained in this topic.
 
 
|ComingSoon=No
 
|ComingSoon=No
 
|Platform=PureEngage
 
|Platform=PureEngage
Line 14: Line 11:
 
|Status=No
 
|Status=No
 
}}{{Section
 
}}{{Section
|sectionHeading=Extract an interaction dataset from Genesys Info Mart
+
|sectionHeading=Upload a dataset from a CSV file
|anchor=extractGIM
+
|anchor=uploadDataset
 
|alignment=Vertical
 
|alignment=Vertical
|structuredtext=Interaction datasets are created by extracting the data from the Genesys Info Mart Database.  
+
|structuredtext=Predictive Routing supports datasets containing interaction and outcome data, and any other data available in your environment relevant to the metrics you intend to optimize.  
 
 
To extract the desired data, use the following procedure:
 
  
#Create an SQL query to retrieve the interaction features from the database.
+
[[File:createNewDataset.png|left|thumb|Select CSV file to upload data]]
#*The '''/dl''' folder in the Data Loader Docker container contains an example SQL query that retrieves agent handle time (AHT) metric data for the interactions between the given start and end dates. Your query can extract data for AHT or any other metric you have the necessary data for.
 
#*You can extend the query to extract the columns from the Genesys Info Mart Database that contain user data relevant to the selected interactions.
 
##Place the SQL query into one of the folders on the host that is mapped to the Docker container
 
##Provide the path to the query file location within the container in the '''[dataset-<your_name>].sql-query''' configuration option, which is configured on the Data Loader Application object.
 
#Upload the extracted dataset to the GPR application.
 
##Add two sections in the Data Loader Application object: '''[dataset-<name>]''' and '''[schema-<name>]''', where ''<name>'' is the name you specify for the Dataset. You must use the same ''<name>'' value for the '''dataset-''' and the '''schema-''' sections.
 
##Configure the following options in the '''[dataset-<name>]''' section:
 
##*'''start-date'''
 
##*'''end-date'''
 
##*'''chunk-size'''
 
##*'''sql-query'''
 
##*'''data-type''' - MUST have the value '''interactions'''
 
##*'''upload-dataset''' - set this to '''false''' initially, to prevent Data Loader from trying to upload the Dataset while you are still configuring it. When you having finished configuring the Dataset, change the value to '''true''' to start the upload process.
 
##Configure the '''[schema-<name>]''' section. This configuration consists of adding options equivalent to ALL column names that the SQL query you created extracts from the Genesys Info Mart Database.
 
##*The options values must specify the datatype for each column.
 
##*One column must contain the timestamps for the interactions. Use the modifier “created_at” for the value of the option containing the timestamp column. For example, you might configure the following option: '''START_TS=timestamp,created_at'''
 
##*If the dataset contains columns holding sensitive or PII data, use the “anon” modifier in the values of the options with these column names. For example, you might configure the following option: '''Customer_ANI=string,anon'''
 
#When you have completed your configuration, set the '''upload-dataset''' option value to '''true'''. Data Loader checks the value of this option every 15 minutes. After a delay of no longer than 15 minutes, therefore, Data Loader starts to upload the data you specified into the configured dataset.
 
|Status=No
 
}}{{Section
 
|sectionHeading=Upload a dataset from a CSV file
 
|alignment=Vertical
 
|structuredtext=Predictive Routing supports datasets containing outcome data taken from sources other than Genesys Info Mart. To upload this data, use the following procedure:
 
  
#Create a CSV file containing the desired data, collected into a consistent schema.
+
To upload this data, use the following procedure:
##Add two sections in the Data Loader <nowiki><tt>Application</tt></nowiki> object: '''[dataset-<name>]''' and '''[schema-<name>]''', where ''<name>'' is the name you specify for the dataset. You must use the same ''<name>'' value for the '''dataset-''' and the '''schema-''' sections.
 
##Configure the following options in the '''[dataset-<name>]''' section:
 
##*'''location'''
 
##*'''csv-separator'''
 
##*'''data-type''' - MUST have the value '''outcomes'''
 
##*'''upload-dataset''' - set this to '''false''' initially, to prevent Data Loader from trying to upload the dataset while you are still configuring it. When you having finished configuring the dataset, change the value to '''true''' to start the upload process.
 
##:When configuring for a CSV file upload, omit the following options: '''start-date''', '''end-date''', '''sql-query'''.
 
##Configure the '''[schema-<name>]''' section. This configuration consists of adding options equivalent to ALL column names that the SQL query you created extracts from the Genesys Info Mart Database.
 
##*The option values must specify the datatype for each column (numeric, string, and so on).
 
##*One column must contain the timestamps for the interactions. Use the modifier “created_at” for the value of the option containing the timestamp column. For example, you might configure the following option: '''START_TS=timestamp,created_at'''
 
##*If the dataset contains columns holding sensitive or PII data, use the “anon” modifier in the values of the options with these column names. For example, you might configure the following option: '''Customer_ANI=string,anon'''
 
#When you have completed your configuration, set the '''upload-dataset''' option value to '''true'''. Data Loader checks the value of this option every 15 minutes. After a delay of no longer than 15 minutes, therefore, Data Loader starts to upload the data you specified into the configured dataset.
 
|Status=No
 
}}{{Section
 
|sectionHeading=Monitor dataset upload progress
 
|anchor=newDataset
 
|alignment=Vertical
 
|structuredtext=[[File:createNewDataset.png|left|thumb|Select CSV file to upload data]]
 
  
#Click the '''Settings''' gear icon [[File:GPMsettingsGear.png|50px]], located on the right side of the top menu bar, to open the '''Settings''' menu (appears on the left side of the window).
+
#Create a CSV file containing the desired data, collected into a consistent schema. See {{Link-SomewhereInThisProduct|product=PE-GPR|version=9.0.0|manual=Deployment|topic=dataReqs|context manual=Help|display text=Set up data for import}} for data requirements and recommendations.
#Click the''' Datasets''' tab to open the Datasets window.
+
#To open the configuration menu, click the '''Settings''' gear icon, located on the right side of the top menu bar: [[File:GPMsettingsGear.png|50px]].
#*The right-hand toggle navigation menu enables you to view a tree view of all datasets associated with your Account, with the Predictors and Models configured for each. To open or close this navigation menu, click the [[File:GPRDatPredModNavIcon.png|25 px]] icon.
+
#Click the '''Datasets''' tab, then click '''Create Dataset'''. The '''Create Dataset''' window opens.
#*You must reload this page to view updates made using the Predictive Routing API, such as appending data to a dataset.
+
#Click in the '''Source File''' box, then navigate to your CSV file and select it.
#Click '''Create Dataset'''.
 
#In the '''Source File''' field, click in the box to browse to your CSV file, or drag and drop the file into the box.
 
  
The GPR application automatically determines the separator type and the schema. It uses the first 500 rows to set the schema. If there are missing or inconsistent values in these rows, the schema might not appear correctly. The instructions in the following section explain how to verify the schema and upload the data.
+
GPR reads the CSV file, determines the separator type and the schema, and displays the results. To continue creating the dataset, follow these steps:
|FAQHeading=How do I upload data from a CSV file?
 
|Status=No
 
}}{{Section
 
|sectionHeading=Verify datatypes and set the timestamp field
 
|anchor=verifyDataset
 
|alignment=Vertical
 
|structuredtext=At this stage, before the data is uploaded, review and configure your dataset.
 
  
 
#Enter a name for your dataset.
 
#Enter a name for your dataset.
Line 93: Line 38:
 
*If you try to upload a file with too many columns using the GPR API, the error message appears in the dataset '''Description''' field in the GPR application.
 
*If you try to upload a file with too many columns using the GPR API, the error message appears in the dataset '''Description''' field in the GPR application.
 
*If you try to upload more than 2.5 million rows to a dataset, only the first 2.5 million rows are stored and the rest are discarded. A warning message appears on the dataset '''Uploads''' tab specifying how many rows were discarded. The '''Append''' button is disabled until you delete some rows from the dataset.
 
*If you try to upload more than 2.5 million rows to a dataset, only the first 2.5 million rows are stored and the rest are discarded. A warning message appears on the dataset '''Uploads''' tab specifying how many rows were discarded. The '''Append''' button is disabled until you delete some rows from the dataset.
|FAQHeading=How do I make sure the dataset schema imported correctly?
 
|Status=No
 
}}{{Section
 
|sectionHeading=View the list of datasets
 
|anchor=viewDatasets
 
|alignment=Vertical
 
|structuredtext=After you create a dataset, it appears in the table of datasets on the '''Settings: Datasets''' window.
 
[[File:SettingsDatasetList.png|left|700px]]
 
  
*To delete a dataset from the list, select the check box in the leftmost column, then click '''Delete Selected'''.
+
See {{Link-SomewhereInThisVersion|manual=Help|topic=viewDatasets|display text=View uploaded datasets}} for a description of how the GPR we application displays datasets, fields within datasets, and individual  dataset uploads.
*Click a dataset name to view that dataset and to append data.
+
|FAQHeading=How do I upload a dataset using the GPR web application?
|Status=No
 
}}{{Section
 
|sectionHeading=View a dataset schema
 
|anchor=viewDatasetSchema
 
|alignment=Vertical
 
|structuredtext=When you click a dataset name in the '''Settings: Datasets''' list, the Dataset Schema tab is displayed. It shows all of the columns in your schema and includes the following information:
 
 
 
*'''Field Name''' and '''Type''' - the name of the column in the dataset and the datatype for that field. The field specified as the '''Created At''' Timestamp field is marked with a green identifying box.
 
*'''Cardinality''' - the number of unique values that occur in that column. If there are more than 1000, this field shows the value as '''1000+'''. Click the cardinality value to open a pop-up window that displays the first 1000 unique values that occur in the field.
 
*'''Missing Values''' and '''Invalid Values''' - The number of rows in which the value for that column is either missing or is in some way invalid. For example, there might be an alphabetical string in a numeric field. The number is followed by the percentage of rows with missing or invalid values. Use these fields to determine whether the data quality is satisfactory.
 
**Invalid values are discarded from the dataset. If the '''Created At''' Timestamp row contains missing or invalid values, the entire row is discarded
 
*'''PII''' - Anonymized fields have a check mark in this column.
 
 
 
You can sort the table by clicking any column header.
 
 
|Status=No
 
|Status=No
 
}}{{Section
 
}}{{Section
Line 125: Line 48:
 
|structuredtext=To add more data to an existing dataset:
 
|structuredtext=To add more data to an existing dataset:
  
[[File:AppendWindow.png|right|thumb|Select CSV file to append]]<br />
+
[[File:AppendWindow.png|right|thumb|Select CSV file to append]]
  
 
#Open the '''Schema''' tab for a specific dataset.
 
#Open the '''Schema''' tab for a specific dataset.
 
#Click '''Append Data'''. The '''Append Data''' pop-up window opens.
 
#Click '''Append Data'''. The '''Append Data''' pop-up window opens.
 
#Select the desired CSV file. It must comply with the schema for the existing dataset.  
 
#Select the desired CSV file. It must comply with the schema for the existing dataset.  
#:If your appended CSV file has errors, a red banner appears with a notification message. This message contains a link to open a pop-up window where you can view the specific errors. Also, the '''Missing Values''' and '''Invalid Values''' columns in the dataset '''Schema''' table are updated to display the number and percentage of errors for each dataset field.
+
#*If your appended CSV file has errors, a red banner appears with a notification message. This message contains a link to open a pop-up window where you can view the specific errors. Also, the '''Missing Values''' and '''Invalid Values''' columns in the dataset '''Schema''' table are updated to display the number and percentage of errors for each dataset field.
#:If you append a CSV file containing more columns than appear in the original schema, the extra columns are automatically discarded.
+
#*If you append a CSV file containing more columns than appear in the original schema, the extra columns are automatically discarded.
 
|FAQHeading=How do I add data to an existing dataset?
 
|FAQHeading=How do I add data to an existing dataset?
|Status=No
 
}}{{Section
 
|sectionHeading=The Uploads tab
 
|anchor=uploads
 
|alignment=Vertical
 
|structuredtext=[[File:AppendDatasets.png|left|thumb|List of CSV files in Dataset]] Click the '''Uploads''' tab to view a table listing the CSV files uploaded to the current dataset. For each CSV file, the table shows information to help you evaluate the source and quality of your data.
 
 
*Red numbers in the '''Missing Values''' and '''Invalid Values''' columns indicate gaps or inconsistencies in the data.
 
*The '''Status''' column provides a quick view of whether any CSV files have data issues that can cause problems when using the dataset for training Models or scoring agents. Hover your mouse over the status icon in a row to see a tooltip that explains the reason for the status.
 
*:The way the status is calculated depends on the number of uploads you have done. For the first five uploads, the status is calculated based on a simple percentage of successfully-imported values. For the sixth and later uploads, the status is calculated relative to the average results of all uploads.
 
 
{{{!}} border="1"
 
{{!}}-
 
!Status
 
!Uploads 1-5
 
!Uploads 6 and above
 
Calculated based on the average of the missing + invalid values for all previous uploads
 
{{!}}-
 
{{!}}Green checkmark icon = Success
 
{{!}}Fewer than 5% of all values in the CSV file are missing or invalid.
 
{{!}}From 0% to (average% + 3%)
 
{{!}}-
 
{{!}}Yellow caution icon = Warning
 
{{!}}between 5% and 50% of the values are missing or invalid.
 
{{!}}From (average% + 3%) to (average% + 13%)
 
{{!}}-
 
{{!}}Yellow half-circle icon = Warning
 
{{!}}The CSV file contained more than 2.5 million rows, so that some rows were not uploaded.
 
{{!}}The CSV file contained more than 2.5 million rows, so that some rows were not uploaded.
 
{{!}}-
 
{{!}}Red stop icon = Error
 
{{!}}More than 50% of the values in the CSV file are missing or invalid.
 
{{!}}From (average% + 13%) to 100%
 
{{!}}}
 
You can remove a problematic CSV file from the Dataset without deleting the entire dataset. To do so:
 
 
*Click the check box on the left side of the CSV row, and then click the trashcan icon that appears above the table.
 
 
|Status=No
 
|Status=No
 
}}
 
}}
 
}}
 
}}

Revision as of 20:00, March 5, 2020

This topic is part of the manual Genesys Predictive Routing Help for version 9.0.0 of Genesys Predictive Routing.
Read this topic for other versions:

Predictive Routing datasets can include a broad range of data used to create Predictors and to train and test Models. This topic explains how to upload this data using the GPR web application.

Related documentation:

WARNING: Although the Predictive Routing web application includes data upload functionality, its use is deprecated in favor of data uploads using Data Loader. If you upload from the GPR web application, note that using both Data Loader and the UI to upload data creates conflicts and presents a high risk of data corruption.

Upload a dataset from a CSV file

Predictive Routing supports datasets containing interaction and outcome data, and any other data available in your environment relevant to the metrics you intend to optimize.

Select CSV file to upload data

To upload this data, use the following procedure:

  1. Create a CSV file containing the desired data, collected into a consistent schema. See Set up data for import for data requirements and recommendations.
  2. To open the configuration menu, click the Settings gear icon, located on the right side of the top menu bar: GPMsettingsGear.png.
  3. Click the Datasets tab, then click Create Dataset. The Create Dataset window opens.
  4. Click in the Source File box, then navigate to your CSV file and select it.

GPR reads the CSV file, determines the separator type and the schema, and displays the results. To continue creating the dataset, follow these steps:

  1. Enter a name for your dataset.
  2. Scroll down the list of fields and ensure that the datatypes have been discovered correctly. If any are incorrect, select the correct datatype from the Type drop-down list.
    • GPR supports the following datatypes: Boolean, Numeric, List, String, Timestamp, and Dict. See Set up data for import for how to correctly construct a Dictionary field.
  3. Click the X in a row to remove it from the dataset. If you change your mind before you upload your data (that is, before you click the Create button on the window), scroll to the bottom of the list of fields and click in the None Selected text box. Choose the field to be restored, and click Add Back.
  4. Set the Created At field, which determines the time the record was created or the time the interaction occurred, depending on the values stored in the field you select. The drop-down list contains all fields with the Timestamp datatype.
  5. (Optional) Click in the PII Fields text box to select the fields that contain personally identifiable or sensitive information, such as Social Security numbers, passwords, names and addresses, phone numbers, and so on. Data in these fields is anonymized when you create the dataset. See Data Anonymization for important information about how data is anonymized. Note that you cannot anonymize the Created At timestamp field.
  6. Click Create to upload the data.
  • If you try to upload a file that contains more than 100 columns, GPR generates an error message. Cancel out of the upload procedure, edit your CSV file to reduce the number of columns, and then repeat the upload procedure.
  • If you try to upload a file with too many columns using the GPR API, the error message appears in the dataset Description field in the GPR application.
  • If you try to upload more than 2.5 million rows to a dataset, only the first 2.5 million rows are stored and the rest are discarded. A warning message appears on the dataset Uploads tab specifying how many rows were discarded. The Append button is disabled until you delete some rows from the dataset.

See View uploaded datasets for a description of how the GPR we application displays datasets, fields within datasets, and individual dataset uploads.

Append data to a dataset

To add more data to an existing dataset:

Select CSV file to append
  1. Open the Schema tab for a specific dataset.
  2. Click Append Data. The Append Data pop-up window opens.
  3. Select the desired CSV file. It must comply with the schema for the existing dataset.
    • If your appended CSV file has errors, a red banner appears with a notification message. This message contains a link to open a pop-up window where you can view the specific errors. Also, the Missing Values and Invalid Values columns in the dataset Schema table are updated to display the number and percentage of errors for each dataset field.
    • If you append a CSV file containing more columns than appear in the original schema, the extra columns are automatically discarded.
Retrieved from "https://all.docs.genesys.com/PE-GPR/9.0.0/Help/Datasets (2025-07-20 13:49:54)"
Comments or questions about this documentation? Contact us for support!