Configure Data Loader for Feature Engineering

From Genesys Documentation
Jump to: navigation, search

upload-dataset

Default Value: see option description
Valid Values: true, false
Changes Take Effect: After 60 sec timeout

Notifies Data Loader that the dataset is fully configured and the data processing for this dataset can be started. Data Loader checks every 60 seconds to see whether the value of this option has changed.

If set to true, Data Loader starts the dataset upload. If set to false, Data Loader does not upload data.

The default value for this option is pre-set to true for the dataset-agent-gim dataset and to false for the dataset-interactions-gim dataset. This configuration ensures that the agent profile, which needs to be in place first, is uploaded immediately.

use-cloud-feature-engineering

Default Value: True
Valid Values: True, False
Changes Take Effect: Immediately

Controls whether Data Loader should upload data to the cloud feature engineering pipeline (CFEP).

  • true (the default) - Data Loader uploads your data to the GPR Core Platform via the CFEP, where it can be augmented with additional features and joined with other datasets before it is used for predictor and model creation, model training, and agent scoring.
  • false - Data Loader uploads data as it did in previous releases, uploading it to the Agent Profile schema, Customer Profile schema, or a configured interactions or outcome dataset, depending on the value of the data-type' option.

end-date

Default Value: 1970-01-01
Valid Values: date in YYYY-MM-DD format
Changes Take Effect: After 15 min timeout

The last date in the period for which Data Loader should retrieve data for a dataset. This date can be in the future.

  • Change the default value to a date suitable for your environment. For example, you might enter 2020-11-04.

This option is required for datasets of the interactions and outcomes types. It is not used for datasets of the customers and agents types.

start-date

Default Value: 1970-01-01
Valid Values: date in YYYY-MM-DD format
Changes Take Effect: After 15 min timeout

The earliest date in the period for which Data Loader should retrieve data for a dataset.

  • Change the default value to a date suitable for your environment. For example, you might enter 2018-11-29.

This option is required for datasets of the interactions and outcomes types. It is not used for datasets of the customers and agents types.

sql-query

Default Value: No default value
Valid Values: A string starting with "file:" and followed by a valid path to a file in the Data Loader Docker container containing an SQL query
Changes Take Effect: After 15 min timeout

You need to configure this option only when you are using a customized query to extract data from the Genesys Info Mart database for the Agent Profile and interactions datasets. You do not need to configure the sql-query option to create datasets from .csv files, such as for Customer Profile data, outcomes data, and agent data from sources other than Genesys Info Mart.

Two example SQL queries are provided in the Data Loader Docker container for your reference:

  • /dl/interaction_data_aht.sql - the query used to collect average handling time (AHT) data for Data Loader to upload to the interactions dataset.
  • /dl/agents_data_gim.sql - the query used to collect data to populate the default Agent Profile dataset.

For instructions to create your own SQL query, see <a rel="nofollow" class="external text" href="https://all.docs.genesys.com/PE-GPR/9.0.0/Deployment/DL-CfgFile#createSQL">Create your own SQL query</a> in the Deployment and Operations Guide.

The following is an example of a valid value for this option: file:/datasets/outcomes/my_interactions_data_gim.sql

If you do not configure this option in the [dataset-agents-gim] or [dataset-interactions-gim] sections, Data Loader uses the appropriate default query.

use-cloud-feature-engineering

Default Value: True
Valid Values: True, False
Changes Take Effect: Immediately

Controls whether Data Loader should upload data to the cloud feature engineering pipeline (CFEP).

  • true (the default) - Data Loader uploads your data to the GPR Core Platform via the CFEP, where it can be augmented with additional features and joined with other datasets before it is used for predictor and model creation, model training, and agent scoring.
  • false - Data Loader uploads data as it did in previous releases, uploading it to the Agent Profile schema, Customer Profile schema, or a configured interactions or outcome dataset, depending on the value of the data-type' option.

upload-dataset

Default Value: see option description
Valid Values: true, false
Changes Take Effect: After 60 sec timeout

Notifies Data Loader that the dataset is fully configured and the data processing for this dataset can be started. Data Loader checks every 60 seconds to see whether the value of this option has changed.

If set to true, Data Loader starts the dataset upload. If set to false, Data Loader does not upload data.

The default value for this option is pre-set to true for the dataset-agent-gim dataset and to false for the dataset-interactions-gim dataset. This configuration ensures that the agent profile, which needs to be in place first, is uploaded immediately.

vq-filter

Default Value: No default value
Valid Values: a comma-separated list of valid virtual queue names
Changes Take Effect: On the next data upload

To have Data Loader upload data only from a subset of virtual queues (VQs) for inclusion in an interaction-type dataset, enter a comma-separated list of the VQs to include. Data Loader uploads records from the Genesys Info Mart database associated with the specified VQs.

sql-query

Default Value: No default value
Valid Values: A string starting with "file:" and followed by a valid path to a file in the Data Loader Docker container containing an SQL query
Changes Take Effect: After 15 min timeout

You need to configure this option only when you are using a customized query to extract data from the Genesys Info Mart database for the Agent Profile and interactions datasets. You do not need to configure the sql-query option to create datasets from .csv files, such as for Customer Profile data, outcomes data, and agent data from sources other than Genesys Info Mart.

Two example SQL queries are provided in the Data Loader Docker container for your reference:

  • /dl/interaction_data_aht.sql - the query used to collect average handling time (AHT) data for Data Loader to upload to the interactions dataset.
  • /dl/agents_data_gim.sql - the query used to collect data to populate the default Agent Profile dataset.

For instructions to create your own SQL query, see <a rel="nofollow" class="external text" href="https://all.docs.genesys.com/PE-GPR/9.0.0/Deployment/DL-CfgFile#createSQL">Create your own SQL query</a> in the Deployment and Operations Guide.

The following is an example of a valid value for this option: file:/datasets/outcomes/my_interactions_data_gim.sql

If you do not configure this option in the [dataset-agents-gim] or [dataset-interactions-gim] sections, Data Loader uses the appropriate default query.

trigger-pipeline-execution

Default Value: False
Valid Values: True, False
Changes Take Effect: On next data upload

This option enables you to trigger the execution of the Cloud Feature Engineering pipeline. The pipeline execution happens after the next scheduled interactions data upload is complete (unless the number of newly-uploaded records is 0).

use-cloud-feature-engineering

Default Value: True
Valid Values: True, False
Changes Take Effect: Immediately

Controls whether Data Loader should upload data to the cloud feature engineering pipeline (CFEP).

  • true (the default) - Data Loader uploads your data to the GPR Core Platform via the CFEP, where it can be augmented with additional features and joined with other datasets before it is used for predictor and model creation, model training, and agent scoring.
  • false - Data Loader uploads data as it did in previous releases, uploading it to the Agent Profile schema, Customer Profile schema, or a configured interactions or outcome dataset, depending on the value of the data-type' option.

The Cloud Feature Engineering Pipeline (CFEP) facilitates computation of aggregated features and complex data joins for more broad-based predictive analysis.

About feature engineering

The CFEP augments your datasets with engineered features and data joins. It automatically performs complex data selection and manipulation steps that otherwise require much more hands-on effort.

NOTE: Although the CFEP is enabled by default, your Genesys GPR account in the Genesys Engage Cloud must have the CFEP configured before you can use it. Contact your Genesys representative for more information.

After you have arranged to have the CFEP configured on the GPR Core Platform, follow the procedures in this topic to have Data Loader upload your data to the pipeline. The CFEP is available in Data Loader release 9.0.017.01 and higher.

With the CFEP enabled, Data Loader does the following:

  • Automatically extracts user data from Genesys Info Mart.
  • Uploads historical interaction data processed on specified virtual queues.
  • Optionally, triggers execution of the CFEP job every upload period after all dataset chunks are uploaded to the GPR Core Platform.

Turn on feature engineering

Enable feature engineering

Feature engineering is enabled by default. However, you can turn off feature engineering separately for any datasets you do not want to have processed by the CFEP. To turn it off, open the Data Loader Application object in GAX and set the value of the use-cloud-feature-engineering configuration option to false. By default, this option is set to true.

Start feature engineering automatically

To have the CFEP automatically start processing each time Data Loader uploads fresh data, set the trigger-pipeline-execution option to true. If you prefer to start pipeline processing manually, you can trigger it by sending a request to the GPR API.

Configure data extraction from Genesys Info Mart

SQL Query Templates

The fields that Data Loader extracts from the Genesys Info Mart database for upload are defined by an SQL query file. Genesys provides default SQL query template files with the Data Loader IP to populate the interactions and Agent Profile datasets.

Genesys recommends that you use the standard SQL query templates to upload data to the CFEP. If you need custom SQL queries, review the information in Create your own SQL query for mandatory fields and other important guidelines.

Note: If you use the standard SQL query templates, do not configure the sql-query option for the interactions and Agent Profile datasets.

Upload User Data

Data Loader extracts user data stored in the Genesys Info Mart database, which is expected to follow user data mapping and propagation rules. See User Data Mapping in the Genesys Info Mart Deployment Guide for a discussion of this topic.

  • If a user data key in the Info Mart database has only one configured rule, Data Loader automatically adds it to the interactions dataset.
  • If you have configured multiple mappings for a user data key and have not specified which is the default rule, Data Loader adds each mapped user data value to a different column in the dataset. For example, a key, CustomData has two mappings configured, IRF_ROUTE and PARTY. With no default rule configured, Data Loader adds the following two columns to the dataset: CustomData_IRF_ROUTE and CustomData_PARTY.
  • If one of the mapped user data values is more important than the others for predictor creation and model building, you can specify it as the default rule. See Configure user data with multiple mappings for instructions.

Note: If you do not have multiple mappings for user data in the Info Mart database, or if you are satisfied with Data Loader assigning each mapping to a separate column, you do not need to do any additional configuration.

Filter by VQ

To limit the amount data that Data Loader uploads to the CFEP, list the names of the virtual queues (VQs) from which Data Loader should take historical interaction data.

To specify VQ names, list the names of the desired VQs in the vq-filter option.

Note: The length of the comma-separated list of VQ names should not exceed 4096 characters. Genesys always recommends that you use Genesys Administrator Extension (GAX) to configure options, but this recommendation is especially important of your list of VQ names is longer than 255 characters.

Migrate datasets to FE

If you have already-configured datasets that you would now like the CFEP to process, use the following procedure:

  1. Disable interaction processing using GPR for the time it takes to perform the migration.
  2. From Genesys Administrator Extension (GAX), export your current Data Loader configuration and save it as a reference.
  3. Delete the existing Agent Profile schema from the GPR web application. See View the Agent Profile schema in the Predictive Routing Help for instructions.

Use GAX to update your Data Loader configuration:

  1. Set the upload-dataset and use-cloud-feature-engineering options to false for all the previously-uploaded datasets you plan to migrate.
  2. Remove the following configuration sections, if present. They are not used with the CFEP, which performs data aggregation automatically: [dataset-agents-gim-ext] and [schema-agents-gim-ext].
  3. Your configuration needs to include both the [dataset-interactions-gim] and [schema-interactions-gim] pair of configuration sections and a new pair of sections that include the same options and the same initial configuration settings.
  4. NOTE: Data Loader releases prior to 9.0.017.01 did not include [dataset-interactions-gim] and [schema-interactions-gim] in the default template. If you do not have those sections in your environment, use GAX to import the Data Loader Application Template for release 9.0.017.01 or higher, then continue the following procedure.
  5. To create the required configuration sections, use GAX to perform the following steps:
    1. Rename the [dataset-interactions-gim] and [schema-interactions-gim] sections to [dataset-interactions-gim-temp] and [schema-interactions-gim-temp]. This prevents them from being overwritten in the following step.
    2. In GAX, import the DataLoader.cfg template file, which is located in the Data Loader Installation Package. For detailed information about importing Application templates, see Bulk Provisioning of Configuration Options in the Genesys Administrator Extension Help.
    3. Locate the newly-created [dataset-interactions-gim] and [schema-interactions-gim] sections, which were provisioned by default when you imported the .cfg file. Rename them for use with the CFEP. For example, the new names of these sections might be [dataset-interactions-fe] and [schema-interactions-fe]. All further configuration for the CFEP should be done using these sections.
    4. Return to your original sections, now named [dataset-interactions-gim-temp] and [schema-interactions-gim-temp]. Return their names to [dataset-interactions-gim] and [schema-interactions-gim]. These sections do not require any additional configuration.
  6. (Optional) To use multiple SQL queries to extract data from the Genesys Info Mart database into separate interactions datasets, configure additional [dataset-<name>] and [schema-<name>] configuration sections, one for each separate SQL query.
    • NOTE: In this case you need to configure the [dataset-<name>].sql-query option for each additional dataset and provide the correct path to the associated SQL file location.
  7. Configure the date range for each dataset to be used with the CFEP by setting the desired values for the [dataset-<name>].start-date and [dataset-<name>].end-date options.
  8. Save the changes to your configuration.

After configuring your datasets in GAX, continue with the following steps:

  1. If you have configured multiple User Data Propagation rules for a single user data key in Genesys Info Mart, see the Configure user data rules section on this page.
  2. Install Data Loader release 9.0.017.01 or higher, following the instructions to Deploy Data Loader. The version you deploy must support the CFEP.
  3. In GAX, set the use-cloud-feature-engineering option to true for the [dataset-agents-gim] and [dataset-interactions-fe] configuration sections in the Data Loader Application object. If you are using additional datasets, make this change in the associated dataset configuration sections as well.
  4. Start Data Loader.
  5. In GAX, set the upload-dataset option to true for the [dataset-agents-gim] and [dataset-interactions-fe] configuration sections in the Data Loader Application object. If you are using additional datasets, make this change in the associated dataset configuration sections as well. This triggers the initial upload to the CFEP.
  6. After the CFEP job is complete, open the GPR web application and review the datasets created. If necessary, refer to View your uploaded data in the Predictive Routing Help for help finding and understanding the data displays. You can now use the uploaded data to create predictors and models.
  7. After you have verified the quality of the datasets produced by CFEP, created predictors with the new datasets, and trained the new models, use GAX to remove your old datasets from the Data Loader Application configuration.
  8. Re-enable GPR interaction processing.

Configure user data rules

If you have used Genesys Info Mart user data mapping rules to have certain user data stored in multiple Info Mart database fields, you can choose to specify one of those as the main value for your dataset. To do so, create an option in the [schema-interactions-*] section of the Data Loader Application. The option name is the user data key name. The value should indicate the datatype and the parameter 'rule:<rule_name>'.

Example

  1. If the dataset is identified with the suffix "fe", the section name where you should create the option is [schema-interactions-fe].
  2. The option name should be the name of the user data key. In this example, the key name is CustomData.
  3. Set the option value using the following format: datatype, rule:<rule_name>. For example, your option value in this example might be: string,rule:IRF_ROUTE.

Data Loader adds a column named CustomData to the uploaded dataset, which contains the values from the Info Mart database column defined by the mapping rule IRF_ROUTE.

Note: Data from the other mapping rules is retained and added to the dataset, in case it might prove useful. The value for each rule is stored in a separate column named <user_data_keyname>_<rule_name>. For example, you might have a rule, PARTY, for the CustomData user data key. In that case, Data Loader adds a column named CustomData_PARTY containing the values for the other mapping.

Retrieved from "https://all.docs.genesys.com/PE-GPR/9.0.0/Deployment/DL-CFEP (2021-04-14 07:53:33)"