IBM InfoSphere Advanced DataStage – Advanced Data Processing

Course Overview

To be a truly advanced IBM InfoSphere DataStage practitioner, you not only need to have an intimate understanding of the platform’s parallel processing capabilities but also must be well-versed in how to process the different complex data sources you will be asked to integrate. Resources like relational data, unstructured data, Hadoop HDFS files, and XML data all have their own unique processing requirements that you will need to deftly navigate.

Ironside’s 2-day IBM InfoSphere Advanced DataStage – Advanced Data Processing course will prepare you to handle any complex data sources you encounter. You will learn advanced processing techniques for masking data, validating data, and using data rules, establishing a sophisticated knowledge of how to deal with all the data sources listed above and more. In addition, you will become an expert in star schema-based data warehouse updates using DataStage SCD (Slowly Changing Dimensions), which will enable you to more accurately integrate information from this type of source.

Prerequisites

This course is intended for experience DataStage developers who want to understand on a deeper level the different data handling job techniques available to them for a variety of complex data sources. Participants should already have a knowledge of DataStage equivalent to having completed a DataStage Essentials course and should have at least one year of experience working with parallel jobs in DataStage.

Course Goals

  • Explore database access techniques for many of the major complex data formats that DataStage developers encounter regularly.

  • Become adept at processing data in both unstructured and big data oriented formats.

  • Learn the details of data masking, XML data handling, and data rule creation.

  • Gain familiarity with the update procedures for star schema data warehouses.

High-Level Curriculum

  • Use Connector stages to read from and write to database tables and handle any SQL errors occurring in your Connector stages.

  • Use the Unstructured Data and Big Data stages to extract information from Excel spreadsheets and read from and write to Hadoop HDFS files.

  • Establish processes for disguising sensitive data in DataStage jobs using the Data Masking stage.

  • Parse, compose, and transform XML data using the XML stage and import/manage XML schemas using the Schema Library Manager.

  • Validate fields in a DataStage job and create custom validation rules using the Data Rules stage.

  • Design a job that can process star schema data warehouses with both Type 1 and Type 2 SCDs.

  • Generate surrogate keys using the Surrogate Key Generator.

Request a Quote for Private Training