IBM InfoSphere Advanced DataStage – Parallel Framework
Course Overview
One of the most powerful features included in IBM InfoSphere DataStage is its parallel processing functionality. It allows you to specify and execute multiple data transformations at the same time, increasing data handling efficiency and getting the information necessary for actionable analytics results where they need to be more quickly.
Ironside’s 3-day IBM InfoSphere Advanced DataStage – Parallel Processing course will prepare you to design more robust parallel processing jobs that are less error prone, reusable, and optimized for the best performance possible. During the class, you’ll get a much deeper understanding of DataStage architecture, including the development process with the tool and how it relates to runtime environment’s. By the course’s conclusion, you will be an advanced DataStage practitioner able to easily navigate all aspects of parallel processing.
Prerequisites
This course is intended for moderate to experienced DataStage users who want to dive deeper into parallel processing capabilities. Ideal students will have experience levels equivalent to having completed the DataStage Essentials course and will have been developing parallel jobs in DataStage for at least a year.
Course Goals
-
Understand the Parallel Framework Architecture that enables the parallel processing functionality in DataStage.
-
Learn the finer points of compilation, execution, partitioning, collecting, and sorting.
-
Recognize how buffering affects parallel jobs and firmly grasp the different Parallel Framework data types available to you.
-
Take advantage of reusable components in parallel processing and engage in balanced optimization of your parallel jobs
High-Level Curriculum
-
Describe and discuss the architecture behind parallel processing and the pipeline and partition parallelism methods.
-
Recognize the role and elements of a DataStage configuration file and gain deep knowledge of the compile process and how it is represented in the OSH
-
Become comfortable with describing and carrying out the runtime job execution process and recognizing how it is depicted in the Score, as well as describing how data partitioning and collecting works in the Parallel Framework.
-
List and select the partitioning and collecting algorithms available.
-
Detail the process of sorting, the optimization techniques available for sorting, and the sort key and partitioner key logic in the Parallel Framework.
-
Describe buffering and the optimization techniques for buffering in the Parallel Framework.
-
Work with Parallel Framework data types and elements including virtual data sets and schemas.
-
Use and explain Runtime Column Propagation (RCP) in DataStage parallel jobs.
-
Create reusable job components based on shared containers.
-
Explain Balanced Optimization and optimize DataStage parallel jobs using it.
Request a Quote for Private Training