Sequential file Stage in datastage

Sequential file  Stage in datastage

The sequential file stage in DataStage allows to read and write the simple flat files. The flat files could be CSV or delimited by any particular character.Fixed width files are also supported by Sequential file stage. The stage has single reject, input and output links. It also supports RCP (runtime column propagation).

Most used properties.

Property Values
Target/File Source or Target Path
Target/File Update Mode Append/ Create/ Overwrite
Target/Write Method Specific File (for Single file) /Generate Multiple Files(Pattern based )
Options/First Line is Column Names True/False
Options/Reject Mode Continue/Fail/Save (Whether to reject data or not)
Options/Filter Command (any unix based command to filter the data)
Options/Schema File Pathname( )

Reading a Sequential file

Specify The Source File

Drag and Drop Sequential file stage on canvas.
Double click on the stage, In the properties tab, Specify the full path in the Source File property.

Reading multiple files

  • To read multiple files, specify the file pattern in the Source file property.
  • Example

Reading Comma separated files (CSV) files.

  1. Go to Output Format tab specify the Record delimiter = comma
  2. Set Final Delimiter = end.
  3. Set row delimiter = Unix newlines
  4. Specify the columns metadata information in columns tab.

    Setting up advanced properties of sequential stage
    Sequential file stage Format tab

Reading fixed width files

  1. Go to Output Format tab, Specify the final delimiter = End
  2. Set  Record delimiter = Unix NewLine
  3. Set Record length = fixed
  4. Set Delimiter – none
  5. Set Quote – none

*Go to Columns tab specify the Column data types as Char with length (It should not be varchar).

View Records

  1.     To check if the formatting has been defined correctly, click view data. If the data is visible as per source file then the properties are set correctly

Writing data in sequential file stage:

Specify the Target File

  • Drag and Drop Sequential file stage and rename it as Sequential_file_1
  • Double click on the Sequential_file_1 stage, In the properties tab, Specify the full path to the target file.

Writing multiple files:

  • To Write multiple files, specify the target file Path multiple times.
  • Example

Writing Comma separated files (CSV) files.

  1. Go to Input Format tab specify the Record delimiter = comma
  2.  Set Final Delimiter = end.
  3.  Set row delimiter = Unix newlines
  4.  Specify the columns metadata information in columns tab.

Writing to fixed width files

  1. Go to Input Format tab, Specify the final delimiter = End
  2. Set  Record delimiter = Unix NewLine
  3. Set Record length = fixed
  4. Set Delimiter – none
  5. Set Quote – none

*Go to Columns tab specify the Column data types as Char.

 View records written to file

Go to Properties of the sequential stage and click view data.

View records via UNIX

To check the result, you could use following UNIX commands via putty.
$cat /home/output.csv
or
$cat /home/output.csv to see input

Preserve partitioning

This property can be set in the advanced tab of the sequential stage. If you set this property than it requests the next stage to preserve the partitioning  .it is ignored for write operations.

 

Null handling

The sequential file stage doesn’t recognize  Nulls. So they have to be handled exclusively.

Go to the Format tab in the sequential file stage then select the Field Defaults option on the left side and select Null field value option at right side .specify the null field value here.

*Nulls are read as empty string in sequential file

Read Sequential file with Schema file

The schema file is a plain text file with the metadata information of columns. they are not saved in repositories .but the schema files can be used to specify the columns at runtime. Some jobs allow us to use partial schema’s so that only you only have to specify required columns.

The runtime column propagation(RCP) must be enabled if you wish to use schema files.

Sample Schema file:

 

record ( emp_name:string[255]; full_address:nullable string[255]; value_1:int32; value_2:int32; date:date)

Each column in schema file is described in the following  format:

column_name:[nullability]datatype;

To specify a default value if the source is NULL: {null_field='[value]’}

Filter option

The filter option in sequential stage properties allows you to filter the records based on the UNIX command.

Skip First and last record

The First and last records can be skipped using sequential stage Filter property as follows.

  • Go to Sequential properties ? Filter
  • Paste the UNIX command sed ‘1d;$d’ in the filter property
  • *Don’t change any partitioning Setting

Fetch First and last record

The First and last records can be fetched using sequential stage Filter property as follows.

  • Paste the UNIX command sed -n ‘1p;$p’ in the filter property

Parallelism

The stage runs in parallel mode in case of reading multiple files but runs in sequential mode for the single file.For fixed width files stage can be configured exclusively by the number of readers per node and read from multiple node properties.

Read From Multiple Nodes

This property allows individual files to be read by several nodes. With the information of node availability and record size of fixed width files, DataStage allocates readers on each within the file which  improves performance on cluster systems.

Parallism concept in sequential stage
Sequential stage: numbers of readers per node

Number Of Readers Per Node

This property allows partitioning the data within the single file.Each node reads the single file with multiple readers and writes data to multiple partitions.For SMP systems it results in better I/O performance.

Parallelism Concpet and sequential stage
Sequential stage read from multiple nodes

Errors and solution

Field “COLUMN_NAME” is null but no null export handling is defined

Solution: Go to Format tab of sequential file stage, set the optional property  Null field value to the empty string to indicate an empty string. The sequential file stage doesn’t recognize nulls.

This solution is also applicable for following errors/warnings.

  • When validating export schema: At field “[field name”: Exporting nullable field without null handling properties.

APT_IMPEXP_ALLOW_ZERO LENGTH_FIXED_NULL

To allow zero length null field values with fixed length fields, set the APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL environment variable.This might be useful for poorly formatted data will cause incorrect results. By default, a zero length null field value might cause an error.

Warning: A sequential operator cannot preserve the partitioning of input data set on input port 0

It is required to clear the preserve partition flag ahead of sequential stage.

Datastage Versions:

This article pertains to datastage version 8.1,8.5,8.7,.9.1,9.5,11.3 and 11.5

Related Examples:

  • Example with fixed width file
  • Concept of readers per node and node per reader
  • how to remove first and the Last record
  • how to add a header to a flat file using datastage

Share this post

Leave a Reply