Internal Sort


Sorting data in a file or combining multiple files is a common requirement in many applications. Sorting is a method that arranges the given records in either ascending or descending order.

There are two techniques in COBOL to sort the files -

  • External sort - It uses the SORT utility in JCL to sort files. DFSORT utility is well explained in the Utilities section.
  • Internal sort - An internal sort is a method to sort data directly within the program's working storage without relying on external utilities. During the sorting process, COBOL reads the records, sorts them in the specified order, and then writes the sorted records either to a new file or back to a file.

Below statements are used in Internal sort process -

  • SORT Statement
  • RETURN Statement
  • RELEASE Statement
  • MERGE Statement

SORT Statement


The SORT statement is a powerful statement used for sorting files internally within the program. It's mainly useful in batch processing, where large amounts of data need to be sorted before further processing.

SORT work-file1 
   ON ASCENDING|DESCENDING KEY key-1
   [USING input-file | INPUT PROCEDURE IS para-1 THRU para-2]
   [GIVING output-file | OUTPUT PROCEDURE IS para-3 THRU para-4]
  • work-file-1 - The name of the sort work file. It's a logical file and does not correspond to any physical file.
  • ASCENDING|DESCENDING KEY - Specifies the sort order. You can sort on multiple keys.
  • key-1, ... - The data item(s) on which to sort.
  • USING input-file - The name of the input file to be sorted.
  • GIVING output-file - The name of the output file where the sorted data will be written.
  • INPUT PROCEDURE - Optional. A procedure is executed for each record of the input file before sorting.
  • OUTPUT PROCEDURE - Optional. A procedure is executed for each record of the sorted file.

Example - Let us assume we have an employee file that contains employee information unsorted. The file should be sorted based on the employee number (1-5 characters) in ascending order.

Input (MATEPK.EMPFILE.NSINPUT1) -

----+----1----+----2----+----3----+----4----+--
E0004EMPLOYEE4     TL   DEPT1LOC1 0000050000
E0002EMPLOYEE2     MGR  DEPT1LOC1 0000080000
E0006EMPLOYEE6     SE   DEPT1LOC1 0000034000
E0001EMPLOYEE1     DIR       LOC1 0000100000

Code -

----+----1----+----2----+----3----+----4----+----5----+
       IDENTIFICATION DIVISION.
       PROGRAM-ID. SORTFLS.
       AUTHOR. MTH.

       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT EMPFILE1 ASSIGN TO INPUT1.
           SELECT EMPFILEO ASSIGN TO OUTPUT1.
           SELECT WORKFILE ASSIGN TO WORK1.

       DATA DIVISION. 
       FILE SECTION.
       FD EMPFILE1
           RECORD CONTAINS 47 CHARACTERS
           DATA RECORD EMP-REC1
           RECORDING MODE F.
       01 EMP-REC1.
          COPY EMPREC.
       FD EMPFILEO 
           RECORD CONTAINS 47 CHARACTERS 
           DATA RECORD EMP-RECO 
           RECORDING MODE F.
       01 EMP-RECO.
          COPY EMPREC.
       SD WORKFILE
           RECORD CONTAINS 47 CHARACTERS 
           DATA RECORD WORK-REC
           RECORDING MODE F.
       01 WORK-REC.
          05 WORK-EMP-NUM     PIC 9(05).
          05 WORK-REM-REC     PIC X(42).

       PROCEDURE DIVISION.
           SORT WORKFILE
             ON ASCENDING KEY WORK-EMP-NUM 
                USING EMPFILE1
                GIVING EMPFILEO.
           STOP RUN.

JCL -

///MATEPKRJ JOB MSGLEVEL=(1,1),NOTIFY=&SYSUID
//*
//STEP01  EXEC PGM=SORTFLS
//STEPLIB  DD  DSN=MATEPK.COBOL.LOADLIB,DISP=SHR
//INPUT1   DD  DSN=MATEPK.EMPFILE.NSINPUT1,DISP=SHR
//OUTPUT1  DD  DSN=MATEPK.EMPFILE.NSOUT1,
//            DISP=(NEW,CATLG,DELETE),
//            SPACE=(TRK,(3,2),RLSE),
//            UNIT=SYSDA,
//            DCB=(DSORG=PS,RECFM=FB,LRECL=47,BLKSIZE=470)
//WORK1    DD  DSN=&&TEMP,
//            DISP=(NEW,DELETE,DELETE),
//            SPACE=(CYL,(10,5),RLSE),
//            UNIT=SYSDA,
//            DCB=(DSORG=PS,RECFM=FB,LRECL=47,BLKSIZE=470)
//SYSOUT   DD  SYSOUT=*

Output (MATEPK.EMPFILE.NSOUT1) -

----+----1----+----2----+----3----+----4----+--
E0001EMPLOYEE1     DIR       LOC1 0000100000
E0002EMPLOYEE2     MGR  DEPT1LOC1 0000080000
E0004EMPLOYEE4     TL   DEPT1LOC1 0000050000
E0006EMPLOYEE6     SE   DEPT1LOC1 0000034000

Explaining Example -

In the above example, SORT statement sorts the input file records based on the employee number (1-5 bytes) and writes into the output file.

MERGE Statement


The MERGE statement is used to combine two or more sequentially ordered files into a single, merged, and sequentially ordered output file. The files should be in sorted order according to the ascending or descending key that is common for both files.

MERGE work-file-1
     ON ASCENDING/DESCENDING KEY key-name-1 [key-name-2 ...]
     [USING input-file-1 [, input-file-2 ...]]
	 [OUTPUT PROCEDURE IS para-1 THRU para-2]
     [GIVING output-file-1 [, output-file-2 ...]]

Example - Let us assume two different employee files contain employee information, and create one merged file containing all employee details from two files.

Input -

MATEPK.EMPFILE.INPUT1

----+----1----+----2----+----3----+----4----+--
E0001EMPLOYEE1     DIR       LOC1 0000100000   
E0002EMPLOYEE2     MGR  DEPT1LOC1 0000080000   
E0004EMPLOYEE4     TL   DEPT1LOC1 0000050000   
E0006EMPLOYEE6     SE   DEPT1LOC1 0000034000

MATEPK.EMPFILE.INPUT2

----+----1----+----2----+----3----+----4----+--
E0003EMPLOYEE3     MGR  DEPT2LOC2 0000075000   
E0005EMPLOYEE5     SSE  DEPT1LOC1 0000045000   
E0007EMPLOYEE7     SSE  DEPT2LOC2 0000046000  

Code -

----+----1----+----2----+----3----+----4----+----5----+
       IDENTIFICATION DIVISION.
       PROGRAM-ID. MERGEFLS. 
       AUTHOR. MTH.

       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT EMPFILE1 ASSIGN TO INPUT1.  
           SELECT EMPFILE2 ASSIGN TO INPUT2.  
           SELECT EMPFILEO ASSIGN TO OUTPUT1. 
           SELECT WORKFILE ASSIGN TO WORK1.
 
       DATA DIVISION.
       FILE SECTION.
	  * Input File1
       FD EMPFILE1
           RECORD CONTAINS 47 CHARACTERS
           DATA RECORD EMP-REC1
           RECORDING MODE F.
       01 EMP-REC1.
          COPY EMPREC.
	  * Input File2
       FD EMPFILE2
           RECORD CONTAINS 47 CHARACTERS
           DATA RECORD EMP-REC2
           RECORDING MODE F.
       01 EMP-REC2.
          COPY EMPREC.
	  * Output file
       FD EMPFILEO
           RECORD CONTAINS 47 CHARACTERS
           DATA RECORD EMP-RECO 
           RECORDING MODE F.
       01 EMP-RECO.
          COPY EMPREC.
	  * Merge work file
       SD WORKFILE
           RECORD CONTAINS 47 CHARACTERS
           DATA RECORD WORK-REC
           RECORDING MODE F.
       01 WORK-REC.
          05 WORK-EMP-NUM         PIC 9(05).
          05 WORK-REM-REC         PIC X(42).

       PROCEDURE DIVISION.
           MERGE WORKFILE
              ON ASCENDING KEY WORK-EMP-NUM
                 USING EMPFILE1, EMPFILE2
                 GIVING EMPFILEO.
           STOP RUN.

JCL -

//MATEPKRJ JOB MSGLEVEL=(1,1),NOTIFY=&SYSUID
//*
//STEP01  EXEC PGM=MERGEFLS
//STEPLIB  DD  DSN=MATEPK.COBOL.LOADLIB,DISP=SHR
//INPUT1   DD  DSN=MATEPK.EMPFILE.INPUT1,DISP=SHR
//INPUT2   DD  DSN=MATEPK.EMPFILE.INPUT2,DISP=SHR
//OUTPUT1  DD  DSN=MATEPK.EMPFILE.OUTPUT1,
//            DISP=(NEW,CATLG,DELETE),
//            SPACE=(TRK,(3,2),RLSE),
//            UNIT=SYSDA,
//            DCB=(DSORG=PS,RECFM=FB,LRECL=47,BLKSIZE=470)
//WORK1    DD  DSN=&&TEMP,
//            DISP=(NEW,DELETE,DELETE),
//            SPACE=(CYL,(10,5),RLSE),
//            UNIT=SYSDA,
//            DCB=(DSORG=PS,RECFM=FB,LRECL=47,BLKSIZE=470)
//SYSOUT   DD  SYSOUT=*  

Output (MATEPK.EMPFILE.OUTPUT1) -

----+----1----+----2----+----3----+----4----+--
E0001EMPLOYEE1     DIR       LOC1 0000100000   
E0002EMPLOYEE2     MGR  DEPT1LOC1 0000080000   
E0003EMPLOYEE3     MGR  DEPT2LOC2 0000075000   
E0004EMPLOYEE4     TL   DEPT1LOC1 0000050000   
E0005EMPLOYEE5     SSE  DEPT1LOC1 0000045000   
E0006EMPLOYEE6     SE   DEPT1LOC1 0000034000   
E0007EMPLOYEE7     SSE  DEPT2LOC2 0000046000   

RELEASE Statement


The RELEASE statement is used to send records from the input procedure to a sort work file in the sorting process. It is only used inside the INPUT PROCEDURE of a SORT operation.

RELEASE record-name [FROM ws-record-name]

Example - Let us use the input file from the above SORT example. But, we need to skip the DIR (having employee number E0001) record before sorting.

Input -

MATEPK.EMPFILE.NSINPUT1

----+----1----+----2----+----3----+----4----+--
E0004EMPLOYEE4     TL   DEPT1LOC1 0000050000
E0002EMPLOYEE2     MGR  DEPT1LOC1 0000080000
E0006EMPLOYEE6     SE   DEPT1LOC1 0000034000
E0001EMPLOYEE1     DIR       LOC1 0000100000

Code -

----+----1----+----2----+----3----+----4----+----5----+
       IDENTIFICATION DIVISION.
       PROGRAM-ID. RELWFILE.
       AUTHOR. MTH.

       ENVIRONMENT DIVISION. 
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT EMPFILE1 ASSIGN TO INPUT1. 
           SELECT EMPFILEO ASSIGN TO OUTPUT1.
           SELECT WORKFILE ASSIGN TO WORK1.

       DATA DIVISION.
       FILE SECTION.
	  * Use declarations from SORT example
       ...

       WORKING-STORAGE SECTION.
       01 WS-VAR.
          05 WS-INPUT1-SW         PIC X(01) VALUE 'N'.
             88 INPUT1-EOF        VALUE 'Y'.
             88 NOT-INPUT1-EOF    VALUE 'N'.

       PROCEDURE DIVISION.
	  * Sorting WORKFILE using a input paragraph
	  * 1000-FILTER-RECORDS and writing to output file
	  * EMPFILEO.
           SORT WORKFILE
             ON ASCENDING KEY WORK-EMP-NUM 
             INPUT PROCEDURE IS 1000-FILTER-RECORDS
                GIVING EMPFILEO.
           STOP RUN.
		   
      * Input paragraph for SORT statement
       1000-FILTER-RECORDS.

           SET  NOT-INPUT1-EOF   TO TRUE
           OPEN INPUT EMPFILE1.
           READ EMPFILE1
                AT END SET INPUT1-EOF TO TRUE
           END-READ

           PERFORM UNTIL INPUT1-EOF
	  * Releases the WORK-REC after processing.
                IF EMP-NUM  OF EMP-REC1
                            NOT EQUAL 'E0001'
                   MOVE EMP-REC1      TO WORK-REC
                   RELEASE WORK-REC
                END-IF 
              READ EMPFILE1
                   AT END SET INPUT1-EOF TO TRUE 
              END-READ
           END-PERFORM.

           CLOSE EMPFILE1.

JCL - Use same JCL from SORT example

Output (MATEPK.EMPFILE.NSOUTPUT) -

----+----1----+----2----+----3----+----4----+--
E0002EMPLOYEE2     MGR  DEPT1LOC1 0000080000   
E0004EMPLOYEE4     TL   DEPT1LOC1 0000050000   
E0006EMPLOYEE6     SE   DEPT1LOC1 0000034000

RETURN Statement


The RETURN statement transfers records from the final phase of a sorting or merging operation to an OUTPUT PROCEDURE. It is used only within the OUTPUT PROCEDURE associated with a SORT or MERGE statement.

RETURN work-file-1 INTO record-1
    AT END statements-block-1
    NOT AT END statements-block-2
END-RETURN

Example - Let us use the input file from the above SORT example. But, we need to skip the DIR (having employee number E0001) record after the sort and before writing it to output file.

Input -

MATEPK.EMPFILE.NSINPUT1

----+----1----+----2----+----3----+----4----+--
E0004EMPLOYEE4     TL   DEPT1LOC1 0000050000
E0002EMPLOYEE2     MGR  DEPT1LOC1 0000080000
E0006EMPLOYEE6     SE   DEPT1LOC1 0000034000
E0001EMPLOYEE1     DIR       LOC1 0000100000

MATEPK.EMPFILE.INPUT2

Code -

----+----1----+----2----+----3----+----4----+----5----+
       IDENTIFICATION DIVISION.
       PROGRAM-ID. RETURNSR.
       AUTHOR. MTH.

       ENVIRONMENT DIVISION. 
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT EMPFILE1 ASSIGN TO INPUT1. 
           SELECT EMPFILEO ASSIGN TO OUTPUT1.
           SELECT WORKFILE ASSIGN TO WORK1.

       DATA DIVISION.
       FILE SECTION.
	  * Use declarations from SORT example
       ...
		  
       WORKING-STORAGE SECTION.
       01 WS-SORTEOF-SW           PIC X(01) VALUE 'N'.
          88 SORT-EOF             VALUE 'Y'. 

       PROCEDURE DIVISION
	  * SORT using OUTPUT PROCEDURE
           SORT WORKFILE
             ON ASCENDING KEY WORK-EMP-NUM
             USING EMPFILE1
             OUTPUT PROCEDURE IS 1000-WRITE-RECORDS.
           STOP RUN.

       1000-WRITE-RECORDS.

           OPEN OUTPUT EMPFILEO.
      * RETURN statement with LOOP to process all records
           PERFORM UNTIL SORT-EOF
                   RETURN WORKFILE
                          AT END SET SORT-EOF   TO TRUE   
                      NOT AT END PERFORM 2000-WRITE-OUTPUT
                   END-RETURN 
           END-PERFORM.

           CLOSE EMPFILEO. 

       2000-WRITE-OUTPUT.
	  * Skipping E0001 record while writing
           IF WORK-EMP-NUM  EQUAL 'E0001'
              CONTINUE
           ELSE
              MOVE  WORK-REC   TO EMP-RECO
              WRITE EMP-RECO
           END-IF.

JCL - Use same JCL from SORT example

Output (MATEPK.EMPFILE.RETURNOP) -

----+----1----+----2----+----3----+----4----+--
E0002EMPLOYEE2     MGR  DEPT1LOC1 0000080000   
E0004EMPLOYEE4     TL   DEPT1LOC1 0000050000   
E0006EMPLOYEE6     SE   DEPT1LOC1 0000034000