Internal Sort
Many applications require sorting data in a file or combining multiple files into a single file. Sorting is a method of arranging the given records in either ascending or descending order.
There are two techniques in COBOL to sort the files -
- External sort - DFSORT (SORT) utility in JCL is used to sort files. DFSORT utility is well explained in the Utilities section.
- Internal sort - An internal sort is a method to sort file directly within the program without relying on external utilities. During the sorting process, COBOL reads the records, sorts them in the specified order, and then writes the sorted records either to a new file or back to a file.
Below statements are used in Internal sort process -
- SORT Statement
- RETURN Statement
- RELEASE Statement
- MERGE Statement
SORT Statement
SORT statement is a powerful statement used for sorting files internally within the program. It's mainly useful in batch processing, where large amounts of data need to be sorted before further processing.
SORT work-file1
ON ASCENDING|DESCENDING KEY key-1
[USING input-file | INPUT PROCEDURE IS para-1 THRU para-2]
[GIVING output-file | OUTPUT PROCEDURE IS para-3 THRU para-4]
- work-file-1 - Specifies sort work file. It's a logical file and does not correspond to any physical file.
- ASCENDING|DESCENDING KEY - Specifies the sort order. You can sort on multiple keys.
- key-1, ... - The data item(s) on which to sort.
- USING input-file - The name of the input file to be sorted.
- GIVING output-file - The name of the output file where the sorted data will be written.
- INPUT PROCEDURE - Optional. A procedure is executed for each record of the input file before sorting.
- OUTPUT PROCEDURE - Optional. A procedure is executed for each record of the sorted file.
Example - Let us assume we have an employee file that contains employee information unsorted. The file should be sorted based on the employee number (1-5 characters) in ascending order.
Input (MATEPK.EMPFILE.NSINPUT1) -
----+----1----+----2----+----3----+----4----+-- E0004EMPLOYEE4 TL DEPT1LOC1 0000050000 E0002EMPLOYEE2 MGR DEPT1LOC1 0000080000 E0006EMPLOYEE6 SE DEPT1LOC1 0000034000 E0001EMPLOYEE1 DIR LOC1 0000100000
Code -
----+----1----+----2----+----3----+----4----+----5----+
...
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT EMPFILE1 ASSIGN TO INPUT1.
SELECT EMPFILEO ASSIGN TO OUTPUT1.
SELECT WORKFILE ASSIGN TO WORK1.
...
DATA DIVISION.
FILE SECTION.
* Use declarations from SORT example
...
PROCEDURE DIVISION.
SORT WORKFILE
ON ASCENDING KEY WORK-EMP-NUM
USING EMPFILE1
GIVING EMPFILEO.
...
JCL -
///MATEPKRJ JOB MSGLEVEL=(1,1),NOTIFY=&SYSUID //* //STEP01 EXEC PGM=SORTFLS //STEPLIB DD DSN=MATEPK.COBOL.LOADLIB,DISP=SHR //INPUT1 DD DSN=MATEPK.EMPFILE.NSINPUT1,DISP=SHR //OUTPUT1 DD DSN=MATEPK.EMPFILE.NSOUT1, // DISP=(NEW,CATLG,DELETE), // SPACE=(TRK,(3,2),RLSE), // UNIT=SYSDA, // DCB=(DSORG=PS,RECFM=FB,LRECL=47,BLKSIZE=470) //WORK1 DD DSN=&&TEMP, // DISP=(NEW,DELETE,DELETE), // SPACE=(CYL,(10,5),RLSE), // UNIT=SYSDA, // DCB=(DSORG=PS,RECFM=FB,LRECL=47,BLKSIZE=470) //SYSOUT DD SYSOUT=*
Output (MATEPK.EMPFILE.NSOUT1) -
----+----1----+----2----+----3----+----4----+-- E0001EMPLOYEE1 DIR LOC1 0000100000 E0002EMPLOYEE2 MGR DEPT1LOC1 0000080000 E0004EMPLOYEE4 TL DEPT1LOC1 0000050000 E0006EMPLOYEE6 SE DEPT1LOC1 0000034000
MERGE Statement
MERGE statement is used to combine two or more sequentially ordered files into a single, merged, and sequentially ordered output file. The files should be in sorted order according to the ascending or descending key that is common for both files.
MERGE work-file-1
ON ASCENDING/DESCENDING KEY key-name-1 [key-name-2 ...]
[USING input-file-1 [, input-file-2 ...]]
[OUTPUT PROCEDURE IS para-1 THRU para-2]
[GIVING output-file-1 [, output-file-2 ...]]
Example - Let us assume two different employee files contain employee information, and create one merged file containing all employee details from two files.
Input -
MATEPK.EMPFILE.INPUT1
----+----1----+----2----+----3----+----4----+-- E0001EMPLOYEE1 DIR LOC1 0000100000 E0002EMPLOYEE2 MGR DEPT1LOC1 0000080000 E0004EMPLOYEE4 TL DEPT1LOC1 0000050000 E0006EMPLOYEE6 SE DEPT1LOC1 0000034000
MATEPK.EMPFILE.INPUT2
----+----1----+----2----+----3----+----4----+-- E0003EMPLOYEE3 MGR DEPT2LOC2 0000075000 E0005EMPLOYEE5 SSE DEPT1LOC1 0000045000 E0007EMPLOYEE7 SSE DEPT2LOC2 0000046000
Code -
----+----1----+----2----+----3----+----4----+----5----+
...
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT EMPFILE1 ASSIGN TO INPUT1.
SELECT EMPFILE2 ASSIGN TO INPUT2.
SELECT EMPFILEO ASSIGN TO OUTPUT1.
SELECT WORKFILE ASSIGN TO WORK1.
...
DATA DIVISION.
FILE SECTION.
* Use declarations from SORT example
...
PROCEDURE DIVISION.
MERGE WORKFILE
ON ASCENDING KEY WORK-EMP-NUM
USING EMPFILE1, EMPFILE2
GIVING EMPFILEO.
...
JCL -
//MATEPKRJ JOB MSGLEVEL=(1,1),NOTIFY=&SYSUID //* //STEP01 EXEC PGM=MERGEFLS //STEPLIB DD DSN=MATEPK.COBOL.LOADLIB,DISP=SHR //INPUT1 DD DSN=MATEPK.EMPFILE.INPUT1,DISP=SHR //INPUT2 DD DSN=MATEPK.EMPFILE.INPUT2,DISP=SHR //OUTPUT1 DD DSN=MATEPK.EMPFILE.OUTPUT1, // DISP=(NEW,CATLG,DELETE), // SPACE=(TRK,(3,2),RLSE), // UNIT=SYSDA, // DCB=(DSORG=PS,RECFM=FB,LRECL=47,BLKSIZE=470) //WORK1 DD DSN=&&TEMP, // DISP=(NEW,DELETE,DELETE), // SPACE=(CYL,(10,5),RLSE), // UNIT=SYSDA, // DCB=(DSORG=PS,RECFM=FB,LRECL=47,BLKSIZE=470) //SYSOUT DD SYSOUT=*
Output (MATEPK.EMPFILE.OUTPUT1) -
----+----1----+----2----+----3----+----4----+-- E0001EMPLOYEE1 DIR LOC1 0000100000 E0002EMPLOYEE2 MGR DEPT1LOC1 0000080000 E0003EMPLOYEE3 MGR DEPT2LOC2 0000075000 E0004EMPLOYEE4 TL DEPT1LOC1 0000050000 E0005EMPLOYEE5 SSE DEPT1LOC1 0000045000 E0006EMPLOYEE6 SE DEPT1LOC1 0000034000 E0007EMPLOYEE7 SSE DEPT2LOC2 0000046000
RELEASE Statement
RELEASE statement is used to send records from the input procedure to a sort work file in the sorting process. It is only used inside the INPUT PROCEDURE of a SORT operation.
RELEASE record-name [FROM ws-record-name]
Example - Let us use the input file from the above SORT example. But, we need to skip the DIR (having employee number E0001) record before sorting.
Input -
MATEPK.EMPFILE.NSINPUT1
----+----1----+----2----+----3----+----4----+-- E0004EMPLOYEE4 TL DEPT1LOC1 0000050000 E0002EMPLOYEE2 MGR DEPT1LOC1 0000080000 E0006EMPLOYEE6 SE DEPT1LOC1 0000034000 E0001EMPLOYEE1 DIR LOC1 0000100000
Code -
----+----1----+----2----+----3----+----4----+----5----+
...
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT EMPFILE1 ASSIGN TO INPUT1.
SELECT EMPFILEO ASSIGN TO OUTPUT1.
SELECT WORKFILE ASSIGN TO WORK1.
...
DATA DIVISION.
FILE SECTION.
* Use declarations from SORT example
...
WORKING-STORAGE SECTION.
01 WS-VAR.
05 WS-INPUT1-SW PIC X(01) VALUE 'N'.
88 INPUT1-EOF VALUE 'Y'.
88 NOT-INPUT1-EOF VALUE 'N'.
...
PROCEDURE DIVISION.
* Sorting WORKFILE using a input paragraph
* 1000-FILTER-RECORDS and writing to output file
* EMPFILEO.
SORT WORKFILE
ON ASCENDING KEY WORK-EMP-NUM
INPUT PROCEDURE IS 1000-FILTER-RECORDS
GIVING EMPFILEO.
STOP RUN.
...
* Input paragraph for SORT statement
1000-FILTER-RECORDS.
SET NOT-INPUT1-EOF TO TRUE
OPEN INPUT EMPFILE1.
READ EMPFILE1
AT END SET INPUT1-EOF TO TRUE
END-READ
PERFORM UNTIL INPUT1-EOF
* Releases the WORK-REC after processing.
IF EMP-NUM OF EMP-REC1
NOT EQUAL 'E0001'
MOVE EMP-REC1 TO WORK-REC
RELEASE WORK-REC
END-IF
READ EMPFILE1
AT END SET INPUT1-EOF TO TRUE
END-READ
END-PERFORM.
CLOSE EMPFILE1.
JCL - Use same JCL from SORT example
Output (MATEPK.EMPFILE.NSOUTPUT) -
----+----1----+----2----+----3----+----4----+-- E0002EMPLOYEE2 MGR DEPT1LOC1 0000080000 E0004EMPLOYEE4 TL DEPT1LOC1 0000050000 E0006EMPLOYEE6 SE DEPT1LOC1 0000034000
RETURN Statement
RETURN statement transfers records from the final phase of a sorting or merging operation to an OUTPUT PROCEDURE. It is used only within the OUTPUT PROCEDURE associated with a SORT or MERGE statement.
RETURN work-file-1 INTO record-1
AT END statements-block-1
NOT AT END statements-block-2
END-RETURN
Example - Let us use the input file from the above SORT example. But, we need to skip the DIR (having employee number E0001) record after the sort and before writing it to output file.
Input -
MATEPK.EMPFILE.NSINPUT1
----+----1----+----2----+----3----+----4----+-- E0004EMPLOYEE4 TL DEPT1LOC1 0000050000 E0002EMPLOYEE2 MGR DEPT1LOC1 0000080000 E0006EMPLOYEE6 SE DEPT1LOC1 0000034000 E0001EMPLOYEE1 DIR LOC1 0000100000
MATEPK.EMPFILE.INPUT2
Code -
----+----1----+----2----+----3----+----4----+----5----+
...
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT EMPFILE1 ASSIGN TO INPUT1.
SELECT EMPFILEO ASSIGN TO OUTPUT1.
SELECT WORKFILE ASSIGN TO WORK1.
DATA DIVISION.
FILE SECTION.
* Use declarations from SORT example
...
WORKING-STORAGE SECTION.
01 WS-SORTEOF-SW PIC X(01) VALUE 'N'.
88 SORT-EOF VALUE 'Y'.
...
PROCEDURE DIVISION
* SORT using OUTPUT PROCEDURE
SORT WORKFILE
ON ASCENDING KEY WORK-EMP-NUM
USING EMPFILE1
OUTPUT PROCEDURE IS 1000-WRITE-RECORDS.
STOP RUN.
1000-WRITE-RECORDS.
OPEN OUTPUT EMPFILEO.
* RETURN statement with LOOP to process all records
PERFORM UNTIL SORT-EOF
RETURN WORKFILE
AT END SET SORT-EOF TO TRUE
NOT AT END PERFORM 2000-WRITE-OUTPUT
END-RETURN
END-PERFORM.
CLOSE EMPFILEO.
2000-WRITE-OUTPUT.
* Skipping E0001 record while writing
IF WORK-EMP-NUM EQUAL 'E0001'
CONTINUE
ELSE
MOVE WORK-REC TO EMP-RECO
WRITE EMP-RECO
END-IF.
JCL - Use same JCL from SORT example
Output (MATEPK.EMPFILE.RETURNOP) -
----+----1----+----2----+----3----+----4----+-- E0002EMPLOYEE2 MGR DEPT1LOC1 0000080000 E0004EMPLOYEE4 TL DEPT1LOC1 0000050000 E0006EMPLOYEE6 SE DEPT1LOC1 0000034000