Page Data Sets
There are three kinds of memory in z Servers: - Central storage - you can compare it to RAM memory in PCs. - Auxiliary storage - disks. - Virtual storage - this is another name for Address Spaces. Each task on z/OS system can address full range of architecture maximum memory. 16EB in case of 64bit z/OS, even if there are only few GB of Central Storage. Page is a 4kB bit of Address Space. It is a small part of program, it can be data or executable instructions, simply part of a program. Pages that are used by system are stored in Central Storage for fast access, they are called Active Pages. If page is unused for some time z/OS can move it to Auxiliary Storage (discs). Page Data Sets are used for storing this pages. Process of moving pages from Auxiliary Storage to Central Storage and the other way around is called Paging. This process enables system to run more programs that could normally be stored in Central Storage - system can use more RAM that it physically has.
1. What will happen when Page Data Sets are becoming full? 2. How to display current utilization of Page Data Sets? 3. How to fix this issue?
Search for IRA200E message in documentation.
Search “z/OS MVS System Command” document.
Read about PAGEADD command in “z/OS MVS System Command”. To allocate Page Data Set you'll need IDCAMS.
When Page Data Sets are filling up message IRA200E AUXILIARY STORAGE SHORTAGE is issued by system. This means that there are simply no more storage to run more address spaces, both Central and Auxiliary storage is close to being full. In such case system won't accept LOGON, MOUNT and START commands so no user will be able to logon to the system and no new task can be started, no new address spaces won't be created. HMC will be needed to communicate with system. This happens when Page Data sets are filled up to 70% by default. 30% are left as safety measure to enable system to run and finish current workload. This is because in some situations this problem will be solved when some task that used a lot of Page Data Set space ends or is canceled.
PLPA 100% FULL E51A SYS1.PAGE.PLPA COMMON 5% OK E51A SYS1.PAGE.COMMON LOCAL 75% OK E72E SYS1.PAGE.LOCAL
With correct system setting COMMON Page Data Set shouldn't cause any problems, it's used for Common Area of Address Spaces. We're interested in LOCAL data set. As name suggests Common page data set stores pages of Common Area which is fixed and is shared by all Address Spaces. Local Page Data Sets store pages from private areas of every address space. Because of this they size and number should be appropriate to the workload on a particular system.
Adding addition data set will fix this issue, it is done by command: /PA PAGE=data.set.name This data set must be already allocated. All systems should have backup data set which can be used here, their names should be available in system documentation. In case system don't have such data set or it's not known you can allocate them with IDCAMS:
//PL00339A JOB NOTIFY=&SYSUID,MSGLEVEL=(1,1) //STEP1 EXEC PGM=IDCAMS //SYSPRINT DD SYSOUT=* //SYSIN DD * DEFINE PAGESPACE (NAME(SYS1.PAGE.LOCAL2) - CYLINDERS(200) - VOLUME(ZASMF1) - )
You can also use ISPF option 3.2;V;6. Many IDCAMS functions are available from those panels. They are worth checking before you decide to code a job but in many cases writing batch job may be better choice. Especially if you are going to use particular function multiple times. After issue is resolved you can delete added Page Data Set from use with command: /PD DELETE,PAGE=data.set.name Note that this command will exclude this data set from use as Page Data Set. Pages in it will be moved automatically to other active Page Data Sets, no need to worry about it. It won't remove the data set from disk, to do so use JCL below:
//PL00339A JOB NOTIFY=&SYSUID,MSGLEVEL=(1,1) //STEP1 EXEC PGM=IDCAMS //SYSPRINT DD SYSOUT=* //SYSIN DD * DELETE SYS1.PAGE.LOCAL2 - PURGE - PAGESPACE
You can also delete it faster with IDCAMS DELETE command - it can be used in 3.4 panel as line command. If no one is logged on system to allocate such data set (and no backup Page Data Sets are present) there is not other way but to free space in active Page Data Sets. Message IRA203I or IRA204E indicate which task is using a lot of Page Data Sets space. It may be necessary to cancel it.
EREP (Environmental Record Editing and Printing Program ) is yet another log in z/OS system. SYSLOG stores messages from various jobs and tasks, commands with their response, WTO and WTOR messages etc. SMF stores all kinds of statistical information from most products that run on z/OS. In comparison EREP stores hardware errors and events, for example read/write errors or processor machine checks.
1. Answer following questions: - What's LOGREC? - Where LOGREC is stored on your system? - Where you can define different LOGREC? - What happens when LOGREC becomes full? - What kind of reports you can generate with EREP Utilities? 2. Copy LOGREC to data set with your User ID as HLQ. 3. Create System Summary Report. Work on History Data Set you've created in Task#2. Include all error types in the report. 4. Modify job from Task#3. This time create Trends Report from 7 day period. 5. Modify job from Task#4. This time create Event Report from last 2 hours with System and Software errors only. 6. Modify job from Task#5. Create System Exception Report about IPLs. What other errors can you extract with this kind of report? 7. Modify job from Task#6. Create report with TRESHOLD=(50,50) parameter. What will be displayed? 8. Modify job from Task#7. Create Detail Report with all available data about all error except the ones about IPLs and Subsystem errors. This job can generate huge amount of records so modify JOB statement so the job is automatically canceled after generating 100 000 records. 9. Create job that offloads LOGREC into GDG.
Two documents describe EREP Facility: “EREP User's Guide” and “EREP Reference”.
You need to simply copy all records from LOGREC to your data set without any processing. LOGREC current copy is also called History Data Set in documentation.
What's LOGREC? LOGREC is another name for ERDS (Error Recording Data Set). This is a data set where EREP saves all its records. LOGREC can have two forms just like SMF log, it can be saved as Data Set or Logstream (z/OS System Logger). In this assignment we'll work on LOGREC saved to data set. ____________________ - Where LOGREC is stored on your system? To display information about LOGREC issue 'D LOGREC' command:
CURRENT MEDIUM = DATASET MEDIUM NAME = SYS1.LOGREC
There isn't much information here. You can only see whether data set or logstream is used and what's its name. ____________________ Where you can define different LOGREC? LOGREC name and logging mode is defined in IEASYSxx member. See 'LOGREC' parameter in “z/OS MVS Initialization and Tuning Reference” for more details. ____________________ What happens when LOGREC becomes full? EREP records are appended to LOGREC data set. When data set becomes full error message is issued: 'IFB080E LOGREC DATA SET NEAR FULL, DSN=dsname'. You can use this message as trigger to LOGREC offload task, use your System Automation for this task. Still, just like in the SMF case, offload task is not shipped with SMF so System Programmer must code it. When data set is full all new error records are simply lost. ____________________ What kind of reports you can generate with EREP Utilities? “EREP User's Guide” provides good guidance among various report types. The main types are: - System Summary Report; - Trends Report; - Event History Report; - System Exception Report Series; - Threshold Summary Report; - Detail Edit and Summary Report; Reports in the list are written in order from the most general to the most specific. In this Assignment you'll create few of them.
//JSADEK01 JOB NOTIFY=&SYSUID //COPY EXEC PGM=IFCEREP1,PARM='CARD' //SERLOG DD DSN=SYS1.LOGREC,DISP=OLD //ACCDEV DD DSN=&SYSUID..LOGREC.COPY,DISP=(NEW,CATLG), // SPACE=(CYL,(20,20),RLSE),RECFM=VB,BLKSIZE=4000 //TOURIST DD SYSOUT=* //SYSIN DD * PRINT=NO ACC=Y ENDPARM
Utility used for creating EREP reports is called IFCEREP1. You can also use it to simply copy LOGREC to another data set. - PARM='CARD' – defines that control statements will be passed in SYSIN DD statement instead of PARM parameter. - SERLOG – defines input data set, in this case it is active LOGREC. - ACCDEV – this DD defines output data set if 'ACC=Y' is coded. In this example there is no filter coded so all records are copied. - TOURIST – stores messages about IFCEREP1 Utility execution. - PRINT=NO – defines that LOGREC is simply copied without any modifications. - ACC=Y – defines that ACCDEV DD statement is used as output. - ENDPARM – ends control statements. LOGREC copy is also called History Data Set. EREP continuously saves new records to LOGREC, because of this before creating any reports you should copy LOGREC and work on the copy. Imagine that you want to run few jobs against actual LOGREC. With each run data in LOGREC will be different (new records are added). By having the copy you can work on consistent data. Also, you won't block active LOGREC with your jobs when working on copy.
//JSADEK01 JOB NOTIFY=&SYSUID //REPORT EXEC PGM=IFCEREP1,PARM='CARD' //ACCIN DD DSN=&SYSUID..LOGREC.COPY,DISP=SHR //DIRECTWK DD UNIT=SYSDA,SPACE=(CYL,50) //EREPPT DD SYSOUT=* //TOURIST DD SYSOUT=* //SYSIN DD * SYSUM HIST ACC=N ENDPARM
- ACCIN – DD statement used if you're working on History Data Set instead of active LOGREC. - DIRECTWK – this is additional DD statement for temporary work storage. - EREPPT – defines where report will be stored. In this case it's sent to spool. - SYSUM – defines report type. System Summary in this case. - HIST – defines that input is History Data Set so ACCIN and DIRECTWK DD statements are required. - ACC=N – we're not interested is storing records that match our filter so ACCDEV DD statement so we can omit it. If we used it we would have two outputs, ERERPT would store formatted reports with selected records while ACCDEV would store the same records but in unformatted format. - TYPE – this control statement defines what kind of errors are included in the report. It's not coded in this example so all error types are included. System Summary displays amount of errors in different categories so it's useful only for checking in what are there are some problems. With this information you can figure out what to check in more details.
//JSADEK01 JOB NOTIFY=&SYSUID //REPORT EXEC PGM=IFCEREP1,PARM='CARD' //ACCIN DD DSN=&SYSUID..LOGREC.COPY,DISP=SHR //DIRECTWK DD UNIT=SYSDA,SPACE=(CYL,50) //EREPPT DD SYSOUT=* //TOURIST DD SYSOUT=* //SYSIN DD * TRENDS DATE=(16105-16112) HIST ACC=N ENDPARM
- DATE – defines days from which records will be processed. Trends report are used for analysis of error patterns and their frequency. So it's another report type that can reviewed periodically to check if there are some new problems on the system.
//JSADEK01 JOB NOTIFY=&SYSUID //REPORT EXEC PGM=IFCEREP1,PARM='CARD' //ACCIN DD DSN=&SYSUID..LOGREC.COPY,DISP=SHR //DIRECTWK DD UNIT=SYSDA,SPACE=(CYL,50) //EREPPT DD SYSOUT=* //TOURIST DD SYSOUT=* //SYSIN DD * EVENT TYPE=S DATE=(16112) TIME=(0900-1100) HIST ACC=N ENDPARM
Event History Record is pretty detailed. Each event is stored in single line. TYPE=S defines that only errors related to System and Software failures are included in the report. If you want to include more than one type or errors you can simply add appropriate letter, for example: 'TYPE=SEM' will include three types of errors (S, E and M). For details about specific error types check “EREP Reference”.
//JSADEK01 JOB NOTIFY=&SYSUID //REPORT EXEC PGM=IFCEREP1,PARM='CARD' //ACCIN DD DSN=&SYSUID..LOGREC.COPY,DISP=SHR //DIRECTWK DD UNIT=SYSDA,SPACE=(CYL,50) //EREPPT DD SYSOUT=* //TOURIST DD SYSOUT=* //SYSIN DD * SYSEXN TYPE=I HIST ACC=N ENDPARM
IPL records store information about date and time of each IPL and amount of time LPAR was turned off. Other supported error types in System Exception Report are 'C', 'D', 'M' and 'O'. You can find supported types for each report type in “EREP Summary”.
//JSADEK01 JOB NOTIFY=&SYSUID //REPORT EXEC PGM=IFCEREP1,PARM='CARD' //ACCIN DD DSN=&SYSUID..LOGREC.COPY,DISP=SHR //DIRECTWK DD UNIT=SYSDA,SPACE=(CYL,50) //EREPPT DD SYSOUT=* //TOURIST DD SYSOUT=* //SYSIN DD * THRESHOLD=(50,50) HIST ACC=N ENDPARM
THRESHOLD report is used for viewing tape drive errors. In this example drives with 50 or more read/write errors are displayed.
//JSADEK01 JOB NOTIFY=&SYSUID,LINES=(100,CANCEL) //REPORT EXEC PGM=IFCEREP1,PARM='CARD' //ACCIN DD DSN=&SYSUID..LOGREC.COPY,DISP=SHR //DIRECTWK DD UNIT=SYSDA,SPACE=(CYL,50) //EREPPT DD SYSOUT=* //TOURIST DD SYSOUT=* //SYSIN DD * PRINT=AL TYPE=ABCDEFHMOTXYZ HIST ACC=N ENDPARM
LINES parameter defines that job is canceled when it reaches certain amount of records, it is coded in thousands of lines. Detail Report gives you the most control over report. You can choose what kind of data is included and about which error types. 'AL' value defines that all available data about each error is displayed.
//JSADEK01 JOB NOTIFY=&SYSUID // SET ERDS=SYS1.LOGREC // SET ARCHERDS=SYSU.MVS.LOGREC(+1) //COPY EXEC PGM=IFCEREP1,PARM='CARD' //SERLOG DD DSN=&ERDS,DISP=OLD //ACCDEV DD DSN=&ARCHERDS,DISP=(NEW,CATLG), // SPACE=(CYL,(20,20),RLSE),RECFM=VB,BLKSIZE=4000 //TOURIST DD SYSOUT=* //SYSIN DD * PRINT=NO ACC=Y ENDPARM //CLEAR EXEC PGM=IFCEREP1,PARM='PRINT=NO,ACC=Y,ZERO=Y', // COND=(0,NE) //SERLOG DD DSN=&ERDS,DISP=OLD //ACCDEV DD DUMMY //TOURIST DD SYSOUT=*
Code for the first step is the same as for copying LOGREC. SET statement is used to define data set names. CLEAR step is executed only after first step completed with CC=0. This time control statement were coded directly in PARM parameter. In such case ENDPARM control statement is not needed. You must code ACC=Y to be able to clear LOGREC but since they were already copied in previous they can be simply discarded. DUMMY parameter does the trick. If you browse active LOGREC you'll see that all the data is still there. It wasn't cleared at all. This is because ZERO=Y just resets pointer at which new records are written so now new records will overwrite existing data from the beginning. This pointer is defined in LOGREC header. You can compare both headers to see the difference. You can use this job as LOGREC offload task in one of two ways. You can add it to your scheduling tool and offload LOGREC every day. This is simpler approach but it's not good for production system. A problem may appear that will generate unusually high number of error records and LOGREC will fill up before offload. Many records will be lost that could be especially useful in such case. Second approach is to convert the job to a started task and start it as a response to 'IFB080E LOGREC DATA SET NEAR FULL, DSN=dsname' message. System Automation tool is the best way to do it.
SYSLOG & OPERLOG Basics
System Log is one of the most basic components of z/OS. It stores the most important messages issued by the system, tasks, jobs and users. It is probably the most often accessed part of z/OS system, because of this you can see SYSLOG and the main z/OS log although only small part of available data is stored in there. Each task and job have additional logs that store messages unavailable in SYSLOG. There are also other system logs like SMF or LOGREC, even more data is available in dumps. As you can see SYSLOG is not the only place where you can find needed information but it's the first place to look. Data included there is sufficient for most activities, in other case it can point your search in the right direction.
1. Answer following questions: - How can you access SYSLOG and OPERLOG? - What kind of messages are written to SYSLOG? - Where SYSLOG is stored? - When SYSLOG is created and when closed? - What's SYSLOG buffer and how can you display it? - What's the difference between SYSLOG and OPERLOG? 2. Describe each column in SYSLOG. 3. Display following information in OPERLOG: - WTORs from a single LPAR. - Only DB2 messages. - Only messages from last hour. - Only DB2 error messages from last hour. - Any message that contains your user ID. 4. Send message to SYSLOG. 5. You get following message: 'IEA404A SEVERE WTO BUFFER SHORTAGE - 100% FULL' What will happen to the new WTO messages? How to fix this issue? 6. Workload on your system increased lately. Because of this you must increase various limits related to message buffers. Here are the requirements for each buffer: - SYSLOG: 5000 - WTO: 2000 - WTOR: 200 This is permanent change so ensure it is retained after IPL. 7. Create SYSLOG offload task. 8. Create TWS Application that will run your offload task periodically. The Application should have four Operations: - Start of External Writer task created in Task#7. - Close SYSLOG (new SYSLOG will be automatically started). - Wait 30 seconds. - Close External Writer task. 9. Test alternate way of starting your offload task. Use '$TA' JES2 command for this Task.
Use build-it SDSF help for specific panels. There is also very useful presentation from Bruce Koss and Wells Fargo. Search web for “SDSF - Beyond the Basics”.
Search “z/OS MVS Command Reference” for appropriate command.
The easiest way to offload SYSLOG is to use JES2 External Writer. See “JES2 Initialization and Tuning Guide” and “z/OS JES Application Programming” for more details. Standard External Writer procedure is available in PROCLIB concatenation under SYS1.PROCLIB(XWTR) name. Use is for analysis and as a base of your own procedure. Note that stopping tasks that have IEFRDER DD statement are closed in slightly different way than normal tasks. See description of START/STOP command in “z/OS MVS Command Reference”.
Check “z/OS JES2 Commands” for the description of '$TA' command.
How can you access SYSLOG and OPERLOG? Both logs are available via SDSF. 'LOG S' – command for accessing SYSLOG. 'LOG O' – command for accessing OPERLOG. 'LOG' – by default LOG command displays SYSLOG but it can be changed in SDSF setting. ____________________ What kind of messages are written to SYSLOG? SYSLOG contains messages that are sent from consoles to system and vice-verse. Commands issued by operators or tasks along with their responses. Overall are three assembler macros that write messages to log: WTL (Write To Log), WTO (Write To Operator) and WTOR (Write To Operator for Response). ____________________ Where SYSLOG is stored? System Log is stored in spool. It is actually a Started Task, you can prefix it in SDSF. You can enter it just like any other output with 'S' or '?' action character. Under '?' option you'll see all SYSLOG outputs also closed but not offloaded SYSLOGs. Go right and you'll see data set name in which SYSLOG is stored, for example “+MASTER+.SYSLOG.STC03001.D0000101.?”. This is spool data set so you cannot access it via 3.4 or JCL, you must use SDSF. ____________________ When SYSLOG is created and when closed? SYSLOG is automatically created during IPL. After that it isn't offloaded by the system so if there is no offload task SYSLOG will eventually fill up the entire spool. You can spin SYSLOG output with WRITELOG command. In this case log is closed and new SYSLOG starts. After that you can offload old SYSLOG from spool. SYSLOG is also closed by 'Z EOD' command, but this command should be used only during System Shutdown (before System Reset). ____________________ What's SYSLOG buffer and how can you display it? Normally SYSLOG records are written to HARDCOPY data set that is located on spool. If there are some problems with it or logging is turned off manually the buffer is used. You can display this buffer with 'D C,HC' command.
MSG:CURR=0 LIM=1500 RPLY:CURR=0 LIM=10 SYS=ADCD PFK=00 HARDCOPY LOG=(SYSLOG) CMDLEVEL=CMDS ROUT=(ALL) LOG BUFFERS IN USE: 0 LOG BUFFER LIMIT: 1000
First line is universal for all 'D C' commands, we're interested in the rest of the output. As you can see buffer is empty and there are only 1000 records that can be stored in there. You can turn off logging to HARDCOPY data set with following command: 'V SYSLOG,HARDCPY,OFF,UNCOND' You'll soon notice that log buffer will start to fill up. Activate HARDCOPY again: 'V SYSLOG,HARDCPY' ____________________ What the difference between SYSLOG and OPERLOG? Both logs store the same messages but OPERLOG provides more functionality: - OPERLOG is normally used in Sysplex so SYSLOGs from all LPARs in Sysplex are available in one place. - OPERLOG is saved as Logstream while SYSLOG is stored as spool data set. - OPERLOG uses different colors for various message types. You can also customize them. - OPERLOG has more functions that make searching easier. FILTER command is especially useful.
You can display description of all SDSF columns in the help panel. Enter SYSLOG and press PF1(Help). Then choose option 3(Fields on the SYSLOG panel). 1 - Record type. Record type field is quite useful because it clearly shows in which line message starts and in which ends. This seems like not much but in reality it makes reading SYSLOG easier and can come in handy during problem investigation. Here are possible values: - N - single-line message. - W - single-line message with reply. - M - first line of a multi-line message. - L - multi-line message label line. - D - multi-line message data line. - E - multi-line message data/end line. - S - continuation of previous line. - O - LOG command input. - X - non-hardcopy or LOG command source. Basically important thing is know how to recognize beginning of the new message. N, W, M and X characters indicate that. 2 - Request type. - C – Command. - R – Reply. - I - Internally issued command. - U - Command from unknown console. - blank – other. 3-9 - Routing code. 11-14 - System name. 20-24 - Date in YYDDD format. 26-36 – Time. 38-45 – Message source. It can be many things, most often you'll find there ID of JOB, STC or TSU. Sometimes console name or special name such as INTERNAL is shown there. In case of multi-line message, it's number is shown here. 47-54 - User exit flag. 57-133 – Message text.
WTORs from a single LPAR. 'RSYS lpar' – WTORs are always shown at the bottom of the log. 'RSYS *' - shows all WTORs again. ____________________ Only DB2 messages. DSN is the standard prefix for messages related to DB2 so you can use following Filter: 'FIL MSGID EQ DSN*' ____________________ Only messages from last hour. You can use two filters by using '+' character: 'FIL TIME GE 22:00' – turns on first filter. 'FIL +TIME LE 23:00' – with '+' character you can add another filter without removing other filters. 'FIL ?' - displays dialog with all active filters. You can set up max 25 filters there. ____________________ Only DB2 error messages from last hour. Use the same filters as in previous example. All you need to do is to add third filter:
Filtering is ON AND/OR between columns AND (AND/OR) AND/OR within a column AND (AND/OR) Column Oper Value (may include * and %) TIME GE 10:00 TIME LE 11:00 MSGID EQ DSN*E
Error messages ends with 'E' character so mask 'DSN*E' is what we need. You also need to pay attention to AND/OR relation. ____________________ Any message that contains your user ID. This example will require two filters one for JOBNAME column and one for MSGTEXT. You also have to change filter relation to 'OR' in 'between columns' field:
AND/OR between columns OR (AND/OR) AND/OR within a column AND (AND/OR) Column Oper Value (may include * and %) JOBNAME EQ JSADEK MSGTEXT EQ JSADEK
If you would like to see all messages related to your user ID it's good idea to add another two filters for your TSU ID. In this case you'll also need 'OR' in 'within a column' field because now you search for two values in each column:
AND/OR between columns OR (AND/OR) AND/OR within a column OR (AND/OR) Column Oper Value (may include * and %) JOBNAME EQ JSADEK MSGTEXT EQ JSADEK JOBID EQ TSU20300 MSGTEXT EQ TSU20300
The easiest way is MVS 'LOG' command: “/LOG 'Hello!'” Here is what you'll get after issuing this command:
NC0000000 SYSA 16118 00:49:05.25 JSADEK 00000290 LOG 'HELLO' O HELLO
As you already know 'NC' characters at the beginning indicates that this is new message and that it is command. There is also another command you can use for that purpose: “/$DMjobid,'Hello!'” Where jobid is ID of SYSLOG task. You can send message to any active address space this way. SYSLOG also works as Master Console, because of this you can also use “/$DMR0,'Hello!'” command. Another way is “/$DMM1,'Hello!'”. Both of these commands will also send message to other Master Consoles so it's not recommended to use them in this case.
'IEA404A SEVERE WTO BUFFER SHORTAGE - 100% FULL' message indicates that system message buffer is full. Issue 'D C,B' message to check details. You'll see current buffer usage and the name of problematic task/console. When the limit is reached all new messages are discarded which may be serious problem for many tasks. This is critical problem that should be fixed ASAP. Use '/K M,REF' command to display full setting for z/OS consoles:
IEE144I K M,AMRF=Y,MLIM=1000,RLIM=0010,UEXIT=N,LOGLIM=001000,ROUTTIME=000,RMAX=0099
To fix the issue simply increase the limit: '/K M,MLIM=2000' Now that issue is temporarily fixed, you should investigate reason of the problem take care of it. Perhaps the limit is too small and you should increase it for good. You'll do that in next assignment. Note: There can be the same problem with WTL buffer (this is the message buffer for SYSLOG): 'IEE767A SEVERE BUFFER SHORTAGE FOR SYSTEM LOG - 100% FULL' The difference is that SYSLOG buffer is checked with '/D C,HC' command. You can increase WTL buffer limit with '/K M,LOGLIM=xxxx' command.
Usually changes like that done without IPL so you'll need to modify two things: - Current setting. - PARMLIB member. Changing setting with 'K M' command will be in effect only until system shutdown. During IPL setting is read from PARMLIB. Here is example setting viewed via 'K M,REF' command:
IEE144I K M,AMRF=Y,MLIM=1000,RLIM=0010,UEXIT=N,LOGLIM=001000,ROUTTIME=000,RMAX=0099
WTOR limit must be set to 200 but changing RLIM alone is not enough because RMAX (max value of reply ID) restricts the limit to 99. In total you need to change four parameters: - LOGLIM – WTL buffer limit. - MLIM – WTO buffer limit. - RLIM – WTOR buffer limit. - RMAX – max reply ID number. All those parameters can be changed in CONSOLxx PARMLIB member. Check “z/OS MVS Initialization and Tuning Reference” for details. 1. Find suffix of currently used CONSOLxx member. 'CON' parameter in IEASYSxx member defines it. 2. Create backup copy of CONSOLxx. 3. Modify CONSOLxx member. RMAX parameter needs to be added/changed in DEFAULT statement. Other parameters are coded under INIT statement:
INIT CMDDELIM(") MONITOR(DSNAME) MMS(00) MPF(00) PFK(00) UEXIT(N) CNGRP(00) LOGLIM(5000) MLIM(2000) RLIM(200) DEFAULT ROUTCODE(ALL) RMAX(200)
4. Modify current system setting with following command: “K M,LOGLIM=5000,MLIM=2000,RLIM=200,RMAX=200” 5. Verify with 'K M,REF' command if new setting was applied correctly:
IEE144I K M,AMRF=Y,MLIM=2000,RLIM=0200,UEXIT=N,LOGLIM=005000,ROUTTIME=000,RMAX=0200
In a few exercises in JCL category you were working with Internal Reader. You can see it as JES program that moves data from System into spool. It can also detect if passed data is command or JCL code and act appropriately. Here we'll use External Writer which is similar JES program but it transfers data in opposite direction - from JES into DD statement. External Writer allows you to select job output by following criteria: – Output class – Job ID – Forms specification – Destination – Output writing routine We'll use 'Output class' here. Using different criteria such as JobID requires Output Separator Routine - SYS1.LINKLIB(IEFSD094). Check “z/OS JES2 Initialization and Tuning Guide” for more information about it. JCL Code:
//XWTRLOG PROC VS=SVMVS1,DS=SYSU.MVS.SYSLOG(+1) //EXTWTR EXEC PGM=IASXWR00,PARM='PL' //IEFRDER DD DSN=&DS,DISP=(NEW,CATLG),VOL=SER=&VS, // SPACE=(CYL,(20,20),RLSE),LRECL=137,RECFM=VBA // PEND
External Writer Utility is called IASXWR00. It saves selected spool outputs into DD statement name IEFRDER and then removes them from spool. In this example we're using two Symbolic Parameters 'DS' for data set name and 'VS' for VOLSER on which the data set will be saved. Control statements passed in PARM parameter define what outputs are selected to offload. First character, in this case 'P' means that Output waits for print while 'C' for punching, since cards and punchers aren't used for a long time 'P' character is always used. Later characters specify output classes from which output is taken. In this case 'L' class is used. External Writer takes all outputs from specified class and writes them to single DD statement. Because of this you should assign one class exclusively for SYSLOG processing so other spool outputs aren't saved to the same data set. VBA is the best record format because each SYSLOG record has different length so using FB would be a waste of storage. 'A' in there means that ANSI print characters are also saved. You don't have to use it but if anyone ever wants to print SYSLOG it's better to leave them in output. LRECL value (137) is standard record length for spool data sets. There is one more characteristic of External Writer you need to know - it is stopped differently than other tasks:
JOBNAME StepName XWTRLOG 0AA0
'/P XWTRLOG' won't work because the task uses IEFRDER DD statement. In such cases you must use step name to stop the task. You can use two formats: '/P 0AA0' '/P XWTRLOG.0AA0' This complicates stopping it because 0AA0 is generated by the system and it may be different with each run. Fortunately there is way to bypass it. You can start the task with command: '/S XWTRLOG.WTR' WTR name doesn't matter, it can be anything, the point is that it is constant. Now you know what is the step name and you can safely stop it with 'P WTR' command. See description of START or STOP command in “z/OS MVS Command Reference” for more details. You can test if your task is working correctly by issuing following commands: '/S XWTRLOG.LOG' - starts External Writer task. '/WRITELOG L ' - closes SYSLOG and puts it in class L. New SYSLOG is automatically opened. '/P LOG ' - closes External Writer task. You can also write small job that will do that:
/*$VS,'S XWTRLOG.LOG' /*$VS,'WRITELOG L' /*$VS,'P LOG' //JSADEK01 JOB //STEP1 EXEC PGM=IEFBR14
In this case commands are issued immediately one after another. It's not a problem while you're using STOP (P) command for XWTRLOG. In such case task will receive the command but it will close only after successful log offload. “/*VS,'command'” - this is JES2 control statement that enables you to issue any JES2 or MVS command at the beginning of the job. You must be authorized to use it.
Oper Duration Job name Internal predecessors Morepreds ws no. HH.MM.SS -IntExt- NREP 001 00.00.01 ________ ___ ___ ___ ___ ___ ___ ___ ___ 0 0 CPUM 010 00.01.00 LOGWTR10 001 ___ ___ ___ ___ ___ ___ ___ 0 0 WAIT 015 00.00.30 ________ 010 ___ ___ ___ ___ ___ ___ ___ 0 0 CPUM 020 00.01.00 LOGWTR20 015 ___ ___ ___ ___ ___ ___ ___ 0 0 NREP 255 00.00.01 ________ 020 ___ ___ ___ ___ ___ ___ ___ 0 0
Operation 015 runs on WAIT workstation. It will run for the time specified as operation duration. As always first operation must be Time dependent, this way Application will run when scheduled, not immediately when added to current plan. LOGWTR10:
/*$VS,'S XWTRLOG.LOG' /*$VS,'WRITELOG L' //LOGWTR10 JOB //STEP1 EXEC PGM=IEFBR14
/*$VS,'P LOG' //LOGWTR20 JOB //STEP1 EXEC PGM=IEFBR14
Depending on the RACF configuration you may need to add authorization for TWS user. Run cycles depends on your configuration requirements. On busy systems SYSLOG is offloaded every day, usually at 00:00. On smaller systems you may want to offload it once a week:
Name of In Out of period/rule Input Deadline F day effect Effect Text HH.MM day HH.MM Type rule YY/MM/DD YY/MM/DD WEEKLY__ 00.00 00 00.10 R 3 16/05/01 71/12/31
In this example it will run every Monday at 00:00 even on holidays (F day rule=3).
--- Frequency --- --- Day --- --- Cycle Specification --- ------------------------------------------------------------------------------- S Only | S Day | S Week _ January _ July _ Every | _ Free day | _ Month _ February _ August | _ Work day | _ Year _ March _ September S First _ Last | _ Monday | _ April _ October _ Second _ 2nd Last | _ Tuesday | _ May _ November _ Third _ 3rd Last | _ Wednesday | _ June _ December _ Fourth _ 4th Last | _ Thursday | Week number __ __ __ __ __ __ _ Fifth _ 5th Last | _ Friday | Period name ________ ________ ___ ___ ___ ___ | _ Saturday | ________ ________ ___ ___ ___ ___ | _ Sunday | ___ ___ ___ ___ | | Shift default origin by ___ days
The last thing to do is to include in in Long Team (2.2.2 panel) and then Current Plan (panel 3.1 panel).
With JES2 you can define commands that are automatically issued at specific time or in time intervals. You can display currently active automatic commands with '/$TA,ALL' command. Only JES commands can be issued that way but you can bypass this by using '$VS' command, which is JES2 command that issues MVS command. To test Automatic Commands you can issue: “/$TATEST,I=20,'$VS,''D T'''” TEST is the name of automatic command. Note that each quote inside command is doubled. In effect “$VS,'D T'” command is issued. This command is issued every 20 seconds. To turn in off you need to use Cancel command: '/$CATEST' In this Task we need to issue three commands every day at specific time. 'S XWTRLOG.LOG' 'WRITELOG L' 'P LOG' Commands:
/$TALOG1,T=24.00,I=86400,'$VS,''S XWTRLOG.LOG''' /$TALOG2,T=24.00,I=86400,'$VS,''WRITELOG L''' /$TALOG3,T=24.01,I=86400,'$VS,''P LOG'''
Setting up interval (I=86400) alone is not enough because in such case command will be issued immediately and 24 hour countdown starts at that moment. By using both 'T' and 'I' parameters you'll define start time of the interval. Those command will be issued at the nearest midnight and interval countdown starts then. Also note that 'T' parameter does not mean “start time” but “time since last midnight” so using 00.00 will effect with immediate command run. Because of this 24 value should be used. Value of 'I' parameter varies from 10 to 86400 so this method cannot be used if you want to issue command less often than once per day. Note: Automatic Commands aren't retained after IPL. To keep them for good you need to modify JES2PARM member. You can simply add them at the end of JES2PARM as normal commands - they'll be executed during JES2 startup.
Installing a basic STC
Installing software on z/OS is often very complex process that involves many teams (System Programmers, Storage, DBDC, System Automation, Security and so on). Installation manuals are usually 500 pages long bricks that you'll have to review during installation planning. In this Assignment you'll install very simple software and perform some basic tasks of Storage, Security and System Programmer's teams to gain more global view of software installation. Software we'll use is ENQWATCH created by Kevin E. Ferguson and available on CBT Tape website: www.cbttape.org. Click on “CBT” tab and download File#844. The task data sets will be stored under 'TOOLS.ENQWATCH.**' naming convention. This exercise assumes that 'TOOLS.**' data sets doesn't exist on the system so we'll have a chance to configure many aspects of the environment. Warning: Because you modify a lot of z/OS configuration you shouldn't perform this activities on system that you or your team does not own.
1. Send ENQWATCH to z/OS and unpack it. 2. Prepare storage environment #1: - Select free NON-SMS DASD you'll use for TOOLS.** data sets. - Reinitialize this DASD so it can be used by SMS. - Create new Management Class. - Create new Storage Class. - Create new Storage Group. - Add your DASD to the Storage Group. - Verify SCDS for possible errors. 3. Prepare storage environment #2: - Modify ACS routines so 'TOOLS.**' data sets will use definitions from Task#2. - Test them. - Validate SCDS. - Activate updated SCDS. 4. Prepare RACF environment #1: - Define data set profile for 'TOOLS.**' data sets. - Give appropriate access rights to this profile. 5. Prepare storage environment #3: - Create new User Catalog that will store only 'TOOLS.**' data sets. - Create TOOLS Alias. - Allocate few 'TOOLS.**' data sets to test storage setting. 6. Prepare ENQWATCH data sets: - Move ENQWATCH data set under 'TOOLS.ENQWATCH.INSTALL' name. - Copy ASSEMBLE member to 'TOOLS.ENQWATCH.CNTL'. - Compile ENQWATCH to 'TOOLS.ENQWATCH.LINKLIB'. - Create ENQWATCH startup procedure in 'TOOLS.ENQWATCH.CNTL'. 7. Prepare MVS environment: - Add new member to PROCLIB concatenation called 'TOOLS.PROCLIB'. - Copy ENQWATCH startup procedure 'TOOLS.PROCLIB'. - Add 'TOOLS.ENQWATCH.LINKLIB' to APF concatenation. 8. Prepare RACF environment #2: - Create STCTOOLS user. - Create profile in STARTED class that associates STCUSER with ENQWATCH startup procedure. - Give STCTOOLS user ALTER authority for 'TOOLS.**' data sets. 9. Test ENQWATCH: - Start the task and confirm it it runs under the right RACF user. - Block a data set with edit option and then run job that tries to allocate that dataset and wait for an alert. - Display all available info about ENQWATCH setting. - Change cycle time for 30 seconds. - Restart ENQWATCH and check if new cycle time is still used.
File characteristic are described on CBT website, you'll need to know them to successfully transfer the file. You can check how to send file to z/OS through PCOMM in IEBGENER Assignment from Utilities category. To unpack it you'll need to use TSO RECEIVE command. Check TSO documentation for more details.
Storage configuration can be very complex and if you're not sure about your selection or simply use z/OS on which you're not the main Programmer you should consult Storage Administrator to select appropriate STORCLAS, MGMTCLAS, STORGRP and User Catalog.
ACS routines are nothing else than a very simple script language used for assigning SMS constructs to data sets (DATACLAS, STORCLAS, MGMTCLAS, STORGRP). SMS uses those constructs when new data sets are allocated, during Primary Space Management and other operations on data sets such as migration. ACS syntax and variables are described in “Chapter 17. Writing ACS routines” of “DFSMSdfp Storage Administration”.
Every system should have RACF documentation that describes RACF Groups and their purpose. If you don't have it, you need to analyze groups and users on your system and decide what access they right they have and need. Before defining Data Set profiles you should check EGN(Enhanced Generic Names) setting. You can read more about EGN in “Appending A” of “RACF Command Language Reference”.
Publication "DFSMS Managing Catalogs" is all you need to be able to define catalog. Beforehand you should check how catalogs are configured on your system: - What is their naming convention and parameters. - Where there are placed and managed by SMS. - How RACF protects them.
You can read about STARTED class profiles in "RACF Security Administrator's Guide".
Using file transfer in PCOMM was already described in IEBGENER Assignment. The second most popular Terminal emulator is x3270 (x3270.bgp.nu). Here is command you can use to send file from Windows to z/OS through wc3270 – Windows version of x3270.
transfer direction=send localfile=C:\Users\User\Desktop\FILE844.XMI hostfile=file944.xmi mode=binary recfm=fixed lrecl=80 blksize=8000
XMI means that file was packed through TSO TRANSMIT command. You can unpack it with TSO RECEIVE command: 'RECEIVE INDATASET('JSADEK.FILE844.XMI')' After issuing it you'll be prompted for additional parameters. Now you can specify output data set: 'DATASET('JSADEK.FILE844.PDS')'
We need to create few things here: DASD – lets name it SOFT01 STORGRP – SOFTSG STORCLAS – SOFTSC MGMTCLAS - SOFTMC 1. Prepare DASD You can check how reinitialize DASD in ADRDSSU assignment. Of course first you need to have free DASD which you can use. On most systems there are spare drives waiting to be used for storage problems or for creation of new Storage Group like in this case. To be able to use it in Storage Group this DASD must be initialized with SMSDS and SG keywords:
//*--------------------------------------------------------------------- //* PUT VOLUME OFFLINE //*--------------------------------------------------------------------- //STEP010 EXEC PGM=SDSF //SYSPRINT DD SYSOUT=* //ISFOUT DD SYSOUT=* //ISFIN DD * /V 0A9D,OFFLINE //*--------------------------------------------------------------------- //* INIT SMS VOLUME //*--------------------------------------------------------------------- //CLIP EXEC PGM=ICKDSF,PARM=NOREPLYU,REGION=6M //SYSPRINT DD SYSOUT=* //SYSIN DD * INIT VFY(SPARE4) UNIT(0A9D) VOLID(SOFT01) VTOC(1,0,90) INDEX(0,1,14) - SMSDS SG //*--------------------------------------------------------------------- //* PUT VOLUME ONLINE //*--------------------------------------------------------------------- //STEP020 EXEC PGM=SDSF //SYSPRINT DD SYSOUT=* //ISFOUT DD SYSOUT=* //ISFIN DD * /V 0A9D,ONLINE
You can now check if DASD is in CONVERT status: ISMF >>> 2(Volume) >>> 1(DASD) >>> 1(Physical) >>> Physical Status
VOLUME PHYSICAL SERIAL STATUS -(2)-- --(22)-- SOFT01 CONVERT
4. Create Management Class. 'TOOLS.**' data sets will be used by software data sets like load modules, panels, procedures etc. This means that they shouldn't be ever migrated or deleted, but they should be regularly backed up. To create new MGMTCLAS enter ISMF and change your view mode to Administrator: ISMF >>> 0(ISMF Profile) >>> 0(User Mode Selection) >>> Option 2 You cannot modify Active SMS configuration. To define/modify any SMS construct or ACS routine you must work on Secondary CDS. Use 'D SMS,ACTIVE' command to display info about ACDS and SCDS. Now you can define MGMTCLAS in SCDS: ISMF >>> 3(Management Class) >>> 3(Define) Here you can see description of each field by putting cursor on it and pressing PF1. You need to change setting regarding Migration:
Migration Attributes Primary Days Non-usage . . . . 9999 (0 to 9999 or blank) Level 1 Days Non-usage . . . . NOLIMIT (0 to 9999, NOLIMIT or blank) Command or Auto Migrate . . . . COMMAND (BOTH, COMMAND or NONE)
Backup Attributes Backup Frequency . . . . . . . . 1 (0 to 9999 or blank) Number of Backup Vers . . . . . . 3 (1 to 100 or blank) (Data Set Exists) Number of Backup Vers . . . . . . 1 (0 to 100 or blank) (Data Set Deleted) Retain days only Backup Ver . . . 90 (1 to 9999, NOLIMIT or blank) (Data Set Deleted) Retain days extra Backup Vers . . 30 (1 to 9999, NOLIMIT or blank) Admin or User command Backup . . BOTH (BOTH, ADMIN or NONE) Auto Backup . . . . . . . . . . . Y (Y or N) Backup Copy Technique . . . . . . S (P, R, S, VP, VR, CP or CR)
After you're done you can leave this screen by pressing PF3. 3. Create Storage Class. In similar way you define STORCLAS. When you put cursor on editable field and press PF1 ISMF will display description of each parameter. In this case we can use the default setting. 4. Create Storage Group. We'll need a simple Storage Group of POOL type with Auto-Migration set to NO and Auto Backup set to YES. You've already set setting regarding Migration and Backup in MGMTCLAS but this setting overwrites it. Basically data set will be Migrated/Backed Up only if Auto-Migrate is turned on in both it's MGMTCLAS and STORGRP. If data set has not MGMTCLAS only setting from STORGRP applies. 5. Add your volume to Storage Group. ISMF >>> 4(Volume) >>> 2(Define) – You can add your volume the SOFTSG STORGRP and it's initial status. Now you can list your storage group and check with 'LISTV' ISMF command details about volumes in SOFTSG. 6. Validate your modification. ISMF >>> 8(Control Data Set) >>> 4(Validate) the SCDS you've modified. Next specify some DS name with your ID as HLQ. ISMF will allocate it and use print there validation results. At this point you should receive following error:
IGD06023I STORAGE GROUP SOFTSG IS NOT REFERENCED BY THE STORAGE GROUP ACS ROUTINE
This is only informational message and you could activate SCDS with it but nothing would be written to SOFTSG. The last thing to do is to modify ACS routines.
You can use ISMF >>> 8(Control Data Sets) >>> 1(Display) to view localization and names of ACS routines. This is critical Storage setting so you always must be very careful while modifying them. You should create a backup copy and pay special attention to the instruction logic. SMS executes ACS routines one by one in following order: DATACLAS, STORCLAS, MGMTCLAS, STORGRP. Here are fragments of ACS that will be used for 'TOOLS.**' data sets. STORCLAS:
FILTLIST TOOLS INCLUDE(TOOLS.**) IF &DSN=&TOOLS THEN DO SET &STORCLAS = 'SOFTSC' EXIT CODE(0) END
IF &STORCLAS = 'SOFTSC' THEN DO SET &MGMTCLAS = 'SOFTMC' EXIT CODE(0) END
IF &STORCLAS = 'SOFTSC' THEN DO SET &STORGRP = 'SOFTSG' EXIT CODE(0) END
As you can see data set prefix 'TOOLS.**' is used only in STORCLAS and both STORGRP and MGMTCLAS reference STORCLAS, not data sets directly. This is recommended way to code ACS routines. Before activating ACSes we need to test them. As always you need to work on SCDS. ISMF >>> 7(Automatic Class Selection) >>> 2(Translate) Translation converts ACS to object form. After that operation you'll be able to create test cases that will validate if they correctly assign SMS constructs to specific data sets. Via option 5(Display) you can check date of last ACS translation. To test ACS routines you need to create Test Case first: ISMF >>> 7(Automatic Class Selection) >>> 4(Test) >>> 1(Define). You need to allocate new PDS in which you'll store Test Cases. You reference this PDS on this test panel:
Select one of the following Options: 1 1. DEFINE - Define an ACS Test Case 2. ALTER - Alter an ACS Test Case 3. TEST - Test ACS Routines If DEFINE or ALTER Option is Chosen, Specify: ACS Test Library . . 'JSADEK.ACS.TEST' ACS Test Member . . TEST1
On the next screen you can specify wide range of SMS setting to test how ACS will behave. We're interested if our new definitions will be assigned to 'TOOLS.**' data sets.
ACS Test Library : JSADEK.ACS.TEST ACS Test Member . : TEST1 To DEFINE ACS Test Case, Specify: Description ==> NEW SMS SETTING FOR 'TOOLS.**' DATA SETS Expected Result DSN (DSN/Collection Name) . . TOOLS.SOME.DATASET MEMN (Object Name) . . . . .
Next we can execute Test Case: ISMF >>> 7(Automatic Class Selection) >>> 4(Test) >>> 3(Test).
To Perform ACS Testing, Specify: CDS Name . . . . . . 'SYS1.SCDS' (1 to 44 Character Data Set Name or 'Active') ACS Test Library . . 'JSADEK.ACS.TEST' ACS Test Member . . TEST1 (fully or partially specified or * for all members) Listing Data Set . . 'JSADEK.ACS.LIST' (1 to 44 Character Data Set Name or Blank) Select which ACS Routines to Test: DC ===> Y (Y/N) SC ===> Y (Y/N) MC ===> Y (Y/N) SG ===> Y (Y/N)
In ACS Test Library you point to the Test Case you've just defined. Listing data set don't have to exist, it will be allocated automatically. After running test you should get results in which you data set is assigned to SOFTSC, SOFTMC and SOFTSG (or whatever names you've used). Now that we have ACS routines tested we can activate updated SMS configuration. Of course before that we need to Validate SCDS for errors. ISMF >>> 8(Control Data Set) >>> 4(Validate) If you get no errors you can activate it: ISMF >>> 8(Control Data Set) >>> 5(Activate) Or with a command: 'SETSMS SCDS(SYS1.SCDS)'
Here are example RACF activities needed for creation of new data set profile, they may differ depending on your configuration. 'AG TOOLS OWNER(DATASET) SUPGROUP(DATASET)' It's a good practice to define separate owner group for each DS profile. Users that have Group-Operation attribute in Owner group of DS profile have Alter authority over all data sets under covered by the profile. Using specific owner group will help to avoid such situation. 'AD 'TOOLS.*.**' OWNER(TOOLS) UACC(NONE) AUDIT(FAILURES,UPDATE)' Creates Data Set Profile with universal access set to NONE – this is recommended option for most data sets. We also set that RACF will write SMF records each time someone without authorization will try to UPDATE, CONTROL or ALTER any data set protected by this profile. Now we can add access right to desired user groups: 'PE 'TOOLS.*.**' ACCESS(ALTER) GEN ID(SECURITY)' 'PE 'TOOLS.*.**' ACCESS(ALTER) GEN ID(SYSPROG)' 'PE 'TOOLS.*.**' ACCESS(ALTER) GEN ID(STORAGE)' 'PE 'TOOLS.*.**' ACCESS(ALTER) GEN ID(DBDC)' 'PE 'TOOLS.*.**' ACCESS(READ) GEN ID(OPER)' 'PE 'TOOLS.*.**' GEN ID(JSADEK) DELETE' Unless RACF uses NOADDCREATOR option the person who created the profile is automatically added on its ACL with ALTER authority. ACLs shouldn't contain human users so we'll remove that record. 'SETR GENERIC(DATASET) REF' Refresh in-storage generic data set profiles. 'LD DA('TOOLS.*.**') GEN AU' Confirm if your changes were successfully applied. Note: Mask syntax in Data Sets Profile Depends on EGN option (Enhanced Generic Names). EGN changes the meaning of single asterisk in Data Set Profiles and allows an use of double asterisk. - With EGN: 'TOOLS.*' - covers data sets with exactly two qualifiers. - With EGN: 'TOOLS.**' - covers one or more qualifiers. - Without EGN: 'TOOLS.*' - covers data sets with two or more qualifiers. So 'TOOLS.*.**' with EGN is equal to 'TOOLS.*' without EGN option turned on.
Before catalog allocation we need to check few things about user catalogs on our system: - What is naming convention for catalogs. - Where there are placed and managed by SMS. - How RACF protects catalog. You can display all catalogs used by system with 'F CATALOG,REPORT,CATSTATS' command. Now you have naming convention, for example 'USERCAT.&sysname.*'. Overall setting regarding catalogs can be checked with “F CATALOG,REPORT” command. Recommendations for Catalogs are as follows: - They should never expire or be migrated. - They should be backed up on every modification. - They should reside on high performance device. By using the same HLQ and volume as existing catalogs we don't have worry about RACF and SMS configuration. Still, to avoid mistake we need to verify what RACF data set profile protects catalogs. - 'SR CLASS(DATASET) MASK(USERCAT)' – to find data set profile protecting user catalogs. - 'LD DA('USERCAT.*.**') GEN AU' – to display profile information. There are additional considerations regarding sysplex and catalog sharing. In this exercise we don't worry about those. JCL code for User Catalog allocation:
//JSADEK01 JOB NOTIFY=&SYSUID //DEFCAT EXEC PGM=IDCAMS,REGION=8M //SYSPRINT DD SYSOUT=* //SYSIN DD * DEFINE USERCATALOG - (NAME(USERCAT.MVS1.TOOLS) - VOLUME(MVS001) - CYLINDERS(5 5) - FREESPACE(10 10) - STRNO(10) - BUFND(20) - BUFNI(20) )
At that point catalog is useless, it's connected to Master catalog but no data set will be recorded in it until you define Alias. It's very simple: 'DEFINE ALIAS(NAME('TOOLS') RELATE('USERCAT.MVS1.TOOLS'))' Now we're ready to allocate first data set with TOOLS HLQ. SMS setting:
Management class . . : SOFTMC Storage class . . . : SOFTSC Volume serial . . . : SOFT01
TOOLS.FIRST.DATASET 15 ? 1 3390 USERCAT.MVS1.TOOLS
RACF setting: 'LD DA('TOOLS.FIRST.DATASET') GEN AU'
INFORMATION FOR DATASET TOOLS.*.** (G) LEVEL OWNER UNIVERSAL ACCESS WARNING ERASE ----- -------- ---------------- ------- ----- 00 TOOLS NONE NO NO AUDITING -------- FAILURES(UPDATE) ...
In this case, the easiest way to do move the data set to the desired volume is to rename it so it's covered by TOOLS.** HLQ and then migrate it. Currently it's stored under your HLQ so SMS will allow its migration. During Recall SMS will use ACS routines to place it on the right volume. To assemble ENQWATCH all you need to do is to specify Source and Target library:
... //ASMIT EXEC ASMIT, // LOADLIB='TOOLS.ENQWATCH.LINKLIB', // MEM=ENQWATCH,SOURCE='TOOLS.ENQWATCH.INSTALL'
In ENQWATCH documentation you'll find example startup procedure. It's not very complicated:
//ENQWATCH EXEC PGM=ENQWATCH //STEPLIB DD DISP=SHR,DSN=TOOLS.ENQWATCH.LINKLIB
New library 'TOOLS.PROCLIB' will be used for storing all jobs and started tasks we store under 'TOOLS.**' prefix. If you're not familiar with PROCLIB types and operations check “PROCLIB concatenation” Assignment in “JES2 & SDSF” tab. Let's suppose that 'SYS1.TEST.PROCLIB' is currently last library in PROC00 concatenation and we want to keep it that way. We'll insert 'TOOLS.PROCLIB' before it:
... PROCLIB(PROC00) DD(7)=(DSNAME=SYS1.TEST.PROCLIB)
JES2PARM after modification:
... PROCLIB(PROC00) DD(7)=(DSNAME=TOOLS.PROCLIB) PROCLIB(PROC00) DD(8)=(DSNAME=SYS1.TEST.PROCLIB)
Of course we need to also change it dynamically. You cannot simply insert PROCLIB in any place you like. In this case you'd have to first replace 'SYS1.TEST.PROCLIB' with 'TOOLS.PROCLIB' and than add 'SYS1.TEST.PROCLIB' as the last DD. This creates risk that between those few seconds something will try to use procedure from 'SYS1.TEST.PROCLIB'. Because of that it best practice to add 'TOOLS.PROCLIB' in the last place:
This operation is safe and search order is not important, unless both libraries contain the same procedures. After IPL setting from JES2PARM will be applied but for now we can keep it that way. As stated in ENQWATCH documentation we also need to add it to APF concatenation. First let's do that dynamically: - 'D PROG,APF' – Displays current APF concatenation, or you can simply use 'APF' SDSF panel, or 'TSO ISRDDN APF' command. - 'SETPROG APF,ADD,DSNAME=TOOLS.ENQWATCH.LINKLIB,SMS' – Adds library to APF concatenation. As always second step in such activities is PARMLIB modification: - 'D PARMLIB' – To check what parmlib concatation is used. - 'D IPLINFO' – To check IEASYSxx suffix. - In IEASYSxx you'll find 'PROG' initializaiton statement. It defines which 'PROGxx' members are used in your system. It's recommended practice to store LNKLST, LPA and APF in PROG member concatenation:
In above case two PROG members are used by the system PROGA0 and PROGEE. PROG members store command executed during IPL so it doesn't matter where you put 'APF ADD' command. It will only affect where it will be placed in APF concatenation. Of course for easier management commands related to APF should be in one place.
APF ADD DSNAME(TOOLS.ENQWATCH.LINKLIB) SMS
As you can see syntax of commands executed via PROGxx members is slightly different than those you use via SETPROG command but basically this is the same thing.
If you would try to run ENQWATCH right now you would receive an error:
ICH408I USER(STARTED2) GROUP(STARTED ) NAME(STC USER #2) TOOLS.ENQWATCH.LINKLIB CL(DATASET ) VOL(SOFT01) INSUFFICIENT ACCESS AUTHORITY FROM TOOLS.*.** (G) ACCESS INTENT(READ ) ACCESS ALLOWED(NONE )
We've created Data Set Profile with UACC(NONE) so default user assigned to Started Tasks wouldn't have authority to those data sets. Users used as started tasks owners should be defined with NOPASSWORD keyword. This will prevent any human user from accessing such profile. It also defines that profile will never expire. 'AU STCTOOLS NAME('TOOLS.** STC USER') DFLTGRP(STCGROUP) OWNER(STCGROUP) NOPASSWORD' 'STARTED' is General Resource Class that connects task start procedure (ENQWATCH in this example) and RACF user. Task will have access to the resources available for this user. Basically when you start ENQWATCH procedure RACF searches 'STARTED' class profiles for any match. If nothing is found it uses default profile '**' which should be very restricted so it won't have rights to 'TOOLS.**' data sets. You can check it with 'RL STARTED ** ALL' command. 'RDEF STARTED ENQWATCH.* UACC(NONE) STDATA(USER(STCTOOLS) GROUP(STCGROUP))' Above command creates resource in Started class. Profile name is in 'member.jobname' format. Most often jobname and member name are the same and only member name is used in such profile. You also need to refresh class STARTED so you definition is made not only in RACF database but also in definitions kept in central storage - 'SETR RACL(STARTED) REF'. From now on each time you start procedure named ENQWATCH it will run under STCTOOLS user and it will have access to the resources available to that user. The last thing to do is to give STCTOOLS user authorization to 'TOOLS.**' data sets: 'PE 'TOOLS.*.**' ID(STCTOOLS) ACCESS(ALTER)'
ENQWATCH monitors contention between jobs and users. It's purpose is to inform user that he blocks data set needed by some job or task, it doesn't monitor other types of contention. After blocking a data set that job wants to use you should have following message:
Please free 'JSADEK.MY.PROCLIB'. Other jobs are waiting to use it. (ENQWATCH)
In ENQWATCH documentation you'll find description of all available commands, for example: - 'F ENQWATCH,INFO' – displays task setting. - 'F ENQWATCH,STATS' – displays statistic from current task run. The only parameter you can change dynamically is: - 'F ENQWATCH,WAIT=0030' – cycle time - 'F ENQWATCH,SMFOFF' – SMF recording After task restart 'F ENQWATCH,STOP' you'll see that default values are used again. If you want to change the setting for good you need to recompile ENQWATCH with desired parameters.
Setting up Health Checker
Health Checker is a simple tool that sole purpose is to verify various aspects of system setting and warn us is the configuration deviates from recommended values. Health Checker consists of two main parts: - HZSPROC which is a framework for Checks management and control. - Checks which are small programs that verify if specific system setting is equal to the recommended value. There is a set of checks shipped with z/OS. More are installed with updates and software products. You can also write your own checks. In this assignment you'll learn how to configure and start Health Checker on z/OS system.
1. Customize Health Checker startup procedure: - Create startup procedure. - Allocate HZSPDATA data set. 2. Customize PARMLIBs: - Create HZSPRMxx member. - Update IEASYSxx if needed. - Ensure that Health Checker is started during IPL. 3. Customize RACF environment: - Define user for Health Checker. - Create STARTED class profile. - Define separate data set profile for HZSPDATA data set. - Add UNIX superuser authority for HZS user. - Ensure that HZS user has access for HZSPRMxx member. - Check if EZB.STACKACCESS.* resource in SERVAUTH class is defined. If yes add appropriate access for it to HZS user. - Define appropriate HZS.* profiles in XFACILIT class add appropriate access rights to them. 4. Start Health Checker: - Verify if HZS user was correctly assigned to the HZSPROC. - Enter CK SDSF panel to see predefined checks and their status. - What are possible check States? - What are possible check Statuses? - What's stored in Result column? - What's stored in Global and GlobalSys columns? - What's stored in ExcCount and RunCount columns? - What's stored in Severity and WTOType columns? - What's stored in WTOType and WTONum columns? - Select some check that ended with EXCEPTION and view its output. - Stop Health Checker. 5. Set up and test HZSPRINT utility. 6. Set up environment for user-written checks: - We'll need two libraries: for REXX scripts and Load modules used by the scripts. - Check SMS environment in order to select appropriate HLQ for those data sets. - Create RACF definitions for the libraries (If needed). - Allocate the libraries and verify that they're correctly protected by RACF. - Add '*.LOADLIB' to LINKLIST and APF. - Add '*.REXX' to System REXX library concatenation. Alternatively you can use default System REXX library. 7. Test sample checks available in SYS1.SAMPLIB: - Two assembler based checks: HZS_SAMPLE_ONE_TIME, HZS_SAMPLE_INTERVAL - Two REXX based checks: HZS_SAMPLE_REXXIN_CHECK, HZS_SAMPLE_REXXTSO_CHECK - Copy REXX scripts into a library in System REXX concatenation. - Allocate data sets needed by REXX scripts. - Compile and Link Message Table used by all four checks into '*.LOADLIB' created in Task#6. - Compile and Link assembler based checks into '*.LOADLIB' created in Task#6. - Copy ADDREP statements to HZSPRMxx and activate the setting. - Test the checks.
Check "Customizing the IBM Health Checker for z/OS procedure" chapter in "Health Checker for z/OS: User's Guide". For more information about Health Checker you can also see "Exploiting the IBM Health Checker for z/OS Infrastructure" RedBook.
Check "Create HZSPRMxx parmlib members" in "Health Checker for z/OS: User's Guide".
Detailed description of all needed activities is in "Setting up security for the IBM Health Checker for z/OS started task" chapter of "Health Checker for z/OS: User's Guide". It's very possible that you already have some or all needed definitions on your system. Check it before creating new ones.
For REXX Basics see 'REXX' tab on this website. "REXX Reference" and "REXX User's Guide" are to basic documents describing this language in z/OS environment. For APF and LINKLIST operations consult: - "ABCs of z/OS System Programming Volume 2" chapter "LPA, LNKLST, and authorized libraries" - "MVS Initialization and Tuning Reference" chapter "Chapter 78. PROGxx (authorized program list, exits, LNKLST sets and LPA)" - "MVS System Commands" chapter "SETPROG command".
All you need for this task including compilation jobs is available in "3.2 Services available to check routines" chapter of "Exploiting the IBM Health Checker for z/OS Infrastructure" RedBook.
Sample procedure is stored in SYS1.SAMPLIB(HZSPROC). All you need to do is to move it to your standard PROCLIB concatenation. Specifically to SYS1.PROCLIB, check HZSPROC IEASYSxx parameter to see why it must be in this particular library. Health Checker procedure executes HZSINIT program. You can use ISRDDN to quickly confirm if it's in your LINKLIST concatenation. HZSPDATA data set is optional but some checks use this data set to store their data between Health Checker restarts. Sample job that allocates this data set is stored in SYS1.SAMPLIB(HZSALLCP) data set. All you need to do is to allocate it with desired name and point to it in Health Checker start procedure.
HZSPRM00 is shipped with z/OS and should be available in SYS1.PARMLIB. There is nothing we need to change there right now. This member is only used to modify default setting for check and for Policy definitions. IEASYSxx changes are optional and depend on how you set up your startup procedure. “HZSPRM='SYSPARM'” means that HZSPRMxx suffix is specified in IEASYSxx but you can also specify it directly in Health Checker startup procedure or use “HZSPRM='PREV'” which means that HZSPROC will take same setting that was used during it's previous run. Also, if you are using different procedure name than HZSPROC you'll need to specify it in HZSPROC parameter of IEASYSxx member. System will automatically search SYS1.PROCLIB for this procedure.
First it's good idea to check if you have already some RACF profiles defined for HZSPROC. Here are two commands with which you can check it: 'SR CLASS(STARTED) MASK(* HZS)' 'SR CLASS(USER) MASK(* HZS)' If nothing is found than Health Checker is not configured on your system. 1. Adding user under which Health Checker will run: 'AU HZSUSER NAME('HEALTH CHECKER STC USER') DFLTGRP(STCGROUP) OWNER(STCGROUP) NOPASSWORD' HZSUSER will use OMVS UID. In order to access OMVS, DFLTGRP of this user should have GID specified in OMVS segment. 2. Creating STARTED class profile: 'RDEF STARTED HZSPROC.* UACC(NONE) STDATA(USER(HZSUSER) GROUP(STCGROUP))' 'SETR REF RACL(STARTED)' 3. Creating separate profile for HZSDATA profile: Currently the profile is stored under SYS1 prefix so it's protected with in the same way. In order to protect it appropriately we'll create a separate definitions for it. 'ADDSD 'SYS1.MVSA.HZSPDATA' OWNER(DATASET) UACC(NONE) GEN' Data set profile does not contain any generic character such as '*' or '**' so by default Discrete profile would be created. It's not recommended to have any Discrete profiles in RACF data base. To make it generic without use of generic characters you need to use 'GEN' keyword. 4. Add appropriate access rights to HZSDATA data set: 'PE 'SYS1.MVSA.HZSPDATA' GEN ID(SYSPROG) ACC(ALTER)' 'PE 'SYS1.MVSA.HZSPDATA' GEN ID(HZSUSER) ACC(UPDATE)' 'PE 'SYS1.MVSA.HZSPDATA' GEN ID(JSADEK) DELETE' If 'ADDCREATOR' option is active you'll be added to access list automatically. In such case remove it accordingly to the rule that human users should never be on ACLs. 5. Add Health Checker user Superuser UNIX authority: 'ALTUSER HZSUSER OMVS(UID(4000) HOME('/') PROGRAM('/bin/sh'))' 'CONNECT HZSUSER GROUP(OMVSGRP)' 'PERMIT BPX.SUPERUSER CLASS(FACILITY) ID(HZSUSER) ACCESS(READ)' 'SETR REF RACLIST(FACILITY)' Before assigning specific UID to a user check if it's free with 'RLIST UNIXMAP * ALL' command. 6. Verify that HZSUSER has access to PARMLIBs: 'LD DA('SYS1.PARMLIB') GEN AU' It's recommended that PARMLIBs has UACC(READ) so if your system follows this convention HZSUSER will have the appropriate access. 7. Check if EZB.STACKACCESS.* resource in SERVAUTH class is defined: 'SR CLASS(SERVAUTH) NOMASK' This command displays all profiles in SERVAUTH class. If such profile exists you'd have to add permission to it for HZSUSER. This is pretty much it. Remember that Checks are simply programs and some of them may require additional authorizations. 8. Defining HZS.* profiles for XFACILIT class. There are few ways in which you can protect access to checks but since Checks are not critical system feature we'll go with the easiest way and create one profile that protects access to all of them. First let's check if XFACILIT class is RACLISTED: 'SETR LIST' If XFACILIT is not RACFLISTED you'll also need to issue: 'SETROPTS CLASSACT(XFACILIT)' 'SETROPTS RACLIST(XFACILIT)' Now we can create Health Checker profile: 'RDEF XFACILIT HZS.MVSA.* UACC(NONE)' 'SETR REF RACL(XFACILIT)' And of course add appropriate access rights to user groups, for example: 'PE HZS.MVSA.* CLASS(XFACILIT) ID(SYSPROG) ACCESS(CONTROL)' 'PE HZS.MVSA.* CLASS(XFACILIT) ID(SECURITY) ACCESS(READ)' 'PE HZS.MVSA.* CLASS(XFACILIT) ID(STORAGE) ACCESS(READ)' 'PE HZS.MVSA.* CLASS(XFACILIT) ID(DBDC) ACCESS(READ)' 'PE HZS.MVSA.* CLASS(XFACILIT) ID(OPER) ACCESS(READ)' 'SETR REF RACL(XFACILIT)'
Health Checker is started and stopped with standard commands 'S HZSPROC', 'P HZSPROC'. SDSF provides CK panel where you can comfortably view status of all checks installed on the system. - What are possible check States? Check State is made up from two parameters, for example: 'ACTIVE(ENABLED)' First is user controlled state: - ACTIVE – by default all checks are Active. - INACTIVE – if system programmer reviewed that check is not applicable to the system configuration or simply not needed he can Deactivate it. New checks can be added in Inactive state from the beginning. Second part is status controlled by Health Checker: - ENABLED – this means that check is working fine. Note that INACTIVE checks can also be in this status. - DISABLED – this means check encountered some problem and is automatically turned off. Check itself could end in error. Value passed to it could be incorrect. Also if check tries to check setting that's not applicable on that system Health Checker can put the Check in DISABLED status. - GLOBAL – this means that one Check verifies some Sysplex wide setting. If there are three systems A, B and C such check runs only on one of them (A) on the other two LPARs (B and C) it's marked as DISABLED. - What are possible check Statuses? Full list of statuses is described in the HZS0200I message documentation. The ones worth mentioning here are: - ENV N/A – if check verifies setting that's not used on the system it's marked as Not Applicable. - SUCCESSFUL – check verified that system setting matches recommended values. - EXCEPTION-sev – check ran successfully but the system setting doesn't match recommended values. This setting should be investigated and most likely corrected. 'sev' means simply how important this setting is. - What's stored in Result column? This is a return code from Check program. All codes other than 0 needs to be checked. There are two possibilities: - Check encounter some kind of error. Perhaps the parameter passed to it was wrong or RACF denied access to some resource. - Check ended successfully but it detected that system setting varies from recommended value. In this case return codes match check severity (LOW = RC4, MEDIUM = RC8, HIGH = RC12). - What's stored in Global and GlobalSys columns? Those two columns describe Sysplex wide checks. GlobalSys column shows on which system in Sysplex check was performed. - What's stored in ExcCount and RunCount columns? ExcCount specifies number of EXCEPTIONS detected during last check run. A single Check can detect more than one deviations in system config. Number of such deviations is shown here. RunCount stores how many times Check was executed. - What's stored in Severity and WTOType columns? Those column define Check importance and message that is issued in case Check ends with exception: - HZS0001I – WTO INFO - HZS0002E – WTO EVENTUAL - *HZS0003E – WTO CRITICAL You can set up your automation software to manage only HIGH and MEDIUM messages while LOW severity messages are ignored. To manually display all WTO on your system use 'D R,L' command. - Select some check that ended with EXCEPTION and view its output. You can enter check output with 'S' action character in SDSF. You'll see there both check description and the result from the last check execution. For example, check CSV_APF_EXISTS checks if there no problems with any data set on APF, if it's not migrated or removed by some reason.
HZSPRINT Utility is fairy simple to use. Sample procedure is stored in 'SYS1.SAMPLIB(HZSPRINT)' data set. To use HZSPRINT Utility you need access to HZS.* profile in XFACILIT class. We've added appropriate access right to users so they can view and manage checks through SDSF. If you want to use this Utility via TWS or other scheduling software a job owner must also have READ access to this resource. The output produced by this utility is pretty much the same as shown in SDSF using 'S' action character. You can use this utility you can create periodical reports from specific checks, for example all checks that ended with EXCEPTION status, like in the example below:
//HZSPRINT EXEC PGM=HZSPRNT,TIME=1440,REGION=0M, // PARM=('CHECK(*,*)','EXCEPTIONS') //SYSOUT DD DISP=(NEW,CATLG),DSN=&SYSUID..HZS.EXCEPTNS, // SPACE=(TRK,(1,1)),LRECL=256,BLKSIZE=27904
To run REXX you need REXX alternate library in your LINKLIST and LPA concatenations. Simply check if you have *.SEAGALT and *.SEAGLPA in those concatenations. They should be there. The next thing to check is if System REXX is set up on your system. This is also a standard z/OS configuration task and there shouldn't be problems with it. System REXX address space is called AXR. Let's use 'SYS1.MVSA.HZS.REXX' and 'SYS1.MVSA.HZS.LOADLIB' names. HZSPROC will need READ access to them. Access list for those data sets can be the same as for HZSPDATA data set. We'll use it as a model: 'AD 'SYS1.MVSA.HZS.**' FGENERIC FROM('SYS1.MVSA.HZSPDATA')' Now HZSPROC will have UPDATE access to it, it needs only READ but it's UPDATE is also fine in this case. If ADDCREATOR options is set in your configuration your user ID will be added to ACL with ALTER access, remove it from there: 'PE 'SYS1.MVSA.HZS.**' GEN ID(JSADEK) DELETE' Alternatively you can turn this option off with 'SETR NOADDCREATOR'. But this is a change of general RACF setting and should be very carefully considered. To execute user Checks you must include them in both APF and LINKLIST concatenations: 'D PROG,LNKLST' 'SETPROG LNKLST,DEFINE,NAME=LNKLST01,COPYFROM=CURRENT' 'SETPROG LNKLST,ADD,NAME=LNKLST01,DSNAME=SYS1.MVSA.HZS.LOADLIB' 'SETPROG LNKLST,ACTIVATE,NAME=LNKLST01' 'D PROG,LNKLST' 'F LLA,REFRESH' 'SETPROG APF,ADD,DSNAME=SYS1.MVSA.HZS.LOADLIB,SMS' 'D PROG,APF' The last thing to do is to add 'SYS1.MVSA.HZS.REXX' to System REXX library concatenation. You can do that via AXRxx PARMLIB member. 'REXXLIB ADD DSN(SYS1.MVSA.HZS.REXX)' System REXX searches for scripts in order: - Libraries defined by REXXLIB statement in AXRxx member. - Default System REXX library SYS1.SAXREXEC. Unfortunately you cannot change this setting dynamically so in order to add your library to System REXX concatenation AXR address space must be restarted which is not recommended during normal system processing. If it's a system where you know you can do that, it's done via FORCE command: - Stop: 'FORCE AXR,ARM' - Start: 'S AXRPSTRT' Alternatively you can simply copy REXX checks to SYS1.SAXREXEC and avoid System REXX restart. You can check System REXX concatenation with 'F AXR,SYSREXX REXXLIB' command. Note: If you are using older versions of z/OS System REXX may don't support user defined libraries. In such case simply go with SYS1.SAXREXEC.
Health Checker has few sample checks which you can use to get familiar with adding and writing user-created checks. There are stored in SYS1.SAMPLIB. First let's set up REXX checks. HZSSXCHK member stores two REXX based checks: HZS_SAMPLE_REXXIN_CHECK, HZS_SAMPLE_REXXTSO_CHECK To be able to use them we need to do three things: - Copy the member from SYS1.SAMPLIB to HZS '*.REXX' library. - Allocate data sets used by those checks. - Compile Message Table used by the checks. REXX check can use two special data sets: HZS_SAMPLE_REXXIN_CHECK – rexxhlq.execname.REXXIN.Eentrycode – For passing parameters to REXX routine. HZS_SAMPLE_REXXTSO_CHECK – name specified in PARMS keyword – For storing output of SAY statements. It is done only if “DEBUG=ON” option is used. In this example we'll use TOOLS HLQ: TOOLS.HZSSXCHK.REXXIN.E1 & TOOLS.HZSSXCHK.DATA REXX checks are executed in Health Check address space and therefore have the same authorizations as Health Checker. All resources needed by REXX scripts must be available to HZSUSER: 'PE 'TOOLS.*.**' GEN ID(HZSUSER) ACC(UPDATE)' JCL for compiling Message Table is available in “3.2.6 Create your first check routine” chapter of "Exploiting the IBM Health Checker for z/OS Infrastructure" RedBook. All we need to to is to supply our '*.LOADLIB' to MSGTLOAD symbolic variable. The same Message Table is used in assembler checks so we don't have to worry about it later. The chapter mentioned above also contains JCL for compiling those assembler checks. Just like before supply your '*.LOADLIB' there and submit the job. Now you should have two modules in your LOADLIB: HZSMSGTB & HZSSCHKR. At this point we have everything in place: - REXX script in System REXX concatenation. - MSGTBL & assembler checks in LOADLIB that's added to LINKLIST and APF. - Data sets needed by REXX checks are allocated. - HZSUSER has access to all those resources. So the last thing to do is to define them to Health Checker. Sample ADDREP statements are available in “3.2.1 Registration (Add) services” of "Exploiting the IBM Health Checker for z/OS Infrastructure" RedBook. All we need to do is to tailor them to our configuration: Assembler checks:
ADDREP CHECK(IBMSAMPLE,HZS_SAMPLE_ONE_TIME) CHECKROUTINE(HZSSCHKR) MESSAGETABLE(HZSMSGTB) ACTIVE ENTRYCODE(1) DATE(20170530) REASON('A sample health check to demonstrate a one time health check') PARMS('LIMIT(047)') SEVERITY(LOW) INTERVAL(ONETIME) USS(NO) /*------------------------------------------------------------------*/ ADDREP CHECK(IBMSAMPLE,HZS_SAMPLE_INTERVAL) CHECKROUTINE(HZSSCHKR) MESSAGETABLE(HZSMSGTB) ACTIVE ENTRYCODE(2) DATE(20170530) REASON('A sample check to demonstrate an interval check') SEVERITY(LOW) INTERVAL(00:05) USS(NO)
ADDREP CHECK(IBMSAMPLE,HZS_SAMPLE_REXXIN_CHECK) EXEC(HZSSXCHK) REXXHLQ(TOOLS) REXXTSO(NO) REXXIN(YES) MSGTBL(HZSMSGTB) ENTRYCODE(1) USS(NO) VERBOSE(NO) PARMS('LIMIT(047)') SEVERITY(LOW) INTERVAL(ONETIME) DATE(20170530) REASON('A sample check to demonstrate an ', 'exec check using REXXIN.') /*------------------------------------------------------------------*/ ADDREP CHECK(IBMSAMPLE,HZS_SAMPLE_REXXTSO_CHECK) EXEC(HZSSXCHK) REXXHLQ(TOOLS) REXXTSO(YES) REXXIN(NO) MSGTBL(HZSMSGTB) ENTRYCODE(2) USS(NO) VERBOSE(NO) PARMS('DSN(TOOLS.HZSSXCHK.DATA)') SEVERITY(LOW) INTERVAL(00:05) EINTERVAL(SYSTEM) DATE(20170530) REASON('A sample check to demonstrate an ', 'exec check using TSO services.')
You can apply new Health Checker setting by issuing command: 'F HZSPROC,ADD,PARMLIB=(01)' where '01' is suffix of HZSPRMxx member. But before doing that you'll need to issue 'F LLA,REFRESH' command again. At the time '*.LOADLIB' have been added to LINKLIST it didn't contained compiled modules for Message Table and Assembler Checks so those modules are not present in LLA directories, issuing 'F LLA,REFRESH' will rebuild the directory and our modules will be visible. If there are no errors in your ADDREP statement you should now see your checks in SDSF CK panel:
HZS_SAMPLE_INTERVAL IBMSAMPLE ACTIVE(ENABLED) SUCCESSFUL HZS_SAMPLE_ONE_TIME IBMSAMPLE ACTIVE(ENABLED) EXCEPTION-LOW HZS_SAMPLE_REXXIN_CHECK IBMSAMPLE ACTIVE(DISABLED) UNEXPECTED ERROR HZS_SAMPLE_REXXTSO_CHECK IBMSAMPLE ACTIVE(ENABLED) SUCCESSFUL
REXXIN_CHECK error '00000001_FFFFFFFD' is normal as described in "Exploiting the IBM Health Checker for z/OS Infrastructure" RedBook.
Working with Health Checks
All Health Checks can be modified by system programmer. You can disable any of them, change how they're executed, for example their severity or the time when they'll run, you can even supply different input parameter to Check programs. In “Chapter 13. IBM Health Checker for z/OS checks” of "Health Checker for z/OS: User's Guide" document you can find detailed description of all IBM supplied checks along with their possible modifications. Usual way for making such modification is using Policies. In this assignment you'll learn how to work with Policies.
1. Display following information in SDSF 'CK' panel: - General Status of Health Checker. - Full information about selected check definition. - Policy of a check. - Open check output in ISPF Editor. 2. Modify dynamically check CSV_APF_EXISTS using command: - Change WTOType to Eventual. - Make it run each three minutes. - Wait until it runs to check your changes. - Refresh the check to go back to the original values. - Deactivate the check. - Activate the check. 3. Select any check with EXCEPTION status, fix the condition and rerun the check. 4. Create two HZSPROC Policies: - Create a policy for four IBMSAMPLE checks. - This policy should have four statements, each for one check. - Those statements should remove specific IBMSAMPLE checks. - Add second policy for CSV_APF_EXISTS that modifies the check so it issues Eventual WTOType and is executed each 30 minutes. Also check should now ignore all migrated data sets. - Activate new HZSPRMxx member. 5. Perform following actions on Policies defined in Task#4: - Activate CSV_APF_EXISTS modification. - Restore original CSV_APF_EXISTS check setting. - Activate IBMSAMPLE Policy. - Modify the policy so only the Assembler checks are removed. REXX checks should be working. - Modify HZSPRMxx member so CSV_APF_EXISTS is used automatically after HZSPROC startup.
Check SDSF built-in help and "Cheat sheet: examples of MODIFY hzsproc commands" chapter in "Health Checker for z/OS: User's Guide".
Check "Syntax and parameters for HZSPRMxx and MODIFY hzsproc" in "Health Checker for z/OS: User's Guide". For example Policy statements see "Chapter 13. IBM Health Checker for z/OS checks" of "Health Checker for z/OS: User's Guide".
General Status of Health Checker. 'DS' action character issued toward any check displays status of Health Checker and all defined checks. 'F HZSPROC,DISPLAY,STATUS' command displays the same information. Full information about selected check definition. 'DL' action character displays full check definition Policy of a check. 'DP' display check policy. By default HZSPRMxx member is empty so it doesn't contain any Policy statements. Policy is simply a permanent check modification present in HZSPRMxx. Open check output in ISPF Editor. You can enter check output with 'SE' action character. It creates ISPF Editor Edit session where you can edit check output in any way you like. Your changes will not be saved in check output but you can copy the output with standard ISPF Editor commands.
'F HZSPROC,UPDATE,CHECK=(IBMCSV,CSV_APF_EXISTS),WTOTYPE=EVENTUAL,INTERVAL=00:03' is the command we're looking for. If you have an Exception on this check you'll notice that now HZS0002E message is issued instead of HZS001I. 'E' action character or 'F HZSPROC,REFRESH,CHECK=(IBMCSV,CSV_APF_EXISTS)' command refreshes check. This means that it's default setting will be restored. All dynamic changes are retained only for the time HZSPROC is running. After task recycle all checks are refreshed. All the parameters that can be changed dynamically can be also modified from the level of CK panel by overwriting field with green font. Deactivating check ('H' action character) puts it in 'INACTIVE(ENABLED)' it won't run in this state. You can reactivate it with 'A' action character.
Let's fix ASM_NUMBER_LOCAL_DATASETS check. It verifies if there are three Page Data Sets on the system.
ILRH0101E Number of local page data sets is below recommended value Explanation: The number of usable local page data sets is 1 (usable meaning not marked 'bad' and not currently in a drained state). This is below the recommended minimum number of 3.
First we need to find new spare volume for them and reinitialize it:
//VARYOFF EXEC PGM=SDSF //SYSPRINT DD SYSOUT=* //ISFOUT DD SYSOUT=* //ISFIN DD * /V 0AB8,OFFLINE //*--------------------------------------------------------------------- //REINIT EXEC PGM=ICKDSF,PARM=NOREPLYU,REGION=6M //SYSPRINT DD SYSOUT=* //SYSIN DD * INIT VFY(XX0AB8) UNIT(0AB8) VOLID(PAGE01) VTOC(1,0,90) INDEX(0,1,14) - NODS //*--------------------------------------------------------------------- //VARYON EXEC PGM=SDSF //SYSPRINT DD SYSOUT=* //ISFOUT DD SYSOUT=* //ISFIN DD * /V 0AB8,ONLINE
Now we have brand new 3390-3 with 3332 Cylinders free. We'll use that space to allocate three Page data set each 1110 Cylinder large. You can check “Page Data Sets” assignment for more information about them and how to allocate them via batch in standard way. Alternatively you can use IDCAMS TSO command: “TSO DEFINE PAGESPACE (NAME('SYS1.LOCAL.PAGE1') CYLINDERS(1110) MODEL('SYS1.LOCAL.PAGE') VOLUME(PAGE01))” Where 'SYS1.LOCAL.PAGE' is currently used. Now let's define them to the system: 'PA PAGE=SYS1.LOCAL.PAGE1' 'PA PAGE=SYS1.LOCAL.PAGE2' 'PA PAGE=SYS1.LOCAL.PAGE3' 'D ASM' 'PD DELETE,PAGE=SYS1.LOCAL.PAGE' 'D ASM' Now we have three new bigger Page data sets. Old one will be kept as spare in case of space shortage. Now let's rerun the ASM_NUMBER_LOCAL_DATASETS check:
ILRH0100I The number of usable local page data sets is 3. This is at or above the recommended minimum number of 3.
The last thing to do is to modify IEASYSxx member so new Page Data Sets are still used after IPL:
PAGE=(SYS1.PLPA.PAGE, SYS1.COMMON.PAGE, SYS1.LOCAL.PAGE1, SYS1.LOCAL.PAGE2, SYS1.LOCAL.PAGE3,L),
Note: In production environment each page should be on different volume to avoid Single Point of Failure. Also high-performance DASDs should be selected.
As you can see in "Syntax and parameters for HZSPRMxx and MODIFY hzsproc" chapter HZSPRMxx can store the same statements that can be issues with “F HZSPROC,...” command. Policy is a set of statements that modify a Checks in some way, here are few examples: - There is a set of non-critical Checks that should be ran OnDemand only. Even if a check is ONETIME only – it's still executed during HZSPROC recycle. Adding such Policy allows you to run those checks only when you really want them to. You activate the policy, analyze checks and remove it. - You have different sets of checks for different times of the day. You may have checks that run in production time 8-16 and must be successful so they generate high severity alert. Later at night they end in Exception but it's doesn't matter at that time. You can set up automatic commands that switch ON and OFF such policy so the checks don't generate unnecessary alerts. Policies definitions:
/*------------------------------------------------------------------*/ /* 4 IBMSAMPLE POLICIY */ /*------------------------------------------------------------------*/ ADDREPLACE POLICY(TEST_CHECKS) STMT(SAMP_DEL_ASSB1) DELETE CHECK(IBMSAMPLE,HZS_SAMPLE_ONE_TIME) REASON('SAMPLE CHECK, NOT NEEDED') DATE(20170602) ADDREPLACE POLICY(TEST_CHECKS) STMT(SAMP_DEL_ASSB2) DELETE CHECK(IBMSAMPLE,HZS_SAMPLE_INTERVAL) REASON('SAMPLE CHECK, NOT NEEDED') DATE(20170602) ADDREPLACE POLICY(TEST_CHECKS) STMT(SAMP_DEL_REXX1) DELETE CHECK(IBMSAMPLE,HZS_SAMPLE_REXXIN_CHECK) REASON('SAMPLE CHECK, NOT NEEDED') DATE(20170602) ADDREPLACE POLICY(TEST_CHECKS) STMT(SAMP_DEL_REXX2) DELETE CHECK(IBMSAMPLE,HZS_SAMPLE_REXXTSO_CHECK) REASON('SAMPLE CHECK, NOT NEEDED') DATE(20170602) /*------------------------------------------------------------------*/ /* CSV_APF_EXISTS POLICY */ /*------------------------------------------------------------------*/ ADDREPLACE POLICY(CHECK_MODS) STMT(APF_EXISTS_UPDT) UPDATE CHECK(IBMCSV,CSV_APF_EXISTS) PARM('MIGRATEDOK(YES)') INTERVAL(00:30) WTOTYPE(EVENTUAL) REASON('CHECK CUSTOMIZATION') DATE(20170602)
Before activating new setting it is a good idea to check syntax of your statements: 'F HZSPROC,ADD,PARMLIB=(00,CHECK)' You can activate new setting with 'F HZSPROC,REPLACE,PARMLIB=(00)' command. The difference between ADD and REPLACE options is that when you add a library its statements are read and concatenated to current Health Checker definitions. REPLACE with remove previous HZSPRMxx statements and replace them with the member you specified.
Here are few considerations while working with policies: - There can be only one Policy active at a time. - Active Policy is processed each time you refresh the check. Also, when you use 'F HZSPROC,ADD,PARMLIB=(00)' command. - Having Policy defined in HZSPRMxx does not activate it automatically. - Removing a Policy does not restore previous setting. If that's your goal you need to remove the Policy and than Refresh the checks modified by the Policy. A few useful commands: - 'F HZSPROC,DISPLAY,POLICY' – displays currently Active Policy. - 'F HZSPROC,DISPLAY,POLICIES' – displays all defined Policies. - 'F HZSPROC,REFRESH,CHECK=(IBMCSV,CSV_APF_EXISTS)' refreshes specific check. You can also do that with 'E' action character in SDSF. - 'F HZSPROC,REMOVE,POLICY=CHECK_MODS,STATEMENT=APF_EXISTS_UPDT' – removes Policy statement from HZSPROC. This means that it's not longer available to use. If you need to use it again you need to redefine it, for example with 'F HZSPROC,ADD,PARMLIB=(00)' command. Activate CSV_APF_EXISTS modification. - 'F HZSPROC,ACTIVATE,POLICY=CHECK_MODS' Restore original CSV_APF_EXISTS check setting. At this point Policy CHECK_MODS is used. This means that no matter if you'll refresh the entire configuration: - 'F HZSPROC,REPLACE,PARMLIB=(00)' Or just specific checks: - 'F HZSPROC,REFRESH,CHECK=(IBMCSV,CSV_APF_EXISTS)' This policy will be reapplied after the refresh. In order to restore original check setting you need to either activate some other Policy and refresh the check: - 'F HZSPROC,ACTIVATE,POLICY=TEST_CHECKS' - 'F HZSPROC,REFRESH,CHECK=(IBMCSV,CSV_APF_EXISTS)' Or remove it completely and refresh the check: - 'F HZSPROC,REMOVE,POLICY=CHECK_MODS,STATEMENT=APF_EXISTS_UPDT' - 'F HZSPROC,REFRESH,CHECK=(IBMCSV,CSV_APF_EXISTS)' Note that the latter method completely removes the Policy statement. It's no longer present in 'F HZSPROC,DISPLAY,POLICIES' output and cannot be reactivated. To restore deleted Policy you need to reapply HZSPRMxx setting: - 'F HZSPROC,ADD,PARMLIB=(00)' Activate IBMSAMPLE Policy. - 'F HZSPROC,ACTIVATE,POLICY=TEST_CHECKS' Modify the policy so only the Assembler checks are removed. REXX checks should be working. After applying the policy all IBMSAMPLE checks were removed. Now we can can restore REXX checks in following way: - 'F HZSPROC,REMOVE,POLICY=TEST_CHECKS,STATEMENT=SAMP_DEL_REXX*' Note that you can use masks in both Policy and Statement parameters. Now only statements for Assembler checks are part of the Policy. This command does not reactivate REXX checks so you need to do that manually: - 'F HZSPROC,REFRESH,CHECK=(IBMSAMPLE,HZS_SAMPLE_REXX*)' Modify HZSPRMxx member so CSV_APF_EXISTS is used automatically after HZSPROC startup. As mentioned earlier defining Policies in HZSPRMxx does not activate them during Health Checker start or when its setting is refreshed. If that's you goal you can add following line to HZSPRMxx member: 'ACTIVATE,POLICY=CHECK_MODS'
Writing REXX checks
Health Checker provides easy to use API that enables system programmer to write their own Health Checks in REXX language. Scripts don't even have to be compiled therefore you can modify their code even when they're already defined and used by Health Checker. In this Assignment you'll learn how to write basic Health Checks in REXX language.
1. Gather technical requirements for user written REXX checks. 2. Write REXX code that checks if two data sets reside on the same volume. Later you'll modify it so it can be used by Health Checker. It should issue at least four messages: - Success: If both data sets reside on different volumes. - Exception: If both data sets reside on the same volume. - Badparm: If data sets are not found. - Error: If some other error occurs. Such check can be used for detecting if single-point-of-failure condition occurs for some important data sets (RACF database for instance). 3. Implement Health Checker functions in the script written in Task#2. Add it to HZSPROC and test in various conditions. 4. Write second REXX check: - The check should verify if no job in JES2 Execution queue is increasing spool usage too quickly. - It should have four input parameters: MAXSPLUSAGE - This is maximum value allowed for a job. If any job in Execution queue exceeds this value Check should end in Exception no matter if job output grows quickly or not. SPLUSAGE - This value defines how large jobs will be checked and compared. 1% means that REXX should only process info about jobs that currently use more than 1% of spool. INCRSPEED - How quickly job size can increase between Health Check run. If spool usage of any job increase more that this value since the last time Check ran it should end in Exception. SPLDATA - Data set name used for keeping jobs that match SPLUSAGE between Check runs. 5. Implement Health Checker functions in the script written in Task#4: - Use "D O" command to check if JES2 is used. If not issue "ENVNA" message. Also, copy JES2 prefix from the output. - Implement INITRUN handling, in this case jobs from previous run should not be checked. - Check should issue at least five message types: SUCCESS, EXCEPTION, BADPARM, ERROR, ENVNA. - Test your Check behavior in all relevant conditions.
All you need for this Task is "Chapter 8. Writing REXX checks" of "IBM Health Checker for z/OS: User's Guide". Additionally you can check "Chapter 3. Writing checks" in "Exploiting the IBM Health Checker for z/OS Infrastructure".
This assignment requires basic REXX skills. If you're familiar with REXX all you need is "REXX Reference". If not, it's best to practice REXX language in simpler tasks before attempting this one.
As an addition to Health Checker documentation it may be a good idea to analyze how sample REXX checks are codes, see "SYS1.SAMPLIB(HZSSXCHK)".
JES2 command you can use: "$DJQ,Q=XEQ,SPL=(%>0.1),SPL=(%)" When issuing MVS command via REXX you must use ADDRESS CONSOLE. You can read about it in "REXX Reference". JES2 commands output wont be fully routed to REXX program until you use "L=Z" at the end of the command, for example: "$DSPL,L=Z" "$DJQ,Q=XEQ,SPL=(%>0.1),SPL=(%),L=Z"
To be able to use CONSOLE environment in System REXX you need to enable TMP (Terminal Monitor Program). Also remember that Health checker must have RACF authorization to all resources needed by the Check, including authority to issue MVS or JES2 commands.
Basic workflow of REXX checks: - Execute HZSLSTRT functions that will establish connection between your Check and Health Checker. - (Optional) Verify if your check is applicable to the particular system setting. If not, issue HZSLFMSG_REASON='ENVNA' message. - (Optional) Verify if Input Parameters are correct. If not, issue HZSLFMSG_REASON='BADPARM' message. - (Optional) Implement INITRUN functionality. If your check uses any data from its previous run you should add support for INITRUN function. See HZS_PQE_FUNCTION_CODE for more details. - (Optional) Define message table for the check. In newer version of z/OS you can issue messages to Health Checker directly from REXX code and avoid defining separate message table. See HZSLFMSG_REQUEST='DIRECTMSG' for more details. - Code actual Check functionality. - You need to define at least one SUCCESS and EXCEPTION message in your code, other messages are optional. - (Optional) Remember about error handling. Use HZSLFMSG_REASON='ERROR' to indicate REXX error to Health Checker. You can also use predefined error messages for Health Checker. See HZSLFMSG_REQUEST="HZSMSG" for more details. - (Optional) Implement debug instruction for HZSPROC. It's best to copy those from sample REXX scripts. - End your script by executing HZSLSTOP function. Additional considerations: - REXXIN is optional data set used for passing input parameters to your check. You can also do that via ADDREP function but REXXIN can be still very useful if you want to pass a lot of input parameters. - REXXOUT is mandatory. It is used for debugging your REXX script. All “SAY” instructions will be written there when DEBUG=ON option is set. It's named 'rexxhlq.execname.REXXOUT'. You define 'rexxhlq' and 'execname' in HZSPRMxx definition. - REXXOUT data set is used for debugging purposes only. If you want to issue normal messages use HZSLFMSG functions. - Make sure that HZSPROC user has UPADTE access for REXXOUT data set. - Do not set return code to RC REXX variable, it will be ignored by Health Checker. Instead, use HZSLFMSG functions to indicate specific error conditions. - It's best to code “SYSTEM” value in EXCEPTION message. This value indicates that check severity must be defined in HZSPRMxx member.
/* REXX */ SIGNAL ON SYNTAX SIGNAL ON ERROR SIGNAL ON FAILURE SIGNAL ON NOVALUE SIGNAL ON HALT /*********************************************************************/ /* MAIN - HEALTH CHECK */ /* VERIFIES IF TWO DATASETS SPECIFIED AS PARAMETER */ /* RESIDE ON DIFFERENT VOLUMES */ /*********************************************************************/ DS1 = 'JSADEK.MY.LINKLIB' DS2 = 'SYS1.ABDTPNL0' IF LISTDSI("'"DS1"'") <> 0 | LISTDSI("'"DS2"'") <> 0 THEN DO CALL TERMINATE "INPUT PARAMETERS INCORRECT" END ADDRESS TSO VOLDS1 = CHECK_VOLSER(DS1) VOLDS2 = CHECK_VOLSER(DS2) IF LENGTH(VOLDS1) <> 6 | LENGTH(VOLDS2) <> 6 THEN DO CALL TERMINATE "VOLSERS NOT DETECTED CORRECTLY" END IF VOLDS1 = VOLDS2 THEN SAY "FAILURE: BOTH DATA SETS ARE ON THE: "VOLDS1" VOLUME" ELSE DO SAY "SUCCESS: DATA SETS ARE ON DIFFERENT VOLUMES:", VOLDS1" AND "VOLDS2 END RETURN RC /*********************************************************************/ /* CHECK VOLSER ON WHICH DATA SET RESIDES */ /*********************************************************************/ CHECK_VOLSER: PARSE ARG DS . C = OUTTRAP('CMDOUT.') "LISTC ENT('"DS"') ALL" C = OUTTRAP('OFF') DO I=1 TO CMDOUT.0 VOLPOS = POS("VOLSER",CMDOUT.I) IF VOLPOS <> 0 THEN DO VOLSER=SUBSTR(CMDOUT.I,VOLPOS+18,6) LEAVE END END RETURN VOLSER /*********************************************************************/ /* HANDLED ERROR ROUTINE */ /*********************************************************************/ TERMINATE: PARSE ARG ERROR_MSG SAY ERROR_MSG EXIT /*********************************************************************/ /* UNHANDLED ERROR ROUTINE */ /*********************************************************************/ SYNTAX: ERROR: FAILURE: NOVALUE: HALT: SAY "AN ERROR HAS OCCURRED ON LINE: "SIGL SAY "ERROR LINE: "SOURCELINE(SIGL) SAY "RETURN CODE: "RC CALL TERMINATE "ERROR TEXT: "ERRORTEXT(RC) EXIT
This type of check can be pretty useful in detecting SPOF (Single Point of Failure). Many important data sets in z/OS has duplicates, RACF database or JES2 Checkpoint Data Set are examples of such. Backup copies like this should always be stored on separate volumes so if there are some problems with one DASD there is always second copy available.
ADDREP CHECK(USERCHK,SPOF_VOL_DIFF_VERIFY) EXEC(SPOFVOL1) REXXHLQ(SYSU) REXXTSO(YES) REXXIN(NO) MSGTBL(*NONE) USS(NO) VERBOSE(NO) SEVERITY(MEDIUM) INTERVAL(0:10) ACTIVE EINTERVAL(SYSTEM) DATE(20170622) PARM('JSADEK.MY.REXX JSADEK.MY.CNTL') REASON('Verify if data sets are on different volumes.')
- 'MSGTBL(*NONE)' – Means that your check uses DIRECTMSG function. - 'REXXHLQ(SYSU)' – Defines HLQ of REXXOUT & REXXIN data set. In this case it will be named 'SYSU.SPOFVOL1.REXXOUT'. You should ensure that it exists and that HZSPROC has UPDATE access to it. - 'USS(NO)' – Means that your check does not communicate with UNIX Shell (OMVS). REXX:
/* REXX */ SIGNAL ON SYNTAX SIGNAL ON ERROR SIGNAL ON FAILURE SIGNAL ON NOVALUE SIGNAL ON HALT /*********************************************************************/ /* If HZSLSTRT is not successful all IBM Health Checker for z/OS */ /* function calls will fail. */ /*********************************************************************/ HZSLSTRT_RC = HZSLSTRT() IF HZSLSTRT_RC <> 0 THEN DO IF HZS_PQE_DEBUG = 1 THEN DO SAY "HZSLSTRT RC" HZSLSTRT_RC SAY "HZSLSTRT RSN" HZSLSTRT_RSN SAY "HZSLSTRT SYSTEMDIAG" HZSLSTRT_SYSTEMDIAG END EXIT END /*********************************************************************/ /* MAIN - HEALTH CHECK */ /* VERIFIES IF TWO DATASETS SPECIFIED AS PARAMETER */ /* RESIDE ON DIFFERENT VOLUMES */ /*********************************************************************/ SAY "CHECK RUN AT: "DATE()", "TIME() PARSE VAR HZS_PQE_PARMAREA DS1 DS2 SAY "ARG 1: "DS1 SAY "ARG 2: "DS2 IF LISTDSI("'"DS1"'") <> 0 ! LISTDSI("'"DS2"'") <> 0 THEN DO HZSLFMSG_REQUEST = "STOP" HZSLFMSG_REASON = "BADPARM" CALL TERMINATE "INPUT PARAMETERS INCORRECT" END ADDRESS TSO VOLDS1 = CHECK_VOLSER(DS1) VOLDS2 = CHECK_VOLSER(DS2) SAY "VOLSER 1: "VOLDS1 SAY "VOLSER 2: "VOLDS2 IF LENGTH(VOLDS1) <> 6 ! LENGTH(VOLDS2) <> 6 THEN DO HZSLFMSG_REQUEST = "STOP" HZSLFMSG_REASON = "ERROR" CALL TERMINATE "VOLSERS NOT DETECTED CORRECTLY" END IF VOLDS1 = VOLDS2 THEN DO SAY "EXCEPTION CLAUSE EXECUTED" HZSLFMSG_REQUEST='DIRECTMSG' HZSLFMSG_SEVERITY='SYSTEM' HZSLFMSG_REASON='CHECKEXCEPTION' HZSLFMSG_DIRECTMSG_ID='HZSUH002E' HZSLFMSG_DIRECTMSG_TEXT="EXCEPTION: BOTH DATA SETS ARE ON THE:", VOLDS1" VOLUME" END ELSE DO SAY "SUCCESS CLAUSE EXECUTED" HZSLFMSG_REQUEST='DIRECTMSG' HZSLFMSG_REASON='CHECKINFO' HZSLFMSG_DIRECTMSG_ID='HZSUH001I' HZSLFMSG_DIRECTMSG_TEXT="SUCCESS: DATA SETS ON DIFFERENT VOLUMES:", VOLDS1" AND "VOLDS2 END CALL END_CHECK RETURN /*********************************************************************/ /* CHECK VOLSER ON WHICH DATA SET RESIDES */ /*********************************************************************/ CHECK_VOLSER: PARSE ARG DS . C = OUTTRAP('CMDOUT.') "LISTC ENT('"DS"') ALL" C = OUTTRAP('OFF') DO I=1 TO CMDOUT.0 VOLPOS = POS("VOLSER",CMDOUT.I) IF VOLPOS <> 0 THEN DO VOLSER=SUBSTR(CMDOUT.I,VOLPOS+18,6) LEAVE END END RETURN VOLSER /*********************************************************************/ /* End of Check Function */ /*********************************************************************/ END_CHECK: SAY "END_CHECK FUNCTION EXECUTED" HZSLFMSG_RC = HZSLFMSG() HZSLSTOP_RC = HZSLSTOP() /* report check completion */ IF HZS_PQE_DEBUG = 1 THEN DO /* Report debug detail in REXXOUT */ SAY "HZSLSTOP RC" HZSLSTOP_RC SAY "HZSLSTOP RSN" HZSLSTOP_RSN SAY "HZSLSTOP SYSTEMDIAG" HZSLSTOP_SYSTEMDIAG END EXIT /*********************************************************************/ /* HANDLED ERROR ROUTINE */ /*********************************************************************/ TERMINATE: PARSE ARG ERROR_MSG SAY ERROR_MSG CALL END_CHECK EXIT /*********************************************************************/ /* UNHANDLED ERROR ROUTINE */ /*********************************************************************/ SYNTAX: ERROR: FAILURE: NOVALUE: HALT: SAY "AN ERROR HAS OCCURRED ON LINE: "SIGL SAY "ERROR LINE: "SOURCELINE(SIGL) SAY "RETURN CODE: "RC CALL TERMINATE "ERROR TEXT: "ERRORTEXT(RC) EXIT
This is a basic version of Health Check. Only the most important functions of Health Checker are used in the script. From programming point of view it is important to remember: - SAY statements in Health Checks are used for debugging only. They are routed to REXXOUT data set. - HZSLFMSG is a function used for the communication between your script and health checker. - HZSLFMSG_REQUEST='DIRECTMSG' – DIRECTMSG can speed up and simplify your REXX code, with it you don't have to define message table separately. - HZSLFMSG_SEVERITY='SYSTEM' – SYSTEM value means that check severity must be specified in HZSPRMxx check definition. It's a good practice to always do it this way. Few useful commands in this Task: - “F HZSPROC,ADD,PARMLIB=(02)” - Appends HZMPRM02 definitions to current HZSPROC setting. - “F HZSPROC,UPDATE,CHECK(USERCHK,SPOF_VOL_DIFF_VERIFY),DEBUG=ON” – Turns debugging on. - “F HZSPROC,UPDATE,CHECK(USERCHK,SPOF_VOL_DIFF_VERIFY),PARM='JSADEK.MY.LINKLIB JSADEK.MY.PROCLIB'” - changes input parameters for the Check. - “SBO” action character let's you display REXXOUT data set from CK panel.
/* REXX */
SIGNAL ON SYNTAX
SIGNAL ON ERROR
SIGNAL ON FAILURE
SIGNAL ON HALT
/* SPOOLCK1 - REXX CHECKS IF SPOOL USAGE OF ANY JOB */
/* IS INCREASING TOO FAST */
/* PARAMETERS: */
/* MAXSPLUSAGE - IF ANY JOB USES MORE THAN THIS VALUE CHECK */
/* ENDS IN EXCEPTION */
/* SPLUSAGE - DEFINES SCOPE OF THE CHECK, ALL JOBS THAT USE */
/* MORE SPOOL THAT THIS VALUE ARE VERIFIED BY THE CHECK */
/* INCRSPEED - IF JOBS SPOOL USAGE INCREASED MORE THAN THIS VALUE */
/* BETWEEN CHECK RUNS, CHECK ENDS IN EXCEPTION */
/* SPLDATA - DATA SET NEEDED FOR STORING JOB DATA BETWEEN CHECK RUNS */
IF DATATYPE(MAXSPLUSAGE)<>"NUM" ! DATATYPE(SPLUSAGE)<>"NUM",
! DATATYPE(INCRSPEED)<>"NUM" ! SYSDSN("'"SPLDATA"'")<>"OK" THEN DO
CALL TERMINATE "INCORRECT INPUT PARAMETERS"
"ALLOC FI(INDD) DA('"SPLDATA"') OLD REUSE"
CALL TERMINATE "PROGRAM ENDED SUCCESSFULLY"
/* COMPARES JOBS FROM CURRENT AND PREVIOUS RUN IF ANY GROWS TOO FAST */
DO I=1 TO JOBID.0
DO N=1 TO OJOBID.0
IF JOBID.I=OJOBID.N & (OPERCENT.N+INCRSPEED)
Comments: - JES2 commands output wont be correctly routed to CONSOLE REXX session if unless you use “L=Z” at the end of JES2 command. - MVSVAR(SYSNAME) – to use MVSVAR function you cannot use NOVALUE error clause. MVSVAR triggers it even if it ends successfully.
ADDREP CHECK(USERCHK,SPOOL_USAGE_CHECK) EXEC(SPOOLCK1) REXXHLQ(SYSU) REXXTSO(YES) REXXIN(NO) MSGTBL(*NONE) USS(NO) VERBOSE(NO) SEVERITY(HIGH) INTERVAL(0:05) ACTIVE EINTERVAL(SYSTEM) DATE(20170720) PARM('5 0.2 0.1 SYSU.SPOOLCK1.SPLDATA') REASON('Check if any running job spool usage grows too quickly.')
Above definition makes the check run every 5 minutes. It will end in Exception when any job in Execution queue takes more than 5% of spool or if any job that has more than 0.2% of spool increased by 0.1% of spool since the last check run, so 0.1% in last 5 minutes. Depending on the batch workload those parameters can be adjusted so they inform user only when it's really needed. REXX script:
/* REXX */
SIGNAL ON SYNTAX
SIGNAL ON ERROR
SIGNAL ON FAILURE
SIGNAL ON HALT
/* SPOOLCK1 - REXX CHECKS IF SPOOL USAGE OF ANY JOB */
/* IS INCREASING TOO FAST */
/* PARAMETERS: */
/* MAXSPLUSAGE - IF ANY JOB USES MORE THAN THIS VALUE CHECK */
/* ENDS IN EXCEPTION */
/* SPLUSAGE - DEFINES SCOPE OF THE CHECK, ALL JOBS THAT USE */
/* MORE SPOOL THAT THIS VALUE ARE VERIFIED BY THE CHECK */
/* INCRSPEED - IF JOBS SPOOL USAGE INCREASED MORE THAN THIS VALUE */
/* BETWEEN CHECK RUNS, CHECK ENDS IN EXCEPTION */
/* SPLDATA - DATA SET NEEDED FOR STORING JOB DATA BETWEEN CHECK RUNS */
/* HEALTH CHECK START */
SAY "CHECK RUN AT: "DATE()", "TIME()
HZSLSTRT_RC = HZSLSTRT()
IF HZSLSTRT_RC <> 0 THEN
IF HZS_PQE_DEBUG = 1 THEN
SAY "HZSLSTRT RC" HZSLSTRT_RC
SAY "HZSLSTRT RSN" HZSLSTRT_RSN
SAY "HZSLSTRT SYSTEMDIAG" HZSLSTRT_SYSTEMDIAG
/* INPUT PARAMTERS PROCESSING */
PARSE VAR HZS_PQE_PARMAREA MAXSPLUSAGE SPLUSAGE INCRSPEED SPLDATA .
IF DATATYPE(MAXSPLUSAGE)<>"NUM" ! DATATYPE(SPLUSAGE)<>"NUM",
! DATATYPE(INCRSPEED)<>"NUM" ! SYSDSN("'"SPLDATA"'")<>"OK" THEN DO
HZSLFMSG_REQUEST = "STOP"
HZSLFMSG_REASON = "BADPARM"
CALL TERMINATE "INPUT PARAMETERS INCORRECT"
SAY "MAXSPLUSAGE IS: "MAXSPLUSAGE
SAY "SPLUSAGE IS: "SPLUSAGE
SAY "INCRSPEED IS: "INCRSPEED
SAY "OUTPUT DATA SET: "SPLDATA
/* MAIN HEALTH CHECK LOGIC */
"ALLOC FI(INDD) DA('"SPLDATA"') OLD REUSE"
IF HZS_PQE_FUNCTION_CODE = "INITRUN" THEN DO
"EXECIO * DISKW INDD (STEM TEMP. FINIS)"
SAY "INITRUN MODE, SPLDATA CLEARED BEFORE CHECK EXECUTION"
CALL HZS_SUCCESS "CHECK ENDED SUCCESSFULLY"
/* HZS EXCEPTION MESSAGE */
PARSE ARG MESSAGE
SAY "CHECK EXCEPTION: "MESSAGE
CALL TERMINATE MESSAGE
/* HZS SUCCESS MESSAGE */
PARSE ARG MESSAGE
SAY "CHECK SUCCESS: "MESSAGE
CALL TERMINATE MESSAGE
/* COMPARES JOBS FROM CURRENT AND PREVIOUS RUN IF ANY GROWS TOO FAST */
DO I=1 TO JOBID.0
DO N=1 TO OJOBID.0
IF JOBID.I=OJOBID.N & (OPERCENT.N+INCRSPEED)
Comments: - Health Checker user must have access to all resources needed by the script, for example: REXXOUT data set, SPLDATA data set, CONSOLE profile of TSOAUTH class, JES2 profile protecting DISPLAY command of OPERCMDS class. - To use 'ADDRESS CONSOLE' in System REXX you must enable TMP(Terminal Monitor Program) for System REXX. To do it you must to add AXRRXWKD to authorized command list in IKJEFTxx and then restart System REXX. Test cases: - First check run - INITRUN should be executed. - SPLDATA empty - comparison is not done, but MAXSPLUSAGE is checked and jobs from current run are written to SPLDATA. - SPLDATA has data from previous run. MAXSPLUSAGE is checked and comparison of jobs is performed. - Check is refreshed. INITRUN mode is triggered when check is refreshed. (During refresh DEBUG mode is turned off so you won't be able see check output in REXXOUT). - There are jobs that exceed MAXSPLUSAGE. Checks ends in exception. No comparison is done but jobs that match SPLUSAGE criteria are written to SPLDATA. You can modify parameters dynamically as follows to test this case: 'F HZSPROC,UPDATE,CHECK(USERCHK,SPOOL_USAGE_CHECK),PARM='0.5 0.2 0.1 SYSU.SPOOLCK1.SPLDATA'
* High Severity Exception * HZSUH002E HZSUH002E JOB EXCEEDED MAX SPOOL USAGE: SYSLOG(0.8496%) SYSLOG(1.4368%)
- There is one or more jobs that grow fast. That increased their size by INCRSPEED % since check ran last time. This condition can be easily generated with a simple job:
//REXX EXEC PGM=IEBGENER //SYSPRINT DD SYSOUT=* //SYSIN DD DUMMY //SYSUT2 DD DSN=&&TEMP(KILLSPL),DISP=(,PASS), // SPACE=(TRK,(1,1,2)),LRECL=80,RECFM=FB,BLKSIZE=8000 //SYSUT1 DD * DO I=0 TO 5000000 SAY "YUPII "I END RETURN 0 //STEP1 EXEC PGM=IKJEFT01,REGION=6M //SYSEXEC DD DISP=SHR,DSN=&&TEMP //SYSTSPRT DD SYSOUT=* //SYSTSIN DD * %KILLSPL
This job will use 596 track groups (1 group = 3 tracks). The message received:
* High Severity Exception * HZSUH002E HZSUH002E JOB INCREASES SPOOL TOO QUICKLY: JSADEKE