D720 - Operators’ Procedures
DEFINITION
Prepare and instigate operators’ instructions, procedures and manual.
SUMMARY
Prepare detailed instructions for the technical operation and support of the system, including routine and abnormal running. This will normally include the production of an operational-standard and the definition of human / machine interactions.
In the Application Software Delivery-IDDI approach it is common to produce the procedures and instructions in the form of a manual (although “electronic manuals” are becoming increasingly viable instead). In a “Brief Delivery” it is normal practice to rely upon the vendor’s documentation for a majority of the information and produce notes to supplement these as required. Where there has been no customisation of the system, it may be that the documentation supplied with the system is adequate.
Although primarily focused on procedures, this process will involve the definition of an operational environment. It may, therefore, require a prototyping approach to investigate the most effective ways of running and controlling the system. Specialist skills may be required for the technical definition and setting up of the procedures.
PATH PLANNING GUIDANCE
Optional - depending on the suitability of the vendor’s standard materials..
DEPENDENCIES
Prerequisites (Finish-Start):
- Implementation Paper(s) defining the overall operational needs, eg technical design architecture, scheduling requirements, interfaces, controls, backup and recovery etc.
Dependent procedures (Finish-Finish):
- parts of the technical testing (Process D810) concerning operational procedures
Dependent procedures (Finish-Start):
- final operations review (Process D895)
- acceptance of the overall operational system and decision to go live (Process D900)
RECEIVABLES
- relevant Implementation Paper(s)
DELIVERABLES
- working, documented operational environment and procedures ready for formal testing
- operational JCL (or equivalent)
- Operations Manual and/or operators’ instructions and procedures
- New / updated client’s operations / MIS production schedules
- New / updated client’s Disaster Recovery Plan and materials
TOOLS
- various package specific materials
- Example: (none)
DETAILED DESCRIPTION OF TASKS
What are operators’ procedures
In this context, we are referring to the technical operation of the computer system, the software and associated procedures.
In an office system or medium system, it is common for the operations of the system to have been simplified to the extent that “ordinary” users operate, control and manage the system. They would normally have received some specialist training but would not be specialists in computer operations. In such cases it is important that the project team leaves simple clear comprehensive instructions for all reasonable aspects of the future operation of the system - often from turning the power on to restoring the database after a failure.
In a mainframe environment there are usually many different specialist skills involved. It is normal, therefore, for a project team to bring in the specialists for some of this work. Specialist resources will be found within most large organisations’ MIS departments. Most will have staff such as operations analysts, system programmers, transaction processing system specialists, database managers, disaster recovery managers, communications network managers, security controllers etc. All of these may need to be consulted regarding the definition, setting up and operation of the system in a live environment.
Although there are typically more issues in a mainframe environment, the task of compiling instructions can be simpler as much of the routine activity is already standard practice carried out by experienced specialist operators.
Job Control Language
Job Control Language (commonly known as JCL) is the generic name for pre-coded instructions that control how the overall computer system processes the various applications and other tasks that it runs. Some computer systems use different names for this (eg DCL Dec Control Language or SCL - ICL’s System Control Language). The expression JCL is, however, universally understood.
JCL is used to control most activities on the computer, although, in some systems, there may also be a higher level control program effectively replacing many of the operator’s tasks and decisions.
Some examples of the functions of JCL and associated software are:
- chain programs or routines together into an overall suite, eg main batch processing, interfaces, reporting runs etc
- routine backing up of data
- loading and unloading the Transaction Processing service
- running recovery procedures (normally on special request following a problem)
- automatic detection of errors in the processing cycles
- in some cases - automatic recovery from errors in the processing cycles
- routine housekeeping (eg disk reorganisation / optimisation)
The standard JCL issued with the package will not normally be suitable for the final live system without amendment. It will normally need to be customised, for example:
- to use the organisation’s specified live disks or database,
- to work on the organisation’s network,
- to use the organisation’s defined printers,
- to include instructions for the backup of system and data,
- to interact properly with other control software
- to take in or output interfaces with required level of procedural control.
The writing of JCL instructions is a specialist skill, although many MIS staff will have at least some level of competence with routine aspects. Appropriate effort should be put into the development and testing of the JCL. It can, sometimes, be a significant and time consuming task requiring expertise to be brought together from several sources.
Contents of operators’ procedures
The following table shows aspects that operators’ procedures may need to cover. It also indicates whether these are normal (ü), optional (O), or unlikely (X) for two example types of system - centralised mainframe and local office or medium system. It is assumed, for this example, that standard operational functions will have been fully defined and/or automated at centralised mainframes, but not at local medium systems. Note that there may be exceptions to this, particularly where processes have been fully automated.
Aspect
|
Central
Mainframe
|
Local Medium System
|
Turning on and operating equipment (eg changing paper, loading and unloading tapes and cartridges)
|
X
|
O
|
Loading the real-time application(s)
|
✔
|
✔
|
Running batch processes, eg reporting runs, interfaces
|
✔
|
✔
|
Running Backup routines
|
O
|
✔
|
Shutdown procedures
|
X
|
✔
|
Actions to take on expected messages or error conditions - ie routine interaction between the operator and the computer
|
✔
|
✔
|
Running database recovery
|
✔
|
✔
|
Run scheduling, dependencies with other systems, and timing
|
✔
|
O
|
Linking in vendor’s remote diagnostics or maintenance service
|
X
|
O
|
How to report unresolvable errors, eg contacting local support, contacting the supplier, escalation procedures etc
|
O
|
O
|
When and how to invoke disaster recovery procedure
|
O
|
O
|
Magnetic media rotation requirements
|
X
|
O
|
Output control and distribution procedures
|
ü
|
X
|
Off site security backups
|
X
|
O
|
Setting system level access security
|
X
|
✔
|
Environmental needs, eg temperature, ventilation, clean power supply
|
X
|
O
|
Archiving of data and file retention requirements
|
X
|
O
|
Procedural control requirements
|
O
|
O
|
Physical access to the computer equipment
|
X
|
O
|
Procedures for applying software updates (and falling back to the old system if necessary)
|
X
|
O
|
Report Distribution Management (manual or by Report Distribution Management System - RDMS)
|
O
|
O
|
Handling of special stationery (especially regarding financially valuable documents or stationery)
|
O
|
O
|
Production schedules
The definition of operational procedures may include the setting up of regular run cycles. The scheduling of runs can be a complex activity. Most large organisations with a centralised mainframe will have a specialist section responsible for scheduling the overall system. The project team should work with them to establish a suitable schedule. With smaller, local systems, it will often be the sole responsibility of the project team.
Factors may include:
- processing needs of the new system and its users
- requirements for routine housekeeping, backups, file reorganisation etc
- dependencies with feeder systems
- capability of the computer to handle concurrent loads (including demands from other systems running at the same time)
- required availability for the system to be available for real-time usage
- feasibility of running batch updates and real-time processing simultaneously
- cyclical and special run requirements (eg month end, year end)
- normal and peak run times for the various processing cycles.
Disaster Recovery
Disaster planning should cover any system that is significant to the successful operation of the organisation. The plan should identify the vital requirements to recover the system in the event that the normal facilities have become unusable. Routine procedures should be put in place to ensure that the plan could be operated successfully if required. Considerable thought and preparation may be required to ensure that the plan is foolproof. It has been found that disaster recovery plans almost always fail unless they have been tested.
Some key factors to consider are:
- Access to appropriate replacement equipment (frequently using a specialist bureau or, similar equipment with adequate capacity at another “friendly” user organisation (possibly on a reciprocal basis).
- Access to telecommunications lines to connect into office networks as required.
- Access to basic system software, configured correctly to run the applications, eg Transaction Processing and database environments properly set up.
- Access to the software to run the applications
- Access to parameter files, databases etc.
- Access to recent copy of master file data and transaction data.
- Method of identifying lost transactions when the system is restarted from the backup data (eg reprocessing of forms or log files if available)
- access to any special media, special stationery etc.
- Details of how to contact key personnel required to set up the system
- Secure off-site storage for the items required to set up the system. Note that the location should not suffer the same risks, eg it should be physically isolated, should not be on the banks of the same river etc
- Lists showing what these items are and how to access them
- Access security setup for the emergency system
- Accommodation for vital staff at the backup site
Commercially available services usually provide facilities at short notice, funded by a number of potential users, the argument being that it is very unlikely that two organisations would have a disaster at the same time. Typically, the contract would include limited use of the facility to test the procedures.
There are two main types of recovery system:
- Hot Restart - the full system is kept ready in a duplicate form and can be operated at very short notice
- Cold Restart - the basic equipment is available, but the client organisation would have to build the full system on it from scratch.
For absolutely critical systems, it may be appropriate to keep a duplicate system running in parallel with the main system. The systems would typically be linked by fibre optics to allow adequate responses.
Documentation of the procedures
The techniques used to record the procedures will vary according to needs. This may have been defined in an earlier Implementation Paper, or it may be constrained by normal practice within the client organisation’s MIS department.
In large, complex environments it will normally be necessary to document operational procedures in a formal and detailed manner. Detail should normally relate to the specific way in which the system has been set up and the specific way in which it should be operated. It is, however, reasonable to assume that the operators are competent to handle routine matters. The documentation produced under siips’ Delivery-IDDI approach is normally, therefore, in the form of a manual.
With a simple stand-alone system, it may be reasonable to rely primarily on the package vendor’s own documentation. This can be supplemented by specific notes as required. The document produced during a “Brief Delivery” approach is usually, therefore, restricted to brief notes detailing the procedures as a supplement to the vendor’s standard documentation.
Where there has been no customisation, it may be possible to rely entirely upon the documentation supplied with the system. Accordingly, this task is frequently not required in “Implement Only” projects.
Conventionally, documentation is still produced in paper form. It is, however, increasingly valuable to prepare documentation in electronic format - particularly where the documentation allows for search facilities to allow the rapid location of required information, and/or “expert” logic to assist operators to find the appropriate action for a given set of events. One important point, however, is that instructions for fixing faults are of no value if they reside on the broken system.
No comments:
Post a Comment