Top Ad unit 728 × 90

Recent

recent

Disaster management - crisis management in Banks -- for PO Interview

Role of HR in the BCP context
a) Crisis Management Team: As a core member of the CMT, HR provides guidance to team on people-related issues, including evacuation, welfare, whether to invoke the HR incident line, alternative travel arrangements and what to communicate to staff.
b) HR Incident Line: Operated from within the centralised HR function, the incident helpline is invoked in those instances, where there are possible casualties or missing staff, as a result of an incident. Invoked by the CMT, the line is manned by qualified HR officers trained in how to deal with distressed callers. The staff may be provided with an emergency card, which includes the incident line number. Information on the hotline is updated on a regular basis. The facility enables line managers to keep the central crisis team up to speed on the whereabouts and well-being of staff. Ongoing welfare and support for staff is also provided via an employee assistance provider.
c) Exceptional Travel arrangements: Transportation plans should be considered in the event of the need to relocate. Key staff need to be identified including details of where they are located, and vehicles are on standby to transport them if required.
8. Technology Aspects of BCP
The are many applications and services in banking system that are highly mission critical in nature and therefore requires high availability, and fault tolerance to be considered while designing and implementing the solution. This aspect is to be taken into account especially while designing the data centre solution and the corporate network solution.
Data Recovery Strategies
Prior to selecting a data recovery (DR) strategy, a DR planner should refer to their organisation's BCP, which should indicate key metrics of recovery point objectiveand recovery time objectivefor business processes:
Recovery Point Objective (RPO)–The acceptable latency of data that will be recovered
Recovery Time Objective (RTO)–The acceptable amount of time to restore the function
Recovery Point Objective must ensure that the Maximum Tolerable Data Loss for each activity is not exceeded. TheRecovery Time Objective must ensure that the Maximum Tolerable Period of Disruption (MTPD), for each activity, is not exceeded. The metrics specified for the business processes must then be mapped to the underlying IT systems and infrastructure that support those processes. Once, RTO and RPO metrics have been mapped to the IT infrastructure, the DR planner can determine the most suitable recovery strategy for each system. An important note here, however, is that the business ultimately sets the IT budget. Therefore, RTO and RPO metrics need to fit with the available budget and the critical of the business process/function.
A List of Common Strategies for Data Protection:
  • Backups made to tape and sent off-site at regular intervals (preferably daily)
  • Backups made to disk on-site and automatically copied to off-site disk, or made directly to off-site disk
  • Replication of data to an off-site location, which overcomes the need to restore the data (only the systems then need to be restored or synced). This generally makes use of storage area network (SAN) technology
  • High availability systems that keep both data and system replicated, off-site, enabling continuous access to systems and data
In many cases, an organisation may elect to use an outsourced disaster recovery provider to provide a stand-by site and systems rather than using their own remote facilities. In addition to preparing for the need to recover systems, organisations must also implement precautionary measures with an objective of preventing a disaster in the first place. These may include some of the following:
  • Local mirrors of systems or data. Use of disk protection technology such as RAID
  • Surge protectors—to minimise the effect of power surges on delicate electronic equipment
  • Uninterrupted power supply (UPS) or backup generator to keep systems going in the event of a power failure
  • Fire preventions—alarms, fire extinguishers
  • Anti-virus software and security measures
A disaster recovery plan is a part of the BCP. It dictates every facet of the recovery process, including:
  • What events denote possible disasters;
  • What people in the organisation have the authority to declare a disaster and thereby put the plan into effect;
  • The sequence of events necessary to prepare the backup site once a disaster has been declared;
  • The roles and responsibilities of all key personnel with respect to carrying out the plan;
  • An inventory of the necessary hardware and software required to restore production;
  • A schedule listing the personnel that will be staffing the backup site, including a rotation schedule to support ongoing operations without burning out the disaster team members.
A disaster recovery plan must be a living document; as the data centre changes, the plan must be updated to reflect those changes.
It is to be noted that the technology issues are a derivative of the Business Continuity plan and Management.
For example, BCP and Management will lead to the Business Impact Analysis, which will lead to the Performance Impact Analysis (PIA). That will depend on the Technology Performance of the total IT Solution Architecture.
To amplify business impact analysis is to identify the critical operations and services, key internal and external dependencies and appropriate resilience levels. It also analysis the risks and quantify the impact of those risks from the point of view of the business disruptions. For example, in order to provide state of the art customer services both at the branch level and the delivery channels we need to take into account the services levels that are committed.
If an ATM transaction has to take place in 10 seconds and cash withdrawal or deposit has to take place in 60 seconds at the counter, then based on the load one can compute the number of customers who can be serviced in a day. The above example is to understand the fact that the business latency introduced by the system is a combination of technology, process and people. Therefore, the technical latency is a derivative of the committed business latency and the technology solution architecture has to deliver the same under varying loads.
Technology Solution Architecture to address specific BCM requirements are:
  • Performance
  • Availability
  • Security and Access Control
  • Conformance to standards to ensure Interoperability
Performance of the technology solution architecture for operations needs to be quantified. It should be possible to measure, as and when required, the quantified parameters. (For example, if the latency for a complex transaction initiated at the branch has to be completed in four seconds under peak load, it should be possible to have adequate measuring environments to ensure that performance degradations have not taken place due to increasing loads.)
Solution architecture has to be designed with high-availability, and no single point of failure. It is inevitable that a complex solution architecture with point products from different sources procured and implemented at different points in time will have some outage once in a while and the important issue is that with clearly defined  SLAs, mean time to restore, it should be possible to identify the fault and correct the same without any degradation in performance.
Accordingly, with respect to the performance and availability aspects the following architectures have to be designed and configured to provide high levels of up time round the clock to ensure uninterrupted functioning.
Summation of the required processes:
– Data centre solution architecture 
– DR solution architecture 
– Near site solution architecture 
– Enterprise network and security architecture 
– Branch or delivery channel architecture
– Based on the above observation, banks are required to do the following: Take up the performance and availability audit of the solutions deployed to ensure that the architecture is designed and implemented with no single point of failure.
– Audit the deployed architecture for all the mission critical applications and services and resolve the concerns that arise in a time bound manner.
– Periodically investigate the outages that are experienced from time to time, which are mini disasters that result in non availability of services for a short span of time, systems not responding when transactions are initiated at the branch level, delivery channels not functioning for a brief period of time to ensure that the customer service is not affected.
– Ensure availability of appropriate technology solutions to measure and monitor the functioning of products. And, have competent and capable technical people within the system to resolve issues expeditiously. (Issues relating to manpower training needs are further elaborated at Annex-B)
The issues detailed above have to be borne in mind while finalising the data centre architecture and the corporate network architecture which are expected to have redundancy built in the solution with no single point of failure.
With reference to the network architecture it is recommended that the Banks built in redundancies as under:
  • Link level redundancy
  • Path level redundancy
  • Route level redundancy
  • Equipment level redundancy
  • Service provider level redundancy
Issues in choosing a backup site and implementing a DC or DR solution:
Backup site : Is a location where an organisation can easily relocate following a disaster, such as fire, flood, terrorist threat or other disruptive event. This is an integral part of the disaster recovery plan and wider business continuity planning of an organisation. A backup site can be another location operated by the organisation, or contracted via a company that specialises in disaster recovery services. In some cases, an organisation will have an agreement with a second organisation to operate a joint backup site.
There are three main types of backup sites:
  • cold sites
  • warm sites 
  • hot sites
Differences between them are determined by costs and effort required to implement each. Another term used to describe a backup site is a work area recovery site.
1. Cold Sites : A cold site is the most inexpensive type of backup site for an organisation to operate. It does not include backed up copies of data and information from the original location of the organisation, nor does it include hardware already set up. The lack of hardware contributes to the minimal start up costs of the cold site, but requires additional time following the disaster to have the operation running at a capacity close to that prior to the disaster.
2. Hot Sites : A hot site is a duplicate of the original site of the organisation, with full computer systems as well as near-complete backups of user data. Real-time synchronisation between the two sites may be used to mirror the data environment of the original site, using wide area network links and specialised software. Following a disruption to the original site, the hot site exists so that the organisation can relocate with minimal losses to normal operations. Ideally, a hot site will be up and running within a matter of hours or even less. Personnel may still have to be moved to the hot site so it is possible that the hot site may be operational from a data processing perspective before staff has relocated. The capacity of the hot site may or may not match the capacity of the original site depending on the organisation's requirements. This type of backup site is the most expensive to operate. Hot sites are popular with organisations that operate real time processes such as financial institutions, government agencies and ecommerce providers
3. Warm Sites : A warm site is, quite logically, a compromise between hot and cold. These sites will have hardware and connectivity already established, though on a smaller scale than the original production site or even a hot site. Warm sites will have backups on hand, but they may not be complete and may be between several days and a week old. An example would be backup tapes sent to the warm site by courier.
8.1 The following issues arise in choosing a back up site and implementing a DC/DR solution:
1. Solution architectures of DC and DR are not identical for all the applications and services. Critical applications and services, namely the retail, corporate, trade finance and government business solutions as well as the delivery channels are having the same DR configurations whereas surround or interfacing applications do not have the DR support. Banks will have to conduct periodical review with reference to the above aspect and upgrade the DR solutions from time to time and ensure that all the critical applications and services have a perfect replica in terms of performance and availability.
2. The configurations of servers, network devices and other products at the DC and DR have to be identical at all times. This includes the patches that are applied at the DC periodically and the changes made to the software from time to time by customization and parameterization to account for the regulatory requirements, system changes etc.
3. Periodic checks with reference to ensuring data and transaction integrity between DC and DR are mandatory. It could be done over the week end or as a part of the EoD / BoD process.
4. Solutions have to have a defined Recovery Time Objective (RTO) and Recovery Point Objective (RPO) parameter. These two parameters have a very clear bearing on the technology aspects as well as the process defined for cut over to the DR and the competency levels required to move over in the specified time frame.
5. Values chosen for the RTO and RPO is more to follow the industry practice and not derived from first principles. Therefore, the DR drills that are conducted periodically have to ensure that the above parameters are strictly complied with.
6. Technology operations processes which support business operations (such as EOD/ BOD) need to formally included into the IT Continuity Plan.
7. Banks may also consider Recovery Time Objective and Recovery Point Objectives (RTO/ RPO) for services being offered and not just a specific application. For example--for internet portal and not retail banking. This is done to avoid any inconsistency in business users understanding.
6. DR drills currently conducted periodically come under the category of planned shutdown. Banks have to evolve a suitable methodology to conduct the drills which are closer to the real disaster scenario so that the confidence levels of the technical team taking up this exercise is built to address the requirement in the event of a real disaster.
7. It is also recommended that the support infrastructure at the DC and DR, namely the electrical systems, air-conditioning environment and other support systems have no single point of failure and do have a building management and monitoring system to constantly and continuously monitor the resources. If it is specified that the solution has a high availability of 99.95 measured on a monthly basis and a mean time to restore of 2 hrs in the event of any failure, it has to include the support system also.
8. Data replication mechanism followed between DC and DR is the asynchronous replication mechanism and implemented across the industry either using database replication techniques or the storage based replication techniques. They do have relative merits and demerits. The RTO and RPO discussed earlier, along with the replication mechanism used and the data transfer required to be accomplished during the peak load will decide the bandwidth required between the DC and the DR. The RPO is directly related to the latency permissible for the transaction data from the DC to update the database at the DR. Therefore, the process implemented for the data replication requirement has to conform to the above and with no compromise to data and transaction integrity.
9. Given the need for drastically minimizing the data loss during exigencies and enable quick recovery and continuity of critical business operations, banks may need to consider near site DR architecture. Major banks with significant customer delivery channel usage and significant participation in financial markets/payment and settlement systems may need to have a plan of action for creating a near site DR architecture over the medium term (say, within three years).
To address these issues, RBI may consider:
  • Stipulating a periodic DR exercise with clearly defined ground rules for the same. IT recovery tests are also required to realistically reflect the worst case scenario where all critical systems must be restored concurrently.
  • Sending out detailed questionnaires to the banks based on the questionnaire that is issued by CPSS-IOSCO.
  • Carrying out system wide stress testing simulating various scenarios, elaborated in the next section.
8.2 Issues/Challenges in DC/DR implementation by the Banks
  1. Despite considerable advances in equipment and telecommunications design and recovery services, IT disaster recovery is becoming challenging. Continuity and recovery aspects are impacting IT strategy and cost implications are challenging IT budgets.
  2. The time window for recovery is shrinking in face of the demand for 24 / 365 operations. Some studies claim that around 30 percent of high-availability applications have to be recovered in less than three hours. A further 45 percent within 24 hours, before losses become unsustainable; others claim that 60 percent of Enterprise Resource Planning (ERP) Systems have to be restored in under 24 hours. This means that traditional off-site backup and restore methods are often no longer adequate. It simply takes too long to recover incremental and full image backups of various inter-related applications (backed up at different times), synchronise them and re-create the position as at disaster. Continuous operation–data mirroring to off-site locations and standby computing and telecommunications–may be the only solution.
  3. A risk assessment and business impact analysis should establish the justification for continuity for specific IT and telecommunication services and applications.
  4. Achieving robust security (security assurance) is not a onetime activity. It cannot be obtained just by purchasing and installing suitable software and hardware. It is a continuous process that requires regular assessment of the security health of the organisation and proactive steps to detect and fix any vulnerability. Every bank should have in place quick and reliable access to expertise for tracking suspicious behavior, monitoring users and performing forensics. Adequate reporting to the authorities concerned – such as the RBI/ IDRBT/CERT-In and other institutions should be an automatic sub process whenever such events occur.
Important steps that need to be institutionalised are the following:
a) Rigorous self-assessment of security measures by banks and comprehensive security audit by external agencies, as detailed under the “Chapter on Information Security” earlier.
b) Random Security Preparedness. It is proposed that a sufficiently large ``question bank'' related to security health of the organization be prepared and given to RBI's inspection teams who go for inspection of banks. A random subset of these queries could then be given to a bank’s IT team for which answers need to be provided in near real time. Sample checks related to user accounts could be the number of new accounts, terminated accounts, most active accounts. There could also be demonstrations of data recovery from archives.
(e) Telecommunications issues may also arise: It is important to ensure that relevant links are in place and that communications capability is compatible. The adequacy of voice and data capacity needs to be checked. Telephony needs to be switched from the disaster site to the standby site. A financial institution’s BCP should consider addressing diversity guidelines for its telecommunications capabilities. This is particularly important for the financial services sector that provides critical payment, clearing, and settlement processes; however, diversity guidelines should be considered by all financial institutions and should be commensurate with the institution’s size, complexity, and overall risk profile. Diversity guidelines may include arrangements with multiple telecommunications providers. However, diverse routing may be difficult to achieve since primary telecommunications carriers may have an agreement with the same sub-carriers to provide local access service, and these sub-carriers may also have a contract with the same local access service providers. Financial institutions do not have any control over the number of circuit segments that will be needed, and they typically do not have a business relationship with any of the sub-carriers. Consequently, it is important for financial institutions to understand the relationship between their primary telecommunications carrier and these various sub-carriers and how this complex network connects to their primary and back-up facilities. To determine whether telecommunications providers use the same sub-carrier or local access service provider, banks may consider performing an end-to-end trace of all critical or sensitive circuits to search for single points of failure such as a common switch, router, PBX, or central telephone office.
(f) Banks may consider the following telecommunications diversity components to enhance BCP:
  1. Alternative media, such as secure wireless systems
  2. Internet protocol networking equipment that provides easily configurable re-routing and traffic load balancing capabilities
  3. Local services to more than one telecommunications carrier’s central office, or diverse physical paths to independent central offices
  4. Multiple, geographically diverse cables and separate points of entry
  5. Frame relay circuits that do not require network interconnections, which often causes delays due to concentration points between frame relay providers
  6. Separate power sources for equipment with generator or uninterrupted power supply back-up
  7. Separate connections to back-up locations
  8. Regular use of multiple facilities in which traffic is continually split between the connections; and
  9. Separate suppliers for hardware and software infrastructure needs.
(g) Banks need to monitor their service relationship with telecommunications providers: In order to manage the inherent risks more effectively. In coordination with vendors, management should ensure that risk management strategies include the following, at a minimum:
– Establish service level agreements that address contingency measures and change management for services provided;

– Ensure that primary and back-up telecommunications paths do not share a single point of failure

– Establish processes to periodically inventory and validate telecommunications circuits and routing paths through comprehensive testing.
(h) Some vendors offer a drop-ship service as an alternative to occupying the standby site. That is, in the event of equipment failure, for instance, they will drop off a replacement rather than insist the client occupy the standby site, with all the inconvenience that may involve. But it is essential that a site survey is undertaken to ensure they can be parked on the required site. Most commercial standby sites offering IT and work area recovery facilities do not guarantee a service: the contract merely provides access to the equipment. Although most reputable vendors will negotiate a Service Level Agreement that specifies the quality of the service, it is rarely offered.
It is important to ensure that a bank’s service will not suffer from unacceptable downtime or response. The vendor may have skilled staff available – but this is rarely guaranteed and they come at a cost. In terms of cost, there may be additional fees to pay for testing, on invocation of a disaster, and for occupation in a disaster. The vendor charging structure also needs to be carefully considered.
  1. Outsourcing Risks: In theory a commercial hot or warm standby site is available 24 / 365. It has staff skilled in assisting recovery. Its equipment is constantly kept up to date, while older equipment remains supported. It is always available for use and offers testing periods once or twice a year. The practice may be different. These days, organizations have a wide range of equipment from different vendors and different models from the same vendor. Not every commercial standby site is able to support the entire range of equipment that a bank may have. Instead, vendors form alliances with others – but this may mean that a bank’s recovery effort is split between more than one standby site. The standby site may not have identical IT equipment: instead of the use of an identical piece of equipment, it will offer a partition on a compatible large computer or server. Operating systems and security packages may not be the same version as the client usually uses. These aspects may cause setbacks when attempting recovery of IT systems and applications – and weak change control at the recovery site could cause a disaster on return to the normal site.
Among the questions banks need to consider are:
  • Is the vendor financially sound?
  • If the recovery site is occupied when a bank want to invoke the same, where and how the bank does the recovery?
  • How does the vendor define―disaster– on what conditions the bank can invoke?
  • How quickly can the bank occupy the recovery site on invocation?
  • How much will the annual subscription cost?
  • How much are invocation fees?
  • How much will it cost to test?
  • How much testing time is allowed?
  • Can the vendor personally cover the full range of our equipment and telecommunication needs now?
  •  If not, how will these needs be met?
  •  Does the vendor have standby generators and uninterruptible power supply adequate to maintain the whole installation?
  •  Does the vendor have alternate telecommunication suppliers with separate routing?
  •  Will the vendor keep in step when the bank buys new equipment?
  •  Will the vendor support aging equipment for as long as the bank would need it?
  •  Can a bank’s equipment at the vendor site?
  •  Will the vendor drop-ship small equipment at my site to save me having to relocate to the recovery site in the event of hardware failure or loss of a single component?
  •  If so, will they charge extra or is this included in the annual subscription?
  •  Is the location of the standby site safe for staff?
  •  Is the recovery site convenient for public transport?
  •  Does it have rest, shower and catering facilities for staff?
  •  Does it have adequate parking space?
  •  Is the site secure, and will the bank’s data remain confidential?
  •  What are the qualifications and skills of the vendor‘s support staff?
  •  Are they certified as members of the professional bodies like DRII or BCI?
  •  Will the vendor‘s support staff help the bank recover? If so, how many?
  •  Are there sufficient vendor staff to handle multiple invocations?
  •  Will the vendor‘s support staff help the bank to test?
  •  Does the recovery site agreement contain a Service Level Agreement specifying availability, reliability and performance?
  •  Does the institution have in place quick and reliable access to expertise for tracking suspicious behavior, monitoring users and performing forensics?
  •  Is there a system of automatic reporting to the authorities concerned – such as the RBI/ IDRBT/ CERT-In and other institutions whenever such events take place?
Most standby site vendors provide sound service at reasonable cost and are genuinely dedicated to assisting their clients. They have an enviable record of successful recoveries. But, as in any industry, there are a few unscrupulous suppliers. It is the responsibility of the IT manager to ensure effective recovery by those vendors, who apply the highest standards, supporting this by a stringent contract, clearly defining service specifications and technical requirements, and service-level agreements.
INDUSTRY-WIDE BCP RECOMMENDATIONS
A holistic BCP would incorporate all dimensions of the banking industry including the financial authorities (Reserve Bank of India), the financial institutions (banks) and the financial market infrastructure. Industry level coordination for contingency planning and management efforts of the individual institutions in the area of operational risk is critical to strengthen the operational resilience of the Indian financial system.
Most important industry-wide recommendations are:
  • Establishing an industry-wide alarm and crisis organisation (in which the key market participants and the most important providers of infrastructure services are represented. The heads of BCP from the participating institutions can make up the top level of this crisis organisation, with the lower levels forming a network between those responsible for the areas of liquidity, large-value payments, retail payment transactions and IT. Any of the institutions can invoke the alarm organisation by activating the level affected).
  • website for industry-wide BCP related information for the benefit of constituents of the industry can be considered.
  • Reviewing the extent to which the RBI and the Individual banks, can act on behalf of one another in exceptional situations like:
– Proving funds to other banks customers
– Waving charges over other banks ATM usage
– Honoring cheques of other banks
  • Intensifying contacts with the telecommunications and IT Infrastructure providers to the Industry
  • Examining the extent to which institutions can provide reciprocal support in the event of a crisis
  • Banks may consider allowing customers of one bank to use ATM networks of other banks for cash withdrawals with charges being borne by the parent bank.
  • Banks may consider waiving off penalties to be levied on delay of in-payments of Treasury deals.
  • Banks may consider making a agreement wherein in need of BCP a participatory Bank will accept request (for a treasury deal / forex transaction/ fixed deposits/ loan ) upto a pre-agreed limit on email/ communication basis and accept the required Contract Note/ Promissory Note / Mortgage/ other legal documents at a later stage.
  • Based on the above Industry wide recommendations and BCP testing scenarios, the Industry as a whole should look at conducting a BCP drill on a periodic basis to ensure that the BCP plans and measures are updated and effective individually as well as industry wide business continuity. Such initiatives should be taken up by the Industry wide consortium as proposed at the start of this section. For example the testing could include a BCP scenario where a particular city/ processing hub is unavailable for a day. This would involve a calling for a BCP measure by all banks, government organizations, support service providers like Telecom companies, Infrastructure providers, etc.
  • The Industry driven alarm and crisis management team as discussed above should ensure that the Industry wide plans are formulated post consultation with all stake holders, these plans are implemented, tested and updated periodically with the changing Industry scenario.
  • There are programmes in the US like the Telecommunications Service Priority System (TSPS), Government Emergency Telecommunications service (GETS) and Wireless Priority Service Programme (WPS) for provision of priority telecom availability and recovery services during exigencies for critical infrastructures and institutions. Similarly, Government of India may declare banking sector including financial markets as critical infrastructure and consider instituting such special measures for enabling conduct of critical banking services and market transactions during exigencies.
BCP Considerations for Systematically Important Payment Systems:
  • Payment systems are essential for smooth transferring of funds amongst banks and buyers / payers. In times of disaster important payment channels also act as mediums of transmitting shocks across the Industry. These payment systems (also termed as Systematically Important Payment Systems) become critical for industry wide BCP considerations.
  • Banks should consider giving special consideration to systemically critical process and systems like clearing, settlement, custody and payment processing (RTGS/ NEFT)
  • This may involve identification of the core markets (e.g., money markets, government securities, foreign exchange, commercial paper, equities, and derivatives) and essential functions supporting these markets (e.g., trading, brokering, transaction execution)
  • Banks should consider it essential to identify the highest level of operational resilience that will be required to ensure basic functioning i.e. minimum level service of these processes so that BCP is functional for the Industry involving all major financial institutions. Banks should decide on alternatives, incase a BCP has to be invoked:
– Switching over to an alternative payment site
– Shifting time slots for RTGS and NEFT payments
– Sending these request to an agreed fall over ‘Service Provider’
– Having a manual process planned which will take care of critical processing over phone, fax, etc form the alternative site
– Having duplicate hardware ready which can take care of limited requirements
  • For all of the above the current systems should have a mechanism in place to facilitate overriding the queue of requests, so as to prioritise processing to those identified as critical/important.
  • Banks should consider having BCP requirements of these systems explicitly mentioned in Vendor Contracts, where applicable.
  • Banks should consider streamlining its processes to International Standards from ISO, IEC and BSI for strengthening the payment system processes and ensuring a higher level of Business continuity.
  • Banks should consider having a multi-skill team from across the Industry trained and ready to take care of these processes with increased focus/concentration. As a BCP measure banks should consider having a dedicated and trained team ready to handle these processes at all times.
The above guidelines for Infrastructure aspects will play a critical role in co-coordinating the effort with help of service providers and industry participants. Hence, as described in above, similar considerations must be given to not just the payment system but:
    • Underlying Infrastructure (Telecom and Internet providers and use of dial up when leased line fails)
    • Support staff
    • Secondary process and alternative service providers
  • Banks should also consider having a collateral or security pool arrangement for scenarios of extreme BCP, which can be accepted Industry wide.
  • Banks should also consider putting to test the fall over plan for payment systems in the above mentioned Industry wide considerations.
ANNEXURE:

Annexure B : Suggested training needs to management IT infrastructure
KEY RECOMMENDATIONS
  1. A bank’s Board has ultimately responsibility and oversight over the business continuity planning activity of a bank. The Board approves the Business Continuity policy of the bank.A bank’s Senior Management is responsible for overseeing the business continuity planning process which inter-alia includes determining how the institution will manage and control identified risks, prioritising critical business functions, allocating knowledgeable personnel and sufficient financial resources to implement the BCP.
  2. A senior official needs to be designated as the Head of BCP activity/function.
  3. Since electronic banking has functions which are spread across more than one department, it is necessary that each department understands its role in the plan and the support required to maintain the plan. In case of disaster, each department has to be prepared for the recovery process aimed at protection of the critical functions. To this end, it would be helpful if a set up like the BCP Committee is charged with the implementation of the BCP in an eventuality and all departments expected to fulfill their respective roles in a co-ordinated manner. Hence, a BCP/Crisis Management Committee consisting of senior officials from various departments like HR, IT, Legal, Business functions, Information Security needs to be instituted
  4. There needs to be adequate teams for various aspects of the BCP at Central Office level as well as individual Zonal/Controlling Office and branch level, as required. Among the various teams that can be considered, based on need, include incident response team, emergency action and operations team, team from particular business function, damage Assessment team, IT teams for hardware, software, network support, Supplies team, team for organizing logistics, relocation team, administrative support team, coordination team
  5. Banks should consider various BCP methodologies and standards, like BS 25999 as inputs for their BCP framework.
  6. BCP should include measures to identify and reduce probability of risk to limit the consequences of damaging incidents and enable the timely resumption of essential operations. BCP should amongst others, consider reputation, operational, financial, regulatory risks.
  7. Failure of critical systems, or interruption of vital business processes, could prevent timely recovery of operations. Therefore, banks must fully understand the vulnerabilities associated with interrelationships between various systems, departments, and business processes. These vulnerabilities should be incorporated into the Business Impact Analysis, which analyses the correlation between system components and the services they provide.
  8. People aspect should be an integral part of a BCP. Generally, plans are often too focused on the technical issues, therefore, it is suggested that a separate section relating to people should be incorporated, including details on staff welfare, counseling, relocation considerations, etc.
  9. Pandemic planning needs to be incorporated as part of BCP framework of banks.
  10. Banks must regularly test BCP to ensure that they are up to date and effective. Testing of BCP should include all aspects and constituents of the Bank i.e. People, Processes and resources (including Technology).
  11. They should involve their Internal Auditors (including IS Auditors) to audit the effectiveness of BCP and its periodic testing as part of their Internal Audit work and their findings/ recommendations in this regard should be incorporated in their report to the Board of Directors and Senior management
  12. The institutions should also consider having a BCP drill planned alongwith the critical third parties in order to provide services or support to continue with pre-identified minimal required processes.
  13. Banks should periodically move their operations (including people, processes and resources(IT and Non-IT)) to the planned fall-over /DR site in order to test the effectiveness of the BCP and also gauge the recovery time needed to bring operations to normal functioning. Banks should also perform the above test without movement of bank personnel to the DR site. This will help in testing the readiness of alternative staff at the DR site.
  14. BCP should be maintained by annual reviews and updates to ensure their continued effectiveness
  15. Banks should also consider having unplanned BCP drill, wherein only a restricted set of people and certain identified personnel may be aware of the drill and not the floor/business personnel.
  16. Detailed requirements relating to procedural, infrastructural and HR related aspects of BCP have been provided so that banks can improve BCP processes helping generate best outcomes.
  17. Requirements in respect of various types of testings like table-top, call tree, simulation, component and complete have been indicated.
  18. The are many applications and services in banking system that are highly mission critical in nature and therefore requires high availability and fault tolerance to be considered while designing and implementing the solution. This aspect is to be taken into account especially while designing the data centre solution and the corporate network solution.
  19. The solution architectures of DC and DR are not identical for all the applications and services. Critical applications and services, namely the retail, corporate, trade finance and government business solutions as well as the delivery channels are having the same DR configurations whereas surround or interfacing applications do not have the DR support. Banks will have to conduct periodical review with reference to the above aspect and upgrade the DR solutions from time to time and ensure that all the critical applications and services have a perfect replica in terms of performance and availability.
  20. Configurations of servers, network devices and other products at the DC and DR have to be identical at all times. This includes the patches that are applied at the DC periodically and the changes made to the software from time to time by customisation and parameterisation to account for the regulatory requirements, system changes etc etc.
  21. Periodic checks with reference to ensuring data and transaction integrity between DC and DR is mandatory. This could be accomplished by doing the same during lean periods such as the week end or as a part of the EoD / BoD process. .The report on such conformity could be submitted to the in-charge of the BCP/DR of the banks on a regular periodical basis and deviations if any, promptly addressed.
  22. Values chosen for the RTO and RPO often mimic the industry practice and are not derived from first principles. Therefore, the DR drills that are conducted periodically have to ensure that the above parameters are strictly complied with. It would be optimal to get the RTO and RPO outlines checked by the IS Audit and further ratified by obtaining customer / user feedback for the first time. Thereafter a random check of these could be done to ensure that these are in tune with the requirements and / or expectation of customer as well as user of the systems
  23. DR drills currently conducted periodically come under the category of planned shutdown. Banks have to evolve a suitable methodology to conduct the drills which are closer to the real disaster scenario so that the confidence levels of the technical team taking up this exercise is built to address the requirement in the event of a real disaster.
  24. It is to be ensured that the support infrastructure at the DC and DR, namely the electrical systems, air-conditioning environment and other support systems do not have a single point of failure and do have a building management and monitoring system to constantly and continuously monitor the resources. The monitoring of uptime has to be made as per the requirements and agreements with the respective vendors. The same requirements have to be taken care of in case the DC/DR set up is in an outsourced location or a common shared set up as well.
  25. Success of a BCP depends on the effective data replication mechanism followed between DC and DR, which is again directly related to the requirements of the banks. The process implemented for the data replication requirement has to conform to this with no compromise to data and transaction integrity and should ensure seamless resumption of operations to the maximum extent possible. This should be conformed to in the DR simulations and reported accordingly to the Top Management as well.
  26. Given the need for drastically minimising the data loss during exigencies and enable quick recovery and continuity of critical business operations, banks may need to consider near site DR architecture. Major banks with significant customer delivery channel usage and significant participation in financial markets/payment and settlement systems may need to consider having a plan of action for creating a near site DR architecture over the medium term (say, within three years).
  27. Banks should set in place standardised reporting templates for informing senior management on weaknesses identified through testing or other means, development of action plans, allocation of needed resources, and follow-up reviews to ensure that remedial actions have been effective. Reports should be based on various security metrics typically obtained by systematic and focused log analysis as indicated in information security chapter.
  28. A sufficiently large “question bank”, related to security health of the organisation, should be prepared and given to RBI's inspection teams. A random subset of these queries could then be given to the bank's IT or security teams and related personnel, for eliciting answers in quick time.
  29. An industry-wide alarm and crisis forum or organisation (in which the key market participants and the most important providers of infrastructure services are represented) may be established. BCP heads from the participating institutions can make up the top-level of this organisation, with the lower levels forming a network between those responsible for the areas of liquidity, large-value payments, retail payment transactions and IT. Any of the institutions can invoke the alarm organisation by activating the level affected. Various recommendations relating to measures banks can consider during exigencies have been provided in the report.
  30. A website for industry-wide BCP-related information for the benefit of constituents of the industry can be considered.
  31. There are programmes in the US–Telecommunications Service Priority System (TSPS), Government Emergency Telecommunications service (GETS) and Wireless Priority Service Programme (WPS)–for the provision of priority telecom availability and recovery services during exigencies. Similarly, Government of India may declare banking sector, including financial markets, as critical infrastructure and consider instituting such special measures enabling conduct of critical banking services and market transactions during exigencies
http://www.rbi.org.in/scripts/PublicationReportDetails.aspx?ID=622
Disaster management - crisis management in Banks -- for PO Interview Reviewed by sambasivan srinivasan on 5:49:00 PM Rating: 5

No comments:

All Rights Reserved by Bank Exams © 2009
Technology PartnerNiralcube

Contact Form

Name

Email *

Message *

Powered by Blogger.