OPS Processes

OPS Processes

登録は簡単!. 無料です
または 登録 あなたのEメールアドレスで登録
OPS Processes により Mind Map: OPS Processes

1. Triage Process

1.1. prioritization

1.1.1. highest priority and emergency request

1.1.1.1. Risk Impact

1.1.1.1.1. Patient impact

1.1.1.2. Business Priority

1.1.1.2.1. CIP goals

1.1.1.3. highest cost

1.1.1.4. Management Priority

1.1.1.5. No Last minute requests

1.1.1.5.1. Zoom chat

1.1.1.5.2. notifications

1.1.1.6. Developer Access

1.1.1.6.1. issues known latter

1.1.2. Emergency Criteria

1.1.2.1. Patient Impact

1.1.2.2. SLA not met

1.1.2.3. Loosing $

1.1.2.4. Disaster Recovery risks not met

1.1.2.5. critical application

1.1.2.5.1. i.e. UMA

1.1.2.5.2. Core Services

1.1.3. Business Criteria

1.1.3.1. Cost

1.1.3.2. reputation

1.1.4. group responsibility

1.1.4.1. Marketing

1.1.4.1.1. uptime

1.1.4.1.2. ability to recover

1.1.4.2. Product team

1.1.4.3. Engineering team

1.1.4.4. Master Contacts List

1.1.4.4.1. Org/role/responsible

1.1.5. Product & Functional Requirements

1.1.5.1. monitoring requirments

1.1.5.1.1. in confluece

1.1.5.1.2. ACCS

1.1.6. Alert/Notification Tools

1.1.6.1. Victor Ops

1.1.6.1.1. Phone Alert - 24/7

1.1.6.1.2. SRE/OPS Manager Configures workflows

1.1.6.1.3. Gathers information from multiple sources to one central location and sends only 1 alert

1.1.6.1.4. Sends alerts to different roles, priorities, different times, and escalation path

1.1.6.1.5. API calls to Nagios, splunk, Network too

1.1.6.2. Nagios

1.1.6.2.1. thresholds

1.1.6.3. Splunk

1.1.6.4. Stealth Watch

1.1.6.5. Network tools

1.1.6.5.1. Netbrain

1.1.6.5.2. Orion

1.1.6.6. Cloud Tools

1.1.6.6.1. AWS Service Health Dashboard

1.1.6.6.2. Azure Tools

1.1.6.6.3. Not On-Prem- Managed Service

1.1.6.7. link to corporate emails, cell phones- sends alerts as emails, text alert, phone call

1.2. Responsibility Role (dedicated or assigned on a rotation)

1.3. Used Across OPS org

1.3.1. any team working on issues and service chagnes

1.4. Assigned Based on Expertise

1.4.1. Software

1.4.2. System

1.4.3. Change Type

1.4.3.1. Errors

1.4.3.2. Bug fixes

1.4.3.3. Correcxtions

1.4.4. Level of Outage based on level of ability and authority

1.4.4.1. Level 1-Oncall engineer

1.4.4.2. Level 2-Senior Eng to Vet

1.4.4.3. Level 3-OPS Manager to Vet

1.4.4.3.1. Pools in Outside resources

1.5. Authority to Make Changes

2. Assessing Risk Process -Risk Impact and Risk Scales

2.1. 1

2.1.1. low

2.1.1.1. Does not bring down any services

2.2. 2

2.3. 3

2.3.1. med

2.3.1.1. Devices serve other networks

2.3.1.1.1. indirect impact

2.4. 4

2.5. 5

2.5.1. high

2.6. Criteria

2.6.1. serviceability

2.6.1.1. Robot/Website client applications Downtime

2.6.2. Business Risks

2.6.2.1. reputation

2.6.2.2. cost

2.6.2.3. how long an outage downtime is

2.6.3. Scope of Impact to Applications/Servers

2.6.3.1. Database maintenace-200(affect developers only vs external customers) servers vs 10(has med. devices on it)

2.6.3.1.1. highest to external customer impact

2.6.4. low risk for cosmetic changes

3. Notification

3.1. notification change process

3.1.1. realtime data

3.1.1.1. inventory database

3.1.1.1.1. dev system

3.1.1.1.2. Infrastructure inventory system

3.1.1.2. Configuration Management

3.1.1.2.1. CMDB

3.2. email

3.3. What gets notified

3.3.1. affects HW/SW environment

3.4. dashboard

3.5. status meetings

3.5.1. Network Status provided(included emergency)

3.6. JIra

3.7. ServiceNow

3.7.1. inventory management module

3.7.2. Stakeholder Calendar reminder invite

3.7.2.1. defines stakeholder impact activities

3.7.2.2. pre-notification prior to change approval

3.7.3. dashboard

3.7.3.1. Slack

3.7.3.1.1. logs change event

3.7.3.2. integrate Victor Ops

3.7.3.2.1. future change

3.7.4. Calendar dashboard outlook

3.8. Confluence

3.9. Centralized Location

3.9.1. Network changes

3.9.2. Database Changes

3.9.3. Development Changes

4. Non FDA regulated quick changes

4.1. Networking Shared process

4.1.1. turn off switch Port

4.1.2. Open Firewall

4.1.3. VIP Builds

4.2. removing processes- in SNow

4.3. streamlined Intake Form-Snow

4.4. Quick streamlined non approval changes

4.4.1. out of scope for QMS changes and product changes, UMA-Part 11

4.5. inscope SNowe

4.5.1. add more servers to existing server pool

4.5.1.1. DVMT supporting apps

4.5.1.1.1. Supporting applications -adding additional resources to a server pool (allready exist and tested) - exDVMT Parsers,

4.5.2. Risk Impact (1) minimal or non

4.6. out of scope-

4.6.1. emergency change

5. Reporting Process

5.1. Status tool

5.1.1. integrated to Slack

5.2. summarize notifications

5.3. Communication

5.3.1. for unforeseen needed support

5.3.2. Awareness of Ongoing/Current Issues

5.4. data for process improvements

5.5. Tracking issues closure

5.6. Data for Service Performance

5.7. Reporting for On time delivery

5.8. Snow Reporting Dashboard

5.8.1. one dashboard

5.8.1.1. different tabs

5.8.1.1.1. ongoing changes

5.9. Snow Reporting Email Notifications

6. Change Review Board Process

6.1. deny

6.2. approve

6.3. hold

6.4. request more info

6.5. Representative Qurum

6.6. Central location of upcoming change

6.7. CAB Required?

6.7.1. filter-clarify request, questions answered, ownership, make additional changes missed in UAT , fillout Intake form or elaborate the intake form

6.7.1.1. Move to CAB after approval

6.7.1.1.1. Review and Approve prior to CAB meeting if no concerns - (task(s) in Snow)

6.7.1.1.2. concerns

6.7.1.2. Time limit to prepare for CAB mtg

6.7.2. Risk Assessed, Rollback Plan required

6.7.2.1. Confluence

6.7.2.1.1. risk assessment finalized? SNow Risk Assessment approved

6.7.3. fast streamliined process

6.7.3.1. Change IP address

6.7.3.2. Update DNS- no hardcoded names or IP addresses

6.7.3.3. no cascading affect

6.7.3.4. add additional resources

6.7.3.4.1. double servers

6.7.3.5. turn on Network Switch Port

6.7.3.6. Open Firewall

6.7.3.7. Build F5 VIP

6.7.3.8. open ended addition field

7. Snow Jira Workflow

7.1. Production Change?

7.1.1. fill out Requirements Form/Intake Form

7.1.1.1. Streamlined Process

7.1.1.2. Assigned to CAB for review

7.1.1.2.1. CAB Approval Task & Notification

7.1.2. Owner Manager approval task to move into CAB

7.2. Dev/UAT-Assigned to Engineer

7.2.1. existing Jira & Snow ticket

7.2.2. apply best practice guidline

8. Emergency fixes

8.1. not unplanned changes that have become critical - failure to plan properly

8.2. Authoritative Request

8.3. Detrimental Bug

8.4. Bussiness Reupatation