OPS Processes

OPS Processes

登録は簡単!. 無料です
または 登録 あなたのEメールアドレスで登録
OPS Processes により Mind Map: OPS Processes

1. Triage Process

1.1. prioritization

1.1.1. highest priority and emergency request Risk Impact Patient impact Business Priority CIP goals highest cost Management Priority No Last minute requests Zoom chat notifications Developer Access issues known latter

1.1.2. Emergency Criteria Patient Impact SLA not met Loosing $ Disaster Recovery risks not met critical application i.e. UMA Core Services

1.1.3. Business Criteria Cost reputation

1.1.4. group responsibility Marketing uptime ability to recover Product team Engineering team Master Contacts List Org/role/responsible

1.1.5. Product & Functional Requirements monitoring requirments in confluece ACCS

1.1.6. Alert/Notification Tools Victor Ops Phone Alert - 24/7 SRE/OPS Manager Configures workflows Gathers information from multiple sources to one central location and sends only 1 alert Sends alerts to different roles, priorities, different times, and escalation path API calls to Nagios, splunk, Network too Nagios thresholds Splunk Stealth Watch Network tools Netbrain Orion Cloud Tools AWS Service Health Dashboard Azure Tools Not On-Prem- Managed Service link to corporate emails, cell phones- sends alerts as emails, text alert, phone call

1.2. Responsibility Role (dedicated or assigned on a rotation)

1.3. Used Across OPS org

1.3.1. any team working on issues and service chagnes

1.4. Assigned Based on Expertise

1.4.1. Software

1.4.2. System

1.4.3. Change Type Errors Bug fixes Correcxtions

1.4.4. Level of Outage based on level of ability and authority Level 1-Oncall engineer Level 2-Senior Eng to Vet Level 3-OPS Manager to Vet Pools in Outside resources

1.5. Authority to Make Changes

2. Assessing Risk Process -Risk Impact and Risk Scales

2.1. 1

2.1.1. low Does not bring down any services

2.2. 2

2.3. 3

2.3.1. med Devices serve other networks indirect impact

2.4. 4

2.5. 5

2.5.1. high

2.6. Criteria

2.6.1. serviceability Robot/Website client applications Downtime

2.6.2. Business Risks reputation cost how long an outage downtime is

2.6.3. Scope of Impact to Applications/Servers Database maintenace-200(affect developers only vs external customers) servers vs 10(has med. devices on it) highest to external customer impact

2.6.4. low risk for cosmetic changes

3. Notification

3.1. notification change process

3.1.1. realtime data inventory database dev system Infrastructure inventory system Configuration Management CMDB

3.2. email

3.3. What gets notified

3.3.1. affects HW/SW environment

3.4. dashboard

3.5. status meetings

3.5.1. Network Status provided(included emergency)

3.6. JIra

3.7. ServiceNow

3.7.1. inventory management module

3.7.2. Stakeholder Calendar reminder invite defines stakeholder impact activities pre-notification prior to change approval

3.7.3. dashboard Slack logs change event integrate Victor Ops future change

3.7.4. Calendar dashboard outlook

3.8. Confluence

3.9. Centralized Location

3.9.1. Network changes

3.9.2. Database Changes

3.9.3. Development Changes

4. Non FDA regulated quick changes

4.1. Networking Shared process

4.1.1. turn off switch Port

4.1.2. Open Firewall

4.1.3. VIP Builds

4.2. removing processes- in SNow

4.3. streamlined Intake Form-Snow

4.4. Quick streamlined non approval changes

4.4.1. out of scope for QMS changes and product changes, UMA-Part 11

4.5. inscope SNowe

4.5.1. add more servers to existing server pool DVMT supporting apps Supporting applications -adding additional resources to a server pool (allready exist and tested) - exDVMT Parsers,

4.5.2. Risk Impact (1) minimal or non

4.6. out of scope-

4.6.1. emergency change

5. Reporting Process

5.1. Status tool

5.1.1. integrated to Slack

5.2. summarize notifications

5.3. Communication

5.3.1. for unforeseen needed support

5.3.2. Awareness of Ongoing/Current Issues

5.4. data for process improvements

5.5. Tracking issues closure

5.6. Data for Service Performance

5.7. Reporting for On time delivery

5.8. Snow Reporting Dashboard

5.8.1. one dashboard different tabs ongoing changes

5.9. Snow Reporting Email Notifications

6. Change Review Board Process

6.1. deny

6.2. approve

6.3. hold

6.4. request more info

6.5. Representative Qurum

6.6. Central location of upcoming change

6.7. CAB Required?

6.7.1. filter-clarify request, questions answered, ownership, make additional changes missed in UAT , fillout Intake form or elaborate the intake form Move to CAB after approval Review and Approve prior to CAB meeting if no concerns - (task(s) in Snow) concerns Time limit to prepare for CAB mtg

6.7.2. Risk Assessed, Rollback Plan required Confluence risk assessment finalized? SNow Risk Assessment approved

6.7.3. fast streamliined process Change IP address Update DNS- no hardcoded names or IP addresses no cascading affect add additional resources double servers turn on Network Switch Port Open Firewall Build F5 VIP open ended addition field

7. Snow Jira Workflow

7.1. Production Change?

7.1.1. fill out Requirements Form/Intake Form Streamlined Process Assigned to CAB for review CAB Approval Task & Notification

7.1.2. Owner Manager approval task to move into CAB

7.2. Dev/UAT-Assigned to Engineer

7.2.1. existing Jira & Snow ticket

7.2.2. apply best practice guidline

8. Emergency fixes

8.1. not unplanned changes that have become critical - failure to plan properly

8.2. Authoritative Request

8.3. Detrimental Bug

8.4. Bussiness Reupatation