IT-Operations in Corona Mode

Having a pandemia raging on full throttle, shutting down society and major parts of business life is of course a real IT challenge. The challenge grows, when the according IT is homed in an basically ongoing business, facing a complete shift in its mode of operation.

This leaves the IT-departement with two major challenges, I want to discuss today:

  1. Organize a resilient and disaster tolerant IT organisation
  2. Adapt to the quick changes in business needs, user needs and user behavior

Resilient organisation of structure and workflow within an IT operations team is today not necessarily an incredible complex thing, since many tools and methods should be available anyway. On top it incorporates a couple of measures, I consider already for quite a while a modern and up to date IT organisation style.

But this rapid shift may result in a fundamental change of behavior and employee perception and well being.

Isolation or Social Distancing

First of all define isolation groups and organize office space in a way that no isolation group meets each other. Even within isolation groups maximize use of home office arrangements. While setting up isolation groups it is a very good idea to maximize operational experience in each group, so that you have all necessary knowledge to run, fix and implement your major infrastructure and applications.

If you have already a project capability driven IT organisation this matches parts of your already existing organisation and you may have to leverage experience and other treats within the teams. In many more cases you probably have a tayloristic distribution of labor, following competency based teams, which leaves you in big troubles now. You have to split these teams, to keep the knowledge and capability of this competency available in any case.

Having the isolation paradigm set, try still to create even smaller groups, keeping in mind that these groups are not ment to meet each other. E.g. within the whole set of capabilities divide frontoffice and backoffice. If they have to be partially on premise, separate office spaces. Particular if there is one group facing customer contact (external as well as internal) separate them from people in no need for physical contact.

On top identify critical personnel – which needs to be protected by all means and therefor remains isolated from the rest of the staff at all times.

And needless to say, that private “social distancing” is a good precaution for the workforce at all.

Repriorisation

So besides facing the inevitable business changes probably there will be a review of cash flow planning and as inevitably probably the IT department will be asked to deliver their share of financial savings to maneuver the company through these difficult times.

To prepare a decision- making for this purpose, there are probably four categories of projects, that are more or less universal and may be applied on the current ongoing IT activities.

1. Priority A are projects and tasks that are inevitable and even in crisis mode in high demand, e.g. in case of social distancing, to enable home-office users, increase mobile workforce capacities and ensure supply of increasing network communication demand. These projects perhaps need even unplanned funding but are required to keep the business on the best possible level running. These projects are in best case already ongoing and funded or put perhaps even higher burden on the cash-flow situation. Nevertheless they are supposed to be necessary, to keep the revenue and cash-flow on the possible levels. Logistics escalation efforts probably need to be undertaken, to get the critical components on line.

2. Priority B are projects that are still of strategic value and most of the investments are already done. There is limited sense in stopping these projects, since they provide still substantial value to the business and particular its recovery perspective after the outages. Probably in pandemic crisis there will be delays in supply chains and delivery of needed project materials. Projects in this category may be postponed or used to control workload, since they are either not critical in timing or can not be implemented due to missing components anyway.

3. Priority C categorizes projects which will not jet be stopped but postponed significantly to remove stress from time budget and cash- flow. They should be postponed in a timely manner, keeping the planning up right. Projects qualify therefore, where the investments have not yet been started or with limited immediate business impact.

4. Priority D finally qualifies projects, which are long term or without any immediate impact. This financial- and time- budget can be anyway saved during a crisis and the according business- values can earliest be realized in the mid future. These projects might right away be postponed to next year or even later and may as well in some cases may be canceled completely.

Frequent Reprioritization

This project prioritization may be reviewed and reassessed frequently since crisis demands may change during the cause of time and the business impact develops highly dynamical as do the delivery and team situations. According to every given scale of measure a pandemic situation is a situation with one of the highest dynamic developments, which there can be and this demands frequent reassessment of the current situation and IT operational progress.

Adapt Communication

The two previous points of creating isolation groups and permanent reprioritization put even more stress on the staff. External structure which is typically provided by an average work environment is vanishing over night in the home office situation. The teams themselves have to communicate more frequently to align work flows and tasks – since IT operations works on an incident driven work- mode anyway, but the inherent clarifications of over the isle communication as well as the implicit structure of the workday provided by breaks, commute and colleague interaction is pulling on every single colleague. Therefore manager or team- leads are demanded to reach out and doublecheck on individual and personal needs as well as on the project or task progress.

In parts IT operations is forced here to adapt work modes which are typically known from the agile world of software development, nevertheless suffering from the large percentage of non- work- package driven tasks that steadily disrupt the individual work schedule – something that helps agile software development to organize progress of the project.

Technical Adjustments

Assuming that the IT operations team is properly reorganized they have to face a massive shift in the needed application environment. Massive shifts in the workforce and adaptions of work procedures hit the IT operations team all at once.

High Demand of Remote Access Workspaces

Typically in pandemic situations a massive shift to home offices happens all of a sudden. There is a whole tool set to allow remote work on different levels and under several prerequisite assumptions.

1. The Offline Worker

The basic offline worker has everything needed on a isolated system and can work self sustained, having all applications and data offline at hand. Communication is non integrated and probably done on the mobile phone, perhaps the home landline or not at all. The work scope is limited and the result exchange with colleagues and the business happens not in an electronic or application based manner, until the employee returns to the office.

The disadvantages of this work mode are limitless since online applications are not available, data synchronicity when returning to the office may result in many data conflicts and the communication deficits hurt heavy in integrated work flows. If there are tools for checking out and in of work packages, the collaborative work within the team is blocked for all locked resources during the crisis.

2. The Nonconnected Online Worker

This type of user has basically most of his data and applications at hand, works similar to the previous colleague but uses online tools for interaction with colleagues and customers. Popular methods may be eMail connection in the style of Outlook Anywhere or Web-access (OWA), Skype4Business using external reverse proxies or federation gateways and perhaps even SharePoint web frontends for data exchange.

The users system is widely independent from central ressources, allowing independent work but taking advantage of many modern means of communication. Issues with data synchronicity remains, but since this work pattern very often matches people with intense communications jobs, there the web based applications definitely provide advantages, since they are robust, easy to use and remove workload from otherwise critical systems.

3. The VPN Connected Online Worker

This user type basically has the same work setup as in the office, connecting the work place through a encrypted private tunnel into the internal company network. Organisation-, communication- and application wise there is no difference to the standard office environment.

Unfortunately the typical IPSEC connections have some downsides as soon as you enter the world of application integrations. VoIP telephone integration as well as many online database integrated front- ends of many business applications – the so called fat clients – typically suffer from latency added through the encryption/decryption process of the network connection and result in poor application performance, whatever that means in the different environment.

Additional issues may rise due to usage of printers or other issues around the private network connection.

4. The Virtualized Workplace Worker

Today in most cases the best approach is to use an entirely ritualized work- environment either through dedicated virtual desktops as in VDI systems or in presented desktop sessions, provided through according sever environments.

Whatever approach to choose, all of them integrate the work environment in the central resources, close to application servers and corporate data storages, so over all this provides a very seamless work experience. The network therefore transports only display information from the server environment to the user and user interaction in return. Network requirements are quite predictable and data as well as application integration is not an issue.

The server based virtual desktop is typically a very convenient approach to provide standard office applications to hundreds or thousands of users. Here many users share virtual instances of the underlying server and its operating system frontend. All typically communication methods can be integrated, as in any other office environment. Typically these installations allow even to use singled out applications, without starting up an entire desktop environment.

For applications heavy on the hardware consumption side in terms of e.g. memory or compute and predominantly graphics, are not entirely suitable for the shared operating system. Besides resource use bad application design and programming, prohibiting the start of more than one application process per system may require the approach as well. Basically here every user gets a dedicated virtual machine, taken from a stack of application driven prepared system instances, started with a reserved resource set on the according underlying physical system.

The good news is that the most common solutions from Citrix and VMWare provide unified access methods that allow the freedom of choice of the according usage scenario and based on the current application needs. In most cases management, deployment and maintenance are supplied by a unified platform environment.

Data Locality

The decision, which access method to use, depends on the level of preparation and the suitability of the work do be done. In nearly every remote application scenario there are issues with data locality and how the applications underneath handle that. E.g. non- synchronized data results in broken business processes. Distributed data permanently synchronized back and forth trough encryption gateways results in poor performance. Worst case applications don’t prohibit data corruption and continue with wrong information, corrupting even more data …

So in any integrated application case, the data locality has to be checked and the appropriate access method has to be chosen.

Adaption in Applications

If everything else stalls, even applications may be adjusted to be used with different frontends, different access patterns, modified data caching settings and many more methods. – This basically lead to too many positions to cover – I spare that here.

Never the less, be aware that during heavy short term shifts of application access patterns all kinds of changes in resource needs or application behavior may raise instantaneously and require immediate action.

Preparations and Clients

To be absolute clear – although many of the organisational changes can be implemented in a couple of days, the more demanding the technical side gets, the harder it will be.

In the best case, the organisation has enough mobile work places, some spares at stock and perhaps only a couple of accessories missing. The necessary adjustments may be covered quickly and that’s it. If on the other hand many users have fixed work stations or thin clients the necessary procurement during a pandemic crisis may be a hard thing to accomplish. On the one hand many other companies try to overcome their shortcomings, during a time when supply chains start to fall apart and logistics gets more difficult day by day. Besides the investments should follow a clear client strategy, since otherwise these emergency investments create a long lasting support burden, jeopardizing many client automation efforts. The more desperate the situation gets, the harder a good long term usable solution will be – and keep in mind, actually you want to remove stress from the cash flow

The infrastructure on the other hand is an entirely different story: Online solutions like 2. and 4. need months or years of planing and appropriate setup. You may implement quick shots, but with them you purchase cleanup burdens. On the other hand appropriate data integrity is definitely a valuable asset and should be considered along the decision. Solution 1. is only appropriate if there is no data integrity need, therefore it does not need any infrastructure at all. Solution 3. may be created on short notice, even if it’s not available at all, even with a small budget but again, it has it’s performance downsides. So an at some level prepared environment makes sense in any given manner, so that in the case resources may be added, but no fundamental new work needs to be done.

On top of that, consultants and suppliers for this type of infrastructure are suffering probably from the same pandemic, again delaying procurement and logistics.

The Cloud

In all that probable desperation, probably the move into the cloud provides quick and easy to use solutions, no doubt. Again this needs massive consideration of data locality – if this should work conveniently. Probably there will be results on short notice, but without the confidence how and where your companies information assets may end up in long term appropriately. Cloud access cost or even worse cloud extraction cost is often faced when its to late and you have to pay to get access to your own data assets.

Long Term Strategy

So to prevent short term decission during such a massive crisis, it’s an appropriate assumption that you invest in time into platform strategies, that allow you quick response in such a case of emergency.

Kyp. F.

P.s. This is pretty much a follow up on my teams current actions during Corona – CoVID 19 crisis. We had mostly everything in place to manage a 150% increase of home- office usage in less about two weeks including global heavy mechanical design with CAD and manufacturing control.
We had to procure capacity extensions for several resource pools, and we had to redesign several solution aspects since they had to scale beyond our previous imagination.
But overall we had nothing to touch, which we hadn’t planned on the short or mid time frame and we had the chance to follow 95% existing plans. This worked better than with any company we or our suppliers knew within our surroundings. The replies from our businesses, at least the one following our recommendations, were exceptional positive. Some of them implementing the transition into emergency business operations in mere two days.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.