Bridging the gap between industry and academia in computing science through Chaos Engineering

Yury Niño
5 min readDec 24, 2024

--

There is a lot of discussion about the gaps between industry and academia and the potential solutions to resolve it. Some authors [SPanicker] [Dunne] [Vakaloudis] have documented several causes, among these: different mindset and goals, industry requires short-range results whereas the academic has a long-range perspective; the innovation level, industry prefers solutions with a low risk, whereas academia is interested in creating new solutions with a high innovation rate and costs management the industry is mainly concerned with costs, whereas the academia is mainly interested in the prestige and recognition.

For this reason, an integration of them require effort from both sides: the academy must produce results in a form that can be used by the industry and the industry is in the obligation of investing in innovation and research. This collaboration will allow to identify issues that are ahead of what the industry requires, to obtain funding from industrial sources such as government, finances, energetic and digital business and to open positions for academics and people with doctorates in the countries.

Accordingly, the question for the universities is: How to establish a research plan to satisfies both industry and academic requirements? There are several approaches which could answer this question, so it is necessary to face one at once. However, the focus of this article is other: proposing the application of the scientific method and the experimentation in the resolution of one known industrial problem: the development of resilient systems.

The scientific method is a standardized way of making observations, gathering data, forming theories, testing predictions, and interpreting results. The scientific method applies rigorous and structured methodologies based in the characterization of a problem, the definition of hypothesis and the prediction of results through the observation and experimentation [Wright].

Some experts realized that the scientific method offered a complete framework for building more resilient systems. They create a new field of study known as Chaos Engineering, which purposes triggering failures intentionally in a controlled way. These controlled failures allow deal with the errors before they occur in production.

In the following sections there are a description of the Chaos Engineering, its principles and strategies, the challenges that must faced, the solutions adopted by the computer and how the University should begin with the formation of chaos engineers, propose the creation of research groups and looks extension programs which allow obtaining fundings from the industry interested in deploy more resilient systems.

Chaos Engineering: its history, principles, and practices

A resilient system is a highly available and durable system. The resilient systems can maintain an acceptable level of service in the face of failure. Chaos Engineering allows to build more resilient systems. This edge of the engineering is the discipline of experimenting on a distributed system in order to build confidence in the capability to withstand turbulent conditions in production”. The goal of Chaos Engineering is to generate new information about how systems reacts when their individual components fail. Through the conduction of experiments, the Chaos Engineering allows to study the behaviour of distributed systems, address their weaknesses proactively and anticipate the actions for solving them developing reactive processes that currently dominate most incident response models.

Chaos engineering is an initiative of some of the most important internet companies that are pioneering large scale and distributed systems. Those companies had systems so complex that they required a new approach to test for failure. In 2010, the Netflix Eng Tools team created Chaos Monkey, a tool for ensuring that a loss of an Amazon instance wouldn’t affect the Netflix streaming experience. In 2012, Netflix shared the source code for Chaos Monkey on Github, saying that they “have found that the best defense against major unexpected failures is to fail often. In 2014, Netflix decided they would create a new role: the Chaos Engineer. Since October of 2014, other companies such as Twilio, Netflix, LinkedIn, Facebook, Google, Microsoft and Amazon have adopted the Chaos Engineering and have embraced their principles and methods in the building of resilient systems.

The following principles describe an ideal application of Chaos Engineering: Build a Hypothesis around Steady State Behavior, Vary Real-world Events, Run Experiments in Production, Automate Experiments to Run Continuously and Minimize Blast Radius. These principles are supported by several foundations such as: focus on the measurable output of a system, rather than internal attributes of the system; prioritize events either by potential impact or estimated frequency; consider that the systems behave differently depending on environment and traffic patterns and automate the experiments and run them continuously.

Why the Academy and Industry needs More Resilient Systems?

There are hundreds of motivations to build resilient systems, improve uptime, move fast and keep systems reliable. The academy and the industry needs more resilient systems not only because a unavailable system costs thousands of dollars to the technology companies, but also because many of our activities depends nowadays of the technology. Tammy Butow documented some examples in one of her conferences: cardiac monitoring systems work via bluetooth devices implanted in the body and a mobile app, so if one of these devices fails and it is not able to recover, a patient could die; people are changing jobs, moving homes, traveling all time, so they need systems with levels of more than 99.999%; the way in which the people is learning have changed significantly , so they need reliable access to teachers, students and learning materials; people need protection from bushfires, tsunamis, earthquakes and storms which can not fail if a disasters happened.

The list could be endless so if we want to create a better future for ourselves, those who come after us, our customers and our wider teams by focusing on building resilient systems.

It is not easy to build resilient systems, but that doesn’t mean we shouldn’t try. The University has an important responsibility here as articulating actor in the generation of new techniques and models, which allow building more resilient systems. So, considering that the chaos engineering and resilience engineering are tools that we can use to create reliability, the universities should include the associated concepts in their academic programs.

REFERENCES

[SPanicker] S.Panicker, Sujata. Bridging the gap between Industry and Academia: Strategies and Solutions for Higher Education in Oman-Focus on Business Studies. 2012.

[Dunne] Dunne, Elisabeth and Rawlins, Mike. Bridging the Gap Between Industry and Higher Education: Training Academics to Promote Student Teamwork},

[Vakaloudis] Vakaloudis, Alex & O Keeffe, Michelle & Hayes, Sarah & Horgan, Trevor & Cahill, Brian & Delaney, Kieran. Bridging the Gap between academia and industry: The role of technology gateways in entrepreneurial education. 2017

[Wright] Gavin Wright. What is the Scientific Method. 2023.

[Perruchoud] Perruchoud, Dominique. Bridging the gap between academic research and industry

[Vrande] Van de Vrande, Vareska, et al. “Open innovation in SMEs: Trends, motives and management challenges.” Technovation 29.6 (2009): 423–437

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Yury Niño
Yury Niño

Written by Yury Niño

Cloud Infrastructure Engineer @Google. Chaos Engineer Advocate. Loves building software applications, DevOps, Security and SRE

No responses yet

Write a response