PHYSICS, MACHINES AND DATA
The Journey of a Data Scientist: Chemical Engineering Meets Data Science
JUNE 16, 2021- Author: Tignis | Category:Employee Spotlight

Chemical engineers have a unique blend of math, physics, and chemistry knowledge, and that intersection of expertise has many strengths. What they tend to lack is a strong background in how to use software to solve data problems. Because software best practices and statistical modeling in general are not heavily emphasized in the chemical engineering domain, collaborating with data scientists, and learning data science skills, is helpful. When chemical engineers are missing data, it can lead to incorrect insights. Waste is a common consequence of incorrect insights. For example, a chemical process that is not running optimally wastes energy or heat, which equates to wasted money. Similarly, processing errors can waste materials, time, and financial resources. Sometimes the stakes are higher. If something goes wrong in a chemical reaction or chemical process, it can have catastrophic safety impacts for the plant operator and potentially for the whole community. In biochemicals, a mistake in a drug, vaccine, clinical trial, or how a diagnostic tool is used can impact consumers’ lives. Having a better understanding and application of data science can alleviate these concerns. Here are three practical examples: Process optimization: The bread and butter of chemical engineering is knowing how to develop a process to make a product that solves some problem. Data science, machine learning (ML), and digital twins can help to optimize this process and improve fault detection. For instance, in the pharmaceutical industry, defining the process window for formulating a particular product or drug is highly experiment- and resource-intensive, and involves numerous bioreactor runs. A well-defined window is needed to prove the process is stable despite any perturbations, all the product quality metrics will be within a certain range, and a certain yield will be achieved. Data science can accelerate this process by simulating it with a digital twin and applying predictive models to determine how the process window will be affected by the different inputs and choices made along the way.

Figure 1: Machine learning models can be used for fault detection and process optimization for typical chemical engineering components and processes.
Surrogate modeling: Physical models reflect how a part of a process, whether a component or piece of equipment, should work, but those based on first principles or some chemistry or physical calculation are limited by the validity of the assumptions. For example, a petrochemical plant’s physical model predicting the concentrations of impurities in a distillation column’s product stream will be limited by crude changes that invalidate the model’s assumptions. With data science and ML, a surrogate model made from a physical model can begin with the original assumptions, and then look at the data and learn from it to become more accurate than the physical model. Once optimized, the ML surrogate model can replace the physical model. Efficiency is another advantage of using surrogate models. To create an accurate physical model, it can take an expert a long time to write down all the equations and solve it. Also, physical models are often too computationally expensive to deploy in production. A surrogate model can be faster to implement and computationally quicker to run. An example of this is when chemical engineers work to improve the efficiency of lithium ion batteries by changing the charging profile to increase their lifespan and capacity. Changing profiles to get more (or less) energy out of a battery, or to reduce its degradation, typically requires a complicated and computationally expensive physical model. A surrogate model can arrive at the same optimum charging profile as the physical model, but much faster.
