March 11th, 2022 Author: Sydney Provence Category: Thought Leadership
The semiconductor industry is rife with opportunities for machine learning, as well as atypical challenges for machine learning applications. One such challenge involves data that superficially appears to be in a time series, yet displays a number of distinct characteristics that prevent typical time-series analysis techniques.
Let’s use a very simplified film deposition recipe as an example. One can imagine a substrate is heated up to a process temperature from the base chamber temperature while the film material is also ramping up to temperature (step 1). Once both are heated, a shutter sequence is triggered and the film is deposited on the substrate for a fixed duration (step 2) until the shutter is closed, the film deposition ends, and the substrate and film material are cooled (step 3).
There are three sensors in this recipe, tracking the substrate temperature, film material temperature, and the shutter status (open or closed) over the three recipe steps. Nominally, these sensors are providing time-series data, but the data outside of the recipe duration is inconsequential when tracking a particular substrate.
This sample will have three time-series as features for any response that is chosen for quality control over the process. Working with these process-related time series in the semiconductor industry is fundamentally different from traditional time series analysis–there’s no reason to believe that each successive point in the data series depends on past values, and any forecasting in any supervised learning is usually for a response that is completely atemporal and outside of the process at hand. Comparing the traces that accompany each sample relies on “stacking” the data, rather than processing it in a continuous series.
Here are a few of the challenges that often accompany these types of datasets:
HIGH DIMENSIONAL DATA, LOW SAMPLE SIZE
Without feature engineering, the dimensionality of these types of datasets can be extremely high. Typical processes in semiconductor manufacturing can involve a large number of sensors tracking different recipe parameters. These recipes can be extremely complex, and even in high-volume manufacturing the number of datapoints that can accompany each sample can be so high as to dwarf the overall number of samples under consideration.
Supervised machine learning in this domain can be difficult due to “the curse of dimensionality.” Overfitting a model to the data, or creating a model that performs well during training but cannot accurately perform with new data, is a very likely outcome in this regime. Unsupervised methods such as clustering can also be difficult, as the distance between samples can be hard to quantify in high-dimensional spaces.
AUTOMATED FEATURE ENGINEERING CAN BE DIFFICULT WITHOUT DOMAIN KNOWLEDGE
Reducing the number of features is an obvious solution to working with high-dimensional datasets, but semiconductor process datasets often have certain constraints that require technical expertise in the process.
Returning to the simplified film deposition example, there may be features in each trace that are obvious to a process engineer who works with the tool that may not be obvious to a data scientist. There may not be any reason to consider the ramp steps while the shutter is closed (steps 1 and 3), and the information in step 2 may be entirely sufficient in feature extraction. The duration of the deposition in step 2 is likely an important feature, perhaps more so than the overall duration of the recipe. The temperature overshoot from the substrate temperature ramp might be interesting to quantify as a feature, rather than just using the setpoint from the recipe. Rather than taking the film material temperature at face value, it may be more interesting (and interpretable) to attempt to extract a deposition rate as a function of time from the data.
There may be interesting opportunities to quantify and gauge the effect of physical phenomena in the discrepancy between the recipe setpoints and the anomalies that occur in real time. These may not be obvious without expertise in the subject matter. Domain knowledge can help to both reduce the number of features in a way that takes into account the physical aspects of the problem and increase the interpretability of the model.
LOW FEATURE VARIANCE WITHIN RECIPE GROUPS
In high-volume manufacturing, the same product will generally be produced by running the same recipe every time, with some changes in tool drift or calibration. In practice, there may be a large number of features present in the data that are known to have a large theoretical importance on the response, but are ultimately unchanging in the available dataset. In this case, modeling a process variable’s impact on the response can be limited to the impact of outliers on the dataset, or looking at anomalies in the dataset to identify failure modes present in the tool.
SUPERVISED MACHINE LEARNING CAN BE TRICKY OR MISLEADING WITHOUT A CLEAR IDEA OF THE PITFALLS
Many semiconductor processes have some quality metric that can be used to identify how well the process performed, but there are often unique challenges in gathering and identifying these performance metrics. For example, some firms may not track individual wafers throughout the process and rather rely on sampling from a lot identifier, causing any modeling to lose granular information about each wafer in the aggregation of the data over the lot. Alternatively, quality control for a process may be time intensive or expensive to perform (SEM, SIMS, etc.) and are therefore only performed intermittently, limiting the number of samples that can be used in an analysis.
Many of the performance metrics are spatially distributed across the wafer, in that they are measured at different surface points across the wafer. When using temporal data from the process time series, there is nothing intrinsic in the data that can identify variations in the spatial distribution of process incongruities. Information about process uniformity can be lost unless it is specifically accounted for.
Furthermore, trace datasets can be limited in scope to a single tool that is merely one of many in a complicated process, with a response that is measured after encountering multiple stages in a process for which little information may be available. In these cases, models risk overstating the impact of a single process on the overall product development, simply because they lack information about what happened to the wafer before and after the process for which there is information.
INCREASING THE ODDS OF A SUCCESSFUL MACHINE LEARNING ENGAGEMENT
When selecting suppliers, contractors, or data scientists as partners on semiconductor machine learning projects, make sure to cover your bases. Ask them deep and specific questions on their experience in semiconductor and similar data cases.
– Have they worked with semiconductor manufacturing processes in the past?
– Are they familiar with non-continuous process data sets that may need to be divided and subdivided into steps?
– Have they worked with extremely high dimensionality data sets?
– Do they have domain or process specific knowledge that will allow them to augment model feature selection?
These are all important questions when selecting partners for machine learning in semiconductor manufacturing.
Tignis is a perfect fit when undertaking your next machine learning project. Regardless of the potential challenges, Tignis has ample experience working with this type of data. Tignis combines domain expertise of the underlying physical parameters of a problem with machine learning capabilities to identify the most promising directions and applications for the data available.