Consistency is key to maintaining a cohesive ecosystem that enables data-driven decision making. Historical data and live production floor data must be aligned. All data must be cleaned, formatted, contextualized, and organized into a taxonomy in order for machine learning technology to operate effectively.
A taxonomy—a hierarchy of labels and classifications derived from knowledge of the domain—provides context to the data. For example, while there might be separate metrics for machine temperature and material temperature; a taxonomy helps identify that these are both types of temperature metrics and also indicates which part of the process they control or affect.
It is also critical to align metrics with metadata such as the product, machine or quality state in order to make any analysis meaningful. For example, if a product tested offline is defective, but you don’t know which line or shift that product was produced on, it’s impossible to go back and find what other products might have been affected by the quality failure.
In this step, different types of algorithms are used to perform conversions, transformations and complex calculations to the contextualized data. These calculations might be simple like converting from Fahrenheit to Celsius or they can be more complex such as computing windowed temporal aggregations.
One scenario from cable manufacturing is measuring the diameter of a product. You might measure the diameter from top to bottom and then measure it from left to right. The algorithm will average these two measurements as well as calculate an acceptable standard deviation. For example, if the cable should be 10 centimeters but can go either up to 10.15 centimeters or down to 9.85 centimeters.centimeters but can go either up to 10.15