The Key Problem Answer can serve as a starting point for analysis on workplace accidents, but it is not in-depth enough to serve as a complete analysis of the situation. We received an email from someone who has worked in the field, explaining what other analyses could be performed on workplace accident data. The following is his critique of the applicability of this lesson:
It's a good lesson, well done. However, the final conclusion regarding injuries in the steel mill statistical significance is not correct. I have done extensive industrial engineering work on which I base this. It's been a while, but since this kind of example matters, I think it is warranted to talk about the proper ways to do a full analysis of such data. The data is not qualified in ways that would make it useful. The reason is that every accident has an attributable cause, and every accident has an interaction with some piece of equipment or material that can be complex. So while there are random events that result in accidents in a classic Poisson manner, usually the accident is not actually an independent event.
Here is one way to dive into working that kind of problem in the real world. (I say one way, because depending on circumstances, other views might be appropriate.One has to think about it for a situation.)
First, one should obtain a time qualifier that specifies to the minute when accidents occurred. Then look for clusters using a distance algorithm. Look at the raw and processed clusters and frequencies with floor supervisors to find out what is going on at those times. This can suggest relationships between linked but distant parts of the plant. Compare those clusters with process logs for events that occur through the day. Each plant is different, and some may not have very good records, in which case it is incumbent on the investigator to develop a set and validate them with shift supervisors.
Second, another distribution should be made by location. This can start with a simple grid of the whole plant. Similarly, doing a distance algorithm on that distribution can show significant items. An interesting view can be a nearest neighbor chain. Look at those with the plant supervisors. Third, another distribution could be made by total number of years experience, and also by number of months continuous service in position.
Fourth, the 5 W's should be applied. What, why, why, why, why? In other words, do a traceback to attempt to determine not just proximate, but initial cause. This is an industrial engineering fundamental. Attempt to do this correctly for each accident.
Last, an activity and root cause distribution could be done. What activity was the accident victim engaged in when the event occurred? What were the proximate and root causes related to the event?
Within a subset of data, if it is large enough, it might be appropriate to do a chi-square analysis under some conditions. But one can only do that after correcting for linkage factors. A chi square or standard deviation would be more likely to be used as a long term month to month or year to year measure of whether steps taken resulted in improvement.
The fundamental basis of all of the above is the recognition pioneered by Shigeo Shingo built on Deming's work, that apparently random events in industrial facilities are mostly not random. Thus, for example, random sampling can find defects. But when a defect is found, it is usually not independent, but the result of, for instance, a drill bit not having been replaced. Thus, the detection method is different from the distribution used to address a problem.