Abbreviated root cause analysis:
1. A nightly update job began failing after a new release because the unchanged job was not built to handle the data found in the production environment.
2. The static data (location information, such as US states) in production was changed with the release, but not any of the dynamic data updated by the job (personnel information, such as names and addresses).
3. The dummy dynamic data used in the testing environments worked with the changes to the static data, because the dummy data was created with the assumption that all personnel have some sort of location entered (non-nullable field), even if it doesn’t match our list of recognized locations.
4. The static location data contains an abbreviation field which was left blank for new, special-purpose locations stored in the same table but irrelevant to the personnel data referenced here. The abbreviation field is also non-nullable.
5. The production data contained surprising, long-standing examples of personnel with blanks for their state abbreviations. For the first time, the static table contained records that also had blanks for their abbreviations (since they don’t really have one). The update query joined the abbreviations, and every person with a blank matched all of the locations with a blank, instead of the one unique state. If either or both of these had been null instead of blank, there wouldn’t have been any crazy matches, as nulls don’t join.
|111388||Rumpel Stiltskin||Special Region Alpha|
|111388||Rumpel Stiltskin||Special Region Beta|
|111388||Rumpel Stiltskin||Special Region Gamma|
|111388||Rumpel Stiltskin||Special Region Delta|
- Use null rather than blank when you mean that there is no such information.
- This applies to the dynamic data.
- This applies to the static data.
- This is a cop-out remedy, because it translates into “don’t make mistakes.”
- Require that the static table’s Abbreviation column only contain unique values.
- This would have been a good change to make when first writing the nightly update job.
- This guarantees that there can only be a single match for a blank abbreviation.
- While the match is probably not what you want for the personnel record, the developer adding the special-purpose locations would have bumped into the restriction which would have lead to the right questions being asked much earlier.
- Test against a copy of production data.
- This is a great solution for finding errors before they hit production.
- This is a no-go when you do not have a testing environment as secure as your production one.
- Analyze production data for the range of values present in each column. Generate test data that includes the full range of values.
- This is a good practice in general, assuming you can’t test against real data.
- This will help catch other, unforeseen errors.
- There is a real cost to spending time on this kind of analysis and data generation, and that requires the will of management.