Post-Training Evaluation Strategies - Part 3

There are a few shining examples of good on-the-job training

Any business professional worth his or her salt knows that training is essential. We've all had that experience of the nightmare job where we walk in with bright eyes and high hopes only to have our will to live crushed under the weight of unreasonable expectations coupled with little to no training. These experiences are more than just frustrating, they undermine employee engagement from day one. A recent study conducted by TINYpulse indicates that employee engagement is worsening and has a direct impact on employee performance. While not cited in this article, HR and training experts agree that employee training during the onboarding process is essential to laying a strong foundation for long-term engagement.

I want to take a moment and address those of my readers who are in the role of boss. Just because you sat down with your new employee for one hour with an explanation of how things work isn't training. Summarizing how the file system is organized, providing a cursory overview of an outdated employee handbook, and pointing your new hire at a couple of stale Microsoft Word and PowerPoint documents with the encouragement of, "You're smart, you'll figure it out," doesn't mean you've trained them. If you disagree and think, "Hey, that's how I was trained!" I would like to point out that people who rationalize fraternity hazings use the same justification. By the way, I don't recommend adding good-natured humiliation to your employee onboarding process.

"Hannibal Lecture" devours our will to work one mouth-noise at a time

Businesses that take training seriously design training systems with the intent of imparting skills aligned to business processes. Sharing knowledge as if you are a college professor delivering a 101 lecture is not training;  however, this kind of training is everywhere. It's so ubiquitous that the majority of organizations fail to recognize what a horrible approach this is, but they simply can't imagine any other way of training their employees. This poor understanding of training is a direct byproduct of the industrialization of primary and secondary education. Here's the strategy. Gather up as many people as you can, stick them in front of a big ol’ brain with legs that spews forth knowledge and wisdom. Then, expect trainees to take copious notes they will never reference once the badly designed test is over, and are out of date as soon as the training materials change. Sprinkle trainees with magic dust made from powdered magical thinking and poof, you have a trained your employees. However, at best they are now capable of reciting facts, but are capable of achieving precious little with all that knowledge. This is the underlying strategy behind most liberal arts education and people are no longer buying what the university systems are selling.

College dropouts can and do succeed

The problem with the industrialization of education in the United States is so bad that the wisdom of, "Go to college, and you will earn more money," is starting to lose its credibility with Generation Y. Philly, a blog that covers the technology industry in Philadelphia, recently posted a follow-up story regarding one Becca Refford. Becca's story goes something like this:

"I dropped out of Temple University's Computer Science program because I was learning more at my paid internship than I was learning in classes I was going into debt just to sit in. When I tried to seek guidance from Temple's administration, I was told to consider a career in a field that was less challenging and required less math, like business management."

I was rather peeved when I read Becca's story since not only is Temple rather pricey, but the administration treated her very legitimate complaint in a highly sexist and discriminatory manner. So Becca dropped out of college, started working full time, and never looked back. When Philly editor, Juliana Reyes caught up with Becca two years after she dropped out of Temple University, Becca was still growing in her career, and when asked if she had any regrets about leaving Temple she's quoted as saying:

"Not at all."

My husband had a similar experience. Jeff works as a software developer for a civil engineering firm here in Philadelphia. He dropped out of Drexel University when his paid internship offered him a job just two years into his undergraduate education. Jeff took the job and never looked back. I am pleased to say that Jeff and I are doing quite well and only one of us has any student loan debt. I can remember several times Jeff has come home at the end of the day and told me that he had to teach a newly graduated software developer how to use source control for code management. For those of you unfamiliar with "source control" and why it's completely unacceptable that university computer science programs are failing to teach students how to use it. Let's just say it's on par with a medical resident not knowing how to use a stethoscope or perform CPR.

What's the solution to horrible on-the-job training?

Let's get the most self-serving thing I can say out of the way, "You should hire Populouz to help you design and implement a better training system." However, that's not helpful for most people. So let's take a different approach. 

The inspiration for this three-part post came to me after an initial consultation with the Director of Learning Analytics at a major U.S. insurance company regarding Level 3 Evaluations. The focus of our meeting was on the subject covered in part two of this post and the topic of today's post. In part two of this post, I described in detail how to develop specifications for a training system using performance objectives (POs) and learning objective (LOs). An essential characteristic is of both performance and learning objectives is that they describe observable behaviors. For example: 

"As a board game designer, I can draft a set of rules, so my game players know how to play the game."

A bad performance objective would be:

"As a board game designers, I will appreciate the importance of structured rules..."

Do you see the difference? If I were the owner of Milton-Bradley, I could assess the ability of my game designers to write board game rules by any number of criteria. I wouldn't care if they appreciate the value of the rules since that's part and parcel with the job.

Building the U.S. Coast Guard Recruit Training System Level 3 Evaluation Survey

This approach of drafting observable performance objectives when speccing out a training system is the approach the U.S. Coast Guard requires all their training systems designers to use and for a good reason. The Coast Guard requires evaluation off whether active duty members are using the training American tax dollars pays for. When I came onboard the U.S. Coast Guard Recruit Training Center (TRACEN) in Cape May, NJ as the Senior Performance Analyst, I worked for Commander Jennifer Sinclair. Commander Sinclair asked me to develop what we in the training biz call a Level 3 Evaluation of the Recruit Training system, which is just a fancy way of saying, "Hey, let's see if people are doing what we trained them to do." There had been a few other attempts by my predecessors to conduct such evaluations, but they produced some lackluster results, and none of them were able to establish a sustainable evaluation process. 

Evaluating an eight-week training program wasn't easy

I want to list just a few of the complications we faced when setting up this system:

  • The training program we're talking about consists of more than 100 performance objectives

  • Not all recruits will use all of the training they received, recruits only specialize in a subset of those skills as they move to the next stage of their career

  • Many recruits may go six months to a year before they have to apply their skills on the job

Here's an example. Seaman Todd may have scored top marks for performing CPR during Recruit Training, but it may be eight months before she will need resuscitate a fisherman on the deck of his ship. More to the point of how difficult this is to evaluate, no single data system in the Coast Guard tracks the number of times individuals demonstrate specific skills on the job. Unlike systems such as Salesforce, which can track user behaviors at a granular level, there is no Big Brother-like system monitoring Coast Guard members.

After some discussion with my colleagues at TRACEN Cape May, we decided that our best course of action was to perform a behavioral survey of newly minted Coasties about six to eight months after graduation. Sure there were limitations with this approach, but it was the most methodologically sound approach we could come up, and doing nothing wasn’t an option.

Designing a behavioral survey to conduct a Level 3 Evaluation

The design of the survey wasn't especially difficult given that the design of the program was one of the best I've ever had the pleasure of working with. I'm not joking when I say that we took the original curriculum architecture of the Recruit Training system and converted it into a survey. We rewrote each performance objective as a survey question. If the performance objective was, 

"As a 3rd class petty officer, I can perform CPR on a mariner suffering from a severe electrical shock, so I can sustain his or her life until emergency medical services are available."

... we wrote the survey question as, 

"I have performed CPR on a person in distress and successfully sustained his/her life until emergency medical services were available."

A little wordy for a survey question, I know, but it was essential we describe the performance we were evaluating given the degree of analysis and research that went into the design of the Recruit Training system in the first place. We broke this question into three parts: 

  • "Have you performed performed this task since graduating from Recruit Training" - Yes or No

  • "How frequently have I performed this task since completing recruit training?" - assessed with a Likert-like scale

  • "How well have you performed this task?" - assessed with a Likert-like scale

We converted all 100+ performance objectives into this question format and used the higher-level grouping of the Recruit Training system's objectives to program skip patterns in the survey. This way, survey-takers weren't answering questions that didn't apply to their job role out in the fleet. For example, the Recruit Training system included broad topic areas such as Seamanship, First Aid, and Damage Control. If a newly minted Seaman was serving as a Damage Controlman aboard a National Security Cutter, then we allowed them to skip the Seamanship section of the survey. Had we failed to create these skip patterns I expect the first run of the survey would have been an abject failure.

Ah, but there's a catch, people aren't honest when it comes to self-evaluation

While this survey may have seemed like a rock-solid solution, there is a problem. It only captures the self-report of the person we were evaluating. There is a well-known habit of survey takers answering questions about themselves. Social Scientists call this behavior Social Desirability Bias, and it manifests itself in the form of inflating one's self-evaluation in a positive manner. To put it another way, if you received a non-anonymous survey asking you to rate your on the job performance, would you report poorly on yourself?

However, why are you surveying the employee and not the boss?

We had discussed the idea, but we dismissed this in our initial discussions for one primary reason. The Coast Guard has very high standards for employee performance and the front-line supervisors out in the fleet are very demanding. This is intentional since the training delivered by TRACEN Cape May needs to be continually reinforced. Furthermore, there is little room for error in the Coast Guard. Coast Guard members work in very dangerous conditions, live in close quarters, and are tasked with protecting the lives of people who find themselves in American waters. Fleet supervisors need to be demanding since the cost of failure is often the life of a civilian or a fellow Coast Guard member. Despite the justifiably high standards the Coast Guard expects, we also knew that those expectations would color the assessments of the supervisors. We new they would always see newly minted Seamen as having, "Room for improvement." In the end, we decided to survey the bosses as well to counterbalance the effects of Social Desirability Bias we knew we would find in our survey. So we created a mirror survey rewritten to ask boss of the individual seaman shipped to them after training. We used some straight forward post-data collection processing to match the survey of the newly minted Seaman with the mirror survey of their boss.

How did you analyze the results?

This is an excellent question and is probably the more complicated part of this whole methodology. At the end of the data collection phase of our process, we had a mountain of survey data. On average, we had more than 300 matched pairs of surveys, a Seaman survey and a matching Boss survey, each time we performed the evaluation. The TRACEN Command Staff were pretty happy with this since we maintained an average of about a 50% response rate for the survey, much higher than the accepted 30% response rate for most surveys of this type. All told, each matched-pair survey represented more than 300 individual data points, so there was no way we were looking at each survey. Instead, we used inferential statistics to look for differences between how Seamen rated themselves and how Bosses rated them. In retrospect, I shouldn't have been surprised, but when we finished programming our Excel reporting dashboard, we found that there was a statistically reliable difference between the Seaman and Boss on all of the performance objectives regarding evaluating the performance of the Seaman. In other words, the Seaman all thought they were doing better than their bosses did. It's at this point that a novice statistician would panic and say, "Oh no, the program is a complete failure, the Fleet is unhappy with the quality of our training system, and the Seamen are delusional!" However, as I like to tell people, this hick from the mountains of Idaho didn't fall off the potato truck yesterday. 

A quick lesson in applied statistics and learning analytics

It's at this point where I want to explain the difference between the reliability of a statistical finding and the effect size. When you hear the term, "statistically significant," concerning inferential statistics what it means is that an observed phenomenon is not due to random chance. In the case of our survey, we know that Social Desirability Bias is a real thing and could explain what we saw in our analysis. We also know that Fleet supervisors have very high expectations no matter how solid the Seaman they receive from TRACEN Cape May. When we are talking about effect size, we are talking about how big the difference was. In the case of the Cost Guard Recruit Training Level 3 Evaluation system, we used what's called a t-test for our analysis. A t-test is a standard statistical test social scientists use to evaluate the differences between mutually exclusive groups. Political Science researchers use this test to assess differences between the political opinions of men and women since, for the most part, people are either a man or a woman (a gender identity discussion is well beyond the scope of this post). There are standard mathematical procedures for determining the effect size for a t-test, but the threshold for whether the effect size means anything in the context of the training system is rather subjective. It took many discussions with the TRACEN Command Staff to decide at what point an effect size was worth digging into further.  However, once we had that threshold established, we were able to finish programming our analysis dashboard. Using a pivot table in our Excel-based reporting dashboard, we were able to get the Top 10 performance objectives regarding effect size. Using conditional formatting, I programmed the dashboard to flag performance objectives that were worth investigating with a bright angry red color; executives love it when the world can be reduced down to red, yellow, and green indicators.

At the end of all this effort, my team and I had a Level 3 Evaluation system that required relatively little effort to execute on a biannual basis. Regarding effort, all I had to do was upload a list of email addresses for recruits, trigger the survey, and then allow a few weeks for data collection. Once the data collection period was over, it took less than five minutes to import all the survey results into the Excel-based dashboard and generate an updated report for the TRACEN Command Staff.

What did you find?

Oddly enough, our findings weren't all the surprising. The same performance objectives kept coming up over and over again for areas of the program we knew didn't allow for enough practice time due to schedule and budgetary constraints. Despite the best possible design of the training activities, there simply was not enough time to allow recruits more practice. Once such performance objective that kept popping up in our reports was pistol marksmanship. The bosses out in the Fleet wanted trained sharpshooters for when boarding parties were required. Field units often have to invest lots of time and effort into more advanced training on marksmanship and situational awareness; the type of training police officers typically receive. However, those kinds of programs often require weeks of dedicated practice on a firing range and time in simulators to ensure a high level of performance from trainees. That level of training simply was not possible given all the resource constraints TRACEN Cape May had to deal with, and in the grand scheme of things, the broader leadership of the Coast Guard was not overly concerned about this particular shortcoming of the Recruit Training system. In all the areas that were deemed essential competencies for recruits to have upon graduation, our reports consistently came up in the green.

Proper training systems design is the key to a successful evaluation effort

I attribute this to two primary factors. First, the Coast Guard has top notch people and the American public should be proud of our Coast Guard service. Working with the Coast Guard right out of college was a wonderful experience. The Guard's strong esprit de corp regarding teamwork, respect, and professionalism have deeply influenced me to this day. Heck, I still use a 24-hour clock on my smartwatch. Their training staff are top notch and took the execution of their duties seriously. I have tremendous respect for the Coast Guard and wouldn't hesitate to partner with a Coast Guard training veteran if the opportunity arose. Do you want to guess what the other factor was? It's simple; the system was designed to train recruits on how to save the life a person in distress through the deliberate practice of CPR and not just "appreciate" or "understand" the finer points why CPR can save someone's life.