Storm Water’s Stat Line

Leveraging big data to improve storm water infrastructure performance

May 3, 2018

7 min read

About the author:

Seth Brown is director for storm water programs for the Water Environment Federation and principal founder of Storm and Stream Solutions. Brown can be reached at [email protected].

A new baseball season is starting, so with this in mind I want to make a connection between a shift in baseball and one that may be happening now in the storm water sector. I recently spent a few days in the Oakland, Calif., area to participate in a meeting with some great minds in the storm water sector. I mention that this event took place in Oakland for a specific reason—“Moneyball,” the well-known book (and movie featuring Brad Pitt) about the rise of big data in baseball focused on the 2002 Oakland Athletics baseball team and the organization’s embrace of a new approach to measuring baseball statistics. The connection between storm water and a book that focused on data in baseball may seem obtuse, so let me explain.

“Moneyball” is a term that refers as much to the revolutionary shift in the mindset of an industry as it does to a specific baseball team or season. For many years—more than 100 years—baseball was led by professional scouts, managers and coaches who made choices using intuition and limited statistical metrics rather than robust data mining and in-depth statistical analysis. Decisions made this way ranged from which players would be drafted or promoted to higher levels within the minor league system to where players should be positioned in the field to the order of batters in the lineup.

The result of this approach was the informal development of a set of heuristic rules that were not often questioned. For instance, a player might be struggling in the lower levels of the minors, but a scout might say, “Don’t worry. Be patient. He just looks like a player, so I’m sure he’ll develop.” Another example is the reliance upon simple statistics, such as batting average for batters or winning percentage for pitchers, to judge the value of a player.

This approach to the game held steady until three dynamics converged in the early 2000s: significant disparities in team payrolls, computing power, and a new, more data-driven approach to the game. Teams such as the New York Yankees and the Boston Red Sox had payrolls many times larger than smaller-market teams such as the Minnesota Twins and the Athletics. This disparity created a demand to find a way to win games in a more cost-efficient way. Could a team with the smallest payroll get into the playoffs or even win the World Series? Concurrently, the field of sabermetrics—named after to the Society for American Baseball Research (SABR)—began to emerge and propose methods of data analysis much more sophisticated than what had previously been used in baseball.

New statistics such as OPS (On-base Plus Slugging) average and WHIP (Walks and Hits per Inning Pitched) emerged and went far beyond the basic statistics historically used in baseball. These new measurements provided a clearer picture of metrics that were more meaningfully tied to performance. However, these new metrics required greater computing power and a reliance upon larger data sets, both of which became more advanced by this time.

The result of this convergence was that Billy Beane, the general manager of the Oakland Athletics in the early 2000’s, who was desperate for a way to compete with larger market teams, decided to adopt this data-driven approach to baseball. It changed the statistical landscape in baseball in major ways. Beane required his scouts to make decisions based not upon who “look like players”, but rather their sabermetric statistics, which led to the assembly of seemingly weak-to-middling players who ended up winning many more games than they were ever expected to win. They even won 20 consecutive games in a row, which set an American League record.

So how does this tie into the storm water sector? Storm water, like baseball, has been governed largely by simplified analytical methods (e.g., rational method) and informal approaches based more often upon heuristics than data and complex analysis. Similarly, these simplified approaches have evolved in storm water as they have in baseball, with simple analytical approaches giving way to two-dimensional modeling platforms; however, the legacy of simpler times remains. An example of this is the use of technology-based standards for storm water management practices.

Assumed pollutant removal capacity is assigned to various storm water control measures (SCMs), and they are used to select and size SCMs in design. But what is the actual performance of an SCM, and is it adequate for the pollutants being delivered? The use of this approach is defended by the argument that pollutants associated with storm water runoff and SCM pollutant removal are too difficult to monitor and quantify. To be sure, the flows associated with urban runoff are not as easily contained and measured as water or wastewater systems, but as sensor technology and computing power has developed, defensibly throwing your hands up in the air and claiming it is just too hard to track the performance of storm water systems and programs is becoming increasingly hard to do.

Sensors have evolved, and so too has our ability to capture and store data and the use of it to improve storm water infrastructure performance. I recall the times when one would set up a flow monitoring or pollutant level station in a remote location, and some unfortunate soul would have to regularly trudge out to a wetland or a stream in the middle of nowhere to collect stored data.

Due to cloud-based storage and cellular technology, the human resources needed to collect monitoring data have been greatly reduced. This performance data can not only be stored remotely, but also be measured in real time. The impact of this cannot be overstated. For instance, these technological advancements open up the potential to know whether or not an SCM is performing as needed or expected at any time, which can reduce costs associated with maintenance by allowing for as-needed inspections rather than inspections by schedule. Additionally, real-time monitoring can allow for real-time controls, thus transforming a passive or static SCM into a dynamically controlled infrastructure, which enables professionals to squeeze performance potential out of each practice in new and value-producing ways.

Like baseball, the storm water sector always has driven lots of data. In 1858, Henry Chadwick, a New York sportswriter, developed the box score, which formalized the organization of data generated by a single baseball game. The parallel for storm water is the collection of storm water program performance information that is required through tracking and reporting to meet municipal separate storm sewer permit requirements.

Similar to use of limited baseball statistics, much of the storm water program and performance data that is collected is not readily manipulated, and in many instances, the data collected is rarely reviewed or analyzed. Beyond this, new, cheaper sensors can provide real-time data that can be immediately sent and stored in the cloud, which is exponentially increasing collected data.

New technologies not only are capturing and storing performance data, but also using it to make infrastructure dynamic and smart. The storm water sector is becoming a big data sector, and increasingly, we expect to see more and more decision-making on SCM selection and program management through high-quality and dynamic data sets. Storm water has always had to scrounge for funding and is constantly seeking legitimacy as an infrastructure industry. Like the underdog Oakland Athletics, perhaps storm water will go on a long winning streak. And with cleaner water, we all win.