How machine learning (ML) is changing the CSO monitoring game

Combined Sewer Overflow (CSO) monitoring: Opportunities for artificial intelligence and machine learning
May 19, 2025
6 min read

Combined sewer systems are an older infrastructure practice of having a single pipe to collect both wastewater and stormwater. There are approximately 700 combined sewer systems in the United States, primarily in the Northeast and Midwest[1] where new sanitary sewers were connected to existing storm sewers. When wastewater treatment plants were built in the first half of the 20th century, sewer interceptors were constructed to convey all wastewater for treatment during dry weather.

Combined sewer overflow (CSO) discharges can cause exceedances of water quality standards (WQS). Such exceedances may pose risks to human health, threaten aquatic life and its habitat, and impair recreational activity[2]. The 1994 National CSO Policy provided a framework for combined sewer communities to make significant investments to capture and treat CSO, and in the following year, the U.S. Environmental Protection Agency (EPA) released guidance for nine minimum controls for combined sewer systems, including guidance on monitoring CSOs to identify overflow frequency, duration and magnitude at individual outfalls[3]. Over the past 30 years, CSO overflow monitoring has evolved to include visual inspection, depth monitoring, area velocity (A/V) monitoring and receiving stream gauging to accurately measure the frequency, duration and volume of CSO discharges.

Many CSO communities deploy a complex network of sensors connected to their supervisory control and data acquisition (SCADA) system or a separate metering platform. Some communities use their monitoring data to provide public alerts to discourage recreation in waterbodies that may be impaired by CSO discharges, and the state of Massachusetts requires public notification within two hours of the community becoming aware of a CSO discharge[4] from their monitoring data.

Black & Veatch's Chris Ranck explains how machine learning can demystify collection systems to create more efficient designs and more reliable outcomes.
Oct. 31, 2023

Artificial intelligence and machine learning basics

Artificial intelligence and machine learning (AI/ML) are emerging technologies in the wastewater industry and has been successfully applied in asset management, operational decision support and predictive modeling. While there may be a perception that all forms of AI/ML are generative AI such as ChatGPT, wastewater applications typically consist of traditional machine learning (ML) to identify repeatable patterns and relationships in data to create a predictive model that is not bound by the mathematical equations in traditional modeling approaches. 

A combined sewer community can leverage the following from machine learning:

  1. Algorithm speed can provide a result in seconds compared to traditional models that may require hours or days. The speed of ML allows for forecasting the future or real-time decision support;
  2. By identifying repeatable patterns, a reliable ML model can be developed in situations where there is insufficient data to develop a traditional mathematical model.

Citizens Energy Group in Indianapolis uses ML to forecast inflows to its CSO storage tunnel. The utility maintains a cloud tool to ingest the 72-hour public domain National Oceanic and Atmospheric Administration (NOAA) forecast, predict tunnel inflow volume and present the predictions in a dashboard available to all utility staff[5].

While Citizens Energy Group and many other CSO communities have found value in utilizing ML, they have found it is most valuable when deployed under a framework that formally defined the objectives, stakeholders and expectations for accuracy prior to training the ML model.

How machine learning can enhance combined sewer overflow monitoring

CSO communities understand that their monitoring approach will see higher expectations in the future, whether from a desire to better inform the public, identify structures for additional maintenance or comply with new regulatory requirements. ML can augment the existing monitoring network in the following ways:

  • Soft sensor: Additional confirmation of a metered CSO discharge before reporting to the public or regulatory agencies.
  • Forecasting: Predicting CSO discharges in the future for public notifications, recreational closures or operational interventions to reduce the discharge.

Figure 2: Roadmap for the development of a ML solution for CSO monitoring

For a successful training and testing of a ML model, the collected data should be reviewed by stakeholders to confirm it is representative of wet weather conditions observed in the combined sewer system. The data is also pre-processed for compatibility with the ML algorithm, which includes normalization of the data. The training and testing of the ML model then are performed for accuracy of predicting CSO discharges and includes a parameter sensitivity analysis to confirm the relative importance of input data (such as rainfall and temperature) aligns with stakeholder institutional knowledge. The parameter sensitivity evaluates whether the artificial intelligence is consistent with the human intelligence from stakeholders. Following the successful training and testing, the trained ML model is compiled, the process to re-train the model is determined, as is the deployment strategy.

Figure 3: Conceptual architecture for deployment of a ML tool

To demonstrate the above concept, an ML model was developed with historical data from a real CSO structure. This CSO has an overflow weir at approximately 1 foot from the invert of the pipe and is monitored by a depth sensor pointing at the weir. The long short-term memory network (LSTM) algorithm was applied for time series forecasting of flow depth in the CSO structure based on the following data sources:

  • Rainfall from the community’s gauge network
  • Daily temperature

Table 1: A comparison of the CSO overflow event frequency by year from the monitoring data and the ML model predictions

The example ML model could be deployed consistent with the architecture in Figure 3 for short-term or long-term predictions based on one or more of the following public domain NOAA forecasts:

  • High-Resolution Rapid Refresh (HRRR), 18-hour duration, 2.5 km grid resolution, one-hour temporal resolution[6]
  • National Digital Forecast Database (NDFD), 72-hour duration, 2.5 km grid resolution, six- hour temporal resolution[7]
  • National Blend of Models (NBM) includes multiple forecasts from two to seven days with varying resolution[8]

The appropriate forecast for the CSO overflow ML model should be selected based on stakeholder objectives and the desired resolution of rainfall data. In an on-premises or cloud deployment, the scripted tools could query the monitoring data and NOAA server every hour to collect current data and generate predictions of potential overflows.

Figure 4: A comparison of the measured depth from the CSO monitoring, the rainfall data, and the predicted depth from the ML model for the summer of 2023

Summary

While significant progress has been made in controlling combined sewer overflows, communities will continue to experience overflows and should expect additional demands on accurate and timely CSO monitoring. ML can augment existing monitoring infrastructure in the form of a soft sensor or forecasting system.

References

[1] US EPA Where Combined Sewer Overflow Outfalls are Located https://www.epa.gov/npdes/where-combined-sewer-overflow-outfalls-are-located

[2] Federal Register Vol. 59, No.75 April 19, 1994. Combined Sewer Overflow (CSO) Control Policy Notice. https://www.epa.gov/sites/default/files/2015-10/documents/owm0111.pdf

[3] US EPA 832-B-95-003, May 1995. Combined Sewer Overflows Guidance for Nine Minimum Controls. https://www.epa.gov/sites/default/files/2015-10/documents/owm0030_2.pdf

[4] Massachusetts Department of Environmental Protection, March 2022. Combined Sewer Overlfow (CSO) Preliminary Public Notification Plan - Instructions https://www.mass.gov/doc/instructions-combined-sewer-overflow-public-notification-plan/download

[5] Sutton, D., Bowers, C., and Ranck, C. (2023) Lessons Learned in 18 Months of Deploying Machine Learning for Predictive Operational Support; Water Environment Federation Technical Exhibition and Conference 2023, Chicago, IL.

[6] NOAA High Resolution Rapid Refresh https://rapidrefresh.noaa.gov/hrrr/

[7] NOAA National Digital Forecast Database https://www.ncei.noaa.gov/products/weather-climate-models/national-digital-forecast-database

[8] NOAA National Blend of Models: https://vlab.noaa.gov/web/mdl/nbm

About the Author

Chris Ranck

Chris Ranck is the national planning leader for Black & Veatch’s collection systems and wet-weather programs. He has 25 years of experience with a focus in combined sewer systems, including hydraulic modeling, water quality modeling, hydraulic design, regulatory negotiations, program management, post-construction monitoring and applications of machine learning. Chris is the chair of WEF’s Collection System Community.

Sign up for Wastewater Digest Newsletters
Get the latest news and updates.