Meter tampering is a common threat to the business side of utility services and a public security threat, incurring uncontrolled tweaks that may increase the risk of accidents. So, fraud detection is core for any utility service company.
The presence of users that manipulate the network typically translates to an increased service price for the compliant users, since for sustainability of the services, remaining users absorb the additional load costs in addition to the inspection procedure costs associated with identifying such frauds.
This post showcases one of our previous AI projects on detecting meter tampering in water services. However, it can be used interchangeably in other types of services such as electricity, gas, and TV. The challenge we faced was identifying users with a high probability of meter tampering.
The first issue we faced was how reliable our labels were. While we had the outcomes from previous inspections (legal or fraud), inspections with a legal outcome may result -in some cases- from bribes. So, while you can trust positive fraud labels, negative cases are a mixture of actual legal cases and corruption itself. Another source of error on the legal observations are difficult-to-detect tampering, for example, in cases where the fraud was not directly on the meter but on bypassing the system at a different hidden point of the network.
We can model this learning task with multiple paradigms.
In this challenge, most houses won’t have inspections. How can you use this data? Maybe you can consider them as negative for training, with the care of not introducing too much noise from hidden fraud, or you can use semi-supervised learning to regularize the model with this data (here).
Also, we discussed how to tackle this task with Positive Unlabeled learning in our previous blog post (here).
Since, in this case, we have access to the test set, Transductive Learning techniques may have a role to play here. If you don’t know what Transductive Learning is, take the time to read about it, chances are you may benefit from it (here).
Finally, This problem tends to be extremely unbalanced. Take a look at one of our previous papers on how to reformulate class imbalance as ranking tasks (here) which can be combined with any of the solutions mentioned above.
The Data and Some Relevant Patterns for Fraud Detection
We trained our models combining three categories of data: contractual, consumption, and context data.
Contractual:
We included information about the:
Contract category: residential vs. industrial, dual-fee vs. flat-fee, contracted capacity, demographic data in the contract (e.g., house size, family size, etc.).
The victim itself – the potentially tampered meter: brand, model, capacity, technical specifications, etc. These features proved to be especially valuable when combined with context data (we will discuss this later).
While some features such as the ZIP code and the income capacity of the house may be correlated with fraud, we should take this with a grain of salt since we may be increasing bias and unfairness in the decision making (check this book).
Consumption:
Meter tampering often is associated with dramatic drops of consumption. Thus, we extracted features from aggregated consumption over time. Since consumption is seasonal and changes from house to house, it is essential to consider relative values instead of absolute ones. For example, you should observe the percentual change in usage between months instead of the absolute difference. We also normalized these features with respect to the global trend between any two months to compensate for seasonality.
Other fraud correlated patterns we found were consumption with low inter-month variability (i.e., the standard deviation of the consumption being too small) and “capped” expenditure (i.e., over the last year, the maximum value is repeated multiple times).
Contextual:
Context data proved to be the most important one. Meter tampering is like a virus and spreads among neighbors, especially among neighbors with the same model/brand of meter. Therefore, we extracted information such as the density of (recent) fraud within K meters (for multiple values of K) with and without stratification per meter model. Please pay attention to our emphasis on density and not a number. In this project, relativizing features to the local context of the individual and the region was critical to learn efficiently. If you don’t know how dense an area is (pretty common in countries with scarce open data), consider internal assets such as the number of contracts you have in the region and the density of your internal assets (e.g., pipes) as a proxy for density.
Reliable estimation of the model performance
When time and space are part of our learning system, we must pay attention to the way we split our data for performance estimation. Random splits will tend to “leak” information in training, giving overestimations of the model performance. In this case, we suggest splitting both temporally and geographically, as illustrated in the following graph.
Other variables to consider for splitting train and test might be the inspection rounds, inspection team, among others. The right split and its granularity (e.g., splitting between months/years, city/district/region) depend on business requirements and the way we intend to deploy our models.
How to use these predictions?
Predictions by themselves are not useful. We need to combine them with a decision process considering a KPI. In this case, we could use the additional information for fraud detection to build cost-effective inspection schedules and routes, finding the right trade-off between the expected benefit obtained from inspecting a house versus the cost of such inspection. We will cover this topic on a different blog post.
Curso, Modelos
Data Ignite
Learn how to create valuable AI solutions from predictions to actions.
Other use cases in the industry beyond fraud detection
We discussed here how to detect fraud detection of water meter. However, the utility industry is a vast field for the application of AI, from determining regions with a high potential value of prospect acquisition, churn prediction, anomaly detection in the network, among others. If you want to explore any of these ideas, just message us!
Like this story?
Subscribe to Our Newsletter
Special offers, latest news and quality content in your inbox.
Signup single post
Recommended Articles
Artigo
Perspetivas da IA: melhores práticas de planeamento estratégico para 2026
6 de janeirode 2026 em
“Lista: Resumo
Descubra as melhores práticas de planeamento estratégico para projetos de IA e dados para aumentar o ROI, a eficiência e a tomada de decisões em 2025.
Algoritmos de aprendizagem automática explicados: guia prático para modelos de IA
30 de dezembrode 2025 em
Guia: Explicação
Descubra os algoritmos de aprendizagem automática explicados com exemplos reais e orientações sobre como selecionar e implementar os modelos de IA adequados.
Um guia prático para reduzir o tempo de lançamento no mercado
22 de dezembrode 2025 em
Guia: Como fazer
Descubra como acelerar o seu lançamento com estratégias práticas para reduzir o tempo de comercialização. Aprenda a aproveitar a IA, a automação e os processos enxutos.
Utilizamos cookies no nosso site para lhe proporcionar a experiência mais relevante, lembrando as suas preferências e visitas repetidas. Ao clicar em «Aceitar tudo», concorda com a utilização de TODOS os cookies. No entanto, pode visitar «Definições de cookies» para fornecer um consentimento controlado.
Este site usa cookies para melhorar a sua experiência enquanto navega pelo site. Dentre eles, os cookies classificados como necessários são armazenados no seu navegador, pois são essenciais para o funcionamento das funcionalidades básicas do site. Também usamos cookies de terceiros que nos ajudam a analisar e entender como você usa este site. Esses cookies serão armazenados no seu navegador somente com o seu consentimento. Você também tem a opção de recusar esses cookies. No entanto, recusar alguns desses cookies pode afetar a sua experiência de navegação.
Os cookies necessários são absolutamente essenciais para o funcionamento adequado do site. Estes cookies garantem as funcionalidades básicas e os recursos de segurança do site, de forma anónima.
Cookie
Duração
Descrição
cookielawinfo-checkbox-analytics
11 meses
Este cookie é definido pelo plugin GDPR Cookie Consent. O cookie é usado para armazenar o consentimento do utilizador para os cookies na categoria "Análises".
cookielawinfo-checkbox-funcional
11 meses
O cookie é definido pelo consentimento de cookies do RGPD para registar o consentimento do utilizador para os cookies na categoria «Funcional».
cookielawinfo-checkbox-necessário
11 meses
Este cookie é definido pelo plugin GDPR Cookie Consent. Os cookies são usados para armazenar o consentimento do utilizador para os cookies na categoria «Necessários».
cookielawinfo-checkbox-outros
11 meses
Este cookie é definido pelo plugin GDPR Cookie Consent. O cookie é utilizado para armazenar o consentimento do utilizador para os cookies na categoria «Outros».
cookielawinfo-checkbox-performance
11 meses
Este cookie é definido pelo plugin GDPR Cookie Consent. O cookie é utilizado para armazenar o consentimento do utilizador para os cookies na categoria «Desempenho».
política_de_cookies_visualizada
11 meses
O cookie é definido pelo plugin GDPR Cookie Consent e é usado para armazenar se o utilizador consentiu ou não com o uso de cookies. Ele não armazena nenhum dado pessoal.
Os cookies funcionais ajudam a executar determinadas funcionalidades, como partilhar o conteúdo do site em plataformas de redes sociais, recolher comentários e outras funcionalidades de terceiros.
Os cookies de desempenho são utilizados para compreender e analisar os principais índices de desempenho do site, o que ajuda a proporcionar uma melhor experiência ao utilizador para os visitantes.
Os cookies analíticos são utilizados para compreender como os visitantes interagem com o website. Estes cookies ajudam a fornecer informações sobre métricas, como o número de visitantes, taxa de rejeição, fonte de tráfego, etc.
Os cookies publicitários são utilizados para fornecer aos visitantes anúncios e campanhas de marketing relevantes. Estes cookies rastreiam os visitantes em vários sites e recolhem informações para fornecer anúncios personalizados.