Uncovering Patterns and Trends in Ausgrid Power Outage Data

Outage Frequency Patterns in Electricity Transmission

Herman Wandabwa
6 min readMay 7, 2023
Photo by Fré Sonneveld on Unsplash

Power outages are a prevalent challenge encountered by utility companies, highlighting the need for a thorough analysis of historical data to understand patterns and trends. This article presents an analysis of the historical outage data for Ausgrid, Australia’s largest electricity distributor, which services 1.7 million customers across Sydney, the Hunter Valley, and the Central Coast. Ausgrid’s network infrastructure includes substations, powerlines, underground cables, and power poles, spanning an area of 22,275 square kilometers.

The outage data examined in this analysis covers the period from 2013 to 2022, encompassing outages that impacted 50 or more customers and lasted for over five minutes. The data is categorized by Local Government Areas (LGAs) and includes information on the average duration of outages and their possible causes. The individual files containing the data can be accessed via the following link.

The focus of this analysis was on the frequency of outages, which was examined in relation to the time of day, day of the week, year, and location.

Data Collection: The data was sourced from Excel files spanning from 2013 to 2022. The code used to read the Excel files and merge them into a single dataframe for further manipulation is presented below.

In the context of data analytics, additional features were created from the “Start Time” feature in the code below. The Start Date and Time features were converted to the date-time format. Then, the year, month, day, day of the week, and hour attributes were extracted using the code provided. The day label feature was created by converting the numerical format of the day to its corresponding day name. Additionally, a boolean output was generated to indicate whether a day is a weekend or not. These newly created features provide more detailed information that can be useful in further analysis and modeling.

Data Integrity:

During the data analysis process, it was discovered that there were approximately 284 instances of duplicated outage events. This issue was likely caused by data integrity problems that may have arisen during the capture of the event data. Upon closer examination, it was determined that records associated with Event_IDs 1592611 and 1598617 referred to two distinct outages that occurred in the HORNSBY, RYDE, and KU-RING-GAI areas at the same date and time. The 282 duplicate records were removed, while the records linked to Event_IDs 1592611 and 1598617 were treated as independent events for further analysis. Ultimately, the analysis was conducted on a total of 13175 unique outage events.

Frequency Analysis:

Frequency of outages by location

Frequency analysis is an important part of data analytics as it helps identify patterns and trends that can be used to make informed decisions. One useful analysis is to examine the frequency of outages by location, which can help identify potential hotspots for future outages. This analysis can be conducted by performing a simple count of Event IDs per Local Government Area (LGA). By examining the frequency of outages in different LGAs, it is possible to identify areas that are more prone to outages and take proactive steps to address any underlying issues. This type of analysis can help improve the reliability of the power grid and ensure that customers are provided with a high level of service.

The bar plot below illustrates the count of events per location as generated by the aforementioned code. The data reveals that Gosford, Hornsby, and Wyong were the locations most significantly impacted by outages during the collection period. These findings suggest that additional attention and resources may need to be directed towards these areas to minimize the frequency and duration of outages. Further analysis can be performed to determine the underlying causes of the outages and identify potential solutions.

Outage Frequency by LGAs

Outage Frequency by Year

In terms of the data analyzed, it can be observed that the year 2020 recorded the highest number of outages. The overall trend indicates a gradual increase in outages over the years, with 2020 being the peak. However, there has been a decrease in the number of outages in subsequent years, with a notable drop in 2021 and further reduction in 2022. Ausgrid must have fixed faults by the end of 2020, prompting the drop in 2021. A visual representation of this trend can be generated using the provided code to produce the bar chart.

The analysis of the available dataset reveals that equipment faults are the most prominent cause of outages for Ausgrid, although specific details regarding the nature of these faults were not provided. Environmental-related issues were identified as the second-most common reason for outages. It is noteworthy that Ausgrid has taken steps to address these issues, as evidenced by the decline in outage frequency after 2020.

Line plot showing frequency of outages by “Year” and “Reason”

The aforementioned plot is a visual representation of the total outage occurrences aggregated by year and reason, as derived from the dataset using the following code:

Average Outage Duration by Days of the Week:

On average, it takes Ausgrid workers longer to fix outages on Fridays and Saturdays. Specifically, it takes approximately 206 minutes for fixes on Saturdays, followed by Fridays. This trend may be attributed to a reduced number of workers available over the weekend or, potentially, a deliberate choice by workers for better callout rates over the weekend. However, further analysis of the time taken to resolve outages of a similar nature across different days of the week is necessary to draw more definitive conclusions. By doing so, Ausgrid could gain more insight into the factors influencing outage resolution times and optimize its workforce accordingly.

Average Outages based on different days of the week

According to the heatmap below, it appears that the majority of outages occur between the hours of 12 PM and 7 PM. It is possible that this pattern is simply due to the fact that most people are at home during these hours and are therefore more likely to report an outage. Conversely, outages are less frequent during the late night and early morning hours. This information is valuable to Ausgrid as it can inform worker planning. To accommodate the higher frequency of outages during peak hours, Ausgrid may consider rostering additional workers to cover the period between 12 PM and 9 PM. This is typically the busiest time for line workers who handle outages.

Frequency of Outages by Day of Week and Hour

A scatter mapbox was utilized to visualize outages by Local Government Area. The plot incorporated geodata sourced from a dataset that contained the latitudes and longitudes of LGAs within Ausgrid’s jurisdiction. The resulting plot revealed a color-coded representation of outages across different LGAs, with areas displaying a darker shade of red indicating a higher frequency of outages. In contrast, areas with a lighter shade of red were relatively less affected.

Scatter Map showing outage counts by LGAs

The code to plot the scatter mapbox is below. LGA names in Ausgrid’s data were formatted to match geojson data before merging them.

As a final step in the analysis, I investigated the probability of outages occurring in a Local Government Area (LGA) based on the associated reasons. This involved grouping the total number of outages in the dataset by LGA and reason and calculating the probability of an outage occurring in a specific LGA and reason. This was computed by dividing the number of outages in each group by the total number of outages in the dataset. The result was then displayed in a tree map format, presenting the LGA, reason, and calculated outage probability. This approach provides a statistically sound method for assessing the likelihood of outages in a given area based on specific reasons.

Tree map showing LGA and probability of an outage based on a specific reason

The treemap visualization shows the probabilistic ratio of outage occurrences by reason. From the chart, the probability of an outage being caused by “environmental” factors is higher in Gosford than in Bankstown, where “equipment faults” were more likely to be experienced.

That's it for now. I look forward to writing more in terms of the optimization process that can be deployed for rostering line workers for Ausgrid based on this outage data.

In summary:

  1. The analysis shows that equipment faults have consistently been a major reason for power outages in the Ausgrid network.
  2. The year 2020 recorded the highest number of outages during the period covered by the dataset.
  3. Based on the data, Gosford, Hornsby, and Wyong were the locations most affected by power outages. Gosford’s cause of outages is mostly environmental-related factors.
  4. Power outages were found to be more prevalent in the afternoons, with a peak at around 6 PM across all days of the week. As such, Ausgrid’s rostering for the afternoon shift should consider having more workers on standby. Mornings and late evenings are usually quieter periods in terms of power outages.
  5. The analysis indicates that power outages are more prevalent on Saturdays. Therefore, there is a need for better workforce planning on this day.

--

--