OPINION: How Government Can Avoid Another FAA System Outage

THE CHALLENGE

On January 11th, the Federal Aviation Administration (FAA) made international news when it grounded all U.S. domestic flights due to an outage of the system that provides pilots with pre-flight safety notices.

According to the FAA, the Notice to Air Missions (NOTAM) database was the cause of the issue, as a result of a data file that was damaged by personnel who failed to follow procedures. 

The FAA then issued a statement saying that contract personnel unintentionally deleted files while working to correct synchronization between the live primary database and a backup database.

The NOTAM is a vital system that compiles essential preflight information for pilots, airline dispatchers and others. The information shared includes details about potential bad weather on the route, taxiway changes at airports and closed airspaces that must be avoided.

As highlighted in this CNN article, the technology behind NOTAM was developed in the 1990s. Transportation Secretary Pete Buttigieg is making the updating of this system a more rapid priority, as highlighted in the article.

For the agency’s 2023 budget, the FAA requested $29.4 million for its Aeronautical Information Management Program, which includes the NOTAM system – prior to the system going down.

MITIGATION

The outage of the NOTAM reinforces the need for overall IT modernization, effective data management, and the development of processes to minimize human error. Until modernization occurs in these legacy systems in this type of scenario, especially in a production environment, scripting can mitigate this type of error to both automate repetitive tasks and minimize overall errors.

In addition, these kinds of issues can be mitigated by ensuring that backup and archived transactions are updated (in near real-time) every 15 minutes, which is very common in banking and mortgage systems.

In the case of an emergency with the primary system, it is possible to turn the backup system into the primary system. This allows for the continual processing of transactions with future operations.

In order to perform these tasks against a production environment, code and procedure verification is also vital – especially with rapidly changing data. Files should be locked down, and cannot be deleted after certain points, allowing for valid records to be maintained.

Error messages should also be developed and generated that describe not only the issue, but the consequences of performing certain actions against a production system. There should also be final tests around the acknowledgements of the overall status.

It’s always a challenge maintaining complex IT systems. However, by having the right tools and processes in place, it is possible to avoid significant network outages like the one that recently happened at the FAA.

Makpar is an experienced solutions-oriented federal government contractor focused on IT infrastructure modernization. Since 2009, the company has offered mission-critical, cutting-edge technology for federal agencies seeking enterprise-wide solutions. Please click here to contact Makpar’s team for more information.

Previous
Previous

The Fed Mission Success Round Up: GAO Report on Fed Cyber Weakness; DoD to Focus on SMB Acquisition; and New CIOs at FEMA and EEOC

Next
Next

The Fed Mission Success Round Up: White House and Building Responsible AI; OPM Cyber Workforce Dashboard; and New FedRAMP Committee