Integrated Data Environments – An Introduction

Aug 6, 2024

Admittedly, the Department of Defense (DoD) (both Government and military organizations alike) uses a lot of acronyms. Nonetheless, acronyms are very common. To an extent, I think the volume of acronyms used in DoD and the Coast Guard is hilarious. This is why every time I perform an analysis, I force the title of the work into an acronym. My personal gold standard is when the acronym actually catches and people refer to my analysis by the acronym. (The platinum standard will be people referring to my analysis by the acronym… and other people knowing immediately what they are talking about. But we are driving cultural change in the Coast Guard. It is slow work.)

But the first problem with acronyms being common is that an “IDE”, was already taken. When I see IDE, I think Integrated Development Environment. As data scientist, I am used to Spyder and PyCharm IDEs for Python. And then the RStudio IDE for R. So, IDE was already taken and assigned to Integrated Development Environment (at least in my head).

When you Google “Integrated Data Environment” you are presented with results pertaining to Integrated Development Environments. But also, to a Privacy Impact Assessment (PIA) from the Defense Logistics Agency (DLA). A PIA is a document created for Government systems and posted publicly so citizens know (at least, they know at an ambiguous, overarching level) things their Government is doing; information the Government is tracking. To the DLA an Integrated Data Environment (moving forward referred to as an IDE), is defined in their PIA as:

Integrates data and computer software systems for DLA and USTRANSCOM. IDE creates a common information technology environment for the management of supply chain distribution and logistics information for Combatant Commands and Military Services. IDE also manages a PKI-enabled secure website that allows Combatant Commands and Military Services to review potential data services with descriptions of information available to them. ¹

DLA’s IDE sounds nice: integrating data and different software systems, common IT environments for what I assume is their core work as an agency (i.e. management of supply chains and logistics). I can see why the Coast Guard would want to emulate DLA’s IDE. We are starting to get to the basis of what the Coast Guard may mean when referencing an IDE (again Integrated Data Environment).

The Data Readiness Task Force’s (DRTF) Charter document references an IDE four times within specific milestones set for the DRTF:

Approve C5I requirements for the tools, systems, network, and other technical needs determined to be part of the Integrated Data Environment.
Finalize the plan for Coast Guard data ingestion into the Integrated Data Environment.
Compete the contract for the Coast Guard Integrated Data Environment.
Begin the Coast Guard data transition to the Integrated Data Environment.

Additionally, beneath the Technical Way Ahead Line of Effort the DRTF Charter references:

“Identify requirements for a Coast Guard IDE. Invoke architecture and data standards, as well as best practice and principles into the System Engineering Lifecycle to enable interoperability and integration interfaces to support data exchange. Conduct market research into existing technological solutions, including tools and infrastructure to assist with data analysis and clean up.”

From these passages, it is apparent the Coast Guard is interested in an IT procurement. It is important to consider that slightly before the DRTF Charter was signed a whitepaper started circulation throughout the Coast Guard, titled “Big Data Platform: Moving Coast Guard Data Analytics into the 21st Century”. This whitepaper highlighted, the Coast Guard collects data in many structured or semi-structured databases, but the “information is stored in a manner that is incompatible with the types of analysis required by the 21st century.” Similarly, “Many of our systems are disconnected from each other, creating silos of information that are difficult to access or use to conduct robust analysis.” It further argued these inherent issues prevent timely analysis for seemingly simple inquiries.

To me it is beginning to look like the Coast Guard desires a data lake. According to Google Cloud,

“A data lake is a centralized repository designed to store process, and secure large amounts of structured, semi-structured, and unstructured data.” ²

To understand why the Coast Guard is interested in a data lake, it is probably most productive to introduce the Coast Guard’s current data stance.

These views are mine and should not be construed as the views of the U.S. Coast Guard.