What is a Big Data Platform?

What do you think when you hear Big Data Platform? When I first reported to the Data Readiness Task Force (DRTF), I assumed a Big Data Platform was interchangeable with Integrated Data Environment (IDE). “Big Data” and “Platform”. It obviously must be some type of cloud-based data analytics system, right? Cloud based because otherwise average (and for the Coast Guard below average) computers could not quite scale to a level considered “Big Data”. And then a data analytics system is gleaned from the combination of Big Data and “Platform”. This logic and overarching assumptions are misleading.

Big Data Platform - Defined

The Big Data Platform (BDP) is a government-owned, cloud-based analytics platform originally developed by Enlighten IT Consulting, LLC for the Defense Information Systems Agency (DISA). It can be deployed by a government agency or contractor in any environment. The code is hosted in a collaboration environment at devforce.disa.mil. Over many years and partners, the BDP network has cultivated a large data repository. Because the BDP is backed by the Department of Defense, Intelligence Community and other Federal services, it enables sharing from different information networks. The BDP boasts the power of distributed query. Which is to say, the BDP is a noSQL database which enables a query run from one BDP to be run in parallel across the network of BDPs, providing data from previously disconnected sources and departments/agencies.

Big Data Platform as the Coast Guard’s Integrated Data Environment

Clearly, the Coast Guard already identified a Government-Off-The-Shelf (GOTS) solution for its IDE. In the same whitepaper referenced in a previous post, a very specific strategy for Coast Guard use of a DISA BDP capability is outlined. 1

  1. The BDP can accept structured and unstructured data and use tags to organize the information and make it easily searchable for analysis.
  2. Similarly, as an additional capability, data from multiple sources can be analyzed together (effectively a Coast Guard data lake). These multiple data sources include large amounts of mission narratives and operational summary databases previously unavailable for analysis (meaningful text analysis at the scale of Coast Guard mission narrative and operational summary data remains out of reach for current Coast Guard analytic capabilities).
  3. Further, the Coast Guard’s data can be expanded to include data sets from within the DISA BDP network, and enrichment data from other open-source data sets.
  4. Finally, tagged information in the data lake can inform future investments in machine learning, which would represent the cutting edge of information-based decision-making and information-driven operations for the Coast Guard.

A Pivot from the Full Extent of Possibilities Implied by a Coast Guard Data Lake

As identified in a previous post, a Coast Guard data lake would begin to address some of the Operations Research and Data Analytics community’s largest problems. 2

Specifically, data calls/data hunts at the beginning of an analysis idea would be streamlined with Coast Guard data stored and shared within a central location.

However, by limiting a Coast Guard data lake to the capabilities of a GOTS DISA BDP, the idea of killing two birds with one stone with respect to data scientist’s desires is no longer on the table. 3

Because the Coast Guard explicitly selected a solution to implement, imposing requirements for Python and R kernels on the existing GOTS solution was impossible. A Request For Proposals to implement a BDP would mean implementing the GOTS solution as is, and the GOTS BDP did not already have Python and R kernel integrations. If truly desired, the onerous would then fall on the Coast Guard to commit resources to integration of the Python and/or R kernels into the BDP. And then hopefully enable those changes to be pushed across the BDP network for the betterment of the GOTS solution.

While the following closing statement is largely a matter of semantics, the Coast Guard’s insistence on buying an established solution makes the Coast Guard’s initial work in establishing an enterprise IDE a procurement and not an acquisition.


These views are mine and should not be construed as the views of the U.S. Coast Guard.