What is a Big Data Platform?

What do you think when you hear Big Data Platform (BDP)?  When I first reported to the Data Readiness Task Force (DRTF), I assumed that BDP was interchangeable with Integrated Data Environment (IDE).  “Big Data” and “Platform”.  It obviously must be some type of cloud based data analytics system.  Cloud based because otherwise average (and for the Coast Guard below average) computers could not quite scale to a level considered “Big Data”.  And then a data analytics system is gleaned from the combination of Big Data and “Platform”.  This logic and overarching assumption is misleading.

Big Data Platform – Defined

The BDP is a government-owned, cloud-based analytics platform originally developed by Enlighten IT Consulting, LLC. for the Defense Information Systems Agency (DISA).  It can be deployed by a government agency or contractor in any environment.  The code is hosted in a collaboration environment at devforce.disa.mil.  Over many years and partners, the BDP network has cultivated a large data repository.  Because the BDP is backed by the DoD, Intelligence Community and other Federal services, it enables sharing from different information networks.

BDP boasts the power of distributed query.  Which is to say, BDP is a noSQL database which enables a query run from one BDP to be run in parallel across the network of BDPs, providing data from previously disconnected sources and departments/agencies.

Big Data Platform as the Coast Guard’s Integrated Data Environment

Clearly, the Coast Guard already identified a Government-Off-The-Shelf (GOTS) solution for its Integrated Data Environment (IDE).  In the same whitepaper referenced in a previous post, a very specific strategy for Coast Guard use of a DISA BDP capability is outlined.[1]

  1. The BDP can accept structured and unstructured data and use tags to organize the information and make it easily searchable for analysis.
  2. Similarly, as an additional capability, data from multiple sources can be analyzed together (effectively a Coast Guard data lake).  These multiple data sources include large amounts of mission narratives and operational summary databases previously unavailable for analysis (meaningful text analysis at the scale of Coast Guard mission narrative and operational summary data remains out of reach for current Coast Guard analytic capabilities).
  3. Further, the Coast Guard’s data can be expanded to include data sets from within the DISA BDP network, and enrichment data from other open-source data sets.
  4. Finally, tagged information in the data lake can inform future investments in machine learning, which would represent the cutting edge of information-based decision-making and information-driven operations for the Coast Guard.

A Pivot from the Full Extent of Possibilities Implied by a Coast Guard Data Lake

As identified in a previous post, a Coast Guard data lake would begin to address some of the analytic community’s largest problems.[1]  Specifically, data calls/data hunts at the beginning of an analysis idea would be streamlined with Coast Guard data stored and shared within a central location.  However, by limiting a Coast Guard data lake to the capabilities of a GOTS DISA BDP, the idea of killing two birds with one stone with respect to analysts desires is no longer on the table.[2]  Because the Coast Guard selected a solution, imposing requirements for Python and R kernels on the existing GOTS solution is impossible.

[1] https://joebreaker.com/2022/05/11/integrated-data-environments-an-introduction/

[1] https://joebreaker.com/2022/06/06/desires-of-a-coast-guard-analyst/

[2] https://joebreaker.com/2022/06/06/desires-of-a-coast-guard-analyst/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: