Initial Data Scientist Hopes for a Coast Guard Integrated Data Environment
Upon reporting to the Data Readiness Task Force (DRTF) and upon hearing discussions for procuring a Big Data Platform (BDP) for the Coast Guard, it was understandably not apparent a BDP was a specific data analytics solution the Coast Guard had already selected. In fact, I thought BDP and Integrated Data Environment (IDE) were being used interchangeably. (Without a doubt, many people were using them interchangeably. I was, without a doubt, not the only person who did not realize BDP referred to a specific data analytics solution.)
The name the Defense Information Systems Agency (DISA) chose for its data analytics platform is truly frustrating. Another officer jokingly stated, “DISA named their dog, ‘Dog’!” and I wish I had thought of the analogy. Because DISA did name their dog, “Dog.” Nevertheless, it is embarrassing how long I went on assuming there would be an opportunity for me to impose my thoughts as a data scientist customer on the Coast Guard’s new IDE. Ultimately, I found the IDE solution was already selected. However, what were those thoughts of mine for an IDE?
My Personal Data Scientist Desires
Speaking candidly as a Coast Guard data scientist, the limited availability and capability of Python and R on our CGONE NIPRnet is debilitating. These tools alone could jailbreak many stagnant data scientists. When I was presented with the idea of an IDE where the Coast Guard will begin to centralize its data storage, I began to get excited about possibilities.
There are any number of online Python and R courses available (ranging from free to well-established and expensive).
1
Many of these courses enable Python and/or R environments directly in a web browser. This is where my initial thoughts pertaining to an IDE for the Coast Guard went.
What if data scientists were not only enabled with a totality of Coast Guard data within a central data lake?
2
What if data scientists were given the Python and R kernels, side-by-side with the data, in the same environment?
3
Loading the data within an analytic sandbox for technical users could be as easy as importing new libraries and packages to Python, R, JavaScript, etc.
Speaking to the idea for a Coast Guard IDE, I initially wrote the following lines.
The Coast Guard IDE implementation shall include the following capabilities:
a text editing code workbook with recognition of syntax and keywords for major open-source data and analytic programming languages to include Python, R and SQL,
community standard open-source analytic packages for both Python and R with open-source analytic package updates desired at a frequency attempting to keep pace with respective community standards,
point-and-click graphical analysis tools in a user-friendly format for performing filters and generating visualizations of professional appearance,
upload capabilities for offline individual user data to be brought into private analytic sandboxes for manipulation within Python, R and SQL; but also to enable point-and-click graphical analysis and visualization generation,
export capabilities for user queried data to comma-separated-values and text file formats,
export capabilities for user generated code/text files, and
export capabilities for visualizations and analysis.
Honestly, it appears these thoughts were a pipe dream. In fact, I would be confident labeling them the data scientist pipe dream for Coast Guard data scientists.
In a previous post I highlighted a day-to-day user of CGONE NIPRnet will have an inordinate amount of Information Technology (IT) problems, struggles and frustrations with their experience.
4
The fact my desired capabilities amount to a pipe dream compounds the day-to-day IT issues data scientists already face.
Essentially, data scientists came up with a creative solution. Yes, Python and R may represent security risks to network administrators of cybersecurity targets like NIPRnet. But if these capabilities are embedded into a website visited through a web browser, then the risk is absorbed because Python and/or R are never installed on the C:\ drive of the machine. And the idea and capability of analytic sandboxes making Python, R and SQL available to utilize beside and against Coast Guard data is feasible with the Coast Guard’s current technology. It is available on CGONE NIPRnet even! And yet, it is as far away as it ever was due to decision-making. Considering Python, R and SQL are free and open-source languages, their continued capability gap is more and more frustrating. Which is a major shortcoming of the Coast Guard’s IDE.
These views are mine and should not be construed as the views of the U.S. Coast Guard.