In a previous post I highlighted that as a data scientist in the Coast Guard each analysis was beginning the same way: which is to say, there is never an assumption the Coast Guard has data ready to go. Common preliminary questioning often includes:
“Where is the data?”
“You don’t have data. Where can we get data?”
Simply put, a data lake could begin to address some of the Coast Guard Operations Research and Data Analytics (ORDA) community’s largest problems. But is there an opportunity to address more Coast Guard data scientist desires; potentially killing two (or more even) birds with one stone?
Analytic Tool Availability on CGONE NIPRnet
Any quick internet search for industry standard data analytic tools will have both Python and R programming languages ranking high on all lists for top data analytic tools. Any data scientist will know that this is largely because analytic development and open-source development are largely occurring on both Python and R. Thus, these tools become indispensable to data scientists who want to keep up with industry trends and leverage the wisdom of crowds by crowd sourcing capability development.
However, it is this flexibility that poses such a risk to NIPRnet network administrators. Or at least, this is my understanding from the Coast Guard NIPRnet (CGONE) network administrators.
In a previous post I highlighted that any Coast Guard users:
“Will have countless stories of Information Technology (IT) related issues,”
1
pertaining to the CGONE NIPRnet.
For the Millennial and Gen Z generations, these occurrences are particularly frustrating. An affordable personal computer is bullet proof compared to what users experience on CGONE NIPRnet. So, imagine being a data scientist where you are sent to school by your organization and trained on industry standard tools, only to be denied the same tools you learned with upon your return to the organization. To add insult to injury, the tools you are denied are not denied on a high-end cost basis. Rather, these tools are free and open source. On your personal computer where you have administrative privileges, you are enabled to safely download and utilize the tools you learned upon and desire, all free of charge.
So, the same thing that makes Python and R programming languages so indispensable to a data scientist (the free capabilities developed on them by the open-source community), is the very poison pill that makes them so difficult for network administrators on NIPRnet. The situational irony of the scenario is undeniable!
Python and R through the Coast Guard’s Integrated Data Environment
There are a plethora of online Python and R courses available, ranging from free to well-established and expensive. Many of these courses will direct users to download and install Python and R kernels (again free and open source) and respective Integrated Development Environments (also free) for each language. However, many of the courses will enable Python or R environments directly in a web browser. Often in these scenarios capabilities of the environments are limited. Nevertheless, the proof of concept is strong. And, any Python/R user familiar with the web embedded Python and R environments would logically extract the full capabilities of Python and R environments could be made available as embedded environments with adequate server compute/storage space for the website.
These were my initial thoughts pertaining to an Integrated Data Environment for the Coast Guard. What if data scientists were not only enabled with a totality of Coast Guard data within a central data lake? What if data scientists were given the Python and R kernels, side-by-side with the data, in the same environment?
These views are mine and should not be construed as the views of the U.S. Coast Guard.