Thus far we established the Coast Guard is seeking a data lake. We also established the Coast Guard’s insists on a persistent use of legacy software systems, which results in disparate, semi-structured data silos. And the issue of these data silos is compounded by the data silos being within disparate legacy software systems. As modern technology solutions are sought to connect or reconcile the Coast Guard’s data silos, legacy software systems cannot support the connection and transfer of data.
Without a data lake, what does the Coast Guard do with video footage from a boarding? Or what does the Coast Guard do with audio recordings of distress calls? Affordable modern technologies (microphones, video cameras, etc.) make collection of these types of data easy. And due to this ease and diligent military processes, the Coast Guard effectively achieves data collection from various “sensors” throughout worldwide operational network. Nevertheless, benefitting from this data to its fullest extent remains significantly more difficult.
In fact, the Coast Guard largely remains locked between paper logs and fully benefiting from electronic logs. As I have slightly alluded to in previous posts, mariners and sea going services have strong roots in data collection (e.g. diligent military processes). There is a good chance the Coast Guard’s logbooks date back farther than anyone would anticipate. This speaks to the impressive data collection roots within sea going services. However, as analog (paper) logs moved to digital (electronic), the Coast Guard did not want to compromise on the logbook formats it was used to seeing. So digital log adoption moved to Adobe.pdf files.
Obviously, there are ways Adobe.pdf files are used to feed structured databases. However, if you are remotely familiar with mariner logbooks you would understand they lend themselves extremely easily to tabbed, comma separated or varying spreadsheet formats. These logs track, at a specific interval, things like the: time, weather (air temperature, water temperature, wind speed, etc.), location, heading, course, etc.
At this point, it should be apparent the Coast Guard is struggling to leverage its data to its fullest extent. Which is why the idea of a data lake is so appealing. Bring all the data silos together and enable users to interact with an entirety of Coast Guard data. This alone represents a previously unaccomplished feat. So, what would data scientists in the Coast Guard seek regarding a Coast Guard data lake?
These views are mine and should not be construed as the views of the U.S. Coast Guard.