In this current world of business, organizations are amassing and storing more data than ever before. We truly are in the age of Big Data, which often presents as many challenges for growing companies as it does benefits.
Most of the data being gathered by your organization is going to be used to improve something about the way you do business. Whether it’s information about how your users are utilizing your product, results gathered from your marketing efforts, or internal statistics about your development processes, your company’s constantly growing data is a major asset that, with the correct analysis, can increase your bottom line.
Deloitte annual Technology Trends report analyses the trends that could disrupt businesses in the next 18-24 months. The 2017 Report recognizes dark analytics as a disruptive technology, among others.
Reports state that 70 percent or more of enterprise data is usually inaccessible for analysis. This data is either locked away or remains hidden in the form of email message files, word processing documents, spreadsheets, PDF files, drawings, photographs, handwritten notes, scanned docs, notes, and flags. Enterprises use the term “Dark Data” to define such data types.
Insights generated from these types of data can be combined with the already available structured data insights, and utilized to address various industry pain points. Computer vision and pattern recognition has made it possible for enterprises to unlock insights from unstructured data, that was considered lost until now. There is a separate field dedicated for this, called “Dark Analytics”.
Understanding Dark Analytics and Dark Data
“Dark data” is typically thought of as data that is collected, but that remains unused for anything more than its intended purpose. Gartner originally coined the term and defined it as, “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing)”.
The bulk of that data is unstructured data. Unstructured, or raw, data includes raw text, text messages, emails (such as internal organizational emails), videos (such as surveillance footage), audio (such as call center recordings), image files, data from Internet of Things (IoT) sensors, and geographic (geolocation) data. Dark data can include structured data too, assuming the data is not being analyzed.
The definition of dark data doesn’t have to be restricted to unused data however. It can be expanded to include any data that is unfound or flat-out hidden. For example, data buried in the deep web, which is not collected by most companies and is hard to find. Deep-web data is data that isn’t indexed by typical search engines, and it can include data from academia, government agencies, user communities, and more. It can also include data on the dark web, which consists of sites only accessible through special means and that are largely untraceable.
Data that falls within this expanded definition of dark data used to be all but unusable due to its location, its sheer quantity, and the resources needed to find and analyze it. However, tools have evolved or have been created to allow organizations to analyze dark data along with big data.
Due to all mentioned above Dark Analytics transcends the barriers of structured data, casting a much wider data net that can capture a wider array of unstructured data, which was previously untapped or hidden.
The three data dimensions that dark analytics stresses on include:
Traditional unstructured data: Most of the organizations have heaps of structured and unstructured data. Emails, notes, messages, documents, logs, and notifications usually constitute the “traditional” unstructured data. These are usually text-based, and largely remain untapped. They could reveal information on pricing, customer behavior, and competitors leveraging dark analytics.
Nontraditional unstructured data: The second dimension to dark analytics focuses on a different category of unstructured data, audio and video files, and still images, among others. New opportunities for signal detection and response are realized when a layer of analytics is added in real-time to audio and video feeds.
Data that lies within the deep web: The third dimension to dark analytics features the infamous deep web. This could contain the largest body of untapped information, comprising data curated by academics, consortia, government agencies, communities, and other third-party domains.
The sheer size of the domain and its lack of structure makes data search daunting task for businesses. However, the intelligence community constantly monitors the volume and context of deep web activity to identify potential threats. On the same lines, enterprises might soon have the tools required to curate competitive intelligence from deep web. For instance, Deep Web Technologies designs search tools for retrieving and analyzing data, usually inaccessible to standard search engines.
The Sources of Dark Data
Before discussing use cases involving dark data, let’s take a quick look at where dark data comes from and why it is so prevalent in many businesses.
Infrastructure and business operations generate much more data than many companies are equipped to interpret. For example, your networking devices probably generate huge amounts of information. Even if you take the time to collect all that machine data, it remains dark unless you analyze it.
In other cases, an inability to work with data efficiently is the reason the data stays in the dark. If the data is stored in a format that your analytics tools don’t support, you lack the ability to turn it into actionable information. In other cases, dark data may be stored on devices from which it is difficult to offload into analytics platforms.
Pitfalls in mining dark data
The field is relatively new, and venturing into the landscape could imply risks for the continued business health and well-being of enterprises. Some of the typical risks faced by an enterprise leveraging dark analytics include:
Legal and regulatory risk: Data covered by mandate or regulation, for instance confidential, financial information (credit card or other account data), or patient records could end up appearing anywhere in dark data collections. This exposure could involve legal and financial liability.
Intelligence risk: Dark data can often contain proprietary or sensitive information, which reflects upon business operations, practices, competitive advantages, important partnerships, and joint ventures. Disclosure of such classified information could compromise important business activities and relationships.
Open-ended exposure: Dark data can contain unknown and unevaluated sources of intelligence, entailing exposure to loss or harm.
Reputation risk: A data breach usually reflects badly on the organizations affected. Enterprises could end up losing customer trust and reputation, in case such an incident occurs.
Handling dark data better with Dark Analytics
Undoubtedly dark analytics promises growth for enterprises in the upcoming years, despite the risks involved. The space will draw a lot of investments form companies worldwide. However, it’s unlikely that all the dark data will be valuable. Enterprises will have to devise a strategy while approaching the space.
Care must be taken to regularly audit and trim the database. The old data must be structured and assigned categories, to make it easier for businesses to swiftly retrieve the stored information later. To deal with data security concerns, it’s advisable to encrypt the data. Companies must ensure that encryption is performed both for data sitting in the in-house servers and the cloud storage.
Furthermore, businesses must implement data retention and safe disposal policies in place, which allows them to retain valuable data for later use. The policies must be aligned with the prescriptions of the Department of Defense.
Putting Your Dark Data to Work
The crucial point to understand about dark data is that it doesn’t have to remain dark. The minute you take dark data and leverage it to gain insights, the data becomes actionable and is no longer dark. To illustrate the point, consider the following examples of ways in which common forms of dark data can be used:
• Networking machine data. As noted above, servers, firewalls, network monitoring tools and other parts of your environment generate large amounts of machine data related to network operations. Avoid dark networking data by using this information to analyze network security, as well as to monitor network activity patterns to ensure that your network infrastructure is never under- or over-utilized.
• Customer support logs. Most businesses maintain records of customer-support interactions that include information such as when a customer contacted the business, which type of communication channel was used, how long the engagement lasted and so on. Don’t make the mistake of leaving this data in the dark, or using it only when you need to research a customer issue. Instead, build it into your analytics workflows by leveraging it to help understand when your customers are most likely to contact you, what their preferred methods of contact are and so on.
• “Legacy” system log. If you have mainframes or other older types of systems running in your environment, you may think that there is no way to use modern analytics tools to understand them. But you can. By offloading system logs and other data from these systems into an analytics platform like Hadoop, you can make sure you are not leaving this “legacy” data in the dark.
• Non-textual data. Most data analytics workflows are built around textual data, which is easier to ingest. You can also make use of video, audio or other non-textual files, however. You can analyze the meta data associated with them, or, if appropriate, translate speech to text in order to gain more insight into the content of the data itself.
How can Dark Analytics lead to unexplored opportunities?
Dark data is potentially a land of undiscovered, neglected opportunities. It has so much to offer for the entire length and breadth of the industry. Companies can gain valuable insights to drive their business.
Dark analytics can help organizations precisely forecast demand for products and services by accurately analyzing clickstream data, or obtaining product telematics. Besides, it can help them solve customer issues by isolating them. Dark analytics can also help companies in building a powerful supply chain, by furnishing them with granular level information.
Case in point involves a project at Copenhagen Airport, which serves as a great instance for dark analytics. The airport was collating useful information by crunching the data in the log files of the airport’s wi-fi routers. Passengers’ smartphones would “ping” routers while they walked through the terminals, offering data on passenger movements. This data showcased the ability to answer commercial questions, for say, “which is the most visited area of duty free?”
In recent years, however, growing numbers of retailers have begun exploring different approaches to developing digital experiences. Some are analyzing previously dark data culled from customers’ digital lives and using the resulting insights to develop merchandising, marketing, customer service, and even product development strategies that offer shoppers a targeted and individualized customer experience.
Stitch Fix, for example, is an online subscription shopping service that uses images from social media and other sources to track emerging fashion trends and evolving customer preferences. Its process begins with clients answering a detailed questionnaire about their tastes in clothing. Then, with client permission, the company’s team of 60 data scientists augments that information by scanning images on customers’ Pinterest boards and other social media sites, analyzing them, and using the resulting insights to a develop a deeper understanding of each customer’s sense of style. Company stylists and artificial intelligence algorithms use these profiles to select style-appropriate items of clothing to be shipped to individual customers at regular intervals.
So as we can see, dark data can be a tremendously powerful weapon when executed. It’s the key to finding new insights, creating new revenue opportunities, developing new partnerships and shifting your business into the data-driven century. What do you think; will dark analytics really become the hot spot of coming year? Can you point any other tremendous technologies of 2018 year as the most interesting ones? Please share with me your thoughts and estimates, and we will be able to know soon whether you were right or wrong, cause New Year is almost here!
Surely it sounds exciting … but seems digital privacy is more at risk.