Tackling Challenges and Mastering Strategies for Integrating Disparate Data Sources

Written By : Alin Mircea

Tackling Challenges and Mastering Strategies for Integrating Disparate Data Sources - image

In 1597, Sir Francis Bacon wrote for posterity a famous saying: “knowledge is power”. Almost five centuries later the significance of this saying is more actual than ever, as modern businesses depend entirely on their ability to gather, process, and interpret data.

As industrial automation systems like DCS (Distributed Control System), SCADA (Supervisory Control And Data Acquisition), MES (Manufacturing Execution System) and ERP (Enterprise Resource Planning) evolved throughout the years, more and more data was generated thus leading to the creation of the vast collections of disparate data sources that exist today.

Disparate data sources refers to different types or formats of data that reside in separate systems, databases, or file formats, and that are not necessarily designed to be integrated or compatible with each other. Speaking generally about industrial automation environments, disparate data sources often refers to the actual devices designed specifically to monitor and control automated processes like assembly lines, machine functions or robotic devices.

These types of devices can span from programmable logic controllers (PLCs), remote terminal units (RTUs) to computer numerical control machines (CNCs), injection molding machines, torque tool controllers (DC tools) and much more. These devices generate data during process control activities, and this data can be tremendously valuable to other systems and parts of the business. In order to obtain insights from these highly disparate potential wodata sources, companies need to find a way to provide access to them from a unified system, which enables a more comprehensive view of their operations.

This can prove to be one of the biggest challenges companies face in present times.

How do you integrate different data sources?

Keeping our ultimate goal of industrial system integration in mind, let’s explore the technology required for system integration – both the physical act of systems integration and the digital integration steps required after physical integration is complete.

  • Physical integration

Different devices from industrial automation systems use different physical communication mediums. The first type of wired medium to be used broadly was coaxial cable. It consists of an inner conductor surrounded by a concentric conducting shield, with the two separated by an insulating material. Coaxial cabling has certain limitations but is still widely-used today, albeit in more specific, less general circumstances.

Coaxial interfaces are typically converted into a different medium prior to integration into higher-level systems like those systems that typically acquire, combine, and store data from disparate systems. Sometimes this conversion and integration is accomplished by a controller device like a PLC with other broad capabilities and other times, conversion might be done via a simple, more specific converter device designed for this express purpose.

Another physical medium that has seen wide adoption within industrial automation environments is the Serial cable, or Serial communications interface. This method uses one or more transmission lines to send and receive data, and that data is continuously sent and received one bit at a time. Three standards derived from this technology and they are in chronological order: RS-232, RS-422 and RS-485. Serial communication mediums are more limited in performance and network flexibility than other modern media however Serial cables and interfaces are still widely used today.

But, like Coaxial, integration to higher level parts of an environment where data is acquired and served to other systems – especially integration to off-the-shelf computer equipment like servers and PCs – is typically accomplished by converting the physical Serial medium and data format into something else (Read: Ethernet) before it is connected to upper-level systems.

In 1973, the twisted pair of copper wires invented originally by Alexander Graham Bell was further refined and expanded, and a new communication medium called the Ethernet cable (or RJ-45 cable) appeared.

Ethernet twisted pair consists of eight wires or four pairs with the pairs of wires bundled together and covered by a protective shield. This new twisted pair cabling was designed specifically for use with the relatively new Ethernet standard and quickly surpassed the previous physical medium used with Ethernet – coaxial cabling – in general adoption.

The speed and flexibility that Ethernet provided has allowed this physical medium to become a mainstay of most modern computer networks, with nearly all computer technology and software operating systems supporting Ethernet and its ubiquitous twisted pair “LAN cable” today as a native network type. Twisted-pair copper cabling is still a common medium for Ethernet communication however the Ethernet standard has been mapped to wireless transmission technologies as well.

Another wired medium common in both industrial operations and information technology environments is fiber optic cables. A fiber optic cable is a thin, flexible, transparent medium made of very fine glass or plastic fibers. It utilizes the physical principle of Total Internal Reflection (TIR) for signal propagation and lossless data transmission.

To transmit data, unlike twisted pairs or coaxial cables that operate using properties of lower-frequency electromagnetic radiation, fiber optic networking uses higher frequency electromagnetic radiation – visible light – in the form of light pulses generated by laser or an injection diode. Each pulse of light represents a single bit of data. Pulses occur extremely rapidly, theoretically offering 44 terabits (44,000 Gigabits) of bandwidth across a single optical cable. Different than ubiquitous Ethernet twisted pair, while fiber optic mediums may be present in various parts of Ethernet and non-Ethernet networks, fiber is typically converted to something like Ethernet twisted pair before integration to common systems like the servers, PCs, and other higher-level systems.

Apart from these, there are also wireless communication mediums that can be used in certain scenarios. Various wireless signaling technologies are commonly leveraged for system integration such as radio frequency (RF), Bluetooth, NFC, and Wi-Fi. With the exception of Wi-Fi, especially in industrial environments these wireless mediums are typically converted into something more ubiquitous with more network design possibilities before integration to higher-level systems: for example, Bluetooth to Ethernet conversion.

Bluetooth serves as an excellent short-distance high-bandwidth wireless data transmission technology but its implementation is point-to-point and lacks the layered, segmented, and interconnected networking designs possible with other wireless and wired mediums. While it is possible to integrate higher-level systems to low-level data sources using point-to-point physical mediums like Bluetooth, integration is difficult, costly, and administratively burdensome while a simple conversion from non-Ethernet wireless to Ethernet (wireless or twisted-pair) removes these obstacles.

  • Digital integration

Through these types of physical mediums data can be represented in multiple ways known as communication or network protocols. Nowadays there is a very big collection of communication protocols used across disparate industrial automation systems. Some of the most used ones for serial communication mediums are Modbus ASCII or RTU, Allen-Bradley DF1 and DH+, and Siemens MPI, AS511 and 3964R. For Ethernet based communication mediums the list is even more comprehensive with some of the most used protocols within industrial environments being Modbus TCP, EtherNet/IP (Common Industrial Protocol), OPC DA and OPC UA, Siemens S7 Industrial Ethernet, and web-oriented protocols like MQTT and HTTP.

The need to integrate disparate industrial data sources into shared systems started to become more and more critical as software-based SCADA and HMI became more and more popular. These software-based control and acquisition systems often utilize common server or PC hardware, with Serial RS232 and Ethernet often the only physical interfaces possible. Even if physical conversion into RS232 or Ethernet media is possible, the data itself – the language used across the physical medium – is still an area of complexity where conversion typically must also occur.

To facilitate digital integration of industrial systems, in 1996 an organization called OPC Foundation was started with the OPC term standing for OLE for Process Control, giving way to a redefinition of OPC to mean “Open Platform Communication.” The goal of this organization was to manage a global structure in which users, vendors and consortia would collaborate to create data transfer standards for multi-vendor, multi-platform, secure and reliable interoperability in industrial automation. Microsoft Windows and its Component-Object Model (COM) of software design was selected as the framework, and within COM, Ethernet protocols and Ethernet physical media are used integrate systems.

Taking advantage of the OPC Foundation’s efforts, software applications emerged that would act as a middleware and bridge that digital connectivity gap. From a conceptual point of view, a middleware application can aggregate data from multiple disparate data sources through a set of digital and physical interfaces and expose it to upper-level systems through different digital and physical interfaces. In the industrial automation space, those type of applications quickly became to be known as OPC Servers. The middleware application would communicate to lower-level industrial devices and systems using a mixture of physical and digital mediums (though Serial and Ethernet mostly commonly) and expose the data to upper-level systems using OPC Data Access, a Windows-based COM-compliant protocol suitable for traversal across Ethernet networks as required.

In recent times, most of the major equipment vendors have adjusted their portfolio to include compatibility with Ethernet and even OPC standards to allow an easier integration between multiple systems and devices. Most devices offer native Ethernet interfaces or optional Ethernet modules, and some offer built-in OPC servers. Furthermore, the connectivity capabilities of middleware “OPC server” applications has often been expanded into a territory similar to API integration platforms like MuleSoft or Boomi by including HTTP publish and server interfaces, data modeling and event-based data collection and transmission.

However, the connectivity dilemma still exists for systems new and old. Even if automation devices implement Ethernet and a modern, open integration protocol like OPC, pressures to reduce connections to sensitive automation devices and the need for digital conversion of data to highly disparate data streams like MQTT and HTTP have further expanded the need – and functionality – of third party “OPC servers” or industrial middleware. These tools can offer connectivity capability to both new and legacy systems while providing multiple ways to provide the data to higher-level systems, OPC, HTTP, MQTT and otherwise.

Old and new challenges

Unifying disparate data sources found in industrial automation environments can be a complex and challenging process as it involves establishing sometimes diverse physical and digital connections to process control equipment and machinery, and integrating multiple sets of data that may be stored in different formats, locations, or systems. Some of the challenges associated with unifying disparate data sources include:

  • Data quality issues

Different data sources may have varying levels of accuracy, consistency, completeness, and timeliness, which can make it difficult to obtain a unified view of the data. Data cleaning and transformation may be required to ensure that the data is consistent and reliable.

  • Data integration

Data from different sources may be stored in different formats, structures, or schemas, which can make it challenging to integrate the data into a unified system. This requires careful planning and coordination to ensure that the data is mapped correctly and that there are no inconsistencies or redundancies. For example in case of serial communications the actual physical distance of transmission path is a very big limitation. This can be accentuated in some cases by the lack of legacy interfaces on modern servers that need to be interconnected with older devices. Ethernet connections also introduce a series of challenges as some devices that lack this functionality might need expensive add-on modules whereas others that do have it might require to be configured with a duplicate private IPs thus resulting in additional reconfiguration of the current network infrastructure.

  • Technical compatibility

Different data sources may use different technologies or platforms, which can create technical compatibility issues when trying to integrate the data. This requires expertise in different technologies and a thorough understanding of data integration techniques. Every communication protocol regardless of the communication medium it uses has its own particularities and individual way of functioning. Add into the mix different degrees of complexity and maybe limited documentation and the result will be a very steep learning curve in understanding how to connect to disparate data sources.

  • Data governance

Unifying disparate data sources also requires attention to data governance issues, such as data security, privacy, and regulatory compliance. These issues can be complex and require careful planning and implementation to ensure that the data is protected and used ethically and legally.

How ThingWorx Kepware Server can help address these challenges

The challenges mentioned in the previous section highlight the difficulties in creating a good quality software solution that’s capable of interacting with so many disparate date sources. Kepware decided to tackle these hurdles since its inception in 1994.

Now after almost 30 years of existence, our solution called Kepware Server (also known widely as KEPServerEX) is the industry’s leading connectivity platform for integrating disparate industrial data sources. Kepware Server supports hundreds of protocols and thousands of disparate devices and systems, allowing users to integrate almost any equipment within the industrial automation space with a single platform, from legacy to cutting-edge. The collected data can be exposed through numerous interfaces and protocols like OPC DA, OPC UA, MQTT, and REST (HTTP) to upper-level systems like SCADA, MES, ERP, Clouds or IoT platforms.

The overall configuration workflow has been simplified through constant product iterations to minimize the learning curve and shorten integration time and complexity. Furthermore, new enhancements are constantly added to keep up with the latest protocols and best practices from the industry especially as it relates to security and overall system stability.

As part of PTC, Kepware also has a comprehensive Support & Maintenance program developed to enrich and lengthen the lifetime of Kepware software applications. This program combines software updates and upgrades with expert Technical Support services to help keep critical automation projects performing at an optimum level. It also increases personal operational efficiency by providing access to Kepware’s industry-leading expertise in the form of live conversations and access to self-driven learning material. All these factors across 30 years of successful software development has resulted in Kepware Server utilized in over 75,000 sites around the world, operating as part of PTC, an important American technology company and member of the S&P 500.


The integration of disparate industrial automation systems can be complex and difficult due to a variety of factors including physical and digital conversion. ThingWorx Kepware Server is a software solution that creates a bridge between new and legacy industrial hardware and software components and modern applications, providing a seamless integration of data between disparate systems and data sources.

With ThingWorx Kepware Server, users can provide secure and reliable connectivity to an enormous variety of devices and systems and present access to and data from all systems in a single, secure and unified manner. Kepware Server provides a critical part of machine control and data acquisition solutions, allowing for real-time data exchange in the most highly-diverse and demanding environments. The consistent, repeatable workflows allow for rapid time-to-value for any automation or digital transformation project for Industry 4.0, and the wide variety of protocol support his enables users to leverage the benefits of modern technology without having to replace legacy systems or disrupt existing operations.

By providing a flexible and scalable integration platform, Thingworx Kepware Server can help organizations to streamline operations, reduce costs, and improve overall efficiency, while enabling a smooth transition to new technologies and applications.