Nearly Half of Data Engineers Say Data Quality Issues are ‘Biggest Frustration’

A poll of data engineers has found data quality is still their biggest frustration and a showstopper for delivering impact and value quickly – with almost half ranking it above other common pipeline challenges.

Over 100 data engineers responded to the poll that was published by recently launched data transformation and infrastructure management tool, Pipeliner.

The poll found that almost 1 in 5 engineers find integrations with other systems to be their biggest challenge, while just less than 20 per cent point to performance bottlenecks. Engineers who took part also stated GDPR compliance, poor team cooperation and lack of access and permissions are high on their list of frustrations when carrying out their role.

Speaking on the results of the poll, Xavi Forde, founding engineer of Pipeliner and a data engineer himself, said:

“It’s no secret that data quality continues to be a root of major frustration for many data engineers. Couple this with an increasing number of organisations looking into adopting AI to support enterprise growth – and data engineers are under increasing pressure to ensure data is insight and AI ready.

“We know data is never perfect – but there are absolutely ways engineers can reduce the chances of data being compromised as it moves through the pipeline, and it all starts from having a well-documented pipeline with a complete traceability between your intended data transformation rules and your data transformation code so that no engineer has to spend hours and hours trying to untangle someone else’s badly written sql.”

To support data engineers in tackling some of their most common obstacles, Pipeliner launched its meta-data driven data transformation and infrastructure management tool in July. It takes mapping specification as an input and delivers data pipeline and infrastructure code directly to a data engineers’ GitHub repository, accelerating the development of data lakes, all while enforcing data governance.

With the Pipeliner, an engineer can go from a mapping specification to a live pipeline in a few minutes as opposed to hours if not days. It specialises in bespoke data pipeline design and implementation, enabling organisations to streamline data integration, optimise workflows, and uphold data quality through automated end-to-end pipeline creation.

Talking about the innovation behind Pipeliner, founder, Svetlana Tarnagurskaja, who will be hosting a panel of Great British Data Founders at Big Data LDN, says: “Pipeliner can help you build production-grade complex data transformation pipelines significantly faster – it’s a tool built by engineers for engineers, with users retaining a full control and ownership of their code, which was of paramount importance to us.

“The mission of Pipeliner is to make the build and maintenance of high-quality bespoke data lakes more affordable and accessible for the industry, whether it’s a small team in a charity sector or an established engineering team under a pressure of unlocking cost-savings in a large enterprise. Pipeliner automates the most-time consuming part of infrastructure and data transformation code  creation to remove bottlenecks, increases productivity and reduces cloud costs. It could save engineers days, even weeks of time.”

Pipeliner works through a typical three stage process:

  • Define – analysts or engineers define source to target transformation logic and data structures that need to be created, it is captured in a mapping specification
  • Generate –  Pipeliner takes the mapping specification as an input and generates the ETL jobs and infrastructure code.
  • Deploy –  Pipeliner delivers fully editable code straight to the Git repo of your choice, ready to be deployed, allowing the engineering team to retain full control of their code.

Backed by a team of experienced data engineers and leveraging cutting-edge technology, Pipeliner empowers businesses to extract actionable insights, make informed decisions, and foster growth through efficient data management.