Blog Archives

Articles via RSS from IBM Big Data Hub

Innocens BV leverages IBM Technology to Develop an AI Solution to help detect potential sepsis events in high-risk newborns

From the moment of birth to discharge, healthcare professionals can collect so much data about an infant’s vitals—for instance, heartbeat frequency or every rise and drop in blood oxygen level. Although medicine continues to advance further, there’s still much to be done to help reduce the number of premature births and infant mortality. The worldwide statistics on premature births are staggering— the University of Oxford estimates that neonatal sepsis causes 2.5 million infant deaths annually.1

Babies born prematurely are susceptible to health problems. Sepsis or bloodstream infection is life threatening and a common complication when admitted in a Neonatal Intensive Care Unit (NICU).

At Innocens BV, the belief is that earlier identification of sepsis-related events in newborns is possible, especially given the vast amount of data points collected from the moment a baby is born. Years’ worth of aggregated data in the NICU could help lead us to a solution. The challenge was gleaning relevant insights from the vast amount of data collected to help identify those infants at risk. This mission is how Innocens BV began in the Neonatal Intensive Care Unit (NICU) at Antwerp University Hospital in Antwerp, Belgium in cooperation with the University of Antwerp. The NICU at the hospital is associated closely with the University , and its focus is on improving care for premature and low birthweight infants. We joined forces with a Bio-informatics research group from the University of Antwerp and started taking the first steps in developing a solution.

Using IBM’s technology and the expertise of their data scientists along with the knowledge and insights from the hospital’s NICU medical team, we kicked off a project to further develop the ideas into a solution that was aimed at using clinical signals that are routinely collected in clinical care to aid doctors with the timely detection of patterns in such data that are associated with a sepsis episode. The specific approach we took required the use of both AI and edge computing to create a predictive model that could process years of anonymized data to help doctors make informed decisions. We wanted to be able to help them observe and monitor the thousands of data points available to make informed decisions.

How AI powers the Innocens Project

When the collaboration began, data scientists at IBM understood they were dealing with a sensitive topic and sensitive information. The Innocens team needed to build a model that could detect subtle changes in neonates’ vital signs while generating as few false alarms as possible. This required a model with a high level of precision that also is built upon  key principles of trustworthy AI including transparency, explainability, fairness, privacy and robustness.

Using IBM Watson Studio, a service available on IBM Cloud Pak for Data, to train and monitor the AI solution’s machine learning models, Innocens BV could help doctors by providing data driven insights that are associated with a potential onset of sepsis. Early results on historical data show that many severe sepsis cases can be identified multiple hours in advance. The user interface providing the output of the predictive AI model is designed to help provide doctors and other medical personel with insights on individual patients and to augment their clinical intuition.

Innocens worked closely with IBM and medical personel at the Antwerp University Hospital to develop a purposeful platform with a user interface that is consistent and easy to navigate and uses a comprehensible AI model with explainable AI capabilities. With the doctors and nurses in mind, the team aimed to create a model that would allow the intended users to reap its benefits. This work was imperative for building trust between the users and the instruments that would help inform a clinician’s diagnosis. Innocens also involved doctors in the development process of building the user interface and respected the privacy and confidentiality of the anonymous historical patient data used to train the model within a robust data architecture.

The technology and outcomes of this research project could have the potential to not only help the patients at Antwerp University Hospital, but to scale for different NICU centers and help other hospitals as they work to combat neonatal sepsis. Innocens BV is working in collaboration with IBM to explore how Innocens can continue to leverage data to help train transparent and explainable AI models capable of finding patterns in patient data, providing doctors with additional data insights and tools that help inform clinical decision-making.

The impact of the Innocens technology is being investigated in clinical trials and is not yet commercially available.

To learn more about how Innocens BV is putting data and AI to work, visit the case study and short documentary film here.

The post Innocens BV leverages IBM Technology to Develop an AI Solution to help detect potential sepsis events in high-risk newborns appeared first on Journey to AI Blog.

A step-by-step guide to setting up a data governance program

In our last blog, we delved into the seven most prevalent data challenges that can be addressed with effective data governance. Today we will share our approach to developing a data governance program to drive data transformation and fuel a data-driven culture.

Data governance is a crucial aspect of managing an organization’s data assets. The primary goal of any data governance program is to deliver against prioritized business objectives and unlock the value of your data across your organization.

Realize that a data governance program cannot exist on its own – it must solve business problems and deliver outcomes. Start by identifying business objectives, desired outcomes, key stakeholders, and the data needed to deliver these objectives. Technology and data architecture play a crucial role in enabling data governance and achieving these objectives.

Don’t try to do everything at once! Focus and prioritize what you’re delivering to the business, determine what you need, deliver and measure results, refine, expand, and deliver against the next priority objectives. A well-executed data governance program ensures that data is accurate, complete, consistent, and accessible to those who need it, while protecting data from unauthorized access or misuse.Diagram showing the intersection of technology, processes and people that creates data governance programs and policyConsider the following four key building blocks of data governance:

  • People refers to the organizational structure, roles, and responsibilities of those involved in data governance, including those who own, collect, store, manage, and use data.
  • Policies provide the guidelines for using, protecting, and managing data, ensuring consistency and compliance.
  • Process refers to the procedures for communication, collaboration and managing data, including data collection, storage, protection, and usage.
  • Technology refers to the tools and systems used to support data governance, such as data management platforms and security solutions.Agile Data Governance - iterative and incremental

For example, if the goal is to improve customer retention, the data governance program should focus on where customer data is produced and consumed across the organization, ensuring that the organization’s customer data is accurate, complete, protected, and accessible to those who need it to make decisions that will improve customer retention.

It’s important to coordinate and standardize policies, roles, and data management processes to align them with the business objectives. This will ensure that data is being used effectively and that all stakeholders are working towards the same goal.

Starting a data governance program may seem like a daunting task, but by starting small and focusing on delivering prioritized business outcomes, data governance can become a natural extension of your day-to-day business.

Building a data governance program is an iterative and incremental process

Step 1: Define your data strategy and data governance goals and objectives

What are the business objectives and desired results for your organization? You should consider both long-term strategic goals and short-term tactical goals and remember that goals may be influenced by external factors such as regulations and compliance.

Examples of types of data governance policies

A data strategy identifies, prioritizes, and aligns business objectives across your organization and its various lines of business. Across multiple business objectives, a data strategy will identify data needs, measures and KPIs, stakeholders, and required data management processes, technology priorities and capabilities.

It is important to regularly review and update your data strategy as your business and priorities change. If you don’t have a data strategy, you should build one – it doesn’t a long time, but you do need the right stakeholders to contribute.

Once you have a clear understanding of business objectives and data needs, set data governance goals and priorities. For example, an effective data governance program may:

  • Improve data quality, which can lead to more accurate and reliable decision making
  • Increase data security to protect sensitive information
  • Enable compliance and reporting against industry regulations
  • Improve overall trust and reliability of your data assets
  • Make data more accessible and usable, which can improve efficiency and productivity.

Clearly defining your goals and objectives will guide the prioritization and development of your data governance program, ultimately driving revenue, cost savings, and customer satisfaction.

Step 2: Secure executive support and essential stakeholders

Identify key stakeholders and roles for the data governance program and who will need to be involved in its execution. This should include employees, managers, IT staff, data architects, and line of business owners, and data custodians within and outside your organization.

An executive sponsor is crucial – an individual who understands the significance and objectives of data governance, recognizes the business value that data governance enables, and who supports the investment required to achieve these outcomes.

With key sponsorship in place, assemble the team to understand the compelling narrative, define what needs to be accomplished, how to raise awareness, and how to build the funding model that will be used to support the implementation of the data governance program.

The following is an example of typical stakeholder levels that may participate in a data governance program:

Pyramid chart showing typical Data Governance structure and roles

By effectively engaging key stakeholders, identifying and delivering clear business value, the implementation of a data governance program can become a strategic advantage for your organization.

Step 3: Assess, build & refine your data governance program

With your business objectives understood and your data governance sponsors and stakeholders in place, it’s important to map these objectives against your existing People, Processes and Technology capabilities to achieve these objectives.

Current state analysis for data governance

Data management frameworks such as the EDM Council’s DCAM and CDMC offer a structured way to assess your data maturity against industry benchmarks with a common language and set of data best practices.

Look at how data is currently being governed and managed within your organization. What are the strengths and weaknesses of your current approach? What is needed to deliver key business objectives?

Remember, you don’t have to (nor should you) do everything at once. Identify areas for improvement, in context of business objectives, to prioritize your efforts and focus on the most important areas to deliver results to the business in a meaningful way. An effective and efficient data governance program will support your organization’s growth and competitive advantage.

Step 4: Document your organization’s data policies

Data policies are a set of documented guidelines for how an organization’s data assets are consistently governed, managed, protected and used. Data policies are driven by your organization’s data strategy, align against business objectives and desired outcomes, and may be influenced by internal and external regulatory factors. Data policies may include topics such as data collection, storage, and usage, data quality and security:

Examples of types of data governance policies

Data policies ensure that your data is being used in a way that supports the overall goals of your organization and complies with relevant laws and regulations. This can lead to improved data quality, better decision making, and increased trust in the organization’s data assets, ultimately leading to a more successful and sustainable organization. 

Step 5: Establish roles and responsibilities

Define clear roles and responsibilities of those involved in data governance, including those responsible for collecting, storing, and using data. This will help ensure that everyone understands their role and can effectively contribute to the data governance effort.

A table showing typical Data Governance roles and responsibilities

The structure of data governance can vary depending on the organization. In a large enterprise, data governance may have a dedicated team overseeing it (as in the table above), while in a small business, data governance may be part of existing roles and responsibilities. A hybrid approach may also be suitable for some organizations. It is crucial to consider company culture and to develop a data governance framework that promotes data-driven practices. The key to success is to start small, learn and adapt, while focusing on delivering and measuring business outcomes.

Having a clear understanding of the roles and responsibilities of data governance participants can ensure that they have the necessary skills and knowledge to perform their duties.

Step 6: Develop and refine data processes

Data governance processes ensure effective decision making and enable consistent data management practices by coordinating teams across (and outside of) your organization. Additionally, data governance processes can also ensure compliance with regulatory standards and protect sensitive data.

Data processes provide formal channels for direction, escalation, and resolution. Data governance processes should be lightweight to achieve your business goals without adding unnecessary burden or hindering innovation.

Processes may be automated through tools, workflow, and technology.

It is important to establish these processes early to prevent issues or confusion that may arise later in the data management implementation.

Step 7 – Implement, evaluate, and adapt your strategy

Once you have defined the components of your data governance program, it’s time to put them in action. This could include implementing new technologies or processes or making changes to existing ones.Data governance program - Drive data transformation and fuel a data-driven culture It is important to remember that data governance programs can only be successful if they demonstrate value to the business, so you need to measure and report on the delivery of the prioritized business outcomes. Regularly monitoring and reviewing your strategy will ensure that it is meeting your goals and business objectives.

Continuously evaluate your goals and objectives and adjust as needed. This will allow your data governance program to evolve and adapt to the changing needs of the organization and the industry. An approach of continuous improvement will enable your data governance program to stay relevant and deliver maximum value to the organization.

Get started on your data governance program

In conclusion, by following an incremental structured approach and engaging key stakeholders, you can build a data governance program that aligns with the unique needs of your organization and supports the delivery of accelerated business outcomes.

Implementing a data governance program can present unique challenges such as limited resources, resistance to change and a lack of understanding of the value of data governance. These challenges can be overcome by effectively communicating the value and benefits of the program to all stakeholders, providing training and support to those responsible for implementation, and involving key decision-makers in the planning process.

By implementing a data governance program that delivers key business outcomes, you can ensure the success of your program and drive measurable business value from your organization’s data assets while effectively manage your data, improving data quality, and maintaining the integrity of data throughout its lifecycle.

So where are you in your data governance journey? Reach out to IBM Expert Labs – we’d be happy to help.

In case you missed them, check out our earlier blogs, Understanding Data Governance and Unlocking the power of data governance by understanding key challenges.

The post A step-by-step guide to setting up a data governance program appeared first on Journey to AI Blog.

3 key reasons why your organization needs Responsible AI

Responsibility is a learned behavior. Over time we connect the dots, understanding the need to meet societal expectations, comply with rules and laws, and to respect the rights of others. We see the link between responsibility, accountability and subsequent rewards. When we act responsibly, the rewards are positive; when we don’t, we can face negative consequences including fines, loss of trust or status, and even confinement. Adherence to responsible artificial intelligence (AI) standards follows similar tenants.

Gartner predicts that the market for artificial intelligence (AI) software will reach almost $134.8 billion by 2025. 

Achieving Responsible AI

As building and scaling AI models for your organization becomes more business critical, achieving Responsible AI (RAI) should be considered a highly relevant topic. There is a growing need to proactively drive responsible, fair, and ethical decisions, designed to:

Manage risk and reputation

No organization wants to be in the news for the wrong reasons, and recently there have been a lot of stories in the press regarding issues of unfair, unexplainable, or biased AI. Organizations need to protect individuals’ privacy and build trust. Incorrect or biased use of AI based on faulty datasets or assumptions can result in lawsuits and an erosion of stakeholder, customer, stockholder and employee trust. Ultimately, this can lead to reputational damage, lost sales and decreased revenue.

Adhere to ethical principles

The importance of driving ethical decisions – not favoring one group over another, requires AI systems that achieve fairness. This necessitates the detection of bias during data acquisition, building, training, deploying and monitoring models.  Fair decisions require the ability to adjust to changes in behavioral patterns and profiles. This may demand model retraining or rebuilding.

Protect and scale against government regulations

AI regulations are growing and changing at a rapid pace and noncompliance can lead to costly audits, fines and negative press. Global organizations with branches in multiple countries are challenged to meet local and country specific rules and regulations. Organizations in highly regulated markets such as healthcare, government and financial services have additional challenges in meeting industry regulations around data and models.

“The average cost of compliance came in at $5.47 million, while the average cost of non-compliance was $14.82 million. The average cost of non-compliance has risen more than 45% in 10 years. The true cost of non-compliance for organizations due to a single non-compliance event is an average of $4 million in revenue.”  The True Cost of Noncompliance

Responsible AI requires governance

Despite good intentions and evolving technologies, achieving responsible AI can be challenging.  AI requires AI governance, not after the fact but baked into AI strategy of your organization. So what is AI governance? It is the process of defining policies and establishing accountability to guide the creation and deployment of AI systems.

For many of today’s organizations today, governing AI requires a lot of manual work that include  the use of multiple tools, applications and platforms. Lack of automation can lead to lengthy model approval, validation and deployment cycles during which model drift and bias can happen. Manual processes can lead to “black box models” that lack transparent and explainable analytic results.

Explainable results are crucial when facing questions on the performance of AI algorithms and models. Your company’s management, stakeholders and stockholders expect accountability.  Your customers deserve and are holding your organization accountable to explain reasons for analytics-based decisions. These may include credit, mortgage and school denials, or the details of healthcare diagnosis or treatment. Documented, explainable model facts are necessary when defending analytic decisions.

An AI Governance solution driving responsible, transparent and explainable AI workflows

The right AI governance solution can help to better direct, manage and monitory your organization’s AI activities. With the right end-to-end automated platform, your organization can strengthen the ability to meet regulatory requirements, protect the reputation of your organization and address ethical concerns.

The IBM AI Governance solution automates across the AI lifecycle from data collection, model building, deploying and monitoring. Model facts are centralized for AI transparency and explainability. This comprehensive solution comes without the excessive costs of switching from your current data science platform. This solution includes:

Components of the solution include:

Lifecycle governance

Monitor, catalog and govern AI models from where they reside. Automate the capture of model metadata and increase predictive accuracy to identify how AI is used and where models need to be reworked.

Risk management

Automate model facts and workflows for compliance to business standards. identify, manage, monitory and report on risk and compliance at scale. Dynamic dashboards provide customizable results for your stakeholders and enhance collaboration across multiple regions and geographies.

Regulatory compliance

Translate external AI regulations into policies for automated enforcement. This results in enhanced adherence to regulations for audit and compliance purposes and provides customized reporting to key stakeholders.


Are you interested in learning more about IBM AI Governance? Read the e-book or visit the website.

Want to see how IBM AI governance can help your organization? Access the free trial or book a meeting.

The post 3 key reasons why your organization needs Responsible AI appeared first on Journey to AI Blog.

5 misconceptions about cloud data warehouses

In today’s world, data warehouses are a critical component of any organization’s technology ecosystem. They provide the backbone for a range of use cases such as business intelligence (BI) reporting, dashboarding, and machine-learning (ML)-based predictive analytics, that enable faster decision making and insights.

The rise of cloud has allowed data warehouses to provide new capabilities such as cost-effective data storage at petabyte scale, highly scalable compute and storage, pay-as-you-go pricing and fully managed service delivery. Companies are shifting their investments to cloud software and reducing their spend on legacy infrastructure. In 2021, cloud databases accounted for 85%1 of the market growth in databases. These developments have accelerated the adoption of hybrid-cloud data warehousing; industry analysts estimate that almost 50%2 of enterprise data has been moved to the cloud.

What is holding back the other 50% of datasets on-premises? Based on our experience speaking with CTOs and IT leaders in large enterprises, we have identified the most common misconceptions about cloud data warehouses that cause companies to hesitate to move to the cloud.

Misconception 1: Cloud data warehouses are more expensive

When considering moving data warehouses from on-premises to the cloud, companies often get sticker shock at the total cost of ownership. However, a more detailed analysis is needed to make an informed decision. Traditional on-premises warehouses require a significant initial capital investment and ongoing support fees, as well as additional expenses for managing the enterprise infrastructure. In contrast, cloud data warehouses may have a higher annual subscription fee, but they incorporate the upfront investment and additional ongoing overhead. Cloud warehouses also provide customers with elastic scalability, cheaper storage, savings on maintenance and upgrade costs, and cost transparency, which allows customers to have greater control over their warehousing costs. Industry analysts estimate that organizations that implement best practices around cloud cost controls and cloud migration see an average savings of 21%3 when using a public cloud and a 13x5 revenue growth rate for adopters of hybrid-cloud through end-to-end reinvention.

Misconception 2: Cloud data warehouses do not provide the same level of security and compliance as on-premises warehouses

Companies in highly regulated industries such as finance, insurance, transportation and manufacturing have a complex set of compliance requirements for their data, often leading to an additional layer of complexity when it comes to migrating data to the cloud. In addition, companies have complex data security requirements. However, over the past decade, a vast array of compliance and security standards, such as SOC2, PCI, HIPAA, and GDPR, have been introduced, and met by cloud providers. The rise of sovereign clouds and industry specific clouds are addressing the concerns of governmental and industry specific regulatory requirements. In addition, warehouse providers take on the responsibility of patching and securing the cloud data warehouse, to ensure that business users stay compliant with the regulations as they evolve.

Misconception 3: All data warehouse migrations are the same, irrespective of vendors

While migrating to the cloud, CTOs often feel the need to revamp and “modernize” their entire technology stack – including moving to a new cloud data warehouse vendor. However, a successful migration usually requires multiple rounds of data replication, query optimization, application re-architecture and retraining of DBAs and architects.

To mitigate these complexities, organizations should evaluate whether a hybrid-cloud version of their existing data warehouse vendor can satisfy their use cases, before considering a move to a different platform. This approach has several benefits, such as streamlined migration of data from on-premises to the cloud, reduced query tuning requirements and continuity in SRE tooling, automations, and personnel. It also enables organizations to create a decentralized hybrid-cloud data architecture where workloads can be distributed across on-prem and cloud.

Misconception 4: Migration to cloud data warehouses needs to be 0% or 100%

Companies undergoing cloud migrations often feel pressure to migrate everything to the cloud to justify the investment of the migration. However, different workloads may be better suited for different deployment environments. With a hybrid-cloud approach to data management, companies can choose where to run specific workloads, while maintaining control over costs and workload management. It allows companies to take advantage of the benefits of the cloud, such as scale and elasticity, while also retaining the control and security of sensitive workloads in-house. For example, Marriott International built a decentralized hybrid-cloud data architecture while migrating from their legacy analytics appliances, and saw a nearly 90% increase in performance. This enabled data-driven analytics at scale across the organization4.

Misconception 5: Cloud data warehouses reduce control over your deployment

Some DBAs believe that cloud data warehouses lack the control and flexibility of on-prem data warehouses, making it harder to respond to security threats, performance issues or disasters. In reality, cloud data warehouses have evolved to provide the same control maturity as on-prem warehouses. Cloud warehouses also provide a host of additional capabilities such as failover to different data centers, automated backup and restore, high availability, and advanced security and alerting measures. Organizations looking to increase adoption of ML are turning to cloud data warehouses that support new, open data formats to catalog, ingest, and query unstructured data types. This functionality provides access to data by storing it in an open format, increasing flexibility for data exploration and ML modeling used by data scientists, facilitating governed data use of unstructured data, improving collaboration, and reducing data silos with simplified data lake integration.

Additionally, some DBAs worry that moving to the cloud reduces the need for their expertise and skillset. However, in reality, cloud data warehouses only automate the operational management of data warehousing such as scaling, reliability and backups, freeing DBAs to work on high value tasks such as warehouse design, performance tuning and ecosystem integrations.

By addressing these five misconceptions of cloud data warehouses and understanding the nuances, advantages, trade-offs and total cost ownership of both delivery models, organizations can make more informed decisions about their hybrid-cloud data warehousing strategy and unlock the value of all their data.

Getting started with a cloud data warehouse

At IBM we believe in making analytics secure, collaborative and price-performant across all deployments, whether running in the cloud, hybrid, or on-premises. For those considering a hybrid or cloud-first strategy, our data warehousing SaaS offerings including IBM Db2 Warehouse and Netezza Performance Server, are available across AWS, Microsoft Azure, and IBM Cloud and are designed to provide customers with the availability, elastic scaling, governance, and security required for SLA-backed, mission critical analytics.

When it comes to moving workloads to the cloud, IBM’s Expert Labs migration services ensure 100% workload compatibility between on-premises workloads and SaaS solutions.

No matter where you are in your journey to cloud, our experts are here to help customize the right approach to fit your needs. See how you can get started with your analytics journey to hybrid cloud by contacting an IBM database expert today.

IBM, db2, cloud data warehouse, data warehouse, db2 on cloud, apache spark, db2 warehouse on cloud

The post 5 misconceptions about cloud data warehouses appeared first on Journey to AI Blog.

Unlocking the power of data governance by understanding key challenges

In our last blog, we introduced Data Governance: what it is and why it is so important. In this blog, we will explore the challenges that organizations face as they start their governance journey.

Organizations have long struggled with data management and understanding data in a complex and ever-growing data landscape. While operational data runs day-to-day business operations, gaining insights and leveraging data across business processes and workflows presents a well-known set of data governance challenges that technology alone cannot solve.

Every organization deals with the following challenges of data governance, and it is important to address these as part of your strategy:

Multiple data silos with limited collaboration

Data silos make it difficult for organizations to get a complete and accurate picture of their business. Silos exist naturally when data is managed by multiple operational systems. Silos may also represent the realities of a distributed organization. Breaking down these silos to encourage data access, data sharing and collaboration will be an important challenge for organizations in the coming years. The right data architecture to link and gain insight across silos requires the communication and coordination of a strategic data governance program.

Inconsistent or lacking business terminology, master data, hierarchies

Raw data without clear business definitions and rules is ripe for misinterpretation and confusion. Any use of data – such as combining or consolidating datasets from multiple sources – requires a level of understanding of that data beyond the physical formats. Combining or linking data assets across multiple repositories to gain greater data analytics and insights requires alignment. It needs linking with consistent master data, reference data, data lineage and hierarchies. Building and maintaining these structures requires the policies and coordination of effective data governance.

A need to ensure data privacy and data security

Data privacy and data security are major challenges when it comes to managing the increasing volume, usage, and complexity of new data. As more and more personal or sensitive data is collected and stored digitally, the risks of data breaches and cyber-attacks increase. To address these challenges and practice responsible data stewardship, organizations must invest in solutions that can protect their data from unauthorized access and breaches.

Ever-changing regulations and compliance requirements

As the regulatory landscape surrounding data governance continues to evolve, organizations need to stay up-to-date on the latest requirements and mandates. Organizations need to ensure that their enterprise data governance practices are compliant. They need to have the ability to:

  • Monitor data issues
  • Ensure data conformity with data quality
  • Establish and manage business rules, data standards and industry regulations
  • Manage risks associated with changing data privacy regulations

Lack of a 360-degree view of organization data

A 360-degree view of data refers to having a comprehensive understanding of all the data within an organization, including its structure, sources, and usage. Think about use cases like Customer 360, Patient 360 or Citizen 360 which provide organizational-specific views. Without these views, organizations will struggle to make data-driven business decisions, as they may not have access to all the information they need to fully understand their business and drive the right outcomes.

The growing volume and complexity of data

As the amount of data generated by organizations continues to grow, it will become increasingly challenging to manage and govern this data effectively. This may require implementing new technologies and data management processes to help handle the volume and complexity of data. These technologies and processes must be adopted to work within the data governance sphere of influence.

The challenges of remote work

The COVID-19 pandemic led to a significant shift towards remote work, which can present challenges for data governance initiatives. Organizations must find ways to effectively manage data and track compliance across data sources and stakeholders in a remote work environment. With remote work becoming the new normal, organizations need to ensure that their data is being accessed and used appropriately, even when employees are not physically present in the office. This requires a set of data governance best practices – including policies, procedures, and technologies – to control and monitor access to data and systems.


If any or all of these seven challenges feel familiar, and you need support with your data governance strategy, know that you aren’t alone. Our next blog will discuss the building blocks of a data governance strategy and share our point of view on how to establish a data governance framework from the ground up.

In the meantime, learn more about building a data-driven organization with The Data Differentiator guide for data leaders.

The post Unlocking the power of data governance by understanding key challenges appeared first on Journey to AI Blog.

Understanding Data Governance

If you’re in charge of managing data at your organization, you know how important it is to have a system in place for ensuring that your data is accurate, up-to-date, and secure. That’s where data governance comes in.

What exactly is data governance and why is it so important?

Simply put, data governance is the process of establishing policies, procedures, and standards for managing data within an organization. It involves defining roles and responsibilities, setting standards for data quality, and ensuring that data is being used in a way that is consistent with the organization’s goals and values.

But don’t let the dry language fool you – data governance is crucial for the success of any organization. Without it, you might as well be throwing your data to the wolves (or the intern who just started yesterday and has no idea what they’re doing). Poor data governance can lead to all sorts of problems, including:

Inconsistent or conflicting data

Imagine trying to make important business decisions based on data that’s all over the place. Not only is it frustrating, but it can also lead to costly mistakes.

Data security breaches

If your data isn’t properly secured, you’re leaving yourself open to all sorts of nasty surprises. Hackers and cyber-criminals are always looking for ways to get their hands on sensitive data, and without proper data governance, you’re making it way too easy for them.

Loss of credibility

If your data is unreliable or incorrect, it can seriously damage your organization’s reputation. No one is going to trust you if they can’t trust your data.

As you can see, data governance is no joke. But that doesn’t mean it can’t be fun! Okay, maybe “fun” is a stretch, but there are definitely ways to make data governance less of a chore. Here are a few best practices to keep in mind:

Establish clear roles and responsibilities

Make sure everyone knows who is responsible for what. Provide the necessary training and resources to help people do their jobs effectively.

Define policies and procedures

Set clear guidelines for how data is collected, stored, and used within your organization. This will help ensure that everyone is on the same page and that your data is being managed consistently.

Ensure data quality

Regularly check your data for accuracy and completeness. Put processes in place to fix any issues that you find. Remember: garbage in, garbage out.

Break down data silos

Data silos are the bane of any data governance program. By breaking down these silos and encouraging data sharing and collaboration, you’ll be able to get a more complete picture of what’s going on within your organization.

Of course, implementing a successful data governance program isn’t always easy. You may face challenges like getting buy-in from stakeholders, dealing with resistance to change, and managing data quality. But with the right approach and a little bit of persistence, you can overcome these challenges and create a data governance program that works for you.

So don’t be afraid to roll up your sleeves and get your hands dirty with data governance. Your data – and your organization – will thank you for it.

In future posts, my Data Elite team and I will help guide you in this journey with our point of view and insights on how IBM can help accelerate your organization’s data readiness with our solutions.

If you’d like to explore more about data governance now, we recommend you check out The Data Differentiator.

The post Understanding Data Governance appeared first on Journey to AI Blog.

Make data protection a 2023 competitive differentiator

Data privacy regulations, such as the General Data Protection Regulation (GDPR) in the European Union or the California Consumer Privacy Act (CCPA) in the state of California, are inescapable. By 2024, for instance, 75% of the entire world’s population will have its personal data protected by encryption, multifactor authentication, masking and erasure, as well as data resilience. It’s clear data protection, and its three pillars—data security, data ethics and data privacy—is the baseline expectation for organizations and stakeholders, both now and into the future.

While this trend is about protecting customer data and maintaining trust amid shifting regulations, it can also be a game changing, competitive advantage for organizations. In implementing cohesive data protection initiatives, organizations that can secure their users’ data see huge wins in brand image and customer loyalty and stand out in the marketplace.

The key to differentiation comes in getting data protection right, as part of an overall data strategy. Keep reading to learn how investing in the right data protection supports and optimizes your brand image.

Use a data protection strategy to maintain your brand image and customer trust

How a company collects, stores and protects consumer data goes beyond cutting data storage costs—it is a central driving force of its reputation and brand image. As a baseline, consumers expect organizations to adhere to data privacy regulations and compliance requirements; they also expect the data and AI lifecycle to be fair, explainable, robust, equitable and transparent.

Operating with a ‘data protection first’ point of view forces organizations to ask the hard hitting, moral questions that matter to clients and prospects: Is it ethical to collect this person’s data in the first place? As an organization, what are we doing with this information? Have we shared our intentions with respondents from whom we’ve collected this data? How long and where will this data be retained? Are we going to harm anybody by doing what we do with data?

Differentiate your brand image with privacy practices rooted in data ethics

When integrated appropriately, data protection and the surrounding data ethics creates a deep trust with clients and the market overall. Take Apple, for example. They have been exceedingly clear in communicating with consumers what data is collected, why they’re collecting that data, and whether they’re making any revenue from it. They go to great lengths to integrate trust, transparency and risk management into the DNA of the company culture and the customer experience. A lot of organizations aren’t as mature in this area of data ethics.

One of the key ingredients to optimizing your brand image through data protection and trust is active communication, both internally and externally. This requires organizations to rethink the way they do business in the broadest sense. To do this, organizations must lean into data privacy programs that build transparency and risk management into everyday workflows. It goes beyond preventing data breaches or having secure means for data collection and storage. These efforts must be supported by integrating data privacy and data ethics into an organization’s culture and customer experiences.

When cultivating a culture rooted in data ethics, keep these three things in mind:

  • Regulatory compliance is a worthwhile investment, as it mitigates risk and helps generate revenue and growth.
  • The need for compliance is not disruptive; it’s an opportunity to differentiate your brand and earn consumer trust.
  • Laying the foundation for data privacy allows your organization to manage its data ethics better.

Embrace the potential of data protection at the core of your competitive advantage

Ultimately, data protection fosters ongoing trust. It isn’t a one-and-done deal. It’s a continuous, iterative journey that evolves with changing privacy laws and regulations, business needs and customer expectations. Your ongoing efforts to differentiate your organization from the competition should include strategically adopt and integrate data protection as a cultural foundation of how work gets done.

By enabling an ethical, sustainable and adaptive data protection strategy that ensures compliance and security in an ever-evolving data landscape, you are building your organization into a market leader.

To learn more about how the three pillars of data protection and how you can incorporate these into your data strategy to get the most of your data, visit The Data Differentiator, an educational guide for data leaders.  

The post Make data protection a 2023 competitive differentiator appeared first on Journey to AI Blog.

Data platform trinity: Competitive or complementary?

Data platform architecture has an interesting history. Towards the turn of millennium, enterprises started to realize that the reporting and business intelligence workload required a new solution rather than the transactional applications. A read-optimized platform that can integrate data from multiple applications emerged. It was Datawarehouse.

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume.

Yet another decade passed. And it became clear that data lake and datawarehouse are no longer enough to handle the business complexity and new workload of the enterprises. It is too expensive. Value of the data projects are difficult to realize. Data platforms are difficult to change. Time demanded a new solution, again.

Guess what? This time, at least three different data platform solutions are emerging: Data Lakehouse, Data Fabric, and Data Mesh. While this is encouraging, it is also creating confusion in the market. The concepts and values are overlapping. At times different interpretations are emerging depending on who is being asked.

This article endeavors to alleviate those confusions. The concepts will be explained. And then a framework will be introduced, which will show how these three concepts may lead to one another or be used with each other.

Data lakehouse: A mostly new platform

Concept of lakehouse was made popular by Databricks. They defined it as: “A data lakehouse is a new, open data management architecture that combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses, enabling business intelligence (BI) and machine learning (ML) on all data.

While traditional data warehouses made use of an Extract-Transform-Load (ETL) process to ingest data, data lakes instead rely on an Extract-Load-Transform (ELT) process. Extracted data from multiple sources is loaded into cheap BLOB storage, then transformed and persisted into a data warehouse, which uses expensive block storage.

This storage architecture is inflexible and inefficient. Transformation must be performed continuously to keep the BLOB and data warehouse storage in sync, adding costs. And continuous transformation is still time-consuming. By the time the data is ready for analysis, the insights it can yield will be stale relative to the current state of transactional systems.

Furthermore, data warehouse storage cannot support workloads like Artificial Intelligence (AI) or Machine Learning (ML), which require huge amounts of data for model training. For these workloads, data lake vendors usually recommend extracting data into flat files to be used solely for model training and testing purposes. This adds an additional ETL step, making the data even more stale.

Data lakehouse was created to solve these problems. The data warehouse storage layer is removed from lakehouse architectures. Instead, continuous data transformation is performed within the BLOB storage. Multiple APIs are added so that different types of workloads can use the same storage buckets. This is an architecture that’s well suited for the cloud since AWS S3 or Azure DLS2 can provide the requisite storage.

Data fabric: A mostly new architecture

The data fabric represents a new generation of data platform architecture. It can be defined as: A loosely coupled collection of distributed services, which enables the right data to be made available in the right shape, at the right time and place, from heterogeneous sources of transactional and analytical natures, across any cloud and on-premises platforms, usually via self-service, while meeting non-functional requirements including cost effectiveness, performance, governance, security and compliance.

The purpose of the data fabric is to make data available wherever and whenever it is needed, abstracting away the technological complexities involved in data movement, transformation and integration, so that anyone can use the data. Some key characteristics of data fabric are:

A network of data nodes

A data fabric is comprised of a network of data nodes (e.g., data platforms and databases), all interacting with one another to provide greater value. The data nodes are spread across the enterprise’s hybrid and multicloud computing ecosystem.

Each node can be different from the others

A data fabric can consist of multiple data warehouses, data lakes, IoT/Edge devices and transactional databases. It can include technologies that range from Oracle, Teradata and Apache Hadoop to Snowflake on Azure, RedShift on AWS or MS SQL in the on-premises data center, to name just a few.

All phases of the data-information lifecycle

The data fabric embraces all phases of the data-information-insight lifecycle. One node of the fabric may provide raw data to another that, in turn, performs analytics. These analytics can be exposed as REST APIs within the fabric, so that they can be consumed by transactional systems of record for decision-making.

Analytical and transactional worlds come together

Data fabric is designed to bring together the analytical and transactional worlds. Here, everything is a node, and the nodes interact with one another through a variety of mechanisms. Some of these require data movement, while others enable data access without movement. The underlying idea is that data silos (and differentiation) will eventually disappear in this architecture.

Security and governance are enforced throughout

Security and governance policies are enforced whenever data travels or is accessed throughout the data fabric. Just as Istio applies security governance to containers in Kubernetes, the data fabric will apply policies to data according to similar principles, in real time.

Data discoverability

Data fabric promotes data discoverability. Here, data assets can be published into categories, creating an enterprise-wide data marketplace. This marketplace provides a search mechanism, utilizing metadata and a knowledge graph to enable asset discovery. This enables access to data at all stages of its value lifecycle.

The advent of the data fabric opens new opportunities to transform enterprise cultures and operating models. Because data fabrics are distributed but inclusive, their use promotes federated but unified governance. This will make the data more trustworthy and reliable. The marketplace will make it easier for stakeholders across the business to discover and use data to innovate. Diverse teams will find it easier to collaborate, and to manage shared data assets with a sense of common purpose.

Data fabric is an embracing architecture, where some new technologies (e.g., data virtualization) play a key role. But it allows existing databases and data platforms to participate in a network, where a data catalogue or data marketplace can help in discovering new assets. Metadata plays a key role here in discovering the data assets.

Data mesh: A mostly new culture

Data mesh as a concept is introduced by Thoughtworks. They defined it as: “…An analytical data architecture and operating model where data is treated as a product and owned by teams that most intimately know and consume the data.” The concept stands on four principles: Domain ownership, data as a product, self-serve data platforms, and federated computational governance.

Data fabric and data mesh as concepts have overlaps. For example, both recommend a distributed architecture – unlike centralized platforms such as datawarehouse, data lake, and data lakehouse. Both want to bring out the idea of a data product offered through a marketplace.

Differences exist also. As it is clear from the definition above, unlike data fabric, data mesh is about analytical data. It is narrower in focus than data fabric. Secondly, it emphasizes operational model and culture, meaning it is beyond just an architecture like data fabric. The nature of data product can be generic in data fabric, whereas data mesh clearly prescribes domain-driven ownership of data products.

The relationship between data lakehouse, data fabric and data mesh

Clearly, these three concepts have their own focus and strength. Yet, the overlap is evident.

Lakehouse stands apart from the other two. It is a new technology, like its predecessors. It can be codified. Multiple products exist in the market, including Databricks, Azure Synapse and Amazon Athena.

Data mesh requires a new operating model and cultural change. Often such cultural changes require a shift in the collective mindset of the enterprise. As a result, data mesh can be revolutionary in nature. It can be built from ground up at a smaller part of the organization before spreading into the rest of it.

Data fabric does not have such pre-requisites as data mesh. It is does not expect such cultural shift. It can be built up using existing assets, where the enterprise has invested over the period of years. Thus, its approach is evolutionary.

So how can an enterprise embrace all these concepts?

Address old data platforms by adopting a data lakehouse

It can embrace adoption of a lakehouse as part of its own data platform evolution journey. For example, a bank may get rid of its decade old datawarehouse and deliver all BI and AI use cases from a single data platform, by implementing a lakehouse.

Address data complexity with a data fabric architecture

If the enterprise is complex and has multiple data platforms, if data discovery is a challenge, if data delivery at different parts of the organization is difficult – data fabric may be a good architecture to adopt. Along with existing data platform nodes, one or multiple lakehouse nodes may also participate there. Even the transactional databases may also join the fabric network as nodes to offer or consume data assets.

Address business complexity with a data mesh journey

To address the business complexity, if the enterprise embarks upon a cultural shift towards domain driven data ownership, promotes self-service in data discovery and delivery, and adopts federated governance – they are on a data mesh journey. If the data fabric architecture is already in place, the enterprise may use it as a key enabler in their data mesh journey. For example, the data fabric marketplace may offer domain centric data products – a key data mesh outcome – from it. The metadata driven discovery already established as a capability through data fabric can be useful in discovering the new data products coming out of mesh.

Every enterprise can look at their respective business goals and decide which entry point suits them best. But even though entry points or motivations can be different, an enterprise may easily use all three concepts together in their quest to data-centricity.

The post Data platform trinity: Competitive or complementary? appeared first on Journey to AI Blog.

What is an open data lakehouse and why you should care?

A data lakehouse is an emerging data management architecture that improves efficiency and converges data warehouse and data lake capabilities driven by a need to improve efficiency and obtain critical insights faster. Let’s start with why data lakehouses are becoming increasingly important.

Why is a data lakehouse architecture becoming increasingly important?

To start, many organizations demand a better return on their datasets to improve decision-making. However, most of their raw data remains unused and trapped in data silos, making extracting insights difficult.

With the prohibitive costs of high-performance analytics solutions, such as cloud data warehouses, and performance challenges associated with legacy data lakes, neither data warehousing nor data lakes satisfy the need for analytical flexibility, cost-to-performance and manageability.

The rise of cloud object storage has driven the cost of storage down and new technologies have evolved to access and query the data stored in there more efficiently. A data lakehouse platform takes advantage of the low-cost storage and leverages the latest query engine to provide warehouse-like performance.

These new technologies and approaches, along with the desire to reduce data duplication and complex ETL pipelines, have resulted in a new architectural data platform approach known as the data lakehouse – offering the flexibility of a data lake with the performance and structure of a data warehouse.

Essential components of a data lakehouse architecture and what makes an open data lakehouse

At the core of a data lakehouse architecture includes the storage, metadata service and the query engine, and typically a data governance component made up of a policy engine and a data dictionary.

A data lakehouse strives to provide customers with flexibility and options, and an open data lakehouse takes this even further by leveraging open-source technologies such as Presto, enabling open governance and giving data scientists and data teams the option to leverage lakehouse components already in place while extending/adopting new lakehouse components based on business intelligence needs.

Storage: This is the layer that physically stores the data. The most common data lake/lakehouse storage types are AWS S3-compatible object storage or HDFS. In this layer, data is stored as files and could be stored in open data file formats such as Parque, Avro and more. Furthermore, metadata defining the table format may also be stored with the file. Open Data Formats are file specifications and protocols made available to the open-source community so that anyone can ingest and enhance, leading to widespread adoption and large communities.

Technical Metadata storage/service:  This component is required to understand what data is available in the storage layer. The query engine needs the metadata for the unstructured data and tables to understand where the data is located, what it looks like, and how to read it. The de-facto open metadata storage solution is the Hive Metadata Store.

In an open data lakehouse an open data governance approach is also supported. Organizations can bring their existing or preferred governance solution, preventing vendor data and metadata lock-in and eliminating or minimizing additional migration efforts.

SQL Query Engine: This component is at the heart of the open data lakehouse. It executes queries against the data and is often referred to as the “compute” component. There are many open-source query engines for lakehouse in the market, such as Presto or Spark. In a lakehouse architecture, the query engine is fully modular and ephemeral, meaning the engine can be dynamically scaled to meet big data workload demands and concurrency. SQL query engines can attach to any number of catalogs and storage.

Beyond the basics with data lakehouse governance

Aside from the core lakehouse components, an organization will also want an enterprise data governance solution to support data quality and security. At a basic level, a data catalog and policy engine are used to define rules with business semantics, and a plugin enables the engine to enforce governance policies during query execution.

Data Catalogs: Enable organizations to store business metadata, such as business terminologies and tags, to enable search and data protection. A Data catalog is essential to help users find the correct data for the job and semantic information for policies and rules.

Policy engine: This component enables users to define data protection policies and enables the engine to enforce those policies. This component is critical for an organization to achieve scalability in its governance framework. A policy engine is often deployed with the technical metadata service and the data catalog, so new/proprietary solutions often merge these components into a single service.

A word on managed services

Perhaps it occurred to you that other data lake implementations already offer some of these same features. Unfortunately for many organizations, maintaining these deployments can be complex. Studies have shown that the biggest hurdle for data lake adoption is the lack of IT skills required to manage them. Which is why a managed SaaS service is key to more modern, open data lakehouse implementations.

Data lakehouse architecture is getting attention, and organizations will want to optimize the components most critical to their business. An open lakehouse architecture brings the flexibility, modularity and cost-effective extensibility that your modern data science and data analytics use cases demand and simplifies taking advantage of future enhancements.


If you found this blog interesting and would like to discuss how you can experience, please contact me, Kevin Shen, at

Stay tuned for more blogs and updates on data lakehouse from IBM’s perspective!

The post What is an open data lakehouse and why you should care? appeared first on Journey to AI Blog.

It’s 2023… are you still planning and reporting from spreadsheets?

I have worked with IBM Planning Analytics with Watson, the platform formerly known as TM1, for over 20 years. I have never been more excited to share in our customers’ enthusiasm for this solution and how it is revolutionising many manual, disconnected, and convoluted processes to support an organisation’s planning and decision-making activities.

Over the last few months, we have been collecting stories from our customers and projects delivered in 2022 and hearing the feedback on how IBM Planning Analytics has helped support organisations across not only the office of finance – but all departments in their organisation.

The need for a better planning and management system

More and more we are seeing demand for solutions that bring together a truly holistic view of both operational and financial business planning data and models across the entire width and breadth of the enterprise. Having such a platform that also allows planning team members to leverage predictive forecasting and integrate decision management and optimisation models puts organisations at a significant advantage over those that continue to rely on manual worksheet or spreadsheet-based planning processes.

A better planning solution in action

One such example comes from a recent project with Oceania Dairy. Oceania Dairy operates a substantial plant in the small New Zealand, South Island town of Glenavy, on the banks of the Waitaki River. The plant can convert 65,000 litres of milk per hour into 10 tons of powder, or 47,000 tons of powder per year, from standard whole milk powders through to specialty powders including infant formula. The site runs infant formula blending and canning lines, UHT production lines, and produces Anhydrous Milk Fat. In total, the site handles more than 250 million litres of milk per year and generates export revenue of close to NZD$500 million.

Oceania Supply Chain Manager, Leo Zhang shares in our recently published case study: “Connectivity has two perspectives: people to people, facilitating information flows between our 400 employees on site. Prior to CorPlan’s work to implement the IBM Planning Analytics product, there was low information efficiency, with people working on legacy systems or individual spreadsheets. The second perspective is integration. While data is supposed to be logically connected, decision makers were collating Excel sheets, resulting in poor decision efficiency.”

CorPlan”, adds Zhang, “has fulfilled this aspect by delivering a common platform which creates a single version of the truth, and a central system where data updates uniformly”. In terms of Collaboration, he says teams working throughout the supply chain managing physical stock flows are being connected from the start to the finish of product delivery. “It’s hard for people to work towards a common goal in the absence of a bigger picture. Collaboration brings that bigger picture to every individual while CorPlan provides that single, common version of the truth,” Zhang comments.

The merits of a holistic planning platform

While the approach of selecting a platform to address a single piece of the planning puzzle – such as Merchandise Planning, or S&OP (Sales and Operational Planning), Workforce Planning or even FP&A (Financial Planning and Analysis) – may be a organisations desired strategy, selecting a platform that can grow and support all planning elements across the organisation has significant merits. Customers such as Oceania Dairy are realising true ROI metrics by having:

  • All an organisation’s stakeholders operating from a single set of agreed planning assumptions, permissions, variables, and results
  • A platform that supports the ability to run any number of live forecast models to support the data analysis and what-if scenarios that are needed to support stakeholder decision-making
  • An integrated consolidation of the various data sources capturing the actual transactional data sets, such as ERP, Payroll, CRM, Data Marts/ Warehouses, external data stores and more
  • Enterprise-level security
  • In the cloud, as a service delivery

My team and I get a big kick out of delivering that first real-time demonstration to a soon-to-be customer, showing them what the IBM Planning Analytics platform can do. It is not just the extensive features and workflow functionality that generates excitement. It is that moment when they experience a sudden and striking realisation – an epiphany – that this product is going to revolutionise the painful, challenging, and time-consuming process of pulling together a plan.

What is even better is then delivering a project successfully, on time and within budget, that delivers the desired results and exceeds expectations.

If you want to experience what “good planning” looks like, feel free to reach out. The CorPlan team would love to help you start your performance management journey. We can help with product trials, proof of concept or simply supply more information about the solution to support your internal business case. It’s time to address the challenges that a disconnected, manual planning and reporting process brings.


To read the full Oceania Dairy case study, please visit the CorPlan website here.

If you’re interested to see how IBM Planning Analytics work, try it for free by accessing the trial here.

The post It’s 2023… are you still planning and reporting from spreadsheets? appeared first on Journey to AI Blog.