Monthly Archives: June 2021

insideBIGDATA Guide to Big Data for Finance

This insideBIGDATA technology guide sponsored by Dell Technologies, insideBIGDATA Guide to Big Data for Finance, provides direction for enterprise thought leaders on ways of leveraging big data technologies in support of analytics proficiencies designed to work more independently and effectively across a few distinct areas in today’s financial service institutions (FSI) climate.

Tableau Extends Augmented Analytics, Bringing the Power of AI to Everyone

Tableau, a leading analytics platform (NYSE: CRM), is bringing data analytics and AI together in a suite of new and expanded augmented analytics features. Tableau’s latest release will empower more people with the right technology to make smarter and faster decisions regardless of their role and skill level.

How Accelerated Computing Is Transforming Data Science

In this contributed article, Scott McClellan, head of Data Science at NVIDIA, discusses how big companies, and startups increasingly use software to speed decision-making in creating new products and services. Businesses can use AI to learn from massive amounts of data captured from a broad range of sensors and sources, but none of this knowledge can be gained without processing these volumes of data.

How Can AI Enhance Human-to-Human Communication and Improve Customer Service?

In this special guest feature, Krishna Raj Raja, founder and CEO of SupportLogic, discusses how AI can help service agents detect customer signals – picking up on how a customer is feeling and also understanding context, then dynamically flagging or escalating a case before an annoyance turns into a major problem.

Real-time data brings real-time business value

Exploitation of data is critical to business success, and quicker data processing improves an organization’s ability to react to business events in real time. As a result, organizations are bringing together new types of data from a variety of internal and external sources for real-time data or near-real-time analytics. This can involve building data lakes and information hubs — often on public clouds — fed by real-time streaming technologies, to process and gain value from this variety of data. All these trends drive a growing need for capabilities that can effectively feed data into information hubs, data lakes and data warehouses and thereafter quickly process large data sets. These capabilities empower quick responses to changing business events, better engagement with clients, and more.

“Modern data management infrastructure needs to be dynamic — to evolve architectural patterns over time, enable new connections and support diverse use cases.”  ~ Gartner1

As organizations struggled to manage the ingestion of rapidly changing structured operational data, a pattern emerged in which organizations leverage data initially delivered to Kafka-based information hubs.

Kafka was conceived as a distributed streaming platform. It provides a very low latency pipeline that enables real-time event processing, movement of data between systems and applications, and real-time transformation of data. However, Kafka is more than just a pipeline; it can also store data. Kafka-based information hubs go well beyond feeding a data lake; they also deliver continuously changing data for downstream data integration with everything from the cloud to AI environments and more.

To help organizations deliver transactional data from the OLTP databases that power the mission-critical business applications into Kafka-based information hubs. IBM® Data Replication provides a Kafka target engine that applies  data  with very high throughput into Kafka. The Kafka target engine is fully integrated with all of the IBM data replication low-impact log-based captures from a wide variety of sources, including Db2® z/OS®; Db2 for iSeries; Db2 for UNIX, Linux® and Windows; Oracle; Microsoft SQL Server; PostgreSQL; MySQL; Sybase; Informix®; and even IBM Virtual Storage Access Method (VSAM) and Information Management System (IMS).

In the event that the requirement does not involve delivery to Kafka, the IBM data replication portfolio also provides a comprehensive solution for delivery of data to other targets such as databases, Hadoop, files, and message queues.

There is often little room for latency when delivering the data that will optimize decision making or provide better services to your customers. Hence, you need the right data replication capability that can incrementally replicate changes captured from database logs in near-real time. In turn, this capability can facilitate streaming analytics, feeding a data lake, and more, using the data landed by IBM replication into Kafka.

Learn more

See how you can use IBM Data Replication for optimized incremental delivery of transactional data to feed your Hadoop-based data lakes or Kafka-based data hubs, read the IBM Data Replication for Big Data solution brief. And read this blog to learn more and register for a planned, fully managed replication service on IBM Cloud® infrastructure that will address real-time replication for cloud-to-cloud and on-premises-to-cloud use cases.

1Smarter with Gartner

The post Real-time data brings real-time business value appeared first on Journey to AI Blog.

Accelerate the modernization of your information architecture with expert tools and advice

IT architectures have witnessed an increasing amount of dispersal and segmentation over the last decade of their life cycle as new data and new technology have made their impact with 62% of enterprises planning to modernize their existing on-premises data platforms[1]. These changes are needed in order to address the lack of cohesion and widespread data repositories and to meet business needs around data and analytics. Of course, doing so can also be difficult without the right planning and experience. This is leading to abandoning modernization efforts or not starting them at all rather than developing high-performance ecosystems with automation to drive better AI models and customer experiences. Fortunately, modernizing doesn’t need to be difficult. IBM has not only developed IBM Cloud Pak® for Data, a data and AI platform, but also a Modernization Factory experience to deliver the planning and expertise crucial to modernization success.

Digital Transformation with Modernization Factory

IBM Cloud & Cognitive Expert Labs has designed a Modernization Factory powered by expertise, proven practices and accelerators to support your journey. Our prescriptive approach is built on the foundational principles of right expertise coupled with the right method to drive success. Our playbook is designed to move customers from existing legacy on-premise software landscapes to cartridges on Cloud Pak for Data that includes the following:

  • An understanding of your landscape
  • Identification of business opportunities that can be leveraged with the new architecture
  • A technical discovery that involves an inventory, assessment and roadmap
  • A mobilization & planning exercise
  • Installation & provisioning of your Cloud Pak for data instance
  • Workload Test: Minimal Viable Product (MVP)
  • Adoption and implementation: Scale with continuous workload modernization

Through the Modernization Factory engagement with IBM you will gain the opportunity to reduce costs and gain a trusted roadmap to move forward all while partnering with IBM to leverage our expertise and experience. We believe that you need to start strong to finish strong and the best way to start strong is to leverage our deep skills, unmatched expertise and unique methods to help you succeed in your modernization journey.

The advice and tools you need to make modernization and optimization easy

As part of Modernization Factory, our experts from IBM Cloud and Cognitive Expert Labs will work with you to better understand your current landscape, use case, functionality and the business opportunity you want to realize. As a first step we will engage with you in a technical discovery which includes building an inventory, an architecture assessment and a roadmap. From there, the solution is installed and provisioned, a workload test is run and any remaining adoption and implementation activities are completed.

Our flexible model allows us to accelerate your journey whether you are moving from an on-premises bare metal implementation to containerized cartridges on Cloud Pak for Data or Cloud Pak for Data as a service. The Modernization Factory also brings immense benefit to customers who are using Db2, DataStage or IBM Cognos. These benefits include:


  • The ability to containerize databases in minutes
  • No exposure of raw data
  • Maintaining full database integrity


  • Automated assessment / unit-testing
  • Automated modernization of workload
  • Automated conversion of enterprise stages to CPD connectors


  • Automated conversion of security setting to CPD native security
  • Automated modernization of workload

How to start modernizing

There has never been a better time to modernize your architecture with a data and AI platform. The era of AI demands a flexible interconnected data architecture and the experts at IBM are ready to help you accelerate success.

To learn more about the importance of upgrading, read our white paper Upgrade to agility: The value of modernizing data and AI services to IBM Cloud Pak for Data, learn more about your modernization options on our website, or reach out directly to IBM Expert Labs to kick start your acceleration journey beginning with a modernization workshop.

You can also read about two clients who have already experienced success with the Cloud Pak for Data platform:

Danske Bank

IBM’s global CDO office


The post Accelerate the modernization of your information architecture with expert tools and advice appeared first on Journey to AI Blog.

Data utility can be preserved while enhancing data privacy

Organizations need to strike a perplexing balance when launching strategic AI initiatives: data needs to be accessible, without compromising privacy regulation compliance or the speed of business innovation. Customer trust and brand reputation are key competitive advantages, so accelerated digital transformation and growth relies on businesses being smart about protecting sensitive customer data while still preserving data utility for AI and analytics teams.

Three questions organizations need to confront when it comes to leveraging customer data are:

  • How can initiatives inside and outside my organization work securely with personal information (PI) and sensitive data?
  • How can I remove PI from datasets without affecting the integrity of the data or accuracy of my projects’ results?
  • How can I actively protect PI and sensitive data whenever they are accessed, wherever they reside?

When organizations do not have answers readily available to the questions above, then AI projects are often stalled and collaboration using meaningful data is limited. Gartner predicts that by 2024, the use of data protection techniques will increase industry collaborations on AI projects by 70%.

In my blog, I discussed the new IBM® AutoPrivacy framework and the key use cases delivered via IBM Cloud Pak® for Data.  Today I will expand on the advanced data protection use case, which is one of key capabilities in the AutoPrivacy framework.

Data protection and de-identification of sensitive data are not new concepts.  Although these concepts have been well known for many years, most enterprises did not employ these practices consistently.  The enforcement of GDPR has drastically changed that and in the post-GPDR era, enterprises are hyperaware of data protection regulations that they must adhere to. With the enforcement of GDPR (Europe), CCPA (California), LGPD (Brazil) and many other data protection legislations in recent months, consumers are now well aware of their privacy rights and are demanding that enterprises provide transparent privacy protection approaches.

Historically, enterprises have used many methods of sensitive data protection, including redaction and various forms of masking such as substitution, shuffling or randomization.  However, with the employment of deep (learning) neural network technology in AI, data science and analytical modeling, the risk of re-identification has been increasing.  Hence, there is a need for newer data protection techniques and robust encryption algorithms that can enhance privacy but also preserve utility of the data.

By far, the most important requirement from IBM customers has been the consistent enforcement of data protection policies, regardless of where the data resides.

Data cannot simply be de-identified randomly; important relationships must be maintained.  Format preservation is a fundamental requirement.  Values must be de-identified consistently across the enterprise, respecting relationships across multiple data assets.  For example, de-identification of a credit card number, personal first and last names, or any other entity identifiers must be repeatable consistently across data sources in on-premises and hybrid cloud environments.

In addition, I have often encountered unique industry use cases where there is a need for special treatment of certain data elements.  For example, in financial services and healthcare, the time intervals between certain dates should be the same whether unmasked or masked.  The accuracy of dates of disease treatment in healthcare are critical for biomedical research, so while shifting dates, it’s very important to maintain the right intervals. Similarly, the interval between a date of birth and date of an auto policy agreement (in other words, the customer’s age) may make a very big difference in the cost and available features of auto insurance.

Most customers require support for custom de-identification when it comes to complex, multi-field computation using a low-code or no-code approach.  There are also several use cases that require the addition of statistical noise to hide individual data and only surface group level information for analytics.

These rich data protection and consistent policy enforcement capabilities are available via IBM Watson® Knowledge Catalog Enterprise Edition to address a wide range of use cases.

The future is bright as the latest privacy enhancing technologies such as differential privacy, synthetic data fabrication and more are brought into the solution. These technologies, paired with the power of IBM Cloud Pak for Data, will allow data science teams to make choices along the privacy-utility spectrum and continue to push the boundaries of AI initiatives.

Learn more

Read more about the IBM unified data privacy framework that can help you understand how sensitive data is used, stored, and accessed throughout your organization.

Try IBM Watson Knowledge Catalog for free.

The post Data utility can be preserved while enhancing data privacy appeared first on Journey to AI Blog.

Video Highlights: Emil Eifrem on the Origins of Neo4j and the Ubiquity of Graphs

The video below is from a webinar for Neo4j’s APAC Quarterly Customer Update. It includes a fascinating conversation between Emil Eifrem, Co-Founder and CEO, and Nik Vora, the Vice President of Neo4j APAC.

The Amazing Applications of Graph Neural Networks

In this contributed article, editorial consultant Jelani Harper points out that a generous portion of enterprise data is Euclidian and readily vectorized. However, there’s a wealth of non-Euclidian, multidimensionality data serving as the catalyst for astounding machine learning use cases.

DALL-E – A Human-like Intelligence through Multimodality

In this special guest feature, Sahar Mor, founder of AirPaper, discusses DALL-E – a new powerful API from OpenAI that creates images from text captions. With this, Sahar is planning to build a few products such as a chart generator based on text and a text-based tool to generate illustrations for landing pages.