Monthly Archives: March 2022

Building an E-commerce Scraper

In this sponsored post, we sit down with Aleksandras Šulženko, Product Owner at, to discuss the past, present, and future of web scraping. He has been involved with both the technical and business side of automated collection for several years and has helped create some of the leading web scraping solutions on the market such as the E-commerce Scraper API, dedicated to collecting external publicly available data from e-commerce websites.

The Power of Controlling Data and Not Having Data Control You

In this special guest feature, Ram Venkatesh, CTO at Cloudera, discusses how leveraging data to delight customers, improve decision making, and increase operational efficiency is now possible for companies that commit to becoming data-driven. Like Deutsche Telecom, the key is to use a hybrid data platform to better control and leverage all data and ensure data quality, so analytics teams can deliver truly meaningful business value. 

State of Data Report Emphasizes Emerging Shift to a Decentralized Model

New market research commissioned by Starburst and Red Hat uncovers that 55% of organizations claim the pandemic has made data access more critical, a slight increase from the 2021 study. As a result, enterprises plan to prioritize multi-cloud flexibility and ease of use when it comes to selecting data infrastructure solutions. The second annual report, “The State of Data and What’s Next,” conducted by independent research firm Enterprise Management Associates (EMA), found that the shift to quick and flexible deployments is imperative for driving the business functions and insights required to deliver valuable customer experiences in today’s fast-paced, distributed environment.

insideBIGDATA Guide to Government

This white paper from Dell Technologies and AMD examines big data analytics projects in government and recommends 15 lessons government agencies can learn. Big data is big business, particularly in the government sector. According to market researchers at IDC, worldwide spending on big data and business analytics solutions grew 10.1% in 2021 to total an estimated $215.7 billion. And a lot of that spending came from the public sector as the government was among the top six industries for overall expenditures related to big data analytics.

insideBIGDATA Latest News – 3/28/2022

In this regular column, we’ll bring you all the latest industry news centered around our main topics of focus: big data, data science, machine learning, AI, and deep learning. Our industry is constantly accelerating with new products and services being announced everyday. Fortunately, we’re in close touch with vendors from this vast ecosystem, so we’re in a unique position to inform you about all that’s new and exciting. Our massive industry database is growing all the time so stay tuned for the latest news items describing technology that may make you and your organization more competitive.

Businesses in Cornwall using AI to Commercialize Space Data 

As the space and satellite industry in Cornwall county, UK scales at pace, so too does the region’s AI capabilities. From edge AI for manufacturing, to AI algorithms being developed to remove cloud cover and unlock satellite data for business transformation – Cornwall is home to a hugely unique mix of companies tapping into the mutually beneficial relationship between the two technologies, in turn becoming a hub for innovation in AI applications.

The Framework that will Connect Tomorrow’s IT is Edge Data Fabric

In this special guest feature, Lewis Carr, Senior Director of Marketing at Actian, expands on his definition of data fabric and why this technology is quickly becoming a necessity for modern enterprises. Data fabric enables data that is distributed across different areas to be accessed in real time in a unifying data layer, under the same management. A data fabric is agnostic to deployment platforms, data process, data use, geographical locations and architectural approach. It facilitates the use of data as an enterprise asset and ensures various kinds of data can be successfully combined, accessed and governed efficiently and effectively – making it an innovation in embedded databases and transformative solution at the edge.

New Quantum Computing Research Shows Promising Path to Commercialization

Agnostiq, Inc., the quantum computing SaaS startup, announced its latest benchmark research which analyzed the state of quantum computing hardware to determine its current and future practicality as a mainstream solution. The findings show that quantum computing hardware has improved over time and that application-specific benchmarks can serve as a more practical yardstick for comparing the capabilities of alternative types of quantum hardware.

Accenture Technology Vision 2022: “Metaverse Continuum” Redefining How the World Works, Operates and Interacts

A new report from Accenture (NYSE: ACN) reveals that the “Metaverse Continuum,” a spectrum of digitally enhanced worlds, realities and business models, is redefining how the world works, operates and interacts. According to the Accenture Technology Vision 2022, “Meet Me in the Metaverse: The Continuum of Technology and Experience Reshaping Business,” businesses are racing toward a future that is very different from the one they were designed to operate in — as technologies, such as extended reality, blockchain, digital twins and edge computing, are converging to reshape human experiences. 

What is a data fabric architecture?

To simplify data access and empower users to leverage trusted information, organizations need a better approach that provides better insights and business outcomes faster, without sacrificing data access controls. There are many different approaches, but you’ll want an architecture that can be used regardless of your data estate. A data fabric is an architectural approach that enables organizations to simplify data access and data governance across a hybrid multicloud landscape for better 360-degree views of the customer and enhanced MLOps and trustworthy AI. In other words, the obstacles of data access, data integration and data protection are minimized, rendering maximum flexibility to the end users.

With this approach, organizations don’t have to move all their data to a single location or data store, nor do they have to take a completely decentralized approach. Instead, a data fabric architecture implies a balance between what needs to be logically or physically decentralized and what needs to be centralized.

Thanks to that balance, there is no limitation to the number of purpose-fit data stores that can participate in the data fabric ecosystem. This means you get a global data catalog that serves as an abstraction layer, single source of truth and single point of data access with infused governance.

Six core capabilities are essential for a data fabric architecture:

  1. A knowledge catalog: This abstraction layer provides a common business understanding of the data for 360-degree customer views, which allows for transparency and collaboration. The knowledge catalog serves as a library with insights about your data. To help you understand your data, the catalog contains a business glossary, taxonomies, data assets (data products) with relevant information like quality scores, business terms associated with each data elements, data owners, activity information, related assets and more.
  2. Automated data enrichment: To create the knowledge catalog, you need automated data stewardship services. These services include the ability to auto-discover and classify data, to detect sensitive information, to analyze data quality, to link business terms to technical metadata and to publish data to the knowledge catalog. To deal with such a large volume of data within the enterprise, automated data enrichment requires intelligent services driven by machine learning.
  3. Self-service governed data access: These services enable users to easily find, understand, manipulate and use the data with key governance capabilities like data profiling, data preview, adding tags and annotations to datasets, collaborate in projects and access data anywhere using SQL interfaces or APIs.
  4. Smart integration: Data integration capabilities are crucial to extract, ingest, stream, virtualize and transform data regardless of where it’s located. Using data policies designed to simultaneously maximize performance and minimize storage and egress costs, smart integration helps ensure that data privacy. Protection is applied on each data pipeline.
  5. Data governance, security, and compliance: With a data fabric, there’s a unified and centralized way to create policies and rules. The ability to automatically link these policies and rules to the various data assets through metadata, such as data classifications, business terms, user groups, roles and more are easily accessible. These policies and rules, which include data access controls, data privacy, data protection and data quality, can then be applied and enforced in large scale across all the data during data access or data movement.
  6. Unified lifecycle: End-to-end lifecycle to composes, builds, tests, deploys, orchestrates, observes and manages the various aspects of the data fabric, such as a data pipeline, in a unified experience using MLOps and AI.

These six crucial capabilities of a data fabric architecture enable data citizens to use data with greater trust and confidence. Irrespective of what that data is, or where it resides — whether in a traditional datacenter or a hybrid cloud environment, in a conventional database or Hadoop, object store or elsewhere — the data fabric architecture provides a simple and integrated approach for data access and use, empowering users with self-service and enabling enterprises to use data to maximize their value chain.

Learn more about a data fabric approach and how it applies to governance and privacy, multicloud data integration, 360-degree customer views and MLOps and trustworthy AI use cases.

The post What is a data fabric architecture? appeared first on Journey to AI Blog.