A process where the model is trained with a limited amount of labeled data, and then the model selects the most informative samples to be labeled by a human expert. The newly labeled data is then used to retrain the model, resulting in a more accurate model with less labeled data. This iterative process reduces the labeling effort and improves the model’s performance.
Artificial Intelligence (AI)
The simulation of human intelligence in machines that are programmed to perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation. AI systems can be designed to learn and improve over time, adapting to new situations and data. AI is a rapidly growing field that has the potential to revolutionize many industries and aspects of our daily lives.
Big Data refers to Data sets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze. We commonly use the 3V to define Big Data: Volume, Variety & Velocity.
A field of study that focuses on enabling computers to interpret and understand visual data from the world around them. It involves developing algorithms and models that can recognize and analyze images and videos, and extract useful information from them. Some examples of tasks are: Scene understanding, Video interpolation, 3D model generation, etc.
Database Management System (DBMS)
It is a software system used to store, retrieve, define, and manage data. A DBMS serves as an interface between the end-user and the database, allowing him to create, modify and delete data in the database.
An inventory of all the organization’s available data assets. Permits users (both business & technic) to have a clear & accessible view of the data the organization has, where it came from, where it is located, who has access to it, etc.
Data Chain Value
The full data lifecycle from collection to analysis and usage. In other words, it describes all of the steps & activities required to identify a need for data to its final use and possible re-use.
The technic documentation of data elements with detailed description of its format, relationships, meaning, source and usage.
The system for defining the people, processes, and technologies needed to manage, oversight, and protect the data assets of an organization.
The overall accuracy, consistency, and trustworthiness of data across its lifecycle, to set goals of a company and its business strategy.
A centralized storage repository, hosting very large volumes & type of data in its native format with no fixed limits on account size or file.
Permits to track the lifecycle of data, showing as it flows from origin to destination.
The creation and implementation of architectures, policies, and procedures that manage the full data lifecycle needs of an organization. The goal is for the data to be accurate, consistent, accessible and secure.
Data Management Platform (DMP)
It is a platform that collect, organize, and analyze data from various sources (online, offline and mobile) & levels (first-, second-, and third-party audience). It permits to better understand prospects & customers and drive strategic & personalized actions.
A data mesh is a decentralized data architecture, where data is organized by domains (marketing, sales, client services, etc.) to make it more accessible to users in an organization.
The process of extracting & analysing data: discovering anomalies, patterns, and correlations within large volumes of data to transform data in exploitable information.
Data Operations (DataOps)
An approach to data management that focuses on the entire data lifecycle, from data acquisition to data delivery, with the goal of improving the speed and quality of data-driven insights. DataOps integrates people, processes, and technologies to enable continuous integration, testing, and deployment of data pipelines. This agile and collaborative approach to data management can improve the efficiency, reliability, and security of data operations.
Permits to know how reliable & suitable is the data for a specific purpose. Data quality is usually defined by 6 main components: accurate, complete, consistent, valid, unique, and timely. Data quality is only one pillar of data integrity.
The process of protecting digital information throughout its lifecycle from corruption, fraud or unauthorized access. It applies to all these elements: hardware, software, storage and user devices; access and administrative controls as well as organizational policies and procedures.
Allows the storage of large amounts of historical data that has been structured to fit a relational database schema. It permits to aggregate & summarize data for business decisions.
The process of using statistical techniques to describe or summarize data (as measure of frequency, dispersion, etc.). It doesn’t make predictions about the future.
Stands for “extract, transform and load”. It is a data integration process that combines data from multiple data sources into a single data store. For this, you have to: Extract => data from its original source, Transform => data by deduplicating it, combining it, and ensuring quality, to then, Load => data into the target database.
General Data Protection Regulation (GDPR)
The regulation that unified data privacy (the protection of private information) laws across the European Union.
Graph Neural Networks (GNNs)
A type of neural network designed to operate on graph data structures such as social networks, molecular structures, and traffic networks. They are commonly used in industry for tasks such as node classification, link prediction, and graph clustering, and have applications in fields such as drug discovery, recommendation systems, and fraud detection.
Machine Learning Operations (MLOps)
It is defined as the guideline principles that permits the development and maintenance of machine learning systems in production seamless and efficient. MLOps is a set of tools and techniques that helps you test/evaluate your machine learning models, deploy them in production – with the best inference performances – and helps tracking their accuracy through their life-cycles and detect performance decreases early-on.
Stochastic models that describe the evolution of a system over time based on the probability of transitioning between different states. In the industry, Markovian Processes are commonly used in combination with AI techniques such as Reinforcement Learning to model and optimize decision-making in dynamic and uncertain environments, such as in robotics, finance, supply-chain, and healthcare.
The ability to understand and explain how a machine learning model arrives at its predictions or decisions. It involves identifying the key features and factors that influence the model’s output, and making the model’s decision-making process transparent and comprehensible to humans.
Natural Language Processing (NLP)
A field of study that focuses on enabling computers to understand, interpret, and generate human language. It involves developing algorithms and models that can process and analyze natural language data, such as text and speech, and extract useful information and insights from it.
Natural Language Understanding (NLU)
A subset of Natural Language Processing (NLP) that focuses on enabling computers to derive meaning from human language.
Optical Character Recognition (OCR)
A technology that enables the conversion of printed or handwritten text into digital text that can be edited and searched. It is commonly used in data entry, document management, and information retrieval applications.
The series of steps required to move data from one system (source) to another (destination).
The process of analyzing & use data to recommend the best actions & next steps. It’s especially useful in driving data-informed decision-making.
AI-powered systems that provide personalized recommendations to users based on their preferences and behaviors. In the industry, recommendation engines are widely used in e-commerce, entertainment, and advertising to improve user engagement, increase sales, and optimize marketing campaigns.
A type of machine learning where an agent (the decision maker) learns to make a sequence of decisions by interacting with an environment. The agent receives rewards or punishments based on its actions, and its goal is to maximize the cumulative reward. Reinforcement learning algorithms use trial-and-error to learn the optimal actions in a given environment.
SQL / No SQL
SQL (Structured Query Language) is a relational database management system that uses structured data and predefined schemas to store and manage data, making it ideal for handling structured data that follows a fixed schema.
NoSQL (Not Only SQL), on the other hand, is a non-relational database management system that can handle unstructured, semi-structured, and structured data without predefined schemas, making it more suitable for handling large, complex, and dynamic data sets that require flexible schemas and high scalability.
When to use each depends on the type of data being handled and the specific needs of the application, with SQL being better suited for structured data that needs strict schema validation and NoSQL being more appropriate for handling unstructured or semi-structured data that needs to be processed quickly and flexibly.
Time To Value
The time it takes for a customer to get value from a product or service. The faster a product or service gives a result, the better the customer experience and ROI are.