Back to Blog

Good product data from a PIM system: The key to a successful e-commerce business

Reading Time 8 mins | May 19, 2021 | Written by: AX Semantics

Online retail is booming. Thereby the number of products and services offered in the e-commerce sector, as well as the background data, are increasing. For this reason, companies require high-quality and unique product descriptions to an ever increasing extent. The creation of product descriptions is essential and is simultaneously among the biggest organizational, manual and financial challenges for the online retail sector. However, if companies approach the topic of product descriptions without any preconceptions, it quickly becomes clear what high-quality product descriptions can accomplish:

  • Direct influence on conversion in the buying process
  • Lower number of returned goods
  • Higher visibility through search engines (e.g. Google)
  • More website traffic
  • Reduced expenditure of customer queries
  • Positive product and customer experience and resulting customer loyalty

With these points in mind, it is worth investing in high-quality product descriptions. In particular, product descriptions that have to be created at short notice should receive more attention - such as descriptions of seasonal products, long-running products or new collections. The challenge is to manage the numerous product descriptions and at the same time to keep them up to date - and all of this within a financially acceptable budget. This is where the automated content creation baed on Natural Language Generation (NLG), comes into action.

The PIM system improves the quality of your product data

The NLG type data-to-text can only work with a good database. Ready-made and constantly maintained data is a must. Data can be collected, archived and used in different ways. A PIM can improve the quality and accuracy of the data, which leads to the optimization of subsequent business processes. The PIM system (Product Information Management System) is a central place for all project-relevant information, specifications and digital files. The relevant product information for online shops can be exported from a PIM system to countless channels.

More on this topic: How To Optimize Your Product Data: 2 Experts Provide Tips

Data quality and completeness influence the quality of the automated descriptions

In automated content generation, data quality has a decisive factor of influence on the quality of the results. Top-quality product data is the foundation of success in the selling of products and services. Manufacturers give product information, which merchants must provide for the intended audience. There are certain weak points in some cases. Manufacturer standards frequently meet legal requirements, but are not necessarily understandable to a layperson. Therefore, as a rule, manufacturer information needs to be enhanced. Motto: Content is king. This means that the more precise and targeted the data, the better. The completeness and quality of product data are crucial for a positive shopping experience.

Data Quality Rules facilitate controlling and monitoring data quality

A PIM system helps to technically ensure the basic data quality and data structure. Data quality is enabled through direct impact by specifying default values and necessary fields. These defaults are part of far-reaching data quality rules that can be configured within a PIM system, making it easier to control and monitor data quality. A good PIM is able to generate additional, new attributes from existing attributes.

However, in practice, it has been demonstrated that the quality of product data is not yet sufficient in many planned projects to fully use the potential of content automation utilizing NLG and achieve the necessary content quality. In these cases, different scenarios for starting automated content generation are conceivable. For example, the sets of rules can be created in the text engine in parallel with the data preparation that is taking place. Multi-stage procedures are also conceivable, in which the available attributes are used in a first step and later, as soon as they are available, more are added to the descriptions.

A PIM system forms the technical basis for data quality

Product data need to be complete, available, granular and consistent

In order to be able to automatically create text from product data, such as product descriptions, the data must be: 1. complete, 2. available, 3. granular and 4. consistent.

1. Completeness of data

In automated content generation, it is essential to ensure that the necessary data or product attributes are complete. It is not necessary to list every aspect of a product, but it is crucial to name the attributes that are relevant for adding text and to design a corresponding structure system. This structural system can then be transferred into a data model within a PIM system.

2. Availability of data

On the surface, the data source does not play a major role for content generation with an NLG system. It is irrelevant whether the structured data is read from a table or communicated directly with corresponding source systems via REST API. However, it is necessary to identify a leading data source in a fully automated workflow that provides the product data. Usually, this is the PIM system, which is connected via Connector, for example.

3. Granularity of data

Granularity describes that the available attributes are recorded as separately as possible from each other. This requirement is not only relevant for the automation of product descriptions, but it also contributes to an improvement of product search. If the product data is not sufficiently granular, it can be divided and structured in an intermediate step, which then makes the product data usable for the automation process.

4. Consistency of data

The fourth important point is the consistency of the maintained product data. Different spellings of the same characteristic values make it more difficult to create training courses, as these errors have to be intercepted and corrected by the NLG software. Prospectively, consistent notation of characteristic values can be ensured, for example, through default values for individual data fields. The optimization of already existing product data also poses a challenge.

Inconsistent data complicates the creation of the set of rules, as different spellings have to be corrected.

High-quality data through systematic product data onboarding

How can the necessary completeness, availability, granularity and consistency of product data be achieved? The basis is systematic product data onboarding, so the desired high quality of the data is accomplished. Consequently, automated high-quality texts can be generated.

In the onboarding process, certain steps are particularly relevant in regard to data quality: The completeness of the data can be ensured in the attribute mapping step, i.e. the category-specific assignment of the supplier attributes to the PIM attributes. Here it is checked whether the defined mandatory attributes per category have been delivered by the supplier. The mapping of the supplier values to the PIM default values also reveals possible gaps. Based on these two onboarding steps, missing product information is identified and can be requested from the supplier or manufacturer in a targeted manner. It is also possible to enhance the information with content from a content provider who makes product data and other information available against payment.

Availability of data depends on the degree of automation of the onboarding process

Availability depends primarily on the degree of automation of onboarding. The more automated the process, the faster the data can be transmitted to the PIM and used for automated content generation. The more accessible the PIM systems are in terms of data import and export, the easier it is to use external onboarding systems and assure effective and problem-free data exchange.

Granularity of the data through text extraction

In order to increase the granularity of product data, text extraction is often mandatory, as certain attributes are not provided in a structured form by the supplier. In this process, the relevant attributes are extracted from semi-structured content from the manufacturer, such as the product titles or the unstructured product descriptions. The attributes extracted from the texts can then be used for additional filters in the online shop, for example.

Consistency of data through standardization of values

The consistency of product data can be ensured in the normalization step: In this process, the numbers, units, lists of values or spellings present in the data are standardized. Abbreviations can also be resolved or synonyms can be converted into a matching designation.

Once all these steps have been completed, the prepared product data is of high quality and available. Then, it can be imported into the PIM system where it is made available for further use. Thus, an onboarding platform is a valuable addition to the PIM to ensure that the data is complete and available, and has the required granularity and consistency.

This video explains, what to look for when implementing a PIM, how data can be used in multiple ways and how different perspectives can be mapped in the texts.


In conclusion, a lack of granularity, consistency and completeness of data fields make trainings created within the NLG solution error-prone, unnecessarily complex and more difficult to maintain. On the other hand, if the availability of the product data is not guaranteed, no automation can take place at all.

The PIM has a key role in an NLG system landscape. As a central data source, it provides all relevant product data for the NLG software and enables their transfer to different channels, such as the online shop. 

About the author

hmmh is partner and managed service provider of AX Semantics and has written this article on the topic of good product data from a PIM system. As a management service provider, hmmh helps customers with a full-service component. AX Semantics highly recommends hmmh to customers if external manpower is required.

The company is a market leader in connected commerce: hmmh brought e-commerce to Germany more than 20 years ago. Since then, the company has been shaping the developments in this field. For hmmh, connected commerce is the logical continuation of the multi-channel business, in which channels become touch points and boundaries between online and offline disappear – at any time, at any place and across all devices, always the right content.

FAQ Whitepaper Title

Open questions around automated content generation?

Download our free white paper that answers frequently asked questions around content automation and the AX software.

AX Semantics