HABAULT Maxime
Data / Web Analyst
Automating Product Creation Through Supplier Data Scraping
Automating Product Creation Through Supplier Data Scraping
Creating product pages from supplier data is traditionally a time-consuming and repetitive process. It requires manual research, data extraction, image collection, content writing, and SEO formatting.
This project was designed to automate the entire workflow, from data collection to content generation, while ensuring data quality, consistency, and scalability.
Overview of the automated product creation workflow.
A Google Sheets Application as the Entry Point
The workflow starts with a Google Sheets–based application used by internal teams. This file acts as a simple and accessible interface where collaborators can initiate the creation of new products for the website.
Users only need to provide minimal input to identify the product. From there, an automated scenario is triggered and dynamically adapts its logic depending on the selected supplier or brand.
Google Sheets interface used by teams to trigger the automation.
Multi-Supplier Scraping and Data Structuring
The automation scenario includes multiple modules designed to handle different suppliers. Each module identifies the supplier, retrieves the supplier ID, and constructs the correct product URL directly on the supplier’s website.
Once the URL is identified, the system scrapes all relevant product data, including descriptions, images, and technical documents. The collected data is then normalized into a standardized JSON structure to ensure consistency across all suppliers.
Scraping process and normalization into structured JSON.
AI-Assisted Content Generation
After structuring the raw data, it is sent to a ChatGPT module responsible for generating a clean and optimized product description based on predefined criteria.
The generated content is organized into multiple paragraphs: an introduction explaining the product, a section highlighting key benefits, a technical characteristics paragraph, and when applicable, an installation or usage recommendation.
Video demonstration of scraping → formatting → AI content generation.
Technical Attributes Matching and Categorization
In parallel, a separate reference file stores standardized technical attributes by product domain, including five key characteristics, warranty information, and certifications.
A Make.com module identifies the product type, assigns the most relevant category, and matches the appropriate technical attributes based on the scraped supplier data. This guarantees structured and consistent product specifications.
Automated categorization and technical attributes matching.
Product Placement and Breadcrumb Logic
Beyond data creation, the project also focuses on proper product placement within the website. The site’s breadcrumb structure is used as a reference to determine the most relevant category path.
By matching product attributes with existing category hierarchies, the system ensures products are positioned where users are most likely to find them.
SEO, Media Formatting, and Server Integration
The automation also generates SEO-friendly meta titles and meta descriptions based on predefined optimization rules.
Images and technical documents are automatically formatted and uploaded to the server, making them immediately ready for integration into the e-commerce platform.
SEO metadata generation and automated media processing.
Measured Results and Impact
The impact of this automation is significant. Previously, creating six product pages manually required approximately two hours.
With the automated system, six products can be generated in about five minutes, followed by a short review and validation phase of approximately fifteen minutes.
This represents a time saving of around 1 hour and 30 minutes per batch of six products, while improving consistency, structure, and overall content quality.
Conclusion
This project demonstrates how combining data scraping, automation, structured data processing, and AI-driven content generation can transform a complex and time-consuming business process into a fast, reliable, and scalable workflow.