Here's a situation every sustainability lead recognises. You're working on a new product line, and a key regional botanical extract isn't in any commercial Life Cycle Assessment (LCA) database. The scientifically rigorous thing to do would be to request all necessary data, ideally a verifiable LCA, of the production of said compound from your supplier. You attempt this, but you realise that your supplier is facing a similar issue with its supplier in turn and does not have any of their records of energy or water use of their production site at hand. And in any case, they hinted that they are not inclined to follow your request based on the stated weeks or months of effort to come up with the needed data. Unless you represent a massive portion of their revenue, they are unlikely to dedicate the required weeks to generating this data. Your hope is, rightly so, that other customers, maybe larger ones too, will request the same data in the future and they will provide primary data for their production. Until that is the case though, you need a different approach.
In a world where companies and LCA practitioners operate with limited resources, fully modelling every detailed material or sub-version from scratch simply isn't a realistic standard. It's tempting to treat database gaps as a temporary problem, something the next database update will solve. In reality they're a permanent condition of LCA, and any methodology that doesn't account for them is incomplete. Coverage improves over time, but it's an asymptote, not a finish line. So until we approach a situation of abundant LCAs across supply chains, the question becomes 'how do we handle data gaps in a way that is defensible, scalable, and transparent?'
This post explores how ingredient mapping LCA works in practice, why proxy methodology is a necessity, and why automated product footprinting platforms are uniquely positioned to make product-level LCA work at scale for food and beverage manufacturers.
Primary data and proxies aren't the only options. The whole point of Life Cycle Inventory (LCI) databases is to make much of the individual data collection redundant, by providing standardised reference datasets. But even here, there is an issue of scale. There are various LCI databases available each with thousands of datasets. Some are general and some are specific to certain sectors. Even considering their overlap, this sums up to a huge number of secondary datasets available. Nevertheless, this represents only a tiny fraction of the global economy’s count of unique goods and services provided. And for the ones it has, it might have many fruits or vegetables, but not all as conventional or organic forms. Or it might have a given plastic of virgin and recycled origin, but not all different kinds of recycling options. And while it provides many regionalised datasets, it still often only offers a handful of locations.
Even institutions and consultants whose full-time job is to research and model LCI datasets will never fully represent the real world. So they, and by extension all of us, sometimes have to rely on proxies. Given the pace of new developments, from alternative proteins and fermentation-derived ingredients to new chemicals and processing technologies, one might argue that the world is outpacing database coverage.
Ultimately, database completeness isn't a roadmap milestone we will eventually hit; it's an asymptote.
Because employable resources are finite, using proxies is a daily reality for LCA practitioners. What separates a scientifically defensible LCA from a back-of-the-napkin estimate is not least methodological consistency. When comparisons between two LCAs are made, whether across time or across different products, both need to rely on consistent, standardised proxy selection processes. Consistency is arguably as important as the choice itself. Two sensible proxies might both be defensible choices. But if a product is assessed using one, then re-assessed two years later using the other, comparability breaks down.
A suitable proxy should reflect all environmental impacts as closely as possible. The factors that matter vary by sector. One we're setting aside here is regional approximation, where for example a Spanish apple is approximated with a South African one. That decision tends to be made separately and is graded separately in many Data Quality Rating (DQR) systems.
For agricultural products, the following all play a part in the proxy selection:
Plant or animal taxonomy: As a first indicator, it usually is advisable to consider plants or animals of the same taxonomy order, starting with the genus and family and going upwards. So for example if there is no dataset for broccoli, one for cauliflower, or alternatively cabbage, could be selected, as they all belong to the species Brassica oleracea.
Cultivation methods: Comparing cultivation methods is important, too. This includes considering whether a plant is perennial or non-perennial, the seasonality, what the yield per area tends to be, whether it tends to be cultivated in static monoculture, with crop rotation, or in polycultures. This also includes the question of whether a crop is even cultivated or harvested in the wild. For example, if there is no dataset for lemongrass or another plant in its family, a dataset for basil might be used.
Cultivation technologies: Another important aspect is the technological aspects of the cultivation. This includes how much water tends to be used for irrigation and what form of irrigation tends to be used. This includes whether the product tends to be grown in greenhouses, under foil or on open fields. This also includes application of fertilisers, pesticides, or manure. Consider for example tomatoes and cucumbers, which are often grown under cover or in greenhouses, and are both frequently cultivated on vertical trellising (a string/wire structure to allow them to grow upwards) or on rockwool.
A well chosen proxy from a peer-reviewed database can actually be more transparent than a custom-built model. Most published LCAs list the database, the flows and the amounts, but not the specific datasets used. Being explicit about which datasets are in use, and which of them are proxies, is what makes an LCA reproducible and trustworthy.
At Sustained, we already show in the modelling tool if the employed databases have a dataset for the selected ingredient and location combination. And in the overview of datasets in use, we flag which flows rely on a proxy and which specific dataset has been chosen.
And we also clearly indicate in the overview of datasets in use, we flag which flows rely on a proxy and which dataset has been used.
The standard proxy is the floor of ingredient mapping LCA, not the ceiling. When an ingredient warrants more confidence (perhaps because of its strategic importance, because it's a major impact contributor, or because of higher-stakes use) there are graduated mechanisms to close the gap. None of them require sending the customer to a six-month consultancy engagement.
Before getting to those, it's worth noting that coverage improves in the background. LCI databases release regular updates, often with meaningful improvements in both coverage and methodology, and Sustained incorporates these as they ship. A dataset that didn't exist last year may exist this year, and an existing dataset may be substantially refined between releases.
Beyond that baseline, if a better fitting dataset is needed sooner, an ingredient can be modelled from a published scientific paper covering its LCA. Many ingredients have such papers, though not all. A model built this way warrants more scepticism than one from a peer-reviewed secondary database, but it can be substantially closer to the real ingredient than a distant proxy.
There is also an option to override only a part of an ingredient's impact. If a supplier has for example conducted an extensive carbon footprint accounting exercise, but cannot provide primary data for any of the other impact categories, the option of overriding only the available category exists.
There are clear boundaries to how proxy data should be used. Flagged and well selected proxies are generally well suited for hotspot identification, internal reformulation, portfolio screening, and order-of-magnitude reporting.
However, flagged proxies, especially of dominant ingredients or flows, are not the right basis for single-SKU public comparative claims under ISO 14044. For those types of claims, a deeper primary-data study is the right tool.
Proxies aren't going away at portfolio scale, and pretending otherwise produces worse LCAs, not better ones. The work is in handling them transparently, flagging confidence, and offering a route to higher rigour where it matters. Manual LCA can do this for a single product. Automated product footprinting can do it for a thousand.
Sustained's ingredient mapping methodology aligns with PEF and ISO 14044 data quality requirements. Every mapping decision is documented, traceable, and reproducible. Confidence is surfaced inline at the data point, not buried obscurely in methodology documentation. This is screening-level LCA done transparently: the right tool for portfolio hotspot identification, with clear signposting when a deeper primary-data study is the right next step.
Q: What happens when an ingredient I use isn't in the LCI database?
A: Modern tools handle it through a documented mapping methodology, the closest defensible match, identified, flagged, and visible to the user. In Sustained, there is the additional option to add a placeholder and thus flag an ingredient for further investigation by Sustained.
Q: Are mapped proxies reliable for sustainability reporting?
A: For screening, hotspot identification, and reformulation, yes, provided the methodology is transparent. For single-SKU public comparative claims, such as PEF and ISO 14044, a higher bar typically requires a primary-data study and a certain level of accuracy in the respective Data Quality Rating.
Q: What if a standard proxy isn't accurate enough for a strategically important ingredient?
A: There are two additional mechanisms beyond the standard library: literature-derived modelling and primary data override (where supplier data or internal measurements override secondary mapping from LCI databases).
Q: Can I request that a new ingredient be added?
A: Yes, Sustained’s library expands continuously based on customer requests; new mappings become available to all customers once added.
Q: What's the difference between primary and secondary data in food LCAs?
A: For a deeper dive into this topic, read our post on Balancing Primary and Secondary Data in Life Cycle Assessment.