· 7 min read

Improving data for categorization and calculation

The quality of the carbon calculation highly depends on the quality of the data.

Data arrives at CarbonLink via API (Application Program Interface). CarbonLink does not alter the data itself or what it contains or make any changes to it; the integration is read-only.

From the data CarbonLink identifies various traits that are then scored the system. For example, if the sender of the invoice is “Helsinki Energy” or the description field reads “Helsinki Electricity”, the invoice is most likely an electricity invoice. If the invoice shows the number of kilowatt hours, CarbonLink can take that into account and use a more accurate activity-based coefficient. If the number of KWh is not shown on the invoice, CarbonLink will use the euro amount for spend-based recognition to calculate the carbon footprint.

Once the data has been identified, a category is selected, from which it receives a scientifically validated coefficient for calculating the amount of carbon. In an example where the invoice states that electricity was purchased for, say, 700 euros, the Finnish Environment Institute SYKE’s ENVIMAT coefficient for electricity is 1.7.

700 EUR = 700 x 1.7kg = 1190 kilograms of carbon.

The activity-based coefficient containing kilowatts is more accurate than the spend-based coefficient described above (see below).

CarbonLink also recognizes if the invoice states that the electricity is emission-free, and can take this into account.

CarbonLink’s recognition is based on a comprehensive analysis of many factors, based on which the category is then determined. The description above is only a very generalized description of one of the steps the program performs.

Why is data sometimes categorized incorrectly

CarbonLink uses automation and algorithms to recognize invoice data.

If the description field is vague or misleading, the data may be recognized incorrectly. A good example are advertising expenses, where the invoice description can be an advertising campaign for a company, for example:

“HEALTHY YOGURT 2025”, where an advertising campaign for yogurt has been made. Because the invoice information does not indicate that it is advertising, the algorithm can identify the invoice line as food.

Sometimes invoices or travel or expense invoices use internal company language, slang, or the line only contains the product number of the product sold or the internal abbreviation of the service, which will be very difficult for the algorithm to identify. From the point of view of identification, it is important that the line explanation, product or service description is as accurate and clear as possible on each line of the invoice.

CarbonLink receives invoice data in various text formats, most often in XML format through the interface. CarbonLink does not read invoice images or attachments. If the unique information of the invoice lines is in the invoice attachment or, for example, in a PDF file, the system will not receive it in structured text format and recognition cannot be performed as accurately.

How to improve calculation and identification

In order for CarbonLink to identify an invoice and assign it to a category, the invoice must be identifiable in general. If the invoice data e.g. does not include anything other than the supplier, who may be a fairly broad-based operator, categorizing the invoice precisely is a difficult task. The description on the invoice line and any quantities and quantity information are therefore the most important individual factors for successful identification.

If it’s unclear what is being invoiced from the invoice itself, the sender of the invoice update the invoice to contain the necessary identifiers to fix the situation. This must usually be requested directly from the invoicer so that the lines of the invoice will describe exactly which product or service has been purchased. This applies in particular to electricity bills and emission-free electricity: The invoice description must make it clear that the electricity is emission-free. Many electricity providers have productized emission-free electricity under a specific name, in which case including this name and finding the electricity company either as a party on the invoice or in the description is perfectly sufficient.

CarbonLink also uses accounting information and can utilize this in categorization. If it’s known that transactions posted to a specific account are always a specific activity, an identifier can be added to aid categorization.

Including cost centers also improves identification, and also provides the opportunity to produce categorization and visualization by cost center in the CarbonLink UI.

From lump sums to more refined information

Information In invoices can sometimes be a lump sum composed from many segments, without any detailed information on what it is composed from. This is especially common with travel invoices: Flights, hotel, taxis, restaurants and other expenses are added together, usually due to convenience by the person drafting the travel expense invoice. The practice is also common with subcontractors.

This sparks challenges for carbon calculation, not to mention how difficult it makes it for good financial management to track what has been bought and when. A better structure than 1 lump sum with the description “Berlin travel expenses, XXX,XX euros” could be, for example:

Business trip, 1.1.2020-6.1.2020

  • Flights HEL-BER, AY1152, XX,XXX €
  • Hotel expenses, Radisson Berlin, 4 nights, XX,XXX €
  • Taxi ride to the airport, XX,XXX €
  • Restaurant expenses 1st day, XX,XXX €

And so on.

This shows each line as a more precise breakdown of how the money has been spent, and in carbon calculations, the correct coefficient can be determined for individual invoice lines instead of using a general coefficient for the lump sum. This results in a vastly more accurate result, and the breakdown of expenses is also better recorded for the company’s accounting purposes.

More detailed info on flights

Flights produce a lot of carbon emissions, and studies show that the largest amount of emissions from aircraft occur during takeoff and landing. Therefore, there are different coefficients for flights based on whether the flight is short (less than 1,500 kilometers), medium-length, or long.

In addition, the price of airline tickets fluctuates and can be subsidized by frequent flyer programs (see example below). In order to calculate the true carbon emissions of a flight, the best information on the invoice is from what airport to which airport the flight was, plus the flight number. If this information is available on the invoice line, CarbonLink can calculate the carbon emissions of the flight very accurately.

Spend vs. Activity

Refining the calculation from Spend-based (in euros) to Activity-based (in grams, kilowatts and liters) also helps with categorization and provides a more accurate calculation result.

Activity-based calculation refers to an estimate based on actual consumption, where the carbon footprint is calculated according to actual consumption. In euro-denominated estimates, the calculation cannot take into account many factors, for example:

The flight was purchased partly with discounts provided by a frequent flyer program, in which case the carbon footprint calculated based on the price will be lower than in reality, as part of the price is omitted from the invoice.

The purchased products received a bulk discount when a larger quantity was purchased at once. The same amount of resources were used to produce the products and the same amount of carbon was generated as in a full-price purchase, but due to the discounted price, the carbon footprint of the product is artificially lower in the spend-based calculation than in reality.

In activity-based calculation, the carbon produced is calculated from the activity performed: kilometers traveled, grams, liters and kilowatts used, etc. Activity-based calculation is thus more accurate than spend-based.

In the CarbonLink user interface, the system indicates with an icon in the calculation row which calculation the system could have used. If both options are available, the system will always prioritize activity-based calculation.

Spend-based are described in CarbonLink with a euro sign and activity-based with a lightning bolt icon.

Carbon calculations are evaluations

Finally, it is worth noting that although CarbonLink’s carbon calculator is based on financial data, calculating carbon emissions is not financial management or a one-size-fits-all solution, but rather evaluations based on assessment and scientific research. The field of carbon calculations is constantly evolving and new research results are published all the time, and carbon calculation and emission factors are refined along with them.

Back to Info index page