Energizta

Energizta is a collaborative project to collect and report open-data on the energy consumption of servers.

Warning: this is still a very early stage project. Any feedback or contribution will be highly appreciated.

principle of Energizta

What

Science is still at an early stage for computer energy consumption evaluation.

Several approaches have been used to measure (or model) the power consumption of computers:

  • RAPL can be used on all recent Intel and AMD CPUs. Most energy consumption metrology agents use it to estimate CPU+RAM consumption which represents the majority of power consumption on most common servers (CPU, RAM, SSD or disk… no GPU). Its it not always possible to access RAPL data. It needs recent Linux kernel, recent hardware, and root access. And it needs a monitoring agent to log data frequently, because you cannot ask RAPL once a year what the consumption has been in the last year. Besides, in environment such as public cloud or VM you can not access RAPL interfaces.

  • The IPMI (DCMI) can provide the power consumption of the power supply unit, but we don't know for sure how this metric is calculated. And access to DCMI seems to be very rare on public bare-metal hosters, so it is a metric we cannot get on every dedicated server.

  • lm-sensors can provide the ACPI power, but again it seems to be very rare, and we don't know how this metric is calculated, more search is needed.

  • A PDU could provide the data, but most users do not have access to datacenters PDU.

  • Wattmeters between the server and its power plug could provide the data, but obviously we don't have that on public baremetal servers.

  • Model have been developed to retrieve consumption data from proxy metrics. They often use data from spec_power or from unormalized data collection process. Unfortunately, the level of granularity is often not fine enough to infer quality models.

These different methods give different result on the same environment. For instance, the first tests to compare RAPL measurements (CPU+RAM) to total power consumption (with DCMI or watt meter) seems to indicate that the global power can be between 1 and 10x what we get with RAPL (again, on standard dedicated server without GPU).

So how do we bridge these gaps? Can we guess the total power based on RAPL only? Maybe add some fixed costs for hardware? Maybe we should add storage IOs? Network? Maybe temperature can help? How much precision can we get with partial information? How precisely can we estimate server yearly power consumption with only hardware specs and proxy metrics?

Energizta is trying to address these problems and provide a set of tools to report and model the power consumption of servers with as much precision as possible.

How?

  1. With a script that collect hardware configurations and retreive power consumption metrics on baremetal servers at differents states with differents methods.
  2. With a "citizen science" database where anyone can contribute by uploading the information returned by the script. This database will be opendata and should allow research scientists to work on models and equations to describe power usage based upon hardware specs proxy metrics (realtime or average).
  3. With an API that will compile the result of the models and equations to provide power consumption estimation based upon what's available to the user given it's context.

Quickstart for energita's contributers

This page is dedicated to people who want to contribute quickly to the Energizta data repository. If you wish to understand the details of the project, please read the rest of the documentation

wget https://raw.githubusercontent.com/Boavizta/Energizta/main/energizta/energizta.sh
chmod +x energizta.sh
sudo apt-get install awk sed curl lshw stress-ng

sudo ./energizta.sh --stresstest --send-to-db

This will run a series of stress tests while collecting power consumption data and then send the collected data to the energizta-db.boavizta.org database with your consent. If your machine has not yet sent any data, energizta.sh will collect a series of variables to characterize the technical configuration of the host.

-- 2023-03-03 18:13:46 - INFO: This test should take 240s
-- 2023-03-03 18:13:46 - INFO: Running "sleep 120" for 60 seconds
{"host": "d04e0818-3d98-41ad-b516-7b735809a0bf_1f9684433f690826ce392919c8022beb_82e8c7a4c323a6df478f95c0683fb084","interval_us": 6400207,"duration_us": 51201667,"nb_states": 8,"cpu_iowait_pct": 0,"cpu_sys_pct": 2,"cpu_usr_pct": 97,"load1": 11.61,"mem_free_MB": 2099,"mem_total_MB": 7708,"mem_used_MB": 2339,"sda_pct_busy": 4,"sda_read_kBps": 1271,"sda_write_kBps": 198,"powers": {"rapl_dram_0_watt": 1,"rapl_package_0_watt": 8,"rapl_total_watt": 9},"energizta_version": "0.1a"}
 
-- 2023-03-03 18:16:11 - INFO: Running "stress-ng -q --cpu 4" for 60 seconds
1ad-b516-7b735809a0bf_1f9684433f690826ce392919c8022beb_82e8c7a4c323a6df478f95c0683fb084","interval_us": 6457377,"duration_us": 51659033,"nb_states": 8,"cpu_iowait_pct": 0,"cpu_sys_pct": 2,"cpu_usr_pct": 97,"load1": 13.35,"mem_free_MB": 2133,"mem_total_MB": 7708,"mem_used_MB": 2317,"sda_pct_busy": 0,"sda_read_kBps": 3,"sda_write_kBps": 202,"powers": {"rapl_dram_0_watt": 0,"rapl_package_0_watt": 8,"rapl_total_watt": 8},"energizta_version": "0.1a"}

... 

-- 2023-03-03 18:21:24 - INFO: Running "stress-ng -q --cpu 8" for 60 seconds
{"host": "d04e0818-3d98-41ad-b516-7b735809a0bf_1f9684433f690826ce392919c8022beb_82e8c7a4c323a6df478f95c0683fb084","interval_us": 6602975,"duration_us": 52823821,"nb_states": 8,"cpu_iowait_pct": 0,"cpu_sys_pct": 0,"cpu_usr_pct": 99,"load1": 17.32,"mem_free_MB": 2076,"mem_total_MB": 7708,"mem_used_MB": 2330,"sda_pct_busy": 0,"sda_read_kBps": 104,"sda_write_kBps": 139,"powers": {"rapl_dram_0_watt": 0,"rapl_package_0_watt": 8,"rapl_total_watt": 8},"energizta_version": "0.1a"}
 
=> Do you still want to send above data to Boavizta's Energizta database? (y/n) y
 
Checking if d04e0818-3d98-41ad-b516-7b735809a0bf_1f9684433f690826ce392919c8022beb_82e8c7a4c323a6df478f95c0683fb084 is registered in Boavizta's Energizta database…
We need to register some information about your hardware and software.
It should be completely anonymous:


"id": "d04e0818-3d98-41ad-b516-7b735809a0bf_1f9684433f690826ce392919c8022beb_82e8c7a4c323a6df478f95c0683fb084",
"hardware": [
...
],
"software": {...}
}

=> Do you allow to register this on Boavizta's Energizta database? (y/n) y

Registering d04e0818-3d98-41ad-b516-7b735809a0bf_1f9684433f690826ce392919c8022beb_82e8c7a4c323a6df478f95c0683fb084 in Boavizta's Energizta database…
This host is now registered.

Sending results to Energizta collaborative database…

Done. Thank you!

FAQ

Shouldn't this be done at another level?

Yes, of course. Power usage should be provided by the datacenter or the hoster, and we believe (hope?) it will be, pretty soon, because the clients will start to ask for it.

But for now, it's not. And even when it will be, how can you tell if the value you get seems to be right? Take the previous example: "I have a server with an Intel Xeon E3-1240v6, 32GB RAM, 2 500GB SSD. I know last year it had a load average of 3." Can you guess the power consumption? Well, we have discussed. Some will tell 20W, some will tell 60W, some will tell 200W.

We need to get a feeling for the numbers we are looking at and working with. And this project should at least get us there.

Is it all I need to calculate environmental impacts?

No, of course not. This project will only help to measure or model the power consumption of a given server. To get to the impacts related to usage, you should add at least the datacenter P.U.E., and then look at the impacts of the energy mix of the location where the server run.

But you should also include the other step of the lifecycle of servers such as raw material extraction, manufacture, transport and end of life.

Finally, you should not only look at the green gas emissions but have a multi-criteria approach by taking into account other impacts such as abiotic depletion, water usage, ...

What about cloud usage? What about VM? How can I know how my application consume?

This is a matter of allocations : How to allocate the impacts of a physical element (servers) to the different function it fulfills ? Since we only focus on a physical layer (servers) this question is out of the scope of the project.

See our approach for cloud : https://doc.dev.api.boavizta.org/Explanations/devices/cloud/

Setup energizta.sh

energizta.sh is a simple script that focuses on retrieving every information that can be used to guess the power consumption of baremetal servers with as much precision as possible

It will try and find all power metrics available. Some are partial (RAPL), some should be global (DCMI, lm-sensors, PDU…) and some could even be inputed by a user looking at a wattmeter. The primary goal is to get all data possible for scientists to work on models.

This first version has been written in Bash4 and does not depend on anything else. The goal is to provide a simple script that can be run by anyone on any recent Linux server.

How to install

wget https://raw.githubusercontent.com/Boavizta/Energizta/main/energizta/energizta.sh
chmod +x energizta.sh
sudo apt-get install awk sed curl lshw

How to use

./energizta.sh --help
sudo ./energizta.sh

It will run until you use Ctrl+C to stop it.

energizta.sh gives you various options that are documented in ./energizta --help

Main options

--interval INTERVAL   Measure server state each INTERVAL seconds (default 5)
--duration DURATION   Stop each stresstest after DURATION seconds (default 60)
--once                Do not loop, print one state and exit
--manual-input        Ask the user to enter power metrics manually (from a Wattmeter…)

--debug               Display debug outputs
--continuous          Display the current state every INTERVAL seconds instead of an average state every DURATION seconds
--energy-only         Only displays energy variables instead of global state (load, cpu, etc.)
--with_timestamp      Include timestamp in displayed variables
--with_date           Include datetime in displayed variables
--short-host-id       Use shorter string as HOST_ID and avoid the need for lshw (not compatible with --send-     to-db)
--force-host-id ID    Force an alternative HOST_ID, use $(hostname) for instance (not compatible with --send-    to-db)

This script should not be used with an INTERVAL lower than 1 because each loop can take 500ms so it can cause significant load. Also, the greater the INTERVAL between each loop, the lower the margin of error on interval dependant measures (disk usage, RAPL power). With that said, some metrics are realtime metrics (temp, dcmi, used mem), so the greater the interval, the least those metrics are representative of the period. I believe a 2 to 10s INTERVAL is ideal.

Default mode

By default energizta.sh will collect and report metrics describing the current use of the machine on which it is installed

Stress test mode

To get the most various data, Energizta can run stress tests to put your server in various load level. It will make your server work at 10%, 50%, 100%… and take measurement for each state.

To do this we use https://github.com/ColinIanKing/stress-ng

On Debian : sudo apt-get install stress-ng

sudo ./energizta.sh --stresstest [--debug]

By default, it will run… TODO

Alternate stress tests

If you want to run your own stress tests, you can do it by providing your own file. Each line of the file should be a stress test command that will run for at least DURATION seconds (because you don't want your stress test to stop before the measurements…). The command does not have to stop by itself, energizta.sh will kill it after DURATION seconds.

sudo ./energizta.sh --stressfile my_stress_tests.txt

About the "host" variable

The "host" variable will be used in our database to group states by host, to study one host, or to exclude one host of the study.

It is composed of 3 parts:

  • the UUID of the / partition. This UUID will not change between runs and should be unique to your computer. But it is also completely anonymous and cannot be used to identify your computer on the internet. That's why we did not use the hostname of the mac address.
  • the md5sum of the hardware : lshw -short (with some filtering)
  • the md5sum of the software : arch, uname -a (minus hostname) and lsb_release -ds

The idea is that hardware and software upgrade can affect power consumption, so we need to group the states under a different ID.

If you want a shorter id, or a custom id, you can use :

--short-host-id       Use shorter string as HOST_ID and avoid the need for lshw
--force-host-id ID    Force an alternative HOST_ID, use $(hostname) for instance

These options are not compatible with --send-to-db.

How to send us your results!

The main goal of this tool is to stresstest your computer or baremetal server, and send the results to Energizta collaborative database. It will also send the hardware and OS.

sudo ./energizta.sh --stresstest --send-to-db

The data sent to the collaborative database should be completely anonymous and should not be enough to identify your computer or server (no hostname, no IP, no MAC, etc.). The script will display and ask you for confirmation before sending data.

How to run in the background (with systemd)

First, move energizta.sh to /usr/local/sbin/energizta.sh.

Then, create a new file /etc/systemd/system/energizta.service

[Unit]
Description=Energizta

[Service]
ExecStart=/usr/local/sbin/energizta.sh --interval 10 --duration 60 --short-host-id
ExecStart=/bin/sh -c '/usr/local/sbin/energizta.sh --interval 10 --duration 60 --with-timestamp --with-date --short-host-id >> /var/lib/energizta/energizta.log'

[Install]
WantedBy=multi-user.target

Then activate and run the service

systemctl daemon-reload
systemctl enable energizta
systemctl start energizta
tail -f /var/log/energizta.log

Please be aware that energizta.sh outputs JSON lines that can take a lot of space overtime. You should use --duration set the duration between each log (60s by default). And you should configure logrotate accordingly.

Also be aware that due to the current implementation, energizta.sh ends up having a load of variables after a few days which can cause significative load. You should restart this daemon at least once a day.

Questions to be answered

To help you understand how to use the data collected in energizta, here is a list of questions we would like to answer.

  • I have a server with an Intel Xeon E3-1240v6, 32GB RAM, 2 500GB SSD. I know last year it had a load average of 3. Can we tell how much energy it has consume? With what precision? Looking at existing data for similar hardware, we should at least be able to provide a minimum and a maximum power consumption. See Boaviztapi's consumption profile for more information on this use case

  • I am running a metrology agent on the same server (Xeon…, 32GB RAM… etc.). RAPL tells me that right now I have 14W in the CPU+RAM, I have a load1 of 3, 14%cpu_user, 10%cpu_sys, 0%cpu_iowait, 5GB RAM used, etc… but I don't have access to IPMI. Can we tell the global power usage? Again, with what precision?

  • I have an AWS EC2 a1.medium instance. I know my average cpu load from AWS API. Looking at existing data for similar hardware, we should at least be able to provide a minimum and a maximum power consumption. See cloud-scanner for more information on this use-case

Datapipeline

In progress...