How to model a database
e-footprint does not have a first-class abstraction for databases. A
database is modeled by composing the primitives used for any backend
workload: a Server, a Storage, and a set of Jobs.
A database-flavoured Service, plugged in through the server's
installed-services mechanism, would bundle a sensible engine baseline
and canonical query templates — that would be a welcome contribution.
How a database maps onto the primitives
- Engine baseline. Use Server.base_ram_consumption and Server.base_compute_consumption for the RAM and CPU the engine holds independently of queries — buffer pool, connection workers, background tasks. Both default to zero, so they are easy to forget.
- On-disk footprint. Use Storage.base_storage_need for existing tables and indexes at t=0, Storage.data_storage_duration for retention, Storage.data_replication_factor for replicas. Storage already accounts for both the fabrication and the operational energy of the persistent volumes; do not model them separately.
- Query workload. Use one Job per kind of operation, not one averaged job. Set Job.compute_needed, Job.ram_needed, and Job.request_duration for what a single query holds; Job.data_transferred for the network payload; Job.data_stored for the net per-call storage growth.
Python sketch
storage = Storage(
"PostgreSQL storage",
base_storage_need=SourceValue(100 * u.GB_stored),
)
db_server = Server(
"PostgreSQL server",
base_ram_consumption=SourceValue(2 * u.GB_ram),
base_compute_consumption=SourceValue(0.1 * u.cpu_core),
storage=storage,
# remaining params fall back to defaults
)
read_query = Job(
"SELECT", server=db_server,
request_duration=SourceValue(20 * u.ms),
compute_needed=SourceValue(0.1 * u.cpu_core),
ram_needed=SourceValue(20 * u.MB_ram),
data_transferred=SourceValue(5 * u.kB),
data_stored=SourceValue(0 * u.kB),
)
write_query = Job(
"INSERT", server=db_server,
request_duration=SourceValue(40 * u.ms),
compute_needed=SourceValue(0.2 * u.cpu_core),
ram_needed=SourceValue(50 * u.MB_ram),
data_transferred=SourceValue(2 * u.kB),
data_stored=SourceValue(0.5 * u.kB),
)
The values above sketch a modest PostgreSQL deployment under steady
load: a couple of gigabytes of shared buffers, continuous background
workers (checkpointer, autovacuum), and queries in the millisecond
range. They are illustrative, not measured — profile your own setup or
cite a credible source in a Source.
Pitfalls
- Averaging reads and writes into a single job. They differ by
orders of magnitude on
data_storedand often oncompute_needed. Model the mix via multiple Job instances. - Confusing transfer and storage. Job.data_transferred is the per-call network payload; Job.data_stored is the per-call contribution to durable storage. Both can be non-zero on the same job.
- Forgetting the engine baseline. The defaults for Server.base_ram_consumption and Server.base_compute_consumption are zero. A database with no queries still consumes resources.
See server_to_server_interaction for how to wire these jobs into a usage journey alongside an upstream web server.
Load this scenario in the e-footprint interface: Database modeling