All Statsd Metrics

account-auditor Metrics

Metric Name

Description

account-auditor.errors

Count of audit runs (across all account databases) which caught an Exception.

account-auditor.passes

Count of individual account databases which passed audit.

account-auditor.failures

Count of individual account databases which failed audit.

account-auditor.timing

Timing data for individual account database audits.

account-reaper Metrics

Metric Name

Description

account-reaper.errors

Count of devices failing the mount check.

account-reaper.timing

Timing data for each reap_account() call.

account-reaper.return_codes.X

Count of HTTP return codes from various operations (e.g. object listing, container deletion, etc.). The value for X is the first digit of the return code (2 for 201, 4 for 404, etc.).

account-reaper.containers_failures

Count of failures to delete a container.

account-reaper.containers_deleted

Count of containers successfully deleted.

account-reaper.containers_remaining

Count of containers which failed to delete with zero successes.

account-reaper.containers_possibly_remaining

Count of containers which failed to delete with at least one success.

account-reaper.objects_failures

Count of failures to delete an object.

account-reaper.objects_deleted

Count of objects successfully deleted.

account-reaper.objects_remaining

Count of objects which failed to delete with zero successes.

account-reaper.objects_possibly_remaining

Count of objects which failed to delete with at least one success.

account-server Metrics

..note::

“Not Found” is not considered an error and requests which increment errors are not included in the timing data.

Metric Name

Description

account-server.DELETE.errors.timing

Timing data for each DELETE request resulting in an error: bad request, not mounted, missing timestamp.

account-server.DELETE.timing

Timing data for each DELETE request not resulting in an error.

account-server.PUT.errors.timing

Timing data for each PUT request resulting in an error: bad request, not mounted, conflict, recently-deleted.

account-server.PUT.timing

Timing data for each PUT request not resulting in an error.

account-server.HEAD.errors.timing

Timing data for each HEAD request resulting in an error: bad request, not mounted.

account-server.HEAD.timing

Timing data for each HEAD request not resulting in an error.

account-server.GET.errors.timing

Timing data for each GET request resulting in an error: bad request, not mounted, bad delimiter, account listing limit too high, bad accept header.

account-server.GET.timing

Timing data for each GET request not resulting in an error.

account-server.REPLICATE.errors.timing

Timing data for each REPLICATE request resulting in an error: bad request, not mounted.

account-server.REPLICATE.timing

Timing data for each REPLICATE request not resulting in an error.

account-server.POST.errors.timing

Timing data for each POST request resulting in an error: bad request, bad or missing timestamp, not mounted.

account-server.POST.timing

Timing data for each POST request not resulting in an error.

account-replicator Metrics

Metric Name

Description

account-replicator.diffs

Count of syncs handled by sending differing rows.

account-replicator.diff_caps

Count of “diffs” operations which failed because “max_diffs” was hit.

account-replicator.no_changes

Count of accounts found to be in sync.

account-replicator.hashmatches

Count of accounts found to be in sync via hash comparison (broker.merge_syncs was called).

account-replicator.rsyncs

Count of completely missing accounts which were sent via rsync.

account-replicator.remote_merges

Count of syncs handled by sending entire database via rsync.

account-replicator.attempts

Count of database replication attempts.

account-replicator.failures

Count of database replication attempts which failed due to corruption (quarantined) or inability to read as well as attempts to individual nodes which failed.

account-replicator.removes.<device>

Count of databases on <device> deleted because the delete_timestamp was greater than the put_timestamp and the database had no rows or because it was successfully sync’ed to other locations and doesn’t belong here anymore.

account-replicator.successes

Count of replication attempts to an individual node which were successful.

account-replicator.timing

Timing data for each database replication attempt not resulting in a failure.

container-auditor Metrics

Metric Name

Description

container-auditor.errors

Incremented when an Exception is caught in an audit pass (only once per pass, max).

container-auditor.passes

Count of individual containers passing an audit.

container-auditor.failures

Count of individual containers failing an audit.

container-auditor.timing

Timing data for each container audit.

container-replicator Metrics

Metric Name

Description

container-replicator.diffs

Count of syncs handled by sending differing rows.

container-replicator.diff_caps

Count of “diffs” operations which failed because “max_diffs” was hit.

container-replicator.no_changes

Count of containers found to be in sync.

container-replicator.hashmatches

Count of containers found to be in sync via hash comparison (broker.merge_syncs was called).

container-replicator.rsyncs

Count of completely missing containers where were sent via rsync.

container-replicator.remote_merges

Count of syncs handled by sending entire database via rsync.

container-replicator.attempts

Count of database replication attempts.

container-replicator.failures

Count of database replication attempts which failed due to corruption (quarantined) or inability to read as well as attempts to individual nodes which failed.

container-replicator.removes.<device>

Count of databases deleted on <device> because the delete_timestamp was greater than the put_timestamp and the database had no rows or because it was successfully sync’ed to other locations and doesn’t belong here anymore.

container-replicator.successes

Count of replication attempts to an individual node which were successful.

container-replicator.timing

Timing data for each database replication attempt not resulting in a failure.

container-server Metrics

Note

“Not Found” is not considered an error and requests which increment errors are not included in the timing data.

Metric Name

Description

container-server.DELETE.errors.timing

Timing data for DELETE request errors: bad request, not mounted, missing timestamp, conflict.

container-server.DELETE.timing

Timing data for each DELETE request not resulting in an error.

container-server.PUT.errors.timing

Timing data for PUT request errors: bad request, missing timestamp, not mounted, conflict.

container-server.PUT.timing

Timing data for each PUT request not resulting in an error.

container-server.HEAD.errors.timing

Timing data for HEAD request errors: bad request, not mounted.

container-server.HEAD.timing

Timing data for each HEAD request not resulting in an error.

container-server.GET.errors.timing

Timing data for GET request errors: bad request, not mounted, parameters not utf8, bad accept header.

container-server.GET.timing

Timing data for each GET request not resulting in an error.

container-server.REPLICATE.errors.timing

Timing data for REPLICATE request errors: bad request, not mounted.

container-server.REPLICATE.timing

Timing data for each REPLICATE request not resulting in an error.

container-server.POST.errors.timing

Timing data for POST request errors: bad request, bad x-container-sync-to, not mounted.

container-server.POST.timing

Timing data for each POST request not resulting in an error.

container-sync Metrics

Metric Name

Description

container-sync.skips

Count of containers skipped because they don’t have sync’ing enabled.

container-sync.failures

Count of failures sync’ing of individual containers.

container-sync.syncs

Count of individual containers sync’ed successfully.

container-sync.deletes

Count of container database rows sync’ed by deletion.

container-sync.deletes.timing

Timing data for each container database row synchronization via deletion.

container-sync.puts

Count of container database rows sync’ed by Putting.

container-sync.puts.timing

Timing data for each container database row synchronization via Putting.

container-updater Metrics

Metric Name

Description

container-updater.successes

Count of containers which successfully updated their account.

container-updater.failures

Count of containers which failed to update their account.

container-updater.no_changes

Count of containers which didn’t need to update their account.

container-updater.timing

Timing data for processing a container; only includes timing for containers which needed to update their accounts (i.e. “successes” and “failures” but not “no_changes”).

object-auditor Metrics

Metric Name

Description

object-auditor.quarantines

Count of objects failing audit and quarantined.

object-auditor.errors

Count of errors encountered while auditing objects.

object-auditor.timing

Timing data for each object audit (does not include any rate-limiting sleep time for max_files_per_second, but does include rate-limiting sleep time for max_bytes_per_second).

object-expirer Metrics

Metric Name

Description

object-expirer.objects

Count of objects expired.

object-expirer.errors

Count of errors encountered while attempting to expire an object.

object-expirer.timing

Timing data for each object expiration attempt, including ones resulting in an error.

object-reconstructor Metrics

Metric Name

Description

object-reconstructor.partition.delete.count.<device>

A count of partitions on <device> which were reconstructed and synced to another node because they didn’t belong on this node. This metric is tracked per-device to allow for “quiescence detection” for object reconstruction activity on each device.

object-reconstructor.partition.delete.timing

Timing data for partitions reconstructed and synced to another node because they didn’t belong on this node. This metric is not tracked per device.

object-reconstructor.partition.update.count.<device>

A count of partitions on <device> which were reconstructed and synced to another node, but also belong on this node. As with delete.count, this metric is tracked per-device.

object-reconstructor.partition.update.timing

Timing data for partitions reconstructed which also belong on this node. This metric is not tracked per-device.

object-reconstructor.suffix.hashes

Count of suffix directories whose hash (of filenames) was recalculated.

object-reconstructor.suffix.syncs

Count of suffix directories reconstructed with ssync.

object-replicator Metrics

Metric Name

Description

object-replicator.partition.delete.count.<device>

A count of partitions on <device> which were replicated to another node because they didn’t belong on this node. This metric is tracked per-device to allow for “quiescence detection” for object replication activity on each device.

object-replicator.partition.delete.timing

Timing data for partitions replicated to another node because they didn’t belong on this node. This metric is not tracked per device.

object-replicator.partition.update.count.<device>

A count of partitions on <device> which were replicated to another node, but also belong on this node. As with delete.count, this metric is tracked per-device.

object-replicator.partition.update.timing

Timing data for partitions replicated which also belong on this node. This metric is not tracked per-device.

object-replicator.suffix.hashes

Count of suffix directories whose hash (of filenames) was recalculated.

object-replicator.suffix.syncs

Count of suffix directories replicated with rsync.

object-server Metrics

Metric Name

Description

object-server.quarantines

Count of objects (files) found bad and moved to quarantine.

object-server.async_pendings

Count of container updates saved as async_pendings (may result from PUT or DELETE requests).

object-server.POST.errors.timing

Timing data for POST request errors: bad request, missing timestamp, delete-at in past, not mounted.

object-server.POST.timing

Timing data for each POST request not resulting in an error.

object-server.PUT.errors.timing

Timing data for PUT request errors: bad request, not mounted, missing timestamp, object creation constraint violation, delete-at in past.

object-server.PUT.timeouts

Count of object PUTs which exceeded max_upload_time.

object-server.PUT.timing

Timing data for each PUT request not resulting in an error.

object-server.PUT.<device>.timing

Timing data per kB transferred (ms/kB) for each non-zero-byte PUT request on each device. Monitoring problematic devices, higher is bad.

object-server.GET.errors.timing

Timing data for GET request errors: bad request, not mounted, header timestamps before the epoch, precondition failed. File errors resulting in a quarantine are not counted here.

object-server.GET.timing

Timing data for each GET request not resulting in an error. Includes requests which couldn’t find the object (including disk errors resulting in file quarantine).

object-server.HEAD.errors.timing

Timing data for HEAD request errors: bad request, not mounted.

object-server.HEAD.timing

Timing data for each HEAD request not resulting in an error. Includes requests which couldn’t find the object (including disk errors resulting in file quarantine).

object-server.DELETE.errors.timing

Timing data for DELETE request errors: bad request, missing timestamp, not mounted, precondition failed. Includes requests which couldn’t find or match the object.

object-server.DELETE.timing

Timing data for each DELETE request not resulting in an error.

object-server.REPLICATE.errors.timing

Timing data for REPLICATE request errors: bad request, not mounted.

object-server.REPLICATE.timing

Timing data for each REPLICATE request not resulting in an error.

object-updater Metrics

Metric Name

Description

object-updater.errors

Count of drives not mounted or async_pending files with an unexpected name.

object-updater.timing

Timing data for object sweeps to flush async_pending container updates. Does not include object sweeps which did not find an existing async_pending storage directory.

object-updater.quarantines

Count of async_pending container updates which were corrupted and moved to quarantine.

object-updater.successes

Count of successful container updates.

object-updater.failures

Count of failed container updates.

object-updater.unlinks

Count of async_pending files unlinked. An async_pending file is unlinked either when it is successfully processed or when the replicator sees that there is a newer async_pending file for the same object.

proxy-server Metrics

In the table, <type> is the proxy-server controller responsible for the request and will be one of account, container, or object.

Metric Name

Description

proxy-server.errors

Count of errors encountered while serving requests before the controller type is determined. Includes invalid Content-Length, errors finding the internal controller to handle the request, invalid utf8, and bad URLs.

proxy-server.<type>.handoff_count

Count of node hand-offs; only tracked if log_handoffs is set in the proxy-server config.

proxy-server.<type>.handoff_all_count

Count of times only hand-off locations were utilized; only tracked if log_handoffs is set in the proxy-server config.

proxy-server.<type>.client_timeouts

Count of client timeouts (client did not read within client_timeout seconds during a GET or did not supply data within client_timeout seconds during a PUT).

proxy-server.<type>.client_disconnects

Count of detected client disconnects during PUT operations (does NOT include caught Exceptions in the proxy-server which caused a client disconnect).

Additionally, middleware often emit their own metrics

proxy-logging Middleware

In the table, <type> is either the proxy-server controller responsible for the request: account, container, object, or the string SOS if the request came from the Swift Origin Server middleware. The <verb> portion will be one of GET, HEAD, POST, PUT, DELETE, COPY, OPTIONS, or BAD_METHOD. The list of valid HTTP methods is configurable via the log_statsd_valid_http_methods config variable and the default setting yields the above behavior.

Metric Name

Description

proxy-server.<type>.<verb>.<status>.timing

Timing data for requests, start to finish. The <status> portion is the numeric HTTP status code for the request (e.g. “200” or “404”).

proxy-server.<type>.GET.<status>.first-byte.timing

Timing data up to completion of sending the response headers (only for GET requests). <status> and <type> are as for the main timing metric.

proxy-server.<type>.<verb>.<status>.xfer

This counter metric is the sum of bytes transferred in (from clients) and out (to clients) for requests. The <type>, <verb>, and <status> portions of the metric are just like the main timing metric.

The proxy-logging middleware also groups these metrics by policy. The <policy-index> portion represents a policy index:

Metric Name

Description

proxy-server.object.policy.<policy-index>.<verb>.<status>.timing

Timing data for requests, aggregated by policy index.

proxy-server.object.policy.<policy-index>.GET.<status>.first-byte.timing

Timing data up to completion of sending the response headers, aggregated by policy index.

proxy-server.object.policy.<policy-index>.<verb>.<status>.xfer

Sum of bytes transferred in and out, aggregated by policy index.

tempauth Middleware

In the table, <reseller_prefix> represents the actual configured reseller_prefix or NONE if the reseller_prefix is the empty string:

Metric Name

Description

tempauth.<reseller_prefix>.unauthorized

Count of regular requests which were denied with HTTPUnauthorized.

tempauth.<reseller_prefix>.forbidden

Count of regular requests which were denied with HTTPForbidden.

tempauth.<reseller_prefix>.token_denied

Count of token requests which were denied.

tempauth.<reseller_prefix>.errors

Count of errors.