The primary design constraints are these:
- We may receive logging events on any thread, and don't want to block the thread while communicating with the service.
- We generally want to batch individual messages to improve throughput, although the service may have constraints on either number of messages or bytes in a batch.
- Services may reject our requests, either temporarily or permanently.
To meet these constraints, the appender creates a separate thread for communication with the service,
with a concurrent queue to hold appended messages. When Log4J calls append()
, the appender converts
the passed LoggingEvent
into a textual representation, verifies that it conforms to limitations imposed
by the service, and adds it to the queue.
The writer consumes that queue, attempting to batch together messages into a single request. Once it has a batch (either based on size or a configurable timeout) it attempts to write those messages to the service. In addition to retries embedded within the AWS SDK, the writer will requeue messages that can't be sent, dropping messages once a user-configurable threshold is reached.
The writer thread is lazily started on the first call to append()
. There's a factory for writer
objects and writer threads, to support testing. If unable to start the writer thread, messages are
dropped and the situation is reported to the internal logger.
The writer thread handles most exceptions internally. Unexpected exceptions are reported using an uncaught exception handler in the appender. This will trigger the appender to discard the writer and create a new one (and also to report the failure to the internal logger).
The writer uses the default constructor for each AWS service, which in turn uses the default credential provider chain. This allows you to specify explicit credentials using several mechanisms, or to use instance roles for applications running on EC2 or Lambda.
Most AWS services allow batching of messages for efficiency. While sending maximum-sized requests is more efficient when there's a high volume of logging, it could cause an excessive delay in writing if there's a low volume (and will leave more messages unwritten if the program shuts down without waiting for all messages to be sent).
The appenders support message batching via the batchDelay
configuration variable. All messages go on
an internal queue, and the writer thread blocks on this queue until a message is available. Once the
writer pulls a message off the queue, it starts a countdown timer, and waits for additional messages
until it either fills the batch (based on AWS limits) or the timer is at zero. At that point it sends
the batch and waits for another initial message.
The default value, 2000, is intended as a tradeoff between keeping the log up to date and minimizing the amount of network traffic generated by the logger. For long-running applications this should be fine, but for applications that only run for a few seconds it may cause message loss (the appenders do not attach a shutdown hook, and the JVM does not necessarily respect those anyway). For such applications it makes sense to reduce the batch size to maybe 250 milliseconds, but beware that increasing the message rate may result in AWS throttling requests, which could extend the actual delay between a logging event and that event being written to its destination.
If you absolutely, positively cannot lose messages, you should use a different appender, one that writes messages synchronously. Beware, however, that even writing to a file is not guaranteed: the operating system buffers writes, so a system crash means that you may lose messages.
The appenders will attempt to deliver every message, requeing the messages if they fail (this is particuarly relevant to the Kinesis appender, since some messages in a batch may fail while others succeed). If there are persistent errors, such as a network outage, this could cause an out-of-memory situation.
To avoid such errors you can configure discardThreshold
and discardAction
parameters, which control how
messages are discarded. The threshold controls the maximum number of messages that will be maintained; once
that threshold is crossed, messages will be discarded according to one of the following rules:
oldest
- the oldest message in the queue is discarded; this is the default, as it allows you to track the current behavior of the application once the failure condition is resolved.newest
- the newest message in the queue is discarded. This is useful if you want to see what's happening at the time the failure condition occurs.none
- no messages are discarded. If you expect intermittent connectivity problems, have lots of memory, and don't want to miss any logging then this option may be reasonable. However, it's probably better to increase the threshold and use one of the other discard actions.
The default threshold is 10,000 messages. Assuming 1kb per message, that's 10MB of heap that will be used by the queue.
In order to support all AWS releases in the 1.11.x sequence, the appenders natively use the default service client constructors. However, these constructors have a few limitations:
- While they use a credentials provider chain
that looks for client credentials in multiple locations, they don't do the same for region.
Instead, they default to the
us-east-1
region and expect applications to change the region as appropriate. - Some use cases require providing credentials other than those that the will be picked up by the default provider chain. For example, you might direct logging to a stream or topic owned by a different AWS account.
To work-around these limitations, the appenders provide several mechanisms for application control of the service client. These are applied in the order listed:
-
You can specify a static factory method to create the client, using the
clientFactory
configuration parameter. This is specified as a fully-qualified classname, followed by a dot and the method name. For example, to use the default CloudWatch Logs factory (this is shown for example only, as the appender will use this method if available):log4j.appender.cloudwatch.clientFactory=com.amazonaws.services.logs.AWSLogsClientBuilder.defaultClient
-
You can specify the client endpoint using the
clientEndpoint
configuration parameter; see the AWS docs for a list of endpoint names. This parameter is primarily intended for applications that must use an older AWS SDK version (including 1.10.x) but want to log outside theus-east-1
region. For example, to direct Kinesis logging to a stream in theus-west-1
region:log4j.appender.kinesis.clientEndpoint=kinesis.us-west-1.amazonaws.com
-
The appender will use reflection to look for the presence of the default client factory. This first appeared in release 1.11.16, so is probably available for your application (and if not, you should consider upgrading).