Configuration

Shotover proxy accepts a two seperate YAML based configuration files. A configuration file specified by --config-file and a topology file specified by --topology-file

configuration.yaml

The configuration file is used to change general behavior of Shotover. Currently it supports two values:

  • main_log_level
  • observability_interface

main_log_level

This is a single string that you can use to configure logging with Shotover. It supports env_filter style configuration and filtering syntax. Log levels and filters can be dynamically changed while Shotover is still running.

observability_interface

Shotover has an observability interface for you to collect Prometheus data from. This value will define the address and port for Shotover's observability interface. It is configured as a string in the format of 127.0.0.1:8080 for IPV4 addresses or [2001:db8::1]:8080 for IPV6 addresses. More information is on the observability page.

topology.yaml

The topology file is currently the primary method for defining how Shotover behaves. Within the topology file you can configure sources, transforms and transform chains.

The below documentation shows you what each section does and runs through an entire example of a Shotover configuration file.

sources

The sources top level resource is a map of named sources, to their definitions.

The sources section of the configuration file allow you to specify a source or origin for requests. You can have multiple sources and even multiple sources of the same type. Each is named to allow you to easily reference it.

A source will generally represent a database protocol and will accept connections and queries from a compatible driver. For example the Redis source will accept connections from any Redis (RESP2) driver such as redis-py.

---
# The source section
sources:
  
  # The configured name of the source
  my_named_redis_source:
    # The source and any configuration needed for it
    # This will generally include a listen address and port
    Redis:
      listen_addr: "127.0.0.1:6379"
  
  # The configured name of the source
  my_cassandra_prod:

    # The sources and any configuration needed for it
    # This will generally include a listen address and port
    Cassandra:
      listen_addr: "127.0.0.1:9042"

chain_config (Chain Configuration)

The chain_config top level resource is a map of named chains, to their definitions.

The chain_config section of the configuration file allows you to name and define a transform chain. A transform chain is represented as an array of transforms and their respective configuration. The order in which a transform chain, is the order in which a query will traverse it. So the first transform in the chain, will get the request from source first, and pass it to the second transform in the chain.

As each transform chain is synchronous, with each transform being able to call the next transform in it's chain, the response from the upstream database or generated by a transform down the chain will be passed back up the chain, allowing each transform to handle the response.

The last transform in a chain should be a "terminating" transform. That is, one that passes the query on to the upstream database (e.g. CassandraSinkSingle) or one that returns a Response on it's own ( e.g. DebugReturner).

For example

chain_config:
  example_chain:
    - One
    - Two
    - Three
    - TerminatingTransform

A query from a client will go:

  • Source -> One -> Two -> Three -> TerminatingTransform

The response (returned to the chain by the TerminatingTransform) will follow the reverse path:

  • TerminatingTransform -> Three -> Two -> One -> Source

Under the hood, each transform is able to call it's down-chain transform and wait on it's response. Each Transform has it's own set of configuration values, options and behavior. See Transforms for details.

The following example chain_config has three chains:

  • redis_chain - Consists of a Tee, a transform that will copy the query to the named topic and also pass the query down-chain to a terminating transform RedisSinkSingle which sends to the query to a Redis server. Very similar to the tee linux program.
  • main_chain - Also consists of a Tee that will copy queries to the same topic as the redis_chain before sending the query onto caching layer that will try to resolve the query from a redis cache before ending up finally sending the query to the destination Cassandra cluster via a CassandraSinkSingle
# This example will replicate all commands to the DR datacenter on a best effort basis
---
chain_config:
  # The name of the first chain
  redis_chain:
    # The first transform in the chain, in this case it's the Tee transform
    - Tee:
        behavior: Ignore
        # The number of message batches that the tee can hold onto in it's buffer of messages to send.
        # If they arent sent quickly enough and the buffer is full then tee will drop new incoming messages.
        buffer_size: 10000
        #The child chain, that Tee will asynchronously pass requests to
        chain:
          - QueryTypeFilter:
              filter: Read
          - Coalesce:
              flush_when_buffered_message_count: 2000
          - QueryCounter:
              name: "DR chain"
          - RedisSinkCluster:
              first_contact_points: [ "127.0.0.1:2120", "127.0.0.1:2121", "127.0.0.1:2122", "127.0.0.1:2123", "127.0.0.1:2124", "127.0.0.1:2125" ]
              connect_timeout_ms: 3000
    #The rest of the chain, these transforms are blocking
    - QueryCounter:
        name: "Main chain"
    - RedisSinkCluster:
        first_contact_points: [ "127.0.0.1:2220", "127.0.0.1:2221", "127.0.0.1:2222", "127.0.0.1:2223", "127.0.0.1:2224", "127.0.0.1:2225" ]
        connect_timeout_ms: 3000

source_to_chain_mapping Chain Mapping

The source_to_chain_mapping top level resource is a map of source names to chain name. This is the binding that will link a defined source to chain and allow messages/queries generated by a source to traverse a given chain.

The below snippet would complete our entire example:

source_to_chain_mapping:
 redis_prod: redis_chain

This mapping would effectively create a solution that:

  • All Redis requests are first batched and then sent to a remote Redis cluster in another region. This happens asynchronously and if the remote Redis cluster is unavailable it will not block operations to the current cluster.
  • Subsequently, all Redis actions get identified based on command type, counted and provided as a set of metrics.
  • The Redis request is then transform into a cluster aware request and routed to the correct node