Link

Mongo DB

TL:DR; This is a nice experiment, but don’t do this unless you want to be a full-time MongoDB database admin.

After several attempts with different configurations, I am unable to find a suitable configuration where the ReplicaSet will heal itself after a single member goes down.

In almost every case the Replica Set loses the member permanently and the lost member ends up in state OTHER. The remaining Replica Set will continue to function for a while, but eventually the database will go offline as members are restarted during regular Kubernetes operations.

Table of contents

  1. Default image as an init container
  2. Initialize a ReplicaSet

Default image as an init container

The default behaviour of most (all?) official database images on Docker Hub is to initialize the database and then immediately run the database.

For MongoDB this causes issues if we want to start a new ReplicaSet. Some command line parameters required at run time cannot be used during database init.

I solved this by first inintializing the database in once container, then run the database with different startup options in a second container.

The below script also copies a pregenerated key to the database folder. This is used by each node of the ReplicaSet to communicate with each other.

#!/bin/bash

# Remove the final exec from the entrypoint
sed -i.bak \
  -e 's|^exec "\$\@"$|exit 0|g' /usr/local/bin/docker-entrypoint.sh

# Execute the entrypoint, it will eventually exit
./usr/local/bin/docker-entrypoint.sh $@

# Copy the MongoDB key to the DB folder,
# MongoDB will otherwise complain about too open permissions
cat /run/secrets/mongodb/MONGODB_KEYFILE > /data/db/mongodb.key
chown mongodb:mongodb /data/db/mongodb.key
chmod 400 /data/db/mongodb.key

exit 0

Initialize a ReplicaSet

On Kubernetes I initialize the ReplicaSet with a Kubernetes Job.

Use the standard Mongo DB container image but we don’t allow it to connect to a database. Replace the entrypoint script with the following:

#!/bin/bash

PODS=(
    "mongodb-0"
    "mongodb-1"
    "mongodb-2"
)
PORT="27017"
SERVICE="mongodb"

MONGO_ARGS=(
    --quiet
    --port ${PORT}
    --username $(< ${MONGO_INITDB_ROOT_USERNAME_FILE})
    --password $(< ${MONGO_INITDB_ROOT_PASSWORD_FILE})
)

# Ensure all MongoDB pods are up
echo "Waiting for all MongoDB pods to become avaialble."
while [ "${PODS_UP}" != "${#PODS[@]}" ]
do
    sleep 1
    let PODS_UP=0
    for POD in ${PODS[@]}
    do
        MONGO_DB_STATUS=0
        MONGO_DB_STATUS=$(mongo ${MONGO_ARGS[@]} --host ${POD}.${SERVICE} --eval "JSON.stringify(db.stats())" | jq -r '.ok' 2> /dev/null)
        if [ "${MONGO_DB_STATUS}" = "1" ]
        then
            echo "MongoDB pod '${POD}' is avaialble!"
            let PODS_UP=${PODS_UP}+1
        fi
    done
done
echo "All MongoDB pods have become avaialble!"

# Check if we already have a ReplicaSet
MONGO_RS_STATUS=$(mongo ${MONGO_ARGS[@]} --host ${PODS[0]}.${SERVICE} --eval "JSON.stringify(rs.status())" | jq -r '.ok')
if [ "${MONGO_RS_STATUS}" == "1" ]
then
    echo "ReplicaSet already exists! nothing to do."
    exit 0
fi

# All nodes are up but give them time to settle
sleep 5

MEMBERS="{ _id: 0, host: '${PODS[0]}.${SERVICE}:${PORT}' }"
for (( i=1; i<${#PODS[@]}; i++ ))
do
    MEMBERS="${MEMBERS}, { _id: $i, host: '${PODS[$i]}.${SERVICE}:${PORT}' }"
done

mongo ${MONGO_ARGS[@]} \
    --host ${PODS[0]}.${SERVICE} \
    --eval  "rs.initiate({
                _id: '${MONGO_INITDB_DATABASE}',
                members: [ ${MEMBERS} ]
            })"