VAMI Backup Error

VMware Cloud Director VAMI Backup

You may find yourself in a situation where you want to take a VMware Cloud Director VAMI Backup, but run into the error: Unable to create backup: A failure occured while creating the backup on the primary.

VMware Cloud Director VAMI Backup - Error Message

Introduction

Sometimes the best efforts still lead to situations you cannot fix.

In your research to solve this dastardly problem, you may come across the document on how to prepare the transfer server storage.

https://docs.vmware.com/en/VMware-Cloud-Director/10.5/VMware-Cloud-Director-Install-Configure-Upgrade-Guide/GUID-BCC3CFF0-E85A-450C-8A5E-3723DFC1A093.html

Now that doc has great setup tips and this should work 90% of the time. But what happens if you use a Debian-based NFS share? Alternatively, what happens if you *don’t* use a Debian-based NFS share and it *still* doesn’t work?

Debian

You can try the fixes outlined in this KB https://kb.vmware.com/s/article/94755

Disable the “RPCMOUNTDOTS=”–manage-gids”” setting.

  1. Shutdown all cells.
  2. Edit /etc/default/nfs-kernel-server and change RPCMOUNTDOPTS=”–manage-gids” to RPCMOUNTDOPTS=””
  3. Reboot the NFS Server
  4. Modify the /etc/password file on all cells to fix the group permission on the postgres user (the group might be listed as 100 instead of the proper 1002
    • postgres:x:1002:1002::/var/vmware/vpostgres/14/:/bin/bash
  5. Start the cells back up.

That might fix it. I used Ubuntu for two different VMware Cloud Director Instances. One instance was fixed with the steps above. However, the other instance *still* would not work.

What happens if it still doesn’t work?

For those cases where it still does not work, VMware support provided me with a new backup.sh script to use that solved the problem on my other VCD instance.

Put the following script in /opt/vmware/appliance/bin – name the file create-backups.sh and make sure the file is owned by root with permissions 755. You may want to take a backup of the old create-backups.sh script first.

SSH into your VMware Cloud Director Cell(s). Take a backup of the create-backups.sh script on all cells and replace the create-backup.sh script on all cells with the script below.

mv /opt/vmware/appliance/bin/create-backups.sh mv /opt/vmware/appliance/bin/create-backups.sh.bak

New create-backups.sh Script

#!/bin/bash
# Copyright 2019-2022 VMware, Inc.  All rights reserved.
VCLOUD_HOME=/opt/vmware/vcloud-director
VMWARE_POSTGRES_BIN=/opt/vmware/vpostgres/current/bin
NFSMOUNT=$VCLOUD_HOME/data/transfer
LOG_DIR="/opt/vmware/var/log/vcd"
LOG_FILE="$LOG_DIR/backup.log"
NODE_TYPE_FILE="$VCLOUD_HOME/appliance-type"
APPL_HOME="/opt/vmware/appliance"
BIN_DIR="$APPL_HOME/bin"
APPL_SSL="$APPL_HOME/etc/ssl"
VCD_PHASE_COMPLETE_FILE="$APPL_HOME/etc/vcd-configuration-completed"

touch $LOG_FILE
source $BIN_DIR/common-utils.sh
get_error_codes

# Prevent simultaneous executions
LOCK_FILE=/tmp/backup.lock
if [ -e "$LOCK_FILE" ]
then
    log_and_echo_error "Another Backup creation is in progress. Please retry later."
    exit $LOCK_CONFLICT_ERROR
else
    touch $LOCK_FILE
    chown vcloud.vcloud $LOCK_FILE
fi
trap "rm -f $LOCK_FILE" EXIT

log_and_echo "Invoking Backup utility ..."

if [ $# -eq 0 ]; then
    log "Command line usage to create embedded PG DB backup: create-backup.sh <DBNAME>"
    log_and_echo "Using vcloud as default PG DB to backup"
fi

DBNAME=${1:-vcloud}                                # Database name whose backup needs to be created
BACKUPS_DIR="/var/vmware/vpostgres/backups"        # Directory where the backups/dumps will be stored
DATE=`date -d now --iso-8601=seconds`              # iso8601 formatted date
DATE=${DATE//:}                                    # Updated date with : trimmed
BACKUP_DIR="$BACKUPS_DIR/backup-$DATE"
DB_DUMP="$DBNAME-database.sql"
DB_DUMP_PATH="$BACKUP_DIR/$DB_DUMP"
ZIP_FILE="$BACKUPS_DIR/backup-$DATE.zip"

HTTP_CERT_FILE=`cat $NFSMOUNT/responses.properties |grep user.certificate.path|cut -d'=' -f2`
if [ -z "${HTTP_CERT_FILE// }" ]; then
        log_and_echo_error "Missing user.certificate.path value in $NFSMOUNT/responses.properties. Diagnose the missing HTTP cert file path issue before reattempting the backup creation."
        exit $FILE_NOT_FOUND
fi
HTTP_KEY_FILE=`cat $NFSMOUNT/responses.properties |grep user.key.path|cut -d'=' -f2`
if [ -z "${HTTP_KEY_FILE// }" ]; then
        log_and_echo_error "Missing user.key.path value in $NFSMOUNT/responses.properties. Diagnose the missing HTTP key file path issue before reattempting the backup creation."
        exit $FILE_NOT_FOUND
fi

PGMI_CERT_FILE="$APPL_SSL/vcd_ova.crt"
PGMI_KEY_FILE="$APPL_SSL/vcd_ova.key"
log_and_echo "HTTP_CERT_FILE =$HTTP_CERT_FILE"
log_and_echo "HTTP_KEY_FILE =$HTTP_KEY_FILE"
log_and_echo "PGMI_CERT_FILE=$PGMI_CERT_FILE"
log_and_echo "PGMI_KEY_FILE=$PGMI_KEY_FILE"

SAVE_METADATA() {
    APPL_VERSION=$(/usr/bin/xmllint --xpath 'string(//fullVersion)' /opt/vmware/etc/appliance-manifest.xml)
    VCD_VERSION=$(grep product.version /opt/vmware/vcloud-director/etc/global.properties)

    echo "appliance.version=$APPL_VERSION" > $BACKUP_DIR/metadata.properties
    echo "$VCD_VERSION" >> $BACKUP_DIR/metadata.properties
}

DB_BACKUP() {
    su - postgres -c "$VMWARE_POSTGRES_BIN/pg_dump -v -Fc $DBNAME > $DB_DUMP_PATH" &>> $LOG_FILE
}

DB_USER_BACKUP() {
    su - postgres -c "$VMWARE_POSTGRES_BIN/pg_dumpall --roles-only | grep -e 'CREATE ROLE vcloud;\|ALTER ROLE vcloud WITH' > $BACKUP_DIR/vcloud-user.sql"
}

ARCHIVE_DUMP_PROPS_CERTS() {
    cp -p $VCLOUD_HOME/etc/global.properties $BACKUP_DIR
    cp -p $NFSMOUNT/responses.properties $BACKUP_DIR
    cp -p $VCLOUD_HOME/etc/truststore.pem $BACKUP_DIR

    cp -p $HTTP_KEY_FILE $BACKUP_DIR
    if [ $? -ne 0 ]; then
         EXIT_FILE_NOT_FOUND $HTTP_KEY_FILE
    fi

    cp -p $HTTP_CERT_FILE $BACKUP_DIR
    if [ $? -ne 0 ]; then
         EXIT_FILE_NOT_FOUND $HTTP_CERT_FILE
    fi

    cp -p $PGMI_KEY_FILE $BACKUP_DIR
    if [ $? -ne 0 ]; then
        EXIT_FILE_NOT_FOUND $PGMI_KEY_FILE
    fi
    cp -p $PGMI_CERT_FILE $BACKUP_DIR
    if [ $? -ne 0 ]; then
        EXIT_FILE_NOT_FOUND $PGMI_CERT_FILE
    fi
    cd $BACKUP_DIR
    zip $ZIP_FILE *
    mv $ZIP_FILE /opt/vmware/vcloud-director/data/transfer/backups
}

EXIT_FILE_NOT_FOUND() {
        log_and_echo_error "Failed to take a backup of $1. Check if $1 exists."
        CLEANUP
        exit $FILE_NOT_FOUND
}

CLEANUP() {
    [ -d $BACKUP_DIR ] && rm -rf $BACKUP_DIR
    if [ $archive_return_code -ne 0 ]; then
        [ -f $ZIP_FILE ] && rm $ZIP_FILE
    fi
}

if [ ! -e "$VCD_PHASE_COMPLETE_FILE" ]; then
    log_and_echo_error "This Cloud Director appliance has not yet been successfully configured."
    exit $VCD_PHASE_NOT_COMPLETE_ERROR
fi

if [ ! -e $NODE_TYPE_FILE ]; then
    log_and_echo_error "Unable to determine appliance type. $NODE_TYPE_FILE is not present on appliance."
    exit $NO_APPLIANCE_TYPE_FOUND_ERROR
fi

node_type=$(<$NODE_TYPE_FILE)
if [ $node_type != "primary" ]; then
    log_and_echo_error "Backup utility should be invoked from primary node."
    exit $EXIT_NOT_APPLICABLE
fi


if grep -qs "$NFSMOUNT" /proc/mounts; then
    /usr/bin/timeout 5s /usr/bin/ls $NFSMOUNT &> /dev/null
    if [ $? -ne 0 ]; then
        log_and_echo_error "Timed out listing nfs mounted share. Backup creation cannot continue."
        exit $NFS_TIME_ERROR
    fi

    if [ ! -d "$BACKUPS_DIR" ]; then
        log_and_echo "Creating top level backups directory $BACKUPS_DIR because it does not exist..."
        mkdir -p $BACKUPS_DIR
        log_and_echo "Changing the permissions to 770 and ownership to vcloud user on $BACKUPS_DIR"
        chmod 770 $BACKUPS_DIR
        chown vcloud.vcloud $BACKUPS_DIR
    fi

    log_and_echo "Creating back up directory $BACKUP_DIR"
    mkdir $BACKUP_DIR
    if [ $? -ne 0 ]; then
        log_and_echo_error "Failed to create backup directory $BACKUP_DIR. Ensure that nfs server settings are correct before reattempting."
        exit $OS_COMMAND_FAILED
    fi
    log_and_echo "Changing the permissions to 770 and ownership to vcloud user on $BACKUP_DIR"
    chmod 770 $BACKUP_DIR
    chown vcloud.vcloud $BACKUP_DIR
else
    log_and_echo_error "$NFSMOUNT is not mounted. Ensure that the NFS is mounted before reinvoking this again."
    exit $NFS_MOUNTPOINT_DOES_NOT_EXIST_ERROR
fi


log_and_echo "Saving metadata to $BACKUP_DIR/metadata.properties ..."
SAVE_METADATA

log_and_echo "Saving the vcloud DB user to $BACKUP_DIR/vcloud-user.sql..."
DB_USER_BACKUP
db_user_backup_return_code=$?
if [ $db_user_backup_return_code -ne 0 ]; then
    log_and_echo_error "Failed saving the vcloud DB user to $BACKUP_DIR/vcloud-user.sql"
    log_and_echo_error "Ensure that all the requirements on the NFS server configuration are met before reattempting this operation again. This includes but not limited to checking the export settings on NFS server and other settings based on the distribution on which NFS server is setup."
    log_and_echo_error "Refer to the product documentation on 'Preparing the Transfer Server Storage for the VMware Cloud Director Appliance' regarding the requirements to be met for configuring the NFS server."
    exit $db_user_backup_return_code
fi

log_and_echo "Creating the $DBNAME DB backup at $BACKUP_DIR..."

DB_BACKUP
db_backup_return_code=$?
if [ $db_backup_return_code -eq 0 ]; then
    log_and_echo "$DBNAME DB backup successfully created."
    log_and_echo "Archiving DB backup, primary node's properties files, certs, metadata ..."

    ARCHIVE_DUMP_PROPS_CERTS
    archive_return_code=$?
    if [ $archive_return_code -eq 0 ]; then
        chown vcloud.vcloud $ZIP_FILE
        log_and_echo "Tar archive successfully created."
        log_and_echo "$DBNAME DB dump, properties files, certs and metadata have been archived to $ZIP_FILE"
        log_and_echo "Backup Creation Complete."
    else
        log_and_echo_error "Failed to archive $DBNAME DB dump, properties files, certs and metadata."
        log_and_echo_error "Backup Creation Failed."
        log_and_echo_error "Please check the logs at $LOG_FILE"
    fi

    CLEANUP
    exit $archive_return_code
else
    archive_return_code=1                # To indicate archive was not taken which would happen when DB backup could not be successfully taken
    CLEANUP
    log_and_echo_error "$DBNAME DB backup failed."
    log_and_echo_error "Please check the logs at $LOG_FILE"
    exit $db_backup_return_code
fi

The next time you run your VMware Cloud Director VAMI Backup you’ll find it creates a backup now!

Check out my other VMware articles at https://explosive.cloud/category/vmware/ – I look to only post challenges I run across that aren’t easily solved.