Back to Blog Posts

Using APScheduler in Python to run a MySQL database backup

Blog

Wednesday, 06 November 2019

The straightforward approach to running a database backup on a schedule can be solved by creating a cron job to run mysqldump and then send the output to a file.


0 23 * * 6 mysqldump -u 'db_username' -p'db_password' db_name > /backup/db_name.sql

Instead, I wanted to schedule and run backups from within the Flask application to make for one less configuration step to remember whenever the application is installed fresh. A common approach for running background processes is to use an asynchronous task queue like Celery or Redis Queue, however these would be considered overkill for a weekly job that runs for 1 second at a time. These options can be revisited for larger or more frequent background processes in future.

A more suited tool for this scenario is the Advanced Python Scheduler (APScheduler) library. This would allow the application to execute a function periodically in the background by decorating it as a job and adding it to a defined schedule. This library can be installed with pip.


pip install apscheduler

There are several types of schedulers offered by APScheduler however the most suitable for this use case is the BackgroundScheduler, as the documentation states to use this option when you want the scheduler to run in the background inside your application. This will mean that the backup can take place without interrupting the main thread and a user will not have to wait until it has completed before continuing to use the site.

The scheduler gets imported and initialised in the applications __init__.py file.


from apscheduler.schedulers.background import BackgroundScheduler

[..]

task_schedule = BackgroundScheduler(daemon=True)
task_schedule.start()

A file named tasks.py has been created to contain the code for jobs that are to be run on schedules. The task_schedule that was initialised earlier is imported along with configuration values and modules needed to run the backup.


import os
import time
import pipes
import pathlib
from app import app, task_schedule

The pathlib module is used to create the backup directory if it does not exist. Specifying parents=True will create any parent directories required for the full path and exist_ok=True won't cause errors if the directory has already been created. The folder for each backup will be named after the time the backup occurred to create unique and organised folders.

The function for running the backup - create_database_backup() has been decorated with the scheduled_job from the imported task_schedule. This is then given a trigger which for this job is cron. This is intended to work similarly to the Unix utility by running the job when the current time satisfies all of the conditions passed to it. I've specified Saturday and 23:00 - so a backup will run weekly (unless the function is called manually).

The code that actually runs the backup is basically the same as the example at the top of the page, as the imported OS module will allow Python to run system commands. The mysqldump_process contains the written out command for running the backup by concatenating the variables imported and created so far. pipes.quote is used to escape characters within the backup path so the command executes correctly in the shell. This is then passed to os.system to execute.

Other than that, the backup time gets logged before and after the backup to keep on top of how long a backup takes.


@task_schedule.scheduled_job('cron', day_of_week='sat', hour=23)
def create_database_backup():

    backup_time = time.strftime('%Y%m%d-%H%M%S')
    db_backup_path = backup_path + '/database/' + backup_time + '/'
    pathlib.Path(db_backup_path).mkdir(parents=True, exist_ok=True)

    app.logger.info("Starting database backup at: " + backup_time)

    mysqldump_process = "mysqldump" + " -u " + mysql_backup_user + " -p" + mysql_backup_pass + " " + mysql_backup_db\
                        + " > " + pipes.quote(db_backup_path) + mysql_backup_db + ".sql"
    os.system(mysqldump_process)

    backup_complete_time = time.strftime('%Y%m%d-%H%M%S')
    app.logger.info("Completed database backup at: " + backup_complete_time)


Leave a Comment

Comments (0)