Liquibase Lock Release with K8s Scheduled Job
If you ever ran a serious production database (is there any other kind, really? 🤔 ), you are most likely using some sort of database migration tool. Tools like Flyway, Liquibase or Knex for JS are pretty good at that and haven’t caused much trouble in our lives until we started treating our apps as cattle and run them in K8s, replicated, rescheduled every now and then, without us even knowing. They scale up and down automatically, sometimes even in the middle of the night while we sleep soundly in our beds. Then something interesting starts to happen….
Problem statement: Lock it & throw away the Pod
All of the mentioned tools above have a safety mechanism that introduces a temporary lock on a table while the schema validation or migration is taking place in order to ensure consistency. However, when we are talking in a Kubernetes context (pun intended), it could happen that the app takes longer to validate the schema or fails the healthcheck exactly in that moment when the lock is present on the database.
Which will lead to the Pod being deleted and replaced by another one that will start but as soon as it sees that there is a lock from a different Pod, it will fail to start crash the Pod. Welcome to CrashLoopBackOff.
Fixing the problem in less than 80 lines of Go
I am basing this solution to be specific for Liquibase but it can easily be adapted to other DB migration tools out there with a few tweaks. Our goal (and hence, our logic) is simple: We want to check the table for any locks and remove them. Hmmm, wait, but what if the lock is legit, we don’t want to accidentally remove a lock that we’re not supposed to, right? In that case, let’s set up a threshold and say that if our lock is older than X minutes, we remove it. Let’s dig into it!
Pseudocode (it’s still code, bro)
package main
func main() {
// Connect to DB
// Check table for locks
// If locked
// check time of lock
// if lock time is higher than X minutes
// remove lock
// end
}
Less pseudo and more code
In order to connect to the database, we will need a driver. The official package go-sql-driver looks good for the job. Looking at the documentationm we can see that connecting to the database is pretty straighforward:
package main
import (
"database/sql"
"log"
"time"
_ "github.com/go-sql-driver/mysql"
)
func main() {
// Connect to DB
db, err := sql.Open("mysql", "username:pass@tcp(localhost:3306)/schema_name")
if err != nil {
log.Fatal(err)
}
db.SetConnMaxLifetime(time.Minute * 3)
db.SetMaxOpenConns(1)
db.SetMaxIdleConns(1)
defer db.Close()
// Check table for locks
// If locked
// check time of lock
// if lock time is higher than X minutes
// remove lock
// end
}
At this point, we can check the table for locks by running a query. We only need 3 fields from the table. LOCKED
- a boolean that tells us if the lock is present or not; LOCKGRANTED
- timestamp of the lock; LOCKEDBY
- to show us which Pod has locked the table, purely for traceability purposes. We will also need a struct to work with the data:
type lock struct {
isLocked bool
lockGranted string
lockedBy string
}
// connect to database....
result := db.QueryRow("SELECT LOCKED, LOCKGRANTED, LOCKEDBY FROM DATABASECHANGELOGLOCK")
var l = lock{}
err := result.Scan(&.isLocked, &l.lockGranted, &l.lockedBy)
if err != nil {
log.Fatal(err)
}
// ....
However, here I have ran into a few issues. First, my LOCKED
field was of type tinyint
so I couldn’t serialize it properly to a bool
field resulting in error messages like ...sql/driver: couldn't convert "\x01" into type bool...
.
I have seen that the sql
package has some pretty useful field wrappers like NullBool
and NullString
that could be used in case of empty results and similar. Unfortunatelly, it still didn’t work for me so I ended up using a small hack found in this Github issue which made me rewrite my query like so:
result := db.QueryRow("SELECT (LOCKED = b'1'), LOCKGRANTED, LOCKEDBY FROM DATABASECHANGELOGLOCK")
I did ended up, however, using the sql.NullString
for my other two struct fields which are useful in case our query returns an empty result.
The end result
Now, this is how all the code looks together along with the logic for checking the lock and removing it:
package main
import (
"database/sql"
"log"
"time"
_ "github.com/go-sql-driver/mysql"
)
type lock struct {
isLocked bool
lockGranted sql.NullString
lockedBy sql.NullString
}
func main() {
// Connect to DB
db, err := sql.Open("mysql", "username:pass@tcp(localhost:3306)/schema_name")
if err != nil {
log.Fatal(err)
}
db.SetConnMaxLifetime(time.Minute * 3)
db.SetMaxOpenConns(1)
db.SetMaxIdleConns(1)
defer db.Close()
// Check table for locks
result := db.QueryRow("SELECT (LOCKED = b'1'), LOCKGRANTED, LOCKEDBY FROM DATABASECHANGELOGLOCK")
var l = lock{}
err := result.Scan(&.isLocked, &l.lockGranted, &l.lockedBy)
if err != nil {
log.Fatal(err)
}
// If locked
if l.isLocked {
// check time of lock
lockTime, err := time.Parse("2006-01-02 15:04:05", l.lockGranted.String)
if err != nil {
log.Fatal(err)
}
now := time.Now()
diff := now.Sub(lockTime)
maxLockTime, err := time.ParseDuration("3m") // we are choosing 3 minutes as our max lock time
if err != nil {
log.Fatal(err)
}
// if lock time is higher than X minutes
if diff > maxLockTime {
lockedBy, _ := l.lockedBy.Value()
log.PrintF("Lock from %v older than %v detected, will try to delete...", lockedBy, diff.Minutes())
// remove lock
res, err := db.Exec("DELETE FROM DATABASECHANGELOGLOCK")
if err != nil {
log.Fatal(err)
}
rows, err := res.RowsAffected()
if err != nil {
log.Fatal(err)
}
log.Print("Deleted locks: %v", rows)
}
}
// end
}
STOP! Refactor time
The code above works but it’s very ugly so we will proceed to refactor it. The following things are eye stingers:
- Everything is crammed into the
main
function. We want to separate the code into smaller, easier to read functions. - Remove the comments.
- Extract strings into constants.
- Extract database connection details into environment variables since we don’t want to store them in plain text in the code.
- Extract the
maxLockTime
into an environment variable because we want this value to be configurable easily. - Move error handling to a separate function since it’s making the code very unreadable.
- Leave a comment about our SQL query hack for the tinyint so the next person (or ourselves in 6 months) knows why this is here.
After all of the above, here is how our code looks like now:
package main
import (
"database/sql"
"log"
"os"
"time"
_ "github.com/go-sql-driver/mysql"
)
type lock struct {
isLocked bool
lockGranted sql.NullString
lockedBy sql.NullString
}
const (
// (LOCKED = b'1') is done to convert tinyint to boolean. See: https://github.com/go-sql-driver/mysql/issues/440
DB_GET_LOCK_QUERY = "SELECT (LOCKED = b'1'), LOCKGRANTED, LOCKEDBY from DATABASECHANGELOGLOCK"
DB_DELETE_LOCK_QUERY = "DELETE FROM DATABASECHANGELOGLOCK"
DB_ENGINE = "mysql"
DB_PORT = "3306"
TIME_LAYOUT = "2006-01-02 15:04:05"
)
var (
maxLockTime = os.Getenv("MAX_LOCK_TIME")
DBUser = os.Getenv("DB_USER")
DBPass = os.Getenv("DB_PASS")
DBURL = os.Getenv("DB_URL")
DBName = os.Getenv("DB_NAME")
)
func main() {
db := connectToDb()
defer db.Close()
result := db.QueryRow(DB_GET_LOCK_QUERY)
var l = Lock{}
e := result.Scan(&l.isLocked, &l.lockGranted, &l.lockedBy)
handleError(e)
if l.isLocked {
handleDBLock(l, db)
}
}
func handleDBLock(l lock, db *sql.DB) {
lockTime, e := time.Parse(TIME_LAYOUT, l.lockGranted.String)
handleError(e)
now := time.Now()
diff := nowTime.Sub(lockTime)
maxLockTime, e := time.ParseDuration(maxLockTime)
handleError(e)
if diff > maxLockTime {
lockedBy, _ := l.lockedBy.Value()
log.Printf("Lock from %v older than %v detected, will try to delete...", lockedBy, diff.Minutes())
res, e := db.Exec(DB_DELETE_LOCK_QUERY)
handleError(e)
rows, e := res.RowsAffected()
handleError(e)
log.Printf("Deleted locks: %v", rows)
}
}
func connectToDb() *sql.DB {
connstr := DBUser + ":" + DBPass + "@tcp(" + DBURL + ":" + DB_PORT + ")/" + DBName
db, err := sql.Open(DB_ENGINE, connstr)
handleError(err)
db.SetConnMaxLifetime(time.Minute * 3)
db.SetMaxOpenConns(1)
db.SetMaxIdleConns(1)
return db
}
func handleError(e error) {
if e != nil {
log.Fatal(e)
}
}
Now that’s more like it….
Containerize and K8s…ize
We want to run this inside of a Kubernetes cluster as a CronJob
. My other alternative was to run it as an AWS Lambda function but the issue was that the database credentials needed to be stored as plain text environment variables. I’ve done some reasearch and saw that we can use AWS SSM and store the credentials there but then we would have to write our code with serverless framework which feels like adding much unecessary bloat for such a simple task. So let’s put the code in a container and run it as a K8s CronJob
.
For building the Docker image, we will use some pretty standard stuff - Docker multi-stage build and run the binary from a scratch
image:
FROM golang:alpine AS builder
WORKDIR $GOPATH/src/app/
COPY main.go .
COPY go.mod .
RUN go get -d -v
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags='-w -s -extldflags "-static"' -a -o /go/bin/app .
FROM scratch
COPY --from=builder /go/bin/app /go/bin/app
ENTRYPOINT ["/go/bin/app"]
The resulting image is pretty small (4MB!) and and doesn’t have a lot of room for exploits. I am sure it can be made more secure by using an unpriviliged user but I had some issues trying to set that up and couldn’t be bothered since my time was limited for this task.
The last piece of the puzzle is the CronJob
manifest as well as the Secret
where we will store our database credentials, which results in something like this:
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: liquibase-lock-release
namespace: example
spec:
schedule: "*/3 * * * *"
concurrencyPolicy: Replace
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
containers:
- name: liquibase-lock-release
image: my-secure-registry.org/liquibase-lock-release:v1
imagePullPolicy: Always
env:
- name: MAX_LOCK_TIME
value: "3m"
envFrom:
- secretRef:
name: liquibase-lock-release-secret
restartPolicy: OnFailure
imagePullSecrets:
- name: my-image-pull-secret
---
apiVersion: v1
kind: Secret
metadata:
name: liquibase-lock-release-secret
namespace: example
type: Opaque
data:
DB_USER: <base64-encoded-db-user>
DB_PASS: <base64-encoded-db-pass>
DB_URL: <base64-encoded-db-url>
DB_NAME: <base64-encoded-schema-name>
At this point, we only need to kubectl apply -f
this baby and we are ready to profit!
Hope you have enjoyed this post. The code for this can be found in this GitHub repository!
Keep on hacking!