Here’s a step by step guide on how to install Airflow 2.2.0 on AWS EC2
AWS EC2 details:
- Amazon Linux 2 AMI (HVM), SSD Volume Type (64-bit x86)
- type: t2.micro (Free Tier Eligible)
- vCPU: 1
- Memory: 1 GB
All the configuration is left by default. Just click review and launch instances.
Since we are going to install the lightest version of airflow, just for testing only, we will use sqlite as database.
Prerequisite:
- Python 3.6 or above
- SQLite 3.15.0 or above
Now on to the actual step by step:
Check python version
$ python3
Python 3.7.10 (default, Jun 3 2021, 00:02:01) [GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] on linux Type "help", "copyright", "credits" or "license" for more information.
Since it’s already python 3.6 or above, we don’t need to upgrade python. But, if yours is below 3.6, my condolences.
Check SQLite version
There are two SQLite version that need to be checked.
- The actual installed version from the SQLite console
- The version used by Python.
To check the actual installed version of the SQLite, type in sqlite3, and you should see the version as shown below. Use Ctrl + D to exit console.
$ sqlite3
SQLite version 3.7.17 2013-05-20 00:56:22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>
To check the version used by python, go to python console as above, type the following command.
>>> import sqlite3
>>> sqlite3.sqlite_version
'3.7.17'
>>>
sqlite3.sqlite_version is different from sqlite3.version. The later will usually show something way lower and it’s not accurate. See links below for reference.
Anyway, since the version (3.7.17) is lower than the prerequisite (3.15.0+), we need to upgrade, but in this case reinstall, SQLite.
Install SQLite
- Go to https://www.sqlite.org/download.html, and find the source code. It should be the one with “autoconf” in the filename.
- Copy link address for that file, and use wget command to download the source file in your EC2 server.
wget https://www.sqlite.org/2021/sqlite-autoconf-3360000.tar.gz
- Extract source file, and go to the extracted folder
tar zxvf sqlite-autoconf-3360000.tar.gz
cd sqlite-autoconf-3360000/
- Before compiling and installing the source code, we need to install development tools from yum
sudo yum groupinstall "Development Tools" -y
- Run the command below one by one to compile and install sqlite
./configure
make
sudo make install
Now, when you check the installed version from the SQLite console, it will show the version we just installed
$ sqlite3
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite>
Unfortunately, when you check the version used by python, it will still show the old version, like nothing happened.
In order to fix this, find sqlite3.so file, to see where are the different version resides.
sudo find / -name *sqlite3.so*
In my case, it will show something like this:
$ sudo find / -name *sqlite3.so*
/usr/lib64/libsqlite3.so.0
/usr/lib64/libsqlite3.so.0.8.6
/usr/lib64/python2.7/lib-dynload/_sqlite3.so /usr/local/lib/libsqlite3.so.0.8.6
/usr/local/lib/libsqlite3.so.0
/usr/local/lib/libsqlite3.so
/home/ec2-user/sqlite-autoconf-3360000/.libs/libsqlite3.so.0.8.6 /home/ec2-user/sqlite-autoconf-3360000/.libs/libsqlite3.so.0 /home/ec2-user/sqlite-autoconf-3360000/.libs/libsqlite3.so
The SQLite that come with the OS is installed under /usr/lib or /usr/lib64. Meanwhile, the one that we install ourselves is in /usr/local/bin.
To make python use the version that we just install, replace the sqlite3.so in /usr/lib, with the one from /sqlite-autoconf folder. Feel free to backup the original file. Use sudo if you are not allowed to replace file in /usr/lib fro your user account.
cp /home/ec2-user/sqlite-autoconf-3360000/.libs/libsqlite3.so.0.8.6 /usr/lib64/libsqlite3.so.0.8.6
cp /home/ec2-user/sqlite-autoconf-3360000/.libs/libsqlite3.so.0 /usr/lib64/libsqlite3.so.0
cp /home/ec2-user/sqlite-autoconf-3360000/.libs/libsqlite3.so /usr/lib64/libsqlite3.so
See if your python is already pointing to the newly installed version of SQLite
$ python3
Python 3.7.10 (default, Jun 3 2021, 00:02:01)
[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> sqlite3.sqlite_version
'3.36.0'
>>>
Yay we’re almost done.
Install Airflow
Learning from experience, I know for a fact that the airflow example dags needs virtualenv, and the python in EC2 doesn’t have that by default. So you need to install it first.
pip3 install virtualenv
Then, install airflow’s latest version using the following command
pip3 install apache-airflow
Or, if you want to specify the version, and constraints according to your python version, you can use the following commands
AIRFLOW_VERSION=2.2.2
PYTHON_VERSION="$(python3 --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip3 install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
Now you can start Airflow magically with just one command as follows:
airflow standalone
which actually a combination of the command below. Or you can run it one by one to see what each of it does.
airflow db initairflow users create \
--username admin \
--firstname Peter \
--lastname Parker \
--role Admin \
--email spiderman@superhero.orgairflow webserver --port 8080airflow scheduler
Important links:
- Aiflow quick start installation documentation: https://airflow.apache.org/docs/apache-airflow/stable/start/local.html
- How to make python use the updated SQLite version: https://sqlite.org/forum/info/233dca76521f207b