Install Airlfow 2.0 on AWS’s Free Tier EC2

Dinda
4 min readJan 29, 2022

Here’s a step by step guide on how to install Airflow 2.2.0 on AWS EC2

AWS EC2 details:

  • Amazon Linux 2 AMI (HVM), SSD Volume Type (64-bit x86)
  • type: t2.micro (Free Tier Eligible)
  • vCPU: 1
  • Memory: 1 GB

All the configuration is left by default. Just click review and launch instances.

Since we are going to install the lightest version of airflow, just for testing only, we will use sqlite as database.

Prerequisite:

  • Python 3.6 or above
  • SQLite 3.15.0 or above

Now on to the actual step by step:

Check python version

$ python3 
Python 3.7.10 (default, Jun 3 2021, 00:02:01) [GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] on linux Type "help", "copyright", "credits" or "license" for more information.

Since it’s already python 3.6 or above, we don’t need to upgrade python. But, if yours is below 3.6, my condolences.

Check SQLite version

There are two SQLite version that need to be checked.

  • The actual installed version from the SQLite console
  • The version used by Python.

To check the actual installed version of the SQLite, type in sqlite3, and you should see the version as shown below. Use Ctrl + D to exit console.

$ sqlite3 
SQLite version 3.7.17 2013-05-20 00:56:22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>

To check the version used by python, go to python console as above, type the following command.

>>> import sqlite3 
>>> sqlite3.sqlite_version
'3.7.17'
>>>

sqlite3.sqlite_version is different from sqlite3.version. The later will usually show something way lower and it’s not accurate. See links below for reference.

Anyway, since the version (3.7.17) is lower than the prerequisite (3.15.0+), we need to upgrade, but in this case reinstall, SQLite.

Install SQLite

  • Go to https://www.sqlite.org/download.html, and find the source code. It should be the one with “autoconf” in the filename.
  • Copy link address for that file, and use wget command to download the source file in your EC2 server.
wget https://www.sqlite.org/2021/sqlite-autoconf-3360000.tar.gz
  • Extract source file, and go to the extracted folder
tar zxvf sqlite-autoconf-3360000.tar.gz
cd sqlite-autoconf-3360000/
  • Before compiling and installing the source code, we need to install development tools from yum
sudo yum groupinstall "Development Tools" -y
  • Run the command below one by one to compile and install sqlite
./configure
make
sudo make install

Now, when you check the installed version from the SQLite console, it will show the version we just installed

$ sqlite3 
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite>

Unfortunately, when you check the version used by python, it will still show the old version, like nothing happened.

In order to fix this, find sqlite3.so file, to see where are the different version resides.

sudo find / -name *sqlite3.so*

In my case, it will show something like this:

$ sudo find / -name *sqlite3.so* 
/usr/lib64/libsqlite3.so.0
/usr/lib64/libsqlite3.so.0.8.6
/usr/lib64/python2.7/lib-dynload/_sqlite3.so /usr/local/lib/libsqlite3.so.0.8.6
/usr/local/lib/libsqlite3.so.0
/usr/local/lib/libsqlite3.so
/home/ec2-user/sqlite-autoconf-3360000/.libs/libsqlite3.so.0.8.6 /home/ec2-user/sqlite-autoconf-3360000/.libs/libsqlite3.so.0 /home/ec2-user/sqlite-autoconf-3360000/.libs/libsqlite3.so

The SQLite that come with the OS is installed under /usr/lib or /usr/lib64. Meanwhile, the one that we install ourselves is in /usr/local/bin.

To make python use the version that we just install, replace the sqlite3.so in /usr/lib, with the one from /sqlite-autoconf folder. Feel free to backup the original file. Use sudo if you are not allowed to replace file in /usr/lib fro your user account.

cp /home/ec2-user/sqlite-autoconf-3360000/.libs/libsqlite3.so.0.8.6 /usr/lib64/libsqlite3.so.0.8.6
cp /home/ec2-user/sqlite-autoconf-3360000/.libs/libsqlite3.so.0 /usr/lib64/libsqlite3.so.0
cp /home/ec2-user/sqlite-autoconf-3360000/.libs/libsqlite3.so /usr/lib64/libsqlite3.so

See if your python is already pointing to the newly installed version of SQLite

$ python3 
Python 3.7.10 (default, Jun 3 2021, 00:02:01)
[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> sqlite3.sqlite_version
'3.36.0'
>>>

Yay we’re almost done.

Install Airflow

Learning from experience, I know for a fact that the airflow example dags needs virtualenv, and the python in EC2 doesn’t have that by default. So you need to install it first.

pip3 install virtualenv

Then, install airflow’s latest version using the following command

pip3 install apache-airflow

Or, if you want to specify the version, and constraints according to your python version, you can use the following commands

AIRFLOW_VERSION=2.2.2
PYTHON_VERSION="$(python3 --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip3 install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

Now you can start Airflow magically with just one command as follows:

airflow standalone

which actually a combination of the command below. Or you can run it one by one to see what each of it does.

airflow db initairflow users create \
--username admin \
--firstname Peter \
--lastname Parker \
--role Admin \
--email spiderman@superhero.org
airflow webserver --port 8080airflow scheduler

Important links:

--

--

Dinda

Writing here as some sort of note cause I can’t quite figure out how to use Notion.