Exploring Crypto Prices with QuestDB and Pandas

A photograph of a laptop displaying candle charts of stock market data
Photo by Pierre Borthiry via Unsplash

This submission comes from one of our community contributors Yitaek Hwang.

In Part I of this series, we used Google Data Studio to quickly import multiple data sources and compare the price action of various cryptocurrencies over time. Even though Google Data Studio provides an easy user-interface and some nice graphing features, it was limited in what it could do in terms of analyzing the dataset. In this post, we will use the popular Python libraries (pandas, matplotlib, and seaborn) to explore the dataset further.

Prerequisites#

Importing Data#

NOTE: If you still have QuestDB running from Part I on Google Cloud, you can skip to the next step.

For simplicity, we will run QuestDB locally via Docker and import the datasets via the console UI. Download and run the latest version of QuestDB:

$ docker run -p 9000:9000 \
-p 9009:9009 \
-p 8812:8812 \
-p 9003:9003 \
questdb/questdb

Navigate to localhost:9000, click on the “Upload” icon on the left-hand panel and import the csv files of interest. This example will use the Solana dataset, but any of the coins from the dataset will also work.

Import CSV section of QuestDB Web Console

Setting up Jupyter Notebook#

Install Jupyter Notebook via pip or with your favorite Python environment management tool such as conda, mamba, and pipenv. Alternatively, you can also download Anaconda Navigator for your OS.

$ pip install notebook

Now install the packages we will use to explore the dataset:

$ pip install numpy pandas matplotlib seaborn psycogp2

If you used Anaconda Navigator, go under Environments > Search Packages to install:

A screenshot of Anaconda Navigator

Now we’re ready to launch the notebook and start exploring the data:

$ jupyter notebook

Connecting to QuestDB#

First, we need to use the psycopg2 library to connect to QuestDB and import the data.

import pandas as pd
import numpy as np
import psycopg2 as pg
engine = pg.connect("dbname='qdb' user='admin' host='127.0.0.1' port='8812' password='quest'")
df = pd.read_sql('select * from coin_Solana.csv', con=engine)

(NOTE: if you are re-using QuestDB from Part I, change the host to the IP address of the load balancer.)

Now we can run some quick queries to make sure our import was successful. The head and tail functions are useful in this case for a quick sanity check:

df.head()
df.tails()
A screenshot showing head and tail functions in Jupyter Notebook

Alternatively, we can use the info and describe commands to get a sense of the data types and distribution:

df.info()
A screenshot showing an info function in Jupyter Notebook
df.describe()
A screenshot showing head a describe function in Jupyter Notebook

For good measure, we can also check for null or na values to make sure we’re working with a clean dataset.

df.isnull().any()
df.isna().any()

These queries should return with False for all the columns. If you have missing values in your dataset, you can use the dropna function to remove that row or column.

Exploring the Data#

Now that we have our dataset in Jupyter, we can run answer some simple questions. For example, we can find the five lowest price of Solana by running the nsmallest function on the column we’re interested in (e.g. High, Low, Open, Close). Similarly, we can use the nlargest function for the opposite:

df.nsmallest(5, 'High')
# df.nlargest(10, 'Low')
A screenshot showing nlargest function in Jupyter Notebook

We can also find days when the open price was lower than the closing price by using the query function:

df.query('Open < Close').head()
A screenshot showing query function in Jupyter Notebook

To get a better sense of the trends, we can resample the dataset. For example, we can get the mean prices by week by using the resample function on the Date column:

df_weekly = df.resample("W", on="Date").mean()
df_weekly.head()
A screenshot showing resample function on date in Jupyter Notebook

Now our data is aggregated by weeks. Similarly, we can get a rolling average over a set window via the rolling function:

df_rolling_mean = df.rolling(?).mean()
df_rolling_mean.tail()
A screenshot showing rolling function in Jupyter Notebook

Visualizing the Data#

To visualize our crypto dataset, we’ll use the seaborn library built on top of matplotlib. First import both libraries, and let’s plot a simple graph of Solana’s opening price over time:

import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(11, 4)})
df = df.set_index('Date')
df['Open'].plot(linewidth=0.5);
A screenshot showing price over time plot

Note that seaborn chose some sane defaults for the x-axis. We can also specify our own labels as well as legends like the plot below comparing the High, Low, and Opening prices of Solana over time:

cols_plot = ['High', 'Low', 'Open']
axes = df[cols_plot].plot(marker='.', alpha=0.5, linestyle='None', figsize=(11, 9), subplots=True)
for ax in axes:
ax.set_ylabel('Price ($)')
A screenshot showing high, low and open price plots

From a quick glance, it is hard to tell the trends of any of these price actions. Let’s dive into a time period when the Solana price starts to become more volatile (i.e. after Jan 2021). We can specify a subset of the dataset using the loc function and apply it to our full dataset, weekly mean resample, and seven day rolling mean.

After giving each of the dataset a different marker, we can plot all of them on the same graph to see when the opening price was above or below the weekly or moving window average:

fig, ax = plt.subplots()
ax.plot(df.loc['2021-01-01 23:59:59': 'Open'], marker='.', linestyle='-', linewidth=0.5, label='Daily')
ax.plot(df_weekly.loc['2021-01-01': 'Open'], marker='o', markersize=8, linestyle='-', label='Weekly Mean Resample')
ax.plot(df_rolling_mean.loc['2021-01-01 23:59:59': 'Open'], marker='.', linestyle='-', label='7-d Rolling Mean')
ax.set_ylabel('Price ($)')
ax.legend();
A screenshot showing moving window average plot

We can zoom in further in the May to July timeframe to capture the volatility. Alternatively, we can also apply the popular 200-day moving average metric to analyze price trends. Finally, we can also use the ewm function in pandas to calculate an exponentially weighed moving average to gather different momentum indicators. While past performance is not indicative of future performance, these momentum indicators can be used to backtest and formulate new trading or price analysis strategies.

Conclusion#

With QuestDB and pandas, it’s easy to visualize and calculate various price and momentum indicators. By polling data from CoinGecko or CoinMarketCap, anyone can start analyzing crypto trends. If you are interested in more crypto analysis, make sure to check out: