
This tutorial shows how to use QuestDB with a Jupyter notebook. We will learn
how to create a database, query data and read table contents into a Pandas
dataframe, and use matplotlib
to create charts of our table data.
#
Getting startedTo get started, you'll need a few things installed and set up. This should be quick.
QuestDB: To download QuestDB, you can find complete instructions for getting started with Docker, Homebrew on macOS, or the binaries.
Jupyter Notebooks: These are interactive Python environments that will help you run a complete version of this tutorial. To run it, you should:
- make sure you are running Python 3.x and not Python 2.7. If you're in doubt,
python --version
will tell you. - install Jupyter Notebooks with
pip3 install --upgrade ipython jupyter
make sure that the libraries we use in this tutorial are also installed withpip3 install requests urlib matplotlib pandas
- clone this repository
(
git clone https://github.com/davidgs/QuestNotebook
) in the repository directory run Jupyter Notebook
That will get you right back to a page that looks eerily similar to this that is interactive, allowing you to run the code and interact with the database yourself.
If you get errors like ModuleNotFoundError: No module named 'requests'
for any
of the libraries you installed above, double-check to make sure that you are
actually using Python 3.x jupytper --path
will let you know if Jupyter is
using 2.7 or 3.x
#
Create a databaseWe will need someplace to store our data, so let's create a test database where we can put some random data. We will create a simple table with 5 columns, one of which is a timestamp.
The create operation in QuestDB appends records to the bottom of a table. If the table has a designated new record, timestamps must be greater than, or equal to the latest timestamp. Attempts to add a timestamp in middle of a table will result in a timestamp out of order error.
cust_id
is the customer identifier. It uniquely identifies a customer.balance_ccy
balance currency. We use char in this example, but we could useSYMBOL
to avoid storing text against each record to save space and increase database performance.balance
is the current balance for customer and currency tuple.inactive
is used to flag deleted records.timestamp
timestamp in microseconds of the record. Note that if you receive the timestamp data as a string, it could also be inserted usingto_timestamp
.
This should return a 200
status the first time you run it. If you run it more
than once, subsequent runs will return 400
because the database already
exists.
#
Generate some dataSince we have a new setup, we should add some data to QuestDB so that we can
have something to query. We will add some random data, for now. You can re-run
this section as many times as you want to add 100 entries at a time, or simply
change the range(100)
to add as many datapoints as you wish.
#
Query data from QuestDBNow that we have data available, let's try querying some of it to see what we get back!
#
Read the content into pandas dataframeSo you'll notice that the returned data is just a massive CSV string. If you'd
rather have JSON data, then you would change the endpoint to
http://localhost:9000/exec
, but since we're going to use Pandas to frame our
data, we'll stick with CSV.
We are also telling pandas to parse the timestamp
field as a date. This is
important since we're dealing with time series data.
#
Narrow the searchThat's just getting us all the data, but let's narrow the search using some SQL
clauses. Let's look for a specific cust_id
and only balances of that customer
that are in USD.
We are also only interested in times the customer was active. Since this is SQL,
you can make this query as simple, or as complex, as you'd like. All of the data
was generated randomly, so this exact query may return no results, you may have
to adjust the cust_id
below until you get results back.
Note: The query string must be URL-encoded before it is sent.
#
Plot the dataWe will use matplotlib to plot the data
From that query we should get a nice little plot of our data, like this:
#
Clean upNow we will clean everything up for the next time.
You can now stop your QuestDB instance, if you'd like, or leave it running and find some great uses for it! If you enjoyed this tutorial, please follow us on Twitter, and make sure you give our GitHub repository a star!