Ingestion Primer

QuestDB makes top performance "data-in" easy.

Choose from existing third party tools, first-party protocols, or both.

Whether you're working with:

  • financial market data
  • sensors
  • analytics
  • event driven apps

And applying tools like:

Or looking to apply the first-party protocols:

Then this guide will prepare you to get the most out of (and into!) QuestDB.

Protocol ingest methods#

The core ingestion methods which are based off of popular protocols.

You may interact with them directly, or through third party tools.

Network EndpointDefault PortInserting & modifying data
InfluxDB Line Protocol9000High performance streaming (Recommended!)
PostgreSQL Wire Protocol8812SQL INSERT, UPDATE
HTTP REST API9000SQL INSERT, UPDATE, CSV
Web Console9000SQL INSERT, UPDATE, CSV

All protocols benefit from the core features and benefits of QuestDB.

Deduplication, out-of-order indexing and performance through wide tables and high cardinality data are always present.

And no matter which method you choose, querying (data-out) is handled via extended SQL.

If you're unsure which method is right for you, consider joining the Slack community to speak to other developers.

We'll introduce the protocols one-by-one, and link out to deeper reference materials.

InfluxDB Line Protocol (ILP)#

Recommended!

InfluxDB Line Protocol (ILP) is the recommended ingestion method for high performance applications. ILP is an insert-only protocol that bypasses SQL INSERT statements, thus achieving significantly higher throughput. It is the fastest way to insert data, and it excels with high volume data streaming. The QuestDB clients leverage ILP by default, and many of the third party tools and integrations utilize it too.

An example of "data-in" via ILP appears as such:

# temperature sensor example
readings,city=London temperature=23.2 1465839830100400000\n
readings,city=London temperature=23.6 1465839830100700000\n
readings,make=Honeywell temperature=23.2,humidity=0.443 1465839830100800000\n

As the example shows, ILP is a text protocol over HTTP (or TCP) which offers ease-of-use. No upfront schema is required; tables are created automatically if they do not already exist. The protocol thrives in situations where multiple streams ingest into a single source. It also supports on-the-fly, concurrent schema changes. For health management, there is error handling and a health check endpoint.

If you'd like to apply the InfluxDB Line Protocol, see:

PostgreSQL Wire Protocol (PGWire)#

PostgreSQL Wire Protocol (PGWire) provides ingestion interoperability with the PostgreSQL ecosystem. QuestDB supports most PostgreSQL keywords and functions, including parameterized queries and psql on the command line. All together, support of PGWire for ingestion and PostgreSQL for querying and data-out allows QuestDB to connect with and query using various PostgreSQL third party client libraries and tools.

import psycopg as pg
# Connect to an existing QuestDB instance using the with statement
conn_str = 'user=admin password=quest host=127.0.0.1 port=8812 dbname=qdb'
with pg.connect(conn_str, autocommit=True) as connection:

By default, PGWire runs over TCP port 8812 and applies SQL INSERT and COPY statements. In contrast to streaming cases - for which we recommend the InfluxDB Line Protocol - PGWire is better suited for applications that INSERT via SQL programmatically. PGWire provides parameterized queries and therefore avoids tricky SQL injection issues.

If PostgreSQL is the answer for your team, see:

REST HTTP API#

The HTTP REST API provides a REST API for importing data, exporting data, and querying. It offers compatibility with a wide range of libraries and tools and is what powers the QuestDB Web Console.

curl -F data=@data.csv http://localhost:9000/imp

To continue with the REST HTTP API, checkout:

Easy CSV upload#

For GUI-driven CSV upload which leverages the REST HTTP API, use the Import tab in the Web Console:

Screenshot of the UI for import

For all CSV import methods, including using the APIs directly, see the CSV Import Guide.

QuestDB and toolchains#

QuestDB is an essential part of high performance data architecture. As such, it provides interoperability with other tools and services. Depending on your needs, QuestDB may help process, ingest, organize, accelerate or store your data. The use cases are many!

An architecture diagram showing QuestDB interoperability with various third party tools.
Roadmap architecture: QuestDB + your favourite tools

For ingest specifically, it is common for QuestDB to be on the receiving end of a service such as Apache Kafka. Kafka is a fault-tolerant message broker that excels at streaming. Its ecosystem provides tooling which - given the popularity of Kafka - can be used in alternative services and tools like Redpanda, similar to how QuestDB supports the InfluxDB Line Protocol.

  1. Apply the Kafka Connect based QuestDB Kafka connector
  2. Apply our Kafka Connect based generic JDBC Connector
  3. Write a custom program to read data from Apache Kafka and write to QuestDB
  4. Use a stream processing engine

Each strategy has different trade-offs.

The rest of this section discusses each strategy and guides users who are already familiar with the Kafka ecosystem.

Apache Kafka#

QuestDB connector#

Recommended for most people!

QuestDB develops a first-party QuestDB Kafka connector. The connector is built on top of the Kafka Connect framework and uses the InfluxDB Line Protocol for communication with QuestDB. Kafka Connect handles concerns such as fault tolerance and serialization. It also provides facilities for message transformations, filtering and so on.

The underlying InfluxDB Line Protocol ensures operational simplicity and excellent performance. It can comfortably insert over 100,000s of rows per second. Leveraging Apache Connect also allows QuestDB to connect with Kafka-compatible applications like Redpanda.

Read our QuestDB Kafka connector guide to get started, with either self-hosted or QuestDB Cloud instances.

JDBC connector#

Similar to the QuestDB Kafka connector, the JDBC connector also uses the Kafka Connect framework. However, instead of using a dedicated InfluxDB Line Protocol stream, it relies on a generic JDBC binary and QuestDB's PGWire compatibility. Similar to how the QuestDB Connector can be used with Kafka-compatible utilities like Redpanda, the JDBC connector works with utilities such as Apache Spark and other tools.

The JDBC connector requires objects in Kafka to have associated schema and overall it is more complex to set up and run. Compared to the QuestDB Kafka connector, the JDBC connector has significantly lower performance, but the following advantages:

  • Higher consistency guarantees than the fire-and-forget QuestDB Kafka connector
  • Various Kafka-as-a-Service providers often have the JDBC connector pre-packaged

Recommended, if the QuestDB Kafka connector cannot be used.

Customized program#

Writing a dedicated program reading from Kafka topics and writing to QuestDB tables offers great flexibility. The program can do arbitrary data transformations and filtering, including stateful operations.

On the other hand, it's the most complex strategy to implement. You'll have to deal with different serialization formats, handle failures, etc. This strategy is recommended for very advanced use cases only.

Not recommended for most people.

Stream processing#

Stream processing engines provide a middle ground between writing a dedicated program and using one of the connectors. Engines such as Apache Flink provide rich API for data transformations, enrichment, and filtering; at the same time, they can help you with shared concerns such as fault-tolerance and serialization. However, they often have a non-trivial learning curve.

QuestDB offers a connector for Apache Flink. It is the recommended strategy if you are an existing Flink user, and you need to do complex transformations while inserting entries from Kafka into QuestDB.

Other third party tools#

For a full list of third party tools supported by QuestDB, see third party tools.

Next step - queries#

Depending on your infrastructure, it should now be apparent which ingestion method is worth pursuing.

Of course, ingestion (data-in) is only half the battle.

Your next best step? Learn how to query and explore data-out from the Query & SQL Overview.

It might also be a solid bet to review timestamp basics.

We also assumed that you have data already.

No data yet? Just starting? No worries. We've got you covered!

There are several quick scaffolding options:

  1. QuestDB demo instance: Hosted, fully loaded and ready to go. Quickly explore the Web Console and SQL syntax.
  2. Create my first data set guide: Create tables, use rnd_ functions and make your own data.
  3. Sample dataset repos: IoT, e-commerce, finance or git logs? Check them out!
  4. Quick start repos: Code-based quick starts that cover ingestion, querying and data visualization using common programming languages and use cases. Also, a cat in a tracksuit.
  5. Time series streaming analytics template: A handy template for near real-time analytics using open source technologies.

There are also one-click sample data sets available in QuestDB Cloud.


โญ Something missing? Page not helpful? Please suggest an edit on GitHub.