The technicalities

All posts Apache Airflow Big Data ouTDo LLM News and insights The technicalities Workshops

all articles

What you need to know before deploying Open Source LLM

Navigating the complexities of deploying open-source Large Language Models (LLMs) can be daunting. From understanding licensing restrictions and making crucial decisions about accuracy, speed, and cost trade-offs, to comprehending benchmark evaluations and exploring deployment strategies, this guide provides essential insights for leveraging open-source LLMs effectively in your projects.

The technicalities 3 min read

RDD in Apache Spark

Learn how to utilize the RDD API in Apache Spark to check partition details or perform low-level operations. Despite being deprecated, the RDD API is accessible via the .rdd method on Datasets and DataFrames. Discover how to check the number of partitions with the getNumPartitions method and determine partition sizes using the glom function. Explore the remaining useful operations that RDD API offers for low-level hacking and internal Spark tasks.

The technicalities 5 min read

Datasets and DataFrames

Understanding Spark's .as[T] Method: Best Practices and Defensive Programming

The technicalities 3 min read

Monitoring Airflow jobs with TIG 2: data quality metrics

In the first article on Monitoring Airflow jobs with TIG, “System Metrics”, we have seen an example of Airflow installation with a TIG stack set…

Apache Airflow min read

Monitoring Airflow jobs with TIG 1: system metrics

Like many server applications, Airflow can – and should – be monitored for metrics and logs. In this article, we will look into the former…

Apache Airflow 5 min read

Databricks – Photon

The Databricks platform offers two execution engines for the clients: the standard Apache Spark (available as an open-source application) and one with Photon enhancement that…

Apache Airflow 6 min read

Airflow — pools and mutexes.

Although the ideal data pipeline is made of idempotent and independent tasks, there are some cases when setting up a mutex (a.k.a. part of the…

Apache Airflow 7 min read

Passing information between DAGs in Airflow.

There are data pipelines where you must pass some values between tasks – not complete datasets, but ~ kilobytes. This can be managed even within…

LLM min read

NeMo-Guardrails

Building a dedicated chatbot is both challenging and dangerous. At company X, the model should talk about X’s offer and, ideally, nothing else to save…