Apache Spark and Apache Iceberg
I’m building a personal datalake, just for fun, using Apache Iceberg and Apache Spark. I’m writing a technical post explaining the process, and how to run a small datalake in your own computer.
I’m building a personal datalake, just for fun, using Apache Iceberg and Apache Spark. I’m writing a technical post explaining the process, and how to run a small datalake in your own computer.
After using ZSH for a couple of years, I decided to switch back to the Fish shell. Fish works out of the box, batteries included, and the basic installation provides all the features I expect from a shell environment. For over 20 years, I have been using Unix-based systems, including SCO Unix, Solaris, and BSDi. I have been using Linux since 1996, and Slackware was the first Linux distribution I used, and now I’m a happy Fedora user....
It was fun to attend FOSDEM for the first time. The event is of immense magnitude, as multiple talks are occurring simultaneously, and approximately 8000 individuals from all over the globe attended the event. It’s a chance to meet new people, learn about new technology, get stickers (I’m trying to find more space on my laptop to put more stickers) and help out at the event. My first FOSDEM and my first time volunteering to heralding the lightning talks, what a great way to get immersed in the event!...
Fedora is the Linux distribution that I primarily utilize, as it offers a satisfactory balance between cutting-edge packages and stability. The release schedule is six-monthly, and you can expect the most recent version of the main packages, a level of innovation you can only find in one of the most up-to-date and stable operating systems. Being a data professional, I enjoy trying new software and staying abreast of the newest industry innovation....
Here’s how I set up ZSH on Fedora Linux. Install the basic packages dnf install zsh zsh-autosuggestions zsh-syntax-highlighting util-linux-user Install Oh My ZSH Follow the instruction on the Oh My ZSH website: sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" Install the Powerlevel10k theme Install the MesloLGS font family first to get the best results. Install the fonts: MesloLGS NF Regular.ttf MesloLGS NF Bold.ttf MesloLGS NF Italic.ttf MesloLGS NF Bold Italic.ttf Clone the repository: git clone --depth=1 https://github....
Python is my main programming language at work, it’s the lingua franca for data engineer, together with SQL. I have always been very curious to learn a functional programming language and Scala was a natural choice. As data engineer, Python, Scala and Java are the most common programming languages, these three technologies dominate the main tools, components and frameworks used by data professionals, for example: Hadoop (Java) Pandas (Python) Airflow (Python) Kafka (Scala and Java) Spark (Scala) Pulsar (Java) (…) It’s a big list, if you are still not convinced, take a look into Apache Projects Directory for Big Data....
I’m starting this new space to write articles on data engineering and also on other different topics like privacy, productivity tools and personal growth. Why a new blog? Before my own blog platform, I wrote a few posts on the Medium platform, like this one: Data Engineer and Infrastructure as a code. I am very interested in sharing knowledge, freely, without paywalls. This is one of the reasons why I decided to stick with my own platform instead of using Medium....