Fryboyter

Create PDF files with a searchable text layer

Surely everyone has already received a PDF file whose text cannot be selected and copied, for example. This is usually because the scanned text has been embedded as an image in the PDF file. The multifunction printer I use at work, for example, does this. However, the problem can be solved relatively easily and reliably.

The tool OCRmyPDF inserts an additional text layer above the image in the PDF file, whose content can be selected, copied and searched. Tesseract](https://github.com/tesseract-ocr/tesseract) is used for text recognition.

In the best case, one simply executes ocrmypdf input.pdf output.pdf. Input.pdf is the original file and output.pdf is the file that will be saved with the additional text layer. Further functions and possible optimisation options can be found in the documentation. Depending on the language of the content of the original file, you may have to install a language package such as tesseract-data-eng (English language package) beforehand. If this is missing, ORCmyPDF also displays a corresponding message and terminates.

I tested OCRmyPDF with a few simple PDF files with easily readable text. With these, the additional text layer was placed very well over the image file so that the text of both layers overlapped very precisely, so that selecting and copying the text was no problem. Searching the file also worked. Of course, the new file needs more storage space but I think it is within limits. For example, the original PDF file with a DIN A4 page containing the first lines of “The Raven” by Edgar Allan Poe is 49.4 KB in size. The PDF file with the text layer added has a size of 49.8 KB.

General

Insert content at the end of the front matter area

When I write an article for fryboyter.de, I create a Markdown file with a front matter area at the beginning. This looks like this, for example.

---
title: Insert content at the end of the front matter area
date: 2022-11-02T20:21:11+0100
categories:
- General
tags:
- Front Matter
- Markdown
- Sed
slug: insert-content-at-the-end-of-the-front-matter-area
---

In some cases, you may want to extend this area later. For example, with nositemap: true so that the article does not appear in the sitemap of the website. For a single article, this can be done quickly by changing the file manually. But what if it affects not one but several files? Then it would be better to automate the changes. However, the area in question does not always consist of the same number of lines. For example, because sometimes I use more tags, sometimes less. Finally, I came to the following solution.

for file in $(find . -type f -name "*.md");
do
lines=$( sed -n '/^---$/=' $file | sed -n 2p )
sed -i -e "$lines i nositemap: true" $file
done

The first line searches the current directory and all subdirectories therein for files with the extension .md. The third line checks how many lines the respective front matter area consists of. The fourth line adds nositemap: true to the end of the front matter area so it looks like this.

---
title: Insert content at the end of the front matter area
date: 2022-11-02T20:21:11+0100
categories:
- General
tags:
- Front Matter
- Markdown
- Sed
slug: insert-content-at-the-end-of-the-front-matter-area
nositemap: true
---

If you want to test the script first without changing the files directly, you should remove the -i parameter in line 4. Then the changed files are only displayed without actually being changed.

General

Clone all repositories of a user at Github

Sometimes you want to clone all of a user’s repositories on Github onto your own computer. Depending on the number of repositories, this can be a bit time-consuming to download each one manually with “git clone”. To automate this you can use the following script. For example, let’s call the script gitdownload.sh.

#!/bin/bash

if [ $# -eq 0 ]
  then
    echo "Usage: $0 <user_name> "
    exit;
fi

USER=$1

for repo in $(curl -s https://api.github.com/users/"$USER"/repos?per_page=1000 |grep git_url |awk '{print $2}'| sed 's/"\(.*\)",/\1/');do
git clone "$repo";
done;

To download the repositories in question, simply run ./gitdownload.sh $username. Instead of $username, enter the name of the user at Github.

General

Find active fork on Github

From time to time I like to use sshfs to mount a directory locally via SSH. However, the development of the project was stopped a few months ago. Today I wanted to see if there is an actively developed fork of sshfs.

Unfortunately, this is not so easy, at least directly via github.com. Via https://github.com/libfuse/sshfs/network/members you can view the forks but they are sorted alphabetically. Because there are currently 403 forks of sshfs on github, it could take a while to find the most current fork. Or you can use https://techgaun.github.io/active-forks/index.html. There you only have to enter the address of the project (e.g. https://github.com/libfuse/sshfs) and you will get a list of forks sorted by activity. With the button “Add Condition” you can customize the search query according to your needs.

Of course, it would be better if you could sort the list directly at Github. However, Codeberg.org, which is based on Gitea, is no better in this respect and also only offers the alphabetically sorted list.

General

Version 0.13.0 of the Isso comment system released

Isso is a Python-based commenting system that you can host yourself, so that users’ comments are not stored on third-party servers.

Isso had two problems for a long time. First, development was a bit slow at times. And second, which was the bigger problem, only one person could publish new versions. And this person had, and probably still has, a lot to do in real life.

This has changed since version 0.12.6. New developers are participating and the necessary rights to release new versions are available. So a little less than 24 hours ago version 0.13.0 was released. The changes are quite extensive and can be read at https://isso-comments.de/news/#isso-version-0-13-0-released.

Furthermore Isso is now accessible via https://isso-comments.de and shall be continued as a community project.

I have already updated the Isso instance I use on fryboyter.de to version 0.13.0. I only had to update the CSS file afterwards, because it was fundamentally overhauled by the Isso developers.

General