Gabor Szabo

Posted on Mar 14, 2023 • Originally published at code-maven.com

One-liner: Remove first two characters of every line in thousands of files

#perl #programming #productivity #beginners

In a project creating a Ladino dictionary in which I have a few thousands of YAML files. They used to include lists of values, but a while ago I split them up into individual entries. I did this because the people who are editing them are not used to YAML files and it makes it a lot easier to explain them what to do.

However, the previous change left me with 1-item lists in each file. I wanted to clean that up.

Example files

Here are a few examples files that were also reduced in size for this demo.

- ladino: kaza

- ladino: komer
  inglez: to eat

- ladino: biervo
  inglez: word
# some comment

As you can see each one has an entry for a Ladino expression. Some of the files have translations to English. Other files in the real data-set had further translations to Hebrew, Turkish, French, Portuguese, and Spanish.

Some files had comments.

That dash at the first row and the indentation is the left-over from the time when more than one of these were in each file.

So I wanted to get rid of the first two columns in every line, except when they start with a hash-mark (#).

Here is the Perl one-liner to do so.

perl -p -i -e 's/^[^#].//' *.yaml

The '*.yaml' at the end is a shell expression that will list all the YAML files in the current directory as the parameters of this command.
The -p tells perl to read the content of each file line-by-line and print it.
The -i tells perl to replace the original files with the content that was printed.
The -e tells perl that the following string is a perl program and not the name of the file where the perl program is
The perl program 's/^[^#].//' will be execute on every line read from the files.
The 's///' is regex substitution. It works on the current line and changes the current line. So the lines that are saved back to the files are the modified lines.
Between the 1st and 2nd slash is the regex.
The first ^ means the match must start at the beginning of the line.
The [^#] means that there must be a character that is not #. This will match any character on the first place of the file except #.
The . means match any character.
The string that is between the 2nd and 3rd slash is the replacement. It is an empty string so if there is a match it will be replaced by the empty string.

That's the whole thing.

Improvement

Now that I am explaining it, it occurred to me that this would be a safer solution:

perl -p -i -e 's/^[- ] //' *.yaml

Here the regex is s/^[- ] // which means the first character must be either a dash or a space and the second character must be a space and those two are replaced.
So if there is anything else as the first two characters the line will not be changed. This is safer as it is more specific as what we would like to match for replacement.

Results

For this article I saved the resulting files in a separate place:

ladino: kaza

ladino: komer
inglez: to eat

ladino: biervo
inglez: word
# some comment

Top comments (5)

Masashi • Mar 15 '23

Perl seems amazing. I don't think that it is that popular nowadays, but if I want to learn it where can I? I learnt Perl regex but maybe there will be some good "Top to bottom" Perl guide. I'm really interested in Perl after a few pf your posts.

Gabor Szabo • Mar 15 '23 • Edited

I am not sure what you mean by "top to bottom", but I can point you to the Perl Tutorial I wrote.

Masashi • Mar 15 '23

Thanks :).

𒎏Wii 🏳️‍⚧️ • Mar 15 '23

Nice one, but you can do this with sed instead of perl too and save a bunch of characters 😀

sed -i -e '1s/^[ -] //' *.yaml

will do the trick just fine, and if you really have a lot of work to do:

find . -name '*.yaml' | xargs -P $(nproc) sed -i -e '1s/^[- ]//'

will run it in parallel on as many threads as you have CPU cores 😁

Randal L. Schwartz • Mar 28 '23

You kids and your fancy "sed -i". Back in the day, sed didn't have that, but Perl had -i long before! The sed folks wisened up and took that idea as their own!

DEV Community

One-liner: Remove first two characters of every line in thousands of files

Example files

Improvement

Results

Top comments (5)

Read next

Beginners Guide To CDN

Mastering TypeScript Core Utility Types

2290. Minimum Obstacle Removal to Reach Corner

Top Open Source Communities you should not miss out in 2025🔥