Age of AI
Given the rise of ChatGPT and other Gen AI tools I have to reinvent my posts to make them relevant, some introduction here.
Hence since now my posts will be less technical and more functional. From my conversation with ChatGPT it (from now on I will use pronouns it) granted me the permission to use our conversation in this blog and to reference ChatGPT as it
.
Introduction
Let's consider a common problem : we wish to optimize storage of media files.
Now, let's imagine we have made a dump of data with photos and videos by year in different formats and a magic tools that may convert them to the format of our choice.
However, we need to do some preparation work : group the files of each type instead of having the grouping by year. How to achieve this using ChatGPT ?
Do the work
Hers is our dump of media :
Let's get some input from ChatGPT :
windows using git bash and linux commands, I wish to rename the folders in the current dir having name 'Photos from ' simply to year
for dir in Photos\ from\ *; do
if [[ -d "$dir" ]]; then
year="${dir##* }" # Extract the year from the folder name
mv "$dir" "$year" # Rename the folder
fi
done
Result :
Great, let's move on!
Metadata
The very first thing you should consider is to fix the metadata, which in the narrow sense mean that each file should contain the Date taken
for the image files and Media created
for the video files. In the best case the image name should contain:
- formatted timestamp in the form
date_time
, like 20240101_000000 or even longer - origin discriminator, anything that would help you to classify the file, when compressing could the the original extension
- some counter with as many digits as necessary to avoid collisions during any compression job
Some of the modern phones do create a speaking file id, some do not, in any case for the old files it may happen that the data is present only in the adjacent json file, the structure name of it usually corresponds to the following :
<filename_with_extension>{<.suffix_of_type_metadata>}.json
. Hence before any processing it is wise to go through all the images and all the videos and update the time attributes, for example withexiftool
for images andffmpeg
for videos and as well it's a good idea to copy the processed file to the new directory with an extension that has as a prefix the formatted_date, and those who do not - move to another folder to fix them up manually.
Feel free to use the script :
#!/bin/bash
# Hardcoded input directory and output directories
INPUT_DIR="./input" # Input directory set to ./input
OUTPUT_DIR_JPG="${INPUT_DIR}/jpg" # Directory for modified JPGs
OUTPUT_DIR_JPG_TO_FIX="${INPUT_DIR}/jpg_to_fix" # Directory for JPGs needing fixing
# Create output directories if they don't exist
mkdir -p "$OUTPUT_DIR_JPG"
mkdir -p "$OUTPUT_DIR_JPG_TO_FIX"
# Iterate through all JPG files in the input directory and subdirectories
find "$INPUT_DIR" -type f -iname "*.jpg" | while read -r jpg_file; do
# Get the base name of the JPG file without the extension
base_name=$(basename "$jpg_file" .jpg)
# Look for the corresponding JSON file that starts with base_name and ends with .json
json_file=$(find "$(dirname "$jpg_file")" -type f -iname "${base_name}.jpg*.json" | head -n 1)
# Check if the corresponding JSON file exists
if [[ -f "$json_file" ]]; then
# Extract the creation time and description from the JSON file
creation_time=$(jq -r '.creationTime.timestamp' "$json_file")
description=$(jq -r '.description' "$json_file")
# Format the creation time for ExifTool (assuming it's in Unix timestamp)
if [[ "$creation_time" =~ ^-?[0-9]+$ ]]; then
formatted_date=$(date -d @"$creation_time" +"%Y%m%d_%H%M%S" 2>/dev/null)
if [[ $? -ne 0 ]]; then
formatted_date=""
fi
else
formatted_date=""
fi
# Use ExifTool to update Date taken and description in the JPG file only if they are empty
current_date_taken=$(exiftool -DateTimeOriginal -s -s -s "$jpg_file")
current_description=$(exiftool -Description -s -s -s "$jpg_file")
# Update DateTimeOriginal if it's empty
if [[ -z "$current_date_taken" && -n "$formatted_date" ]]; then
exiftool -overwrite_original -DateTimeOriginal="$formatted_date" "$jpg_file" >/dev/null 2>&1
fi
# Update Description if it's empty
if [[ -z "$current_description" && -n "$description" ]]; then
exiftool -overwrite_original -Description="$description" "$jpg_file" >/dev/null 2>&1
fi
else
# If JSON file is not found, notify but continue to check Date taken
echo "JSON file not found for: $jpg_file"
fi
# Now check if Date taken is set in the JPG file
date_taken=$(exiftool -DateTimeOriginal -s -s -s "$jpg_file")
# Construct the new filename based on the Date taken
if [[ -n "$date_taken" ]]; then
# Format the date_taken string to be filename-safe
formatted_date=$(echo "$date_taken" | sed -e 's/://g' -e 's/ /_/g') # Remove colons and spaces
safe_date_taken="${formatted_date:0:8}_${formatted_date:9:6}" # Separate date and time
new_filename="${OUTPUT_DIR_JPG}/${safe_date_taken}_$(basename "$jpg_file")" # New filename based on formatted date
cp "$jpg_file" "$new_filename"
else
# Get the last directory name if Date taken is not set
last_dir=$(basename "$(dirname "$jpg_file")")
new_filename="${OUTPUT_DIR_JPG_TO_FIX}/${last_dir}_$(basename "$jpg_file")"
cp "$jpg_file" "$new_filename"
fi
done
echo "Processing complete."
Feel free to adjust the logic for different file types using ChatGPT
or another Generative AI of your choice.
One more step closer!
Thank you, now I wish to go through all the folder and instead of grouping by year to group by the file extension, the year folders should disappear and at the end of the procedure I wish to have only extension name folders with all the files with the same extension from all the folders.
for dir in [0-9]*; do
if [[ -d "$dir" ]]; then
for file in "$dir"/*; do
if [[ -f "$file" ]]; then
ext="${file##*.}" # Extract the file extension
mkdir -p "$ext" # Create a directory for the extension if it doesn't exist
mv "$file" "$ext/" # Move the file to the extension directory
fi
done
rmdir "$dir" # Remove the now-empty year directory
fi
done
Here is it :
I will rename the folders to the lower case, you may learn about the case sensitivity.
And also I will rename all the files to lowercase in two steps:
find ./ -type f | while read -r file; do
dir=$(dirname "$file")
base=$(basename "$file")
lower_base=$(echo "$base" | tr '[:upper:]' '[:lower:]')
mv "$file" "$dir/tmp_$lower_base"
done
find ./ -type f -name 'tmp_*' | while read -r file; do
dir=$(dirname "$file")
base=$(basename "$file")
new_base="${base#tmp_}" # Remove the tmp_ prefix
mv "$file" "$dir/$new_base"
done
Okay cool, we have done the data preparation. Now let's do some optimization.
Top comments (0)