Project 1: Phone Number and Email Address Extractor
Welcome to the first project of this tutorial series. In this project, you will learn how to extract phone numbers and email addresses from your clipboard using regex to search and manipulate text patterns in strings of characters.
What you will gain from this project:
- Learning how to use the Pyperclip module for copying and pasting text.
- Creating and working with complex regular expressions.
Prerequisites
- Familiarity with "Automate the Boring Stuff: Python Programming Basics" and pattern matching with Regular Expressions.
- Optionally, Python 3.9 or an earlier version.
- A strong willingness to learn and tackle more complex exercises.
To access the project README file, click here.
You might be tempted to start coding immediately. Take some time to think through the project and add comments as you plan your approach.
Step 1: Write comments
def extract_email():
# get the text off the clipboard
# create a regex for email addresses
# find all matches in the clipboard text
# return the matches
pass
def extract_phone_number():
# get the text off the clipboard
# create a regex for phone numbers
# find all matches in the clipboard text
# return the matches
pass
if __name__ == '__main__':
extract_email()
extract_phone_number()
Step 2: Import Required modules
Now that you have outlined the comments let's import the necessary modules. For example, you need the Pyperclip module for copying and pasting.
Note: It was stated in the prerequisites that this project works with Python 3.9 or earlier, primarily due to the Pyperclip module's compatibility. If you are using a more recent Python version, you have three options:
- Look for an alternative module for copying and pasting.
- Use file manipulation, such as reading from a file and writing to a file, instead.
- If you are a bit more advanced in Python, consider forking the Pyperclip module, improving it, and open-sourcing it.
import re, pyperclip
First, you will have to import the re module for working with regular expressions.
Step 3: Paste Text from Clipboard
This step can be a bit tricky. You want to make sure that you have copied text to the clipboard using the standard Ctrl + C
keyboard shortcut. Next use the pyperclip.paste()
to automatically paste it, storing the text in a copied_text
variable. Think of it this way, instead of manually pasting, you are storing it in a variable. Now you are ready to work with the copied text.
def extract_email():
# get the text off the clipboard
copied_text = pyperclip.paste()
Step 4: Create an Email Regex
To test if it works correctly, you can click here. You can also check out this repository for curated collections of Regex.
Note: This regex doesn't fulfill all the required validation, and creating a regex can be complicated, requiring a lot of practice. Depending on your programming role, you might use regex often or less. Don't cram it; just understand the basics and be able to interpret a common regex pattern. Remember, the goal is to work smarter, not harder.
# create a regex for email addresses
email_regex = re.compile(r'''(
[a-zA-Z0-9._%+-]+ # username
@ # @ symbol
[a-zA-Z0-9.-]+ # domain name
(\.[a-zA-Z]{2,4}) # dot-something
)''', re.VERBOSE)
Step 5: Find All Matches in the Clipboard Text
You want to use the findall()
function to find all matches. Remember that 'findall()' returns a tuple or a list depending on whether there is a group.
Quote from the book as a quick refresher:
To summarize what the final() method returns, remember the following:
- When called on a regex with no groups, such as \d\d\d-\d\d\d- \d\d\d\d, the method findall() returns a list of string matches, such as ['415-555-9999', '212-555-0000'].
- When called on a regex that has groups, such as (\d\d\d)-(\d\d\d)- (\d\ d\d\d), the method find () returns a list of tuples of strings (one string for each group), such as [('415', '555', '1122'), ('212', '555', '0000')].
# find all matches in the clipboard text
for groups in email_regex.findall(copied_text):
print(groups)
Result in terminal:
('jobseeker@email.com', '.com')
('hrmanager@company.com', '.com')
('projectmanager@consultingfirm.net', '.net')
('finance@corporation.org', '.org')
Since we had to loop over it, this was the result. To access the text file, click here.
Step 6: Create a 'matches' Variable and Append the Groups to It.
You create a variable called matches
with an empty list. You use list indexing to access the first element since that is what is needed and you append group[0]
to it.
matches = []
def extract_email():
...
# find all matches in the clipboard text
for groups in email_regex.findall(copied_text):
matches.append(groups[0])
if __name__ == '__main__':
extract_email()
print(matches)
Result in terminal:
['jobseeker@email.com', 'hrmanager@company.com', 'projectmanager@consultingfirm.net', 'finance@corporation.org']
Step 7: Update the main
Now that our extract_email() is working fine, you need to update this. You check if the length of matches is greater than 0. If it's true, the if statement is executed. You copy the matches to the clipboard, then you join it using the '\n' new line. You print 'Copied to clipboard' to let you know it has been copied, and then you print it to the terminal.
Open a blank notepad on your system and press Ctrl+V, and you will see that it has been successfully copied to the clipboard.If no matches are found, it executes the else statement.
if __name__ == '__main__':
extract_email()
# Copy results to the clipboard if a match was found else execute the else statement
if len(matches) > 0:
pyperclip.copy('\n'.join(matches))
print('Copied to clipboard.')
print('\n'.join(matches))
else:
print('No phone numbers or email addresses found.')
Exercise
Now, it's your turn to implement the extract_phone_number()
function. It follows the same steps as above. When you are done, you can check my solution here.
You might be tempted to start coding immediately, but remember, think, think, and then code. All the best.
Conclusion
I hope that by now you have been able to implement the other function. If you struggled with it, that's perfectly fine; everyone does at times. This project has covered working with complex Regex and using Pyperclip to copy and paste. Be sure to check out the next series for more projects.
If you have any questions, want to connect, or just fancy a chat, feel free to reach out to me on LinkedIn and Twitter. Until next time, happy coding.
Top comments (2)
File manipulation can be easily done with Python. This way we can use Python 3.10 without worrying about any external libraries
Think, think, and then code.
Nice advise👍🏻