I wrote the following documentation with some hesitation.
As a big fan of Perplexity AI and the innovative approaches they've taken to adding real-time data augmentation to large language models, I'm wont to attack their platform for no reason.
However, as a privacy conscious user, I've been disappointed to note that the team seems to have disregarded the feedback conveyed by several disappointed users on Reddit and other forums.
The purpose of this quick demonstration was to show how at the time of writing, December 29th, 2024, Perplexity AI seems to be employing a "security through obscurity" approach to user file uploads.
Depending on their file type, user uploads are routed into either Cloudinary for images or AWS Buckets for other files. The latter appear to have some more robust security mechanisms in place like time limitations than the former.
Both however were freely accessible from unauthenticated user sessions. From a user security standpoint, this is a troubling finding.
Assuming (as it is reasonable to do) that vast amounts of personally identifiable information have been uploaded by users to Perplexity to date, and have found their way into these cloud buckets, the fact that anyone can access them or potentially attempt to scrape them is worrying.
For the purpose of this demonstration, I used a repository containing synthetic data generated by a large language model, but modeled after credible data that real users might submit.
That repository is available here.
Method Used
A test account was created using Perplexity AI and various types of mock PII data were uploaded by adding them (using drag and drop) to prompts.
User Uploaded Images
An image from the synthetic data store was added to a prompt:
The completion included the photo:
The URL was accessed:
The URL indicates that the photo had been uploaded to a Cloudinary CDN bucket:
The full URL:
https://pplx-res.cloudinary.com/image/upload/v1735497206/user_uploads/jTRfKPcJgmDGGva/family-pic.jpg
To verify that the resource could be accessed without authentication, the URL was pasted into a browser in a new session that was not logged into Perplexity.
No authentication requirement prevented the resource from being accessed.
User-Uploaded Documents
A document containing personally identifiable information, including an address and a phone number, was uploaded with a prompt asking for feedback on a resume:
The URL path showed that the asset was stored in an S3 bucket.
It contained the following structure:
https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/{user_id}/{file_id}/{file_name}?AWSAccessKeyId={access_key}&Signature={signature}&x-amz-security-token={security_token}&Expires={expiration_timestamp}
The file could also be downloaded form a non-authenticated browser session:
User-Uploaded Code With Secrets
To reset handling for codes that was provided by the user, a Python program containing a hard coded secret in this repository was uploaded alongside a prompt. :
The file structure indicated that the code had also been uploaded to an AWS bucket:
These structure was as follows:
https://<bucket-name>.s3.amazonaws.com/<folder-path>/<unique-id>/<file-name>?AWSAccessKeyId=<access-key-id>&Signature=<signature>&x-amz-security-token=<security-token>&Expires=<expiration-timestamp>
As previously the script was accessible from an unauthenticated session:
A Request From A Privacy-Concerned User
Paying customers deserve better than to have the security of the personal data they commit to Perplexity AI reduced to a game of probabilities.
It is extremely reasonable to assume that in the course of ordinary prompting, users of LLM services may commit files containing highly personal information through the mechanism of uploading data to prompts.
A job seeker might upload their resume containing their home address and phone number. A personal user might upload a medical file. While many might regard these practices as ill-advised. It's nevertheless reasonable to assume they are occurring.
It is suggested, therefore, that at the time of writing, it's probable to assume that a huge amount of user submitted data is unprotected on Cloudinary buckets. The fact that this remains the case in spite of repeated requests by users to improve security is disappointing.
"Security through obscurity" has failed in the past and better methods can be employed to protect user data.
Top comments (0)