Tossing out Fake Amazon Reviews & reRanking the Results
Amazon is my outlet for retail therapy where I will drown myself in product research until I am fully satisfied I have selected the holy grail while it’s on sale. If you spend any time filtering and sorting as I do, you have probably found yourself frustrated with Amazon’s ranking.
My Top Amazon Shopping Frustrations:
- The 5-star product with two reviews is next to the 5-star product with a thousand reviews.
- The 5-star product with a thousand reviews also has 200 1-star reviews and nothing in-between.
- You have to skim the comments to ensure the product isn’t a knock-off, refurbished or expired.
- The product has changed over time and those 5-star reviews were for an older version before they started cutting costs.
That is how the reRank project was born:
A pet project that generates an unbiased Amazon product ranking by identifying the untrustworthy reviews and re-sorting the list of results.
Note: Currently only available for US Amazon.com
Now whenever I want to search for a product, I can go through my reRank site and get a list of top products with reviews I can trust. I also added direct links back to the full report on ReviewMeta or I can view the price history on Keepa.
Technical Details
I found a site ReviewMeta which will generate a dynamic report of suspicious reviews on Amazon products and provide a new star ranking. What a genius idea! What if I could take it a step further? I used the Amazon API to filter the top 30 products, combined that with the ReviewMeta API and looked for a couple of my own pet peeves, such as the percentage of one-star reviews, in order to create my own unique ranking.
Technology
Laravel 5.7 using the following packages:
- Amazon ECS (E-Commerce Services) Package for Laravel https://github.com/JoeDawson/amazon-ecs
- DiDOM — simple and fast HTML parser
https://github.com/Imangazaliev/DiDOM
Getting Started
To begin I played inside Amazon Scratchpad in order to test different queries.
https://webservices.amazon.com/scratchpad/index.html
Roadblocks
1) API Limits
My goal was to allow users to do their own searches but Amazon puts a limit of one query per second and ReviewMeta’s free API would not appreciate thousands of simultaneous hits either. Instead, I did research on the top items searched for over the past year and ran only those lists.
2) Caching Restrictions
Amazon has strict policies against caching prices so I used their “Amazon Associate” widget to display each product price.
3) No Reviews API
When I call the Amazon Product API it returns a JSON result which I parse and stores in a MySQL database. Unfortunately, the Amazon Product API only provides an iFrame for the reviews so I had to write manual pattern extractions in order to grab the number of reviews for each star count. This became another barrier because Amazon saw my script as a bot and threw up captchas. I have to give kudos to Hartley Brody’s blog for helping me figure out what to do next: https://blog.hartleybrody.com/scrape-amazon/
My MVP
After a few days of energy drinks and sleep deprivation, I have a minimum viable product (MVP) that I am releasing to the world to see if it’s worth spending more time to develop.