Scraping Facebook Search Results
Today we will build a recipe to find local shops and extract the following business data:
- Business name
- Phone number
- Facebook email
- Find their website
- Find additional phones on their site
- same for contact emails
- check for their website technology (WordPress and so on)
Here's the workflow outline (double-click images to zoom in):
In the end, we will gather all this data in a nice Google sheet.
First, let's find some friendly restaurant pages in New York. I am starving:
This is a generic search, but we could refine the filters by adding other filters like shops, location, and category:
Technically, Facebook limits allow us to scrape
- Unlimited page URL search BUT
- Only 15 execution hourly
A bit of maths: (15 potential leads per hour) x (24 hours a day) x 365 days =131400 potential leads per year for one Facebook account! Nice 🤪.
But in reality, it will be less. Still enough to find nice spots to prospect and eat NY Pastas for life.
Add the "link" variable (the business page URL from the previous search). Here we will visit and load each restaurant Facebook company page from search results and scrape their details:
Add "Max Depth" level to 2 or 3 to crawl website directories to find emails and phones from websites. Generally, you can find this information on the website footer.
Add the "website" variable in the "WEBSITE URL" field:
To filter only the websites using WordPress, we will use a filter as follows:
- If technology name
- Text contains
- "wordpress" (keep it wrapped around quotation marks)
Here we will consolidate all the data we found at each step of our automation recipe and send this to a Google Sheet.
Create a new Sheet:
Set its sharing permission as "Editor" and copy the Sheet link:
Here's the header template we will use:
Map each column on our Google Spreadsheet by cheery picking the variables you need to output on it:
Let's launch that recipe. I am hungry, dammit!
Logs showing the automation processing in real-time:
Sample data running 15 minutes (slowly to fly under FB police radar). This limit is hardcoded by default in TexAu for obvious safety reasons.
You might see an #ERROR in the phone column number. Don't worry. The phones are present. It's just a formatting issue. Let's fix that cheese in my pasta:
Click the column, do CTRL+F search-replace, and check "Also search within formulas". Done.
Oh, I see on the right column that 230 Fifth Restaurant in NYC has a WordPress site. Let's see if it's correct.
Oh yeah, excellent WordPress pasta.