Many people don't understand a crucial point about scrapers: It can never be as stable as automation apps using official APIs like Zapier or Integromat.
There are so many dependencies (multiple platforms and pages layouts). When you use any scraping tool to access data from those platforms that don't want to share their data from API (the purpose of those scraping tools), a single change in the code of a page can break your entire automation.
Social media pages are very dynamic. Those web pages are often changing for experimentation and conversion purposes (CRO). The latter can also vary depending on your location.
TexAu has a test automation process indeed to alert us when a module breaks because of this. But what people forget is that at the end of the day, there is still a developer that will have to fix the code until it changes again.
There is no magical auto-healing process that can fix this, sadly. It's the same for all scraping tools.
In its current state of development, TexAu lacks a relational database management system.
This is indeed an issue that we want to address in the future.
TexAu cannot join 2 automation modules output results evenly if two or more modules generate multiple output URL links for one input URL.
These are called 1:2 ratio automation modules.
Ex: the Scraping profiles from LinkedIn search Results automation will require:
- one search URL (one input)
- and will output multiple profile URLs (2 or more outputs)
What does that mean? If you have more than two 1:2 ratio automation modules in your workflow, you will have to split and consolidate the data to separate Google Sheets.
You can find more information and a list of all 1:2 ratio automation in this article:
Below is the most classic example where you have only one 1:2 ratio automation at the beginning of your workflow, and then all the following automation modules are 1:1 ratio automation.
In this case, you will have no issues consolidating all the data in one Google Sheet.
All the outputs from the first automation module will iterate individually in all the following modules, all good.
In this second example, all the automation modules in the workflow are 1:2 ratio, and you will have to send each module results to different Google Sheets (or Spreadsheets).
Here, each species generate multiple outputs from one input. You won't be able to consolidate this.
At some point, you will be able to know when to split those results in your workflows.
Here's another illustration for scraping LinkedIn Jobs listings and finding decision-makers of the companies posting these Jobs:
Another limit is the lack of scheduling for automation composing a workflow. But, again, this is an issue with automation such as auto-like or skill endorsement, e.g., "click automation".
Click automation is the easiest to detect by social media platforms, and you should use it sparingly.
The best way to automate these actions is to run them in multiple batches throughout the day.
For example, ten likes or profile skill endorsements every hour, randomly.
While you can easily do this by scheduling a single automation from a CSV or Google Sheet input, in the context of a workflow involving multiple automation with different daily limits, needless to say, it's a real brain-teaser 🧠.
That's also the reason why some automation limits are low to prevent any rate-limiting when used in workflows.
To solve this, we need to implement a scheduling setting for those "click-based automation".
So that way, instead of capping those automation to an arbitrary daily limit of let's say ten per day, we will be able to run them in multiple batches of ten automated actions every "x" hours per day inside a workflow.