This collection regroups all automation around website data.
This automation allows the extraction of the pixels installed on websites.
This one is useful on many levels:
- First, you can know if a website is running ads or not to sell ad services to its owners.
- Second, you can also outreach competitors and partners for pixel ad exchange so both of you can benefit from sharing data intelligence.
You will be able to extract Google, Facebook, and LinkedIn pixels with it.
To run this automation, input the website you want to extract the pixel in TexAu.
Another handy automation. This one will convert any company name to its domain name.
This automation helps collect additional data on the domain. In addition, you will find professional emails from a LinkedIn or Sales Navigator profile search (more on this in the workflow chapter).
To run this automation, input any company name to find its domain in TexAu.
This automation allows taking screenshots of a website. It can be only the hero section or the entire page.
These screenshots will be stored in the cloud as temporary URLs in the TexAu result section or stored locally on your computer download folder if you use TexAu Desktop.
To run this automation, input the website URL you want to capture with TexAu.
These files will be stored in the cloud as temporary URLs in the TexAu result section or stored locally on your computer download folder if you use TexAu Desktop.
Source Code Example:
To use this automation, input the website URL and give the folder a name (filename).
Once downloaded, TexAu will download the files in this folder.
Cloud users will be able to download this folder as a zip file. Desktop users will find this folder in the download folder of their computer.
This automation will extract phones and emails from any website in bulk.
There is two way to use this automation:
- extract the emails and phones from all the website pages in bulk.
- Extract the emails and phones of individual page URLs of the site.
For site-wide bulk email extraction, you can use these settings:
- input URL: a domain name
- crawl depth = 3 (maximum subdirectory level)
- Max URLs = 1000 (maximum, for small sites 100-200 is enough)
For individual pages emails and phones extraction:
- input URL: a domain name
- crawl depth = 1
- Max URLs = 1
The latter setting is suitable for processing the website sitemap's URLs (ex: a subdirectory, listing ad section of a site).
There are two additional settings:
Return single row: pulls all the emails and phones into 2 separate cells with all the comma-separated data.
Extensive Search: this will take more execution time and crawl the site deeper for more results
Here is an easy example describing how you can use this automation to find emails and phones on a listing site. We will process individual URLs of a directory, not the entire website. This process will work with directories with a simple pagination structure.
This website has simple pagination. Each page URL contains the same URL parameter ?p= where the value equals the page number followed by the listing category and distance.
The pagination goes up to 35 pages:
URL parameter for page=1
Seeing the pagination structure, we can easily deduct the other pages' URLs until page 35.
Open a new Google Sheet, paste the first page URL containing the page parameter in one column, and then the URL's end in another column.
Then scroll down on the first cell and expand it to the bottom to have all the pages.
Finally, Concatenate both columns in a third column to rebuild the full URL of the page.
Now use this Google Sheet as input in the "Extract Emails And Phones From Website" automation to scrape all the data on each page.
This automation will extract all the social media links of a website.
This one is useful to find the official social media links of a company. Data you can use later reuse as input to find company employees on LinkedIn, for instance.
Usually, you can find these social links in the footer of a website.
To extract this data in TexAu, input the website URL and run.
Same as before, you'll find a setting to pull results in a single cell if a page has multiple links from the same social platform (ex: subsidiaries or franchises).
This automation will extract the JSON-LD data of a website or web page.
JSON-LD uses schema markup data to describe the content on a page. This also plays a crucial role in SEO.
But what most people don't realize is that you can also use JSON-LD data to scrape websites.
This structured data extraction will work on all sites using JSON-LD.
Here is an example with a Shopify e-Commerce website:
Here the JSON-LD data of the page contains information such as product price, quantity, model, or description.
To use this automation, input the website domain or a list of URLs you want to scrape with TexAu.
After extraction completion, download the scraped data as a CSV file in TexAu.
Finally, go to the JSON to CSV converter website below, then paste the JSON-LD file content.
Upon completion, download and open the CSV file. You will see the JSON-LD data in table form.
This automation allows finding any front-end technologies used by a website. For example, you can find the CMS or Cart in use, the web framework, tracking system, CDN, Chat support system, and more.
This data is helpful if you want to find and target specific site owners using these technologies.
Ex: finding all the WordPress site owners in your area.
To run this automation, input the website you want to find the technologies in use.
This automation extracts the meta description and keyword of any web page.
This automation is helpful for SEO applications, and it also gives information about a website niche industry.
Ex: finding and outreaching owners of non-ranking websites on Google to sell them SEO services.
To run this automation, input the target website URL in TexAu.