Websites



Document image

Websites

This collection regroups all automation around website data.

Extract Pixels From Websites

This automation allows the extraction of the pixels installed on websites.

This one is useful on many levels:

  • First, you can know if a website is running ads or not to sell ad services to its owners.
  • Second, you can also outreach competitors and partners for pixel ad exchange so both of you can benefit from sharing data intelligence.

You will be able to extract Google, Facebook, and LinkedIn pixels with it.



Document image
Document image





Document image



To run this automation, input the website you want to extract the pixel in TexAu.

Document image





Company Name To Domain

Another handy automation. This one will convert any company name to its domain name.

This automation helps collect additional data on the domain. In addition, you will find professional emails from a LinkedIn or Sales Navigator profile search (more on this in the workflow chapter).

To run this automation, input any company name to find its domain in TexAu.

Document image





Take A Website Screenshot

This automation allows taking screenshots of a website. It can be only the hero section or the entire page.

These screenshots will be stored in the cloud as temporary URLs in the TexAu result section or stored locally on your computer download folder if you use TexAu Desktop.

Document image

Input Example:

HTML
|

To run this automation, input the website URL you want to capture with TexAu.

Document image





Extract HTML From A Page

This automation will extract the HTML, CSS, and Javascript of a web page, including all pictures and attachments.

These files will be stored in the cloud as temporary URLs in the TexAu result section or stored locally on your computer download folder if you use TexAu Desktop.

Document image

Source Code Example:

HTML
|

Input Example:

HTML
|

To use this automation, input the website URL and give the folder a name (filename).

Once downloaded, TexAu will download the files in this folder.

Cloud users will be able to download this folder as a zip file. Desktop users will find this folder in the download folder of their computer.

Document image





Extract Emails And Phones From Website

This automation will extract phones and emails from any website in bulk.

Document image

Input Example:

HTML
|

There is two way to use this automation:

  • extract the emails and phones from all the website pages in bulk.
  • Extract the emails and phones of individual page URLs of the site.

For site-wide bulk email extraction, you can use these settings:

  • input URL: a domain name
  • crawl depth = 3 (maximum subdirectory level)
  • Max URLs = 1000 (maximum, for small sites 100-200 is enough)

For individual pages emails and phones extraction:

  • input URL: a domain name
  • crawl depth = 1
  • Max URLs = 1

The latter setting is suitable for processing the website sitemap's URLs (ex: a subdirectory, listing ad section of a site).

There are two additional settings:

Return single row: pulls all the emails and phones into 2 separate cells with all the comma-separated data.

Extensive Search: this will take more execution time and crawl the site deeper for more results

Here is an easy example describing how you can use this automation to find emails and phones on a listing site. We will process individual URLs of a directory, not the entire website. This process will work with directories with a simple pagination structure.

Document image

Input Example:

HTML
|

This website has simple pagination. Each page URL contains the same URL parameter ?p= where the value equals the page number followed by the listing category and distance.

The pagination goes up to 35 pages:

Document image

URL parameter for page=1

HTML
|

Seeing the pagination structure, we can easily deduct the other pages' URLs until page 35.

Open a new Google Sheet, paste the first page URL containing the page parameter in one column, and then the URL's end in another column.

Then scroll down on the first cell and expand it to the bottom to have all the pages.

Finally, Concatenate both columns in a third column to rebuild the full URL of the page.

Document image

Now use this Google Sheet as input in the "Extract Emails And Phones From Website" automation to scrape all the data on each page.

Document image







Extract Social Media Links From Website

This automation will extract all the social media links of a website.

This one is useful to find the official social media links of a company. Data you can use later reuse as input to find company employees on LinkedIn, for instance.

Usually, you can find these social links in the footer of a website.

Document image

Input Example:

HTML
|

To extract this data in TexAu, input the website URL and run.

Same as before, you'll find a setting to pull results in a single cell if a page has multiple links from the same social platform (ex: subsidiaries or franchises).

Document image





Get The JSON LD Of A Website

This automation will extract the JSON-LD data of a website or web page.

JSON-LD uses schema markup data to describe the content on a page. This also plays a crucial role in SEO.

But what most people don't realize is that you can also use JSON-LD data to scrape websites.

This structured data extraction will work on all sites using JSON-LD.

Here is an example with a Shopify e-Commerce website:

Document image

Input Example:

HTML
|

Here the JSON-LD data of the page contains information such as product price, quantity, model, or description.

To use this automation, input the website domain or a list of URLs you want to scrape with TexAu.

Document image

After extraction completion, download the scraped data as a CSV file in TexAu.

Finally, go to the JSON to CSV converter website below, then paste the JSON-LD file content.

HTML
|

Upon completion, download and open the CSV file. You will see the JSON-LD data in table form.

Document image







Check A Website Tech Stack Automatically

This automation allows finding any front-end technologies used by a website. For example, you can find the CMS or Cart in use, the web framework, tracking system, CDN, Chat support system, and more.

This data is helpful if you want to find and target specific site owners using these technologies.

Ex: finding all the WordPress site owners in your area.

Document image

Input Example:

HTML
|

To run this automation, input the website you want to find the technologies in use.

Document image





Collect The Meta Tags Of Any Site

This automation extracts the meta description and keyword of any web page.

This automation is helpful for SEO applications, and it also gives information about a website niche industry.

Ex: finding and outreaching owners of non-ranking websites on Google to sell them SEO services.

Document image

Input Example:

HTML
|

To run this automation, input the target website URL in TexAu.

Document image
Document image
Document image







Updated 13 May 2022
Did this page help you?
Yes
No