How Do Spy Tools Actually Work?

ironbull · Jul 19, 2016

Hi,

I would like to know how do spy tools actually work from behind. I know it's a tricky question and not a lot of people know about this but I hope I can get a good answer from someone in the business who is willing to share some information.

My guess is that the best ones make use of botnets, but not really sure. Nothing to do with IPs, User Agents, Browsers or anything like that. They can mimic that behavior from what I have seen. And it's impossible to do it manually, so there must be a lot of automation in the scraping process.

I mean, how can a spy tool be able to:

- Visit a site/app which has an ad network on it
- Click all over around/Refresh the screen to come up with different ads/landers
- Being able to detect the campaign URL
- Being able to detect the lander URL
- Click all over around, inside the lander
- Being able to detect the click URL
- Being able to detect the following redirects to the offer

And do it on a daily basis in a lot of sites/apps. Plus being able to scrape from different countries. Plus being able to scrape from different carriers. Plus being able to scrape from different devices.

If someone has valuable information, feel free to send me a PM if you don't want to share it in public.

Thanks

T J Tutor · Jul 19, 2016

ironbull said:
Hi,

I would like to know how do spy tools actually work from behind. I know it's a tricky question and not a lot of people know about this but I hope I can get a good answer from someone in the business who is willing to share some information.

My guess is that the best ones make use of botnets, but not really sure. Nothing to do with IPs, User Agents, Browsers or anything like that. They can mimic that behavior from what I have seen. And it's impossible to do it manually, so there must be a lot of automation in the scraping process.

I mean, how can a spy tool be able to:

- Visit a site/app which has an ad network on it
- Click all over around/Refresh the screen to come up with different ads/landers
- Being able to detect the campaign URL
- Being able to detect the lander URL
- Click all over around, inside the lander
- Being able to detect the click URL
- Being able to detect the following redirects to the offer

And do it on a daily basis in a lot of sites/apps. Plus being able to scrape from different countries. Plus being able to scrape from different carriers. Plus being able to scrape from different devices.

If someone has valuable information, feel free to send me a PM if you don't want to share it in public.

Thanks

They use bots that target banner ads and then save the data to a database. These companies hire some drastically talented developers to write the scripts the bots deliver every time they spider a site and it happens fast, lightening fast. The data sets you outline above are only a fraction of the queries and qualifiers used to disassemble and reassemble the information they encounter and deliver to their clients. This is why any of the good ones are very expensive. WhatRunsWhere is $329 a month and BoxOfAds from $247 to $599 a month.

ironbull · Jul 19, 2016

T J Tutor said:
They use bots that target banner ads and then save the data to a database. These companies hire some drastically talented developers to write the scripts the bots deliver every time they spider a site and it happens fast, lightening fast. The data sets you outline above are only a fraction of the queries and qualifiers used to disassemble and reassemble the information they encounter and deliver to their clients. This is why any of the good ones are very expensive. WhatRunsWhere is $329 a month and BoxOfAds from $247 to $599 a month.

They are using something more than simple bots. Bots are not able to fake mobile carriers for example. Also, they are using legit residential IPs not cheap VPN/proxies IPs that are marked.

T J Tutor · Jul 20, 2016

ironbull said:
They are using something more than simple bots.

I agree that many bots are simple, however, bots that crawl and index are anything but simple. They are in fact, fairly complex, such as the GoogleBot that spiders. GoogleBot visits many sites in parallel at the same time, their "legs" spanning a large area of the "web" simulataeously. Spiders (bots) can crawl through and read all of a site's pages and follow all the hypertext links in each page and follow the links to their destinations as well as identify the behavioral aspects of the sites.

These types of bots often work together in what is called a "client/server" design. I have credentials in this area of software development and deployment. I can easily build a set of routines that can crawl a network and report primary architectures of code and then have that routine (bot) report to specialized subordinate bots to catalog their findings in a database.

Bots can issue commands, respond to the results of those commands, and request and/or respond to database queries.

Now, I haven't built any software for spying ads or carriers and such, I stated that bots are part of the solution because this is what I hear from fellow developers. According to my sources, scrapers today are all considered bots. He may be wrong, but given he works at a University in the computer sciences department as an adjunct professor, I tend to believe him.

We do have a couple of Spy Tool Reps here and I would be very interested in knowing more accurate answers to your questions. Maybe we can get @johnnygood from BoxOfAds and @davidkelly from WhatRunsWhere to lay some knowledge on us.

Bots are not able to fake mobile carriers for example. Also, they are using legit residential IPs not cheap VPN/proxies IPs that are marked.

I'm not sure why you are mentioning "fake mobile carriers". Your question was "how do spy tools actually work".

ironbull · Jul 20, 2016

T J Tutor said:
I'm not sure why you are mentioning "fake mobile carriers". Your question was "how do spy tools actually work".

Well because I know some spy tools offer carrier filters and that's hard to mimic just with code.

Maybe @AdsXposed and @MobileadMonitor could add something here as well.

MxyzptlkFishStix · Jul 20, 2016

Perl or Python Programmer + SoC + 3G Modem + Carrier Data Plan

ironbull · Jul 21, 2016

mxyzptlkfishstiks said:
Perl or Python Programmer + SoC + 3G Modem + Carrier Data Plan

Do you think they have a Carrier Data Plan for every carrier in the world? Plus they have several ips per carrier.

T J Tutor · Jul 21, 2016

mxyzptlkfishstiks said:
Perl or Python Programmer + SoC + 3G Modem + Carrier Data Plan

Well, I guess I can see the implementation of PERL or Python, though PERL is a bit sloppy in my opinion for this type of use. Are you talking PERL 5 or 6? Two different animals I think. Python on the other hand I can see being more adaptable for the purposes of writing this kind of code. I'm not sure what SoC has to do with it, it's chip technology. Unless maybe the code is able to seek out and identify the devices based on their individual chip signature.

MxyzptlkFishStix · Jul 21, 2016

ironbull said:
Do you think they have a Carrier Data Plan for every carrier in the world? Plus they have several ips per carrier.

No. They probably use Pareto for any geo they are targeting

ironbull · Jul 21, 2016

mxyzptlkfishstiks said:
No. They probably use Pareto for any geo they are targeting

What's Pareto? Any useful link to that? as it's a really generic word

MxyzptlkFishStix · Jul 21, 2016

T J Tutor said:
Well, I guess I can see the implementation of PERL or Python, though PERL is a bit sloppy in my opinion for this type of use. Are you talking PERL 5 or 6? Two different animals I think. Python on the other hand I can see being more adaptable for the purposes of writing this kind of code. I'm not sure what SoC has to do with it, it's chip technology. Unless maybe the code is able to seek out and identify the devices based on their individual chip signature.

Sorry, I meant to say SBC (single board computer).

ironbull · Jul 23, 2016

Seems like a secret topic as no one from the existing tools is replying.

T J Tutor · Jul 23, 2016

Yeah, I can't say I am surprised. This is a code based, and code architecture based, set of questions we've posed and everyone in development circles like to stay tight lipped about these types of questions. Many, if not all, are likely under non-disclosure agreements preventing them from being forthcoming. Still, one would think someone outside those environments would have some more detailed answers for us.

MxyzptlkFishStix · Jul 23, 2016

Well, none of the spy tools work against the more savvy guys out there whose last line of defense is rDNS lookups, OS+MTU/MSS+User Agent detection and fingerprinting.

T J Tutor · Jul 23, 2016

mxyzptlkfishstiks said:
Well, none of the spy tools work against the more savvy guys out there whose last line of defense is rDNS lookups, OS+MTU/MSS+User Agent detection and fingerprinting.

This is interesting, do these "savvy guys" develop their own customized script or is there an automated tool or script that can be bought? An open source version maybe?

MxyzptlkFishStix · Jul 24, 2016

T J Tutor said:
This is interesting, do these "savvy guys" develop their own customized script or is there an automated tool or script that can be bought? An open source version maybe?

It's an open source C program that I will not reveal the name of.

However, I've gone a step further. I paid a Linux kernel developer to have to code converted from a userspace program to a kernel patch. Whereas the userspace daemon needed on average around 30ms to identify spy tool nodes, it can do it now under 7ms. I keep the unique fingerprint details of the nodes in a database and also drop them on the fly with iptables/ip6tables.

Most of these nodes are on Amazon EC2 instances which wouldn't be a problem if they didn't stick out like a sore thumb where the spoofed browser user agent doesn't match the tcp stack.

Example:

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/13.

Detected OS TCP Stack: Linux 4.4.X kernel

-----

Here's a lazy solution for the spy tools that reside on Amazon instances. Take the IP address ranges found here and drop them with iptables.

Dr. Forum · Jul 24, 2016

The spy tools are always applications that are developed to do data mining and crawling in the background. They are usually developed in a way that they crawl ceratin webpages in order to find the relevant information needed. This is made possible through the available algorithms in data mining. If you are interested in knowing the behind scene of the spy tools then you can get some tutorials on youtube. Will get a few links for you and then do the posting on this thread. All the best buddy.

tyoussef · Jul 24, 2016

all that True and also we can't deny That Some advertising networks Sell There own data to others

T J Tutor · Jul 24, 2016

mxyzptlkfishstiks said:
It's an open source C program that I will not reveal the name of.

However, I've gone a step further. I paid a Linux kernel developer to have to code converted from a userspace program to a kernel patch. Whereas the userspace daemon needed on average around 30ms to identify spy tool nodes, it can do it now under 7ms. I keep the unique fingerprint details of the nodes in a database and also drop them on the fly with iptables/ip6tables.

Most of these nodes are on Amazon EC2 instances which wouldn't be a problem if they didn't stick out like a sore thumb where the spoofed browser user agent doesn't match the tcp stack.

Example:

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/13.

Detected OS TCP Stack: Linux 4.4.X kernel

-----

Here's a lazy solution for the spy tools that reside on Amazon instances. Take the IP address ranges found here and drop them with iptables.

Now that's something to pursue.

ironbull · Jul 25, 2016

Dr. Forum said:
The spy tools are always applications that are developed to do data mining and crawling in the background. They are usually developed in a way that they crawl ceratin webpages in order to find the relevant information needed. This is made possible through the available algorithms in data mining. If you are interested in knowing the behind scene of the spy tools then you can get some tutorials on youtube. Will get a few links for you and then do the posting on this thread. All the best buddy.

Looking forward for those links

The Most Active and Friendliest Affiliate Marketing Community Online!

How Do Spy Tools Actually Work?

Active Member

GM

Active Member

GM

Active Member

Well-Known Member

Active Member

GM

Well-Known Member

Active Member

Well-Known Member

Active Member

GM

Well-Known Member

GM

Well-Known Member

Well-Known Member

Well-Known Member

GM

Active Member

Similar threads

The Most Active and Friendliest
Affiliate Marketing Community Online!