[discussion] Content Filtering

Added by ask low 9 months ago

I wished I would benefit the best of both worlds, as you said (#182#note-36). But not worth at all.

I just pondered upon your comment #182#note-34, and wondered how that extra information from url is considered as an advert.
DNS filtering can't block urls based on path/subpage indexes, because host lists must need to start with a domain/ip at the least. Which is why they aren't super effective / invasive. Which is why Linus Sebastian was able to promote pi-hole, even if he's against adblocking.

I've understood the pattern of adverts, and it turns out that the marketing industry stopped running their own domains and started sponsoring mainstream journalists/media/content creators, so that the adverts get built into the content itself. As an example, this is how YouTube works ingeneral (sponsored segments).

DNS/Host filtering although is able to block out 3rd party requests, hence somewhat usable for html based content blocking. But it's not at all worth for those sites who integrate ads within their JS & CSS. That's how Cosmetic Filtering extensions like uBO, ABPlus, Adguard, etc emerged.

Replies (12)

RE: [discussion] Content Filtering - Added by Soren Stoutner 9 months ago

From an ad filtering perspective, I can see three broad categories. 1) Ads that can be filtered by domain. 2) Ads that can be filtered by other aspects of the URL. 3) Ads that can be filtered by modifying the page contents. Currently Youtube ads can only be filtered by this third category, and they are starting to redesign their system to even prevent that.

Note that in each case, if the webpage really wants to, they can bypass all three types of filtering. For number 1, they can serve the ad from their own domain instead of a third party. For type number 2, they can either use the same URL attributes to populate the ad as they use for important content on the page, or they can use random attributes for both. For type number 3, they can use the same page attributes to display ads as important content, or they can use random values for both.

Something similar can be said about tracking features. Sometimes they can be filtered by domain. Sometimes they can be filtered by other aspects of the URL. Rarely would modifying the page DOM contents block tracking, as tracking features rarely display visual elements that wouldn't be removed by filtering for type 1 or 2. Websites can bypass all three of these by embedding tracking JavaScript into the same JavaScript file that serves important content. Currently, most filtering doesn't remove individual elements from a JavaScript file, although Feature #270: Fine grained JavaScript controls plans to be able to do so.

In the case of fighting tracking, disabling JavaScript is particularly powerful, as most of the advanced tracking techniques require running code on the user's device. Most advertising also currently depends on JavaScript, but that is more of a convenience than a necessity. Websites could ship most ads in HTML and CSS if they wanted to.

RE: [discussion] Content Filtering - Added by ask low 9 months ago

I guess in terms of article based sites, such as news, informative and journalistic media, content blocking is already a lot more effective, even if we consider category 1. I'm already satisfied with DNS based results regarding this.

For category 2 & 3, of course. Marketing Engineers are surely always one step ahead of keeping their business running along the majority userbase (who mostly use Google Chrome as the default browser). Hence, I believe that they typically don't even care about the revenue loss caused by a mere portion of geeky users who avade their tactics.

As for YouTube, they fail in whatever they do to prevent anything that violates their ToS, as long as their API is public. (invidious ?)

Fighting tracking and privacy/anonymity race, we're already lost the game, whether JS is executed or not. Most tracking and ID leaks happen at ISP level. That's where the whole cat&mouse chase of VPN/TOR/I2P proxies. & don't even wanna dabble into those.

I believe in either ALL IN or ALL OUT. Either you reap the benefits of internet services with personalized ad servings. Or completely don't use the services at all. Staying somewhere in the middle doesn't help. And I'm mostly all out.

RE: [discussion] Content Filtering - Added by Soren Stoutner 9 months ago

Regarding ISP level tracking, it isn't as complete as most people think because it has a hard time identifying all of a user's browsing history. For two reasons.

1. Most people don't use the same ISP all day long. You might use one ISP at home, another at work, another at a restaurant you visit, and another that provides cell phone data when you are away from Wi-Fi.

2. Most often you share a single IP address with a number of users. For example, when you are at work the IP address your ISP sees will likely be the same as that of all the other workers.

Using a VPN actually makes this worse, because the VPN gets all of your traffic even when you switch ISPs. Being able to sell your complete browsing history is a lot more valuable than only being able to seel a portion of it. And despite whatever fake promises a VPN provider might make on their website, you know they are all spying on you and selling your data. In fact, you are paying them to spy on you and sell your browsing history. They are laughing all the way to the bank.

But JavaScript is even more invasive than this. Large internet companies, like Google and Facebook, have specifically designed their systems to track individual users across devices and IP addresses. They can do this because they have tricked a large portion of the internet into embedding Google's and Facebook's code into their websites through things like ads or analytics. This goes far beyond what an ISP can ever collect. Not only can they identify users across all their devices and all the ISPs they use, but they can even differentiate different people who use the same computer. All of this because of the power of JavaScript.

RE: [discussion] Content Filtering - Added by ask low 9 months ago

Totally agree with VPN argument. Although, I believe setting up your own VPS is another whole thing among one of the advanced DIY projects one can do.

I don't understand this scripting concept at all. I've heard there are many JS successors that'll pan out in the future such as typescript, webassembly, etc, even though I'm 100% not sure it'll happen. Time and fate decides these.

Btw identity personalisation is not a crime, because they (large companies) literally show us their Terms of Agreement about the data collection before installing their product. And we literally hit Agree button, just TL;DR lmao.

I hope content filtering stays alive throughout the future web breakthroughs.

RE: [discussion] Content Filtering - Added by Soren Stoutner 9 months ago

Large companies almost always do things with your data that you would be very angry about if you knew what was happening. And, despite what their lawyers argue, these bad things are almost always in conflict with a plain reading of their terms of agreement. They do this because they know they will almost never be held accountable.

There will be no privacy on the internet until web browsers stop downloading arbitrary programs (scripts) from the internet and running them on users devices. One of the odd things about the cell phone world is we have become desensitized to the idea of being able to download untrusted code to our devices (random apps) and have the expectation that somehow the app store or our devices will be able to protect us from them. This is a fundamental impossibility. In the old days, would you have run some random program off a floppy disk someone mailed you? If you run untrusted code on your device it will inevitably be compromised.

Web browsers fall for this same problem. We need to get back to the world where we only download untrusted data (like HTML and CSS) from the internet, and all the code that runs on our devices is trusted. Disabling JavaScript by default is a good first step along this path.

RE: [discussion] Content Filtering - Added by ask low 9 months ago

Disabling JavaScript (which currently, is the web content delivery mechanism for almost 2 decades), is probably a good first step for now. But I wonder that would not be the case in the future (probably even now, caz 80% of the internet is broken without it).

As you said, the large companies had always been dictating how the world should run, since the beginning of the software boom. Normie people are just herds who don't learn new things, be aware of anything at all and continue holding each other's tails till they wait for a domino effect.

If the data that we receive from the internet is untrusted, or at least adverts to be precise, then the elephant in the room is, who decides what data is trustable and what data not ?

RE: [discussion] Content Filtering - Added by Soren Stoutner 9 months ago

One of my purposes with Privacy Browser is to get enough usage (about 20% of all web traffic) that web developers start redesigning their websites to work with JavaScript.

How do you personally decide what is trusted code? For me, the code needs to be open source. It needs to have been reviewed by at least one independent entity, like F-Droid or Debian. Often other parties run automated scans on all the programs in F-Droid's or Debian's repositories, catching even further problems. It is still possible for bad things to slip through all of this, but this is a minimum for what I would start to consider to be trusted code.

The important differentiation between data and code is that untrusted data can't usually do you any harm if the code that processes it is trusted and free of vulnerabilities. So, you can safely open a document someone sends you as long at that document doesn't include code or data that somehow exploits a vulnerability in the code of the program you use to open it.

RE: [discussion] Content Filtering - Added by ask low 9 months ago

If that's the path you're going, then you're definitely successful in your work I'd say. You're following one principle and you're doing it well. That's literally UNIX philosophy.

I do have positive attitude towards the current 20% web traffic, caz that's where I fall in.

But the problem in Content Filtering is the CONTENT which is not in our hands. Unfortunately, not all code can be open source. As long as Giant Companies control mainstream code, the content control is in their hands. Their philosophy is all about centralization.

Btw, I don't think focusing on disabling JS will be much of any huge benefit. The whatever 20% traffic that we focus on, is of majority non-invasive. To up our strategy, shouldn't we actually focus on securing the other 80% JS traffic ?

RE: [discussion] Content Filtering - Added by Soren Stoutner 9 months ago

If Privacy Browser gains a 20% market share, 80% of all internet websites will reprogram themselves to work without JavaScript. Sometimes it takes my breath away to consider how daunting the challenge is of changing the way the entire internet thinks and functions. But I see no other path that can lead to good results.

Note that there is a very small amount of trusted JavaScript on the internet. You can decide for yourself if JavaScript is trusted based on who wrote it and if they provide the source (non-minified) for it. That is the entire concept of Domain Settings, which lets you enable JavaScript for sites you trust (or which you desire to use so badly that you will take the risk with untrusted JavaScript). But trusted JavaScript will never be the norm. Trying to find safe ways of running untrusted JavaScript is a fools errand that will never produce good results. You can chase that whack-a-mole game for all time and never come out ahead.

RE: [discussion] Content Filtering - Added by ask low 9 months ago

Realistically speaking... PB or any PB like project (even mozilla) cannot achieve 20% market share.

But the effort that we put into the projects like PB, will definitely have at least some ripple effect and keeps the door open for the opposite minds. That's what matters and that's all we can do ;P

Does #270 come under content filtering idea ? Or is it a completely different strategy ?

RE: [discussion] Content Filtering - Added by Soren Stoutner 9 months ago

Time will tell. I think that there will be future events when the general population realize how much their lives are negatively impacted by web tracking and suddenly start caring a whole lot more about internet privacy.

Remember when Eric Schmidt claimed that the expectation of privacy on the internet was dead?

It was only a few short years later that everyone figured out that wasn't going to fly with people. And suddenly, everyone became a "privacy company".

Right now there is a lot of money to be made in pretending to sell privacy. In the future, I think there will be a lot of money to be made in selling actual privacy. Once that happens, I think that Privacy Browser, or something like it, could gain a 20% market share.

Feature #270: Fine grained JavaScript controls is much more fine grained than what most people think of with content filtering. But it is a natural extension of what the AdBlock syntax currently does. Instead of just loading some parts of a website and not others, it is basically loading just some commands in a JavaScript file and not others.

RE: [discussion] Content Filtering - Added by ask low 5 months ago

CISA recommends adblocking lmao.

"Deploy adblocking software"