Project

General

Profile

Actions

Feature #182

open

Allow users to manually or automatically update the filter lists

Added by Soren Stoutner over 6 years ago. Updated 5 months ago.

Status:
New
Priority:
4.x
Start date:
08/18/2017
Due date:
% Done:

0%

Estimated time:

Description

In the Filter Lists Activity, display the last update timestamp of the filter list (which will be updated every so often with new releases).

Allow the users to manually update the list, which will pull the current version from EasyList.

Allow the users to set updates to happen automatically, like once a day or once a week. This will run an asynchronous task when Privacy Browser starts.

Actions #1

Updated by Soren Stoutner almost 6 years ago

  • Subject changed from All users to manually or automatically update the ad block list to Allow users to manually or automatically update the ad block list
Actions #2

Updated by Soren Stoutner over 5 years ago

  • Subject changed from Allow users to manually or automatically update the ad block list to Allow users to manually or automatically update the blocklists
  • Description updated (diff)
Actions #3

Updated by Soren Stoutner 11 months ago

  • Subject changed from Allow users to manually or automatically update the blocklists to Allow users to manually or automatically update the filter lists
Actions #4

Updated by Soren Stoutner 11 months ago

  • Description updated (diff)
Actions #5

Updated by ask low 7 months ago

I think shipping the lists through browser is a bad idea. One way to circumvent this, is to get all the lists on the first launch of the application, from easylist, and fanboylist websites, as they're just text files. Typically, storing those lists as data files either in user config directory, or data directory. And update them in a certain timespan. Usually 4 days.

Actions #6

Updated by Soren Stoutner 7 months ago

Yes, but it ends up being a much more complicated procedure than might be initially apparent.

For example, you need to pause all resource requests while new lists are being loaded. Otherwise, traffic that should be blocked by one of the lists could periodically pass through, which would be an unacceptable experience.

You need to design a recovery mechanism that can detect lists being only partially downloaded or parsed if the process gets interrupted by the Android Activity Lifecycle.

https://developer.android.com/guide/components/activities/activity-lifecycle

You need you handle problems with entries in the lists that are either malformed for Privacy Browser's implementation or written in such a way that they cause Privacy Browser to crash. A few years ago there was an entry in one of the lists that caused the browser to crash. Because I test and modify the lists before shipping them, I was able to remove the problematic entry (it was eventually removed upstream). But if the browser is fetching the lists themselves, users could end up in a scenario where the only way to fix the crashing browser is to remove all their settings, which can lease to a lot of user dissatisfaction. For a bit of context on this, I would recommend you read the following links.

https://www.stoutner.com/privacy-browser-android/filter-lists/
https://www.stoutner.com/privacy-browser-android/filter-lists/easylist/
https://www.stoutner.com/privacy-browser-android/filter-lists/easyprivacy/

All of these are difficulties that can be designed around, but it does make this a very difficult and time consuming feature to implement.

Actions #7

Updated by ask low 7 months ago

I know a Chinese browser that keeps the list files in user directory (Internal-Storage/Android/data/<package>) and it replaces those existing lists in a specific period of time. The replacement itself doesn't cause the blocklists to leak out, since the updation happens immediately.

If there are malfunctioned lists, make sure that you're updating the lists in such a time when the upstream fixes already in place.

Or instead of removing those lists, it might be better to think why it happens in the first place. Caz other browsers might not have behaved the same way.

Actions #8

Updated by Soren Stoutner 7 months ago

This is a planned feature and it will be implemented at the appropriate time.

"since the updation happens immediately"

The above statement makes me think that you don't understand how computers work.

Actions #9

Updated by ask low 7 months ago

I think you misunderstood me. "The updation happens immediately" I meant that the lists that are kept local, can be replaced by the new pair, at the time when we aren't browsing.

I know that this is just a discussion of the idea, so I don't think anyone rushing about the implementation is gonna help us. At the end of the day,
it's you sir, who'll make the decisions, according to what's possible to you ;P

I do know how computers work but only hardware, just not the coding & software engineering. As a FOSS friendly user, I think I should only focus on the proper usage & debugging. I do daily drive arch on i3wm/sway, not to geek out, but only to be productive & for the sake of minimalism. So I have very little knowledge & time to focus on the inner workings though.

Actions #10

Updated by Soren Stoutner 7 months ago

For reference, the second link in comment 6 above has some benchmark data about how long it takes to read the filter lists from storage and parse them into the array lists in memory.

"Android stores all assets (like the filter list data) in compressed files in the APK. On a Pixel 5, decompressing and parsing the filter lists when Privacy Browser starts takes about half a second. Checking a resource URL against the lists takes about 20-50 milliseconds."

Actions #11

Updated by ask low 7 months ago

Isn't it possible to store the lists outside the application package ? I've used some other browsers that download lists directly into the userdata & then parse through them. But most of those are closed source btw.

Actions #12

Updated by Soren Stoutner 7 months ago

Yes. Anything downloaded after the APK is installed could be stored in the app's private data folder. In the case of Privacy Browser that is something like /data/user/0/com.stoutner.privacybrowser.standard. However, parsing the lists from that location will take about the same amount of time as parsing them from the APK.

This is mostly because the way the lists are stored in text files is substantially different than how they are stored in nested array lists in RAM, and parsing them is a very CPU intensive experience. In my design, I have worked really hard to minimize the CPU hit, with a particular focus on minimizing how long it takes to check an individual URL against the lists. Most websites will load dozens or even hundreds of resource requests for a single page. I feel pretty happy that I was able to get things down 20-50 milliseconds per check. I wish I could get the parsing of the lists down to less than 500 milliseconds, but so far that is the best I have been able to do.

For some reference, this is the code that parses the filter lists.

https://gitweb.stoutner.com/?p=PrivacyBrowserAndroid.git;a=blob;f=app/src/main/java/com/stoutner/privacybrowser/helpers/ParseFilterListHelper.kt;hb=HEAD

Actions #13

Updated by ask low 7 months ago

Nice. That's already sorted, but the focus of this issue tracker is to allow users to manually update the existing filter lists. To achieve that, there should be an interface within the browser, or an automated sub daemon inside API that can trigger list updates in a specific period & in a specific working condition. That can be challenging as you said, caz there are lot of tasks to manage like properly downloading them, updating periodically & without interrupting userspace.

I think the best way, is manual updation by users. Like providing a trigger button to update local lists, & provide causion beneath it whether the lists are upto date or not.

Right now, users have no choice but to wait until the next browser update. It's at least better than nothing isn't it...

Actions #14

Updated by Soren Stoutner 7 months ago

Actually, the most difficult issues are the other ones I mentioned in comment 6. Pausing the loading of all resource requests while new filter lists are being loaded is by far the easiest of all the issues to address.

One of my core design philosophies is that I am not going to intentionally release a half-baked feature that has the potential to cause massive frustration for users.

As such, this feature will be released when it is ready. But it will require significant effort to make it work correctly and there are many other things that are more important and hence earlier in line.

Actions #15

Updated by ask low 7 months ago

Wise decision. Is there any beta branch you're maintaining? I'd love to participate if it exists !

Actions #16

Updated by Soren Stoutner 7 months ago

I don't currently have a beta branch, but that is something I might do in the future.

If you want you can always download the code and build it yourself. The cloning syntax is available at https://www.stoutner.com/privacy-browser-android/.

At least one person has done this to receive updated filter lists faster than official releases, as can be seen at #566.

Actions #17

Updated by ask low 6 months ago

Can I ask if there's any other possible way to update lists except through recompilation ?
Some news sites slip through ads & couple of banners, which doesn't happen on updated lists (in other browsers).
There's a fork of pb called Monocles (which updated recently, didn't show any ads on the same site btw), which also confirms this.

Actions #18

Updated by Soren Stoutner 6 months ago

Not currently. Although I should note that Privacy Browser is very easy to compile yourself. Basically, just download Android Studio, do the basic setup, clone Privacy Browser from the instructions on the main website, open it up, and build it.

https://developer.android.com/studio

https://www.stoutner.com/privacy-browser-android/

Actions #19

Updated by Soren Stoutner 6 months ago

An recent example of how automatically updating the filter lists can introduce problems can be found at:

https://github.com/easylist/easylist/issues/17295

Actions #20

Updated by Soren Stoutner 6 months ago

  • Priority changed from 3.x to 4.x

For stability reasons, I think I am going to put this off until Feature #1081: Expose detailed resource request types in Privacy WebView is implemented.

Actions #21

Updated by ask low 6 months ago

But I'm not sure how other webviewless browsers were able to handle those strings without any breakages. Unfortunately, they're closed source.

Actions #22

Updated by Soren Stoutner 6 months ago

If they aren't based on WebView, then they are operating under a different set of constraints.

Actions #23

Updated by ask low 6 months ago

webviewless I meant the browsers that don't come bundled with a webview, means the ones that rely on system's default webview.
Anyways, I hope privacy webview's ready sooner. Can't wait for it ;P

Actions #24

Updated by Soren Stoutner 6 months ago

I am not aware of any other browser that uses Android System WebView processes the AdBlock syntax used by EasyList.

Lightning uses a hosts file.
FOSS Browser uses a hosts file.
jQuarks uses a list from pgl.yoyo.org, which is simply a list of domains that doesn't look at any other part of the URL.

All of these are fairly simple (and prone to miss things) compared to what Privacy Browser has implemented.

Which browser do you think is based on Android System WebView that has managed to implement something more sophisticated that what Privacy Browser currently offers?

Actions #25

Updated by ask low 6 months ago

Oh. I've said closed source one, not FOSS. It's unfortunately a chinese browser (https://play.google.com/store/apps/details?id=mark.via.gp). It does have hosts files config. Has ability to even add new ones in it btw.
I do take some precautions to run these shady apps securely.

Actions #26

Updated by Soren Stoutner 6 months ago

If all they have is a hosts file, then it isn't very sophisticated (or very good at protecting your privacy).

For example, in the example linked to from #note-19, the entry is designed to block HTTP pings, but only if they are third-party requests. Nothing about a hosts.txt file would even notice that is happening.

So, Privacy Browser uses filter lists that are much more sophisticated than what these other browsers are using, but because of limitations in Android System WebView, sometimes the upstream lists need to be modified before they will function correctly. Shipping the lists with Privacy Browser allows me to make those modifications. Automatically updating those lists could result in undesirable behavior for users.

However, with the advent of Privacy WebView in the 4.x series, it will be possible to more fully implement the AdBlock format (used by EasyList), meaning it will be safer to allow auto-updating of the lists.

Actions #27

Updated by ask low 6 months ago

Is it possible to modify hosts file by a browser function through regexes, & then store it locally ? Maybe via could be doing the same thing.
When I hit update in Via's hosts config, it processes the text file first (for couple of seconds), then stores in my Android/data directory.

Actions #28

Updated by Soren Stoutner 6 months ago

Host files, by definition, only apply to hosts (which is another way of saying domain names).

The domain name part of a URL provides a little bit of informaiton, but, especially in relation to privacy invasions, often doesn't provide enough information to know if something should be blocked. For that, a deeper analysis of the URL that comes after the host is often needed.

Reading over the AdBlockPlus documentation can provide some sense of the nuance of information that can be checked once you move beyond just dealing with host files.

https://help.adblockplus.org/hc/en-us/articles/360062733293

Actions #29

Updated by stein chen 6 months ago

@ask low Maybe "https://github.com/Slion/Fulguris", which is FOSS and available at F-Droid would be worth checking out? It has the ability to add your own abp-syntax-lists.

Actions #30

Updated by Soren Stoutner 6 months ago

That's an interesting browser. I will have to take a look at it.

Actions #31

Updated by ask low 5 months ago

Current workaround. I've been using personalDNSfilter (https://github.com/IngoZenz/personaldnsfilter) on rooted mode (without VPN) which is a systemwide implementation of local DNS. It supports custom filter lists, as well as auto updation period too.
With this, I disable PB's host files altogether. Works perfectly.

Actions #32

Updated by Soren Stoutner 5 months ago

You might want to run Privacy Browser's filter lists in addition to a DNS filter, as DNS is basically the same as host files. It only looks at the domain name and can't filter based on anything else in the URL.

Actions #33

Updated by ask low 5 months ago

I do understand the difference between host based filtering and query based filtering. But I don't think one can achieve a better result than the other. You can easily convert Easylist into hosts and use it as it is.
https://github.com/ProgramComputer/Easylist_adservers_hosts
The above is only for Easylist Adservers. The same applies for EasyPrivacy, EasyCookies & Annoyances. I did request for it though.
https://github.com/ProgramComputer/Easylist_adservers_hosts/issues/4
Not sure if it'll be done. For now, I disabled the primary Easylist from PBA, & kept the rest enabled.

Actions #34

Updated by Soren Stoutner 5 months ago

But, by converting them, all you do is strip out the extra information and only consider the domain name. And for entries that don't even have a domain name, how would you convert them? For example:

/getads/*

Or, something like this:

/googleadservices/*

Or, something like this:

/track_click?

These are examples from EasyList and EasyPrivacy.

Actions #35

Updated by ask low 5 months ago

I haven't thought of that. That is why some reddit posts claim that DNS adblocking is not as effective as dedicated browser content blocking. I didn't take them seriously.

Actions #36

Updated by Soren Stoutner 5 months ago

There should be no problem enabling all of Privacy Browser's filter lists as well as using a DNS filter that you can update yourself. You get all the benefits of both worlds.

Actions

Also available in: Atom PDF