/e/OS: An Easy and Private Custom ROM

1 May 20231 May 2023 ~ admin ~ 1 Comment

My Samsung S9 is usable hardware. However, the Android version feels slow and is no longer supported by Samsung – i.e., it hasn’t received security updates since 2022. One option is to sell it and buy a new phone. Another option is to install a new operating system on it! Thereby reducing e-waste and saving money.

I can install a new OS for free on my S9, improving the phone’s longevity and giving it a fresh new look. Let me introduce /e/OS (“e OS”).

/e/OS is a modified version of Android, a custom ROM, that is maintained independently of Samsung or Google’s Android. /e/OS is also a fork of an operating system called LineageOS and removes almost all of the closed-source Google code from Android.

/e/OS is open-source Android at its core, with no Google apps or Google services accessing your personal data. So if privacy is important to you, /e/OS is a good option.

There are other custom ROMs out there but I find /e/OS to work well with most apps. It also looks clean and is easy to install with their Easy Installer app which walks you nicely through the process.

Is Privacy Traded for Functionality?

For most applications, /e/OS is great, but not all. The App Lounge is your Play Store/App Store but focuses on Privacy. I like how it scores an app’s privacy out of 10. I try to minimise the number of low privacy-scoring apps on my phone.

The App Lounge has pretty much all apps that you would want. I can get working Instagram and Discord – even if they are not the most privacy-respecting applications. However, it was hard to find Discord and some common apps were not readily findable. The App Lounge is also filled with some weird and random apps too but at least it doesn’t have ads all over like Google Play Store.

I find for messaging, most apps work fine except Facebook Messenger which didn’t work for me. One surprisingly great app is NewPipe, a client for YouTube. NewPipe gives you all the perks of YouTube Premium for free and doesn’t even track you as YouTube does. The consequence is that you are not provided with recommended content.

So /e/OS has me covered for YouTube and messaging. Navigation/maps, however, is a big trade-off (in my opinion).

The default maps app is Magic Earth. Magic Earth’s routing can be quite off, especially for the London Underground. It will recommend poor routes – I don’t really trust it. Google Maps is far superior as a service. I use TFL Go in tandem with Magic Earth when navigating London.

As for e-mail, I can use my Gmail account just fine, as well as my Murena e-mail that I got with my Murena cloud account (Murena is behind /e/OS). More on that later.

/e/OS has some advanced privacy features which I like. You can toggle on the use of the Tor network, and you can block Trackers on apps.

When using /e/OS, there is a small trade-off between functionality and privacy, but not in all aspects.

/e/OS – e Foundation

Murena Cloud

Your /e/OS phone has good (optional) integration with Murena Cloud. Part of the point of /e/OS is moving away from Google. Murena Cloud is an alternative to Google Cloud but with only 1GB free compared to Google’s 15GB of free space.

However, I like Murena’s transparency in telling me which country my data is being held. They also give you an e-mail alias if you don’t want to always give out your e-mail.

Closing

/e/OS looks good and works well for me (apart from Maps). /e/OS is not for everyone and I think it depends on the person – the advanced privacy features are probably not worth the small functionality trade-off for most people.

An S9 specific issue: I couldn’t find a way to map the Bixby button to anything. I would also like to have a cap on battery charging (cap at 85%) like my S21 has to prevent overcharging and increase battery longevity. Clearing all open tabs was also not obvious to me and should be more prominent. Another problem was enabling 2FA for Murena cloud but I managed it – this needs to be easier.

/e/OS has good privacy out of the box and most Meta and Google apps like Instagram, WhatsApp and Gmail still work – it’s good they’re there in case you still really need them. However, it almost defeats the point of the OS.

Overall a great OS and alternative to Google and Apple operating systems.

Self-hosting with Mini PCs: Discord bot & Minecraft server

24 April 2023 ~ admin ~ 2 Comments

Not a data science blog today! I wanted to briefly share some project ideas for self-hosting and just say that Mini PCs are great! Mini PCs can be powerful and quiet little desktops, but, they can also function as servers.

I got my hands on two Intel NUCs, which are small form-factor computers. It’s just like a normal desktop but can fit in your hands (~10 x10 cm).

If you leave them with networking and no I/O, they will sip power (~30 Watts) and act like a little server.

NUCs and other similar mini PCs such as the Antec Asrock have many use cases. For example, host websites (like this one!), be home media servers (with Jellyfin or Plex software) and be used for Home Automation servers (and more).

Below are two examples: Hosting a Discord bot and Bedrock Minecraft Server. As for software prerequisites, my NUC has Ubuntu server, Python, Docker and Docker Compose already installed.

Discord bot

Let’s create our own Discord bot and self-host it. First, you want to want to visit https://discord .com/developers, create an application and add a bot. You will need to copy the token and save it somewhere safe. For permissions, it depends on the bot, but usually I enable send and read messages at the least. For Privileged Gateway Intents, I would usually enable all.

A typical bot could start like this

import discord
intents = discord.Intents.all()
intents.members = True

This gives your bot the ability to receive member-related events. Next, we could make our bot invoke commands when a user types in ‘$’ in chat:

from discord.ext import commands
client = commands.Bot(command_prefix = "$", intents = intents)

@client.event
async def on_ready():
print('We have logged in as {0.user}'.format(client))

The on_ready() function will let us know when the bot is connected to Discord and ready to start processing events.

If we want the bot to say hello when we type $hello in the discord server we can use “context” (ctx).

@client.command()
async def hello(ctx):
    await ctx.send('Hello!')

Don’t forget to run the bot with your token client.run("TOKEN"). You can add your bot to your server using the OAuth2 URL generator in the Developer Portal, ticking send and read messages, and pasting the URL into the browser

I recommend using ctx for sending messages – it can make things easier. The context contains information about the message that triggered the command. This includes the channel, server, and author of the message.

You can run the bot with python3 name_of_bot.py in the terminal.

Check out my GitHub for a Discord bot that uses the Natural Language Toolkit (NLTK) Python package to find the most negative user (silly usage I know!). Fabio-RibeiroB/NLPdisrespectBOT: Discord Bot for Sentiment Analysis (github.com)

This is not a dedicated server for my bot so let’s containerise the application and let it run in the background.

I don’t want my secret token in the container in case I want to share the image so I created a .env file with TOKEN=my_token (ignore quotes) and add this to a .gitignore and .dockerignore files. In the bot script, you will then need to load the env variables using the python-dotenv library.

Our Dockerfile could look like this:

FROM python:3
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "sentimental_analysis_bot.py"]

This assumes you have a requirements.txt file with all your dependencies.

While in the directory of the bot, let’s build the image from a Dockerfile and name it.

docker build -t my_container_name -f Dockerfile .

Now we run the image as a container and pass it to the .env file and run it in the background

docker run -d --env-file=.env my_container_name

After entering the container I can see that there is no .env file. Great!

Minecraft Bedrock server

No need to spend $5/month on a server when you can host it yourself. Thanks to the itzg/minecraft-bedrock-server – Docker Image | Docker Hub docker image, having your own Minecraft server is not too difficult. I have only played around with the Bedrock version but I believe the Java steps are similar.

You will need docker to pull the image above (see the GitHub or Docker hub page for more info) and I recommend using docker-compose to get the container up in the background (command: docker-compose up -d).

As for configurable settings, I would enable the “allow list” which specifies which players can join the server; it’s just a security feature. However, the easiest way to add people to the allow list from the terminal is not clear to me. In the end, I just entered the running container (command: sudo docker exec -it my_server_name bash) and edit the allow list JSON by installing vim (apt update && apt upgrade && apt install vim). Here is an example allowlist.json.

[{"ignoresPlayerLimit":false,"name":"Your_name","xuid":"Your_xuid"}]

To find your xuid use this site: https://www.cxkes.me/xbox/xuid.

Once the container is running, you must set up port forwarding – WikiHow has a nice guide for doing this.

Below is an example docker-compose.yml that uses itzg’s Minecraft image, and set’s the Minecraft server to be survival mode, online, have an allow_list and set the name. If you ever want to change anything you can edit this file or server.properties and restart the container. For example, you may want to allow cheats.

version: '3.4'

services:
  bds:
    image: itzg/minecraft-bedrock-server
    restart: always
    environment:
      EULA: "TRUE"
      GAMEMODE: survival
      DIFFICULTY: easy
      ONLINE_MODE: "true"
      ALLOW_LIST: "true"
      SERVER_NAME: "My World"

    ports:
      - 19132:19132/udp
    volumes:
      - bds:/data
    stdin_open: true
    tty: true

volumes:
  bds: {}

To join the server, make sure you are on the allow list, type the server IP (same public IP as the NUC) and the port in “add server” on Minecraft.

I haven’t found a better way yet to change the allow list other than installing vim on the container and editing the allow_list.json file. Or you can just disable the allow list so any can join if you are having trouble with friends connecting.

Class Balancing: SMOTE & Variations

23 November 202215 April 2023 ~ admin ~ Leave a comment

Frequency and Bias

An important consideration in any classification task is class frequency. Class imbalances are problematic because the classifier becomes less sensitive to the minority classes. Consider a training set with a majority class A and minority B. An algorithm trained on this imbalanced data will develop a bias toward predicting A just because it appears more often. Another dataset could in theory contain more of B, for example. This classifier would function poorly in this case because it learned a preference for A.

To avoid this bias, perform class balancing. There are different ways to accomplish balancing. One method is to oversample the minority classes by duplicating observations. However, oversampling can cause the algorithm to overfit, and the data becomes skewed toward the replicated observations. Alternatively, one could undersample the majority class, leading to a loss of valuable information for a classifier [1]. I prefer to maintain as much information as possible.

In this article, I explain variations of oversampling, including using existing observations from the minority to synthesise new data. Balancing via artificial means is known as Synthetic Minority Oversampling Technique, or SMOTE [3].

SMOTE and Tomek Links

A data point can be represented as a vector, where each entry of the vector is an attribute. SMOTE works by first selecting a feature vector from the minority class at random. Then, the algorithm chooses a random neighbouring feature vector from k-nearest (usually five) neighbours. The new, synthesised, feature vector lies at an arbitrary point along the line connecting the two [3]. Fig.1 depicts synthetic data generation in SMOTE with an example dataset.

Fig.1: How SMOTE generates new points by connecting feature vectors. Image from `imblearn` [2].

The problem with SMOTE is that it can generate noise by interpolating points between outliers [2], and alone does not necessarily improve on the more straightforward random oversampling method. It is unlikely that SMOTE adds any additional information by using existing data, but SMOTE still shifts the bias toward the minority class [2]. Applying SMOTE may increase the sensitivity of the minority class but could decrease accuracy and precision. The authors of SMOTE found that combining SMOTE with under-sampling methods can improve classification performance [3].

To address some issues with SMOTE, particularly the case of synthetic noise, there is a modified version, SMOTE+Tomek, which tries to clean the feature space of synthetic noise. SMOTE+Tomek removes a majority class point with the nearest neighbour of another class. The pair of two closeby opposing types is called a Tomek link and is illustrated in Fig.2 [2].

Fig.2: Illustration of a Tomek link [2].

Undersampling the majority class with Tomek links thus removes boundary cases between classes and class label noise.

Another variation of SMOTE is SMOTE+ENN. Considered an improvement to removing Tomek links, SMOTE+ENN deletes the k-nearest neighbours as well as the points of the Tomek link [4]. There are more variations of SMOTE, including KMeans+SMOTE [5] which applies KMeans clustering before SMOTE.

It is important to note that class balancing is performed after the train-test split otherwise SMOTE will interpolate points between test data in training data. In this scenario, the new feature vectors in the training set could leak information about the location of points in the test data.

An alternative method to deal with imbalanced data and avoid SMOTE entirely is using balanced ensemble classifiers. In balanced ensemble methods, bootstrapping can be used to sample the data so that the constituent classifiers (e.g., trees in a forest) train on subsets with the classes present in equal amounts [6].

Example in Python: imblearn library

from imblearn.pipeline import make_pipeline
from sklearn.ensemble import RandomForestClassifier
from imblearn.over_sampling import SMOTE

pipeline = [SMOTE(random_state=0),
           RandomForestClassifier(random_state=0, min_samples_split=4, n_estimators=500)
          ]
model = make_pipeline(*pipeline)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

See more on the imblearn website: https://imbalanced-learn.org/dev/references/generated/imblearn.over_sampling.SMOTE.html#

References:

[1] Ma Y, He H. Imbalanced learning: foundations, algorithms, and applications. John
Wiley & Sons; 2013

[2] imblearn library documentation; By the Imbalanced-learn developers. Available from:
https://imbalanced-learn.org/dev/index.html.
[3] Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-
sampling technique. Journal of artificial intelligence research. 2002;16:321-57.
[4] Batista GE, Bazzan AL, Monard MC, et al. Balancing Training Data for Automated
Annotation of Keywords: a Case Study. In: WOB; 2003. p. 10-8.
[5] Last F, Douzas G, Bacao F. Oversampling for imbalanced learning based on k-means
and smote. arXiv preprint arXiv:171100837. 2017.
[6] Chen C, Liaw A, Breiman L, et al. Using random forest to learn imbalanced data. University of California, Berkeley. 2004;110(1-12):24.
d case studies. MIT Press; 2020.

ML App with Flask

2 September 20222 September 2022 ~ admin ~ 1 Comment

You built an ML with TensorFlow or Sci-kit learn and now want it deployed on a website. This article is a quick guide on loading an ML model in Python and using it to make predictions on a web app with Flask. This is based on my image classifier app found on my GitHub Fabio-RibeiroB/image_classifier_app: App to classify happy or sad images (github.com). In this app, the user uploads a file and presses predict. Specifically, the user uploads images for binary classification. It all depends on your model. You can change this code to upload CSV data instead, for example.

I appreciate this article is high-level and lacking detail. It is more than an outline to make it as short-form as possible. To see more, check out my aforementioned repo on GitHub. Anyway, let’s begin.

Save and Load

Let’s say you have a Sequential model that you compiled.

model = Sequential()
....some model
....
model.compile(....)

Now save the model, for example, as a .h5 file. I saved mine in a “models” folder. You can also use pickle to dump and load models as .pkl files.

from tensorflow.keras.models import load_model
import os
model.save(os.path.join('models','model.h5'))

Now load it in your Flask app.

from flask import Flask, render_template, request, redirect, flash, session # useful flask modules
from werkzeug.utils import secure_filename # security
import logging

logging.basicConfig(level=logging.DEBUG)
logging.info('program starting')

from tensorflow.keras.models import load_model
model = load_model('models/model.h5') # loaded model

Static Uploads Folder

We also need a folder where we can upload data for the model. Make a directory called static, and within that, a directory called uploads.

UPLOAD_FOLDER = './static/uploads'
ALLOWED_EXTENSIONS = {'png', 'jgp', 'jpeg'} # change depending on model
app = Flask(__name__)
app.secret_key = b'somesecretkey'
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

def allowed_file(filename):
    """
    Check the uploaded data is correct format
    """"
    return '.' in filename and \
          filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS

Routes and Prediction

Now I create the necessary routes in your app that makes a prediction. We will later make a home.html will allow us to start a prediction. The predict.html page shows the results.

@app.route('/')
def home():
    return render_template('home.html')

@app.route('/predict', methods=['POST'])
def predict():
    file = request.files('file')
    # check uploaded file is okay in upload folder
    if file and allowed_file(file.filename):
        filename = secure_filename(file.filename)
        data_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
   
        file. save(data_path) # save data
        # read data in for example with pandas
        prediction = model.predict(data) # in my example prediction is one number, a probability.

        # remember to delete file after use

        os.remove(data_path)
        return render_template('predict.html', data=prediction)
   
if __name__ == "__main__":
    app.run(debug=True)

The above code saves the valid data in the uploads folder and loads the data. For example, you could load your data into a pandas data frame. The /predict route passes the prediction variable containing a single prediction. The prediction variable is passed to predict.html to render the result on the web page. If your model outputs a lot of predictions, like a CSV of predictions, these lines will need to be modified. You probably want the user to download the predictions as a CSV instead of displaying the results on the screen. In this case, you need a download button in your HTML.

However, continuing my example, we have a home page called home.html with a form for the user to upload data. I simply removed the rest of my HTML tags to declutter the code snippet below. See my repo for the entire HTML file.

In home.html

<form method="POST" action="{{url_for('home')}}"   enctype=multipart/form-data>
    <input type=file name=file>
    <input type=submit value=predict>
</form>

And now for prediction.html. This page will simply output the results with a back button.

{{data}}
<form>
<input type="button" value="Try again" on    click="history.back()">
</form>

Flask run, and the app should be running in local host.

Data Science & Tech Blog

Quick summaries and guides

Uncategorised