AI Usage at Work

A few years ago, we wondered if we would one day see an AI capable of writing coherent text or producing an image. Now, we have hundreds of AI models that can do this in different ways, in a simple matter of seconds.

The accessibility of those tools also greatly improved: it used to be a pain to install all the required dependencies for TensorFlow to work properly, and now tools such as Anaconda, Docker, PyTorch, Transformers and Diffusers made it incredibly easy to get started. Now, we can easily try out those models in our browser without any GPU or installation.

On my previous job, my team started to get interested in how AI could improve our work. They created a sheet with potential ideas for AI integration in their workflow.

Their biggest concern was the confidentiality of their data: you must be careful about what you upload, especially if it contains sensitive information such as future project plans.

I already worked with AI in the past, so I wondered where we could integrate it using self-hosted tools to avoid any risk of leaking confidential data.

Code Auto-completion#

My first thought was auto-completion of code, the most well-known example being GitHub Copilot, which is a tool that suggests code snippets based on the context, but that is not open source.

My company used an old C++ IDE called CodeBlocks, it is a basic tool to write C++ code, but it is missing many important features such as a linter, build tools, or any AI capabilities.

I found the TabbyML project instead. It is a self-hosted tool that allows you to install both a chat model such as a ChatGPT, but also a special code completion model such as StarCoder. It can be very straightforward to install with Docker, and it can be integrated in any modern IDE such as VSCode, the JetBrains IDE, or even Eclipse.

I tried it on the small integrated NVIDIA GPU of my work computer with a StarCoder-3B, and it somewhat worked. It was capable of guessing some lines that I wanted to write, but it was not very accurate.

My work GPU was way too small to load anything meaningful. I tried it again on my university GPU, a RTX 6000 Ada, and the results were much better: I was able to try out bigger models such as StarCoder2-7B, CodeLlama-13B, DeepSeekCoder-6.7B, CodeGemma-7B, Qwen2.5-Coder-14B, but the most impressive for me was Codestral-22B, the biggest model available from Mistral.

With this model, you can write the comment and documentation of your code, and the model will write the entire function for you. It can even feel like mind-reading!

You can also ask it a question about the code, and it will try to answer it from the files in the repository. It could be incredibly useful to introduce new developers to the code base.

While you still have to verify the results, it did a perfect job at helping me write code faster, and I did not felt the brake like GitHub Copilot where everything I wrote was sent to the cloud.

Chat Models#

While TabbyML is a great tool, it is most useful for a developer in an IDE. For a more general usage, other self-hosted tools such as Ollama were more adapted.

Ollama offers a simple chat interface similar to ChatGPT, is straightforward to set up and supports a large number of models.

I was able to try some small models on my work computer, such as Gemma3-4B, but it is pretty challenging to have a coherent and good discussion with such a small model. Ollama is meant to be used with better GPUs, so I also tried it on my university GPU with better models such as CodeLlama-13B, DeepSeekCoder-16B and MistralSmall-24B.

The results were much more impressive, the conversation was much more coherent, and the model was able to understand my intentions better. Ollama can be configured to support documents uploading thanks to a RAG, which allows it to respond accordingly to the document content. For example, I had a question about a very specific assembly language we used at work and its compiler, and this tool was able to answer it from the manual I gave him.