Apertus – Open Foundation Model for Sovereign AI

(apertvs.ai)

148 points | by T-A 2 hours ago

18 comments

maxloh 2 hours ago
Other fully open LLMs include Allen AI's OLMo 3.1 and MBZUAI's K2 Think V2, both of which have released their full training pipelines and datasets.
Nvidia Nemotron is also an open training source model, though a portion of its dataset remains proprietary.
Quoting lambda's comment:
> Note that the Nemotron models are generally stronger than Olmo and K2 Think V2 (according to Artificial Analysis benchmarks), and there is a lot of overlap in their datasets (lots of datasets are based on the same sources with different filtering, Olmo and K2 Think V2 both have used some Nemotron datasets).
> But yeah, Nemotron is a modern and fairly capable LLM, even the 122b is more capable than Deepseek R1 (a 671b model) on most benchmarks, and there's also the recently released 550b Ultra now.
https://news.ycombinator.com/item?id=48492439
neom 19 minutes ago
I'm curious to know what stuff like this means for cohere? Their whole value prop is Sovereign AI. It seems they spent a lot of money developing models but own none of their own infra, what is the point of a country spending a lot of money on coheres solutions when stuff like this is becoming increasingly available and usable? Feels like I must be missing something here??
SwellJoe 2 hours ago
I like the idea, and it has become more pressing that everyone outside the US think about tech sovereignty because the US has become an unsafe place to keep your data, but the impression I get from Apertus is that it moves at the speed of a committee. I have no expectation they'll deliver a competitive model. At least, not competitive with current models. Maybe competitive with models a year ago (though they haven't even done that yet, right?).
[-]
- nezuzen 1 hour ago
  "the US has become an unsafe place to keep your data"
  I empathize with this but curious what would make any other country a better safehaven for your data? I personally like the EU's approach to data safeguards, but are there other locales/data protections you have in mind that would keep your data "safe".
  [-]
  - digitaltrees 1 hour ago
    The rule of law exists in other countries in a way it does not in the US right now.
    [-]
    - SubiculumCode 19 minutes ago
      Can you give examples?
dangoodmanUT 9 minutes ago
How are they going to be competitive with top models at 70B size?
dTal 1 hour ago
It's good that there is a movement for open LLMs, but it's not where the battleground is right now. The battleground is local vs service LLMs, and we are losing that battle badly despite all the software being here now and viable, entirely because UX sucks.
How many normal people do you know who use "ChatGPT"? A lot, probably.
How many even know what "Gemma" is, let alone have downloaded llama.cpp, a GGUF file from Hugginface, and run "llama-server" from a text console with all the correct command arguments? How many are thinking about this use case when speccing out their next computer? Where is the breathless marketing copy boasting x tok/s?
We are sleepwalking into slavery.
[-]
- 627467 51 minutes ago
  "Normal people" have never bothered to host their own: photos, music, videos, documents, comunications, etc. To the point that for many their computer is essentially a thin client into someone else's server. Why would we think this same people would care about "personal" inference?
- 8note 1 hour ago
  normal people dont really have the hardware to run local models
  [-]
  - manithree 5 minutes ago
    They may not right now, but the whole point of Microsoft's Copilot+ PC standard (even though it's somewhat anemic) is to run models locally. Apple Silicon with enough unified memory is capable. Not to mention modern iPhones and Pixels have fairly capable NPUs and routinely run local models. So, we may not be to the point where most normal people have the hardware to run local models, but it is rapidly approaching.
  - sosodev 10 minutes ago
    They have it, we just haven’t enabled them. The smart model with a chat box is the wrong abstraction for local. Ideally we would have it built into applications as a clear and easy to use opt-in feature. Like allowing a user to index a folder on their hard drive and then search it semantically via embeddings. You could do that on fairly low end hardware these days. Like 2GB of RAM with any processor made within the last 10 years.
- theptip 38 minutes ago
  Why do you feel the important part _now_ is where the weights get run?
  I can see this as a future battleground but access to frontier models (which you cannot run locally) seems a lot more relevant today.
- 0gs 45 minutes ago
  it's funny because i made this thing (called enough) that aims to make it easy for non-technical people to get up and running with local models quickly, but it is impossible to figure out how to break through the noise. every thread and comment like this breaks my heart a lil bit
- azinman2 53 minutes ago
  > We are sleepwalking into slavery.
  That’s a bit hyperbolic…
- idiotsecant 1 hour ago
  Better UX does not buy you a datacenter farm to train state of the art cutting edge models. Right now the only people who can do that are the technobility class.
  [-]
  - dTal 1 hour ago
    It does not, but it might encourage more people to care. Worrying about training is a luxury when you are starting from a baseline of "OpenAI spies upon me and controls my access". Let's focus on getting every Tom, Dick and Harry 1) on board with LLMs, because they're happening, 2) habitually using local software.
- double0jimb0 1 hour ago
  Yea, anyone who understands what makes products actually usable is opting to get paid for said skill.
- wmf 1 hour ago
  LM Studio
mrshu 1 hour ago
By far the most impactful product of the Apretus project are the people. To quote a memorable line from Dominique Paul (https://www.thisiscrispin.com/):
> What most people miss IMO is that this is not a team who is doing this for the fourth time like virtually any other LLM provider and who could learn from its own past experiences. I bet if the team would do another model training they could get way better results at one fourth of the costs.
pferde 2 hours ago
For a model that claims to focus on many languages, it's quite unreliable when it comes to simple questions like "how to say X in language Y" or "how to conjugate verb X in language Y". It keeps hallucinating words that do not exist, and when corrected, it only hallucinates a new lie.
[-]
- 8note 1 hour ago
  it probably doesnt know what language each set of words is referencing.
  i doubt they are including a lot of training data labeled with the language.
  "how to say X in language Y" is a different task from saying X in language Y
reconnecting 1 hour ago
A chat interface where you can try Apertus:
https://chat.publicai.co
jawns 1 hour ago
I am curious about how opt-outs and PII removal work.
Who confirms those requests are legit?
throwaw12 2 hours ago
Looks like their instruct models are Llama3.1 fine tune from last year. Is there any progress on new models?
My last hope for soverign AI is from Chinese open models
[-]
- kordlessagain 2 hours ago
  Sovereign AI is not about using just one model. It's about using the right model for the right job, and getting them to talk through the solution TOGETHER before presenting the answer.
  If you want to mix models like this, check out https://github.com/deepbluedynamics/nemesis8
trvz 2 hours ago
The previous version of this model has been pretty bad, but claimed to adhere to copyright laws. However, based on my testing, that's not true either. So in my view this is completely useless.
[-]
- embedding-shape 2 hours ago
  As long as the following remains true, this release ends up a bigger contribution to science at large than most other models trained "behind closed doors":
  > Fully open model: open weights + open data + full training details including all data and training recipes
  [-]
  - coder543 1 hour ago
    Is a recipe useful if no one likes it?
    There are equally open, much more useful models out there: https://artificialanalysis.ai/?models=nvidia-nemotron-3-ultr...
- simonw 2 hours ago
  It uses fineweb, which is derived from Common Crawl, which is an unlicensed scrape of web pages.
maxloh 2 hours ago
Great to see more fully open LLMs.
I think a problem with open-weight models is that while you can improve them, you are not going to create the next generation of LLMs by fine-tuning. We are at the mercy of frontier labs for access to SOTA LLMs. For example, Anthropic recently started requiring identity verification for Claude [0], same for OpenAI [1].
If one day China's distillation labs stop releasing their LLMs as open-weight, I doubt American labs will continue to release free LLM weights without that competition.
That's where fully open pipelines shine: they enable the community to create the next generation of SOTA LLMs. That is the only way LLMs truly become sovereign.
[0]: https://news.ycombinator.com/item?id=48618455
[1]: https://news.ycombinator.com/item?id=48618606
[-]
- anon373839 1 hour ago
  > China's distillation labs
  This notion that Chinese labs are merely distilling frontier models is quite an unwarranted slur. Those labs have published WAY more useful research than US labs on RL techniques, novel model architectures, training pipelines, etc. They have also hit intelligence-per-parameter densities that US labs have yet to attain.
  Apart from that, merely training a model on outputs from another model, off policy and without the logits, doesn’t really work that well.
  The Chinese labs know how to build frontier level models. GLM-5.2 shows that they no longer even need Nvidia chips to do it.
  [-]
  - halJordan 31 minutes ago
    But have they? I understand that the Chinese side is illuminated and the American side is dark. I disagree that the Chinese labs have created anything that isn't in an American research lab or production dc. Sure the Chinese have published their findings and not for nothing. But are they novel? Unlikely imo
    [-]
    - chriskanan 21 minutes ago
      They are doing ta tremendous amount of novel research where American AI companies have "war rooms" to study their papers and models and American labs publish next to nothing. They have to often do more with less. As an AI researcher, Chinese labs are doing tremendous benefit to science whereas some American companies (and I'm American) seem to think only they are able to do AI research responsibility (I've been working on neural networks for 25+ years). I'm pretty sure Fable sabotaged my research codebase (see the news stories about this).
  - Vaslo 31 minutes ago
    I recently watched a video for one of these “Chinese Models” it kept insisting it was Claude when the user asked. Sorry, there’s no “slur” here but legit suspicion.
    [-]
    - c0rruptbytes 21 minutes ago
      https://blog.kilo.ai/p/did-claude-opus-48-distill-alibabas
      it happens to all models…when the internet is increasingly generated, things happen
- dofm 1 hour ago
  > We are at the mercy of frontier labs for access to SOTA LLMs
  I disagree with this use of SOTA, and this topic is why.
  Anthropic and OpenAI have “cutting-edge” models. These are beyond the state of the art but they are closed, secretive, hard to quantify.
  The “state of the art” is open source, open weights models that can be inspected, studied, shared and critiqued, because that is what is meant by “the art” —- it is the knowledge and principles and evidence and materials available to all. The “state of the art” is the highest point of that.
  I wish we could make this distinction and stop blessing two secretive, unverifiable loss-making companies with so much power.
  (Putting that aside, I suspect — without evidence, mind you - that the endless march to solving models by making them bigger is not the solution anyway.)
  [-]
  - sockaddr 1 hour ago
    Sorry but I think you’re requirement that something only be “the art” if any arbitrary person can critique it is off. The frontier labs are working on the state of the art but it’s just art that you aren’t allowed to see. Unfortunately.
    [-]
    - dofm 1 hour ago
      It is work using the principles of the art, obviously.
      But "state of the art" implies the highest state of general availability, not just in terms of access to some product, but of use of the ideas, concepts, methodologies etc.
      Anthropic and OpenAI have "cutting edge" models; the state of the art is behind the cutting edge.
      The state of the art is the best open source, open weights model available. More or less by definition.
      I am probably tilting at windmills here.
      [-]
      - bnj 19 minutes ago
        I appreciate this distinction. The are multiple senses of SOTA and one that has been taking on greater mindshare is as a synonym of “the best available”. By rebasing on SOTA as generally available and understood versus cutting edge, which has limited distribution and leads the way, we expand the vocabulary we have available to describe what’s going on. Thanks.
    - 8note 1 hour ago
      the art is the standard engineering practices that go into building the thing
      its things you would be trained in as part of a bachelor's degree and some graduate coursework
yreg 2 hours ago
previous thread: https://news.ycombinator.com/item?id=45108401
markab21 10 minutes ago
I'm mildly surprised that more people aren't using Nemo models for this reason. We've moved most of our processing to a combination of Nemo Ultra and Super, with some support for multi-model-specific tasks on Omni. The setup is working REALLY well for us, and I'm comfortable with the more measured pace of improvements. We work with many long-context problems, and the ecosystem is great.
There were a number of use cases where we needed to use Gemini (audio modality), and Ultra has been a VERY cost-effective alternative once we got through the nuances.
atemerev 2 hours ago
I use it extensively. It is not ready for agentic use, but as a generic driving model for RAG use cases, it is pretty competent. You can build useful software with it.
[-]
- MASNeo 1 hour ago
  I use Apertus including as the driver for an agent, not a coding agent. Find it useful enough. What was your Challenge?
_pdp_ 2 hours ago
I want to believe.
Ainaguade 28 minutes ago
[dead]
focusgroup0 6 minutes ago
[dead]