Nex-AGI's Brazilian LLM Exposed as Open-Source Merge

Brazilian AI company Nex-AGI is facing significant backlash after its celebrated "homegrown" large language model, Nex-N2-7B, was identified as a merge of existing open-source models. The discovery, detailed in a public GitHub issue, directly contradicts the company's claims of developing the model from the ground up. This incident shines a harsh light on the importance of verification and transparency in the increasingly competitive AI landscape.

A 'Homegrown' Claim Under Scrutiny

Nex-AGI initially presented its Nex-N2-7B model as a landmark achievement for Brazil's technology sector. The company's narrative centered on creating a powerful, sovereign AI built with local expertise. This claim garnered positive attention, positioning Nex-AGI as a leader in the Latin American AI scene and a testament to the country's growing technological prowess.

However, the lack of transparency surrounding the model's training data, architecture, and computational resources quickly drew suspicion from the open-source AI community. As developers began to probe the model's files, inconsistencies began to emerge, leading to a full-blown community investigation.

The Community's Code-Level Investigation

A detailed analysis posted on the project's GitHub repository laid out compelling evidence against Nex-AGI's claims. The investigation, which anyone can review, pointed to several red flags indicating the model was not an original creation.

Key findings include:

Identical Architecture: The model's config.json file was a perfect match for another open-source model, Emerald-7B-V2.1.
Matching File Sizes: The model's weight files, known as .safetensors, had the exact same byte-for-byte size as the suspected source models.
Tensor Analysis: A deeper dive into the model's tensors (the core data structures of a neural network) revealed that the model appears to be a direct merge of two specific open-source models: 'Emerald-7B-V2.1' and 'Wey-L-7B-V3.'

Model merging is a legitimate technique where developers combine existing models to create new ones with hybrid capabilities. However, presenting a merged model as one built "from scratch" is a serious misrepresentation that violates the trust-based principles of the open-source community. For weekly insights into open-source developments and AI ethics, subscribe to the AI Breaking Wire newsletter and join over 10,000 AI professionals staying informed.

Why It Matters

The Nex-AGI controversy is a powerful case study in the power of community oversight in the AI industry. As companies race to announce new and more powerful models, claims require rigorous scrutiny. This incident serves as a critical reminder that transparency regarding training data, methods, and model origins is not just an ethical ideal but a fundamental requirement for a healthy and trustworthy AI ecosystem. Without it, the collaborative spirit of open-source innovation is put at risk.

Brazil's 'Homegrown' Nex-AGI LLM Exposed as a Model Merge

A 'Homegrown' Claim Under Scrutiny

The Community's Code-Level Investigation

Why It Matters

Comments

Comments