Shane

Feb 9, 202622d

Category

Empty

1.

OpenClaw was found to contain hundreds of skills that were laced with Trojans and data-stealing malware, which turned the AI agent into a malware delivery system and prompted mitigation actions by OpenClaw and VirusTotal.

2.

WorldVQA benchmark showed that leading multimodal models still failed to reach 50% accuracy on basic visual entity recognition, with Gemini 3 Pro scoring 47.4% and models often asserting incorrect specific labels with high confidence.

3.

Claude Opus 4.6 claimed the top spot on the Artificial Analysis Intelligence Index, surpassing GPT-5.2, while the report noted that OpenAI's Codex 5.3 remained pending and that Opus's token costs were higher than some competitors.

4.

Researchers reported that reasoning models such as Deepseek-R1 generated internal ensembles resembling teams of experts—a "society of thought" with contrasting internal voices—and that this internal debate measurably improved problem-solving performance.

References

1.

https://the-decoder.com/malicious-skills-turn-ai-agent-openclaw-into-a-malware-delivery-system/

Malicious skills turn AI agent OpenClaw into a malware delivery system

Hundreds of skills for the AI agent OpenClaw were laced with Trojans and data stealers. OpenClaw and VirusTotal are fighting back, but the fundamental security problem with AI agents isn't going away.

the-decoder.com

1.

https://the-decoder.com/best-multimodal-models-still-cant-crack-50-percent-on-basic-visual-entity-recognition/

Best multimodal models still can't crack 50 percent on basic visual entity recognition

A new benchmark called WorldVQA tests whether multimodal AI models actually recognize what they see or just make it up. Even the best performer, Gemini 3 Pro, tops out at 47.4 percent when asked for specific details like exact species or product names instead of generic labels. Worse, the models are convinced they're right even when they're wrong.

the-decoder.com

1.

https://the-decoder.com/claude-opus-4-6-takes-the-top-spot-on-artificial-analysis-intelligence-index-but-openais-codex-5-3-looms/

Claude Opus 4.6 takes the top spot on Artificial Analysis Intelligence Index, but OpenAI's Codex 5.3 looms

Claude Opus 4.6 just claimed the top spot on the Artificial Analysis Intelligence Index, beating GPT-5.2 and every other model currently tested. But with OpenAI's Codex 5.3 still waiting in the wings and token costs running higher than the competition, first place might not last long.

the-decoder.com

1.

https://the-decoder.com/study-finds-ai-reasoning-models-generate-a-society-of-thought-with-arguing-voices-inside-their-process/

Study finds AI reasoning models generate a "society of thought" with arguing voices inside their process

New research reveals that reasoning models like Deepseek-R1 simulate entire teams of experts when solving problems: some extraverted, some neurotic, all conscientious. This internal debate doesn't just look like teamwork. It measurably boosts performance.

the-decoder.com

more pain more gain 🚀

© 2024-2025 Shane "Lx". All rights reserved.