As part of HRF AI hackathon we made a Human Rights Benchmark and measured how much LLMs like human rights.
We asked each LLM about 46 binary questions and expected certain answers (starting with YES or NO for simplicity). Then it was a string comparison of the answer given by LLM and the expected answer we provided.
OpenAI is pro human rights as well as Meta. Chinese models are everywhere. The most intelligent open source model today (GLM) ranked the worst. Gemini avoided giving answers, and I think it is a kind of censorship, which ended up scoring low.
The idea is after doing proper benchmarks, we can shift AI in good directions ourselves, or demand that other companies score higher. Ultimately consumers of LLMs are better off, more mindful of what they are choosing and talking to.
Open sourced the code and questions:
Our activist:
Thanks @Justin Moon and @HRF for the event. It was a great experience and it was "the place to be" this weekend.
GitHub
GitHub - hrleaderboard/hrleaderboard: Human Rights Leaderboard
Human Rights Leaderboard. Contribute to hrleaderboard/hrleaderboard development by creating an account on GitHub.
Our activist: 
X (formerly Twitter)
楊建利/Jianli Yang (@yangjianli001) on X
哈佛大学研究员 公民力量创办人 Research Fellow, Harvard University Founder&President of https://t.co/nqPghFCRqx Scholar/Activist/Autho...