[Air-L] CDT online event - May 24, 2023 (10 am ET) - Can Large Language Models Analyze Non-English Content?
Dhanaraj Thakur
dthakur at cdt.org
Wed May 10 13:45:04 PDT 2023
Hi everyone,
Please see details below about an online event CDT is hosting on May 24
at 10am ET. This will follow the upcoming launch of our research report
"Lost in Translation: Large Language Models in Non-English Content
Analysis." In the meantime please RSVP for our event here
<https://www.eventbrite.com/e/mind-the-gap-can-large-language-models-analyze-non-english-content-tickets-631677633807>.
thanks,
Dhanaraj
*Mind the Gap: Can Large Language Models Analyze Non-English Content?*
*Time: *10:00 AM EDT
*Date: *May 24, 2023
From search engines to social media to hiring algorithms, automated
systems increasingly shape people’s online experiences worldwide.
Despite internet users speaking thousands of languages, most of these
systems are primarily trained using English-language data. Computer
scientists claim that they have found a solution to this linguistic gap
in a new technology called “multilingual language models.” Multilingual
language models work similarly to the language models that power new
generative systems like ChatGPT, but instead of being trained on
millions of examples of text in mostly one language, they pull text from
dozens or hundreds of languages and learn connections between them.
But do these multilingual language models work as well as companies say
they do? A new technical primer
<https://cdt.org/insights/languages-left-behind-automated-content-analysis-in-non-english-languages/>by
CDT shows that these systems may have key shortcomings which only
compound when used to analyze non-English languages.
This panel will convene NLP researchers building systems and digitizing
languages spoken by millions of people in India and South Africa,
content policy experts evaluating the impact these systems have on
users’ rights, and CDT’s research and policy team members for a deep
dive into how these multilingual language models work, what their
capabilities and limitations are, how they can be improved, and what’s
at stake when these systems fall short.
Speakers:
* Aliya Bhatia <https://cdt.org/staff/aliya-bhatia/>, Center for
Democracy & Technology
* Gabriel Nicholas <https://cdt.org/staff/gabriel-nicholas/>, Center
for Democracy & Technology
* Dr Monojit Choudhury
<https://www.microsoft.com/en-us/research/people/monojitc/>, Turing
Institute
* Dr Vukosi Marivate
<https://africa.harvard.edu/people/vukosi-marivate>, Masakhane
* Jacqueline Rowe <https://www.gp-digital.org/team/jacqueline-rowe/>,
Global Partners Digital
*RSVP here*
<https://www.eventbrite.com/e/mind-the-gap-can-large-language-models-analyze-non-english-content-tickets-631677633807>
--
*Dhanaraj Thakur* (he/him) | Research Director
Center for Democracy & Technology |*cdt.org <https://cdt.org/>*
*E:* dthakur at cdt.org | *P:* +1 202 407 8849
More information about the Air-L
mailing list