[Air-L] CDT online event - May 24, 2023 (10 am ET) - Can Large Language Models Analyze Non-English Content?

Dhanaraj Thakur dthakur at cdt.org
Wed May 10 13:45:04 PDT 2023


Hi everyone,

Please see details below about an online event CDT is hosting on May 24 
at 10am ET. This will follow the upcoming launch of our research report 
"Lost in Translation: Large Language Models in Non-English Content 
Analysis." In the meantime please RSVP for our event here 
<https://www.eventbrite.com/e/mind-the-gap-can-large-language-models-analyze-non-english-content-tickets-631677633807>. 


thanks,

Dhanaraj


*Mind the Gap: Can Large Language Models Analyze Non-English Content?*

*Time: *10:00 AM EDT

*Date: *May 24, 2023

 From search engines to social media to hiring algorithms, automated 
systems increasingly shape people’s online experiences worldwide. 
Despite internet users speaking thousands of languages, most of these 
systems are primarily trained using English-language data. Computer 
scientists claim that they have found a solution to this linguistic gap 
in a new technology called “multilingual language models.” Multilingual 
language models work similarly to the language models that power new 
generative systems like ChatGPT, but instead of being trained on 
millions of examples of text in mostly one language, they pull text from 
dozens or hundreds of languages and learn connections between them.

But do these multilingual language models work as well as companies say 
they do? A new technical primer 
<https://cdt.org/insights/languages-left-behind-automated-content-analysis-in-non-english-languages/>by 
CDT shows that these systems may have key shortcomings which only 
compound when used to analyze non-English languages.

This panel will convene NLP researchers building systems and digitizing 
languages spoken by millions of people in India and South Africa, 
content policy experts evaluating the impact these systems have on 
users’ rights, and CDT’s research and policy team members for a deep 
dive into how these multilingual language models work, what their 
capabilities and limitations are, how they can be improved, and what’s 
at stake when these systems fall short.

Speakers:

  * Aliya Bhatia <https://cdt.org/staff/aliya-bhatia/>, Center for
    Democracy & Technology
  * Gabriel Nicholas <https://cdt.org/staff/gabriel-nicholas/>, Center
    for Democracy & Technology
  * Dr Monojit Choudhury
    <https://www.microsoft.com/en-us/research/people/monojitc/>, Turing
    Institute
  * Dr Vukosi Marivate
    <https://africa.harvard.edu/people/vukosi-marivate>, Masakhane
  * Jacqueline Rowe <https://www.gp-digital.org/team/jacqueline-rowe/>,
    Global Partners Digital

*RSVP here* 
<https://www.eventbrite.com/e/mind-the-gap-can-large-language-models-analyze-non-english-content-tickets-631677633807>



-- 

*Dhanaraj Thakur* (he/him) | Research Director
Center for Democracy & Technology |*cdt.org <https://cdt.org/>*
*E:* dthakur at cdt.org | *P:* +1 202 407 8849


More information about the Air-L mailing list