[Air-L] Research on Twitter - a couple of questions

xDxD.vs.xDxD xdxd.vs.xdxd at gmail.com
Mon Jan 16 22:52:08 PST 2012


Dear Monica

On Tue, Jan 17, 2012 at 6:29 AM, Monica Barratt <tronica at gmail.com> wrote:

> basically what the authors are doing
> is using Twitter's advanced search function to compile a database of
> Tweets which they then subject to a simple content analysis.

[...]

> (see
> https://twitter.com/#!/search-advanced ). But wouldn't that only bring
> up the tweets from people who have nominated their home city or
> geolocated their tweets? Don't many users do neither of these things
> and therefore the corpus of data would be incomplete?
>
>
from my experience (a series of projects ranging from research to business)
this is a complex question.

Twitter allows handling geographic data in more than one way.

The "near-this-place" functionality is partially reliable, meaning that it
actually uses GPS coordinates coming from your smartphone if that is the
tool your are using to twit and if you opted-in for the acquisition of this
kind of data.

And in this case, too, it is reliable if your GPS has enough satellite
coverage. Or, at least, this depends on how much detail you would need: if
you only need general indication of the location of the person (e.g.: city
level) i'd say you can grab all these kinds of results without any problem.

Different reasoning must be done if the user is accessing from the browser
and has opted-in for the geo-data acquisition as well.

In this case everything immediately becomes more unreliable, as it depends
on the possibility to use network topology to map geographical resources,
and this widely depends on your ISP's policies and infrastructures. For
example I am writing from Rome, but if I post to Twitter it would show up
from Florence (my provider lets only lets other service providers have
visibility up to a gateway that is placed in that city).

All this information has limited evidence on Twitter's data formats used to
send you back results of the various types. In these data formats the
indication of the geographical position is found in multiple places (user
related and twit related) and it includes the possibility to receive the
position under the form of points (e.g: using GPS) of geo-referenced
polygons (e.g.: when you geo-locate using the browser and the evaluation is
only available to identify you being inside a city administrative
boundaries). And, from what i was able to see, there is no explicit
information about the reliability of the information provided.

and researchers doing this kind of process also usually need to ask the
question if they really need the location of the user or wether if they
need to identify the place which is being talked about.

In this second case there are several truly reliable techniques which can
be used, such as Named Entity Recognition and Natural Language Processing,
with the two most commonly being used in combined form (e.g.: using NER to
identify names of places such as cities, shopping malls, landmarks,
neighbourhoods etc., and NLP to understand if the linguistic context
suggests that the word(s) are being used in ways which would describe the
user talking about that certain place).

>From my experience using hashtags is effective only in selected scenarios,
for example if you need to understand brand usage (#nike) or specific
really identifiable words (for example if you need to analyse events, or
situations referring to news, or things like that).

In all other cases much more depth needs to go into the analysis of data,
and i rarely found cases in which i could avoid using natural language
processing techniques.

But, again, it depends on what you're doing. If you are analyzing the ways
in which people digitally participate to an event by using the event's
hashtag, it is perfectly fine; maybe you loose something, but it works. If
you're interested in understanding the ways in which people feel about a
certain topic or issue, hashtags won't work, or, at least, they won't allow
you to gain a good, reliable, complete understanding of the scenario.

2. The authors use profile images to ascertain the approximate age and
> gender of account holders. In my experience, many people use profile
> images that do not represent themselves - eg. celebrities or past
> images of themselves or images of themselves with others.
>
>
no, completely unreliable.


> Is this an accurate or useful way of dealing with Twitter profile data
> or is it too flawed as a technique to be useful?
>

we commonly use profile data only on rare and specific occasions, and
consider profile data (such as home location) to be largely unreliable.

hope i have been useful

and, by the way: hello to you all AoIR members! :)

I have been lurking a bit now around the wonderful discussions going on
around this list, It has been a pleasure reading, and it will be to join in
the discussion again as well.

I am an artist, interaction designer, hacker and robotic engineer. I teach
ubiquitous publishing, cross media and digital design at Rome's University
La Sapienza, at Rome University of Fine Arts and at ISIA Design in
Florence, and my research focuses on the human mutation brought on by
digital technologies and networks, from the points of view of knowledge,
relationships, learning, living in urban contexts. Maybe we'll have a
chance to speak more about it.

all the best to you all,
Salvatore



-- 
Salvatore Iaconesi

salvatore.iaconesi at artisopensource.net
xdxd.vs.xdxd at gmail.com
salvatore at fakepress.net

skype: xdxdVSxdxd
---
Art is Open Source
http://www.artisopensource.net

---
FakePress
http://www.fakepress.it



More information about the Air-L mailing list