Facebook begins using artificial intelligence to describe photos to blind users : Can we use such type of technology in the library for Visually Impaired user??
Ask a member of Facebook’s growth team what
feature played the biggest role in getting the company to a billion daily
users, and they’ll likely tell you it was photos. The endless stream of
pictures, which users have been able to upload since 2005, a year after
Facebook’s launch, makes the social network irresistible to a global audience.
It’s difficult to imagine Facebook without
photos. Yet for millions of blind and visually impaired people, that’s been the
reality for over a decade.
Not anymore. Today Facebook will begin
automatically describing the content of photos to blind and visually impaired
users. Called "automatic alternative text," the feature was created
by Facebook’s 5-year-old accessibility team. Led by Jeff Wieland, a former user
researcher in Facebook’s product group, the team previously built closed
captioning for videos and implemented an option to increase the default font
size on Facebook for iOS, a feature 10 percent of Facebook users take advantage
of.
Using
Voice Over to read descriptions of photos out loud
Automatic alt text, which is coming to iOS
today and later to Android and the web, recognizes objects in photos using
machine learning. Machine learning helps to build artificial intelligences by
using algorithms to make predictions. If you show a piece of software enough
pictures of a dog, for example, in time it will be able to identify a dog in a
photograph. Automatic alt text identifies things in Facebook photos, then uses
the iPhone’s VoiceOver feature to read descriptions of the photos out loud to
users. While still in its early stages, the technology can reliably identify
concepts in categories including transportation ("car,"
"boat," "airplane"), nature ("snow,"
"ocean," "sunset"), sports ("basketball court"),
and food ("sushi"). The technology can also describe people
("baby," "smiling," beard"), and identify a selfie.
ast week, I traveled to Facebook’s
accessibility lab in Menlo Park to see the technology in action. Wieland was
there, along with Matt King, a Facebook engineer who is blind. King, who was
born with limited sight and became blind in college, has been advocating for
more accessible computers since the 1980s. Today, he represents Facebook on a
World Wide Web consortium responsible for the technical specifications that
make web pages accessible.
The primary way that blind people access the
internet is through a screen reader — software that describes the elements
displayed on a screen (a link, a button, some text, and so on) and makes it
possible to interact with them. The web has evolved over the years to be
friendlier to blind people. For example, the downward-facing triangle you see
on every Facebook post, which allows you to hide the post or report it as spam,
gets described by the screen reader not as a triangle but as as "story
options, collapsed pop-up button." That way, blind users know they can
interact with it.
But much of the web has long been out of
reach for blind people. "You used to hear file names, and you didn’t know
if they were clickable," King says. "It was a big Easter Egg hunt —
and it wasn’t any fun at all. Even when I found the eggs, a lot of the eggs
were photos. People talk in pictures, and talking in pictures is inherently out
of reach for me." Facebook considered a range of approaches to the
problem. "We don’t want to add a lot of friction," King says.
"We could probably require people when they upload a photo: ‘please
describe this for blind people.’ It would drive people nuts — that would never
work at scale." (This is the actual approach Twitter is taking
to the problem, though adding descriptions is optional.)
Facebook’s scale is enormous: each day, users
upload 2 billion photos across Facebook, Instagram, Messenger, and WhatsApp.
And so the accessibility team turned to Facebook’s artificial intelligence
division, which is building software that recognizes images automatically.
"We need a solution to that problem if people who cannot see photos and
understand what’s in them are going to be part of the community and get the
same enjoyment and benefit out of the platform as the people who can,"
King says.
In a demonstration, King pulled up a few
stories on Facebook that include photos. He set the screen to black so we
couldn’t see anything. If you’d like to re-think everything you ever thought
you knew about web design, watch a blind person use the internet for five
minutes. King normally has his screen reader speak to him incredibly quickly —
the slightest audio cues now orient him on the page, reading Facebook posts out
loud, identifying links, and exposing various buttons. His fingers were a blur
as he entered commands on a standard MacBook Air. I remained totally lost until
King turned the screen back on, save for the handful of words that described
what we were seeing on Facebook.
One Facebook post had a photo with the
caption "Sunday night splurge," and the description read aloud by the
phone was "pizza, food." When King turned the screen back on, there
was a photo of a giant pepperoni pizza with olives. Another photo had the
caption "celebrations," and the phone described the photo as
"three people smiling outdoors." It turned out to be … three people
smiling outdoors. "Now I’m really understanding the essence of the
story," King says. "Sometimes it’s just really amazing what one word
can do."
Facebook is not alone in using machine
learning to understand photos; it’s one of a few things artificial intelligence
can currently do with any level of sophistication. Similar technology powers
keyword searches in Google Photos and Flickr. But the technology is still prone
to errors, and millions of objects have yet to be parsed. Last year, Google was
forced to apologize after Photos tagged two black people as
"gorillas."
By default, Facebook will only suggest a tag
for a photo if it is 80 percent confident that it knows what it’s looking at.
But in sensitive cases — including ones involving race, the company told me —
it will require a much higher level of confidence before offering a suggestion.
When it isn’t confident, Facebook simply won’t suggest a description. "In
some cases, no data is better than bad data," Wieland says.
It’s a cliché for tech companies to describe
a project as "just the beginning," but in this case it feels
particularly true. Today it only works on one platform, and only in English.
There are still millions of objects that Facebook can’t recognize with 80
percent confidence. ("Pizza" it knows. "Pepperoni pizza with
olives" is still a ways away.) But the team is already pushing hard on two
new tools: recognizing objects in videos, a technology it first demonstrated in
November; and something it calls "visual Q&A," which will allow
users to ask questions about pictures and receive an answer from Facebook’s AI.
You might ask who is in a photo, for example, and it would tell you the names
of the Facebook friends who appear in it.
At this stage, automatic alt tags represent a
fascinating demonstration of technology. But at scale, they could also
represent a growth opportunity — people with disabilities have been less likely
to use Facebook on average, for obvious reasons. "Inclusion is really
powerful and exclusion is really painful," King says. "The impact of
doing something like this is really telling people who are blind, your ability
to participate in the social conversation that’s going on around the world is
really important to us. It’s saying as a person, you matter, and we care about
you. We want to include everybody — and we’ll do what it takes to include
everybody."
Source | http://www.theverge.com
Pralhad
Jadhav
Senior
Manager @ Library
Khaitan
& Co
Upcoming
Event | National Conference on Future Librarianship: Innovation for Excellence
(NCFL 2016) during April 22-23, 2016.
Note
| If anybody use these post for forwarding in any social media coverage or
covering in the Newsletter please give due credit to those who are taking
efforts for the same.
No comments:
Post a Comment