Scraping Open Web Discussions

We’re talking about Scraping Open Web Discussions, with Pedro Maioli, Data Analyst.
View the slides:
Scraping Open Web Discussions
Agenda
- Reputation
- Figurative vs. literal
- AI rep vs. human rep
- Reputation data collection methods
- Traditional vs. online
- Planned vs. spontaneous
- Open Web reputation data
- Some examples
- Turning scraping into product
- Cases
Reputation Proceeds Transaction
Why do I want to talk about reputation? Reputation precedes transaction, in every sense of these words.
We can talk about the figurative sense – the buyer has a belief that the seller they are buying from is trusted – that they are going to receive a good quality product or service for the sum they’re going to spend.
In many ways, what I’ve caught myself doing was either selling the client (access to/knowledge about) it’s own reputation or even selling reputation itself!
You can’t even see organic results unless you scroll down.
But – BUT – Google does not lie. If the user wants Expedia, he or she WILL have Expedia. A whole Google box for Expedia.
My take is – if building reputation amongst users make a small fraction of users to search for the brand (like Airbnb does), they will still follow the click-the-organic route. Elon Musk said Google is like a mountain, you can climb it, but not move it.
It won’t solve the problem, but it helps. It gives us time to figure out our next move.
We all know what SEO is …
Offline methods
Traditional “offline” methods included door-to-door/street surveying, telemarketing, and even SMS texting. Traditionally, clients tend to think of these methods as boring, mildly inconvenient or even invasive.
Refusing to participate is a common answer.
Online methods
And then, there was the internet. Slightly less inconvenient, feedback pop-ups populated many websites, in all sorts of ways.
Of course, by then, customers are already aware of your service to some extent – sometimes even after the transaction (“how’d you enjoy buying with us?”) – other times, just wandering on your website, similar to a person just walking inside a department store, for example.
The point is, they’re actively interacting with the company in order to be prompted to giving feedback/telling us about the reputation we want to know. Everyone here has probably seen a “rate our app on Play Store/App Store” dialogue. That’s another good example. Again, the user has already downloaded the app, which only allows us to receive feedback from a user “inside” our dept. store.
Methods/Landscape
- Users ‘inside dept. store’
- Willing to participate, but only if it’s really quick
- Still hard for companies to identify key points on reputation
- Medium-to-hard-to identify key points on product/service class (not only client brand)
- Customers hardly see the ‘we value your opinion’ concept behind this
Differ
Constant factor: users WEREN’T giving their opinion in first place;
They were ALL doing different things when asked;
They won’t have a spontaneous or truly interested think about it.
What will they do?
Dear every hotel and airline, every @expedia and @hotwire:
I don’t want to tell you about my stay. I don’t want to tell you about my flight. I don’t want to tell you about check-in or check-out.
The worst part of my user experience is your incessant asking about it.
STOP.
— Jeff Speck (@JeffSpeckAICP) August 29, 2019
The sole aspect of answering a survey about one’s concept of other’s reputation might change the outcome.
(That’s the thing about quantum computing as well! Qubits instead of bits, if you look at the qubit, it will change its value and you’ll need to work on a prediction of it).
So where are we doing it? Guess what: humans came up this thing called “conversation”, where you can gather with friends and ramble about things we did or will do, like the location or platform we’ve chosen to book hotels for the next holiday. This can be done either AFK like these lovely people from stock photos at a bar or virtually, through the advent of messaging/social networking apps we can find our friends on, either through group chats or one-on-one conversations.
Unfortunately, this kind of data is usually private and protected by laws such as GDPR, unless you are Facebook or Google (technically, they DO share these, but in their own terms and making money off of it by selling API plans to automating tools).
Conversations Become Discussions
Enter the open web: whenever we expand the scope of our conversations to a search/indexable place, usually available to not only friends but acquaintances, people with common interests, brands or even total strangers.
Here’s what I find magical about these: each channel will obviously have a different userbase and focus. Some of them will obviously hold angrier people, we might find lots of bad PR or free hate, but we also find genuine opinions from around the globe. In many of them, it’s culturally acceptable to weigh in on the discussion even if you’re total strangers.
Landscape
- 3.48 billion social media users in 2019
- 71% of consumers who have had a good social media interaction experience with a brand are likely to recommend it to others
- 96% of the people that discuss brands online do not follow those brands’ owned profiles
Here are a few examples from 2-3 days I’ve been checking Twitter (yeah, I do like Twitter), during August 28-29th.
Each conversation is unique, some we can even argue it’s acceptable to interact with as the company itself. Lots of providers do this, media providers like Netflix or Amazon and credit card providers as well.
- Acceptable example? User can be sympathized with, pointed to a support channel (Expedia does a nice job doing that from what I’ve seen in the past weeks)
- (deja vu!) unacceptable example?
- Unbranded search (“still need to plan”) opportunity to weigh in
- Simple opportunity to endorse good PR through like/retweeting actions
Also, look at how global open web discussions are: Japan, India, US, Europe, professional, personal…
- Trying desperately to make people buy your product
- Trying desperately to look cool and pulling a “How do you do, fellow kids”
- Scheduling a billion automated posts that no one will engage in because they’re too specific for your product
The range of open web discussions is wide, and user engagement is very targeted in some places.
Detecting good reputation is as good as detecting bad reputation, as we can learn from both kinds of occurrences.
As Reddit doesn’t have the 280 characters per post restriction like previous Twitter examples, I think it’s suited for the analysis of in-depth discussions.
If this wasn’t a coordinated inside job, give this guy a raise.
Turning Scraping into Product/Cases
Turning this kind of research into a product can be done in many different ways, which I’ve chosen to aggregate into three classes for agencies:
- There’s the social media listening/managing way
- The ‘create-your-platform’ way
- The ‘insightfulness/BI’ way.
Let’s keep in mind every one of these has its own set of tools and are usually biased towards delivering a specific outcome.