Posts

Ways of income for Engagement Project + Major code update ( with explanation ) to check the comment quality

avatar of @amr008
25
@amr008
·
0 views
·
5 min read

Good evening to everyone , I hope you are doing good .

This post to tell you notify you the major changes that I made to the code in order to prevent copy pasted comments , very similar comments to be identified .

Before going into the code , I will like to mention the number of ways I am trying to raise the income for Engagement project.

All the tokens earned goes to the Engagement Project itself.

SPORTS

Currently @amr008.sports earns through -

  1. Curation
  2. Authoring " SPORTS TODAY " discussion thread daily

From today -

  1. Daily actifit post .

CTP

Currently @amr008.ctp earns through -

  1. Curation
  2. Authoring " Curation Report of Engagement Project " daily .

STEM and LEO

  1. Curation alone .

In near future -

  1. All the Hive earned from @amr008.sports and @amr008.ctp will be used to buy LEO and STEM to give a boost to engagement project.

If you have any suggestion what more can be added , please let me know in the comments .

Python code update - using FuzzyWuzzy - to prevent spammers be in top 25.

The main intention of the Engagement project is to reward those who are engaging with other and putting effort through engagement . It would be unfair if I just let some people who copy paste number of comment and surpass genuine users efforts .

Fuzzy Wuzzy

Fuzzy Wuzzy is a library which let's us find the similarity between two strings ( in laymen terms - two sentences ) . I have used this in my code to find similar comments -

How does it work? Example
from fuzzywuzzy import fuzz 
from fuzzywuzzy import process 
 
 
from fuzzywuzzy import fuzz 
from fuzzywuzzy import process 
 
 
s1="Thanks" 
s2="Thanks buddy" 
s3="Thanks a lot buddy" 
 
compare=process.extract(s1,[s2,s3],scorer=fuzz.token_set_ratio) 
print(compare) 

Now I am comparing the first string s1 with s2 and s3 . The output is -

O/P = [('Thanks buddy', 100), ('Thanks a lot buddy', 100)]

This means , the s1 is actually in s2 and s3 . There is a 100% similarity between s1 , s2 and s3.

Example 2-

s1="Thanks a lot " 
s2="Thanks buddy" 
s3="Thanks a lot buddy" 
 
compare=process.extract(s1,[s2,s3],scorer=fuzz.token_set_ratio) 

O/P = s1 is - [('Thanks a lot buddy', 100), ('Thanks buddy', 67)]

This means s1 which is "Thanks a lot" is 100% similar to "Thanks a lot buddy " and 67% similar to "Thanks buddy" .

Let's take some real examples now -

@thatgermandude came forward and told me that he runs a lottery and talk to various authors with similar comments so he is getting an unfair advantage over others although the comments are similar .

His latest comments -

Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Are you sure you have the right LEO Token? I am talking about the LeoFinance Token on hive. Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery Thank you for participating! Today you had no luck... Maybe you will do better in my next Not-a-Lottery

So when I applied the string comparison and ran test -

Observe the quality points which determines the rank , here it is - 0.0239

Without applying string comparison

He was actually in the top 25 today with comment quality of 2.2218 ( because of length of comment , number of comments , people talked to is high )

I would like to apologize to @thatgermandude for using his example here but its only because he was so honest and voluntarily came forward and told me about this I am using his example .

Prevents copy pasted comments , highly similar comments to get upvotes .

If you look at the above example @thatgermandude was at 18th rank but after I implemented the comment comparison , he is now in 163rd rank .

So if in future someone decides to take advantage by using a bot ? It will be very difficult to rank higher without manually answering the comments .

Which do you consider as similar comments ?

  1. I will compare your 1st comment with all other comments - if any of the other comments returns 60% or above match - it will be considered as similar comment and the count for similar comment goes up .
  2. Then I will move to 2nd comment and compare with 3rd to rest of the comments and similar process as 1st step until all the comments are done .

I arrived at 60% by manually checking a lot of various strings and taking samples out of real users comments .

Spam Alerts

I have also set alert in the code now using this logic -

  1. If a user has made 50% or more comments which are very similar to each other , the code will show up the name to me .

This is how I found out -

The code told me that @erarium has 100% similar comments and it was actually true -

you can check the latest comments here and see -

Is this intentional ? Absolutely not , it is a curation project just like mine . It is not their job to tell me they will post same comments , it is my job to figure it out .

Ranking of @erarium before this code implementation - 11
Ranking of @erarium after this code implementation - 160

I wanted to do this for very long and I am very happy I got this working to some extent . This code will be used from tomorrow to rank and curate . This doesn't mean all the other factors don't matter anymore - ofcourse they do . Everything has its own weightage . It just became harder for non-quality comments to be at the top.

@abh12345 and @crokkon . What do you feel about this?

Posted with STEMGeeks