Monthly Archives: April 2015

Removing posts from a revenge pornography Website – is it worth it?

From all the data I have collected I have been trying to determine a way which would accurately describe how quickly new posts to a revenge pornography website are consumed (by which I mean viewed) by the community who use a revenge porn Website.  Some posts gather large number of views.  It has been shown in previous blog posts that female depictions will attract more views than male depicted posts.

My previous posts have also described how some posts have been removed before I was able to collect a full months data on them.  In total 32% of the posts were removed and 68% remained live.  This has led me to wonder if the people in the posts that were removed were better off than those in the posts that remained live, as their pictures had been removed and were no longer able to be seen.

Obviously, once a post has been removed it can no longer be seen and commented on.  However, if material to the Website is consumed quickly and the majority of views happen in a relatively short time, then in order to reduce harm to the person depicted in the post speed is also very important.  If the majority of views happen in a short space of time, then it is probable that this is when the material is more likely to be downloaded and shared.

In order to investigate this further I took the total data for the all the current posts and and worked out the mean view per hour.  I then highlighted the percentiles (25%, 50% and 75%) and the hour that the percentile was reached.  I did the same for male and female data.  Here are the results.

Hour Percentiles Reached 
Gender 25% 50% 75%
Male 12 35 139
Female 9 22 64
All 9 23 72

New content is quickly consumed, and it would appear that posts depicting females are more quickly consumed than those depicting males.

Once material has been posted without consent then it is quickly consumed and presumably distributed within the revenge pornography community, suggesting that posts need to be removed as quickly as possible in order to reduce harm.

Advertisements

Revenge Pornography – From Description to Inference

So far the data I have collected I have analysed using descriptive statistics.  The next step is to see if the data I have collected can be used to infer anything about the community who use the revenge pornography Website.

The question I would like to answer here is as follows;

Is there a difference between how the community views and engages with posts containing females than with posts containing males?

The rationale behind this question is that I expect that the community of the revenge pornography website I am studying will be more interested in viewing and engaging with material containing females rather than that of males.  I will be exploring this theory through the page views and comments that each group receives.  While the previous work I have done shows that for the data I have gathered this is true, I need to run further tests in order to test whether or not this is applicable to the revenge pornography community as a whole and not just true for the data I have collected.

If you wish to skip the boring statistical bits, you can skip ahead to the conclusion.

Firstly I need to identify the correct test that I can run in order to test my hypothesis.  If you have read my previous blog posts you will know that I have a larger number of posts that depict females in my sample than those that portray men.  I have run some tests in order to check for normality of the data.  If the data is normal I can run an independent samples T test, otherwise I will need to find another test.

I am running the tests here on the posts that remained current throughout the whole 28 days.

Firstly I ran tests of normality checking the distribution of page views depending on gender.

Tests of Normality
Gender Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Maximum Page Views Male .242 18 .007 .847 18 .007
Female .047 378 .044 .956 378 .000
a. Lilliefors Significance Correction

Here the important numbers is those that come under Sig.  If these are less than 0.05 then it can be concluded that the data is not normally distrbuted and an independent samples T test would be the incorrect test to use.  Using the Kolmogorov-Smirnova and Shapiro-Wilk test of normality I can state that my data is not normally distributed, so non parametric tests are needed in order to test my hypothesis.

I ran a Mann-Whitney test in order to test my hypothesis and below are the results.

Test Statisticsa
Maximum Page Views
Mann-Whitney U 11.000
Wilcoxon W 102.000
Z -6.042
Asymp. Sig. (2-tailed) .000
Exact Sig. (2-tailed) .000
Exact Sig. (1-tailed) .000
Point Probability .000
a. Grouping Variable: Gender
Ranks
Gender N Mean Rank Sum of Ranks
Maximum Page Views Male 13 7.85 102.00
Female 257 141.96 36483.00
Total 270

The important point here is that gender is highly significant when correlated with page views.  The significance value is less 0.001.  The average rank of page views that females received in the Ranks table is higher than that of men.  Therefore we can conclude that posts depicting females will attract more views from the revenge pornography Website community  than those depicting males.

Statistical Write Up

A Mann-Whitney test indicated that the maximum page views was greater for females (median = 36791) than for males (median =7847), U = 11.00, P = 0.000.

Is there a difference between comments that a posts receives dependent on the gender of the person in the post?

Firstly I tested for normality.

Tests of Normality
Gender Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Maximum Comments Male .261 13 .016 .909 13 .177
Female .142 257 .000 .811 257 .000
a. Lilliefors Significance Correction

The Sig. Value for the Kolmogorov-Smirnov test for both males and females is less than 0.05 indicating a deviation from normality and suggesting that a non parametric test is required in order to explore the data further.

I ran the Mann-Whitney test in order to test my hypothesis and the results are as below.

Ranks
Gender N Mean Rank Sum of Ranks
Maximum Comments Male 13 55.38 720.00
Female 257 139.55 35865.00
Total 270
Test Statisticsa
Maximum Comments
Mann-Whitney U 629.000
Wilcoxon W 720.000
Z -3.798
Asymp. Sig. (2-tailed) .000
Exact Sig. (2-tailed) .000
Exact Sig. (1-tailed) .000
Point Probability .000
a. Grouping Variable: Gender

The important point here is that gender is highly significant when correlated with comments.  The significance value is less 0.001.  The average rank of comments that females received in the Ranks table is higher than that of men.  Therefore we can conclude that posts depicting females will attract more comments from the revenge pornography Website community  than those depicting males.

Statistical Write Up

A Mann-Whitney test indicated that the maximum page views was greater for females (median = 14) than for males (median =9), U = 629.00, P = 0.000

Conclusion

The community of the revenge pornography Website will engage more with posts containing females than posts containing males.

Next Steps

I have data detailing the number of pages views each post gathers on an hourly basis.  I would like to use this data to discover when the most views and comments are likely to occur showing how quickly material is distributed within the revenge pornography community.  However, this is likely to take a little bit of time to be written up as I need to do a lot of reading first.

I wanna get statistical, Let me hear your numbers talk

I apologise profusely for the terrible pun that is the title of this blog post.  I like statistical analysis and have been looking forward to this point for a few weeks.  Firstly, some boring stuff that you should know.  All analysis for this blog post was completed using SPSS version 22.  The data was collected from a revenge pornography website over 28 days using a custom built webscraper utilising python, selenium Webdriver.  The data was stored in an SQL database.  Initial data analysis was undertaken using Excel 2010 and is detailed in previous blog posts.

Data collection was 97.27% successful, with a failure rate of 2.73%.  Reasons for the failure for the Webscraper to run successfully are varied and dependant on different factors.

Descriptive Statistics

The table below details how many posts remained current or were removed over the full data collection period.  This has been split by the gender of the person depicted in the material posted.

Gender Post Status Count %
Male Removed 5 1.3
  Current 13 3.3
Totals 18 4.6
Female Removed 121 30.6
  Current 257 64.9
Totals 378 95.5
All Removed 126 31.8
  Current 270 68.2
Totals   396 100

As stated before, the majority of posts feature women and this is true for posts that remain current and are removed.

The table below details where the person depicted in the post resides.  While it is tempting to assume that the uploader and the subject are in the same country, it would be incorrect to do so, as it is clear from some of the comments posted the only contact the uploader and the subject had took place online and sometimes both were living in different countries.

Count Table N %
USA 303 76.5%
UK 23 5.8%
Canada 17 4.3%
Denmark 5 1.3%
France 4 1.0%
Australia 4 1.0%
Germany 4 1.0%
Italy 3 .8%
New Zealand 3 .8%
Mexico 3 .8%
Brazil 3 .8%
Netherlands 3 .8%
Venezuela 2 .5%
Russia 2 .5%
Czech Republic 2 .5%
South Africa 2 .5%
Puerto Rico 2 .5%
Poland 2 .5%
Sweden 1 .3%
Austria 1 .3%
Kenya 1 .3%
Japan 1 .3%
Lebanon 1 .3%
Albania 1 .3%
Romania 1 .3%
Hungary 1 .3%
Switzerland 1 .3%
Total 396 100.0%

I also split this data by gender of the subject in the material.

Count Table N %
Male USA 13 3.3%
Canada 2 .5%
UK 1 .3%
Mexico 1 .3%
Kenya 1 .3%
Total 18 4.5%
Female USA 290 73.2%
UK 22 5.6%
Canada 15 3.8%
Denmark 5 1.3%
France 4 1.0%
Australia 4 1.0%
Germany 4 1.0%
Italy 3 .8%
New Zealand 3 .8%
Brazil 3 .8%
Netherlands 3 .8%
Venezuela 2 .5%
Mexico 2 .5%
Russia 2 .5%
Czech Republic 2 .5%
South Africa 2 .5%
Puerto Rico 2 .5%
Poland 2 .5%
Sweden 1 .3%
Austria 1 .3%
Japan 1 .3%
Lebanon 1 .3%
Albania 1 .3%
Romania 1 .3%
Hungary 1 .3%
Switzerland 1 .3%
Total 378 95.5%

The people who were posted without to the revenge pornography Website came from 27 different countires, with the majority coming from the USA and other English speaking countries.

I then focused on the age of the person depicted in the material.  I grouped the ages into groups of 10 years, starting at 18.

Count Table N %
18 – 27 275 69.4%
28 – 37 73 18.4%
38 – 47 37 9.3%
48 – 57 10 2.5%
58 – 67 1 .3%

I split this data into gender of the subject in the images.

Count Table N %
Male 18 – 27 13 3.3%
28 – 37 4 1.0%
38 – 47 1 .3%
48 – 57 0 0.0%
58 – 67 0 0.0%
Total 18 4.5%
Female 18 – 27 262 66.2%
28 – 37 69 17.4%
38 – 47 36 9.1%
48 – 57 10 2.5%
58 – 67 1 .3%
Total 378 95.5%

I also split this data into groups depending on whether the post stayed current or was removed.

Count Table N %
Male Current 18 – 27 10 2.5%
28 – 37 2 .5%
38 – 47 1 .3%
48 – 57 0 0.0%
58 – 67 0 0.0%
Total 13 3.3%
Removed 18 – 27 3 .8%
28 – 37 2 .5%
38 – 47 0 0.0%
48 – 57 0 0.0%
58 – 67 0 0.0%
Total 5 1.3%
Female Current 18 – 27 167 42.2%
28 – 37 53 13.4%
38 – 47 30 7.6%
48 – 57 6 1.5%
58 – 67 1 .3%
Total 257 64.9%
Removed 18 – 27 95 24.0%
28 – 37 16 4.0%
38 – 47 6 1.5%
48 – 57 4 1.0%
58 – 67 0 0.0%
Total 121 30.6%

So far these tables have helped in analysing the people who appear on the Website.  Next I will be using this data to infer something about the population who use the revenge pornography Website.

Is there a gender bias in the type of posts that attract the most views and comments?

Using the number of maximum page views as the dependant and gender as the independent factor, I compared the means of the maximum page views for all the posts that remained current throughout the 672 hours.

Here are the results for a 95% confidence level.

Gender Statistic Std. Error
Male Mean 7936.23 421.44
95% Confidence Interval for Mean Lower Bound 7017.98
Upper Bound 8854.48
Female Mean 38083.40 960.52
95% Confidence Interval for Mean Lower Bound 36191.88
Upper Bound 39974.93  

There is no overlap between the confidence levels at a 95% level.  This indicates there is a real difference between the means of page views that posts depicting men and women on a revenge pornography Website.  Running the same test at a 99% confidence level gives us the same result, as can be seen from the table below.

Gender Statistic Std. Error
Male Mean 7936.23 421.44
99% Confidence Interval for Mean Lower Bound 6648.91
Upper Bound 9223.55
Female Mean 38083.40 960.52
99% Confidence Interval for Mean Lower Bound 35590.69
Upper Bound 40576.11

I ran similar tests replacing page views with maximum comments the post received.  Again this test was only run against posts that remained current for the whole 672 hours and was run at the 99% confidence level.

Gender Statistic Std. Error
Male Mean 8.92 1.04
99% Confidence Interval for Mean Lower Bound 5.74
Upper Bound 12.10
Female Mean 15.63 0.50
99% Confidence Interval for Mean Lower Bound 14.33
Upper Bound 16.93

There is no overlap between the confidence levels at a 99% level.  This indicates there is a real difference between the means of comments that posts depicting men and women on a revenge pornography Website.

Therefore, there is a difference in the comments and views that men and women who appear on a revenge pornography Website attract, and it would appear that the community who use and engage with revenge pornography are more likely to engage with posts depicting women.

However, as the keen eyed amongst you may have noticed, there is large difference between the number of women who appeared on the site compared to the number of women.  Therefore, it will be necessary to run further tests to see if the size of the sample makes a difference.  This will be discussed in my next blog post.

 

Analysis of 28 Days of Data Scraped From a Revenge Pornography Website.

All of my data has been collected and subjected to very simple analysis.  To recap, data was scraped from a revenge pornography Website every hour for 28 days.  The site identified accepted submissions depicting both male and female subjects.  The home-made pornographic material is displayed on the Website without the consent of the person depicted within it.

In my study I use “post” to describe the original posting to the revenge pornography Website submitted by the uploader.  This contains details about the person depicted, including geographical details, name and age.  There is also some detail about why the uploader is posting the material.  This is followed by the images or video.

Posts are then viewed and commented on.

Here is a summary of the data collected.

Day Number of Posts Male Female Current Removed
1 22 1 21 15 7
2 16 1 15 12 4
3 11 0 11 8 3
4 19 1 18 15 4
5 11 0 11 10 1
6 0 0 0 0 0
7 25 0 25 20 5
8 14 1 13 10 4
9 29 1 28 18 11
10 6 0 6 5 1
11 9 1 8 8 1
12 24 4 20 13 11
13 2 0 2 1 1
14 28 0 28 14 14
15 12 1 11 9 3
16 17 1 16 12 5
17 10 0 10 6 4
18 12 1 11 9 3
19 0 0 0 0 0
20 15 0 15 12 3
21 36 2 34 25 11
22 13 0 13 11 2
23 10 0 10 7 3
24 13 0 13 8 5
25 10 0 10 10 0
26 10 0 10 6 4
27 14 3 11 6 8
28 8 0 8 0 8
Totals 396 18 378 270 126

In total there were 396 posts to the Website over 28 days.

18 of these had men depicted in the posts and 378 had women depicted as the subject of the post.

270 posts stayed current for 672 hours (28 days).  126 were removed before the 672 hours finished.

There were 2 days when there were no posts to the Website.

Below the same data is presented in graphical format in order to make it easier to visualise.

Chart 1The above chart is pretty self-explanatory, showing how many posts were made on each day over a 28 day period.

Chart 3

Above we can see the posts that remained current and those that were removed grouped by the day they were posted.  68% of posts remained current throughout the data collection and 32% were removed before the 672 hours concluded.

Chart 2

The above chart shows the gender difference of the people depicted in the posts grouped by the day the post went live.  It is obvious from this chart that more women than men were posted the to the revenge pornography Website.  In fact, 95% of the posts showed women, with only 5% showing men.

Not only is there a  gender bias in the subject of the person depicted in the post, but there is also a gender bias in the number of views the post receives.

Chart 4

This chart shows the number of views each post receives if the person depicted in it is male and where the post remained current for the whole 28 days.

Chart 5

This chart shows the number of views each post receives if the person depicted in it is female and where the post remained current for the whole 28 days.

Chart 6

The data has been amalgamated into the above chart where it can be seen that there is a slight crossover between the highest viewed male and lowest viewed female.

Below are the messy graphs.  The first one shows the cumulative number of views each post that remains current receives over the 672 hours for female subjects.  From the graph you can see that the highest number of views happen early on in the 672 hours.

Chart 7

The second graph is easier to read and shows the cumulative number of views each post that remains current receives over the 672 hours for male subjects.  Similar to the above graph, you can see that the highest number of views happen early on in the 672 hours.

Chart 8

The next two graphs show the discrete views each post receives every hour.  The first graph shows the female posts and the second graph shows the male posts.  Both show that the greatest number of views happen early on in the life span of the post.

Chart 9 Chart 10

This stage has been useful in getting to know my data and identifying trends in the data.  I will next be trying out some simple statistical techniques in order to understand the community that surrounds a revenge pornography Website.

Revenge Pornography – Analysing Week 4’s Data

The final week’s data has been analysed and here are the results.

The summary of the data is as shown below.

Day Number of Posts Male Female Current Removed
22 13 0 13 11 2
23 10 0 10 7 3
24 13 0 13 8 5
25 10 0 10 10 0
26 10 0 10 6 4
27 14 3 11 6 8
28 8 0 8 0 8

There were 3 men who appeared on the Website during this week, of which only 1 post remained current throughout the whole the whole 28 days (672 hours).  Day 28 is the only day where all the posts were removed.

This data was converted into chart form in an attempt to make it easier to understand.  The first chart shows the posts made to the Website by daily frequency.

Chart 1

The second chart show the same data, however, here it is split by the gender of the person who is subject in the pornographic material posted.

Chart 2

The same data is again displayed, this time by status of the post.  Whether it stayed up (current) for the whole 672 hours or whether it was removed.

Chart 3

 

The posts that remained current for the whole month were then analysed to see how many views each post received.  Two charts are presented, one for females and one for males.

 

Chart 5 Chart 6

Again, the following charts are a mess, but useful for identifying trends.  The following two charts shows the number of cumulative views per hour for each posting with a status of current split by gender.

Chart 8

Chart 8

The next two charts show the number of actual views per hour for all current posts split by gender.

Chart 9Chart 10

 

Next I will be focusing on all the data that has been collected over the 28 days.

Revenge Pornography – Analysing Week 3’s Data

The analysis of the data I have scraped from a revenge pornography website has continued, and here is the data from week 3.

Firstly, here is the summary of the data that was collected.

Day Number of Posts Male Female Current Removed
15 12 1 11 9 3
16 17 1 16 12 5
17 10 0 10 6 4
18 12 1 11 9 3
19 0 0 0 0 0
20 15 0 15 12 3
21 36 2 34 25 11

First thing to note is that on day 19 there were no posts to the revenge pornography website.  There were 5 men who appeared on the Website during this week, and all 5 posts remained current throughout the whole the whole 28 days (672 hours).

Again the data was converted into chart form in an attempt to make it easier to understand.  The first chart shows the posts made to the Website by daily frequency.

Chart 1

 

The second chart show the same data, however, here it is split by the gender of the person who is subject in the pornographic material posted.

Chart 2

 

The same data is again displayed, this time by status of the post.  Whether it stayed up (current) for the whole 672 hours or whether it was removed.

Chart 3

 

The posts that remained current for the whole month were then analysed to see how many views each post received.  Two charts are presented, one for females and one for males.

Chart 4Chart 5

 

Again, the following charts are a mess, but useful for identifying trends.  The following two charts shows the number of cumulative views per hour for each posting with a status of current split by gender.

Chart 6Chart 7

 

The next two charts show the number of actual views per hour for all current posts split by gender.

Chart 8Chart 9

 

As before, it would appear from this data that most views to the posts take place early on, within the first 50 hours.  However, further analysis will need to be done.