Project

General

Profile

Feature #14601

Feature #12562: Have a web analytics platform like Matomo

Know which ressources we would need to run Matomo on our infrastructure

Added by sajolida about 2 years ago. Updated 7 months ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
Infrastructure
Target version:
-
Start date:
09/04/2017
Due date:
% Done:

0%

Feature Branch:
Type of work:
Research
Blueprint:
Starter:
Affected tool:

Description

When importing logs on the prototype:

  • Were all CPU cores used during this process?
  • Was I/O a blocker, i.e. were processes blocked waiting for I/O?
  • Was all available memory used by this process?
  • Did you configure MariaDB in any way to optimize for large DBs?

To start with, we need:

  • The list of package dependencies
  • What access you need beside a shell (e.g. write access to file X, ability to run command Y as root)
  • The list of DBs and directories to backup
  • Resources requirements (ideally: current needs & what you'll need in 2 years).

import logs.webm (1.43 MB) sajolida, 10/26/2017 03:33 PM


Related issues

Related to Tails - Bug #11680: Upgrade server hardware (2017-2019 edition) Resolved 09/19/2016
Related to Tails - Feature #14846: Understand the user agent issue in the logs of our website Resolved 10/13/2017
Related to Tails - Feature #14872: Use Matomo to analyze the 2017 donation campaign Resolved 10/20/2017

History

#1 Updated by intrigeri about 2 years ago

  • Related to Bug #11680: Upgrade server hardware (2017-2019 edition) added

#2 Updated by sajolida almost 2 years ago

  • Blocks Feature #14761: Core work 2017Q4 → 2018Q1: User experience added

#3 Updated by sajolida almost 2 years ago

  • Target version set to Tails_3.3

#4 Updated by sajolida almost 2 years ago

Regarding the ressource usage when importing logs, I'm attaching a screencast of my prototype machine importing some logs and running iotop and top. I hope this helps!

  • Were all CPU cores used during this process?

The imports_logs scripts is simulating web requests and hitting back Apache. MySQL was the biggest CPU eater one 1 thread, and then Apache with 4 threads.

I'm not very good at top, how can I see which cores are used and how much?

  • Was I/O a blocker, i.e. were processes blocked waiting for I/O?

It doesn't seem so, but again, I'm not sure how to check that. I tried to run the OS plugging my hard disk inside the computer instead of running from USB as earlier and the importing speed was roughly the same: 1.5 h/day of logs.

  • Was all available memory used by this process?

No. I have 4GB and only 320MB were used.

  • Did you configure MariaDB in any way to optimize for large DBs?

No.

#5 Updated by sajolida almost 2 years ago

  • Related to Feature #14846: Understand the user agent issue in the logs of our website added

#6 Updated by sajolida almost 2 years ago

  • Related to Feature #14872: Use Matomo to analyze the 2017 donation campaign added

#7 Updated by sajolida almost 2 years ago

  • Assignee changed from sajolida to intrigeri
  • QA Check set to Info Needed

Now I'm wondering if it's crazy to ask you for a disposable VM to test Piwik on our infra as part of the analysis of the donation campaign 2017. See #14872.

I know that I can get Piwik running in 1-2 hours and the donation campaign analysis might be a good occasion to experiment with it with a clear objective (and also avoid the headaches and doubts I had parsing the logs with custom code this year). See #14846.

It might help us get a better idea on the ressources we'll need. Then we can destroy that VM or discuss improving it.

But I can also continue to run Piwik on my prototype machine like I've been doing until now if you think that the extra work is not worth it.

#8 Updated by intrigeri almost 2 years ago

  • Assignee changed from intrigeri to sajolida

Now I'm wondering if it's crazy to ask you for a disposable VM to test Piwik on our infra as part of the analysis of the donation campaign 2017.

When would you need it? (I'm asking because last year's analysis was done a looong time after the end of the campaign, so perhaps a relaxed timeframe would work even though I'm sure you want to be faster this time.)

#9 Updated by sajolida almost 2 years ago

  • Target version changed from Tails_3.3 to Tails_3.5

#10 Updated by intrigeri almost 2 years ago

sajolida wrote:

Regarding the ressource usage when importing logs, I'm attaching a screencast of my prototype machine importing some logs and running iotop and top. I hope this helps!

It does!

  • Were all CPU cores used during this process?

The imports_logs scripts is simulating web requests and hitting back Apache. MySQL was the biggest CPU eater one 1 thread, and then Apache with 4 threads.

I'm not very good at top, how can I see which cores are used and how much?

In top, type "1" and you'll get per-core (really: per-hyperthread) usage. Or use htop instead, which is nicer in many ways :)

  • Was I/O a blocker, i.e. were processes blocked waiting for I/O?

It doesn't seem so, but again, I'm not sure how to check that.

The "wa" number is "time waiting for I/O completion"; that's around 10% in your case.

I tried to run the OS plugging my hard disk inside the computer instead of running from USB as earlier and the importing speed was roughly the same: 1.5 h/day of logs.

Thanks, this is interesting, see below.

  • Was all available memory used by this process?

No. I have 4GB and only 320MB were used.

+ 1.1GB (increasing) in buffer/cache.

So looking at your video, there's no obvious bottleneck: not much I/O wait, not much I/O operations, plenty of free memory and I see that your CPUs are idle ~40% of the time. So I suspect that either the import script does not send as many HTTP requests in parallel as it could, or Apache/PHP is configured to handle too many parallel requests at a time.

What value did you pass to the --recorders option? The doc says: "It should be set to the number of CPU cores in your server. You can also experiment with higher values which may increase performance until a certain point". If you did not use this option, then I think this explains your results, and I'd like to see a new benchmark that actually tries to use the available hardware resources :)

In passing, you did not pass --enable-reverse-dns nor --disable-bulk-tracking, did you?

#11 Updated by sajolida almost 2 years ago

It seems like you read the doc much more than me. The command I ran was:

/var/www/misc/log-analytics/import_logs.py --url=http://localhost/ \
                                           --enable-http-errors --enable-http-redirects --enable-static --enable-bots \
                                           --idsite=1 \
                                           --log-format-regex='.* ((?P<ip>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "\S+ (?P<path>.*?) \S+" (?P<status>\S+) (?P<length>\S+) "(?P<referrer>.*?)" "(?P<user_agent>.*?)").*' \
                                           access.log-YYYY-MM-DD.gz

So yeah, it's with only 1 recorder :)

I want to play more with Piwik in the coming weeks and having the prototype machine up and running for days. I should even be able to give you a hand on the machine if you want.

I'll do some more benchmarking and report again here.

#12 Updated by sajolida almost 2 years ago

As you mentioned on XMPP, I would be super happy to try this on a VM in the cloud first. Maybe we should have a closer look at what kind of data we are confident to give to a VM in the cloud (I don't know if you're thinking about Amazon or Greenhost or whatever).

#13 Updated by sajolida over 1 year ago

  • Subject changed from Know which ressources we would need to run Piwik on our infrastructure to Know which ressources we would need to run Matomo on our infrastructure

#14 Updated by anonym over 1 year ago

  • Target version changed from Tails_3.5 to Tails_3.6

#15 Updated by sajolida over 1 year ago

  • Target version changed from Tails_3.6 to Tails_3.7

#16 Updated by sajolida over 1 year ago

  • Blocks deleted (Feature #14761: Core work 2017Q4 → 2018Q1: User experience)

#17 Updated by sajolida over 1 year ago

  • Blocks Feature #15392: Core work 2018Q2 → 2018Q3: User experience added

#18 Updated by sajolida over 1 year ago

  • Target version deleted (Tails_3.7)

#19 Updated by intrigeri about 1 year ago

sajolida wrote:

I want to play more with Piwik in the coming weeks and having the prototype machine up and running for days. I should even be able to give you a hand on the machine if you want.

I'll do some more benchmarking and report again here.

Reminder: next time you play with Matomo, please use as many recorders as your machine supports and report back here :)

#20 Updated by sajolida 11 months ago

  • Blocks deleted (Feature #15392: Core work 2018Q2 → 2018Q3: User experience)

#21 Updated by sajolida 7 months ago

  • Status changed from Confirmed to Rejected
  • Assignee deleted (sajolida)

I'm rejecting because I don't think we'll work on this any time soon.

Also available in: Atom PDF