Project

General

Profile

Bug #9900

Improve Website search

Added by BitingBird over 4 years ago. Updated 7 months ago.

Status:
Confirmed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
11/09/2014
Due date:
% Done:

33%

Feature Branch:
Type of work:
Research
Blueprint:
Starter:
Affected tool:

Description

Right now, it's barely usable.

jvoisin said he would work on that.


Subtasks

Bug #9899: Website search should prioritize titlesConfirmed

Bug #8247: Website search doesn't support quotes properlyConfirmed

Bug #9898: Website search should allow to sort by languageConfirmed

Feature #9904: Have an option to include mailing-lists to website searchResolved

Bug #12473: Search bar on the website points to 404Confirmed

Bug #13575: Website search should indentify the different translations of one document, and only provide one result.Duplicate


Related issues

Related to Tails - Bug #11650: Analyze third-party search engine requests Confirmed 08/16/2016
Related to Tails - Bug #11649: Analyze internal search engine requests Confirmed 08/16/2016
Related to Tails - Feature #6569: Make the Tails documentation searchable offline Rejected 01/05/2014

History

#1 Updated by jvoisin about 4 years ago

It seems that this is a known bug from ikiwiki, and there is little that I can do about this :/

#2 Updated by intrigeri about 4 years ago

It seems that this is a known bug from ikiwiki, and there is little that I can do about this :/

Perhaps that explains #9899, but probably none of the other subtasks.

#3 Updated by jvoisin over 3 years ago

Ikiwiki has some plugins to improves its searching capabilities, but they are either cumbersome or inefficient .

I took at look at what other privacy-minded websites are doing, and it seems that they are mostly using external search engines. For example, Qubes is using duckduckgo (at the bottom of the page).

This is an easy, zero-maintainance and effective way: I do trust more a generic web search engine to do effective, secure, meaningful and semantic searches than an ikiwiki plugin.

The question being, are we ok with externalising the search feature ?

#4 Updated by intrigeri over 3 years ago

  • Type of work changed from Website to Discuss

#5 Updated by elouann over 3 years ago

We have talked again about xapian today. See also https://ikiwiki.info/todo/different_search_engine/

#6 Updated by sajolida over 2 years ago

  • Related to Bug #12473: Search bar on the website points to 404 added

#7 Updated by u over 2 years ago

I personally hate websites which send me to some external search because it's often quite unusable.

#8 Updated by intrigeri over 2 years ago

elouann wrote:

We have talked again about xapian today. See also https://ikiwiki.info/todo/different_search_engine/

JFTR ikiwiki already uses xapian, as written on top of the page you're linking to.

#9 Updated by intrigeri over 2 years ago

  • Related to deleted (Bug #12473: Search bar on the website points to 404)

#10 Updated by intrigeri over 2 years ago

I personally hate websites which send me to some external search because it's often quite unusable.

I can relate to this feeling, and I agree that ideally we should fix all the major problems of the ikiwiki internal search engine and not rely on an external one. Now, as the many subtasks of this ticket show, the current search UX on our website as already pretty bad, and the DDG results for the search you used as an example on #12473 are much better than what we provide currently. So switching to DDG would be a great incremental improvement, and a great first step compared to what we currently have; I believe it's easy to implement, and then we can discuss, next time we update our roadmap, how much resources we want to put into improving ikiwiki's internal search engine.

What do you think? If you disagree, I would appreciate if you could elaborate a bit on "quite unusable", e.g. with a few examples applied to how it would work for our website :)

#11 Updated by u over 2 years ago

Report of our discussion at the monthly meeting of May:

We agree that there are heavy problems currently.
- improving the search feature might also help decrease the backlog of frontdesk.

Are we okay to externalise the search feature?
- one of us feels uncomfortable to give up autonomy over the website
- two of us think it's fine
- one of us is asking if this would raise security implications to give the
search string or several search strings of a user to a third party website.
What's the worst that could happen? The user's IP could be linked to these
strings and the person could be identified as a Tails user, if not using
TorBrowser.
- most of us think that the security implications for users should be reconsidered with more core people present at the meeting before taking any decision.

So let's brainstorm first some more on the implications of this.

As a sidenote, we could research a better solution than the ikiwiki search.
Something like Apache/solr or another local search like phinde.

#12 Updated by u over 2 years ago

  • Type of work changed from Discuss to Research

#13 Updated by sajolida over 2 years ago

I would appreciate if you could elaborate a bit on "quite unusable", e.g. with a few examples applied to how it would work for our website.

I'm not a big user of search bars and maybe this comes from being too
frequently frustrated by the search results. But my gut feeling is that
I'm not often frustrated by bad in-house implementations (like ours)
than externalized search.

But yes, I'm very interested in learning about Ulrike's experience.
A first thing that comes to my mind is whether the search results
provided by DDG would be integrated in our website design and navigation
or whether they would be a redirection to DDG's website. Because being
dragged outside of the website would be a clear usability downside.

What else?

#14 Updated by u over 2 years ago

sajolida wrote:

I'm not a big user of search bars and maybe this comes from being too
frequently frustrated by the search results. But my gut feeling is that
I'm not often frustrated by bad in-house implementations (like ours)
than externalized search.

Same as me then.

A first thing that comes to my mind is whether the search results
provided by DDG would be integrated in our website design and navigation
or whether they would be a redirection to DDG's website. Because being
dragged outside of the website would be a clear usability downside.

In general, when using externalized search, one is redirected to the external search engine page, see how it's done at qubes-os.org.

What else?

Well, for me there are indeed some privacy considerations to look into before externalizing this feature.

#15 Updated by u over 2 years ago

intrigeri wrote:

I personally hate websites which send me to some external search because it's often quite unusable.

I can relate to this feeling, and I agree that ideally we should fix all the major problems of the ikiwiki internal search engine and not rely on an external one. Now, as the many subtasks of this ticket show, the current search UX on our website as already pretty bad, and the DDG results for the search you used as an example on #12473 are much better than what we provide currently. So switching to DDG would be a great incremental improvement, and a great first step compared to what we currently have; I believe it's easy to implement, and then we can discuss, next time we update our roadmap, how much resources we want to put into improving ikiwiki's internal search engine.

What do you think? If you disagree, I would appreciate if you could elaborate a bit on "quite unusable", e.g. with a few examples applied to how it would work for our website :)

By unusable I mean that with an externalized search I'm sent to another page with a very different design, so I might get lost. Once there, I need to click again to get back to where I want to be, and get the answer I was actually searching for.

After thinking about this issue for some days now, I feel that we should not do it because of the aforementioned UX reasons, as well as privacy considerations.
Furthermore, while discussing this at the last contributor meeting, some people said they never even use the internal search but they always use the browser built-in search themselves.

Can we have some stats about the usage of search pages on the websites?
Is it possible to update the Xapian DB more frequently using a cron job?

#16 Updated by u over 2 years ago

Oh, and let me add one more thing: while DDG is very good with search results in english, i find it sometimes hard to find information in other languages.

#17 Updated by sajolida over 2 years ago

Sidenote, I'm not doing to do stats manually about that on the Apache logs. I'd rather spend that time trying to set up Piwiki or something like this.

Still, I'm happy to help people on to learn how to download the logs.

#18 Updated by u over 2 years ago

sajolida wrote:

Sidenote, I'm not doing to do stats manually about that on the Apache logs. I'd rather spend that time trying to set up Piwiki or something like this.

Awstats maybe? :)

#19 Updated by intrigeri over 2 years ago

Hi!

u wrote:

By unusable I mean that with an externalized search I'm sent to another page with a very different design, so I might get lost. Once there, I need to click again to get back to where I want to be, and get the answer I was actually searching for.

OK, thanks for clarifying! I personally find this UX better than not finding at all the page I was searching for, or finding N copies (#9898), or finding a 404 (#12473) but whatever, I'm not going to argue further on this point since I seem to be the only one who would find almost anything better than the current situation :]

Furthermore, while discussing this at the last contributor meeting, some people said they never even use the internal search but they always use the browser built-in search themselves.

Indeed, I happen to do this myself as well (when I don't simply use find or git grep locally), but that's because our website's search engine is so crappy I've entirely given up using it. I would be curious why these other people don't use it. And then we'll see what conclusion we can draw from this info.

Can we have some stats about the usage of search pages on the websites?

Sure, in theory all tails@ members have access to the raw data. I did it myself this time, because I felt somewhat responsible for it after having restarted this discussion.

So, there have been 48k searches over the last 2 months, i.e. one every 1.8 minutes. I had a quick look at the last 200 of them, and the vast majority seems legit (user agent pretents it's a real browser, and the query string is plausibly the kind of things I would expect humans interested in Tails would search on our website). But I have no clue how to understand this figure, other than "lots of people try using our internal search engine": we don't know if they'll find what they were looking for, and if they fail then we don't know if they'll try again next time or will simply give up using this search engine. On this topic I concur with sajolida wrt. the need for more powerful web analytics tools.

Is it possible to update the Xapian DB more frequently using a cron job?

I see no reason why it wouldn't be possible, but that's not a trivial coding task (nothing is trivial once concurrent access to data is involved). Rough guesstimates for an experienced Perl software developer, assuming the expected outcome is a bit more clearly specified first: probably 3-4 hours for someone who's already at ease with the ikiwiki code base and the way our production website runs, and rather 10-12 hours otherwise. As usual, doubling or triping these estimates would be sound. And then add a couple hours to deploy this thing on our production website. From there, two comments:

  • Actually fixing the root cause of #12473 would probably take less time than implementing this workaround.
  • This workaround, or a more proper solution, would address #12473 only, and in the end we would still be left with the other issues this search engine suffers from. So I don't think it's worth doing it in isolation; but it could surely be made part of a Great Plan™ to fix all the biggest issues at once without relying on an external provider, since that's apparently the option preferred by most of us.

As a sidenote, we could research a better solution than the ikiwiki search.
Something like Apache/solr or another local search like phinde.

I don't know any of these tools so I'll shut up. If someone has time to evaluate them, I suggest first looking at the subtasks to better understand the problems we're trying to solve here. And to keep in mind: the hardest part with such tools might be to integrate it into the (rather restrictive, for security's sake) setup our production website runs on.

At this point I'm giving up on the external search engine idea. I don't feel like trying to actively lead this discussion to a conclusion myself. I hope someone else will catch the ball and we won't be stuck for too long in the (crappy) status quo: there's no way not to make any decision, as not deciding anything or postponing is de facto equivalent to deciding that we're fine with keeping what we currently have until something happens.

Regardless, if required by the project, I could be the one working on the ikiwiki search engine, or any other solution that requires Perl skills. As I have probably made more than clear enough already, I still am rather unconvinced it would be a good use of our precious software development time, but if the project wants to prioritize this over some other things I could do, in the same time, with my Foundations Team hat, then I'll comply without complaining too much: it'll actually be fun hacking time for me ;)

Thanks everyone for your input!

#20 Updated by sajolida over 2 years ago

  • Related to Bug #11650: Analyze third-party search engine requests added

#21 Updated by sajolida over 2 years ago

  • Related to Bug #11649: Analyze internal search engine requests added

#22 Updated by sajolida over 2 years ago

From the DuckDuckGo website: https://duckduckgo.com/search_box

« Because of the way we generate our search results, we do not have the syndication rights to allow you to host our results on your site (e.g. in a frame). When your users click on the results they will be instead taken to our site. »

#23 Updated by sajolida over 2 years ago

#24 Updated by sajolida over 2 years ago

  • Assignee changed from jvoisin to sajolida

jvoisin: I'm shamelessly stealing this one from you. I hope you don't mind.

I read a bit about search and UX lately and I want to understand better the technologies behind them and see if we can keep this under our control.

#25 Updated by sajolida about 2 years ago

#26 Updated by sajolida over 1 year ago

  • Assignee changed from sajolida to jvoisin

I tried to read the documentation of Xapian and Lucene and it's definitely not written for human beings like me. I'm dropping the ball...

#27 Updated by u over 1 year ago

  • Related to Feature #6569: Make the Tails documentation searchable offline added

#28 Updated by intrigeri 8 months ago

  • Assignee deleted (jvoisin)

I'm de-assigning from jvoisin because AFAIK there's no WIP by him on this front and I'd rather make it clear that we need someone to lead this conversation to a conclusion.

Then let's first check whether DDG's search results are substantially better than what we currently have (if they are not, it's pointless to discuss privacy concerns about DDG). My claim that they were much better has not been challenged so far except perhaps:

u wrote:

while DDG is very good with search results in english, i find it sometimes hard to find information in other languages.

u, was this feedback about using DDG for searching the web in general, or about searching our website specifically?

FWIW I've just given it a try with the "contraseña" Spanish word:

I see that our current internal search engine returns 3 results, that are all relevant. DuckDuckGo returns many more results, most of them being relevant and missing in ikiwiki's own search results. Both have pros & cons wrt. ordering and presentation of results.

Also available in: Atom PDF