Commit Graph

2336 Commits

Author SHA1 Message Date
Veloman Yunkan 8f2f93371b Changed a test in order to avoid a bug in Xapian
Xapian version 1.4.18 contains a bug in snippet generation caused by
incorrect handling of stemming.

The test-point with a search pattern "beatles" produced snippets with no
highlights of the search term. Debugging showed that the search pattern
"beatles" was transformed to a search term "beatl" which then didn't
match the word "beatles" in the text from which a snippet had to be
extracted.

The test case passed on my development machine as well as for most CI
configurations. However the "Packages / build-deb (ubuntu-bionic)"
variant failed because of a slightly different handling of punctuation
at the snippet boundaries:

Test context:
  url: /ROOT/search?pattern=beatles&content=zimfile
  actual snippet:   ...side "Yellow Submarine" ...........
  expected snippet: ...-side "Yellow Submarine" ...........

Above mismatch resulted in a looser comparison of the snippet contents
and failed the requirement that the snippet MUST contain highlights
(this is how the said bug in Xapian was discovered).

An attempt to change the search pattern to "field" didn't eliminate the
problem. Despite the search pattern itself being in singular form (i.e.
identical to its stemmed version) the plural form "fields" in the
snippet was still not highlighted.

Using for a search pattern an adjective instead of a noun achieved the
desired outcome.
2022-05-18 13:28:52 +04:00
Veloman Yunkan eeca88573b Validation of snippets in search results
The "expected" snippets in the test data must be a union of all possible
snippets produced at runtime for a given (document, search terms) pair
on all platforms of interest:

- Overlapping snippets must be properly merged

- Non-overlapping snippets can be joined with a " ... " in between.
2022-05-18 13:20:27 +04:00
Veloman Yunkan 4521249452 Excluded snippets from search results validation 2022-05-18 13:05:29 +04:00
Veloman Yunkan 21e183c2e4 First test for a non-first page of search results 2022-05-18 12:45:47 +04:00
Veloman Yunkan d56ccbd019 First search results test-point with pagination 2022-05-18 12:45:47 +04:00
Veloman Yunkan 825cf1c948 Added a test-point for a large unpaginated search 2022-05-18 12:45:47 +04:00
Veloman Yunkan 57c31a43a4 Another simple test-point for /search endpoint 2022-05-18 12:45:47 +04:00
Veloman Yunkan 84c68d4d7b Search results pagination bugfix
Search results pagination is disabled for a single page outcome too.
2022-05-18 12:45:47 +04:00
Veloman Yunkan f2cf42427a New unit-test TaskbarlessServerTest.searchResults
This is a preliminary implementation checking only the following
cases:

- no search results
- all search results fitting on a single page

The second test-case fails because of a bug in search renderer (leading
to the pagination footer being pointlessly enabled). Will fix it in the
next commit.
2022-05-18 12:45:47 +04:00
Veloman Yunkan 612ecc975d Support for testing a server without a taskbar
Taskbar injected by a server adds distraction to unit-tests focusing
on the HTML contents of the returned pages. The new test-suite
TaskbarlessServerTest will have taskbar disabled.
2022-05-18 12:45:47 +04:00
Veloman Yunkan ae56d399b7 Explained why search_result.html needs inline CSS
In #727 inline CSS [was extracted](e4a4b2f961)
from `static/templates/no_search_result.html` into a separate stylesheet
resource. The purpose was to later

1. get rid of the custom `static/templates/no_search_result.html` error
   template and use a general purpose error template instead (this was
   accomplished by PR #744).

2. deduplicate the CSS code between `static/templates/no_search_result.html` and
   `static/templates/search_result.html` by making the latter to also refer to
   an internal CSS resource rather than containing inline stylesheet code.

While preparing to implement the 2nd point, I figured out that
`kiwix::SearchRenderer` is used as a component in `kiwix-desktop` too,
which probably would be upset by a link to a libkiwix's internal CSS resource.

This commit documents that finding.
2022-05-18 12:45:47 +04:00
Kelson eaa8c3c91c
Merge pull request #776 from kiwix/fix_i18n_windows
Specify utf8 encoding when opening i18n resource file.
2022-05-17 22:50:20 +02:00
Matthieu Gautier 26c06d8c2a Specify utf8 encoding when opening i18n resource file.
Else, on windows, we will try to open files with "local" encoding (cp1252)
2022-05-17 18:36:35 +02:00
Matthieu Gautier eee6803328
Merge pull request #774 from kiwix/manually_generate_i18n_resource_list 2022-05-17 14:57:51 +02:00
Matthieu Gautier d19ae1b054 Update i18n_resources_list.txt using generate_i18n_resources_list.py 2022-05-16 14:27:48 +02:00
Matthieu Gautier abe2fa0179 Add a script to generate the i18n resource list automatically. 2022-05-16 14:27:48 +02:00
Matthieu Gautier 6e93bad565 Do not auto discover i18n files.
Revert to the plain old 'i18n_resources_list.txt' file.

Auto discovering of i18n file has a main flaw (and a small bug):
- The main flaw is that rerun the configure will not detect new
  translation files. It means that if we use cache in our CI,
  new translation will not be included.
- The bug is that on Windows, meson fails with a error about a non existent
  `` (empty) file name. I suppose it is because python replace
  `\n` by `\r\n` on Windows, and the the `.strip().split('\n')` keeps empty
  lines.

The small bug could be fixed, but the main flaw make the whole better if
we use a script to generate the listing.

This commit is somehow a half revert of 2eff5b55a6
2022-05-16 14:27:37 +02:00
Kelson 5fb919e73e
Merge pull request #772 from kiwix/roundHomepage 2022-05-15 10:02:05 +02:00
Nikhil Tanwar 2771a95d40 Floor the value returned by viewPortToCount()
Previously, the value returned by viewPortToCount() could be a decimal number, this floors its value.
Helps in clean requests and caching.
Fix #766
2022-05-15 08:02:32 +05:30
Kelson 8dbf015689
Merge pull request #770 from kiwix/magnetLink
Use real magnet link in download modal
2022-05-14 17:05:05 +02:00
Nikhil Tanwar 6cdc47eb62 Use real magnet link in download modal
Previously, on clicking Magnet, we were redirecting to a different site:
https://download.kiwix.org/zim/other/xyzBookWithDate.zim.magnet

This had the real magnet link as page content
Now we use the real magnet link in the href, thus not redirecting and starting the download right away.
Fix #767
2022-05-14 17:00:14 +02:00
Matthieu Gautier cbd37073e8
Merge pull request #761 from kiwix/translatewiki 2022-05-11 17:04:33 +02:00
translatewiki.net d131b732d8 Localisation updates from https://translatewiki.net. 2022-05-11 16:11:17 +02:00
Matthieu Gautier 17c1b3b82f
Merge pull request #759 from kiwix/diacritics_insensitive_suggestions 2022-05-10 15:51:18 +02:00
Veloman Yunkan 744dd87fb0 Testing that /suggest is diacritics insensitive 2022-05-10 15:15:19 +02:00
Matthieu Gautier d469e2aed8
Merge pull request #768 from kiwix/update_ci 2022-05-10 15:13:42 +02:00
Matthieu Gautier 73d2d47ca7 Run the CI on Ubuntu Bionic and Fedora 35
Xenial and f31 are eol
2022-05-10 14:58:56 +02:00
Matthieu Gautier 55149407d2
Merge pull request #763 from kiwix/i18n_resource_discovery 2022-05-09 15:11:02 +02:00
Veloman Yunkan 2eff5b55a6 Automatic discovery of i18n resources
Excluding qqq.json any .json file under static/i18n is now considered to
be a i18n resource. This eliminates the need to update the
i18n_resources_list.txt file every time a new language json file is
added. Thus Translatewiki PRs will not require extra work.
2022-05-09 15:12:16 +04:00
Kelson 26eccb5a5f
Merge pull request #712 from kiwix/static_resource_versioning
Static resource versioning
2022-05-02 23:49:55 +02:00
Veloman Yunkan 1b81ccc5e5 Using a regular expression with named groups 2022-05-02 20:48:05 +04:00
Veloman Yunkan 091786c7d8 A slight simplification of resource preprocessing
Now the whole content of a resource is preprocessed with a single
invocation of `re.sub()` rather than line-by-line.

Also, the function `get_preprocessed_resource()` returns a single value
rather than a (preprocessed_content, modification_count) pair; the
situation when the preprocessed resource is identical to the source
version is signalled by a return value of None.
2022-05-02 20:38:08 +04:00
Veloman Yunkan c0b9e2a466 Cache-id of resources with account for dependency
The cache-id of resources now includes dependency information. This commit
illustrates that property with the changed cache-id of skin/index.js which
depends on skin/{download,hash,magnet,bittorent}.png.

The implementation is not fool-proof - cyclic dependency between
resources is not detected and will lead to infinite recursion.
2022-05-02 20:37:22 +04:00
Veloman Yunkan 03ab2f67dd Using global variables for base & output directories 2022-05-02 20:37:22 +04:00
Veloman Yunkan 157f01e951 Preparing to handle inter-resource dependency
The current implementation of resource preprocessing contains a bug
(with respect to the problem that it tries to solve): it doesn't take
into account the dependence of static resources on each other. If
resource A refers to B and B refers to C, then a change in C would
result in its cache id being updated in the preprocessed version of B.
However the cache id of B won't change since the cache id is derived
from the source rather than from the preprocessed output.

This commit is the first step towards addressing the described issue.

Now cache-id of a resource is computed on demand rather than precomputed
for all resources. The only thing remaining is to compute the cache-id
from the preprocessed content.
2022-05-02 20:37:22 +04:00
Veloman Yunkan 42fd6e8926 Made kiwix-resources work with python 3.5-
Formatted string literals appeared in Python 3.6. Some CI platforms
still use older versions of Python.
2022-05-02 20:37:22 +04:00
Veloman Yunkan 707df3d10b Removing the old preprocessed resource, if any
If during an earlier build a resource was symlinked in the build
directory (because it wasn't modified by preprocessing) and later
changes are made to the resource that result in its preprocessing no
longer being a no-op, then the preprocessing is performed (in place) on
the original resource directly (via the symlink). Therefore any symlinks
must be removed before preprocessing a resource.
2022-05-02 20:37:22 +04:00
Veloman Yunkan c016dfd2ce Resource preprocessing handles relative links
... but only if they contain "/skin/" as a substring.
2022-05-02 20:37:22 +04:00
Veloman Yunkan 150851b33d kiwix-resources preprocesses all resources
kiwix-resources preprocesses all resources rather than only templates. At
this point this doesn't change anything since only (some) template resources
contain KIWIXCACHEID placeholders. But this enhancement opens the door
to the preprocessing of static/skin/index.js (after preprocessing is
able to handle relative links, which comes in the next commit).
2022-05-02 20:37:22 +04:00
Veloman Yunkan 3b9f28b2b5 Applied cache-id to search_results.css
The story of search_results.css

static/skin/search_results.css was extracted from
static/templates/no_search_result.html before the latter was dropped.

static/templates/no_search_result.html in turn seems to be a copied and
edited version of static/templates/search_result.html.

In the context of exploratory work on the internationalization of
kiwix-serve (PR #679) I noticed duplication of inline CSS across those
two templates and intended to eliminated it. That goal was not fully
accomplished (static/templates/search_result.html remained untouched)
because by that time PR #679 grew too big and the efforts were diverted
into splitting it into smaller ones. Thus search_results.css slipped
into one of those small PRs, without making much sense because nothing
really justifies preserving custom CSS in the "Fulltext search unavailable"
error page.

At the same time, it served as the only case where a link to a cacheable
resource is generated in C++ code (rather than found in a template).
This poses certain problems to the handling of cache-ids. A workaround
is to expel the URL into a template so that it is processed by
`kiwix-resources`. This commit merely demonstrates that solution. But
whether it should be preserved (or rather the "Fulltext search
unavailable" page should be deprived of CSS) is questionable.
2022-05-02 20:37:22 +04:00
Veloman Yunkan fc85215ea0 Preprocessing of template resources
In template resources (found under static/templates), strings of the
form "PATH/TO/STATIC/RESOURCE?KIWIXCACHEID" are expanded into
"PATH/TO/STATIC/RESOURCE?cacheid=CACHEIDVAL" where CACHEIDVAL is a
8-digit hexadecimal hash digest of the file at
static/PATH/TO/STATIC/RESOURCE.
2022-05-02 20:37:22 +04:00
Veloman Yunkan acdc1dfb27 New unit-test ServerTest.CacheIdsOfStaticResources
Introduced a new unit-test which will ensure that static resources of
kiwix-serve have the cache ids applied to them in the links embedded into
the HTML code.

At this point there are no cache ids. The new unit-test will help to
visualize how they come into existence.
2022-05-02 20:37:22 +04:00
Matthieu Gautier f90cc39a52
Merge pull request #757 from kiwix/gzip_compression 2022-04-28 14:36:51 +02:00
Matthieu Gautier fba0f09f4f Do not compress content smaller than 1400 Bytes 2022-04-27 18:23:39 +02:00
Matthieu Gautier 0d294c50a5 [SERVER] Support gzip encoding instead of deflate.
The `compress` function is copied from httplib
2022-04-27 18:23:38 +02:00
Kelson dc42f831c0
Merge pull request #756 from kiwix/doc-badge
Add documentation badge in README
2022-04-23 11:20:42 +02:00
Emmanuel Engelhart 1757f7f168
Add documentation badge in README 2022-04-23 10:38:15 +02:00
Matthieu Gautier c43c637bea
Merge pull request #679 from kiwix/kiwix-serve-i18n 2022-04-14 15:21:47 +02:00
Veloman Yunkan 927c12574a Preliminary support for Accept-Language: header
In the absence of the "userlang" query parameter in the URL, the value
of the "Accept-Language" header is used. However, it is assumed that
"Accept-Language" specifies a single language (rather than a comma
separated list of languages possibly weighted with quality values).

Example:

Accept-Language: fr
// should work

Accept-Language: fr-CH, fr;q=0.9, en;q=0.8, de;q=0.7, *;q=0.5
// The requested language will be considered to be
// "fr-CH, fr;q=0.9, en;q=0.8, de;q=0.7, *;q=0.5".
// The i18n code will fail to find resources for such a language
// and will use the default "en" instead.
2022-04-13 16:40:20 +02:00
Veloman Yunkan 9987fbd488 Fixed CI build failure under android_arm* 2022-04-13 16:40:20 +02:00