As a result of this clean-up the /suggest endpoint too stopped
generating confusing 404 Not Found errors (which, like in /meta's case
is not that important). Another functional change is that the "term"
parameter became optional.
Before this fix the /meta endpoint could return a 404 Not Found page
saying
The requested URL "/meta" was not found on this server.
Error cases producing such a result were:
- `/meta?content=NON-EXISTING-BOOK&name=metaname`
- `/meta?content=book&name=BAD-META-NAME`
Now a proper message is shown for each of those cases.
This fix is being done just for consistency (the /meta endpoint is not
a user-facing one and the scripts don't bother about error texts).
Now Response::build_404() takes the URL instead of the entire
RequestContext object. An empty url suppresses the
The requested URL "url" was not found on this server.
part of the error text.
Before this fix the /random endpoint could return a 404 Not Found page
saying
The requested URL "/random" was not found on this server.
Error cases producing such a result were:
- `/random?content=NON-EXISTING-BOOK` (can happen when a server is
restarted or the library is reloaded and the current book is no longer
available).
- Failure of the libkiwix routine for picking a random article.
Now a proper message is shown for each of those cases.
Library became thread-safe with the exception of `getBookById()`
and `getBookByPath()` methods - thread safety in those accessors is
rendered meaningless by their return type (they return a reference
to a book which can be removed any time later by another thread).
Introducing a mutex in `Library` necessitates manually implementing the
move constructor and assignment operator. It's better to still delegate
that work to the compiler to eliminate any possibility of bugs when new
data members are added to `Library`. The trick is to move the data into
an auxiliary class `LibraryBase` and derive `Library` from it.
Originally `LibraryManipulator` was an abstract class completely decoupled
from `Library`. Its `addBookToLibrary()` and `addBookmarkToLibrary()`
methods could be defined in an arbitrary way. Now `LibraryManipulator` has to be
bound to a library object, those methods are no longer virtual, they always
update the library and allow for some additional actions via virtual
functions `bookWasAddedToLibrary()` and `bookmarkWasAddedToLibrary()`.
Deduplicated the mustache templates static/templates/catalog_v2_entries.xml
and static/templates/catalog_v2_complete_entry.xml (the latter was
renamed to static/templates/catalog_v2_entry.xml).
This will allow handle_suggest API to accept two arguments `start` and
`suggestionLength` that will allow handle_suggest to retrieve
suggestions in the given range rather than the default 0-10 range.
Language code to human friendly name translation is now done with the
help of the ICU library. It works if the line
```
-include $(LANGSRCDIR)/resfiles.mk
```
in the file `source/data/Makefile.in` of the icu4c dependency is not
commented out. Currently, the said line is commented out (along with
some other include's) by the `icu4c_custom_data.patch` patch of the
`kiwix-build` tool.
Introduces a new member mp_search that houses the zim::Search object,
adds a new constructor for this purpose. This commit also add an
overload for getHtml that takes start and end integers as arguments
since they are not part of the search object we include.
With openzim/libzim#540 we now have a new function to get
illustration(previously favicon in 48x48 size and unity scale) in
multiple sizes. We need to replace getFaviconEntry with this new
getIllustrationItem method.
This changes the output of `/catalog/search` as follows:
- Entire search query (rather than only the value of the `q` parameter)
is put in the <title> node.
- Search performed with an empty query presents itself as "All zims".
- The feed id remains stable for identical searches on the same
library.
/catalog/v2/entries is intended to play the combined role of
/catalog/root.xml and /catalog/search of the old OPDS API. Currently,
the latter role is not yet implemented.
Implementation note: instead of tweaking and reusing
`OPDSDumper::dumpOPDSFeed()`, the generation of the OPDS feed is done via `mustache`
and a new template `static/catalog_v2_entries.xml`.
Note: This commit somewhat relaxes validation of non variable
`<updated>` elements in the OPDS feed - the contents of any `<updated>`
element is replaced with the YYYY-MM-DDThh:mm:ssZ placeholder.
Each sugestions used to be stored as vector of strings to hold various values
such as title, path etc inside them. With this commit, we use the new
dedicated class `SuggestionItem` to do the same.
With openzim/libzim#545 we now support snippet generation of titles
which can be used as the display label on the ui for highlighted titles
via the "label" field.
The old version used plain title which is still available in the value
field.
After switching to Xapian-based search in the library/catalog, an empty
query stopped acting as a match-all query. This commit restores the old
behaviour in that regard.
Returning status code 204 in case of an empty results doesn't show the
empty results page as described in #466. Reverting the changes in #396
fixes the issue.
Catalog filtering should now be case/diacritics insensitive for all
fields. However it is not validated for language, name and category
fields, and is validated for tags, creator & publisher only for text
supplied in the filter (but not for values read from the book).
Catalog filtering by titles/description was sensitive to diacritics
present in the query string. Fixed that.
Also enhanced the unit test to validate the insensitivity to diacritics
present in either the title/description or the query string.
This change fixes the failure of the LibraryTest.filterByPublisher
unit-test broken by the previous commit.
The previous approach used in `publisherQuery()` for building a phrase
query enforcing the specified prefix for all terms fails if
1. the input phrase contains a non-word term that Xapian's query parser
doesn't like (e.g. a standalone ampersand character, 1/2, a#1, etc);
2. the input phrase contains at least three terms that Xapian's query
parser has no issue with.
Using the `quest` tool (coming with xapian-tools under Ubuntu) the
issue can be demonstrated as follows:
```
$ quest -o phrase -d some_xapian_db "Energy & security"
Parsed Query: Query((energy@1 PHRASE 11 Zsecur@2))
Exactly 0 matches
MSet:
$ quest -o phrase -d some_xapian_db "Energy & security act"
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries
$ quest -o phrase -d some_xapian_db 'Energy 1/2 security act'
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries
$ quest -o phrase -d some_xapian_db "Energy a#1 security act"
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries
```
The problem comes from parsing the query with the default operation set
to `OP_PHRASE` (exemplified by the `-o phrase` option in above
invocations of `quest`). A workaround is to parse the phrase with a
default operation of `OP_OR` and then combine all the terms with
`OP_PHRASE`.
Besides stemming should be disabled in order to target an exact phrase
match (save for the non-word terms, if any, that are ignored by the
query parser).
Moved the `filter.hasQuery()` check inside `buildXapianQuery()`.
`Library::filterViaBookDB()` only cares if the query that is going to be
run on the book DB would match all documents. The rest of changes
related to enhancing the usage of Xapian for the catalog search will
happen inside `buildXapianQuery()` and `updateBookDB()`.
Language code is converted from ISO 639-3 to ISO 639 (which is
understood by Xapian) via ICU. The previous approach via an explicit
map had its advantages since Xapian has more than one stemmer
implementations for some languages (selectable via Xapian-specific
identifiers). This commit relies on the defaults associated with the
ISO 639 language codes.
The search text in the catalog query is interpreted as partial by
default, but partial query mode can be disabled in C++. The latter
possibility is not exposed via the /catalog/search kiwix-serve endpoint,
though.
1. Get the subset of books matching the q (title/description) parameter
of the search
2. Filter out books not matching the other parameters of the search.
Stage 1. currently works in the old way, but will be replaced by Xapian
based search in subsequent commits.
The kiwixlib java wrapper unit test can be run manually via the
src/wrapper/java/org/kiwix/testing/compile_test.sh script.
The test ZIM files in src/wrapper/java/org/kiwix/testing were created
using the create_test_zimfiles. They must be updated/re-generated and
committed in git whenever their source data or the create_test_zimfiles
script changes. Note: small.zim.embedded is not used at this point, it
was created for testing the enhancement coming in a few commits.
Mimetype may contain a parameters.
Then, the mimetype would be something like "text/html;foo=bar;foz=baz"
It will contains a `;` and `=` and it conflicts with the same operators
we use to separate the items in our list.
We have to use a more advanced algorithm which takes the context into
account.
Fix#416
Use a heap allocated buffer (with lifetime of Aria2 class) instead of
a stack allocated one.
Original fix made by @ZaWertun. Kudos to him.
Fix #kiwix/kiwix-desktop#123, kiwix/kiwix-desktop#513
and kiwix/kiwix-desktop#423
WaitingThread read some shared memory with the SubProcess
(`mutex`, `m_running`).
When we destroy the SubProcess, we must be sure that WaitingThread has
correctly finished else we may have invalid read/write on freed memory.
On the CI, the native_dyn docker image is setup with a packaged version
on libmicrohttpd for which `MHD_HTTP_RANGE_NOT_SATISFIABLE` is not
defined.
When the CI will be fixed, we can revert this commit.
Android clang complains about the fact it cannot move the
`std::unique_ptr<ContentResponse>` into a `std::unique_ptr<Response>&&`
(for the implicit `std::unique_ptr<Response>` constructor).
Let's help him a bit.
This is only an "interface" for now as other type of response (entry) may
be "transformed" to a ContentResponse.
We cannot move all the code in the class.
With #403, the article mimetype may be different than "text/html".
It can also be "text/html; raw=true".
(And in fact it already could have any kind of optional argument).
The response detect if taskbar must be added depending of the mimetype.
Now, `set_taskbar` can be call unconditionally
(no need to check for the mimetype)
And we don't need to call set_taskbar if we have no information to set.
Some HTML articles are meant to be displayed through a viewer. In this case,
we know we don't want the server to inject the taskbar nor the link blocker
because the content is not a user-ready web page but a partial element of it.
Such articles still need to be `text/html` to be parsed properly by browsers.
This changes the way we decide to display the tasbar or not.
Previously, we were adding it to every article with a MIME __starting with__ `text/html`.
Now, we're additionally preventing it on `text/html` MIME if there is a `;raw=true` string inside.
This leaves articles with MIME `text/html;raw=true` (warc2zim convention) outside
of the taskbar target.
For similar reasons, the external-link blocker is set to apply to the same set of articles.
Previously, it was applied to all articles which was an (unoticable) mistake.
Originally reported against case sensitivity of the Range header
(see issue #387), this fix applies to all request headers (since
according to RFC 7230 all header fields are case-insensitive, see
https://tools.ietf.org/html/rfc7230#section-3.2). However, a
corresponding unit-test was added only for the Range header.
Previous API were using an internal vector to store the suggestions search
results.
The new API takes a vector as out argument. So user can call the functions
without having to protect the search.
We should change the android API to reflect the change but it is a bit
more complex to do at JNI level. As android do not call it multithreaded
we are safe for now. And we need the new API asap for kiwix-desktop.
So we keep the same API on android for now, the new api will be made
in next version.
Some architectures, specifically armel, mipsel, m68k & powerpc in
Debian, need to explicitly link to atomic.
Use meson to see if the target's CPU family is one of those, and if so,
pass -latomic to the linker.
Tested on armel and mipsel machines to verify passing -latomic works, and
on armhf and amd64 to ensure normal builds aren't broken.
Fixes#371.
libmicrohttpd handles HEAD requests by dropping the body of the response
(if any). Hence letting a HEAD request through into the code that
processes GET requests is safe.
Also added server unit-tests related to the handling of HEAD requests.
Response::set_entry() was upgraded from a simple setter to a method
performing certain business logic that was previously taken care of by
InternalServer::handle_content().
This is surprising, but C++11 fstream doesn't have a constructor
that take wchar as path.
So, on windows, we cannot open a stream on a path containing non ascii
char. VC++ provide an extension for that, but it is not standard and
g++ mingwin doesn't provide it.
So move all our write/read tools function to the plain old c versions,
using _wopen to open wide path on windows.
We must use the wide version of the getenv to correctly handle the case
we have accents in the user directory.
This also change the default dataDirectory on windows from $APPDATA to
$APPDATA/kiwix.
Fixed a regression introduced in block-external-links feature.
For cleaner source, the taskbar (and the block-external JS file) were both
attached to `<head>\n`.
Unfortunately, this isn't safe enough as some ZIM files might have all kinds of HTML
syntax. Sotoki for instance have no CR after head, rendering the attachment impossible.
Note: realizing this method is somehow fragile as any HTML content with extra attribute
on the `<head>` tag or without a `<head>` tag would break the taskbar and the block external feature.
- `setBlockExternalLinks()` on server
- zero-dependency JS code
- JS script added in `inject_externallinks_blocker()`
- changed URL to `/catch/external?source=<source>`
In many use cases, it is not wanted to have user accidentaly click on external links
and leave the served ZIM content.
This could be because the result is unpredictible (reader not implementing this properly)
or because the serve user knows there's no backup internet connexion or because there is
an induced cost behind external links that doesn't affect served content.
using a new flag (`blockExternalLinks`) on `Response`/`setTaskBar`, a piece of JS code
is injected into the taskbar code.
This code adds a JS handler on all link click events and verifies the destination.
If the destination appears to be an external link (1), the link target is changed to
a specific URL:
```
/external?source=<original_uri>
```
(1) external is a link that's not on the same origin and starts with either `http:` `https:` or `//`.
Server implements a new handler on `/external` that displays a new page (`captured_external.html`)
which returns a generic message explaining the situation and offering to click on the link
again should the user really want to.
This is done by specifically asking `set_taskbar` to not block external requests on that page.
This approach allows integrators using a reverse proxy to handle that endpoint differently (rebrand it)
1. `Server` now has an `m_blockExternalLinks` defaulting to `false`
1. `Server.setTaskbar` is extended to support an additional bool to set the variable.
1. `Response` now has an `m_blockExternalLinks`
1. `Response` constr expects an additional bool for `blockExternalLinks`.
1. `Response.set_taskbar` is extended to support an additional bool to set the variable.
1. JNI/Java Wrapper reflects the extensions.
1. New resource file `templates/block_external.js` (included in head_part). Should it be in skin?
1. New resource file `templates/captured_external.html` for `handle_captured_external()`
1. Added a comment on `head_part.html` to help with JS insertion at the right place
1. `introduce_taskbar()` conditionnaly inserts the JS inside the taskbar
We must correctly quote path with space on windows.
This is needed as we can't launch command using a array of string on
windows but by giving only one string using space as separator.
Fixkiwix/kiwix-desktop#268
No real change. Reordering setting and dumping of attribute in the same
order (mostly) they are declared in book.h make it easier to detect missing
attribute.
Downloader::startDownload has a new parameter option which is a vector of
pair that represents the options that can be set for adding a uri with aria2
with the function Aria2::addUri.
Aria2::addUri uses this parameter to set the struct of parameters for the
aria2 command
This mainly use the "new memory system". No need to call dispose function.
Rename the class to Library to conform with the naming semantics
(JNIKiwix* use old memory system)
The JNIKiwixManager is used to manage (insertion of book in) the library.
It is created, as needed, using an existing Library as input.
It is then used to add books, parse library.xml or opds content.
Then it can be destruct (and must be) with the `dispose` method.
```java
library = JNILibrary(...);
manager = JNIManager(library);
manager.parseOpds(opdscontent);
manager.dispose();
// library contains the books declared in the opds content.
// Use the library methods to get the books' info.
```
All path must be utf8. This is already the case in all our project.
(If this not the case, this is a bug)
So we don't need to have a version with a native and utf8 path.
Set the different filter's fields only when we are requested to filter
them. Else, we ends to requests that some fields are empty.
If the request has no argument, we raise an exception (catched) and so
we don't set the corresponding field in the filter.
Fix#303
The default value of this parameter is false, in this case all the bookmarks
are returned, otherwise only those who are related to books of the library.
Api changes :
- removeLastPathElement do not takes extra arguments
`removePreSeparator` and `removePostSeparator`.
This is not needed as path do not need special tailing separator.
- Only one function `split`. Arguments can be implicitly convert to
string. No need for overloading functions to explicitly cast them.
- `split` function takes another argument `trimEmpty`. If true, empty
element are removed.
Path manipulation now almost pass trough a vector<string> to store each
path's part.
Most of the complex works is now made in the normalizeParts function.
There are two executable path :
- The user one (the appimage path)
- The real one (in the appimage archive)
When we search of `library.xml` we need the user one.
But when we search of `aria2c` or `kiwix-serve` we need the real one.
Fixkiwix/kiwix-desktop#256
If kiwix-desktop use a `library.xml` in the same directory than the
executable, we need to use it instead of the default one.
Instead of detect again the `library.xml` to use, let `kiwix-desktop` set
the library to use.
This also fix a issue when `/` is not a valid path separator in windows.
AppImage works by decompressing the "program" in a temporary directory.
So the executable path is not the path of the AppImage file.
By using the environment variables set by appimage we can find the correct
"path" of the executable.
Fixkiwix/kiwix-desktop#46
Android need to handle the redirection by doing a redirection in the web
view, not by providing the content of the targeted article.
This is already what we do in kiwix-serve or ios.
The API should be far better by returning a Entry but for now,
we just change the given url if the article is a redirection.
The server will be running some code on the behalf of the calling code.
We really don't what to crash the library (and the binary) because
of a wrong request.
This code is mainly copied from kiwix-tools.
But :
- Move all the response thing in a new class Response.
- This Response class is responsible to handle all the MHD_response
configuration. This way the server handle a global object and do
no call to MHD_response*
- Server uses a lot more the templating system with mustache.
There are still few regex operations (because we need to
change a content already existing).
- By default, the server serves the content using the id as name.
- Server creates a new Searcher per request. This way, we don't have
to protect the search for multi-thread and we can do several search
in the same time.
- search results are not cached, this will allow future improvement in the
search algorithm.
- the home page is not cached.
- Few more verbose information (number of request served, time spend to
respond to a request).
TOOD:
- Readd interface selection.
- Do Android wrapper.
- Remove KiwixServer (who use a external process).
-