Commit Graph

63 Commits

Author SHA1 Message Date
Veloman Yunkan 18871b4b15 Helper function Library::getBookPropValueSet()
Introduced a helper function `Library::getBookPropValueSet()` and
deduplicated Library::getBooks{Languages,Creators,Publishers}() methods.
2021-08-03 11:32:38 +02:00
Matthieu Gautier b70c92cade Move back used helper functions to the public API.
- Add docstring
- Move the declaration in kiwix namespace.
- Adapt our include to include the right headers.
2021-07-07 14:43:13 +02:00
Maneesh P M 940368b8ac Add m_archives and getArchiveById to Library
These members will mirror the functionality offered by equivalent usage
of Reader class.
2021-07-03 14:02:31 +05:30
Veloman Yunkan b259afa408 Library::getBooksCategories()
Note: no unit test added
2021-06-08 20:37:00 +04:00
Veloman Yunkan 41276341d0 Empty query acts as a match-all query
After switching to Xapian-based search in the library/catalog, an empty
query stopped acting as a match-all query. This commit restores the old
behaviour in that regard.
2021-05-09 15:14:43 +02:00
Veloman Yunkan 3879b82112 const-correct kiwix::Library
- Made most methods of kiwix::Library const.
- Also added const versions of getBookById() and getBookByPath()
  methods.
2021-04-28 11:42:55 +04:00
Veloman Yunkan 63e9a09259 Cleaned up/beautified Library::updateBookDB() 2021-04-27 16:59:21 +04:00
Veloman Yunkan 4178c169dd Xapian documents in book DB store only the book id 2021-04-27 16:59:21 +04:00
Veloman Yunkan f751aff2fb Full case/diacritics insensitivity in catalog filtering
Catalog filtering should now be case/diacritics insensitive for all
fields. However it is not validated for language, name and category
fields, and is validated for tags, creator & publisher only for text
supplied in the filter (but not for values read from the book).
2021-04-27 16:59:21 +04:00
Veloman Yunkan 87dc9d2723 Made catalog filtering by query diacritics insensitive
Catalog filtering by titles/description was sensitive to diacritics
present in the query string. Fixed that.

Also enhanced the unit test to validate the insensitivity to diacritics
present in either the title/description or the query string.
2021-04-27 16:59:21 +04:00
Veloman Yunkan 9c7366890d Catalog filtering by tags works via Xapian 2021-04-27 16:59:21 +04:00
Veloman Yunkan 19e195cb7d Filter::Tags typedef 2021-04-27 16:59:21 +04:00
Veloman Yunkan 3d5fd8f585 Catalog filtering by creator works via Xapian 2021-04-27 16:59:21 +04:00
Veloman Yunkan d3d5abe14d Handling of non-words in publisher query
This change fixes the failure of the LibraryTest.filterByPublisher
unit-test broken by the previous commit.

The previous approach used in `publisherQuery()` for building a phrase
query enforcing the specified prefix for all terms fails if

1. the input phrase contains a non-word term that Xapian's query parser
   doesn't like (e.g. a standalone ampersand character, 1/2, a#1, etc);
2. the input phrase contains at least three terms that Xapian's query
   parser has no issue with.

Using the `quest` tool (coming with xapian-tools under Ubuntu) the
issue can be demonstrated as follows:

```
$ quest -o phrase -d some_xapian_db "Energy & security"
Parsed Query: Query((energy@1 PHRASE 11 Zsecur@2))
Exactly 0 matches
MSet:

$ quest -o phrase -d some_xapian_db "Energy & security act"
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries

$ quest -o phrase -d some_xapian_db 'Energy 1/2 security act'
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries

$ quest -o phrase -d some_xapian_db "Energy a#1 security act"
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries
```

The problem comes from parsing the query with the default operation set
to `OP_PHRASE` (exemplified by the `-o phrase` option in above
invocations of `quest`). A workaround is to parse the phrase with a
default operation of `OP_OR` and then combine all the terms with
`OP_PHRASE`.

Besides stemming should be disabled in order to target an exact phrase
match (save for the non-word terms, if any, that are ignored by the
query parser).
2021-04-27 16:59:21 +04:00
Veloman Yunkan a759ab989f Catalog filtering by publisher works via Xapian 2021-04-27 16:59:21 +04:00
Veloman Yunkan 7ccd9ffcce Catalog filtering by language works via Xapian 2021-04-27 16:59:21 +04:00
Veloman Yunkan 0c0a37073b Catalog filtering by category works via Xapian 2021-04-27 16:59:21 +04:00
Veloman Yunkan 415c65cf03 Catalog filtering by book name works via Xapian 2021-04-27 16:59:21 +04:00
Veloman Yunkan 8287f351e7 Final logic of Library::filterViaBookDB()
Moved the `filter.hasQuery()` check inside `buildXapianQuery()`.
`Library::filterViaBookDB()` only cares if the query that is going to be
run on the book DB would match all documents. The rest of changes
related to enhancing the usage of Xapian for the catalog search will
happen inside `buildXapianQuery()` and `updateBookDB()`.
2021-04-27 16:59:21 +04:00
Veloman Yunkan ea779ac200 Extracted buildXapianQuery() 2021-04-27 16:59:21 +04:00
Veloman Yunkan 80cd1fc989 Renamed 2 functions in Filter and Library 2021-04-27 16:59:21 +04:00
Veloman Yunkan 2d76f8395e Dropped unused functions from Filter's private API
This should have been done back in PR #460
2021-04-27 16:59:21 +04:00
Veloman Yunkan ec9186b174 Library::removeBookById() updates the search DB
This fix makes the `XmlLibraryTest.removeBookByIdUpdatesTheSearchDB`
unit-test pass.
2021-04-09 17:06:45 +04:00
Veloman Yunkan aaaa5a637e Library::filter() doesn't create empty books
This changes how the `XmlLibraryTest.removeBookByIdUpdatesTheSearchDB`
unit-test fails.
2021-04-09 17:06:45 +04:00
Veloman Yunkan 24ed96a38c Library.removeBookById() drops the reader too
This fix makes the `XmlLibraryTest.removeBookByIdDropsTheReader`
unit-test pass.
2021-04-09 17:05:56 +04:00
Veloman Yunkan aa2a031ba4 Xapian headers are not exposed through libkiwix 2021-04-07 18:24:33 +04:00
Veloman Yunkan e214efecd4 Language code conversion via ICU
Language code is converted from ISO 639-3 to ISO 639 (which is
understood by Xapian) via ICU. The previous approach via an explicit
map had its advantages since Xapian has more than one stemmer
implementations for some languages (selectable via Xapian-specific
identifiers). This commit relies on the defaults associated with the
ISO 639 language codes.
2021-03-17 14:32:03 +01:00
Veloman Yunkan 09233bf4f3 Support for partial queries in catalog search
The search text in the catalog query is interpreted as partial by
default, but partial query mode can be disabled in C++. The latter
possibility is not exposed via the /catalog/search kiwix-serve endpoint,
though.
2021-03-17 14:32:03 +01:00
Veloman Yunkan a599fb3892 Initial version of Xapian-based catalog search 2021-03-17 14:32:03 +01:00
Veloman Yunkan a17fc0ef2d Library::getBooksByTitleOrDescription() 2021-03-17 14:32:03 +01:00
Veloman Yunkan db06b2c7ca Library::BookIdCollection typedef 2021-03-17 14:32:03 +01:00
Veloman Yunkan a20f9e2ce1 Library::filter() works in two stages
1. Get the subset of books matching the q (title/description) parameter
   of the search

2. Filter out books not matching the other parameters of the search.

Stage 1. currently works in the old way, but will be replaced by Xapian
based search in subsequent commits.
2021-03-17 14:32:03 +01:00
Veloman Yunkan e55bf514e8 Dedicated 'category' parameter in catalog search 2021-03-17 14:10:57 +04:00
Matthieu Gautier 46626a3f98 Add the method get bookByPath in library. 2020-03-06 12:08:05 +01:00
Matthieu Gautier f560a1f815 Be able to filter the books by name. 2020-01-30 19:02:33 +01:00
Matthieu Gautier 49aa0fbb9f Use a macro to write the filters. 2020-01-30 15:42:54 +01:00
luddens c9a15c9961 Add a parameter to getBookmarks fct to get valid bookmarks only
The default value of this parameter is false, in this case all the bookmarks
are returned, otherwise only those who are related to books of the library.
2019-10-31 14:05:21 +02:00
Matthieu Gautier 598dd3c175 [API Break] Fix pathTools (and a bit stringTools).
Api changes :
 - removeLastPathElement do not takes extra arguments
   `removePreSeparator` and `removePostSeparator`.
   This is not needed as path do not need special tailing separator.
 - Only one function `split`. Arguments can be implicitly convert to
   string. No need for overloading functions to explicitly cast them.
 - `split` function takes another argument `trimEmpty`. If true, empty
   element are removed.

Path manipulation now almost pass trough a vector<string> to store each
path's part.

Most of the complex works is now made in the normalizeParts function.
2019-09-19 18:16:06 +02:00
Matthieu Gautier ce8fff0b42 Make the library create the reader. 2019-08-11 10:19:48 +02:00
Matthieu Gautier 72223d69fe Fix include in pathTools.h 2019-08-10 11:02:23 +02:00
Matthieu Gautier 31c9375a3a Better API to filter books in a library.
Instead of having a single method `listBooksIds` that tries to be
exhaustive about all the filter and sort option, split the method in
two separated methods `filter` and `sort`.

The `filter` method takes a `Filter` object that represent on what we are
filtering. This object has to be construct before calling `filter`.

```cpp
Filter filter;
filter.query("Astring");
filter.acceptTags({"nopic"});
// return all book in eng and with "Astring" in the tile or description".
library.filter(filter);
//equivalent to
library.listBooksIds(ALL, UNSORTED, "Astring", "", "", "", {"nopic"});
// or better
library.filter(Filter().query("Astring").acceptTags({"nopic"}));
```

The method `listBooksIds` has been marked as deprecated.

Add a small test on the library.
2019-06-26 16:41:01 +02:00
Matthieu Gautier c6254d9504 Allow the library to be filtered by tags.
This add an argument to `listBooksIds` to filter by tags.
So, this is an API break.
2019-03-07 17:08:39 +01:00
Matthieu Gautier af7689e3e8 [API break] Move all the tools in the tools directory instead of common.
The `common` name is from the time where kiwix was only one repository
for all the project (android, desktop, server...).

Now we have split the repositories and kiwix-lib is the "common" repo,
the "common" directory is somehow nonsense.
2019-01-23 15:31:38 +01:00
Matthieu Gautier 12498e2cfe Add bookmarks support.
The library now contains (simple) methods to handle bookmarks.
The bookmark are stored in a separate xml file.

Bookmark are mainly a couple (`zimId`, `articleUrl`).
However, in the xml we store a bit more data :
- The article's title (for display)
- The book's title, lang and date (for potential update of zim files)
2018-12-02 15:47:29 +01:00
Matthieu Gautier b5ce60a627 Move the dump of the library into library.xml in a specific class.
The same way the dump into a opds feed is in a specific class.
2018-11-28 12:09:28 +01:00
Matthieu Gautier cf1cfe774e Correctly check for ArticleCount and MediaCount before writing them. 2018-11-12 10:58:10 +01:00
Matthieu Gautier 2682fa8f9c Remove unecessary variable or output. 2018-10-26 14:19:10 +02:00
Matthieu Gautier b1508c0b98 Better listBooksIds supported mode.
Only have REMOTE or LOCAL is a bit restrictive. By using flags a user
can specify for complex request.
2018-10-24 11:50:11 +02:00
Matthieu Gautier a73ef23f6e Keep the book size in byte in memory (instead of in kb)
We keep the size in kb in library.xml for compatibility.
2018-10-24 10:47:12 +02:00
Matthieu Gautier fe6d5fa93e Store the downloadId in the book (and in the library). 2018-10-24 10:47:12 +02:00