Zdravim. Potrebuji v Linu zpracovat nejaky html kod, ale narazil jsem na diakritiku, se kterou se nemuzu vyporadat. Jak jednoduse odstranim diakritiku ze znaku? nap. slovo "čeština" prevest na "cestina".
Zdravim. Potrebuji v Linu zpracovat nejaky html kod, ale narazil jsem na diakritiku, se kterou se nemuzu vyporadat. Jak jednoduse odstranim diakritiku ze znaku? nap. slovo "čeština" prevest na "cestina".
amd64 x86_64 AMD Athlon(tm) 64 Processor 3000+ GNU/Linux
Největší zábavou bývají ty počítačové úlohy, které jsou v praxi naprosto k ničemu
Nevim jak v C, ale v PHP to bylo v nějakym skriptu udělaný takle nějak:
prostě se nadefinovaly znaky s nějakou diakritikou k těm pak jejich ekvivalenty bez ní a pak odstranilo.Kód:$bez_diakritiky = StrTr($s_diakritikou, "áäčďéěëíňóöřšťúůüýžÁÄČĎÉĚËÍŇÓÖŘŠŤÚŮÜÝŽ", "aacdeeeinoorstuuuyzAACDEEEINOORSTUUUYZ");
...všechno lze nějak udělat, otázkou je jak...
PHP mi je naprd :]. Moje fce vypada takto:Původně odeslal Gappa
Kód:void DiakritikaPryc() { int *fd = OtevriSoubor(Stranka, "r"),*fds=OtevriSoubor(StrankaBezDiakritiky, "w"); unsigned char c, buffer[PocetBytu(fd)]; long j=0; char mapa[128][2] = { /*charmapa znaku s diakritikou - cp1250 sucks*/ // http : //www.columbia.edu/kermit/cp1250.html //dec char dec col/row oct hex description 128, 'E', // [€] 128 08/00 200 80 EURO SYMBOL 129, ' ', // [�] 129 08/01 201 81 ( UNDEFINED ) 130, ' ', // [‚] 130 08/02 202 82 LOW 9 SINGLE QUOTE 131, ' ', // [?] 131 08/03 203 83 ( UNDEFINED ) 132, ' ', // [„] 132 08/04 204 84 LOW 9 DOUBLE QUOTE 133, ' ', // […] 133 08/05 205 85 ELLIPSIS 134, ' ', // [†] 134 08/06 206 86 DAGGER 135, ' ', // [‡] 135 08/07 207 87 DOUBLE DAGGER 136, ' ', // [?] 136 08/08 210 88 ( UNDEFINED ) 137, ' ', // [‰] 137 08/09 211 89 PER MIL SIGN 138, 'S', // [Š] 138 08/10 212 8A CAPITAL LETTER S WITH CARON 139, '<', // [‹] 139 08/11 213 8B LEFT SINGLE QUOTE BRACKET 140, 'S', // [S] 140 08/12 214 8C CAPITAL LETTER S WITH ACUTE ACCENT 141, 'T', // [T] 141 08/13 215 8D CAPITAL LETTER T WITH CARON 142, 'Z', // [Ž] 142 08/14 216 8E CAPITAL LETTER Z WITH CARON 143, 'Z', // [Z] 143 08/15 217 8F CAPITAL LETTER Z WITH ACUTE ACCENT 144, ' ', // [�] 144 09/00 220 90 ( UNDEFINED ) 145, ' ', // [‘] 145 09/01 221 91 HIGH 6 SINGLE QUOTE 146, '’', // [’] 146 09/02 222 92 HIGH 9 SINGLE QUOTE 147, '"', // [“] 147 09/03 223 93 HIGH 6 DOUBLE QUOTE 148, '"', // [”] 148 09/04 224 94 HIGH 9 DOUBLE QUOTE 149, ' ', // [•] 149 09/05 225 95 LARGE CENTERED DOT 150, ' ', // [–] 150 09/06 226 96 EN DASH 151, ' ', // [—] 151 09/07 227 97 EM DASH 152, ' ', // [?] 152 09/08 230 98 ( UNDEFINED ) 153, ' ', // [™] 153 09/09 231 99 TRADEMARK SIGN 154, 's', // [š] 154 09/10 232 9A SMALL LETTER S WITH CARON 155, '>', // [›] 155 09/11 233 9B RIGHT SINGLE QUOTE BRACKET 156, 's', // [s] 156 09/12 234 9C SMALL LETTER S WITH ACUTE ACCENT 157, ' ', // [t] 157 09/13 235 9D SMALL LETTER T WITH CARON 158, 'z', // [ž] 158 09/14 236 9E SMALL LETTER Z WITH CARON 159, 'z', // [z] 159 09/15 237 9F SMALL LETTER Z WITH ACUTE ACCENT 160, ' ', // [ ] 160 10/00 240 A0 NO-BREAK SPACE 161, ' ', // [?] 161 10/01 241 A1 CARON 162, ' ', // [?] 162 10/02 242 A2 BREVE 163, ' ', // [L] 163 10/03 243 A3 CAPITAL LETTER L WITH STROKE 164, ' ', // [¤] 164 10/04 244 A4 CURRENCY SIGN 165, 'A', // [A] 165 10/05 245 A5 CAPITAL LETTER A WITH OGONEK 166, ' ', // [¦] 166 10/06 246 A6 BROKEN BAR 167, ' ', // [§] 167 10/07 247 A7 PARAGRAPH SIGN 168, ' ', // [¨] 168 10/08 250 A8 DIAERESIS 169, ' ', // [©] 169 10/09 251 A9 COPYRIGHT SIGN 170, ' ', // [S] 170 10/10 252 AA CAPITAL LETTER S WITH CEDILLA 171, ' ', // [«] 171 10/11 253 AB LEFT ANGLE QUOTATION MARK 172, ' ', // [¬] 172 10/12 254 AC NOT SIGN 173, ' ', // [] 173 10/13 255 AD SOFT HYPHEN 174, ' ', // [®] 174 10/14 256 AE REGISTERED TRADE MARK SIGN 175, ' ', // [Z] 175 10/15 257 AF CAPITAL LETTER Z WITH DOT ABOVE 176, ' ', // [°] 176 11/00 260 B0 DEGREE SIGN, RING ABOVE 177, ' ', // [±] 177 11/01 261 B1 PLUS-MINUS SIGN 178, ' ', // [?] 178 11/02 262 B2 OGONEK 179, ' ', // [l] 179 11/03 263 B3 SMALL LETTER L WITH STROKE 180, ' ', // [´] 180 11/04 264 B4 ACUTE ACCENT 181, ' ', // [µ] 181 11/05 265 B5 MICRO SIGN 182, ' ', // [¶] 182 11/06 266 B6 PILCROW SIGN 183, ' ', // [·] 183 11/07 267 B7 MIDDLE DOT 184, ' ', // [¸] 184 11/08 270 B8 CEDILLA 185, 'a', // [a] 185 11/09 271 B9 SMALL LETTER A WITH OGONEK 186, 's', // [s] 186 11/10 272 BA SMALL LETTER S WITH CEDILLA 187, ' ', // [»] 187 11/11 273 BB RIGHT ANGLE QUOTATION MARK 188, ' ', // [L] 188 11/12 274 BC CAPITAL LETTER L WITH CARON 189, ' ', // [?] 189 11/13 275 BD DOUBLE ACUTE ACCENT 190, 'l', // [l] 190 11/14 276 BE CAPITAL LETTER I WITH CARON 191, 'z', // [z] 191 11/15 277 BF SMALL LETTER Z WITH DOT ABOVE 192, 'R', // [R] 192 12/00 300 C0 CAPITAL LETTER R WITH ACUTE ACCENT 193, 'A', // [Á] 193 12/01 301 C1 CAPITAL LETTER A WITH ACUTE ACCENT 194, 'A', // [Â] 194 12/02 302 C2 CAPITAL LETTER A WITH CIRCUMFLEX 195, ' ', // [A] 195 12/03 303 C3 CAPITAL LETTER A WITH BREVE 196, 'A', // [Ä] 196 12/04 304 C4 CAPITAL LETTER A WITH DIAERESIS 197, ' ', // [L] 197 12/05 305 C5 CAPITAL LETTER L WITH ACUTE ACCENT 198, 'C', // [C] 198 12/06 306 C6 CAPITAL LETTER C WITH ACUTE ACCENT 199, ' ', // [Ç] 199 12/07 307 C7 CAPITAL LETTER C WITH CEDILLA 200, 'C', // [C] 200 12/08 310 C8 CAPITAL LETTER C WITH CARON 201, 'E', // [É] 201 12/09 311 C9 CAPITAL LETTER E WITH ACUTE ACCENT 202, 'E', // [E] 202 12/10 312 CA CAPITAL LETTER E WITH OGONEK 203, 'E', // [Ë] 203 12/11 313 CB CAPITAL LETTER E WITH DIAERESIS 204, 'E', // [E] 204 12/12 314 CC CAPITAL LETTER E WITH CARON 205, 'I', // [Í] 205 12/13 315 CD CAPITAL LETTER I WITH ACUTE ACCENT 206, 'I', // [Î] 206 12/14 316 CE CAPITAL LETTER I WITH CIRCUMFLEX ACCENT 207, 'D', // [D] 207 12/15 317 CF CAPITAL LETTER D WITH CARON 208, 'D', // [Ð] 208 13/00 320 D0 CAPITAL LETTER D WITH STROKE 209, 'N', // [N] 209 13/01 321 D1 CAPITAL LETTER N WITH ACUTE ACCENT 210, 'N', // [N] 210 13/02 322 D2 CAPITAL LETTER N WITH CARON 211, 'O', // [Ó] 211 13/03 323 D3 CAPITAL LETTER O WITH ACUTE ACCENT 212, 'O', // [Ô] 212 13/04 324 D4 CAPITAL LETTER O WITH CIRCUMFLEX 213, 'O', // [O] 213 13/05 325 D5 CAPITAL LETTER O WITH DOUBLE ACUTE ACCENT 214, 'O', // [Ö] 214 13/06 326 D6 CAPITAL LETTER O WITH DIAERESIS 215, '*', // [×] 215 13/07 327 D7 MULTIPLICATION SIGN 216, 'R', // [R] 216 13/08 330 D8 CAPITAL LETTER R WITH CARON 217, 'U', // [U] 217 13/09 331 D9 CAPITAL LETTER U WITH RING ABOVE 218, 'U', // [Ú] 218 13/10 332 DA CAPITAL LETTER U WITH ACUTE ACCENT 219, 'U', // [U] 219 13/11 333 DB CAPITAL LETTER U WITH DOUBLE ACUTE ACCENT 220, 'U', // [Ü] 220 13/12 334 DC CAPITAL LETTER U WITH DIAERESIS 221, 'Y', // [Ý] 221 13/13 335 DD CAPITAL LETTER Y WITH ACUTE ACCENT 222, 'T', // [T] 222 13/14 336 DE CAPITAL LETTER T WITH CEDILLA 223, ' ', // [ß] 223 13/15 337 DF SMALL GERMAN LETTER SHARP s 224, 'r', // [r] 224 14/00 340 E0 SMALL LETTER R WITH ACUTE ACCENT 225, 'a', // [á] 225 14/01 341 E1 SMALL LETTER A WITH ACUTE ACCENT 226, 'a', // [â] 226 14/02 342 E2 SMALL LETTER A WITH CIRCUMFLEX 227, 'a', // [a] 227 14/03 343 E3 SMALL LETTER A WITH BREVE 228, 'a', // [ä] 228 14/04 344 E4 SMALL LETTER A WITH DIAERESIS 229, 'l', // [l] 229 14/05 345 E5 SMALL LETTER L WITH ACUTE ACCENT 230, 'c', // [c] 230 14/06 346 E6 SMALL LETTER C WITH ACUTE ACCENT 231, 'c', // [ç] 231 14/07 347 E7 SMALL LETTER C WITH CEDILLA 232, 'c', // [c] 232 14/08 350 E8 SMALL LETTER C WITH CARON 233, 'e', // [é] 233 14/09 351 E9 SMALL LETTER E WITH ACUTE ACCENT 234, 'e', // [e] 234 14/10 352 EA SMALL LETTER E WITH OGONEK 235, 'e', // [ë] 235 14/11 353 EB SMALL LETTER E WITH DIAERESIS 236, 'e', // [e] 236 14/12 354 EC SMALL LETTER E WITH CARON 237, 'i', // [í] 237 14/13 355 ED SMALL LETTER I WITH ACUTE ACCENT 238, 'i', // [î] 238 14/14 356 EE SMALL LETTER I WITH CIRCUMFLEX ACCENT 239, 'd', // [d] 239 14/15 357 EF SMALL LETTER D WITH CARON 240, 'd', // [d] 240 15/00 360 F0 SMALL LETTER D WITH STROKE 241, 'n', // [n] 241 15/01 361 F1 SMALL LETTER N WITH ACUTE ACCENT 242, 'n', // [n] 242 15/02 362 F2 SMALL LETTER N WITH CARON 243, 'o', // [ó] 243 15/03 363 F3 SMALL LETTER O WITH ACUTE ACCENT 244, ' ', // [ô] 244 15/04 364 F4 SMALL LETTER O WITH CIRCUMFLEX 245, 'o', // [o] 245 15/05 365 F5 SMALL LETTER O WITH DOUBLE ACUTE ACCENT 246, 'o', // [ö] 246 15/06 366 F6 SMALL LETTER O WITH DIAERESIS 247, ' ', // [÷] 247 15/07 367 F7 DIVISION SIGN 248, 'r', // [r] 248 15/08 370 F8 SMALL LETTER R WITH CARON 249, 'u', // [u] 249 15/09 371 F9 SMALL LETTER U WITH RING ABOVE 250, 'u', // [ú] 250 15/10 372 FA SMALL LETTER U WITH ACUTE ACCENT 251, 'u', // [u] 251 15/11 373 FB SMALL LETTER U WITH DOUBLE ACUTE ACCENT 252, 'u', // [ü] 252 15/12 374 FC SMALL LETTER U WITH DIAERESIS 253, 'y', // [ý] 253 15/13 375 FD SMALL LETTER Y WITH ACUTE ACCENT 254, 't', // [t] 254 15/14 376 FE SMALL LETTER T WITH CEDILLA 255, ' ' }; // [?] 255 15/15 377 FF DOT ABOVE while (!feof(fd)) { int i; c=fgetc(fd); if(!feof(fd)) { if(c >= 128) { for(i=0;i<128;i++) { if((i+128)==c) /*o rad v ASCII vys*/ { buffer[j]=mapa[i][1]; } } } else { buffer[j]=c; } j++; } } fwrite(buffer,sizeof(char),strlen(buffer),fds); ZavriSoubor(fds); ZavriSoubor(fd); }
amd64 x86_64 AMD Athlon(tm) 64 Processor 3000+ GNU/Linux
Největší zábavou bývají ty počítačové úlohy, které jsou v praxi naprosto k ničemu
To mě zas C je na prd, páč ho neumim )))))
...všechno lze nějak udělat, otázkou je jak...
kdybys to chtel v perlu (popripade si z nej vytahat ty tabulky) tak hledej cstocs.... a nebo ve zdrojacich mysql by mely tusim bejt....
Hrrrr, will you stop using people as human driven search engines? Google.com has all the answers you need.
já mam ve svym programu na odstranění diakritiky (v unicode) tohle:
Kód:int arrayutf[96] = {-61, -127, -60, -116, -60, -114, -61, -119, -60, -102, -61, -115, -60, -67, -59, -121, -61, -109, -59, -104, -59, -96, -59, -92, -61, -102, -59, -82, -61, -99, -59, -67, -61, -95, -60, -115, -60, -113, -61, -87, -60, -101, -61, -83, -60, -66, -59, -120, -61, -77, -59, -103, -59, -95, -59, -91, -61, -70, -59, -81, -61, -67, -59, -66, -61, -124, -61, -117, -61, -106, -61, -100, -61, -92, -61, -85, -61, -74, -61, -68, -61, -76, -61, -108, -60, -71, -60, -70, -60, -67, -60, -66, -59, -108, -59, -107}; int arraywin[48] = {65, 67, 68, 69, 69, 73, 76, 78, 79, 82, 83, 84, 85, 85, 89, 90, 97, 99, 100, 101, 101, 105, 108, 110, 111, 114, 115, 116, 117, 117, 121, 122, 65, 69, 79, 85, 97, 101, 111, 117, 111, 111, 76, 108, 76, 108, 82, 114}; string Util::disableCzChars(string message) { string s = ""; for(unsigned int j = 0; j < message.length(); j++) { int zn = (int)message[j]; int zzz = -1; for(int l = 0; l < 96; l+=2) { int zn2 = (int)message[j+1]; if ((zn == arrayutf[l])&&(zn2 == arrayutf[l+1])) { zzz = (int)(l/2); break; } } if (zzz >= 0) { s += (char)(arraywin[zzz]); j++; } else { s += message[j]; } } return s; }
EC5410 + Chill 400w, AMD Athlon XP 2500+@3200+, AC Copper Silent 2 (rev2), Out: AC Fan Pro TC, DFI NFII Ultra-AL nForce2 Ultra 400, 2x256MB+512MB DDR 333 CL2.5 Dual Channel, inno3D GeForce4 Ti4200-8x 128MB + AC Fan Pro TC, Seagate ST3160023A 160GB, Seagate ST360021A 60GB, Teac CD-W552E, LG SuperMulti GSA-4160B, SB Live! 5.1 Player + Audigy MOD, Windows XP Pro SP2, Karneval TURBO 2000/300
ehm, to chces prepisovat kodovaci tabulky vsech moznejch a nemoznejch kodovejch stranek ? 100% na to existuje hotova knihovna, ale pokud to chces moci mermo delat sam, nezaponem oriznout pouzitej rozsah na znaky 0-127 (pripadne jeste vic, jen na alfanumericky).
IMPROBE AMOR, QUID NON MORTALIA PECTORA COGIS - krutá jsi, lásko, kam až ty doženeš smrtelná srdce -- Vergilius
Mnoho je prostředků, které léčí lásku, ale žádný není spolehlivý.
S tím, čeho se na nás dopustili druzí se už nějak vyrovnáme. Horší je to s tím, čeho jsme se na sobě dopustili sami. -- Francois La Rochefoucauld
Nabídnout přátelství tomu, kdo chce lásku, je jako dát chleba tomu, kdo umírá žízní.
Toto téma si právě prohlíží 1 uživatelů. (0 registrovaných a 1 anonymních)